Hi, I have a problem, it is easy in Scala code, but I can not take the top
N from RDD as RDD.
There are 1 Student Score, ask take top 10 age, and then take top 10
from each age, the result is 100 records.
The Scala code is here, but how can I do it in RDD, *for RDD.take return
is Array,
For converting an Array or any List to a RDD, we can try using :
sc.parallelize(groupedScore)//or whatever the name of the list
variable is
On Mon, Dec 1, 2014 at 8:14 PM, Xuefeng Wu ben...@gmail.com wrote:
Hi, I have a problem, it is easy in Scala code, but I can not take the top
N
rdd.top collects it on master...
If you want topk for a key run map / mappartition and use a bounded
priority queue and reducebykey the queues.
I experimented with topk from algebird and bounded priority queue wrapped
over jpriority queue ( spark default)...bpq is faster
Code example is here:
hi Debasish,
I found test code in map translate,
would it collect all products too?
+ val sortedProducts = products.toArray.sorted(ord.reverse)
Yours, Xuefeng Wu 吴雪峰 敬上
On 2014年12月2日, at 上午1:33, Debasish Das debasish.da...@gmail.com wrote:
rdd.top collects it on master...
If you want