How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
Hi, I have a problem, it is easy in Scala code, but I can not take the top N from RDD as RDD. There are 1 Student Score, ask take top 10 age, and then take top 10 from each age, the result is 100 records. The Scala code is here, but how can I do it in RDD, *for RDD.take return is Array,

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Ritesh Kumar Singh
For converting an Array or any List to a RDD, we can try using : sc.parallelize(groupedScore)//or whatever the name of the list variable is On Mon, Dec 1, 2014 at 8:14 PM, Xuefeng Wu ben...@gmail.com wrote: Hi, I have a problem, it is easy in Scala code, but I can not take the top N

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Debasish Das
rdd.top collects it on master... If you want topk for a key run map / mappartition and use a bounded priority queue and reducebykey the queues. I experimented with topk from algebird and bounded priority queue wrapped over jpriority queue ( spark default)...bpq is faster Code example is here:

Re: How take top N of top M from RDD as RDD

2014-12-01 Thread Xuefeng Wu
hi Debasish, I found test code in map translate, would it collect all products too? + val sortedProducts = products.toArray.sorted(ord.reverse) Yours, Xuefeng Wu 吴雪峰 敬上 On 2014年12月2日, at 上午1:33, Debasish Das debasish.da...@gmail.com wrote: rdd.top collects it on master... If you want