Re: Index wise most frequently occuring element

2015-01-27 Thread Sven Krasser
Use combineByKey. For top 10 as an example (bottom 10 work similarly): add
the element to a list. If the list is larger than 10, delete the smallest
elements until size is back to 10.
-Sven

On Tue, Jan 27, 2015 at 3:35 AM, kundan kumar  wrote:

> I have a an array of the form
>
> val array: Array[(Int, (String, Int))] = Array(
>   (idx1,(word1,count1)),
>   (idx2,(word2,count2)),
>   (idx1,(word1,count1)),
>   (idx3,(word3,count1)),
>   (idx4,(word4,count4)))
>
> I want to get the top 10 and bottom 10 elements from this array for each
> index (idx1,idx2,). Basically I want the top 10 most occuring and
> bottom 10 least occuring elements for each index value.
>
> Please suggest how to acheive in spark in most efficient way. I have tried
> it using the for loops for each index but this makes the program too slow
> and runs sequentially.
>
> Thanks,
>
> Kundan
>
>
>


-- 
http://sites.google.com/site/krasser/?utm_source=sig


Index wise most frequently occuring element

2015-01-27 Thread kundan kumar
I have a an array of the form

val array: Array[(Int, (String, Int))] = Array(
  (idx1,(word1,count1)),
  (idx2,(word2,count2)),
  (idx1,(word1,count1)),
  (idx3,(word3,count1)),
  (idx4,(word4,count4)))

I want to get the top 10 and bottom 10 elements from this array for each
index (idx1,idx2,). Basically I want the top 10 most occuring and
bottom 10 least occuring elements for each index value.

Please suggest how to acheive in spark in most efficient way. I have tried
it using the for loops for each index but this makes the program too slow
and runs sequentially.

Thanks,

Kundan