Re: Use combineByKey and StatCount

2014-04-14 Thread dachuan
it seems you can imitate RDD.top()'s implementation. for each partition, you get the number of records, and the total sum of key, and in the final result handler, you add all the sum together, and add the number of records together, then you can get the mean, I mean, arithmetic mean. On Tue, Apr

Use combineByKey and StatCount

2014-04-01 Thread Jaonary Rabarisoa
Hi all; Can someone give me some tips to compute mean of RDD by key , maybe with combineByKey and StatCount. Cheers, Jaonary