Re: about aggregateByKey and standard deviation

2014-11-03 Thread Kamal Banga
I don't think directy .aggregateByKey() can be done, because we will need count of keys (for average). Maybe we can use .countByKey() which returns a map and .foldByKey(0)(_+_) (or aggregateByKey()) which gives sum of values per key. I myself ain't getting how to proceed. Regards On Fri, Oct 31,

about aggregateByKey and standard deviation

2014-10-31 Thread qinwei
Hi, everyone    I have an RDD filled with data like        (k1, v11)        (k1, v12)        (k1, v13)        (k2, v21)        (k2, v22)        (k2, v23)         ...     I want to calculate the average and standard deviation of (v11, v12, v13) and (v21, v22, v23) group by there keys    for