about aggregateByKey and standard deviation

qinwei Fri, 31 Oct 2014 00:59:26 -0700





Hi, everyone    I have an RDD filled with data like        (k1, v11)        
(k1, v12)        (k1, v13)        (k2, v21)        (k2, v22)        (k2, v23)   
     ...
    I want to calculate the average and standard deviation of (v11, v12, v13) 
and (v21, v22, v23) group by there keys    for the moment, i have done that by 
using groupByKey and map, I notice that groupByKey is very expensive,  but i 
can not figure out how to do it by using aggregateByKey, so i wonder is there 
any better way to do this?
Thanks!


qinwei

about aggregateByKey and standard deviation

Reply via email to