Hi, everyone I have an RDD filled with data like (k1, v11) (k1, v12) (k1, v13) (k2, v21) (k2, v22) (k2, v23) ... I want to calculate the average and standard deviation of (v11, v12, v13) and (v21, v22, v23) group by there keys for the moment, i have done that by using groupByKey and map, I notice that groupByKey is very expensive, but i can not figure out how to do it by using aggregateByKey, so i wonder is there any better way to do this? Thanks! qinwei
- about aggregateByKey and standard deviation qinwei
- Re: about aggregateByKey and standard deviation Kamal Banga