Hi All.

I am trying to get my head around why using accumulators and accumulables seems 
to be the most recommended method for accumulating running sums, averages, 
variances and the like, whereas the aggregate method seems to me to be the 
right one. I have no performance measurements as of yet, but it seems that 
aggregate is simpler and more intuitive (And it does what one might expect an 
accumulator to do) whereas the accumulators and accumulables seem to have some 
extra complications and overhead.

So...

What's the real difference between an accumulator/accumulable and aggregating 
an RDD? When is one method of aggregation preferred over the other?

Thanks,
Nate

Reply via email to