Flashback: RDD.aggregate versus accumulables...

2016-03-10 Thread jiml
And Lord Joe you were right future versions did protect accumulators in actions. I wonder if anyone has a "modern" take on the accumulator vs. aggregate question. Seems like if I need to do it by key or control partitioning I would use aggregate. Bottom line question / reason for post: I wonder

Re: RDD.aggregate?

2014-12-11 Thread ll
any explaination on how aggregate works would be much appreciated. i already looked at the spark example and still am confused about the seqop and combop... thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434p20634.html Sent from

Re: RDD.aggregate?

2014-12-11 Thread Gerard Maas
There's some explanation and an example here: http://stackoverflow.com/questions/26611471/spark-data-processing-with-grouping/26612246#26612246 -kr, Gerard. On Thu, Dec 11, 2014 at 7:15 PM, ll duy.huynh@gmail.com wrote: any explaination on how aggregate works would be much appreciated. i

RDD.aggregate?

2014-12-04 Thread ll
can someone please explain how RDD.aggregate works? i looked at the average example done with aggregate() but i'm still confused about this function... much appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434.html Sent from

Re: RDD.aggregate versus accumulables...

2014-11-17 Thread Daniel Siegmann
You should *never* use accumulators for this purpose because you may get incorrect answers. Accumulators can count the same thing multiple times - you cannot rely upon the correctness of the values they compute. See SPARK-732 https://issues.apache.org/jira/browse/SPARK-732 for more info. On Sun,

Re: RDD.aggregate versus accumulables...

2014-11-17 Thread Surendranauth Hiraman
We use Algebird for calculating things like min/max, stddev, variance, etc. https://github.com/twitter/algebird/wiki -Suren On Mon, Nov 17, 2014 at 11:32 AM, Daniel Siegmann daniel.siegm...@velos.io wrote: You should *never* use accumulators for this purpose because you may get incorrect

RE: RDD.aggregate versus accumulables...

2014-11-17 Thread Segerlind, Nathan L
: RDD.aggregate versus accumulables... You should never use accumulators for this purpose because you may get incorrect answers. Accumulators can count the same thing multiple times - you cannot rely upon the correctness of the values they compute. See SPARK-732https://issues.apache.org/jira/browse/SPARK

RE: RDD.aggregate versus accumulables...

2014-11-17 Thread lordjoe
I have been playing with using accumulators (despite the possible error with multiple attempts) These provide a convenient way to get some numbers while still performing business logic. I posted some sample code at http://lordjoesoftware.blogspot.com/. Even if accumulators are not perfect today -

RDD.aggregate versus accumulables...

2014-11-16 Thread Segerlind, Nathan L
Hi All. I am trying to get my head around why using accumulators and accumulables seems to be the most recommended method for accumulating running sums, averages, variances and the like, whereas the aggregate method seems to me to be the right one. I have no performance measurements as of yet,