And Lord Joe you were right future versions did protect accumulators in
actions. I wonder if anyone has a "modern" take on the accumulator vs.
aggregate question. Seems like if I need to do it by key or control
partitioning I would use aggregate.
Bottom line question / reason for post: I wonder
any explaination on how aggregate works would be much appreciated. i already
looked at the spark example and still am confused about the seqop and
combop... thanks.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434p20634.html
Sent from
There's some explanation and an example here:
http://stackoverflow.com/questions/26611471/spark-data-processing-with-grouping/26612246#26612246
-kr, Gerard.
On Thu, Dec 11, 2014 at 7:15 PM, ll duy.huynh@gmail.com wrote:
any explaination on how aggregate works would be much appreciated. i
can someone please explain how RDD.aggregate works? i looked at the average
example done with aggregate() but i'm still confused about this function...
much appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-aggregate-tp20434.html
Sent from
You should *never* use accumulators for this purpose because you may get
incorrect answers. Accumulators can count the same thing multiple times -
you cannot rely upon the correctness of the values they compute. See
SPARK-732 https://issues.apache.org/jira/browse/SPARK-732 for more info.
On Sun,
We use Algebird for calculating things like min/max, stddev, variance, etc.
https://github.com/twitter/algebird/wiki
-Suren
On Mon, Nov 17, 2014 at 11:32 AM, Daniel Siegmann daniel.siegm...@velos.io
wrote:
You should *never* use accumulators for this purpose because you may get
incorrect
: RDD.aggregate versus accumulables...
You should never use accumulators for this purpose because you may get
incorrect answers. Accumulators can count the same thing multiple times - you
cannot rely upon the correctness of the values they compute. See
SPARK-732https://issues.apache.org/jira/browse/SPARK
I have been playing with using accumulators (despite the possible error with
multiple attempts) These provide a convenient way to get some numbers while
still performing business logic.
I posted some sample code at http://lordjoesoftware.blogspot.com/.
Even if accumulators are not perfect today -
Hi All.
I am trying to get my head around why using accumulators and accumulables seems
to be the most recommended method for accumulating running sums, averages,
variances and the like, whereas the aggregate method seems to me to be the
right one. I have no performance measurements as of yet,