How does the Spark Accumulator work under the covers?

2014-10-10 Thread Areg Baghdasaryan (BLOOMBERG/ 731 LEX -)
Hello, I was wondering on what does the Spark accumulator do under the covers. I’ve implemented my own associative addInPlace function for the accumulator, where is this function being run? Let’s say you call something like myRdd.map(x = sum += x) is “sum” being accumulated locally in any way,

Re: How does the Spark Accumulator work under the covers?

2014-10-10 Thread HARIPRIYA AYYALASOMAYAJULA
If you use parallelize, the data is distributed across multiple nodes available and sum is computed individually within each partition and later merged. The driver manages the entire process. Is my understanding correct? Can someone please correct me if I am wrong? On Fri, Oct 10, 2014 at 9:37

Re: How does the Spark Accumulator work under the covers?

2014-10-10 Thread Jayant Shekhar
Hi Areg, Check out http://spark.apache.org/docs/latest/programming-guide.html#accumulators val sum = sc.accumulator(0) // accumulator created from an initial value in the driver The accumulator variable is created in the driver. Tasks running on the cluster can then add to it. However, they