[ https://issues.apache.org/jira/browse/SPARK-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275608#comment-14275608 ]
Apache Spark commented on SPARK-733: ------------------------------------ User 'ilganeli' has created a pull request for this issue: https://github.com/apache/spark/pull/4022 > Add documentation on use of accumulators in lazy transformation > --------------------------------------------------------------- > > Key: SPARK-733 > URL: https://issues.apache.org/jira/browse/SPARK-733 > Project: Spark > Issue Type: Bug > Components: Documentation > Reporter: Josh Rosen > > Accumulators updates are side-effects of RDD computations. Unlike RDDs, > accumulators do not carry lineage that would allow them to be computed when > their values are accessed on the master. > This can lead to confusion when accumulators are used in lazy transformations > like `map`: > {code} > val acc = sc.accumulator(0) > data.map(x => acc += x; f(x)) > // Here, acc is 0 because no actions have cause the `map` to be computed. > {code} > As far as I can tell, our documentation only includes examples of using > accumulators in `foreach`, for which this problem does not occur. > This pattern of using accumulators in map() occurs in Bagel and other Spark > code found in the wild. > It might be nice to document this behavior in the accumulators section of the > Spark programming guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org