[ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-30666: ------------------------------------ Assignee: (was: Apache Spark) > Reliable single-stage accumulators > ---------------------------------- > > Key: SPARK-30666 > URL: https://issues.apache.org/jira/browse/SPARK-30666 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.1.0 > Reporter: Enrico Minack > Priority: Major > > This proposes a pragmatic improvement to allow for reliable single-stage > accumulators. Under the assumption that a given stage / partition / rdd > produces identical results, non-deterministic code produces identical > accumulator increments on success. Rerunning partitions for any reason should > always produce the same increments per partition on success. > With this pragmatic approach, increments from individual partitions / tasks > are only merged into the accumulator on driver side for the first time per > partition. This is useful for accumulators registered with > {{countFailedValues == false}}. Hence, the accumulator aggregates all > successful partitions only once. > The implementations require extra memory that scales with the number of > partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org