[ 
https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30666:
------------------------------------

    Assignee:     (was: Apache Spark)

> Reliable single-stage accumulators
> ----------------------------------
>
>                 Key: SPARK-30666
>                 URL: https://issues.apache.org/jira/browse/SPARK-30666
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> This proposes a pragmatic improvement to allow for reliable single-stage 
> accumulators. Under the assumption that a given stage / partition / rdd 
> produces identical results, non-deterministic code produces identical 
> accumulator increments on success. Rerunning partitions for any reason should 
> always produce the same increments per partition on success.
> With this pragmatic approach, increments from individual partitions / tasks 
> are only merged into the accumulator on driver side for the first time per 
> partition. This is useful for accumulators registered with 
> {{countFailedValues == false}}. Hence, the accumulator aggregates all 
> successful partitions only once.
> The implementations require extra memory that scales with the number of 
> partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to