[ 
https://issues.apache.org/jira/browse/SPARK-30108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-30108:
--------------------------------------
    Description: 
Spark accumulators reflect the work that has been done, and not the data that 
has been processed. There are situations where one tuple can be processed 
multiple times, e.g.: task/stage retries, speculation, determination of ranges 
for global ordered, etc... For observed metrics we need the value of the 
accumulator to be based on the data and not on processing.

The current aggregating accumulator is already robust to some of these issues 
(like task failure), but we need to add some additional checks to make sure it 
is fool proof.

> Add robust accumulator for observable metrics
> ---------------------------------------------
>
>                 Key: SPARK-30108
>                 URL: https://issues.apache.org/jira/browse/SPARK-30108
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Herman van Hövell
>            Priority: Major
>
> Spark accumulators reflect the work that has been done, and not the data that 
> has been processed. There are situations where one tuple can be processed 
> multiple times, e.g.: task/stage retries, speculation, determination of 
> ranges for global ordered, etc... For observed metrics we need the value of 
> the accumulator to be based on the data and not on processing.
> The current aggregating accumulator is already robust to some of these issues 
> (like task failure), but we need to add some additional checks to make sure 
> it is fool proof.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to