[ https://issues.apache.org/jira/browse/SPARK-30108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001619#comment-17001619 ]
Ankit Raj Boudh commented on SPARK-30108: ----------------------------------------- [~hvanhovell], Thank you, during development of this feature i will take care of this point. > Add robust accumulator for observable metrics > --------------------------------------------- > > Key: SPARK-30108 > URL: https://issues.apache.org/jira/browse/SPARK-30108 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.0.0 > Reporter: Herman van Hövell > Priority: Major > > Spark accumulators reflect the work that has been done, and not the data that > has been processed. There are situations where one tuple can be processed > multiple times, e.g.: task/stage retries, speculation, determination of > ranges for global ordered, etc... For observed metrics we need the value of > the accumulator to be based on the data and not on processing. > The current aggregating accumulator is already robust to some of these issues > (like task failure), but we need to add some additional checks to make sure > it is fool proof. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org