[ https://issues.apache.org/jira/browse/SPARK-30108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138118#comment-17138118 ]
Wenchen Fan commented on SPARK-30108: ------------------------------------- is there any progress on it? > Add robust accumulator for observable metrics > --------------------------------------------- > > Key: SPARK-30108 > URL: https://issues.apache.org/jira/browse/SPARK-30108 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.1.0 > Reporter: Herman van Hövell > Priority: Major > > Spark accumulators reflect the work that has been done, and not the data that > has been processed. There are situations where one tuple can be processed > multiple times, e.g.: task/stage retries, speculation, determination of > ranges for global ordered, etc... For observed metrics we need the value of > the accumulator to be based on the data and not on processing. > The current aggregating accumulator is already robust to some of these issues > (like task failure), but we need to add some additional checks to make sure > it is fool proof. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org