[ https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davin Tjong updated SPARK-46294: -------------------------------- Component/s: SQL (was: Spark Core) > Clean up initValue vs zeroValue semantics in SQLMetrics > ------------------------------------------------------- > > Key: SPARK-46294 > URL: https://issues.apache.org/jira/browse/SPARK-46294 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: Davin Tjong > Priority: Minor > > The semantics of initValue and _zeroValue in SQLMetrics is a little bit > confusing, since they effectively mean the same thing. Changing it to the > following would be clearer, especially in terms of defining what an "invalid" > metric is. > > proposed definitions: > > initValue is the starting value for a SQLMetric. If a metric has value equal > to its initValue, then it should be filtered out before aggregating with > SQLMetrics.stringValue(). > > zeroValue defines the lowest value considered valid. If a SQLMetric is > invalid, it is set to zeroValue upon receiving any updates, and it also > reports zeroValue as its value to avoid exposing it to the user > programatically (concern previouosly addressed in SPARK-41442). > For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that > the metric is by default invalid. At the end of a task, we will update the > metric making it valid, and the invalid metrics will be filtered out when > calculating min, max, etc. as a workaround for SPARK-11013. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org