Davin Tjong created SPARK-46294:
-----------------------------------

             Summary: Clean up initValue vs zeroValue semantics in SQLMetrics
                 Key: SPARK-46294
                 URL: https://issues.apache.org/jira/browse/SPARK-46294
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.5.0
            Reporter: Davin Tjong


The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
confusing, since they effectively mean the same thing. Changing it to the 
following would be clearer, especially in terms of defining what an "invalid" 
metric is.
 
proposed definitions:
 
initValue is the starting value for a SQLMetric. If a metric has value equal to 
its initValue, then it should be filtered out before aggregating with 
SQLMetrics.stringValue().
 
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, 
it is set to zeroValue upon receiving any updates, and it also reports 
zeroValue as its value to avoid exposing it to the user programatically 
(concern previouosly addressed in SPARK-41442).

For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
the metric is by default invalid. At the end of a task, we will update the 
metric making it valid, and the invalid metrics will be filtered out when 
calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to