[ 
https://issues.apache.org/jira/browse/SPARK-46294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davin Tjong updated SPARK-46294:
--------------------------------
    Component/s: SQL
                     (was: Spark Core)

> Clean up initValue vs zeroValue semantics in SQLMetrics
> -------------------------------------------------------
>
>                 Key: SPARK-46294
>                 URL: https://issues.apache.org/jira/browse/SPARK-46294
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Davin Tjong
>            Priority: Minor
>
> The semantics of initValue and _zeroValue in SQLMetrics is a little bit 
> confusing, since they effectively mean the same thing. Changing it to the 
> following would be clearer, especially in terms of defining what an "invalid" 
> metric is.
>  
> proposed definitions:
>  
> initValue is the starting value for a SQLMetric. If a metric has value equal 
> to its initValue, then it should be filtered out before aggregating with 
> SQLMetrics.stringValue().
>  
> zeroValue defines the lowest value considered valid. If a SQLMetric is 
> invalid, it is set to zeroValue upon receiving any updates, and it also 
> reports zeroValue as its value to avoid exposing it to the user 
> programatically (concern previouosly addressed in SPARK-41442).
> For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that 
> the metric is by default invalid. At the end of a task, we will update the 
> metric making it valid, and the invalid metrics will be filtered out when 
> calculating min, max, etc. as a workaround for SPARK-11013.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to