[ 
https://issues.apache.org/jira/browse/SPARK-35259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-35259:
--------------------------------
    Description: 
Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
{code}
    // Time latency for open block request in ms
    private final Timer openBlockRequestLatencyMillis = new Timer();
    // Time latency for executor registration latency in ms
    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    // Time latency for processing fetch merged blocks meta request latency in 
ms
    private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
    // Time latency for processing finalize shuffle merge request latency in ms
    private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
{code}
However these Dropwizard Timers by default use nanoseconds 
([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
 It's certainly possible to extract milliseconds from them, but it seems 
misleading to have millis in the name here.

This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
like {{openBlockRequestLatencyMillis_count}} and 
{{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust the 
name accordingly, so the unit shouldn't be included in the name of the metric 
itself.

  was:
Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
{code}
    // Time latency for open block request in ms
    private final Timer openBlockRequestLatencyMillis = new Timer();
    // Time latency for executor registration latency in ms
    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    // Time latency for processing finalize shuffle merge request latency in ms
    private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
{code}
However these Dropwizard Timers by default use nanoseconds 
([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
 It's certainly possible to extract milliseconds from them, but it seems 
misleading to have millis in the name here.

{{YarnShuffleServiceMetrics}} currently doesn't expose any incorrectly-named 
metrics since it doesn't export any timing information from these metrics 
(which I am trying to address in SPARK-35258), but these names still result in 
kind of misleading metric names like 
{{finalizeShuffleMergeLatencyMillis_count}} -- a count doesn't have a unit. It 
should be up to the metrics exporter, like {{YarnShuffleServiceMetrics}}, to 
decide the unit and adjust the name accordingly.


> ExternalBlockHandler metrics have misleading unit in the name
> -------------------------------------------------------------
>
>                 Key: SPARK-35259
>                 URL: https://issues.apache.org/jira/browse/SPARK-35259
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 3.1.1
>            Reporter: Erik Krogen
>            Priority: Major
>
> Today {{ExternalBlockHandler}} exposes a few {{Timer}} metrics:
> {code}
>     // Time latency for open block request in ms
>     private final Timer openBlockRequestLatencyMillis = new Timer();
>     // Time latency for executor registration latency in ms
>     private final Timer registerExecutorRequestLatencyMillis = new Timer();
>     // Time latency for processing fetch merged blocks meta request latency 
> in ms
>     private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
>     // Time latency for processing finalize shuffle merge request latency in 
> ms
>     private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
> {code}
> However these Dropwizard Timers by default use nanoseconds 
> ([documentation|https://metrics.dropwizard.io/3.2.3/getting-started.html#timers]).
>  It's certainly possible to extract milliseconds from them, but it seems 
> misleading to have millis in the name here.
> This causes {{YarnShuffleServiceMetrics}} to expose confusingly-named metrics 
> like {{openBlockRequestLatencyMillis_count}} and 
> {{openBlockRequestLatencyMillis_nanos}}. It should be up to the metrics 
> exporter, like {{YarnShuffleServiceMetrics}}, to decide the unit and adjust 
> the name accordingly, so the unit shouldn't be included in the name of the 
> metric itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to