[jira] [Commented] (HADOOP-14972) Histogram metrics types for latency, etc.

Steve Loughran (JIRA) Tue, 24 Oct 2017 02:51:43 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-14972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16216627#comment-16216627
 ]


Steve Loughran commented on HADOOP-14972:
-----------------------------------------

I don't know about perf stats of quantiles but yes, performance would be good.

Everyone playing with metrics should spend an afternoon instrumenting an app of 
theirs with CodaHale metrics and Java 8, I've done something similar with scala 
in the past, where you can use the way codahale probes its metrics to actually 
implement the lookup as closures probing the running app, rather than just 
having the app publishing information which is often not needed at all.

See also [~iyonger];s HADOOP-14475 patch, which I have sadly neglected and 
which I'm aware we need to pull in. Sean: can you look at that patch before we 
do other things, as I don't want that patch obsoleted by later work.

> Histogram metrics types for latency, etc.
> -----------------------------------------
>
>                 Key: HADOOP-14972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14972
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0, 3.0.0
>            Reporter: Sean Mackrory
>            Assignee: Sean Mackrory
>
> We'd like metrics to track latencies for various operations, such as 
> latencies for various request types, etc. This may need to be done different 
> from current metrics types that are just counters of type long, and it needs 
> to be done intelligently as these measurements are very numerous, and are 
> primarily interesting due to the outliers that are unpredictably far from 
> normal. A few ideas on how we might implement something like this:
> * An adaptive, sparse histogram type. I envision something configurable with 
> a maximumum granularity and a maximum number of bins. Initially, datapoints 
> are tallied in bins with the maximum granularity. As we reach the maximum 
> number of bins, bins are merged in even / odd pairs. There's some complexity 
> here, especially to make it perform well and allow safe concurrency, but I 
> like the ability to configure reasonable limits and retain as much 
> granularity as possible without knowing the exact shape of the data 
> beforehand.
> * LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent 
> bins. This was suggested to me by [~fabbri]. I initially did not like the 
> idea of having either so many hard-coded bins for however many op types, but 
> this could also be done dynamically (we just hard-code which measurements we 
> take, and with what granularity to group them, e.g. read_latency, 200 ms). 
> The resulting dataset could be sparse and dynamic to allow for extreme 
> outliers, but the granularity is still pre-determined.
> * We could also simply track a certain number of the highest latencies, and 
> basic descriptive statistics like a running average, min / max, etc. 
> Inherently more limited in what it can show us, but much simpler and might 
> still provide some insight when analyzing performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-14972) Histogram metrics types for latency, etc.

Reply via email to