[ https://issues.apache.org/jira/browse/HADOOP-14972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-14972: ------------------------------------ Issue Type: Sub-task (was: New Feature) Parent: HADOOP-14831 > Histogram metrics types for latency, etc. > ----------------------------------------- > > Key: HADOOP-14972 > URL: https://issues.apache.org/jira/browse/HADOOP-14972 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.9.0, 3.0.0 > Reporter: Sean Mackrory > Assignee: Sean Mackrory > > We'd like metrics to track latencies for various operations, such as > latencies for various request types, etc. This may need to be done different > from current metrics types that are just counters of type long, and it needs > to be done intelligently as these measurements are very numerous, and are > primarily interesting due to the outliers that are unpredictably far from > normal. A few ideas on how we might implement something like this: > * An adaptive, sparse histogram type. I envision something configurable with > a maximumum granularity and a maximum number of bins. Initially, datapoints > are tallied in bins with the maximum granularity. As we reach the maximum > number of bins, bins are merged in even / odd pairs. There's some complexity > here, especially to make it perform well and allow safe concurrency, but I > like the ability to configure reasonable limits and retain as much > granularity as possible without knowing the exact shape of the data > beforehand. > * LongMetrics named "read_latency_600ms", "read_latency_800ms" to represent > bins. This was suggested to me by [~fabbri]. I initially did not like the > idea of having either so many hard-coded bins for however many op types, but > this could also be done dynamically (we just hard-code which measurements we > take, and with what granularity to group them, e.g. read_latency, 200 ms). > The resulting dataset could be sparse and dynamic to allow for extreme > outliers, but the granularity is still pre-determined. > * We could also simply track a certain number of the highest latencies, and > basic descriptive statistics like a running average, min / max, etc. > Inherently more limited in what it can show us, but much simpler and might > still provide some insight when analyzing performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org