[ https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155160#comment-14155160 ]
Yi Deng commented on HBASE-12133: --------------------------------- [~stack] I've checked the Histogram implementation of *yammer* if this one is the correct one: http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.2.0/com/yammer/metrics/core/Histogram.java#Histogram First, about the performance. `updateVariance` is a big performance point, it generates at least a double array for each call, which will contribute to some cpu in future GC job. Other parts are similar to mine version but not faster. Second, Histogram uses a sampling strategy to tradeoff between accuracy and resouce(cpu/mem) (actually, this is not a histogram algorithm). There're two types of samples: # `UniformSample` randomly samples a fixed number of values and use them to estimate the percentiles. The performance is good but I have concern of how many samples we need to keep to get a sound result. If we need to safe a lot of samples, the computation time (for computing the percentile) will increase accordingly. # `ExponentiallyDecayingSample` could be very slow for updating. `FastLongHistogram` uses a different stratege: # It maintains a series of uniformly split buckets (or bins) for histogram counting, values out of the buckets range are counted with special counters. Values larger than 10X is counted in yet another counter for handling the outliers. # No synchronization(lock) in each update. # No decaying but the user can priodically reset the histogram. Information collected in last round is used to guide the bucket range in next round. Currently this class is mainly to be used in 0.98 only. For `updateMin`/`Max`, if you're talking about not making them in a separate util clases, I'm open to that. > Add FastLongHistogram for metric computation > -------------------------------------------- > > Key: HBASE-12133 > URL: https://issues.apache.org/jira/browse/HBASE-12133 > Project: HBase > Issue Type: New Feature > Components: metrics > Affects Versions: 0.98.8 > Reporter: Yi Deng > Assignee: Yi Deng > Priority: Minor > Labels: histogram, metrics > Fix For: 0.98.8 > > Attachments: > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, > 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch > > > FastLongHistogram is a thread-safe class that estimate distribution of data > and computes the quantiles. It's useful for computing aggregated metrics like > P99/P95. -- This message was sent by Atlassian JIRA (v6.3.4#6332)