[ 
https://issues.apache.org/jira/browse/HBASE-12133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155160#comment-14155160
 ] 

Yi Deng commented on HBASE-12133:
---------------------------------

[~stack] I've checked the Histogram implementation of *yammer* if this one is 
the correct one: 
http://grepcode.com/file/repo1.maven.org/maven2/com.yammer.metrics/metrics-core/2.2.0/com/yammer/metrics/core/Histogram.java#Histogram

First, about the performance. `updateVariance` is a big performance point, it 
generates at least a double array for each call, which will contribute to some 
cpu in future GC job. Other parts are similar to mine version but not faster.

Second, Histogram uses a sampling strategy to tradeoff between accuracy and 
resouce(cpu/mem) (actually, this is not a histogram algorithm). There're two 
types of samples: 

# `UniformSample` randomly samples a fixed number of values and use them to 
estimate the percentiles. The performance is good but I have concern of how 
many samples we need to keep to get a sound result. If we need to safe a lot of 
samples, the computation time (for computing the percentile) will increase 
accordingly.
# `ExponentiallyDecayingSample` could be very slow for updating.

`FastLongHistogram` uses a different stratege:
# It maintains a series of uniformly split buckets (or bins) for histogram 
counting, values out of the buckets range are counted with special counters. 
Values larger than 10X is counted in yet another counter for handling the 
outliers.
# No synchronization(lock) in each update.
# No decaying but the user can priodically reset the histogram. Information 
collected in last round is used to guide the bucket range in next round.

Currently this class is mainly to be used in 0.98 only.

For `updateMin`/`Max`, if you're talking about not making them in a separate 
util clases, I'm open to that.


> Add FastLongHistogram for metric computation
> --------------------------------------------
>
>                 Key: HBASE-12133
>                 URL: https://issues.apache.org/jira/browse/HBASE-12133
>             Project: HBase
>          Issue Type: New Feature
>          Components: metrics
>    Affects Versions: 0.98.8
>            Reporter: Yi Deng
>            Assignee: Yi Deng
>            Priority: Minor
>              Labels: histogram, metrics
>             Fix For: 0.98.8
>
>         Attachments: 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch, 
> 0001-Add-FastLongHistogram-for-fast-histogram-estimation.patch
>
>
> FastLongHistogram is a thread-safe class that estimate distribution of data 
> and computes the quantiles. It's useful for computing aggregated metrics like 
> P99/P95.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to