[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-17 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416680#comment-13416680
 ] 

Andrew Wang commented on HBASE-6261:


Sorry I haven't had time to push on this more. I talked with Jon Hsieh last 
week about doing a more convincing analysis of the performance of the new 
MutableQuantiles class from HADOOP-8541 vs the existing reservoir-sampling 
histogram method. I'll try to get that done within a week.

I'm also not sure about the right course of action at getting it used in HBase. 
Stack indicated way back on the mailing list that he was okay waiting for a 
hadoop-common version bump, which is kind of a long timescale. If people really 
urgently want this, we could just copy the code over and then refactor it away 
when it's released in hadoop-common.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-17 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416706#comment-13416706
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

bq. copy the code over and then refactor it away
+1 on above.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417178#comment-13417178
 ] 

stack commented on HBASE-6261:
--

bq. Stack indicated way back on the mailing list that he was okay waiting for a 
hadoop-common version bump, which is kind of a long timescale.

Yeah.  Code copied in tends to never go away (For example: see MurmurHash that 
started out in hbase and has been in hadoop now w/ a good few years).

bq. If people really urgently want this, we could just copy the code over and 
then refactor it away when it's released in hadoop-common.

Sounds like a nice to have.  How much code would you have to copy in?  What 
would it be?  Thanks Andrew.



> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-18 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417302#comment-13417302
 ] 

Andrew Wang commented on HBASE-6261:


It'd be these files from hadoop-common:

* src/main/java/org/apache/hadoop/metrics2/lib/MutableQuantiles.java
* src/main/java/org/apache/hadoop/metrics2/lib/Quantiles.java
* src/main/java/org/apache/hadoop/metrics2/util/SampleQuantiles.java

{{wc -l}} reports it's 534 lines across those three files, heavily commented of 
course. {{MutableQuantiles}} is a hadoop2 metrics2 interface for 
SampleQuantiles, and might need to be modified for use in HBase. I haven't 
looked at what Elliot's done for HBASE-4050 yet.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417891#comment-13417891
 ] 

stack commented on HBASE-6261:
--

@Elliott You going to bring these files listed by Andrew above in anyways up in 
compat modules?

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421031#comment-13421031
 ] 

Andrew Wang commented on HBASE-6261:


Based on feedback from Elliot and Jon, I've done some analysis of both 
SampleQuantiles and MetricsHistogram.

For both, I tried item counts of 1k, 10k, 100k, 1M, 2.5M, 5M, and 10M. For each 
count, I randomly shuffled longs from {{[0, count)}}, pushed them through the 
estimator, and measured the runtime, # of samples, and error for various 
quantiles. This was repeated ten times, giving stddev error bars for each point.

MetricsHistogram was left using default settings (1028 item reservoir). 
SampleQuantiles was also left with default settings, tracking the same 
quantiles as MetricsHistogram, but with bounded error. I threw away the 0.90 
quantile from SampleQuantiles since MetricsHistogram didn't have a function to 
compute it (though trivial).

This was all run single-threaded on my couple-years-old T410s laptop.

You can view the imgur album of just the plots here: [http://imgur.com/a/gTDYr]

h2. Runtime

Note that the y-axis is log-scale in this plot. SampleQuantiles is roughly an 
order of magnitude slower at 10 million items (26.8s vs. 3.3s), but the scaling 
pattern overall looks good. It's comparable for low (<=10k) items.

!http://i.imgur.com/c6SIl.png!

h2. Memory usage

Note that the y-axis is again log-scale in this plot.

MetricsHistogram uses a flat 1028 items of storage, so it has constant memory 
usage. At 10 million items, SampleQuantiles uses roughly an order of magnitude 
more memory (19.4k items vs. 1k). Since SampleQuantiles samples are about 40B 
each and MetricsHistogram samples are 8B each, this is approximately 776KB vs. 
8KB.

This matters less for small numbers of items. The crossover point on the graph 
happens at between 10k and 100k items. The scaling pattern looks similar to the 
runtime, overall good.

!http://i.imgur.com/3E3RQ.png!

h2. Error bounds

Note that for this series of plots, the y-axis is linear and the x-axis is log. 
This makes the actual error values easier to interpret. Error was calculated by 
taking the difference in the actual and the estimated rank of the percentile, 
and dividing by the total count.

SampleQuantiles is by default configured to track 50th with 5% error, 75th with 
2.5%, 95th with 0.5%, and 99th with 0.1%. We see less error at higher 
percentiles, and with larger sized streams. For 95th and 99th, we reach 
essentially 0% error at around 1 million items (0.009% for 95th, 0.004% for 
99th).

MetricsHistogram doesn't really provide great error, and high percentiles seem 
to get worse as the number of items increase. There's also large standard 
deviation in error, which is unfortunate if these values are going to be used 
for thresholding. For 95th, it looks like 0.4% to 0.6% error. For 99th, we're 
looking at 0.2% to 0.3%.

An error of half a percent doesn't sound huge, but remember that this is error 
in rank, or effectively on a uniform latency distribution. To translate this, I 
fitted against the get latency distribution I got from running a mixed get/scan 
YCSB workload against CDH3u1 HBase. At the 95th percentile, an error of 0.5% 
translated to 137ms -3.4% and +4%. At the 99th, an error of 0.5% translated to 
310ms -21.7% and +43.3%. These are just indicative numbers; the important point 
is that half a percent on the tail of a Zipf distribution is pretty meaningful.

!http://i.imgur.com/m0ERq.png!
!http://i.imgur.com/qvfpR.png!
!http://i.imgur.com/k5y5o.png!
!http://i.imgur.com/uyqAK.png!

h2. Conclusion

For low-rate events (order 0.1s on up) like compactions or flushes, I think it 
can go either way. SampleQuantiles has similar CPU/memory usage up until ~10k 
items, but MetricsHistogram is perfectly accurate up until 1028 items, has 
bounded memory, and can be used to compute other statistics. The 1028 mark 
seems important here; just keep all the data for low-rate events.

For high-rate events (order ms) like RPCs, it depends if you care at all about 
accuracy. The memory/CPU overhead of SampleQuantiles is high in relative terms 
(order of magnitude), but you need to use it if you're measuring for SLAs since 
MetricsHistogram basically isn't accurate. It also seems unlikely that you'll 
have that many 1M+ item streams you want to track, and it's just a couple 
hundred KB more memory. Use MetricsHistogram if accuracy isn't important, but I 
feel like SampleQuantiles is a pretty reasonable choice.

Hopefully that was enlightening. I posted the raw data and plotting script if 
anyone else wants to play with it, and I can post the test code snippets used 
to make the data if anyone's interested in that too.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
>

[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-07-23 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421063#comment-13421063
 ] 

Elliott Clark commented on HBASE-6261:
--

@Stack
Yes.  I was thinking that I would bring Andrew's quantile stuff in while 
working on the transition to metrics2.

@Andrew
Wow.  That's pretty awesome stuff.  I think your final recommendations are spot 
on.  We should have a way for users to turn on/off these high fidelity 
histograms for different sets of metrics (rpc, compactions, etc).

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf, MetricsHistogram.data, 
> SampleQuantiles.data, parse.py
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-25 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401178#comment-13401178
 ] 

Otis Gospodnetic commented on HBASE-6261:
-

@Andrew - Ted Dunning may have thoughts on this and/or pointers to Mahout math 
or something else.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401463#comment-13401463
 ] 

Otis Gospodnetic commented on HBASE-6261:
-

@Andrew See https://twitter.com/otisg/status/217487624804376576

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-26 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401512#comment-13401512
 ] 

Andrew Purtell commented on HBASE-6261:
---

Which provoked this response: 
https://twitter.com/ted_dunning/status/217488314297626625
{quote}
The basic techniques from the Mahout OnlineSummarizer will work for this.
{quote}

Would be great if any subsequent conversation happen on this JIRA instead of in 
twitterspace. 

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-26 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401946#comment-13401946
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

"Approximate Counts and Quantiles over Sliding Windows" is more desirable for 
its ability to do arbitrary percentiles for sliding time windows.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-26 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401978#comment-13401978
 ] 

Elliott Clark commented on HBASE-6261:
--

Sliding times are much less useful if they come with a big cost.  I'd much 
rather move the moving average computation into something like OpenTSDB than to 
have it in hbase.  HBase should keep the least amount of history as possible.  
That way people that are interested in deep metrics can get it and move that 
into a dedicated system; all others are able to ignore it and they don't pay a 
high cost.

Speed > Memory > Accuracy

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-27 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402354#comment-13402354
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

So far I think the assumption is that the new algorithm would apply to the 
computation of all metrics.

Is it possible to configure "Approximate Counts and Quantiles over Sliding 
Windows" for a few selected metrics (to be consumed by load balancer, e.g.) 
while the others get computed with light weight algorithm ?

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402429#comment-13402429
 ] 

Andrew Wang commented on HBASE-6261:


@Elliot: Moving averages can be cheaply computed on the existing reservoir 
sample, this is more about percentiles. I'm not sure how OpenTSDB factors into 
this, since you'd have to feed the latency stream to OpenTSDB to figure out 
percentiles, which seems expensive. Depending on how tight your speed and 
memory constraints are, I think we could do this in HBase at acceptably minimal 
cost, or make this configurable somehow.

@Ted: The additional cost to do sliding windows is somewhat significant (I 
think 10s of MB more memory). Both the sliding and non-sliding methods allow 
for arbitrary percentiles. Anyway, I think reporting the 50th, 90th, 95th, and 
99th should satisfy anyone. Mixing and matching algorithms is possible and 
probably even advised since it's only worth doing this for high-rate streams 
where accuracy is important. Implementations of the cheaper and less accurate 
algos are already available.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-27 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402443#comment-13402443
 ] 

Elliott Clark commented on HBASE-6261:
--

Basically I'm saying that I don't think that sliding windows are useful since 
most things that consume the metrics can do a moving average, which performs a 
very similar job as sliding windows.



> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-27 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402733#comment-13402733
 ] 

Andrew Wang commented on HBASE-6261:


I've got my Java implementation of the non-sliding biased quantiles algorithm 
(QuantileEstimationCKMS.java) up on github:

https://github.com/umbrant/QuantileEstimation

Benchmarking on my laptop, I pushed 1 million shuffled items [0, 10**9) through 
it in 1.2 seconds while asking it to track the 50th, 90th, 95th, and 99th 
percentiles with low error. It kept ~5500 samples to do this, which at ~36B per 
sample, is about 193KiB. Empirical error was basically 0. I also ran it for 10 
million random longs, which took 19s and about 685KiB.

I think this is pretty lightweight. If this sounds reasonable, I'll start 
working on a patch.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403225#comment-13403225
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

Nice work.

In QuantileEstimationCKMS.java:
{code}
  long[] buffer = new long[500];
{code}
I think the buffer size should be configurable.
Can we maintain a metric for how often compress() is called ?
Should compress() return an int indicating how many items are removed ?
What if no item gets removed coming out of a call to compress() ?

Please work on a patch.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403280#comment-13403280
 ] 

Andrew Wang commented on HBASE-6261:


I don't think performance is very sensitive to the buffer size, it's just a way 
of batching inserts for efficiency. Definitely doesn't affect accuracy because 
I have it call insertBatch() on every query().

We can maintain the compress count and track the # items removed, but I don't 
know if it's really worth exposing to the user (metrics for our metrics?). I 
think it's nice for testing though, so I'll try to expose it internally.

I've never seen compress() fail to remove any items, but I guess this could 
happen with some adversarial pattern. I don't think you can do much about it 
though, since the algo needs those items to maintain the error bounds.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403284#comment-13403284
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

bq. algo needs those items to maintain the error bounds.
Right. That's why I was looking for data structure that can grow in size.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403304#comment-13403304
 ] 

Andrew Wang commented on HBASE-6261:


Yea, everything ultimately goes into the {{sample}} LinkedList. The fixed size 
{{buffer}} is just used to do more efficient batch inserts into {{sample}}.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403456#comment-13403456
 ] 

Zhihong Ted Yu commented on HBASE-6261:
---

Makes sense.
Looking forward to the patch.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403571#comment-13403571
 ] 

Andrew Wang commented on HBASE-6261:


I filed HADOOP-8541, since this is going to be landing in hadoop-common's 
metrics2. When HBASE-5040 clears, we can look into actually hooking it up in 
HBase.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403576#comment-13403576
 ] 

Elliott Clark commented on HBASE-6261:
--

If it only lands in hadoop are we going to be able to use it at all?  
Reflection doesn't seem like it's really viable here where we're trying to call 
the same method on lots of different Histogram objects; it would be pretty slow 
on top of the perf hit we would be taking for the added accuracy.  

Can it just replace MetricsHistogram ?

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403586#comment-13403586
 ] 

Andrew Wang commented on HBASE-6261:


I think it'll be usable from common, it's going to be like the existing 
MutableCounter or MutableStat in that you instantiate it once then call 
updateMethod() a bunch. Unless HBase does it differently than the datanode, I 
don't think reflection is used on the hot path of tracking the stream of 
values, just occasionally to publish it via JMX.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-28 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403595#comment-13403595
 ] 

Elliott Clark commented on HBASE-6261:
--

But we won't be able to require the new version of hadoop that would contain 
the code for quite a while. So we would have to keep our current Histogram 
implementation, use reflection to see if hadoop jars contain UberHistogram(or 
whatever you plan on calling it), if so use reflection to interact with it.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
> Attachments: Latencyestimation.pdf
>
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira