[ https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402733#comment-13402733 ]
Andrew Wang commented on HBASE-6261: ------------------------------------ I've got my Java implementation of the non-sliding biased quantiles algorithm (QuantileEstimationCKMS.java) up on github: https://github.com/umbrant/QuantileEstimation Benchmarking on my laptop, I pushed 1 million shuffled items [0, 10**9) through it in 1.2 seconds while asking it to track the 50th, 90th, 95th, and 99th percentiles with low error. It kept ~5500 samples to do this, which at ~36B per sample, is about 193KiB. Empirical error was basically 0. I also ran it for 10 million random longs, which took 19s and about 685KiB. I think this is pretty lightweight. If this sounds reasonable, I'll start working on a patch. > Better approximate high-percentile percentile latency metrics > ------------------------------------------------------------- > > Key: HBASE-6261 > URL: https://issues.apache.org/jira/browse/HBASE-6261 > Project: HBase > Issue Type: New Feature > Reporter: Andrew Wang > Labels: metrics > Attachments: Latencyestimation.pdf > > > The existing reservoir-sampling based latency metrics in HBase are not > well-suited for providing accurate estimates of high-percentile (e.g. 90th, > 95th, or 99th) latency. This is a well-studied problem in the literature (see > [1] and [2]), the question is determining which methods best suit our needs > and then implementing it. > Ideally, we should be able to estimate these high percentiles with minimal > memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% > on 99th). It's also desirable to provide this over different time-based > sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour. > I'll note that this would also be useful in HDFS, or really anywhere latency > metrics are kept. > [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf > [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira