[ https://issues.apache.org/jira/browse/KAFKA-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002605#comment-16002605 ]
ASF GitHub Bot commented on KAFKA-5203: --------------------------------------- GitHub user iv-m opened a pull request: https://github.com/apache/kafka/pull/3002 KAFKA-5203: Metrics: fix resetting of histogram sample Without the histogram cleanup, the percentiles are calculated incorrectly after purging of one or more samples: event counts go out of sync with counts in histogram buckets, and bucket with lower value gets chosen for the given quantile. This change adds the necessary histogram cleanup. You can merge this pull request into a Git repository by running: $ git pull https://github.com/iv-m/kafka kafka-5203-percentiles-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3002.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3002 ---- commit 07c176a23ea11b6ba7a103d8aaf4867591ba9293 Author: Ivan A. Melnikov <i...@altlinux.org> Date: 2017-05-09T12:15:27Z KAFKA-5203: Metrics: fix resetting of histogram sample Without the histogram cleanup, the percentiles are calculated incorrectly after purging of one or more samples: event counts go out of sync with counts in histogram buckets, and bucket with lower value gets chosen for the given quantile. This change adds the necessary histogram cleanup. ---- > Percentilles are calculated incorrectly > --------------------------------------- > > Key: KAFKA-5203 > URL: https://issues.apache.org/jira/browse/KAFKA-5203 > Project: Kafka > Issue Type: Bug > Components: metrics > Reporter: Ivan A. Melnikov > Priority: Minor > > After the samples are purged couple of times, the calculated percentile > values tend to decrease comparing to the expected values. > Consider the following simple example (sorry, idk if I can make it shorter): > {code} > int buckets = 100; > Metrics metrics = new Metrics(new > MetricConfig().eventWindow(buckets/2).samples(2)); > Sensor sensor = metrics.sensor("test"); > sensor.add(new Percentiles(4 * buckets, 100.0, > Percentiles.BucketSizing.CONSTANT, > new Percentile(metrics.metricName("test.p50", "grp1"), 50), > new Percentile(metrics.metricName("test.p75", "grp1"), 75))); > Metric p50 = metrics.metrics().get(metrics.metricName("test.p50", > "grp1")); > Metric p75 = metrics.metrics().get(metrics.metricName("test.p75", > "grp1")); > for (int i = 0; i < buckets; i++) sensor.record(i); > System.out.printf("p50=%.3f p75=%.3f\n", p50.value(), p75.value()); > for (int i = 0; i < buckets; i++) sensor.record(i); > System.out.printf("p50=%.3f p75=%.3f\n", p50.value(), p75.value()); > for (int i = 0; i < buckets; i++) sensor.record(i); > System.out.printf("p50=%.3f p75=%.3f\n", p50.value(), p75.value()); > {code} > The output from this is: > {noformat} > p50=50.000 p75=74.490 > p50=24.490 p75=36.735 > p50=15.306 p75=24.490 > {noformat} > The expected output is, of course, with all three lines similar to the first > one. -- This message was sent by Atlassian JIRA (v6.3.15#6346)