nicmichael opened a new issue #2053: FastCodahale Timers miscalculate percentiles URL: https://github.com/apache/bookkeeper/issues/2053 **BUG REPORT** ***Describe the bug*** FastCodahale Timer implementation may miscalculate percentiles if snapshots of values are slightly out of sync, and if only few events have been recorded. FastCodahale Timers use fine-grained locking and are meant to tolerate that (some) values change while being recorded or while snapshots are created. Currently, the total count of requests is not synchronized with the number of requests recorded in percentile buckets. If a snapshot is created while the total count of the timer has been incremented beyond the sum of values in the percentile buckets, the percentile calculation may produce wrong values. For example, if 3 percentile values have been recorded, but the overall count is 4, then the percentile calculation would be based on 4 values. This becomes most obvious if a percentile > .75 (e.g. p95) is being calculated. For this, the implementation will try to find 0.95 * 4 values, which is more than the 3 values recorded in the buckets. Since no bucket fulfills the criteria, the bound of the last (overflow) bucket will be returned, i.e. Long.MAX_VALUE. ***To Reproduce*** I added a test case in my pull request to fix this bug which demonstrates the behavior. ***Expected behavior*** Snapshots should base the percentile calculation on the sum of all values contained in the buckets, not on any value provided by a caller (which might be wrong or out of sync).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
