[ 
https://issues.apache.org/jira/browse/CASSANDRA-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15366750#comment-15366750
 ] 

Per Otterström commented on CASSANDRA-11752:
--------------------------------------------

I understand your concern. I have not verified the performance impact myself. 
The locking scheme is very much inspired by the one used by the 
[ExponentiallyDecayingReservoir|https://github.com/dropwizard/metrics/blob/3.1-maintenance/metrics-core/src/main/java/com/codahale/metrics/ExponentiallyDecayingReservoir.java]
 in the Metrics library.

It should be possible to add some kind of buffering during rescale. Another 
option with less complexity could be to simply skip the collection of metrics 
during rescale and let threads continue. We would loose some accuracy in the 
percentiles and possibly an outlier in the min/max values. We can still add 
metrics to the non-decaying buckets during rescale, so getValues() will still 
be just as accurate as it is now. Any opinion on this?

I went for 30 minutes rescale interval based on the assumption that in an 
extreme case a metric could hit the same bucket a million times every second, 
so 60M times every minute. After 30 minutes forward decay factor will be 
29^2=536870912. Accumulated value will be 60M, 180M, 420M...64P which will be 
represented with 56 bits, giving us some extra head room in a signed 64 bit 
long. Based on these assumptions it could be possible to fit another few 
minutes, but 60 would be to much. Should perhaps mention these assumptions in 
the java-doc.

I don't have plots showing the effect of the rescale. I'm out of office for a 
few weeks but I'll try to verify this and performance impact as soon as I find 
the time.



> histograms/metrics in 2.2 do not appear recency biased
> ------------------------------------------------------
>
>                 Key: CASSANDRA-11752
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11752
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Chris Burroughs
>            Assignee: Per Otterström
>              Labels: metrics
>             Fix For: 2.2.8
>
>         Attachments: 11752-2.2.txt, boost-metrics.png, 
> c-jconsole-comparison.png, c-metrics.png, default-histogram.png
>
>
> In addition to upgrading to metrics3, CASSANDRA-5657 switched to using  a 
> custom histogram implementation.  After upgrading to Cassandra 2.2 
> histograms/timer metrics are not suspiciously flat.  To be useful for 
> graphing and alerting metrics need to be biased towards recent events.
> I have attached images that I think illustrate this.
>  * The first two are a comparison between latency observed by a C* 2.2 (us) 
> cluster shoring very flat lines and a client (using metrics 2.2.0, ms) 
> showing server performance problems.  We can't rule out with total certainty 
> that something else isn't the cause (that's why we measure from both the 
> client & server) but they very rarely disagree.
>  * The 3rd image compares jconsole viewing of metrics on a 2.2 and 2.1 
> cluster over several minutes.  Not a single digit changed on the 2.2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to