[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Benedict Elliott Smith (Jira) Mon, 24 Feb 2025 05:26:43 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929761#comment-17929761
 ]


Benedict Elliott Smith edited comment on CASSANDRA-20250 at 2/24/25 1:25 PM:
-----------------------------------------------------------------------------

[~mmuzaf] this still uses the DropWizard API, and implements the correct 
interfaces for querying them. It just doesn't use the concrete classes defined 
by DropWizard.

Looking at the vertx library, they correctly use {{Metered}}, and so the 
majority of these changes would be correctly supported. It however looks like 
vertx incorrectly uses {{Counter}} rather than {{Counting}} ,and {{Histogram}} 
rather than {{Sampling}}. We might want to contribute a patch upstream to fix 
that. It should be a very small patch.

It looks like the k8ssandra project has used entirely the wrong objects 
throughout. I guess that will want a fix contributing too.

My personal view is that we should not make unbounded affordances to 
incorrectly implemented consumers of DropWizard. Once we are confident the 
exporting services we directly support are compatible, we should proceed. But, 
given even some of our own consumers are incorrectly implemented, this will 
have to land in a major release, and we will need to discuss it on dev@ as no 
doubt there will be dissenters.


was (Author: benedict):
[~mmuzaf] this still uses the DropWizard API, and implements the correct 
interfaces for querying them. It just doesn't use the concrete classes defined 
by DropWizard.

Looking at the vertx library, they correctly use {{Metered}}, and so the 
majority of these changes would be correctly supported. It however looks like 
vertx incorrectly uses {{Counter}} rather than {{Counting}} ,and {{Histogram}} 
rather than {{Sampling}}. We might want to contribute a patch upstream to fix 
that. It should be a very small patch.

It looks like the k8ssandra project has used entirely the wrong objects 
throughout. I guess that will want a fix contributing too.

My personal view is that we should not make unbounded affordances to 
incorrectly implemented consumers of DropWizard. Once we are confident the 
exporting services we directly support are compatible, we should process. But, 
given our own consumers are incorrectly implemented this will have to land in a 
major release, and we will need to discuss it on dev@ as no doubt there will be 
dissenters.

> Optimize Counter, Meter and Histogram metrics using thread local counters
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, image-2025-02-18-23-22-19-983.png, jmh-result.json, 
> vmstat.log, vmstat_without_metrics.log
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
> LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for 
> counting purposes. It has severe memory overhead + while has a better scaling 
> than AtomicLong we still have to pay some cost for the concurrent operations. 
> Additionally, in case of Meter - we have a non-optimal behaviour when we 
> count the same things several times.
> The idea (suggested by [~benedict]) is to switch to thread-local counters 
> which we can store in a common thread-local array to reduce memory overhead. 
> In this way we can avoid concurrent update overheads/contentions and to 
> reduce memory footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Reply via email to