[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Maxim Muzafarov (Jira) Mon, 24 Feb 2025 05:38:39 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17929759#comment-17929759
 ]


Maxim Muzafarov edited comment on CASSANDRA-20250 at 2/24/25 1:12 PM:
----------------------------------------------------------------------

Reporters are not the only problem here as far as I can see. A few projects in 
the Cassandra ecosystem rely heavily on Cassandra to have dropwizard metrics 
and the related API. This includes using listeners for the MetricsRegistry and 
metric types provided by the dropwizard library. Without support for the 
Dropwizard API, changes would likely be incompatible and the impact on the 
community is quite high.

Examples:

{*}cassandra-sidecar{*}, collects metrics via the library [1], I'm not fully 
familiar with this project, but briefly checked the sources and it seems it has 
also a registry, that collects metrics from a C* node.
[1] [https://vertx.io/docs/4.4.9/vertx-dropwizard-metrics/java/]
[https://github.com/apache/cassandra-sidecar/blob/trunk/server/src/main/java/org/apache/cassandra/sidecar/metrics/SidecarMetricsImpl.java#L42]



{*}management-api-for-apache-cassandra{*}, they just use 
CassandraMetricsRegistry directly, and listeners of course with metrics types.
[https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/management-api-agent-common/src/main/java/io/k8ssandra/metrics/interceptors/MetricsInterceptor.java#L71]

 


was (Author: mmuzaf):
Reporters are not the only problem here as far as I can see. A few projects in 
the Cassandra ecosystem rely heavily on Cassandra to have dropwizard metrics 
and the related API. This includes using listeners for the MetricsRegistry and 
metric types provided by the dropwizard library. Without support for the 
Dropwizard API, changes would likely be incompatible and the impact on the 
community is quite high.

Examples:

{*}cassandra-sidecar{*}, collects metrics via the library [1], I'm not fully 
familiar with this project, but briefly checked the sources and it seems it has 
also a registry, that collects metrics from a C* node.
[1] [https://vertx.io/docs/4.4.9/vertx-dropwizard-metrics/java/]

{*}management-api-for-apache-cassandra{*}, they just use 
CassandraMetricsRegistry directly, and listeners of course with metrics types.
[https://github.com/k8ssandra/management-api-for-apache-cassandra/blob/master/management-api-agent-common/src/main/java/io/k8ssandra/metrics/interceptors/MetricsInterceptor.java#L71]

 

> Optimize Counter, Meter and Histogram metrics using thread local counters
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, image-2025-02-18-23-22-19-983.png, jmh-result.json, 
> vmstat.log, vmstat_without_metrics.log
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
> LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for 
> counting purposes. It has severe memory overhead + while has a better scaling 
> than AtomicLong we still have to pay some cost for the concurrent operations. 
> Additionally, in case of Meter - we have a non-optimal behaviour when we 
> count the same things several times.
> The idea (suggested by [~benedict]) is to switch to thread-local counters 
> which we can store in a common thread-local array to reduce memory overhead. 
> In this way we can avoid concurrent update overheads/contentions and to 
> reduce memory footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Reply via email to