[jira] [Comment Edited] (CASSANDRA-20250) Provide the ability to disable specific metrics collection

Dmitry Konstantinov (Jira) Thu, 06 Feb 2025 15:00:29 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924714#comment-17924714
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20250 at 2/6/25 10:48 PM:
--------------------------------------------------------------------------

I have added a very basic JMH to compare increment throughput - 
[code|https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:20250_proto-trunk]
This is the result of execution on my laptop (MacOS, OpenJDK11, 2,6 GHz 6-Core 
Intel Core i7), ideally I need to run it on a server machine with more cores 
for more relevant results:
{code:java}
     [java] Benchmark                                  (type)   Mode  Cnt       
 Score       Error   Units
     [java] ThreadLocalMetricsBench.increment       LongAdder  thrpt   16   
528024.232 ±  5197.699  ops/ms
     [java] ThreadLocalMetricsBench.increment    LazySetArray  thrpt   16   
957165.739 ±  7878.474  ops/ms
     [java] ThreadLocalMetricsBench.increment  PiggybackArray  thrpt   16  
1016821.284 ± 24017.736  ops/ms
{code}
[^async_profiler_cpu_profiles.zip] 
[^jmh-result.json]

The next steps I think is to add delta and delta + Int2IntHashMap 
implementations..


was (Author: dnk):
I have added a very basic JMH to compare increment throughput - 
[code|https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:20250_proto-trunk]
This is the result of execution on my laptop (MacOS, OpenJDK11, 2,6 GHz 6-Core 
Intel Core i7), ideally I need to run it on a server machine for more relevant 
results:
{code:java}
     [java] Benchmark                                  (type)   Mode  Cnt       
 Score       Error   Units
     [java] ThreadLocalMetricsBench.increment       LongAdder  thrpt   16   
528024.232 ±  5197.699  ops/ms
     [java] ThreadLocalMetricsBench.increment    LazySetArray  thrpt   16   
957165.739 ±  7878.474  ops/ms
     [java] ThreadLocalMetricsBench.increment  PiggybackArray  thrpt   16  
1016821.284 ± 24017.736  ops/ms
{code}
[^async_profiler_cpu_profiles.zip] 
[^jmh-result.json]

The next steps I think is to add delta and delta + Int2IntHashMap 
implementations..

> Provide the ability to disable specific metrics collection
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Priority: Normal
>         Attachments: async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, jmh-result.json
>
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern)
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
>  
> The idea of this ticket is to allow an operator to configure a list of 
> disabled metrics in cassandra.yaml, like:
> {code:java}
> disabled_metrics:
>     - metric_a
>     - metric_b
> {code}
> From implementation point of view I see two possible approaches (which can be 
> combined):
>  # Generic: when a metric is registering if it is listed in disabled_metrics 
> we do not publish it via JMX and provide a noop implementation of metric 
> object (such as histogram) for it.
> Logging analogy: log level check within log method
>  # Specialized: for some metrics the process of value calculation is not for 
> free and introduces an overhead as well, in such cases it would be useful to 
> check within specific logic using an API (like: isMetricEnabled) do we need 
> to do it. Example of such metric: 
> ClientRequestSizeMetrics.recordRowAndColumnCountMetrics
> Logging analogy: an explicit 'if (isDebugEnabled())' condition used when a 
> message parameter is expensive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20250) Provide the ability to disable specific metrics collection

Reply via email to