[ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17924503#comment-17924503
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-20250 at 2/6/25 11:50 AM:
-----------------------------------------------------------------------------

bq. I think we have some natural clusterisation of them, like: there is one set 
used by write threads and very different one used by repair threads

Yes, I agree and had considered e.g. simply chunking our address space so you 
include an adjacent few hundred metrics might suffice. The problem is doing 
this without introducing lots of extra latency, but you could simply mask the 
total address space. In fact, there is a fairly simple approach to just divide 
the global address space by 64, and use simple bit fiddling to determine which 
slices of that you have right now, and still use a single linear array. 

But, I think for a first version simple is good.

bq. Regarding PhantomReference - I thought about it but I remember what it may 
impact negatively GC pauses

For a small number of objects like this it should be fine. But, this is also 
not a hard requirement. If we're running the update method frequently enough 
this is unlikely to be a serious problem.


was (Author: benedict):
bq. I think we have some natural clusterisation of them, like: there is one set 
used by write threads and very different one used by repair threads

Yes, I agree and had considered e.g. simply chunking our address space so you 
include an adjacent few hundred metrics might suffice. The problem is doing 
this without introducing lots of extra latency, but you could simply mask the 
total address space. In fact, there is a fairly simple approach to just divide 
the global address space by 64, and use simply bit fiddling to determine which 
slices of that you have right now, and still use a single linear array. 

But, I think for a first version simple is good.

bq. Regarding PhantomReference - I thought about it but I remember what it may 
impact negatively GC pauses

For a small number of objects like this it should be fine. But, this is also 
not a hard requirement. If we're running the update method frequently enough 
this is unlikely to be a serious problem.

> Provide the ability to disable specific metrics collection
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Priority: Normal
>         Attachments: cpu_profile_insert.html
>
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern)
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
>  
> The idea of this ticket is to allow an operator to configure a list of 
> disabled metrics in cassandra.yaml, like:
> {code:java}
> disabled_metrics:
>     - metric_a
>     - metric_b
> {code}
> From implementation point of view I see two possible approaches (which can be 
> combined):
>  # Generic: when a metric is registering if it is listed in disabled_metrics 
> we do not publish it via JMX and provide a noop implementation of metric 
> object (such as histogram) for it.
> Logging analogy: log level check within log method
>  # Specialized: for some metrics the process of value calculation is not for 
> free and introduces an overhead as well, in such cases it would be useful to 
> check within specific logic using an API (like: isMetricEnabled) do we need 
> to do it. Example of such metric: 
> ClientRequestSizeMetrics.recordRowAndColumnCountMetrics
> Logging analogy: an explicit 'if (isDebugEnabled())' condition used when a 
> message parameter is expensive.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to