[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Dmitry Konstantinov (Jira) Tue, 18 Feb 2025 15:40:24 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-20250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928218#comment-17928218
 ]


Dmitry Konstantinov edited comment on CASSANDRA-20250 at 2/18/25 11:38 PM:
---------------------------------------------------------------------------

I have implemented a logic with PhantomReference to release metric ids used by 
metric objects (plan to cover with tests and commit soon). I used LocalPool 
logic related to PhantomReferences as an example.

In case of ThreadLocalMetrics I see the following resolvable issue: if I track 
ThreadLocalMetrics themself I cannot keep a hard reference to this object to 
allow PhantomReference to trigger. At the same time I need an accurate value 
for count, so I cannot lost a content of ThreadLocalMetrics when a related 
thread is dead and count logic still needs  an access to it while the 
ThreadLocalMetrics content is not transferred to a summary. So, we need a list 
of ThreadLocalMetrics (or list of ThreadLocalMetrics inner arrays) to be able 
to calculate and return an actual count.

The possible way to resolve this issue is to use PhantomReference to track 
another object associated with a thread, for example Thread itself:

!image-2025-02-18-23-22-19-983.png|width=600!

So, we have a schema like this. We keep hard references to ThreadLocalMetrics 
and access them to calculate a count even if a thread is dead; we use 
PhantomReference which refers Thread object itself to trigger eventual 
transferring of ThreadLocalMetrics values to a summary and removal of 
ThreadLocalMetrics object from the shared list and releasing it in this way.

This is a simplest possible way to handle unused ThreadLocalMetrics to which I 
have found so far..

An additional thought about PhantomReferences: if the referent object  (Thread 
in our case) is long-living and promoted to an old generation we may wait for 
an old gen GC for it for quite a long time and the related ThreadLocalMetrics 
can be kept alive much longer (and affects get count overhead) compared to the 
option when we cleanup ThreadLocalMetrics by a periodic task which checks 
thread aliveness directly..

 


was (Author: dnk):
I have implemented a logic with PhantomReference to release metric ids used by 
metric objects (plan to cover with tests and commit soon). I used LocalPool 
logic related to PhantomReferences as an example.

In case of ThreadLocalMetrics I see the following resolvable issue: if I track 
ThreadLocalMetrics themself I cannot keep a hard reference to this object to 
allow PhantomReference to trigger. At the same time I need an accurate value 
for count, so I cannot lost a content of ThreadLocalMetrics when a related 
thread is dead and count logic still needs  an access to it while the 
ThreadLocalMetrics content is not transferred to a summary. So, we need a list 
of ThreadLocalMetrics (or list of ThreadLocalMetrics inner arrays) to be able 
to calculate and return an actual count.

The possible way to resolve this issue is to use PhantomReference to track 
another object associated with a thread, for example Thread itself:

!image-2025-02-18-23-22-19-983.png|width=600!

So, we have a schema like this. We keep hard references to ThreadLocalMetrics 
and calculate a count even if a thread is dead and we use PhantomReference 
which refers Thread object itself to trigger transferring of ThreadLocalMetrics 
values to a summary and removal of ThreadLocalMetrics object from the shared 
list.

This is a simplest possible way to handle unused ThreadLocalMetrics to which I 
have found so far..

An additional thought about PhantomReferences: if the referent object  (Thread 
in our case) is long-living and promoted to an old generation we may wait for 
an old gen GC for it for quite a long time and the related ThreadLocalMetrics 
can be kept alive much longer (and affects get count overhead) compared to the 
option when we cleanup ThreadLocalMetrics by a periodic task which checks 
thread aliveness directly..

 

> Optimize Counter, Meter and Histogram metrics using thread local counters
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20250
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20250
>             Project: Apache Cassandra
>          Issue Type: New Feature
>          Components: Observability/Metrics
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: 5.1_profile_cpu.html, 
> 5.1_profile_cpu_without_metrics.html, 5.1_tl4_profile_cpu.html, 
> Histogram_AtomicLong.png, async_profiler_cpu_profiles.zip, 
> cpu_profile_insert.html, image-2025-02-18-23-22-19-983.png, jmh-result.json, 
> vmstat.log, vmstat_without_metrics.log
>
>
> Cassandra has a lot of metrics collected, many of them are collected per 
> table, so their instance number is multiplied by number of tables. From one 
> side it gives a better observability, from another side metrics are not for 
> free, there is an overhead associated with them:
> 1) CPU overhead: in case of simple CPU bound load: I already see like 5.5% of 
> total CPU spent for metrics in cpu framegraphs for read load and 11% for 
> write load. 
> Example: [^cpu_profile_insert.html] (search by "codahale" pattern). The 
> framegraph is captured using Async profiler build: 
> async-profiler-3.0-29ee888-linux-x64
> 2) memory overhead: we spend memory for entities used to aggregate metrics 
> such as LongAdders and reservoirs + for MBeans (String concatenation within 
> object names is a major cause of it, for each table+metric name combination a 
> new String is created)
> LongAdder is used by Dropwizard Counter/Meter and Histogram metrics for 
> counting purposes. It has severe memory overhead + while has a better scaling 
> than AtomicLong we still have to pay some cost for the concurrent operations. 
> Additionally, in case of Meter - we have a non-optimal behaviour when we 
> count the same things several times.
> The idea (suggested by [~benedict]) is to switch to thread-local counters 
> which we can store in a common thread-local array to reduce memory overhead. 
> In this way we can avoid concurrent update overheads/contentions and to 
> reduce memory footprint as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-20250) Optimize Counter, Meter and Histogram metrics using thread local counters

Reply via email to