[jira] [Updated] (CASSANDRA-20760) Optimize calculating of partition key sizes in TrieMemtable#getFlushSet

Dmitry Konstantinov (Jira) Sun, 13 Jul 2025 07:25:07 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dmitry Konstantinov updated CASSANDRA-20760:
--------------------------------------------
    Description: 
Currently within org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet we 
iterate over all partitions to retrieve partition key sizes by constructing 
byte[] for every partition key.

We can do a similar kind of traversal logic but only count bytes instead of 
allocating and filling byte[] values. Additionally, we can skip token bytes 
instead of parsing them.

!cpu_profile_before.png|width=1000!

when we do a flush we almost consume 1 cpu core for a second within getFlushSet:
 !cpu_profile_before_pattern.png!  

  was:
Currently within org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet we 
iterate over all partitions to retrieve partition key sizes by constructing 
byte[] for every partition key.

We can do a similar kind of traversal logic but only count bytes instead of 
allocating and filling byte[] values. Additionally, we can skip token bytes 
instead of parsing them.

!cpu_profile_before.png|width=1000!


> Optimize calculating of partition key sizes in TrieMemtable#getFlushSet
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-20760
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20760
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: cpu_profile_after.png, cpu_profile_before.png, 
> cpu_profile_before_pattern.png
>
>
> Currently within org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet we 
> iterate over all partitions to retrieve partition key sizes by constructing 
> byte[] for every partition key.
> We can do a similar kind of traversal logic but only count bytes instead of 
> allocating and filling byte[] values. Additionally, we can skip token bytes 
> instead of parsing them.
> !cpu_profile_before.png|width=1000!
> when we do a flush we almost consume 1 cpu core for a second within 
> getFlushSet:
>  !cpu_profile_before_pattern.png!  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-20760) Optimize calculating of partition key sizes in TrieMemtable#getFlushSet

Reply via email to