[ https://issues.apache.org/jira/browse/CASSANDRA-20760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18004990#comment-18004990 ]
Dmitry Konstantinov commented on CASSANDRA-20760: ------------------------------------------------- for a typical scenario when we flush the whole memtable into a single directory we can go even further and calculate total partition keys size during a write time, partition count is already captured. In this case the cost of getFlushSet is becoming near 0 and not visible in a profile graph at all. > Optimize calculating of partition key sizes in TrieMemtable#getFlushSet > ----------------------------------------------------------------------- > > Key: CASSANDRA-20760 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20760 > Project: Apache Cassandra > Issue Type: Improvement > Components: Local/Memtable > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Fix For: 5.x > > Attachments: 5.1_cpu_before.html, cpu_profile_after.png, > cpu_profile_before.png, cpu_profile_before_pattern.png > > > Currently within org.apache.cassandra.db.memtable.TrieMemtable#getFlushSet we > iterate over all partitions to retrieve partition key sizes by constructing > byte[] for every partition key. > We can do a similar kind of traversal logic but only count bytes instead of > allocating and filling byte[] values. Additionally, we can skip token bytes > instead of parsing them. > cpu heatmap captured using async profiler: [^5.1_cpu_before.html] > !cpu_profile_before.png|width=1000! > when we do a flush we almost consume 1 cpu core for a about second within > getFlushSet: > !cpu_profile_before_pattern.png|width=1000! -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org