Taejin Koo created CASSANDRA-20909:
--------------------------------------

             Summary: Potential inefficiency in first flush with TrieMemtable + 
UnifiedCompaction
                 Key: CASSANDRA-20909
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20909
             Project: Apache Cassandra
          Issue Type: Bug
            Reporter: Taejin Koo


Hello,

I’m a Cassandra newbie, so please correct me if I’ve misunderstood something.

While evaluating Cassandra for adoption, I’ve been testing different modes. In 
the process of exploring features, I considered using TrieMemtable together 
with the UnifiedCompaction strategy (level). My thought was that combining the 
two could allow SSTables created from memtables to utilize token ranges much 
more efficiently, so I’ve been investigating whether this is feasible.

During this testing, I noticed that the first SSTable created is always 
generated as a single file. Because only one SSTable is produced, it ends up 
covering the entire token range, which I believe is highly inefficient.

After digging into the source, I found that the cause seems to be in
{{{}org.apache.cassandra.db.compaction.unified.Controller#getFlushSizeBytes{}}}:
 
{{double envFlushSize = cfs.metric.flushSizeOnDisk.get();}}
For the very first flush, this metric returns {{{}0{}}}, which results in the 
minimum value of {{1}} being used. This explains why the first SSTable covers 
the entire token range.

I also experimented with using {{flushSizeOverride}} to generate arbitrary 
shard values, but since this parameter does not adjust dynamically, I feel this 
is not a good long-term solution.

Once again, I may be misunderstanding this behavior, so please correct me if my 
interpretation is wrong.

Thank you. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to