[ https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778600#comment-17778600 ]
Stefan Miklosovic commented on CASSANDRA-18945: ----------------------------------------------- Can you elaborate what performance problems you saw? I was testing UCS by loading around 1TB (more like 800G) and I have not see any problem in cassandra-stress. It was still responsive and write throughput was not affected. > Unified Compaction Strategy is creating too many sstables > --------------------------------------------------------- > > Key: CASSANDRA-18945 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18945 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction > Reporter: Branimir Lambov > Assignee: Ethan Brown > Priority: Normal > > The unified compaction strategy currently aims to create sstables with close > to the same size, defaulting to 1 GiB. Unfortunately tests show that > Cassandra starts to have performance problems when the number of sstables > grows to the order of a thousand, and in particular that even 1 TiB of data > with the default configuration is creating too many sstables for efficient > processing. This matters even more for SAI, where the number of sstables in > the system can have a proportional effect on the complexity of operations. > It is quite easy to create a configuration option that allows sstables to > take some part of the data growth by adding a multiplier to [the shard count > calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding] > formula, replacing > {{2 ^ round(log2(d / (t * b))) * b}} > with > {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, > where 𝜆 is a parameter whose value is between 0 and 1. > With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in > parallel at the square root of the data size growth. 0 would result in no > growth, and 1 in always using the same number of shards. > It may also be valuable to introduce a threshold for engaging the base shard > count to avoid splitting lowest-level sstables into fragments that are too > small. > Once both of these are in place, we can set defaults that better suit all > node densities, including 10 TiB and beyond, for example: > - target size of 1 GiB > - 𝜆 of 1/3 > - base shard count of 4 > - minimum size 100 MiB -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org