[ https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780969#comment-17780969 ]
Stefan Miklosovic commented on CASSANDRA-18945: ----------------------------------------------- I noticed that UCS is not introduced in cqlsh so we can not autocomplete that, like (1) and (2) etc. I think this is a good opportunity to finish it if we are introducing new options here. Also, ant rat-check was failing on missing licences in svg files. I created a PR against that PR here where it is all fixed (3) We should also add parameters of UCS to doc/cql3/CQL.textile among other compaction strategies. I have left this exercise on the authors of this ticket. (1) https://github.com/apache/cassandra/blob/trunk/pylib/cqlshlib/cqlhandling.py#L49 (2) https://github.com/apache/cassandra/blob/trunk/pylib/cqlshlib/cql3handling.py#L86-L103 (3) https://github.com/datastax/cassandra/pull/833 > Unified Compaction Strategy is creating too many sstables > --------------------------------------------------------- > > Key: CASSANDRA-18945 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18945 > Project: Cassandra > Issue Type: Bug > Components: Local/Compaction > Reporter: Branimir Lambov > Assignee: Ethan Brown > Priority: Normal > Fix For: 5.0, 5.x > > Attachments: file_ucs_shenandoah.html, file_ucs_shenandoah_3.html, > file_ucs_shenandoah_off_heap_memtable.html, > file_ucs_shenandoah_on_heap_memtable_2.html, > file_ucs_shenandoah_on_heap_memtable_3.html, key-value-oss.html > > Time Spent: 1h > Remaining Estimate: 0h > > The unified compaction strategy currently aims to create sstables with close > to the same size, defaulting to 1 GiB. Unfortunately tests show that > Cassandra starts to have performance problems when the number of sstables > grows to the order of a thousand, and in particular that even 1 TiB of data > with the default configuration is creating too many sstables for efficient > processing. This matters even more for SAI, where the number of sstables in > the system can have a proportional effect on the complexity of operations. > It is quite easy to create a configuration option that allows sstables to > take some part of the data growth by adding a multiplier to [the shard count > calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding] > formula, replacing > {{2 ^ round(log2(d / (t * b))) * b}} > with > {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, > where 𝜆 is a parameter whose value is between 0 and 1. > With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in > parallel at the square root of the data size growth. 0 would result in no > growth, and 1 in always using the same number of shards. > It may also be valuable to introduce a threshold for engaging the base shard > count to avoid splitting lowest-level sstables into fragments that are too > small. > Once both of these are in place, we can set defaults that better suit all > node densities, including 10 TiB and beyond, for example: > - target size of 1 GiB > - 𝜆 of 1/3 > - base shard count of 4 > - minimum size 100 MiB -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org