[ 
https://issues.apache.org/jira/browse/CASSANDRA-18945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785945#comment-17785945
 ] 

Stefan Miklosovic commented on CASSANDRA-18945:
-----------------------------------------------

[j17 
trunk|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3488/workflows/69f2bb67-68bc-45b3-a2ee-54fe4bdf8952]
[j11 
trunk|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3488/workflows/be20958a-6b4f-43ba-858c-0450684e8d2c]

There were failures in dtests so I run them once more 
[here|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3492/workflows/c7dbced5-f8b7-423b-90e9-672d167874ea/jobs/151050]

[j17 
5.0|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3491/workflows/8207e0a6-eb6f-4bb6-a5ef-bb43aa7ca003]
[j11 
5.0|https://app.circleci.com/pipelines/github/instaclustr/cassandra/3491/workflows/ee1e7b06-b0db-4365-9982-c3d1184bc9f7]

repeatable tests are looking stable.

the last j11 5.0 job is running right now. Once that is all finished I will 
merge that. [~blambov] approved on the PR already.

> Unified Compaction Strategy is creating too many sstables
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-18945
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18945
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Branimir Lambov
>            Assignee: Ethan Brown
>            Priority: Normal
>             Fix For: 5.0-beta
>
>         Attachments: file_ucs_shenandoah.html, file_ucs_shenandoah_3.html, 
> file_ucs_shenandoah_off_heap_memtable.html, 
> file_ucs_shenandoah_on_heap_memtable_2.html, 
> file_ucs_shenandoah_on_heap_memtable_3.html, key-value-oss.html
>
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The unified compaction strategy currently aims to create sstables with close 
> to the same size, defaulting to 1 GiB. Unfortunately tests show that 
> Cassandra starts to have performance problems when the number of sstables 
> grows to the order of a thousand, and in particular that even 1 TiB of data 
> with the default configuration is creating too many sstables for efficient 
> processing. This matters even more for SAI, where the number of sstables in 
> the system can have a proportional effect on the complexity of operations.
> It is quite easy to create a configuration option that allows sstables to 
> take some part of the data growth by adding a multiplier to [the shard count 
> calculation|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/UnifiedCompactionStrategy.md#sharding]
>  formula, replacing 
> {{2 ^ round(log2(d / (t * b))) * b}} 
> with 
> {{2 ^ round((1 - 𝜆) * log2(d / (t * b))) * b}}, 
> where 𝜆 is a parameter whose value is between 0 and 1.
> With this, a 𝜆 of 0.5 would mean that shard count and sstable size grow in 
> parallel at the square root of the data size growth. 0 would result in no 
> growth, and 1 in always using the same number of shards.
> It may also be valuable to introduce a threshold for engaging the base shard 
> count to avoid splitting lowest-level sstables into fragments that are too 
> small.
> Once both of these are in place, we can set defaults that better suit all 
> node densities, including 10 TiB and beyond, for example:
>  - target size of 1 GiB
>  - 𝜆 of 1/3
>  - base shard count of 4
>  - minimum size 100 MiB



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to