Joey Lynch created CASSANDRA-17381:
--------------------------------------

             Summary: Produce and verify BoundedReadCompactionStrategy as a 
unified general purpose compaction algorithm
                 Key: CASSANDRA-17381
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17381
             Project: Cassandra
          Issue Type: Improvement
          Components: Local/Compaction
            Reporter: Joey Lynch
            Assignee: Joey Lynch


The existing compaction strategies have a number of drawbacks that make all 
three unsuitable as a general use compaction strategy, for example STCS creates 
giant files that are hard to back up, mess with the page cache, and led to many 
of the early re-open bugs. LCS improved dramatically on this but also has 
various issues e.g. lack of performant full compaction or due to the strict 
leveling with e.g. bulk loading when writes exceed the rate we can do the L0 - 
L1 promotion.

In this 
[talk|https://github.com/ngcc/ngcc2019/blob/master/NextGenerationCassandraCompactionGoingBeyondLCS.pdf]
 I introduced a novel compaction strategy that aims to expose a single tunable 
that the user can control for the read amplification. Raise the 
max_read_per_read and you tradeoff read/space performance for write 
performance. Since then a proof of concept [patch 
|https://github.com/jolynch/cassandra/tree/jolynch_bounded_read_final]has been 
published along with some rudimentary [documentation 
|https://gist.github.com/jolynch/9118465f32ad5298b4e39d03ccd4370e] but this is 
still not tracked in Jira.

The remaining work here is

1. Validate the algorithm is correct via test suites and performance testing 
stress testing and benchmarking with OSS tools (e.g. cassandra-stress, 
[tlp-stress|https://github.com/thelastpickle/tlp-stress], or 
[ndbench|https://github.com/Netflix/ndbench]). When issues are found (there 
likely will be issues as the patch is a PoC), devise how to adjust the 
algorithm and implementation appropriately. Key metric of success is we can run 
Cassandra stably for more than 24 hours while applying sustained load, and 
compaction can keep up.

2. Do more in depth experiments measuring performance across a wide range of 
workloads (e.g. write heavy, read heavy, balanced, time series, register 
update, etc ...) and in comparison with LCS (leveled), STCS (size tiered), and 
TWCS (time window). Key metrics of success are establishing that as we tune 
max_read_per_read we should get more predictable read latency under low system 
load (ρ < 30%) while not degrading at high system load (ρ > 70%), and we should 
match LCS performance under low load while doing better at high load.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to