[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dinesh Joshi updated CASSANDRA-15379: ------------------------------------- Status: Changes Suggested (was: Review In Progress) Hi [~jolynch], thanks for the patch. I went over it and it looks generally good. On a high level the only concern I have is introducing a {{NoOpCompressor}} may lead to some performance issues compared to our current state. This is mainly due to Java JIT's inability to optimize megamorphic call sites. However, I think this is just a theory and we should try and validate it using an actual performance test. IMHO, the advantages that you have laid out would outweight a bit of performance penalty. Other than that, I had some code related feedback. It fixes the {{DatabaseDescriptorRefTest}} and also makes minor structural modifications for safety and clarity. I have illustrated in my branch [here|https://github.com/apache/cassandra/compare/trunk...dineshjoshi:CASSANDRA-15379-review?expand=1]. Please feel free to cherry pick the commits in your branch. > Make it possible to flush with a different compression strategy than we > compact with > ------------------------------------------------------------------------------------ > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable > Reporter: Joey Lynch > Assignee: Joey Lynch > Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org