[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088266#comment-17088266 ] Joey Lynch edited comment on CASSANDRA-15379 at 4/21/20, 11:21 PM: --- *Zstd Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: ~100 million partitions with 2 rows each of 10 columns, total size per partition of about 4 KiB of random data. ~120 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=320MiB, fanout=20 * Compression: Zstd with 16 KiB block size I had to tweak some settings to make compaction less of the overall trace (it was 50+% or more of the traces) which are hiding the flush behavior. Specifically I increased the size of the memtable before flush by increasing the {{memtable_cleanup_threshold}} setting from 0.11 to 0.5, which allowed flushes to get up to 1.4 GiB, and by setting compaction to defer as long as we can before doing the L0 -> L1 transition: {noformat} compaction = {'class': 'LeveledCompactionStrategy', 'fanout_size': '20', 'max_threshold': '128', 'min_threshold': '32', 'sstable_size_in_mb': '320'} compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor'} {noformat} I would prefer to up fanout_size even more to defer compactions further, but with the increase in memtable size and increase in sstable size and fanout I was able to reduce the compaction load to where the cluster was stable (pending compactions not growing without bound) on both baseline and candidate *Zstd Defaults Benchmark Results*: Candidate flushes were spaced about 4 minutes apart and took about 8 seconds to flush 1.4 GiB. Flamegraphs show 50% of on-cpu time in flush writer and ~45 in compression. [^15379_candidate_flush_trace.png] Baseline flushes were spaced about 4 minutes apart and took about 22 seconds to flush 1.4 GiB. Flamegraphs show 20% of on-cpu time in flush writer and ~75 in compression. [^15379_baseline_flush_trace.png] No significant change in coordinator level, replica level latency or system metrics. Some latencies were better on candidate some worse. [^15379_system_zstd_defaults.png] [^15379_coordinator_zstd_defaults.png] [^15379_replica_zstd_defaults.png] I think the main finding here is that already, with the cheapest zstd level, we are running closer to the flush interval than I'd like (if it takes longer to flush then the next time we flush, it's bad news bears for the cluster), and this is with a relatively small number of writes per second (~400 coordinator writes per second per node) *Next steps:* I've published a final squashed commit to: ||trunk|| |[657c39d4|https://github.com/jolynch/cassandra/commit/657c39d4aba0888c6db6a46d1b1febf899de9578]| |[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379-final]| |[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379-final]| There appear to be a lot of failures in java8 runs that I'm pretty sure are unrelated to my change (unit tests and in-jvm dtests passed, along with long unit tests). I'll look into all the failures and make sure they're unrelated (on a related note I'm :( that trunk is so red again). I am now running a test with Zstd compression set to a block size of 256 KiB and level 10, which is how we typically run it in production for write mosty read rarely datasets such as trace data (for the significant reduction in disk space). was (Author: jolynch): *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: ~100 million partitions with 2 rows each of 10 columns, total size per partition of about 4 KiB of random data. ~120 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=320MiB, fanout=20 * Compression: Zstd with 16 KiB block size I had to tweak some settings to make compaction less of the overall trace (it was 50+% or more of the traces) which are hiding the flush behavior. Specifically I increased the size of the memtable before flush by increasing the {{memtable_cleanup_threshold}} setting from 0.11 to 0.5, which allowed flushes to get up to 1.4 GiB, and by setting compaction to defer as long as we can before doing the L0 -> L1 transition: {noformat} compaction = {'class': 'LeveledCompactionStrategy', 'fanout_size': '20', 'max_threshold': '128', 'min_threshold': '32', 'sstable_size_in_mb': '320'} compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.ZstdCompressor'} {noformat} I would prefer to up fanout_size even more to defer compactions further, but with the increase in memtable size and
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087187#comment-17087187 ] Joey Lynch edited comment on CASSANDRA-15379 at 4/19/20, 9:44 PM: -- Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) * Relevant JVM configuration: 12 GiB heap size In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 10 million partitions with 2 rows each of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block size *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for commit * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to see reduced CPU usage due to flushes. I will likely have to reduce the read/write throughput due to compactions taking a crazy amount of our on CPU time with this configuration. was (Author: jolynch): Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 10 million partitions with 2 rows each of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block size *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087187#comment-17087187 ] Joey Lynch edited comment on CASSANDRA-15379 at 4/19/20, 9:09 PM: -- Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 10 million partitions with 2 rows each of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block size *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for commit * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to see reduced CPU usage due to flushes. I will likely have to reduce the read/write throughput due to compactions taking a crazy amount of our on CPU time with this configuration. was (Author: jolynch): Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block size *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087187#comment-17087187 ] Joey Lynch edited comment on CASSANDRA-15379 at 4/19/20, 9:00 PM: -- Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block size *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for commit * Run a benchmark of `ZstdCompressor` with and without the patch, we expect to see reduced CPU usage due to flushes. I will likely have to reduce the read/write throughput due to compactions taking a crazy amount of our on CPU time with this configuration. was (Author: jolynch): Alright, finally fixed our internal trunk build so we can do performance validations again. I ran the following performance benchmark and the results are essentially identical for the default configuration (so testing _just_ the addition of the NoopCompressor on the megamorphic call sites). *Experimental Setup:* A baseline and candidate cluster of EC2 machines running the following: * C* cluster: 3x3 (us-east-1 and eu-west-1) i3.2xlarge * Load cluster: 3 m5.2xlarge nodes running ndbench in us-east-1, generating a consistent load against the cluster * Baseline C* version: Latest trunk (b05fe7ab) * Candidate C* version: The proposed patch applied to the same version of trunk * Relevant system configuration: Ubuntu xenial running Linux 4.15, with kyber io scheduler (vs noop), 32 KiB readahead (vs 128), and tc-fq network qdisc (vs pfifo_fast) In all cases load is applied and then we wait for metrics to settle, especially things like pending compactions, read/write latencies, p99 latencies, etc ... *Defaults Benchmark:* * Load pattern: 1.2K wps and 1.2k rps at LOCAL_ONE consistency with a random load pattern. * Data sizing: 2 rows of 10 columns, total size per partition of about 10 KiB of random data. ~100 GiB per node data size (replicated 6 ways) * Compaction settings: LCS with size=256MiB, fanout=20 * Compression: LZ4 with 16 KiB block siz *Defaults Benchmark Results:* We do not have data to support the hypothesis that the megamorphic call sites have become more expensive to the addition of the NoopCompressor. 1. No significant change at the coordinator level (least relevant metric): [^15379_coordinator_defaults.png] 2. No significant change at the replica level (most relevant metric): [^15379_replica_defaults.png] 3. No significant change at the system resource level (second most relevant metrics): [^15379_system_defaults.png] Our external flamegraphs exports appear to be broken, but I looked at them and they also show no noticeable difference (I'll work with our performance team to fix exports so I can share the data here). *Next steps for me:* * Squash, rebase, and re-run unit and dtests with latest trunk in preparation for
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059531#comment-17059531 ] Joey Lynch edited comment on CASSANDRA-15379 at 3/15/20, 2:03 AM: -- Cool, took your changes and [rebased on trunk with a few fixups|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]. Tests are running now. I am having some trouble with our performance integration suite for trunk right now, but should hopefully be able to run those performance tests on Monday. Just to confirm you would like performance numbers for a write heavy test for baseline (trunk without my patch): * No compressor * LZ4 Compressor * Zstd Compressor And the following candidates: * No compressor * Noop compressor * LZ4 compressor * Zstd compressor was (Author: jolynch): Cool, took your changes and [rebased on trunk with a few fixups|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]. Tests are running now. I am having some trouble with our performance integration suite for trunk right now, but should hopefully be able to run those performance tests on Monday. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement the following at the same time: > * Per table flush compression configuration > * Ability to default the table flush and compaction compression in the yaml. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971010#comment-16971010 ] Joey Lynch edited comment on CASSANDRA-15379 at 11/10/19 4:03 AM: -- [~djoshi] per your feedback in slack I've added the ability for the user to control the flush via a yaml option while doing the right thing by default. ||trunk|| |[branch|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-15379]| |[!https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379.png?circle-token= 1102a59698d04899ec971dd36e925928f7b521f5!|https://circleci.com/gh/jolynch/cassandra/tree/CASSANDRA-15379]| In order to implement the "don't compress during the flush" [option you suggested|https://the-asf.slack.com/archives/CK23JSY2K/p1572905922120300?thread_ts=1572905763.117000=CK23JSY2K] I figured that the easiest was was to just implement the simple [NoopCompressor|https://github.com/apache/cassandra/commit/9030d8abcf593c06e85f549947ad41621d4776d1] everyone has been mentioning for years. I was having a hard time turning off compression at the level of abstraction BigTableWriter operates at since it doesn't control that e.g. the compression offset file get's written. This way even if you select "none" your flush is still protected by block level checksums. Separately it gives us a good path forward for mitigating CASSANDRA-12682 and CASSANDRA-9264 if we want it to I think. was (Author: jolynch): [~djoshi] per your feedback in slack I've added the ability for the user to control the flush via a yaml option while doing the right thing by default. In order to implement the "don't compress during the flush" [option you suggested|https://the-asf.slack.com/archives/CK23JSY2K/p1572905922120300?thread_ts=1572905763.117000=CK23JSY2K] I figured that the easiest was was to just implement the simple [NoopCompressor|https://github.com/apache/cassandra/commit/9030d8abcf593c06e85f549947ad41621d4776d1] everyone has been mentioning for years. I was having a hard time turning off compression at the level of abstraction BigTableWriter operates at since it doesn't control that e.g. the compression offset file get's written. This way even if you select "none" your flush is still protected by block level checksums. Separately it gives us a good path forward for mitigating CASSANDRA-12682 and CASSANDRA-9264 if we want it to I think. > Make it possible to flush with a different compression strategy than we > compact with > > > Key: CASSANDRA-15379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15379 > Project: Cassandra > Issue Type: Improvement > Components: Local/Compaction, Local/Config, Local/Memtable >Reporter: Joey Lynch >Assignee: Joey Lynch >Priority: Normal > Fix For: 4.0-alpha > > > [~josnyder] and I have been testing out CASSANDRA-14482 (Zstd compression) on > some of our most dense clusters and have been observing close to 50% > reduction in footprint with Zstd on some of our workloads! Unfortunately > though we have been running into an issue where the flush might take so long > (Zstd is slower to compress than LZ4) that we can actually block the next > flush and cause instability. > Internally we are working around this with a very simple patch which flushes > SSTables as the default compression strategy (LZ4) regardless of the table > params. This is a simple solution but I think the ideal solution though might > be for the flush compression strategy to be configurable separately from the > table compression strategy (while defaulting to the same thing). Instead of > adding yet another compression option to the yaml (like hints and commitlog) > I was thinking of just adding it to the table parameters and then adding a > {{default_table_parameters}} yaml option like: > {noformat} > # Default table properties to apply on freshly created tables. The currently > supported defaults are: > # * compression : How are SSTables compressed in general (flush, > compaction, etc ...) > # * flush_compression : How are SSTables compressed as they flush > # supported > default_table_parameters: > compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 16 > flush_compression: > class_name: 'LZ4Compressor' > parameters: > chunk_length_in_kb: 4 > {noformat} > This would have the nice effect as well of giving our configuration a path > forward to providing user specified defaults for table creation (so e.g. if a > particular user wanted to use a different default chunk_length_in_kb they can > do that). > So the proposed (~mandatory) scope is: > * Flush with a faster compression strategy > I'd like to implement
[jira] [Comment Edited] (CASSANDRA-15379) Make it possible to flush with a different compression strategy than we compact with
[ https://issues.apache.org/jira/browse/CASSANDRA-15379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966938#comment-16966938 ] Joey Lynch edited comment on CASSANDRA-15379 at 11/4/19 7:30 PM: - My rationale for the {{EnumSet}} over a boolean member function is: # Versus the boolean function idea it doesn't break the ICompressor abstraction and let compressors know that flushes exist. As in, it is very easy for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but probably can't make the call if that should be used in flushes or other situations. I could have a {{isFastCompressor}} boolean function but given that {{ICompressor}} is a public API interface I think sets of capabilities will be more maintainable than a collection of boolean functions going forwards, especially if we start adding more capabilities (see #2). # If we go down the path of _not_ making more knobs and just try to have the database figure out the best way to compress data for users this is easier to maintain long term since compressors can offer multiple types of hints to the database. For example the database might refuse to use slow compressors in flushes, commitlogs, etc or having compaction strategies opt into higher ratio compression strategies in higher "levels". If we do go down this path there are fewer interface changes (instead of adding and removing functions we just add ICompressor.Uses hints). # Versus the set of strings idea, it has compile time checks that are useful (which is the primary argument against sets of strings afaik). After thinking about this problem space more I'm no longer convinced that giving general users more knobs here is the right choice (the table properties). By using a {{suitableUses}} hint the database can in the future 4.x releases internally optimize: * Flushes: "get this data off my heap as fast as possible". We don't care about ratio (since the products will be re-compacted shortly) or decompression speed, only care about compression speed. * Commitlog: "some compression is nice but get this data off my heap fast". We mostly care about compression speed, but very minorly about ratio. * Compaction: "The older the data the more compressed it should be". We care a lot about decompression speed and ratio, but don't want to pick expensive compressors at the high churn points (L0 in LCS, small tables in STCS, before the time window bucket in TWCS) The interface still gives advanced users a backdoor (they extend the compressor they want to change the behavior of and change what capabilities it offers). edit: I pinged this ticket into [slack|https://the-asf.slack.com/archives/CK23JSY2K/p1572881897039500] to seek more feedback. was (Author: jolynch): My rationale for the {{EnumSet}} over a boolean member function is: # Versus the boolean function idea it doesn't break the ICompressor abstraction and let compressors know that flushes exist. As in, it is very easy for an ICompressor author to claim to be good at {{FAST_COMPRESSION}} but probably can't make the call if that should be used in flushes or other situations. I could have a {{isFastCompressor}} boolean function but given that {{ICompressor}} is a public API interface I think sets of capabilities will be more maintainable than a collection of boolean functions going forwards, especially if we start adding more capabilities (see #2). # If we go down the path of _not_ making more knobs and just try to have the database figure out the best way to compress data for users this is easier to maintain long term since compressors can offer multiple types of hints to the database. For example the database might refuse to use slow compressors in flushes, commitlogs, etc or having compaction strategies opt into higher ratio compression strategies in higher "levels". If we do go down this path there are fewer interface changes (instead of adding and removing functions we just add ICompressor.Uses hints). # Versus the set of strings idea, it has compile time checks that are useful (which is the primary argument against sets of strings afaik). After thinking about this problem space more I'm no longer convinced that giving general users more knobs here is the right choice (the table properties). By using a {{suitableUses}} hint the database can internally optimize: * Flushes: "get this data off my heap as fast as possible". We don't care about ratio (since the products will be re-compacted shortly) or decompression speed, only care about compression speed. * Commitlog: "some compression is nice but get this data off my heap fast". We mostly care about compression speed, but very minorly about ratio. * Compaction: "The older the data the more compressed it should be". We care a lot about decompression speed and ratio, but don't want to pick expensive compressors at the high