[ https://issues.apache.org/jira/browse/CASSANDRA-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113334#comment-15113334 ]
Jim Witschey commented on CASSANDRA-10995: ------------------------------------------ I started with this workload from [~enigmacurry], which he says he uses as a go-to starting point: http://cstar.datastax.com/tests/id/a4963d82-a596-11e5-8573-0256e416528f I ran the workload with each SSTable compressor and with no compression: * http://cstar.datastax.com/tests/id/872be204-c073-11e5-b8b1-0256e416528f * http://cstar.datastax.com/tests/id/3aa9a452-c055-11e5-8c22-0256e416528f * http://cstar.datastax.com/tests/id/82bcb414-bfb0-11e5-8c22-0256e416528f * http://cstar.datastax.com/tests/id/0ef49fb8-bf94-11e5-8c22-0256e416528f Here are stress' summary statistics: {code} Write ======================================================================= Deflate LZ4 Snappy no compression latency 95th percentile 3.5 3.3 3.3 3.4 latency 99th percentile 5.3 4.8 4.7 5.3 latency 99.9th percentile 103.5 86.1 87.7 86.7 latency max 9357.7 513.3 471.7 397.9 op rate 146818.0 226499.0 227101.0 227818.0 partition rate 146818.0 226499.0 227101.0 227818.0 row rate 146818.0 226499.0 227101.0 227818.0 latency mean 3.4 2.2 2.2 2.2 latency median 1.5 1.5 1.5 1.5 Read ======================================================================= Deflate LZ4 Snappy no compression latency 95th percentile 11.6 4.2 4.5 3.5 latency 99th percentile 27.3 6.4 7.0 5.1 latency 99.9th percentile 56.3 48.5 49.1 48.4 latency max 363.6 403.1 385.9 469.0 op rate 69999.0 204419.0 197231.0 229806.0 partition rate 69999.0 204419.0 197231.0 229806.0 row rate 69999.0 204419.0 197231.0 229806.0 latency mean 7.1 2.4 2.5 2.1 latency median 6.1 1.8 1.9 1.6 Mixed Read/Write ======================================================================= Deflate LZ4 Snappy no compression latency 95th percentile 12.0 4.9 5.1 3.5 latency 99th percentile 25.2 9.4 9.0 5.0 latency 99.9th percentile 61.5 59.2 58.8 57.6 latency max 261.8 7436.9 6741.0 3443.8 op rate 76038.0 181384.0 177463.0 217650.0 partition rate 76038.0 181384.0 177463.0 217650.0 row rate 76038.0 181384.0 177463.0 217650.0 latency mean 6.5 2.7 2.8 2.3 latency median 5.4 1.7 1.8 1.6 {code} (I generated this chart using the data and iPython notebook posted here: https://gist.github.com/mambocab/7bf14e0ff23e0f807f67 for future reference.) I would want to re-run this a couple times before drawing conclusions, but so far, using no compression is at least better than any compression in most read metrics. Any particular requests for follow-up? My thought was: * at the very least, more runs of this same workload * probably runs of some of the small workloads we use for daily regressions I also have access to Windows machines, so if this benchmark is good, I can run on that cluster as well. It may also be worth the time to confirm that turning off compression doesn't negatively impact, e.g. MVs, 2Is, and a larger variety of datasets. I'm not sure exactly what information we need to make this decision. > Consider disabling sstable compression by default in 3.x > -------------------------------------------------------- > > Key: CASSANDRA-10995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10995 > Project: Cassandra > Issue Type: Improvement > Reporter: Aleksey Yeschenko > Assignee: Jim Witschey > > With the new sstable format introduced in CASSANDRA-8099, it's very likely > that enabled sstable compression is no longer the right default option. > [~slebresne]'s [blog post|http://www.datastax.com/2015/12/storage-engine-30] > on the new storage engine has some comparison numbers for 2.2/3.0, with and > without compression that show that in many cases compression no longer has a > significant effect on sstable sizes - all while sill consuming extra > resources for both writes (compression) and reads (decompression). > We should run a comprehensive set of benchmarks to determine whether or not > compression should be switched to 'off' now in 3.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332)