[ 
https://issues.apache.org/jira/browse/CASSANDRA-10995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113334#comment-15113334
 ] 

Jim Witschey commented on CASSANDRA-10995:
------------------------------------------

I started with this workload from [~enigmacurry], which he says he uses as a 
go-to starting point:

http://cstar.datastax.com/tests/id/a4963d82-a596-11e5-8573-0256e416528f

I ran the workload with each SSTable compressor and with no compression:

* http://cstar.datastax.com/tests/id/872be204-c073-11e5-b8b1-0256e416528f
* http://cstar.datastax.com/tests/id/3aa9a452-c055-11e5-8c22-0256e416528f
* http://cstar.datastax.com/tests/id/82bcb414-bfb0-11e5-8c22-0256e416528f
* http://cstar.datastax.com/tests/id/0ef49fb8-bf94-11e5-8c22-0256e416528f

Here are stress' summary statistics:

{code}
Write
=======================================================================
                            Deflate       LZ4    Snappy  no compression
latency 95th percentile         3.5       3.3       3.3             3.4
latency 99th percentile         5.3       4.8       4.7             5.3
latency 99.9th percentile     103.5      86.1      87.7            86.7
latency max                  9357.7     513.3     471.7           397.9
op rate                    146818.0  226499.0  227101.0        227818.0
partition rate             146818.0  226499.0  227101.0        227818.0
row rate                   146818.0  226499.0  227101.0        227818.0
latency mean                    3.4       2.2       2.2             2.2
latency median                  1.5       1.5       1.5             1.5

Read
=======================================================================
                           Deflate       LZ4    Snappy  no compression
latency 95th percentile       11.6       4.2       4.5             3.5
latency 99th percentile       27.3       6.4       7.0             5.1
latency 99.9th percentile     56.3      48.5      49.1            48.4
latency max                  363.6     403.1     385.9           469.0
op rate                    69999.0  204419.0  197231.0        229806.0
partition rate             69999.0  204419.0  197231.0        229806.0
row rate                   69999.0  204419.0  197231.0        229806.0
latency mean                   7.1       2.4       2.5             2.1
latency median                 6.1       1.8       1.9             1.6

Mixed Read/Write
=======================================================================
                           Deflate       LZ4    Snappy  no compression
latency 95th percentile       12.0       4.9       5.1             3.5
latency 99th percentile       25.2       9.4       9.0             5.0
latency 99.9th percentile     61.5      59.2      58.8            57.6
latency max                  261.8    7436.9    6741.0          3443.8
op rate                    76038.0  181384.0  177463.0        217650.0
partition rate             76038.0  181384.0  177463.0        217650.0
row rate                   76038.0  181384.0  177463.0        217650.0
latency mean                   6.5       2.7       2.8             2.3
latency median                 5.4       1.7       1.8             1.6
{code}

(I generated this chart using the data and iPython notebook posted here: 
https://gist.github.com/mambocab/7bf14e0ff23e0f807f67 for future reference.)

I would want to re-run this a couple times before drawing conclusions, but so 
far, using no compression is at least better than any compression in most read 
metrics.

Any particular requests for follow-up? My thought was:

* at the very least, more runs of this same workload
* probably runs of some of the small workloads we use for daily regressions

I also have access to Windows machines, so if this benchmark is good, I can run 
on that cluster as well.

It may also be worth the time to confirm that turning off compression doesn't 
negatively impact, e.g. MVs, 2Is, and a larger variety of datasets. I'm not 
sure exactly what information we need to make this decision.

> Consider disabling sstable compression by default in 3.x
> --------------------------------------------------------
>
>                 Key: CASSANDRA-10995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10995
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Aleksey Yeschenko
>            Assignee: Jim Witschey
>
> With the new sstable format introduced in CASSANDRA-8099, it's very likely 
> that enabled sstable compression is no longer the right default option.
> [~slebresne]'s [blog post|http://www.datastax.com/2015/12/storage-engine-30] 
> on the new storage engine has some comparison numbers for 2.2/3.0, with and 
> without compression that show that in many cases compression no longer has a 
> significant effect on sstable sizes - all while sill consuming extra 
> resources for both writes (compression) and reads (decompression).
> We should run a comprehensive set of benchmarks to determine whether or not 
> compression should be switched to 'off' now in 3.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to