[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071504#comment-13071504
 ] 

Terje Marthinussen commented on CASSANDRA-47:
---------------------------------------------

Instead of on/off we could use size.

In the cassandra we run, we have compression implemented on a supercolumn level.

It turned out to be very good for performance for us not to compress data in 
memtables (which we would normally do with compression on supercolumns) or 
during flushing from memtables as both of these caused slowdown in the write 
path.

Under heavy write activity, the resulting sstables from memtable flushes often 
gets pretty small (maybe avg. 20MB in our case) so compression does not really 
make much difference on disk consumption there, but the performance penalty 
does.

All the compression/decompression on compacting the smallest tables also makes 
a noticable difference when trying to keep up on the compaction side.

Instead we went for compression which only happens when a source sstable during 
compaction is larger than 4GB. 

I would recommend to consider similar functionality here.

I started off with ning for our compression, but I now run the built in java 
deflate to get even better compression. Since we only compress the largest 
sstables, and do no other compression in the write path or on compaction of 
small sstables,the very slow compression of deflate does not bother us that 
much. 

The read side is of course still slower with inflate, but it is still more than 
fast enough to not be a problem. 

OS caching will also be better thanks to the better compression so we can 
regain some of the performance lost vs. ning/snappy there.

We could also consider being very tunable with deflate for very large sstables, 
ning/snappy  for smaller and no compression for the smallest, but I am not sure 
it is worth it.

By the way, how much difference did you see on ning vs. snappy? When I tested 
it was not all that much difference and I felt ning was easier to bundle so to 
me it seemed like a better alternative.

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, 
> CASSANDRA-47-v3.patch, CASSANDRA-47-v4.patch, CASSANDRA-47.patch, 
> snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O 
> (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to