[ 
https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062381#comment-13062381
 ] 

Pavel Yaskevich commented on CASSANDRA-47:
------------------------------------------

bq. This seems like an unrealistically good compression ratio. If I gzip a real 
world SSTable that has redundant data that should be ripe for compression I 
only see 641M-->217M. What's the gzip compression ratio with the SSTables that 
stress.java workload generates?

You can easily test it yourself: for example ./bin/stress -S 1024 -n 1000000 -C 
250 -V wait for compactions to finish and check block size of the resulting 
files (using ls -lahs), I see 3.8GB compressed into 781MB in my tests. 
internal_op_rate with the current trunk code is around 450-500 but with current 
patch it is about 2800-3000 on Quad-Core AMD Opteron(tm) Processor 2374 HE 
4229730MHz on each core, 2GB mem (rackspace instance). cardinality of 250 is 5 
times bigger that default + average size values using -V option. 

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O 
> (almost always a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to