[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062381#comment-13062381 ]
Pavel Yaskevich commented on CASSANDRA-47: ------------------------------------------ bq. This seems like an unrealistically good compression ratio. If I gzip a real world SSTable that has redundant data that should be ripe for compression I only see 641M-->217M. What's the gzip compression ratio with the SSTables that stress.java workload generates? You can easily test it yourself: for example ./bin/stress -S 1024 -n 1000000 -C 250 -V wait for compactions to finish and check block size of the resulting files (using ls -lahs), I see 3.8GB compressed into 781MB in my tests. internal_op_rate with the current trunk code is around 450-500 but with current patch it is about 2800-3000 on Quad-Core AMD Opteron(tm) Processor 2374 HE 4229730MHz on each core, 2GB mem (rackspace instance). cardinality of 250 is 5 times bigger that default + average size values using -V option. > SSTable compression > ------------------- > > Key: CASSANDRA-47 > URL: https://issues.apache.org/jira/browse/CASSANDRA-47 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Labels: compression > Fix For: 1.0 > > Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar > > > We should be able to do SSTable compression which would trade CPU for I/O > (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira