[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014531#comment-13014531 ]
Terje Marthinussen commented on CASSANDRA-47: --------------------------------------------- This is not so interesting for a "proper" solution maybe, but adding just for the reference. I needed to get space for more data, so I recently just crashed into a quick compression hack for supercolumns. I was considering to compress the index blocks as Jonathan suggested, but I could not make up my mind on how safe that would be in terms of other code accessing the data and had a bit short time, so I looked for something more isolated. Final decision was to simply compress the serialized columns in a supercolumn (+ add a bit caching to avoid recompressing all the time when serialized size is requested) The data I have, has supercolumns with typically 50-60 subcolumns. Mostly small strings or numbers. In total, the subcolumns makes up 600-1200 bytes of data when serialized. Usually a fair bit of supercolumns per row. My test data was 447 keys. I tested with the ning lzf jars and the default java.util.zip. This is not necessarily a good test, but I think json2sstable is somewhat useful to measure relative impact between implementations although not useful to determine real performance in any way. In addition, I made a simple dictionary of column names (only applied to supercolumns) as the column names was not very well compressed when looking at just a single supercolumn at a time. The result of both the digest and compression: Standard cassandra. json2sstable: real 0m55.148s user 1m50.023s sys 0m2.856s sstable: 190MB ning.com: real 1m8.315s user 2m18.361s sys 0m4.600s sstable: 108MB java.util.zip real 1m35.899s user 2m49.691s sys 0m2.940s sstable: 90mb As a reference, the whole sstable files compresses as follows: ning.com (command line) real 0m1.803s user 0m1.536s sys 0m0.396s sstable: 80MB gzip (command line) real 0m6.175s user 0m6.076s sys 0m0.084s sstable: 48MB I doubt this implementation has much for inclusion in a release. Just added the numbers for the reference. Of course, if requested, I could see if I could make the patch available somewhere. > SSTable compression > ------------------- > > Key: CASSANDRA-47 > URL: https://issues.apache.org/jira/browse/CASSANDRA-47 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Priority: Minor > Fix For: 0.8 > > > We should be able to do SSTable compression which would trade CPU for I/O > (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira