[ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067831#comment-13067831 ]
Pavel Yaskevich commented on CASSANDRA-47: ------------------------------------------ bq. Why ? Is there something in compressed files that requires to have an Input and a Output class ? Can't we just have CDF seek() method throw an exception if the CDF has been opened in "rw" mode ? And if we don't split it (which again, I don't see why we would have to, but maybe I'm missing something), I'm pretty sure there is very little parts that will require refactoring (skip cache isn't one of them, CDF will just set skipCache to false; even though I don't see why skip cache would be a problem with compression). The thing about Input/Output classes was mentioned previously at CASSANDRA-1470. I -1 doing "seek() method throw an exception if the CDF has been opened in "rw" mode" because this is not a clean interface but I rather prefer to make separate classes as that will be a more reasonable and clean design. Anyway, even right now common ancestor of both is RandomAccessFile (or even FileDataInput). So I -1 doing merge of CDF and BRAF before we have a BRAF refactored. bq. In any case, having compression optional is a requirement and in my book, the more important one. To be clear, I'm -1 on committing anything where compression is not optional (we cannot ask people to trust compression on day 1, and I strongly think that the "let's commit and fix after" is the wrong way to go). So we at least need CDF and BRAF to have some common ancestor for that. To be clear, I'm not proposing "let's commit and fix after", compaction can be make optional easily with current state of the patch and I'm making it my top priority. bq. I would prefer putting this index and the header into a separate component (a -Compression component ?). Thinking about that further - I'm a bit conserved about adding one more file to handle a single SSTable, main goal of my design here was to make CDF independent from other components of the system to avoid any additional complexity so maybe it's better to stream file offsets to the temporary file while SSTable being written and after that store index section at the end of the file (as a conter-action of keeping that index in memory)? bq. Talking about the header, the control bytes detection is not correct since we haven't done this so far: there is no guarantee an existing data file won't start by the bytes 'C' then 'D' (having or not having a -Compression component could serve this purpose though). We can use a magic number the same way as gzip does http://en.wikipedia.org/wiki/Gzip#File_format. > SSTable compression > ------------------- > > Key: CASSANDRA-47 > URL: https://issues.apache.org/jira/browse/CASSANDRA-47 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Pavel Yaskevich > Labels: compression > Fix For: 1.0 > > Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47.patch, > snappy-java-1.0.3-rc4.jar > > > We should be able to do SSTable compression which would trade CPU for I/O > (almost always a good trade). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira