[ https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117032#comment-13117032 ]
Alan Liang commented on CASSANDRA-2735: --------------------------------------- We've tested this patch internally and we noticed that this actually resulted in a lot more compactions than the SizeTieredCompactionStrategy. The increase in IO was not acceptable for our use and therefore stopped working on this patch. Internally, we ended up implementing expiration of sstables within SizeTieredCompactionStrategy. We've called it SizeTieredExpirableCompactionStrategy. Given a set of all sstables, the compaction procedure becomes: 1. Expire sstables based on max timestamp of the sstable. Remove expired sstables from the set. 2. Remove sstables from the set that are >= to a max size 3. Run the SizeTieredCompactionStrategy on the remaining sstables. The downside with this strategy is that during compaction, newer sstables could be mixed with older sstables and the resultant compacted sstable gets marked with a max timestamp of the newer sstable. This means you won't be able to expire the older rows within the sstable until the entire sstable is to be expired. This problem of compacting really old sstables with newer sstables is mitigated with a restriction that an sstable is taken out of consideration for compaction if it reaches a certain max sstable size. This works because older sstables tend to be larger files. We found this is currently working for our specific use case of storing timeseries data. I can post the patch for this SizeTieredExpirableCompactionStrategy if there is interest. I'll have to rebase it. > Timestamp Based Compaction Strategy > ----------------------------------- > > Key: CASSANDRA-2735 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2735 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Alan Liang > Assignee: Alan Liang > Priority: Minor > Labels: compaction > Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch, > 0001-timestamp-bucketed-compaction-strategy.patch > > > Compaction strategy implementation based on max timestamp ordering of the > sstables while satisfying max sstable size, min and max compaction > thresholds. It also handles expiration of sstables based on a timestamp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira