[ 
https://issues.apache.org/jira/browse/CASSANDRA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117032#comment-13117032
 ] 

Alan Liang commented on CASSANDRA-2735:
---------------------------------------

We've tested this patch internally and we noticed that this actually resulted 
in a lot more compactions  than the SizeTieredCompactionStrategy. The increase 
in IO was not acceptable for our use and therefore stopped working on this 
patch.

Internally, we ended up implementing expiration of sstables within 
SizeTieredCompactionStrategy. We've called it 
SizeTieredExpirableCompactionStrategy. Given a set of all sstables, the 
compaction procedure becomes:

1. Expire sstables based on max timestamp of the sstable. Remove expired 
sstables from the set.
2. Remove sstables from the set that are >= to a max size
3. Run the SizeTieredCompactionStrategy on the remaining sstables.

The downside with this strategy is that during compaction, newer sstables could 
be mixed with older sstables and the resultant compacted sstable gets marked 
with a max timestamp of the newer sstable. This means you won't be able to 
expire the older rows within the sstable until the entire sstable is to be 
expired. This problem of compacting really old sstables with newer sstables is 
mitigated with a restriction that an sstable is taken out of consideration for 
compaction if it reaches a certain max sstable size. This works because older 
sstables tend to be larger files.

We found this is currently working for our specific use case of storing 
timeseries data. I can post the patch for this 
SizeTieredExpirableCompactionStrategy if there is interest. I'll have to rebase 
it.
                
> Timestamp Based Compaction Strategy
> -----------------------------------
>
>                 Key: CASSANDRA-2735
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2735
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Alan Liang
>            Assignee: Alan Liang
>            Priority: Minor
>              Labels: compaction
>         Attachments: 0001-timestamp-bucketed-compaction-strategy-V2.patch, 
> 0001-timestamp-bucketed-compaction-strategy.patch
>
>
> Compaction strategy implementation based on max timestamp ordering of the 
> sstables while satisfying max sstable size, min and max compaction 
> thresholds. It also handles expiration of sstables based on a timestamp.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to