[ https://issues.apache.org/jira/browse/CASSANDRA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154250#comment-13154250 ]
Jonathan Ellis commented on CASSANDRA-3442: ------------------------------------------- Should we do single-sstable compactions *after* the bucket compactions? Doing them first means we might compact them twice, when the bucket-based compaction would have been adequate. It looks like this will never stop recompacting sstables with high expiring column counts, until they finally expire and are expunged. I think we need to address this somehow, possibly by waiting until some fraction of gc_grace_seconds has elapsed since sstable creation (which we can just get from mtime). If we can reasonably test this in CompactionsTest I'd like to add that. > TTL histogram for sstable metadata > ---------------------------------- > > Key: CASSANDRA-3442 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3442 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Yuki Morishita > Priority: Minor > Labels: compaction > Fix For: 1.1 > > Attachments: 3442.txt > > > Under size-tiered compaction, you can generate large sstables that compact > infrequently. With expiring columns mixed in, we could waste a lot of space > in this situation. > If we kept a TTL EstimatedHistogram in the sstable metadata, we could do a > single-sstable compaction aginst sstables with over 20% (?) expired data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira