[ 
https://issues.apache.org/jira/browse/CASSANDRA-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154250#comment-13154250
 ] 

Jonathan Ellis commented on CASSANDRA-3442:
-------------------------------------------

Should we do single-sstable compactions *after* the bucket compactions?  Doing 
them first means we might compact them twice, when the bucket-based compaction 
would have been adequate.

It looks like this will never stop recompacting sstables with high expiring 
column counts, until they finally expire and are expunged.  I think we need to 
address this somehow, possibly by waiting until some fraction of 
gc_grace_seconds has elapsed since sstable creation (which we can just get from 
mtime).

If we can reasonably test this in CompactionsTest I'd like to add that.
                
> TTL histogram for sstable metadata
> ----------------------------------
>
>                 Key: CASSANDRA-3442
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3442
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>            Priority: Minor
>              Labels: compaction
>             Fix For: 1.1
>
>         Attachments: 3442.txt
>
>
> Under size-tiered compaction, you can generate large sstables that compact 
> infrequently.  With expiring columns mixed in, we could waste a lot of space 
> in this situation.
> If we kept a TTL EstimatedHistogram in the sstable metadata, we could do a 
> single-sstable compaction aginst sstables with over 20% (?) expired data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to