[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784300#comment-13784300 ]
Tyler Hobbs commented on CASSANDRA-6109: ---------------------------------------- I think I have some clearer ideas about how to do this now. We should be able to combine hotness and overlap concerns at the different levels. At level (1), avoid compacting comparatively cold data by dropping sstables from buckets when their hotness is less than, say, 25% of the bucket average (this avoids the low-variance problem of using the stddev). If the bucket falls below the min compaction threshold, ignore it (to make sure we're compacting enough sstables at once). At level (2), submit the hottest bucket to the executor for compaction. The average number of sstables hit per-read is actually a decent measure for prioritizing compactions at the executor level. At level (3), we can combine that with the bucket hotness to get a rough idea of how many individual sstable reads per second we could save by compacting a given bucket (hotness * avg_sstables_per_read). Prioritize compaction tasks in the queue based on this measure. That should give us a nice balance of not compacting cold data and prioritizing compaction of the most read and most fragmented sstables. > Consider coldness in STCS compaction > ------------------------------------ > > Key: CASSANDRA-6109 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Tyler Hobbs > Fix For: 2.0.2 > > > I see two options: > # Don't compact cold sstables at all > # Compact cold sstables only if there is nothing more important to compact > The latter is better if you have cold data that may become hot again... but > it's confusing if you have a workload such that you can't keep up with *all* > compaction, but you can keep up with hot sstable. (Compaction backlog stat > becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)