[ 
https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784300#comment-13784300
 ] 

Tyler Hobbs commented on CASSANDRA-6109:
----------------------------------------

I think I have some clearer ideas about how to do this now.  We should be able 
to combine hotness and overlap concerns at the different levels.

At level (1), avoid compacting comparatively cold data by dropping sstables 
from buckets when their hotness is less than, say, 25% of the bucket average 
(this avoids the low-variance problem of using the stddev).  If the bucket 
falls below the min compaction threshold, ignore it (to make sure we're 
compacting enough sstables at once).

At level (2), submit the hottest bucket to the executor for compaction.

The average number of sstables hit per-read is actually a decent measure for 
prioritizing compactions at the executor level.  At level (3), we can combine 
that with the bucket hotness to get a rough idea of how many individual sstable 
reads per second we could save by compacting a given bucket (hotness * 
avg_sstables_per_read).  Prioritize compaction tasks in the queue based on this 
measure.

That should give us a nice balance of not compacting cold data and prioritizing 
compaction of the most read and most fragmented sstables.

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but 
> it's confusing if you have a workload such that you can't keep up with *all* 
> compaction, but you can keep up with hot sstable.  (Compaction backlog stat 
> becomes useless since we fall increasingly behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to