[ 
https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790622#comment-13790622
 ] 

Tyler Hobbs commented on CASSANDRA-6109:
----------------------------------------

bq. Suppose for instance that I have 11 sstables, one of which has 10M reads 
recently and 10 of which have 1M reads. If I set my threshold to 25% then 
nothing gets compacted which is probably not what we want, since the 10 "cold" 
sstables collectively represent 50% of the read activity.

Actually, in this case none of the sstables would be considered cold (assuming 
they all have similar key estimates).  The mean reads would be 1.8M, and 0.25 * 
1.8M = 0.45M.

I agree that it might be difficult to tune intelligently, though.

bq. analyze hotness globally (per-CF) rather than per-bucket

That seems reasonable to me.

bq. configure the threshold based on hotness percentile (compact me if I am 
hotter than N% of my peers)

This has the problem of always ignoring the coldest sstable even when there is 
little variation between them.  So if you have four SSTables with 1M, 1M, 1M, 
and 0.999M reads, the last will be considered cold and never compacted.

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>         Attachments: 6109-v1.patch, 6109-v2.patch
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but 
> it's confusing if you have a workload such that you can't keep up with *all* 
> compaction, but you can keep up with hot sstable.  (Compaction backlog stat 
> becomes useless since we fall increasingly behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to