[ 
https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790610#comment-13790610
 ] 

Jonathan Ellis commented on CASSANDRA-6109:
-------------------------------------------

I'm thinking about how I tune this as an operator.  If we're going by 
coldness-relative-to-mean, I'm not really sure where to set that to achieve my 
read performance goals other than trial and error.

Suppose for instance that I have 11 sstables, one of which has 10M reads 
recently and 10 of which have 1M reads.  If I set my threshold to 25% then 
nothing gets compacted which is probably not what we want, since the 10 "cold" 
sstables collectively represent 50% of the read activity.

What if instead we 
# analyze hotness globally (per-CF) rather than per-bucket, and
# configure the threshold based on hotness percentile (compact me if I am 
hotter than N% of my peers)

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>         Attachments: 6109-v1.patch, 6109-v2.patch
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but 
> it's confusing if you have a workload such that you can't keep up with *all* 
> compaction, but you can keep up with hot sstable.  (Compaction backlog stat 
> becomes useless since we fall increasingly behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to