[ https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790610#comment-13790610 ]
Jonathan Ellis commented on CASSANDRA-6109: ------------------------------------------- I'm thinking about how I tune this as an operator. If we're going by coldness-relative-to-mean, I'm not really sure where to set that to achieve my read performance goals other than trial and error. Suppose for instance that I have 11 sstables, one of which has 10M reads recently and 10 of which have 1M reads. If I set my threshold to 25% then nothing gets compacted which is probably not what we want, since the 10 "cold" sstables collectively represent 50% of the read activity. What if instead we # analyze hotness globally (per-CF) rather than per-bucket, and # configure the threshold based on hotness percentile (compact me if I am hotter than N% of my peers) > Consider coldness in STCS compaction > ------------------------------------ > > Key: CASSANDRA-6109 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6109 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: Tyler Hobbs > Fix For: 2.0.2 > > Attachments: 6109-v1.patch, 6109-v2.patch > > > I see two options: > # Don't compact cold sstables at all > # Compact cold sstables only if there is nothing more important to compact > The latter is better if you have cold data that may become hot again... but > it's confusing if you have a workload such that you can't keep up with *all* > compaction, but you can keep up with hot sstable. (Compaction backlog stat > becomes useless since we fall increasingly behind.) -- This message was sent by Atlassian JIRA (v6.1#6144)