[ 
https://issues.apache.org/jira/browse/CASSANDRA-6109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783326#comment-13783326
 ] 

Tyler Hobbs commented on CASSANDRA-6109:
----------------------------------------

bq. The latter is better if you have cold data that may become hot again... but 
it's confusing if you have a workload such that you can't keep up with all 
compaction, but you can keep up with hot sstable. (Compaction backlog stat 
becomes useless since we fall increasingly behind.)

The pending compactions stat is already pretty wonky, so I'm not sure we should 
prioritize keeping that sane.

Option 1 (don't compact cold sstables) seems dangerous as a first step compared 
to option 2, especially because it's hard to decide what is "cold".  
Prioritizing compaction of hotter sstables seems like the better first step.

When comparing hotness of sstables, I think a good measure is 
{{avg_reads_per_sec / number_of_keys}} rather than just {{avg_reads_per_sec}} 
so that large sstables aren't over-weighted.  When I mention the hotness of a 
bucket of sstables below, I'm talking about the sum of the hotness measure 
across the individual sstables.

For prioritizing compaction of hotter sstables, it seems like there are a few 
levels this can operate at:
# Picking sstable members for compaction buckets
# Picking the most "interesting" bucket to submit to the compaction executor 
(currently the smallest sstables are considered the most interesting)
# At the compaction executor level, prioritizing tasks in the queue (the queue 
is not currently prioritized)

(1) seems like the most difficult point to make good decisions at.  I can 
imagine a scheme like dropping members that are below {{2 * stdev}} of the mean 
hotness for the bucket working decently, but some of the efficiency of 
compacting many sstables at once is lost, and some of the drops would be poor 
when there is little variance among the sstables.

(2) would probably work well by itself, although, as discussed below, sstable 
overlap is a better measure than hotness for this.

(3) requires (2) to be somewhat fair.  Each table submits its hottest buckets 
for compaction, and the executor prioritizes the hottest buckets in the queue 
(regardless of which table they came from).  There is a potential for 
starvation among colder tables when compaction falls behind, but that may be 
mitigated by a few things:
* If the compaction of the hotter sstables is very effective at merging rows, 
the hotness of future buckets for that table should be lower.  Since the 
hotness of a bucket is the sum of its members, if four totally overlapping 
sstables are merged into one sstable, the hotness of the new sstable should be 
1/4 of the hotness of the previous bucket.  I'll point out that tracking how 
much overlap there is among sstables would be a much better measure than 
hotness for picking which compactions to prioritize; in the worst case here (no 
overlap), the hotness of the newly compacted sstable could be the same as the 
bucket it came from.
* If we were willing to discard cold items in the queue when hotter items came 
in and the queue was full, colder tables would eventually submit new tasks with 
more sstables in them (thus having greater hotness).

While I'm thinking about it, do we have any tickets or features in place to 
track sstable overlap (beyond average number of sstables hit per read at the 
table level)?

> Consider coldness in STCS compaction
> ------------------------------------
>
>                 Key: CASSANDRA-6109
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6109
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.2
>
>
> I see two options:
> # Don't compact cold sstables at all
> # Compact cold sstables only if there is nothing more important to compact
> The latter is better if you have cold data that may become hot again...  but 
> it's confusing if you have a workload such that you can't keep up with *all* 
> compaction, but you can keep up with hot sstable.  (Compaction backlog stat 
> becomes useless since we fall increasingly behind.)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to