[ 
https://issues.apache.org/jira/browse/CASSANDRA-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-3854:
--------------------------------------

             Priority: Minor  (was: Major)
    Affects Version/s:     (was: 1.1)
                       1.0.0
    
> Lower levels should have higher priority in LeveledCompactionStrategy
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-3854
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3854
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Stu Hood
>            Priority: Minor
>              Labels: compaction, leveled
>
> In LeveledCompactionStrategy, there is a comment explaining why compactions 
> prioritize the top levels:
> bq. So instead, we force compacting higher levels first.  This may not 
> minimize the number of reads done as quickly in the short term, but it 
> minimizes the i/o needed to compact optimially which gives us a long term win.
> The argument is that compacting the lower levels first causes data to be 
> re-compacted more frequently.
> But the result of compacting the top levels first is that each compaction 
> pass does less total work, because the number of overlapping files in a 
> particular level is limited to the number of files that can be generated by a 
> single compaction pass.
> Consider the overload example from that comment:
> {quote}
> L0: 988 [ideal: 4]
> L1: 117 [ideal: 10]
> L2: 12  [ideal: 100]
> {quote}
> Assuming that there are initially no overlapping files in L2 or L1, the 
> current implementation will:
> # Compact 32 files from L0 into L1 (which will cause 32 L0 files and 117 L1 
> files to be compacted together)
> # Possibly compact L1 and L2 in order to remove overlapping files due to the 
> L0 compaction
> # Repeat
> The problem with this strategy is that only every 3rd compaction will be 
> working to drain L0. Additionally, each compaction that occurs in L1 and L2 
> will be doing the minimum possible work that can trigger a compaction (the 
> equivalent of SizeTieredStrategy triggering the min threshold).
> ----
> If we agree that the goal is to ensure that L0 is drained (since a 
> proliferation of tiny overlapping files means worst case read behavior), then 
> a better strategy would be to start from the lower levels, and to change 
> compaction of L0 to not include L1. With this strategy, the steps would be:
> # Compact 32 files from L0 (causing them to be partitioned and placed in L1)
> # Repeat
> # Compact ?? files from LN into LN+1 (existing strategy)
> # Repeat
> With this approach, L0 is quickly partitioned, considerably shortening the 
> time during which 988 files need to be checked. Additionally, because each 
> level is completely drained before moving to the next, compactions occurring 
> within a level are triggering the equivalent of SizeTieredStrategy's max 
> threshold (aka, always hitting a cap of 32 involved files.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to