[ 
https://issues.apache.org/jira/browse/CASSANDRA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307969#comment-14307969
 ] 

Björn Hegerfors commented on CASSANDRA-7272:
--------------------------------------------

I don't understand why major compaction for STCS isn't already optimal. I do 
see why one might want to compact some but not all SSTables in a 
multi-tombstone compaction (CASSANDRA-7019) (though DTCS should be a better fit 
for anyone wanting this). But if every single SSTable is being rewritten to 
disk, why not write them into one file? As far as I understand, the ultimate 
goal of STCS is to be one SSTable. STCS only gets there, the natural way, once 
in a blue moon. But that's the most optimal state that it can be in. Am I wrong?

The only explanation I can see for splitting the result of compacting all 
SSTables into fragments, is if those fragments are:
1. Partitioned smartly. For example into separate token ranges (à la LCS), 
timestamp ranges (à la DTCS) or clustering column ranges (which would be 
interesting). Or a combination of these.
2. The structure upheld by the resulting fragments is not subsequently 
demolished by the running compaction strategy going on with its usual business.

> Add "Major" Compaction to LCS 
> ------------------------------
>
>                 Key: CASSANDRA-7272
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7272
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: Marcus Eriksson
>            Priority: Minor
>              Labels: compaction
>             Fix For: 3.0
>
>
> LCS has a number of minor issues (maybe major depending on your perspective).
> LCS is primarily used for wide rows so for instance when you repair data in 
> LCS you end up with a copy of an entire repaired row in L0.  Over time if you 
> repair you end up with multiple copies of a row in L0 - L5.  This can make 
> predicting disk usage confusing.  
> Another issue is cleaning up tombstoned data.  If a tombstone lives in level 
> 1 and data for the cell lives in level 5 the data will not be reclaimed from 
> disk until the tombstone reaches level 5.
> I propose we add a "major" compaction for LCS that forces consolidation of 
> data to level 5 to address these.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to