[ 
https://issues.apache.org/jira/browse/CASSANDRA-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-8360:
--------------------------------------
    Reviewer: Marcus Eriksson

> In DTCS, always compact SSTables in the same time window, even if they are 
> fewer than min_threshold
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8360
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8360
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Björn Hegerfors
>            Assignee: Björn Hegerfors
>            Priority: Minor
>         Attachments: cassandra-2.0-CASSANDRA-8360.txt
>
>
> DTCS uses min_threshold to decide how many time windows of the same size that 
> need to accumulate before merging into a larger window. The age of an SSTable 
> is determined as its min timestamp, and it always falls into exactly one of 
> the time windows. If multiple SSTables fall into the same window, DTCS 
> considers compacting them, but if they are fewer than min_threshold, it 
> decides not to do it.
> When do more than 1 but fewer than min_threshold SSTables end up in the same 
> time window (except for the current window), you might ask? In the current 
> state, DTCS can spill some extra SSTables into bigger windows when the 
> previous window wasn't fully compacted, which happens all the time when the 
> latest window stops being the current one. Also, repairs and hints can put 
> new SSTables in old windows.
> I think, and [~jjordan] agreed in a comment on CASSANDRA-6602, that DTCS 
> should ignore min_threshold and compact tables in the same windows regardless 
> of how few they are. I guess max_threshold should still be respected.
> [~jjordan] suggested that this should apply to all windows but the current 
> window, where all the new SSTables end up. That could make sense. I'm not 
> clear on whether compacting many SSTables at once is more cost efficient or 
> not, when it comes to the very newest and smallest SSTables. Maybe compacting 
> as soon as 2 SSTables are seen is fine if the initial window size is small 
> enough? I guess the opposite could be the case too; that the very newest 
> SSTables should be compacted very many at a time?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to