[ 
https://issues.apache.org/jira/browse/CASSANDRA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antti Nissinen updated CASSANDRA-9572:
--------------------------------------
    Attachment: first_results_after_patch.txt

> DateTieredCompactionStrategy fails to combine SSTables correctly when TTL is 
> used.
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9572
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Antti Nissinen
>            Assignee: Marcus Eriksson
>              Labels: dtcs
>             Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x
>
>         Attachments: cassandra_sstable_metadata_reader.py, 
> cassandra_sstable_timespan_graph.py, compaction_stage_test01_jira.log, 
> compaction_stage_test02_jira.log, datagen.py, explanation_jira.txt, 
> first_results_after_patch.txt, motivation_jira.txt, src_2.1.5_with_debug.zip
>
>
> DateTieredCompaction works correctly when data is dumped for a certain time 
> period in short SSTables in time manner and then compacted together. However, 
> if TTL is applied to the data columns the DTCS fails to compact files 
> correctly in timely manner. In our opinion the problem is caused by two 
> issues:
> A) During the DateTieredCompaction process the getFullyExpiredSStables is 
> called twice. First from the DateTieredCompactionStrategy class and second 
> time from the CompactionTask class. On the first time the target is to find 
> out fully expired SStables that are not overlapping with any non-fully 
> expired SSTables. That works correctly. When the getFullyExpiredSSTables is 
> called second time from CompactionTask class the selection of fully expired 
> SSTables is modified compared to the first selection.
> B) The minimum timestamp of the new SSTables created by combining together 
> fully expired SSTable and files from the most interesting bucket is not 
> correct.
> These two issues together cause problems for the DTCS process when it 
> combines together SSTables having overlap in time and TTL for the column. 
> This is demonstrated by generating test data first without compactions and 
> showing the timely distribution of files. When the compaction is enabled the 
> DCTS combines files together, but the end result is not something to be 
> expected. This is demonstrated in the file motivation_jira.txt
> Attachments contain following material:
> - Motivation_jira.txt: Practical examples how the DTCS behaves with TTL
> - Explanation_jira.txt: gives more details, explains test cases and 
> demonstrates the problems in the compaction process
> - Logfile file for the compactions in the first test case 
> (compaction_stage_test01_jira.log)
> - Logfile file for the compactions in the seconnd test case 
> (compaction_stage_test02_jira.log)
> - source code zip file for version 2.1.5 with additional comment statements 
> (src_2.1.5_with_debug.zip)
> - Python script to generate test data (datagen.py)
> - Python script to read metadata from SStables 
> (cassandra_sstable_metadata_reader.py)
> - Python script to generate timeline representation of SSTables 
> (cassandra_sstable_timespan_graph.py)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to