[ https://issues.apache.org/jira/browse/CASSANDRA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583313#comment-14583313 ]
Antti Nissinen commented on CASSANDRA-9572: ------------------------------------------- I ran the tests again and looked at the log file in detail. Now it works as expected. Thank you very much for all involved! > DateTieredCompactionStrategy fails to combine SSTables correctly when TTL is > used. > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-9572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9572 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Antti Nissinen > Assignee: Marcus Eriksson > Labels: dtcs > Fix For: 3.x, 2.1.x, 2.0.x, 2.2.x > > Attachments: cassandra_sstable_metadata_reader.py, > cassandra_sstable_timespan_graph.py, compaction_stage_test01_jira.log, > compaction_stage_test02_jira.log, datagen.py, explanation_jira.txt, > first_results_after_patch.txt, motivation_jira.txt, src_2.1.5_with_debug.zip > > > DateTieredCompaction works correctly when data is dumped for a certain time > period in short SSTables in time manner and then compacted together. However, > if TTL is applied to the data columns the DTCS fails to compact files > correctly in timely manner. In our opinion the problem is caused by two > issues: > A) During the DateTieredCompaction process the getFullyExpiredSStables is > called twice. First from the DateTieredCompactionStrategy class and second > time from the CompactionTask class. On the first time the target is to find > out fully expired SStables that are not overlapping with any non-fully > expired SSTables. That works correctly. When the getFullyExpiredSSTables is > called second time from CompactionTask class the selection of fully expired > SSTables is modified compared to the first selection. > B) The minimum timestamp of the new SSTables created by combining together > fully expired SSTable and files from the most interesting bucket is not > correct. > These two issues together cause problems for the DTCS process when it > combines together SSTables having overlap in time and TTL for the column. > This is demonstrated by generating test data first without compactions and > showing the timely distribution of files. When the compaction is enabled the > DCTS combines files together, but the end result is not something to be > expected. This is demonstrated in the file motivation_jira.txt > Attachments contain following material: > - Motivation_jira.txt: Practical examples how the DTCS behaves with TTL > - Explanation_jira.txt: gives more details, explains test cases and > demonstrates the problems in the compaction process > - Logfile file for the compactions in the first test case > (compaction_stage_test01_jira.log) > - Logfile file for the compactions in the seconnd test case > (compaction_stage_test02_jira.log) > - source code zip file for version 2.1.5 with additional comment statements > (src_2.1.5_with_debug.zip) > - Python script to generate test data (datagen.py) > - Python script to read metadata from SStables > (cassandra_sstable_metadata_reader.py) > - Python script to generate timeline representation of SSTables > (cassandra_sstable_timespan_graph.py) -- This message was sent by Atlassian JIRA (v6.3.4#6332)