[ 
https://issues.apache.org/jira/browse/CASSANDRA-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-8683:
--------------------------------
    Description: 
When introducing CASSANDRA-6916 we permitted the early opened files to overlap 
with the files they were replacing by one DecoratedKey, as this permitted a few 
minor simplifications. Unfortunately this breaks assumptions in 
LeveledCompactionScanner, that are causing the intermittent unit test failures: 
http://cassci.datastax.com/job/trunk_utest/1330/testReport/junit/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testValidationMultipleSSTablePerLevel/

This patch by itself does not fix the bug, but fixes the described aspect of 
it, by ensuring the replaced and replacing files never overlap. This is 
achieved first by always selecting the replaced file start as the next key 
present in the file greater than the last key in the new file(s).  If there is 
no such key, however, there is no data to return for the reader, but to permit 
abort and atomic replacement at the end of a macro compaction action, we must 
keep the file in the DataTracker for replacement purposes, but not return it to 
consumers (esp. as many assume a non-empty range). For this I have introduced a 
new OpenReason called SHADOWED, and a DataTracker.View.shadowed collection of 
sstables, that tracks those we still consider to be in the live set, but from 
which we no longer answer any queries.

CASSANDRA-8744 (and then CASSANDRA-8750) then ensures that these bounds are 
honoured, so that we never break the assumption that files in LCS never overlap.

  was:
Incremental repairs holds a set of the sstables it started the repair on (we 
need to know which sstables were actually validated to be able to anticompact 
them). This includes any tmplink files that existed when the compaction started 
(if we wouldn't include those, we would miss data since we move the start point 
of the existing non-tmplink files)

With CASSANDRA-6916 we swap out those instances with new ones 
(SSTR.cloneWithNewStart / SSTW.openEarly), meaning that the underlying file can 
get deleted even though we hold a reference.

This causes the unit test error: 
http://cassci.datastax.com/job/trunk_utest/1330/testReport/junit/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testValidationMultipleSSTablePerLevel/

(note that it only fails on trunk though, in 2.1 we don't hold references to 
the repairing files for non-incremental repairs, but the bug should exist in 
2.1 as well)


> Ensure early reopening has no overlap with replaced files
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8683
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8683
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Eriksson
>            Assignee: Benedict
>            Priority: Critical
>             Fix For: 2.1.3
>
>         Attachments: 0001-avoid-NPE-in-getPositionsForRanges.patch
>
>
> When introducing CASSANDRA-6916 we permitted the early opened files to 
> overlap with the files they were replacing by one DecoratedKey, as this 
> permitted a few minor simplifications. Unfortunately this breaks assumptions 
> in LeveledCompactionScanner, that are causing the intermittent unit test 
> failures: 
> http://cassci.datastax.com/job/trunk_utest/1330/testReport/junit/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testValidationMultipleSSTablePerLevel/
> This patch by itself does not fix the bug, but fixes the described aspect of 
> it, by ensuring the replaced and replacing files never overlap. This is 
> achieved first by always selecting the replaced file start as the next key 
> present in the file greater than the last key in the new file(s).  If there 
> is no such key, however, there is no data to return for the reader, but to 
> permit abort and atomic replacement at the end of a macro compaction action, 
> we must keep the file in the DataTracker for replacement purposes, but not 
> return it to consumers (esp. as many assume a non-empty range). For this I 
> have introduced a new OpenReason called SHADOWED, and a 
> DataTracker.View.shadowed collection of sstables, that tracks those we still 
> consider to be in the live set, but from which we no longer answer any 
> queries.
> CASSANDRA-8744 (and then CASSANDRA-8750) then ensures that these bounds are 
> honoured, so that we never break the assumption that files in LCS never 
> overlap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to