[jira] [Commented] (CASSANDRA-8683) Incremental repairs broken with early opening of compaction results

Benedict (JIRA) Tue, 27 Jan 2015 04:42:49 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293442#comment-14293442
 ]


Benedict commented on CASSANDRA-8683:
-------------------------------------

So I introduced the following code to the end of getPosition:

{code}
        assert op == SSTableReader.Operator.EQ;
        if (updateCacheAndStats)
            bloomFilterTracker.addFalsePositive();
        Tracing.trace("Partition index lookup complete (bloom filter false 
positive) for sstable {}", descriptor.generation);
        return null;
{code}

and this resulted in the following output (after enough runs):

{noformat}
    [junit] Testcase: 
testValidationMultipleSSTablePerLevel(org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest):
  Caused an ERROR
    [junit] java.lang.AssertionError
    [junit] java.util.concurrent.ExecutionException: java.lang.AssertionError
    [junit]     at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    [junit]     at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    [junit]     at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest.testValidationMultipleSSTablePerLevel(LeveledCompactionStrategyTest.java:184)
    [junit] Caused by: java.lang.AssertionError
    [junit]     at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:243)
    [junit]     at 
org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1355)
    [junit]     at 
org.apache.cassandra.io.sstable.format.SSTableReader.getPositionsForRanges(SSTableReader.java:1282)
    [junit]     at 
org.apache.cassandra.io.sstable.format.big.BigTableScanner.getScanner(BigTableScanner.java:67)
    [junit]     at 
org.apache.cassandra.io.sstable.format.big.BigTableReader.getScanner(BigTableReader.java:101)
    [junit]     at 
org.apache.cassandra.io.sstable.format.SSTableReader.getScanner(SSTableReader.java:1538)
    [junit]     at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy$LeveledScanner.<init>(LeveledCompactionStrategy.java:318)
    [junit]     at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.getScanners(LeveledCompactionStrategy.java:245)
    [junit]     at 
org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getScanners(WrappingCompactionStrategy.java:357)
    [junit]     at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:999)
    [junit]     at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:95)
    [junit]     at 
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:591)
    [junit]     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    [junit]     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    [junit]     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    [junit]     at java.lang.Thread.run(Thread.java:745)
{noformat}

Which suggests that it is indeed that we're skipping over the entire contents 
of our index page. However this may have nothing to do with the 
getToken().maxKeyBound() business, since that would likely be deterministic 
(although this is likely also a bug). It's possible that the memory being freed 
is happening at the right timing interval to give us a bad binary search 
result, though. This seems a little unlikely, but _is_ possible, so I suggest 
we get 7705 finished and running so we can see nail down the early release of 
memory, and then continue our investigations here.

> Incremental repairs broken with early opening of compaction results
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-8683
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8683
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>             Fix For: 2.1.3
>
>         Attachments: 0001-avoid-NPE-in-getPositionsForRanges.patch
>
>
> Incremental repairs holds a set of the sstables it started the repair on (we 
> need to know which sstables were actually validated to be able to anticompact 
> them). This includes any tmplink files that existed when the compaction 
> started (if we wouldn't include those, we would miss data since we move the 
> start point of the existing non-tmplink files)
> With CASSANDRA-6916 we swap out those instances with new ones 
> (SSTR.cloneWithNewStart / SSTW.openEarly), meaning that the underlying file 
> can get deleted even though we hold a reference.
> This causes the unit test error: 
> http://cassci.datastax.com/job/trunk_utest/1330/testReport/junit/org.apache.cassandra.db.compaction/LeveledCompactionStrategyTest/testValidationMultipleSSTablePerLevel/
> (note that it only fails on trunk though, in 2.1 we don't hold references to 
> the repairing files for non-incremental repairs, but the bug should exist in 
> 2.1 as well)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8683) Incremental repairs broken with early opening of compaction results

Reply via email to