[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679517#comment-14679517
 ] 

Stefania commented on CASSANDRA-9908:
-------------------------------------

bq. Thinking about it, we need to replicate a little more code from the regular 
CF drop: interruption of compactions. We should be calling both 
CompactionManager.instance.interruptCompactionForCFs and waitForCessation, to 
ensure that most references to the files are also relinquished before we signal 
successful drop.

Done. However, {{waitForCessasion}} is not called by {{Keyspace.dropCf}} - just 
pointing it out, I agree that we should call it.

bq. Then, unfortunately, that still doesn't put us in the clear, but I'm 
beginning to think it would be better and more robust to just accelerate 
SecondaryIndexes having their own UUID, which I believe is already on the cards 
for the near future. As we can still have other long running operations - such 
as keySamples which have references to sstables that can be long lived, that 
aren't compactions and won't acquiesce. So, really, we need to sleep-spin until 
they are all released. This is pretty unpleasant, and it does run the risk of 
schema changes becoming blocked indefinitely on a leaked reference.

Other than this race, which would result in a log message after this patch, 
what are the other risks? From what I understand we include temporary files 
when calculating the latest generation number in 
{{ColumnFamilyStore.createColumnFamilyStore}} but we exclude them when loading 
the sstables in the constructor of CFS. So, we should not reuse old generation 
numbers (which would be really bad) and we should not load old files (also 
quite bad). We delete the tx log file last, and we call getTemporaryFiles() 
before calling listFiles() in Directories.filter(). So can we still pick up 
temporary files by mistake? If anything we should always ensure that 
listFiles() is never called before getTemporaryFiles() by perhaps moving this 
code together? Or am I missing something completely? I do agree on using 
separate UUID for indexes by the way, I am just reasoning on how to make this 
safe given the current status.

bq. I would also prefer we at least logged an error message if we encounter the 
race condition with the NoSuchFileException. Since it definitely means we still 
have problems.

I changed the log message from WARN to ERROR.

--

Please disregard the old patch based on trunk, I rebased it on cassandra-3.0 
and renamed it to 
[9908-3.0|https://github.com/stef1927/cassandra/commits/9908-3.0].


> Potential race caused by async cleanup of transaction log files
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9908
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sam Tunnicliffe
>            Assignee: Stefania
>              Labels: benedict-to-commit
>             Fix For: 3.0 beta 1
>
>         Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit]     at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit]     at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit]     at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit]     at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit]     at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit]     at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit]     at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit]     at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit]     at 
> org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:351)
> [junit]     at 
> org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:313)
> [junit]     at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit]     at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit]     at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit]     at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit]     at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit]     at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit]     at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit]     at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit]     at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit]     at java.nio.file.Files.newByteChannel(Files.java:407)
> [junit]     at 
> java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
> [junit]     at java.nio.file.Files.newInputStream(Files.java:152)
> [junit]     at java.nio.file.Files.newBufferedReader(Files.java:2784)
> [junit]     at java.nio.file.Files.readAllLines(Files.java:3202)
> [junit]     at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:616)
> [junit] 
> [junit] 
> [junit] Test org.apache.cassandra.db.SecondaryIndexTest FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to