[ 
https://issues.apache.org/jira/browse/CASSANDRA-10181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712014#comment-14712014
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-10181 at 8/26/15 7:54 AM:
----------------------------------------------------------------------

I think this has always been an issue, it probably hasn't been a problem until 
now as we enforced a check in {{SIM#addIndexedColumn}} that a custom index 
didn't extend {{AbstractSimplePerColumnSecondaryIndex}}. Aside from that 
though, I think if one had been sufficiently motivated to write one, I think a 
CFS backed custom index would have deadlocked earlier versions too. 

In the linked branch, I've modified the post flush task to force flush non-CFS 
backed indexes, rather than all custom indexes. I was expected to have to 
modify {{CFS#concatWithIndexes}} to so that it would include CFS backed custom 
indexes in the actual flush task but it was already doing so, which is another 
latent bug (although now it's actually the right thing to do). The branch is 
built on your patch from CASSANDRA-10180.

Patches:
* [3.0 branch|https://github.com/beobal/cassandra/tree/10181-3.0]
* [trunk branch|https://github.com/beobal/cassandra/tree/10181-trunk]

CI Tests:
* [3.0 
testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-3.0-testall/]
* [3.0 
dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-3.0-dtest/]
* [trunk 
testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-trunk-testall/]
* [trunk 
dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]



was (Author: beobal):
I think this has always been an issue, it probably hasn't been a problem until 
now as we enforced a check in {{SIM#addIndexedColumn}} that a custom index 
didn't extend {{AbstractSimplePerColumnSecondaryIndex}}. Aside from that 
though, I think if one had been sufficiently motivated to write one, I think a 
CFS backed custom index would have deadlocked earlier versions too. 

In the linked branch, I've modified the post flush task to force flush non-CFS 
backed indexes, rather than all custom indexes. I was expected to have to 
modify {{CFS#concatWithIndexes}} to so that it would include CFS backed custom 
indexes in the actual flush task but it was already doing so, which is another 
latent bug (although now it's actually the right thing to do). The branch is 
built on your patch from CASSANDRA-10180.

Patches:
* [3.0 branch|https://github.com/beobal/cassandra/tree/10181-3.0]
* [trunk branch|https://github.com/beobal/cassandra/tree/10181-trunk]

CI Tests:
* [3.0 
testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-testall/]
* [3.0 
dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]
* [trunk 
testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-trunk-testall/]
* [trunk 
dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-10181-dtest/]


> Deadlock flushing tables with CUSTOM indexes
> --------------------------------------------
>
>                 Key: CASSANDRA-10181
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10181
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Tyler Hobbs
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.0 beta 2
>
>         Attachments: flush-deadlock-repro.txt
>
>
> In 3.0, if a table with a CUSTOM secondary index is force flushed, Cassandra 
> will deadlock while attempting to perform a blocking flush on the tables 
> backing the secondary indexes.
> The basic problem is that the base table's post-flush task ends up waiting on 
> the post-flush task for the secondary index to complete.  However, since the 
> post-flush executor is single-threaded, this results in a deadlock.
> Here's the partial stacktrace for the base table part of this (line numbers 
> may not be 100% accurate):
> {noformat}
> org.apache.cassandra.db.ColumnFamilyStore.forceBlockingFlush(ColumnFamilyStore.java:927)
>       at 
> org.apache.cassandra.index.internal.CustomIndex.lambda$getBlockingFlushTask$0(VertexCentricIndex.java:114)
>       at 
> org.apache.cassandra.index.internal.CustomIndex$$Lambda$95/057902870.call(Unknown
>  Source)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)
>       at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
>       at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:58)
>       at 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)
>       at 
> org.apache.cassandra.index.SecondaryIndexManager.lambda$executeAllBlocking$39(SecondaryIndexManager.java:896)
>       at 
> org.apache.cassandra.index.SecondaryIndexManager$$Lambda$94/25774682.accept(Unknown
>  Source)
>       at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
>       at 
> java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
>       at 
> org.apache.cassandra.index.SecondaryIndexManager.executeAllBlocking(SecondaryIndexManager.java:893)
>       at 
> org.apache.cassandra.index.SecondaryIndexManager.flushIndexesBlocking(SecondaryIndexManager.java:346)
>       at 
> org.apache.cassandra.index.SecondaryIndexManager.flushAllCustomIndexesBlocking(SecondaryIndexManager.java:358)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore$PostFlush.run(ColumnFamilyStore.java:960)
> {noformat}
> First, note that the base of this stacktrace is in CFS$PostFlush.run(), which 
> means it's running on the post-flush executor.  When 
> {{CFS.forceBlockingFlush()}} is called on the secondary index table, we end 
> up blocking on another task that's submitted to the post-flush executor.  
> Since that executor is single-threaded and is already running the base table 
> task, this results in deadlock.
> The attached patch includes a unit test and custom secondary index class 
> (basically just KeysIndex) to reproduce the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to