[ 
https://issues.apache.org/jira/browse/CASSANDRA-12251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401718#comment-15401718
 ] 

Alex Petrov commented on CASSANDRA-12251:
-----------------------------------------

Unfortunately, I could not reproduce this locally. However, given the exception 
it's more or less clear what has happened. It's just much harder to land nodes 
into the same state. 

The tasks executed by {{MigrationManager}} are landing in the {{optional}} 
tasks of scheduled executor 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/MigrationManager.java].
 Optional tasks are not being waited for during shutdown or drain. Although 
even putting the task into the {{nonPeriodicTasks}} executor won't fix the 
problem since post-flush executor is being shut down before the periodic 
executor 
[here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StorageService.java#L4251].
 

In order to fix the issue, we need to both move the migration task to 
non-periodic and make sure that shutdown order is correct. Since mutation stage 
is already shut down, I left commit log shutdown on ints place and only moved 
the post flush executor shutdown after the periodic tasks.

> dtest failure in 
> upgrade_tests.cql_tests.TestCQLNodes3RF3_Upgrade_current_3_x_To_indev_3_x.whole_list_conditional_test
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-12251
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12251
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Philip Thompson
>            Assignee: Alex Petrov
>              Labels: dtest
>         Attachments: node1.log, node1_debug.log, node1_gc.log, node2.log, 
> node2_debug.log, node2_gc.log, node3.log, node3_debug.log, node3_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.8_dtest_upgrade/1/testReport/upgrade_tests.cql_tests/TestCQLNodes3RF3_Upgrade_current_3_x_To_indev_3_x/whole_list_conditional_test
> Failed on CassCI build cassandra-3.8_dtest_upgrade #1
> Relevant error in logs is
> {code}
> Unexpected error in node1 log, error: 
> ERROR [InternalResponseStage:2] 2016-07-20 04:58:45,876 
> CassandraDaemon.java:217 - Exception in thread 
> Thread[InternalResponseStage:2,5,main]
> java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut 
> down
>       at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:61)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) 
> ~[na:1.8.0_51]
>       at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) 
> ~[na:1.8.0_51]
>       at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:165)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>  ~[na:1.8.0_51]
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtable(ColumnFamilyStore.java:842)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.switchMemtableIfCurrent(ColumnFamilyStore.java:822)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.forceFlush(ColumnFamilyStore.java:891)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.schema.SchemaKeyspace.lambda$flush$1(SchemaKeyspace.java:279)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.schema.SchemaKeyspace$$Lambda$200/1129213153.accept(Unknown
>  Source) ~[na:na]
>       at java.lang.Iterable.forEach(Iterable.java:75) ~[na:1.8.0_51]
>       at 
> org.apache.cassandra.schema.SchemaKeyspace.flush(SchemaKeyspace.java:279) 
> ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchema(SchemaKeyspace.java:1271)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.schema.SchemaKeyspace.mergeSchemaAndAnnounceVersion(SchemaKeyspace.java:1253)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.service.MigrationTask$1.response(MigrationTask.java:92) 
> ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:53)
>  ~[apache-cassandra-3.7.jar:3.7]
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) 
> ~[apache-cassandra-3.7.jar:3.7]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_51]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_51]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_51]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_51]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_51]
> {code}
> This is on a mixed 3.0.8, 3.8-tentative cluster



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to