[ https://issues.apache.org/jira/browse/CASSANDRA-11729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-11729: ---------------------------------------- Attachment: node3_debug.log.gz node2_debug.log.gz node1_debug.log.gz This isn't actually related to indexes, but is highlighting a race condition which is pretty pervasive. You can see from the stacktrace that the assertion error is actually being thrown from a lambda defined in {{CassandraDaemon::setup}}, which only runs when a node is started. From inspection of the code and logs, what seems to be happening is this: * At startup node1 creates a task to submit rebuilds of all MVs in all keyspaces& submits it to the {{OptionalTasks}} executor to run after {{RING_DELAY}}. * While this is still pending, all 3 nodes finish startup and proceed with the test, creating and then dropping the {{ks}} keyspace. * It so happens that all of the "DROP KEYSPACE" statements hit node3 as the coordinator. From its log, we can see that the 4th of these executes at {{00:33:56,585}}, so shortly after that point, it pushes a defs change to node1 and node2. * Back on node1, the MV building runnable is executed where it calls {{Keyspace::all}} and begins to iterate the keyspaces, submitting MV builds. This is where the race occurs. {{Keyspace::all}} provides an iterable of {{Keyspace}} instance by transforming the key set of {{Schema.instance.keyspaces}}, using {{Keyspace::open}} as the transformation function. Concurrently, processing the schema update pushed by node3 follows the path {code} SchemaKeyspace::mergeSchema -> Schema.instance.dropKeyspace -> Schema.instance.clearKeyspaceMetadata -> Schema.instance.keyspaces.remove {code} If the removal from {{Schema.instance.keyspaces}} happens after the transforming iterable has read the keyspace name from the keyset, but before it attempts to open the {{Keyspace}}, the assertion error is thrown. This is really a deep rooted problem with schema not being properly safe under any level of concurrency. {{Keyspace::all}} has many callsites, all of which are potentially vulnerable to this and fixing that properly should be done as a subtask of CASSANDRA-9424. [~iamaleksey] , I don't think that any of the existing subtasks fully capture this. Do you think it may fit in CASSANDRA-9425, or do you think a new ticket is called for? [~philipthompson], is the best thing to do here just to mark the test as flaky for now? > dtest failure in > secondary_indexes_test.TestSecondaryIndexes.test_6924_dropping_ks > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-11729 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11729 > Project: Cassandra > Issue Type: Bug > Reporter: Russ Hatch > Assignee: Sam Tunnicliffe > Labels: dtest > Fix For: 3.x > > Attachments: node1_debug.log.gz, node2_debug.log.gz, > node3_debug.log.gz > > > looks to be a single flap. might be worth trying to reproduce. example > failure: > http://cassci.datastax.com/job/trunk_dtest/1204/testReport/secondary_indexes_test/TestSecondaryIndexes/test_6924_dropping_ks > Failed on CassCI build trunk_dtest #1204 -- This message was sent by Atlassian JIRA (v6.3.4#6332)