[ https://issues.apache.org/jira/browse/CASSANDRA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707082#comment-14707082 ]
Paulo Motta commented on CASSANDRA-10104: ----------------------------------------- The initial error was related to hinted handoff failure, but that error [went away in the latest build|http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/lastCompletedBuild/testReport/jmx_test/TestJMX/netstats_test_2/] with the introduction of the new [hinted handoff implementation|https://issues.apache.org/jira/browse/CASSANDRA-6230]. There is now a new problem: {noformat} CassandraDaemon.java:635 - Exception encountered during startup java.lang.IllegalArgumentException: Unknown CF 5bc52802-de25-35ed-aeab-188eecebb090 \\tat org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:209) ~[main/:na] at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:202) ~[main/:na] at org.apache.cassandra.cql3.restrictions.StatementRestrictions.<init>(StatementRestrictions.java:125) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:790) ~[main/:na] at org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:740) ~[main/:na] at org.apache.cassandra.auth.CassandraRoleManager.prepare(CassandraRoleManager.java:423) ~[main/:na] at org.apache.cassandra.auth.CassandraRoleManager.setup(CassandraRoleManager.java:139) ~[main/:na] at org.apache.cassandra.service.StorageService.doAuthSetup(StorageService.java:1044) ~[main/:na] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:975) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:696) ~[main/:na] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:570) ~[main/:na] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) [main/:na] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [main/:na] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) [main/:na] {noformat} It seems this is a race similar to [CASSANDRA-9201|https://issues.apache.org/jira/browse/CASSANDRA-9201], that happens when multiple nodes are started concurrently. Apparently the new auth schema is already available but the ColumnFamilyStore is not yet created when retrieving roles on {{CassandraRoleManager}}. The not-so-elegant fix is to wait until the CFS is available in a busy loop before calling {{CassandraRoleManager.setup()}}. Maybe there's a better way of synchronizing this, so I'm open to suggestions. The patch is available [here|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:10104-3.0] for review.Tests will be available shortly below: * [3.0 testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-3.0-testall/lastCompletedBuild/testReport/] * [3.0 dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-3.0-dtest/lastCompletedBuild/testReport/] * [trunk testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-trunk-testall/lastCompletedBuild/testReport/] * [trunk dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-trunk-dtest/lastCompletedBuild/testReport/] > Windows dtest 3.0: jmx_test.py:TestJMX.netstats_test fails > ---------------------------------------------------------- > > Key: CASSANDRA-10104 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10104 > Project: Cassandra > Issue Type: Sub-task > Reporter: Joshua McKenzie > Assignee: Paulo Motta > Labels: Windows > Fix For: 3.0.x > > > {noformat} > Unexpected error in node1 node log: ['ERROR [HintedHandoff:2] 2015-08-16 > 23:14:04,419 CassandraDaemon.java:191 - Exception in thread > Thread[HintedHandoff:2,1,main] > org.apache.cassandra.exceptions.WriteFailureException: Operation failed - > received 0 responses and 1 failures \tat > org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:106) > ~[main/:na] \tat > org.apache.cassandra.db.HintedHandOffManager.checkDelivered(HintedHandOffManager.java:358) > ~[main/:na] \tat > org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:414) > ~[main/:na] \tat > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:346) > ~[main/:na] \tat > org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:91) > ~[main/:na] \tat > org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:537) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_45] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[na:1.8.0_45] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]'] > -------------------- >> begin captured logging << -------------------- > dtest: DEBUG: cluster ccm directory: d:\temp\dtest-j1ttp3 > dtest: DEBUG: Nodetool command > 'D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra\bin\nodetool.bat -h > localhost -p 7100 netstats' failed; exit status: 1; stdout: Starting NodeTool > ; stderr: nodetool: Failed to connect to 'localhost:7100' - ConnectException: > 'Connection refused: connect'. > dtest: DEBUG: removing ccm cluster test at: d:\temp\dtest-j1ttp3 > dtest: DEBUG: clearing ssl stores from [d:\temp\dtest-j1ttp3] directory > --------------------- >> end captured logging << --------------------- > {noformat} > Failure history: > [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/jmx_test/TestJMX/netstats_test/history/]. > Looks to have regressed on build > [#5|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/5/] > which seems unlikely given the commit. > Env: Both, though on a local run the test fails due to: > {noformat} > Traceback (most recent call last): > File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown > raise AssertionError('Unexpected error in %s node log: %s' % (node.name, > errors)) > AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 > 15:42:07,717 NoSpamLogger.java:97 - This platform does not support atomic > directory streams (SecureDirectoryStream); race conditions when loading > sstable files could occurr', 'ERROR [main] 2015-08-17 15:50:43,978 > NoSpamLogger.java:97 - This platform does not support atomic directory > streams (SecureDirectoryStream); race conditions when loading sstable files > could occurr'] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)