[ 
https://issues.apache.org/jira/browse/CASSANDRA-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14707082#comment-14707082
 ] 

Paulo Motta commented on CASSANDRA-10104:
-----------------------------------------

The initial error was related to hinted handoff failure, but that error [went 
away in the latest 
build|http://cassci.datastax.com/view/win32/job/cassandra-3.0_dtest_win32/lastCompletedBuild/testReport/jmx_test/TestJMX/netstats_test_2/]
 with the introduction of the new [hinted handoff 
implementation|https://issues.apache.org/jira/browse/CASSANDRA-6230]. There is 
now a new problem:

{noformat}
CassandraDaemon.java:635 - Exception encountered during startup 
java.lang.IllegalArgumentException: Unknown CF 
5bc52802-de25-35ed-aeab-188eecebb090 \\tat 
org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:209) 
~[main/:na]
at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:202) 
~[main/:na]
at 
org.apache.cassandra.cql3.restrictions.StatementRestrictions.<init>(StatementRestrictions.java:125)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepareRestrictions(SelectStatement.java:790)
 ~[main/:na]
at 
org.apache.cassandra.cql3.statements.SelectStatement$RawStatement.prepare(SelectStatement.java:740)
 ~[main/:na]
at 
org.apache.cassandra.auth.CassandraRoleManager.prepare(CassandraRoleManager.java:423)
 ~[main/:na]
at 
org.apache.cassandra.auth.CassandraRoleManager.setup(CassandraRoleManager.java:139)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageService.doAuthSetup(StorageService.java:1044)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:975)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:696) 
~[main/:na]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:570) 
~[main/:na]
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) 
[main/:na]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) 
[main/:na]
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) 
[main/:na]
{noformat}

It seems this is a race similar to 
[CASSANDRA-9201|https://issues.apache.org/jira/browse/CASSANDRA-9201], that 
happens when multiple nodes are started concurrently. Apparently the new auth 
schema is already available but the ColumnFamilyStore is not yet created when 
retrieving roles on {{CassandraRoleManager}}.

The not-so-elegant fix is to wait until the CFS is available in a busy loop 
before calling {{CassandraRoleManager.setup()}}. Maybe there's a better way of 
synchronizing this, so I'm open to suggestions.

The patch is available 
[here|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:10104-3.0]
 for review.Tests will be available shortly below:
* [3.0 
testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-3.0-testall/lastCompletedBuild/testReport/]
* [3.0 
dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-3.0-dtest/lastCompletedBuild/testReport/]
* [trunk 
testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-trunk-testall/lastCompletedBuild/testReport/]
* [trunk 
dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10104-trunk-dtest/lastCompletedBuild/testReport/]



> Windows dtest 3.0: jmx_test.py:TestJMX.netstats_test fails
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-10104
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10104
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joshua McKenzie
>            Assignee: Paulo Motta
>              Labels: Windows
>             Fix For: 3.0.x
>
>
> {noformat}
> Unexpected error in node1 node log: ['ERROR [HintedHandoff:2] 2015-08-16 
> 23:14:04,419 CassandraDaemon.java:191 - Exception in thread 
> Thread[HintedHandoff:2,1,main] 
> org.apache.cassandra.exceptions.WriteFailureException: Operation failed - 
> received 0 responses and 1 failures \tat 
> org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:106)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.HintedHandOffManager.checkDelivered(HintedHandOffManager.java:358)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:414)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:346)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.HintedHandOffManager.access$400(HintedHandOffManager.java:91)
>  ~[main/:na] \tat 
> org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:537)
>  ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_45] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_45] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]']
> -------------------- >> begin captured logging << --------------------
> dtest: DEBUG: cluster ccm directory: d:\temp\dtest-j1ttp3
> dtest: DEBUG: Nodetool command 
> 'D:\jenkins\workspace\cassandra-3.0_dtest_win32\cassandra\bin\nodetool.bat -h 
> localhost -p 7100 netstats' failed; exit status: 1; stdout: Starting NodeTool
> ; stderr: nodetool: Failed to connect to 'localhost:7100' - ConnectException: 
> 'Connection refused: connect'.
> dtest: DEBUG: removing ccm cluster test at: d:\temp\dtest-j1ttp3
> dtest: DEBUG: clearing ssl stores from [d:\temp\dtest-j1ttp3] directory
> --------------------- >> end captured logging << ---------------------
> {noformat}
> Failure history: 
> [consistent|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/17/testReport/junit/jmx_test/TestJMX/netstats_test/history/].
>  Looks to have regressed on build 
> [#5|http://cassci.datastax.com/view/cassandra-3.0/job/cassandra-3.0_dtest_win32/5/]
>  which seems unlikely given the commit.
> Env: Both, though on a local run the test fails due to:
> {noformat}
> Traceback (most recent call last):
>   File "c:\src\cassandra-dtest\dtest.py", line 532, in tearDown
>     raise AssertionError('Unexpected error in %s node log: %s' % (node.name, 
> errors))
> AssertionError: Unexpected error in node1 node log: ['ERROR [main] 2015-08-17 
> 15:42:07,717 NoSpamLogger.java:97 - This platform does not support atomic 
> directory streams (SecureDirectoryStream); race conditions when loading 
> sstable files could occurr', 'ERROR [main] 2015-08-17 15:50:43,978 
> NoSpamLogger.java:97 - This platform does not support atomic directory 
> streams (SecureDirectoryStream); race conditions when loading sstable files 
> could occurr']
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to