[ 
https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959301#comment-15959301
 ] 

Joel Knighton commented on CASSANDRA-13407:
-------------------------------------------

For posterity, this is the race possible when the Gossiper is started, as far 
as I can tell.

In setup, we initialize a fake ring using Util.createInitialRing. This will 
intialize the nodes in an unsafe manner and then inject the token states. If a 
status check runs before the tokens state is set, the previously decommissioned 
node will look like a fat client, since it won't have tokens and will not have 
a DEAD_STATE. Since we aren't gossiping, we won't have heard from it in greater 
than fatClientTimeout, so we'll remove it. If this races with the ss.onChange 
in createInitialRing, we can remove the endpointstate while processing it, 
which will cause a NPE as above. This race can be seen at 16:15:51,205 in the 
log linked from the test failure.

We also need to remove SchemaLoader.loadSchema() as you did in the patch - this 
is because it starts the Gossiper as well. This is fine; we don't appear to 
need it.

The patch looks good - the race exists in theory on 2.1/2.2, but it appears to 
only manifest on 3.0+. I don't think it is worth committing to 2.1 for that 
reason - let's do 2.2+ forward and run the test at least once on each branch 
before committing.



> test failure at RemoveTest.testBadHostId
> ----------------------------------------
>
>                 Key: CASSANDRA-13407
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alex Petrov
>            Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
>       at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
>       at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
>       at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
>       at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
>       at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>       at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure 
> example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to