[jira] [Commented] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

Matt Fleming (Jira) Fri, 16 Apr 2021 09:07:14 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323917#comment-17323917
 ]


Matt Fleming commented on CASSANDRA-16588:
------------------------------------------

I think the patch from Sam is a bit too aggressive and will incorrectly think 
that gossip data for the local node that contains dead states ("left", 
"removing", "hibernate", etc) is the bad ACK that we're trying to detect to 
avoid the NPE in isSafeForStartup. You should be able to trigger this by 
assassinating a non-seed node in a cluster.

We should probably filter out deadStates because they won't trigger the NPE.

Something like this 
https://github.com/mfleming/cassandra/commit/e68602ae300e6a2567e1b59efa4229ff3456e521

> NPE getting host_id in Gossiper.isSafeForStartup
> ------------------------------------------------
>
>                 Key: CASSANDRA-16588
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16588
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Brandon Williams
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 3.11.x, 4.0-rc
>
>
> As seen here: 
> https://ci-cassandra.apache.org/job/Cassandra-devbranch/604/testReport/junit/org.apache.cassandra.distributed.upgrade/MixedModeGossipTest/testStatusFieldShouldExistInOldVersionNodesEdgeCase/
> {noformat}
> java.lang.NullPointerException
>       at org.apache.cassandra.gms.Gossiper.isSafeForStartup(Gossiper.java:952)
>       at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:657)
>       at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:933)
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>       at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$10(Instance.java:541)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>       at java.lang.Thread.run(Thread.java:748)
> {noformat}
> I believe what is happening is a GossipDigestAck has been queued to ack the 
> shutdown state from the node on the seed, but isn't actually sent until the 
> node has restarted and gone into shadow.  Since the ack contains the node's 
> IP, it assumes a host_id will be there but since this is not an actual shadow 
> response, it is not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16588) NPE getting host_id in Gossiper.isSafeForStartup

Reply via email to