[DISCUSS] Gossip shutdown may corrupt peers making it so the cluster never converges, and a small protocol change to fix

David Capwell Fri, 06 Oct 2023 15:50:21 -0700

Just filed https://issues.apache.org/jira/browse/CASSANDRA-18913 (Gossip NPE 
due to shutdown event corrupting empty statuses) which is where I saw this 
issue..


When we do gossip shutdown we send a message GOSSIP_SHUTDOWN which then gets 
handled by this method org.apache.cassandra.gms.Gossiper#markAsShutdown… there 
is a issue with the current implementation; the peers mutate the state for the 
node shutting down, which cause pending gossip events to get ignored!

Simple example of an issue here is the following

Node1 starts up and starts bootstrapping
Node1 joins the ring
Node1 disables gossip (or halts)

In this case some nodes in the cluster will see the joining of the ring, and 
others won’t.  Now, the ones who have seen the gossip shutdown will set the 
version to Integer.MAX_VALUE which will have gossip not sync any unseen states…

Why is this a problem?  Lets say you now need to host replace node1… and the 
seeds you are using didn’t see the join ring event… you then get the following 
error during the host replacement "Could not find tokens for %s to replace”

To solve this and clean things up, I would like to send the state from the node 
shutting down and avoid peers mutating endpoint states they don’t own; with 
this the cluster should eventually converge!  This would be a protocol change, 
so would need to make sure everyone is cool with me doing this in 5.0.

[DISCUSS] Gossip shutdown may corrupt peers making it so the cluster never converges, and a small protocol change to fix

Reply via email to