[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231934#comment-17231934
 ] 

David Capwell commented on CASSANDRA-16213:
-------------------------------------------

Finished assassinate and made sure to flesh out the different cases I could 
see.  org.apache.cassandra.gms.EndpointState#isEmpty does need to check for 
status in order for assassinate with this patch.

If you stop all nodes and bring up all but the host to remove, then assassinate 
the node to remove, it will still be "empty" based off version, but will have a 
status.  If we do not check the status when we check for empty, we would then 
treat this endpoint as normal and move on, which isn't correct as its in the 
LEFT state.

 

[~paulo] I added 
org.apache.cassandra.distributed.test.hostreplacement.AssassinatedEmptyNodeTest 
to flesh this case out if you want to take a closer look. EndpointState.isEmpty 
is only use in one spot now since we removed the filter, so feel its still best 
to check the state to make sure it is this specific case.

> Cannot replace_address /X because it doesn't exist in gossip
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-16213
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Membership
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to