[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208240#comment-17208240
 ] 

Brandon Williams edited comment on CASSANDRA-16182 at 10/5/20, 6:24 PM:
------------------------------------------------------------------------

bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could talk to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically


was (Author: brandon.williams):
bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could take to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16182
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Normal
>             Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 30000ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to