[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Raymond Huffman (Jira) Tue, 14 Mar 2023 14:48:07 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700391#comment-17700391
 ]


Raymond Huffman commented on CASSANDRA-18319:
---------------------------------------------

I will add that the rolling restart is necessary to get the cluster into this 
state. Since the restarts are staggered, the FatClient timeout each node has 
for the old IP will trigger at different times, and their subsequent gossip 
quarantines for the old IP will also end at different times. This gives us logs 
that look like this. Note the difference in the times that FatClient is 
removed. 

(127.0.0.1)
INFO [GossipTasks:1] 2023-03-14 19:21:32,223 Gossiper.java:933 - FatClient 
/127.0.0.6 has been silent for 30000ms, removing from gossip
DEBUG [GossipTasks:1] 2023-03-14 19:22:32,239 Gossiper.java:961 - 60000 
elapsed, /127.0.0.6 gossip quarantine over
INFO [GossipStage:1] 2023-03-14 19:22:36,042 Gossiper.java:1199 - Node 
/127.0.0.6 is now part of the cluster
 (127.0.0.2)
INFO [GossipTasks:1] 2023-03-14 19:21:40,138 Gossiper.java:933 - FatClient 
/127.0.0.6 has been silent for 30000ms, removing from gossip
DEBUG [GossipTasks:1] 2023-03-14 19:22:40,155 Gossiper.java:961 - 60000 
elapsed, /127.0.0.6 gossip quarantine over
INFO [GossipStage:1] 2023-03-14 19:22:41,243 Gossiper.java:1199 - Node 
/127.0.0.6 is now part of the cluster
 
(127.0.0.3)
INFO [GossipTasks:1] 2023-03-14 19:21:54,060 Gossiper.java:933 - FatClient 
/127.0.0.6 has been silent for 30000ms, removing from gossip
DEBUG [GossipTasks:1] 2023-03-14 19:22:54,081 Gossiper.java:961 - 60000 
elapsed, /127.0.0.6 gossip quarantine over
INFO [GossipStage:1] 2023-03-14 19:22:55,161 Gossiper.java:1199 - Node 
/127.0.0.6 is now part of the cluster
 (127.0.0.4)
INFO [GossipTasks:1] 2023-03-14 19:22:08,209 Gossiper.java:933 - FatClient 
/127.0.0.6 has been silent for 30000ms, removing from gossip
DEBUG [GossipTasks:1] 2023-03-14 19:23:08,229 Gossiper.java:961 - 60000 
elapsed, /127.0.0.6 gossip quarantine over
INFO [GossipStage:1] 2023-03-14 19:23:08,252 Gossiper.java:1199 - Node 
/127.0.0.6 is now part of the cluster
 (127.0.0.5)
INFO [GossipTasks:1] 2023-03-14 19:22:22,226 Gossiper.java:933 - FatClient 
/127.0.0.6 has been silent for 30000ms, removing from gossipDEBUG 
[GossipTasks:1] 2023-03-14 19:23:22,247 Gossiper.java:961 - 60000 elapsed, 
/127.0.0.6 gossip quarantine over
INFO [GossipStage:1] 2023-03-14 19:23:25,172 Gossiper.java:1199 - Node 
/127.0.0.6 is now part of the cluster 

> Cassandra in Kubernetes: IP switch decommission issue
> -----------------------------------------------------
>
>                 Key: CASSANDRA-18319
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18319
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ines Potier
>            Priority: Normal
>         Attachments: 3.11_gossipinfo.zip, node1_gossipinfo.txt, 
> test_decommission_after_ip_change_logs.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have recently encountered a recurring old IP reappearance issue while 
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> *Issue Description*
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have 
> noticed that this behavior, associated with a decommission operation, can get 
> the cluster into an erroneous state.
> Consider the following situation: a Cassandra node {{node1}} , with 
> {{{}hostId1{}}}, owning 20.5% of the token ring, bounces and switches IP 
> ({{{}old_IP{}}} → {{{}new_IP{}}}). After a couple gossip iterations, all 
> other nodes’ nodetool status output includes a {{new_IP}} UN entry owning 
> 20.5% of the token ring and no {{old_IP}} entry.
> Shortly after the bounce, {{node1}} gets decommissioned. Our cluster does not 
> have a lot of data, and the decommission operation completes pretty quickly. 
> Logs on other nodes start showing acknowledgment that {{node1}} has left and 
> soon, nodetool status’ {{new_IP}} UL entry disappears. {{node1}} ‘s pod is 
> deleted.
> After a minute delay, the cluster enters the erroneous state. An  {{old_IP}} 
> DN entry reappears in nodetool status, owning 20.5% of the token ring. No 
> node owns this IP anymore and according to logs, {{old_IP}} is still 
> associated with {{{}hostId1{}}}.
> *Issue Root Cause*
> By digging through Cassandra logs, and re-testing this scenario over and over 
> again, we have reached the following conclusion: 
>  * Other nodes will continue exchanging gossip about {{old_IP}} , even after 
> it becomes a fatClient.
>  * The fatClient timeout and subsequent quarantine does not stop {{old_IP}} 
> from reappearing in a node’s Gossip state, once its quarantine is over. We 
> believe that this is due to a misalignment on all nodes’ {{old_IP}} 
> expiration time.
>  * Once {{new_IP}} has left the cluster, and {{old_IP}} next gossip state 
> message is received by a node, StorageService will no longer face collisions 
> (or will, but with an even older IP) for {{hostId1}} and its corresponding 
> tokens. As a result, {{old_IP}} will regain ownership of 20.5% of the token 
> ring.
> *Proposed fix*
> Following the above investigation, we were thinking about implementing the 
> following fix:
> When a node receives a gossip status change with {{STATE_LEFT}} for a leaving 
> endpoint {{{}new_IP{}}}, before evicting {{{}new_IP from the token ring, 
> purge from Gossip (ie evictFromMembership{}}}) all endpoints that meet the 
> following criteria:
>  * {{endpointStateMap}} contains this endpoint
>  * The endpoint is not currently a token owner 
> ({{{}!tokenMetadata.isMember(endpoint){}}})
>  * The endpoint’s {{hostId}} matches the {{hostId}} of {{new_IP}}
>  * The endpoint is older than {{leaving_IP}} 
> ({{{}Gossiper.instance.compareEndpointStartup{}}})
>  * The endpoint’s token range (from {{{}endpointStateMap{}}}) intersects with 
> {{{}new_IP{}}}’s
> This modification’s intention is to force nodes to realign on {{old_IP}} 
> expiration, and expunge it from Gossip so it does not reappear after 
> {{new_IP}} leaves the ring.
> Another approach we have also been considering is expunging {{old_IP}} at the 
> moment of the StorageService collision resolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Reply via email to