[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Raymond Huffman (Jira) Wed, 15 Mar 2023 15:41:08 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700889#comment-17700889
 ]


Raymond Huffman commented on CASSANDRA-18319:
---------------------------------------------

As one attempt at remediation, I added a call to 
`Gossiper.instance.replacementQuarantine(ep)` in `SS.handleStateNormal`, if a 
hostId collision is detected, but since this only doubles the value of 
QUARANTINE_DELAY, I can still get the issue to appear if I restart the nodes 
more slowly during the rolling restart. Presumably there is some value for 
QUARANTINE_DELAY that is large enough, but it would probably need to scale with 
the number of nodes in the cluster.

> Cassandra in Kubernetes: IP switch decommission issue
> -----------------------------------------------------
>
>                 Key: CASSANDRA-18319
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18319
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ines Potier
>            Priority: Normal
>         Attachments: 3.11_gossipinfo.zip, node1_gossipinfo.txt, 
> test_decommission_after_ip_change_logs.zip, 
> v4.0_1678853171792_test_decommission_after_ip_change.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have recently encountered a recurring old IP reappearance issue while 
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> *Issue Description*
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have 
> noticed that this behavior, associated with a decommission operation, can get 
> the cluster into an erroneous state.
> Consider the following situation: a Cassandra node {{node1}} , with 
> {{{}hostId1{}}}, owning 20.5% of the token ring, bounces and switches IP 
> ({{{}old_IP{}}} → {{{}new_IP{}}}). After a couple gossip iterations, all 
> other nodes’ nodetool status output includes a {{new_IP}} UN entry owning 
> 20.5% of the token ring and no {{old_IP}} entry.
> Shortly after the bounce, {{node1}} gets decommissioned. Our cluster does not 
> have a lot of data, and the decommission operation completes pretty quickly. 
> Logs on other nodes start showing acknowledgment that {{node1}} has left and 
> soon, nodetool status’ {{new_IP}} UL entry disappears. {{node1}} ‘s pod is 
> deleted.
> After a minute delay, the cluster enters the erroneous state. An  {{old_IP}} 
> DN entry reappears in nodetool status, owning 20.5% of the token ring. No 
> node owns this IP anymore and according to logs, {{old_IP}} is still 
> associated with {{{}hostId1{}}}.
> *Issue Root Cause*
> By digging through Cassandra logs, and re-testing this scenario over and over 
> again, we have reached the following conclusion: 
>  * Other nodes will continue exchanging gossip about {{old_IP}} , even after 
> it becomes a fatClient.
>  * The fatClient timeout and subsequent quarantine does not stop {{old_IP}} 
> from reappearing in a node’s Gossip state, once its quarantine is over. We 
> believe that this is due to a misalignment on all nodes’ {{old_IP}} 
> expiration time.
>  * Once {{new_IP}} has left the cluster, and {{old_IP}} next gossip state 
> message is received by a node, StorageService will no longer face collisions 
> (or will, but with an even older IP) for {{hostId1}} and its corresponding 
> tokens. As a result, {{old_IP}} will regain ownership of 20.5% of the token 
> ring.
> *Proposed fix*
> Following the above investigation, we were thinking about implementing the 
> following fix:
> When a node receives a gossip status change with {{STATE_LEFT}} for a leaving 
> endpoint {{{}new_IP{}}}, before evicting {{{}new_IP from the token ring, 
> purge from Gossip (ie evictFromMembership{}}}) all endpoints that meet the 
> following criteria:
>  * {{endpointStateMap}} contains this endpoint
>  * The endpoint is not currently a token owner 
> ({{{}!tokenMetadata.isMember(endpoint){}}})
>  * The endpoint’s {{hostId}} matches the {{hostId}} of {{new_IP}}
>  * The endpoint is older than {{leaving_IP}} 
> ({{{}Gossiper.instance.compareEndpointStartup{}}})
>  * The endpoint’s token range (from {{{}endpointStateMap{}}}) intersects with 
> {{{}new_IP{}}}’s
> This modification’s intention is to force nodes to realign on {{old_IP}} 
> expiration, and expunge it from Gossip so it does not reappear after 
> {{new_IP}} leaves the ring.
> Another approach we have also been considering is expunging {{old_IP}} at the 
> moment of the StorageService collision resolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Reply via email to