[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Raymond Huffman (Jira) Mon, 20 Mar 2023 16:55:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702960#comment-17702960
 ]


Raymond Huffman commented on CASSANDRA-18319:
---------------------------------------------

Thanks for the response. I tested this a bit more and I don’t think it’s just 
log noise. While the cluster is in this state, reads and writes with 
consistency ALL will fail. Do you think Ines' suggestion above that purges fat 
clients matching a decommissioning node could be reasonable way to avoid 
getting into this state?

 

> Cassandra in Kubernetes: IP switch decommission issue
> -----------------------------------------------------
>
>                 Key: CASSANDRA-18319
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18319
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Ines Potier
>            Priority: Normal
>         Attachments: 3.11_gossipinfo.zip, node1_gossipinfo.txt, 
> test_decommission_after_ip_change_logs.zip, 
> v4.0_1678853171792_test_decommission_after_ip_change.zip
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have recently encountered a recurring old IP reappearance issue while 
> testing decommissions on some of our Kubernetes Cassandra staging clusters.
> *Issue Description*
> In Kubernetes, a Cassandra node can change IP at each pod bounce. We have 
> noticed that this behavior, associated with a decommission operation, can get 
> the cluster into an erroneous state.
> Consider the following situation: a Cassandra node {{node1}} , with 
> {{{}hostId1{}}}, owning 20.5% of the token ring, bounces and switches IP 
> ({{{}old_IP{}}} → {{{}new_IP{}}}). After a couple gossip iterations, all 
> other nodes’ nodetool status output includes a {{new_IP}} UN entry owning 
> 20.5% of the token ring and no {{old_IP}} entry.
> Shortly after the bounce, {{node1}} gets decommissioned. Our cluster does not 
> have a lot of data, and the decommission operation completes pretty quickly. 
> Logs on other nodes start showing acknowledgment that {{node1}} has left and 
> soon, nodetool status’ {{new_IP}} UL entry disappears. {{node1}} ‘s pod is 
> deleted.
> After a minute delay, the cluster enters the erroneous state. An  {{old_IP}} 
> DN entry reappears in nodetool status, owning 20.5% of the token ring. No 
> node owns this IP anymore and according to logs, {{old_IP}} is still 
> associated with {{{}hostId1{}}}.
> *Issue Root Cause*
> By digging through Cassandra logs, and re-testing this scenario over and over 
> again, we have reached the following conclusion: 
>  * Other nodes will continue exchanging gossip about {{old_IP}} , even after 
> it becomes a fatClient.
>  * The fatClient timeout and subsequent quarantine does not stop {{old_IP}} 
> from reappearing in a node’s Gossip state, once its quarantine is over. We 
> believe that this is due to a misalignment on all nodes’ {{old_IP}} 
> expiration time.
>  * Once {{new_IP}} has left the cluster, and {{old_IP}} next gossip state 
> message is received by a node, StorageService will no longer face collisions 
> (or will, but with an even older IP) for {{hostId1}} and its corresponding 
> tokens. As a result, {{old_IP}} will regain ownership of 20.5% of the token 
> ring.
> *Proposed fix*
> Following the above investigation, we were thinking about implementing the 
> following fix:
> When a node receives a gossip status change with {{STATE_LEFT}} for a leaving 
> endpoint {{{}new_IP{}}}, before evicting {{{}new_IP from the token ring, 
> purge from Gossip (ie evictFromMembership{}}}) all endpoints that meet the 
> following criteria:
>  * {{endpointStateMap}} contains this endpoint
>  * The endpoint is not currently a token owner 
> ({{{}!tokenMetadata.isMember(endpoint){}}})
>  * The endpoint’s {{hostId}} matches the {{hostId}} of {{new_IP}}
>  * The endpoint is older than {{leaving_IP}} 
> ({{{}Gossiper.instance.compareEndpointStartup{}}})
>  * The endpoint’s token range (from {{{}endpointStateMap{}}}) intersects with 
> {{{}new_IP{}}}’s
> This modification’s intention is to force nodes to realign on {{old_IP}} 
> expiration, and expunge it from Gossip so it does not reappear after 
> {{new_IP}} leaves the ring.
> Another approach we have also been considering is expunging {{old_IP}} at the 
> moment of the StorageService collision resolution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-18319) Cassandra in Kubernetes: IP switch decommission issue

Reply via email to