[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837763#comment-17837763 ]
Alex Petrov edited comment on CASSANDRA-19221 at 4/16/24 3:28 PM: ------------------------------------------------------------------ I've had a closer look at it, and wanted to mention that 5.0 behaviour is most likely uninteded; it contains at least one bug, and is potentially dangeroud. In short, my test was to spin up a 3 node cluster: {{127.0.0.1}}, {{127.0.0.2}}, {{127.0.0.3}}, and swap IP addresses for the two latter nodes ({{.2}} and {{.3}}). As a result of this test, nodes have in fact swapped their IPs, but: * if you would shut down {{.2}} and {{.3}}, and start {{.2}}, and then {{.3}}, {{.3}} startup won't even begin because ccm considers its IP address to be occupied, so an entire test can work only if you start the two nodes in parallel * after swapping ip addresses, ccm breaks, since it attempts to search {{UP}} message for a specific IP address for a node, which it doesn't find if you merely change the address in the conf file * peers table for {{.2}} whose address is now {{.3}} will still have {{.3}} in its peers table. In general, since we are using ip addresses for node identity, I am weary of allowing identity transfers for the occupied pars. By this I mean if {{ip <-> node id}} pair exists in the directory, we have to free up the IP address before the other node can claim it. So the test would look as follows: So for swapping {{.2}} and {{.3}}, one of the nodes would have to migrate to {{.4}} first, and only then can the freed up IP address be occupied again. Submitting a patch that fixes the peers table behaviour and codifies a requirement of a separate node for swapping addresses. was (Author: ifesdjeen): I've had a closer look at it, and wanted to mention that 5.0 behaviour is most likely uninteded; it contains at least one bug, and is potentially dangeroud. In short, my test was to spin up a 3 node cluster: {{127.0.0.1}}, {{127.0.0.2}}, {{127.0.0.3}}, and swap IP addresses for the two latter nodes ({{.2}} and {{.3}}. As a result of this test, nodes have in fact swapped their IPs, but: * if you would shut down {{.2}} and {{.3}}, and start {{.2}}, and then {{.3}}, {{.3}} startup won't even begin because ccm considers its IP address to be occupied, so an entire test can work only if you start the two nodes in parallel * after swapping ip addresses, ccm breaks, since it attempts to search {{UP}} message for a specific IP address for a node, which it doesn't find if you merely change the address in the conf file * peers table for {{.2}} whose address is now {{.3}} will still have {{.3}} in its peers table. In general, since we are using ip addresses for node identity, I am weary of allowing identity transfers for the occupied pars. By this I mean if {{ip <-> node id}} pair exists in the directory, we have to free up the IP address before the other node can claim it. So the test would look as follows: So for swapping {{.2}} and {{.3}}, one of the nodes would have to migrate to {{.4}} first, and only then can the freed up IP address be occupied again. Submitting a patch that fixes the peers table behaviour and codifies a requirement of a separate node for swapping addresses. > CMS: Nodes can restart with new ipaddress already defined in the cluster > ------------------------------------------------------------------------ > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata > Reporter: Paul Chandler > Assignee: Alex Petrov > Priority: Normal > Fix For: 5.1-alpha1 > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-000000000003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-000000000001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-000000000003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-000000000001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org