[ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837763#comment-17837763
 ] 

Alex Petrov edited comment on CASSANDRA-19221 at 4/16/24 3:28 PM:
------------------------------------------------------------------

I've had a closer look at it, and wanted to mention that 5.0 behaviour is most 
likely uninteded; it contains at least one bug, and is potentially dangeroud. 
In short, my test was to spin up a 3 node cluster: {{127.0.0.1}}, 
{{127.0.0.2}}, {{127.0.0.3}}, and swap IP addresses for the two latter nodes 
({{.2}} and {{.3}}). As a result of this test, nodes have in fact swapped their 
IPs, but: 

  * if you would shut down {{.2}} and {{.3}}, and start {{.2}}, and then 
{{.3}}, {{.3}} startup won't even begin because ccm considers its IP address to 
be occupied, so an entire test can work only if you start the two nodes in 
parallel
  * after swapping ip addresses, ccm breaks, since it attempts to search {{UP}} 
message for a specific IP address for a node, which it doesn't find if you 
merely change the address in the conf file
  * peers table for {{.2}} whose address is now {{.3}} will still have {{.3}} 
in its peers table. 

In general, since we are using ip addresses for node identity, I am weary of 
allowing identity transfers for the occupied pars. By this I mean if {{ip <-> 
node id}} pair exists in the directory, we have to free up the IP address 
before the other node can claim it. So the test would look as follows:

So for swapping {{.2}} and {{.3}}, one of the nodes would have to migrate to 
{{.4}} first, and only then can the freed up IP address be occupied again. 

Submitting a patch that fixes the peers table behaviour and codifies a 
requirement of a separate node for swapping addresses.


was (Author: ifesdjeen):
I've had a closer look at it, and wanted to mention that 5.0 behaviour is most 
likely uninteded; it contains at least one bug, and is potentially dangeroud. 
In short, my test was to spin up a 3 node cluster: {{127.0.0.1}}, 
{{127.0.0.2}}, {{127.0.0.3}}, and swap IP addresses for the two latter nodes 
({{.2}} and {{.3}}. As a result of this test, nodes have in fact swapped their 
IPs, but: 

  * if you would shut down {{.2}} and {{.3}}, and start {{.2}}, and then 
{{.3}}, {{.3}} startup won't even begin because ccm considers its IP address to 
be occupied, so an entire test can work only if you start the two nodes in 
parallel
  * after swapping ip addresses, ccm breaks, since it attempts to search {{UP}} 
message for a specific IP address for a node, which it doesn't find if you 
merely change the address in the conf file
  * peers table for {{.2}} whose address is now {{.3}} will still have {{.3}} 
in its peers table. 

In general, since we are using ip addresses for node identity, I am weary of 
allowing identity transfers for the occupied pars. By this I mean if {{ip <-> 
node id}} pair exists in the directory, we have to free up the IP address 
before the other node can claim it. So the test would look as follows:

So for swapping {{.2}} and {{.3}}, one of the nodes would have to migrate to 
{{.4}} first, and only then can the freed up IP address be occupied again. 

Submitting a patch that fixes the peers table behaviour and codifies a 
requirement of a separate node for swapping addresses.

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19221
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: Paul Chandler
>            Assignee: Alex Petrov
>            Priority: Normal
>             Fix For: 5.1-alpha1
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-000000000003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-000000000002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-000000000001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to