[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sam Tunnicliffe updated CASSANDRA-19221: ---------------------------------------- Reviewers: Sam Tunnicliffe, Sam Tunnicliffe Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > CMS: Nodes can restart with new ipaddress already defined in the cluster > ------------------------------------------------------------------------ > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata > Reporter: Paul Chandler > Assignee: Alex Petrov > Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-000000000003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-000000000001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-000000000003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-000000000002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-000000000001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org