[jira] [Commented] (CASSANDRA-19219) CMS: restarting a CMS node with different ip address

Paul Chandler (Jira) Thu, 21 Dec 2023 04:55:05 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799417#comment-17799417
 ]


Paul Chandler commented on CASSANDRA-19219:
-------------------------------------------

"If you are seeing these happen concurrently (i.e. both nodes go down, then 
come back up with the other's address), we should investigate."

This is what I am seeing at the moment. I will raise a separate Jira for it. I 
have seen the code that I thought would trap it, but it does not seem to be 
catching this example.  

> CMS: restarting a CMS node with different ip address
> ----------------------------------------------------
>
>                 Key: CASSANDRA-19219
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19219
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Transactional Cluster Metadata
>            Reporter: Paul Chandler
>            Priority: Normal
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> a pod goes down and is re created with a new ip address, the data is all 
> stored on a detached volume so when the new pod is created all the old data 
> for the node is reattached. In 4.0 this is handled correctly the node will 
> come back up with the same hostid, tokens etc, just a new ip address and the 
> cluster is healthy throughout.
>  
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> I then run nodetool -p 7199 reconfigurecms datacenter1:3 --sync to create 3 
> CMS nodes
> I then bring down 127.0.0.1 and replace the rpc_address and listen_address 
> with 127.0.0.4 and re start the node. The node then hangs with this as the 
> last error message:
> (8821185654333640868,9200867415893016118]=ForRange\{lastModified=Epoch{epoch=12},
>  
> endpointsForRange=[Full(/127.0.0.1:7000,(8821185654333640868,9200867415893016118]),
>  Full(/127.0.0.2:7000,(8821185654333640868,9200867415893016118]), 
> Full(/127.0.0.3:7000,(8821185654333640868,9200867415893016118])]},
> }}}, lockedRanges=LockedRanges\{lastModified=Epoch{epoch=14}, locked={}}}. 
> This can mean that this node is configured differently from CMS.
> java.lang.AssertionError: not aware of any cluster members
>         at 
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalReplicas(NetworkTopologyStrategy.java:233)
>         at 
> org.apache.cassandra.locator.CMSPlacementStrategy$DatacenterAware.reconfigure(CMSPlacementStrategy.java:119)
>         at 
> org.apache.cassandra.tcm.transformations.cms.PrepareCMSReconfiguration$Complex.execute(PrepareCMSReconfiguration.java:164)
>         at 
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:429)
>         at 
> org.apache.cassandra.tcm.log.LocalLog$Async$AsyncRunnable.run(LocalLog.java:682)
>         at 
> org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:121)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> WARN  [GlobalLogFollower] 2023-12-21 11:11:34,408 LocalLog.java:693 - 
> Stopping log processing on the node... All subsequent epochs will be ignored.
> org.apache.cassandra.tcm.log.LocalLog$StopProcessingException: 
> java.lang.AssertionError: not aware of any cluster members
>         at 
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:434)
>         at 
> org.apache.cassandra.tcm.log.LocalLog$Async$AsyncRunnable.run(LocalLog.java:682)
>         at 
> org.apache.cassandra.concurrent.InfiniteLoopExecutor.loop(InfiniteLoopExecutor.java:121)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.AssertionError: not aware of any cluster members
>         at 
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalReplicas(NetworkTopologyStrategy.java:233)
>         at 
> org.apache.cassandra.locator.CMSPlacementStrategy$DatacenterAware.reconfigure(CMSPlacementStrategy.java:119)
>         at 
> org.apache.cassandra.tcm.transformations.cms.PrepareCMSReconfiguration$Complex.execute(PrepareCMSReconfiguration.java:164)
>         at 
> org.apache.cassandra.tcm.log.LocalLog.processPendingInternal(LocalLog.java:429)
>         ... 4 common frames omitted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19219) CMS: restarting a CMS node with different ip address

Reply via email to