Re: [PR] CASSANDRA-20476 & CASSANDRA-20736 Handle CMS member addresses changing concurrently [cassandra]

via GitHub Mon, 02 Mar 2026 08:07:24 -0800


beobal commented on code in PR #4613:
URL: https://github.com/apache/cassandra/pull/4613#discussion_r2873282343



##########
src/java/org/apache/cassandra/tcm/Startup.java:
##########
@@ -184,6 +239,128 @@ public static void 
initializeAsNonCmsNode(Function<Processor, Processor> wrapPro
         }
     }
 
+
+    /**
+     * If the broadcast address of this node has changed, we must verify the 
endpoints it knows for
+     * the members of the CMS are still reachable and valid. This is necessary 
for the node to submit
+     * a STARTUP transformation which updates its broadcast address in 
ClusterMetadata.
+     *
+     * If the node is itself a CMS member, it is also a requirement to be able 
to contact a
+     * majority of the other CMS members in order to perform the serial reads 
and writes which
+     * constitute committing to and fetching from the distributed metadata log.
+     *
+     * To do this, we use a simple protocol:
+     * 1. For each CMS member in our replayed ClusterMetadata, ping the 
associated broadcast address
+     *   to query for id of the node at that address. This determines whether 
the endpoint still
+     *   belongs to that same node (which is/was a CMS member).
+     * 2. While we don't have confirmed current addresses for a majority of 
CMS nodes:
+     * 2a. Run discovery to locate as many peer addresses as possible.
+     * 2b. Query every discovered endpoint and ask for its node id.
+     * If we still don't have confirmed addresses for a majority of CMS 
members, go to 2a and
+     * repeat as peers may themselves still be starting up and so may have 
become discoverable.
+     *
+     * This process builds up a mapping of id -> current address for CMS 
members which can then be
+     * used to construct a set of temporary redirects between addresses 
according to ClusterMetadata
+     * and the newly discovered ones.
+     *
+     * As each CMS node with a changed address goes through the startup 
process, it will commit its
+     * STARTUP transformation and the new broadcast address will be found in 
ClusterMetadata. A log
+     * listener is used to react to these transformations by removing 
redundant address overrides
+     * as they are enacted.
+     *
+     * @param nodeId derived from the persisted id of this node from the 
system.peers table
+     * @param replayed current ClusterMetadata after replaying the metadata 
log for startup
+     */
+    private static ClusterMetadata initializeCMSLookup(NodeId nodeId, 
ClusterMetadata replayed)
+    {
+        InetAddressAndPort oldAddress = replayed.directory.endpoint(nodeId);
+        InetAddressAndPort newAddress = 
FBUtilities.getBroadcastAddressAndPort();
+        if (newAddress.equals(oldAddress))
+            return replayed;
+
+        Map<NodeId, InetAddressAndPort> previousCMS = new HashMap<>();
+        replayed.fullCMSMemberIds().forEach(id -> previousCMS.put(id, 
replayed.directory.endpoint(id)));
+        Map<NodeId, InetAddressAndPort> confirmedCMS = new HashMap<>();
+
+        Set<InetAddressAndPort> candidates = new 
HashSet<>(previousCMS.values());
+        candidates.add(newAddress);
+
+        int maxRounds = 5;
+        int currentRound = 0;
+        long roundTimeNanos = Math.min(TimeUnit.SECONDS.toNanos(4),
+                                       
DatabaseDescriptor.getDiscoveryTimeout(TimeUnit.NANOSECONDS) / maxRounds);
+        // TODO a non-CMS node only needs to be able to contact a single CMS 
member to commit its STARTUP

Review Comment:
   I think discovering the full CMS in the common case is not really a problem. 
   To the second point, I don't think it makes an awful lot of difference but 
once we are sure we know a majority of CMS member addresses isn't it better to 
start trying to commit the Startup? If we wait for confirmation of addresses 
for the whole CMS we could end up being blocked unnecessarily by a single DOWN 
CMS member?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-20476 & CASSANDRA-20736 Handle CMS member addresses changing concurrently [cassandra]

Reply via email to