beobal commented on code in PR #4613:
URL: https://github.com/apache/cassandra/pull/4613#discussion_r2873282343
##########
src/java/org/apache/cassandra/tcm/Startup.java:
##########
@@ -184,6 +239,128 @@ public static void
initializeAsNonCmsNode(Function<Processor, Processor> wrapPro
}
}
+
+ /**
+ * If the broadcast address of this node has changed, we must verify the
endpoints it knows for
+ * the members of the CMS are still reachable and valid. This is necessary
for the node to submit
+ * a STARTUP transformation which updates its broadcast address in
ClusterMetadata.
+ *
+ * If the node is itself a CMS member, it is also a requirement to be able
to contact a
+ * majority of the other CMS members in order to perform the serial reads
and writes which
+ * constitute committing to and fetching from the distributed metadata log.
+ *
+ * To do this, we use a simple protocol:
+ * 1. For each CMS member in our replayed ClusterMetadata, ping the
associated broadcast address
+ * to query for id of the node at that address. This determines whether
the endpoint still
+ * belongs to that same node (which is/was a CMS member).
+ * 2. While we don't have confirmed current addresses for a majority of
CMS nodes:
+ * 2a. Run discovery to locate as many peer addresses as possible.
+ * 2b. Query every discovered endpoint and ask for its node id.
+ * If we still don't have confirmed addresses for a majority of CMS
members, go to 2a and
+ * repeat as peers may themselves still be starting up and so may have
become discoverable.
+ *
+ * This process builds up a mapping of id -> current address for CMS
members which can then be
+ * used to construct a set of temporary redirects between addresses
according to ClusterMetadata
+ * and the newly discovered ones.
+ *
+ * As each CMS node with a changed address goes through the startup
process, it will commit its
+ * STARTUP transformation and the new broadcast address will be found in
ClusterMetadata. A log
+ * listener is used to react to these transformations by removing
redundant address overrides
+ * as they are enacted.
+ *
+ * @param nodeId derived from the persisted id of this node from the
system.peers table
+ * @param replayed current ClusterMetadata after replaying the metadata
log for startup
+ */
+ private static ClusterMetadata initializeCMSLookup(NodeId nodeId,
ClusterMetadata replayed)
+ {
+ InetAddressAndPort oldAddress = replayed.directory.endpoint(nodeId);
+ InetAddressAndPort newAddress =
FBUtilities.getBroadcastAddressAndPort();
+ if (newAddress.equals(oldAddress))
+ return replayed;
+
+ Map<NodeId, InetAddressAndPort> previousCMS = new HashMap<>();
+ replayed.fullCMSMemberIds().forEach(id -> previousCMS.put(id,
replayed.directory.endpoint(id)));
+ Map<NodeId, InetAddressAndPort> confirmedCMS = new HashMap<>();
+
+ Set<InetAddressAndPort> candidates = new
HashSet<>(previousCMS.values());
+ candidates.add(newAddress);
+
+ int maxRounds = 5;
+ int currentRound = 0;
+ long roundTimeNanos = Math.min(TimeUnit.SECONDS.toNanos(4),
+
DatabaseDescriptor.getDiscoveryTimeout(TimeUnit.NANOSECONDS) / maxRounds);
+ // TODO a non-CMS node only needs to be able to contact a single CMS
member to commit its STARTUP
Review Comment:
I think discovering the full CMS in the common case is not really a problem.
To the second point, I don't think it makes an awful lot of difference but
once we are sure we know a majority of CMS member addresses isn't it better to
start trying to commit the Startup? If we wait for confirmation of addresses
for the whole CMS we could end up being blocked unnecessarily by a single DOWN
CMS member?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]