szetszwo commented on code in PR #10488:
URL: https://github.com/apache/ozone/pull/10488#discussion_r3437621520


##########
hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/OzoneConfigKeys.java:
##########
@@ -617,6 +617,17 @@ public final class OzoneConfigKeys {
   public static final boolean OZONE_CLIENT_FAILOVER_RESOLVE_NEEDED_DEFAULT =
           false;
 
+  /**
+   * Number of consecutive heartbeat failures the DN tolerates against the
+   * same SCM endpoint before re-resolving its hostname. Only consulted when
+   * {@link #OZONE_CLIENT_FAILOVER_RESOLVE_NEEDED_KEY} is true. Conservative
+   * default avoids re-resolution on transient blips while still recovering
+   * from a peer pod IP change within seconds.
+   */
+  public static final String OZONE_DN_SCM_HEARTBEAT_REFRESH_THRESHOLD_KEY =
+          "ozone.datanode.scm.heartbeat.address.refresh.threshold";

Review Comment:
   The existing confs have profix "hdds.heartbeat.xxx" and they are in 
HddsConfigKeys.  Please follow the current convention.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/EndpointStateMachine.java:
##########
@@ -79,6 +107,59 @@ public EndpointStateMachine(InetSocketAddress address,
             .build());
   }
 
+  /**
+   * The original "host:port" string used to construct {@link #getAddress()}.
+   * @return the host:port string, or null if not preserved at construction
+   */
+  public String getHostAndPort() {
+    return hostAndPort;
+  }
+
+  /**
+   * Re-resolves the configured hostname and reports whether the resolved
+   * IP differs from the cached {@link #address}. Does not mutate any
+   * state on this endpoint -- the caller is responsible for swapping
+   * this endpoint out (via {@link SCMConnectionManager#refreshSCMServer})
+   * if a refresh is desired.
+   * <p>
+   * Returns null when:
+   * <ul>
+   *   <li>{@link #hostAndPort} was not preserved at construction</li>
+   *   <li>the resolved IP is unchanged</li>
+   *   <li>the new resolution is unresolved (DNS lookup failed)</li>
+   * </ul>
+   * Returns the freshly-resolved {@link InetSocketAddress} when the IP
+   * has changed under the same hostname.
+   */
+  public InetSocketAddress resolveLatestAddress() {
+    if (hostAndPort == null) {
+      return null;
+    }
+    InetSocketAddress refreshed;
+    try {
+      refreshed = NetUtils.createSocketAddr(hostAndPort);
+    } catch (IllegalArgumentException ex) {
+      LOG.warn("Failed to re-resolve {}: {}", hostAndPort, ex.getMessage());
+      return null;
+    }
+    if (refreshed.isUnresolved()) {
+      LOG.warn("Re-resolution of {} produced an unresolved address; "
+          + "leaving cached address {} in place.", hostAndPort, address);
+      return null;
+    }
+    // Null-safe comparison: if the cached address itself was unresolved
+    // (initial DNS lookup failed; ctor accepted it with a warning),
+    // address.getAddress() is null and a successful re-resolution
+    // counts as a change. NPE-ing here would wedge the heartbeat
+    // refresh path on exactly the case it most needs to fix.
+    java.net.InetAddress cachedIp = address.getAddress();
+    if (cachedIp != null
+        && refreshed.getAddress().equals(cachedIp)) {
+      return null;
+    }
+    return refreshed;

Review Comment:
   Get refreshedIp and check null.
   ```java
       final InetAddress refreshedIp = refreshed.getAddress();
       if (refreshedIp == null) {
         LOG.warn("Failed to resolve {}; reusing previous address {}.", 
hostAndPort, address);
         return null;
       }
       if (refreshedIp.equals(address.getAddress())) {
         return null;
       }
       return refreshed;
   ```



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/SCMConnectionManager.java:
##########


Review Comment:
   We should change the key of this map to hostAndPort first.  It does not make 
sense to check if the resolved addresses collide.
   



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/EndpointStateMachine.java:
##########
@@ -79,6 +107,59 @@ public EndpointStateMachine(InetSocketAddress address,
             .build());
   }
 
+  /**
+   * The original "host:port" string used to construct {@link #getAddress()}.
+   * @return the host:port string, or null if not preserved at construction
+   */
+  public String getHostAndPort() {
+    return hostAndPort;
+  }
+
+  /**
+   * Re-resolves the configured hostname and reports whether the resolved
+   * IP differs from the cached {@link #address}. Does not mutate any
+   * state on this endpoint -- the caller is responsible for swapping
+   * this endpoint out (via {@link SCMConnectionManager#refreshSCMServer})
+   * if a refresh is desired.
+   * <p>
+   * Returns null when:
+   * <ul>
+   *   <li>{@link #hostAndPort} was not preserved at construction</li>
+   *   <li>the resolved IP is unchanged</li>
+   *   <li>the new resolution is unresolved (DNS lookup failed)</li>
+   * </ul>
+   * Returns the freshly-resolved {@link InetSocketAddress} when the IP
+   * has changed under the same hostname.
+   */
+  public InetSocketAddress resolveLatestAddress() {
+    if (hostAndPort == null) {
+      return null;
+    }
+    InetSocketAddress refreshed;
+    try {
+      refreshed = NetUtils.createSocketAddr(hostAndPort);
+    } catch (IllegalArgumentException ex) {
+      LOG.warn("Failed to re-resolve {}: {}", hostAndPort, ex.getMessage());
+      return null;

Review Comment:
   It should throw an exception:
   ```java
         throw new IllegalStateException("Malformed host address: " + 
hostAndPort, ex);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to