Copilot commented on code in PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#discussion_r2930111538


##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +230,42 @@ public PeerId getLeader() {
     }
 
     /**
-     * Send a message to the leader to get the grpc address;
+     * Send a message to the leader to get the grpc address.
      */
     public String getLeaderGrpcAddress() throws ExecutionException, 
InterruptedException {
         if (isLeader()) {
             return config.getGrpcAddress();
         }
 
         if (raftNode.getLeaderId() == null) {
-            waitingForLeader(10000);
+            waitingForLeader(config.getRpcTimeout());
         }
 
-        return 
raftRpcClient.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString()).get()
-                            .getGrpcAddress();
+        // Cache leader to avoid repeated getLeaderId() calls and guard against
+        // waitingForLeader() returning without a leader being elected.
+        PeerId leader = raftNode.getLeaderId();
+        if (leader == null) {
+            throw new ExecutionException(new IllegalStateException("Leader is 
not ready"));
+        }
+
+        try {
+            RaftRpcProcessor.GetMemberResponse response = raftRpcClient
+                    .getGrpcAddress(leader.getEndpoint().toString())
+                    .get(config.getRpcTimeout(), TimeUnit.MILLISECONDS);
+            if (response != null && response.getGrpcAddress() != null) {
+                return response.getGrpcAddress();
+            }
+        } catch (TimeoutException | ExecutionException e) {
+            // TODO: a more complete fix would need a source of truth for the 
leader's
+            // actual grpcAddress rather than deriving it from the local 
node's port config.
+            throw new ExecutionException(
+                    String.format("Failed to resolve leader gRPC address for 
%s", leader), e);
+        }
+
+        log.warn("Leader gRPC address field is null in RPC response for {}", 
leader);
+        throw new ExecutionException(
+                new IllegalStateException(
+                        String.format("Leader gRPC address unavailable for 
%s", leader)));

Review Comment:
   The PR description says we should fall back to deriving the leader gRPC 
address from the raft endpoint IP + local gRPC port when the bolt RPC times 
out/fails, but the current implementation still throws ExecutionException on 
TimeoutException/ExecutionException and also throws when the 
response/grpcAddress is null. This means callers like 
PDService.redirectToLeader() can still fail to forward requests when the bolt 
RPC is broken in bridge networking, so the cluster behavior described in #2959 
likely remains broken. Consider returning a derived address (e.g., leader 
endpoint host + config grpcPort) on timeout/failed response (and only throw if 
that derivation is impossible).



##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +230,42 @@ public PeerId getLeader() {
     }
 
     /**
-     * Send a message to the leader to get the grpc address;
+     * Send a message to the leader to get the grpc address.
      */
     public String getLeaderGrpcAddress() throws ExecutionException, 
InterruptedException {
         if (isLeader()) {
             return config.getGrpcAddress();
         }
 
         if (raftNode.getLeaderId() == null) {
-            waitingForLeader(10000);
+            waitingForLeader(config.getRpcTimeout());
         }
 
-        return 
raftRpcClient.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString()).get()
-                            .getGrpcAddress();
+        // Cache leader to avoid repeated getLeaderId() calls and guard against
+        // waitingForLeader() returning without a leader being elected.
+        PeerId leader = raftNode.getLeaderId();
+        if (leader == null) {
+            throw new ExecutionException(new IllegalStateException("Leader is 
not ready"));
+        }
+
+        try {
+            RaftRpcProcessor.GetMemberResponse response = raftRpcClient
+                    .getGrpcAddress(leader.getEndpoint().toString())
+                    .get(config.getRpcTimeout(), TimeUnit.MILLISECONDS);
+            if (response != null && response.getGrpcAddress() != null) {
+                return response.getGrpcAddress();
+            }
+        } catch (TimeoutException | ExecutionException e) {
+            // TODO: a more complete fix would need a source of truth for the 
leader's
+            // actual grpcAddress rather than deriving it from the local 
node's port config.
+            throw new ExecutionException(
+                    String.format("Failed to resolve leader gRPC address for 
%s", leader), e);

Review Comment:
   In this catch block, catching ExecutionException and then wrapping it in a 
new ExecutionException leads to double-wrapping (the real root cause ends up at 
e.getCause()). It will also make it harder for callers to distinguish timeout 
vs other failures. Consider handling TimeoutException separately and either 
rethrow the original ExecutionException or wrap e.getCause() so the root cause 
is preserved more directly.
   



##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -119,8 +122,7 @@ public synchronized boolean init(PDConfig.Raft config) {
         nodeOptions.setRpcConnectTimeoutMs(config.getRpcTimeout());
         nodeOptions.setRpcDefaultTimeout(config.getRpcTimeout());
         nodeOptions.setRpcInstallSnapshotTimeout(config.getRpcTimeout());
-        // Set the raft configuration
-        RaftOptions raftOptions = nodeOptions.getRaftOptions();
+        // TODO: tune RaftOptions for PD (see hugegraph-store PartitionEngine 
for reference)
 
         nodeOptions.setEnableMetrics(true);
 

Review Comment:
   nodeOptions.setEnableMetrics(true) is called twice (earlier right after 
setFsm(), and again here). The second call is redundant and can be removed to 
reduce noise and avoid implying there are two separate metric toggles.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to