Copilot commented on code in PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#discussion_r2960380363


##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +229,57 @@ public PeerId getLeader() {
     }
 
     /**
-     * Send a message to the leader to get the grpc address;
+     * Send a message to the leader to get the grpc address.
      */
     public String getLeaderGrpcAddress() throws ExecutionException, 
InterruptedException {
         if (isLeader()) {
             return config.getGrpcAddress();
         }
 
         if (raftNode.getLeaderId() == null) {
-            waitingForLeader(10000);
+            waitingForLeader(config.getRpcTimeout());

Review Comment:
   `waitingForLeader()` waits in fixed 1000ms intervals (see 
`waitingForLeader(long)`), so passing `config.getRpcTimeout()` here can violate 
the configured timeout when `rpcTimeout < 1000` (it may still block ~1s). 
Consider either clamping the value passed in (e.g., 
`Math.max(config.getRpcTimeout(), 1000)`) or updating `waitingForLeader()` to 
wait for `min(remaining, 1000)` so the timeout is respected.
   



##########
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/RaftEngineLeaderAddressTest.java:
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hugegraph.pd.raft;
+
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import org.apache.hugegraph.pd.config.PDConfig;
+import org.apache.hugegraph.testutil.Whitebox;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import com.alipay.sofa.jraft.Node;
+import com.alipay.sofa.jraft.entity.PeerId;
+import com.alipay.sofa.jraft.util.Endpoint;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.ArgumentMatchers.anyString;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+public class RaftEngineLeaderAddressTest {
+
+    private static final String LEADER_IP = "10.0.0.1";
+    private static final int GRPC_PORT = 8686;
+    private static final String LEADER_GRPC_ADDRESS = "10.0.0.1:8686";
+
+    private Node originalRaftNode;
+    private RaftRpcClient originalRaftRpcClient;
+    private PDConfig.Raft originalConfig;
+
+    private Node mockNode;
+    private RaftRpcClient mockRpcClient;
+    private PDConfig.Raft mockConfig;
+    private PeerId mockLeader;
+
+    @Before
+    public void setUp() {
+        RaftEngine engine = RaftEngine.getInstance();
+
+        // Save originals
+        originalRaftNode = engine.getRaftNode();
+        originalRaftRpcClient = Whitebox.getInternalState(engine, 
"raftRpcClient");
+        originalConfig = Whitebox.getInternalState(engine, "config");
+
+        // Build mock leader PeerId with real Endpoint
+        mockLeader = mock(PeerId.class);
+        Endpoint endpoint = new Endpoint(LEADER_IP, 8610);
+        when(mockLeader.getEndpoint()).thenReturn(endpoint);
+
+        // Build mock Node that reports itself as follower with a known leader
+        mockNode = mock(Node.class);
+        when(mockNode.isLeader(true)).thenReturn(false);
+        when(mockNode.getLeaderId()).thenReturn(mockLeader);
+
+        // Build mock config
+        // Use a short timeout (100ms) so the null-leader test doesn't block 
for seconds
+        mockConfig = mock(PDConfig.Raft.class);
+        when(mockConfig.getGrpcAddress()).thenReturn("127.0.0.1:" + GRPC_PORT);
+        when(mockConfig.getGrpcPort()).thenReturn(GRPC_PORT);
+        when(mockConfig.getRpcTimeout()).thenReturn(100);
+

Review Comment:
   This comment is now misleading: the tests generally don't hit 
`waitingForLeader()` with the 100ms timeout (the only null-leader test 
explicitly overrides `getRpcTimeout()` to 0ms). Consider updating/removing the 
comment to match the actual behavior being tested.



##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +229,57 @@ public PeerId getLeader() {
     }
 
     /**
-     * Send a message to the leader to get the grpc address;
+     * Send a message to the leader to get the grpc address.
      */
     public String getLeaderGrpcAddress() throws ExecutionException, 
InterruptedException {
         if (isLeader()) {
             return config.getGrpcAddress();
         }
 
         if (raftNode.getLeaderId() == null) {
-            waitingForLeader(10000);
+            waitingForLeader(config.getRpcTimeout());
+        }
+
+        // Cache leader to avoid repeated getLeaderId() calls and guard against
+        // waitingForLeader() returning without a leader being elected.
+        PeerId leader = raftNode.getLeaderId();
+        if (leader == null) {
+            throw new ExecutionException(new IllegalStateException("Leader is 
not ready"));
+        }
+
+        RaftRpcProcessor.GetMemberResponse response = null;
+        try {
+            // TODO: a more complete fix would need a source of truth for the 
leader's
+            // actual grpcAddress rather than deriving it from the local 
node's port config.
+            response = raftRpcClient
+                    .getGrpcAddress(leader.getEndpoint().toString())
+                    .get(config.getRpcTimeout(), TimeUnit.MILLISECONDS);
+            if (response != null && response.getGrpcAddress() != null) {
+                return response.getGrpcAddress();
+            }
+            if (response == null) {
+                log.warn("Leader RPC response is null for {}, falling back to 
derived address",
+                         leader);
+            } else {
+                log.warn("Leader gRPC address field is null in RPC response 
for {}, "
+                         + "falling back to derived address", leader);
+            }
+        } catch (TimeoutException e) {
+            log.warn("Timed out resolving leader gRPC address for {}, falling 
back to derived "
+                     + "address", leader);
+        } catch (ExecutionException e) {
+            Throwable cause = e.getCause() != null ? e.getCause() : e;
+            log.warn("Failed to resolve leader gRPC address for {}, falling 
back to derived "
+                     + "address", leader, cause);
         }
 
-        return 
raftRpcClient.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString()).get()
-                            .getGrpcAddress();
+        // Best-effort fallback: derive from leader raft endpoint IP + local 
gRPC port.
+        // WARNING: this may be incorrect in clusters where PD nodes use 
different grpc.port
+        // values, a proper fix requires a cluster-wide source of truth for 
gRPC addresses.
+        String derived = leader.getEndpoint().getIp() + ":" + 
config.getGrpcPort();
+        log.warn("Using derived leader gRPC address {} — may be incorrect if 
nodes use different ports",

Review Comment:
   The fallback path logs warnings multiple times per call (one for the failure 
condition and another for "Using derived leader gRPC address..."). Since 
`getLeaderGrpcAddress()` is invoked in hot paths like 
`PDService.redirectToLeader()` (and even called twice there), this can generate 
excessive WARN logs under failure conditions. Consider consolidating to a 
single WARN that includes the derived address (or downgrading the final message 
to INFO/DEBUG) to reduce log volume.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to