Copilot commented on code in PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#discussion_r2960380363
##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +229,57 @@ public PeerId getLeader() {
}
/**
- * Send a message to the leader to get the grpc address;
+ * Send a message to the leader to get the grpc address.
*/
public String getLeaderGrpcAddress() throws ExecutionException,
InterruptedException {
if (isLeader()) {
return config.getGrpcAddress();
}
if (raftNode.getLeaderId() == null) {
- waitingForLeader(10000);
+ waitingForLeader(config.getRpcTimeout());
Review Comment:
`waitingForLeader()` waits in fixed 1000ms intervals (see
`waitingForLeader(long)`), so passing `config.getRpcTimeout()` here can violate
the configured timeout when `rpcTimeout < 1000` (it may still block ~1s).
Consider either clamping the value passed in (e.g.,
`Math.max(config.getRpcTimeout(), 1000)`) or updating `waitingForLeader()` to
wait for `min(remaining, 1000)` so the timeout is respected.
##########
hugegraph-pd/hg-pd-test/src/main/java/org/apache/hugegraph/pd/raft/RaftEngineLeaderAddressTest.java:
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hugegraph.pd.raft;
+
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+
+import org.apache.hugegraph.pd.config.PDConfig;
+import org.apache.hugegraph.testutil.Whitebox;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import com.alipay.sofa.jraft.Node;
+import com.alipay.sofa.jraft.entity.PeerId;
+import com.alipay.sofa.jraft.util.Endpoint;
+
+import static org.mockito.ArgumentMatchers.anyLong;
+import static org.mockito.ArgumentMatchers.anyString;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+public class RaftEngineLeaderAddressTest {
+
+ private static final String LEADER_IP = "10.0.0.1";
+ private static final int GRPC_PORT = 8686;
+ private static final String LEADER_GRPC_ADDRESS = "10.0.0.1:8686";
+
+ private Node originalRaftNode;
+ private RaftRpcClient originalRaftRpcClient;
+ private PDConfig.Raft originalConfig;
+
+ private Node mockNode;
+ private RaftRpcClient mockRpcClient;
+ private PDConfig.Raft mockConfig;
+ private PeerId mockLeader;
+
+ @Before
+ public void setUp() {
+ RaftEngine engine = RaftEngine.getInstance();
+
+ // Save originals
+ originalRaftNode = engine.getRaftNode();
+ originalRaftRpcClient = Whitebox.getInternalState(engine,
"raftRpcClient");
+ originalConfig = Whitebox.getInternalState(engine, "config");
+
+ // Build mock leader PeerId with real Endpoint
+ mockLeader = mock(PeerId.class);
+ Endpoint endpoint = new Endpoint(LEADER_IP, 8610);
+ when(mockLeader.getEndpoint()).thenReturn(endpoint);
+
+ // Build mock Node that reports itself as follower with a known leader
+ mockNode = mock(Node.class);
+ when(mockNode.isLeader(true)).thenReturn(false);
+ when(mockNode.getLeaderId()).thenReturn(mockLeader);
+
+ // Build mock config
+ // Use a short timeout (100ms) so the null-leader test doesn't block
for seconds
+ mockConfig = mock(PDConfig.Raft.class);
+ when(mockConfig.getGrpcAddress()).thenReturn("127.0.0.1:" + GRPC_PORT);
+ when(mockConfig.getGrpcPort()).thenReturn(GRPC_PORT);
+ when(mockConfig.getRpcTimeout()).thenReturn(100);
+
Review Comment:
This comment is now misleading: the tests generally don't hit
`waitingForLeader()` with the 100ms timeout (the only null-leader test
explicitly overrides `getRpcTimeout()` to 0ms). Consider updating/removing the
comment to match the actual behavior being tested.
##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -228,19 +229,57 @@ public PeerId getLeader() {
}
/**
- * Send a message to the leader to get the grpc address;
+ * Send a message to the leader to get the grpc address.
*/
public String getLeaderGrpcAddress() throws ExecutionException,
InterruptedException {
if (isLeader()) {
return config.getGrpcAddress();
}
if (raftNode.getLeaderId() == null) {
- waitingForLeader(10000);
+ waitingForLeader(config.getRpcTimeout());
+ }
+
+ // Cache leader to avoid repeated getLeaderId() calls and guard against
+ // waitingForLeader() returning without a leader being elected.
+ PeerId leader = raftNode.getLeaderId();
+ if (leader == null) {
+ throw new ExecutionException(new IllegalStateException("Leader is
not ready"));
+ }
+
+ RaftRpcProcessor.GetMemberResponse response = null;
+ try {
+ // TODO: a more complete fix would need a source of truth for the
leader's
+ // actual grpcAddress rather than deriving it from the local
node's port config.
+ response = raftRpcClient
+ .getGrpcAddress(leader.getEndpoint().toString())
+ .get(config.getRpcTimeout(), TimeUnit.MILLISECONDS);
+ if (response != null && response.getGrpcAddress() != null) {
+ return response.getGrpcAddress();
+ }
+ if (response == null) {
+ log.warn("Leader RPC response is null for {}, falling back to
derived address",
+ leader);
+ } else {
+ log.warn("Leader gRPC address field is null in RPC response
for {}, "
+ + "falling back to derived address", leader);
+ }
+ } catch (TimeoutException e) {
+ log.warn("Timed out resolving leader gRPC address for {}, falling
back to derived "
+ + "address", leader);
+ } catch (ExecutionException e) {
+ Throwable cause = e.getCause() != null ? e.getCause() : e;
+ log.warn("Failed to resolve leader gRPC address for {}, falling
back to derived "
+ + "address", leader, cause);
}
- return
raftRpcClient.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString()).get()
- .getGrpcAddress();
+ // Best-effort fallback: derive from leader raft endpoint IP + local
gRPC port.
+ // WARNING: this may be incorrect in clusters where PD nodes use
different grpc.port
+ // values, a proper fix requires a cluster-wide source of truth for
gRPC addresses.
+ String derived = leader.getEndpoint().getIp() + ":" +
config.getGrpcPort();
+ log.warn("Using derived leader gRPC address {} — may be incorrect if
nodes use different ports",
Review Comment:
The fallback path logs warnings multiple times per call (one for the failure
condition and another for "Using derived leader gRPC address..."). Since
`getLeaderGrpcAddress()` is invoked in hot paths like
`PDService.redirectToLeader()` (and even called twice there), this can generate
excessive WARN logs under failure conditions. Consider consolidating to a
single WARN that includes the derived address (or downgrading the final message
to INFO/DEBUG) to reduce log volume.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]