imbajin commented on code in PR #2961:
URL: https://github.com/apache/hugegraph/pull/2961#discussion_r2909775766
##########
hugegraph-pd/hg-pd-core/src/main/java/org/apache/hugegraph/pd/raft/RaftEngine.java:
##########
@@ -239,8 +241,20 @@ public String getLeaderGrpcAddress() throws
ExecutionException, InterruptedExcep
waitingForLeader(10000);
}
- return
raftRpcClient.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString()).get()
- .getGrpcAddress();
+ try {
+ RaftRpcProcessor.GetMemberResponse response = raftRpcClient
+
.getGrpcAddress(raftNode.getLeaderId().getEndpoint().toString())
+ .get(config.getRpcTimeout(), TimeUnit.MILLISECONDS);
Review Comment:
⚠️ `config.getRpcTimeout()` only bounds the callers
`CompletableFuture.get(...)` here; it does **not** bound the underlying Bolt
RPC. `RaftRpcClient` is still initialized with a fresh `new RpcOptions()` and
`invokeAsync(...)` uses `rpcOptions.getRpcDefaultTimeout()`, so in the
bridge-network failure case well return from this method quickly but still
leave the actual RPC running until the client default timeout fires. On a
follower that keeps redirecting traffic, those in-flight requests can
accumulate.
Could we wire the configured timeout into `RaftRpcClient` itself (or cancel
the future on timeout) so the transport timeout and the caller timeout stay
consistent?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]