ivandika3 opened a new pull request, #9682:
URL: https://github.com/apache/ozone/pull/9682
## What changes were proposed in this pull request?
Currently in OMRatisHelper#getOMResponseFromRaftClientReply, the
OMResponse.leaderOMNodeId is set to the replierId (i.e.
RaftClientReply.serverId) which is set to the Raft peer that is serving the
request.
Previously, it was fine since all Ratis request all write requests that need
to be served by the leader. However, now that Ozone support OM follower read,
OM cannot use replierId as the OMResponse.leaderOMNodeId.
Hadoop27OmTransport and Hadoop3OmTransport submitRequest will failover to
the this new OM node ID. If the request is incorrectly set to a non-leader OM,
this might cause the subsequent request to wrongly be sent to the follower,
which might add to the write latency.
I also decided to remove the failover logic from the Hadoop transport
because it causes the client to failover twice
1. The setNextOmProxy Hadoop3OMTransport#submitRequest will set the current
proxy OM node ID
2. However, selectNextOmProxy will be triggered in
OmFailoverProxyProviderBase#getRetryPolicy
- This will update the OM proxy from pointing to the leader to the next
OM node
- This will trigger long failover, especially since we need to wait
longer client already goes through all OM nodes
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14426
## How was this patch tested?
IT (Clean CI: https://github.com/ivandika3/ozone/actions/runs/21431935770)
Note: `testOMProxyProviderFailoverToCurrentLeader` is failing because of the
removal of `setNextOmProxy` logic in Hadoop transport. I removed it because I
believe the test is not correct.
The test assumes that if we trigger
`OmFailoverProxyProvider#performFailover`, the next OM request will be sent to
the new current OM node ID since `FailoverProxyProvider#getProxy` is already
updated to the new OM node ID. However, this is incorrect since OM client wraps
the RPC proxy with `RetryProxy#create` which uses `RetryInvocationHandler` .
`RetryInvocationHandler` caches the current proxy in
`ProxyDescriptor.proxyInfo` and only call `FailoverProxyProvider#getProxy` when
there is a failover event which triggers `ProxyDescriptor#failover`. So even if
the test changes the OM node ID to non-leader, the `RetryInvocationHandler`
will keep sending to the leader until the leader steps down / down /
disconnected.
We can technically make the test pass by removing the cached
`ProxyDescriptor.proxyInfo` and make
`RetryInvocationHandler.ProxyDescriptor#getProxy` calls
`FailoverProxyProvider#getProxy` directly. However, this might affect other use
Hadoop client use cases where `FailoverProxyProvider#getProxy` recreates the
proxy every time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]