[
https://issues.apache.org/jira/browse/HDDS-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-14516:
-------------------------------
Description:
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
higher latency (around 500ms) compared to the following OM requests (which only
runs for <10ms) from the same client. If another client sends a request, this
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
It does not seem to be related to the getServiceInfo() as I tried to remove the
initial getServiceInfo() and the InfoVolume becomes the slow one instead. It
also does not seem to be related to the ReadIndex slowness since even if the
appliedIndex remains unchanged and also using optimization such as
[RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the
issue still happens. Network slowness is also out of the question since the
slowness happens in a test.
We need to check the reason of this.
was:
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
higher latency (around 500ms) compared to the following OM requests (which only
runs for <10ms) from the same client. If another client sends a request, this
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
It does not seem to be related to the same request as I tried to remove the
initial getServiceInfo(). It also does not seem to be related to the ReadIndex
slowness since even if after using
[RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the
issue still happens.
We need to check the reason of this.
> Investigate high latency on first OM linearizable read request
> --------------------------------------------------------------
>
> Key: HDDS-14516
> URL: https://issues.apache.org/jira/browse/HDDS-14516
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
> linearizable read, the first OM read request from a unique client (e.g.
> getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
> higher latency (around 500ms) compared to the following OM requests (which
> only runs for <10ms) from the same client. If another client sends a request,
> this issue happens again for the first request of that client.
> {code:java}
> 2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
> 2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
> 2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
> It does not seem to be related to the getServiceInfo() as I tried to remove
> the initial getServiceInfo() and the InfoVolume becomes the slow one instead.
> It also does not seem to be related to the ReadIndex slowness since even if
> the appliedIndex remains unchanged and also using optimization such as
> [RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the
> issue still happens. Network slowness is also out of the question since the
> slowness happens in a test.
> We need to check the reason of this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]