[
https://issues.apache.org/jira/browse/HDDS-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-14516:
-------------------------------
Summary: Investigate high latency OM follower linearizable read request
(was: Investigate high latency on first OM linearizable read request)
> Investigate high latency OM follower linearizable read request
> --------------------------------------------------------------
>
> Key: HDDS-14516
> URL: https://issues.apache.org/jira/browse/HDDS-14516
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
> linearizable read, the first OM read request from a unique client (e.g.
> getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
> higher latency (around 500ms) compared to the following OM requests (which
> only runs for <10ms) from the same client. If another client sends a request,
> this issue happens again for the first request of that client.
> {code:java}
> 2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
> 2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
> 2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
> It does not seem to be related to the getServiceInfo() as I tried to remove
> the initial getServiceInfo() and the InfoVolume becomes the slow one instead.
> It also does not seem to be related to the ReadIndex network slowness since
> the high latency happens only in a test.
> We need to check the reason of this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]