[
https://issues.apache.org/jira/browse/HDDS-13621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056294#comment-18056294
]
Tsz-wo Sze commented on HDDS-13621:
-----------------------------------
[~ssa], I agree that Ratis could clean up the cache as proposed in RATIS-2385.
However, Ozone should also check null to avoid NPE.
> NPE in OzoneManagerRatisServer.checkRetryCache
> ----------------------------------------------
>
> Key: HDDS-13621
> URL: https://issues.apache.org/jira/browse/HDDS-13621
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Manager
> Affects Versions: 2.0.0
> Reporter: Sergey Soldatov
> Assignee: Sergey Soldatov
> Priority: Major
> Labels: pull-request-available
>
> Under a load, OM periodically fails to check the RetryCache:
> {code:java}
> 2025-08-27 16:18:09,562 WARN ipc.Server: IPC Server handler 0 on default port
> 9862, call Call#5998989 Retry#2
> org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from
> 10.88.252.12:48376
> java.lang.NullPointerException: Cannot invoke
> "org.apache.ratis.protocol.Message.getContent()" because the return value of
> "org.apache.ratis.protocol.RaftClientReply.getMessage()" is null
> at
> org.apache.hadoop.ozone.om.helpers.OMRatisHelper.getOMResponseFromRaftClientReply(OMRatisHelper.java:68)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.getOMResponse(OzoneManagerRatisServer.java:570)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerRatisServer.checkRetryCache(OzoneManagerRatisServer.java:495)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.internalProcessRequest(OzoneManagerProtocolServerSideTranslatorPB.java:168)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:124)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:115)
> at
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1246)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1169)
> at
> java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
> at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3198){code}
> It's not clear yet whether this is Ozone or Ratis issue. RCA is in progress.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]