smengcl opened a new pull request, #9781: URL: https://github.com/apache/ozone/pull/9781
## What changes were proposed in this pull request? 1. Catch `IOException` instead of `Exception` in `compactDB` and `triggerSnapshotDefrag`. Note: It is also not guaranteed that `IOException` would have detailed message (i.e. getMessage() can still return null). 2. Add null check and fallback whenever `setErrorMsg(ex.getMessage())` was used in the code base. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-14649 ## How was this patch tested? - Manually tested on a cluster (custom build): Before: ```bash $ sudo -u om ozone admin om snapshot defrag --service-id=ozone1771242317 --node-id=om1546336036 Triggering Snapshot Defrag Service ... com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos$TriggerSnapshotDefragResponse$Builder.setErrorMsg(OzoneManagerAdminProtocolProtos.java:5369) at org.apache.hadoop.ozone.protocolPB.OMAdminProtocolServerSideImpl.triggerSnapshotDefrag(OMAdminProtocolServerSideImpl.java:133) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos$OzoneManagerAdminService$2.callBlockingMethod(OzoneManagerAdminProtocolProtos.java:5549) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2905) , while invoking $Proxy20.triggerSnapshotDefrag over null. Retrying after sleeping for 1000ms. com.google.protobuf.ServiceException: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): java.lang.NullPointerException at org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos$TriggerSnapshotDefragResponse$Builder.setErrorMsg(OzoneManagerAdminProtocolProtos.java:5369) at org.apache.hadoop.ozone.protocolPB.OMAdminProtocolServerSideImpl.triggerSnapshotDefrag(OMAdminProtocolServerSideImpl.java:133) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos$OzoneManagerAdminService$2.callBlockingMethod(OzoneManagerAdminProtocolProtos.java:5549) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2905) , while invoking $Proxy20.triggerSnapshotDefrag over null. Retrying after sleeping for 1000ms. ``` After: ```bash $ sudo -u om ozone admin om snapshot defrag --service-id=ozone1771242317 --node-id=om1546336036 Triggering Snapshot Defrag Service ... Failed to trigger snapshot defragmentation: Failed to Decommission OM. Error: Request to trigger snapshot defragmentation, sent to om1546336036[ccycloud-8.quasar-quotgb.root.comops.site:9862] failed with error: java.lang.UnsupportedOperationException at org.apache.hadoop.hdds.utils.db.Codec.fromCodecBuffer(Codec.java:94) at org.apache.hadoop.hdds.utils.db.TypedTable$1.convert(TypedTable.java:603) at org.apache.hadoop.hdds.utils.db.TypedTable$RawIterator.next(TypedTable.java:684) at org.apache.hadoop.hdds.utils.db.TypedTable$RawIterator.next(TypedTable.java:635) at org.apache.hadoop.ozone.om.snapshot.defrag.SnapshotDefragService.getTableBounds(SnapshotDefragService.java:242) at org.apache.hadoop.ozone.om.snapshot.defrag.SnapshotDefragService.performFullDefragmentation(SnapshotDefragService.java:272) at org.apache.hadoop.ozone.om.snapshot.defrag.SnapshotDefragService.checkAndDefragSnapshot(SnapshotDefragService.java:615) at org.apache.hadoop.ozone.om.snapshot.defrag.SnapshotDefragService.triggerSnapshotDefragOnce(SnapshotDefragService.java:670) at org.apache.hadoop.ozone.om.OzoneManager.triggerSnapshotDefrag(OzoneManager.java:3518) at org.apache.hadoop.ozone.protocolPB.OMAdminProtocolServerSideImpl.triggerSnapshotDefrag(OMAdminProtocolServerSideImpl.java:130) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerAdminProtocolProtos$OzoneManagerAdminService$2.callBlockingMethod(OzoneManagerAdminProtocolProtos.java:5549) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:995) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:923) at java.base/java.security.AccessController.doPrivileged(AccessController.java:712) at java.base/javax.security.auth.Subject.doAs(Subject.java:439) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1910) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2905) Failed to Decommission OM. Error: Request to trigger snapshot defragmentation, sent to om1546336036[ccycloud-8.quasar-quotgb.root.comops.site:9862] failed with error: java.lang.UnsupportedOperationException ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
