[ https://issues.apache.org/jira/browse/HDDS-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704814#comment-17704814 ]
Attila Doroszlai commented on HDDS-8271: ---------------------------------------- CC [~ritesh], [~duongnguyen] > Ozone Manager crashed caused by rocksdb closed > ---------------------------------------------- > > Key: HDDS-8271 > URL: https://issues.apache.org/jira/browse/HDDS-8271 > Project: Apache Ozone > Issue Type: Bug > Components: OM > Affects Versions: 1.3.0 > Reporter: Hongbing Wang > Priority: Critical > > An OM crashed, its log has: > {noformat} > 2023-03-23 20:32:06,079 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61913753 > to > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61913753-61920134 > 2023-03-23 20:32:06,087 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135 > 2023-03-23 20:32:13,820 [om3-client-thread180] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment > log-61920135_61926451 to index:61926451 > 2023-03-23 20:32:13,821 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61920135 > to > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61920135-61926451 > 2023-03-23 20:32:13,827 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452 > 2023-03-23 20:32:15,699 [Finalizer] WARN > org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: > RocksIterator is not closed properly > 2023-03-23 20:32:15,699 [Finalizer] WARN > org.apache.hadoop.hdds.utils.db.managed.ManagedRocksObjectUtils: > RocksIterator is not closed properly > 2023-03-23 20:32:21,281 [om3-client-thread183] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment > log-61926452_61932758 to index:61932758 > 2023-03-23 20:32:21,281 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61926452 > to > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61926452-61932758 > 2023-03-23 20:32:21,288 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759 > 2023-03-23 20:32:28,387 [om3-client-thread160] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolling segment > log-61932759_61939170 to index:61939170 > 2023-03-23 20:32:28,388 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: Rolled log segment from > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61932759 > to > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_61932759-61939170 > 2023-03-23 20:32:28,395 [om3@group-197E298202B9-SegmentedRaftLogWorker] INFO > org.apache.ratis.server.raftlog.segmented.SegmentedRaftLogWorker: > om3@group-197E298202B9-SegmentedRaftLogWorker: created new log segment > /data/ozonedata/om/ratis/e961d78e-fe7e-32f2-a53f-197e298202b9/current/log_inprogress_61939171 > 2023-03-23 20:32:29,003 [OM StateMachine ApplyTransaction Thread - 0] ERROR > org.apache.hadoop.ozone.om.KeyManagerImpl: CheckAccess operation failed for > key:vol-XXX/prod-XXX/FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE > java.io.IOException: Rocks Database is closed > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517) > at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110) > at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40) > at > org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255) > at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181) > at > org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037) > at > org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164) > at > org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584) > at > org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570) > at > org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128) > at > org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360) > at > org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215) > at > org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339) > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2023-03-23 20:32:29,004 [OM StateMachine ApplyTransaction Thread - 0] ERROR > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest: > MultipartUpload Commit is failed for > Key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE in > Volume/Bucket vol-XXX/prod-XXX > INTERNAL_ERROR org.apache.hadoop.ozone.om.exceptions.OMException: Check > access operation failed for > key:FILE_2022-03-28-13-38-36_10003/_2022-03-28-13-38-45_0.FILE > at > org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1083) > at > org.apache.hadoop.ozone.security.acl.OzoneNativeAuthorizer.checkAccess(OzoneNativeAuthorizer.java:164) > at > org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2584) > at > org.apache.hadoop.ozone.om.OzoneManager.checkAcls(OzoneManager.java:2570) > at > org.apache.hadoop.ozone.om.OzoneAclUtils.checkAllAcls(OzoneAclUtils.java:128) > at > org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:360) > at > org.apache.hadoop.ozone.om.request.OMClientRequest.checkAcls(OMClientRequest.java:215) > at > org.apache.hadoop.ozone.om.request.key.OMKeyRequest.checkKeyAcls(OMKeyRequest.java:339) > at > org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:138) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleWriteRequest(OzoneManagerRequestHandler.java:311) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.runCommand(OzoneManagerStateMachine.java:533) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerStateMachine.lambda$1(OzoneManagerStateMachine.java:324) > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Rocks Database is closed > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.assertClose(RocksDatabase.java:395) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.get(RocksDatabase.java:517) > at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:110) > at org.apache.hadoop.hdds.utils.db.RDBTable.get(RDBTable.java:40) > at > org.apache.hadoop.hdds.utils.db.TypedTable.getFromTable(TypedTable.java:255) > at org.apache.hadoop.hdds.utils.db.TypedTable.get(TypedTable.java:181) > at > org.apache.hadoop.ozone.om.KeyManagerImpl.checkAccess(KeyManagerImpl.java:1037) > ... 15 more > 2023-03-23 20:32:29,012 [OMDoubleBufferFlushThread] ERROR > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer: Terminating with > exit status 1: During flush to DB encountered error in OMDoubleBuffer flush > thread OMDoubleBufferFlushThread > java.io.IOException: Rocks Database is closed > at > org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.assertClosed(RocksDatabase.java:310) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase$ColumnFamily.batchPut(RocksDatabase.java:293) > at > org.apache.hadoop.hdds.utils.db.RDBBatchOperation.put(RDBBatchOperation.java:62) > at > org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:77) > at > org.apache.hadoop.hdds.utils.db.RDBTable.putWithBatch(RDBTable.java:40) > at > org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:132) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:149) > at > org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.checkAndUpdateDB(S3MultipartUploadCommitPartResponse.java:117) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$2(OzoneManagerDoubleBuffer.java:283) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.addToBatchWithTrace(OzoneManagerDoubleBuffer.java:228) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$1(OzoneManagerDoubleBuffer.java:281) > at java.util.Iterator.forEachRemaining(Iterator.java:116) > at > org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:277) > at java.lang.Thread.run(Thread.java:745) > 2023-03-23 20:32:29,016 [shutdown-hook-0] INFO > org.apache.hadoop.ozone.om.OzoneManagerStarter: SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down OzoneManager at bigdata-hostxxx.com/10.xx.xx.xx > ************************************************************/ > {noformat} > more OM log see: [OM crashed detail log > |https://github.com/whbing/issue_logs/blob/main/ozone/om/om_crash.20230323.log] > rocksdb log: LOG.old.1679574802031612 > {noformat} > 2023/03/23-20:32:26.278546 140423117969152 [/db_impl/db_impl_files.cc:445] > [JOB 4050] Try to delete WAL files size 42545044203, prev total WAL file size > 42958842785, number of live WAL files 326. > ... > 2023/03/23-20:32:28.853876 140423224928000 [/version_set.cc:3800] More > existing levels in DB than needed. max_bytes_for_level_multiplier may not be > guaranteed. > ... > 2023/03/23-20:32:28.854153 140423224928000 (Original Log Time > 2023/03/23-20:32:28.854112) [/db_impl/db_impl_compaction_flush.cc:284] > [multipartInfoTable] Level summary: base level 6 level multiplier 10.00 max > bytes base 268435456 files[2 0 0 0 0 0 3] max score 0.50 > ... > 2023/03/23-20:32:28.998765 140423117969152 [le/delete_scheduler.cc:77] > Deleted file /data/ozonedata/om/db/om.db/098324.log immediately, > rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000 > 2023/03/23-20:32:29.009957 140422757758720 [/db_impl/db_impl.cc:480] > Shutdown: canceling all background work > 2023/03/23-20:32:29.020213 140423117969152 [le/delete_scheduler.cc:77] > Deleted file /data/ozonedata/om/db/om.db/098322.log immediately, > rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000 > ... > 2023/03/23-20:32:32.829353 140423117969152 [le/delete_scheduler.cc:77] > Deleted file /data/ozonedata/om/db/om.db/097921.log immediately, > rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000 > 2023/03/23-20:32:32.883735 140422757758720 [/db_impl/db_impl.cc:710] Shutdown > complete > {noformat} > more see : [detail rocks > log|https://github.com/whbing/issue_logs/blob/main/ozone/om/LOG.old.1679574802031612] > core file: hs_err_pid1017701.log : > {noformat} > Stack: [0x00007fb6b8883000,0x00007fb6b8984000], sp=0x00007fb6b89817f8, free > space=1017k > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0 > j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5 > j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18 > j > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395 > j > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/om/helpers/DBUpdates;+30 > j > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/p > rotocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9 > J 19759 C2 > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/apache/hadoop/ozo > ne/protocol/proto/OzoneManagerProtocolProtos$OMResponse; (840 bytes) @ > 0x00007fc99793523c [0x00007fc997933ec0+0x137c] > J 19799 C2 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerP > rotocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; > (81 bytes) @ 0x00007fc997dec564 [0x00007fc997deb460+0x1104] > J 17937 C2 > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(Lcom/google/protobuf/Descriptors$MethodDescriptor;Lcom/google/protobuf/Rp > cController;Lcom/google/protobuf/Message;)Lcom/google/protobuf/Message; (69 > bytes) @ 0x00007fc995ec62a0 [0x00007fc995ec6140+0x160] > J 17503 C2 > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Lorg/apache/hadoop/ipc/RPC$Server;Ljava/lang/String;Lorg/apache/hadoop/io/Writable;J)Lorg/apache/hadoop/io/Wr > itable; (830 bytes) @ 0x00007fc99714df78 [0x00007fc99714c9c0+0x15b8] > J 17497 C2 org.apache.hadoop.ipc.Server$RpcCall.run()Ljava/lang/Object; (5 > bytes) @ 0x00007fc997237f70 [0x00007fc997237c60+0x310] > v ~StubRoutines::call_stub > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org