[ https://issues.apache.org/jira/browse/HDDS-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704600#comment-17704600 ]
Wei-Chiu Chuang commented on HDDS-8267: --------------------------------------- fwiw the log file was not missing. It was deliberately deleted by rocksdb. Rocksdb itself should have the ability to tell if its wal log is deleted or not. IMO it's a bug in rocksdb. > getOMDBUpdates requests crashed Ozone Manager > --------------------------------------------- > > Key: HDDS-8267 > URL: https://issues.apache.org/jira/browse/HDDS-8267 > Project: Apache Ozone > Issue Type: Bug > Affects Versions: 1.4.0 > Reporter: Wei-Chiu Chuang > Priority: Critical > Attachments: HDDS-8267-ozone-om.log, LOG, LOG.old.1679288337380113, > LOG.old.1679301915549029, LOG.old.1679315208126016, LOG.old.1679350165641592, > hs_err_pid195294.log, stderr.log > > > An OM crashed, its log has: > {noformat} > 2023-03-20 23:29:11,751 ERROR org.apache.hadoop.hdds.utils.db.RDBStore: > Unable to get delta updates since sequenceNumber 98273. This exception will > not be thrown to the client > java.io.IOException: RocksDatabase[/var/lib/hadoop-ozone/om/data/om.db]: > Failed to getUpdatesSince 98273; status : IOError(Undefined); message : while > stat a file for size: /var/lib/hadoop-ozone/om/data/om.db/000121.log: No such > file or directory > at > org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:576) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:85) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.getUpdatesSince(RocksDatabase.java:724) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:368) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3975) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:354) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:233) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:223) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:177) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) > Caused by: org.rocksdb.RocksDBException: while stat a file for size: > /var/lib/hadoop-ozone/om/data/om.db/000121.log: No such file or directory > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3966) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.getUpdatesSince(RocksDatabase.java:721) > ... 17 more > {noformat} > There is no 000121.log under the directory for sure. > {noformat} > ... > -rw-r--r-- 1 hdfs hdfs 1219 Mar 20 22:07 000091.sst > -rw-r--r-- 1 hdfs hdfs 36002 Mar 20 22:07 000092.sst > -rw-r--r-- 1 hdfs hdfs 86659 Mar 20 22:07 000093.sst > -rw-r--r-- 1 hdfs hdfs 137260115 Mar 20 23:28 000166.log > -rw-r--r-- 1 hdfs hdfs 23833 Mar 20 23:28 000167.sst > -rw-r--r-- 1 hdfs hdfs 1221 Mar 20 23:28 000168.sst > -rw-r--r-- 1 hdfs hdfs 11350 Mar 20 23:28 000169.sst > -rw-r--r-- 1 hdfs hdfs 1888 Mar 20 23:28 000170.sst > -rw-r--r-- 1 hdfs hdfs 6001 Mar 20 23:28 000171.sst > -rw-r--r-- 1 hdfs hdfs 10118 Mar 20 23:28 000172.sst > -rw-r--r-- 1 hdfs hdfs 1330 Mar 20 23:28 000173.sst > -rw-r--r-- 1 hdfs hdfs 303259 Mar 20 23:28 000175.sst > -rw-r--r-- 1 hdfs hdfs 295146251 Mar 20 23:29 000176.log > -rw-r--r-- 1 hdfs hdfs 90863 Mar 20 23:28 000177.sst > {noformat} > The stderr has the following rocksdb error repeatedly: > {noformat} > Exception in thread "Thread-835" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-837" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-836" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-842" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > {noformat} > There is a crash report file hs_err_pid195294.log: > {noformat} > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0 > j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5 > j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18 > j > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395 > j > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/o > m/helpers/DBUpdates;+30 > j > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lo > rg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9 > j > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/a > pache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;+533 > J 19482 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProt > ocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; > (45 bytes) @ 0x00007f37b24d48e4 [0x00007f37b24d44a0+0x444] > J 17472 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolPro > tos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; > (214 bytes) @ 0x00007f37b3789ac4 [0x00007f37b3788ea0+0xc24] > J 17794 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$468.apply(Ljava/lang/Object;)Ljava/lang/Object; > (12 bytes) @ 0x00007f3 > 7b163c124 [0x00007f37b163bfc0+0x164] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org