[ https://issues.apache.org/jira/browse/HDDS-8267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang updated HDDS-8267: ---------------------------------- Attachment: stderr.log > getOMDBUpdates requests crashed Ozone Manager > --------------------------------------------- > > Key: HDDS-8267 > URL: https://issues.apache.org/jira/browse/HDDS-8267 > Project: Apache Ozone > Issue Type: Bug > Reporter: Wei-Chiu Chuang > Priority: Critical > Attachments: HDDS-8267-ozone-om.log, hs_err_pid195294.log, stderr.log > > > An OM crashed, its log has: > {noformat} > 2023-03-20 23:29:11,751 ERROR org.apache.hadoop.hdds.utils.db.RDBStore: > Unable to get delta updates since sequenceNumber 98273. This exception will > not be thrown to the client > java.io.IOException: RocksDatabase[/var/lib/hadoop-ozone/om/data/om.db]: > Failed to getUpdatesSince 98273; status : IOError(Undefined); message : while > stat a file for size: /var/lib/hadoop-ozone/om/data/om.db/000121.log: No such > file or directory > at > org.apache.hadoop.hdds.utils.HddsServerUtil.toIOException(HddsServerUtil.java:576) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.toIOException(RocksDatabase.java:85) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.getUpdatesSince(RocksDatabase.java:724) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:368) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3975) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:354) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(OzoneManagerRequestHandler.java:233) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(OzoneManagerProtocolServerSideTranslatorPB.java:223) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:177) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:87) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:147) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894) > Caused by: org.rocksdb.RocksDBException: while stat a file for size: > /var/lib/hadoop-ozone/om/data/om.db/000121.log: No such file or directory > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3966) > at > org.apache.hadoop.hdds.utils.db.RocksDatabase.getUpdatesSince(RocksDatabase.java:721) > ... 17 more > {noformat} > There is no 000121.log under the directory for sure. > {noformat} > ... > -rw-r--r-- 1 hdfs hdfs 1219 Mar 20 22:07 000091.sst > -rw-r--r-- 1 hdfs hdfs 36002 Mar 20 22:07 000092.sst > -rw-r--r-- 1 hdfs hdfs 86659 Mar 20 22:07 000093.sst > -rw-r--r-- 1 hdfs hdfs 137260115 Mar 20 23:28 000166.log > -rw-r--r-- 1 hdfs hdfs 23833 Mar 20 23:28 000167.sst > -rw-r--r-- 1 hdfs hdfs 1221 Mar 20 23:28 000168.sst > -rw-r--r-- 1 hdfs hdfs 11350 Mar 20 23:28 000169.sst > -rw-r--r-- 1 hdfs hdfs 1888 Mar 20 23:28 000170.sst > -rw-r--r-- 1 hdfs hdfs 6001 Mar 20 23:28 000171.sst > -rw-r--r-- 1 hdfs hdfs 10118 Mar 20 23:28 000172.sst > -rw-r--r-- 1 hdfs hdfs 1330 Mar 20 23:28 000173.sst > -rw-r--r-- 1 hdfs hdfs 303259 Mar 20 23:28 000175.sst > -rw-r--r-- 1 hdfs hdfs 295146251 Mar 20 23:29 000176.log > -rw-r--r-- 1 hdfs hdfs 90863 Mar 20 23:28 000177.sst > {noformat} > The stderr has the following rocksdb error repeatedly: > {noformat} > Exception in thread "Thread-835" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-837" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-836" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > Exception in thread "Thread-842" java.lang.IllegalArgumentException: Illegal > value provided for FlushReason: 13 > at org.rocksdb.FlushReason.fromValue(FlushReason.java:51) > at org.rocksdb.FlushJobInfo.<init>(FlushJobInfo.java:41) > {noformat} > There is a core dump and hs_err_pid195294.log: > {noformat} > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > j org.rocksdb.RocksDB.getLatestSequenceNumber(J)J+0 > j org.rocksdb.RocksDB.getLatestSequenceNumber()J+5 > j org.apache.hadoop.hdds.utils.db.RocksDatabase.getLatestSequenceNumber()J+18 > j > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(JJ)Lorg/apache/hadoop/hdds/utils/db/DBUpdatesWrapper;+395 > j > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lorg/apache/hadoop/ozone/o > m/helpers/DBUpdates;+30 > j > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesRequest;)Lo > rg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$DBUpdatesResponse;+9 > j > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handleReadRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMRequest;)Lorg/a > pache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse;+533 > J 19482 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitReadRequestToOM(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProt > ocolProtos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; > (45 bytes) @ 0x00007f37b24d48e4 [0x00007f37b24d44a0+0x444] > J 17472 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolPro > tos$OMRequest;)Lorg/apache/hadoop/ozone/protocol/proto/OzoneManagerProtocolProtos$OMResponse; > (214 bytes) @ 0x00007f37b3789ac4 [0x00007f37b3788ea0+0xc24] > J 17794 C1 > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB$$Lambda$468.apply(Ljava/lang/Object;)Ljava/lang/Object; > (12 bytes) @ 0x00007f3 > 7b163c124 [0x00007f37b163bfc0+0x164] > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org