[ https://issues.apache.org/jira/browse/HDFS-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373706#comment-17373706 ]
Wei-Chiu Chuang commented on HDFS-14529: ---------------------------------------- We encountered this bug again, and it is reproducible for this set of fsimage/edit logs. We added debug logs and found that the IIP has a few missing components. It was supposed to have 8 components in the path but only 6 was found. Two were nulls. It is likely caused by files already deleted from snapshots. Somehow the active NN keeps the file in memory, so standby namenode crashes upon loading edits. Comparing this method with other similar methods, I think we should check for nullity of iip.getLastINode(), and throw FileNotFoundException. There are other places in the code where we could add the nullity check as well. I did fail several times for other edit log op (mkdir, rename, renameSnapshot) too. {noformat} 21/07/02 11:39:39 ERROR namenode.FSEditLogLoader: AssertionError caught in unprotectedSetTimes: iip=INodesInPath: path = /apps/hive/warehouse/ea_common.db/sls_blng_rw/ins_gmt_dt=2021-06-22/part-00001-087de2ec-7888-4f2b-bea6-3702c69cf953.c000 inodes = [, apps, hive, warehouse, ea_common.db, sls_blng_rw, null, null], length=8 isSnapshot = false snapshotId = 8014, lastINode=null, mtime=-1, atime=1624825911021, force? true java.lang.AssertionError: i = 6 != 8, this=INodesInPath: path = /apps/hive/warehouse/ea_common.db/sls_blng_rw/ins_gmt_dt=2021-06-22/part-00001-087de2ec-7888-4f2b-bea6-3702c69cf953.c000 inodes = [, apps, hive, warehouse, ea_common.db, sls_blng_rw, null, null], length=8 isSnapshot = false snapshotId = 8014 at org.apache.hadoop.hdfs.server.namenode.INodesInPath.validate(INodesInPath.java:488) at org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:355) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:631) {noformat} > NPE while Loading the Editlogs > ------------------------------ > > Key: HDFS-14529 > URL: https://issues.apache.org/jira/browse/HDFS-14529 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.1 > Reporter: Harshakiran Reddy > Assignee: Ayush Saxena > Priority: Major > > {noformat} > 2019-05-31 15:15:42,397 ERROR namenode.FSEditLogLoader: Encountered exception > on operation TimesOp [length=0, > path=/testLoadSpace/dir0/dir0/dir0/dir2/_file_9096763, mtime=-1, > atime=1559294343288, opCode=OP_TIMES, txid=18927893] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.unprotectedSetTimes(FSDirAttrOp.java:490) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:711) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:286) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:181) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:924) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:771) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:331) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:726) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.doRecovery(NameNode.java:1558) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1640) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1725){noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org