[ https://issues.apache.org/jira/browse/HDFS-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kihwal Lee resolved HDFS-8011. ------------------------------ Resolution: Cannot Reproduce It is very likely fixed in the later releases when we fixed similar issues. > standby nn can't started > ------------------------ > > Key: HDFS-8011 > URL: https://issues.apache.org/jira/browse/HDFS-8011 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha > Affects Versions: 2.3.0 > Environment: centeros 6.2 64bit > Reporter: fujie > > We have seen crash when starting the standby namenode, with fatal errors. Any > solutions, workarouds, or ideas would be helpful for us. > 1. Here is the context: > At begining we have 2 namenodes, take A as active and B as standby. For > some resons, namenode A was dead, so namenode B is working as active. > When we try to restart A after a minute, it can't work. During this > time a lot of files were put to HDFS, and a lot of files were renamed. > Nodenode A crashed when "awaiting reported blocks in safemode" each > time. > > 2. We can see error log below: > 1)2015-03-30 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation CloseOp [length=0, inodeId=0, > path=/xxx/_temporary/xxx/part-r-00074.bz2, replication=3, > mtime=1427699913947, atime=1427699081161, blockSize=268435456, > blocks=[blk_2103131025_1100889495739], permissions=dm:dm:rw-r--r--, > clientName=, clientMachine=, opCode=OP_CLOSE, txid=7632753612] > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.setGenerationStampAndVerifyReplicas(BlockInfoUnderConstruction.java:247) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoUnderConstruction.commitBlock(BlockInfoUnderConstruction.java:267) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:639) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:813) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:383) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:209) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > > 2)2015-03-30 FATAL > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unknown error > encountered while tailing edits. Shutting down standby N > N. > java.io.IOException: Failed to apply edit log operation AddBlockOp > [path=/xxx/_temporary/xxx/part-m-00121, > penultimateBlock=blk_2102331803_1100888911441, > lastBlock=blk_2102661068_1100889009168, RpcClientId=, RpcCallId=-2]: error > null > at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:122) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:737) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$0(EditLogTailer.java:302) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:413) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) > -- This message was sent by Atlassian JIRA (v6.3.4#6332)