Log output prior to first shutdown: Many StateChange lines: .. 2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 178.21.118.71:50010 to delete blk_-6474991874786417864_1 10366 blk_4835119182915024496_110378 blk_5400176265071942037_110378 blk_3844651748475104977_110378 blk_-1608821730784127393_110376 bl k_6509383172138875668_110378 blk_5451246938053636538_110378 blk_2897026915704967576_110378 blk_-8171354591956681743_110358 blk_831230 8457484333010_110378 blk_-3230829712632219383_110378 2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 178.21.118.127:50010 to delete blk_-3399604157806337594_ 110378 blk_6747099268224653868_110330 blk_-8319971768460712343_110366 blk_-2961731871717799971_110378 blk_7869501598834020846_110374 blk_8093290740979129346_110370 blk_-6013635848766803390_110378 blk_7993259237237421138_110378 blk_5759478854526777476_110378 blk_-249 5357027905458427_110366 blk_-1854083208115601228_110366 2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 178.21.118.106:50010 to delete blk_-7802030530595759204_ 110378 blk_8721277224606661791_110374 blk_6747099268224653868_110330 blk_5656147162240510324_110378 blk_2897026915704967576_110378 bl k_-240991845576310137_110378 blk_5938453705724038271_110378 blk_8312308457484333010_110378 blk_-1854083208115601228_110366 blk_469356 3029097482715_110378 blk_2642639928334327569_110374 blk_-1087530163655362136_110378 blk_-330417000415464230_110378 blk_54512469380536 36538_110378 blk_6300864378343690857_110378 blk_9188777915480180330_110378 blk_6198541037144409710_110378 2011-10-24 10:09:56,665 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 178.21.118.102:50010 to delete blk_-8874651062228528404_ 110378 blk_-3399604157806337594_110378 blk_5374073877837004061_110338 blk_-3734465814030860048_110370 blk_4355408303621436967_110378 blk_4924117654799449604_110378 blk_1289214837876543883_110378 blk_-240991845576310137_110378 blk_5938453705724038271_110378 blk_-8294 59405706536971_110370 blk_-7851154197821762387_110322 blk_2792433696712651711_110378 blk_-361468857299887395_110378 blk_5345112692694 71506_110342 blk_-7018473240249214767_110378 blk_5393329649654239848_110370 2011-10-24 10:10:12,065 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 1281 Total time for transac tions(ms): 21Number of transactions batched in Syncs: 10 Number of syncs: 870 SyncTimes(ms): 13578 2011-10-24 10:10:26,250 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at namenode01.openindex.io/178.21.118.98 ************************************************************/
Following startup: .. 2011-10-24 10:11:58,601 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2011-10-24 10:12:00,478 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 2011-10-24 10:12:00,592 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times 2011-10-24 10:12:00,609 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 58901 2011-10-24 10:12:17,101 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 1 2011-10-24 10:12:17,373 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8688539 loaded in 16 seconds. 2011-10-24 10:12:17,376 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1113) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1125) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1028) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1009) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:827) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:365) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:353) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162) > Hi, > > We are on Hadoop 0.20.203 and use HDFS and MapRed for crawling using Apache > Nutch. This morning we had our first encounter with an unwilling NameNode > due to a very similar issue descriped on the list earlier [1]. > > We replaced with a backed up checkpoint, restarted the daemon and moved > files with missing blocks to lost+found. Only transient data was affected. > > Just now it happened again, same remedy and same results but it occured to > me that there may be a reason besides coincidence or bad luck. After > processing several GB's of prepared data we delete (-skipTrash) what we > don't need and hurry on. > > Prior to both corruptions a several GB's were deleted and HDFS stopped only > seconds later. Is there a possible relation here? It's not something i can > test because we are not willing to risk corruption of important data; we're > moving it to other locations as we speak. > > Any tips on how to prevent this from happening again or finding the source > of the problem are very much appreciated. > > [1]: > http://hadoop-common.472056.n3.nabble.com/addChild-NullPointerException- > when-starting-namenode-and-reading-edits-file-td92226.html > > Thanks