NameNode corruption: NPE addChild at start up

Markus Jelsma Mon, 24 Oct 2011 13:23:34 -0700

Hi,

We are on Hadoop 0.20.203 and use HDFS and MapRed for crawling using Apache 
Nutch. This morning we had our first encounter with an unwilling NameNode due 
to a very similar issue descriped on the list earlier [1].


We replaced with a backed up checkpoint, restarted the daemon and moved files 
with missing blocks to lost+found. Only transient data was affected.

Just now it happened again, same remedy and same results but it occured to me 
that there may be a reason besides coincidence or bad luck. After processing 
several GB's of prepared data we delete (-skipTrash) what we don't need and 
hurry on. 

Prior to both corruptions a several GB's were deleted and HDFS stopped only 
seconds later. Is there a possible relation here? It's not something i can 
test because we are not willing to risk corruption of important data; we're 
moving it to other locations as we speak.

Any tips on how to prevent this from happening again or finding the source of 
the problem are very much appreciated.

[1]: http://hadoop-common.472056.n3.nabble.com/addChild-NullPointerException-
when-starting-namenode-and-reading-edits-file-td92226.html

Thanks

NameNode corruption: NPE addChild at start up

Reply via email to