Hi, We are on Hadoop 0.20.203 and use HDFS and MapRed for crawling using Apache Nutch. This morning we had our first encounter with an unwilling NameNode due to a very similar issue descriped on the list earlier [1].
We replaced with a backed up checkpoint, restarted the daemon and moved files with missing blocks to lost+found. Only transient data was affected. Just now it happened again, same remedy and same results but it occured to me that there may be a reason besides coincidence or bad luck. After processing several GB's of prepared data we delete (-skipTrash) what we don't need and hurry on. Prior to both corruptions a several GB's were deleted and HDFS stopped only seconds later. Is there a possible relation here? It's not something i can test because we are not willing to risk corruption of important data; we're moving it to other locations as we speak. Any tips on how to prevent this from happening again or finding the source of the problem are very much appreciated. [1]: http://hadoop-common.472056.n3.nabble.com/addChild-NullPointerException- when-starting-namenode-and-reading-edits-file-td92226.html Thanks