Hi. This quite worry-some issue.
Can anyone advice on this? I'm really concerned it could appear in production, and cause a huge data loss. Is there any way to recover from this? Regards. 2009/5/5 Tamir Kamara <tamirkam...@gmail.com> > I didn't have a space problem which led to it (I think). The corruption > started after I bounced the cluster. > At the time, I tried to investigate what led to the corruption but didn't > find anything useful in the logs besides this line: > saveLeases found path > > /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002 > but no matching entry in namespace > > I also tried to recover from the secondary name node files but the > corruption my too wide-spread and I had to format. > > Tamir > > On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote: > > > Hi. > > > > Same conditions - where the space has run out and the fs got corrupted? > > > > Or it got corrupted by itself (which is even more worrying)? > > > > Regards. > > > > 2009/5/4 Tamir Kamara <tamirkam...@gmail.com> > > > > > I had the same problem a couple of weeks ago with 0.19.1. Had to > reformat > > > the cluster too... > > > > > > On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com> > wrote: > > > > > > > Hi. > > > > > > > > After rebooting the NameNode server, I found out the NameNode doesn't > > > start > > > > anymore. > > > > > > > > The logs contained this error: > > > > "FSNamesystem initialization failed" > > > > > > > > > > > > I suspected filesystem corruption, so I tried to recover from > > > > SecondaryNameNode. Problem is, it was completely empty! > > > > > > > > I had an issue that might have caused this - the root mount has run > out > > > of > > > > space. But, both the NameNode and the SecondaryNameNode directories > > were > > > on > > > > another mount point with plenty of space there - so it's very strange > > > that > > > > they were impacted in any way. > > > > > > > > Perhaps the logs, which were located on root mount and as a result, > > could > > > > not be written, have caused this? > > > > > > > > > > > > To get back HDFS running, i had to format the HDFS (including > manually > > > > erasing the files from DataNodes). While this reasonable in test > > > > environment > > > > - production-wise it would be very bad. > > > > > > > > Any idea why it happened, and what can be done to prevent it in the > > > future? > > > > I'm using the stable 0.18.3 version of Hadoop. > > > > > > > > Thanks in advance! > > > > > > > > > >