Hi.

This quite worry-some issue.

Can anyone advice on this? I'm really concerned it could appear in
production, and cause a huge data loss.

Is there any way to recover from this?

Regards.

2009/5/5 Tamir Kamara <tamirkam...@gmail.com>

> I didn't have a space problem which led to it (I think). The corruption
> started after I bounced the cluster.
> At the time, I tried to investigate what led to the corruption but didn't
> find anything useful in the logs besides this line:
> saveLeases found path
>
> /tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002
> but no matching entry in namespace
>
> I also tried to recover from the secondary name node files but the
> corruption my too wide-spread and I had to format.
>
> Tamir
>
> On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote:
>
> > Hi.
> >
> > Same conditions - where the space has run out and the fs got corrupted?
> >
> > Or it got corrupted by itself (which is even more worrying)?
> >
> > Regards.
> >
> > 2009/5/4 Tamir Kamara <tamirkam...@gmail.com>
> >
> > > I had the same problem a couple of weeks ago with 0.19.1. Had to
> reformat
> > > the cluster too...
> > >
> > > On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com>
> wrote:
> > >
> > > > Hi.
> > > >
> > > > After rebooting the NameNode server, I found out the NameNode doesn't
> > > start
> > > > anymore.
> > > >
> > > > The logs contained this error:
> > > > "FSNamesystem initialization failed"
> > > >
> > > >
> > > > I suspected filesystem corruption, so I tried to recover from
> > > > SecondaryNameNode. Problem is, it was completely empty!
> > > >
> > > > I had an issue that might have caused this - the root mount has run
> out
> > > of
> > > > space. But, both the NameNode and the SecondaryNameNode directories
> > were
> > > on
> > > > another mount point with plenty of space there - so it's very strange
> > > that
> > > > they were impacted in any way.
> > > >
> > > > Perhaps the logs, which were located on root mount and as a result,
> > could
> > > > not be written, have caused this?
> > > >
> > > >
> > > > To get back HDFS running, i had to format the HDFS (including
> manually
> > > > erasing the files from DataNodes). While this reasonable in test
> > > > environment
> > > > - production-wise it would be very bad.
> > > >
> > > > Any idea why it happened, and what can be done to prevent it in the
> > > future?
> > > > I'm using the stable 0.18.3 version of Hadoop.
> > > >
> > > > Thanks in advance!
> > > >
> > >
> >
>

Reply via email to