Re: Namenode failed to start with "FSNamesystem initialization failed" error

Raghu Angadi Tue, 05 May 2009 11:18:45 -0700

Tamir,

Please file a jira on the problem you are seeing with 'saveLeases'. Inthe past there have been multiple fixes in this area (HADOOP-3418,HADOOP-3724, and more mentioned in HADOOP-3724).

Also refer the thread you startedhttp://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html


I think another user reported the same problem recently.

These are indeed very serious and very annoying bugs.

Raghu.

Tamir Kamara wrote:

I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path
/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote:

Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara <tamirkam...@gmail.com>

I had the same problem a couple of weeks ago with 0.19.1. Had to reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com> wrote:

Hi.

After rebooting the NameNode server, I found out the NameNode doesn't

start

anymore.

The logs contained this error:
"FSNamesystem initialization failed"


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run out

of

space. But, both the NameNode and the SecondaryNameNode directories

were

on

another mount point with plenty of space there - so it's very strange

that

they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,

could

not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including manually
erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the

future?

I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!

Re: Namenode failed to start with "FSNamesystem initialization failed" error

Reply via email to