Tamir,

Please file a jira on the problem you are seeing with 'saveLeases'. In the past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724, and more mentioned in HADOOP-3724).

Also refer the thread you started http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html

I think another user reported the same problem recently.

These are indeed very serious and very annoying bugs.

Raghu.

Tamir Kamara wrote:
I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path
/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote:

Hi.

Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara <tamirkam...@gmail.com>

I had the same problem a couple of weeks ago with 0.19.1. Had to reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com> wrote:

Hi.

After rebooting the NameNode server, I found out the NameNode doesn't
start
anymore.

The logs contained this error:
"FSNamesystem initialization failed"


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run out
of
space. But, both the NameNode and the SecondaryNameNode directories
were
on
another mount point with plenty of space there - so it's very strange
that
they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,
could
not be written, have caused this?


To get back HDFS running, i had to format the HDFS (including manually
erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the
future?
I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!



Reply via email to