Tamir Kamara wrote:
Hi Raghu,

The thread you posted is my original post written when this problem first
happened on my cluster. I can file a JIRA but I wouldn't be able to provide
information other than what I already posted and I don't have the logs from
that time. Should I still file ?

yes. Jira is a better place for tracking and fixing bugs. I am pretty sure what you saw is a bug (either already or needs to be fixed).

Raghu.

Thanks,
Tamir


On Tue, May 5, 2009 at 9:14 PM, Raghu Angadi <rang...@yahoo-inc.com> wrote:

Tamir,

Please file a jira on the problem you are seeing with 'saveLeases'. In the
past there have been multiple fixes in this area (HADOOP-3418, HADOOP-3724,
and more mentioned in HADOOP-3724).

Also refer the thread you started
http://www.mail-archive.com/core-user@hadoop.apache.org/msg09397.html

I think another user reported the same problem recently.

These are indeed very serious and very annoying bugs.

Raghu.


Tamir Kamara wrote:

I didn't have a space problem which led to it (I think). The corruption
started after I bounced the cluster.
At the time, I tried to investigate what led to the corruption but didn't
find anything useful in the logs besides this line:
saveLeases found path

/tmp/temp623789763/tmp659456056/_temporary_attempt_200904211331_0010_r_000002_0/part-00002
but no matching entry in namespace

I also tried to recover from the secondary name node files but the
corruption my too wide-spread and I had to format.

Tamir

On Mon, May 4, 2009 at 4:48 PM, Stas Oskin <stas.os...@gmail.com> wrote:

 Hi.
Same conditions - where the space has run out and the fs got corrupted?

Or it got corrupted by itself (which is even more worrying)?

Regards.

2009/5/4 Tamir Kamara <tamirkam...@gmail.com>

 I had the same problem a couple of weeks ago with 0.19.1. Had to
reformat
the cluster too...

On Mon, May 4, 2009 at 3:50 PM, Stas Oskin <stas.os...@gmail.com>
wrote:

 Hi.
After rebooting the NameNode server, I found out the NameNode doesn't

start

anymore.

The logs contained this error:
"FSNamesystem initialization failed"


I suspected filesystem corruption, so I tried to recover from
SecondaryNameNode. Problem is, it was completely empty!

I had an issue that might have caused this - the root mount has run out

of

space. But, both the NameNode and the SecondaryNameNode directories

were
on

another mount point with plenty of space there - so it's very strange

that

they were impacted in any way.

Perhaps the logs, which were located on root mount and as a result,

could
not be written, have caused this?

To get back HDFS running, i had to format the HDFS (including manually
erasing the files from DataNodes). While this reasonable in test
environment
- production-wise it would be very bad.

Any idea why it happened, and what can be done to prevent it in the

future?

I'm using the stable 0.18.3 version of Hadoop.

Thanks in advance!




Reply via email to