HDFS edits log file corrupted can lead to a major loss of data.
---------------------------------------------------------------
Key: HADOOP-760
URL: http://issues.apache.org/jira/browse/HADOOP-760
Project: Hadoop
Issue Type: Bug
Components: dfs
Affects Versions: 0.6.1
Reporter: Philippe Gassmann
Priority: Critical
In one of our test system, our HDFS gets corrupted after the edits log file has
been corrupted (i can tell how).
When we restarted the HDFS, the namenode refusses to started with a exception
in hadoop-namenode-xxx.out.
Unfortunately, a rm mistake has been done, and I was not able to save somewhere
this exception.
But it was an ArrayIndexOutOfBoundException somewhere in a UTF8 method called
from FSEditLog.loadFSEdits.
The result : the namenode was unable to start, the only way to get it fixed was
the removing of the edits log file.
As it was on a test machine we do not have any backup, so all files created in
the hdfs since the last start of the namenode were lost.
Is there a way to periodically commit changes to the hdfs in fsimage instead of
keeping a huge logfile ? (eg every 10 minutes or so.)
Even if the namenode files are rsync'ed, what can be done in that particular
case ? (if we periodically rsync the fsimage and its corrupted edits file).
This issue affects the 0.6.1 HDFS version. After looking at the hadoop trunk
code, I am not able to says if this can be happening anymore... (I would say
yes because of the use of UTF8 class in the same way as in 0.6.1)
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira