HDFS edits log file corrupted can lead to a major loss of data.
---------------------------------------------------------------

                 Key: HADOOP-760
                 URL: http://issues.apache.org/jira/browse/HADOOP-760
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.6.1
            Reporter: Philippe Gassmann
            Priority: Critical


In one of our test system, our HDFS gets corrupted after the edits log file has 
been corrupted (i can tell how).

When we restarted the HDFS, the namenode refusses to started with a exception 
in hadoop-namenode-xxx.out.

Unfortunately, a rm mistake has been done, and I was not able to save somewhere 
this exception. 

But it was an ArrayIndexOutOfBoundException somewhere in a UTF8 method called 
from FSEditLog.loadFSEdits.

The result : the namenode was unable to start, the only way to get it fixed was 
the removing of the edits log file.

As it was on a test machine we do not have any backup, so all files created in 
the hdfs since the last start of the namenode were lost.

Is there a way to periodically commit changes to the hdfs in fsimage instead of 
keeping a huge logfile ? (eg every 10 minutes or so.)

Even if the namenode files are rsync'ed, what can be done in that particular 
case ? (if we periodically rsync the fsimage and its corrupted edits file).

This issue affects the 0.6.1 HDFS version. After looking at the hadoop trunk 
code, I am not able to says if this can be happening anymore... (I would say 
yes because of the use of UTF8 class in the same way as in 0.6.1)




-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to