[ 
https://issues.apache.org/jira/browse/HDFS-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413326#comment-13413326
 ] 

Todd Lipcon commented on HDFS-3652:
-----------------------------------

This has data-loss implications as well. I am able to reproduce the following:

- NN is writing to three dirs: /data/1/nn, /data/2/nn, and /data/3/nn
- I modified the NN to inject an IOException when creating "edits.new" in 
/data/3/nn, which causes "removeEditsForStorageDir" to get called inside 
{{rollEditLog}}
- Upon triggering a checkpoint:
-- all three logs are closed successfully
-- /data/1/nn and /data/2/nn are successfully opened for "edits.new"
-- /data/3/nn throws an IOE which gets caught. This calls 
{{removeEditsForStorageDir}}, which removes the wrong stream (augmented 
logging):
{code}
12/07/12 16:23:54 INFO namenode.FSNamesystem: Roll Edit Log from 127.0.0.1
12/07/12 16:23:54 INFO namenode.FSNamesystem: Number of transactions: 0 Total 
time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number 
of syncs: 0 SyncTimes(ms): 0 0 0 
12/07/12 16:23:54 WARN namenode.FSNamesystem: Removing edits stream 
/tmp/name1/nn/current/edits.new
12/07/12 16:23:54 WARN common.Storage: Removing storage dir /tmp/name3/nn
java.io.IOException: Injected fault for /tmp/name3/nn/current/edits.new
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog$EditLogFileOutputStream.<init>(FSEditLog.java:146)
{code}
- The NN is now _only_ writing to /tmp/name2/nn/current/edits.new, but 
considers both name1 and name2 to be good from a storage-directory standpoint. 
However, {{/tmp/name1/nn/current/edits.new}} exists as an empty edit log file 
(just the header and preallocated region of 0xffs)
- When {{rollFSImage}} is called, it successfully calls {{close}} only on the 
name2 log - which truncates it to the correct transaction boundary. Then it 
renames both {{name2/.../edits.new}} and {{name1/.../edits.new}} to {{edits}}, 
and opens them both for append (assuming they've been truncated to a 
transaction boundary).
- The NN is now writing to name1 and name2, but name1's log looks like this:

{code}
<valid header> <preallocated bytes of 0xffffffffffff.....> <transactions>
{code}

- Upon the next checkpoint, the 2NN will likely download this log, since it's 
listed first in the name directory list. Upon doing so, it will see the 0xff at 
the head of the log and not read any of the edits (which come after all of the 
0xffs)
- The 2NN then uploads the "merged" image back to the NN, which blows away the 
"edits" file. Thus, its in-memory data has gotten out of sync with the disk 
data, and the next time a checkpoint occurs or the NN restarts, it will fail.

This is not an issue in trunk since the code was largely rewritten by HDFS-1073.

The workaround for existing users is simple: rename the directories to eg 
/data/1/nn1 and /data/2/nn2. The fix is also simple. I will upload the fix this 
afternoon.
                
> 1.x: FSEditLog failure removes the wrong edit stream when storage dirs have 
> same name
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-3652
>                 URL: https://issues.apache.org/jira/browse/HDFS-3652
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 1.0.3, 1.1.0, 1.2.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>
> In {{FSEditLog.removeEditsForStorageDir}}, we iterate over the edits streams 
> trying to find the stream corresponding to a given dir. To check equality, we 
> currently use the following condition:
> {code}
>       File parentDir = getStorageDirForStream(idx);
>       if (parentDir.getName().equals(sd.getRoot().getName())) {
> {code}
> ... which is horribly incorrect. If two or more storage dirs happen to have 
> the same terminal path component (eg /data/1/nn and /data/2/nn) then it will 
> pick the wrong stream(s) to remove.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to