[ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831200#action_12831200
 ] 

Todd Lipcon commented on HDFS-909:
----------------------------------

Unfortunately, I wasn't able to write a unit test that reproduces your race. 
This is because the race only
occurs if the NN exits before rollEditLogs() is called at the end of 
FSImage.saveFSImage(boolean) --
the race induces corruiption in EDITS, but EDITS_NEW is a correct empty file. 
rollEditLogs thus fixes
up the state of the file.

I think we'll deal with this issue in the other JIRA regarding saveFSImage 
operation.

bq. think we will need to call waitForSyncToFinish() both before entering safe 
mode

I think it's actually impossible to be correct here. The issue with 
waitForSyncToFinish in enterSafeMode is that many
of the FSNamesystem calls have a structure that looks like:
{code}
1 void someOperation() {
2   synchronized (this) {
3     if (!isInSafeMode()) { explode; }
4     internalSomeOperation();
5   }
6   getEditLog().logSync();
7 }
{code}

If we call enterSafeMode between line 5 and 6, a waitForSyncToFinish would 
return immediately,
since the sync isn't running yet. Really this is the case for any of lines 3-6 
since enterSafeMode
is synchronized as well.

I think we need an additional method with stronger guarantees than 
waitForSyncToFinish - something like
syncAllOutstandingOperations that waits until synctxid == txid.

What do you think?

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
> operations  corrupts edits log
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-909
>                 URL: https://issues.apache.org/jira/browse/HDFS-909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>         Environment: CentOS
>            Reporter: Cosmin Lehene
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, 
> hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation 
> resulting in OP_INVALID end-of-file marker being initially overwritten by the 
> concurrent (in setReadyToFlush) threads and then removed twice from the 
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the 
> FSEdits.logSync() method level. However at a lower level access to 
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
> synchronized. These can be called from concurrent threads like in the example 
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write 
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
> "end-of-file marker") and then remove it twice (from each thread) in 
> flushAndSync()! Hence there will be a valid byte missing from the edits log 
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
> cluster restart. 
> We got to this point after investigating a corrupted edits file that made 
> HDFS unable to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but 
> writables.length is 768. 
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to