[
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859133#action_12859133
]
Konstantin Shvachko commented on HDFS-909:
------------------------------------------
- The issue is not closed, so it would be better to have a unified patch,
rather than doing 2 commits. I don't mind to recommit.
- Test for 0.20 passes fine now. Found 2 (eclipse) warnings in TestEditLogRace:
-- Method {{getFormattedFSImage()}} is not used anywhere.
-- Static method {{setBufferCapacity()}} should be called in static manner,
like {{FSEditLog.setBufferCapacity()}}
- I understand Tom's plan for 0.21. It does not hurt to commit though.
> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write
> operations corrupts edits log
> -----------------------------------------------------------------------------------------------------
>
> Key: HDFS-909
> URL: https://issues.apache.org/jira/browse/HDFS-909
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
> Environment: CentOS
> Reporter: Cosmin Lehene
> Assignee: Todd Lipcon
> Priority: Blocker
> Fix For: 0.21.0, 0.22.0
>
> Attachments: hdfs-909-ammendation.txt, hdfs-909-branch-0.20.txt,
> hdfs-909-branch-0.20.txt, hdfs-909-branch-0.21.txt, hdfs-909-unittest.txt,
> hdfs-909.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt,
> hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation
> resulting in OP_INVALID end-of-file marker being initially overwritten by the
> concurrent (in setReadyToFlush) threads and then removed twice from the
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() ->
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() ->
> FSEditLog.closeStream() -> EditLogOutputStream.flush() ->
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() ->
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() ->
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() ->
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() ->
> FSEditLog.closeStream() ->EditLogOutputStream.flush() ->
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() ->
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() ->
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the
> FSEdits.logSync() method level. However at a lower level access to
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT
> synchronized. These can be called from concurrent threads like in the example
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the
> "end-of-file marker") and then remove it twice (from each thread) in
> flushAndSync()! Hence there will be a valid byte missing from the edits log
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon
> cluster restart.
> We got to this point after investigating a corrupted edits file that made
> HDFS unable to start with
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but
> writables.length is 768.
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.