[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

Hadoop QA (JIRA) Mon, 01 Feb 2010 22:34:45 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12828498#action_12828498
 ]


Hadoop QA commented on HDFS-909:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12434498/hdfs-909.txt
  against trunk revision 903906.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 2 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/110/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/110/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/110/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/110/console

This message is automatically generated.

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write 
> operations  corrupts edits log
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-909
>                 URL: https://issues.apache.org/jira/browse/HDFS-909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>         Environment: CentOS
>            Reporter: Cosmin Lehene
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt
>
>
> closing the edits log file can race with write to edits log file operation 
> resulting in OP_INVALID end-of-file marker being initially overwritten by the 
> concurrent (in setReadyToFlush) threads and then removed twice from the 
> buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> 
> FSEditLog.closeStream() -> EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush() 
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> 
> FSEditLog.purgeEditLog() -> FSEditLog.revertFileStreams() -> 
> FSEditLog.closeStream() ->EditLogOutputStream.flush() -> 
> EditLogFileOutputStream.flushAndSync()
> VERSUS
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> 
> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the 
> FSEdits.logSync() method level. However at a lower level access to 
> EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT 
> synchronized. These can be called from concurrent threads like in the example 
> above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write 
> operation it can race for EditLogFileOutputStream.setReadyToFlush that will 
> overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the 
> "end-of-file marker") and then remove it twice (from each thread) in 
> flushAndSync()! Hence there will be a valid byte missing from the edits log 
> that leads to a SecondaryNameNode silent failure and a full HDFS failure upon 
> cluster restart. 
> We got to this point after investigating a corrupted edits file that made 
> HDFS unable to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but 
> writables.length is 768. 
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> In the edits file we found the first 2 entries:
> {code:title=edits}
> FFFFFFEC0900000005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D313236313038323632383039340008363731303838363400000003F6CBB87EF376E3E600000000040000000000000000039665F9549DE069A5735E00000000040000000000000000039665ADCC71A050B16ABF00000000015A179A0000000000039665066861646F6F700A737570657267726F7570010100000003003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F3833323831343837313930373033313732373900352F68626173652F64656D6F5F5F75736572732F3336343035313634362F746573742F36393137333831323838333034343734333836000D3132363130383236333138363902
> ...
> {code}
> This is the completeFile operation that's missing the last byte 
> {code:title=completeFile}
> FFFFFFEC0900000005003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F38333238313438373139303730333137323739000133000D31323631303832363331383335000D313236313038323632383039340008363731303838363400000003F6CBB87EF376E3E600000000040000000000000000039665F9549DE069A5735E00000000040000000000000000039665ADCC71A050B16ABF00000000015A179A0000000000039665066861646F6F700A737570657267726F757001??
> {code}
> followed by a rename operation
> {code:Title=rename}
> 0100000003003F2F68626173652F64656D6F5F5F75736572732F636F6D70616374696F6E2E6469722F3336343035313634362F3833323831343837313930373033313732373900352F68626173652F64656D6F5F5F75736572732F3336343035313634362F746573742F36393137333831323838333034343734333836000D31323631303832363331383639
> {code}
> The first byte of the rename was instead read as part of the completeFile() 
> operation. This resulted in reading the next operation as 0x00 (OP_ADD) 
> followed by an int (length) which would have been 0x0000030 which is 768 that 
> was read and failed in the following code
> {code:Title=FSEditLog.java}
>  case OP_ADD:
>         case OP_CLOSE: {
>           // versions > 0 support per file replication
>           // get name and replication
>           int length = in.readInt();
>           if (-7 == logVersion && length != 3||
>               -17 < logVersion && logVersion < -7 && length != 4 ||
>               logVersion <= -17 && length != 5) {
>               throw new IOException("Incorrect data format."  +
>                                     " logVersion is " + logVersion +
>                                     " but writables.length is " +
>                                     length + ". ");
> {code}
> Filling the missing byte with a value resulted in correct interpretation of 
> the 0x01 (OP_RENAME) and correct parsing for the rest of the edits file (>1MB)
> This theory is also supported by the NameNode log file from the date the 
> corruption took place:
> {code:title=namenode.log}
> 2009-12-17 12:43:51,276 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
> 10.72.17.162
> 2009-12-17 12:43:51,338 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.165:50010 is added to 
> blk_-480585673051114658_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.165:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.167:50010|RBW]]} size 67108864
> 2009-12-17 12:43:51,339 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.166:50010 is added to 
> blk_-480585673051114658_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.165:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.167:50010|RBW]]} size 67108864
> 2009-12-17 12:43:51,342 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/demo__users/compaction.dir/364051646/8328148719070317279. 
> blk_-5923234476536534337_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.164:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.163:50010|RBW]]}
> 2009-12-17 12:43:51,352 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.167:50010 is added to 
> blk_-480585673051114658_235109{blockUCState=COMMITTED, primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.165:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.167:50010|RBW]]} size 67108864
> 2009-12-17 12:43:51,833 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.163:50010 is added to 
> blk_-5923234476536534337_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.164:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.163:50010|RBW]]} size 22681498
> 2009-12-17 12:43:51,834 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.164:50010 is added to 
> blk_-5923234476536534337_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.164:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.163:50010|RBW]]} size 22681498
> 2009-12-17 12:43:51,834 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.72.17.166:50010 is added to 
> blk_-5923234476536534337_235109{blockUCState=UNDER_CONSTRUCTION, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUnderConstruction[10.72.17.166:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.164:50010|RBW], 
> ReplicaUnderConstruction[10.72.17.163:50010|RBW]]} size 22681498
> 2009-12-17 12:43:51,835 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/demo__users/compaction.dir/364051646/8328148719070317279 is closed by 
> DFSClient_-1989779667
> 2009-12-17 12:43:51,835 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage from 
> 10.72.17.162
> 2009-12-17 12:43:51,870 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop,hadoop  
>      ip=/10.72.17.166        cmd=rename      
> src=/hbase/demo__users/compaction.dir/364051646/8328148719070317279     
> dst=/hbase/demo__users/364051646/test/6917381288304474386       
> perm=hadoop:supergroup:rw-r--r--
> {code}
> The last 3 entries show a completeFile operation followed by a rollFSIMage 
> operation followed by a rename operation. This is where most probably the 
> race condition took place.
> Synchronizing access to EditLogOutputStream could fix the problem, however 
> other race or deadlocks may still occur. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log

Reply via email to