[ https://issues.apache.org/jira/browse/HDFS-14941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16968569#comment-16968569 ]
Hudson commented on HDFS-14941: ------------------------------- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17616 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17616/]) HDFS-14941. Potential editlog race condition can cause corrupted file. (cliang: rev dd900259c421d6edd0b89a535a1fe08ada91735f) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirWriteFileOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogOp.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SequentialNumber.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * (add) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestAddBlockTailing.java > Potential editlog race condition can cause corrupted file > --------------------------------------------------------- > > Key: HDFS-14941 > URL: https://issues.apache.org/jira/browse/HDFS-14941 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Chen Liang > Assignee: Chen Liang > Priority: Major > Labels: ha > Fix For: 3.3.0 > > Attachments: HDFS-14941.001.patch, HDFS-14941.002.patch, > HDFS-14941.003.patch, HDFS-14941.004.patch, HDFS-14941.005.patch, > HDFS-14941.006.patch > > > Recently we encountered an issue that, after a failover, NameNode complains > corrupted file/missing blocks. The blocks did recover after full block > reports, so the blocks are not actually missing. After further investigation, > we believe this is what happened: > First of all, on SbN, it is possible that it receives block reports before > corresponding edit tailing happened. In which case SbN postpones processing > the DN block report, handled by the guarding logic below: > {code:java} > if (shouldPostponeBlocksFromFuture && > namesystem.isGenStampInFuture(iblk)) { > queueReportedBlock(storageInfo, iblk, reportedState, > QUEUE_REASON_FUTURE_GENSTAMP); > continue; > } > {code} > Basically if reported block has a future generation stamp, the DN report gets > requeued. > However, in {{FSNamesystem#storeAllocatedBlock}}, we have the following code: > {code:java} > // allocate new block, record block locations in INode. > newBlock = createNewBlock(); > INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); > saveAllocatedBlock(src, inodesInPath, newBlock, targets); > persistNewBlock(src, pendingFile); > offset = pendingFile.computeFileSize(); > {code} > The line > {{newBlock = createNewBlock();}} > Would log an edit entry {{OP_SET_GENSTAMP_V2}} to bump generation stamp on > Standby > while the following line > {{persistNewBlock(src, pendingFile);}} > would log another edit entry {{OP_ADD_BLOCK}} to actually add the block on > Standby. > Then the race condition is that, imagine Standby has just processed > {{OP_SET_GENSTAMP_V2}}, but not yet {{OP_ADD_BLOCK}} (if they just happen to > be in different setment). Now a block report with new generation stamp comes > in. > Since the genstamp bump has already been processed, the reported block may > not be considered as future block. So the guarding logic passes. But > actually, the block hasn't been added to blockmap, because the second edit is > yet to be tailed. So, the block then gets added to invalidate block list and > we saw messages like: > {code:java} > BLOCK* addBlock: block XXX on node XXX size XXX does not belong to any file > {code} > Even worse, since this IBR is effectively lost, the NameNode has no > information about this block, until the next full block report. So after a > failover, the NN marks it as corrupt. > This issue won't happen though, if both of the edit entries get tailed all > together, so no IBR processing can happen in between. But in our case, we set > edit tailing interval to super low (to allow Standby read), so when under > high workload, there is a much much higher chance that the two entries are > tailed separately, causing the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org