[ https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17753917#comment-17753917 ]
ASF GitHub Bot commented on HDFS-17150: --------------------------------------- zhangshuyan0 commented on code in PR #5937: URL: https://github.com/apache/hadoop/pull/5937#discussion_r1293027160 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ########## @@ -3802,16 +3803,26 @@ boolean internalReleaseLease(Lease lease, String src, INodesInPath iip, lastBlock.getBlockType()); } - if (uc.getNumExpectedLocations() == 0 && lastBlock.getNumBytes() == 0) { + int minLocationsNum = 1; + if (lastBlock.isStriped()) { + minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); + } + if (uc.getNumExpectedLocations() < minLocationsNum && + lastBlock.getNumBytes() == 0) { // There is no datanode reported to this block. // may be client have crashed before writing data to pipeline. // This blocks doesn't need any recovery. // We can remove this block and close the file. pendingFile.removeLastBlock(lastBlock); finalizeINodeFileUnderConstruction(src, pendingFile, iip.getLatestSnapshotId(), false); - NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " - + "Removed empty last block and closed file " + src); + if (uc.getNumExpectedLocations() == 0) { + NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " + + "Removed empty last block and closed file " + src); + } else { + NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: " Review Comment: > If uc.getNumExpectedLocations() is 0, regardless of whether it is a striped block or not, I think we should all consider it to be an empty block, not unrecoverable. How about adding some comments here? > EC: Fix the bug of failed lease recovery. > ----------------------------------------- > > Key: HDFS-17150 > URL: https://issues.apache.org/jira/browse/HDFS-17150 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Shuyan Zhang > Priority: Major > Labels: pull-request-available > > If the client crashes without writing the minimum number of internal blocks > required by the EC policy, the lease recovery process for the corresponding > unclosed file may continue to fail. Taking RS(6,3) policy as an example, the > timeline is as follows: > 1. The client writes some data to only 5 datanodes; > 2. Client crashes; > 3. NN fails over; > 4. Now the result of `uc.getNumExpectedLocations()` completely depends on > block report, and there are 5 datanodes reporting internal blocks; > 5. When the lease expires hard limit, NN issues a block recovery command; > 6. The datanode checks the command and finds that the number of internal > blocks is insufficient, resulting in an error and recovery failure; > 7. The lease expires hard limit again, and NN issues a block recovery command > again, but the recovery fails again...... > When the number of internal blocks written by the client is less than 6, the > block group is actually unrecoverable. We should equate this situation to the > case where the number of replicas is 0 when processing replica files, i.e., > directly remove the last block group and close the file. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org