[ https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752648#comment-17752648 ]
ASF GitHub Bot commented on HDFS-17150: --------------------------------------- hfutatzhanghb commented on code in PR #5937: URL: https://github.com/apache/hadoop/pull/5937#discussion_r1289651742 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ########## @@ -3802,7 +3803,12 @@ boolean internalReleaseLease(Lease lease, String src, INodesInPath iip, lastBlock.getBlockType()); } - if (uc.getNumExpectedLocations() == 0 && lastBlock.getNumBytes() == 0) { + int minLocationsNum = 1; + if (lastBlock.isStriped()) { + minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); + } + if (uc.getNumExpectedLocations() < minLocationsNum && Review Comment: @zhangshuyan0 Hi, shuyan. please also check below code snippet in method FSDirWriteFileOp#storeAllocatedBlock: ```java final BlockType blockType = pendingFile.getBlockType(); // allocate new block, record block locations in INode. Block newBlock = fsn.createNewBlock(blockType); INodesInPath inodesInPath = INodesInPath.fromINode(pendingFile); saveAllocatedBlock(fsn, src, inodesInPath, newBlock, targets, blockType); persistNewBlock(fsn, src, pendingFile); ``` Does the BlockUnderConstructionFeature#replicas also write to editlog because it is a part of lastBlock. Thanks a lot. > EC: Fix the bug of failed lease recovery. > ----------------------------------------- > > Key: HDFS-17150 > URL: https://issues.apache.org/jira/browse/HDFS-17150 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Shuyan Zhang > Priority: Major > Labels: pull-request-available > > If the client crashes without writing the minimum number of internal blocks > required by the EC policy, the lease recovery process for the corresponding > unclosed file may continue to fail. Taking RS(6,3) policy as an example, the > timeline is as follows: > 1. The client writes some data to only 5 datanodes; > 2. Client crashes; > 3. NN fails over; > 4. Now the result of `uc.getNumExpectedLocations()` completely depends on > block report, and there are 5 datanodes reporting internal blocks; > 5. When the lease expires hard limit, NN issues a block recovery command; > 6. The datanode checks the command and finds that the number of internal > blocks is insufficient, resulting in an error and recovery failure; > 7. The lease expires hard limit again, and NN issues a block recovery command > again, but the recovery fails again...... > When the number of internal blocks written by the client is less than 6, the > block group is actually unrecoverable. We should equate this situation to the > case where the number of replicas is 0 when processing replica files, i.e., > directly remove the last block group and close the file. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org