Chenyu Zheng created HDFS-17542: ----------------------------------- Summary: EC: Optimize the EC block reconstruction. Key: HDFS-17542 URL: https://issues.apache.org/jira/browse/HDFS-17542 Project: Hadoop HDFS Issue Type: Improvement Reporter: Chenyu Zheng Assignee: Chenyu Zheng
The current reconstruction process of EC blocks is based on the original contiguous blocks. It is mainly implemented through the work constructed by computeReconstructionWorkForBlocks. It can be roughly divided into three processes: * scheduleReconstruction * chooseTargets * validateReconstructionWork For ordinary contiguous blocks: * (1) scheduleReconstruction Select srcNodes as the source of the copy block according to the status of each replica of the block. * (2) chooseTargets Select the target of the copy. * (3) validateReconstructionWork Add the copy command to srcNode, srcNode receives the command through heartbeat, and executes the block copy from src to target. For EC blocks: (1) and (2) are nearly same. However, in (3), block copying or block reconstruction may occur, or no work may be generated, such as when some storage are busy. If no work is generated, it will lead to the problem described in HDFS-17516. Even if no block copying or block reconstruction is generated, pendingReconstruction and neededReconstruction will still be updated until the block times out, which wastes the scheduling opportunity. In order to be compatible with the original contiguous blocks and decide the specific action in (3), unnecessary liveBlockIndices, liveBusyBlockIndices, and excludeReconstructedIndices are introduced. We know many bug is related here. These can be avoided. Improvements: * Move the work of deciding whether to copy or reconstruct blocks from (3) to (1). Such improvements are more conducive to implementing the explicit specification of the reconstruction block index mentioned in HDFS-16874, and do not need to pass liveBlockIndices, liveBusyBlockIndice. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org