Chenyu Zheng created HDFS-17516:
-----------------------------------
Summary: Erasure Coding: Some reconstruction blocks and metrics
are inaccuracy when decommission DN which contains many EC blocks.
Key: HDFS-17516
URL: https://issues.apache.org/jira/browse/HDFS-17516
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Chenyu Zheng
Assignee: Chenyu Zheng
Attachments: 截屏2024-05-09 下午3.59.22.png, 截屏2024-05-09 下午3.59.44.png
When decommission DN which contains many EC blocks, this DN will mark as busy
by
scheduleReconstruction, then ErasureCodingWork::addTaskToDatanode will not
generate any block to ecBlocksToBeReplicated.
Although no DNA_TRANSFER BlockCommand will be generated for this block,
pendingReconstruction and neededReconstruction are still updated, and
blockmanager mistakenly believes that the block is being copied.
The periodic increases of Metrics
`fs_namesystem_num_timed_out_pending_reconstructions` and
`fs_namesystem_under_replicated_blocks` also prove this. In fact, many blocks
are not actually copied. These blocks are re-added to neededReconstruction
until they time out.
!截屏2024-05-09 下午3.59.44.png!!截屏2024-05-09 下午3.59.22.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]