Chenyu Zheng created HDFS-17515:
-----------------------------------
Summary: Erasure Coding: ErasureCodingWork is not effectively
limited during a block reconstruction cycle.
Key: HDFS-17515
URL: https://issues.apache.org/jira/browse/HDFS-17515
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Chenyu Zheng
Assignee: Chenyu Zheng
In a block reconstruction cycle, ErasureCodingWork is not effectively limited.
I add some debug log, log when ecBlocksToBeReplicated is an integer multiple of
100.
{code:java}
2024-05-09 10:46:06,986 DEBUG
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY:
ecBlocksToBeReplicated for IP:PORT already have 100 blocks
2024-05-09 10:46:06,987 DEBUG
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY:
ecBlocksToBeReplicated for IP:PORT already have 200 blocks
...
2024-05-09 10:46:06,992 DEBUG
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY:
ecBlocksToBeReplicated for IP:PORT already have 2000 blocks
2024-05-09 10:46:06,992 DEBUG
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManagerZCY:
ecBlocksToBeReplicated for IP:PORT already have 2100 blocks {code}
During a block reconstruction cycle, ecBlocksToBeReplicated increases from 0 to
2100, This is much larger than replicationStreamsHardLimit. This brings
unfairness and leads to a greater tendency to copy EC blocks.
In fact, for non ec block, this is not a problem.
pendingReplicationWithoutTargets increase when schedule work. When
pendingReplicationWithoutTargets is too big, will not schedule work for this
node.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]