Shangshu Qian created HDFS-17660:
------------------------------------
Summary: HDFS cache commands should be throttled to avoid
contention with the write pipeline
Key: HDFS-17660
URL: https://issues.apache.org/jira/browse/HDFS-17660
Project: Hadoop HDFS
Issue Type: Bug
Components: caching
Affects Versions: 2.10.2, 3.4.0
Reporter: Shangshu Qian
We found a potential feedback loop between the HDFS write pipeline and the
block caching commands. Currently, there is no throttling on the number of
cache commands generated for each heartbeat (HB) reply, unlike the block
replication commands, which is throttled by
`dfs.namenode.replication.work.multiplier.per.iteration`.
The positive feedback loop of workload can be described as follows:
# When there is a high write workload to the DN, there may be IOExceptions
thrown in the write pipeline, causing more IncrementalBlockReports (IBRs) to be
sent to the NN.
# The IBRs can have a contention with the HB handling and the cache commands
generations on the NN, because they are all part of the HB handling logic.
# When the DN's heartbeat is delayed, the
`CacheReplicationMonitor.chooseDatanodesForCaching` may take more time to
iterate through more DNs because some DNs are temporarily unavailable due to
the HB delays. Some cached blocks can also be temporarily unavailable, and the
NN needs to generate commands for these blocks again, which also makes the
cache command generation slower for each HB.
# The extra cache commands generated causes extra workload on the DN, making
them more vulnerable to IOExceptions in the write pipeline.
Add throttling similar to the one in `BlockManager.computeDatanodeWork` can
make this feedback loop less likely to happen.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]