[jira] [Created] (HDFS-17660) HDFS cache commands should be throttled to avoid contention with the write pipeline

Shangshu Qian (Jira) Sat, 09 Nov 2024 17:51:10 -0800

Shangshu Qian created HDFS-17660:
------------------------------------

             Summary: HDFS cache commands should be throttled to avoid 
contention with the write pipeline
                 Key: HDFS-17660
                 URL: https://issues.apache.org/jira/browse/HDFS-17660
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: caching
    Affects Versions: 2.10.2, 3.4.0
            Reporter: Shangshu Qian



We found a potential feedback loop between the HDFS write pipeline and the 
block caching commands. Currently, there is no throttling on the number of 
cache commands generated for each heartbeat (HB) reply, unlike the block 
replication commands, which is throttled by 
`dfs.namenode.replication.work.multiplier.per.iteration`.

 

The positive feedback loop of workload can be described as follows:
 # When there is a high write workload to the DN, there may be IOExceptions 
thrown in the write pipeline, causing more IncrementalBlockReports (IBRs) to be 
sent to the NN.
 # The IBRs can have a contention with the HB handling and the cache commands 
generations on the NN, because they are all part of the HB handling logic.
 # When the DN's heartbeat is delayed, the 
`CacheReplicationMonitor.chooseDatanodesForCaching` may take more time to 
iterate through more DNs because some DNs are temporarily unavailable due to 
the HB delays. Some cached blocks can also be temporarily unavailable, and the 
NN needs to generate commands for these blocks again, which also makes the 
cache command generation slower for each HB.
 # The extra cache commands generated causes extra workload on the DN, making 
them more vulnerable to IOExceptions in the write pipeline.

 

Add throttling similar to the one in `BlockManager.computeDatanodeWork` can 
make this feedback loop less likely to happen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (HDFS-17660) HDFS cache commands should be throttled to avoid contention with the write pipeline

Reply via email to