Shangshu Qian created HDFS-17837:
------------------------------------
Summary: Potential Feedback Loop in Write Pipeline
Key: HDFS-17837
URL: https://issues.apache.org/jira/browse/HDFS-17837
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.10.2
Reporter: Shangshu Qian
We find that a delay in the pipeline recovery operations may cause block
recovery to fail, resulting in workload amplification.
Pipeline rebuild cause extra workload to the DNs in the cluster.
-> The pipeline rebuild can cause contention with the block recovery
operations, which is also an inter-datanode operation.
-> The failed block recovery may cause extra retries, making the DN load higher.
-> The IBR reporting in the heartbeat is delayed due to IOE caused by
congestion.
-> The write pipeline fails because the IBR is delayed, causing more pipeline
rebuild.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]