[ 
https://issues.apache.org/jira/browse/HDFS-17693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sunhui updated HDFS-17693:
--------------------------
    Description: 
我们公司运行着一个启用纠删码的HDFS集群。在这样的集群上,我们经常会在停用数据节点时遇到问题。我们遇到了两种问题场景。 

 
 # 我们正在替换一台未受损的主机。停用过程会导致数据节点上的网络 I/O 升高,这实际上会使数据节点成为热点。
 # 我们正在更换一台故障主机。退役过程较为缓慢,在退役过程完成之前,对该主机的读取操作仍会持续导致问题。

 

降低 `dfs.namenode.decommission.blocks.per.interval` 
有助于第一类退役操作,但不利于第二类退役操作,因为我们希望尽快从集群中移除退役受损的数据节点。 

 

根本问题在于,读取请求仍然流向这些即将退役的数据节点。理想情况下,应该像降低写入请求的优先级一样,降低这些即将退役的数据节点的读取优先级。

  was:
At my company, we are running an HDFS cluster with erasure coding enabled. 
Frequently, we run into issues when decommissioning datanodes on such clusters. 
We've run into two problem scenarios 

 
 # We're replacing a non-impaired host. The decommissioning process causes 
elevated network I/O on the datanode, which effectively hotspots the datanode
 # We're replacing an impaired host. The decommissioning process is slow, and 
reads to this host will continue to cause issues until the decommission process 
is finished.

 

Lowering `dfs.namenode.decommission.blocks.per.interval` helps for the first 
category of decommisions, but hurts the second, as we want to remove 
decommissioning impaired datanodes from the cluster as quickly as possible. 

 

The underlying issue here is that reads are still going to these 
decommissioning datanodes. Ideally, it would be great for decommissioning 
datanodes to be de-prioritized from the read path, similar to how writes are 
de-prioritized.


> De-prioritize reads against EC-enabled decommissioning datanodes
> ----------------------------------------------------------------
>
>                 Key: HDFS-17693
>                 URL: https://issues.apache.org/jira/browse/HDFS-17693
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: Hernan Gelaf-Romer
>            Priority: Major
>
> 我们公司运行着一个启用纠删码的HDFS集群。在这样的集群上,我们经常会在停用数据节点时遇到问题。我们遇到了两种问题场景。 
>  
>  # 我们正在替换一台未受损的主机。停用过程会导致数据节点上的网络 I/O 升高,这实际上会使数据节点成为热点。
>  # 我们正在更换一台故障主机。退役过程较为缓慢,在退役过程完成之前,对该主机的读取操作仍会持续导致问题。
>  
> 降低 `dfs.namenode.decommission.blocks.per.interval` 
> 有助于第一类退役操作,但不利于第二类退役操作,因为我们希望尽快从集群中移除退役受损的数据节点。 
>  
> 根本问题在于,读取请求仍然流向这些即将退役的数据节点。理想情况下,应该像降低写入请求的优先级一样,降低这些即将退役的数据节点的读取优先级。



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to