[ 
https://issues.apache.org/jira/browse/HDFS-14563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866538#comment-16866538
 ] 

He Xiaoqiao commented on HDFS-14563:
------------------------------------

Thanks [~sodonnell].
{quote}If we are adding decommission / recommission, we should also add enter / 
leave maintenance mode as they are closely related{quote}
Sure, we should add maintenance mode as decommission. I will update that later.
{quote}Writing to the existing include / exclude files, or another separate 
file is one option, but that would be difficult to replicate to the SBNN.{quote}
I prefer to ZK than other options. Since it is used by YARN/RBF for long times. 
And I think it is mature solution. Of course, we should provide different 
options to users choose by themselves.
Another side, introduce another component is will bring OP. cost. Welcome other 
grace mode.

> Enhance interface about recommissioning/decommissioning
> -------------------------------------------------------
>
>                 Key: HDFS-14563
>                 URL: https://issues.apache.org/jira/browse/HDFS-14563
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Xiaoqiao
>            Priority: Major
>         Attachments: HDFS-14563.001.patch
>
>
> In current implementation, if we need to decommissioning or recommissioning 
> one datanode, the only way is add the datanode to include or exclude file 
> under namenode configuration path then execute command `bin/hadoop dfsadmin 
> -refreshNodes` and trigger namenode to reload include/exclude and start to 
> recommissioning or decommissioning datanode.
> The shortcomings of this approach is that:
> a. namenode reload include/exclude configuration file from devices, if I/O 
> load is high, handler may be blocked.
> b. namenode has to process every datnodes in include and exclude 
> configurations, if there are many datanodes (very common for large cluster) 
> pending to process, namenode will be hung for hundred seconds to wait 
> recommision/decommision finish at the worst since holding write lock.
> I think we should expose one lightweight interface to support recommissioning 
> or decommissioning single datanode, thus we can operate datanode using 
> dfsadmin more smooth.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to