[ 
https://issues.apache.org/jira/browse/HDFS-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458926#comment-13458926
 ] 

Luke Lu commented on HDFS-1590:
-------------------------------

This has become a FEI (Frequently Encountered Issue) for new dev/QAs :)

Maybe we should introduce a dfsadmin -decommission command that handles all 
these logic with a --force option to automatically setrep #remaining-nodes on 
the files with replications > #remaining-nodes and a --force-data-loss option 
to immediately remove nodes (mostly for testing purpose.)
                
> Decommissioning never ends when node to decommission has blocks that are 
> under-replicated and cannot be replicated to the expected level of replication
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1590
>                 URL: https://issues.apache.org/jira/browse/HDFS-1590
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>         Environment: Linux
>            Reporter: Mathias Herberts
>            Assignee: Harsh J
>            Priority: Minor
>
> On a test cluster with 4 DNs and a default repl level of 3, I recently 
> attempted to decommission one of the DNs. Right after the modification of the 
> dfs.hosts.exclude file and the 'dfsadmin -refreshNodes', I could see the 
> blocks being replicated to other nodes.
> After a while, the replication stopped but the node was not marked as 
> decommissioned.
> When running an 'fsck -files -blocks -locations' I saw that all files had a 
> replication of 4 (which is logical given there are 4 DNs), but some of the 
> files had an expected replication set to 10 (those were job.jar files from 
> M/R jobs).
> I ran 'fs -setrep 3' on those files and shortly after the namenode reported 
> the DN as decommissioned.
> Shouldn't this case be checked by the NameNode when decommissioning a node? 
> I.e considere a node decommissioned if either one of the following is true 
> for each block on the node being decommissioned:
> 1. It is replicated more than the expected replication level.
> 2. It is replicated as much as possible given the available nodes, even 
> though it is less replicated than expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to