[ 
https://issues.apache.org/jira/browse/HDFS-7742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389145#comment-14389145
 ] 

Hudson commented on HDFS-7742:
------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2099 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2099/])
HDFS-7742. Favoring decommissioning node for replication can cause a block to 
stay (kihwal: rev 04ee18ed48ceef34598f954ff40940abc9fde1d2)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> favoring decommissioning node for replication can cause a block to stay 
> underreplicated for long periods
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7742
>                 URL: https://issues.apache.org/jira/browse/HDFS-7742
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>             Fix For: 2.7.0
>
>         Attachments: HDFS-7742-v0.patch
>
>
> When choosing a source node to replicate a block from, a decommissioning node 
> is favored. The reason for the favoritism is that decommissioning nodes 
> aren't servicing any writes so in-theory they are less loaded.
> However, the same selection algorithm also tries to make sure it doesn't get 
> "stuck" on any particular node:
> {noformat}
>       // switch to a different node randomly
>       // this to prevent from deterministically selecting the same node even
>       // if the node failed to replicate the block on previous iterations
> {noformat}
> Unfortunately, the decommissioning check is prior to this randomness so the 
> algorithm can get stuck trying to replicate from a decommissioning node. 
> We've seen this in practice where a decommissioning datanode was failing to 
> replicate a block for many days, when other viable replicas of the block were 
> available.
> Given that we limit the number of streams we'll assign to a given node 
> (default soft limit of 2, hard limit of 4), It doesn't seem like favoring a 
> decommissioning node has significant benefit. i.e. when there is significant 
> replication work to do, we'll quickly hit the stream limit of the 
> decommissioning nodes and use other nodes in the cluster anyway; when there 
> isn't significant replication work then in theory we've got plenty of 
> replication bandwidth available so choosing a decommissioning node isn't much 
> of a win.
> I see two choices:
> 1) Change the algorithm to still favor decommissioning nodes but with some 
> level of randomness that will avoid always selecting the decommissioning node
> 2) Remove the favoritism for decommissioning nodes
> I prefer #2. It simplifies the algorithm, and given the other throttles we 
> have in place, I'm not sure there is a significant benefit to selecting 
> decommissioning nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to