[ 
https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896499#comment-13896499
 ] 

Hudson commented on HDFS-5837:
------------------------------

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1669 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1669/])
HDFS-5837. dfs.namenode.replication.considerLoad should consider decommissioned 
nodes. Contributed by Tao Luo. (shv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566410)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java


> dfs.namenode.replication.considerLoad does not consider decommissioned nodes
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-5837
>                 URL: https://issues.apache.org/jira/browse/HDFS-5837
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0
>            Reporter: Bryan Beaudreault
>            Assignee: Tao Luo
>             Fix For: 2.3.0
>
>         Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, 
> HDFS-5837_branch_2.2.0.patch
>
>
> In DefaultBlockPlacementPolicy, there is a setting 
> dfs.namenode.replication.considerLoad which tries to balance the load of the 
> cluster when choosing replica locations.  This code does not take into 
> account decommissioned nodes.
> The code for considerLoad calculates the load by doing:  TotalClusterLoad / 
> numNodes.  However, numNodes includes decommissioned nodes (which have 0 
> load).  Therefore, the average load is artificially low.  Example:
> TotalLoad = 250
> numNodes = 100
> decommissionedNodes = 70
> remainingNodes = numNodes - decommissionedNodes = 30
> avgLoad = 250/100 = 2.50
> trueAvgLoad = 250 / 30 = 8.33
> If the real load of the remaining 30 nodes is (on average) 8.33, this is more 
> than 2x the calculated average load of 2.50.  This causes these nodes to be 
> rejected as replica locations. The final result is that all nodes are 
> rejected, and no replicas can be placed.  
> See exceptions printed from client during this scenario: 
> https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to