[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896037#comment-13896037 ]
Hudson commented on HDFS-5837: ------------------------------ SUCCESS: Integrated in Hadoop-trunk-Commit #5134 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5134/]) HDFS-5837. dfs.namenode.replication.considerLoad should consider decommissioned nodes. Contributed by Tao Luo. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1566410) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > ---------------------------------------------------------------------------- > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.6-alpha, 2.2.0 > Reporter: Bryan Beaudreault > Assignee: Tao Luo > Fix For: 2.3.0 > > Attachments: HDFS-5837.patch, HDFS-5837_B.patch, HDFS-5837_C.patch, > HDFS-5837_branch_2.2.0.patch > > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)