[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13893919#comment-13893919 ]
Hadoop QA commented on HDFS-5837: --------------------------------- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627409/HDFS-5837_B.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6060//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6060//console This message is automatically generated. > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > ---------------------------------------------------------------------------- > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.0.0-alpha, 2.0.6-alpha, 2.2.0 > Reporter: Bryan Beaudreault > Assignee: Tao Luo > Attachments: HDFS-5837.patch, HDFS-5837_B.patch > > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)