[ https://issues.apache.org/jira/browse/HDFS-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Luo updated HDFS-5837: -------------------------- Status: Patch Available (was: Open) > dfs.namenode.replication.considerLoad does not consider decommissioned nodes > ---------------------------------------------------------------------------- > > Key: HDFS-5837 > URL: https://issues.apache.org/jira/browse/HDFS-5837 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.2.0, 2.0.6-alpha, 2.0.0-alpha > Reporter: Bryan Beaudreault > Assignee: Tao Luo > Attachments: HDFS-5837.patch, HDFS-5837_B.patch > > > In DefaultBlockPlacementPolicy, there is a setting > dfs.namenode.replication.considerLoad which tries to balance the load of the > cluster when choosing replica locations. This code does not take into > account decommissioned nodes. > The code for considerLoad calculates the load by doing: TotalClusterLoad / > numNodes. However, numNodes includes decommissioned nodes (which have 0 > load). Therefore, the average load is artificially low. Example: > TotalLoad = 250 > numNodes = 100 > decommissionedNodes = 70 > remainingNodes = numNodes - decommissionedNodes = 30 > avgLoad = 250/100 = 2.50 > trueAvgLoad = 250 / 30 = 8.33 > If the real load of the remaining 30 nodes is (on average) 8.33, this is more > than 2x the calculated average load of 2.50. This causes these nodes to be > rejected as replica locations. The final result is that all nodes are > rejected, and no replicas can be placed. > See exceptions printed from client during this scenario: > https://gist.github.com/bbeaudreault/49c8aa4bb231de54e9c1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)