Daryn Sharp created HDFS-7967: --------------------------------- Summary: Reduce the performance impact of the balancer Key: HDFS-7967 URL: https://issues.apache.org/jira/browse/HDFS-7967 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Assignee: Daryn Sharp
The balancer needs to query for blocks to move from overly full DNs. The block lookup is extremely inefficient. An iterator of the node's blocks is created from the iterators of its storages' blocks. A random number is chosen corresponding to how many blocks will be skipped via the iterator. Each skip requires costly scanning of triplets. The current design also only considers node imbalances while ignoring imbalances within the nodes's storages. A more efficient and intelligent design may eliminate the costly skipping of blocks via round-robin selection of blocks from the storages based on remaining capacity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)