[
https://issues.apache.org/jira/browse/HDFS-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-1480:
------------------------------
Summary: All replicas of a block can end up on the same rack when some
datanodes are decommissioning. (was: All replicas for a block with repl=2 end
up in same rack)
> All replicas of a block can end up on the same rack when some datanodes are
> decommissioning.
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-1480
> URL: https://issues.apache.org/jira/browse/HDFS-1480
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.20.2
> Reporter: T Meyarivan
> Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-1480-test.txt, hdfs-1480.txt, hdfs-1480.txt,
> hdfs-1480.txt
>
>
> It appears that all replicas of a block can end up in the same rack. The
> likelihood of such replicas seems to be directly related to decommissioning
> of nodes.
> Post rolling OS upgrade (decommission 3-10% of nodes, re-install etc, add
> them back) of a running cluster, all replicas of about 0.16% of blocks ended
> up in the same rack.
> Hadoop Namenode UI etc doesn't seem to know about such incorrectly replicated
> blocks. "hadoop fsck .." does report that the blocks must be replicated on
> additional racks.
> Looking at ReplicationTargetChooser.java, following seem suspect:
> snippet-01:
> {code}
> int maxNodesPerRack =
> (totalNumOfReplicas-1)/clusterMap.getNumOfRacks()+2;
> {code}
> snippet-02:
> {code}
> case 2:
> if (clusterMap.isOnSameRack(results.get(0), results.get(1))) {
> chooseRemoteRack(1, results.get(0), excludedNodes,
> blocksize, maxNodesPerRack, results);
> } else if (newBlock){
> chooseLocalRack(results.get(1), excludedNodes, blocksize,
> maxNodesPerRack, results);
> } else {
> chooseLocalRack(writer, excludedNodes, blocksize,
> maxNodesPerRack, results);
> }
> if (--numOfReplicas == 0) {
> break;
> }
> {code}
> snippet-03:
> {code}
> do {
> DatanodeDescriptor[] selectedNodes =
> chooseRandom(1, nodes, excludedNodes);
> if (selectedNodes.length == 0) {
> throw new NotEnoughReplicasException(
> "Not able to place enough
> replicas");
> }
> result = (DatanodeDescriptor)(selectedNodes[0]);
> } while(!isGoodTarget(result, blocksize, maxNodesPerRack, results));
> {code}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira