Nikola Vujic created HDFS-5184:
----------------------------------
Summary: BlockPlacementPolicyWithNodeGroup does not work correct
when avoidStaleNodes is true
Key: HDFS-5184
URL: https://issues.apache.org/jira/browse/HDFS-5184
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Nikola Vujic
Priority: Minor
If avoidStaleNodes is true then choosing targets is potentially done in two
attempts. If we don't find enough targets to place replicas in the first
attempt then second attempt is invoked with the aim to use stale nodes in order
to find the remaining targets. This second attempt breaks node group rule of
not having two replicas in the same node group.
Invocation of the second attempt looks like this:
DatanodeDescriptor choseTarget(excludeNodes,...) {
oldExcludedNodes=new HashMap<Node, Node>(excludedNodes);
// first attempt
// if we don't find enough targets then
if (avoidStaleNodes) {
for (Node node : results)
{ oldExcludedNodes.put(node, node); }
numOfReplicas = totalReplicasExpected - results.size();
return chooseTarget(numOfReplicas, writer, oldExcludedNodes, blocksize,
maxNodesPerRack, results, false);
}
}
So, all excluded nodes from the first attempt which are neigher in
oldExcludedNodes nor in results will be ignored and the second invocation of
chooseTarget will use an incomplete set of excluded nodes. For example, if we
have next topology:
dn1 -> /d1/r1/n1
dn2 -> /d1/r1/n1
dn3 -> /d1/r1/n2
dn4 -> /d1/r1/n2
and if we want to choose 3 targets with avoidStaleNodes=true then in the first
attempt we will choose 2 targets since we have only two node groups. Let's say
we choose dn1 and dn3. Then, we will add dn1 and dn2 in the oldExcudedNodes and
use that set of excluded nodes in the second attempt. This set of excluded
nodes is incomplete and allows us to select dn2 and dn4 which should not be
selected due to node group awareness but it is happening in the current code!
Quick repro:
- add
CONF.setBoolean(DFSConfigKeys.DFS_NAMENODE_AVOID_STALE_DATANODE_FOR_WRITE_KEY,
true); to TestReplicationPolicyWithNodeGroup.
- testChooseMoreTargetsThanNodeGroups() should fail.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira