[ https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987312#comment-13987312 ]
Tsz Wo Nicholas Sze commented on HDFS-5168: ------------------------------------------- The failure of TestBalancerWithNodeGroup was not related. It also failed in some other builds, e.g [build #6780|https://builds.apache.org/job/PreCommit-HDFS-Build/6780/testReport/junit/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithRackLocality/] and my local machine without the patch. The javadoc warning was due to the new ScriptBasedMapping constructor "@param argument "conf" is not a parameter name." Nikola, could you upload a patch to fix the javadoc warning? > BlockPlacementPolicy does not work for cross node group dependencies > -------------------------------------------------------------------- > > Key: HDFS-5168 > URL: https://issues.apache.org/jira/browse/HDFS-5168 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Nikola Vujic > Assignee: Nikola Vujic > Priority: Critical > Attachments: HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, > HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, HDFS-5168.patch, > HDFS-5168.patch > > > Block placement policies do not work for cross rack/node group dependencies. > In reality this is needed when compute servers and storage fall in two > independent fault domains, then both BlockPlacementPolicyDefault and > BlockPlacementPolicyWithNodeGroup are not able to provide proper block > placement. > Let's suppose that we have Hadoop cluster with one rack with two servers, and > we run 2 VMs per server. Node group topology for this cluster would be: > server1-vm1 -> /d1/r1/n1 > server1-vm2 -> /d1/r1/n1 > server2-vm1 -> /d1/r1/n2 > server2-vm2 -> /d1/r1/n2 > This is working fine as long as server and storage fall into the same fault > domain but if storage is in a different fault domain from the server, we will > not be able to handle that. For example, if storage of server1-vm1 is in the > same fault domain as storage of server2-vm1, then we must not place two > replicas on these two nodes although they are in different node groups. > Two possible approaches: > - One approach would be to define cross rack/node group dependencies and to > use them when excluding nodes from the search space. This looks as the > cleanest way to fix this as it requires minor changes in the > BlockPlacementPolicy classes. > - Other approach would be to allow nodes to fall in more than one node group. > When we chose a node to hold a replica we have to exclude from the search > space all nodes from the node groups where the chosen node belongs. This > approach may require major changes in the NetworkTopology. -- This message was sent by Atlassian JIRA (v6.2#6252)