[ 
https://issues.apache.org/jira/browse/HDFS-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951189#comment-13951189
 ] 

Nikola Vujic commented on HDFS-5168:
------------------------------------

Hi [~djp],

Finally this task is on top of my table. I have developed and submitted a demo 
patch.

Here is a short description of the patch:

The whole logic related to excluding appropriate nodes from the search space 
for the replica placement is in the {{addDependentNodesToExcludedNodes}} method 
in the {{BlockPlacementPolicyWithNodeGroup}} class.  
{{addDependentNodesToExcludedNodes}} method is called from 
{{addToExcludedNodes}} method and it is used to adjust {{excludedNodes}} set by 
adding nodes which are dependent on the {{chosenNode}}.

Getting a list of dependent nodes for a given node is exposed now via 
{{DNSToSwitchMapping}} inteface which is extended with {{public List<String> 
getDependency(String name);}} method for this purpose. We need to provide 
hostname or IP address as an argument to the {{getDependency}} method and it 
returns a list of host names. I picked up host names to define dependencies in 
order to avoid incosistency with systems where multiple IP are used per host. 
Also, in some systems network can failover from one NIC to another and then we 
will have to discover dependencies again due to change in IP address. 
Otherwise, our dependences will be in the incosistent state. Using host names 
solves this problem.

Dependency is discovered during node registration and preserved in the 
{{DatanodeInfo}} class in {{dependentHostNames}}. {{Host2NodesMap}} class is 
extended with {{mapHost}} hash map which maps hostnames to IP addresses. This 
is used in the {{BlockPlacementPolicyWithNodeGroup}} class when we search for 
the dependent nodes by host name.

This demo patch implements getDependency only for ScriptBasedMapping. 
{{net.topology.dependency.script.file.name}} is a new configuration property 
which value is a path to the script to be used for getting dependencies.

If this approach is fine I will implement getDependency for TableMapping and 
StaticMapping classes as well.


Thanks,
Nikola.

> BlockPlacementPolicy does not work for cross node group dependencies
> --------------------------------------------------------------------
>
>                 Key: HDFS-5168
>                 URL: https://issues.apache.org/jira/browse/HDFS-5168
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Nikola Vujic
>            Assignee: Nikola Vujic
>            Priority: Critical
>         Attachments: HDFS-5168.patch
>
>
> Block placement policies do not work for cross rack/node group dependencies. 
> In reality this is needed when compute servers and storage fall in two 
> independent fault domains, then both BlockPlacementPolicyDefault and 
> BlockPlacementPolicyWithNodeGroup are not able to provide proper block 
> placement.
> Let's suppose that we have Hadoop cluster with one rack with two servers, and 
> we run 2 VMs per server. Node group topology for this cluster would be:
>  server1-vm1 -> /d1/r1/n1
>  server1-vm2 -> /d1/r1/n1
>  server2-vm1 -> /d1/r1/n2
>  server2-vm2 -> /d1/r1/n2
> This is working fine as long as server and storage fall into the same fault 
> domain but if storage is in a different fault domain from the server, we will 
> not be able to handle that. For example, if storage of server1-vm1 is in the 
> same fault domain as storage of server2-vm1, then we must not place two 
> replicas on these two nodes although they are in different node groups.
> Two possible approaches:
> - One approach would be to define cross rack/node group dependencies and to 
> use them when excluding nodes from the search space. This looks as the 
> cleanest way to fix this as it requires minor changes in the 
> BlockPlacementPolicy classes.
> - Other approach would be to allow nodes to fall in more than one node group. 
> When we chose a node to hold a replica we have to exclude from the search 
> space all nodes from the node groups where the chosen node belongs. This 
> approach may require major changes in the NetworkTopology.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to