[ https://issues.apache.org/jira/browse/MAPREDUCE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261753#comment-14261753 ]
Allen Wittenauer commented on MAPREDUCE-4168: --------------------------------------------- Why would one use multiple nics? The easy and obvious reason is security. It's an extremely desirable config to have compute nodes be 'outbound only'. If YARN is providing compute power to an automated system, there is no reason for the client to talk to anything other than the RM and maybe the proxy server. Input is fetched via some other system and output is pushed as the last part of the pipeline. On the same token, in some networks there is a separate admin network that is used for operational processes. That network is trusted, the user facing one is not. Hadoop supports multiple file systems. There are many filesystem designs where it's reasonable to configure another nic acting as a backhaul to the backup infrastructure. The last thing you'd want going across that pipe is user network traffic. ... and those are just the ones off the top of my head. bq. Clients that would need RPC connectivity to compute nodes, would be within cluster network. This seems to be too narrow of a view of the potential operating environment. In other words, who says that the multiple nics are there because Hadoop needs them? What if Hadoop is going into a DC that is brown field or has other custom needs? Of course, as pointed above, it's trivial to come up with a realistic use case where clients only need RPC access to master nodes which are, unfortunately, the key problem bits. > Support multiple network interfaces > ----------------------------------- > > Key: MAPREDUCE-4168 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4168 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Tom White > > Umbrella jira to track the MapReduce side of HADOOP-8198. -- This message was sent by Atlassian JIRA (v6.3.4#6332)