Hi, I am trying to set up a Hadoop cluster (using hadoop-0.20.2) using a bunch of machines each of which have 2 interfaces, a control and an internal interface. I want only the internal interface to be used for running hadoop (all hadoop control and data traffic is to be sent only using the internal interface). I modified the dfs.datanode.dns.interface in hdfs-site.xml and mapred.tasktracker.dns.interface in mapred-site.xml to point to the internal interfaces on each of the machines in my cluster. However, even after that, the communication happens on the control interface (a tcpdump shows that the control interface of the nodes is being used to transfer data during the shuffle phase!).
How can I make sure that all data exchanged between the slaves in my cluster is through the internal interface and not using the control interface? Any help would be appreciated. Thanks, Virajith