Author: arp Date: Tue Jul 22 20:28:57 2014 New Revision: 1612695 URL: http://svn.apache.org/r1612695 Log: HDFS-6712. Document HDFS Multihoming Settings. (Contributed by Arpit Agarwal)
Added: hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm Modified: hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Modified: hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt?rev=1612695&r1=1612694&r2=1612695&view=diff ============================================================================== --- hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt (original) +++ hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Tue Jul 22 20:28:57 2014 @@ -602,6 +602,8 @@ Release 2.5.0 - UNRELEASED HDFS-6680. BlockPlacementPolicyDefault does not choose favored nodes correctly. (szetszwo) + HDFS-6712. Document HDFS Multihoming Settings. (Arpit Agarwal) + OPTIMIZATIONS HDFS-6214. Webhdfs has poor throughput for files >2GB (daryn) Added: hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm?rev=1612695&view=auto ============================================================================== --- hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm (added) +++ hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm Tue Jul 22 20:28:57 2014 @@ -0,0 +1,145 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Distributed File System-${project.version} - Support for Multi-Homed Networks + --- + --- + ${maven.build.timestamp} + +HDFS Support for Multihomed Networks + + This document is targetted to cluster administrators deploying <<<HDFS>>> in + multihomed networks. Similar support for <<<YARN>>>/<<<MapReduce>>> is + work in progress and will be documented when available. + +%{toc|section=1|fromDepth=0} + +* Multihoming Background + + In multihomed networks the cluster nodes are connected to more than one + network interface. There could be multiple reasons for doing so. + + [[1]] <<Security>>: Security requirements may dictate that intra-cluster + traffic be confined to a different network than the network used to + transfer data in and out of the cluster. + + [[2]] <<Performance>>: Intra-cluster traffic may use one or more high bandwidth + interconnects like Fiber Channel, Infiniband or 10GbE. + + [[3]] <<Failover/Redundancy>>: The nodes may have multiple network adapters + connected to a single network to handle network adapter failure. + + + Note that NIC Bonding (also known as NIC Teaming or Link + Aggregation) is a related but separate topic. The following settings + are usually not applicable to a NIC bonding configuration which handles + multiplexing and failover transparently while presenting a single 'logical + network' to applications. + +* Fixing Hadoop Issues In Multihomed Environments + +** Ensuring HDFS Daemons Bind All Interfaces + + By default <<<HDFS>>> endpoints are specified as either hostnames or IP addresses. + In either case <<<HDFS>>> daemons will bind to a single IP address making + the daemons unreachable from other networks. + + The solution is to have separate setting for server endpoints to force binding + the wildcard IP address <<<INADDR_ANY>>> i.e. <<<0.0.0.0>>>. Do NOT supply a port + number with any of these settings. + +---- +<property> + <name>dfs.namenode.rpc-bind-host</name> + <value>0.0.0.0</value> + <description> + The actual address the RPC server will bind to. If this optional address is + set, it overrides only the hostname portion of dfs.namenode.rpc-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node listen on all interfaces by + setting it to 0.0.0.0. + </description> +</property> + +<property> + <name>dfs.namenode.servicerpc-bind-host</name> + <value>0.0.0.0</value> + <description> + The actual address the service RPC server will bind to. If this optional address is + set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node listen on all interfaces by + setting it to 0.0.0.0. + </description> +</property> + +<property> + <name>dfs.namenode.http-bind-host</name> + <value>0.0.0.0</value> + <description> + The actual adress the HTTP server will bind to. If this optional address + is set, it overrides only the hostname portion of dfs.namenode.http-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node HTTP server listen on all + interfaces by setting it to 0.0.0.0. + </description> +</property> + +<property> + <name>dfs.namenode.https-bind-host</name> + <value>0.0.0.0</value> + <description> + The actual adress the HTTPS server will bind to. If this optional address + is set, it overrides only the hostname portion of dfs.namenode.https-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node HTTPS server listen on all + interfaces by setting it to 0.0.0.0. + </description> +</property> +---- + +** Clients use Hostnames when connecting to DataNodes + + By default <<<HDFS>>> clients connect to DataNodes using the IP address + provided by the NameNode. Depending on the network configuration this + IP address may be unreachable by the clients. The fix is letting clients perform + their own DNS resolution of the DataNode hostname. The following setting + enables this behavior. + +---- +<property> + <name>dfs.client.use.datanode.hostname</name> + <value>true</value> + <description>Whether clients should use datanode hostnames when + connecting to datanodes. + </description> +</property> +---- + +** DataNodes use HostNames when connecting to other DataNodes + + Rarely, the NameNode-resolved IP address for a DataNode may be unreachable + from other DataNodes. The fix is to force DataNodes to perform their own + DNS resolution for inter-DataNode connections. The following setting enables + this behavior. + +---- +<property> + <name>dfs.datanode.use.datanode.hostname</name> + <value>true</value> + <description>Whether datanodes should use datanode hostnames when + connecting to other datanodes for data transfer. + </description> +</property> +---- +