[ 
https://issues.apache.org/jira/browse/HDFS-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027895#comment-13027895
 ] 

Tanping Wang commented on HDFS-1875:
------------------------------------

I like this idea.  It would be really useful if we can have multiple simulated 
data nodes binded to different hosts and dfs client binded to a particular 
host.  And futher down the road, some of the simulated data nodes on different 
hosts, but the same rack.  We can use this to test network topology distance 
related issues.

One of the related problem that I ran into was that the order of data nodes in 
LocatedBlock returned by name nodes is sorted by 
NetworkTopology#pseudoSortByDistance().  In current Mini dfs cluster, there is 
no way I can bind the client to a host or bind a simulated data node to a 
particular host/rack.  It would be nice if mini dfs cluster can make this 
possible, so that the network topology distance of client to each data node is 
fixed.  Therefore, the order of data nodes returned within a LocatedBlock on 
MiniDFS cluster is fixed.  Currently the order of data nodes in LocatedBlock is 
randomly sorted which means NetworkTopology understand the DFSClient and 
simulated datanodes are not different hosts and different racks. 

Also in currently Mini DFS client provides the opton of -racks when starting 
data nodes.  But we can not bind multiple simulated data nodes to one rack... 
so it is not really that useful.

> MiniDFSCluster hard-codes dfs.datanode.address to localhost
> -----------------------------------------------------------
>
>                 Key: HDFS-1875
>                 URL: https://issues.apache.org/jira/browse/HDFS-1875
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 0.22.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>             Fix For: 0.23.0
>
>
> When creating RPC addresses that represent the communication sockets for each 
> simulated DataNode, the MiniDFSCluster class hard-codes the address of the 
> dfs.datanode.address port to be "127.0.0.1:0"
> The DataNodeCluster test tool uses the MiniDFSCluster class to create a 
> selected number of simulated datanodes on a single host. In the 
> DataNodeCluster setup, the NameNode is not simulated but is started as a 
> separate daemon.
> The problem is that if the write requrests into the simulated datanodes are 
> originated on a host that is not the same host running the simulated 
> datanodes, the connections are refused. This is because the RPC sockets that 
> are started by MiniDFSCluster are for "localhost" (127.0.0.1) and are not 
> accessible from outside that same machine.
> It is proposed that the MiniDFSCluster.setupDatanodeAddress() method be 
> overloaded in order to accommodate an environment where the NameNode is on 
> one host, the client is on another host, and the simulated DataNodes are on 
> yet another host (or even multiple hosts simulating multiple DataNodes each).
> The overloaded API would add a parameter that would be used as the basis for 
> creating the RPS sockets. By default, it would remain 127.0.0.1

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to