Enhance MiniDFSCluster to improve testing of network topology distance related issues. --------------------------------------------------------------------------------------
Key: HDFS-1962 URL: https://issues.apache.org/jira/browse/HDFS-1962 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 0.22.0 Reporter: Eric Payne Fix For: 0.23.0 In Jira HDFS-1875, Tanping Wang added the following comment. In order to keep the scope of HDFS-1875 small, I have created this Jira to capture this need. ------------------------------------------------- It would be really useful if we can have multiple simulated data nodes binded to different hosts and dfs client binded to a particular host. And futher down the road, some of the simulated data nodes on different hosts, but the same rack. We can use this to test network topology distance related issues. One of the related problem that I ran into was that the order of data nodes in LocatedBlock returned by name nodes is sorted by NetworkTopology#pseudoSortByDistance(). In current Mini dfs cluster, there is no way I can bind the client to a host or bind a simulated data node to a particular host/rack. It would be nice if mini dfs cluster can make this possible, so that the network topology distance of client to each data node is fixed. Therefore, the order of data nodes returned within a LocatedBlock on MiniDFS cluster is fixed. Currently the order of data nodes in LocatedBlock is randomly sorted which means NetworkTopology understand the DFSClient and simulated datanodes are not different hosts and different racks. Also in currently Mini DFS client provides the option of -racks when starting data nodes. But we can not bind multiple simulated data nodes to one rack... so it is not really that useful. ------------------------------------------------- -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira