[ https://issues.apache.org/jira/browse/HDDS-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974047#comment-16974047 ]
Sammi Chen edited comment on HDDS-2249 at 11/14/19 8:47 AM: ------------------------------------------------------------ Thanks [~swagle] for report this. One idea comes to my mind is how about use the the hostname:port as the key in dnsToUuidMap. If it works, it might solve this issue. was (Author: sammi): Thanks [~swagle] for report this. One idea comes to my mind is how about use the the hostname:port as the key in dnsToUuidMap. If it works, will it solve this issue? > SortDatanodes does not return correct orders when many DNs on a given host > -------------------------------------------------------------------------- > > Key: HDDS-2249 > URL: https://issues.apache.org/jira/browse/HDDS-2249 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM > Affects Versions: 0.5.0 > Reporter: Stephen O'Donnell > Priority: Major > > In HDDS-2199 ScmNodeManager.getNodeByAddress() was changed to return a list > of nodes rather than a single entry, to handle the case where many datanodes > are running on the same host. > In SCMBlocKProtocol.sortDatanodes(), it uses the results returned from > getNodesByAddress to determine if the client submitting the request is > running on a cluster node, and if it is, it attempts to sort the datanodes by > distance from the client machine. > To do this, the code currently takes the first DatanodeDetails object > returned by getHostsByAddress and then compares it with the other passed in > nodes. If any of the passed nodes are equal to the client node (based on the > Java object ID) it returns a zero distance, otherwise the distance is > calculated. > The sort is performed in NetworkTopologyImpl.sortByDistanceCost() which later > calls NetworkTopologyImpl.getDistanceCost() which is where the object > comparison is performed: > {code} > if ((node1 != null && node2 != null && node1.equals(node2)) || > (node1 == null && node2 == null)) { > return 0; > } > {code} > This does not always work when there are many datanodes on the same host, as > the first node returned from getNodesByAddress() is guarantted to be on the > same host as the client, but the list of passed datanodes may not include > that datanode instance. > To fix this, we should probably have getDistanceCost() compare hostnames or > IP as a second check or instead of the object equality, however this is not > trivial to implement. > The reason, is that getDistanceCost() takes Node objects (not > DatanodeDetails) and a Node does not have a IP or Hostname field. It does > have a getNetworkName method, which should return the hostname, but it is > overwritten by the hosts UUID when it registed to the node manager, by this > line in NodeManager.register(): > datanodeDetails.setNetworkName(datanodeDetails.getUuidString()); > > Note this only affects test clusters where many DNs are on a single host, and > it does not cause any failures. The DNs may be returned a less than ideal > order. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org