[ https://issues.apache.org/jira/browse/HDFS-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673673#comment-15673673 ]
Nandakumar commented on HDFS-10206: ----------------------------------- Thanks for the review [~mingma]. {quote} When the conditions {{reader.equals(node) & isOnSameRack(reader, node) }} aren't satisfied, this patch will cause extra string parsing. Wonder if there is any major performance impact. If that isn't an issue, can getDistanceUsingNetworkLocation handle all scenarios including {{reader.equals(node) & isOnSameRack(reader, node) }}? {quote} I was also worried about the performance impact that will be caused by extra string parsing, that is why {{getDistanceUsingNetworkLocation}} is called only when the conditions {{reader.equals(node)}} and {{isOnSameRact(reader, node)}} are not satisfied. {quote} It probably doesn't matter much. getWeight used to return 0, 1, 2, 3, etc. as network layer increases. With the patch it changes to 0, 1, 2, 4, etc.. {quote} I didn't quite understand this point. Previously {{getWeight}} used to return 0 for local, 1 for same rack and 2 for off rack. With this patch it will be 0 for local, 1 for same rack and after that the value is incremented by 1 for each level > getBlockLocations might not sort datanodes properly by distance > --------------------------------------------------------------- > > Key: HDFS-10206 > URL: https://issues.apache.org/jira/browse/HDFS-10206 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ming Ma > Assignee: Nandakumar > Attachments: HDFS-10206.000.patch > > > If the DFSClient machine is not a datanode, but it shares its rack with some > datanodes of the HDFS block requested, {{DatanodeManager#sortLocatedBlocks}} > might not put the local-rack datanodes at the beginning of the sorted list. > That is because the function didn't call {{networktopology.add(client);}} to > properly set the node's parent node; something required by > {{networktopology.sortByDistance}} to compute distance between two nodes in > the same topology tree. > Another issue with {{networktopology.sortByDistance}} is it only > distinguishes local rack from remote rack, but it doesn't support general > distance calculation to tell how remote the rack is. > {noformat} > NetworkTopology.java > protected int getWeight(Node reader, Node node) { > // 0 is local, 1 is same rack, 2 is off rack > // Start off by initializing to off rack > int weight = 2; > if (reader != null) { > if (reader.equals(node)) { > weight = 0; > } else if (isOnSameRack(reader, node)) { > weight = 1; > } > } > return weight; > } > {noformat} > HDFS-10203 has suggested moving the sorting from namenode to DFSClient to > address another issue. Regardless of where we do the sorting, we still need > fix the issues outline here. > Note that BlockPlacementPolicyDefault shares the same NetworkTopology object > used by DatanodeManager and requires Nodes stored in the topology to be > {{DatanodeDescriptor}} for block placement. So we need to make sure we don't > pollute the NetworkTopology if we plan to fix it on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org