[ https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942031#comment-14942031 ]
He Tianyi commented on HDFS-9149: --------------------------------- Perhaps we can either go explicit approach or implicit one. Explicit approach suggests we make the notion if IDC explicit, make it a part of network location just like {{NodeGroup}}. But that may require to extend {{NetworkTopology}} since the default one did not imply anything about IDC, thus behavior may become different. If we go this way, other components may benefit from DC awareness. For example, one can add {{IsDCAware}} and {{getDataCenterOfNode}} to {{NetworkTopology}} and implement {{NetworkTopologyWithMultiDC}}, then further consider DC to achieve better locality in {{BlockPlacementPolicy}}. On the other hand, implicit approach suggests we better off substituting {{getWeight}} with distance function or more complicated weight function. Perhaps calculate the number of common ancestors. This does not imply anything about IDC, and can be extend to whatever hierarchy necessary. In cases which reader is not a part of the cluster, we still need to maintain the correct network location for potential readers (e.g. maintain a rack table for all hosts in every DC, and give default location with only DC judged by ip address when dealing with unknown), and the only issue is when location of reader is something like {{/DC1}}, however, this won't be an issue since it would have same number of ancestors to all datanodes within {{DC1}}, and no ancestor to datanodes within other DC, the outcome is still correct. > Consider multi datacenter when sortByDistance > --------------------------------------------- > > Key: HDFS-9149 > URL: https://issues.apache.org/jira/browse/HDFS-9149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: He Xiaoqiao > Assignee: He Tianyi > > {{sortByDistance}} doesn't consider multi-datacenter when read data, so there > my be reading data via other datacenter when hadoop deployment with multi-IDC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)