[ 
https://issues.apache.org/jira/browse/HDFS-9149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942031#comment-14942031
 ] 

He Tianyi commented on HDFS-9149:
---------------------------------

Perhaps we can either go explicit approach or implicit one.

Explicit approach suggests we make the notion if IDC explicit, make it a part 
of network location just like {{NodeGroup}}. But that may require to extend 
{{NetworkTopology}} since the default one did not imply anything about IDC, 
thus behavior may become different.
If we go this way, other components may benefit from DC awareness. For example, 
one can add {{IsDCAware}} and {{getDataCenterOfNode}} to {{NetworkTopology}} 
and implement {{NetworkTopologyWithMultiDC}}, then further consider DC to 
achieve better locality in {{BlockPlacementPolicy}}.

On the other hand, implicit approach suggests we better off substituting 
{{getWeight}} with distance function or more complicated weight function. 
Perhaps calculate the number of common ancestors. This does not imply anything 
about IDC, and can be extend to whatever hierarchy necessary.

In cases which reader is not a part of the cluster, we still need to maintain 
the correct network location for potential readers (e.g. maintain a rack table 
for all hosts in every DC, and give default location with only DC judged by ip 
address when dealing with unknown), and the only issue is when location of 
reader is something like {{/DC1}}, however, this won't be an issue since it 
would have same number of ancestors to all datanodes within {{DC1}}, and no 
ancestor to datanodes within other DC, the outcome is still correct.

> Consider multi datacenter when sortByDistance
> ---------------------------------------------
>
>                 Key: HDFS-9149
>                 URL: https://issues.apache.org/jira/browse/HDFS-9149
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: He Xiaoqiao
>            Assignee: He Tianyi
>
> {{sortByDistance}} doesn't consider multi-datacenter when read data, so there 
> my be reading data via other datacenter when hadoop deployment with multi-IDC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to