[
https://issues.apache.org/jira/browse/HADOOP-4567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644897#action_12644897
]
Jothi Padmanabhan commented on HADOOP-4567:
-------------------------------------------
This looks good. The only debatable point is whether to introduce a new 'racks'
variable in BlockLocations or to just prefix the network topology in the hosts
variable itself. Having a separate racks variable implies that there is an
implicit requirement that the the ordering of hosts and the racks are identical
(which is true in this case). However, having a separate racks variable does
makes it easier to handle cases where there is no topology information
available, just a simple check on the racks variable would do instead of adding
logic during the parsing of host names. I am fine with either approach.
> GetFileBlockLocations should return the NetworkTopology information of the
> machines that hosts those blocks
> -----------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4567
> URL: https://issues.apache.org/jira/browse/HADOOP-4567
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: dfsRackLocation.patch
>
>
> MultiFileInputFormat and FileInputFormat should use block locality
> information to construct splits.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.