[
https://issues.apache.org/jira/browse/HADOOP-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12534962
]
Runping Qi commented on HADOOP-2060:
------------------------------------
OK, that is the part I didn't get.
I'll examine the location list to confirm.
Thanks,
> DFSClient should choose a block that is local to the node where the client is
> running
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-2060
> URL: https://issues.apache.org/jira/browse/HADOOP-2060
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: Runping Qi
>
> When I chase down the DFSClient code to see how the data locality impact the
> dfs read throughput,
> I realized that DFSClient does not use data locality info (at least not
> obvious to me)
> when it chooses a block for read from the available replicas.
> Here is the relevant code:
> {code}
> /**
> * Pick the best node from which to stream the data.
> * Entries in <i>nodes</i> are already in the priority order
> */
> private DatanodeInfo bestNode(DatanodeInfo nodes[],
> AbstractMap<DatanodeInfo, DatanodeInfo>
> deadNodes)
> throws IOException {
> if (nodes != null) {
> for (int i = 0; i < nodes.length; i++) {
> if (!deadNodes.containsKey(nodes[i])) {
> return nodes[i];
> }
> }
> }
> throw new IOException("No live nodes contain current block");
> }
> private DNAddrPair chooseDataNode(LocatedBlock block)
> throws IOException {
> int failures = 0;
> while (true) {
> DatanodeInfo[] nodes = block.getLocations();
> try {
> DatanodeInfo chosenNode = bestNode(nodes, deadNodes);
> InetSocketAddress targetAddr =
> DataNode.createSocketAddr(chosenNode.getName());
> return new DNAddrPair(chosenNode, targetAddr);
> } catch (IOException ie) {
> String blockInfo = block.getBlock() + " file=" + src;
> if (failures >= MAX_BLOCK_ACQUIRE_FAILURES) {
> throw new IOException("Could not obtain block: " + blockInfo);
> }
>
> if (nodes == null || nodes.length == 0) {
> LOG.info("No node available for block: " + blockInfo);
> }
> LOG.info("Could not obtain block " + block.getBlock() + " from any
> node: " + ie);
> try {
> Thread.sleep(3000);
> } catch (InterruptedException iex) {
> }
> deadNodes.clear(); //2nd option is to remove only nodes[blockId]
> openInfo();
> failures++;
> continue;
> }
> }
> }
> {code}
> It seems to pick the first good replica.
> This means that even though some replica is local to the node where the
> client runs,
> it may actually pick a remote replica.
> Map/reduce tries to schedule a mapper to a node where some copy of the input
> split data is local to the node.
> However, if the DFSClient does not use that info in choosing replica for
> read, the mapper may well have to read data
> from the network, even though a local replica is available.
> I hope I missed something and misunderstood the code.
> Otherwise, this will be a serious problem to performance.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.