[ https://issues.apache.org/jira/browse/HADOOP-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi resolved HADOOP-2060. -------------------------------- Resolution: Invalid The suggested feature is already in. > DFSClient should choose a block that is local to the node where the client is > running > ------------------------------------------------------------------------------------- > > Key: HADOOP-2060 > URL: https://issues.apache.org/jira/browse/HADOOP-2060 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Runping Qi > > When I chase down the DFSClient code to see how the data locality impact the > dfs read throughput, > I realized that DFSClient does not use data locality info (at least not > obvious to me) > when it chooses a block for read from the available replicas. > Here is the relevant code: > {code} > /** > * Pick the best node from which to stream the data. > * Entries in <i>nodes</i> are already in the priority order > */ > private DatanodeInfo bestNode(DatanodeInfo nodes[], > AbstractMap<DatanodeInfo, DatanodeInfo> > deadNodes) > throws IOException { > if (nodes != null) { > for (int i = 0; i < nodes.length; i++) { > if (!deadNodes.containsKey(nodes[i])) { > return nodes[i]; > } > } > } > throw new IOException("No live nodes contain current block"); > } > private DNAddrPair chooseDataNode(LocatedBlock block) > throws IOException { > int failures = 0; > while (true) { > DatanodeInfo[] nodes = block.getLocations(); > try { > DatanodeInfo chosenNode = bestNode(nodes, deadNodes); > InetSocketAddress targetAddr = > DataNode.createSocketAddr(chosenNode.getName()); > return new DNAddrPair(chosenNode, targetAddr); > } catch (IOException ie) { > String blockInfo = block.getBlock() + " file=" + src; > if (failures >= MAX_BLOCK_ACQUIRE_FAILURES) { > throw new IOException("Could not obtain block: " + blockInfo); > } > > if (nodes == null || nodes.length == 0) { > LOG.info("No node available for block: " + blockInfo); > } > LOG.info("Could not obtain block " + block.getBlock() + " from any > node: " + ie); > try { > Thread.sleep(3000); > } catch (InterruptedException iex) { > } > deadNodes.clear(); //2nd option is to remove only nodes[blockId] > openInfo(); > failures++; > continue; > } > } > } > {code} > It seems to pick the first good replica. > This means that even though some replica is local to the node where the > client runs, > it may actually pick a remote replica. > Map/reduce tries to schedule a mapper to a node where some copy of the input > split data is local to the node. > However, if the DFSClient does not use that info in choosing replica for > read, the mapper may well have to read data > from the network, even though a local replica is available. > I hope I missed something and misunderstood the code. > Otherwise, this will be a serious problem to performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.