[ 
https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651
 ] 

Jean-Daniel Cryans commented on HBASE-1672:
-------------------------------------------

We already do this inside TableInputFormatBase:

{code}
String regionLocation = table.getRegionLocation(startKeys[startPos]).
  getServerAddress().getHostname(); 
splits[i] = new TableSplit(this.table.getTableName(),
  startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
  HConstants.EMPTY_START_ROW, regionLocation);
LOG.info("split: " + i + "->" + splits[i]);
{code}

I don't know if we can do anything more than that. One difference in HBase 
compared to mapred on HDFS is that a region is only on one node, not 3 which is 
the default replication factor. So being able to get the right map task on the 
right RS at the right moment may be difficult for the JobTracker.

> Map tasks not local to RS
> -------------------------
>
>                 Key: HBASE-1672
>                 URL: https://issues.apache.org/jira/browse/HBASE-1672
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: mapred, master, regionserver
>    Affects Versions: 0.20.0, 0.19.3
>         Environment: DN, TT and RS running on the same nodes.
>            Reporter: Amandeep Khurana
>             Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10% 
> of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the 
> scan job were equal to the number of regions (280). Only 25 of them were data 
> local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to