[
https://issues.apache.org/jira/browse/HBASE-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733651#action_12733651
]
Jean-Daniel Cryans commented on HBASE-1672:
-------------------------------------------
We already do this inside TableInputFormatBase:
{code}
String regionLocation = table.getRegionLocation(startKeys[startPos]).
getServerAddress().getHostname();
splits[i] = new TableSplit(this.table.getTableName(),
startKeys[startPos], ((i + 1) < realNumSplits) ? startKeys[lastPos]:
HConstants.EMPTY_START_ROW, regionLocation);
LOG.info("split: " + i + "->" + splits[i]);
{code}
I don't know if we can do anything more than that. One difference in HBase
compared to mapred on HDFS is that a region is only on one node, not 3 which is
the default replication factor. So being able to get the right map task on the
right RS at the right moment may be difficult for the JobTracker.
> Map tasks not local to RS
> -------------------------
>
> Key: HBASE-1672
> URL: https://issues.apache.org/jira/browse/HBASE-1672
> Project: Hadoop HBase
> Issue Type: Bug
> Components: mapred, master, regionserver
> Affects Versions: 0.20.0, 0.19.3
> Environment: DN, TT and RS running on the same nodes.
> Reporter: Amandeep Khurana
> Fix For: 0.20.0, 0.19.4
>
>
> The number of data local map tasks while scanning a table is only about 10%
> of the total map tasks...
> My table had 280 regions and 13M records... The number of map tasks in the
> scan job were equal to the number of regions (280). Only 25 of them were data
> local tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.