[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193306#comment-14193306 ]
Ted Yu commented on HBASE-12394: -------------------------------- Mind putting patch on reviewboard ? hbase.mapreduce.scan.regionspermapper controls how many mappers would be used. Have you considered specifying number of mappers for this feature ? Thanks > Support multiple regions as input to each mapper in map/reduce jobs > ------------------------------------------------------------------- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Affects Versions: 2.0.0, 0.98.6.1 > Reporter: Weichen Ye > Attachments: HBASE-12394.patch > > > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > This is an example,which means each mapper has 3 regions as input. > <property> > <name>hbase.mapreduce.scan.regionspermapper</name> > <value>3</value> > </property> > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)