Hi When TableInputFormat is used to source an HBase table in a MapReduce job, its splitter will make a map task for each region of the table. However, in some cases, the user’s scan range may locate in a single region, resulting in there is a only mapper. For example, the rowkey of the table is ‘md5(userid) + timestamp’, once client want to scan the data of a specified user in the latest month with MR, it’s much possible that there is only one mapper working.
In order to scan data in parallel if the user's scan range located in a single region, should we split the scan range into serveral segments within a region? Best, xinxin
