should we split the scan range into serveral segments when the scan range only located in a single region?

libis Mon, 04 Sep 2017 00:06:43 -0700

Hi

When TableInputFormat is used to source an HBase table in a MapReduce job,
its splitter will make a map task for each region of the table. However, in
some cases, the user’s scan range may locate in a single region, resulting
in there is  a only mapper. For example, the rowkey of the table is
‘md5(userid) + timestamp’, once client want to scan the data of a specified
user in the latest month with MR, it’s much possible that there is only one
mapper working.


In order to scan data in parallel if the user's scan range located in a
single region, should we split the scan range into serveral segments within
a region?

Best,

xinxin

should we split the scan range into serveral segments when the scan range only located in a single region?

Reply via email to