Re: should we split the scan range into serveral segments when the scan range only located in a single region?

libis Mon, 04 Sep 2017 05:26:16 -0700

Thanks for replying promptly. oh, i think it maybe hard to set a proper
mapper number per region for a hbase user, and in that way, some small
region may create so much small jobs. however, we can simply specify a
fixed mapper number only if the scan range located in a single region which
maybe a common production scene for the large  region(>30g). what do you
think?


2017-09-04 17:13 GMT+08:00 Chia-Ping Tsai <[email protected]>:

> That sounds good. There are some related issue. see
> https://issues.apache.org/jira/browse/HBASE-4914 and
> https://issues.apache.org/jira/browse/HBASE-4063.
>
> On 2017-09-04 15:06, libis <[email protected]> wrote:
> > Hi
> >
> > When TableInputFormat is used to source an HBase table in a MapReduce
> job,
> > its splitter will make a map task for each region of the table. However,
> in
> > some cases, the user’s scan range may locate in a single region,
> resulting
> > in there is  a only mapper. For example, the rowkey of the table is
> > ‘md5(userid) + timestamp’, once client want to scan the data of a
> specified
> > user in the latest month with MR, it’s much possible that there is only
> one
> > mapper working.
> >
> > In order to scan data in parallel if the user's scan range located in a
> > single region, should we split the scan range into serveral segments
> within
> > a region?
> >
> > Best,
> >
> > xinxin
> >
>

Re: should we split the scan range into serveral segments when the scan range only located in a single region?

Reply via email to