Re: should we split the scan range into serveral segments when the scan range only located in a single region?

Chia-Ping Tsai Mon, 04 Sep 2017 05:43:01 -0700

Thanks for the information. Mikhail. It seems to me the issue is popular.
libis, Could you take HBASE-18090 over? I can assign the issue to you if i get 
ur jira account.


On 2017-09-04 20:26, Mikhail Antonov <[email protected]> wrote: 
> I've filed https://issues.apache.org/jira/browse/HBASE-18090 some time ago
> and attached draft patch to it. It's not complete as we need some deeper
> changes in the way we open regions (see comments) but basic stuff works (I
> ended up going the other route and didn't have bandwidth to finish that -
> would be great if someone picked it up)
> 
> Mikhail
> 
> On Mon, Sep 4, 2017 at 11:13 AM Chia-Ping Tsai <[email protected]> wrote:
> 
> > That sounds good. There are some related issue. see
> > https://issues.apache.org/jira/browse/HBASE-4914 and
> > https://issues.apache.org/jira/browse/HBASE-4063.
> >
> > On 2017-09-04 15:06, libis <[email protected]> wrote:
> > > Hi
> > >
> > > When TableInputFormat is used to source an HBase table in a MapReduce
> > job,
> > > its splitter will make a map task for each region of the table. However,
> > in
> > > some cases, the userâs scan range may locate in a single region,
> > resulting
> > > in there is  a only mapper. For example, the rowkey of the table is
> > > âmd5(userid) + timestampâ, once client want to scan the data of a
> > specified
> > > user in the latest month with MR, itâs much possible that there is only
> > one
> > > mapper working.
> > >
> > > In order to scan data in parallel if the user's scan range located in a
> > > single region, should we split the scan range into serveral segments
> > within
> > > a region?
> > >
> > > Best,
> > >
> > > xinxin
> > >
> >
> -- 
> Thanks,
> Michael Antonov
>

Re: should we split the scan range into serveral segments when the scan range only located in a single region?

Reply via email to