Thanks for the information. Mikhail. It seems to me the issue is popular. libis, Could you take HBASE-18090 over? I can assign the issue to you if i get ur jira account.
On 2017-09-04 20:26, Mikhail Antonov <[email protected]> wrote: > I've filed https://issues.apache.org/jira/browse/HBASE-18090 some time ago > and attached draft patch to it. It's not complete as we need some deeper > changes in the way we open regions (see comments) but basic stuff works (I > ended up going the other route and didn't have bandwidth to finish that - > would be great if someone picked it up) > > Mikhail > > On Mon, Sep 4, 2017 at 11:13 AM Chia-Ping Tsai <[email protected]> wrote: > > > That sounds good. There are some related issue. see > > https://issues.apache.org/jira/browse/HBASE-4914 and > > https://issues.apache.org/jira/browse/HBASE-4063. > > > > On 2017-09-04 15:06, libis <[email protected]> wrote: > > > Hi > > > > > > When TableInputFormat is used to source an HBase table in a MapReduce > > job, > > > its splitter will make a map task for each region of the table. However, > > in > > > some cases, the userâs scan range may locate in a single region, > > resulting > > > in there is a only mapper. For example, the rowkey of the table is > > > âmd5(userid) + timestampâ, once client want to scan the data of a > > specified > > > user in the latest month with MR, itâs much possible that there is only > > one > > > mapper working. > > > > > > In order to scan data in parallel if the user's scan range located in a > > > single region, should we split the scan range into serveral segments > > within > > > a region? > > > > > > Best, > > > > > > xinxin > > > > > > -- > Thanks, > Michael Antonov >
