Re: Parallel Scanner

2017-02-19 Thread ramkrishna vasudevan
Hi Anil,

HBase directly does not provide parallel scans. If you know the table
region's start and end keys you could create parallel scans in your
application code.

In the above code snippet, the intent is right - you get the required
regions and can issue parallel scans from your app.

One thing to watch out is that if there is a split in the region then this
start and end row may change so in that case it is better you try to get
the regions every time before you issue a scan. Does that make sense to you?

Regards
Ram

On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:

> Hi ,
>
> I am building an usecase where i have to load the hbase data into In-memory
> database (IMDB). I am scanning the each region and loading data into IMDB.
>
> i am looking at parallel scanner ( https://issues.apache.org/
> jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and HTable#
> getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> deprecated, HBASE-1935 is still open.
>
> I see Connection from ConnectionFactory is HConnectionImplementation by
> default and creates HTable instance.
>
> Do you see any issues in using HTable from Table instance ?
> for each region {
> int i = 0;
> List regions =
> hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(), true);
>
> for (HRegionLocation region : regions){
> startRow = i == 0 ? scans.getStartRow() :
> region.getRegionInfo().getStartKey();
> i++;
> endRow = i == regions.size()? scans.getStopRow() :
> region.getRegionInfo().getEndKey();
>  }
>}
>
> are there any alternatives to achieve parallel scan? Thanks.
>
> Thanks
>


Re: On HBase Read Replicas

2017-02-19 Thread Anoop John
Thanks Enis.. I was not knowing the way of setting replica id
specifically..  So what will happen if that said replica is down at
the read time?  Will that go to another replica?

-Anoop-

On Sat, Feb 18, 2017 at 3:34 AM, Enis Söztutar  wrote:
> You can do gets using two different "modes":
>  - Do a read with backup RPCs. In case, the algorithm that I have above
> will be used. 1 RPC to primary, and 2 more RPCs after primary timeouts.
>  - Do a read to a single replica. In this case, there is only 1 RPC that
> will happen to that given replica.
>
> Enis
>
> On Fri, Feb 17, 2017 at 12:03 PM, jeff saremi 
> wrote:
>
>> Enis
>>
>> Thanks for taking the time to reply
>>
>> So i thought that a read request is sent to all Replicas regardless. If we
>> have the option of Sending to one, analyzing response, and then sending to
>> another, this bodes well with our scenarios.
>>
>> Please confirm
>>
>> thanks
>>
>> 
>> From: Enis Söztutar 
>> Sent: Friday, February 17, 2017 11:38:42 AM
>> To: hbase-user
>> Subject: Re: On HBase Read Replicas
>>
>> You can use read-replicas to distribute the read-load if you are fine with
>> stale reads. The read replicas normally have a "backup rpc" path, which
>> implements a logic like this:
>>  - Send the RPC to the primary replica
>>  - if no response for 100ms (or configured timeout), send RPCs to the other
>> replicas
>>  - return the first non-exception response.
>>
>> However, there is also another feature for read replicas, where you can
>> indicate which exact replica_id you want to read from when you are doing a
>> get. If you do this:
>> Get get = new Get(row);
>> get.setReplicaId(2);
>>
>> the Get RPC will only go to the replica_id=2. Note that if you have region
>> replication = 3, then you will have regions with replica ids: {0, 1, 2}
>> where replica_id=0 is the primary.
>>
>> So you can do load-balancing with a get.setReplicaId(random() %
>> num_replicas) kind of pattern.
>>
>> Enis
>>
>>
>>
>> On Thu, Feb 16, 2017 at 9:41 AM, Anoop John  wrote:
>>
>> > Never saw this kind of discussion.
>> >
>> > -Anoop-
>> >
>> > On Thu, Feb 16, 2017 at 10:13 PM, jeff saremi 
>> > wrote:
>> > > Thanks Anoop.
>> > >
>> > > Understood.
>> > >
>> > > Have there been enhancement requests or discussions on load balancing
>> by
>> > providing additional replicas in the past? Has anyone else come up with
>> > anything on this?
>> > > thanks
>> > >
>> > > 
>> > > From: Anoop John 
>> > > Sent: Thursday, February 16, 2017 2:35:48 AM
>> > > To: user@hbase.apache.org
>> > > Subject: Re: On HBase Read Replicas
>> > >
>> > > The region replica feature came in so as to reduce the MTTR and so
>> > > increase the data availability.  When the master region containing RS
>> > > dies, the clients can read from the secondary regions.  But to keep
>> > > one thing in mind that this data from secondary regions will be bit
>> > > out of sync as the replica is eventual consistent.   Because of this
>> > > said reason,  change client so as to share the load across diff RSs
>> > > might be tough.
>> > >
>> > > -Anoop-
>> > >
>> > > On Sun, Feb 12, 2017 at 8:13 AM, jeff saremi 
>> > wrote:
>> > >> Yes indeed. thank you very much Ted
>> > >>
>> > >> 
>> > >> From: Ted Yu 
>> > >> Sent: Saturday, February 11, 2017 3:40:50 PM
>> > >> To: user@hbase.apache.org
>> > >> Subject: Re: On HBase Read Replicas
>> > >>
>> > >> Please take a look at the design doc attached to
>> > >> https://issues.apache.org/jira/browse/HBASE-10070.
>> > >>
>> > >> Your first question would be answered by that document.
>> > >>
>> > >> Cheers
>> > >>
>> > >> On Sat, Feb 11, 2017 at 2:06 PM, jeff saremi 
>> > wrote:
>> > >>
>> > >>> The first time I heard replicas in HBase the following thought
>> > immediately
>> > >>> came to my mind:
>> > >>> To alleviate the load in read-heavy clusters, one could assign Region
>> > >>> servers to be replicas of others so that the load is distributed and
>> > there
>> > >>> is less pressure on the main RS.
>> > >>>
>> > >>> Just 2 days ago a colleague quoted a paragraph from HBase manual that
>> > >>> contradicted this completely. Apparently, the replicas do not help
>> > with the
>> > >>> load but they actually contribute to more traffic on the network and
>> > on the
>> > >>> underlying file system
>> > >>>
>> > >>> Would someone be able to give us some insight on why anyone would
>> want
>> > >>> replicas?
>> > >>>
>> > >>> And also could one easily change this behavior in the HBase native
>> Java
>> > >>> client to support what I had been imagining as the concept for
>> > replicas?
>> > >>>
>> > >>>
>> > >>> thanks
>> > >>>
>> >
>>


Re: Parallel Scanner

2017-02-19 Thread Anil
Thanks Ram.

So, you mean that there is no harm in using  HTable#getRegionsInRange in
the application code.

HTable#getRegionsInRange returned single entry for all my region start key
and end key. i need to explore more on this.

"If you know the table region's start and end keys you could create
parallel scans in your application code."  - is there any way to scan a
region in the application code other than the one i put in the original
email ?

"One thing to watch out is that if there is a split in the region then
this start
and end row may change so in that case it is better you try to get
the regions every time before you issue a scan"
 - Agree. i am dynamically determining the region start key and end key
before initiating scan operations for every initial load.

Thanks.




On 20 February 2017 at 10:59, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Hi Anil,
>
> HBase directly does not provide parallel scans. If you know the table
> region's start and end keys you could create parallel scans in your
> application code.
>
> In the above code snippet, the intent is right - you get the required
> regions and can issue parallel scans from your app.
>
> One thing to watch out is that if there is a split in the region then this
> start and end row may change so in that case it is better you try to get
> the regions every time before you issue a scan. Does that make sense to
> you?
>
> Regards
> Ram
>
> On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:
>
> > Hi ,
> >
> > I am building an usecase where i have to load the hbase data into
> In-memory
> > database (IMDB). I am scanning the each region and loading data into
> IMDB.
> >
> > i am looking at parallel scanner ( https://issues.apache.org/
> > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and HTable#
> > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > deprecated, HBASE-1935 is still open.
> >
> > I see Connection from ConnectionFactory is HConnectionImplementation by
> > default and creates HTable instance.
> >
> > Do you see any issues in using HTable from Table instance ?
> > for each region {
> > int i = 0;
> > List regions =
> > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(), true);
> >
> > for (HRegionLocation region : regions){
> > startRow = i == 0 ? scans.getStartRow() :
> > region.getRegionInfo().getStartKey();
> > i++;
> > endRow = i == regions.size()? scans.getStopRow() :
> > region.getRegionInfo().getEndKey();
> >  }
> >}
> >
> > are there any alternatives to achieve parallel scan? Thanks.
> >
> > Thanks
> >
>


Re: Parallel Scanner

2017-02-19 Thread ramkrishna vasudevan
Yes. There is way.

Have you seen Endpoints? Endpoints are triggers like points that allows
your client to trigger them parallely in one ore more regions using the
start and end key of the region. This executes parallely and then you may
have to sort out the results as per your need.

But these endpoints have to running on your region servers and it is not a
client only soln.
https://blogs.apache.org/hbase/entry/coprocessor_introduction.

Be careful when you use them. Since these endpoints run on server ensure
that these are not heavy or things that consume more memory which can have
adverse effects on the server.


Regards
Ram

On Mon, Feb 20, 2017 at 12:18 PM, Anil  wrote:

> Thanks Ram.
>
> So, you mean that there is no harm in using  HTable#getRegionsInRange in
> the application code.
>
> HTable#getRegionsInRange returned single entry for all my region start key
> and end key. i need to explore more on this.
>
> "If you know the table region's start and end keys you could create
> parallel scans in your application code."  - is there any way to scan a
> region in the application code other than the one i put in the original
> email ?
>
> "One thing to watch out is that if there is a split in the region then
> this start
> and end row may change so in that case it is better you try to get
> the regions every time before you issue a scan"
>  - Agree. i am dynamically determining the region start key and end key
> before initiating scan operations for every initial load.
>
> Thanks.
>
>
>
>
> On 20 February 2017 at 10:59, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Hi Anil,
> >
> > HBase directly does not provide parallel scans. If you know the table
> > region's start and end keys you could create parallel scans in your
> > application code.
> >
> > In the above code snippet, the intent is right - you get the required
> > regions and can issue parallel scans from your app.
> >
> > One thing to watch out is that if there is a split in the region then
> this
> > start and end row may change so in that case it is better you try to get
> > the regions every time before you issue a scan. Does that make sense to
> > you?
> >
> > Regards
> > Ram
> >
> > On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:
> >
> > > Hi ,
> > >
> > > I am building an usecase where i have to load the hbase data into
> > In-memory
> > > database (IMDB). I am scanning the each region and loading data into
> > IMDB.
> > >
> > > i am looking at parallel scanner ( https://issues.apache.org/
> > > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and
> HTable#
> > > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > > deprecated, HBASE-1935 is still open.
> > >
> > > I see Connection from ConnectionFactory is HConnectionImplementation by
> > > default and creates HTable instance.
> > >
> > > Do you see any issues in using HTable from Table instance ?
> > > for each region {
> > > int i = 0;
> > > List regions =
> > > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(),
> true);
> > >
> > > for (HRegionLocation region : regions){
> > > startRow = i == 0 ? scans.getStartRow() :
> > > region.getRegionInfo().getStartKey();
> > > i++;
> > > endRow = i == regions.size()? scans.getStopRow() :
> > > region.getRegionInfo().getEndKey();
> > >  }
> > >}
> > >
> > > are there any alternatives to achieve parallel scan? Thanks.
> > >
> > > Thanks
> > >
> >
>


Re: Parallel Scanner

2017-02-19 Thread Anil
Thanks Ram. I will look into EndPoints.

On 20 February 2017 at 12:29, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:

> Yes. There is way.
>
> Have you seen Endpoints? Endpoints are triggers like points that allows
> your client to trigger them parallely in one ore more regions using the
> start and end key of the region. This executes parallely and then you may
> have to sort out the results as per your need.
>
> But these endpoints have to running on your region servers and it is not a
> client only soln.
> https://blogs.apache.org/hbase/entry/coprocessor_introduction.
>
> Be careful when you use them. Since these endpoints run on server ensure
> that these are not heavy or things that consume more memory which can have
> adverse effects on the server.
>
>
> Regards
> Ram
>
> On Mon, Feb 20, 2017 at 12:18 PM, Anil  wrote:
>
> > Thanks Ram.
> >
> > So, you mean that there is no harm in using  HTable#getRegionsInRange in
> > the application code.
> >
> > HTable#getRegionsInRange returned single entry for all my region start
> key
> > and end key. i need to explore more on this.
> >
> > "If you know the table region's start and end keys you could create
> > parallel scans in your application code."  - is there any way to scan a
> > region in the application code other than the one i put in the original
> > email ?
> >
> > "One thing to watch out is that if there is a split in the region then
> > this start
> > and end row may change so in that case it is better you try to get
> > the regions every time before you issue a scan"
> >  - Agree. i am dynamically determining the region start key and end key
> > before initiating scan operations for every initial load.
> >
> > Thanks.
> >
> >
> >
> >
> > On 20 February 2017 at 10:59, ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> > > Hi Anil,
> > >
> > > HBase directly does not provide parallel scans. If you know the table
> > > region's start and end keys you could create parallel scans in your
> > > application code.
> > >
> > > In the above code snippet, the intent is right - you get the required
> > > regions and can issue parallel scans from your app.
> > >
> > > One thing to watch out is that if there is a split in the region then
> > this
> > > start and end row may change so in that case it is better you try to
> get
> > > the regions every time before you issue a scan. Does that make sense to
> > > you?
> > >
> > > Regards
> > > Ram
> > >
> > > On Sat, Feb 18, 2017 at 1:44 PM, Anil  wrote:
> > >
> > > > Hi ,
> > > >
> > > > I am building an usecase where i have to load the hbase data into
> > > In-memory
> > > > database (IMDB). I am scanning the each region and loading data into
> > > IMDB.
> > > >
> > > > i am looking at parallel scanner ( https://issues.apache.org/
> > > > jira/browse/HBASE-8504, HBASE-1935 ) to reduce the load time and
> > HTable#
> > > > getRegionsInRange(byte[] startKey, byte[] endKey, boolean reload) is
> > > > deprecated, HBASE-1935 is still open.
> > > >
> > > > I see Connection from ConnectionFactory is HConnectionImplementation
> by
> > > > default and creates HTable instance.
> > > >
> > > > Do you see any issues in using HTable from Table instance ?
> > > > for each region {
> > > > int i = 0;
> > > > List regions =
> > > > hTable.getRegionsInRange(scans.getStartRow(), scans.getStopRow(),
> > true);
> > > >
> > > > for (HRegionLocation region : regions){
> > > > startRow = i == 0 ? scans.getStartRow() :
> > > > region.getRegionInfo().getStartKey();
> > > > i++;
> > > > endRow = i == regions.size()? scans.getStopRow()
> :
> > > > region.getRegionInfo().getEndKey();
> > > >  }
> > > >}
> > > >
> > > > are there any alternatives to achieve parallel scan? Thanks.
> > > >
> > > > Thanks
> > > >
> > >
> >
>