RE: Question on hbase.client.scanner.timeout.period

Eric Owhadi Fri, 28 Aug 2015 16:45:06 -0700

OK will do. Not yet sure if it is easy, will know on Monday :-). Was
struggling today to see how to regression test this without putting
breakpoints to simulate busy client not calling next() on time in trafodion
code...
Eric


-----Original Message-----
From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of Stack
Sent: Friday, August 28, 2015 6:35 PM
To: HBase Dev List <dev@hbase.apache.org>
Subject: Re: Question on hbase.client.scanner.timeout.period

On Fri, Aug 28, 2015 at 11:31 AM, Eric Owhadi <eric.owh...@esgyn.com> wrote:

> That sounds good, but given trafodion needs to work on current and
> future released version of HBase, unpatched, I will first implement a
> ClientScannerTrafodion (to be deprecated), inheriting from
> ClientScanner that will just overload the loadCache(),and make sure
> that the code that is picking up the right scanner based on scan
> object is bypassed to force getting the ClientScannerTrafodion when
> appropriate.
> Not very elegant, but need to take into consideration trafodion
> deployment requirements.
> Then, if we do not discover any side effect during our QA related to
> this code I will port the fix on HBase to deprecate the custom scanner
> (probably first on HBase 2.0, then will let the community decide if
> this fix is worth it for back porting...). It will be a first for me,
> but that's great, I'll take your offer to help ;-)...
>

Sweet. Suggest opening an umbrellas issue in hbase to implement this
feature. Reference HBASE-2161 (it is closed now). Link trafodion issue to
it. A subtask could have implementation in hbase 2.0, another could be
backport.

Is is easy to insert your T*ClientScanner?
St.Ack



> Regards,
> Eric
>
> -----Original Message-----
> From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of
> Stack
> Sent: Thursday, August 27, 2015 3:55 PM
> To: HBase Dev List <dev@hbase.apache.org>
> Subject: Re: Question on hbase.client.scanner.timeout.period
>
> On Thu, Aug 27, 2015 at 1:39 PM, Eric Owhadi <eric.owh...@esgyn.com>
> wrote:
>
> > Oops, my bad, the related JIRA was :
> > https://issues.apache.org/jira/browse/HBASE-2161
> >
> > I am suggesting that the special code client side in loadCache() of
> > ClientScanner that is trapping the UnknownScannerException, then on
> > purpose check if it is coming from a lease timeout (and not by a
> > region move) to decide that it would throw a ScannerTimeoutException
> > instead of letting the code go and just reset the scanner and start
> > from last successful retrieve (the way it works for an
> > unknowScannerException due to a region moving).
> > By just removing the special handling that tries to differentiate
> > from unkownScannerException due to lease timeout, we should have a
> > resolution to JIRA 2161- And to our trafodion issue.
> >
> > We are still protecting against dead client that would cause
> > resource leak at region server, since we keep the lease timeout
> > mechanism.
> >
> > Not sure if I have overlooked something, as usually, code is here
> > for a reason :-)...
> >
> >
> Your proposal sounds good to me.
>
> Scanner works the way it does because it has always work this way (smile).
> A while back, one of the lads suggested we do like dynamodb and have
> scanner have no state on the serverside, the scan next would just
> supply all necessary context. It was argued against because serverside
> setup is so costly. Your suggestion is similar only we do it only if
> Scanner has timed out.
>
> Suggest we keep the current semantic in 1.x at least. We could flip to
> your behavior in 2.x.  Meantime, you'd have to ask for it when you set
> up your Scan object by setting a flag.
>
> Would that work? If you want to have a go at it, I could help out on
> the issue.
>
> St.Ack
>
>
>
>
> > Regards,
> > Eric
> >
> >
> >
> > -----Original Message-----
> > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of
> > Stack
> > Sent: Thursday, August 27, 2015 3:23 PM
> > To: HBase Dev List <dev@hbase.apache.org>
> > Subject: Re: Question on hbase.client.scanner.timeout.period
> >
> > On Tue, Aug 25, 2015 at 8:03 AM, Eric Owhadi <eric.owh...@esgyn.com>
> > wrote:
> >
> > > Hello St.Ack,
> > > Thanks for your pointer, but I had already investigated JIRA
> > > https://issues.apache.org/jira/browse/HBASE-13090
> > > Unfortunately, this heartbeat will protect against rpc timeout,
> > > not server side lease timeout that we are experiencing right now.
> > > I have not seen an active JIRA fixing our issue.
> > > Only https://issues.apache.org/jira/browse/HBASE6121 is
> > > complaining about the exact same issue, but was never resolved.
> > >
> > >
> > Which issue? https://issues.apache.org/jira/browse/HBASE-6121 seems
> > unrelated.
> >
> >
> >
> > > The heartbeat JIRA in 13090 protect for situation where server
> > > scanner takes so long to retrieve the highly filtered information,
> > > that it exceeds the RPC timeout (hbase.rpc.timeout).
> >
> >
> >
> > > The timeout we are experiencing is the
> > > hbase.client.scanner.timeout.period,
> > > also deprecatedly known as hbase.regionserver.lease.period The
> > > mechanism is different: here, region server scanners wants to
> > > protect themselves against dead clients that would not perform
> > > "close", and allow releasing server side scanner resources. To do
> > > that, a lease mechanism is implemented, and if between 2 next()
> > > call, more than hbase.regionserver.lease.period occurs, the server
> > > side scanner will have been forced closed by this lease timeout
> > > safety mechanism. On late next() call, client will receive a
> > > DNRIOE of type unknownScannerException, and the client will assess
> > > that it is coming most likely from the lease timeout (and not from
> > > a region move), therefore throwing an exception instead of reset
> > > scanner (for the region move scenario).
> > >
> > > Hbase 1.1 does not address, as far as I have researched, the
> > > hbase.client.scanner.timeout.period issue we are facing.
> > >
> > >
> >
> > Can you not have the high-level query that is being fed by a scan do
> > HBASE-13333? That is, tickle, the ongoing scan on occasion just to
> > say that I'm still alive?
> >
> > Otherwise, what would you suggest? A scan that does not timeout? Or
> > the client being able to set a timeout in the Scan passed to the server?
> >
> > Sorry for late reply,
> > St.Ack
> >
> >
> >
> > > And yes, we will move to Hbase 1.1, and 1.0 as Cloudera and
> > > Hortonworks are having version mismatch on the next official
> > > builds trafodion will support.
> > >
> > > So my question is still open?
> > >
> > > Best regards,
> > > Eric Owhadi
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf
> > > Of Stack
> > > Sent: Monday, August 24, 2015 11:07 PM
> > > To: HBase Dev List
> > > Subject: Re: Question on hbase.client.scanner.timeout.period
> > >
> > > On Mon, Aug 24, 2015 at 4:48 PM, Eric Owhadi
> > > <eric.owh...@esgyn.com>
> > > wrote:
> > >
> > > > Hello everyone,
> > > > We have been facing a situation on trafodion, where we are
> > > > hitting the hbase.client.scanner.timeout.period scenario:
> > > > basically, when doing queries that require spilling to disk
> > > > because of high complexity of what is involved, the underlying
> > > > hbase scanner serving one of the operation involved in the
> > > > complex query cannot call the next() withing the timeout
> > > > specify... too busy taking care of other business.
> > > > This is legit scenario, and I was wondering why in the code,
> > > > special care is done to make sure that client side, if a DNRIOE
> > > > of type unknownScannerException shows up, and the
> > > > hbase.client.scanner.timeout.period time elapsed, we make sure
> > > > to throw a scannerTimeoutException, instead of just let it go
> > > > and reset scanner.
> > > >
> > > > Scanners were redone in hbase 1.1. Can Trafodion come up onto
> > > > hbase
> > 1.1?
> > > See
> > > https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1
> > > for summary.
> > > St.Ack
> > >
> > >
> > >
> > > > I imagine that the lease time out implementation on region
> > > > server side is supposed to protect from resource leak of scanner
> > > > object server side. But I am not sure why we would make it so
> > > > that client side throw this timeout exception, when in fact what
> > > > just happened was that client was too busy to call next() on time.
> > > >
> > > > I am sure there is a reason, but cannot figure it out :-).
> > > >
> > > > BTW, I found this JIRA, talking about exact same thing:
> > > > https://issues.apache.org/jira/browse/HBASE61-21 but with no
> > resolution.
> > > >
> > >
> > >
> > > > Any help understanding the reason of the timeout thrwown client
> > > > side instead of an automatic reset would be much appreciated,
> > > > Best regards, Eric Owhadi
> > > >
> > >
> >
>

RE: Question on hbase.client.scanner.timeout.period

Reply via email to