RE: Question on hbase.client.scanner.timeout.period

Eric Owhadi Fri, 28 Aug 2015 11:32:35 -0700

That sounds good, but given trafodion needs to work on current and future
released version of HBase, unpatched, I will first implement a
ClientScannerTrafodion (to be deprecated), inheriting from ClientScanner
that will just overload the loadCache(),and make sure that the code that is
picking up the right scanner based on scan object is bypassed to force
getting the ClientScannerTrafodion when appropriate.
Not very elegant, but need to take into consideration trafodion deployment
requirements.
Then, if we do not discover any side effect during our QA related to this
code I will port the fix on HBase to deprecate the custom scanner (probably
first on HBase 2.0, then will let the community decide if this fix is worth
it for back porting...). It will be a first for me, but that's great, I'll
take your offer to help ;-)...
Regards,
Eric


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Thursday, August 27, 2015 3:55 PM
To: HBase Dev List <[email protected]>
Subject: Re: Question on hbase.client.scanner.timeout.period

On Thu, Aug 27, 2015 at 1:39 PM, Eric Owhadi <[email protected]> wrote:

> Oops, my bad, the related JIRA was :
> https://issues.apache.org/jira/browse/HBASE-2161
>
> I am suggesting that the special code client side in loadCache() of
> ClientScanner that is trapping the UnknownScannerException, then on
> purpose check if it is coming from a lease timeout (and not by a
> region move) to decide that it would throw a ScannerTimeoutException
> instead of letting the code go and just reset the scanner and start
> from last successful retrieve (the way it works for an
> unknowScannerException due to a region moving).
> By just removing the special handling that tries to differentiate from
> unkownScannerException due to lease timeout, we should have a
> resolution to JIRA 2161- And to our trafodion issue.
>
> We are still protecting against dead client that would cause resource
> leak at region server, since we keep the lease timeout mechanism.
>
> Not sure if I have overlooked something, as usually, code is here for
> a reason :-)...
>
>
Your proposal sounds good to me.

Scanner works the way it does because it has always work this way (smile).
A while back, one of the lads suggested we do like dynamodb and have scanner
have no state on the serverside, the scan next would just supply all
necessary context. It was argued against because serverside setup is so
costly. Your suggestion is similar only we do it only if Scanner has timed
out.

Suggest we keep the current semantic in 1.x at least. We could flip to your
behavior in 2.x.  Meantime, you'd have to ask for it when you set up your
Scan object by setting a flag.

Would that work? If you want to have a go at it, I could help out on the
issue.

St.Ack




> Regards,
> Eric
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Thursday, August 27, 2015 3:23 PM
> To: HBase Dev List <[email protected]>
> Subject: Re: Question on hbase.client.scanner.timeout.period
>
> On Tue, Aug 25, 2015 at 8:03 AM, Eric Owhadi <[email protected]>
> wrote:
>
> > Hello St.Ack,
> > Thanks for your pointer, but I had already investigated JIRA
> > https://issues.apache.org/jira/browse/HBASE-13090
> > Unfortunately, this heartbeat will protect against rpc timeout, not
> > server side lease timeout that we are experiencing right now. I have
> > not seen an active JIRA fixing our issue.
> > Only https://issues.apache.org/jira/browse/HBASE6121 is complaining
> > about the exact same issue, but was never resolved.
> >
> >
> Which issue? https://issues.apache.org/jira/browse/HBASE-6121 seems
> unrelated.
>
>
>
> > The heartbeat JIRA in 13090 protect for situation where server
> > scanner takes so long to retrieve the highly filtered information,
> > that it exceeds the RPC timeout (hbase.rpc.timeout).
>
>
>
> > The timeout we are experiencing is the
> > hbase.client.scanner.timeout.period,
> > also deprecatedly known as hbase.regionserver.lease.period The
> > mechanism is different: here, region server scanners wants to
> > protect themselves against dead clients that would not perform
> > "close", and allow releasing server side scanner resources. To do
> > that, a lease mechanism is implemented, and if between 2 next()
> > call, more than hbase.regionserver.lease.period occurs, the server
> > side scanner will have been forced closed by this lease timeout
> > safety mechanism. On late next() call, client will receive a DNRIOE
> > of type unknownScannerException, and the client will assess that it
> > is coming most likely from the lease timeout (and not from a region
> > move), therefore throwing an exception instead of reset scanner (for
> > the region move scenario).
> >
> > Hbase 1.1 does not address, as far as I have researched, the
> > hbase.client.scanner.timeout.period issue we are facing.
> >
> >
>
> Can you not have the high-level query that is being fed by a scan do
> HBASE-13333? That is, tickle, the ongoing scan on occasion just to say
> that I'm still alive?
>
> Otherwise, what would you suggest? A scan that does not timeout? Or
> the client being able to set a timeout in the Scan passed to the server?
>
> Sorry for late reply,
> St.Ack
>
>
>
> > And yes, we will move to Hbase 1.1, and 1.0 as Cloudera and
> > Hortonworks are having version mismatch on the next official builds
> > trafodion will support.
> >
> > So my question is still open?
> >
> > Best regards,
> > Eric Owhadi
> >
> >
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Stack
> > Sent: Monday, August 24, 2015 11:07 PM
> > To: HBase Dev List
> > Subject: Re: Question on hbase.client.scanner.timeout.period
> >
> > On Mon, Aug 24, 2015 at 4:48 PM, Eric Owhadi <[email protected]>
> > wrote:
> >
> > > Hello everyone,
> > > We have been facing a situation on trafodion, where we are hitting
> > > the hbase.client.scanner.timeout.period scenario:
> > > basically, when doing queries that require spilling to disk
> > > because of high complexity of what is involved, the underlying
> > > hbase scanner serving one of the operation involved in the complex
> > > query cannot call the next() withing the timeout specify... too
> > > busy taking care of other business.
> > > This is legit scenario, and I was wondering why in the code,
> > > special care is done to make sure that client side, if a DNRIOE of
> > > type unknownScannerException shows up, and the
> > > hbase.client.scanner.timeout.period time elapsed, we make sure to
> > > throw a scannerTimeoutException, instead of just let it go and
> > > reset scanner.
> > >
> > > Scanners were redone in hbase 1.1. Can Trafodion come up onto
> > > hbase
> 1.1?
> > See
> > https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1
> > for summary.
> > St.Ack
> >
> >
> >
> > > I imagine that the lease time out implementation on region server
> > > side is supposed to protect from resource leak of scanner object
> > > server side. But I am not sure why we would make it so that client
> > > side throw this timeout exception, when in fact what just happened
> > > was that client was too busy to call next() on time.
> > >
> > > I am sure there is a reason, but cannot figure it out :-).
> > >
> > > BTW, I found this JIRA, talking about exact same thing:
> > > https://issues.apache.org/jira/browse/HBASE61-21 but with no
> resolution.
> > >
> >
> >
> > > Any help understanding the reason of the timeout thrwown client
> > > side instead of an automatic reset would be much appreciated, Best
> > > regards, Eric Owhadi
> > >
> >
>

RE: Question on hbase.client.scanner.timeout.period

Reply via email to