RE: Question on hbase.client.scanner.timeout.period

Eric Owhadi Thu, 27 Aug 2015 13:40:02 -0700

Oops, my bad, the related JIRA was :
https://issues.apache.org/jira/browse/HBASE-2161


I am suggesting that the special code client side in loadCache() of
ClientScanner that is trapping the UnknownScannerException, then on purpose
check if it is coming from a lease timeout (and not by a region move) to
decide that it would throw a ScannerTimeoutException instead of letting the
code go and just reset the scanner and start from last successful retrieve
(the way it works for an unknowScannerException due to a region moving).
By just removing the special handling that tries to differentiate from
unkownScannerException due to lease timeout, we should have a resolution to
JIRA 2161- And to our trafodion issue.

We are still protecting against dead client that would cause resource leak
at region server, since we keep the lease timeout mechanism.

Not sure if I have overlooked something, as usually, code is here for a
reason :-)...

Regards,
Eric



-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Thursday, August 27, 2015 3:23 PM
To: HBase Dev List <[email protected]>
Subject: Re: Question on hbase.client.scanner.timeout.period

On Tue, Aug 25, 2015 at 8:03 AM, Eric Owhadi <[email protected]> wrote:

> Hello St.Ack,
> Thanks for your pointer, but I had already investigated JIRA
> https://issues.apache.org/jira/browse/HBASE-13090
> Unfortunately, this heartbeat will protect against rpc timeout, not
> server side lease timeout that we are experiencing right now. I have
> not seen an active JIRA fixing our issue.
> Only https://issues.apache.org/jira/browse/HBASE6121 is complaining
> about the exact same issue, but was never resolved.
>
>
Which issue? https://issues.apache.org/jira/browse/HBASE-6121 seems
unrelated.



> The heartbeat JIRA in 13090 protect for situation where server scanner
> takes so long to retrieve the highly filtered information, that it
> exceeds the RPC timeout (hbase.rpc.timeout).



> The timeout we are experiencing is the
> hbase.client.scanner.timeout.period,
> also deprecatedly known as hbase.regionserver.lease.period The
> mechanism is different: here, region server scanners wants to protect
> themselves against dead clients that would not perform "close", and
> allow releasing server side scanner resources. To do that, a lease
> mechanism is implemented, and if between 2 next() call, more than
> hbase.regionserver.lease.period occurs, the server side scanner will
> have been forced closed by this lease timeout safety mechanism. On
> late next() call, client will receive a DNRIOE of type
> unknownScannerException, and the client will assess that it is coming
> most likely from the lease timeout (and not from a region move),
> therefore throwing an exception instead of reset scanner (for the
> region move scenario).
>
> Hbase 1.1 does not address, as far as I have researched, the
> hbase.client.scanner.timeout.period issue we are facing.
>
>

Can you not have the high-level query that is being fed by a scan do
HBASE-13333? That is, tickle, the ongoing scan on occasion just to say that
I'm still alive?

Otherwise, what would you suggest? A scan that does not timeout? Or the
client being able to set a timeout in the Scan passed to the server?

Sorry for late reply,
St.Ack



> And yes, we will move to Hbase 1.1, and 1.0 as Cloudera and
> Hortonworks are having version mismatch on the next official builds
> trafodion will support.
>
> So my question is still open?
>
> Best regards,
> Eric Owhadi
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Stack
> Sent: Monday, August 24, 2015 11:07 PM
> To: HBase Dev List
> Subject: Re: Question on hbase.client.scanner.timeout.period
>
> On Mon, Aug 24, 2015 at 4:48 PM, Eric Owhadi <[email protected]>
> wrote:
>
> > Hello everyone,
> > We have been facing a situation on trafodion, where we are hitting
> > the hbase.client.scanner.timeout.period scenario:
> > basically, when doing queries that require spilling to disk because
> > of high complexity of what is involved, the underlying hbase scanner
> > serving one of the operation involved in the complex query cannot
> > call the next() withing the timeout specify... too busy taking care
> > of other business.
> > This is legit scenario, and I was wondering why in the code, special
> > care is done to make sure that client side, if a DNRIOE of type
> > unknownScannerException shows up, and the
> > hbase.client.scanner.timeout.period time elapsed, we make sure to
> > throw a scannerTimeoutException, instead of just let it go and reset
> > scanner.
> >
> > Scanners were redone in hbase 1.1. Can Trafodion come up onto hbase 1.1?
> See https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1
> for summary.
> St.Ack
>
>
>
> > I imagine that the lease time out implementation on region server
> > side is supposed to protect from resource leak of scanner object
> > server side. But I am not sure why we would make it so that client
> > side throw this timeout exception, when in fact what just happened
> > was that client was too busy to call next() on time.
> >
> > I am sure there is a reason, but cannot figure it out :-).
> >
> > BTW, I found this JIRA, talking about exact same thing:
> > https://issues.apache.org/jira/browse/HBASE61-21 but with no resolution.
> >
>
>
> > Any help understanding the reason of the timeout thrwown client side
> > instead of an automatic reset would be much appreciated, Best
> > regards, Eric Owhadi
> >
>

RE: Question on hbase.client.scanner.timeout.period

Reply via email to