Hello St.Ack,
Thanks for your pointer, but I had already investigated JIRA
https://issues.apache.org/jira/browse/HBASE-13090
Unfortunately, this heartbeat will protect against rpc timeout, not server
side lease timeout that we are experiencing right now. I have not seen an
active JIRA fixing our issue.
Only https://issues.apache.org/jira/browse/HBASE6121 is complaining about
the exact same issue, but was never resolved.

The heartbeat JIRA in 13090 protect for situation where server scanner takes
so long to retrieve the highly filtered information, that it exceeds the RPC
timeout (hbase.rpc.timeout).
The timeout we are experiencing is the hbase.client.scanner.timeout.period,
also deprecatedly known as hbase.regionserver.lease.period
The mechanism is different: here, region server scanners wants to protect
themselves against dead clients that would not perform "close", and allow
releasing server side scanner resources. To do that, a lease mechanism is
implemented, and if between 2 next() call, more than
hbase.regionserver.lease.period occurs, the server side scanner will have
been forced closed by this lease timeout safety mechanism. On late next()
call, client will receive a DNRIOE of type unknownScannerException, and the
client will assess that it is coming most likely from the lease timeout (and
not from a region move), therefore throwing an exception instead of reset
scanner (for the region move scenario).

Hbase 1.1 does not address, as far as I have researched, the
hbase.client.scanner.timeout.period issue we are facing.

And yes, we will move to Hbase 1.1, and 1.0 as Cloudera and Hortonworks are
having version mismatch on the next official builds trafodion will support.

So my question is still open?

Best regards,
Eric Owhadi



-----Original Message-----
From: saint....@gmail.com [mailto:saint....@gmail.com] On Behalf Of Stack
Sent: Monday, August 24, 2015 11:07 PM
To: HBase Dev List
Subject: Re: Question on hbase.client.scanner.timeout.period

On Mon, Aug 24, 2015 at 4:48 PM, Eric Owhadi <eric.owh...@esgyn.com> wrote:

> Hello everyone,
> We have been facing a situation on trafodion, where we are hitting the
> hbase.client.scanner.timeout.period scenario:
> basically, when doing queries that require spilling to disk because of
> high complexity of what is involved, the underlying hbase scanner
> serving one of the operation involved in the complex query cannot call
> the next() withing the timeout specify... too busy taking care of other
> business.
> This is legit scenario, and I was wondering why in the code, special
> care is done to make sure that client side, if a DNRIOE of type
> unknownScannerException shows up, and the
> hbase.client.scanner.timeout.period time elapsed, we make sure to
> throw a scannerTimeoutException, instead of just let it go and reset
> scanner.
>
> Scanners were redone in hbase 1.1. Can Trafodion come up onto hbase 1.1?
See https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1 for
summary.
St.Ack



> I imagine that the lease time out implementation on region server side
> is supposed to protect from resource leak of scanner object server
> side. But I am not sure why we would make it so that client side throw
> this timeout exception, when in fact what just happened was that
> client was too busy to call next() on time.
>
> I am sure there is a reason, but cannot figure it out :-).
>
> BTW, I found this JIRA, talking about exact same thing:
> https://issues.apache.org/jira/browse/HBASE61-21 but with no resolution.
>


> Any help understanding the reason of the timeout thrwown client side
> instead of an automatic reset would be much appreciated, Best regards,
> Eric Owhadi
>

Reply via email to