Re: Separately configurable client meta rpc timeout

Andrew Purtell Mon, 20 Jun 2022 09:46:29 -0700

Our default position should be to resist adding new configuration
variables, but in this case, I think it makes sense.
+1 for adding a distinct timeout setting for meta. Definitely a valid
special case.


On Mon, Jun 20, 2022 at 9:09 AM 张铎(Duo Zhang) <[email protected]> wrote:

> You can see the comments at the top of the method, on why we do not honor
> the rpc timeout, and also not the operation timeout.
>
> So here maybe we should introduce a special scan timeout for the meta
> table?
>
> Bryan Beaudreault <[email protected]> 于2022年6月20日周一
> 23:45写道：
>
> > Hi Duo, just getting back to this. Thanks for your response.
> >
> > Actually I'm pretty sure there is a simple retry for all scanner next
> > calls. In master branch this occurs
> > in AsyncScanSingleRegionRpcRetryingCaller#call(), which is called from
> > #next(). The stub.scan() call in call() passes a callback onComplete
> which
> > includes an error handling call of onError. In onError, a retry is
> > scheduled at the end of the method which calls call() again. See
> >
> >
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncScanSingleRegionRpcRetryingCaller.java#L584
> > .
> > Let me know if I'm missing something. Similar logic in branch-2 blocking
> > client.
> >
> > But anyway, most meta calls are small scans which return their results in
> > the openScanner call anyway. So improperly tuned rpc timeouts (too short)
> > can cause retries in openScanner, and probably next() as well if
> > applicable.
> >
> > I took another look and we do not have any special
> > hbase.client.scanner.timeout or hbase.rpc.timeout for meta. Unless I'm
> > missing something in the link above, I'm going to move forward adding
> these
> > in the jira.
> >
> > On Tue, May 31, 2022 at 8:55 PM 张铎(Duo Zhang) <[email protected]>
> > wrote:
> >
> > > Scan will not honor operation timeout configuration as its logic is a
> bit
> > > different compared to normal read/write operations.
> > >
> > > For scan, usually there is no simple 'retry'(except the open scanner
> > call),
> > > if you hit an error, usually you need to restart the scan by making a
> new
> > > open scanner call, not retry on the scanner next call.
> > >
> > > IIRC we have a special hbase.client.scanner.timeout.period and also a
> > > special hbase.rpc.timeout for meta?
> > >
> > > Thanks.
> > >
> > > Bryan Beaudreault <[email protected]> 于2022年6月1日周三
> > 00:47写道：
> > >
> > > > Hi all,
> > > >
> > > > We just had a production issue where a user-facing API service had a
> > low
> > > > hbase.rpc.timeout, and this majorly contributed to a meta hotspotting
> > > > issue. The issue is, user requests can only be submitted once the
> > > necessary
> > > > RegionLocation is in the MetaCache. But in a meta hotspotting
> scenario
> > it
> > > > may be impossible to return a RegionLocation for hbase:meta in a
> timely
> > > > manner. This will trigger the rpc timeout, which may result in a
> number
> > > of
> > > > retries. This retry storm (across many client instances) can further
> > > > exacerbate meta hotspotting issues.
> > > >
> > > > My thought is to decouple meta rpc timeout from user rpc timeouts,
> > > because
> > > > generally you would prefer to allow a longer meta request to succeed
> > > > because it may unblock many user requests.
> > > >
> > > > I think our current timeouts for meta scans are a bit confusing.
> > There's
> > > > a hbase.client.meta.operation.timeout, but actually that does not
> apply
> > > to
> > > > meta scans. Instead they are configured via hbase.rpc.timeout
> > > > and hbase.client.scanner.timeout.period.
> > > >
> > > > I was considering special casing meta scans so that they are
> configured
> > > via
> > > > (new) hbase.client.meta.rpc.timeout and (existing)
> > > > hbase.client.meta.operation.timeout. This would be different from
> > typical
> > > > scan requests, but may be more intuitive overall? Does anyone have
> any
> > > > opinions?
> > > >
> > > > See https://issues.apache.org/jira/browse/HBASE-27078
> > > <https://issues.apache.org/jira/browse/HBASE-27078>
> > > >
> > >
> >
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
    It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse

Re: Separately configurable client meta rpc timeout

Reply via email to