Scan will not honor operation timeout configuration as its logic is a bit different compared to normal read/write operations.
For scan, usually there is no simple 'retry'(except the open scanner call), if you hit an error, usually you need to restart the scan by making a new open scanner call, not retry on the scanner next call. IIRC we have a special hbase.client.scanner.timeout.period and also a special hbase.rpc.timeout for meta? Thanks. Bryan Beaudreault <[email protected]> 于2022年6月1日周三 00:47写道: > Hi all, > > We just had a production issue where a user-facing API service had a low > hbase.rpc.timeout, and this majorly contributed to a meta hotspotting > issue. The issue is, user requests can only be submitted once the necessary > RegionLocation is in the MetaCache. But in a meta hotspotting scenario it > may be impossible to return a RegionLocation for hbase:meta in a timely > manner. This will trigger the rpc timeout, which may result in a number of > retries. This retry storm (across many client instances) can further > exacerbate meta hotspotting issues. > > My thought is to decouple meta rpc timeout from user rpc timeouts, because > generally you would prefer to allow a longer meta request to succeed > because it may unblock many user requests. > > I think our current timeouts for meta scans are a bit confusing. There's > a hbase.client.meta.operation.timeout, but actually that does not apply to > meta scans. Instead they are configured via hbase.rpc.timeout > and hbase.client.scanner.timeout.period. > > I was considering special casing meta scans so that they are configured via > (new) hbase.client.meta.rpc.timeout and (existing) > hbase.client.meta.operation.timeout. This would be different from typical > scan requests, but may be more intuitive overall? Does anyone have any > opinions? > > See https://issues.apache.org/jira/browse/HBASE-27078 >
