Ray Mattingly created HBASE-28672:
-------------------------------------
Summary: Large batch requests can be blocked indefinitely by quotas
Key: HBASE-28672
URL: https://issues.apache.org/jira/browse/HBASE-28672
Project: HBase
Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Mattingly
At my day job we are trying to implement default quotas for a variety of access
patterns. We began by introducing a default read IO limit per-user, per-machine
— this has been very successful in reducing hotspots, even on clusters with
thousands of distinct users.
While implementing a default writes/second throttle, I realized that doing so
would put us in a precarious situation where large-enough batches may never
succeed. If your batch size is greater than your TimeLimiter's max throughput,
then you will always fail in the quota estimation stage. Meanwhile [IO
estimates are more
optimistic|https://github.com/apache/hbase/blob/bdb3f216e864e20eb2b09352707a751a5cf7460f/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/DefaultOperationQuota.java#L192-L193],
deliberately, which can let large requests do targeted oversubscription of an
IO quota:
{code:java}
// assume 1 block required for reads. this is probably a low estimate, which is
okay
readConsumed = numReads > 0 ? blockSizeBytes : 0;{code}
This is okay because the Limiter's availability will go negative and force a
longer backoff on subsequent requests. I believe this is preferable UX compared
to a doomed throttling loop.
In my opinion, we should do something similar in batch request estimation, by
estimating a batch request's workload at {{Math.min(batchSize,
limiterMaxThroughput)}} rather than simply {{{}batchSize{}}}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)