rmdmattingly opened a new pull request, #5683: URL: https://github.com/apache/hbase/pull/5683
We've been experiencing RpcThrottlingException with 0ms waitInterval. This seems odd and wasteful, since the client side will immediately retry without backoff. I think the problem is related to the synchronization of RateLimiter. The TimeBasedLimiter checkQuota method does the following: ``` if (!reqSizeLimiter.canExecute(estimateWriteSize + estimateReadSize)) { RpcThrottlingException.throwRequestSizeExceeded( reqSizeLimiter.waitInterval(estimateWriteSize + estimateReadSize)); } ``` Both canExecute and waitInterval are synchronized, but we're calling them independently. So it's possible under high concurrency for canExecute to return false, but then waitInterval returns 0 (would have been true). This simplifies the API by having canExecute return the waitInterval, it being greater than 0 if we should throttle the client. This also implicitly fixes a bug with request number quotas — we were returning a waitInterval assuming a single resource consumption, regardless of the resources consumed by the operation: https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/TimeBasedLimiter.java#L144-L146 I'm marking this as a draft while I deploy this onto a test cluster to confirm that it resolves our 0ms throttle backoffs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org