[PR] HBASE-28359 Improve quota RateLimiter synchronization [hbase]

via GitHub Thu, 15 Feb 2024 06:30:25 -0800


rmdmattingly opened a new pull request, #5683:
URL: https://github.com/apache/hbase/pull/5683


   We've been experiencing RpcThrottlingException with 0ms waitInterval. This 
seems odd and wasteful, since the client side will immediately retry without 
backoff. I think the problem is related to the synchronization of RateLimiter.
   
   The TimeBasedLimiter checkQuota method does the following:
   ```
   if (!reqSizeLimiter.canExecute(estimateWriteSize + estimateReadSize)) {
     RpcThrottlingException.throwRequestSizeExceeded(
       reqSizeLimiter.waitInterval(estimateWriteSize + estimateReadSize));
   } 
   ```
   
   Both canExecute and waitInterval are synchronized, but we're calling them 
independently. So it's possible under high concurrency for canExecute to return 
false, but then waitInterval returns 0 (would have been true).
   
   This simplifies the API by having canExecute return the waitInterval, it 
being greater than 0 if we should throttle the client.
   
   This also implicitly fixes a bug with request number quotas — we were 
returning a waitInterval assuming a single resource consumption, regardless of 
the resources consumed by the operation: 
https://github.com/apache/hbase/blob/9c8c9e7fbf8005ea89fa9b13d6d063b9f0240443/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/TimeBasedLimiter.java#L144-L146
   
   I'm marking this as a draft while I deploy this onto a test cluster to 
confirm that it resolves our 0ms throttle backoffs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] HBASE-28359 Improve quota RateLimiter synchronization [hbase]

Reply via email to