[ https://issues.apache.org/jira/browse/HBASE-28453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831083#comment-17831083 ]
Hudson commented on HBASE-28453: -------------------------------- Results for branch master [build #1037 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1037/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1037/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1037/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (x) {color:red}-1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/master/1037/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Support a middle ground between the Average and Fixed interval rate limiters > ---------------------------------------------------------------------------- > > Key: HBASE-28453 > URL: https://issues.apache.org/jira/browse/HBASE-28453 > Project: HBase > Issue Type: Improvement > Affects Versions: 2.6.0 > Reporter: Ray Mattingly > Assignee: Ray Mattingly > Priority: Major > Labels: pull-request-available > Fix For: 2.6.0, 3.0.0-beta-2 > > Attachments: Screenshot 2024-03-21 at 2.08.51 PM.png, Screenshot > 2024-03-21 at 2.30.01 PM.png > > > h3. Background > HBase quotas support two rate limiters: a "fixed" and an "average" interval > rate limiter. > h4. FixedIntervalRateLimiter > The fixed interval rate limiter is simpler: it has a TimeUnit, say 1 second, > and it refills a resource allotment on the recurring interval. So you may get > 10 resources every second, and if you exhaust all 10 resources in the first > millisecond of an interval then you will need to wait 999ms to acquire even 1 > more resource. > h4. AverageIntervalRateLimiter > The average interval rate limiter, HBase's default, allows for more flexibly > timed refilling of the resource allotment. Extending our previous example, > say you have a 10 reads/sec quota and you have exhausted all 10 resources > within 1ms of the last full refill. If you request 1 more read then, rather > than returning a 999ms wait interval indicating the next full refill time, > the rate limiter will recognize that you only need to wait 99ms before 1 read > can be available. After 100ms has passed in aggregate since the last full > refill, it will support the refilling of 1/10th the limit to facilitate the > request for 1/10th the resources. > h3. The Problems with Current RateLimiters > The problem with the fixed interval rate limiter is that it is too strict > from a latency perspective. It results in quota limits to which we cannot > fully subscribe with any consistency. > The problem with the average interval rate limiter is that, in practice, it > is far too optimistic. For example, a real rate limiter might limit to > 100MB/sec of read IO per machine. Any multigets that come in will require > only a tiny fraction of this limit; for example, a 64kb block is only 0.06% > of the total. As a result, the vast majority of wait intervals end up being > tiny — like <5ms. This can actually cause an inverse of your intention, where > setting up a throttle causes a DDOS of your RPC layer via continuous > throttling and ~immediate retrying. I've discussed this problem in > https://issues.apache.org/jira/browse/HBASE-28429 and proposed a minimum wait > interval as the solution there; after some more thinking, I believe this new > rate limiter would be a less hacky solution to this deficit so I'd like to > close that Jira in favor of this one. > See the attached chart where I put in place a 10k req/sec/machine throttle > for this user at 10:43 to try to curb this high traffic, and it resulted in a > huge spike of req/sec due to the throttle/retry loop created by the > AverageIntervalRateLimiter. > h3. Original Proposal: PartialIntervalRateLimiter as a Solution > I've implemented a RateLimiter which allows for partial chunks of the overall > interval to be refilled, by default these chunks are 10% (or 100ms of a 1s > interval). I've deployed this to a test cluster at my day job and have seen > this really help our ability to full subscribe to a quota limit without > executing superfluous retries. See the other attached chart which shows a > cluster undergoing a rolling restart from using FixedIntervalRateLimiter to > my new PartialIntervalRateLimiter and how it is then able to fully subscribe > to its allotted 25MB/sec/machine read IO quota. > h3. Updated Proposal: Improving FixedIntervalRateLimiter > Rather than implement a new rate limiter, we can make a lower touch change > which just adds support for a refill interval that is less than the time unit > on a FixedIntervalRateLimiter. This can be a no-op change for those who have > not opted into the feature by having the refill interval default to the time > unit. For clarity, see [my branch > here|https://github.com/apache/hbase/compare/master...HubSpot:hbase:HBASE-28453] > which I will PR soon -- This message was sent by Atlassian Jira (v8.20.10#820010)