[
https://issues.apache.org/jira/browse/HBASE-28963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ray Mattingly resolved HBASE-28963.
-----------------------------------
Release Note: The horizontal scalability of the Quotas refresh chore was
improved. A side effect of this change is that each Quotas cache miss will not
result in an immediate refreshing of the cache.
Resolution: Fixed
> Updating Quota Factors is too expensive
> ---------------------------------------
>
> Key: HBASE-28963
> URL: https://issues.apache.org/jira/browse/HBASE-28963
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.6.1
> Reporter: Ray Mattingly
> Assignee: Ray Mattingly
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.2
>
> Attachments: image-2024-11-06-12-06-44-317.png,
> quota-refresh-hmaster.png
>
>
> My company is running Quotas across a few hundred clusters of varied size.
> One cluster has hundreds of servers and tens of thousands of regions. We
> noticed that the HMaster was quite busy for this cluster, and after some
> investigation we realized that RegionServers were hammering the HMaster's
> ClusterMetrics endpoint to facilitate the refreshing of table machine quota
> factors.
> There are a few things that we could do here — in a perfect world, I think
> the RegionServers would have a better P2P communication of region states, and
> whatever else is, necessary to derive new quota factors. Relying solely on
> the HMaster for this coordination creates a tricky bottleneck for the
> horizontal scalability of clusters.
> That said, I think that a simpler and preferable initial step would be to
> make our code a bit more cost conscious. At my company, for example, we don't
> even define any table-scoped quotas. Without any table scoped quotas in the
> cache, our cache could be much more thoughtful about the work that it chooses
> to do on each refresh. So I'm proposing that we check [the size of the
> tableQuotaCache
> keyset|https://github.com/apache/hbase/blob/db3ba44a4c692d26e70b6030fc519e92fd79f638/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L418]
> earlier, and use this inference to determine what ClusterMetrics we bother
> to fetch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)