This is an automated email from the ASF dual-hosted git repository.
Apache9 pushed a commit to branch branch-2.6
in repository https://gitbox.apache.org/repos/asf/hbase.git
The following commit(s) were added to refs/heads/branch-2.6 by this push:
new 7e4a75c96a4 HBASE-30111 Fix for flapper TestBlockBytesScannedQuota by
AI (#8117)
7e4a75c96a4 is described below
commit 7e4a75c96a4abc5998abe7868177a647c528448b
Author: Hari Krishna Dara <[email protected]>
AuthorDate: Fri Apr 24 20:39:57 2026 +0530
HBASE-30111 Fix for flapper TestBlockBytesScannedQuota by AI (#8117)
AI tool: Claude Code
AI Model: Sonnet 4.5 (reverified using Opus 4.6 (1M context))
Prompt: You need to diagnose the HBase test TestBlockBytesScannedQuota that
is flapping. It had multiple failures and errors in the PR validation build,
but none when I ran locally. The source code is at @../../src/apache/hbase/ and
the logs for good run is at @Good-Run and bad run is at @bad-run. Can you
compare the logs and try to understand why the run failed and look at the code
for a potential fix?
Summary by AI:
The test is flapping due to a **timing/race condition** in the quota system:
1. **5-second timeout too short**: The `testTraffic` method only waited 5
seconds for quotas to take effect
2. **Quota cache not fully propagated**: On slower systems (like CI), the
quota cache refresh can be asynchronous and may not fully propagate in time
3. **Quotas bypassed**: When cache isn't refreshed, the logs show `"bypass
expected false, actual true"`, meaning all requests succeed instead of being
throttled
4. **Insufficient retries**: Each iteration takes ~1.3 seconds, so only 3-4
retries fit in 5 seconds, not enough for the quota system to stabilize
**Bad run pattern:**
- Test expects 1 successful request but gets 5 (all succeed because quotas
not enforced)
- Retries every ~1.3 seconds for 4 attempts
- Times out after 5 seconds with "Waiting timed out after [5,000] msec"
**Good run pattern:**
- Quotas enforced immediately
- Tests pass quickly (36.97s total vs 63.14s for failed run)
Increased the timeout in `testTraffic()` from **5,000ms to 30,000ms** (line
263). This gives the quota system sufficient time to:
- Complete cache refresh
- Propagate quota settings across all components
- Handle slower CI environments
This is a conservative fix that maintains the retry logic while allowing
adequate time for the distributed quota system to stabilize. The 30-second
timeout is still reasonable for a test and should handle the asynchronous
nature of quota enforcement.
Signed-off-by: Duo Zhang <[email protected]>
(cherry picked from commit de30a2bdb08326ec69195bf0da52e4fa2d18311a)
---
.../java/org/apache/hadoop/hbase/quotas/TestBlockBytesScannedQuota.java | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
a/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestBlockBytesScannedQuota.java
b/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestBlockBytesScannedQuota.java
index 543f138da73..a4c522447ec 100644
---
a/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestBlockBytesScannedQuota.java
+++
b/hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestBlockBytesScannedQuota.java
@@ -260,7 +260,7 @@ public class TestBlockBytesScannedQuota {
private void testTraffic(Callable<Long> trafficCallable, long
expectedSuccess, long marginOfError)
throws Exception {
- TEST_UTIL.waitFor(5_000, () -> {
+ TEST_UTIL.waitFor(30_000, () -> {
long actualSuccess;
try {
actualSuccess = trafficCallable.call();