Duo Zhang created HBASE-30097:
---------------------------------

             Summary: Fix flaky TestBlockBytesScannedQuota
                 Key: HBASE-30097
                 URL: https://issues.apache.org/jira/browse/HBASE-30097
             Project: HBase
          Issue Type: Sub-task
          Components: test
            Reporter: Duo Zhang


Sonnet 4.5(4.6?) summary

TestBlockBytesScannedQuota
The test is flapping due to a timing/race condition in the quota system:

5-second timeout too short: The testTraffic method only waited 5 seconds for 
quotas to take effect
Quota cache not fully propagated: On slower systems (like CI), the quota cache 
refresh can be asynchronous and may not fully propagate in time
Quotas bypassed: When cache isn't refreshed, the logs show "bypass expected 
false, actual true", meaning all requests succeed instead of being throttled
Insufficient retries: Each iteration takes ~1.3 seconds, so only 3-4 retries 
fit in 5 seconds, not enough for the quota system to stabilize
Bad run pattern:

Test expects 1 successful request but gets 5 (all succeed because quotas not 
enforced)
Retries every ~1.3 seconds for 4 attempts
Times out after 5 seconds with "Waiting timed out after [5,000] msec"
Good run pattern:

Quotas enforced immediately
Tests pass quickly (36.97s total vs 63.14s for failed run)
Increased the timeout in testTraffic() from 5,000ms to 30,000ms (line 263). 
This gives the quota system sufficient time to:

Complete cache refresh
Propagate quota settings across all components
Handle slower CI environments
This is a conservative fix that maintains the retry logic while allowing 
adequate time for the distributed quota system to stabilize. The 30-second 
timeout is still reasonable for a test and should handle the asynchronous 
nature of quota enforcement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to