haridsv commented on PR #8117:
URL: https://github.com/apache/hbase/pull/8117#issuecomment-4301939754

   Here is a more detailed reasoning by AI. Attaching the log files from bad 
run.
   
   
[bad-run.tar.gz](https://github.com/user-attachments/files/26998411/bad-run.tar.gz)
   
   Here are the key pieces of evidence:
   
   ## 1. The timeout error (bad run)
   
   **File:** 
`bad-run/org.apache.hadoop.hbase.quotas.TestBlockBytesScannedQuota.txt`, line 6:
   ```
   java.lang.AssertionError: Waiting timed out after [5,000] msec
   ```
   All 3 failures share this same 5-second timeout at 
`TestBlockBytesScannedQuota.java:263`.
   
   ## 2. Quota cache stuck in bypass mode (bad run)
   
   **File:** 
`bad-run/org.apache.hadoop.hbase.quotas.TestBlockBytesScannedQuota-output.txt`, 
repeated pattern starting around timestamp `18:29:34`:
   ```
   Traffic test yielded 5 successful requests. Expected 1 +/- 0
   User limiter for user=runner ... not refreshed, bypass expected false, 
actual true
   ```
   
   The test expects only 1 request to succeed (because quotas should throttle), 
but **5 succeed every time** because the limiter is still in `bypass=true` (no 
quota enforced). The retry loop hits this same state 4 times in ~4 seconds, 
then times out at 5 seconds.
   
   ## 3. Compare with good run
   
   **File:** 
`good-run/org.apache.hadoop.hbase.quotas.TestBlockBytesScannedQuota-output.txt`:
   ```
   Traffic test yielded 1 successful requests. Expected 1 +/- 1
   ```
   Quota enforcement kicks in immediately — no "not refreshed" messages at all 
for these assertions.
   
   ## 4. The 5-second timeout in source code
   
   **File:** `TestBlockBytesScannedQuota.java`, line 263:
   ```java
   TEST_UTIL.waitFor(5_000, () -> {
   ```
   
   ## 5. The retry penalty eats the budget
   
   **File:** `TestBlockBytesScannedQuota.java`, lines 274-277:
   ```java
   if (!success) {
       triggerUserCacheRefresh(TEST_UTIL, false, TABLE_NAME);
       waitMinuteQuota();
   }
   ```
   
   Each failed attempt calls `triggerUserCacheRefresh`, which in 
**`ThrottleQuotaTestUtil.java` line 224** does `Thread.sleep(250)` plus a 
`waitFor(60000, 250, ...)` poll loop, and `waitMinuteQuota()` (line 316) calls 
`envEdge.incValue(70000)`. The real wall-clock cost of each retry is ~1.3 
seconds (visible in the bad-run log timestamps: `18:29:34` → `18:29:35` → 
`18:29:36` → `18:29:38`), so only 3-4 attempts fit within the 5-second window.
   
   On CI (slower), the quota cache takes longer to propagate from bypass to 
enforcing. The good run on your local machine was fast enough that it resolved 
on the first attempt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to