AlinsRan commented on PR #13602:
URL: https://github.com/apache/apisix/pull/13602#issuecomment-4786124185

   ### On the lack of an APISIX-level test
   
   This is a dependency bump that pulls in the upstream fix for duplicate 
metric emission. I deliberately did **not** add a `t/` case, and want to record 
why, since "where's the test?" is a fair question.
   
   The duplicate is a **cross-worker race** in `prometheus_keys`: a worker 
lists the same series twice only when a TTL'd slot expires and is re-added 
*before* that worker's incremental-sync cursor, so `KeyIndex:sync_range()` 
never revisits and reconciles the stale slot. I tried to reproduce it 
deterministically at the APISIX level several ways, and none works:
   
   - **Real multi-worker + HTTP traffic** (create metric → let it expire → 
re-hit → scrape, ×20 scrapes): 0 duplicates, even on the old library.
   - **Two real `KeyIndex` instances over a real `ngx.shared` dict** (real 
`add`/`sync`/expire/`re-add`, 1-key and 2-key constructions, traced step by 
step): the stale slot is reconciled away on the next incremental sync → no 
duplicate.
   - **Mock dict + controllable clock** (same realistic sequence): same result 
— no duplicate.
   
   In every straightforward sequence the shared-dict sync removes the stale 
slot, so the duplicate simply does not occur. That intermittency is exactly why 
#11934 only showed up in production.
   
   Consistent with this, the upstream library verifies the fix at the 
`KeyIndex` unit level rather than through realistic traffic — 
`testListDeduplicatesStaleSlot` and `testExpiredReAddNoDuplicate` in 
`prometheus_test.lua`. Those are the authoritative tests for this fix, and they 
ship with the bumped version. Adding a contrived white-box test inside APISIX 
would only duplicate them without exercising anything APISIX-specific.
   
   Happy to add a mock-dict scenario test here as well if reviewers would 
prefer an in-repo check.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to