sihyeonn opened a new pull request, #13195:
URL: https://github.com/apache/apisix/pull/13195

   ## Summary
   
   When metrics are configured with an `expire` value, nginx's slab allocator 
marks entries as logically expired but does **not** automatically return the 
underlying slab pages to the free-space pool. As a result, 
`apisix_shared_dict_free_space_bytes` for `prometheus-metrics` decreases 
monotonically over time — slabs are only reclaimed when explicitly flushed.
   
   ## Root Cause
   
   `ngx.shared.DICT:flush_expired()` must be called explicitly to reclaim slab 
memory from expired entries. Without it:
   
   - Time series expire logically (reads return nil after `expire` seconds)
   - But the slab memory is **not** returned to free space
   - `free_space_bytes` trends toward zero regardless of actual active 
time-series count
   
   This can be observed by comparing `free_space_bytes` with the active 
time-series count: the count fluctuates (e.g. drops significantly during 
low-traffic periods) while free space never recovers — even after most entries 
have expired.
   
   ## Fix
   
   Call `dict:flush_expired(1000)` inside `exporter_timer`, which already runs 
every `refresh_interval` (default 15 s) in the privileged agent process.
   
   ```lua
   local prom_dict = ngx.shared["prometheus-metrics"]
   if prom_dict then
       prom_dict:flush_expired(1000)
   end
   ```
   
   **Why `max_count=1000`**: Without a limit, a single flush call could hold 
the shared-dict write lock for an extended time if many expired entries have 
accumulated. Limiting to 1000 per cycle keeps the lock time well under 10 ms in 
practice, while remaining entries are flushed in subsequent timer ticks (every 
15 s).
   
   The call runs in the privileged agent process, which is separate from worker 
request-handling processes, so the brief write-lock has minimal impact on 
request throughput.
   
   ## Checklist
   
   - [x] No functional change to metric collection or rendering
   - [x] Compatible with existing `expire` metric configuration
   - [x] Works with any `refresh_interval` setting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to