shreemaan-abhishek opened a new pull request, #13139:
URL: https://github.com/apache/apisix/pull/13139
## Problem
`apisix_llm_active_connections` is a Prometheus Gauge that tracks in-flight
LLM requests. The counter leaks (never decrements) whenever a plugin calls
`ngx.exit()` during request processing — not only in SSE streaming, but also in
**non-streaming responses**.
**Root cause**: When `ai-aliyun-content-moderation` (or any other plugin)
calls `ngx.exit()` inside a phase handler (e.g. `body_filter`,
`header_filter`), OpenResty terminates the current coroutine immediately. This
exit is **not** caught by the `pcall` wrapping the upstream request in
`ai-proxy/base.lua`. As a result:
1. `exporter.inc_llm_active_connections(ctx)` is called before
`pcall(do_request)` ✓
2. A plugin calls `ngx.exit()` — either mid-stream (SSE) or after receiving
a complete non-streaming response
3. `exporter.dec_llm_active_connections(ctx)` placed after `pcall` is
**never reached** ✗
4. Gauge leaks — only goes up, never down
This affects both `ai-proxy` and `ai-proxy-multi` in all request types:
non-streaming chat, SSE streaming, and any other path where a downstream plugin
exits early.
## Fix
Remove the `dec` call from after `pcall` in `ai-proxy/base.lua` and instead
rely solely on the **log phase**, which always runs even after `ngx.exit()`.
Introduce a `ctx.llm_active_connections_tracked` flag to prevent
double-decrement:
**`ai-proxy/base.lua`** — increment and set flag, no `dec` after `pcall`:
```lua
exporter.inc_llm_active_connections(ctx)
ctx.llm_active_connections_tracked = true
local ok, code_or_err, body = pcall(do_request)
-- dec is intentionally NOT here — handled in log phase
```
**`ai-proxy.lua` and `ai-proxy-multi.lua` log phase**:
```lua
function _M.log(conf, ctx)
if ctx.llm_active_connections_tracked then
exporter.dec_llm_active_connections(ctx)
ctx.llm_active_connections_tracked = false
end
-- ...
end
```
The log phase runs unconditionally regardless of how the request ended
(normal completion, upstream error, or `ngx.exit()` from any plugin), so the
gauge is always correctly decremented.
## Tests
Added a regression test in `t/plugin/ai-aliyun-content-moderation.t`:
- Creates a route with `prometheus` + `ai-proxy` +
`ai-aliyun-content-moderation` (`check_response=true`)
- Sends a **non-streaming** chat request (LLM mock always returns offensive
content)
- Content moderation denies the response via `ngx.exit(400)`
- Asserts `apisix_llm_active_connections{...} 0` in Prometheus metrics after
the log phase completes
All existing tests in `t/plugin/prometheus-ai-proxy.t` (40 tests) continue
to pass.
### Checklist
- [x] I have explained the need for this PR and the problem it solves
- [x] I have explained the changes or the new features added to this PR
- [x] I have added tests corresponding to this change
- [ ] I have updated the documentation to reflect this change
- [x] I have verified that this change is backward compatible (If not,
please discuss on the [APISIX mailing
list](https://github.com/apache/apisix/tree/master#community) first)
<!--
Note
1. Mark the PR as draft until it's ready to be reviewed.
2. Always add/update tests for any changes unless you have a good reason.
3. Always update the documentation to reflect the changes made in the PR.
4. Make a new commit to resolve conversations instead of `push -f`.
5. To resolve merge conflicts, merge master instead of rebasing.
6. Use "request review" to notify the reviewer after making changes.
7. Only a reviewer can mark a conversation as resolved.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]