nic-6443 opened a new pull request, #13250: URL: https://github.com/apache/apisix/pull/13250
### What this does Adds two opt-in configuration knobs to `ai-proxy` and `ai-proxy-multi` to protect the gateway from a runaway upstream LLM service: - `max_stream_duration_ms` — wall-clock cap on total streaming response duration. - `max_response_bytes` — cap on total bytes read from the upstream for a single response (streaming or non-streaming). Both are opt-in (no default) — existing deployments are unaffected. ### Why The existing `timeout` field is fed to `httpc:set_timeout()`, which is a per-socket-operation timeout (connect / send / read-one-block). It does **not** bound the total duration of a streaming response. If an upstream LLM has a bug that causes it to continuously emit valid SSE tokens without ever sending a terminator (`[DONE]`, `message_stop`, `response.completed`), `parse_streaming_response` sits in an uncapped `while true` loop, pinning the worker at ~100% CPU indefinitely and degrading availability for all other traffic on that worker. ### Behavior on abort - **Streaming, limit hit mid-stream (bytes already flushed):** stop feeding chunks and force-close the upstream httpc (`close()` + `res._httpc = nil`, so we don't pool a half-drained connection). nginx closes the downstream connection at end of content phase. The client detects truncation via the missing protocol-specific terminator. We intentionally do **not** synthesize a per-protocol "graceful error" SSE frame: we support three client protocols (OpenAI chat, Anthropic messages, OpenAI responses) with different terminators, and a missing terminator is the standard SSE way any mid-stream network failure is communicated to clients. - **Streaming, limit hit before any output:** return `504` (duration) or `502` (size) so `on_error` / fallback / retry hooks can kick in like any other upstream failure. - **Non-streaming, `Content-Length` exceeds cap:** pre-check the header, force-close the connection, return `502` without ever reading the body. - **Non-streaming, chunked / no `Content-Length`:** post-read size check catches the oversized body and returns `502`. - `ctx.var.llm_request_done = true` is set on abort so downstream filters (e.g. moderation plugins that defer work until completion) finalize their state. - A `core.log.warn` line is emitted on every abort (`aborting AI stream: <limit> exceeded; bytes=X duration_ms=Y route_id=Z`) so log-based alerting can surface the event. No new Prometheus metric — the log line is sufficient and avoids expanding the plugin's metric surface. ### Caveat (documented) Both limits are best-effort: they are enforced after each chunk is read from the upstream, so the byte cap can overshoot by up to one upstream chunk (≈8 KiB in practice) and the duration cap can overshoot by up to one chunk's processing time. This is acceptable for the failure mode we are defending against (runaway streams produce tens of MB/s, so a one-chunk overshoot is negligible compared to "run forever"). ### Testing New `t/plugin/ai-proxy-stream-limits.t` with a mock upstream that either streams OpenAI chat SSE chunks forever (no `[DONE]`) or returns a 100 KB body with matching `Content-Length`. Covers: 1. `max_stream_duration_ms=500` → request aborted in <5 s with the expected log line. 2. `max_response_bytes=2048` → request aborted in <5 s with the expected log line. 3. Non-streaming `max_response_bytes=1024` vs 100 KB upstream response → 502 + expected log line. 4. Schema validation rejects `max_stream_duration_ms: 0`. `luacheck` passes on all three modified Lua files. ### Docs Added rows to the config tables in `docs/en/latest/plugins/ai-proxy.md`, `ai-proxy-multi.md`, and their Chinese translations, with a clarifying note that `timeout` only bounds per-socket-operation timeouts and the new fields are needed to bound total stream duration / total bytes read. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
