(apisix) branch master updated: docs(ai-proxy): restore max_stream_duration_ms and max_response_bytes (#13539)

alinsran Sun, 14 Jun 2026 19:47:20 -0700

This is an automated email from the ASF dual-hosted git repository.

AlinsRan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/apisix.git



The following commit(s) were added to refs/heads/master by this push:
     new d5004d1fa docs(ai-proxy): restore max_stream_duration_ms and 
max_response_bytes (#13539)
d5004d1fa is described below

commit d5004d1fab7dc972c2758b5cc29442728b77235a
Author: AlinsRan <[email protected]>
AuthorDate: Mon Jun 15 10:46:02 2026 +0800

    docs(ai-proxy): restore max_stream_duration_ms and max_response_bytes 
(#13539)
---
 docs/en/latest/plugins/ai-proxy.md | 4 +++-
 docs/zh/latest/plugins/ai-proxy.md | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/docs/en/latest/plugins/ai-proxy.md 
b/docs/en/latest/plugins/ai-proxy.md
index 802b49593..14906b20c 100644
--- a/docs/en/latest/plugins/ai-proxy.md
+++ b/docs/en/latest/plugins/ai-proxy.md
@@ -93,7 +93,9 @@ When `provider` is set to `bedrock`, the Plugin expects 
requests in the [Bedrock
 | logging        | object  | False    |         |                              
            | Logging configurations. Does not affect `error.log`. |
 | logging.summaries | boolean | False | false |                                
          | If true, logs request LLM model, duration, request, and response 
tokens. |
 | logging.payloads  | boolean | False | false |                                
          | If true, logs request and response payload. |
-| timeout        | integer | False    | 30000    | 1 - 600000                  
             | Request timeout in milliseconds when requesting the LLM service. 
|
+| timeout        | integer | False    | 30000    | 1 - 600000                  
             | Request timeout in milliseconds when requesting the LLM service. 
Applied per socket operation (connect / send / read block); does not cap the 
total duration of a streaming response. |
+| max_stream_duration_ms | integer | False |        | ≥ 1                      
                | Maximum wall-clock duration (in milliseconds) for a streaming 
AI response. If the upstream keeps sending data past this deadline, the gateway 
closes the connection. Unset means no cap. Use this to protect the gateway from 
upstream bugs that produce tokens indefinitely. When the limit is hit 
mid-stream, the downstream SSE stream is truncated (no protocol-specific 
terminator such as `[DONE]`, ` [...]
+| max_response_bytes     | integer | False |        | ≥ 1                      
                | Maximum total bytes read from the upstream for a single AI 
response (streaming or non-streaming). If exceeded, the gateway closes the 
connection. For non-streaming responses with `Content-Length`, the check is 
performed before reading the body; for chunked (no-`Content-Length`) 
non-streaming responses and for streaming responses, the cap is enforced 
incrementally as bytes are received. Unset  [...]
 | max_req_body_size | integer | False | 67108864 | >= 1 | Maximum request body 
size in bytes that the plugin reads into memory. Requests whose body exceeds 
this limit are rejected with `413`. Prevents unbounded memory buffering of 
large request bodies. |
 | keepalive      | boolean | False    | true   |                               
           | If true, keeps the connection alive when requesting the LLM 
service. |
 | keepalive_timeout | integer | False | 60000  | ≥ 1000                        
           | Keepalive timeout in milliseconds when connecting to the LLM 
service. |
diff --git a/docs/zh/latest/plugins/ai-proxy.md 
b/docs/zh/latest/plugins/ai-proxy.md
index 8d7b18102..78d0fecb7 100644
--- a/docs/zh/latest/plugins/ai-proxy.md
+++ b/docs/zh/latest/plugins/ai-proxy.md
@@ -93,7 +93,9 @@ import TabItem from '@theme/TabItem';
 | logging        | object  | 否    |         |                                  
        | 日志配置。不影响 `error.log`。 |
 | logging.summaries | boolean | 否 | false |                                    
      | 如果为 true，记录请求 LLM 模型、持续时间、请求和响应令牌。 |
 | logging.payloads  | boolean | 否 | false |                                    
      | 如果为 true，记录请求和响应负载。 |
-| timeout        | integer | 否    | 30000    | 1 - 600000                      
         | 请求 LLM 服务时的请求超时时间（毫秒）。 |
+| timeout        | integer | 否    | 30000    | 1 - 600000                      
         | 请求 LLM 服务时的请求超时时间（毫秒）。按单次 socket 操作（连接 / 发送 / 读取数据块）计算，不限制流式响应的总时长。 |
+| max_stream_duration_ms | integer | 否 |        | ≥ 1                          
            | 流式 AI 
响应的最大墙钟时长（毫秒）。如果上游在该截止时间后仍持续发送数据，网关会关闭连接。不设置表示不限制。用于防止上游异常无限产出 
token。当在流式过程中触发该限制时，下游 SSE 流会被截断（不会发送 
`[DONE]`、`message_stop`、`response.completed` 等协议终止标记）；行为正常的客户端应将缺少终止标记视为不完整的响应。 
|
+| max_response_bytes     | integer | 否 |        | ≥ 1                          
            | 单次 AI 响应（流式或非流式）从上游读取的最大总字节数。超过则网关关闭连接。对于带 `Content-Length` 
的非流式响应，在读取响应体前进行检查；对于分块（无 
`Content-Length`）的非流式响应以及流式响应，则在接收字节的过程中增量地强制执行该上限。不设置表示不限制。 |
 | keepalive      | boolean | 否    | true   |                                   
       | 如果为 true，在请求 LLM 服务时保持连接活跃。 |
 | keepalive_timeout | integer | 否 | 60000  | ≥ 1000                            
       | 连接到 LLM 服务时的保活超时时间（毫秒）。 |
 | keepalive_pool | integer | 否    | 30       | ≥ 1                             
         | LLM 服务连接的保活池大小。 |

(apisix) branch master updated: docs(ai-proxy): restore max_stream_duration_ms and max_response_bytes (#13539)

Reply via email to