This is an automated email from the ASF dual-hosted git repository.
yilialin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/apisix.git
The following commit(s) were added to refs/heads/master by this push:
new 05a9bf252 docs: fix ai-proxy-multi attribute nesting and add missing
health check sub-attributes (#13169)
05a9bf252 is described below
commit 05a9bf2526cdc89d1b0fac0cdae7082b281d06f7
Author: Yilia Lin <[email protected]>
AuthorDate: Fri Apr 17 16:30:31 2026 +0800
docs: fix ai-proxy-multi attribute nesting and add missing health check
sub-attributes (#13169)
---
docs/en/latest/plugins/ai-proxy-multi.md | 1678 ++++++++++++++++++++++++++++-
docs/zh/latest/plugins/ai-proxy-multi.md | 1697 +++++++++++++++++++++++++++++-
2 files changed, 3293 insertions(+), 82 deletions(-)
diff --git a/docs/en/latest/plugins/ai-proxy-multi.md
b/docs/en/latest/plugins/ai-proxy-multi.md
index dedde607c..2a71760b3 100644
--- a/docs/en/latest/plugins/ai-proxy-multi.md
+++ b/docs/en/latest/plugins/ai-proxy-multi.md
@@ -33,6 +33,9 @@ description: The ai-proxy-multi Plugin extends the
capabilities of ai-proxy with
<link rel="canonical" href="https://docs.api7.ai/hub/ai-proxy-multi" />
</head>
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
## Description
The `ai-proxy-multi` Plugin simplifies access to LLM and embedding models by
transforming Plugin configurations into the designated request format for
OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, and
other OpenAI-compatible APIs. It extends the capabilities of
[`ai-proxy`](./ai-proxy.md) with load balancing, retries, fallbacks, and health
checks.
@@ -49,9 +52,9 @@ In addition, the Plugin also supports logging LLM request
information in the acc
## Attributes
-| Name | Type | Required | Default
| Valid Values | Description |
+| Name | Type | Required | Default
| Valid values | Description |
|------------------------------------|----------------|----------|-----------------------------------|--------------|-------------|
-| fallback_strategy | string or array | False | |
string: "instance_health_and_rate_limiting", "http_429", "http_5xx"<br />array:
["rate_limiting", "http_429", "http_5xx"] | Fallback strategy. When set, the
Plugin will check whether the specified instance’s token has been exhausted
when a request is forwarded. If so, forward the request to the next instance
regardless of the instance priority. When not set, the Plugin will not forward
the request to low prior [...]
+| fallback_strategy | string or array | False | |
string: "instance_health_and_rate_limiting", "http_429", "http_5xx"<br />array:
["rate_limiting", "http_429", "http_5xx"] | Fallback strategy. When set, the
Plugin will check whether the specified instance's token has been exhausted
when a request is forwarded. If so, forward the request to the next instance
regardless of the instance priority. When not set, the Plugin will not forward
the request to low prior [...]
| balancer | object | False |
| | Load balancing configurations. |
| balancer.algorithm | string | False | roundrobin
| [roundrobin, chash] | Load balancing algorithm. When set
to `roundrobin`, weighted round robin algorithm is used. When set to `chash`,
consistent hashing algorithm is used. |
| balancer.hash_on | string | False |
| [vars, headers, cookie, consumer, vars_combinations] |
Used when `type` is `chash`. Support hashing on [NGINX
variables](https://nginx.org/en/docs/varindex.html), headers, cookie, consumer,
or a combination of [NGINX variables](https://nginx.org/en/docs/varindex.html).
|
@@ -73,31 +76,29 @@ In addition, the Plugin also supports logging LLM request
information in the acc
| instances.auth.gcp.expire_early_secs| integer | False | 60
| minimum = 0 | Seconds to expire the access token
before its actual expiration time to avoid edge cases. |
| instances.options | object | False |
| | Model configurations. In addition to
`model`, you can configure additional parameters and they will be forwarded to
the upstream LLM service in the request body. For instance, if you are working
with OpenAI, DeepSeek, or AIMLAPI, you can configure additional parameters such
as `max_tokens`, `temperature`, `top_p`, and `stream`. See your LLM provider's
API documentation f [...]
| instances.options.model | string | False |
| | Name of the LLM model, such as `gpt-4`
or `gpt-3.5`. See your LLM provider's API documentation for more available
models. |
-| instances.override | object | False |
| | Override setting. |
-| instances.override.endpoint | string | False |
| | LLM provider endpoint to replace the
default endpoint with. If not configured, the Plugin uses the default OpenAI
endpoint `https://api.openai.com/v1/chat/completions`. |
-| logging | object | False |
| | Logging configurations. Does not affect
`error.log`. |
-| logging.summaries | boolean | False | false
| | If true, logs request LLM model,
duration, request, and response tokens. |
-| logging.payloads | boolean | False | false
| | If true, logs request and response
payload. |
-| checks | object | False |
| | Health check configurations. Note that
at the moment, OpenAI, DeepSeek, and AIMLAPI do not provide an official health
check endpoint. Other LLM services that you can configure under
`openai-compatible` provider may have available health check endpoints. |
-| checks.active | object | True |
| | Active health check configurations. |
-| checks.active.type | string | False | http
| [http, https, tcp] | Type of health check connection. |
-| checks.active.timeout | number | False | 1
| | Health check timeout in seconds. |
-| checks.active.concurrency | integer | False | 10
| | Number of upstream nodes to be checked at
the same time. |
-| checks.active.host | string | False |
| | HTTP host. |
-| checks.active.port | integer | False |
| between 1 and 65535 inclusive | HTTP port. |
-| checks.active.http_path | string | False | /
| | Path for HTTP probing requests. |
-| checks.active.https_verify_certificate | boolean | False | true
| | If true, verify the node's TLS certificate.
|
-| checks.active.req_headers | array[string] | False |
| | Additional request headers for the
active health check probe. |
-| checks.active.healthy | object | False |
| | Healthy check configurations. |
-| checks.active.healthy.interval | integer | False | 1
| minimum = 1 | Time interval of checking healthy nodes,
in seconds. |
-| checks.active.healthy.http_statuses | array[integer] | False | [200, 302]
| between 200 and 599 | HTTP status codes defining a
healthy node. |
-| checks.active.healthy.successes | integer | False | 2
| between 1 and 254 | Number of successful probes to
define a healthy node. |
-| checks.active.unhealthy | object | False |
| | Unhealthy check configurations. |
-| checks.active.unhealthy.interval | integer | False | 1
| minimum = 1 | Time interval of checking unhealthy
nodes, in seconds. |
-| checks.active.unhealthy.http_statuses | array[integer] | False | [429, 404,
500, 501, 502, 503, 504, 505] | between 200 and 599 | HTTP status codes
defining an unhealthy node. |
-| checks.active.unhealthy.http_failures | integer | False | 5
| between 1 and 254 | Number of HTTP failures to define an
unhealthy node. |
-| checks.active.unhealthy.tcp_failures | integer | False | 2
| between 1 and 254 | Number of TCP failures to define an
unhealthy node. |
-| checks.active.unhealthy.timeouts | integer | False | 3
| between 1 and 254 | Number of probe timeouts to define
an unhealthy node. |
+| logging | object | False |
| | Logging configurations. |
+| logging.summaries | boolean | False | false
| | If true, log request LLM model, duration,
request, and response tokens. |
+| logging.payloads | boolean | False | false
| | If true, log request and response
payload. |
+| instances.override | object | False |
| | Override setting. |
+| instances.override.endpoint | string | False |
| | LLM provider endpoint to replace the
default endpoint with. If not configured, the Plugin uses the default OpenAI
endpoint `https://api.openai.com/v1/chat/completions`. |
+| instances.checks | object | False |
| | Health check configurations.
Note that at the moment, OpenAI, DeepSeek, and AIMLAPI do not provide an
official health check endpoint. Other LLM services that you can configure under
`openai-compatible` provider may have available health check endpoints. |
+| instances.checks.active | object | True |
| | Active health check
configurations. |
+| instances.checks.active.type | string | False |
http | [http, https, tcp] | Type of health check
connection. |
+| instances.checks.active.timeout | number | False |
1 | | Health check timeout in
seconds. |
+| instances.checks.active.concurrency | integer | False |
10 | | Number of upstream nodes to be
checked at the same time. |
+| instances.checks.active.host | string | False |
| | HTTP host. |
+| instances.checks.active.port | integer | False |
| between 1 and 65535 inclusive | HTTP port. |
+| instances.checks.active.http_path | string | False |
/ | | Path for HTTP probing
requests. |
+| instances.checks.active.https_verify_certificate | boolean | False |
true | | If true, verify the node's TLS
certificate. |
+| instances.checks.active.healthy | object | False |
| | Healthy check configurations.
|
+| instances.checks.active.healthy.interval | integer | False |
1 | | Time interval of checking
healthy nodes, in seconds. |
+| instances.checks.active.healthy.http_statuses | array[integer] | False |
[200,302] | status code between 200 and 599 inclusive |
An array of HTTP status codes that defines a healthy node. |
+| instances.checks.active.healthy.successes | integer | False |
2 | between 1 and 254 inclusive | Number of
successful probes to define a healthy node. |
+| instances.checks.active.unhealthy | object | False |
| | Unhealthy check
configurations. |
+| instances.checks.active.unhealthy.interval | integer | False |
1 | | Time interval of checking
unhealthy nodes, in seconds. |
+| instances.checks.active.unhealthy.http_statuses | array[integer] | False |
[429,404,500,501,502,503,504,505] | status code between 200 and 599 inclusive |
An array of HTTP status codes that defines an unhealthy node. |
+| instances.checks.active.unhealthy.http_failures | integer | False |
5 | between 1 and 254 inclusive | Number of HTTP
failures to define an unhealthy node. |
+| instances.checks.active.unhealthy.timeout | integer | False |
3 | between 1 and 254 inclusive | Number of probe
timeouts to define an unhealthy node. |
| timeout | integer | False | 30000
| greater than or equal to 1 | Request timeout in
milliseconds when requesting the LLM service. |
| keepalive | boolean | False | true
| | If true, keep the connection alive when
requesting the LLM service. |
| keepalive_timeout | integer | False | 60000
| greater than or equal to 1000 | Request timeout in
milliseconds when requesting the LLM service. |
@@ -126,6 +127,17 @@ For demonstration and easier differentiation, you will be
configuring one OpenAI
Create a Route as such and update with your LLM providers, models, API keys,
and endpoints if applicable:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -168,6 +180,166 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
Send 10 POST requests to the Route with a system prompt and a sample user
question in the request body, to see the number of requests forwarded to OpenAI
and DeepSeek:
```shell
@@ -208,6 +380,17 @@ The following example demonstrates how you can configure
two models with differe
Create a Route as such and update with your LLM providers, models, API keys,
and endpoints if applicable:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -217,7 +400,7 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
"methods": ["POST"],
"plugins": {
"ai-proxy-multi": {
- "fallback_strategy: ["rate_limiting"],
+ "fallback_strategy": ["rate_limiting"],
"instances": [
{
"name": "openai-instance",
@@ -263,6 +446,199 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ - name: ai-rate-limiting
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
Send a POST request to the Route with a system prompt and a sample user
question in the request body:
```shell
@@ -316,7 +692,7 @@ You should receive a response similar to the following:
Since the `total_tokens` value exceeds the configured quota of `10`, the next
request within the 60-second window is expected to be forwarded to the other
instance.
-Within the same 60-second window, send another POST request to the route:
+Within the same 60-second window, send another POST request to the Route:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -351,10 +727,21 @@ You should see a response similar to the following:
### Load Balance and Rate Limit by Consumers
-The following example demonstrates how you can configure two models for load
balancing and apply rate limiting by consumer.
+The following example demonstrates how you can configure two models for load
balancing and apply rate limiting by Consumer.
Create a Consumer `johndoe` and a rate limiting quota of 10 tokens in a
60-second window on `openai-instance` instance:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -376,7 +763,7 @@ curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
}'
```
-Configure `key-auth` credential for `johndoe`:
+Configure `key-auth` Credential for `johndoe`:
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers/johndoe/credentials" -X PUT
\
@@ -397,7 +784,7 @@ Create another Consumer `janedoe` and a rate limiting quota
of 10 tokens in a 60
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
- "username": "johndoe",
+ "username": "janedoe",
"plugins": {
"ai-rate-limiting": {
"instances": [
@@ -414,7 +801,7 @@ curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
}'
```
-Configure `key-auth` credential for `janedoe`:
+Configure `key-auth` Credential for `janedoe`:
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT
\
@@ -429,8 +816,183 @@ curl
"http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+consumers:
+ - username: johndoe
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - name: key-auth
+ type: key-auth
+ config:
+ key: john-key
+ - username: janedoe
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - name: key-auth
+ type: key-auth
+ config:
+ key: jane-key
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-consumer-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: Consumer
+metadata:
+ namespace: aic
+ name: johndoe
+spec:
+ gatewayRef:
+ name: apisix
+ plugins:
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - type: key-auth
+ name: primary-key
+ config:
+ key: john-key
+---
+apiVersion: apisix.apache.org/v1alpha1
+kind: Consumer
+metadata:
+ namespace: aic
+ name: janedoe
+spec:
+ gatewayRef:
+ name: apisix
+ plugins:
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - type: key-auth
+ name: primary-key
+ config:
+ key: jane-key
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-consumer-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixConsumer
+metadata:
+ namespace: aic
+ name: johndoe
+spec:
+ ingressClassName: apisix
+ authParameter:
+ keyAuth:
+ value:
+ key: john-key
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+---
+apiVersion: apisix.apache.org/v2
+kind: ApisixConsumer
+metadata:
+ namespace: aic
+ name: janedoe
+spec:
+ ingressClassName: apisix
+ authParameter:
+ keyAuth:
+ value:
+ key: jane-key
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-consumer-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
Create a Route as such and update with your LLM providers, models, API keys,
and endpoints if applicable:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -441,7 +1003,7 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
"plugins": {
"key-auth": {},
"ai-proxy-multi": {
- "fallback_strategy: ["rate_limiting"],
+ "fallback_strategy": ["rate_limiting"],
"instances": [
{
"name": "openai-instance",
@@ -475,7 +1037,180 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-Send a POST request to the Route without any consumer key:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ key-auth: {}
+ ai-proxy-multi:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: key-auth
+ config:
+ _meta:
+ disable: false
+ - name: ai-proxy-multi
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: key-auth
+ enable: true
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+Send a POST request to the Route without any Consumer key:
```shell
curl -i "http://127.0.0.1:9080/anything" -X POST \
@@ -661,7 +1396,7 @@ You should see a response similar to the following:
}
```
-This shows `ai-proxy-multi` load balance the traffic with respect to the rate
limiting rules in `ai-rate-limiting` by consumers.
+This shows `ai-proxy-multi` load balance the traffic with respect to the rate
limiting rules in `ai-rate-limiting` by Consumers.
### Restrict Maximum Number of Completion Tokens
@@ -671,6 +1406,17 @@ For demonstration and easier differentiation, you will be
configuring one OpenAI
Create a Route as such and update with your LLM providers, models, API keys,
and endpoints if applicable:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -715,6 +1461,172 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
Send a POST request to the Route with a system prompt and a sample user
question in the request body:
```shell
@@ -803,6 +1715,17 @@ The following example demonstrates how you can configure
the `ai-proxy-multi` Pl
Create a Route as such and update with your LLM providers, embedding models,
API keys, and endpoints:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -851,6 +1774,178 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: azure-openai
+ weight: 0
+ auth:
+ header:
+ api-key: "${AZ_OPENAI_API_KEY}"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: azure-openai
+ weight: 0
+ auth:
+ header:
+ api-key: "your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: azure-openai
+ weight: 0
+ auth:
+ header:
+ api-key: "your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
Send a POST request to the Route with an input string:
```shell
@@ -895,6 +1990,17 @@ The following example demonstrates how you can configure
the `ai-proxy-multi` Pl
Create a Route as such and update the LLM providers, embedding models, API
keys, and health check related configurations:
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
@@ -952,6 +2058,199 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${YOUR_LLM_API_KEY}"
+ options:
+ model: "${YOUR_LLM_MODEL}"
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${YOUR_LLM_API_KEY}"
+ options:
+ model: "${YOUR_LLM_MODEL}"
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
For verification, the behaviours should be consistent with the verification in
[active health checks](../tutorials/health-check.md).
### Include LLM Information in Access Log
@@ -959,7 +2258,7 @@ For verification, the behaviours should be consistent with
the verification in [
The following example demonstrates how you can log LLM request related
information in the gateway's access log to improve analytics and audit. The
following variables are available:
* `request_llm_model`: LLM model name specified in the request.
-* `apisix_upstream_response_time`: Time taken for APISIX to send the request
to the upstream service and receive the full response
+* `apisix_upstream_response_time`: Time taken for APISIX to send the request
to the upstream service and receive the full response.
* `request_type`: Type of request, where the value could be
`traditional_http`, `ai_chat`, or `ai_stream`.
* `llm_time_to_first_token`: Duration from request sending to the first token
received from the LLM service, in milliseconds.
* `llm_model`: LLM model.
@@ -1017,3 +2316,308 @@ In the gateway's access log, you should see a log entry
similar to the following
```
The access log entry shows the request type is `ai_chat`, Apisix upstream
response time is `5765` milliseconds, time to first token is `2858`
milliseconds, Requested LLM model is `gpt-4`. LLM model is `gpt-4`, prompt
token usage is `23`, and completion token usage is `8`.
+
+### Send Request Log to Logger
+
+The following example demonstrates how you can log request and request
information, including LLM model, token, and payload, and push them to a
logger. Before proceeding, you should first set up a logger, such as Kafka. See
[`kafka-logger`](./kafka-logger.md) for more information.
+
+Create a Route to your LLM services and configure logging details as such:
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
+```shell
+curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
+ -H "X-API-KEY: ${admin_key}" \
+ -d '{
+ "id": "ai-proxy-multi-route",
+ "uri": "/anything",
+ "methods": ["POST"],
+ "plugins": {
+ "ai-proxy-multi": {
+ "instances": [
+ {
+ "name": "openai-instance",
+ "provider": "openai",
+ "weight": 8,
+ "auth": {
+ "header": {
+ "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+ }
+ },
+ "options": {
+ "model": "gpt-4"
+ }
+ },
+ {
+ "name": "deepseek-instance",
+ "provider": "deepseek",
+ "weight": 2,
+ "auth": {
+ "header": {
+ "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+ }
+ },
+ "options": {
+ "model": "deepseek-chat"
+ }
+ }
+ ],
+ "logging": {
+ "summaries": true,
+ "payloads": true
+ }
+ },
+ "kafka-logger": {
+ "brokers": [
+ {
+ "host": "127.0.0.1",
+ "port": 9092
+ }
+ ],
+ "kafka_topic": "test2",
+ "key": "key1",
+ "batch_max_size": 1
+ }
+ }
+ }
+ }'
+```
+
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ kafka-logger:
+ brokers:
+ - host: 127.0.0.1
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+```
+
+Synchronize the configuration to the gateway:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ - name: kafka-logger
+ config:
+ brokers:
+ - host: kafka.aic.svc.cluster.local
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ - name: kafka-logger
+ enable: true
+ config:
+ brokers:
+ - host: kafka.aic.svc.cluster.local
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+```
+
+</TabItem>
+
+</Tabs>
+
+Apply the configuration to your cluster:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+Send a POST request to the Route:
+
+```shell
+curl "http://127.0.0.1:9080/anything" -X POST \
+ -H "Content-Type: application/json" \
+ -d '{
+ "messages": [
+ { "role": "system", "content": "You are a mathematician" },
+ { "role": "user", "content": "What is 1+1?" }
+ ]
+ }'
+```
+
+You should receive a response similar to the following if the request is
forwarded to OpenAI:
+
+```json
+{
+ ...,
+ "model": "gpt-4-0613",
+ "choices": [
+ {
+ "index": 0,
+ "message": {
+ "role": "assistant",
+ "content": "1+1 equals 2.",
+ "refusal": null
+ },
+ "logprobs": null,
+ "finish_reason": "stop"
+ }
+ ],
+ ...
+}
+```
+
+In the Kafka topic, you should also see a log entry corresponding to the
request with the LLM summary and request/response payload.
diff --git a/docs/zh/latest/plugins/ai-proxy-multi.md
b/docs/zh/latest/plugins/ai-proxy-multi.md
index 9cc22ed43..a764d7d11 100644
--- a/docs/zh/latest/plugins/ai-proxy-multi.md
+++ b/docs/zh/latest/plugins/ai-proxy-multi.md
@@ -33,6 +33,9 @@ description: ai-proxy-multi 插件通过负载均衡、重试、故障转移和
<link rel="canonical" href="https://docs.api7.ai/hub/ai-proxy-multi" />
</head>
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
## 描述
`ai-proxy-multi` 插件通过将插件配置转换为
OpenAI、DeepSeek、Azure、AIMLAPI、Anthropic、OpenRouter、Gemini、Vertex AI 和其他 OpenAI
兼容 API 的指定请求格式,简化了对 LLM 和嵌入模型的访问。它通过负载均衡、重试、故障转移和健康检查扩展了
[`ai-proxy`](./ai-proxy.md) 的功能。
@@ -68,7 +71,7 @@ description: ai-proxy-multi 插件通过负载均衡、重试、故障转移和
| instances.auth.header | object | 否 |
| | 身份验证标头。应配置 `header` 和 `query` 中的至少一个。 |
| instances.auth.query | object | 否 |
| | 身份验证查询参数。应配置 `header` 和 `query` 中的至少一个。 |
| instances.auth.gcp | object | 否 |
| | Google Cloud Platform (GCP) 身份验证配置。 |
-| instances.auth.gcp.service_account_json | string | 否 |
| | GCP 服务帐户 JSON
文件的内容。也可以通过设置“GCP_SERVICE_ACCOUNT”环境变量来配置。 |
+| instances.auth.gcp.service_account_json | string | 否 |
| | GCP 服务帐户 JSON
文件的内容。也可以通过设置"GCP_SERVICE_ACCOUNT"环境变量来配置。 |
| instances.auth.gcp.max_ttl | integer | 否 |
| minimum = 1 | 用于缓存 GCP 访问令牌的最大 TTL(以秒为单位)。 |
| instances.auth.gcp.expire_early_secs| integer | 否 | 60
| minimum = 0 | 在访问令牌实际过期时间之前使其过期的秒数,以避免边缘情况。 |
| instances.options | object | 否 |
| | 模型配置。除了 `model` 之外,您还可以配置其他参数,它们将在请求体中转发到上游
LLM 服务。例如,如果您使用 OpenAI、DeepSeek 或 AIMLAPI,可以配置其他参数,如
`max_tokens`、`temperature`、`top_p` 和 `stream`。有关更多可用选项,请参阅您的 LLM 提供商的 API 文档。 |
@@ -78,26 +81,26 @@ description: ai-proxy-multi 插件通过负载均衡、重试、故障转移和
| logging | object | 否 |
| | 日志配置。不影响 `error.log`。 |
| logging.summaries | boolean | 否 | false
| | 如果为 true,记录请求 LLM 模型、持续时间、请求和响应令牌。 |
| logging.payloads | boolean | 否 | false
| | 如果为 true,记录请求和响应负载。 |
-| checks | object | 否 |
| | 健康检查配置。请注意,目前 OpenAI、DeepSeek 和 AIMLAPI
不提供官方健康检查端点。您可以在 `openai-compatible` 提供商下配置的其他 LLM 服务可能有可用的健康检查端点。 |
-| checks.active | object | 是 |
| | 主动健康检查配置。 |
-| checks.active.type | string | 否 | http
| [http, https, tcp] | 健康检查连接类型。 |
-| checks.active.timeout | number | 否 | 1
| | 健康检查超时时间(秒)。 |
-| checks.active.concurrency | integer | 否 | 10
| | 同时检查的上游节点数量。 |
-| checks.active.host | string | 否 |
| | HTTP 主机。 |
-| checks.active.port | integer | 否 |
| 1 到 65535(包含) | HTTP 端口。 |
-| checks.active.http_path | string | 否 | /
| | HTTP 探测请求的路径。 |
-| checks.active.https_verify_certificate | boolean | 否 | true
| | 如果为 true,验证节点的 TLS 证书。 |
-| checks.active.req_headers | array[string] | 否 |
| | 主动健康检查探测的附加请求标头。 |
-| checks.active.healthy | object | 否 |
| | 健康检查配置。 |
-| checks.active.healthy.interval | integer | 否 | 1
| minimum = 1 | 检查健康节点的时间间隔(秒)。 |
-| checks.active.healthy.http_statuses | array[integer] | 否 | [200, 302]
| 200 到 599 | 定义健康节点的 HTTP 状态码。 |
-| checks.active.healthy.successes | integer | 否 | 2
| 1 到 254 | 定义健康节点所需的成功探测次数。 |
-| checks.active.unhealthy | object | 否 |
| | 不健康检查配置。 |
-| checks.active.unhealthy.interval | integer | 否 | 1
| minimum = 1 | 检查不健康节点的时间间隔(秒)。 |
-| checks.active.unhealthy.http_statuses | array[integer] | 否 | [429, 404,
500, 501, 502, 503, 504, 505] | 200 到 599 | 定义不健康节点的 HTTP 状态码。 |
-| checks.active.unhealthy.http_failures | integer | 否 | 5
| 1 到 254 | 定义不健康节点所需的 HTTP 失败次数。 |
-| checks.active.unhealthy.tcp_failures | integer | 否 | 2
| 1 到 254 | 定义不健康节点所需的 TCP 失败次数。 |
-| checks.active.unhealthy.timeouts | integer | 否 | 3
| 1 到 254 | 定义不健康节点所需的探测超时次数。 |
+| instances.override | object | 否 |
| | 覆盖设置。 |
+| instances.override.endpoint | string | 否 |
| | 用于替换默认端点的 LLM 提供商端点。如果未配置,插件使用默认的 OpenAI
端点 `https://api.openai.com/v1/chat/completions`。 |
+| instances.checks | object | 否 |
| | 健康检查配置。请注意,目前 OpenAI、DeepSeek 和
AIMLAPI 不提供官方健康检查端点。您可以在 `openai-compatible` 提供商下配置的其他 LLM 服务可能有可用的健康检查端点。 |
+| instances.checks.active | object | 是 |
| | 主动健康检查配置。 |
+| instances.checks.active.type | string | 否 | http
| [http, https, tcp] | 健康检查连接类型。 |
+| instances.checks.active.timeout | number | 否 | 1
| | 健康检查超时时间(秒)。 |
+| instances.checks.active.concurrency | integer | 否 | 10
| | 同时检查的上游节点数量。 |
+| instances.checks.active.host | string | 否 |
| | HTTP 主机。 |
+| instances.checks.active.port | integer | 否 |
| 1 到 65535(包含) | HTTP 端口。 |
+| instances.checks.active.http_path | string | 否 | /
| | HTTP 探测请求的路径。 |
+| instances.checks.active.https_verify_certificate | boolean | 否 | true
| | 如果为 true,验证节点的 TLS 证书。 |
+| instances.checks.active.healthy | object | 否 |
| | 健康检查配置。 |
+| instances.checks.active.healthy.interval | integer | 否 | 1
| | 检查健康节点的时间间隔(秒)。 |
+| instances.checks.active.healthy.http_statuses | array[integer] | 否 |
[200,302] | 200 到 599 之间的状态码(包含) | 定义健康节点的 HTTP 状态码数组。 |
+| instances.checks.active.healthy.successes | integer | 否 | 2
| 1 到 254(包含) | 定义健康节点所需的成功探测次数。 |
+| instances.checks.active.unhealthy | object | 否 |
| | 不健康检查配置。 |
+| instances.checks.active.unhealthy.interval | integer | 否 | 1
| | 检查不健康节点的时间间隔(秒)。 |
+| instances.checks.active.unhealthy.http_statuses | array[integer] | 否 |
[429,404,500,501,502,503,504,505] | 200 到 599 之间的状态码(包含) | 定义不健康节点的 HTTP 状态码数组。
|
+| instances.checks.active.unhealthy.http_failures | integer | 否 | 5
| 1 到 254(包含) | 定义不健康节点的 HTTP 失败次数。 |
+| instances.checks.active.unhealthy.timeout | integer | 否 | 3
| 1 到 254(包含) | 定义不健康节点的探测超时次数。 |
| timeout | integer | 否 | 30000
| 大于或等于 1 | 请求 LLM 服务时的请求超时时间(毫秒)。 |
| keepalive | boolean | 否 | true
| | 如果为 true,在请求 LLM 服务时保持连接活跃。 |
| keepalive_timeout | integer | 否 | 60000
| 大于或等于 1000 | 请求 LLM 服务时的请求超时时间(毫秒)。 |
@@ -124,7 +127,18 @@ admin_key=$(yq '.deployment.admin.admin_key[0].key'
conf/config.yaml | sed 's/"/
为了演示和更容易区分,您将配置一个 OpenAI 实例和一个 DeepSeek 实例作为上游 LLM 服务。
-创建路由并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+创建 Route 并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -168,7 +182,167 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-向路由发送 10 个 POST 请求,在请求体中包含系统提示和示例用户问题,以查看转发到 OpenAI 和 DeepSeek 的请求数量:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 10 个 POST 请求,在请求体中包含系统提示和示例用户问题,以查看转发到 OpenAI 和 DeepSeek 的请求数量:
```shell
openai_count=0
@@ -206,7 +380,18 @@ DeepSeek responses: 2
以下示例演示了如何配置两个具有不同优先级的模型,并在优先级较高的实例上应用速率限制。在 `fallback_strategy` 设置为
`["rate_limiting"]` 的情况下,一旦高优先级实例的速率限制配额完全消耗,插件应继续将请求转发到低优先级实例。
-创建路由并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+创建 Route 并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -263,7 +448,200 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-向路由发送 POST 请求,在请求体中包含系统提示和示例用户问题:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ priority: 1
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ priority: 0
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ - name: ai-rate-limiting
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ limit_strategy: total_tokens
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 POST 请求,在请求体中包含系统提示和示例用户问题:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -316,7 +694,7 @@ curl "http://127.0.0.1:9080/anything" -X POST \
由于 `total_tokens` 值超过了配置的 `10` 配额,预计在 60 秒窗口内的下一个请求将转发到另一个实例。
-在同一个 60 秒窗口内,向路由发送另一个 POST 请求:
+在同一个 60 秒窗口内,向 Route 发送另一个 POST 请求:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -347,12 +725,24 @@ curl "http://127.0.0.1:9080/anything" -X POST \
],
...
}
-```#
-## 按消费者进行负载均衡和速率限制
+```
+
+### 按消费者进行负载均衡和速率限制
以下示例演示了如何配置两个模型进行负载均衡,并按消费者应用速率限制。
-创建消费者 `johndoe` 并在 `openai-instance` 实例上设置 60 秒窗口内 10 个令牌的速率限制配额:
+创建 Consumer `johndoe` 并在 `openai-instance` 实例上设置 60 秒窗口内 10 个令牌的速率限制配额:
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
@@ -375,7 +765,7 @@ curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
}'
```
-为 `johndoe` 配置 `key-auth` 凭据:
+为 `johndoe` 配置 `key-auth` Credential:
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers/johndoe/credentials" -X PUT
\
@@ -390,7 +780,7 @@ curl
"http://127.0.0.1:9180/apisix/admin/consumers/johndoe/credentials" -X PUT \
}'
```
-创建另一个消费者 `janedoe` 并在 `deepseek-instance` 实例上设置 60 秒窗口内 10 个令牌的速率限制配额:
+创建另一个 Consumer `janedoe` 并在 `deepseek-instance` 实例上设置 60 秒窗口内 10 个令牌的速率限制配额:
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
@@ -413,7 +803,7 @@ curl "http://127.0.0.1:9180/apisix/admin/consumers" -X PUT \
}'
```
-为 `janedoe` 配置 `key-auth` 凭据:
+为 `janedoe` 配置 `key-auth` Credential:
```shell
curl "http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT
\
@@ -428,7 +818,182 @@ curl
"http://127.0.0.1:9180/apisix/admin/consumers/janedoe/credentials" -X PUT \
}'
```
-创建路由并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+consumers:
+ - username: johndoe
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - name: key-auth
+ type: key-auth
+ config:
+ key: john-key
+ - username: janedoe
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - name: key-auth
+ type: key-auth
+ config:
+ key: jane-key
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-consumer-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: Consumer
+metadata:
+ namespace: aic
+ name: johndoe
+spec:
+ gatewayRef:
+ name: apisix
+ plugins:
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - type: key-auth
+ name: primary-key
+ config:
+ key: john-key
+---
+apiVersion: apisix.apache.org/v1alpha1
+kind: Consumer
+metadata:
+ namespace: aic
+ name: janedoe
+spec:
+ gatewayRef:
+ name: apisix
+ plugins:
+ - name: ai-rate-limiting
+ config:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+ credentials:
+ - type: key-auth
+ name: primary-key
+ config:
+ key: jane-key
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-consumer-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixConsumer
+metadata:
+ namespace: aic
+ name: johndoe
+spec:
+ ingressClassName: apisix
+ authParameter:
+ keyAuth:
+ value:
+ key: john-key
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: openai-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+---
+apiVersion: apisix.apache.org/v2
+kind: ApisixConsumer
+metadata:
+ namespace: aic
+ name: janedoe
+spec:
+ ingressClassName: apisix
+ authParameter:
+ keyAuth:
+ value:
+ key: jane-key
+ plugins:
+ ai-rate-limiting:
+ instances:
+ - name: deepseek-instance
+ limit: 10
+ time_window: 60
+ rejected_code: 429
+ limit_strategy: total_tokens
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-consumer-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+创建 Route 并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -474,7 +1039,180 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-向路由发送 POST 请求,不带任何消费者密钥:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ key-auth: {}
+ ai-proxy-multi:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: key-auth
+ config:
+ _meta:
+ disable: false
+ - name: ai-proxy-multi
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: key-auth
+ enable: true
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ fallback_strategy:
+ - rate_limiting
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 POST 请求,不带任何消费者密钥:
```shell
curl -i "http://127.0.0.1:9080/anything" -X POST \
@@ -489,7 +1227,7 @@ curl -i "http://127.0.0.1:9080/anything" -X POST \
您应该收到 `HTTP/1.1 401 Unauthorized` 响应。
-使用 `johndoe` 的密钥向路由发送 POST 请求:
+使用 `johndoe` 的密钥向 Route 发送 POST 请求:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -543,7 +1281,7 @@ curl "http://127.0.0.1:9080/anything" -X POST \
由于 `total_tokens` 值超过了 `johndoe` 的 `openai` 实例配置配额,预计在 60 秒窗口内来自 `johndoe`
的下一个请求将转发到 `deepseek` 实例。
-在同一个 60 秒窗口内,使用 `johndoe` 的密钥向路由发送另一个 POST 请求:
+在同一个 60 秒窗口内,使用 `johndoe` 的密钥向 Route 发送另一个 POST 请求:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -577,7 +1315,7 @@ curl "http://127.0.0.1:9080/anything" -X POST \
}
```
-使用 `janedoe` 的密钥向路由发送 POST 请求:
+使用 `janedoe` 的密钥向 Route 发送 POST 请求:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -624,7 +1362,7 @@ curl "http://127.0.0.1:9080/anything" -X POST \
由于 `total_tokens` 值超过了 `janedoe` 的 `deepseek` 实例配置配额,预计在 60 秒窗口内来自 `janedoe`
的下一个请求将转发到 `openai` 实例。
-在同一个 60 秒窗口内,使用 `janedoe` 的密钥向路由发送另一个 POST 请求:
+在同一个 60 秒窗口内,使用 `janedoe` 的密钥向 Route 发送另一个 POST 请求:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -660,7 +1398,7 @@ curl "http://127.0.0.1:9080/anything" -X POST \
}
```
-这显示了 `ai-proxy-multi` 根据消费者在 `ai-rate-limiting` 中的速率限制规则对流量进行负载均衡。
+这显示了 `ai-proxy-multi` 根据 Consumer 在 `ai-rate-limiting` 中的速率限制规则对流量进行负载均衡。
### 限制完成令牌的最大数量
@@ -668,7 +1406,18 @@ curl "http://127.0.0.1:9080/anything" -X POST \
为了演示和更容易区分,您将配置一个 OpenAI 实例和一个 DeepSeek 实例作为上游 LLM 服务。
-创建路由并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+创建 Route 并更新您的 LLM 提供商、模型、API 密钥和端点(如果适用):
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -714,7 +1463,173 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-向路由发送 POST 请求,在请求体中包含系统提示和示例用户问题:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ max_tokens: 50
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ max_tokens: 100
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 POST 请求,在请求体中包含系统提示和示例用户问题:
```shell
curl "http://127.0.0.1:9080/anything" -X POST \
@@ -800,7 +1715,18 @@ curl "http://127.0.0.1:9080/anything" -X POST \
以下示例演示了如何配置 `ai-proxy-multi` 插件以代理请求并在嵌入模型之间进行负载均衡。
-创建路由并更新您的 LLM 提供商、嵌入模型、API 密钥和端点:
+创建 Route 并更新您的 LLM 提供商、嵌入模型、API 密钥和端点:
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -850,7 +1776,179 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
-向路由发送 POST 请求,包含输入字符串:
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${AZ_OPENAI_API_KEY}"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint: "https://api.openai.com/v1/embeddings"
+ - name: az-openai-instance
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: text-embedding-3-small
+ override:
+ endpoint:
"https://ai-plugin-developer.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2023-05-15"
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 POST 请求,包含输入字符串:
```shell
curl "http://127.0.0.1:9080/embeddings" -X POST \
@@ -892,7 +1990,18 @@ curl "http://127.0.0.1:9080/embeddings" -X POST \
以下示例演示了如何配置 `ai-proxy-multi`
插件以代理请求并在模型之间进行负载均衡,并启用主动健康检查以提高服务可用性。您可以在一个或多个实例上启用健康检查。
-创建路由并更新 LLM 提供商、嵌入模型、API 密钥和健康检查相关配置:
+创建 Route 并更新 LLM 提供商、嵌入模型、API 密钥和健康检查相关配置:
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
```shell
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
@@ -951,8 +2060,506 @@ curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
}'
```
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${YOUR_LLM_API_KEY}"
+ options:
+ model: "${YOUR_LLM_MODEL}"
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer ${YOUR_LLM_API_KEY}"
+ options:
+ model: "${YOUR_LLM_MODEL}"
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: llm-instance-1
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ - name: llm-instance-2
+ provider: openai-compatible
+ weight: 0
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: your-model
+ checks:
+ active:
+ type: https
+ host: yourhost.com
+ http_path: /your/probe/path
+ healthy:
+ interval: 2
+ successes: 1
+ unhealthy:
+ interval: 1
+ http_failures: 3
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
为了验证,行为应与[主动健康检查](../tutorials/health-check.md)中的验证一致。
+### 发送请求日志到日志记录器
+
+以下示例演示了如何记录请求和响应信息(包括 LLM 模型、令牌和负载),并将其推送到日志记录器。在继续之前,您应该先设置一个日志记录器,例如
Kafka。有关更多信息,请参阅 [`kafka-logger`](./kafka-logger.md)。
+
+创建 Route 到您的 LLM 服务并配置日志记录详情:
+
+<Tabs
+groupId="api"
+defaultValue="admin-api"
+values={[
+{label: 'Admin API', value: 'admin-api'},
+{label: 'ADC', value: 'adc'},
+{label: 'Ingress Controller', value: 'aic'}
+]}>
+
+<TabItem value="admin-api">
+
+```shell
+curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
+ -H "X-API-KEY: ${admin_key}" \
+ -d '{
+ "id": "ai-proxy-multi-route",
+ "uri": "/anything",
+ "methods": ["POST"],
+ "plugins": {
+ "ai-proxy-multi": {
+ "instances": [
+ {
+ "name": "openai-instance",
+ "provider": "openai",
+ "weight": 8,
+ "auth": {
+ "header": {
+ "Authorization": "Bearer '"$OPENAI_API_KEY"'"
+ }
+ },
+ "options": {
+ "model": "gpt-4"
+ }
+ },
+ {
+ "name": "deepseek-instance",
+ "provider": "deepseek",
+ "weight": 2,
+ "auth": {
+ "header": {
+ "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
+ }
+ },
+ "options": {
+ "model": "deepseek-chat"
+ }
+ }
+ ],
+ "logging": {
+ "summaries": true,
+ "payloads": true
+ }
+ },
+ "kafka-logger": {
+ "brokers": [
+ {
+ "host": "127.0.0.1",
+ "port": 9092
+ }
+ ],
+ "kafka_topic": "test2",
+ "key": "key1",
+ "batch_max_size": 1
+ }
+ }
+ }
+ }'
+```
+
+</TabItem>
+
+<TabItem value="adc">
+
+```yaml title="adc.yaml"
+services:
+ - name: ai-proxy-multi-service
+ routes:
+ - name: ai-proxy-multi-route
+ uris:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ ai-proxy-multi:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer ${OPENAI_API_KEY}"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer ${DEEPSEEK_API_KEY}"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ kafka-logger:
+ brokers:
+ - host: 127.0.0.1
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+```
+
+将配置同步到网关:
+
+```shell
+adc sync -f adc.yaml
+```
+
+</TabItem>
+
+<TabItem value="aic">
+
+<Tabs
+groupId="k8s-api"
+defaultValue="gateway-api"
+values={[
+{label: 'Gateway API', value: 'gateway-api'},
+{label: 'APISIX CRD', value: 'apisix-crd'}
+]}>
+
+<TabItem value="gateway-api">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v1alpha1
+kind: PluginConfig
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-plugin-config
+spec:
+ plugins:
+ - name: ai-proxy-multi
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ - name: kafka-logger
+ config:
+ brokers:
+ - host: kafka.aic.svc.cluster.local
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+---
+apiVersion: gateway.networking.k8s.io/v1
+kind: HTTPRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ parentRefs:
+ - name: apisix
+ rules:
+ - matches:
+ - path:
+ type: Exact
+ value: /anything
+ method: POST
+ filters:
+ - type: ExtensionRef
+ extensionRef:
+ group: apisix.apache.org
+ kind: PluginConfig
+ name: ai-proxy-multi-plugin-config
+```
+
+</TabItem>
+
+<TabItem value="apisix-crd">
+
+```yaml title="ai-proxy-multi-ic.yaml"
+apiVersion: apisix.apache.org/v2
+kind: ApisixRoute
+metadata:
+ namespace: aic
+ name: ai-proxy-multi-route
+spec:
+ ingressClassName: apisix
+ http:
+ - name: ai-proxy-multi-route
+ match:
+ paths:
+ - /anything
+ methods:
+ - POST
+ plugins:
+ - name: ai-proxy-multi
+ enable: true
+ config:
+ instances:
+ - name: openai-instance
+ provider: openai
+ weight: 8
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: gpt-4
+ - name: deepseek-instance
+ provider: deepseek
+ weight: 2
+ auth:
+ header:
+ Authorization: "Bearer your-api-key"
+ options:
+ model: deepseek-chat
+ logging:
+ summaries: true
+ payloads: true
+ - name: kafka-logger
+ enable: true
+ config:
+ brokers:
+ - host: kafka.aic.svc.cluster.local
+ port: 9092
+ kafka_topic: test2
+ key: key1
+ batch_max_size: 1
+```
+
+</TabItem>
+
+</Tabs>
+
+将配置应用到集群:
+
+```shell
+kubectl apply -f ai-proxy-multi-ic.yaml
+```
+
+</TabItem>
+
+</Tabs>
+
+向 Route 发送 POST 请求:
+
+```shell
+curl "http://127.0.0.1:9080/anything" -X POST \
+ -H "Content-Type: application/json" \
+ -d '{
+ "messages": [
+ { "role": "system", "content": "You are a mathematician" },
+ { "role": "user", "content": "What is 1+1?" }
+ ]
+ }'
+```
+
+如果请求被转发到 OpenAI,您应该收到类似以下的响应:
+
+```json
+{
+ ...,
+ "model": "gpt-4-0613",
+ "choices": [
+ {
+ "index": 0,
+ "message": {
+ "role": "assistant",
+ "content": "1+1 equals 2.",
+ "refusal": null
+ },
+ "logprobs": null,
+ "finish_reason": "stop"
+ }
+ ],
+ ...
+}
+```
+
+在 Kafka 主题中,您还应该看到与请求对应的日志条目,其中包含 LLM 摘要和请求/响应负载。
+
### 在访问日志中包含 LLM 信息
以下示例演示了如何在网关的访问日志中记录 LLM 请求相关信息,以改进分析和审计。以下变量可用:
@@ -975,7 +2582,7 @@ nginx_config:
重新加载 APISIX 以使配置更改生效。
-接下来,使用 `ai-proxy-multi` 插件创建路由并发送请求。例如,如果请求转发到 OpenAI 并且您收到以下响应:
+接下来,使用 `ai-proxy-multi` 插件创建 Route 并发送请求。例如,如果请求转发到 OpenAI 并且您收到以下响应:
```json
{