Yilialinn commented on code in PR #2028: URL: https://github.com/apache/apisix-website/pull/2028#discussion_r3085433623
########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { Review Comment: **Bug: this example returns `500 Internal Server Error` at request time when no consumer is authenticated.** Tested on APISIX 3.16.0. Error log: ``` [lua] init.lua:456: phase_func(): failed to get rate limit rules: nil limit-count exits with http status code 500 ``` Root cause: in `limit-count/init.lua` `get_rules()`, after resolving the key string, there is a check: ```lua local key, _, n_resolved = core.utils.resolve_var(rule.key, ctx.var) if n_resolved == 0 then goto CONTINUE -- rule is silently skipped end ``` `n_resolved` counts how many `${}` variables were successfully resolved to a non-nil value. When `consumer_name` is nil (no authenticated consumer), `n_resolved` stays `0` and the rule is skipped. When all rules are skipped, the rules list is empty and APISIX returns 500. **This is a code-level bug in APISIX** — `n_resolved == 0` was likely intended to skip rules where the key variable is missing, but it also incorrectly skips rules whose key variables resolve to nil. A separate issue should be filed against `apache/apisix`. **To make this example work as written**, the route must include an authentication plugin so that `consumer_name` is populated: ```json "plugins": { "key-auth": {}, "limit-count": { ... } } ``` And a consumer with that auth plugin must exist. The blog should either add this prerequisite or use a key that is always available (e.g., `"${remote_addr}"`) for the basic illustration. ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } Review Comment: **Bug: same `n_resolved == 0` issue — `${consumer_name}` key causes this rule to be skipped when no consumer is authenticated, resulting in `500`.** Tested on APISIX 3.16.0. Error log: ``` [lua] init.lua:208: phase_func(): failed to get limit conn rules: nil limit-conn exits with http status code 500 ``` Same root cause as the `limit-count` case: `limit-conn/init.lua` has the identical `n_resolved == 0` guard. The `${consumer_name}` rule is skipped when there is no consumer context. **Fix needed**: add an auth plugin (e.g., `key-auth`) to the route so `consumer_name` is populated at request time. ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } + ], + "rejected_code": 503 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. + +## AI Rate Limiting: Token Budget Management + +For AI gateway use cases, the `ai-rate-limiting` plugin combines multiple rules with variable support for fine-grained token budget control: + +```json +{ + "uri": "/v1/chat/completions", + "plugins": { + "ai-rate-limiting": { + "limit_strategy": "total_tokens", + "rules": [ + { + "count": 10000, + "time_window": 60, + "key": "${consumer_name}_per_minute", Review Comment: **Bug: `ai-rate-limiting` is silently non-functional when used without `ai-proxy`.** Tested on APISIX 3.16.0. The route accepts requests and returns `502` (upstream unreachable), but the rate limiting plugin does absolutely nothing — no rate limit headers, no token counting, no rejection. Root cause: `ai-rate-limiting`'s `access()` begins with: ```lua local ai_instance_name = ctx.picked_ai_instance_name if not ai_instance_name then return -- silently exits, no limiting applied end ``` `ctx.picked_ai_instance_name` is only set by the `ai-proxy` (or `ai-proxy-multi`) plugin. Without it, the entire plugin is a no-op. **The example as written is incomplete.** It must include an `ai-proxy` plugin configuration on the same route for `ai-rate-limiting` to have any effect. Suggest adding a note: > **Prerequisite**: `ai-rate-limiting` must be used alongside the `ai-proxy` plugin. It relies on `ai-proxy` to populate the AI instance context (`ctx.picked_ai_instance_name`) and token usage (`ctx.ai_token_usage`). Without `ai-proxy`, the plugin is silently inactive. ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } + ], + "rejected_code": 503 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. + +## AI Rate Limiting: Token Budget Management + +For AI gateway use cases, the `ai-rate-limiting` plugin combines multiple rules with variable support for fine-grained token budget control: + +```json +{ + "uri": "/v1/chat/completions", + "plugins": { + "ai-rate-limiting": { + "limit_strategy": "total_tokens", + "rules": [ + { + "count": 10000, + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "consumer" + }, + { + "count": 500000, + "time_window": 86400, + "key": "${consumer_name}_per_day", + "header_prefix": "daily" + }, + { + "count": 1000000, + "time_window": 60, + "key": "global", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This configuration enforces three simultaneous constraints: + +1. **Per-consumer burst**: 10,000 tokens per minute per consumer +2. **Per-consumer daily**: 500,000 tokens per day per consumer +3. **Global capacity**: 1,000,000 tokens per minute across all consumers + +As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways. + +## Combining Multiple Rules with Variables + +The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_burst_quota ?? 10}", + "time_window": 1, + "key": "${consumer_name}_per_second", + "header_prefix": "burst" + }, + { + "count": "${http_x_sustained_quota ?? 500}", + "time_window": 60, + "key": "${consumer_name}_per_minute", Review Comment: **Bug: `${consumer_name}_per_second` key fails when no consumer is authenticated — same `n_resolved == 0` issue.** Without an auth plugin on the route, `consumer_name` is nil. The rule is skipped, and since rule 3 (`"global"`) is also skipped (see next comment), all rules are dropped → `500`. **Fix**: add `"key-auth": {}` (or another auth plugin) to the route plugins. ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } + ], + "rejected_code": 503 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. + +## AI Rate Limiting: Token Budget Management + +For AI gateway use cases, the `ai-rate-limiting` plugin combines multiple rules with variable support for fine-grained token budget control: + +```json +{ + "uri": "/v1/chat/completions", + "plugins": { + "ai-rate-limiting": { + "limit_strategy": "total_tokens", + "rules": [ + { + "count": 10000, + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "consumer" + }, + { + "count": 500000, + "time_window": 86400, + "key": "${consumer_name}_per_day", + "header_prefix": "daily" + }, + { + "count": 1000000, + "time_window": 60, + "key": "global", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, Review Comment: **Bug: `"key": "global"` is silently skipped here too (same `n_resolved == 0` issue).** `ai-rate-limiting` delegates to `limit-count` internally for the `rules` path. The same `n_resolved == 0` guard applies, so the `"global"` constant key rule is never enforced. Combined with the missing `ai-proxy` issue on the same example, the global capacity cap described in point 3 of the explanation below this block never takes effect. **Workaround**: `"key": "${http_host ?? global}"` (same fix as the `limit-conn` example). ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } + ], + "rejected_code": 503 + } + }, + "upstream": { Review Comment: **Bug: a plain string key like `"global"` also triggers the `n_resolved == 0` skip and is never applied.** This is the second part of the `limit-conn` bug. Since `"global"` contains no `${}` variable expression, `core.utils.resolve_var("global", ctx.var)` returns `n_resolved = 0`. The guard `if n_resolved == 0 then goto CONTINUE` then skips this rule too. In this example both rules are skipped → empty rules list → `500`. **This is also a code-level bug in APISIX**: constant string keys should be valid and usable directly. The `n_resolved == 0` check should only skip a rule when the key *contains* a variable expression that failed to resolve, not when the key is a plain constant. **Workaround until the APISIX bug is fixed**: replace the bare constant with a variable expression that always resolves: ```json "key": "${http_host ?? global}" ``` `http_host` is present on every request, so `n_resolved = 1`. The resulting key value is the Host header (effectively a per-service shared counter), or `"global"` as fallback if the header is absent. ########## blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md: ########## @@ -0,0 +1,329 @@ +--- +title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API Gateway" +authors: + - name: "Ming Wen" + title: "Author" + url: "https://github.com/moonming" + image_url: "https://github.com/moonming.png" +keywords: + - Apache APISIX + - API Gateway + - Rate Limiting + - Dynamic Rate Limiting + - AI Gateway + - Multi-Tenant + - Token Budget +description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple rules and variable support across limit-count, limit-conn, and ai-rate-limiting plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a single route configuration. +tags: [Products] +--- + +Rate limiting is one of the most critical capabilities in any API gateway. Yet for years, most gateways — including APISIX — have treated it as a static, one-size-fits-all configuration: set a number, set a time window, done. + +In practice, real-world rate limiting is far more nuanced. A SaaS platform needs different quotas for free and paid users. An AI gateway must enforce token budgets that vary by model and consumer. A multi-tenant API must isolate rate limits per tenant without duplicating routes. + +Apache APISIX 3.16 addresses these challenges head-on with two powerful enhancements to the rate limiting plugins: **multiple rules** and **variable support**. Together, they transform rate limiting from static configuration into a dynamic, context-aware policy engine. + +<!--truncate--> + +## What Changed in APISIX 3.16 + +APISIX 3.16 introduces two complementary features across the `limit-count`, `limit-conn`, and `ai-rate-limiting` plugins: + +| Feature | Description | Supported Plugins | +|---------|-------------|-------------------| +| Multiple rules | Define an array of rate limiting rules with independent thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` | +| Variable support | Use APISIX variables (`${remote_addr}`, `${http_*}`, `${consumer_name}`, etc.) in `count`, `time_window`, and `key` fields, with optional default values via `${var ?? default}` | `limit-count`, `limit-conn`, `ai-rate-limiting` | + +Both features are fully backward compatible. Existing configurations continue to work without modification. + +## Multiple Rules: Beyond Single-Threshold Rate Limiting + +### The Problem + +Consider a common requirement: limit an API to **10 requests per second** and **500 requests per minute**. Before 3.16, you had to configure two separate plugin instances or chain multiple routes. This was verbose, error-prone, and hard to maintain. + +### The Solution + +The new `rules` array lets you define multiple rate limiting policies in a single plugin configuration. Each rule operates independently with its own counter, time window, and key. + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 10, + "time_window": 1, + "key": "${remote_addr}_per_second", + "header_prefix": "per-second" + }, + { + "count": 500, + "time_window": 60, + "key": "${remote_addr}_per_minute", + "header_prefix": "per-minute" + }, + { + "count": 10000, + "time_window": 86400, + "key": "${remote_addr}_per_day", + "header_prefix": "per-day" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +With this configuration, APISIX enforces all three limits simultaneously. A client hitting the per-second limit receives a `429` response with headers indicating which limit was exceeded: + +``` +X-Per-Second-RateLimit-Limit: 10 +X-Per-Second-RateLimit-Remaining: 0 +X-Per-Second-RateLimit-Reset: 1 +X-Per-Minute-RateLimit-Limit: 500 +X-Per-Minute-RateLimit-Remaining: 499 +X-Per-Minute-RateLimit-Reset: 60 +``` + +The `header_prefix` field lets clients distinguish which rule triggered the rejection — critical for debugging and client-side retry logic. + +## Variable Support: Context-Aware Rate Limiting + +### The Problem + +Static rate limits assume every consumer is equal. In reality, a free-tier user and an enterprise customer should have very different quotas. Before 3.16, supporting this meant creating separate routes for each tier — leading to route explosion and configuration drift. + +### The Solution + +Variable support lets you pull rate limiting parameters directly from the request context. The `count`, `time_window`, and `key` fields now accept APISIX variables. + +### Example 1: Per-Tier Rate Limiting via HTTP Header + +Suppose your authentication middleware injects an `X-Rate-Quota` header based on the user's subscription tier: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_rate_quota ?? 100}", + "time_window": 60, + "key": "${consumer_name}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Now the same route handles all tiers: + +| Tier | `X-Rate-Quota` Header | Effective Limit | +|------|----------------------|-----------------| +| Free | 100 | 100 req/min | +| Pro | 1000 | 1,000 req/min | +| Enterprise | 50000 | 50,000 req/min | + +One route. One plugin configuration. All tiers. + +### Example 2: Multi-Tenant Isolation with Variable Combination + +For a multi-tenant SaaS API, you can combine variables to create isolated rate limit buckets per tenant per endpoint: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": 1000, + "time_window": 60, + "key": "${http_x_tenant_id} ${uri}" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get independent counters. Tenant A calling `/api/v1/orders` gets yet another counter. This creates a natural per-tenant-per-endpoint isolation without any route duplication. + +### Example 3: Dynamic Concurrent Connection Limits + +The `limit-conn` plugin also supports rules and variables, enabling dynamic concurrency control: + +```json +{ + "uri": "/api/v1/inference", + "plugins": { + "limit-conn": { + "default_conn_delay": 0.1, + "rules": [ + { + "conn": 5, + "burst": 2, + "key": "${consumer_name}" + }, + { + "conn": 100, + "burst": 20, + "key": "global" + } + ], + "rejected_code": 503 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This limits each consumer to 5 concurrent connections while capping the total at 100 — preventing any single consumer from monopolizing backend capacity. + +## AI Rate Limiting: Token Budget Management + +For AI gateway use cases, the `ai-rate-limiting` plugin combines multiple rules with variable support for fine-grained token budget control: + +```json +{ + "uri": "/v1/chat/completions", + "plugins": { + "ai-rate-limiting": { + "limit_strategy": "total_tokens", + "rules": [ + { + "count": 10000, + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "consumer" + }, + { + "count": 500000, + "time_window": 86400, + "key": "${consumer_name}_per_day", + "header_prefix": "daily" + }, + { + "count": 1000000, + "time_window": 60, + "key": "global", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, + "upstream": { + "type": "roundrobin", + "nodes": { + "127.0.0.1:1980": 1 + } + } +} +``` + +This configuration enforces three simultaneous constraints: + +1. **Per-consumer burst**: 10,000 tokens per minute per consumer +2. **Per-consumer daily**: 500,000 tokens per day per consumer +3. **Global capacity**: 1,000,000 tokens per minute across all consumers + +As AI API costs scale directly with token usage, this kind of layered budget control is essential for production AI gateways. + +## Combining Multiple Rules with Variables + +The real power emerges when you combine both features. Here is a complete example for an API platform with tiered pricing: + +```json +{ + "uri": "/api/v1/*", + "plugins": { + "limit-count": { + "rules": [ + { + "count": "${http_x_burst_quota ?? 10}", + "time_window": 1, + "key": "${consumer_name}_per_second", + "header_prefix": "burst" + }, + { + "count": "${http_x_sustained_quota ?? 500}", + "time_window": 60, + "key": "${consumer_name}_per_minute", + "header_prefix": "sustained" + }, + { + "count": 100000, + "time_window": 60, + "key": "global", + "header_prefix": "global" + } + ], + "rejected_code": 429 + } + }, Review Comment: **Bug: `"key": "global"` is silently skipped — same `n_resolved == 0` issue as the `limit-conn` example.** Even if rules 1 and 2 were working (with a consumer), this global safety cap rule would never be applied because the constant string `"global"` produces `n_resolved = 0`. This means the "static global safety cap" described in the paragraph below this block is never actually enforced, which defeats a key part of the example's purpose. **Workaround**: `"key": "${http_host ?? global}"` until the upstream APISIX bug is fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
