This is an automated email from the ASF dual-hosted git repository.

baoyuan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/apisix-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 2769ec3e98e blog: add article on APISIX 3.16 dynamic rate limiting 
(#2028)
2769ec3e98e is described below

commit 2769ec3e98eed49ea3eee1ad49a9a0baa0b315af
Author: Ming Wen <[email protected]>
AuthorDate: Thu Apr 16 11:44:32 2026 +0800

    blog: add article on APISIX 3.16 dynamic rate limiting (#2028)
---
 .../04/14/apisix-3.16-dynamic-rate-limiting.md     | 332 +++++++++++++++++++++
 1 file changed, 332 insertions(+)

diff --git a/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md 
b/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md
new file mode 100644
index 00000000000..447c9985da2
--- /dev/null
+++ b/blog/en/blog/2026/04/14/apisix-3.16-dynamic-rate-limiting.md
@@ -0,0 +1,332 @@
+---
+title: "What's New in Apache APISIX 3.16: Dynamic Rate Limiting for Your API 
Gateway"
+authors:
+  - name: "Ming Wen"
+    title: "Author"
+    url: "https://github.com/moonming";
+    image_url: "https://github.com/moonming.png";
+keywords:
+  - Apache APISIX
+  - API Gateway
+  - Rate Limiting
+  - Dynamic Rate Limiting
+  - AI Gateway
+  - Multi-Tenant
+  - Token Budget
+description: Apache APISIX 3.16 introduces dynamic rate limiting with multiple 
rules and variable support across limit-count, limit-conn, and ai-rate-limiting 
plugins, enabling context-aware, per-tier, and multi-tenant rate limiting in a 
single route configuration.
+tags: [Community]
+---
+
+Rate limiting is one of the most critical capabilities in any API gateway. Yet 
for years, most gateways — including APISIX — have treated it as a static, 
one-size-fits-all configuration: set a number, set a time window, done.
+
+In practice, real-world rate limiting is far more nuanced. A SaaS platform 
needs different quotas for free and paid users. An AI gateway must enforce 
token budgets that vary by model and consumer. A multi-tenant API must isolate 
rate limits per tenant without duplicating routes.
+
+Apache APISIX 3.16 addresses these challenges head-on with two powerful 
enhancements to the rate limiting plugins: **multiple rules** and **variable 
support**. Together, they transform rate limiting from static configuration 
into a dynamic, context-aware policy engine.
+
+<!--truncate-->
+
+## What Changed in APISIX 3.16
+
+APISIX 3.16 introduces two complementary features across the `limit-count`, 
`limit-conn`, and `ai-rate-limiting` plugins:
+
+| Feature | Description | Supported Plugins |
+|---------|-------------|-------------------|
+| Multiple rules | Define an array of rate limiting rules with independent 
thresholds and time windows | `limit-count`, `limit-conn`, `ai-rate-limiting` |
+| Variable support | Use APISIX variables (`$remote_addr`, `$http_*`, 
`$consumer_name`, etc.) in `key` and plugin-specific rate or threshold fields | 
`limit-count`, `limit-conn`, `ai-rate-limiting` |
+
+Both features are fully backward compatible. Existing configurations continue 
to work without modification.
+
+## Multiple Rules: Beyond Single-Threshold Rate Limiting
+
+### The Problem
+
+Consider a common requirement: limit an API to **10 requests per second** and 
**500 requests per minute**. Before 3.16, you had to configure two separate 
plugin instances or chain multiple routes. This was verbose, error-prone, and 
hard to maintain.
+
+### The Solution
+
+The new `rules` array lets you define multiple rate limiting policies in a 
single plugin configuration. Each rule operates independently with its own 
counter, time window, and key.
+
+```json
+{
+  "uri": "/api/v1/*",
+  "plugins": {
+    "limit-count": {
+      "rules": [
+        {
+          "count": 10,
+          "time_window": 1,
+          "key": "${remote_addr}_per_second",
+          "header_prefix": "per-second"
+        },
+        {
+          "count": 500,
+          "time_window": 60,
+          "key": "${remote_addr}_per_minute",
+          "header_prefix": "per-minute"
+        },
+        {
+          "count": 10000,
+          "time_window": 86400,
+          "key": "${remote_addr}_per_day",
+          "header_prefix": "per-day"
+        }
+      ],
+      "rejected_code": 429
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+With this configuration, APISIX enforces all three limits simultaneously. A 
client hitting the per-second limit receives a `429` response with headers 
indicating which limit was exceeded:
+
+```
+X-Per-Second-RateLimit-Limit: 10
+X-Per-Second-RateLimit-Remaining: 0
+X-Per-Second-RateLimit-Reset: 1
+X-Per-Minute-RateLimit-Limit: 500
+X-Per-Minute-RateLimit-Remaining: 499
+X-Per-Minute-RateLimit-Reset: 60
+```
+
+The `header_prefix` field lets clients distinguish which rule triggered the 
rejection — critical for debugging and client-side retry logic.
+
+## Variable Support: Context-Aware Rate Limiting
+
+### The Problem
+
+Static rate limits assume every consumer is equal. In reality, a free-tier 
user and an enterprise customer should have very different quotas. Before 3.16, 
supporting this meant creating separate routes for each tier — leading to route 
explosion and configuration drift.
+
+### The Solution
+
+Variable support lets you pull rate limiting parameters directly from the 
request context. The `count`, `time_window`, and `key` fields now accept APISIX 
variables.
+
+### Example 1: Per-Tier Rate Limiting via HTTP Header
+
+Suppose your authentication middleware injects an `X-Rate-Quota` header based 
on the user's subscription tier. Pair `limit-count` with an auth plugin such as 
`key-auth` so that `${consumer_name}` is available as the rate limit key:
+
+```json
+{
+  "uri": "/api/v1/*",
+  "plugins": {
+    "key-auth": {},
+    "limit-count": {
+      "rules": [
+        {
+          "count": "${http_x_rate_quota ?? 100}",
+          "time_window": 60,
+          "key": "${consumer_name}"
+        }
+      ],
+      "rejected_code": 429
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+Now the same route handles all tiers:
+
+| Tier | `X-Rate-Quota` Header | Effective Limit |
+|------|----------------------|-----------------|
+| Free | 100 | 100 req/min |
+| Pro | 1000 | 1,000 req/min |
+| Enterprise | 50000 | 50,000 req/min |
+
+One route. One plugin configuration. All tiers.
+
+### Example 2: Multi-Tenant Isolation with Variable Combination
+
+For a multi-tenant SaaS API, you can combine variables to create isolated rate 
limit buckets per tenant per endpoint:
+
+```json
+{
+  "uri": "/api/v1/*",
+  "plugins": {
+    "limit-count": {
+      "rules": [
+        {
+          "count": 1000,
+          "time_window": 60,
+          "key": "${http_x_tenant_id} ${uri}"
+        }
+      ],
+      "rejected_code": 429
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+Tenant A calling `/api/v1/users` and Tenant B calling the same endpoint get 
independent counters. Tenant A calling `/api/v1/orders` gets yet another 
counter. This creates a natural per-tenant-per-endpoint isolation without any 
route duplication.
+
+### Example 3: Dynamic Concurrent Connection Limits
+
+The `limit-conn` plugin also supports rules and variables, enabling dynamic 
concurrency control. The example below uses `key-auth` so each consumer gets 
its own connection quota, while a shared cap applies across all consumers using 
`${http_host ?? global}` as the shared key:
+
+```json
+{
+  "uri": "/api/v1/inference",
+  "plugins": {
+    "key-auth": {},
+    "limit-conn": {
+      "default_conn_delay": 0.1,
+      "rules": [
+        {
+          "conn": 5,
+          "burst": 2,
+          "key": "${consumer_name}"
+        },
+        {
+          "conn": 100,
+          "burst": 20,
+          "key": "${http_host ?? global}"
+        }
+      ],
+      "rejected_code": 503
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+This limits each consumer to 5 concurrent connections while capping the total 
at 100 — preventing any single consumer from monopolizing backend capacity.
+
+## AI Rate Limiting: Token Budget Management
+
+For AI gateway use cases, the `ai-rate-limiting` plugin works alongside 
`ai-proxy` to enforce token budgets at the gateway level. It combines multiple 
rules with variable support for fine-grained control:
+
+```json
+{
+  "uri": "/v1/chat/completions",
+  "plugins": {
+    "ai-rate-limiting": {
+      "limit_strategy": "total_tokens",
+      "rules": [
+        {
+          "count": 10000,
+          "time_window": 60,
+          "key": "${consumer_name}_per_minute",
+          "header_prefix": "consumer"
+        },
+        {
+          "count": 500000,
+          "time_window": 86400,
+          "key": "${consumer_name}_per_day",
+          "header_prefix": "daily"
+        },
+        {
+          "count": 1000000,
+          "time_window": 60,
+          "key": "${http_host ?? global}",
+          "header_prefix": "global"
+        }
+      ],
+      "rejected_code": 429
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+This configuration enforces three simultaneous constraints:
+
+1. **Per-consumer burst**: 10,000 tokens per minute per consumer
+2. **Per-consumer daily**: 500,000 tokens per day per consumer
+3. **Global capacity**: 1,000,000 tokens per minute across all consumers
+
+As AI API costs scale directly with token usage, this kind of layered budget 
control is essential for production AI gateways.
+
+## Combining Multiple Rules with Variables
+
+The real power emerges when you combine both features. Here is a complete 
example for an API platform with tiered pricing. It uses `key-auth` to identify 
consumers, reads per-consumer quotas from request headers, and maintains a 
shared global safety cap via `${http_host ?? global}`:
+
+```json
+{
+  "uri": "/api/v1/*",
+  "plugins": {
+    "key-auth": {},
+    "limit-count": {
+      "rules": [
+        {
+          "count": "${http_x_burst_quota ?? 10}",
+          "time_window": 1,
+          "key": "${consumer_name}_per_second",
+          "header_prefix": "burst"
+        },
+        {
+          "count": "${http_x_sustained_quota ?? 500}",
+          "time_window": 60,
+          "key": "${consumer_name}_per_minute",
+          "header_prefix": "sustained"
+        },
+        {
+          "count": 100000,
+          "time_window": 60,
+          "key": "${http_host ?? global}",
+          "header_prefix": "global"
+        }
+      ],
+      "rejected_code": 429
+    }
+  },
+  "upstream": {
+    "type": "roundrobin",
+    "nodes": {
+      "127.0.0.1:1980": 1
+    }
+  }
+}
+```
+
+The authentication layer sets per-consumer burst and sustained quotas via 
headers. APISIX enforces both per-consumer limits dynamically while maintaining 
a static global safety cap. No route duplication. No configuration drift 
between tiers.
+
+## What's Next
+
+The `limit-req` plugin (leaky bucket algorithm) does not yet support the 
`rules` array ([#13179](https://github.com/apache/apisix/issues/13179)). We 
welcome community contributions to bring it to feature parity.
+
+We are also exploring deeper integration with external policy engines, 
enabling rate limiting quotas to be fetched from external key-value stores or 
policy services at runtime.
+
+## Getting Started
+
+Upgrade to APISIX 3.16:
+
+```bash
+# Docker
+docker pull apache/apisix:3.16.0
+
+# Helm
+helm repo update
+helm upgrade apisix apisix/apisix --set image.tag=3.16.0
+```
+
+Check the full documentation:
+
+- [limit-count 
plugin](https://apisix.apache.org/docs/apisix/plugins/limit-count/)
+- [limit-conn 
plugin](https://apisix.apache.org/docs/apisix/plugins/limit-conn/)
+- [ai-rate-limiting 
plugin](https://apisix.apache.org/docs/apisix/plugins/ai-rate-limiting/)

Reply via email to