[I] bug: ai-rate-limiting is applied globally instead of per consumer [apisix]

via GitHub Mon, 12 Jan 2026 23:22:43 -0800


jman0815 opened a new issue, #12896:
URL: https://github.com/apache/apisix/issues/12896


   ### Current Behavior
   
   When I add a consumer (e.g. georg) with an ai-rate-limitin configuration as 
shown in the example, and then create a second consumer (e.g. martin) with the 
same plugin configuration but a different key-auth API key, both consumers end 
up sharing the same rate limit for an instance.
   Once georg reaches the configured token limit, request from martin are also 
rejected with 429 "Configured rate limit reached", even though martin has his 
own consumer entry and API key.
   The rate limitin appears to be applied globally per model, rather than per 
consumer, as described in the documentation. 
   
   {
     "username":"georg",
     "plugins":{
       "key-auth":{
         "key":"Bearer "
       },
       "ai-rate-limiting":{
         "instances":[
           {
             "name":"gpt-oss-120b",
             "limit_strategy":"prompt_tokens",
             "time_window":60,
             "limit":400
           },
           {
             "name":"bge-m3",
             "limit_strategy":"prompt_tokens",
             "time_window":60,
             "limit":10000
           },
           {
             "name":"gpt-oss-120b",
             "limit_strategy":"completion_tokens",
             "time_window":60,
             "limit":400
           }
         ],
         "rejected_code":429,
         "rejected_msg":"Configured rate limit reached",
         "show_limit_quota_header":true
       }
     }
   }
   
   ### Expected Behavior
   
   Each consumer should have an independent rate limit quota.
   When multiple consumers (e.g. georg and marting) are configured  with the 
same ai-rate-limiting plugin settings but different key-auth API keys, the 
token limit should be enforced per consumer, not shared globally.
   If georg reaches his configured token limit, only request from georg should 
be rejected with 429 ..., while martin should still be able to make requests 
within his own quota.
   
   ### Error Logs
   
   _No response_
   
   ### Steps to Reproduce
   
   - Create a Consumer named georg with key-auth enabled and configure the 
ai-rate-limiting plugin with a token limit (e.g. 400 tokens per 60 seconds for 
an instance).
   - Create a second Consumer named martin with the same ai-rate-limiting 
configuration, but with a different key-auth API key.
   - Send requests using georg’s API key until the configured token limit is 
reached and requests start returning: "429 Configured rate limit reached"
   - Immediately send requests using martin’s API key.
   - Observe that martin’s requests are also rejected with: "429 Configured 
rate limit reached"
   
   ### Environment
   
   - APISIX version (run `apisix version`):
     3.14.1
   
   - Operating system (run `uname -a`):
     Linux (Kubernetes container, official APISIX Docker image)
   
   - OpenResty / Nginx version (run `openresty -V` or `nginx -V`):
     OpenResty (bundled with APISIX Docker image)
   
   - etcd version, if relevant (run `curl 
http://127.0.0.1:9090/v1/server_info`):
     v3.6.0 (self-deployed in Kubernetes)
   
   - APISIX Dashboard version, if relevant:
     Not used
   
   - Plugin runner version, for issues related to plugin runners:
     Not used (using a serverless-post-function)
   
   - LuaRocks version, for installation issues (run `luarocks --version`):
     Not applicable (using official Docker image)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] bug: ai-rate-limiting is applied globally instead of per consumer [apisix]

Reply via email to