nic-6443 opened a new pull request, #13191:
URL: https://github.com/apache/apisix/pull/13191

   ## Description
   
   Add a new `expression` option for the `limit_strategy` field in the 
`ai-rate-limiting` plugin, allowing users to define custom Lua arithmetic 
expressions for dynamic token cost calculation.
   
   ### Motivation
   
   Different LLM providers return varying usage fields (e.g., Anthropic returns 
`cache_creation_input_tokens`, `cache_read_input_tokens`). The existing 
strategies (`total_tokens`, `prompt_tokens`, `completion_tokens`) cannot 
account for provider-specific fields or apply weighted costs. The `expression` 
strategy allows users to define exactly how token cost is calculated.
   
   ### Changes
   
   **Plugin (`apisix/plugins/ai-rate-limiting.lua`):**
   - Added `"expression"` to the `limit_strategy` enum
   - Added `cost_expr` schema field (Lua arithmetic expression string)
   - Added sandboxed expression evaluation:
     - `expr_safe_env`: safe math functions (`abs`, `ceil`, `floor`, `max`, 
`min`)
     - `compile_cost_expr()`: validates expression syntax at config time
     - `eval_cost_expr()`: evaluates expression against raw LLM usage at runtime
   - Updated `check_schema()` to require valid `cost_expr` when strategy is 
`expression`
   - Updated `get_token_usage()` to evaluate expression against 
`ctx.llm_raw_usage`
   
   **Tests (`t/plugin/ai-rate-limiting-expression.t`):**
   - Schema validation (expression requires cost_expr, rejects empty/invalid 
syntax)
   - Non-streaming Anthropic requests with expression rate limiting
   - Streaming Anthropic requests with expression rate limiting
   - Cache-aware expressions (excludes cache_read_input_tokens)
   - Weighted expressions (different cost multipliers per field)
   - Missing variables default to 0
   
   ### Usage Examples
   
   ```json
   {
     "ai-rate-limiting": {
       "limit": 500,
       "time_window": 60,
       "limit_strategy": "expression",
       "cost_expr": "input_tokens + cache_creation_input_tokens + output_tokens"
     }
   }
   ```
   
   Weighted cost with cache discount:
   ```json
   {
     "cost_expr": "input_tokens + cache_read_input_tokens * 0.1 + 
cache_creation_input_tokens * 1.25 + output_tokens"
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to