membphis commented on PR #13578:
URL: https://github.com/apache/apisix/pull/13578#issuecomment-4786071487

   I found two merge-blocking issues in the current `ai-cache` implementation:
   
   ### [P1] Cache key does not include the effective model or picked AI instance
   
   `ai-cache` computes the fingerprint from `ctx.var.request_llm_model or 
body.model`, but it does not include `ctx.picked_ai_instance_name`, provider, 
or the route / instance effective `options.model`:
   
   - `apisix/plugins/ai-cache/key.lua`: the fingerprint uses only protocol, 
requested model, normalized messages, and remaining body params.
   - `apisix/plugins/ai-cache.lua`: the lookup happens in `access`, before the 
upstream request is built.
   - `ai-proxy-multi` has already selected `ctx.picked_ai_instance` before 
lower-priority plugins run, so that selected instance is available at cache 
lookup time.
   
   This can return the wrong provider/model response on an `ai-proxy-multi` 
route. A request can warm the cache through instance A, then a later identical 
request can be routed to instance B but still hit and replay instance A's 
response because both requests share the same cache key.
   
   This should be fixed before merge by including the selected AI instance 
and/or effective model/provider in the cache key or scope, with a regression 
test covering `ai-proxy-multi` instances that use different models or providers.
   
   ### [P2] The plugin can cache ordinary JSON traffic when it is not behind 
`ai-proxy`
   
   The docs say `ai-cache` must be used with `ai-proxy` or `ai-proxy-multi`, 
but the implementation does not enforce or safely bypass that condition. 
`ai-cache.access` reads any JSON request body, computes a key, and marks the 
request as `MISS`; then `log` writes any 200 response to Redis. There is no 
`ctx.picked_ai_instance` guard like the existing AI moderation plugins use.
   
   If the plugin is accidentally attached at Route / Service / Consumer level 
without an AI proxy, ordinary JSON upstream responses can be cached and 
replayed. That is a surprising behavior and can leak stale or incorrect non-AI 
responses.
   
   Please add a guard before key computation, either bypassing by default or 
using the shared `ai-protocols.binding` `fail_mode` behavior, and add coverage 
for the no-`ai-proxy` case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to