Re: [PR] feat: ai-cache plugin (exact-match cache for openai-chat) [apisix]

via GitHub Fri, 12 Jun 2026 09:41:02 -0700


janiussyafiq commented on PR #13424:
URL: https://github.com/apache/apisix/pull/13424#issuecomment-4693267842


   **Restructure incoming: converging on a reference exact-match cache design — 
smaller, with zero shared-file changes.**
   
   Following up on the phase-1 direction requested in #13308 ("focus only on 
exact cache… make the cache key correct and conservative") and RFC #13290, I am 
reshaping this PR to shrink its review surface and remove **all** edits to 
`ai-proxy/base.lua`, `ai-providers/base.lua`, `apisix/cli/ngx_tpl.lua`, 
`apisix/utils/redis-schema.lua`, and `t/APISIX.pm`. The next push brings the 
diff from 2,395 additions / 15 files down to ~1,900 / 9 files, with production 
code down from ~510 to ~370 lines — all in new files, except one plugin-list 
line each in `cli/config.lua` and `config.yaml.example`.
   
   What changes and why:
   
   - **Key on the request as received**, scoped by `ctx.conf_id` + 
`ctx.conf_version`. This covers the "route/service/plugin config" context asked 
for in #13308 directly: any config edit (including an in-place `ai-proxy` 
model-override change) bumps `conf_version` and makes old entries unreachable — 
stricter than the previous instance-name scoping, which missed in-place 
override edits. The model and all generation parameters are hashed in the body; 
the effective-request computation, the provider-base refactor, and the failover 
write-skip guard are deleted entirely. Routes using `ai-proxy-multi` **bypass 
caching** in PR1, so one route = one fixed instance and cross-instance serving 
is impossible; multi-instance semantics get their own follow-up PR where that 
design can be discussed in isolation.
   - **Lookup moves from `before_proxy` (priority 1086) to `access` (priority 
1028)**, following the existing `ai-aliyun-content-moderation` pattern, with 
the response captured via the plugin's own `lua_body_filter`. Hit/write 
semantics are unchanged, hits stay subject to `ai-rate-limiting` and 
request-side moderation, and this fixes an ordering issue where protocol 
detection at 1086 ran before `ai-request-rewrite` (1073) could mutate the body.
   - **Single-node Redis only**; `redis-cluster` (schema branch + slot-lock 
dict wiring) moves to a stacked follow-up, where it will also gain a real 
cluster round-trip test against the CI cluster service.
   - Kept deliberately: 2xx-only writes and hit/miss headers per RFC #13290, 
JSON validation + 1 MiB write cap, corrupt-entry cleanup, fail-open on every 
Redis/key error, temperature/top_p milli-quantisation, and 
`core.json.stably_encode` canonicalisation with SHA-256.
   
   **Stack:** ① `redis-schema` shared-table fix (separate small PR, opening now 
— it is a pre-existing bug this PR previously fixed in passing) → ② this PR → ③ 
redis-cluster support → ④ ai-proxy-multi semantics (draft, design question 
stated up front) → ⑤ RFC #13290 alignment (consumer scoping, `bypass_on`, 
headers schema, Prometheus metrics). Each stage is independently shippable, and 
every deferred behavior degrades to a cache MISS, never an incorrect hit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: ai-cache plugin (exact-match cache for openai-chat) [apisix]

Reply via email to