janiussyafiq opened a new pull request, #13578:
URL: https://github.com/apache/apisix/pull/13578

   ### Description
   
   Adds a new `ai-cache` plugin that caches LLM responses and replays them for 
subsequent requests that resolve to the same prompt, cutting upstream token 
cost and latency for repetitive workloads (FAQ bots, document Q&A, translation).
   
   This PR implements the **exact (L1)** cache layer:
   
   - **Cache key** — a SHA-256 fingerprint of the request as received: client 
protocol, requested model, normalized messages, and the remaining 
response-determining body parameters (`temperature`, `top_p`, `max_tokens`, 
`tools`, …). Provider-agnostic via `ai-protocols`, so it works for every chat 
protocol `ai-proxy` supports (OpenAI Chat, Anthropic Messages, Bedrock 
Converse, OpenAI Responses).
   - **Storage** — Redis (single-node); connection fields are sourced from 
`apisix.utils.redis-schema` via the `policy` + `if/then` convention used by 
`limit-count` / `limit-req` / `limit-conn`.
   - **Scope** — shared cache by default; opt-in per-consumer / per-variable 
isolation (`cache_key.include_consumer` / `include_vars`).
   - **Behavior** — write-on-2xx only (non-streaming); `cache_bypass` opt-out 
(proxy-cache convention); `max_cache_body_size` cap; `X-AI-Cache-Status` / 
`X-AI-Cache-Age` response headers; fails open (proxies as a normal miss) when 
Redis is unreachable.
   - Runs below `ai-proxy` (priority `1035`) and depends on `ai-proxy` / 
`ai-proxy-multi`.
   
   Semantic cache, streaming support, and observability are planned as 
follow-up PRs. User-facing documentation will be added in a later PR once the 
series is further along.
   
   #### Which issue(s) this PR fixes:
   
   Related to #13290
   
   ### Checklist
   
   - [x] I have explained the need for this PR and the problem it solves
   - [x] I have explained the changes or the new features added to this PR
   - [x] I have added tests corresponding to this change
   - [ ] I have updated the documentation to reflect this change
   - [x] I have verified that this change is backward compatible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to