janiussyafiq commented on PR #13308:
URL: https://github.com/apache/apisix/pull/13308#issuecomment-4443532662

   Closing this PR to address @membphis's review properly.
   
   Splitting the work into a phased series so the cache-key correctness 
conversation can be reviewed independently of semantic caching, embedding 
providers, and multi-protocol support:
   
   - **Phase 1**: exact (L1) cache only, **for `openai-chat` protocol only**, 
with a conservative cache key that accounts for everything affecting LLM output 
— effective model post-override, protocol, picked upstream instance, full 
structured `messages` (not concatenated text), whitelisted generation 
parameters, stream flag, and operator-scoped consumer/vars. All 
`openai-chat`-compatible providers (Azure OpenAI, DeepSeek, OpenRouter, 
Together, Fireworks, Ollama, etc.) work transparently. Sliced into 5 small 
reviewable PRs.
   - **Phase 2**: additional client protocols (`openai-responses`, 
`anthropic-messages`, `bedrock-converse`) — one PR each.
   - **Phase 3+**: semantic (L2) embedding-based cache as its own design track.
   
   The full plan including PR breakdown, cache-key invariants, and per-PR test 
gates is captured in a PRD on my fork: 
https://github.com/janiussyafiq/apisix/issues/13
   
   The first PR in the series — a small, behavior-preserving `ai-proxy` 
refactor that extracts the instance-override application logic into helpers (so 
the cache key can reflect operator-side overrides correctly) — will follow 
shortly.
   
   Thanks for the careful review.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to