janiussyafiq opened a new issue, #13352:
URL: https://github.com/apache/apisix/issues/13352

   ### Context
   
   `apisix/plugins/ai-aliyun-content-moderation.lua`, 
`apisix/plugins/ai-aws-content-moderation.lua`, and the upcoming 
`ai-lakera-guard` plugin ([api7/rfcs#32](https://github.com/api7/rfcs/pull/32)) 
all use `protocols.get(ctx.ai_client_protocol).extract_request_content(body)` 
to assemble the text passed to their vendor moderation API.
   
   Each protocol's `extract_request_content` (`openai-chat`, 
`openai-responses`, `anthropic-messages`, `bedrock-converse`) currently walks 
**only `body.messages[].content`**. Two coverage gaps are inherited by every 
moderation plugin that uses it.
   
   ### Gap 1 — Anthropic top-level `system:`
   
   Anthropic places system prompts outside `messages[]` (top-level 
`body.system` string). `anthropic-messages.lua`'s `extract_request_content` 
(lines 177-198) does not surface it. The same module's `get_messages` (lines 
203-228) **does** include it, so there's a clear precedent for treating 
`system` as scannable content — `extract_request_content` just doesn't.
   
   The textbook prompt-injection attack — *"ignore your previous instructions 
and..."* — most commonly targets the system role. Anthropic-format requests 
bypass scanning entirely for this attack class. OpenAI Chat / Responses and 
Bedrock Converse put system content *inside* `messages[]` and are unaffected.
   
   ### Gap 2 — `body.tools[]` definitions
   
   Function-call schemas — `tools[].function.description`, 
`tools[].function.parameters`, the equivalents in Anthropic and Bedrock — are 
not scanned by any current `extract_request_content` implementation. A 
maliciously-crafted tool description (instructions to exfiltrate, jailbreak via 
tool name, etc.) bypasses scanning across all four protocols.
   
   ### Proposed fix
   
   Extend `extract_request_content` in each protocol module:
   
   - `anthropic-messages.lua`: include `body.system` string content; walk 
`body.tools[]` (`name`, `description`, `input_schema` text content).
   - `openai-chat.lua` and `openai-responses.lua`: walk `body.tools[]` 
(`function.name`, `function.description`, `function.parameters` schema text 
content).
   - `bedrock-converse.lua`: include `body.system` block content; walk 
`body.toolConfig.tools[]` (`toolSpec.name`, `toolSpec.description`, 
`toolSpec.inputSchema` text content).
   
   ### Side effects
   
   This is a behavior change. Existing `ai-aliyun-content-moderation` and 
`ai-aws-content-moderation` users may see traffic that previously passed now 
flag, because more content is being scanned. Test plans for both plugins should 
add coverage for the new fields. Worth flagging in release notes.
   
   ### Discovered
   
   While drafting [api7/rfcs#32](https://github.com/api7/rfcs/pull/32) §3.2 
(`ai-lakera-guard` known limitations).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to