codeant-ai-for-open-source[bot] commented on code in PR #40448:
URL: https://github.com/apache/superset/pull/40448#discussion_r3305660053
##########
tests/unit_tests/mcp_service/utils/test_sanitization.py:
##########
@@ -824,3 +824,169 @@ def
test_error_responses_sanitize_prompt_facing_error_text(error_schema: type) -
"Missing x [ESCAPED-UNTRUSTED-CONTENT-CLOSE] y\n"
f"{LLM_CONTEXT_CLOSE_DELIMITER}"
)
+
+
+# ---------------------------------------------------------------------------
+# sanitize_sql_expression — Ticket #3.
+#
+# Locks in three properties of the SQL-metric sanitizer:
+# 1. legitimate SQL aggregate expressions pass through unchanged,
+# 2. the on\w+= event-handler check is NOT inherited (would false-positive
+# on `monthly = 12`),
+# 3. statement stacking / comments / DDL+DML / XSS are rejected, while
+# subqueries pass through (subquery policy lives in Superset core's
+# ALLOW_ADHOC_SUBQUERY feature flag, not here).
+# ---------------------------------------------------------------------------
+
+
+def _sanitize_sql():
+ """Import lazily so the import error surfaces as a per-test failure."""
+ from superset.mcp_service.utils.sanitization import sanitize_sql_expression
+
+ return sanitize_sql_expression
+
+
+def test_sanitize_sql_expression_allows_ticket_example():
+ sanitize_sql_expression = _sanitize_sql()
+ expr = "COUNT(CASE WHEN closed_won THEN 1 END)::numeric /
NULLIF(COUNT(*),0)"
+ assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_no_false_positive_on_equals():
+ """`monthly = 12` must pass; sanitize_user_input's on\\w+= check matches
+ `on`+`thly`+`=` and would block it. This locks in that the new sanitizer
+ is independent of sanitize_user_input."""
+ sanitize_sql_expression = _sanitize_sql()
+ expr = "SUM(CASE WHEN monthly = 12 THEN 1 END)"
+ assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_abs_and_casts():
+ sanitize_sql_expression = _sanitize_sql()
+ expr = "ABS(SUM(amount))::numeric / 100.0"
+ assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_subquery():
+ """Subquery policy belongs to Superset core (ALLOW_ADHOC_SUBQUERY).
+ The MCP-layer sanitizer must NOT block SELECT — otherwise it would
+ override the admin's feature-flag choice."""
+ sanitize_sql_expression = _sanitize_sql()
+ expr = "(SELECT AVG(x) FROM other_table)"
+ assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_backticks():
+ """MySQL/MariaDB use backticks for identifier quoting
+ (``SUM(`Order Date`)``). The SQL execution path has no shell, so the
+ shell-metacharacter concern that blocks backticks in filter values
+ does not apply here. Regression test for an earlier defensive block
+ that broke MySQL identifier syntax."""
+ sanitize_sql_expression = _sanitize_sql()
+ expr = "SUM(`Order Date`)"
+ assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_blocks_statement_stacking():
+ sanitize_sql_expression = _sanitize_sql()
+ with pytest.raises(ValueError, match="statement stacking"):
+ sanitize_sql_expression("SUM(amount); DROP TABLE users",
"sql_expression")
+
+
+def test_sanitize_sql_expression_blocks_line_comment():
+ sanitize_sql_expression = _sanitize_sql()
+ with pytest.raises(ValueError, match="comment"):
+ sanitize_sql_expression("SUM(amount) -- inject", "sql_expression")
+
+
+def test_sanitize_sql_expression_blocks_block_comment():
+ sanitize_sql_expression = _sanitize_sql()
+ with pytest.raises(ValueError, match="comment"):
+ sanitize_sql_expression("SUM(amount) /* inject */", "sql_expression")
+
+
[email protected](
+ "expr",
+ [
+ "DROP TABLE users",
+ "DELETE FROM users",
+ "INSERT INTO users VALUES (1)",
+ "UPDATE users SET x=1",
+ "ALTER TABLE users ADD COLUMN x int",
+ "TRUNCATE users",
+ "GRANT ALL ON users TO public",
+ "EXEC sp_helpdb",
+ ],
+)
+def test_sanitize_sql_expression_blocks_ddl_dml(expr: str):
+ sanitize_sql_expression = _sanitize_sql()
+ with pytest.raises(ValueError, match="disallowed"):
+ sanitize_sql_expression(expr, "sql_expression")
+
+
+def test_sanitize_sql_expression_strips_script_tags():
+ """XSS guard: the raw expression is echoed back into chart names, labels,
+ previews, and LLM-context responses, so HTML tags must not survive."""
+ sanitize_sql_expression = _sanitize_sql()
+ raw = "SUM(amount)<script>alert(1)</script>"
+ cleaned = sanitize_sql_expression(raw, "sql_expression")
+ assert "<script" not in cleaned.lower()
+ assert "alert(1)" not in cleaned
Review Comment:
**Suggestion:** This test asserts that `alert(1)` is removed from the
sanitized SQL expression, but `sanitize_sql_expression` only strips HTML tags
and does not remove safe text content inside those tags. With current sanitizer
behavior, `<script>` tags are removed while `alert(1)` remains, so this
assertion will fail even when sanitization is working as designed. Update the
expectation to validate tag stripping only (or change sanitizer behavior
explicitly if full script-content removal is actually required). [incorrect
condition logic]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP SQL expression sanitizer tests fail in CI.
- ⚠️ XSS test expectation misaligned with sanitizer design.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `tests/unit_tests/mcp_service/utils/test_sanitization.py` and locate
`test_sanitize_sql_expression_strips_script_tags` (around lines 88–95 in the
current
file), where `raw = "SUM(amount)<script>alert(1)</script>"`, `cleaned =
sanitize_sql_expression(raw, "sql_expression")`, and the test asserts both
`"<script" not
in cleaned.lower()` and `"alert(1)" not in cleaned` (lines 92–95 from the
tool output).
2. Note that `_sanitize_sql()` in the same file imports
`sanitize_sql_expression` from
`superset.mcp_service.utils.sanitization` (lines 3–7), so this test directly
exercises the
production sanitizer implementation.
3. In `superset/mcp_service/utils/sanitization.py`, locate
`sanitize_sql_expression` at
line 474 (from Grep output) and see that it calls `_strip_html_tags(value)`
before other
checks (lines 49–52 of the snippet at offset 460). `_strip_html_tags` is
defined at line
174 and uses `nh3.clean(decoded, tags=set(), url_schemes=set())` (lines
35–37 of that
snippet) to strip all HTML tags but preserve the inner text content.
4. Run `pytest
tests/unit_tests/mcp_service/utils/test_sanitization.py::test_sanitize_sql_expression_strips_script_tags`.
The call `sanitize_sql_expression("SUM(amount)<script>alert(1)</script>",
"sql_expression")` returns a string where `<script>`/`</script>` have been
removed but
`alert(1)` remains (e.g. `SUM(amount)alert(1)`), so the first assertion
about `<script`
passes, but the second assertion `assert "alert(1)" not in cleaned` fails,
even though the
sanitizer is behaving as implemented (stripping tags only, not their text
content).
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=690f1674718d467195e97e3d76c0ee3e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=690f1674718d467195e97e3d76c0ee3e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/mcp_service/utils/test_sanitization.py
**Line:** 931:934
**Comment:**
*Incorrect Condition Logic: This test asserts that `alert(1)` is
removed from the sanitized SQL expression, but `sanitize_sql_expression` only
strips HTML tags and does not remove safe text content inside those tags. With
current sanitizer behavior, `<script>` tags are removed while `alert(1)`
remains, so this assertion will fail even when sanitization is working as
designed. Update the expectation to validate tag stripping only (or change
sanitizer behavior explicitly if full script-content removal is actually
required).
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40448&comment_hash=ea8ae54b787783db5f4e27e59ea11b864973a55c824404fdb42f9f41efdd5f20&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40448&comment_hash=ea8ae54b787783db5f4e27e59ea11b864973a55c824404fdb42f9f41efdd5f20&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]