codeant-ai-for-open-source[bot] commented on code in PR #40448:
URL: https://github.com/apache/superset/pull/40448#discussion_r3305660053


##########
tests/unit_tests/mcp_service/utils/test_sanitization.py:
##########
@@ -824,3 +824,169 @@ def 
test_error_responses_sanitize_prompt_facing_error_text(error_schema: type) -
         "Missing x [ESCAPED-UNTRUSTED-CONTENT-CLOSE] y\n"
         f"{LLM_CONTEXT_CLOSE_DELIMITER}"
     )
+
+
+# ---------------------------------------------------------------------------
+# sanitize_sql_expression — Ticket #3.
+#
+# Locks in three properties of the SQL-metric sanitizer:
+#   1. legitimate SQL aggregate expressions pass through unchanged,
+#   2. the on\w+= event-handler check is NOT inherited (would false-positive
+#      on `monthly = 12`),
+#   3. statement stacking / comments / DDL+DML / XSS are rejected, while
+#      subqueries pass through (subquery policy lives in Superset core's
+#      ALLOW_ADHOC_SUBQUERY feature flag, not here).
+# ---------------------------------------------------------------------------
+
+
+def _sanitize_sql():
+    """Import lazily so the import error surfaces as a per-test failure."""
+    from superset.mcp_service.utils.sanitization import sanitize_sql_expression
+
+    return sanitize_sql_expression
+
+
+def test_sanitize_sql_expression_allows_ticket_example():
+    sanitize_sql_expression = _sanitize_sql()
+    expr = "COUNT(CASE WHEN closed_won THEN 1 END)::numeric / 
NULLIF(COUNT(*),0)"
+    assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_no_false_positive_on_equals():
+    """`monthly = 12` must pass; sanitize_user_input's on\\w+= check matches
+    `on`+`thly`+`=` and would block it. This locks in that the new sanitizer
+    is independent of sanitize_user_input."""
+    sanitize_sql_expression = _sanitize_sql()
+    expr = "SUM(CASE WHEN monthly = 12 THEN 1 END)"
+    assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_abs_and_casts():
+    sanitize_sql_expression = _sanitize_sql()
+    expr = "ABS(SUM(amount))::numeric / 100.0"
+    assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_subquery():
+    """Subquery policy belongs to Superset core (ALLOW_ADHOC_SUBQUERY).
+    The MCP-layer sanitizer must NOT block SELECT — otherwise it would
+    override the admin's feature-flag choice."""
+    sanitize_sql_expression = _sanitize_sql()
+    expr = "(SELECT AVG(x) FROM other_table)"
+    assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_allows_backticks():
+    """MySQL/MariaDB use backticks for identifier quoting
+    (``SUM(`Order Date`)``). The SQL execution path has no shell, so the
+    shell-metacharacter concern that blocks backticks in filter values
+    does not apply here. Regression test for an earlier defensive block
+    that broke MySQL identifier syntax."""
+    sanitize_sql_expression = _sanitize_sql()
+    expr = "SUM(`Order Date`)"
+    assert sanitize_sql_expression(expr, "sql_expression") == expr
+
+
+def test_sanitize_sql_expression_blocks_statement_stacking():
+    sanitize_sql_expression = _sanitize_sql()
+    with pytest.raises(ValueError, match="statement stacking"):
+        sanitize_sql_expression("SUM(amount); DROP TABLE users", 
"sql_expression")
+
+
+def test_sanitize_sql_expression_blocks_line_comment():
+    sanitize_sql_expression = _sanitize_sql()
+    with pytest.raises(ValueError, match="comment"):
+        sanitize_sql_expression("SUM(amount) -- inject", "sql_expression")
+
+
+def test_sanitize_sql_expression_blocks_block_comment():
+    sanitize_sql_expression = _sanitize_sql()
+    with pytest.raises(ValueError, match="comment"):
+        sanitize_sql_expression("SUM(amount) /* inject */", "sql_expression")
+
+
[email protected](
+    "expr",
+    [
+        "DROP TABLE users",
+        "DELETE FROM users",
+        "INSERT INTO users VALUES (1)",
+        "UPDATE users SET x=1",
+        "ALTER TABLE users ADD COLUMN x int",
+        "TRUNCATE users",
+        "GRANT ALL ON users TO public",
+        "EXEC sp_helpdb",
+    ],
+)
+def test_sanitize_sql_expression_blocks_ddl_dml(expr: str):
+    sanitize_sql_expression = _sanitize_sql()
+    with pytest.raises(ValueError, match="disallowed"):
+        sanitize_sql_expression(expr, "sql_expression")
+
+
+def test_sanitize_sql_expression_strips_script_tags():
+    """XSS guard: the raw expression is echoed back into chart names, labels,
+    previews, and LLM-context responses, so HTML tags must not survive."""
+    sanitize_sql_expression = _sanitize_sql()
+    raw = "SUM(amount)<script>alert(1)</script>"
+    cleaned = sanitize_sql_expression(raw, "sql_expression")
+    assert "<script" not in cleaned.lower()
+    assert "alert(1)" not in cleaned

Review Comment:
   **Suggestion:** This test asserts that `alert(1)` is removed from the 
sanitized SQL expression, but `sanitize_sql_expression` only strips HTML tags 
and does not remove safe text content inside those tags. With current sanitizer 
behavior, `<script>` tags are removed while `alert(1)` remains, so this 
assertion will fail even when sanitization is working as designed. Update the 
expectation to validate tag stripping only (or change sanitizer behavior 
explicitly if full script-content removal is actually required). [incorrect 
condition logic]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP SQL expression sanitizer tests fail in CI.
   - ⚠️ XSS test expectation misaligned with sanitizer design.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `tests/unit_tests/mcp_service/utils/test_sanitization.py` and locate
   `test_sanitize_sql_expression_strips_script_tags` (around lines 88–95 in the 
current
   file), where `raw = "SUM(amount)<script>alert(1)</script>"`, `cleaned =
   sanitize_sql_expression(raw, "sql_expression")`, and the test asserts both 
`"<script" not
   in cleaned.lower()` and `"alert(1)" not in cleaned` (lines 92–95 from the 
tool output).
   
   2. Note that `_sanitize_sql()` in the same file imports 
`sanitize_sql_expression` from
   `superset.mcp_service.utils.sanitization` (lines 3–7), so this test directly 
exercises the
   production sanitizer implementation.
   
   3. In `superset/mcp_service/utils/sanitization.py`, locate 
`sanitize_sql_expression` at
   line 474 (from Grep output) and see that it calls `_strip_html_tags(value)` 
before other
   checks (lines 49–52 of the snippet at offset 460). `_strip_html_tags` is 
defined at line
   174 and uses `nh3.clean(decoded, tags=set(), url_schemes=set())` (lines 
35–37 of that
   snippet) to strip all HTML tags but preserve the inner text content.
   
   4. Run `pytest
   
tests/unit_tests/mcp_service/utils/test_sanitization.py::test_sanitize_sql_expression_strips_script_tags`.
   The call `sanitize_sql_expression("SUM(amount)<script>alert(1)</script>",
   "sql_expression")` returns a string where `<script>`/`</script>` have been 
removed but
   `alert(1)` remains (e.g. `SUM(amount)alert(1)`), so the first assertion 
about `<script`
   passes, but the second assertion `assert "alert(1)" not in cleaned` fails, 
even though the
   sanitizer is behaving as implemented (stripping tags only, not their text 
content).
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=690f1674718d467195e97e3d76c0ee3e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=690f1674718d467195e97e3d76c0ee3e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/utils/test_sanitization.py
   **Line:** 931:934
   **Comment:**
        *Incorrect Condition Logic: This test asserts that `alert(1)` is 
removed from the sanitized SQL expression, but `sanitize_sql_expression` only 
strips HTML tags and does not remove safe text content inside those tags. With 
current sanitizer behavior, `<script>` tags are removed while `alert(1)` 
remains, so this assertion will fail even when sanitization is working as 
designed. Update the expectation to validate tag stripping only (or change 
sanitizer behavior explicitly if full script-content removal is actually 
required).
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40448&comment_hash=ea8ae54b787783db5f4e27e59ea11b864973a55c824404fdb42f9f41efdd5f20&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40448&comment_hash=ea8ae54b787783db5f4e27e59ea11b864973a55c824404fdb42f9f41efdd5f20&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to