codeant-ai-for-open-source[bot] commented on code in PR #36722:
URL: https://github.com/apache/superset/pull/36722#discussion_r2629524856


##########
superset/db_engine_specs/mysql.py:
##########
@@ -293,6 +293,11 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         :param cancel_query_id: MySQL Connection ID
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent SQL injection
+        # MySQL CONNECTION_ID() returns an unsigned integer
+        if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):

Review Comment:
   **Suggestion:** Calling `cls.validate_cancel_query_id` will raise an 
AttributeError if `validate_cancel_query_id` is a module-level function (not 
defined as a classmethod/attribute on the class); import and call the function 
from the module instead to ensure it exists at runtime and avoid an attribute 
lookup failure. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           from superset.db_engine_specs.base import validate_cancel_query_id
           if not validate_cancel_query_id(cancel_query_id, r"^\d+$"):
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The downstream context shows validate_cancel_query_id is a module-level 
function (not guaranteed to be an attribute on the class), so invoking it 
through cls.validate_cancel_query_id may fail at runtime with AttributeError. 
Importing and calling the function directly fixes a real runtime/attribute 
lookup issue introduced by the new code.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/mysql.py
   **Line:** 298:298
   **Comment:**
        *Possible Bug: Calling `cls.validate_cancel_query_id` will raise an 
AttributeError if `validate_cancel_query_id` is a module-level function (not 
defined as a classmethod/attribute on the class); import and call the function 
from the module instead to ensure it exists at runtime and avoid an attribute 
lookup failure.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/base.py:
##########
@@ -2177,6 +2177,27 @@ def get_cancel_query_id(  # pylint: 
disable=unused-argument
 
         return None
 
+    @staticmethod
+    def validate_cancel_query_id(
+        cancel_query_id: str | None,
+        pattern: str = r"^\d+$",
+    ) -> bool:
+        """
+        Validate that a cancel_query_id matches expected format.
+
+        This is a defense-in-depth measure to prevent SQL injection in 
cancel_query
+        implementations that use string interpolation. While cancel_query_id 
typically
+        comes from trusted database sources (e.g., CONNECTION_ID()), 
validation ensures
+        safety even if the data source is compromised.
+
+        :param cancel_query_id: The query identifier to validate
+        :param pattern: Regex pattern to match (default: numeric only)
+        :return: True if valid, False otherwise
+        """
+        if cancel_query_id is None:
+            return False
+        return bool(re.match(pattern, str(cancel_query_id)))

Review Comment:
   **Suggestion:** Using re.match can result in partial matches if callers pass 
a pattern without anchors; this may incorrectly validate malicious or malformed 
IDs — use re.fullmatch to require the entire string to match the pattern. 
[possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           return bool(re.fullmatch(pattern, str(cancel_query_id)))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This fixes a real correctness issue: re.match can accept partial matches 
depending
   on the provided pattern, while re.fullmatch ensures the entire identifier 
conforms
   to the expectation, preventing accidental validation of malicious substrings.
   It's a small, safe improvement with no cross-cutting changes.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/base.py
   **Line:** 2199:2199
   **Comment:**
        *Possible Bug: Using re.match can result in partial matches if callers 
pass a pattern without anchors; this may incorrectly validate malicious or 
malformed IDs — use re.fullmatch to require the entire string to match the 
pattern.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/impala.py:
##########
@@ -187,9 +187,14 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
 
         :param cursor: New cursor instance to the db of the query
         :param query: Query instance
-        :param cancel_query_id: impala db not need
+        :param cancel_query_id: Impala query ID in format "hex:hex"
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent URL injection
+        # Impala query IDs are in format: "hex_string:hex_string" (32 hex 
chars with colon)
+        if not cls.validate_cancel_query_id(cancel_query_id, 
r"^[a-f0-9]+:[a-f0-9]+$"):
+            return False
+
         try:
             impala_host = query.database.url_object.host
             url = 
f"http://{impala_host}:25000/cancel_query?query_id={cancel_query_id}";

Review Comment:
   **Suggestion:** The code assumes `query.database.url_object.host` is 
present; if `impala_host` is None or empty the constructed URL will be invalid 
(e.g., "http://None:25000";) and may lead to unexpected requests or errors, so 
check the host and return False early when missing. [null pointer]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               if not impala_host:
                   return False
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Verifying impala_host before building the URL avoids attempting an HTTP call 
to an invalid
   host (e.g. "None") and prevents unnecessary DNS resolution/network errors. 
The current try/except
   would catch those failures, but an explicit guard is clearer and avoids 
making a doomed request.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/impala.py
   **Line:** 200:200
   **Comment:**
        *Null Pointer: The code assumes `query.database.url_object.host` is 
present; if `impala_host` is None or empty the constructed URL will be invalid 
(e.g., "http://None:25000";) and may lead to unexpected requests or errors, so 
check the host and return False early when missing.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/impala.py:
##########
@@ -187,9 +187,14 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
 
         :param cursor: New cursor instance to the db of the query
         :param query: Query instance
-        :param cancel_query_id: impala db not need
+        :param cancel_query_id: Impala query ID in format "hex:hex"
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent URL injection
+        # Impala query IDs are in format: "hex_string:hex_string" (32 hex 
chars with colon)
+        if not cls.validate_cancel_query_id(cancel_query_id, 
r"^[a-f0-9]+:[a-f0-9]+$"):

Review Comment:
   **Suggestion:** The regex used for validation is too permissive and only 
allows lowercase hex and variable-length parts; require exactly 16 hex chars on 
each side (case-insensitive) to match the expected "16hex:16hex" format so an 
attacker cannot pass crafted longer/shorter values to bypass validation. 
[security]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           # Impala query IDs are in format: "hex_string:hex_string" (32 hex 
chars with colon, 16 chars per side)
           if not cls.validate_cancel_query_id(cancel_query_id, 
r"^[A-Fa-f0-9]{16}:[A-Fa-f0-9]{16}$"):
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The PR currently accepts variable-length hex parts and only lowercase chars. 
The actual
   cancel ID is produced as two 16-char hex halves (from .hex()), so tightening 
the regex
   to enforce exactly 16 hex chars per side (and allowing upper/lower) closes a 
small
   validation gap and reduces the attack surface for crafted IDs. This is a 
real security
   hardening and safe to apply.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/impala.py
   **Line:** 194:195
   **Comment:**
        *Security: The regex used for validation is too permissive and only 
allows lowercase hex and variable-length parts; require exactly 16 hex chars on 
each side (case-insensitive) to match the expected "16hex:16hex" format so an 
attacker cannot pass crafted longer/shorter values to bypass validation.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/ocient.py:
##########
@@ -381,7 +381,13 @@ def handle_cursor(cls, cursor: Any, query: Query) -> None:
     def cancel_query(cls, cursor: Any, query: Query, cancel_query_id: str) -> 
bool:
         with OcientEngineSpec.query_id_mapping_lock:

Review Comment:
   **Suggestion:** The code holds `query_id_mapping_lock` while calling 
`cursor.execute(...)`; this performs I/O while a global lock is held, which can 
cause contention or deadlocks. Fetch and validate the mapping under the lock, 
release the lock before executing the CANCEL command, then reacquire the lock 
to remove the mapping (use a safe removal like `pop`). [race condition]
   
   **Severity Level:** Minor ⚠️
   



##########
tests/unit_tests/db_engine_specs/test_cancel_query_validation.py:
##########
@@ -0,0 +1,365 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Tests for cancel_query_id validation to prevent SQL/command injection.
+
+These tests verify that the validate_cancel_query_id method properly
+rejects malicious input that could be used for SQL injection attacks.
+"""
+
+from unittest.mock import Mock, patch
+
+import pytest
+
+
+class TestValidateCancelQueryId:
+    """Tests for BaseEngineSpec.validate_cancel_query_id"""
+
+    def test_validate_cancel_query_id_valid_numeric(self) -> None:
+        """Test that valid numeric IDs pass validation"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        assert BaseEngineSpec.validate_cancel_query_id("12345") is True
+        assert BaseEngineSpec.validate_cancel_query_id("1") is True
+        assert BaseEngineSpec.validate_cancel_query_id("999999999") is True
+
+    def test_validate_cancel_query_id_invalid_sql_injection(self) -> None:
+        """Test that SQL injection payloads are rejected"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        # Common SQL injection payloads
+        assert BaseEngineSpec.validate_cancel_query_id("1; DROP TABLE users; 
--") is False
+        assert BaseEngineSpec.validate_cancel_query_id("1' OR '1'='1") is False
+        assert BaseEngineSpec.validate_cancel_query_id("1 UNION SELECT * FROM 
users") is False
+        assert BaseEngineSpec.validate_cancel_query_id("1; DELETE FROM 
users;") is False
+
+    def test_validate_cancel_query_id_invalid_special_chars(self) -> None:
+        """Test that special characters are rejected"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        assert BaseEngineSpec.validate_cancel_query_id("123&admin=true") is 
False
+        assert BaseEngineSpec.validate_cancel_query_id("123#fragment") is False
+        assert BaseEngineSpec.validate_cancel_query_id("123\n456") is False
+        assert BaseEngineSpec.validate_cancel_query_id("123\r\nHost: evil") is 
False
+
+    def test_validate_cancel_query_id_none(self) -> None:
+        """Test that None is rejected"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        assert BaseEngineSpec.validate_cancel_query_id(None) is False
+
+    def test_validate_cancel_query_id_empty_string(self) -> None:
+        """Test that empty string is rejected"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        assert BaseEngineSpec.validate_cancel_query_id("") is False
+
+    def test_validate_cancel_query_id_custom_pattern(self) -> None:
+        """Test custom regex patterns"""
+        from superset.db_engine_specs.base import BaseEngineSpec
+
+        # Hex pattern (for Impala)
+        assert BaseEngineSpec.validate_cancel_query_id(
+            "abc123:def456", r"^[a-f0-9]+:[a-f0-9]+$"
+        ) is True
+        assert BaseEngineSpec.validate_cancel_query_id(
+            "invalid:pattern!", r"^[a-f0-9]+:[a-f0-9]+$"
+        ) is False
+
+        # Alphanumeric with underscores (for Trino)
+        assert BaseEngineSpec.validate_cancel_query_id(
+            "20240101_123456_00001_abcde", r"^[a-zA-Z0-9_]+$"
+        ) is True
+        assert BaseEngineSpec.validate_cancel_query_id(
+            "20240101-123456", r"^[a-zA-Z0-9_]+$"
+        ) is False
+
+
+class TestMySQLCancelQueryValidation:
+    """Tests for MySQL cancel_query input validation"""
+
+    @patch("sqlalchemy.engine.Engine.connect")
+    def test_cancel_query_valid_id(self, engine_mock: Mock) -> None:
+        """Test that valid MySQL connection ID works"""
+        from superset.db_engine_specs.mysql import MySQLEngineSpec
+        from superset.models.sql_lab import Query
+
+        query = Query()
+        cursor_mock = engine_mock.return_value.__enter__.return_value

Review Comment:
   **Suggestion:** Tests patching "sqlalchemy.engine.Engine.connect" but then 
derive a cursor via the patch's return value; this is unnecessary and fragile 
for unit tests that directly call engine-spec `cancel_query` implementations. 
Create an explicit `Mock` cursor instead and remove the patch 
decorator/parameter to avoid coupling tests to SQLAlchemy internals and to 
ensure the cursor contract is explicit. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
       def test_cancel_query_valid_id(self) -> None:
           """Test that valid MySQL connection ID works"""
           from superset.db_engine_specs.mysql import MySQLEngineSpec
           from superset.models.sql_lab import Query
   
           query = Query()
           cursor_mock = Mock()
           cursor_mock.execute = Mock()
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current test unnecessarily ties itself to SQLAlchemy internals by 
patching
   Engine.connect only to extract a cursor mock. Replacing that with an explicit
   Mock cursor makes the test clearer and less brittle without changing 
behavior.
   The improved code is executable and simplifies intent; it's a valid test 
robustness improvement.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/db_engine_specs/test_cancel_query_validation.py
   **Line:** 96:103
   **Comment:**
        *Possible Bug: Tests patching "sqlalchemy.engine.Engine.connect" but 
then derive a cursor via the patch's return value; this is unnecessary and 
fragile for unit tests that directly call engine-spec `cancel_query` 
implementations. Create an explicit `Mock` cursor instead and remove the patch 
decorator/parameter to avoid coupling tests to SQLAlchemy internals and to 
ensure the cursor contract is explicit.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/redshift.py:
##########
@@ -173,6 +173,11 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         :param cancel_query_id: Redshift PID
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent SQL injection
+        # Redshift pg_backend_pid() returns an integer
+        if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):
+            return False

Review Comment:
   **Suggestion:** Early return on invalid `cancel_query_id` leaks the provided 
`cursor` because the function returns before the cursor is closed; ensure the 
cursor is closed (or at least attempted to be closed) before returning False. 
[resource leak]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               try:
                   if cursor is not None:
                       cursor.close()
               except Exception:
                   # best effort close; swallow to preserve original behavior 
of returning False
                   pass
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The early return skips the same cursor cleanup that happens on the success 
path (cursor.close() after execute). Given the docstring says the function 
receives a "New cursor instance", the function should be responsible for 
closing it. The proposed change is safe (best-effort close, swallow exceptions) 
and prevents a potential resource leak without changing behavior otherwise.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/redshift.py
   **Line:** 179:179
   **Comment:**
        *Resource Leak: Early return on invalid `cancel_query_id` leaks the 
provided `cursor` because the function returns before the cursor is closed; 
ensure the cursor is closed (or at least attempted to be closed) before 
returning False.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/singlestore.py:
##########
@@ -541,6 +541,11 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         :param cancel_query_id: SingleStore connection ID and aggregator ID
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent SQL injection
+        # SingleStore format: "CONNECTION_ID AGGREGATOR_ID" (two 
space-separated integers)
+        if not cls.validate_cancel_query_id(cancel_query_id, 
r"^\d+(\s+\d+)?$"):
+            return False

Review Comment:
   **Suggestion:** When validation fails the method returns False silently; log 
a warning with the invalid `cancel_query_id` to aid debugging and alert 
operators rather than failing quietly. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
               logger.warning("cancel_query called with invalid 
cancel_query_id: %r", cancel_query_id)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Returning False silently on validation failure makes debugging harder. 
Emitting a logger.warning with the invalid value (non-sensitive) improves 
observability and helps operators triage issues; this is a non-invasive, useful 
change.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/singlestore.py
   **Line:** 547:547
   **Comment:**
        *Possible Bug: When validation fails the method returns False silently; 
log a warning with the invalid `cancel_query_id` to aid debugging and alert 
operators rather than failing quietly.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/postgres.py:
##########
@@ -473,6 +473,11 @@ def cancel_query(cls, cursor: Any, query: Query, 
cancel_query_id: str) -> bool:
         :param cancel_query_id: Postgres PID
         :return: True if query cancelled successfully, False otherwise
         """
+        # Validate cancel_query_id to prevent SQL injection
+        # PostgreSQL pg_backend_pid() returns an integer
+        if not cls.validate_cancel_query_id(cancel_query_id, r"^\d+$"):

Review Comment:
   **Suggestion:** `cancel_query_id` may not be an integer type; converting it 
to `int` when executing the parameterized query can raise ValueError if the 
value isn't numeric — validate and convert to an integer first and only call 
execute if conversion succeeds. [type error]
   
   **Severity Level:** Minor ⚠️
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to