codeant-ai-for-open-source[bot] commented on code in PR #36422:
URL: https://github.com/apache/superset/pull/36422#discussion_r2589099886


##########
superset/tags/filters.py:
##########
@@ -68,7 +68,7 @@ def apply(self, query: Query, value: Any) -> Query:
             .join(self.model.tags)
             .filter(Tag.name.ilike(ilike_value))
         )
-        return query.filter(self.model.id.in_(tags_query))  # type: 
ignore[union-attr]
+        return query.filter(self.model.id.in_(tags_query))

Review Comment:
   **Suggestion:** Using a SQLAlchemy Query object directly inside `.in_()` can 
raise "Select object is not selectable" or similar errors on some SQLAlchemy 
versions; convert the query to a selectable (subquery) before using it in 
`in_()`. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           return query.filter(self.model.id.in_(tags_query.subquery()))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current code uses a SQLAlchemy Query object inside .in_(), which can 
raise
   "Select object is not selectable" on newer SQLAlchemy versions. Converting 
the
   query to a selectable (e.g. .subquery() or .statement) fixes that 
compatibility
   issue and produces a valid SQL expression.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tags/filters.py
   **Line:** 71:71
   **Comment:**
        *Possible Bug: Using a SQLAlchemy Query object directly inside `.in_()` 
can raise "Select object is not selectable" or similar errors on some 
SQLAlchemy versions; convert the query to a selectable (subquery) before using 
it in `in_()`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/connectors/sqla/models.py:
##########
@@ -1787,7 +1787,7 @@ def query_datasources_by_permissions(  # pylint: 
disable=invalid-name
         #
         #   table.column IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1)
         filters = [
-            method.in_(perms)  # type: ignore[union-attr]
+            method.in_(perms)

Review Comment:
   **Suggestion:** If `perms` is a plain string (e.g., a single permission 
passed as "perm_string"), SQLAlchemy's `in_` will iterate the string and 
produce an IN clause over individual characters instead of the intended single 
value; wrap non-iterable-or-non-sequence single values in a list to ensure 
correct semantics. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               (
                   method.in_([perms])
                   if isinstance(perms, (str, bytes))
                   else method.in_(perms)
               )
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a reasonable defensive improvement: if a caller accidentally passes 
a plain string/bytes,
   SQLAlchemy will treat it as an iterable of characters and build an incorrect 
IN clause.
   The PR's typing suggests sets are expected, but runtime callers might pass a 
single string.
   Wrapping string-like values in a single-element list fixes a real footgun 
without changing
   intended behavior for proper iterables.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/connectors/sqla/models.py
   **Line:** 1790:1790
   **Comment:**
        *Logic Error: If `perms` is a plain string (e.g., a single permission 
passed as "perm_string"), SQLAlchemy's `in_` will iterate the string and 
produce an IN clause over individual characters instead of the intended single 
value; wrap non-iterable-or-non-sequence single values in a list to ensure 
correct semantics.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/security/create.py:
##########
@@ -44,7 +44,7 @@ def run(self) -> Any:
     def validate(self) -> None:
         roles = populate_roles(self._roles)
         tables = (
-            
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()  # 
type: ignore[attr-defined]
+            
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()

Review Comment:
   **Suggestion:** Some DB backends can produce invalid SQL for an empty IN 
list; short-circuit when `self._tables` is empty to avoid generating an IN () 
clause and return an empty result list immediately. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
               [] if not self._tables else 
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Short-circuiting when self._tables is empty to return [] avoids generating 
an empty IN clause on backends that mishandle it. SQLAlchemy often handles 
empty sequences, but this explicit guard is cheap and prevents subtle 
DB-specific failures.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/security/create.py
   **Line:** 47:47
   **Comment:**
        *Possible Bug: Some DB backends can produce invalid SQL for an empty IN 
list; short-circuit when `self._tables` is empty to avoid generating an IN () 
clause and return an empty result list immediately.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/tags/filters.py:
##########
@@ -90,4 +90,4 @@ def apply(self, query: Query, value: Any) -> Query:
             .join(self.model.tags)
             .filter(Tag.id == value)
         )
-        return query.filter(self.model.id.in_(tags_query))  # type: 
ignore[union-attr]
+        return query.filter(self.model.id.in_(tags_query))

Review Comment:
   **Suggestion:** For filtering by tag id, `IN(subquery)` is unnecessary and 
may be less efficient; use the relationship `.any()` with `Tag.id == value` to 
produce an EXISTS-style predicate and avoid creating an explicit subquery 
result. [performance]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           return query.filter(self.model.tags.any(Tag.id == value))
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Replacing the IN(subquery) with self.model.tags.any(Tag.id == value) yields 
an
   EXISTS-style predicate that is clearer and generally performs better; it's a
   safe semantic equivalent to the current filter.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/tags/filters.py
   **Line:** 93:93
   **Comment:**
        *Performance: For filtering by tag id, `IN(subquery)` is unnecessary 
and may be less efficient; use the relationship `.any()` with `Tag.id == value` 
to produce an EXISTS-style predicate and avoid creating an explicit subquery 
result.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/integration_tests/databases/api_tests.py:
##########
@@ -4126,6 +4126,74 @@ def test_validate_sql_endpoint_failure(self, 
get_validator_by_name):
         assert rv.status_code == 422
         assert "Kaboom!" in response["errors"][0]["message"]
 
+    @mock.patch.dict(
+        "superset.config.SQL_VALIDATORS_BY_ENGINE",
+        SQL_VALIDATORS_BY_ENGINE,
+        clear=True,
+    )
+    def test_validate_sql_with_jinja_templates(self):
+        """
+        Database API: validate SQL with Jinja templates
+        """
+        request_payload = {
+            "sql": (
+                "SELECT *\nFROM birth_names\nWHERE 1=1\n"
+                "{% if city_filter is defined %}\n"
+                "    AND city = '{{ city_filter }}'\n{% endif %}\n"
+                "LIMIT {{ limit | default(100) }}"
+            ),
+            "schema": None,
+            "template_params": {},
+        }
+
+        example_db = get_example_database()
+        if example_db.backend not in ("presto", "postgresql"):
+            pytest.skip("Only presto and PG are implemented")
+
+        self.login(ADMIN_USERNAME)
+        uri = f"api/v1/database/{example_db.id}/validate_sql/"
+        rv = self.client.post(uri, json=request_payload)
+        response = json.loads(rv.data.decode("utf-8"))
+        assert rv.status_code == 200
+        # Should not contain Jinja syntax errors like "syntax error at or near 
{"
+        for error in response["result"]:
+            assert "{" not in error["message"]
+            assert "%" not in error["message"]

Review Comment:
   **Suggestion:** The tests assert that no Jinja syntax remains by checking 
for any "{" or "%" characters, which is overly broad and may reject legitimate 
messages containing those characters (e.g., SQL with modulo operator or JSON 
snippets). Instead, check for Jinja-specific tokens "{{" and "{%" to avoid 
false positives. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           for error in response.get("result", []):
               message = error.get("message") if isinstance(error, dict) else 
str(error)
               # Ensure Jinja template constructs are not leaked in validator 
messages.
               assert "{{" not in message
               assert "{%" not in message
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current assertions are too coarse and can flag legitimate messages that 
contain a brace or percent sign unrelated to Jinja templates (e.g. JSON 
snippets or SQL with a modulo operator). The proposed change tightens the check 
to Jinja constructs ("{{" and "{%") and adds defensive accessors for keys, 
which makes the test both more accurate and more robust to varied validator 
message shapes. This is a functional improvement to the test, not merely 
stylistic.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/databases/api_tests.py
   **Line:** 4159:4161
   **Comment:**
        *Logic Error: The tests assert that no Jinja syntax remains by checking 
for any "{" or "%" characters, which is overly broad and may reject legitimate 
messages containing those characters (e.g., SQL with modulo operator or JSON 
snippets). Instead, check for Jinja-specific tokens "{{" and "{%" to avoid 
false positives.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/daos/base.py:
##########
@@ -173,7 +173,7 @@ class BaseDAO(CoreBaseDAO[T], Generic[T]):
 
     def __init_subclass__(cls) -> None:
         cls.model_cls = get_args(
-            cls.__orig_bases__[0]  # type: ignore  # pylint: disable=no-member
+            cls.__orig_bases__[0]  # pylint: disable=no-member
         )[0]

Review Comment:
   **Suggestion:** Accessing `cls.__orig_bases__[0]` unguarded can raise 
AttributeError (if `__orig_bases__` doesn't exist) or IndexError when classes 
aren't parameterized; guard the access and fall back to `cls.__bases__` and 
handle empty/get_args results so `__init_subclass__` doesn't crash at subclass 
creation. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           cls_bases = getattr(cls, "__orig_bases__", None) or cls.__bases__
           args = get_args(cls_bases[0]) if cls_bases else ()
           cls.model_cls = args[0] if args else None
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current code assumes __orig_bases__ exists and contains at least one 
entry with generic args. The proposed guard prevents AttributeError/IndexError 
and makes subclass initialization robust for non-generic or unexpected bases. 
It's a straightforward, correct hardening of the current fragile access.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/daos/base.py
   **Line:** 175:177
   **Comment:**
        *Possible Bug: Accessing `cls.__orig_bases__[0]` unguarded can raise 
AttributeError (if `__orig_bases__` doesn't exist) or IndexError when classes 
aren't parameterized; guard the access and fall back to `cls.__bases__` and 
handle empty/get_args results so `__init_subclass__` doesn't crash at subclass 
creation.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/database/validate_sql.py:
##########
@@ -62,6 +63,12 @@ def run(self) -> list[dict[str, Any]]:
         sql = self._properties["sql"]
         catalog = self._properties.get("catalog")
         schema = self._properties.get("schema")
+        template_params = self._properties.get("template_params", {})
+

Review Comment:
   **Suggestion:** Unpacking `template_params` with `**template_params` will 
raise a TypeError if `template_params` is not a dict or contains 
non-string/non-identifier keys (e.g. numbers, dashes). Validate and sanitize 
`template_params` (ensure it's a dict and filter keys to string identifiers) 
before unpacking to avoid runtime TypeError. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # Ensure template_params is a mapping we can safely unpack as 
keyword args
           if not isinstance(template_params, dict):
               template_params = {}
           # Filter out keys that can't be used as keyword argument names to 
avoid TypeError
           template_params = {
               k: v for k, v in template_params.items() if isinstance(k, str) 
and k.isidentifier()
           }
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The existing code unpacks template_params with ** which will raise a 
TypeError
   if template_params is not a dict or contains non-string / non-identifier 
keys.
   Adding a defensive check to ensure template_params is a dict prevents an
   unexpected runtime crash. The suggested filtering to only use string
   identifiers is a conservative choice; it does change semantics (drops keys
   that aren't valid kwarg names) so it must be done intentionally, but the
   core idea — validate the shape before using ** — is correct and fixes a
   real potential runtime error visible from the diff.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/database/validate_sql.py
   **Line:** 67:67
   **Comment:**
        *Type Error: Unpacking `template_params` with `**template_params` will 
raise a TypeError if `template_params` is not a dict or contains 
non-string/non-identifier keys (e.g. numbers, dashes). Validate and sanitize 
`template_params` (ensure it's a dict and filter keys to string identifiers) 
before unpacking to avoid runtime TypeError.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/security/update.py:
##########
@@ -51,7 +51,7 @@ def validate(self) -> None:
             raise RLSRuleNotFoundError()
         roles = populate_roles(self._roles)
         tables = (
-            
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()  # 
type: ignore[attr-defined]
+            
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
         )

Review Comment:
   **Suggestion:** Passing an empty list into SQLAlchemy's IN may behave 
unexpectedly or produce an unnecessary query; short-circuit when `self._tables` 
is empty to avoid querying and ensure the subsequent length check behaves 
deterministically. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           # If no table ids were provided, avoid querying the DB and set 
tables to empty list
           if not self._tables:
               tables = []
           else:
               tables = 
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Short-circuiting when self._tables is empty avoids an unnecessary DB 
roundtrip and makes intent explicit. It's a small, safe optimization that 
doesn't change semantics (query would return no rows anyway) and avoids any 
edge behavior some DB backends/sqlalchemy versions might have with empty IN 
lists.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/security/update.py
   **Line:** 53:55
   **Comment:**
        *Possible Bug: Passing an empty list into SQLAlchemy's IN may behave 
unexpectedly or produce an unnecessary query; short-circuit when `self._tables` 
is empty to avoid querying and ensure the subsequent length check behaves 
deterministically.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to