codeant-ai-for-open-source[bot] commented on code in PR #36422:
URL: https://github.com/apache/superset/pull/36422#discussion_r2589099886
##########
superset/tags/filters.py:
##########
@@ -68,7 +68,7 @@ def apply(self, query: Query, value: Any) -> Query:
.join(self.model.tags)
.filter(Tag.name.ilike(ilike_value))
)
- return query.filter(self.model.id.in_(tags_query)) # type:
ignore[union-attr]
+ return query.filter(self.model.id.in_(tags_query))
Review Comment:
**Suggestion:** Using a SQLAlchemy Query object directly inside `.in_()` can
raise "Select object is not selectable" or similar errors on some SQLAlchemy
versions; convert the query to a selectable (subquery) before using it in
`in_()`. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
return query.filter(self.model.id.in_(tags_query.subquery()))
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The current code uses a SQLAlchemy Query object inside .in_(), which can
raise
"Select object is not selectable" on newer SQLAlchemy versions. Converting
the
query to a selectable (e.g. .subquery() or .statement) fixes that
compatibility
issue and produces a valid SQL expression.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/tags/filters.py
**Line:** 71:71
**Comment:**
*Possible Bug: Using a SQLAlchemy Query object directly inside `.in_()`
can raise "Select object is not selectable" or similar errors on some
SQLAlchemy versions; convert the query to a selectable (subquery) before using
it in `in_()`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/connectors/sqla/models.py:
##########
@@ -1787,7 +1787,7 @@ def query_datasources_by_permissions( # pylint:
disable=invalid-name
#
# table.column IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1)
filters = [
- method.in_(perms) # type: ignore[union-attr]
+ method.in_(perms)
Review Comment:
**Suggestion:** If `perms` is a plain string (e.g., a single permission
passed as "perm_string"), SQLAlchemy's `in_` will iterate the string and
produce an IN clause over individual characters instead of the intended single
value; wrap non-iterable-or-non-sequence single values in a list to ensure
correct semantics. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
(
method.in_([perms])
if isinstance(perms, (str, bytes))
else method.in_(perms)
)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a reasonable defensive improvement: if a caller accidentally passes
a plain string/bytes,
SQLAlchemy will treat it as an iterable of characters and build an incorrect
IN clause.
The PR's typing suggests sets are expected, but runtime callers might pass a
single string.
Wrapping string-like values in a single-element list fixes a real footgun
without changing
intended behavior for proper iterables.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/connectors/sqla/models.py
**Line:** 1790:1790
**Comment:**
*Logic Error: If `perms` is a plain string (e.g., a single permission
passed as "perm_string"), SQLAlchemy's `in_` will iterate the string and
produce an IN clause over individual characters instead of the intended single
value; wrap non-iterable-or-non-sequence single values in a list to ensure
correct semantics.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/security/create.py:
##########
@@ -44,7 +44,7 @@ def run(self) -> Any:
def validate(self) -> None:
roles = populate_roles(self._roles)
tables = (
-
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all() #
type: ignore[attr-defined]
+
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
Review Comment:
**Suggestion:** Some DB backends can produce invalid SQL for an empty IN
list; short-circuit when `self._tables` is empty to avoid generating an IN ()
clause and return an empty result list immediately. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
[] if not self._tables else
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Short-circuiting when self._tables is empty to return [] avoids generating
an empty IN clause on backends that mishandle it. SQLAlchemy often handles
empty sequences, but this explicit guard is cheap and prevents subtle
DB-specific failures.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/security/create.py
**Line:** 47:47
**Comment:**
*Possible Bug: Some DB backends can produce invalid SQL for an empty IN
list; short-circuit when `self._tables` is empty to avoid generating an IN ()
clause and return an empty result list immediately.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/tags/filters.py:
##########
@@ -90,4 +90,4 @@ def apply(self, query: Query, value: Any) -> Query:
.join(self.model.tags)
.filter(Tag.id == value)
)
- return query.filter(self.model.id.in_(tags_query)) # type:
ignore[union-attr]
+ return query.filter(self.model.id.in_(tags_query))
Review Comment:
**Suggestion:** For filtering by tag id, `IN(subquery)` is unnecessary and
may be less efficient; use the relationship `.any()` with `Tag.id == value` to
produce an EXISTS-style predicate and avoid creating an explicit subquery
result. [performance]
**Severity Level:** Minor ⚠️
```suggestion
return query.filter(self.model.tags.any(Tag.id == value))
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Replacing the IN(subquery) with self.model.tags.any(Tag.id == value) yields
an
EXISTS-style predicate that is clearer and generally performs better; it's a
safe semantic equivalent to the current filter.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/tags/filters.py
**Line:** 93:93
**Comment:**
*Performance: For filtering by tag id, `IN(subquery)` is unnecessary
and may be less efficient; use the relationship `.any()` with `Tag.id == value`
to produce an EXISTS-style predicate and avoid creating an explicit subquery
result.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/integration_tests/databases/api_tests.py:
##########
@@ -4126,6 +4126,74 @@ def test_validate_sql_endpoint_failure(self,
get_validator_by_name):
assert rv.status_code == 422
assert "Kaboom!" in response["errors"][0]["message"]
+ @mock.patch.dict(
+ "superset.config.SQL_VALIDATORS_BY_ENGINE",
+ SQL_VALIDATORS_BY_ENGINE,
+ clear=True,
+ )
+ def test_validate_sql_with_jinja_templates(self):
+ """
+ Database API: validate SQL with Jinja templates
+ """
+ request_payload = {
+ "sql": (
+ "SELECT *\nFROM birth_names\nWHERE 1=1\n"
+ "{% if city_filter is defined %}\n"
+ " AND city = '{{ city_filter }}'\n{% endif %}\n"
+ "LIMIT {{ limit | default(100) }}"
+ ),
+ "schema": None,
+ "template_params": {},
+ }
+
+ example_db = get_example_database()
+ if example_db.backend not in ("presto", "postgresql"):
+ pytest.skip("Only presto and PG are implemented")
+
+ self.login(ADMIN_USERNAME)
+ uri = f"api/v1/database/{example_db.id}/validate_sql/"
+ rv = self.client.post(uri, json=request_payload)
+ response = json.loads(rv.data.decode("utf-8"))
+ assert rv.status_code == 200
+ # Should not contain Jinja syntax errors like "syntax error at or near
{"
+ for error in response["result"]:
+ assert "{" not in error["message"]
+ assert "%" not in error["message"]
Review Comment:
**Suggestion:** The tests assert that no Jinja syntax remains by checking
for any "{" or "%" characters, which is overly broad and may reject legitimate
messages containing those characters (e.g., SQL with modulo operator or JSON
snippets). Instead, check for Jinja-specific tokens "{{" and "{%" to avoid
false positives. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
for error in response.get("result", []):
message = error.get("message") if isinstance(error, dict) else
str(error)
# Ensure Jinja template constructs are not leaked in validator
messages.
assert "{{" not in message
assert "{%" not in message
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The current assertions are too coarse and can flag legitimate messages that
contain a brace or percent sign unrelated to Jinja templates (e.g. JSON
snippets or SQL with a modulo operator). The proposed change tightens the check
to Jinja constructs ("{{" and "{%") and adds defensive accessors for keys,
which makes the test both more accurate and more robust to varied validator
message shapes. This is a functional improvement to the test, not merely
stylistic.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/databases/api_tests.py
**Line:** 4159:4161
**Comment:**
*Logic Error: The tests assert that no Jinja syntax remains by checking
for any "{" or "%" characters, which is overly broad and may reject legitimate
messages containing those characters (e.g., SQL with modulo operator or JSON
snippets). Instead, check for Jinja-specific tokens "{{" and "{%" to avoid
false positives.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/daos/base.py:
##########
@@ -173,7 +173,7 @@ class BaseDAO(CoreBaseDAO[T], Generic[T]):
def __init_subclass__(cls) -> None:
cls.model_cls = get_args(
- cls.__orig_bases__[0] # type: ignore # pylint: disable=no-member
+ cls.__orig_bases__[0] # pylint: disable=no-member
)[0]
Review Comment:
**Suggestion:** Accessing `cls.__orig_bases__[0]` unguarded can raise
AttributeError (if `__orig_bases__` doesn't exist) or IndexError when classes
aren't parameterized; guard the access and fall back to `cls.__bases__` and
handle empty/get_args results so `__init_subclass__` doesn't crash at subclass
creation. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
cls_bases = getattr(cls, "__orig_bases__", None) or cls.__bases__
args = get_args(cls_bases[0]) if cls_bases else ()
cls.model_cls = args[0] if args else None
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The current code assumes __orig_bases__ exists and contains at least one
entry with generic args. The proposed guard prevents AttributeError/IndexError
and makes subclass initialization robust for non-generic or unexpected bases.
It's a straightforward, correct hardening of the current fragile access.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/daos/base.py
**Line:** 175:177
**Comment:**
*Possible Bug: Accessing `cls.__orig_bases__[0]` unguarded can raise
AttributeError (if `__orig_bases__` doesn't exist) or IndexError when classes
aren't parameterized; guard the access and fall back to `cls.__bases__` and
handle empty/get_args results so `__init_subclass__` doesn't crash at subclass
creation.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/database/validate_sql.py:
##########
@@ -62,6 +63,12 @@ def run(self) -> list[dict[str, Any]]:
sql = self._properties["sql"]
catalog = self._properties.get("catalog")
schema = self._properties.get("schema")
+ template_params = self._properties.get("template_params", {})
+
Review Comment:
**Suggestion:** Unpacking `template_params` with `**template_params` will
raise a TypeError if `template_params` is not a dict or contains
non-string/non-identifier keys (e.g. numbers, dashes). Validate and sanitize
`template_params` (ensure it's a dict and filter keys to string identifiers)
before unpacking to avoid runtime TypeError. [type error]
**Severity Level:** Minor ⚠️
```suggestion
# Ensure template_params is a mapping we can safely unpack as
keyword args
if not isinstance(template_params, dict):
template_params = {}
# Filter out keys that can't be used as keyword argument names to
avoid TypeError
template_params = {
k: v for k, v in template_params.items() if isinstance(k, str)
and k.isidentifier()
}
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The existing code unpacks template_params with ** which will raise a
TypeError
if template_params is not a dict or contains non-string / non-identifier
keys.
Adding a defensive check to ensure template_params is a dict prevents an
unexpected runtime crash. The suggested filtering to only use string
identifiers is a conservative choice; it does change semantics (drops keys
that aren't valid kwarg names) so it must be done intentionally, but the
core idea — validate the shape before using ** — is correct and fixes a
real potential runtime error visible from the diff.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/database/validate_sql.py
**Line:** 67:67
**Comment:**
*Type Error: Unpacking `template_params` with `**template_params` will
raise a TypeError if `template_params` is not a dict or contains
non-string/non-identifier keys (e.g. numbers, dashes). Validate and sanitize
`template_params` (ensure it's a dict and filter keys to string identifiers)
before unpacking to avoid runtime TypeError.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/security/update.py:
##########
@@ -51,7 +51,7 @@ def validate(self) -> None:
raise RLSRuleNotFoundError()
roles = populate_roles(self._roles)
tables = (
-
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all() #
type: ignore[attr-defined]
+
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
)
Review Comment:
**Suggestion:** Passing an empty list into SQLAlchemy's IN may behave
unexpectedly or produce an unnecessary query; short-circuit when `self._tables`
is empty to avoid querying and ensure the subsequent length check behaves
deterministically. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
# If no table ids were provided, avoid querying the DB and set
tables to empty list
if not self._tables:
tables = []
else:
tables =
db.session.query(SqlaTable).filter(SqlaTable.id.in_(self._tables)).all()
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Short-circuiting when self._tables is empty avoids an unnecessary DB
roundtrip and makes intent explicit. It's a small, safe optimization that
doesn't change semantics (query would return no rows anyway) and avoids any
edge behavior some DB backends/sqlalchemy versions might have with empty IN
lists.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/security/update.py
**Line:** 53:55
**Comment:**
*Possible Bug: Passing an empty list into SQLAlchemy's IN may behave
unexpectedly or produce an unnecessary query; short-circuit when `self._tables`
is empty to avoid querying and ensure the subsequent length check behaves
deterministically.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]