codeant-ai-for-open-source[bot] commented on code in PR #40421:
URL: https://github.com/apache/superset/pull/40421#discussion_r3324712120
##########
superset/config.py:
##########
@@ -1775,6 +1775,19 @@ def engine_context_manager( # pylint:
disable=unused-argument
"pg_read_file",
"pg_ls_dir",
"pg_read_binary_file",
+ # PostgreSQL large-object functions: writers can plant arbitrary
+ # bytes on the server filesystem (lo_export, lo_from_bytea, lowrite,
+ # lo_put, lo_create, lo_import) and readers can pull bytes back out
+ # (lo_get, loread). Defense-in-depth on top of is_mutating()'s
+ # function-name check.
+ "lo_from_bytea",
+ "lo_export",
+ "lo_import",
+ "lo_put",
+ "lo_create",
+ "lowrite",
+ "lo_get",
+ "loread",
Review Comment:
**Suggestion:** The PostgreSQL disallowed function list is incomplete for
large-object mutation paths: `lo_unlink` is not included, so environments
relying on `DISALLOWED_SQL_FUNCTIONS` defense-in-depth can still execute
destructive large-object operations via function calls. Add `lo_unlink` to the
PostgreSQL set. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ SQL Lab allows lo_unlink despite function denylist config.
- ⚠️ Executor-based function checks also miss lo_unlink.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Inspect the PostgreSQL disallowed-function configuration in
`superset/config.py:20-65`,
where `DISALLOWED_SQL_FUNCTIONS["postgresql"]` includes `lo_from_bytea`,
`lo_export`,
`lo_import`, `lo_put`, `lo_create`, `lowrite`, `lo_get`, and `loread` but
does NOT include
`lo_unlink`.
2. Note how SQL Lab enforces this denylist in `superset/sql_lab.py:406-414`:
it pulls
`disallowed_functions =
app.config["DISALLOWED_SQL_FUNCTIONS"].get(db_engine_spec.engine,
set())` and calls
`parsed_script.check_functions_present(disallowed_functions)`, raising
`SupersetDisallowedSQLFunctionException(disallowed_functions)` only when a
listed function
appears.
3. Observe the function detection implementation in
`superset/sql/parse.py:345-388`
(`SQLStatement.check_functions_present`), which walks the AST
(`self._parsed.find_all(exp.Func)`) and collects SQL function names like
`LO_UNLINK` into
the `present` set when they appear in a query.
4. Run a query like `SELECT lo_unlink(12345);` in SQL Lab against a
PostgreSQL database
whose engine name is `"postgresql"`: `check_functions_present` will see
`LO_UNLINK` in the
AST, but because `DISALLOWED_SQL_FUNCTIONS["postgresql"]` does not contain
`"lo_unlink"`,
the method returns `False`, no `SupersetDisallowedSQLFunctionException` is
raised, and the
destructive `lo_unlink` operation is allowed despite the intended
defense-in-depth
denylist.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=a60c0e2a145045ff87e7478166ba15b5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=a60c0e2a145045ff87e7478166ba15b5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/config.py
**Line:** 1778:1790
**Comment:**
*Security: The PostgreSQL disallowed function list is incomplete for
large-object mutation paths: `lo_unlink` is not included, so environments
relying on `DISALLOWED_SQL_FUNCTIONS` defense-in-depth can still execute
destructive large-object operations via function calls. Add `lo_unlink` to the
PostgreSQL set.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=39d7419dee77a2857df9e0fa077efeb2f924fbebbac7c31a13a7fd0c6f6fa1b7&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=39d7419dee77a2857df9e0fa077efeb2f924fbebbac7c31a13a7fd0c6f6fa1b7&reaction=dislike'>👎</a>
##########
superset/sql/parse.py:
##########
@@ -562,6 +562,62 @@ class SQLStatement(BaseSQLStatement[exp.Expression]):
This class is used for all engines with dialects that can be parsed using
sqlglot.
"""
+ # Function names that mutate server-side state but appear in the AST as
+ # plain function calls inside a non-mutating wrapper. Used by
+ # ``is_mutating()`` to classify e.g. PostgreSQL large-object writers.
+ # Names are uppercased for comparison.
+ _MUTATING_FUNCTION_NAMES: frozenset[str] = frozenset(
+ {
+ "LO_FROM_BYTEA",
+ "LO_EXPORT",
+ "LO_IMPORT",
+ "LO_PUT",
+ "LO_CREATE",
+ "LOWRITE",
+ # PostgreSQL sequence mutators. `SELECT setval('seq', N)` looks
+ # like a read but changes sequence state for every subsequent
+ # `nextval` caller.
+ "SETVAL",
+ }
+ )
Review Comment:
**Suggestion:** The mutating-function denylist is incomplete: `lo_unlink` is
a PostgreSQL large-object writer (it deletes large objects) but is missing from
`_MUTATING_FUNCTION_NAMES`. A query like `SELECT lo_unlink(...)` will not be
classified as mutating and can bypass the read-only `allow_dml=False` guard.
Add `LO_UNLINK` to the mutating function set. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Read-only databases can execute lo_unlink mutations.
- ❌ DML guard in SQL Lab is bypassed for lo_unlink.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Inspect the mutating-function set in `superset/sql/parse.py:569-582`,
where
`SQLStatement._MUTATING_FUNCTION_NAMES` includes `LO_FROM_BYTEA`,
`LO_EXPORT`,
`LO_IMPORT`, `LO_PUT`, `LO_CREATE`, `LOWRITE`, and `SETVAL`, but omits
`LO_UNLINK`, even
though `lo_unlink` deletes large objects in PostgreSQL.
2. Follow the mutation classification logic in `SQLStatement.is_mutating` at
`superset/sql/parse.py:734-217`: after checking AST node types, it walks
`self._parsed.find_all(exp.Func)` and returns `True` when `name in
self._MUTATING_FUNCTION_NAMES`; because `LO_UNLINK` is not in that set,
`SELECT
lo_unlink(...)` is treated as non-mutating.
3. See how script-level mutation is derived in `SQLScript.has_mutation` at
`superset/sql/parse.py:49-55`, which returns `any(statement.is_mutating()
for statement in
self.statements)`, and how SQL Lab enforces read-only mode in
`superset/sql_lab.py:429-431` via `if parsed_script.has_mutation() and not
database.allow_dml: raise SupersetDMLNotAllowedException()`.
4. Configure a PostgreSQL database in Superset with `database.allow_dml` set
to `False`,
then execute `SELECT lo_unlink(12345);` in SQL Lab: `SQLScript` parses the
script using
the `"postgresql"` engine, `SQLStatement.is_mutating` returns `False`
because `LO_UNLINK`
is not in `_MUTATING_FUNCTION_NAMES`, `parsed_script.has_mutation()` is
`False`, the
`SupersetDMLNotAllowedException` guard at `superset/sql_lab.py:429-431` is
not triggered,
and the supposedly read-only connection executes a mutating `lo_unlink`
operation.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=855ef03db5184636a882edaa9eaddf50&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=855ef03db5184636a882edaa9eaddf50&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 569:582
**Comment:**
*Security: The mutating-function denylist is incomplete: `lo_unlink` is
a PostgreSQL large-object writer (it deletes large objects) but is missing from
`_MUTATING_FUNCTION_NAMES`. A query like `SELECT lo_unlink(...)` will not be
classified as mutating and can bypass the read-only `allow_dml=False` guard.
Add `LO_UNLINK` to the mutating function set.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=ff50aead1334d7d8402317988ed4af0a9b15b8a015546b269dfd563daf4eb8d8&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=ff50aead1334d7d8402317988ed4af0a9b15b8a015546b269dfd563daf4eb8d8&reaction=dislike'>👎</a>
##########
superset/sql/parse.py:
##########
@@ -829,17 +917,45 @@ def check_functions_present(self, functions: set[str]) ->
bool:
else:
present.add(function.name.upper())
+ # MySQL `@@<name>` syntax (also Oracle/SQL-Server `@@name`) parses as
+ # `exp.SessionParameter`, which is *not* a subclass of `exp.Func`, so
+ # the walk above misses it. Include those names so denylist entries
+ # like `version` or `hostname` match `SELECT @@version`.
+ for param in self._parsed.find_all(exp.SessionParameter):
+ present.add(param.name.upper())
+
return any(function.upper() in present for function in functions)
def check_tables_present(self, tables: set[str]) -> bool:
"""
Check if any of the given tables are present in the statement.
+ Denylist entries may be bare (``pg_stat_activity``) or
+ schema-qualified (``information_schema.tables``). Bare entries
+ match by table name regardless of schema; qualified entries
+ require the schema to match too. This lets us block all access
+ to ``information_schema`` without also blocking any
+ user-authored table that happens to be named ``tables``.
+
:param tables: Set of table names to check for (case-insensitive)
- :return: True if any of the tables are present
- """
- present = {table.table.lower() for table in self.tables}
- return any(table.lower() in present for table in tables)
+ :return: True if any of the given tables is referenced
+ """
+ present_bare: set[str] = set()
+ present_qualified: set[str] = set()
+ for t in self.tables:
+ bare = t.table.lower()
+ present_bare.add(bare)
+ if t.schema:
+ present_qualified.add(f"{t.schema.lower()}.{bare}")
+ for entry in tables:
+ needle = entry.lower()
+ if "." in needle:
+ if needle in present_qualified:
+ return True
+ else:
+ if needle in present_bare:
+ return True
+ return False
Review Comment:
**Suggestion:** This adds schema-qualified table matching, but the SQL Lab
caller that builds `found_tables` still only compares bare table names. As a
result, when a qualified denylist entry (like `information_schema.tables`) is
matched, the exception often reports the entire denylist instead of the actual
referenced table. Update the caller contract to return/report matched qualified
entries, not only bare names. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ SQL Lab error lists all disallowed tables, not matched ones.
- ⚠️ Users cannot see which system view triggered rejection.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Review the PostgreSQL table denylist in `superset/config.py:127-176`,
where
`DISALLOWED_SQL_TABLES["postgresql"]` now includes schema-qualified entries
such as
`"information_schema.tables"`, `"information_schema.columns"`, and others in
addition to
bare names like `"pg_stat_activity"`.
2. Examine the updated `SQLStatement.check_tables_present` implementation in
`superset/sql/parse.py:390-419`, which builds `present_bare` (table names)
and
`present_qualified` (e.g. `"information_schema.tables"`) and returns `True`
whenever a
denylist entry, including schema-qualified ones, is present in the parsed
script.
3. Look at the SQL Lab enforcement in `superset/sql_lab.py:415-428`: after
`parsed_script.check_tables_present(disallowed_tables)` returns `True`, it
recomputes
`found_tables` using only bare names via `present = {table.table.lower() for
table in
statement.tables}` and checks `if table.lower() in present:` for each
denylist entry; for
a denylist entry like `"information_schema.tables"`, `table.table.lower()`
is just
`"tables"`, so `"information_schema.tables"` is never added to
`found_tables`.
4. Run a query such as `SELECT * FROM information_schema.tables;` in SQL Lab
against a
PostgreSQL database: `check_tables_present` returns `True` thanks to the
schema-qualified
matching in `present_qualified`, but `found_tables` remains empty due to the
bare-name
comparison, so `SupersetDisallowedSQLTableException` at
`superset/sql_lab.py:427` is
raised with `found_tables or disallowed_tables`, causing the error to report
the entire
`DISALLOWED_SQL_TABLES["postgresql"]` set instead of accurately listing only
`information_schema.tables`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=38879d183bea49dda87d4bffad7c574e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=38879d183bea49dda87d4bffad7c574e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 945:958
**Comment:**
*Api Mismatch: This adds schema-qualified table matching, but the SQL
Lab caller that builds `found_tables` still only compares bare table names. As
a result, when a qualified denylist entry (like `information_schema.tables`) is
matched, the exception often reports the entire denylist instead of the actual
referenced table. Update the caller contract to return/report matched qualified
entries, not only bare names.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=09e83ffada428dbc51b22d247e7235e21072473d3c38f14abcf1102214166948&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=09e83ffada428dbc51b22d247e7235e21072473d3c38f14abcf1102214166948&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]