codeant-ai-for-open-source[bot] commented on code in PR #38601:
URL: https://github.com/apache/superset/pull/38601#discussion_r2924153618
##########
tests/integration_tests/security/row_level_security_tests.py:
##########
@@ -845,3 +845,47 @@ def test_dataset_id_can_be_string(self):
sql = dataset.get_query_str(self.query_obj)
assert re.search(RLS_ALICE_REGEX, sql)
+
+ @pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
+ @pytest.mark.usefixtures("load_energy_table_with_slice")
+ def test_global_rls_not_applied_to_virtual_dataset_inner_tables(self):
+ """
+ Global guest RLS (no dataset ID) should only be applied to the outer
+ query of a virtual dataset, not to individual inner tables. This
+ prevents SQL errors when the RLS column doesn't exist on all inner
+ tables (e.g., a virtual dataset JOINing tables where only one has the
+ filtered column).
+ """
+ birth_names = self.get_table(name="birth_names")
+
+ virtual_dataset = SqlaTable(
+ table_name="virtual_birth_names_rls_test",
+ database=birth_names.database,
+ schema=birth_names.schema,
+ sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608
Review Comment:
**Suggestion:** The new regression test does not actually create a virtual
dataset with multiple inner tables, so it cannot catch the original bug
scenario where a global guest filter is incorrectly pushed down to joined inner
tables that lack the filtered column. Build the virtual dataset with a JOIN so
the test exercises the intended failure mode. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Embedded guest JOIN regressions escape integration test coverage.
- ⚠️ CI may pass despite broken virtual-dataset RLS behavior.
```
</details>
```suggestion
sql=f"SELECT b.name, e.value FROM {birth_names.table_name} b
JOIN energy_usage e ON 1=1", # noqa: S608
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Run
`GuestTokenRowLevelSecurityTests.test_global_rls_not_applied_to_virtual_dataset_inner_tables`
in `tests/integration_tests/security/row_level_security_tests.py:851-891`;
it builds
virtual SQL at line 865 as `SELECT * FROM birth_names` (single inner table).
2. The test calls `virtual_dataset.get_query_str(query_obj)` at
`row_level_security_tests.py:885`, which flows into
`superset/models/helpers.py:get_query_str` (line 2186) and then virtual SQL
parsing in
`get_from_clause` (line 2029).
3. Inner-table RLS application loops `parsed_statement.tables` in
`superset/utils/rls.py:53-63`; with current test SQL there is only one table
(`birth_names`), so no multi-table pushdown case is exercised.
4. The documented failure mode needs joined inner tables with different
schemas;
`energy_usage` fixture columns are `source/target/value`
(`tests/integration_tests/fixtures/energy_dashboard.py:49`), so adding JOIN
coverage is
required to validate "column missing on some inner tables" regressions.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 865:865
**Comment:**
*Logic Error: The new regression test does not actually create a
virtual dataset with multiple inner tables, so it cannot catch the original bug
scenario where a global guest filter is incorrectly pushed down to joined inner
tables that lack the filtered column. Build the virtual dataset with a JOIN so
the test exercises the intended failure mode.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38601&comment_hash=7feef62c49d8f78bee472b3b2fcfc07d34c99302e6c3d87b3e325d7d8a93f25f&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38601&comment_hash=7feef62c49d8f78bee472b3b2fcfc07d34c99302e6c3d87b3e325d7d8a93f25f&reaction=dislike'>👎</a>
##########
tests/integration_tests/security/row_level_security_tests.py:
##########
@@ -845,3 +845,47 @@ def test_dataset_id_can_be_string(self):
sql = dataset.get_query_str(self.query_obj)
assert re.search(RLS_ALICE_REGEX, sql)
+
+ @pytest.mark.usefixtures("load_birth_names_dashboard_with_slices")
+ @pytest.mark.usefixtures("load_energy_table_with_slice")
+ def test_global_rls_not_applied_to_virtual_dataset_inner_tables(self):
+ """
+ Global guest RLS (no dataset ID) should only be applied to the outer
+ query of a virtual dataset, not to individual inner tables. This
+ prevents SQL errors when the RLS column doesn't exist on all inner
+ tables (e.g., a virtual dataset JOINing tables where only one has the
+ filtered column).
+ """
+ birth_names = self.get_table(name="birth_names")
+
+ virtual_dataset = SqlaTable(
+ table_name="virtual_birth_names_rls_test",
+ database=birth_names.database,
+ schema=birth_names.schema,
+ sql=f"SELECT * FROM {birth_names.table_name}", # noqa: S608
+ )
+ db.session.add(virtual_dataset)
+ db.session.commit()
+
+ try:
+ # Global RLS rule (no dataset ID) — should only apply to outer
+ # query, not to the inner birth_names table reference
+ g.user = self.guest_user_with_rls(rules=[{"clause": "name =
'Alice'"}])
+ query_obj = {
+ "groupby": [],
+ "metrics": None,
+ "filter": [],
+ "is_timeseries": False,
+ "columns": ["name"],
+ "granularity": None,
+ "from_dttm": None,
+ "to_dttm": None,
+ "extras": {},
+ }
+ sql = virtual_dataset.get_query_str(query_obj)
+
+ # Guest RLS should appear in the outer WHERE clause
+ assert re.search(RLS_ALICE_REGEX, sql)
Review Comment:
**Suggestion:** The assertion only checks that the guest RLS clause exists
somewhere in the SQL, which still passes if the clause is applied both to inner
tables and the outer query. Assert that the clause appears exactly once to
verify the fix truly prevents duplicate inner-table application. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Duplicate guest RLS injection not detected by assertion.
- ⚠️ Regression can ship despite failing outer-only guarantee.
```
</details>
```suggestion
# Guest RLS should appear exactly once (outer query only)
assert len(re.findall(RLS_ALICE_REGEX, sql)) == 1
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In
`tests/integration_tests/security/row_level_security_tests.py:887-888`, the
check
uses `re.search`, which validates only "at least one match".
2. Query generation path for this test is `virtual_dataset.get_query_str()`
(`row_level_security_tests.py:885`) →
`superset/models/helpers.py:get_sqla_query` where
outer RLS is appended at line 3215.
3. Guest RLS clauses are produced in
`superset/connectors/sqla/models.py:770-772`; if
guest RLS is mistakenly also included for inner virtual-dataset tables, SQL
can contain
the same clause multiple times (inner + outer).
4. With duplicated predicates, `re.search(RLS_ALICE_REGEX, sql)` still
passes;
`len(re.findall(...)) == 1` is needed to assert the fix intent ("outer query
only") and
catch duplicate injection regressions.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/security/row_level_security_tests.py
**Line:** 887:888
**Comment:**
*Logic Error: The assertion only checks that the guest RLS clause
exists somewhere in the SQL, which still passes if the clause is applied both
to inner tables and the outer query. Assert that the clause appears exactly
once to verify the fix truly prevents duplicate inner-table application.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38601&comment_hash=b28796140ece76cc56b2b796e8a1b1c3c831de7f00971ab253415af072df5323&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38601&comment_hash=b28796140ece76cc56b2b796e8a1b1c3c831de7f00971ab253415af072df5323&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]