betodealmeida opened a new pull request, #38683:
URL: https://github.com/apache/superset/pull/38683

   <!---
   Please write the PR title following the conventions at 
https://www.conventionalcommits.org/en/v1.0.0/
   Example:
   fix(dashboard): load charts correctly
   -->
   
   ### SUMMARY
   <!--- Describe the change below, including rationale and design decisions -->
   
   This PR fixes errors like this:
   
   ```
   psycopg2.errors.UndefinedColumn: column "is_green" does not exist in 
virtual_table
   ```
   
   This happens on Redshift (and potentially other databases) when RLS rules 
are applied to virtual datasets whose SQL uses table-name-qualified column 
references without an explicit alias.
   
   #### Root Cause
   
   `RLSAsSubqueryTransformer` replaces tables with filtered subqueries when 
applying RLS. When a table has no explicit alias, it constructed the subquery 
alias from the fully-qualified table name:
   
   ```sql
   -- Virtual dataset SQL:
   SELECT pens.pen_id, pens.is_green FROM public.pens
   
   -- After RLS (before fix):
   SELECT pens.pen_id, pens.is_green
   -- column refs use "pens" ↑
   FROM (SELECT * FROM public.pens WHERE user_id = 1) AS "public.pens"
   --                                        but alias is ↑ "public.pens"
   ```
   
   The column references (`pens.column`) can't resolve against the quoted alias 
"public.pens" because they're different identifiers, causing Redshift to return 
`column "is_green" does not exist in virtual_table`.
   
   #### Fix
   
   - `superset/sql/parse.py` — Use just the table name as the subquery alias 
instead of the schema-qualified path:
   
   ```sql
     -- After RLS (after fix):
     SELECT pens.pen_id, pens.is_green
     FROM (SELECT * FROM public.pens WHERE user_id = 1) AS "pens"
   ```
   
   - `superset/utils/rls.py` — `apply_rls()` now returns bool indicating 
whether any RLS predicates were actually applied.
   - `superset/models/helpers.py` — `get_from_clause()` and 
`validate_adhoc_subquery()` only regenerate SQL through sqlglot's `format()` 
when RLS was actually applied. Previously, all virtual dataset SQL was 
round-tripped
   through sqlglot even when no RLS rules existed, which could silently rewrite 
dialect-specific syntax (e.g., `NVL` → `COALESCE`, `current_timestamp` → 
`GETDATE()`, `:: casts` → `CAST()` on Redshift).
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   <!--- Skip this if not applicable -->
   
   N/A
   
   ### TESTING INSTRUCTIONS
   <!--- Required! What steps can be taken to manually verify the changes? -->
   
   Added 12 unit tests in 
`tests/unit_tests/models/test_virtual_dataset_format.py` covering:
   
   - SQL is preserved verbatim when no RLS applies (4 tests including 
Redshift-specific syntax)
   - SQL is regenerated when RLS is applied (1 test)
   - `apply_rls()` return value correctness (3 tests)
   - RLS subquery alias uses table name only, not schema-qualified path (4 
tests)
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to