GitHub user dosubot[bot] added a comment to the discussion: RLS condition inconsistently applied inside dataset metric subqueries
<!-- Answer --> Great questions — here's a detailed breakdown of how the RLS/sqlglot machinery works internally: ## AST Traversal: Not Every SELECT — Every `exp.Table` Node The RLS transformers don't inject into every SELECT. Instead, they operate at the **`exp.Table` node level**. When `self._parsed.transform(transformer)` is called, sqlglot recursively walks the entire AST and the transformer's `__call__` method fires on every node — but it only acts on `exp.Table` nodes [[1]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L203-L204). Other nodes pass through unchanged. - **AS_PREDICATE**: Finds the parent context (FROM → appends to WHERE; JOIN → appends to ON clause) [[2]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L216-L232) - **AS_SUBQUERY**: Replaces the table node with `(SELECT * FROM table WHERE <rls>) AS alias` [[3]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L268-L276) ## What Counts as a "Table Reference" Table detection uses `traverse_scope()` from sqlglot to iterate all scopes and collect `exp.Table` sources [[4]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L1316-L1321): ```python sources = [ source for scope in traverse_scope(statement) for source in scope.sources.values() if isinstance(source, exp.Table) and not is_cte(source, scope) ] ``` **Detected as tables**: base tables, schema-qualified tables, aliased tables, JOIN targets. **NOT detected as tables**: CTE references (explicitly filtered), derived tables (subqueries in FROM — these are `exp.Subquery`, not `exp.Table`), table-valued functions. ## CTEs, Scalar Subqueries, Correlated Subqueries, Derived Tables They are **not** treated the same: - **CTEs**: Explicitly excluded from table detection via `is_cte()` [[5]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L1333-L1352). But tables *inside* the CTE definition body are detected and receive RLS. This prevents bypass attacks like `WITH t AS (SELECT * FROM restricted) SELECT * FROM t`. - **Derived tables** (`FROM (SELECT ...)`): The derived table itself is not an `exp.Table`, but tables *inside* the subquery are found via scope traversal and get RLS. - **Scalar/correlated subqueries**: Same — the subquery isn't a table, but tables referenced within it are discovered and injected with RLS. So yes, detection is fundamentally **FROM/JOIN-based** at the scope level — it finds `exp.Table` nodes that appear as sources in any scope. ## Your Three Metric Examples ```sql SUM(amount) ``` → **No subquery detected.** `has_subquery()` returns `False` [[6]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L809-L816). No special RLS processing of the metric expression itself. RLS is applied only to the dataset's main query. ```sql (SELECT SUM(s.amount) FROM sales s WHERE s.product_id = main.product_id) ``` → **Subquery detected.** `has_subquery()` finds the nested SELECT [[6]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L809-L816). If `ALLOW_ADHOC_SUBQUERY` is `False` (default): **raises `SupersetSecurityException`**. If enabled: `apply_rls()` injects RLS into the `sales` table reference inside the subquery [[7]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/models/helpers.py#L136-L138). ```sql SUM(CASE WHEN EXISTS (SELECT 1 FROM stock st WHERE st.product_id = product_id) THEN qty END) ``` → **Subquery detected** (the `SELECT 1 FROM stock` is a nested SELECT found by AST walking). Same behavior as above — blocked or has RLS applied to `stock`. ## Alias/Scope Resolution The transformer **does not resolve aliases before injection**. Instead, it qualifies RLS predicate columns with the table's alias (or name if no alias) [[8]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql/parse.py#L212-L214): ```python for column in predicate.find_all(exp.Column): column.set("table", node.alias or node.this) ``` So if your table is `FROM sales AS s`, the RLS predicate `warehouse_id = XX` becomes `s.warehouse_id = XX`. This is literal qualification, not semantic resolution — the transformer doesn't "understand" scope context, it just stamps the table/alias onto predicate columns. ## Jinja vs RLS Ordering **Jinja renders FIRST, then RLS is applied to the rendered SQL.** The pipeline is [[9]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sqllab/query_render.py#L60-L63) [[10]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql_lab.py#L392-L409): 1. **Jinja rendering** → produces final SQL string 2. **SQL parsing** → sqlglot parses into AST 3. **RLS injection** → transforms AST 4. **Format back to SQL** → string output 5. **Execution** So the AST is built from the **final rendered SQL**, not the raw Jinja template. Your Jinja-generated SQL is fully expanded before sqlglot ever sees it. ## Debugging / Inspecting Transformed SQL A few options: - **SQL Lab query history**: The `executed_sql` field on the Query model stores the final SQL post-RLS [[11]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql_lab.py#L478). Visible in the SQL Lab UI query history panel. - **Debug logging**: Set `logging.getLogger('superset.sql_lab').setLevel(logging.DEBUG)` — the final SQL is logged at debug level [[12]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/sql_lab.py#L1230). - **QUERY_LOGGER callback** in `superset_config.py` [[13]](https://github.com/apache/superset/blob/829e4d92d91ceae4b43b1ed3b063ffe45377799c/superset/config.py#L963-L972): receives the final transformed SQL before execution. - **Chart "View Query" button**: In Explore view, the displayed SQL should reflect post-RLS transformation. ## Practical Takeaway for "Safe" Metric Patterns | Pattern | RLS behavior | |---------|-------------| | Simple aggregates (`SUM(col)`, `COUNT(*)`) | No subquery → no RLS on metric itself; RLS only on main dataset query | | Expressions with no SELECT (`CASE WHEN ... THEN col END`) | Same — safe, no subquery detection | | Metrics with subqueries (correlated or scalar) | Triggers subquery detection → blocked by default, or RLS injected into all tables in the subquery | | CTEs used in dataset SQL | CTE *references* excluded from RLS, but tables *inside* CTE definitions receive RLS | The safest approach: **keep metrics as simple aggregate expressions** without embedded SELECT statements. Move complex filtering logic into the dataset's base SQL (virtual dataset definition) where RLS will be applied uniformly at the outer level. <!-- Dosu Comment Footer --> *To reply, just mention [@dosu](https://go.dosu.dev/dosubot?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=reply-with-mention&utm_term=apache%2Fsuperset).* --- Share context across your team and agents. Try [Dosu](https://dosu.dev?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=knowledge-infrastructure-tagline&utm_term=apache%2Fsuperset). [](https://app.dosu.dev/response-feedback?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=knowledge-infrastructure-feedback&utm_term=apache%2Fsuperset&message_id=115c7cfa-4e16-4ce9-9746-2ee4cd7e709e) [](https://github.dosu.com/apache/superset?utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=knowledge-infrastructure-learn-repo&utm_term=apache%2Fsuperset) [](https://app.dosu.dev/signup?referrer=openSource&source=github-footer&utm_source=github&utm_medium=bot-comment&utm_campaign=github-comment-footer-20260415&utm_content=knowledge-infrastructure-add-team&utm_term=apache%2Fsuperset) GitHub link: https://github.com/apache/superset/discussions/40400#discussioncomment-17039404 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
