fl0-m opened a new issue, #40465:
URL: https://github.com/apache/superset/issues/40465
### Bug description
### Bug description
In Superset 6.1.0, the new streaming CSV export pipeline introduced by
#35478 (*"feat(streaming): Streaming CSV uploads for over 100k records for
constant memory usage"*) bypasses Superset's standard query-preparation
pipeline. This produces two distinct regressions, both reproducible against
Trino.
**Bug 1 — CSV exports crash on Trino with `__STREAM_ERROR__`**
The streaming path in
`superset/commands/streaming_export/base.py::_execute_query_and_stream` sends
raw chart SQL directly to `engine.execute(text(sql))` without running it
through `database.mutate_sql_based_on_config()` first. The SQL Superset
generates for a chart ends with a `LIMIT N;` line — and Trino's HTTP statement
endpoint rejects trailing semicolons as `mismatched input ';'. Expecting:
<EOF>`.
Because the streaming response has already flushed headers by the time the
exception fires, Flask cannot change the status code. The generator instead
writes the sentinel string `__STREAM_ERROR__: Export failed. Please try again
in some time.` (63 bytes) into the response body and closes the stream. The
user receives an HTTP 200 with that text inside what should have been their CSV
file. The frontend has no way to distinguish this from a successful download.
**Bug 2 — User impersonation is bypassed**
On databases configured with `impersonate_user: true` (Trino, Presto, etc.),
every other Superset execution site acquires the engine via
`database.get_sqla_engine_with_context(user_name=…)` so the end user's identity
is forwarded as the `X-Trino-User` header. The streaming export path acquires
its engine without this context and runs every query as the service principal.
Consequences:
- **Audit trail broken** — every CSV export, from every user, shows up in
the Trino query log as the service account.
- **Resource-group routing broken** — exports no longer route to the user's
configured Trino resource group.
- **Possible authorization bypass** — engines that key per-user authz off
`X-Trino-User` (Ranger, OPA, file-based ACLs, row/column-level security via
session-aware views) will see the service account on the streaming path. A
Superset user may be able to export data via "Download CSV" that they are not
permitted to read via SQL Lab.
Bug 1 is the visible crash. Bug 2 is independently reproducible — even with
bug 1 patched, every CSV in the Trino query log is misattributed.
The non-streaming export paths (Excel export, SQL Lab, `/api/v1/chart/data`
JSON renders) are unaffected because they go through the proper pipeline.
### How to reproduce the bug
1. Connect Superset 6.1.0 to a Trino cluster with `impersonate_user: true`.
2. Create a dashboard tile or standalone chart backed by a Trino dataset.
3. As any logged-in OAuth user (not the service principal), click `…` →
`Download` → `Export to CSV`.
4. Open the downloaded file.
5. Open the Trino UI / query history and locate the corresponding query.
**Expected**
- The CSV contains the chart's data.
- The Trino query record shows `User: <logged-in user>`, the user's normal
resource group, and the database's default schema.
**Actual**
- The downloaded file is 63 bytes and contains only:
```
__STREAM_ERROR__: Export failed. Please try again in some time.
```
- The Trino query record shows:
- `Error Type: USER_ERROR`
- `Error Code: SYNTAX_ERROR (1)`
- `Message: line N:13: mismatched input ';'. Expecting: <EOF>`
- `User: <service principal>` (not the end user)
- `Resource Group: n/a`
- `Schema: <empty>`
Performing the same action with `Export to Excel` instead of `Export to CSV`
works correctly and shows the end user, the right resource group, the default
schema, and a sqlglot-reformatted SQL body.
### Side-by-side evidence
Same chart, same user, two consecutive export attempts seconds apart.
**Failing CSV export — streaming path**
```
User: superset
Principal: superset
Source: Apache Superset
Catalog: my_catalog
Schema: (empty)
Resource Group: n/a
Status: USER_ERROR / SYNTAX_ERROR
SQL (last line): LIMIT 500000;
SQL form: raw, lowercase keywords, DATE '2026-05-20'
```
**Succeeding Excel export — non-streaming path**
```
User: [email protected] <-- end user via
X-Trino-User
Principal: superset
Source: Apache Superset
Catalog: my_catalog
Schema: my_schema
Resource Group: analysts
Status: FINISHED
SQL (last line): LIMIT 500000
SQL form: uppercased keywords, CAST('2026-05-20' AS DATE)
```
Both SQL strings are derived from the same chart definition. The differences
(trailing `;`, missing sqlglot reformat, missing schema context, missing user
impersonation) are all consequences of the streaming path skipping
`mutate_sql_based_on_config()` and `get_sqla_engine_with_context(user_name=…)`.
### Minimal SQL illustrating the difference
What the streaming CSV path sends to Trino (fails):
```sql
SELECT category AS category, region AS region, sum(amount) AS "SUM(amount)"
FROM (select date, order_id, region, amount, category
from my_catalog.my_schema.orders) AS virtual_table
WHERE date >= DATE '2026-05-20' AND date < DATE '2026-05-27'
AND amount > 100 AND region IS NOT NULL
GROUP BY category, region
ORDER BY "SUM(amount)" DESC
LIMIT 500000;
```
What the non-streaming Excel path sends to Trino (works):
```sql
SELECT
category AS category,
region AS region,
SUM(amount) AS "SUM(amount)"
FROM (
SELECT date, order_id, region, amount, category
FROM my_catalog.my_schema.orders
) AS virtual_table
WHERE
date >= CAST('2026-05-20' AS DATE)
AND date < CAST('2026-05-27' AS DATE)
AND amount > 100
AND NOT region IS NULL
GROUP BY category, region
ORDER BY "SUM(amount)" DESC
LIMIT 500000
```
### Stack trace
```
ERROR:superset.commands.streaming_export.base:Traceback: Traceback (most
recent call last):
File ".../sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File ".../trino/sqlalchemy/dialect.py", line 442, in do_execute
cursor.execute(statement, parameters)
File ".../trino/dbapi.py", line 640, in execute
self._iterator = iter(self._query.execute())
File ".../trino/client.py", line 938, in execute
self._result.rows += self.fetch()
File ".../trino/client.py", line 958, in fetch
status = self._request.process(response)
File ".../trino/client.py", line 727, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR,
name=SYNTAX_ERROR,
message="line 24:13: mismatched input ';'. Expecting: <EOF>",
query_id=...)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/superset/commands/streaming_export/base.py", line 225, in
csv_generator
yield from self._execute_query_and_stream(sql, database, limit)
File "/app/superset/commands/streaming_export/base.py", line 168, in
_execute_query_and_stream
).execute(text(sql))
...
sqlalchemy.exc.ProgrammingError: (trino.exceptions.TrinoUserError)
TrinoUserError(
type=USER_ERROR, name=SYNTAX_ERROR,
message="line 24:13: mismatched input ';'. Expecting: <EOF>",
query_id=...)
```
Trino-side parser stack (from the corresponding query in the Trino UI):
```
io.trino.sql.parser.ParsingException: line 24:13: mismatched input ';'.
Expecting: <EOF>
at io.trino.sql.parser.ErrorHandler.syntaxError(ErrorHandler.java:108)
...
at
io.trino.dispatcher.DispatchManager.createQueryInternal(DispatchManager.java:225)
```
### Environment
- **Superset version:** 6.1.0
- **Database engine:** Trino 480 (`trino-python-client` via SQLAlchemy)
- **DB connection setting:** `impersonate_user: true`
- **Python:** 3.10
- **Deployment:** Helm chart on Kubernetes
- **Auth:** OAuth2
### Severity
I'd argue release-blocker class for two reasons:
1. **Functional:** every dashboard/chart CSV export against Trino or Presto
in 6.1.0 is broken, with no in-UI signal of failure (HTTP 200 + sentinel text
inside the file).
2. **Security:** missing impersonation may silently bypass per-user
authorization on deployments that key Trino authz off `X-Trino-User`. Any
deployment using Ranger / OPA / file-based ACLs / RLS views with Superset +
Trino should validate before upgrading.
### Screenshots/recordings
_No response_
### Superset version
master / latest-dev
### Python version
3.10
### Node version
I don't know
### Browser
Chrome
### Additional context
_No response_
### Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to
my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug
report.
- [x] I have checked Superset's logs for errors and if I found a relevant
Python stacktrace, I included it here as text in the "additional context"
section.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]