fitzee commented on code in PR #40194:
URL: https://github.com/apache/superset/pull/40194#discussion_r3255671463
##########
superset/commands/streaming_export/base.py:
##########
@@ -160,9 +165,20 @@ def _execute_query_and_stream(
# Merge database to prevent DetachedInstanceError
merged_database = session.merge(database)
- # Execute query with streaming
- with merged_database.get_sqla_engine() as engine:
+ with merged_database.get_sqla_engine(
+ catalog=catalog, schema=schema
+ ) as engine:
with engine.connect() as connection:
+ # Run prequeries (e.g. SET search_path for PostgreSQL)
before
+ # streaming. get_prequeries() is the only mechanism that
sets
+ # search_path for PostgreSQL — adjust_engine_params()
ignores
+ # schema for that engine.
+ for prequery in
merged_database.db_engine_spec.get_prequeries(
+ database=merged_database,
+ catalog=catalog,
+ schema=schema,
+ ):
+ connection.execute(text(prequery))
result_proxy = connection.execution_options(
stream_results=True
Review Comment:
Done! Implemented exactly as you suggested.
**What changed:** Moved prequery execution out of the streaming command and
into a SQLAlchemy `connect` event listener registered inside
`Database.get_sqla_engine()`:
```python
prequeries = self.db_engine_spec.get_prequeries(
database=self, catalog=catalog, schema=schema
)
if prequeries:
def run_prequeries(dbapi_connection, connection_record):
cursor = dbapi_connection.cursor()
for prequery in prequeries:
cursor.execute(prequery)
cursor.close()
sqla.event.listen(engine, "connect", run_prequeries)
yield engine
```
The streaming commands now just call `engine.connect()` directly — the
prequeries are automatic for any caller of `get_sqla_engine()`.
**Validation:** Tested end-to-end locally against the running Docker
environment:
1. Created a non-public schema (`test_schema`) in PostgreSQL with 120,000
rows
2. Executed a SQL Lab query against `test_schema.big_table` with
`expected_rows=120000` (above the `CSV_STREAMING_ROW_THRESHOLD`)
3. Hit the `/api/v1/sqllab/export_streaming/` endpoint — received 7.6MB of
clean CSV, no `__STREAM_ERROR__`
4. Confirmed via PostgreSQL statement logs that `set search_path =
"test_schema"` was being executed on every new connection through the event
listener
Thanks for steering this to the right architectural fix!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]