fitzee commented on code in PR #40194:
URL: https://github.com/apache/superset/pull/40194#discussion_r3255671463


##########
superset/commands/streaming_export/base.py:
##########
@@ -160,9 +165,20 @@ def _execute_query_and_stream(
             # Merge database to prevent DetachedInstanceError
             merged_database = session.merge(database)
 
-            # Execute query with streaming
-            with merged_database.get_sqla_engine() as engine:
+            with merged_database.get_sqla_engine(
+                catalog=catalog, schema=schema
+            ) as engine:
                 with engine.connect() as connection:
+                    # Run prequeries (e.g. SET search_path for PostgreSQL) 
before
+                    # streaming. get_prequeries() is the only mechanism that 
sets
+                    # search_path for PostgreSQL — adjust_engine_params() 
ignores
+                    # schema for that engine.
+                    for prequery in 
merged_database.db_engine_spec.get_prequeries(
+                        database=merged_database,
+                        catalog=catalog,
+                        schema=schema,
+                    ):
+                        connection.execute(text(prequery))
                     result_proxy = connection.execution_options(
                         stream_results=True

Review Comment:
   Done! Implemented exactly as you suggested.
   
   **What changed:** Moved prequery execution out of the streaming command and 
into a SQLAlchemy `connect` event listener registered inside 
`Database.get_sqla_engine()`:
   
   ```python
   prequeries = self.db_engine_spec.get_prequeries(
       database=self, catalog=catalog, schema=schema
   )
   if prequeries:
       def run_prequeries(dbapi_connection, connection_record):
           cursor = dbapi_connection.cursor()
           for prequery in prequeries:
               cursor.execute(prequery)
           cursor.close()
   
       sqla.event.listen(engine, "connect", run_prequeries)
   yield engine
   ```
   
   The streaming commands now just call `engine.connect()` directly — the 
prequeries are automatic for any caller of `get_sqla_engine()`.
   
   **Validation:** Tested end-to-end locally against the running Docker 
environment:
   1. Created a non-public schema (`test_schema`) in PostgreSQL with 120,000 
rows
   2. Executed a SQL Lab query against `test_schema.big_table` with 
`expected_rows=120000` (above the `CSV_STREAMING_ROW_THRESHOLD`)
   3. Hit the `/api/v1/sqllab/export_streaming/` endpoint — received 7.6MB of 
clean CSV, no `__STREAM_ERROR__`
   4. Confirmed via PostgreSQL statement logs that `set search_path = 
"test_schema"` was being executed on every new connection through the event 
listener
   
   Thanks for steering this to the right architectural fix!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to