michael-s-molina opened a new pull request, #36529:
URL: https://github.com/apache/superset/pull/36529

   ### Summary
   
   This PR introduces a unified SQL execution API for Superset through 
`Database.execute()` and `Database.execute_async()` methods.
   
   ### Background
   
   During analysis of the codebase, I identified **4 distinct SQL execution 
paths**:
   
   1. **Database model** (`Database.get_df()`) - Direct query execution
   2. **SQL Lab** (`sql_lab.py`, `celery_task.py`) - Interactive query 
execution with async support
   3. **Charts** (`ExploreMixin.query()`) - Chart data fetching via SQLAlchemy 
query building
   4. **MCP Tools** (`mcp_service.py`) - External tool integration
   
   Each path implements its own subset of features, leading to inconsistent 
behavior and code duplication:
   
   | Feature                  | Database.get_df() | SQL Lab     | Charts        
    | MCP Tools        | Unified API |
   | ------------------------ | ----------------- | ----------- | 
----------------- | ---------------- | ----------- |
   | **Jinja2 Templates**     | ❌                | ✅          | ✅               
 | ❌               | ✅          |
   | **Row-Level Security**   | ❌                | ⚠️ Optional | ✅ (WHERE 
clause) | ❌               | ✅ (Always) |
   | **Result Caching**       | ❌                | ✅          | ✅               
 | ❌               | ✅          |
   | **SQL Mutation**         | ✅                | ✅          | ✅               
 | ❌               | ✅          |
   | **DML Permission Check** | ❌                | ✅          | ❌ (SELECT only) 
 | ⚠️ Basic keyword | ✅          |
   | **Disallowed Functions** | ❌                | ✅          | ❌               
 | ⚠️ Hardcoded     | ✅          |
   | **Query Timeout**        | ❌                | ✅          | ⚠️ 
Request-level  | ⚠️ Not enforced  | ✅          |
   | **Query Logging**        | ✅                | ✅          | ❌               
 | ❌               | ✅          |
   | **Async Execution**      | ❌                | ✅          | ❌               
 | ❌               | ✅          |
   | **Progress Tracking**    | ❌                | ✅          | ❌               
 | ❌               | ✅          |
   | **Query Audit Trail**    | ❌                | ✅          | ❌               
 | ❌               | ✅          |
   | **Result Limit**         | ❌                | ✅          | ✅               
 | ✅               | ✅          |
   | **Dry Run Mode**         | ❌                | ❌          | ❌               
 | ❌               | ✅          |
   
   This inconsistency creates security gaps (e.g., MCP Tools don't apply RLS) 
and maintenance challenges (e.g., fixing a caching bug requires changes in 
multiple places).
   
   ### Objective
   
   This PR **does not** consolidate all execution paths. Instead, it provides a 
**unified API** that:
   
   - Is compatible with all existing execution paths
   - Can be incrementally adopted by each path
   - Ensures consistent security guarantees (RLS always applied, DML permission 
checks)
   - Supports both synchronous and asynchronous execution
   - Provides a stable interface for **Superset extensions** to execute queries 
with full feature support
   
   ### API Overview
   
   #### Synchronous Execution
   
   ```python
   from superset_core.api.types import QueryOptions, QueryStatus
   
   result = database.execute(
       "SELECT * FROM users WHERE active = true",
       options=QueryOptions(schema="public", limit=100)
   )
   
   if result.status == QueryStatus.SUCCESS:
       df = result.data
       print(f"Found {result.row_count} rows")
   ```
   
   #### Asynchronous Execution
   
   ```python
   result = database.execute_async(
       "SELECT * FROM large_table",
       options=QueryOptions(schema="analytics")
   )
   
   # Check status and get results
   status = result.handle.get_status()
   if status == QueryStatus.SUCCESS:
       query_result = result.handle.get_result()
       df = query_result.data
   
   # Cancel if needed
   result.handle.cancel()
   ```
   
   #### QueryOptions
   
   - `catalog` / `schema` - Target database location
   - `limit` - Maximum rows to return
   - `template_params` - Variables for Jinja2 template rendering
   - `cache` - Result caching configuration (timeout, force_refresh)
   - `dry_run` - Return transformed SQL without execution
   - `timeout_seconds` - Query timeout override
   
   ### Security Guarantees
   
   Both `execute()` and `execute_async()` enforce:
   
   - **Row-level security (RLS)** - Always applied via AST transformation
   - **DML permission checks** - Requires `database.allow_dml=True`
   - **Disallowed SQL functions** - Blocked per configuration
   - **Query timeout protection** - Prevents runaway queries
   
   ### Future Consolidation Tasks
   
   The following tasks will migrate each execution path to use the new unified 
API:
   
   - [Remove query execution code in MCP 
service](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286307)
   - [Remove query execution code in Database 
model](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286406)
   - [Remove query execution code in SQL 
Lab](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286423)
   - [Remove query execution code in 
charts](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144510614)
   
   ### Test Plan
   
   - [x] Unit tests for sync execution (SELECT, DML, templates, RLS, caching, 
timeouts)
   - [x] Unit tests for async execution (query creation, handle methods, 
cancellation)
   - [x] Pre-commit checks pass (mypy, ruff, pylint)
   
   ### ADDITIONAL INFORMATION
   <!--- Check any relevant boxes with "x" -->
   <!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
   - [ ] Has associated issue:
   - [ ] Required feature flags:
   - [ ] Changes UI
   - [ ] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [ ] Migration is atomic, supports rollback & is backwards-compatible
     - [ ] Confirm DB migration upgrade and downgrade tested
     - [ ] Runtime estimates and downtime expectations provided
   - [ ] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to