michael-s-molina opened a new pull request, #36529:
URL: https://github.com/apache/superset/pull/36529
### Summary
This PR introduces a unified SQL execution API for Superset through
`Database.execute()` and `Database.execute_async()` methods.
### Background
During analysis of the codebase, I identified **4 distinct SQL execution
paths**:
1. **Database model** (`Database.get_df()`) - Direct query execution
2. **SQL Lab** (`sql_lab.py`, `celery_task.py`) - Interactive query
execution with async support
3. **Charts** (`ExploreMixin.query()`) - Chart data fetching via SQLAlchemy
query building
4. **MCP Tools** (`mcp_service.py`) - External tool integration
Each path implements its own subset of features, leading to inconsistent
behavior and code duplication:
| Feature | Database.get_df() | SQL Lab | Charts
| MCP Tools | Unified API |
| ------------------------ | ----------------- | ----------- |
----------------- | ---------------- | ----------- |
| **Jinja2 Templates** | ❌ | ✅ | ✅
| ❌ | ✅ |
| **Row-Level Security** | ❌ | ⚠️ Optional | ✅ (WHERE
clause) | ❌ | ✅ (Always) |
| **Result Caching** | ❌ | ✅ | ✅
| ❌ | ✅ |
| **SQL Mutation** | ✅ | ✅ | ✅
| ❌ | ✅ |
| **DML Permission Check** | ❌ | ✅ | ❌ (SELECT only)
| ⚠️ Basic keyword | ✅ |
| **Disallowed Functions** | ❌ | ✅ | ❌
| ⚠️ Hardcoded | ✅ |
| **Query Timeout** | ❌ | ✅ | ⚠️
Request-level | ⚠️ Not enforced | ✅ |
| **Query Logging** | ✅ | ✅ | ❌
| ❌ | ✅ |
| **Async Execution** | ❌ | ✅ | ❌
| ❌ | ✅ |
| **Progress Tracking** | ❌ | ✅ | ❌
| ❌ | ✅ |
| **Query Audit Trail** | ❌ | ✅ | ❌
| ❌ | ✅ |
| **Result Limit** | ❌ | ✅ | ✅
| ✅ | ✅ |
| **Dry Run Mode** | ❌ | ❌ | ❌
| ❌ | ✅ |
This inconsistency creates security gaps (e.g., MCP Tools don't apply RLS)
and maintenance challenges (e.g., fixing a caching bug requires changes in
multiple places).
### Objective
This PR **does not** consolidate all execution paths. Instead, it provides a
**unified API** that:
- Is compatible with all existing execution paths
- Can be incrementally adopted by each path
- Ensures consistent security guarantees (RLS always applied, DML permission
checks)
- Supports both synchronous and asynchronous execution
- Provides a stable interface for **Superset extensions** to execute queries
with full feature support
### API Overview
#### Synchronous Execution
```python
from superset_core.api.types import QueryOptions, QueryStatus
result = database.execute(
"SELECT * FROM users WHERE active = true",
options=QueryOptions(schema="public", limit=100)
)
if result.status == QueryStatus.SUCCESS:
df = result.data
print(f"Found {result.row_count} rows")
```
#### Asynchronous Execution
```python
result = database.execute_async(
"SELECT * FROM large_table",
options=QueryOptions(schema="analytics")
)
# Check status and get results
status = result.handle.get_status()
if status == QueryStatus.SUCCESS:
query_result = result.handle.get_result()
df = query_result.data
# Cancel if needed
result.handle.cancel()
```
#### QueryOptions
- `catalog` / `schema` - Target database location
- `limit` - Maximum rows to return
- `template_params` - Variables for Jinja2 template rendering
- `cache` - Result caching configuration (timeout, force_refresh)
- `dry_run` - Return transformed SQL without execution
- `timeout_seconds` - Query timeout override
### Security Guarantees
Both `execute()` and `execute_async()` enforce:
- **Row-level security (RLS)** - Always applied via AST transformation
- **DML permission checks** - Requires `database.allow_dml=True`
- **Disallowed SQL functions** - Blocked per configuration
- **Query timeout protection** - Prevents runaway queries
### Future Consolidation Tasks
The following tasks will migrate each execution path to use the new unified
API:
- [Remove query execution code in MCP
service](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286307)
- [Remove query execution code in Database
model](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286406)
- [Remove query execution code in SQL
Lab](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144286423)
- [Remove query execution code in
charts](https://github.com/orgs/apache/projects/544?pane=issue&itemId=144510614)
### Test Plan
- [x] Unit tests for sync execution (SELECT, DML, templates, RLS, caching,
timeouts)
- [x] Unit tests for async execution (query creation, handle methods,
cancellation)
- [x] Pre-commit checks pass (mypy, ruff, pylint)
### ADDITIONAL INFORMATION
<!--- Check any relevant boxes with "x" -->
<!--- HINT: Include "Fixes #nnn" if you are fixing an existing issue -->
- [ ] Has associated issue:
- [ ] Required feature flags:
- [ ] Changes UI
- [ ] Includes DB Migration (follow approval process in
[SIP-59](https://github.com/apache/superset/issues/13351))
- [ ] Migration is atomic, supports rollback & is backwards-compatible
- [ ] Confirm DB migration upgrade and downgrade tested
- [ ] Runtime estimates and downtime expectations provided
- [ ] Introduces new feature or API
- [ ] Removes existing feature or API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]