[PR] feat(mcp): add sql_expression to ColumnRef for calculated metrics [superset]

via GitHub Sat, 23 May 2026 21:11:37 -0700


gkneighb opened a new pull request, #40398:
URL: https://github.com/apache/superset/pull/40398


   ### SUMMARY
   
   `ColumnRef` (the metric/column reference used in `generate_chart` configs) 
previously offered two modes for defining a metric:
   
   - **SIMPLE**: `{name, aggregate}` — e.g. `SUM(sales)`
   - **saved_metric**: `{name, saved_metric: true}` — reference a stored 
dataset metric by name
   
   This left **calculated metrics** — ratios, growth rates, conversion 
percentages — unreachable from MCP unless the dataset already had a saved 
metric defined. There's no MCP tool to add saved metrics to an existing 
dataset, so an LLM trying to build a "Profit Ratio" KPI is stuck:
   
   ```
   big_number's metric only accepts ColumnRef (name + aggregate)
   ↳ No SQL expression slot
   ↳ saved_metric requires manually editing the dataset (no MCP path)
   ↳ Fallback: use `update_chart` to patch raw form_data
      — defeats the typed-config story
   ```
   
   This PR adds a third mode:
   - **SQL**: `{name, sql_expression}` — e.g. `SUM(profit) / NULLIF(SUM(sales), 
0)`
   
   `name` becomes the metric's label; `sql_expression` carries the formula.
   
   ### BEFORE/AFTER
   
   **Before** — `generate_chart` rejects calculated metrics:
   
   ```
   generate_chart({
     config: {
       chart_type: "big_number",
       metric: {name: "profit_ratio", sql_expression: "SUM(profit)/SUM(sales)"},
     }
   })
   → Error: ColumnRef has no field 'sql_expression'
   ```
   
   **After** — calculated metric works in a single call:
   
   ```
   generate_chart({
     config: {
       chart_type: "big_number",
       metric: {
         name: "profit_ratio",
         sql_expression: "SUM(profit) / NULLIF(SUM(sales), 0)",
         label: "Profit Ratio",
       },
       y_axis_format: ".2%",
     }
   })
   → Chart created. get_chart_data returns {'Profit Ratio': 0.15063…}
   ```
   
   ### CHANGES
   
   Four files touched:
   
   1. **`superset/mcp_service/chart/schemas.py`**
      - Add `sql_expression: str | None` field to `ColumnRef` with description 
geared at LLM consumers (mentions ratios, growth rates, conversion percentages).
      - Extend `is_metric` to recognize sql_expression as valid.
      - Extend `clear_aggregate_for_saved_metric` so sql_expression also takes 
over from aggregate (matching saved_metric semantics).
      - Add `disallow_saved_and_sql` validator — saved_metric and 
sql_expression are mutually exclusive paths.
   
   2. **`superset/mcp_service/chart/chart_utils.py`**
      - Extend `create_metric_object` to emit Superset's SQL form_data shape 
when sql_expression is set.
   
   3. **`superset/mcp_service/chart/validation/schema_validator.py`**
      - Update the big_number metric guard to accept sql_expression alongside 
aggregate and saved_metric. Updated error message + suggestions teach the LLM 
the new path.
   
   4. **`superset/mcp_service/chart/validation/dataset_validator.py`**
      - Skip the dataset-column-existence check for sql_expression metrics — 
their column references live inline in the expression, not as the 
ColumnRef.name.
   
   The REST API has supported this shape via adhoc form_data metrics forever 
(chart `params.metric = {expressionType: "SQL", sqlExpression, label}`). This 
PR exposes the same capability through the typed MCP schema so an LLM can 
construct `big_number`, `xy`, `table`, `pivot_table`, and `mixed_timeseries` 
charts with calculated metrics in one round-trip.
   
   ### TESTING INSTRUCTIONS
   
   ```bash
   pytest 
tests/unit_tests/mcp_service/chart/test_chart_schemas.py::TestColumnRefSqlExpression
   pytest 
tests/unit_tests/mcp_service/chart/test_chart_utils.py::TestCreateMetricObject
   pytest tests/unit_tests/mcp_service/chart/test_big_number_chart.py
   ```
   
   New tests:
   - `TestColumnRefSqlExpression` — 7 tests covering field defaults, 
acceptance, aggregate-clearing, is_metric, mutual exclusion with saved_metric, 
use in XYChartConfig, max-length enforcement.
   - `TestCreateMetricObject` (2 added) — SQL emission shape + label fallback.
   - `TestBigNumberChartConfig` (3 added) — config acceptance, SchemaValidator 
pre-validation guard, and that bare ColumnRef (no aggregate/sql/saved) is still 
rejected.
   
   **306 tests pass across the four touched modules; zero regressions.**
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue: _(filing alongside, will link)_
   - [ ] Required feature flags: _none_
   - [x] Changes UI: _no_
   - [x] Includes DB Migration: _no_
   - [x] Includes packaged template files: _no_
   - [x] Introduces new feature or API: _adds `sql_expression` field to the 
`ColumnRef` schema used by `generate_chart` and related MCP tools_
   - [x] Removes existing feature or API: _no_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat(mcp): add sql_expression to ColumnRef for calculated metrics [superset]

Reply via email to