codeant-ai-for-open-source[bot] commented on code in PR #40398:
URL: https://github.com/apache/superset/pull/40398#discussion_r3293888224
##########
superset/mcp_service/chart/validation/dataset_validator.py:
##########
@@ -139,6 +139,11 @@ def _validate_columns_exist(
for col_ref in column_refs:
if col_ref.saved_metric:
continue
+ if col_ref.sql_expression:
+ # SQL-expression metrics define their column references
+ # inline in the expression itself; the ColumnRef's `name`
+ # is just a label. Skip the dataset-column lookup.
+ continue
Review Comment:
**Suggestion:** This skips dataset-column validation for any `ColumnRef`
that carries `sql_expression`, but `ColumnRef` is also used for non-metric
positions (like x-axis, dimensions, and group-bys). That lets invalid column
names in those positions bypass Tier-1 validation and fail later at query time
with a generic DB error instead of a clear validation error. Restrict this
bypass to metric contexts only (or track whether each extracted ref came from a
metric field before skipping). [incorrect condition logic]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ generate_chart accepts invalid dimension columns with sql_expression.
- ❌ XY and mixed_timeseries X-axis errors occur at database.
- ⚠️ Handlebars and table raw columns may bypass validation.
- ⚠️ LLM callers receive opaque DB errors, not clear hints.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. A caller invokes the MCP `generate_chart` tool with a JSON request parsed
by
`SchemaValidator.validate_request()` in
`superset/mcp_service/chart/validation/schema_validator.py:36-55`, using
`chart_type:
"xy"` and a config where a non-metric position carries `sql_expression`,
e.g.:
- `config.x = {"name": "bogus_dim", "sql_expression":
"SUM(profit)/SUM(sales)"}` and a
valid `y` metric list.
2. `ValidationPipeline._validate_dataset()` in
`superset/mcp_service/chart/validation/pipeline.py:17-28` calls
`DatasetValidator.validate_against_dataset(config, dataset_id,
dataset_context)`, which in
turn calls `_extract_column_references(config)` in
`superset/mcp_service/chart/validation/dataset_validator.py:153-199`. For an
`XYChartConfig`, `_extract_column_references` appends:
- `config.x` (the X-axis `ColumnRef`), all `config.y` metric
`ColumnRef`s, and any
`group_by` `ColumnRef`s into a single `column_refs` list without tracking
which refs
are metrics vs dimensions.
3. Still in `DatasetValidator.validate_against_dataset`,
`_validate_columns_exist(column_refs, dataset_context)` is invoked at
`superset/mcp_service/chart/validation/dataset_validator.py:118-120`. In
that method, the
loop over `column_refs` at lines 24-32 checks:
- `if col_ref.saved_metric: continue`
- `if col_ref.sql_expression: ... continue` (lines 142-146 in the PR hunk)
Because `config.x` has `sql_expression` set, this X-axis `ColumnRef` hits
the `if
col_ref.sql_expression:` branch and skips the dataset column lookup
entirely, even
though for X/group_by/dimension positions `name` is expected to be a real
dataset
column.
4. Dataset validation therefore returns `True` with no
`ChartGenerationError`, and the
pipeline proceeds to build form_data via `map_xy_config(config, dataset_id)`
in
`superset/mcp_service/chart/chart_utils.py:704-79`. `map_xy_config` uses
`config.x.name`
and `config.group_by[i].name` directly as column names in form_data
(`"x_axis":
config.x.name`, `"groupby": [c.name ...]`) and ignores `sql_expression`.
Superset then
executes a query referencing `bogus_dim`, which does not exist in the
dataset, causing a
database-level "column not found" failure that surfaces as a generic DB
engine error
instead of the intended Tier-1 validation error that would have been raised
if
`_validate_columns_exist` had actually checked non-metric column names
carrying
`sql_expression`.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=4c608f3bd5654bb0b433d6334af4565e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=4c608f3bd5654bb0b433d6334af4565e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/validation/dataset_validator.py
**Line:** 142:146
**Comment:**
*Incorrect Condition Logic: This skips dataset-column validation for
any `ColumnRef` that carries `sql_expression`, but `ColumnRef` is also used for
non-metric positions (like x-axis, dimensions, and group-bys). That lets
invalid column names in those positions bypass Tier-1 validation and fail later
at query time with a generic DB error instead of a clear validation error.
Restrict this bypass to metric contexts only (or track whether each extracted
ref came from a metric field before skipping).
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40398&comment_hash=3746ecec817bf86cb9f72f371d69a598e59f1507ce4f9010fa5384f3b2732e36&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40398&comment_hash=3746ecec817bf86cb9f72f371d69a598e59f1507ce4f9010fa5384f3b2732e36&reaction=dislike'>👎</a>
##########
superset/mcp_service/chart/validation/schema_validator.py:
##########
@@ -392,18 +392,25 @@ def _pre_validate_big_number_config(
],
error_code="INVALID_BIG_NUMBER_METRIC_TYPE",
)
- if not metric.get("aggregate") and not metric.get("saved_metric"):
+ if (
+ not metric.get("aggregate")
+ and not metric.get("saved_metric")
+ and not metric.get("sql_expression")
+ ):
Review Comment:
**Suggestion:** The new pre-validation treats any truthy `sql_expression` as
valid, so whitespace-only expressions (for example `" "`) bypass this guard
and proceed to query building, where they fail with database SQL syntax errors.
Normalize/strip the value and reject blank expressions here so invalid SQL
metrics fail fast with a clear validation error. [incorrect condition logic]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Big Number charts with blank SQL metrics crash at query.
- ⚠️ MCP callers receive backend SQL errors, not validation feedback.
- ⚠️ LLM-generated configs with whitespace SQL fail non-gracefully.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. A caller constructs MCP chart request data targeting the MCP validation
pipeline (see
`superset/mcp_service/chart/validation/pipeline.py:96-115`), with a Big
Number config
whose metric uses a whitespace-only SQL expression:
2.
```python
request_data = {
"dataset_id": 1,
"config": {
"chart_type": "big_number",
"metric": {
"name": "bad_metric",
"sql_expression": " ", # only spaces
},
},
}
```
3. `ValidationPipeline.validate_request_with_warnings()`
(pipeline.py:96-115) calls
`SchemaValidator.validate_request(request_data)`
(`schema_validator.py:8-31`), which in
turn calls `_pre_validate(request_data)` and then
`_pre_validate_chart_type(chart_type,
config)` (`schema_validator.py:7-20). For `chart_type == "big_number"`,
`_pre_validate_big_number_config` is invoked (`schema_validator.py:22-25).
4. Inside `_pre_validate_big_number_config` at lines 395-399, the guard:
```python
if (
not metric.get("aggregate")
and not metric.get("saved_metric")
and not metric.get("sql_expression")
):
...
```
treats `" "` as truthy, so the guard passes and no validation error is
returned, even
though the SQL expression is effectively blank.
5. `SchemaValidator.validate_request` then constructs a
`GenerateChartRequest` whose
`config.metric` is a `ColumnRef` with `sql_expression=" "`
(`superset/mcp_service/chart/schemas.py:36-45). `ColumnRef.is_metric`
(`schemas.py:49-51`)
uses `bool(self.sql_expression)`, so this whitespace value still marks the
column as a
metric.
6. When the chart is executed, `map_big_number_config` (`chart_utils.py:8-20
around
815-822) calls `create_metric_object(config.metric)`
(`chart_utils.py:47-73), which builds
a metric dict with `"expressionType": "SQL"` and `"sqlExpression": " "`.
This is sent to
Superset's query engine as a SQL metric, resulting in an invalid/empty SQL
expression and
a database-level SQL syntax error instead of a clear MCP validation error.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=cc0541817eca469abfdecb842e10001b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=cc0541817eca469abfdecb842e10001b&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/validation/schema_validator.py
**Line:** 395:399
**Comment:**
*Incorrect Condition Logic: The new pre-validation treats any truthy
`sql_expression` as valid, so whitespace-only expressions (for example `" "`)
bypass this guard and proceed to query building, where they fail with database
SQL syntax errors. Normalize/strip the value and reject blank expressions here
so invalid SQL metrics fail fast with a clear validation error.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40398&comment_hash=38849d1dcc6b58a79da010039836228033e7346fa880874d44fb30009683386c&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40398&comment_hash=38849d1dcc6b58a79da010039836228033e7346fa880874d44fb30009683386c&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]