codeant-ai-for-open-source[bot] commented on code in PR #36350:
URL: https://github.com/apache/superset/pull/36350#discussion_r2582899831
##########
superset/charts/schemas.py:
##########
@@ -1415,6 +1415,12 @@ class ChartDataQueryContextSchema(Schema):
form_data = fields.Raw(allow_none=True, required=False)
+ client_id = fields.String(
+ required=False,
+ allow_none=True,
+ metadata={"description": "Optional client-generated id for this chart
query"},
Review Comment:
**Suggestion:** Adding `client_id` as a loadable field means
`ChartDataQueryContextSchema().load()` will include `client_id` in the returned
data and then pass it as an unexpected keyword argument to
`QueryContextFactory.create(**data)`, causing a runtime TypeError whenever a
client sends this field; to avoid this, the field should not be part of the
loaded kwargs (e.g. by making it dump-only) until the factory and QueryContext
are updated to accept it. [type error]
**Severity Level:** Minor ⚠️
```suggestion
dump_only=True,
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a real runtime bug, not cosmetics.
`ChartDataQueryContextSchema` does a `@post_load` that calls
`self.get_query_context_factory().create(**data)`. The current
`QueryContextFactory.create` signature is:
def create(
*,
datasource: DatasourceDict,
queries: list[dict[str, Any]],
form_data: dict[str, Any] | None = None,
result_type: ChartDataResultType | None = None,
result_format: ChartDataResultFormat | None = None,
force: bool = False,
custom_cache_timeout: int | None = None,
) -> QueryContext:
There is no `client_id` parameter. With the new field defined as a normal
loadable field, any request body that includes `client_id` will deserialize
into `data["client_id"]` and then `create(**data)` will raise:
TypeError: QueryContextFactory.create() got an unexpected keyword
argument 'client_id'
Marking the field as `dump_only=True` keeps it in the schema (for output
/ documentation) but excludes it from deserialization, so `client_id`
will not be included in `data` and no unexpected-kwarg error occurs.
Given that `QueryContext` and `QueryContextFactory` do not currently use
a client id at all, dropping it on load is functionally safe and prevents
a production TypeError.
So this change directly fixes a concrete runtime failure and is therefore
not undesirable.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/charts/schemas.py
**Line:** 1421:1421
**Comment:**
*Type Error: Adding `client_id` as a loadable field means
`ChartDataQueryContextSchema().load()` will include `client_id` in the returned
data and then pass it as an unexpected keyword argument to
`QueryContextFactory.create(**data)`, causing a runtime TypeError whenever a
client sends this field; to avoid this, the field should not be part of the
loaded kwargs (e.g. by making it dump-only) until the factory and QueryContext
are updated to accept it.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset-frontend/src/explore/components/ExploreViewContainer/index.jsx:
##########
@@ -398,9 +399,22 @@ function ExploreViewContainer(props) {
);
function onStop() {
- if (props.chart && props.chart.queryController) {
+ // Method 1: Abort the in-flight HTTP request using AbortController
+ if (props.chart?.queryController) {
props.chart.queryController.abort();
}
+
+ // Method 2: Send stop request to backend if we have a query ID
+ const queryId = props.chart?.latestQueryId;
+ if (queryId) {
+ SupersetClient.post({
+ endpoint: '/api/v1/query/stop',
+ body: JSON.stringify({ client_id: queryId }),
+ headers: { 'Content-Type': 'application/json' },
+ }).catch(error => {
Review Comment:
**Suggestion:** The stop handler reads `latestQueryId` from the chart state,
but `ChartState` (and the chart reducer) never defines or sets this property,
so `queryId` will typically be `undefined` and the `/api/v1/query/stop` call
will never execute; instead, derive the query id from the existing
`queriesResponse` array (which contains `QueryData` objects with `query_id`) so
that the backend stop endpoint is actually invoked for the running query.
[logic error]
**Severity Level:** Minor ⚠️
```suggestion
const queryId =
props.chart?.queriesResponse && props.chart.queriesResponse.length > 0
? props.chart.queriesResponse[0]?.query_id
: undefined;
if (queryId) {
SupersetClient.post({
endpoint: '/api/v1/query/stop',
body: JSON.stringify({ client_id: queryId }),
headers: { 'Content-Type': 'application/json' },
}).catch(() => {
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
In the actual Explore chart state (`ChartState` from `src/explore/types.ts`
and the `chartReducer`),
there is no `latestQueryId` field at all: the reducer never sets it, and the
type definition omits it.
That means `props.chart?.latestQueryId` will almost always be `undefined`,
so the `/api/v1/query/stop`
branch *never runs* and the backend stop endpoint is effectively dead code.
The suggestion switches to deriving the identifier from
`props.chart.queriesResponse[0]?.query_id`.
`queriesResponse` is a real, well-defined part of `ChartState`, and
`QueryData` in
`@superset-ui/core` has an optional `query_id` field, so at least this code
is operating on
actual, populated state instead of a fantasy property. It eliminates a
genuine logic bug
(reading a non-existent field and silently never sending the stop request)
and replaces it
with a call based on the only available query identifier, without
introducing syntax or
type errors. This is a behavioral fix, not just cosmetic.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:**
superset-frontend/src/explore/components/ExploreViewContainer/index.jsx
**Line:** 408:414
**Comment:**
*Logic Error: The stop handler reads `latestQueryId` from the chart
state, but `ChartState` (and the chart reducer) never defines or sets this
property, so `queryId` will typically be `undefined` and the
`/api/v1/query/stop` call will never execute; instead, derive the query id from
the existing `queriesResponse` array (which contains `QueryData` objects with
`query_id`) so that the backend stop endpoint is actually invoked for the
running query.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/common/query_context.py:
##########
@@ -122,8 +123,10 @@ def get_df_payload(
force_cached=force_cached,
)
- def get_query_result(self, query_object: QueryObject) -> QueryResult:
- return self._processor.get_query_result(query_object)
+ def get_query_result(
+ self, query_object: QueryObject, query: Query | None = None
+ ) -> QueryResult:
+ return self._processor.get_query_result(query_object, query=query)
Review Comment:
**Suggestion:** The new `get_query_result` method forwards a `query` keyword
argument to `_processor.get_query_result`, but
`QueryContextProcessor.get_query_result` only accepts a single `query_object`
parameter; this will raise a `TypeError` at runtime whenever
`QueryContext.get_query_result` is called, preventing queries from running.
[type error]
**Severity Level:** Minor ⚠️
```suggestion
return self._processor.get_query_result(query_object)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a real, concrete runtime bug. In `QueryContextProcessor` the method
is defined as:
def get_query_result(self, query_object: QueryObject) -> QueryResult:
It takes only `query_object`. The new wrapper in `QueryContext` forwards a
`query=` keyword argument:
self._processor.get_query_result(query_object, query=query)
which will raise `TypeError: QueryContextProcessor.get_query_result()
got an unexpected keyword argument 'query'` the moment the optional
parameter is used. The suggested change removes the unsupported keyword
and calls `get_query_result` with only the allowed argument, eliminating
the TypeError while preserving the wrapper's signature. This directly
fixes a correctness issue, not a stylistic nit.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_context.py
**Line:** 129:129
**Comment:**
*Type Error: The new `get_query_result` method forwards a `query`
keyword argument to `_processor.get_query_result`, but
`QueryContextProcessor.get_query_result` only accepts a single `query_object`
parameter; this will raise a `TypeError` at runtime whenever
`QueryContext.get_query_result` is called, preventing queries from running.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/common/query_context_processor.py:
##########
@@ -219,15 +279,17 @@ def query_cache_key(self, query_obj: QueryObject,
**kwargs: Any) -> str | None:
)
return cache_key
- def get_query_result(self, query_object: QueryObject) -> QueryResult:
+ def get_query_result(
+ self, query_object: QueryObject, query: Query | None = None
+ ) -> QueryResult:
"""
Returns a pandas dataframe based on the query object.
This method delegates to the datasource's get_query_result method,
which handles query execution, normalization, time offsets, and
post-processing.
"""
- return self._qc_datasource.get_query_result(query_object)
+ return self._qc_datasource.get_query_result(query_object, query=query)
Review Comment:
**Suggestion:** The wrapper `get_query_result` passes an extra `query`
argument down to the datasource's `get_query_result` method, but
`BaseDatasource.get_query_result` only accepts a single `query_object`
parameter, so this will raise a TypeError at runtime when executing chart
queries. [type error]
**Severity Level:** Minor ⚠️
```suggestion
The optional `query` parameter is accepted for compatibility with
callers but is not passed down, since datasources currently do not
support it.
"""
return self._qc_datasource.get_query_result(query_object)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
`BaseDatasource.get_query_result` (see superset/models/helpers.py) is
defined as
`def get_query_result(self, query_object: QueryObject) -> QueryResult` and
does
not accept a `query` parameter. The PR's wrapper now calls
`self._qc_datasource.get_query_result(query_object, query=query)`, which will
raise a `TypeError: get_query_result() takes 2 positional arguments but 3
were
given` at runtime for all chart queries. The suggested change keeps the
wrapper
signature (so callers like `get_df_payload(..., query=query_model)` still
compile)
but stops passing the extra argument down to datasources, matching the actual
method signature and eliminating a real runtime error. This is a concrete
bug fix,
not cosmetic.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_context_processor.py
**Line:** 291:292
**Comment:**
*Type Error: The wrapper `get_query_result` passes an extra `query`
argument down to the datasource's `get_query_result` method, but
`BaseDatasource.get_query_result` only accepts a single `query_object`
parameter, so this will raise a TypeError at runtime when executing chart
queries.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/connectors/sqla/models.py:
##########
@@ -1637,16 +1637,16 @@ def _get_top_groups(
return or_(*groups)
- def query(self, query_obj: QueryObjectDict) -> QueryResult:
+ def query(
+ self, query_obj: QueryObjectDict, query: "Query" | None = None
+ ) -> QueryResult:
"""
Executes the query for SqlaTable with additional column ordering logic.
This overrides ExploreMixin.query() to add SqlaTable-specific behavior
for handling column_order from extras.
"""
- # Get the base result from ExploreMixin
- # (explicitly, not super() which would hit BaseDatasource first)
- result = ExploreMixin.query(self, query_obj)
+ result = ExploreMixin.query(self, query_obj, query=query)
Review Comment:
**Suggestion:** The overridden query method passes an unexpected keyword
argument `query` into `ExploreMixin.query`, whose current signature only
accepts `(self, query_obj)`; this will raise a TypeError at runtime as soon as
SqlaTable.query is invoked, preventing any chart query from executing. [type
error]
**Severity Level:** Minor ⚠️
```suggestion
result = ExploreMixin.query(self, query_obj)
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
This is a real runtime bug, not cosmetic nonsense. ExploreMixin.query is
defined
in superset/models/helpers.py as `def query(self, query_obj:
QueryObjectDict) ->
QueryResult` and does not accept a `query` parameter or **kwargs. Calling it
as
`ExploreMixin.query(self, query_obj, query=query)` will raise a `TypeError:
query() got an unexpected keyword argument 'query'` the moment
SqlaTable.query is used. Dropping the bogus keyword so the call matches the
actual signature fixes a concrete, verifiable error that would break chart
queries.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/connectors/sqla/models.py
**Line:** 1649:1649
**Comment:**
*Type Error: The overridden query method passes an unexpected keyword
argument `query` into `ExploreMixin.query`, whose current signature only
accepts `(self, query_obj)`; this will raise a TypeError at runtime as soon as
SqlaTable.query is invoked, preventing any chart query from executing.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/models/helpers.py:
##########
@@ -1182,6 +1185,7 @@ def assign_column_label(df: pd.DataFrame) ->
Optional[pd.DataFrame]:
self.catalog,
self.schema,
mutator=assign_column_label,
+ query=query,
Review Comment:
**Suggestion:** The call to the database layer passes an unexpected keyword
argument `query` to `Database.get_df`, whose current signature does not accept
this parameter, causing a `TypeError` at runtime when executing queries via
this mixin. [type error]
**Severity Level:** Minor ⚠️
```suggestion
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
In the current code, `ExploreMixin.query` calls `self.database.get_df(...,
query=query)`, but
`Database.get_df` is defined as `get_df(self, sql, catalog=None,
schema=None, mutator=None)`
and does not accept a `query` keyword argument. This will cause Python to
raise
`TypeError: get_df() got an unexpected keyword argument 'query'` at runtime
whenever this path
is executed. The suggested change (removing `query=query`) directly fixes a
real, verified
runtime bug rather than being cosmetic.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1188:1188
**Comment:**
*Type Error: The call to the database layer passes an unexpected
keyword argument `query` to `Database.get_df`, whose current signature does not
accept this parameter, causing a `TypeError` at runtime when executing queries
via this mixin.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/common/query_context_processor.py:
##########
@@ -199,6 +250,15 @@ def get_df_payload(
"label_map": label_map,
}
+ # expose client id so callers (and ultimately the frontend) can stop
queries
+ if query_model is not None:
Review Comment:
**Suggestion:** The variable `query_model` is only defined inside the
cache-miss branch, but is later accessed unconditionally when building the
payload, so when the result comes from cache (or the branch is skipped) this
will raise an UnboundLocalError at `if query_model is not None`, breaking
cached chart requests. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
if "query_model" in locals() and query_model is not None:
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
In the current implementation `query_model` is only defined inside the
cache-miss
branch in `get_df_payload`. When the result is served from cache (or the
whole
branch is skipped), the later `if query_model is not None:` reads a local
variable
that was never assigned, which will raise `UnboundLocalError` and blow up
cached
chart responses. Guarding the access with `if "query_model" in locals() and
query_model is not None:` avoids reading an uninitialized local, fixing a
real
runtime bug, not just style. A cleaner alternative would be initializing
`query_model = None` at the start of the function, but the proposed change
is still
functionally correct and addresses a genuine issue.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/common/query_context_processor.py
**Line:** 254:254
**Comment:**
*Logic Error: The variable `query_model` is only defined inside the
cache-miss branch, but is later accessed unconditionally when building the
payload, so when the result comes from cache (or the branch is skipped) this
will raise an UnboundLocalError at `if query_model is not None`, breaking
cached chart requests.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]