codeant-ai-for-open-source[bot] commented on code in PR #38562:
URL: https://github.com/apache/superset/pull/38562#discussion_r2930703935
##########
superset/mcp_service/flask_singleton.py:
##########
@@ -52,62 +51,45 @@
logger.info("Reusing existing Flask app from app context for MCP
service")
# Use _get_current_object() to get the actual Flask app, not the
LocalProxy
app = current_app._get_current_object()
+ elif appbuilder_initialized:
+ # appbuilder is initialized but we have no app context. Calling
+ # create_app() here would invoke appbuilder.init_app() a second
+ # time with a *different* Flask app, overwriting shared internal
+ # state (views, security manager, etc.). Fail loudly instead of
+ # silently corrupting the singleton.
+ raise RuntimeError(
+ "appbuilder is already initialized but no Flask app context is "
+ "available. Cannot call create_app() as it would re-initialize "
+ "appbuilder with a different Flask app instance."
+ )
Review Comment:
**Suggestion:** The new `RuntimeError` path makes `get_flask_app()` fail
whenever `appbuilder` is initialized but no request/app context is active,
which is exactly the situation some MCP paths hit before they push a context.
This can cause MCP startup/tool calls to crash in-process even though a Flask
app already exists. Reuse the already-bound app instance from `appbuilder` when
available, and only raise if no concrete app can be resolved. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP initialization fails in no-context in-process paths.
- ⚠️ Tool auth/context setup can fail before execution.
- ⚠️ Middleware factories depending on Flask app may not initialize.
```
</details>
```suggestion
elif appbuilder_initialized:
# appbuilder is initialized but we have no app context. If the
# concrete Flask app is already bound, reuse it directly.
existing_app = getattr(appbuilder, "_app", None)
if isinstance(existing_app, Flask):
logger.info("Reusing existing Flask app from appbuilder for MCP
service")
app = existing_app
else:
raise RuntimeError(
"appbuilder is already initialized but no Flask app context
is "
"available. Cannot resolve an existing Flask app instance."
)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start Superset so `appbuilder` is initialized
(`superset/extensions/__init__.py:11`
defines global `appbuilder`; `superset/mcp_service/flask_singleton.py:46`
checks
`appbuilder._session is not None`).
2. Execute MCP setup from a code path without Flask app context (real
callers use
`get_flask_app()` outside request context, e.g.
`superset/mcp_service/server.py:341-345`,
`superset/mcp_service/auth.py:47-54` when `has_app_context()` is false).
3. Import/load `superset.mcp_service.flask_singleton` via those callers;
module-level
initialization runs immediately
(`superset/mcp_service/flask_singleton.py:35-65`).
4. When `appbuilder_initialized=True` and no context, branch `elif
appbuilder_initialized`
raises `RuntimeError` (`superset/mcp_service/flask_singleton.py:54-64`),
aborting MCP
server/middleware/auth initialization instead of reusing existing app.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/flask_singleton.py
**Line:** 54:64
**Comment:**
*Logic Error: The new `RuntimeError` path makes `get_flask_app()` fail
whenever `appbuilder` is initialized but no request/app context is active,
which is exactly the situation some MCP paths hit before they push a context.
This can cause MCP startup/tool calls to crash in-process even though a Flask
app already exists. Reuse the already-bound app instance from `appbuilder` when
available, and only raise if no concrete app can be resolved.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=0a6a9ef7cfc88e24582948d0c1b9ec3e22b3ff722b493abbeb40f688f7c644e3&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=0a6a9ef7cfc88e24582948d0c1b9ec3e22b3ff722b493abbeb40f688f7c644e3&reaction=dislike'>👎</a>
##########
superset/mcp_service/system/schemas.py:
##########
@@ -125,33 +111,19 @@ class FeatureAvailability(BaseModel):
class InstanceInfo(BaseModel):
- instance_summary: InstanceSummary = Field(
- ..., description="Instance summary information"
- )
- recent_activity: RecentActivity = Field(
- ..., description="Recent activity information"
- )
- dashboard_breakdown: DashboardBreakdown = Field(
- ..., description="Dashboard breakdown information"
- )
- database_breakdown: DatabaseBreakdown = Field(
- ..., description="Database breakdown by type"
- )
- popular_content: PopularContent = Field(
- ..., description="Popular content information"
- )
+ instance_summary: InstanceSummary
+ recent_activity: RecentActivity
+ dashboard_breakdown: DashboardBreakdown
+ database_breakdown: DatabaseBreakdown
+ popular_content: PopularContent
current_user: UserInfo | None = Field(
None,
- description="The authenticated user making the request. "
- "Use current_user.id with created_by_fk filter to find your own
assets.",
- )
- feature_availability: FeatureAvailability = Field(
- ...,
description=(
- "Dynamic feature availability for the current user and deployment"
+ "Use current_user.id with created_by_fk filter to find your own
assets."
),
)
- timestamp: datetime = Field(..., description="Response timestamp")
+ feature_availability: FeatureAvailability
Review Comment:
**Suggestion:** `feature_availability` is now required, but
`instance_metadata.py` builds `InstanceInfo` without registering that metric
calculator. This causes a Pydantic validation failure during resource
generation and forces the metadata endpoint into its error fallback path. Make
this field default to an empty `FeatureAvailability` so callers that don't
provide it still produce valid responses. [logic error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ instance://metadata returns fallback error payload only.
- ❌ Available datasets/databases context missing from metadata resource.
- ⚠️ Accessible menus not provided in metadata response.
```
</details>
```suggestion
feature_availability: FeatureAvailability =
Field(default_factory=FeatureAvailability)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start MCP app import path; `superset/mcp_service/app.py:419-423` imports
`system.resources`, and
`superset/mcp_service/system/resources/__init__.py:21-24`
registers `instance://metadata` via `instance_metadata.py`.
2. Trigger resource read for `instance://metadata` (resource function
`get_instance_metadata_resource` at
`superset/mcp_service/system/resources/instance_metadata.py:36-38`).
3. In that function, `InstanceInfoCore` is configured with
`output_schema=InstanceInfo`
(`instance_metadata.py:78`) but `metric_calculators` omits
`feature_availability`
(`instance_metadata.py:79-85`), then calls
`instance_info_core.get_resource()`
(`instance_metadata.py:95`).
4. `InstanceInfoCore._generate_instance_info()` builds `response_data` and
instantiates
schema (`superset/mcp_service/mcp_core.py:121-129`); `InstanceInfo` requires
`feature_availability` (`superset/mcp_service/system/schemas.py:125`), so
Pydantic raises
`ValidationError` (subclass of `ValueError`), caught by
`instance_metadata.py:143-155`,
returning fallback error JSON instead of metadata.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/system/schemas.py
**Line:** 125:125
**Comment:**
*Logic Error: `feature_availability` is now required, but
`instance_metadata.py` builds `InstanceInfo` without registering that metric
calculator. This causes a Pydantic validation failure during resource
generation and forces the metadata endpoint into its error fallback path. Make
this field default to an empty `FeatureAvailability` so callers that don't
provide it still produce valid responses.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=5f7cf68eead1fa1f4deaab812137a251e3658c8488ba3d94f9ca57aa6b31f468&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=5f7cf68eead1fa1f4deaab812137a251e3658c8488ba3d94f9ca57aa6b31f468&reaction=dislike'>👎</a>
##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -286,45 +286,31 @@ class GetDashboardInfoRequest(MetadataCacheControl):
class DashboardInfo(BaseModel):
- id: int | None = Field(None, description="Dashboard ID")
- dashboard_title: str | None = Field(None, description="Dashboard title")
- slug: str | None = Field(None, description="Dashboard slug")
- description: str | None = Field(None, description="Dashboard description")
- css: str | None = Field(None, description="Custom CSS for the dashboard")
- certified_by: str | None = Field(None, description="Who certified the
dashboard")
- certification_details: str | None = Field(None, description="Certification
details")
- json_metadata: str | None = Field(
- None, description="Dashboard metadata (JSON string)"
- )
- position_json: str | None = Field(None, description="Chart positions (JSON
string)")
- published: bool | None = Field(
- None, description="Whether the dashboard is published"
- )
- is_managed_externally: bool | None = Field(
- None, description="Whether managed externally"
- )
- external_url: str | None = Field(None, description="External URL")
- created_on: str | datetime | None = Field(None, description="Creation
timestamp")
- changed_on: str | datetime | None = Field(
- None, description="Last modification timestamp"
- )
- created_by: str | None = Field(None, description="Dashboard creator
(username)")
- changed_by: str | None = Field(None, description="Last modifier
(username)")
- uuid: str | None = Field(None, description="Dashboard UUID (converted to
string)")
- url: str | None = Field(None, description="Dashboard URL")
- created_on_humanized: str | None = Field(
- None, description="Humanized creation time"
- )
- changed_on_humanized: str | None = Field(
- None, description="Humanized modification time"
- )
- chart_count: int = Field(0, description="Number of charts in the
dashboard")
- owners: List[UserInfo] = Field(default_factory=list,
description="Dashboard owners")
- tags: List[TagInfo] = Field(default_factory=list, description="Dashboard
tags")
- roles: List[RoleInfo] = Field(default_factory=list, description="Dashboard
roles")
- charts: List[ChartInfo] = Field(
- default_factory=list, description="Dashboard charts"
- )
+ id: int | None = None
+ dashboard_title: str | None = None
+ slug: str | None = None
+ description: str | None = None
+ css: str | None = None
+ certified_by: str | None = None
+ certification_details: str | None = None
+ json_metadata: str | None = None
+ position_json: str | None = None
+ published: bool | None = None
+ is_managed_externally: bool | None = None
+ external_url: str | None = None
+ created_on: str | datetime | None = None
+ changed_on: str | datetime | None = None
+ created_by: str | None = None
+ changed_by: str | None = None
Review Comment:
**Suggestion:** `DashboardInfo` now omits `created_by_name` and
`changed_by_name`, but `serialize_dashboard_object` and dashboard schema
discovery still use those field names. Pydantic will silently ignore these
unexpected inputs, so requested creator/modifier display-name columns disappear
from list responses. Restore the name-based fields to keep serializer and
schema contracts aligned. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ list_dashboards omits requested creator display-name columns.
- ⚠️ get_schema advertises columns list_dashboards cannot return.
- ❌ MCP client column-selection contract becomes inconsistent.
```
</details>
```suggestion
created_by: str | None = None
changed_by: str | None = None
created_by_name: str | None = None
changed_by_name: str | None = None
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Start MCP server where tools are registered in
`superset/mcp_service/app.py:402-407`
(`list_dashboards`) and `superset/mcp_service/app.py:423-426` (`get_schema`).
2. Call `get_schema(model_type=\"dashboard\")`; this executes
`superset/mcp_service/system/tool/get_schema.py:126` and builds dashboard
select columns
via `select_columns=get_dashboard_columns()` at `.../get_schema.py:103`.
3. `get_dashboard_columns()` includes
`"created_by_name"`/`"changed_by_name"` in
`superset/mcp_service/common/schema_discovery.py:293-295` and `:275-277`, so
clients are
told these are valid selectable columns.
4. Call
`list_dashboards(select_columns=[\"created_by_name\",\"changed_by_name\"])`;
execution goes through
`superset/mcp_service/dashboard/tool/list_dashboards.py:68`,
computes columns from schema discovery at `:103`, and serializes with
context filtering at
`:149` and `:156`.
5. During item serialization, `serialize_dashboard_object()` passes
`created_by_name`/`changed_by_name` into `DashboardInfo` at
`superset/mcp_service/dashboard/schemas.py:515` and `:512`, but
`DashboardInfo` no longer
defines these fields (`:303-304` only has `created_by`/`changed_by`), so
requested
display-name columns are dropped from output.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/schemas.py
**Line:** 303:304
**Comment:**
*Logic Error: `DashboardInfo` now omits `created_by_name` and
`changed_by_name`, but `serialize_dashboard_object` and dashboard schema
discovery still use those field names. Pydantic will silently ignore these
unexpected inputs, so requested creator/modifier display-name columns disappear
from list responses. Restore the name-based fields to keep serializer and
schema contracts aligned.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=36ffa07b4a287e27404606d8aef5b334442903f5b815d4e79dd7bf361f4e547d&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38562&comment_hash=36ffa07b4a287e27404606d8aef5b334442903f5b815d4e79dd7bf361f4e547d&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]