Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

via GitHub Mon, 08 Dec 2025 10:11:10 -0800


aminghadersohi commented on code in PR #36458:
URL: https://github.com/apache/superset/pull/36458#discussion_r2599621316



##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -132,20 +133,16 @@ def _serialize_chart(
             % (count, total_pages)
         )
 
-        # Apply field filtering via serialization context if select_columns 
specified
-        # This triggers ChartInfo._filter_fields_by_context for each chart
-        if request.select_columns:
-            await ctx.debug(
-                "Applying field filtering via serialization context: 
select_columns=%s"
-                % (request.select_columns,)
-            )
-            # Return dict with context - FastMCP will serialize it
-            return result.model_dump(
-                mode="json", context={"select_columns": request.select_columns}
-            )
-
-        # No filtering - return full result as dict
-        return result.model_dump(mode="json")
+        # Apply field filtering via serialization context
+        # Use requested columns or defaults - always filter to reduce token 
usage
+        columns_to_filter = request.select_columns or DEFAULT_CHART_COLUMNS

Review Comment:
   Addressed - Now using `result.columns_requested` which is already 
resolved/validated by ModelListCore instead of raw `request.select_columns`.



##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -108,17 +112,11 @@ def _serialize_dashboard(
         page_size=request.page_size,
     )
 
-    # Apply field filtering via serialization context if select_columns 
specified
-    # This triggers DashboardInfo._filter_fields_by_context for each dashboard
-    if request.select_columns:
-        await ctx.debug(
-            "Applying field filtering via serialization context: 
select_columns=%s"
-            % (request.select_columns,)
-        )
-        # Return dict with context - FastMCP will serialize it
-        return result.model_dump(
-            mode="json", context={"select_columns": request.select_columns}
-        )
-
-    # No filtering - return full result as dict
-    return result.model_dump(mode="json")
+    # Apply field filtering via serialization context
+    # Use requested columns or defaults - always filter to reduce token usage
+    columns_to_filter = request.select_columns or DEFAULT_DASHBOARD_COLUMNS

Review Comment:
   Addressed - Same fix applied: now using `result.columns_requested` which is 
resolved by ModelListCore.



##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -144,20 +144,16 @@ def _serialize_dataset(
             )
         )
 
-        # Apply field filtering via serialization context if select_columns 
specified
-        # This triggers DatasetInfo._filter_fields_by_context for each dataset
-        if request.select_columns:
-            await ctx.debug(
-                "Applying field filtering via serialization context: 
select_columns=%s"
-                % (request.select_columns,)
-            )
-            # Return dict with context - FastMCP will serialize it
-            return result.model_dump(
-                mode="json", context={"select_columns": request.select_columns}
-            )
-
-        # No filtering - return full result as dict
-        return result.model_dump(mode="json")
+        # Apply field filtering via serialization context
+        # Use requested columns or defaults - always filter to reduce token 
usage
+        columns_to_filter = request.select_columns or DEFAULT_DATASET_COLUMNS
+        await ctx.debug(
+            "Applying field filtering via serialization context: 
select_columns=%s"
+            % (columns_to_filter,)
+        )
+        return result.model_dump(
+            mode="json", context={"select_columns": columns_to_filter}
+        )

Review Comment:
   Addressed - Now returning `DatasetList.model_validate(filtered)` which 
properly converts the dict back to a Pydantic model instance.



##########
superset/mcp_service/mcp_core.py:
##########
@@ -106,6 +106,8 @@ def __init__(
         search_columns: List[str],
         list_field_name: str,
         output_list_schema: Type[L],
+        all_columns: List[str] | None = None,
+        sortable_columns: List[str] | None = None,
         logger: logging.Logger | None = None,

Review Comment:
   Addressed - Moved `logger` parameter before `all_columns` and 
`sortable_columns` to preserve backwards compatibility with existing callers.



##########
superset/mcp_service/mcp_core.py:
##########
@@ -195,6 +199,8 @@ def get_keys(obj: BaseModel | dict[str, Any] | Any) -> 
List[str]:
             "has_next": page < total_pages - 1,
             "columns_requested": columns_requested,
             "columns_loaded": columns_to_load,
+            "columns_available": self.all_columns,
+            "sortable_columns": self.sortable_columns,

Review Comment:
   Addressed - Now returning `list(...)` copies in the result to prevent 
external mutation of internal state.



##########
superset/mcp_service/system/tool/get_schema.py:
##########
@@ -0,0 +1,142 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Unified schema discovery tool for MCP service.
+
+This tool consolidates schema discovery for all model types (chart, dataset,
+dashboard) into a single endpoint, reducing token usage and API calls.
+"""
+
+import logging
+from typing import Literal
+
+from fastmcp import Context
+from superset_core.mcp import tool
+
+from superset.daos.chart import ChartDAO
+from superset.daos.dashboard import DashboardDAO
+from superset.daos.dataset import DatasetDAO
+from superset.mcp_service.common.schema_discovery import (
+    CHART_DEFAULT_COLUMNS,
+    CHART_SEARCH_COLUMNS,
+    CHART_SELECT_COLUMNS,
+    CHART_SORTABLE_COLUMNS,
+    DASHBOARD_DEFAULT_COLUMNS,
+    DASHBOARD_SEARCH_COLUMNS,
+    DASHBOARD_SELECT_COLUMNS,
+    DASHBOARD_SORTABLE_COLUMNS,
+    DATASET_DEFAULT_COLUMNS,
+    DATASET_SEARCH_COLUMNS,
+    DATASET_SELECT_COLUMNS,
+    DATASET_SORTABLE_COLUMNS,
+    GetSchemaRequest,
+    GetSchemaResponse,
+    ModelSchemaInfo,
+)
+from superset.mcp_service.mcp_core import ModelGetSchemaCore
+from superset.mcp_service.utils.schema_utils import parse_request
+
+logger = logging.getLogger(__name__)
+
+# Create core instances for each model type
+_chart_schema_core = ModelGetSchemaCore(
+    model_type="chart",
+    dao_class=ChartDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=CHART_SELECT_COLUMNS,
+    sortable_columns=CHART_SORTABLE_COLUMNS,
+    default_columns=CHART_DEFAULT_COLUMNS,
+    search_columns=CHART_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+_dataset_schema_core = ModelGetSchemaCore(
+    model_type="dataset",
+    dao_class=DatasetDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=DATASET_SELECT_COLUMNS,
+    sortable_columns=DATASET_SORTABLE_COLUMNS,
+    default_columns=DATASET_DEFAULT_COLUMNS,
+    search_columns=DATASET_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+_dashboard_schema_core = ModelGetSchemaCore(
+    model_type="dashboard",
+    dao_class=DashboardDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=DASHBOARD_SELECT_COLUMNS,
+    sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
+    default_columns=DASHBOARD_DEFAULT_COLUMNS,
+    search_columns=DASHBOARD_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+# Map model types to their core instances
+_SCHEMA_CORES: dict[
+    Literal["chart", "dataset", "dashboard"], 
ModelGetSchemaCore[ModelSchemaInfo]
+] = {
+    "chart": _chart_schema_core,
+    "dataset": _dataset_schema_core,
+    "dashboard": _dashboard_schema_core,
+}
+
+
+@tool(tags=["discovery"])
+@parse_request(GetSchemaRequest)
+async def get_schema(request: GetSchemaRequest, ctx: Context) -> 
GetSchemaResponse:
+    """
+    Get comprehensive schema metadata for a model type.
+
+    Returns all information needed to construct valid queries:
+    - select_columns: All columns available for selection
+    - filter_columns: Filterable columns with their operators
+    - sortable_columns: Columns valid for order_column
+    - default_select: Columns returned when select_columns not specified
+    - search_columns: Columns searched by the search parameter
+
+    This unified tool consolidates discovery, reducing API calls and token 
usage.
+    Use this instead of calling get_chart_available_filters,
+    get_dataset_available_filters, and get_dashboard_available_filters 
separately.
+
+    Args:
+        model_type: One of "chart", "dataset", or "dashboard"
+
+    Returns:
+        Comprehensive schema information for the requested model type
+    """
+    await ctx.info(f"Getting schema for model_type={request.model_type}")
+
+    # Get the appropriate core instance and run it
+    core = _SCHEMA_CORES[request.model_type]

Review Comment:
   Addressed - Now using `.get()` with explicit error handling that returns a 
clear error message listing valid model types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

Reply via email to