Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

via GitHub Mon, 08 Dec 2025 11:43:48 -0800


aminghadersohi commented on code in PR #36458:
URL: https://github.com/apache/superset/pull/36458#discussion_r2599908256



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -200,6 +179,14 @@ class DatasetList(BaseModel):
     has_next: bool
     columns_requested: List[str] | None = None
     columns_loaded: List[str] | None = None

Review Comment:
   Addressed - Updated all three schema files (chart, dashboard, dataset) to 
use `Field(default_factory=list)` for `columns_requested` and `columns_loaded`, 
ensuring consistent list defaults across all model types.



##########
superset/mcp_service/mcp_core.py:
##########
@@ -473,34 +479,96 @@ def _generate_instance_info(self) -> BaseModel:
             raise
 
 
-class ModelGetAvailableFiltersCore(BaseCore, Generic[S]):
+class ModelGetSchemaCore(BaseCore, Generic[S]):
     """
-    Generic tool for retrieving available filterable columns and operators for 
a
-    model. Used for get_dataset_available_filters, get_chart_available_filters,
-    get_dashboard_available_filters, etc.
+    Generic tool for retrieving comprehensive schema metadata for a model type.
+
+    Provides unified schema discovery for list tools:
+    - select_columns: All columns available for selection
+    - filter_columns: Filterable columns with their operators
+    - sortable_columns: Columns valid for order_column
+    - default_columns: Columns returned when select_columns not specified
+    - search_columns: Columns searched by the search parameter
+    - default_sort: Default column for sorting
+    - default_sort_direction: Default sort direction ("asc" or "desc")
+
+    Replaces the individual get_*_available_filters tools with a unified 
approach.
     """
 
     def __init__(
         self,
+        model_type: Literal["chart", "dataset", "dashboard"],
         dao_class: Type[BaseDAO[Any]],
         output_schema: Type[S],
+        select_columns: List[Any],
+        sortable_columns: List[str],
+        default_columns: List[str],
+        search_columns: List[str],
+        default_sort: str = "changed_on",
+        default_sort_direction: Literal["asc", "desc"] = "desc",
         logger: logging.Logger | None = None,
     ) -> None:
+        """
+        Initialize the schema discovery core.
+
+        Args:
+            model_type: The type of model (chart, dataset, dashboard)
+            dao_class: The DAO class to query for filter columns
+            output_schema: Pydantic schema for the response (e.g., 
ModelSchemaInfo)
+            select_columns: Column metadata (List[ColumnMetadata] or similar)
+            sortable_columns: Column names that support sorting
+            default_columns: Column names returned by default
+            search_columns: Column names used for text search
+            default_sort: Default sort column
+            default_sort_direction: Default sort direction
+            logger: Optional logger instance
+        """
         super().__init__(logger)
+        self.model_type = model_type
         self.dao_class = dao_class
         self.output_schema = output_schema
+        self.select_columns = select_columns
+        self.sortable_columns = sortable_columns
+        self.default_columns = default_columns
+        self.search_columns = search_columns
+        self.default_sort = default_sort
+        self.default_sort_direction = default_sort_direction
 
-    def run_tool(self) -> S:
+    def _get_filter_columns(self) -> Dict[str, List[str]]:
+        """Get filterable columns and operators from the DAO."""
         try:
             filterable = self.dao_class.get_filterable_columns_and_operators()
-            # Ensure column_operators is a plain dict, not a custom type
-            column_operators = dict(filterable)
-            response = self.output_schema(column_operators=column_operators)
+            return dict(filterable)

Review Comment:
   Addressed - Added defensive handling in `_get_filter_columns()`: checks for 
None return value, handles both dict and dict-like objects, and logs a warning 
with the unexpected type if conversion fails.



##########
superset/mcp_service/system/resources/__init__.py:
##########
@@ -18,4 +18,7 @@
 """System resources for Superset MCP service"""
 
 # Import to register resources when module is imported
-from . import instance_metadata  # noqa: F401
+from . import (
+    instance_metadata,  # noqa: F401
+    schema_discovery,  # noqa: F401
+)

Review Comment:
   Addressed - Changed from grouped tuple import to separate import lines for 
better clarity and per-import linting.



##########
tests/unit_tests/mcp_service/chart/tool/test_list_charts.py:
##########
@@ -170,3 +173,37 @@ def test_model_dump_serialization(self):
         assert "page_size" in data
         assert data["filters"][0]["col"] == "slice_name"
         assert data["select_columns"] == ["id", "slice_name"]
+
+
+class TestChartDefaultColumnFiltering:
+    """Test default column filtering behavior for charts."""
+
+    def test_minimal_default_columns_constant(self):
+        """Test that minimal default columns are properly defined."""
+        from superset.mcp_service.common.schema_discovery import 
CHART_DEFAULT_COLUMNS
+
+        # Should have exactly 4 minimal columns
+        assert len(CHART_DEFAULT_COLUMNS) == 4
+        assert set(CHART_DEFAULT_COLUMNS) == {"id", "slice_name", "viz_type", 
"uuid"}

Review Comment:
   Addressed - Replaced the exact length assertion with `issubset` check: 
`required_columns.issubset(set(CHART_DEFAULT_COLUMNS))`. This ensures the 
required columns are present while allowing future additions.



##########
tests/unit_tests/mcp_service/chart/tool/test_list_charts.py:
##########
@@ -170,3 +173,37 @@ def test_model_dump_serialization(self):
         assert "page_size" in data
         assert data["filters"][0]["col"] == "slice_name"
         assert data["select_columns"] == ["id", "slice_name"]
+
+
+class TestChartDefaultColumnFiltering:
+    """Test default column filtering behavior for charts."""
+
+    def test_minimal_default_columns_constant(self):
+        """Test that minimal default columns are properly defined."""
+        from superset.mcp_service.common.schema_discovery import 
CHART_DEFAULT_COLUMNS
+
+        # Should have exactly 4 minimal columns
+        assert len(CHART_DEFAULT_COLUMNS) == 4
+        assert set(CHART_DEFAULT_COLUMNS) == {"id", "slice_name", "viz_type", 
"uuid"}
+
+        # Heavy columns should NOT be in defaults
+        assert "form_data" not in CHART_DEFAULT_COLUMNS
+        assert "query_context" not in CHART_DEFAULT_COLUMNS
+        assert "description" not in CHART_DEFAULT_COLUMNS
+        assert "datasource_name" not in CHART_DEFAULT_COLUMNS
+
+    def test_empty_select_columns_default(self):
+        """Test that select_columns defaults to empty list which triggers
+        minimal defaults in tool."""
+        request = ListChartsRequest()
+        assert request.select_columns == []
+
+    def test_explicit_select_columns(self):
+        """Test that explicit select_columns can include non-default 
columns."""
+        request = ListChartsRequest(
+            select_columns=["id", "slice_name", "description", "form_data"]
+        )
+        assert "description" in request.select_columns
+        assert "form_data" in request.select_columns
+        # These are explicitly requested, not defaults
+        assert len(request.select_columns) == 4

Review Comment:
   Addressed - Replaced the loose length assertion with a precise set equality 
check: `assert set(request.select_columns) == {"id", "slice_name", 
"description", "form_data"}`



##########
superset/mcp_service/system/resources/schema_discovery.py:
##########
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP Resources for schema discovery.
+
+These resources provide static metadata about available columns, filters,
+and sorting options for chart, dataset, and dashboard models.
+LLM clients can use these resources to understand what parameters are valid
+without making API calls.
+"""
+
+import logging
+from typing import Any
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.auth import mcp_auth_hook
+
+logger = logging.getLogger(__name__)
+
+
+def _build_schema_resource(model_type: str) -> dict[str, Any]:
+    """Build schema resource data for a model type."""
+    from superset.mcp_service.common.schema_discovery import (
+        CHART_ALL_COLUMNS,
+        CHART_DEFAULT_COLUMNS,
+        CHART_SEARCH_COLUMNS,
+        CHART_SELECT_COLUMNS,
+        CHART_SORTABLE_COLUMNS,
+        DASHBOARD_ALL_COLUMNS,
+        DASHBOARD_DEFAULT_COLUMNS,
+        DASHBOARD_SEARCH_COLUMNS,
+        DASHBOARD_SELECT_COLUMNS,
+        DASHBOARD_SORTABLE_COLUMNS,
+        DATASET_ALL_COLUMNS,
+        DATASET_DEFAULT_COLUMNS,
+        DATASET_SEARCH_COLUMNS,
+        DATASET_SELECT_COLUMNS,
+        DATASET_SORTABLE_COLUMNS,
+    )
+
+    schemas = {
+        "chart": {
+            "model_type": "chart",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}

Review Comment:
   Addressed - Added `_safe_col_dict()` helper function that uses `getattr()` 
with defaults and `str()` coercion to ensure JSON-safe serialization of column 
metadata.



##########
superset/mcp_service/system/resources/schema_discovery.py:
##########
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP Resources for schema discovery.
+
+These resources provide static metadata about available columns, filters,
+and sorting options for chart, dataset, and dashboard models.
+LLM clients can use these resources to understand what parameters are valid
+without making API calls.
+"""
+
+import logging
+from typing import Any
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.auth import mcp_auth_hook
+
+logger = logging.getLogger(__name__)
+
+
+def _build_schema_resource(model_type: str) -> dict[str, Any]:
+    """Build schema resource data for a model type."""
+    from superset.mcp_service.common.schema_discovery import (
+        CHART_ALL_COLUMNS,
+        CHART_DEFAULT_COLUMNS,
+        CHART_SEARCH_COLUMNS,
+        CHART_SELECT_COLUMNS,
+        CHART_SORTABLE_COLUMNS,
+        DASHBOARD_ALL_COLUMNS,
+        DASHBOARD_DEFAULT_COLUMNS,
+        DASHBOARD_SEARCH_COLUMNS,
+        DASHBOARD_SELECT_COLUMNS,
+        DASHBOARD_SORTABLE_COLUMNS,
+        DATASET_ALL_COLUMNS,
+        DATASET_DEFAULT_COLUMNS,
+        DATASET_SEARCH_COLUMNS,
+        DATASET_SELECT_COLUMNS,
+        DATASET_SORTABLE_COLUMNS,
+    )
+
+    schemas = {
+        "chart": {
+            "model_type": "chart",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}
+                for col in CHART_SELECT_COLUMNS
+            ],
+            "all_column_names": CHART_ALL_COLUMNS,
+            "default_columns": CHART_DEFAULT_COLUMNS,
+            "sortable_columns": CHART_SORTABLE_COLUMNS,
+            "search_columns": CHART_SEARCH_COLUMNS,
+            "default_sort": "changed_on",
+            "default_sort_direction": "desc",
+        },
+        "dataset": {
+            "model_type": "dataset",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}

Review Comment:
   Addressed - Same fix applied using the shared `_safe_col_dict()` helper for 
JSON-safe serialization.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

Reply via email to