Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

via GitHub Mon, 08 Dec 2025 09:45:22 -0800


codeant-ai-for-open-source[bot] commented on code in PR #36458:
URL: https://github.com/apache/superset/pull/36458#discussion_r2599536502



##########
superset/mcp_service/chart/tool/list_charts.py:
##########
@@ -132,20 +133,16 @@ def _serialize_chart(
             % (count, total_pages)
         )
 
-        # Apply field filtering via serialization context if select_columns 
specified
-        # This triggers ChartInfo._filter_fields_by_context for each chart
-        if request.select_columns:
-            await ctx.debug(
-                "Applying field filtering via serialization context: 
select_columns=%s"
-                % (request.select_columns,)
-            )
-            # Return dict with context - FastMCP will serialize it
-            return result.model_dump(
-                mode="json", context={"select_columns": request.select_columns}
-            )
-
-        # No filtering - return full result as dict
-        return result.model_dump(mode="json")
+        # Apply field filtering via serialization context
+        # Use requested columns or defaults - always filter to reduce token 
usage
+        columns_to_filter = request.select_columns or DEFAULT_CHART_COLUMNS

Review Comment:
   **Suggestion:** Using `request.select_columns` directly can be a string 
(JSON) or other unparsed input; passing that raw value into the serialization 
`context` can cause incorrect behavior or type errors. Use the parsed/returned 
`columns_requested` from the `result` (which ModelListCore normalizes) so the 
context always receives a proper list of column names. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           columns_to_filter = getattr(result, "columns_requested", 
DEFAULT_CHART_COLUMNS)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Good catch — ModelListCore normalizes/records the parsed columns as 
result.columns_requested.
   request.select_columns can be an unparsed string or other form, so using the 
value returned by
   the tool (result.columns_requested) ensures the serialization context always 
gets a proper list.
   This fixes a real type/behavior mismatch and is safe to apply here.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/list_charts.py
   **Line:** 138:138
   **Comment:**
        *Type Error: Using `request.select_columns` directly can be a string 
(JSON) or other unparsed input; passing that raw value into the serialization 
`context` can cause incorrect behavior or type errors. Use the parsed/returned 
`columns_requested` from the `result` (which ModelListCore normalizes) so the 
context always receives a proper list of column names.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/dashboard/tool/list_dashboards.py:
##########
@@ -108,17 +112,11 @@ def _serialize_dashboard(
         page_size=request.page_size,
     )
 
-    # Apply field filtering via serialization context if select_columns 
specified
-    # This triggers DashboardInfo._filter_fields_by_context for each dashboard
-    if request.select_columns:
-        await ctx.debug(
-            "Applying field filtering via serialization context: 
select_columns=%s"
-            % (request.select_columns,)
-        )
-        # Return dict with context - FastMCP will serialize it
-        return result.model_dump(
-            mode="json", context={"select_columns": request.select_columns}
-        )
-
-    # No filtering - return full result as dict
-    return result.model_dump(mode="json")
+    # Apply field filtering via serialization context
+    # Use requested columns or defaults - always filter to reduce token usage
+    columns_to_filter = request.select_columns or DEFAULT_DASHBOARD_COLUMNS

Review Comment:
   **Suggestion:** Type error / incorrect assumptions: `request.select_columns` 
may be provided as a JSON string or other formats; using it directly as 
`columns_to_filter` can pass a string into serialization context (which expects 
a list), causing incorrect filtering or runtime type errors. Parse 
`request.select_columns` into a list (e.g., with `parse_json_or_list`) before 
using it in the serialization context. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       from superset.mcp_service.utils.schema_utils import parse_json_or_list
       # Apply field filtering via serialization context
       # Use requested columns or defaults - always filter to reduce token usage
       if request.select_columns:
           columns_to_filter = parse_json_or_list(
               request.select_columns, param_name="select_columns"
           )
       else:
           columns_to_filter = DEFAULT_DASHBOARD_COLUMNS
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   ModelListCore.run_tool parses select_columns for DAO querying, but the 
original request.select_columns may still be a JSON string or non-list 
representation depending on the caller/schema. Passing that raw value into the 
serialization context can produce incorrect filtering or downstream runtime 
errors. Parsing (e.g., via parse_json_or_list) before using it in the 
serialization context is a sensible, low-risk correctness fix.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/tool/list_dashboards.py
   **Line:** 115:117
   **Comment:**
        *Type Error: Type error / incorrect assumptions: 
`request.select_columns` may be provided as a JSON string or other formats; 
using it directly as `columns_to_filter` can pass a string into serialization 
context (which expects a list), causing incorrect filtering or runtime type 
errors. Parse `request.select_columns` into a list (e.g., with 
`parse_json_or_list`) before using it in the serialization context.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/mcp_core.py:
##########
@@ -195,6 +199,8 @@ def get_keys(obj: BaseModel | dict[str, Any] | Any) -> 
List[str]:
             "has_next": page < total_pages - 1,
             "columns_requested": columns_requested,
             "columns_loaded": columns_to_load,
+            "columns_available": self.all_columns,
+            "sortable_columns": self.sortable_columns,

Review Comment:
   **Suggestion:** Exposing internal list objects directly in the response can 
leak internal mutable state or allow external mutation; the new response fields 
`columns_available` and `sortable_columns` should return a shallow copy (or 
empty list if None) to avoid accidental external modification or sharing the 
same list object used internally. [resource leak]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               "columns_available": list(self.all_columns) if self.all_columns 
is not None else [],
               "sortable_columns": list(self.sortable_columns) if 
self.sortable_columns is not None else [],
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Exposing internal list objects in the response can lead to accidental 
external mutation.
   Returning shallow copies (or empty lists when None) is a cheap, defensive 
improvement that avoids surprising shared-state bugs.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/mcp_core.py
   **Line:** 202:203
   **Comment:**
        *Resource Leak: Exposing internal list objects directly in the response 
can leak internal mutable state or allow external mutation; the new response 
fields `columns_available` and `sortable_columns` should return a shallow copy 
(or empty list if None) to avoid accidental external modification or sharing 
the same list object used internally.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/mcp_service/chart/tool/test_list_charts.py:
##########
@@ -170,3 +173,37 @@ def test_model_dump_serialization(self):
         assert "page_size" in data
         assert data["filters"][0]["col"] == "slice_name"
         assert data["select_columns"] == ["id", "slice_name"]
+
+
+class TestChartDefaultColumnFiltering:
+    """Test default column filtering behavior for charts."""
+
+    def test_minimal_default_columns_constant(self):
+        """Test that minimal default columns are properly defined."""
+        from superset.mcp_service.common.schema_discovery import 
CHART_DEFAULT_COLUMNS
+
+        # Should have exactly 4 minimal columns
+        assert len(CHART_DEFAULT_COLUMNS) == 4
+        assert set(CHART_DEFAULT_COLUMNS) == {"id", "slice_name", "viz_type", 
"uuid"}
+
+        # Heavy columns should NOT be in defaults
+        assert "form_data" not in CHART_DEFAULT_COLUMNS
+        assert "query_context" not in CHART_DEFAULT_COLUMNS
+        assert "description" not in CHART_DEFAULT_COLUMNS
+        assert "datasource_name" not in CHART_DEFAULT_COLUMNS
+
+    def test_empty_select_columns_default(self):
+        """Test that select_columns defaults to empty list which triggers
+        minimal defaults in tool."""
+        request = ListChartsRequest()
+        assert request.select_columns == []
+
+    def test_explicit_select_columns(self):
+        """Test that explicit select_columns can include non-default 
columns."""
+        request = ListChartsRequest(
+            select_columns=["id", "slice_name", "description", "form_data"]
+        )
+        assert "description" in request.select_columns
+        assert "form_data" in request.select_columns
+        # These are explicitly requested, not defaults
+        assert len(request.select_columns) == 4

Review Comment:
   **Suggestion:** Replace the loose length assertion for explicitly requested 
columns with a precise equality check to ensure the request preserves the 
requested columns and their order, making the test fail only if the behavior 
actually changes. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           expected = ["id", "slice_name", "description", "form_data"]
           request = ListChartsRequest(select_columns=expected)
           # Ensure the explicit columns are preserved exactly
           assert request.select_columns == expected
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Tightening the assertion to check exact equality (including order) makes the 
test more precise and will catch regressions where the request coercion or 
parsing mutates contents or order.
   This is a reasonable, actionable improvement to test correctness rather than 
a purely cosmetic change.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/chart/tool/test_list_charts.py
   **Line:** 203:209
   **Comment:**
        *Logic Error: Replace the loose length assertion for explicitly 
requested columns with a precise equality check to ensure the request preserves 
the requested columns and their order, making the test fail only if the 
behavior actually changes.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/mcp_service/chart/tool/test_list_charts.py:
##########
@@ -170,3 +173,37 @@ def test_model_dump_serialization(self):
         assert "page_size" in data
         assert data["filters"][0]["col"] == "slice_name"
         assert data["select_columns"] == ["id", "slice_name"]
+
+
+class TestChartDefaultColumnFiltering:
+    """Test default column filtering behavior for charts."""
+
+    def test_minimal_default_columns_constant(self):
+        """Test that minimal default columns are properly defined."""
+        from superset.mcp_service.common.schema_discovery import 
CHART_DEFAULT_COLUMNS
+
+        # Should have exactly 4 minimal columns
+        assert len(CHART_DEFAULT_COLUMNS) == 4
+        assert set(CHART_DEFAULT_COLUMNS) == {"id", "slice_name", "viz_type", 
"uuid"}

Review Comment:
   **Suggestion:** Test is brittle by asserting the exact length of 
`CHART_DEFAULT_COLUMNS`; if a new benign default column is added the test will 
fail unnecessarily. Instead assert that the required minimal columns are 
present (subset) and that there are at least the expected number, which 
preserves intent without being overly strict. [maintainability]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # The minimal required columns must be present
           minimal_required = {"id", "slice_name", "viz_type", "uuid"}
           assert minimal_required.issubset(set(CHART_DEFAULT_COLUMNS))
           # Ensure there are at least the expected minimal columns (not 
brittle to additions)
           assert len(CHART_DEFAULT_COLUMNS) >= len(minimal_required)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This suggestion improves test robustness: the intent is to ensure the 
minimal set exists, not to lock the list length to exactly 4 forever.
   Using issubset and a >= check preserves intent while avoiding brittle 
failures when unrelated, benign defaults are added later.
   It's a valid maintenance change for tests and doesn't hide real regressions 
because the test still rejects presence of heavy columns.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/chart/tool/test_list_charts.py
   **Line:** 185:187
   **Comment:**
        *Maintainability: Test is brittle by asserting the exact length of 
`CHART_DEFAULT_COLUMNS`; if a new benign default column is added the test will 
fail unnecessarily. Instead assert that the required minimal columns are 
present (subset) and that there are at least the expected number, which 
preserves intent without being overly strict.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/resources/schema_discovery.py:
##########
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP Resources for schema discovery.
+
+These resources provide static metadata about available columns, filters,
+and sorting options for chart, dataset, and dashboard models.
+LLM clients can use these resources to understand what parameters are valid
+without making API calls.
+"""
+
+import logging
+from typing import Any
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.auth import mcp_auth_hook
+
+logger = logging.getLogger(__name__)
+
+
+def _build_schema_resource(model_type: str) -> dict[str, Any]:
+    """Build schema resource data for a model type."""
+    from superset.mcp_service.common.schema_discovery import (
+        CHART_ALL_COLUMNS,
+        CHART_DEFAULT_COLUMNS,
+        CHART_SEARCH_COLUMNS,
+        CHART_SELECT_COLUMNS,
+        CHART_SORTABLE_COLUMNS,
+        DASHBOARD_ALL_COLUMNS,
+        DASHBOARD_DEFAULT_COLUMNS,
+        DASHBOARD_SEARCH_COLUMNS,
+        DASHBOARD_SELECT_COLUMNS,
+        DASHBOARD_SORTABLE_COLUMNS,
+        DATASET_ALL_COLUMNS,
+        DATASET_DEFAULT_COLUMNS,
+        DATASET_SEARCH_COLUMNS,
+        DATASET_SELECT_COLUMNS,
+        DATASET_SORTABLE_COLUMNS,
+    )
+
+    schemas = {
+        "chart": {
+            "model_type": "chart",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}
+                for col in CHART_SELECT_COLUMNS
+            ],
+            "all_column_names": CHART_ALL_COLUMNS,
+            "default_columns": CHART_DEFAULT_COLUMNS,
+            "sortable_columns": CHART_SORTABLE_COLUMNS,
+            "search_columns": CHART_SEARCH_COLUMNS,
+            "default_sort": "changed_on",
+            "default_sort_direction": "desc",
+        },
+        "dataset": {
+            "model_type": "dataset",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}
+                for col in DATASET_SELECT_COLUMNS
+            ],
+            "all_column_names": DATASET_ALL_COLUMNS,
+            "default_columns": DATASET_DEFAULT_COLUMNS,
+            "sortable_columns": DATASET_SORTABLE_COLUMNS,
+            "search_columns": DATASET_SEARCH_COLUMNS,
+            "default_sort": "changed_on",
+            "default_sort_direction": "desc",
+        },
+        "dashboard": {
+            "model_type": "dashboard",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}

Review Comment:
   **Suggestion:** Same issue for dashboard `select_columns`: `col.type` or 
`col.description` may not be JSON-safe. Convert `type` to a primitive (string) 
and use `getattr` for safety so `json.dumps` does not raise a TypeError or an 
AttributeError. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
                   {
                       "name": getattr(col, "name", None),
                       "description": getattr(col, "description", "") or "",
                       "type": getattr(col, "type", None)
                       if isinstance(getattr(col, "type", None), str)
                       else str(getattr(col, "type", None)),
                   }
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Same rationale: converting potentially non-primitive `type` to a string and 
using getattr
   is a safe guard against serialization errors or missing attributes. It's 
practical and low-risk.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/resources/schema_discovery.py
   **Line:** 86:86
   **Comment:**
        *Type Error: Same issue for dashboard `select_columns`: `col.type` or 
`col.description` may not be JSON-safe. Convert `type` to a primitive (string) 
and use `getattr` for safety so `json.dumps` does not raise a TypeError or an 
AttributeError.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/dataset/schemas.py:
##########
@@ -200,6 +179,14 @@ class DatasetList(BaseModel):
     has_next: bool
     columns_requested: List[str] | None = None
     columns_loaded: List[str] | None = None

Review Comment:
   **Suggestion:** Inconsistent list defaults: `columns_requested` and 
`columns_loaded` are typed as optional (`List[str] | None = None`) while the 
newly added `columns_available` and `sortable_columns` use 
`default_factory=list`. This creates inconsistent runtime types (None vs list) 
for fields that are semantically lists and may cause callers that iterate these 
attributes to raise TypeError when they encounter None. Make 
`columns_requested` and `columns_loaded` default to empty lists using 
`Field(default_factory=list, ...)` to keep types consistent and avoid needing 
None checks. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       columns_requested: List[str] = Field(
           default_factory=list,
           description="Requested columns for the response",
       )
       columns_loaded: List[str] = Field(
           default_factory=list,
           description="Columns that were actually loaded for each dataset",
       )
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a valid and practical improvement: having some list fields default 
to None while newly added ones default to [] is an inconsistency that easily 
causes callers to hit TypeError when they iterate without None checks. 
Converting columns_requested/columns_loaded to Field(default_factory=list) 
makes the runtime shape consistent and avoids noisy guards throughout the code. 
Be aware this is a behavioural/API change — callers/tests expecting None will 
see [] instead.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/schemas.py
   **Line:** 180:181
   **Comment:**
        *Type Error: Inconsistent list defaults: `columns_requested` and 
`columns_loaded` are typed as optional (`List[str] | None = None`) while the 
newly added `columns_available` and `sortable_columns` use 
`default_factory=list`. This creates inconsistent runtime types (None vs list) 
for fields that are semantically lists and may cause callers that iterate these 
attributes to raise TypeError when they encounter None. Make 
`columns_requested` and `columns_loaded` default to empty lists using 
`Field(default_factory=list, ...)` to keep types consistent and avoid needing 
None checks.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/mcp_service/system/tool/test_get_schema.py:
##########
@@ -0,0 +1,376 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Tests for the get_schema unified schema discovery tool.
+"""
+
+from unittest.mock import patch
+
+import pytest
+from fastmcp import Client
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.common.schema_discovery import (
+    CHART_DEFAULT_COLUMNS,
+    CHART_SEARCH_COLUMNS,
+    CHART_SORTABLE_COLUMNS,
+    DASHBOARD_DEFAULT_COLUMNS,
+    DASHBOARD_SEARCH_COLUMNS,
+    DASHBOARD_SORTABLE_COLUMNS,
+    DATASET_DEFAULT_COLUMNS,
+    DATASET_SEARCH_COLUMNS,
+    DATASET_SORTABLE_COLUMNS,
+    GetSchemaRequest,
+    ModelSchemaInfo,
+)
+from superset.utils import json
+
+
[email protected]
+def mcp_server():
+    return mcp
+
+
[email protected](autouse=True)
+def mock_auth():
+    """Mock authentication for all tests."""
+    from unittest.mock import Mock
+
+    with patch("superset.mcp_service.auth.get_user_from_request") as 
mock_get_user:
+        mock_user = Mock()
+        mock_user.id = 1
+        mock_user.username = "admin"
+        mock_get_user.return_value = mock_user
+        yield mock_get_user
+
+
+class TestGetSchemaRequest:
+    """Test the GetSchemaRequest schema validation."""
+
+    def test_chart_model_type(self):
+        """Test creating request for chart model type."""
+        request = GetSchemaRequest(model_type="chart")
+        assert request.model_type == "chart"
+
+    def test_dataset_model_type(self):
+        """Test creating request for dataset model type."""
+        request = GetSchemaRequest(model_type="dataset")
+        assert request.model_type == "dataset"
+
+    def test_dashboard_model_type(self):
+        """Test creating request for dashboard model type."""
+        request = GetSchemaRequest(model_type="dashboard")
+        assert request.model_type == "dashboard"
+
+    def test_invalid_model_type(self):
+        """Test that invalid model types raise validation error."""
+        with pytest.raises(ValueError, match="Input should be"):
+            GetSchemaRequest(model_type="invalid")
+
+
+class TestModelSchemaInfo:
+    """Test the ModelSchemaInfo response structure."""
+
+    def test_chart_schema_info(self):
+        """Test chart schema info structure."""
+        from superset.mcp_service.common.schema_discovery import 
CHART_SELECT_COLUMNS
+
+        info = ModelSchemaInfo(
+            model_type="chart",
+            select_columns=CHART_SELECT_COLUMNS,
+            filter_columns={"slice_name": ["eq", "sw"]},
+            sortable_columns=CHART_SORTABLE_COLUMNS,
+            default_select=CHART_DEFAULT_COLUMNS,
+            default_sort="changed_on",
+            default_sort_direction="desc",
+            search_columns=CHART_SEARCH_COLUMNS,
+        )
+
+        assert info.model_type == "chart"
+        assert len(info.select_columns) > 0
+        assert len(info.default_select) == 4  # id, slice_name, viz_type, uuid
+        assert "slice_name" in info.filter_columns
+        assert info.default_sort == "changed_on"
+
+    def test_dataset_default_columns(self):
+        """Test dataset default columns are minimal set."""
+        assert len(DATASET_DEFAULT_COLUMNS) == 4
+        assert "id" in DATASET_DEFAULT_COLUMNS
+        assert "table_name" in DATASET_DEFAULT_COLUMNS
+        assert "schema" in DATASET_DEFAULT_COLUMNS
+        assert "uuid" in DATASET_DEFAULT_COLUMNS
+        # These should NOT be in defaults
+        assert "columns" not in DATASET_DEFAULT_COLUMNS
+        assert "metrics" not in DATASET_DEFAULT_COLUMNS
+
+    def test_dashboard_default_columns(self):
+        """Test dashboard default columns are minimal set."""
+        assert len(DASHBOARD_DEFAULT_COLUMNS) == 4
+        assert "id" in DASHBOARD_DEFAULT_COLUMNS
+        assert "dashboard_title" in DASHBOARD_DEFAULT_COLUMNS
+        assert "slug" in DASHBOARD_DEFAULT_COLUMNS
+        assert "uuid" in DASHBOARD_DEFAULT_COLUMNS
+        # These should NOT be in defaults
+        assert "published" not in DASHBOARD_DEFAULT_COLUMNS
+        assert "charts" not in DASHBOARD_DEFAULT_COLUMNS
+
+    def test_chart_default_columns(self):
+        """Test chart default columns are minimal set."""
+        assert len(CHART_DEFAULT_COLUMNS) == 4
+        assert "id" in CHART_DEFAULT_COLUMNS
+        assert "slice_name" in CHART_DEFAULT_COLUMNS
+        assert "viz_type" in CHART_DEFAULT_COLUMNS
+        assert "uuid" in CHART_DEFAULT_COLUMNS
+        # These should NOT be in defaults
+        assert "description" not in CHART_DEFAULT_COLUMNS
+        assert "form_data" not in CHART_DEFAULT_COLUMNS
+
+
+class TestGetSchemaToolViaClient:
+    """Test the get_schema tool via MCP client."""
+
+    @patch("superset.daos.chart.ChartDAO.get_filterable_columns_and_operators")
+    @pytest.mark.asyncio
+    async def test_get_schema_chart(self, mock_filters, mcp_server):
+        """Test get_schema for chart model type."""
+        mock_filters.return_value = {
+            "slice_name": ["eq", "sw", "ilike"],
+            "viz_type": ["eq", "in"],
+        }
+
+        async with Client(mcp_server) as client:
+            result = await client.call_tool(
+                "get_schema", {"request": {"model_type": "chart"}}
+            )
+
+            assert result.content is not None
+            data = json.loads(result.content[0].text)
+
+            # Check structure
+            assert "schema_info" in data
+            info = data["schema_info"]
+            assert info["model_type"] == "chart"
+            assert "select_columns" in info
+            assert "filter_columns" in info
+            assert "sortable_columns" in info
+            assert "default_select" in info
+            assert "search_columns" in info
+
+            # Check default columns are minimal
+            assert len(info["default_select"]) == 4
+            assert set(info["default_select"]) == {
+                "id",
+                "slice_name",
+                "viz_type",
+                "uuid",
+            }
+
+    
@patch("superset.daos.dataset.DatasetDAO.get_filterable_columns_and_operators")
+    @pytest.mark.asyncio
+    async def test_get_schema_dataset(self, mock_filters, mcp_server):
+        """Test get_schema for dataset model type."""
+        mock_filters.return_value = {
+            "table_name": ["eq", "sw"],
+            "schema": ["eq"],
+        }
+
+        async with Client(mcp_server) as client:
+            result = await client.call_tool(
+                "get_schema", {"request": {"model_type": "dataset"}}
+            )
+
+            assert result.content is not None
+            data = json.loads(result.content[0].text)
+
+            info = data["schema_info"]
+            assert info["model_type"] == "dataset"
+
+            # Check default columns are minimal
+            assert len(info["default_select"]) == 4
+            assert set(info["default_select"]) == {
+                "id",
+                "table_name",
+                "schema",
+                "uuid",
+            }
+
+            # Check search columns
+            assert "table_name" in info["search_columns"]
+            assert "description" in info["search_columns"]
+
+    
@patch("superset.daos.dashboard.DashboardDAO.get_filterable_columns_and_operators")
+    @pytest.mark.asyncio
+    async def test_get_schema_dashboard(self, mock_filters, mcp_server):
+        """Test get_schema for dashboard model type."""
+        mock_filters.return_value = {
+            "dashboard_title": ["eq", "ilike"],
+            "published": ["eq"],
+        }
+
+        async with Client(mcp_server) as client:
+            result = await client.call_tool(
+                "get_schema", {"request": {"model_type": "dashboard"}}
+            )
+
+            assert result.content is not None
+            data = json.loads(result.content[0].text)
+
+            info = data["schema_info"]
+            assert info["model_type"] == "dashboard"
+
+            # Check default columns are minimal
+            assert len(info["default_select"]) == 4
+            assert set(info["default_select"]) == {
+                "id",
+                "dashboard_title",
+                "slug",
+                "uuid",
+            }
+
+            # Check sortable columns include expected values
+            assert "dashboard_title" in info["sortable_columns"]
+            assert "changed_on" in info["sortable_columns"]
+
+    @patch("superset.daos.chart.ChartDAO.get_filterable_columns_and_operators")
+    @pytest.mark.asyncio
+    async def test_get_schema_with_json_string_request(self, mock_filters, 
mcp_server):
+        """Test get_schema accepts JSON string request (Claude Code 
compatibility)."""
+        mock_filters.return_value = {"slice_name": ["eq"]}
+
+        async with Client(mcp_server) as client:
+            # Send request as JSON string (Claude Code bug workaround)
+            result = await client.call_tool(
+                "get_schema", {"request": '{"model_type": "chart"}'}
+            )
+
+            assert result.content is not None
+            data = json.loads(result.content[0].text)
+            assert data["schema_info"]["model_type"] == "chart"
+
+    @patch("superset.daos.chart.ChartDAO.get_filterable_columns_and_operators")
+    @pytest.mark.asyncio
+    async def test_get_schema_select_columns_have_metadata(
+        self, mock_filters, mcp_server
+    ):
+        """Test that select_columns include metadata (description, type)."""
+        mock_filters.return_value = {}
+
+        async with Client(mcp_server) as client:
+            result = await client.call_tool(
+                "get_schema", {"request": {"model_type": "chart"}}
+            )
+
+            data = json.loads(result.content[0].text)

Review Comment:
   **Suggestion:** `result.content` is used without verifying it's not None; 
add an explicit assertion before accessing `result.content[0].text` to avoid a 
TypeError/IndexError if the tool returned no content. [null pointer]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               assert result.content is not None
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Adding an explicit assertion that result.content is not None (or that it's 
truthy and has at least one element) is a defensive, low-cost change that 
prevents a TypeError/IndexError in flaky conditions and makes tests' 
assumptions explicit. Several other tests in the file already do this 
assertion, so consistency is good.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/system/tool/test_get_schema.py
   **Line:** 278:278
   **Comment:**
        *Null Pointer: `result.content` is used without verifying it's not 
None; add an explicit assertion before accessing `result.content[0].text` to 
avoid a TypeError/IndexError if the tool returned no content.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/dataset/tool/list_datasets.py:
##########
@@ -144,20 +144,16 @@ def _serialize_dataset(
             )
         )
 
-        # Apply field filtering via serialization context if select_columns 
specified
-        # This triggers DatasetInfo._filter_fields_by_context for each dataset
-        if request.select_columns:
-            await ctx.debug(
-                "Applying field filtering via serialization context: 
select_columns=%s"
-                % (request.select_columns,)
-            )
-            # Return dict with context - FastMCP will serialize it
-            return result.model_dump(
-                mode="json", context={"select_columns": request.select_columns}
-            )
-
-        # No filtering - return full result as dict
-        return result.model_dump(mode="json")
+        # Apply field filtering via serialization context
+        # Use requested columns or defaults - always filter to reduce token 
usage
+        columns_to_filter = request.select_columns or DEFAULT_DATASET_COLUMNS
+        await ctx.debug(
+            "Applying field filtering via serialization context: 
select_columns=%s"
+            % (columns_to_filter,)
+        )
+        return result.model_dump(
+            mode="json", context={"select_columns": columns_to_filter}
+        )

Review Comment:
   **Suggestion:** Returning the result of `model_dump(...)` returns a plain 
dict/JSON-serializable structure instead of the expected Pydantic `DatasetList` 
instance; callers and the function signature expect a `DatasetList` object — 
re-validate the dumped data back into `DatasetList` before returning so the 
return type and downstream behavior remain correct. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           filtered = result.model_dump(mode="json", context={"select_columns": 
columns_to_filter})
           return DatasetList.model_validate(filtered)
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The function is annotated to return DatasetList and `result` is a Pydantic 
model instance.
   Returning the dict produced by `model_dump` is a type mismatch and can break 
callers that
   expect a DatasetList (and may rely on Pydantic model behavior). 
Re-validating back into
   DatasetList (e.g. DatasetList.model_validate(...)) fixes the mismatch. The 
suggested
   improvement is correct and straightforward.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dataset/tool/list_datasets.py
   **Line:** 151:156
   **Comment:**
        *Type Error: Returning the result of `model_dump(...)` returns a plain 
dict/JSON-serializable structure instead of the expected Pydantic `DatasetList` 
instance; callers and the function signature expect a `DatasetList` object — 
re-validate the dumped data back into `DatasetList` before returning so the 
return type and downstream behavior remain correct.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/mcp_core.py:
##########
@@ -106,6 +106,8 @@ def __init__(
         search_columns: List[str],
         list_field_name: str,
         output_list_schema: Type[L],
+        all_columns: List[str] | None = None,
+        sortable_columns: List[str] | None = None,
         logger: logging.Logger | None = None,

Review Comment:
   **Suggestion:** Breaking change in constructor parameter order: two new 
parameters (`all_columns`, `sortable_columns`) were inserted before `logger`, 
which will shift positional arguments for existing callers that pass `logger` 
positionally and silently assign the logger instance to `all_columns`. Restore 
backward compatibility by keeping `logger` at the same positional index (move 
`logger` immediately after `output_list_schema`) so existing positional calls 
still bind correctly. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           logger: logging.Logger | None = None,
           all_columns: List[str] | None = None,
           sortable_columns: List[str] | None = None,
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a valid backward-compatibility concern: adding two optional 
parameters before `logger`
   will rebind positional arguments for existing callers that pass `logger` 
positionally.
   Moving `logger` back (or making the new parameters keyword-only) fixes a 
real, silent break.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/mcp_core.py
   **Line:** 109:111
   **Comment:**
        *Possible Bug: Breaking change in constructor parameter order: two new 
parameters (`all_columns`, `sortable_columns`) were inserted before `logger`, 
which will shift positional arguments for existing callers that pass `logger` 
positionally and silently assign the logger instance to `all_columns`. Restore 
backward compatibility by keeping `logger` at the same positional index (move 
`logger` immediately after `output_list_schema`) so existing positional calls 
still bind correctly.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/mcp_core.py:
##########
@@ -473,34 +479,96 @@ def _generate_instance_info(self) -> BaseModel:
             raise
 
 
-class ModelGetAvailableFiltersCore(BaseCore, Generic[S]):
+class ModelGetSchemaCore(BaseCore, Generic[S]):
     """
-    Generic tool for retrieving available filterable columns and operators for 
a
-    model. Used for get_dataset_available_filters, get_chart_available_filters,
-    get_dashboard_available_filters, etc.
+    Generic tool for retrieving comprehensive schema metadata for a model type.
+
+    Provides unified schema discovery for list tools:
+    - select_columns: All columns available for selection
+    - filter_columns: Filterable columns with their operators
+    - sortable_columns: Columns valid for order_column
+    - default_columns: Columns returned when select_columns not specified
+    - search_columns: Columns searched by the search parameter
+    - default_sort: Default column for sorting
+    - default_sort_direction: Default sort direction ("asc" or "desc")
+
+    Replaces the individual get_*_available_filters tools with a unified 
approach.
     """
 
     def __init__(
         self,
+        model_type: Literal["chart", "dataset", "dashboard"],
         dao_class: Type[BaseDAO[Any]],
         output_schema: Type[S],
+        select_columns: List[Any],
+        sortable_columns: List[str],
+        default_columns: List[str],
+        search_columns: List[str],
+        default_sort: str = "changed_on",
+        default_sort_direction: Literal["asc", "desc"] = "desc",
         logger: logging.Logger | None = None,
     ) -> None:
+        """
+        Initialize the schema discovery core.
+
+        Args:
+            model_type: The type of model (chart, dataset, dashboard)
+            dao_class: The DAO class to query for filter columns
+            output_schema: Pydantic schema for the response (e.g., 
ModelSchemaInfo)
+            select_columns: Column metadata (List[ColumnMetadata] or similar)
+            sortable_columns: Column names that support sorting
+            default_columns: Column names returned by default
+            search_columns: Column names used for text search
+            default_sort: Default sort column
+            default_sort_direction: Default sort direction
+            logger: Optional logger instance
+        """
         super().__init__(logger)
+        self.model_type = model_type
         self.dao_class = dao_class
         self.output_schema = output_schema
+        self.select_columns = select_columns
+        self.sortable_columns = sortable_columns
+        self.default_columns = default_columns
+        self.search_columns = search_columns
+        self.default_sort = default_sort
+        self.default_sort_direction = default_sort_direction
 
-    def run_tool(self) -> S:
+    def _get_filter_columns(self) -> Dict[str, List[str]]:
+        """Get filterable columns and operators from the DAO."""
         try:
             filterable = self.dao_class.get_filterable_columns_and_operators()
-            # Ensure column_operators is a plain dict, not a custom type
-            column_operators = dict(filterable)
-            response = self.output_schema(column_operators=column_operators)
+            return dict(filterable)

Review Comment:
   **Suggestion:** Fragile handling of DAO filter metadata: 
`get_filterable_columns_and_operators()` result is converted with `dict(...)` 
without guarding for None or unexpected types, which can raise TypeError. 
Validate the returned value first, handle None, and accept both dict and 
iterable-of-pairs formats; log and return empty dict on unexpected formats to 
avoid runtime errors. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
               if not filterable:
                   return {}
               if isinstance(filterable, dict):
                   return filterable.copy()
               try:
                   return dict(filterable)
               except TypeError:
                   self._log_warning(
                       f"Unexpected format for filter columns returned by 
{self.dao_class}: {filterable}"
                   )
                   return {}
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The DAO may return None, a dict, or an iterable-of-pairs. Calling dict(...) 
unconditionally can raise
   TypeError or produce confusing results. Validating and handling multiple 
shapes (dict copy, dict() from pairs,
   or empty on None) is a correct robustness fix.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/mcp_core.py
   **Line:** 541:541
   **Comment:**
        *Logic Error: Fragile handling of DAO filter metadata: 
`get_filterable_columns_and_operators()` result is converted with `dict(...)` 
without guarding for None or unexpected types, which can raise TypeError. 
Validate the returned value first, handle None, and accept both dict and 
iterable-of-pairs formats; log and return empty dict on unexpected formats to 
avoid runtime errors.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/resources/__init__.py:
##########
@@ -18,4 +18,7 @@
 """System resources for Superset MCP service"""
 
 # Import to register resources when module is imported
-from . import instance_metadata  # noqa: F401
+from . import (
+    instance_metadata,  # noqa: F401
+    schema_discovery,  # noqa: F401
+)

Review Comment:
   **Suggestion:** Grouped tuple imports with inline comments reduce clarity 
and make per-import linting and future edits error-prone; import each resource 
on its own line (with per-line noqa) to improve readability and avoid 
accidental removal of the noqa markers. [maintainability]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
   # Import to register resources when module is imported
   from . import instance_metadata  # noqa: F401
   from . import schema_discovery  # noqa: F401
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Splitting the grouped import into explicit one-line imports is a reasonable, 
low-risk maintainability improvement: it's clearer, makes per-import linting 
straightforward, and avoids accidental removal of noqa comments. It does not 
change runtime behavior.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/resources/__init__.py
   **Line:** 21:24
   **Comment:**
        *Maintainability: Grouped tuple imports with inline comments reduce 
clarity and make per-import linting and future edits error-prone; import each 
resource on its own line (with per-line noqa) to improve readability and avoid 
accidental removal of the noqa markers.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/mcp_service/dashboard/tool/test_dashboard_tools.py:
##########
@@ -513,6 +514,96 @@ async def 
test_list_dashboards_custom_uuid_slug_columns(mock_list, mcp_server):
         assert "slug" in result.data["columns_loaded"]
 
 
+class TestDashboardDefaultColumnFiltering:
+    """Test default column filtering behavior for dashboards."""
+
+    def test_minimal_default_columns_constant(self):
+        """Test that minimal default columns are properly defined."""
+        from superset.mcp_service.common.schema_discovery import (
+            DASHBOARD_DEFAULT_COLUMNS,
+        )
+
+        # Should have exactly 4 minimal columns
+        assert len(DASHBOARD_DEFAULT_COLUMNS) == 4
+        assert set(DASHBOARD_DEFAULT_COLUMNS) == {
+            "id",
+            "dashboard_title",
+            "slug",
+            "uuid",
+        }
+
+        # Heavy columns should NOT be in defaults
+        assert "charts" not in DASHBOARD_DEFAULT_COLUMNS
+        assert "published" not in DASHBOARD_DEFAULT_COLUMNS
+        assert "json_metadata" not in DASHBOARD_DEFAULT_COLUMNS
+        assert "position_json" not in DASHBOARD_DEFAULT_COLUMNS
+
+    def test_empty_select_columns_default(self):
+        """Test that select_columns defaults to empty list which triggers
+        minimal defaults in tool."""
+        request = ListDashboardsRequest()
+        assert request.select_columns == []
+
+    def test_explicit_select_columns(self):
+        """Test that explicit select_columns can include non-default 
columns."""
+        request = ListDashboardsRequest(
+            select_columns=["id", "dashboard_title", "published", "charts"]
+        )
+        assert "published" in request.select_columns
+        assert "charts" in request.select_columns
+        # These are explicitly requested, not defaults
+        assert len(request.select_columns) == 4
+
+    @patch("superset.daos.dashboard.DashboardDAO.list")
+    @pytest.mark.asyncio
+    async def test_default_columns_in_response(self, mock_list, mcp_server):
+        """Test that minimal default columns appear in response metadata."""
+        dashboard = Mock()
+        dashboard.id = 1
+        dashboard.dashboard_title = "Test Dashboard"
+        dashboard.slug = "test-dashboard"
+        dashboard.uuid = "test-uuid-123"
+        dashboard.url = "/dashboard/1"
+        dashboard.published = True
+        dashboard.changed_by_name = "admin"
+        dashboard.changed_on = None
+        dashboard.changed_on_humanized = None
+        dashboard.created_by_name = "admin"
+        dashboard.created_on = None
+        dashboard.created_on_humanized = None
+        dashboard.tags = []
+        dashboard.owners = []
+        dashboard.slices = []
+        dashboard.description = None
+        dashboard.css = None
+        dashboard.certified_by = None
+        dashboard.certification_details = None
+        dashboard.json_metadata = None
+        dashboard.position_json = None
+        dashboard.is_managed_externally = False
+        dashboard.external_url = None
+        dashboard.thumbnail_url = None
+        dashboard.roles = []
+        dashboard.charts = []
+        mock_list.return_value = ([dashboard], 1)

Review Comment:
   **Suggestion:** The test `test_default_columns_in_response` creates a 
dashboard mock but does not set `dashboard._mapping`. Other list-related tests 
rely on `_mapping` to determine which columns were loaded/requested; omitting 
it makes this test brittle or likely to fail because the tool builds 
`columns_loaded` from `_mapping`. Add a `_mapping` dict that includes the 
minimal/default columns (id, dashboard_title, slug, uuid) so the tool can 
compute `columns_loaded` consistently. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
           # Ensure a mapping is present so list tools can compute 
columns_loaded/columns_requested
           dashboard._mapping = {
               "id": dashboard.id,
               "dashboard_title": dashboard.dashboard_title,
               "slug": dashboard.slug,
               "uuid": dashboard.uuid,
               "url": dashboard.url,
               "published": dashboard.published,
               "changed_by_name": dashboard.changed_by_name,
               "changed_on": dashboard.changed_on,
               "changed_on_humanized": dashboard.changed_on_humanized,
               "created_by_name": dashboard.created_by_name,
               "created_on": dashboard.created_on,
               "created_on_humanized": dashboard.created_on_humanized,
               "tags": dashboard.tags,
               "owners": dashboard.owners,
               "charts": [],
           }
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This test, unlike the other list tests in the file, does not set 
dashboard._mapping.
   The codebase and other tests rely on that attribute to determine which 
columns were
   loaded/requested (result.data["columns_loaded"] is typically derived from 
_mapping).
   Omitting _mapping makes the test brittle and likely to produce false 
negatives.
   Adding the small _mapping dict is a targeted, correct fix that aligns this 
test with
   existing expectations and other tests in the file.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/mcp_service/dashboard/tool/test_dashboard_tools.py
   **Line:** 588:588
   **Comment:**
        *Possible Bug: The test `test_default_columns_in_response` creates a 
dashboard mock but does not set `dashboard._mapping`. Other list-related tests 
rely on `_mapping` to determine which columns were loaded/requested; omitting 
it makes this test brittle or likely to fail because the tool builds 
`columns_loaded` from `_mapping`. Add a `_mapping` dict that includes the 
minimal/default columns (id, dashboard_title, slug, uuid) so the tool can 
compute `columns_loaded` consistently.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/tool/get_schema.py:
##########
@@ -0,0 +1,142 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Unified schema discovery tool for MCP service.
+
+This tool consolidates schema discovery for all model types (chart, dataset,
+dashboard) into a single endpoint, reducing token usage and API calls.
+"""
+
+import logging
+from typing import Literal

Review Comment:
   **Suggestion:** Missing top-level asyncio import: if you run synchronous 
core work on a thread (using `asyncio.to_thread`) it's cleaner and slightly 
faster to import `asyncio` at module scope instead of importing inside the 
function; add `import asyncio` to the import block. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
   import asyncio
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   If you follow the first suggestion and use asyncio.to_thread, importing 
asyncio at module scope is cleaner than importing it inside the function. It's 
a small, safe, and idiomatic change.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/tool/get_schema.py
   **Line:** 26:26
   **Comment:**
        *Possible Bug: Missing top-level asyncio import: if you run synchronous 
core work on a thread (using `asyncio.to_thread`) it's cleaner and slightly 
faster to import `asyncio` at module scope instead of importing inside the 
function; add `import asyncio` to the import block.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/resources/schema_discovery.py:
##########
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP Resources for schema discovery.
+
+These resources provide static metadata about available columns, filters,
+and sorting options for chart, dataset, and dashboard models.
+LLM clients can use these resources to understand what parameters are valid
+without making API calls.
+"""
+
+import logging
+from typing import Any
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.auth import mcp_auth_hook
+
+logger = logging.getLogger(__name__)
+
+
+def _build_schema_resource(model_type: str) -> dict[str, Any]:
+    """Build schema resource data for a model type."""
+    from superset.mcp_service.common.schema_discovery import (
+        CHART_ALL_COLUMNS,
+        CHART_DEFAULT_COLUMNS,
+        CHART_SEARCH_COLUMNS,
+        CHART_SELECT_COLUMNS,
+        CHART_SORTABLE_COLUMNS,
+        DASHBOARD_ALL_COLUMNS,
+        DASHBOARD_DEFAULT_COLUMNS,
+        DASHBOARD_SEARCH_COLUMNS,
+        DASHBOARD_SELECT_COLUMNS,
+        DASHBOARD_SORTABLE_COLUMNS,
+        DATASET_ALL_COLUMNS,
+        DATASET_DEFAULT_COLUMNS,
+        DATASET_SEARCH_COLUMNS,
+        DATASET_SELECT_COLUMNS,
+        DATASET_SORTABLE_COLUMNS,
+    )
+
+    schemas = {
+        "chart": {
+            "model_type": "chart",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}
+                for col in CHART_SELECT_COLUMNS
+            ],
+            "all_column_names": CHART_ALL_COLUMNS,
+            "default_columns": CHART_DEFAULT_COLUMNS,
+            "sortable_columns": CHART_SORTABLE_COLUMNS,
+            "search_columns": CHART_SEARCH_COLUMNS,
+            "default_sort": "changed_on",
+            "default_sort_direction": "desc",
+        },
+        "dataset": {
+            "model_type": "dataset",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}

Review Comment:
   **Suggestion:** Same JSON-serialization risk for dataset `select_columns`: 
`col.description` or `col.type` may be non-primitive and break `json.dumps`. 
Use `getattr` and coerce `type` to a string when necessary to ensure JSON 
serializability and avoid AttributeError. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
                   {
                       "name": getattr(col, "name", None),
                       "description": getattr(col, "description", "") or "",
                       "type": getattr(col, "type", None)
                       if isinstance(getattr(col, "type", None), str)
                       else str(getattr(col, "type", None)),
                   }
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Same as above: coercing to primitives avoids json.dumps TypeError and 
prevents AttributeError
   when attributes are missing. This is a sensible defensive change for 
serialization safety.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/resources/schema_discovery.py
   **Line:** 73:73
   **Comment:**
        *Type Error: Same JSON-serialization risk for dataset `select_columns`: 
`col.description` or `col.type` may be non-primitive and break `json.dumps`. 
Use `getattr` and coerce `type` to a string when necessary to ensure JSON 
serializability and avoid AttributeError.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/tool/get_schema.py:
##########
@@ -0,0 +1,142 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+Unified schema discovery tool for MCP service.
+
+This tool consolidates schema discovery for all model types (chart, dataset,
+dashboard) into a single endpoint, reducing token usage and API calls.
+"""
+
+import logging
+from typing import Literal
+
+from fastmcp import Context
+from superset_core.mcp import tool
+
+from superset.daos.chart import ChartDAO
+from superset.daos.dashboard import DashboardDAO
+from superset.daos.dataset import DatasetDAO
+from superset.mcp_service.common.schema_discovery import (
+    CHART_DEFAULT_COLUMNS,
+    CHART_SEARCH_COLUMNS,
+    CHART_SELECT_COLUMNS,
+    CHART_SORTABLE_COLUMNS,
+    DASHBOARD_DEFAULT_COLUMNS,
+    DASHBOARD_SEARCH_COLUMNS,
+    DASHBOARD_SELECT_COLUMNS,
+    DASHBOARD_SORTABLE_COLUMNS,
+    DATASET_DEFAULT_COLUMNS,
+    DATASET_SEARCH_COLUMNS,
+    DATASET_SELECT_COLUMNS,
+    DATASET_SORTABLE_COLUMNS,
+    GetSchemaRequest,
+    GetSchemaResponse,
+    ModelSchemaInfo,
+)
+from superset.mcp_service.mcp_core import ModelGetSchemaCore
+from superset.mcp_service.utils.schema_utils import parse_request
+
+logger = logging.getLogger(__name__)
+
+# Create core instances for each model type
+_chart_schema_core = ModelGetSchemaCore(
+    model_type="chart",
+    dao_class=ChartDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=CHART_SELECT_COLUMNS,
+    sortable_columns=CHART_SORTABLE_COLUMNS,
+    default_columns=CHART_DEFAULT_COLUMNS,
+    search_columns=CHART_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+_dataset_schema_core = ModelGetSchemaCore(
+    model_type="dataset",
+    dao_class=DatasetDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=DATASET_SELECT_COLUMNS,
+    sortable_columns=DATASET_SORTABLE_COLUMNS,
+    default_columns=DATASET_DEFAULT_COLUMNS,
+    search_columns=DATASET_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+_dashboard_schema_core = ModelGetSchemaCore(
+    model_type="dashboard",
+    dao_class=DashboardDAO,
+    output_schema=ModelSchemaInfo,
+    select_columns=DASHBOARD_SELECT_COLUMNS,
+    sortable_columns=DASHBOARD_SORTABLE_COLUMNS,
+    default_columns=DASHBOARD_DEFAULT_COLUMNS,
+    search_columns=DASHBOARD_SEARCH_COLUMNS,
+    default_sort="changed_on",
+    default_sort_direction="desc",
+    logger=logger,
+)
+
+# Map model types to their core instances
+_SCHEMA_CORES: dict[
+    Literal["chart", "dataset", "dashboard"], 
ModelGetSchemaCore[ModelSchemaInfo]
+] = {
+    "chart": _chart_schema_core,
+    "dataset": _dataset_schema_core,
+    "dashboard": _dashboard_schema_core,
+}
+
+
+@tool(tags=["discovery"])
+@parse_request(GetSchemaRequest)
+async def get_schema(request: GetSchemaRequest, ctx: Context) -> 
GetSchemaResponse:
+    """
+    Get comprehensive schema metadata for a model type.
+
+    Returns all information needed to construct valid queries:
+    - select_columns: All columns available for selection
+    - filter_columns: Filterable columns with their operators
+    - sortable_columns: Columns valid for order_column
+    - default_select: Columns returned when select_columns not specified
+    - search_columns: Columns searched by the search parameter
+
+    This unified tool consolidates discovery, reducing API calls and token 
usage.
+    Use this instead of calling get_chart_available_filters,
+    get_dataset_available_filters, and get_dashboard_available_filters 
separately.
+
+    Args:
+        model_type: One of "chart", "dataset", or "dashboard"
+
+    Returns:
+        Comprehensive schema information for the requested model type
+    """
+    await ctx.info(f"Getting schema for model_type={request.model_type}")
+
+    # Get the appropriate core instance and run it
+    core = _SCHEMA_CORES[request.model_type]

Review Comment:
   **Suggestion:** Key lookup can raise KeyError: 
`_SCHEMA_CORES[request.model_type]` will raise a KeyError if 
`request.model_type` is not present. Lookups should use `.get()` and return a 
clear error (or raise a controlled exception) with an informative message so 
the tool fails gracefully. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
       core = _SCHEMA_CORES.get(request.model_type)
       if core is None:
           await ctx.error(f"Unsupported model_type: {request.model_type}")
           raise ValueError(f"Unsupported model_type: {request.model_type}")
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Defensive checking here is reasonable: a direct dict lookup will raise 
KeyError for unexpected model_type values. Even if GetSchemaRequest likely 
constrains allowed literals, validating and returning a clear error (and 
logging it) produces a nicer failure mode and avoids an unhelpful stack trace. 
The proposed .get() + explicit ValueError (and ctx.error) is sensible.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/tool/get_schema.py
   **Line:** 132:132
   **Comment:**
        *Possible Bug: Key lookup can raise KeyError: 
`_SCHEMA_CORES[request.model_type]` will raise a KeyError if 
`request.model_type` is not present. Lookups should use `.get()` and return a 
clear error (or raise a controlled exception) with an informative message so 
the tool fails gracefully.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/mcp_service/system/resources/schema_discovery.py:
##########
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+MCP Resources for schema discovery.
+
+These resources provide static metadata about available columns, filters,
+and sorting options for chart, dataset, and dashboard models.
+LLM clients can use these resources to understand what parameters are valid
+without making API calls.
+"""
+
+import logging
+from typing import Any
+
+from superset.mcp_service.app import mcp
+from superset.mcp_service.auth import mcp_auth_hook
+
+logger = logging.getLogger(__name__)
+
+
+def _build_schema_resource(model_type: str) -> dict[str, Any]:
+    """Build schema resource data for a model type."""
+    from superset.mcp_service.common.schema_discovery import (
+        CHART_ALL_COLUMNS,
+        CHART_DEFAULT_COLUMNS,
+        CHART_SEARCH_COLUMNS,
+        CHART_SELECT_COLUMNS,
+        CHART_SORTABLE_COLUMNS,
+        DASHBOARD_ALL_COLUMNS,
+        DASHBOARD_DEFAULT_COLUMNS,
+        DASHBOARD_SEARCH_COLUMNS,
+        DASHBOARD_SELECT_COLUMNS,
+        DASHBOARD_SORTABLE_COLUMNS,
+        DATASET_ALL_COLUMNS,
+        DATASET_DEFAULT_COLUMNS,
+        DATASET_SEARCH_COLUMNS,
+        DATASET_SELECT_COLUMNS,
+        DATASET_SORTABLE_COLUMNS,
+    )
+
+    schemas = {
+        "chart": {
+            "model_type": "chart",
+            "select_columns": [
+                {"name": col.name, "description": col.description, "type": 
col.type}

Review Comment:
   **Suggestion:** JSON serialization will fail or produce unexpected output if 
`col.description` or `col.type` are non-primitive objects (e.g. enums or 
SQLAlchemy types). Coerce these fields to JSON-safe primitives (strings or 
None) and use `getattr` to avoid AttributeError if an element lacks an 
attribute. [type error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
                   {
                       "name": getattr(col, "name", None),
                       "description": getattr(col, "description", "") or "",
                       "type": getattr(col, "type", None)
                       if isinstance(getattr(col, "type", None), str)
                       else str(getattr(col, "type", None)),
                   }
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   This is a reasonable defensive improvement: if any `col.type` or 
`col.description`
   is a non-primitive (SQLAlchemy TypeEngine, Enum, or other object) json.dumps 
would
   raise TypeError. Using getattr and coercing `type` to str prevents that and 
also
   avoids AttributeError if a col is missing an attribute. The change is local 
and
   corrects a real potential runtime error when serializing.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/system/resources/schema_discovery.py
   **Line:** 60:60
   **Comment:**
        *Type Error: JSON serialization will fail or produce unexpected output 
if `col.description` or `col.type` are non-primitive objects (e.g. enums or 
SQLAlchemy types). Coerce these fields to JSON-safe primitives (strings or 
None) and use `getattr` to avoid AttributeError if an element lacks an 
attribute.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(mcp): add unified get_schema tool for schema discovery [superset]

Reply via email to