elasticdotventures commented on issue #35003: URL: https://github.com/apache/superset/issues/35003#issuecomment-3940271286
Hi @betodealmeida and the Superset community 👋 We wanted to share some context from a fork we've been developing in case it's useful to anyone looking for **DataFrame, MCP (Model Context Protocol), or AI-agent-driven chart/dashboard generation** capabilities today while SIP-182 matures. **The fork**: https://github.com/PromptExecution/superset-datafusion-mcp ### Why the fork exists [@PromptExecution](https://github.com/PromptExecution) is a consulting org working with a client that operates a data platform currently in testing, expected to reach roughly **~1,000 daily users within the next few months**. Many of those users are Python-proficient analysts who needed a capability that goes beyond what standard BI tools offer today: - **In-session DataFrames as chart sources** — ingest an Arrow/Parquet table via an AI agent, immediately generate a Superset chart against it, no database required - **MCP tool surface** — expose chart creation, dashboard assembly, and DataFrame querying as first-class tools that LLM agents can call - **"Better than Grafana" diagram and dashboard generation** — including Mermaid diagram output and composite dashboard assembly from agent conversations The delivery timeline made a clean upstream contribution path impractical for this cycle. Rather than wait, we made a **hard fork** to ship the MCP service layer on top of Superset's existing chart infrastructure. ### What we built (relevant to SIP-182) The fork adds a `VirtualDatasetRegistry` backed by **[Apache Arrow](https://arrow.apache.org/)** (in-memory tables, TTL-scoped, session-isolated) and **[Apache DataFusion](https://datafusion.apache.org/)** / DuckDB for query execution. We think this is the natural internal engine choice for Apache Superset — Arrow and DataFusion are both Apache-family projects with strong columnar performance characteristics, and Arrow in particular is already the lingua franca for DataFrame interchange across the Python ecosystem. An AI agent can: 1. Ingest a DataFrame → register as a virtual dataset (Arrow table in memory) 2. Call `generate_chart(dataset_id="virtual:{uuid}", config={...})` → DataFusion/DuckDB executes the query → Superset renders the chart 3. Query the virtual dataset with arbitrary SQL via the MCP tool surface The bridge between virtual datasets and chart rendering lives entirely outside Superset's `get_sqla_query()` path, which means **it is structurally aligned with the decoupling SIP-182 proposes** — the `Explorable` protocol would give our bridge a proper first-class home. ### How we're planning to harmonize This fork is also serving as a live test of **[`gh-aw`](https://github.com/PromptExecution/superset-datafusion-mcp/tree/master/.github/workflows)** (GitHub Copilot Agent Workflows) for CI/CD automation. We've wired up a breaking-change checker agent that watches specifically for SIP-182 milestones: - `Explorable` protocol introduction (Phase 0 / PR #36245) - `form_data` key renames (Phases 2/3) — our bridge centralises all form_data reads into accessor functions so they're a single-file update - `get_sqla_query()` removal (Phase 4) — low direct risk since we already bypass it, but we'll do a full audit when it lands When Phase 0 merges, our plan is to implement `Explorable` for the `VirtualDatasetRegistry` so virtual datasets work natively through Superset's chart pipeline. At that point we'd love to discuss upstreaming the registry, the MCP tool surface, and potentially the Prometheus query tool (which has no upstream equivalent proposed yet). ### Cherry-pick contributions In the meantime we're tracking upstream closely and tagging anything that looks like a clean upstream contribution candidate. If any of the patterns we've built — session-scoped in-memory datasets, TTL lifecycle management, Arrow-native query results, or the MCP agentic tool layer — would be useful reference material as Phases 1–3 land, we're happy to share specifics or open draft PRs for discussion. Thanks for the thoughtful design work here — SIP-182 is exactly the right abstraction boundary and we're genuinely excited to see it mature. — [@PromptExecution](https://github.com/PromptExecution) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
