Vara Bonthu created SPARK-52806:
-----------------------------------

             Summary: SPIP: AI-Native Observability for Apache Spark History 
Server via Model Context Protocol
                 Key: SPARK-52806
                 URL: https://issues.apache.org/jira/browse/SPARK-52806
             Project: Spark
          Issue Type: New Feature
          Components: Documentation, Web UI
    Affects Versions: 4.0.0, 3.5.6
         Environment: h2. Environment:

- This solution works with any Spark History server deployment, irrespective of 
the cloud provider
            Reporter: Vara Bonthu


This SPIP proposes adding AI-native observability capabilities to Apache Spark 
through a Model Context Protocol (MCP) server that enables natural language 
querying and analysis of Spark History Server data.
h2. Summary

We propose creating a bridge between AI assistants (Claude, GPT, Amazon Q) and 
Apache Spark History Server data, enabling users to ask questions like "Why is 
my Spark job slow?" and receive AI-powered analysis with actionable 
recommendations.
h2. Key Features
 * Natural language interface for Spark diagnostics
 * 17+ pre-built diagnostic tools for common performance scenarios
 * AI-powered root cause analysis and optimization recommendations
 * Zero modifications required to existing Spark installations
 * Compatible with multiple AI assistants via Model Context Protocol

h2. Community Value
 * 10x faster troubleshooting workflows
 * Lower barrier to entry for Spark performance optimization
 * Positions Apache Spark as AI-ready for next-generation observability
 * Addresses growing demand for AI-powered developer tools

h2. Implementation Approach
 * Standalone MCP server consuming existing Spark History Server REST APIs
 * No changes to Spark core required
 * Kubernetes-native deployment with Helm charts or on any virtual machine
 * Built on the emerging MCP standard for AI-tool integration

h2. Related Work
 * No related projects are available for this problem
 * This project is currently under a neutral org 
[https://github.com/DeepDiagnostix-AI/spark-history-server-mcp]

h2. Who maintains

- Currently, Vara Bonthu (AWS Open Source Specialist SA), Manabu McCloskey 
(AWS, Open Source Engineer), along with Amazon EMR service teams until we build 
the community. 

 

We have also submitted a proposal to Kubeflow 
[https://github.com/kubeflow/community/issues/872. 
|https://github.com/kubeflow/community/issues/872] We want to hear from Apache 
Spark community on this amazing step forward for AI observability and are 
willing to support this project. 

 

Full SPIP document with detailed technical design, timeline, and success 
metrics will be attached as a comment.

This proposal aligns with Apache Spark's mission to make big data processing 
accessible while positioning the project at the forefront of AI-native tooling.

 

*NOTE: We are happy to demo this to the community a great solution if you 
provide the opportunity for us to present.*

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to