mistercrunch opened a new issue, #35820:
URL: https://github.com/apache/superset/issues/35820

   # [SIP-XX] Structured Context for AI in Superset
   
   ## Motivation
   
   AI-assisted coding is changing how we build software. Tools like Claude 
Code, Cursor, and Copilot make developers faster and more productive — but only 
when they get the right context.  
   
   Right now, Superset’s AI context is scattered:
   
   - Partial instructions in `AGENTS.md` and `CLAUDE.md`  with symlinks
   - Docs spread across repos, wikis, and PRs  
   - Tribal knowledge that isn’t written anywhere  
   
   This makes it hard for both humans and AI tools to get up to speed.
   
   We need a better system. “Context engineering” means treating the 
information AI relies on like code — organized, versioned, and reviewed.  This 
SIP proposes adding a structured `context/` directory that makes Superset’s 
core knowledge discoverable, modular, and up to date.
   
   ---
   
   ## Problem
   
   1. Monolithic files don’t scale -> AIs don't perform well with overloaded 
context
   2. Contributors and AIs can’t find relevant info  
   3. No structure for how context should live in the repo  
   4. Fear of duplication blocks people from writing good summaries  
   5. No process to keep AI context fresh  / curated / organized
   
   ---
   
   ## Proposal
   
   ### Principles
   
   - **Context as Code** – versioned, reviewed, and owned  
   - **Small Chunks** – short, focused Markdown files  
   - **Graph Structure** – link files using `@context/...` references  
   - **OK to Duplicate** – optimize for clarity, not deduplication  
   - **Task-Scoped** – load only what’s needed for the current change  
   
   ---
   
   ### Directory Layout
   
   ```
   context/
   ├── lexicon.md          # Index of all context files
   ├── core.md             # What Superset is and why it exists
   ├── architecture.md     # High-level system overview
   ├── architecture/       # Frontend, backend, pipelines, etc.
   ├── concepts/           # Core ideas like datasets, dashboards, plugins
   ├── guidelines/         # Coding and design patterns
   ├── workflows/          # Common dev tasks
   └── refactors/          # Ongoing cleanup and migration efforts
   ```
   
   Each file explains one topic clearly, with links to other files.  
   AI tools can load subsets based on the task at hand.
   
   ---
   
   ### Example
   
   ```markdown
   # Adding a Visualization Plugin
   
   See @context/architecture/frontend.md for how the frontend works.  
   See @context/concepts/viz-plugins.md for plugin architecture.  
   
   Steps to add a plugin:
   1. Create a new directory under `superset-frontend/plugins/`
   2. ...
   ```
   
   ---
   
   ### Scope
   
   **Include:**  
   Architecture, core concepts, workflows, guidelines, refactors, testing, 
performance.
   
   **Exclude:**  
   Issue threads, PR details, user docs, or fast-changing ops info.
   
   ---
   
   ### Best Practices
   
   - Keep each file <500 lines  
   - Single topic per file  
   - Add “context owner” headers if/where it applies
   - Write for humans and AI both  
   - Cross-link freely  
   - Prefer practical, example-driven writing  
   
   ---
   
   ### Ownership
   
   - Reviewed like code  
   - PRs that add new patterns must update context  
   - Committers can own files or sections  
   - Quarterly cleanup to remove stale content  
   
   ---
   
   ### Integration with AI Tools
   
   The `context/` layout works with Claude Code, Cursor, Copilot, and others.
   
   Example:
   
   ```bash
   claude-code --context @context/lexicon.md "Add caching to dataset endpoint"
   ```
   
   Developers can also copy-paste references directly in chat or terminal.
   
   ---
   
   ### Rollout Plan
   
   **Phase 1** – create structure and seed `core.md`, `architecture.md`, 
`lexicon.md`  
   **Phase 2** – add high-value workflows and guidelines  
   **Phase 3** – integrate into PR templates and `CONTRIBUTING.md`  
   
   No API, database, or migration changes. Purely additive.
   
   ---
   
   ### Success Metrics
   
   - Number of context files  
   - Frequency of updates  
   - Feedback from contributors and AI tool users  
   - Reduction in repetitive contributor questions  
   
   ---
   
   ## Rejected Alternatives
   
   - One giant `AGENTS.md` – doesn’t scale  
   - Wiki – not versioned or reviewed  
   - Code comments – too granular  
   - Auto-generated docs – lack curation  
   - “Superset for AI” single doc – same scale issue  
   
   ---
   
   ## Summary
   
   This SIP formalizes **context as code**.  
   A structured `context/` directory helps both humans and AI assistants work 
faster and with fewer mistakes.  
   It’s low cost, easy to maintain, and sets us up for the agentic coding 
future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to