[I] [SIP-189] Context Engineering: Structured AI Context Management for Superset [superset]

via GitHub Thu, 23 Oct 2025 14:49:02 -0700


mistercrunch opened a new issue, #35822:
URL: https://github.com/apache/superset/issues/35822


   ## Motivation
   
   AI-assisted coding is changing how we build software. Tools like Claude 
Code, Cursor, and Copilot make developers faster and more productive — but only 
when they get the right context.
   
   Right now, Superset's AI context is scattered:
   * Partial instructions in `AGENTS.md` and `CLAUDE.md` with symlinks
   * Docs spread across repos, wikis, and PRs
   * Tribal knowledge that isn't written anywhere
   
   This makes it hard for both humans and AI tools to get up to speed.
   
   Across the development spectrum—large refactors, feature development, bug 
fixing, code review—we're seeing consistent 2x-10x velocity gains. Given these 
multipliers, **curating context may be one of the highest-leverage activities 
we can undertake**.
   
   We need a better system. "Context engineering" means treating the 
information AI relies on like code — organized, versioned, and reviewed. This 
SIP proposes adding a structured `context/` directory that makes Superset's 
core knowledge discoverable, modular, and up to date.
   
   ## Problem
   
   1. **Monolithic files don't scale** – AIs don't perform well with overloaded 
context
   2. **No discoverability** – Contributors and AIs can't find relevant info
   3. **No structure** – Unclear how context should live in the repo
   4. **Duplication avoidance** – Blocks people from writing good summaries
   5. **No curation process** – Context goes stale, stays incorrect
   
   Note: Stale or incorrect context is worse than no context at all.
   
   ## Proposed Change
   
   ### Principles
   
   * **Context as Code** – versioned, reviewed, and owned
   * **Small Chunks** – short, focused Markdown files
   * **Knowledge Graph** – link files using `@context/...` to build navigable 
structure
   * **OK to Duplicate** – optimize for clarity, not deduplication
   * **Task-Scoped** – load only what's needed for the current change
   
   ### Directory Structure
   
   ```
   context/
   ├── lexicon.md          # Index of all context files
   ├── core.md             # What Superset is and why it exists
   ├── architecture.md     # High-level system overview
   ├── architecture/       # Detailed architecture topics
   │   ├── frontend.md
   │   ├── backend.md
   │   ├── data-pipeline.md
   │   └── caching.md
   ├── concepts/           # Core ideas
   │   ├── viz-plugins.md
   │   ├── datasets.md
   │   ├── dashboards.md
   │   └── semantic-layer.md
   ├── guidelines/         # Development patterns
   │   ├── coding/
   │   │   ├── python-style.md
   │   │   ├── react-patterns.md
   │   │   └── testing.md
   │   └── design/
   │       ├── ui-patterns.md
   │       └── accessibility.md
   ├── workflows/          # Common dev tasks
   │   ├── adding-viz-plugin.md
   │   ├── database-migration.md
   │   └── release-process.md
   └── refactors/          # Ongoing migrations
       ├── legacy-charts-migration.md
       └── typescript-conversion.md
   ```
   
   Each file explains one topic clearly, with links to other files. AI tools 
can load subsets based on the task at hand.
   
   ### Foundation Files
   
   **lexicon.md** – Master index listing all context files with short 
descriptions. Entry point for discovering context.
   
   Usage:
   ```
   "Review @context/lexicon.md and load relevant files for adding a caching 
layer to the dataset endpoint"
   ```
   
   **core.md** – Essential overview of what Superset is, core philosophy, key 
capabilities, target users.
   
   **architecture.md** – High-level view of system components, tech stack, 
architectural patterns, deployment variations.
   
   ### Self-Referencing
   
   Context files freely reference each other to build a knowledge graph:
   
   ```markdown
   # Adding a Visualization Plugin
   
   See @context/architecture/frontend.md for frontend architecture.
   See @context/concepts/viz-plugins.md for plugin concepts.
   
   Steps to add a plugin:
   1. Create directory under `superset-frontend/plugins/`
   2. ...
   ```
   
   ### Scope
   
   **Include:**
   - Architectural overviews and deep dives
   - Core concepts and patterns
   - Development guidelines and best practices
   - Common workflows and procedures
   - Active refactoring initiatives
   - Design patterns, testing strategies, security considerations
   - RFCs that define ongoing patterns
   - Tribal knowledge not documented elsewhere
   
   **Exclude:**
   - Individual issue discussions (keep in GitHub Issues)
   - Pull request details (keep in PRs)
   - User-facing documentation (keep in `docs/`)
   - Historical decisions without current relevance
   - Auto-generated documentation (reference from context)
   - Rapidly changing operational details (link to canonical sources)
   
   ### Best Practices
   
   * Keep each file under 500 lines (~1000-2000 tokens)
   * Single topic per file
   * Include "Last updated" metadata, optional "Context owner"
   * Write for humans and AI both
   * Cross-link liberally
   * Prefer practical, example-driven writing
   * Self-contained enough to understand without reading entire graph
   
   ### Ownership and Maintenance
   
   * Context files reviewed like code
   * PRs that add new patterns must update context
   * Committers can designate context owners for specific files
   * Quarterly reviews to remove stale content
   * Stale context is worse than no context—keep it fresh
   
   ### Integration with AI Tools
   
   Works with Claude Code, Cursor, Copilot, and custom agents through 
standardized file structure.
   
   Examples:
   ```bash
   # In chat
   "I'm adding a new viz plugin. Reference 
@context/workflows/adding-viz-plugin.md 
   and @context/concepts/viz-plugins.md"
   
   # Command line (if supported)
   claude-code --context @context/lexicon.md "Add caching to dataset endpoint"
   ```
   
   ## New or Changed Public Interfaces
   
   **Context Directory Structure**
   - New developer-facing interface: standardized `context/` directory
   - Consumers: developers, AI tools, documentation generators
   - Versioning: follows semantic versioning
     - Major: breaking changes to structure
     - Minor: new files/directories
     - Patch: content updates
   
   **Developer Workflow**
   - PR checklist includes "Context updated if introducing new patterns"
   - Context reviewed like code
   
   **Documentation Relationship**
   - User docs in `docs/` remain authoritative for end-users
   - Context in `context/` optimized for developer/AI consumption
   - Intentional overlap between the two
   
   ## Migration Plan and Compatibility
   
   **Rollout Phases:**
   
   **Phase 1** – Foundation
   - Create `context/` directory structure
   - Seed `core.md`, `architecture.md`, `lexicon.md`
   - Migrate useful content from `AGENTS.md` / `CLAUDE.md`
   - Document in `CONTRIBUTING.md`
   
   **Phase 2** – Population
   - Add high-value workflows and guidelines
   - Encourage context updates in feature PRs
   - Document 3-5 complete workflow guides
   
   **Phase 3** – Integration
   - Add to PR template
   - Create examples for popular AI tools
   - Announce on mailing list
   
   **Compatibility:**
   - Backward compatible—existing `AGENTS.md` can remain during transition
   - No breaking changes, database migrations, or API changes
   - Purely additive
   - Deprecate `AGENTS.md` after 6 months with pointer to `context/`
   
   ## Success Metrics
   
   * Number of context files created
   * Frequency of context updates in PRs
   * Contributor feedback on AI-assisted development velocity
   * Reduction in repetitive "how do I..." questions
   
   ## Rejected Alternatives
   
   **One giant `AGENTS.md`**  
   Doesn't scale beyond ~1000 lines. Poor signal-to-noise for specific tasks. 
Limited context windows make large files problematic.
   
   **Wiki for AI context**  
   Not versioned with code. Diverges from codebase. No code review. Less 
discoverable for AI tools.
   
   **Embed in code comments**  
   Pollutes codebase. Difficult to maintain consistency. Doesn't help with 
architectural/workflow context.
   
   **Auto-generated from code**  
   Lacks curation, architectural decisions, and tribal knowledge. Better as 
complement, not replacement.
   
   **Single "Superset for AI" doc**  
   Doesn't scale—would become tens of thousands of lines. Impossible to 
maintain. Poor for context windows.
   
   **`AGENTS/` instead of `context/`**  
   Context serves humans too, not just agents. More accurate and future-proof.
   
   ## Summary
   
   This SIP formalizes context as code. A structured `context/` directory helps 
both humans and AI assistants work faster with fewer mistakes. It's low cost, 
easy to maintain, and positions Superset for the agentic coding era.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [SIP-189] Context Engineering: Structured AI Context Management for Superset [superset]

Reply via email to