GitHub user nsivabalan added a comment to the discussion: Making Hudi Github Project More Agentic
Let me recap a structured approach for integrating AI agents into Hudi in a practical and incremental way. Workstream 1: Agent-Friendly Repository Introduce lightweight, maintainable documentation to help both humans and AI agents better understand the project: Add a concise AGENTS.md describing key concepts, workflows, and entry points Add an ARCHITECTURE.md capturing high-level system design and core components Optionally introduce a PR checklist item to keep these documents updated as the code evolves The goal is to improve discoverability without introducing heavy maintenance overhead. Workstream 2: Agent-Friendly Runtime Interface This aligns with the broader vision of a Claude/Codex-like experience for building and operating lakehouses. (Vinoth's proposal) Phase 1: Read-Only Investigation Tools Create a modern Hudi operator surface (via MCP and/or an enhanced CLI) focused on safe, read-only operations: List tables Describe table metadata Fetch and inspect timeline Summarize recent commits (inserts/updates) Show table services state (compaction, clustering, cleaning) Explain key configurations and their impact This phase focuses on observability and understanding, enabling users (and agents) to reason about table state safely. Phase 2: Guided Actions (Controlled Mutations) After validating read-only workflows and gathering community feedback, introduce controlled write operations: Trigger compaction and clustering Trigger cleaner and archival Trigger rollbacks for pending/inflight ingestion commits Suggest configuration changes based on observed patterns Generate commands first (instead of executing directly), allowing users to review before applying The emphasis here is on safe, guided execution, not full automation. Workstream 3: Guided Labs and Onboarding Provide structured, hands-on experiences to help users get started quickly: // This is what Soumil proposed above or have already started efforts on Create guided workflows for: Table creation and ingestion Incremental processing Table services (compaction, clustering) Optionally integrate agent-assisted walkthroughs The goal is to reduce the learning curve and accelerate adoption. Workstream 4: Troubleshooting and Diagnostics Enable agents to assist with common operational issues: Diagnose ingestion failures and stuck pipelines Identify performance bottlenecks (e.g., small files, skew, slow queries) Analyze timeline issues (pending/inflight commits) Provide actionable recommendations for fixes This workstream focuses on operational productivity and debugging efficiency. GitHub link: https://github.com/apache/hudi/discussions/18324#discussioncomment-16578617 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
