Re: [D] Making Hudi Github Project More Agentic [hudi]

via GitHub Wed, 15 Apr 2026 19:59:17 -0700


GitHub user nsivabalan added a comment to the discussion: Making Hudi Github 
Project More Agentic


Let me recap a structured approach for integrating AI agents into Hudi in a 
practical and incremental way.

Workstream 1: Agent-Friendly Repository

Introduce lightweight, maintainable documentation to help both humans and AI 
agents better understand the project:

Add a concise AGENTS.md describing key concepts, workflows, and entry points
Add an ARCHITECTURE.md capturing high-level system design and core components
Optionally introduce a PR checklist item to keep these documents updated as the 
code evolves

The goal is to improve discoverability without introducing heavy maintenance 
overhead.

Workstream 2: Agent-Friendly Runtime Interface

This aligns with the broader vision of a Claude/Codex-like experience for 
building and operating lakehouses. (Vinoth's proposal)

Phase 1: Read-Only Investigation Tools

Create a modern Hudi operator surface (via MCP and/or an enhanced CLI) focused 
on safe, read-only operations:

List tables
Describe table metadata
Fetch and inspect timeline
Summarize recent commits (inserts/updates)
Show table services state (compaction, clustering, cleaning)
Explain key configurations and their impact

This phase focuses on observability and understanding, enabling users (and 
agents) to reason about table state safely.

Phase 2: Guided Actions (Controlled Mutations)

After validating read-only workflows and gathering community feedback, 
introduce controlled write operations:

Trigger compaction and clustering
Trigger cleaner and archival
Trigger rollbacks for pending/inflight ingestion commits
Suggest configuration changes based on observed patterns
Generate commands first (instead of executing directly), allowing users to 
review before applying

The emphasis here is on safe, guided execution, not full automation.

Workstream 3: Guided Labs and Onboarding

Provide structured, hands-on experiences to help users get started quickly:

// This is what Soumil proposed above or have already started efforts on
Create guided workflows for:
Table creation and ingestion
Incremental processing
Table services (compaction, clustering)
Optionally integrate agent-assisted walkthroughs

The goal is to reduce the learning curve and accelerate adoption.

Workstream 4: Troubleshooting and Diagnostics

Enable agents to assist with common operational issues:

Diagnose ingestion failures and stuck pipelines
Identify performance bottlenecks (e.g., small files, skew, slow queries)
Analyze timeline issues (pending/inflight commits)
Provide actionable recommendations for fixes

This workstream focuses on operational productivity and debugging efficiency.

GitHub link: 
https://github.com/apache/hudi/discussions/18324#discussioncomment-16578617

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Making Hudi Github Project More Agentic [hudi]

Reply via email to