This is really cool, thanks for sharing Kaxil and Pavan. Thanks & Regards, Amogh Desai
On Thu, Mar 5, 2026 at 6:34 PM Kaxil Naik <[email protected]> wrote: > Hi everyone, > > Pavan and I have been working on AIP-99 native agentic AI for Airflow 3. > The first set of PRs have landed. > > The core idea: Airflow already has 350+ provider hooks, each > pre-authenticated through connections. AIP-99 turns those hooks directly > into AI agent tools. > > What's available now: > > 1. HookToolset: wraps any Airflow hook into AI-callable tools with > explicit allowed_methods: > > from airflow.providers.common.ai.toolsets import HookToolset > > HookToolset(hook=S3Hook(aws_conn_id="my_aws"), > allowed_methods=["list_keys"]) > > 2. SQLToolset: 4 curated database tools (list tables, describe schema, > execute query, fetch results) scoped to specific tables. > > 3. DataFusionToolset — lets AI agents query files on object stores (S3, > local filesystem, Iceberg) through Apache DataFusion. Agents get SQL > access to Parquet, CSV, and Avro files without loading them into a > database. > > 4. MCPToolset: connects to external MCP servers via Airflow connections. > > 5. Task decorators (Operators are also available :) ): > - @task.llm : single LLM call with structured output > - @task.agent : multi-step agent with tool access > - @task.llm_sql : text-to-SQL pipelines > - @task.llm_schema_compare : cross-database schema diffing > > LLM connections are configured through > Airflow's standard connection model, supporting OpenAI, Anthropic, Google, > Ollama, etc. > > HITL (Human-in-the-Loop) integration is also in progress as a draft PR. > > Project Board: > - https://github.com/orgs/apache/projects/586 > > Summit talk where we previewed this: > https://www.youtube.com/watch?v=XSAzSDVUi2o > > Separate from the AI work, AIP-99 also adds an AnalyticsOperator powered > by Apache DataFusion for high-performance SQL on object stores: > > - AnalyticsOperator — run SQL queries directly against S3, GCS, local > files, and Iceberg tables. Supports Parquet, CSV, Avro. > - @task.analytics decorator — TaskFlow API support for the above. > - Iceberg support via PyIceberg with Glue catalog integration. > > Pavan and I would love it if folks can start testing out and create GitHub > issues if you run into bugs. Our intention is to keep it at 0.x version so > we can iterate on it faster. Looking forward to feedback. > > Thanks, > Kaxil >
