Hi David, Thanks for raising this proposal; it would be a great addition. I was actually planning to discuss this with you, as I believe it will support AIP-99 by allowing this hook to provide rich context and sample data for LLMs.
I am +1 for this :) Regards Pavan On Tue, Mar 3, 2026 at 3:53 PM Blain David <[email protected]> wrote: > Hello everyone, > > Following some initial discussions with Jarek Potiuk and a previously > opened PR, I would like to formally propose the introduction of an Apache > Arrow / ADBC provider for Airflow. > > Context & Motivation: > > While Airflow has a rich set of database-specific providers, the data > ecosystem is rapidly shifting toward ADBC (Arrow Database Connectivity). > ADBC solves many of the "bottleneck" issues associated with traditional > DB-API 2.0, ODBC or JDBC drivers by leveraging columnar data access and > Arrow-native memory representation. > > We are seeing significant momentum here: > > > * Performance: Significant reduction in serialization overhead for > bulk operations. While results vary by driver maturity and server-side > native Arrow support (e.g., flight endpoints), ADBC provides a much higher > performance ceiling than standard PEP 249 drivers. > * Standardization: Systems like Snowflake, Apache DataFusion and > DuckDB are increasingly treating Arrow as a first-class citizen. > * Future-proofing: Tools like dbt-fusion and various lakehouse > architectures are moving toward Arrow-based execution. > > The Proposal: > > I propose adding an apache-airflow-providers-apache-arrow (or similar) > that introduces an AdbcHook. > > Key Technical Highlights: > > > * Compatibility: By implementing DbApiHook, the AdbcHook will be > immediately compatible with existing SQL operators. > * Efficiency: It will offer a high-performance alternative to > traditional row-based drivers without requiring users to rewrite their DAG > logic. > * Scope: Focus on providing a standardized interface for Arrow-native > bulk reads and writes (future enhancement in AdbcHook). > > Community & Maintenance: > > I have already started the groundwork in a Draft PR (#52330). > > I believe this aligns with the project's goal of supporting > high-performance data engineering patterns. I'm looking for feedback on: > > > * Naming: Should this be a standalone adbc provider or part of an > apache.arrow provider? I chose the later but to be discussed. > * Scope: At the moment I was only focusing purely on the > Hook/Connection, as it extends the DbAPiHook and implements all required > methods, it's already directly useable in SQL-operators. > > I'd love to gather your thoughts and gauge interest before moving to a > formal voting thread. > > Draft PR: https://github.com/apache/airflow/pull/52330 > > Best regards, > David >
