Hi everyone,

I'm a PhD student at Carnegie Mellon working on database research,
co-advised by Andy Pavlo and Jignesh Patel. I've worked on DuckDB in the
past through my research (https://github.com/duckdb/duckdb/pull/7528), and
I'm currently doing an internship at Columnar.

My main internship project has been developing a DuckDB extension for ADBC.
The extension lets DuckDB users connect to Snowflake, Databricks, BigQuery,
PostgreSQL, MySQL, and any other system with an ADBC driver.

The extension supports querying ADBC databases directly through a
`read_adbc` table function. It also supports using `ATTACH` to connect to
an ADBC database and then running `SELECT`, `INSERT`, `COPY`, and CTAS
statements as if the database were local to DuckDB.

The repo is now public here:
https://github.com/columnar-tech/duckdb-adbc-client/

For those following recent work at the intersection of DuckDB and ADBC, you
may have seen that community member Rusty Conover previously published an
AI-developed `adbc_scanner` community extension for DuckDB. We took a
different approach with this extension, choosing to hand-code the core
pieces as part of the academic goals of my internship. The extension also
integrates with ADBC connection profiles, aims for broad database
compatibility, and includes automatic connection pooling, automatic
metadata caching, and memory-efficient `INSERT` and CTAS support through
streaming bulk ingest operations.

Since we made the repo public, we have also seen some of these ideas and
capabilities begin to appear in related community work. That is allowed
under the Apache 2.0 license, and we want to be clear that we welcome
experimentation and reuse. At the same time, the speed with which
AI-assisted development can absorb and repackage work makes attribution,
coordination, and shared governance especially important. Columnar is also
a financial supporter of DuckLabs, and we care about keeping the
DuckDB/ADBC ecosystem collaborative and healthy.

With that in mind, although we initially developed this extension under the
Columnar GitHub organization for convenience, we are interested in donating
it to the Arrow project and moving it to an ASF repo. There is some work to
do to assess the feasibility and details, including how DuckDB release
cycles would interact with ADBC release cycles. But we think it would be
valuable to have an official ADBC client extension for DuckDB maintained
under ASF governance, much like the official ADBC client libraries for
various languages. Our hope is that this could provide a neutral place for
third-party contributors, including Rusty and others in the DuckDB and ADBC
communities, to collaborate rather than maintaining separate DuckDB/ADBC
extensions with overlapping goals.

I’d appreciate feedback from the Arrow community on whether this seems like
a good direction and what the right next steps would be.

In the meantime, please take a look at the repo, try the extension using
the instructions in the README, and open issues for any bugs, compatibility
problems, or design feedback:
https://github.com/columnar-tech/duckdb-adbc-client/

- Sam

Reply via email to