Some updates:

The proposal is being updated based on feedback from contributors to DuckDB and 
DBI. We've been using GitHub issues on the fork to discuss the API design and 
how to implement data ingestion/bound parameters: 
https://github.com/lidavidm/arrow/issues 

If anyone has suggestions/ideas/questions, or would like to jump in as well, 
please feel free to chime in there too.

I have also been wondering if we might want to plan to split off a new repo for 
this work? In particular, some components might be easiest to consume if they 
didn't also have a hard dependency on the Arrow C++ libraries. And we could use 
the repo to manage contributed drivers (some of which may individually leverage 
the Arrow libraries). Of course, maintaining a parallel build system, setting 
up releases, etc. is also a lot of work.

-David

On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote:
> I don't have major new things to add on this topic except that I've
> long had the aspiration of creating something like Python's DBAPI 2.0
> [1] at the C or C++ level to enable a measure of API standardization
> for Arrow-native read/write interfaces with database drivers. It seems
> like a natural complement to the wire-protocol standardization work
> with FlightSQL. I had previously brought in some code that I had
> worked on related to interfacing with the HiveServer2 wire protocol
> (for Hive and Impala, or other HS2-compatible query engines) with the
> intention of prototyping but never was able to find the time.
>
> From an external messaging standpoint, one thing that will be
> important is to assert that this is not intended to displace or
> deprecate ODBC or JDBC drivers. In fact, I would hope that the
> Arrow-native APIs could be added somehow to existing driver libraries
> where it made sense, so that if they are used in an application that
> uses Arrow, they can opt in to using the Arrow-based APIs for getting
> result sets, or doing bulk inserts, etc.
>
> [1]: https://peps.python.org/pep-0249/
>
> On Tue, Apr 26, 2022 at 12:36 PM Antoine Pitrou <anto...@python.org> wrote:
>>
>>
>> Do we want something more flexible than dlopen() and runtime symbol
>> lookup (a mechanism which constrains the way you can organize and
>> distribute drivers)?
>>
>> For example, perhaps we could expose an API struct of function pointers
>> that could be obtained through driver-specific means.
>>
>>
>> Le 26/04/2022 à 18:29, David Li a écrit :
>> > Hello,
>> >
>> > In light of recent efforts around Flight SQL, projects like pgeon [1], and 
>> > long-standing tickets/discussions about database support in Arrow [2], it 
>> > seems there's an opportunity to define standard database interfaces for 
>> > Arrow that could unify these efforts. So we've put together a proposal for 
>> > "ADBC", a common Arrow-based database client API:
>> >
>> > https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/edit#heading=h.r6o6j2navi4c
>> >
>> > A common API and implementations could help combine/simplify client-side 
>> > projects like pgeon, or what DBI is considering [3], and help them take 
>> > advantage of developments like Flight SQL and existing columnar APIs.
>> >
>> > We'd appreciate any feedback. (Comments should be open, please let me know 
>> > if not.)
>> >
>> > [1]: https://github.com/0x0L/pgeon
>> > [2]: https://issues.apache.org/jira/browse/ARROW-11670
>> > [3]: https://github.com/r-dbi/dbi3/issues/48
>> >
>> > Thanks,
>> > David

Reply via email to