Thank you Kou!

At least initially, I don't think I'll be able to complete the Dataset 
integration in time. So 10.0.0 probably won't ship with a hard dependency. That 
said I am hoping to have PyArrow take an optional dependency (so Flight SQL can 
finally be available from Python).

On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
> Hi,
>
> As a maintainer of Linux packages, I want apache/arrow-adbc
> to be released before apache/arrow is released so that
> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
> .deb/.rpm.
>
> (If Apache Arrow Dataset uses apache/arrow-adbc,
> apache/arrow's .deb/.rpm needs to depend on
> apache/arrow-adbc's .deb/.rpm.)
>
> We can add .deb/.rpm related files
> (dev/tasks/linux-packages/ in apache/arrow) to
> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc.
>
> FYI: I did it for datafusion-contrib/datafusion-c:
>
> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package
> * 
> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml
>
> I can work on it in apache/arrow-adbc.
>
>
> Thanks,
> -- 
> kou
>
> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>
>   "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 
> 11:51:08 -0400,
>   "David Li" <lidav...@apache.org> wrote:
>
>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of 
>> text that follows…)
>> 
>> These are the components:
>> 
>> - Core adbc.h header
>> - Driver manager for C/C++
>> - Flight SQL-based driver
>> - Postgres-based driver (WIP)
>> - SQLite-based driver (more of a testbed for me than an actual component - I 
>> don't think we'd actually distribute this)
>> - Java core interfaces
>> - Java driver manager
>> - Java JDBC-based driver
>> - Java Flight SQL-based driver
>> - Python driver manager
>> 
>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL drivers 
>> get moved to the main Arrow repo and distributed as part of the regular 
>> Arrow releases.
>> 
>> For the rest of the components: they could be packaged individually, but 
>> versioned and released together. Also, each C/C++ driver probably needs a 
>> corresponding Python package so Python users do not have to futz with shared 
>> library configurations. (See [1].) So for instance, installing PyArrow would 
>> also give you the Flight SQL driver, and `pip install adbc_postgres` would 
>> get you the Postgres-based driver.
>> 
>> That would mean setting up separate CI, release, etc. (and eventually 
>> linking Crossbow & Conbench as well?). That does mean duplication of effort, 
>> but the trade off is avoiding bloating the main release process even 
>> further. However, I'd like to hear from those closer to the release process 
>> on this subject - if it would make people's lives easier, we could merge 
>> everything into one repo/process.
>> 
>> Integrations would be distributed as part of their respective packages (e.g. 
>> Arrow Dataset would optionally link to the driver manager). So the "part of 
>> Arrow 10.0.0" aspect means having a stable interface for adbc.h, and getting 
>> the Flight SQL drivers into the main repo.
>> 
>> [1]: https://github.com/apache/arrow-adbc/issues/53
>> 
>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
>>> On Fri, 19 Aug 2022 14:09:44 -0400
>>> "David Li" <lidav...@apache.org> wrote:
>>>> Since it's been a while, I'd like to give an update. There are also a few 
>>>> questions I have around distribution.
>>>> 
>>>> Currently:
>>>> - Supported in C, Java, and Python.
>>>> - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, 
>>>> with a draft of a libpq (Postgres) driver (using nanoarrow).
>>>> - For Java, there are drivers wrapping JDBC and Flight SQL.
>>>> - For Python, there's low-level bindings to the C API, and the DBAPI 
>>>> interface on top of that (+a few extension methods resembling 
>>>> DuckDB/Turbodbc).
>>>>  
>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. (I'd 
>>>> like to thank Hannes and Kirill for their comments, as well as Antoine, 
>>>> Dewey, and Matt here.)
>>>> 
>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm not 
>>>> sure how we would like to handle packaging and distribution. In 
>>>> particular, there are several sub-components for each language (the driver 
>>>> manager + the drivers), increasing the work. Any thoughts here?
>>>
>>> Sorry, forgot to answer here. But I think your question is too broadly
>>> formulated. It probably deserves a case-by-case discussion, IMHO.
>>>
>>>> I'm also wondering how we want to handle this in terms of specification - 
>>>> I assume we'd consider the core header file/Java interfaces a spec like 
>>>> the C Data Interface/Flight RPC, and vote on them/mirror them into the 
>>>> format/ directory?
>>>
>>> That sounds like the right way to me indeed.
>>>
>>> Regards
>>>
>>> Antoine.

Reply via email to