Thank you Kou! At least initially, I don't think I'll be able to complete the Dataset integration in time. So 10.0.0 probably won't ship with a hard dependency. That said I am hoping to have PyArrow take an optional dependency (so Flight SQL can finally be available from Python).
On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: > Hi, > > As a maintainer of Linux packages, I want apache/arrow-adbc > to be released before apache/arrow is released so that > apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's > .deb/.rpm. > > (If Apache Arrow Dataset uses apache/arrow-adbc, > apache/arrow's .deb/.rpm needs to depend on > apache/arrow-adbc's .deb/.rpm.) > > We can add .deb/.rpm related files > (dev/tasks/linux-packages/ in apache/arrow) to > apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc. > > FYI: I did it for datafusion-contrib/datafusion-c: > > * https://github.com/datafusion-contrib/datafusion-c/tree/main/package > * > https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml > > I can work on it in apache/arrow-adbc. > > > Thanks, > -- > kou > > In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com> > "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 > 11:51:08 -0400, > "David Li" <lidav...@apache.org> wrote: > >> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of >> text that follows…) >> >> These are the components: >> >> - Core adbc.h header >> - Driver manager for C/C++ >> - Flight SQL-based driver >> - Postgres-based driver (WIP) >> - SQLite-based driver (more of a testbed for me than an actual component - I >> don't think we'd actually distribute this) >> - Java core interfaces >> - Java driver manager >> - Java JDBC-based driver >> - Java Flight SQL-based driver >> - Python driver manager >> >> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL drivers >> get moved to the main Arrow repo and distributed as part of the regular >> Arrow releases. >> >> For the rest of the components: they could be packaged individually, but >> versioned and released together. Also, each C/C++ driver probably needs a >> corresponding Python package so Python users do not have to futz with shared >> library configurations. (See [1].) So for instance, installing PyArrow would >> also give you the Flight SQL driver, and `pip install adbc_postgres` would >> get you the Postgres-based driver. >> >> That would mean setting up separate CI, release, etc. (and eventually >> linking Crossbow & Conbench as well?). That does mean duplication of effort, >> but the trade off is avoiding bloating the main release process even >> further. However, I'd like to hear from those closer to the release process >> on this subject - if it would make people's lives easier, we could merge >> everything into one repo/process. >> >> Integrations would be distributed as part of their respective packages (e.g. >> Arrow Dataset would optionally link to the driver manager). So the "part of >> Arrow 10.0.0" aspect means having a stable interface for adbc.h, and getting >> the Flight SQL drivers into the main repo. >> >> [1]: https://github.com/apache/arrow-adbc/issues/53 >> >> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: >>> On Fri, 19 Aug 2022 14:09:44 -0400 >>> "David Li" <lidav...@apache.org> wrote: >>>> Since it's been a while, I'd like to give an update. There are also a few >>>> questions I have around distribution. >>>> >>>> Currently: >>>> - Supported in C, Java, and Python. >>>> - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, >>>> with a draft of a libpq (Postgres) driver (using nanoarrow). >>>> - For Java, there are drivers wrapping JDBC and Flight SQL. >>>> - For Python, there's low-level bindings to the C API, and the DBAPI >>>> interface on top of that (+a few extension methods resembling >>>> DuckDB/Turbodbc). >>>> >>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. (I'd >>>> like to thank Hannes and Kirill for their comments, as well as Antoine, >>>> Dewey, and Matt here.) >>>> >>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm not >>>> sure how we would like to handle packaging and distribution. In >>>> particular, there are several sub-components for each language (the driver >>>> manager + the drivers), increasing the work. Any thoughts here? >>> >>> Sorry, forgot to answer here. But I think your question is too broadly >>> formulated. It probably deserves a case-by-case discussion, IMHO. >>> >>>> I'm also wondering how we want to handle this in terms of specification - >>>> I assume we'd consider the core header file/Java interfaces a spec like >>>> the C Data Interface/Flight RPC, and vote on them/mirror them into the >>>> format/ directory? >>> >>> That sounds like the right way to me indeed. >>> >>> Regards >>> >>> Antoine.