Hi, > Do we have a preference for versioning strategy? Should we > proceed in lockstep with the Arrow C++ library et. al. and > release "ADBC 1.0.0" (the API standard) with "drivers > version 10.0.0", or use an independent versioning scheme? > (For example, release API standard and components at > "1.0.0". Then further releases of components that do not > change the spec would be "1.1", "1.2", ...; if/when we > change the spec, start over with "2.0", "2.1", ...)
I like an independent versioning schema. I assume that ADBC doesn't need backward incompatible changes frequently. How about incrementing major version only when ADBC needs any backward incompatible changes? e.g.: 1. Release ADBC (the API standard) 1.0.0 2. Release adbc_driver_manager 1.0.0 3. Release adbc_driver_postgres 1.0.0 4. Add a new feature to adbc_driver_postgres without any backward incompatible changes 5. Release adbc_driver_postgres 1.1.0 6. Fix a bug in adbc_driver_manager without any backward incompatible changes 7. Release adbc_driver_manager 1.0.1 8. Add a backward incompatible change to adbc_driver_manager 9. Release adbc_driver_manager 2.0.0 10. Add a new feature to ADBC without any backward incompatible changes 11. Release ADBC (the API standard) 1.1.0 Thanks, -- kou In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep 2022 16:36:43 -0400, "David Li" <lidav...@apache.org> wrote: > Following up here with some specific questions: > > Matt Topol added some Go definitions [1] (thanks!) I'd assume we want to vote > on those as well? > > How should the process work for Java/Go? For C/C++, I assume we'd treat it > like the C Data Interface and copy adbc.h to format/ after a vote, and then > vote on releases of components. Or do we really only consider the C header as > the 'format', with the others being language-specific affordances? > > What about for Java and for Go? We could vote on and tag a release for Go, > and add a documentation page that links to the Java/Go definitions at a > specific revision (as the equivalent 'format' definition for Java/Go)? Or > would we vendor the entire Java module/Go package as the 'format'? > > Do we have a preference for versioning strategy? Should we proceed in > lockstep with the Arrow C++ library et. al. and release "ADBC 1.0.0" (the API > standard) with "drivers version 10.0.0", or use an independent versioning > scheme? (For example, release API standard and components at "1.0.0". Then > further releases of components that do not change the spec would be "1.1", > "1.2", ...; if/when we change the spec, start over with "2.0", "2.1", ...) > > [1]: https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go > > -David > > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote: >> Hi, >> >> OK. I'll send pull requests for GLib and Ruby soon. >> >>> I'm curious if you have a particular use case in mind. >> >> I don't have any production-ready use case yet but I want to >> implement an Active Record adapter for ADBC. Active Record >> is the O/R mapper for Ruby on Rails. Implementing Web >> application by Ruby on Rails is one of major Ruby use >> cases. So providing Active Record interface for ADBC will >> increase Apache Arrow users in Ruby community. >> >> NOTE: Generally, Ruby on Rails users don't process large >> data but they sometimes need to process large (medium?) data >> in a batch process. Active Record adapter for ADBC may be >> useful for such use case. >> >>> There's a little bit more API cleanup to do [1]. If you >>> have comments on that or anything else, I'd appreciate >>> them. Otherwise, pull requests would also be appreciated. >> >> OK. I'll open issues/pull requests when I find >> something. For now, I think that "MODULE" type library >> instead of "SHARED" type library in CMake terminology >> [cmake] is better for driver modules. (I'll open an issue >> for this later.) >> >> [cmake]: https://cmake.org/cmake/help/latest/command/add_library.html >> >> >> Thanks, >> -- >> kou >> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27 Aug 2022 >> 15:28:56 -0400, >> "David Li" <lidav...@apache.org> wrote: >> >>> I would be very happy to see GLib/Ruby bindings! I'm curious if you have a >>> particular use case in mind. >>> >>> There's a little bit more API cleanup to do [1]. If you have comments on >>> that or anything else, I'd appreciate them. Otherwise, pull requests would >>> also be appreciated. >>> >>> [1]: https://github.com/apache/arrow-adbc/issues/79 >>> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote: >>>> Hi, >>>> >>>> Thanks for sharing the current status! >>>> I understand. >>>> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc >>>> before we release the first version? (I want to use ADBC >>>> from Ruby.) Or should I wait for the first release? If I can >>>> work on it now, I'll open pull requests for it. >>>> >>>> Thanks, >>>> -- >>>> kou >>>> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com> >>>> "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug 2022 >>>> 11:03:26 -0400, >>>> "David Li" <lidav...@apache.org> wrote: >>>> >>>>> Thank you Kou! >>>>> >>>>> At least initially, I don't think I'll be able to complete the Dataset >>>>> integration in time. So 10.0.0 probably won't ship with a hard >>>>> dependency. That said I am hoping to have PyArrow take an optional >>>>> dependency (so Flight SQL can finally be available from Python). >>>>> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote: >>>>>> Hi, >>>>>> >>>>>> As a maintainer of Linux packages, I want apache/arrow-adbc >>>>>> to be released before apache/arrow is released so that >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's >>>>>> .deb/.rpm. >>>>>> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc, >>>>>> apache/arrow's .deb/.rpm needs to depend on >>>>>> apache/arrow-adbc's .deb/.rpm.) >>>>>> >>>>>> We can add .deb/.rpm related files >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to >>>>>> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc. >>>>>> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c: >>>>>> >>>>>> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package >>>>>> * >>>>>> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml >>>>>> >>>>>> I can work on it in apache/arrow-adbc. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> -- >>>>>> kou >>>>>> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 >>>>>> 11:51:08 -0400, >>>>>> "David Li" <lidav...@apache.org> wrote: >>>>>> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall >>>>>>> of text that follows…) >>>>>>> >>>>>>> These are the components: >>>>>>> >>>>>>> - Core adbc.h header >>>>>>> - Driver manager for C/C++ >>>>>>> - Flight SQL-based driver >>>>>>> - Postgres-based driver (WIP) >>>>>>> - SQLite-based driver (more of a testbed for me than an actual >>>>>>> component - I don't think we'd actually distribute this) >>>>>>> - Java core interfaces >>>>>>> - Java driver manager >>>>>>> - Java JDBC-based driver >>>>>>> - Java Flight SQL-based driver >>>>>>> - Python driver manager >>>>>>> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL >>>>>>> drivers get moved to the main Arrow repo and distributed as part of the >>>>>>> regular Arrow releases. >>>>>>> >>>>>>> For the rest of the components: they could be packaged individually, >>>>>>> but versioned and released together. Also, each C/C++ driver probably >>>>>>> needs a corresponding Python package so Python users do not have to >>>>>>> futz with shared library configurations. (See [1].) So for instance, >>>>>>> installing PyArrow would also give you the Flight SQL driver, and `pip >>>>>>> install adbc_postgres` would get you the Postgres-based driver. >>>>>>> >>>>>>> That would mean setting up separate CI, release, etc. (and eventually >>>>>>> linking Crossbow & Conbench as well?). That does mean duplication of >>>>>>> effort, but the trade off is avoiding bloating the main release process >>>>>>> even further. However, I'd like to hear from those closer to the >>>>>>> release process on this subject - if it would make people's lives >>>>>>> easier, we could merge everything into one repo/process. >>>>>>> >>>>>>> Integrations would be distributed as part of their respective packages >>>>>>> (e.g. Arrow Dataset would optionally link to the driver manager). So >>>>>>> the "part of Arrow 10.0.0" aspect means having a stable interface for >>>>>>> adbc.h, and getting the Flight SQL drivers into the main repo. >>>>>>> >>>>>>> [1]: https://github.com/apache/arrow-adbc/issues/53 >>>>>>> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote: >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400 >>>>>>>> "David Li" <lidav...@apache.org> wrote: >>>>>>>>> Since it's been a while, I'd like to give an update. There are also a >>>>>>>>> few questions I have around distribution. >>>>>>>>> >>>>>>>>> Currently: >>>>>>>>> - Supported in C, Java, and Python. >>>>>>>>> - For C/Python, there are basic drivers wrapping Flight SQL and >>>>>>>>> SQLite, with a draft of a libpq (Postgres) driver (using nanoarrow). >>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight SQL. >>>>>>>>> - For Python, there's low-level bindings to the C API, and the DBAPI >>>>>>>>> interface on top of that (+a few extension methods resembling >>>>>>>>> DuckDB/Turbodbc). >>>>>>>>> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. >>>>>>>>> (I'd like to thank Hannes and Kirill for their comments, as well as >>>>>>>>> Antoine, Dewey, and Matt here.) >>>>>>>>> >>>>>>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm >>>>>>>>> not sure how we would like to handle packaging and distribution. In >>>>>>>>> particular, there are several sub-components for each language (the >>>>>>>>> driver manager + the drivers), increasing the work. Any thoughts here? >>>>>>>> >>>>>>>> Sorry, forgot to answer here. But I think your question is too broadly >>>>>>>> formulated. It probably deserves a case-by-case discussion, IMHO. >>>>>>>> >>>>>>>>> I'm also wondering how we want to handle this in terms of >>>>>>>>> specification - I assume we'd consider the core header file/Java >>>>>>>>> interfaces a spec like the C Data Interface/Flight RPC, and vote on >>>>>>>>> them/mirror them into the format/ directory? >>>>>>>> >>>>>>>> That sounds like the right way to me indeed. >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Antoine.