Re: [DISC] Improving Arrow's database support

Matthew Topol Mon, 12 Sep 2022 07:20:00 -0700

Automated semver would be ideal if we can do it.....

There's quite a lot of utilities that exist which would automaticallyhandle the versioning if we're using conventional commits.

On Mon, Sep 12 2022 at 02:26:15 PM +0200, Jacob Wujciak<ja...@voltrondata.com.INVALID> wrote:

+ 1 to independent, semver versioning for adbc.
I would propose we use conventional commit style [1] commit messagesfor
the pr commits (I assume squash + merge) so we can automate the
versioning|double check manual versioning.

[1]: <https://www.conventionalcommits.org/>
On Thu, Sep 8, 2022 at 6:05 PM David Li <lidav...@apache.org<mailto:lidav...@apache.org>> wrote:
Thanks all, I've updated the header with the proposed versioningscheme.
At this point I believe the core definitions are ready. (Note thatI'mexplicitly punting on [1][2][3] here.) Absent further comments, I'dlike to
 do the following:

 - Start a vote on mirroring adbc.h to arrow/format, as well adding
docs/source/format/ADBC.rst that describes the header, the Javainterface,the Go interface, and the versioning scheme (I will put up a PRbeforehand)- Begin work on CI/packaging, with a release hopefully coincidingwith
 Arrow 10.0.0
- Begin work on changes to the main repository, also hopefully intime for10.0.0 (moving the Flight SQL driver to be part of apache/arrow;exposing
 it in PyArrow; possibly also exposing Acero via ADBC)

 [1]: <https://github.com/apache/arrow-adbc/issues/46>
 [2]: <https://github.com/apache/arrow-adbc/issues/55>
 [3]: <https://github.com/apache/arrow-adbc/issues/59>

 On Sat, Sep 3, 2022, at 18:36, Matthew Topol wrote:
 > +1 from me on the strategy proposed by Kou.
 >
 > That would be my preference also. I agree it is preferable to be
 versioned
 > independently.
 >
 > --Matt
 >
> On Sat, Sep 3, 2022, 6:24 PM Sutou Kouhei <k...@clear-code.com<mailto:k...@clear-code.com>> wrote:
 >
 >> Hi,
 >>
 >> > Do we have a preference for versioning strategy? Should we
 >> > proceed in lockstep with the Arrow C++ library et. al. and
 >> > release "ADBC 1.0.0" (the API standard) with "drivers
 >> > version 10.0.0", or use an independent versioning scheme?
 >> > (For example, release API standard and components at
 >> > "1.0.0". Then further releases of components that do not
 >> > change the spec would be "1.1", "1.2", ...; if/when we
 >> > change the spec, start over with "2.0", "2.1", ...)
 >>
 >> I like an independent versioning schema. I assume that ADBC
 >> doesn't need backward incompatible changes frequently. How
 >> about incrementing major version only when ADBC needs
 >> any backward incompatible changes?
 >>
 >> e.g.:
 >>
 >>   1.  Release ADBC (the API standard) 1.0.0
 >>   2.  Release adbc_driver_manager 1.0.0
 >>   3.  Release adbc_driver_postgres 1.0.0
 >>   4.  Add a new feature to adbc_driver_postgres without
 >>       any backward incompatible changes
 >>   5.  Release adbc_driver_postgres 1.1.0
 >>   6.  Fix a bug in adbc_driver_manager without
 >>       any backward incompatible changes
 >>   7.  Release adbc_driver_manager 1.0.1
 >>   8.  Add a backward incompatible change to adbc_driver_manager
 >>   9.  Release adbc_driver_manager 2.0.0
 >>   10. Add a new feature to ADBC without any
 >>       backward incompatible changes
 >>   11. Release ADBC (the API standard) 1.1.0
 >>
 >>
 >> Thanks,
 >> --
 >> kou
 >>
>> In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com<mailto:7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com>>>> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep2022
 >> 16:36:43 -0400,
>> "David Li" <lidav...@apache.org <mailto:lidav...@apache.org>>wrote:
 >>
 >> > Following up here with some specific questions:
 >> >
>> > Matt Topol added some Go definitions [1] (thanks!) I'd assumewe want
 to
 >> vote on those as well?
 >> >
>> > How should the process work for Java/Go? For C/C++, I assumewe'd
 treat
>> it like the C Data Interface and copy adbc.h to format/ after avote,
 and
>> then vote on releases of components. Or do we really onlyconsider the C
 >> header as the 'format', with the others being language-specific
 affordances?
 >> >
>> > What about for Java and for Go? We could vote on and tag arelease for>> Go, and add a documentation page that links to the Java/Godefinitions
 at a
>> specific revision (as the equivalent 'format' definition forJava/Go)?
 Or
>> would we vendor the entire Java module/Go package as the'format'?
 >> >
>> > Do we have a preference for versioning strategy? Should weproceed in>> lockstep with the Arrow C++ library et. al. and release "ADBC1.0.0"
 (the
>> API standard) with "drivers version 10.0.0", or use anindependent>> versioning scheme? (For example, release API standard andcomponents at>> "1.0.0". Then further releases of components that do not changethe spec>> would be "1.1", "1.2", ...; if/when we change the spec, startover with
 >> "2.0", "2.1", ...)
 >> >
>> > [1]:<https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go>
 >> >
 >> > -David
 >> >
 >> > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote:
 >> >> Hi,
 >> >>
 >> >> OK. I'll send pull requests for GLib and Ruby soon.
 >> >>
 >> >>> I'm curious if you have a particular use case in mind.
 >> >>
 >> >> I don't have any production-ready use case yet but I want to
 >> >> implement an Active Record adapter for ADBC. Active Record
 >> >> is the O/R mapper for Ruby on Rails. Implementing Web
 >> >> application by Ruby on Rails is one of major Ruby use
 >> >> cases. So providing Active Record interface for ADBC will
 >> >> increase Apache Arrow users in Ruby community.
 >> >>
 >> >> NOTE: Generally, Ruby on Rails users don't process large
 >> >> data but they sometimes need to process large (medium?) data
 >> >> in a batch process. Active Record adapter for ADBC may be
 >> >> useful for such use case.
 >> >>
 >> >>> There's a little bit more API cleanup to do [1]. If you
 >> >>> have comments on that or anything else, I'd appreciate
 >> >>> them. Otherwise, pull requests would also be appreciated.
 >> >>
 >> >> OK. I'll open issues/pull requests when I find
 >> >> something. For now, I think that "MODULE" type library
 >> >> instead of "SHARED" type library in CMake terminology
 >> >> [cmake] is better for driver modules. (I'll open an issue
 >> >> for this later.)
 >> >>
 >> >> [cmake]:
 <https://cmake.org/cmake/help/latest/command/add_library.html>
 >> >>
 >> >>
 >> >> Thanks,
 >> >> --
 >> >> kou
 >> >>
>> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com<mailto:e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com>>>> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27Aug 2022
 >> >> 15:28:56 -0400,
>> >> "David Li" <lidav...@apache.org<mailto:lidav...@apache.org>> wrote:
 >> >>
>> >>> I would be very happy to see GLib/Ruby bindings! I'm curiousif you
 >> have a particular use case in mind.
 >> >>>
 >> >>> There's a little bit more API cleanup to do [1]. If you have
 comments
>> on that or anything else, I'd appreciate them. Otherwise, pullrequests
 >> would also be appreciated.
 >> >>>
 >> >>> [1]: <https://github.com/apache/arrow-adbc/issues/79>
 >> >>>
 >> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote:
 >> >>>> Hi,
 >> >>>>
 >> >>>> Thanks for sharing the current status!
 >> >>>> I understand.
 >> >>>>
 >> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc
 >> >>>> before we release the first version? (I want to use ADBC
 >> >>>> from Ruby.) Or should I wait for the first release? If I can
 >> >>>> work on it now, I'll open pull requests for it.
 >> >>>>
 >> >>>> Thanks,
 >> >>>> --
 >> >>>> kou
 >> >>>>
>> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com<mailto:8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>>>> >>>> "Re: [DISC] Improving Arrow's database support" on Fri,26 Aug
 2022
 >> >>>> 11:03:26 -0400,
>> >>>> "David Li" <lidav...@apache.org<mailto:lidav...@apache.org>> wrote:
 >> >>>>
 >> >>>>> Thank you Kou!
 >> >>>>>
>> >>>>> At least initially, I don't think I'll be able to completethe>> Dataset integration in time. So 10.0.0 probably won't ship witha hard>> dependency. That said I am hoping to have PyArrow take anoptional
 >> dependency (so Flight SQL can finally be available from Python).
 >> >>>>>
 >> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
 >> >>>>>> Hi,
 >> >>>>>>
>> >>>>>> As a maintainer of Linux packages, I wantapache/arrow-adbc
 >> >>>>>> to be released before apache/arrow is released so that
 >> >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
 >> >>>>>> .deb/.rpm.
 >> >>>>>>
 >> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc,
 >> >>>>>> apache/arrow's .deb/.rpm needs to depend on
 >> >>>>>> apache/arrow-adbc's .deb/.rpm.)
 >> >>>>>>
 >> >>>>>> We can add .deb/.rpm related files
 >> >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to
>> >>>>>> apache/arrow-adbc to build .deb/.rpm forapache/arrow-adbc.
 >> >>>>>>
 >> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c:
 >> >>>>>>
 >> >>>>>> *
>><https://github.com/datafusion-contrib/datafusion-c/tree/main/package>
 >> >>>>>> *
 >> >>>>>>
 >>
<https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml>
 >> >>>>>>
 >> >>>>>> I can work on it in apache/arrow-adbc.
 >> >>>>>>
 >> >>>>>>
 >> >>>>>> Thanks,
 >> >>>>>> --
 >> >>>>>> kou
 >> >>>>>>
>> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com<mailto:5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>>>> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu,25 Aug
 >> 2022
 >> >>>>>> 11:51:08 -0400,
>> >>>>>> "David Li" <lidav...@apache.org<mailto:lidav...@apache.org>> wrote:
 >> >>>>>>
>> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorryfor the
 >> wall of text that follows…)
 >> >>>>>>>
 >> >>>>>>> These are the components:
 >> >>>>>>>
 >> >>>>>>> - Core adbc.h header
 >> >>>>>>> - Driver manager for C/C++
 >> >>>>>>> - Flight SQL-based driver
 >> >>>>>>> - Postgres-based driver (WIP)
>> >>>>>>> - SQLite-based driver (more of a testbed for me than anactual
 >> component - I don't think we'd actually distribute this)
 >> >>>>>>> - Java core interfaces
 >> >>>>>>> - Java driver manager
 >> >>>>>>> - Java JDBC-based driver
 >> >>>>>>> - Java Flight SQL-based driver
 >> >>>>>>> - Python driver manager
 >> >>>>>>>
>> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. TheFlight
 SQL
>> drivers get moved to the main Arrow repo and distributed as partof the
 >> regular Arrow releases.
 >> >>>>>>>
 >> >>>>>>> For the rest of the components: they could be packaged
>> individually, but versioned and released together. Also, eachC/C++
 driver
>> probably needs a corresponding Python package so Python users donot
 have
>> to futz with shared library configurations. (See [1].) So forinstance,>> installing PyArrow would also give you the Flight SQL driver,and `pip
 >> install adbc_postgres` would get you the Postgres-based driver.
 >> >>>>>>>
>> >>>>>>> That would mean setting up separate CI, release, etc.(and
 >> eventually linking Crossbow & Conbench as well?). That does mean
>> duplication of effort, but the trade off is avoiding bloatingthe main>> release process even further. However, I'd like to hear fromthose
 closer
>> to the release process on this subject - if it would makepeople's lives
 >> easier, we could merge everything into one repo/process.
 >> >>>>>>>
>> >>>>>>> Integrations would be distributed as part of theirrespective
 >> packages (e.g. Arrow Dataset would optionally link to the driver
 manager).
>> So the "part of Arrow 10.0.0" aspect means having a stableinterface for
 >> adbc.h, and getting the Flight SQL drivers into the main repo.
 >> >>>>>>>
 >> >>>>>>> [1]: <https://github.com/apache/arrow-adbc/issues/53>
 >> >>>>>>>
 >> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
 >> >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400
>> >>>>>>>> "David Li" <lidav...@apache.org<mailto:lidav...@apache.org>> wrote:>> >>>>>>>>> Since it's been a while, I'd like to give an update.There are
 >> also a few questions I have around distribution.
 >> >>>>>>>>>
 >> >>>>>>>>> Currently:
 >> >>>>>>>>> - Supported in C, Java, and Python.
>> >>>>>>>>> - For C/Python, there are basic drivers wrappingFlight SQL
 and
>> SQLite, with a draft of a libpq (Postgres) driver (usingnanoarrow).>> >>>>>>>>> - For Java, there are drivers wrapping JDBC and FlightSQL.>> >>>>>>>>> - For Python, there's low-level bindings to the C API,and the>> DBAPI interface on top of that (+a few extension methodsresembling
 >> DuckDB/Turbodbc).
 >> >>>>>>>>>
>> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R),and>> DuckDB. (I'd like to thank Hannes and Kirill for their comments,as
 well as
 >> Antoine, Dewey, and Matt here.)
 >> >>>>>>>>>
>> >>>>>>>>> I'd like to have this as part of 10.0.0 in somefashion.
 >> However, I'm not sure how we would like to handle packaging and
>> distribution. In particular, there are several sub-componentsfor each>> language (the driver manager + the drivers), increasing thework. Any
 >> thoughts here?
 >> >>>>>>>>
>> >>>>>>>> Sorry, forgot to answer here. But I think your questionis too
 >> broadly
>> >>>>>>>> formulated. It probably deserves a case-by-casediscussion,
 IMHO.
 >> >>>>>>>>
>> >>>>>>>>> I'm also wondering how we want to handle this in termsof
 >> specification - I assume we'd consider the core header file/Java
 interfaces
>> a spec like the C Data Interface/Flight RPC, and vote onthem/mirror
 them
 >> into the format/ directory?
 >> >>>>>>>>
 >> >>>>>>>> That sounds like the right way to me indeed.
 >> >>>>>>>>
 >> >>>>>>>> Regards
 >> >>>>>>>>
 >> >>>>>>>> Antoine.
 >>

Re: [DISC] Improving Arrow's database support

Reply via email to