Thanks all, I've updated the header with the proposed versioning
scheme.
At this point I believe the core definitions are ready. (Note that
I'm
explicitly punting on [1][2][3] here.) Absent further comments, I'd
like to
do the following:
- Start a vote on mirroring adbc.h to arrow/format, as well adding
docs/source/format/ADBC.rst that describes the header, the Java
interface,
the Go interface, and the versioning scheme (I will put up a PR
beforehand)
- Begin work on CI/packaging, with a release hopefully coinciding
with
Arrow 10.0.0
- Begin work on changes to the main repository, also hopefully in
time for
10.0.0 (moving the Flight SQL driver to be part of apache/arrow;
exposing
it in PyArrow; possibly also exposing Acero via ADBC)
[1]: <https://github.com/apache/arrow-adbc/issues/46>
[2]: <https://github.com/apache/arrow-adbc/issues/55>
[3]: <https://github.com/apache/arrow-adbc/issues/59>
On Sat, Sep 3, 2022, at 18:36, Matthew Topol wrote:
> +1 from me on the strategy proposed by Kou.
>
> That would be my preference also. I agree it is preferable to be
versioned
> independently.
>
> --Matt
>
> On Sat, Sep 3, 2022, 6:24 PM Sutou Kouhei <k...@clear-code.com
<mailto:k...@clear-code.com>> wrote:
>
>> Hi,
>>
>> > Do we have a preference for versioning strategy? Should we
>> > proceed in lockstep with the Arrow C++ library et. al. and
>> > release "ADBC 1.0.0" (the API standard) with "drivers
>> > version 10.0.0", or use an independent versioning scheme?
>> > (For example, release API standard and components at
>> > "1.0.0". Then further releases of components that do not
>> > change the spec would be "1.1", "1.2", ...; if/when we
>> > change the spec, start over with "2.0", "2.1", ...)
>>
>> I like an independent versioning schema. I assume that ADBC
>> doesn't need backward incompatible changes frequently. How
>> about incrementing major version only when ADBC needs
>> any backward incompatible changes?
>>
>> e.g.:
>>
>> 1. Release ADBC (the API standard) 1.0.0
>> 2. Release adbc_driver_manager 1.0.0
>> 3. Release adbc_driver_postgres 1.0.0
>> 4. Add a new feature to adbc_driver_postgres without
>> any backward incompatible changes
>> 5. Release adbc_driver_postgres 1.1.0
>> 6. Fix a bug in adbc_driver_manager without
>> any backward incompatible changes
>> 7. Release adbc_driver_manager 1.0.1
>> 8. Add a backward incompatible change to adbc_driver_manager
>> 9. Release adbc_driver_manager 2.0.0
>> 10. Add a new feature to ADBC without any
>> backward incompatible changes
>> 11. Release ADBC (the API standard) 1.1.0
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In <7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com
<mailto:7b20d730-b85e-4818-b99e-3335c40c2...@www.fastmail.com>>
>> "Re: [DISC] Improving Arrow's database support" on Thu, 01 Sep
2022
>> 16:36:43 -0400,
>> "David Li" <lidav...@apache.org <mailto:lidav...@apache.org>>
wrote:
>>
>> > Following up here with some specific questions:
>> >
>> > Matt Topol added some Go definitions [1] (thanks!) I'd assume
we want
to
>> vote on those as well?
>> >
>> > How should the process work for Java/Go? For C/C++, I assume
we'd
treat
>> it like the C Data Interface and copy adbc.h to format/ after a
vote,
and
>> then vote on releases of components. Or do we really only
consider the C
>> header as the 'format', with the others being language-specific
affordances?
>> >
>> > What about for Java and for Go? We could vote on and tag a
release for
>> Go, and add a documentation page that links to the Java/Go
definitions
at a
>> specific revision (as the equivalent 'format' definition for
Java/Go)?
Or
>> would we vendor the entire Java module/Go package as the
'format'?
>> >
>> > Do we have a preference for versioning strategy? Should we
proceed in
>> lockstep with the Arrow C++ library et. al. and release "ADBC
1.0.0"
(the
>> API standard) with "drivers version 10.0.0", or use an
independent
>> versioning scheme? (For example, release API standard and
components at
>> "1.0.0". Then further releases of components that do not change
the spec
>> would be "1.1", "1.2", ...; if/when we change the spec, start
over with
>> "2.0", "2.1", ...)
>> >
>> > [1]:
<https://github.com/apache/arrow-adbc/blob/main/go/adbc/adbc.go>
>> >
>> > -David
>> >
>> > On Sun, Aug 28, 2022, at 10:56, Sutou Kouhei wrote:
>> >> Hi,
>> >>
>> >> OK. I'll send pull requests for GLib and Ruby soon.
>> >>
>> >>> I'm curious if you have a particular use case in mind.
>> >>
>> >> I don't have any production-ready use case yet but I want to
>> >> implement an Active Record adapter for ADBC. Active Record
>> >> is the O/R mapper for Ruby on Rails. Implementing Web
>> >> application by Ruby on Rails is one of major Ruby use
>> >> cases. So providing Active Record interface for ADBC will
>> >> increase Apache Arrow users in Ruby community.
>> >>
>> >> NOTE: Generally, Ruby on Rails users don't process large
>> >> data but they sometimes need to process large (medium?) data
>> >> in a batch process. Active Record adapter for ADBC may be
>> >> useful for such use case.
>> >>
>> >>> There's a little bit more API cleanup to do [1]. If you
>> >>> have comments on that or anything else, I'd appreciate
>> >>> them. Otherwise, pull requests would also be appreciated.
>> >>
>> >> OK. I'll open issues/pull requests when I find
>> >> something. For now, I think that "MODULE" type library
>> >> instead of "SHARED" type library in CMake terminology
>> >> [cmake] is better for driver modules. (I'll open an issue
>> >> for this later.)
>> >>
>> >> [cmake]:
<https://cmake.org/cmake/help/latest/command/add_library.html>
>> >>
>> >>
>> >> Thanks,
>> >> --
>> >> kou
>> >>
>> >> In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com
<mailto:e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com>>
>> >> "Re: [DISC] Improving Arrow's database support" on Sat, 27
Aug 2022
>> >> 15:28:56 -0400,
>> >> "David Li" <lidav...@apache.org
<mailto:lidav...@apache.org>> wrote:
>> >>
>> >>> I would be very happy to see GLib/Ruby bindings! I'm curious
if you
>> have a particular use case in mind.
>> >>>
>> >>> There's a little bit more API cleanup to do [1]. If you have
comments
>> on that or anything else, I'd appreciate them. Otherwise, pull
requests
>> would also be appreciated.
>> >>>
>> >>> [1]: <https://github.com/apache/arrow-adbc/issues/79>
>> >>>
>> >>> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote:
>> >>>> Hi,
>> >>>>
>> >>>> Thanks for sharing the current status!
>> >>>> I understand.
>> >>>>
>> >>>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc
>> >>>> before we release the first version? (I want to use ADBC
>> >>>> from Ruby.) Or should I wait for the first release? If I can
>> >>>> work on it now, I'll open pull requests for it.
>> >>>>
>> >>>> Thanks,
>> >>>> --
>> >>>> kou
>> >>>>
>> >>>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com
<mailto:8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>>
>> >>>> "Re: [DISC] Improving Arrow's database support" on Fri,
26 Aug
2022
>> >>>> 11:03:26 -0400,
>> >>>> "David Li" <lidav...@apache.org
<mailto:lidav...@apache.org>> wrote:
>> >>>>
>> >>>>> Thank you Kou!
>> >>>>>
>> >>>>> At least initially, I don't think I'll be able to complete
the
>> Dataset integration in time. So 10.0.0 probably won't ship with
a hard
>> dependency. That said I am hoping to have PyArrow take an
optional
>> dependency (so Flight SQL can finally be available from Python).
>> >>>>>
>> >>>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> As a maintainer of Linux packages, I want
apache/arrow-adbc
>> >>>>>> to be released before apache/arrow is released so that
>> >>>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
>> >>>>>> .deb/.rpm.
>> >>>>>>
>> >>>>>> (If Apache Arrow Dataset uses apache/arrow-adbc,
>> >>>>>> apache/arrow's .deb/.rpm needs to depend on
>> >>>>>> apache/arrow-adbc's .deb/.rpm.)
>> >>>>>>
>> >>>>>> We can add .deb/.rpm related files
>> >>>>>> (dev/tasks/linux-packages/ in apache/arrow) to
>> >>>>>> apache/arrow-adbc to build .deb/.rpm for
apache/arrow-adbc.
>> >>>>>>
>> >>>>>> FYI: I did it for datafusion-contrib/datafusion-c:
>> >>>>>>
>> >>>>>> *
>>
<https://github.com/datafusion-contrib/datafusion-c/tree/main/package>
>> >>>>>> *
>> >>>>>>
>>
<https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml>
>> >>>>>>
>> >>>>>> I can work on it in apache/arrow-adbc.
>> >>>>>>
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> --
>> >>>>>> kou
>> >>>>>>
>> >>>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com
<mailto:5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>>
>> >>>>>> "Re: [DISC] Improving Arrow's database support" on Thu,
25 Aug
>> 2022
>> >>>>>> 11:51:08 -0400,
>> >>>>>> "David Li" <lidav...@apache.org
<mailto:lidav...@apache.org>> wrote:
>> >>>>>>
>> >>>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry
for the
>> wall of text that follows…)
>> >>>>>>>
>> >>>>>>> These are the components:
>> >>>>>>>
>> >>>>>>> - Core adbc.h header
>> >>>>>>> - Driver manager for C/C++
>> >>>>>>> - Flight SQL-based driver
>> >>>>>>> - Postgres-based driver (WIP)
>> >>>>>>> - SQLite-based driver (more of a testbed for me than an
actual
>> component - I don't think we'd actually distribute this)
>> >>>>>>> - Java core interfaces
>> >>>>>>> - Java driver manager
>> >>>>>>> - Java JDBC-based driver
>> >>>>>>> - Java Flight SQL-based driver
>> >>>>>>> - Python driver manager
>> >>>>>>>
>> >>>>>>> I think: adbc.h gets mirrored into the Arrow repo. The
Flight
SQL
>> drivers get moved to the main Arrow repo and distributed as part
of the
>> regular Arrow releases.
>> >>>>>>>
>> >>>>>>> For the rest of the components: they could be packaged
>> individually, but versioned and released together. Also, each
C/C++
driver
>> probably needs a corresponding Python package so Python users do
not
have
>> to futz with shared library configurations. (See [1].) So for
instance,
>> installing PyArrow would also give you the Flight SQL driver,
and `pip
>> install adbc_postgres` would get you the Postgres-based driver.
>> >>>>>>>
>> >>>>>>> That would mean setting up separate CI, release, etc.
(and
>> eventually linking Crossbow & Conbench as well?). That does mean
>> duplication of effort, but the trade off is avoiding bloating
the main
>> release process even further. However, I'd like to hear from
those
closer
>> to the release process on this subject - if it would make
people's lives
>> easier, we could merge everything into one repo/process.
>> >>>>>>>
>> >>>>>>> Integrations would be distributed as part of their
respective
>> packages (e.g. Arrow Dataset would optionally link to the driver
manager).
>> So the "part of Arrow 10.0.0" aspect means having a stable
interface for
>> adbc.h, and getting the Flight SQL drivers into the main repo.
>> >>>>>>>
>> >>>>>>> [1]: <https://github.com/apache/arrow-adbc/issues/53>
>> >>>>>>>
>> >>>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
>> >>>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400
>> >>>>>>>> "David Li" <lidav...@apache.org
<mailto:lidav...@apache.org>> wrote:
>> >>>>>>>>> Since it's been a while, I'd like to give an update.
There are
>> also a few questions I have around distribution.
>> >>>>>>>>>
>> >>>>>>>>> Currently:
>> >>>>>>>>> - Supported in C, Java, and Python.
>> >>>>>>>>> - For C/Python, there are basic drivers wrapping
Flight SQL
and
>> SQLite, with a draft of a libpq (Postgres) driver (using
nanoarrow).
>> >>>>>>>>> - For Java, there are drivers wrapping JDBC and Flight
SQL.
>> >>>>>>>>> - For Python, there's low-level bindings to the C API,
and the
>> DBAPI interface on top of that (+a few extension methods
resembling
>> DuckDB/Turbodbc).
>> >>>>>>>>>
>> >>>>>>>>> There's drafts of integration with Ibis [1], DBI (R),
and
>> DuckDB. (I'd like to thank Hannes and Kirill for their comments,
as
well as
>> Antoine, Dewey, and Matt here.)
>> >>>>>>>>>
>> >>>>>>>>> I'd like to have this as part of 10.0.0 in some
fashion.
>> However, I'm not sure how we would like to handle packaging and
>> distribution. In particular, there are several sub-components
for each
>> language (the driver manager + the drivers), increasing the
work. Any
>> thoughts here?
>> >>>>>>>>
>> >>>>>>>> Sorry, forgot to answer here. But I think your question
is too
>> broadly
>> >>>>>>>> formulated. It probably deserves a case-by-case
discussion,
IMHO.
>> >>>>>>>>
>> >>>>>>>>> I'm also wondering how we want to handle this in terms
of
>> specification - I assume we'd consider the core header file/Java
interfaces
>> a spec like the C Data Interface/Flight RPC, and vote on
them/mirror
them
>> into the format/ directory?
>> >>>>>>>>
>> >>>>>>>> That sounds like the right way to me indeed.
>> >>>>>>>>
>> >>>>>>>> Regards
>> >>>>>>>>
>> >>>>>>>> Antoine.
>>