Hi,

OK. I'll send pull requests for GLib and Ruby soon.

> I'm curious if you have a particular use case in mind.

I don't have any production-ready use case yet but I want to
implement an Active Record adapter for ADBC. Active Record
is the O/R mapper for Ruby on Rails. Implementing Web
application by Ruby on Rails is one of major Ruby use
cases. So providing Active Record interface for ADBC will
increase Apache Arrow users in Ruby community.

NOTE: Generally, Ruby on Rails users don't process large
data but they sometimes need to process large (medium?) data
in a batch process. Active Record adapter for ADBC may be
useful for such use case.

> There's a little bit more API cleanup to do [1]. If you
> have comments on that or anything else, I'd appreciate
> them. Otherwise, pull requests would also be appreciated.

OK. I'll open issues/pull requests when I find
something. For now, I think that "MODULE" type library
instead of "SHARED" type library in CMake terminology
[cmake] is better for driver modules. (I'll open an issue
for this later.)

[cmake]: https://cmake.org/cmake/help/latest/command/add_library.html


Thanks,
-- 
kou

In <e6380315-94aa-4dd1-8685-268edd597...@www.fastmail.com>
  "Re: [DISC] Improving Arrow's database support" on Sat, 27 Aug 2022 15:28:56 
-0400,
  "David Li" <lidav...@apache.org> wrote:

> I would be very happy to see GLib/Ruby bindings! I'm curious if you have a 
> particular use case in mind. 
> 
> There's a little bit more API cleanup to do [1]. If you have comments on that 
> or anything else, I'd appreciate them. Otherwise, pull requests would also be 
> appreciated.
> 
> [1]: https://github.com/apache/arrow-adbc/issues/79
> 
> On Fri, Aug 26, 2022, at 21:53, Sutou Kouhei wrote:
>> Hi,
>>
>> Thanks for sharing the current status!
>> I understand.
>>
>> BTW, can I add GLib/Ruby bindings to apache/arrow-adbc
>> before we release the first version? (I want to use ADBC
>> from Ruby.) Or should I wait for the first release? If I can
>> work on it now, I'll open pull requests for it.
>>
>> Thanks,
>> -- 
>> kou
>>
>> In <8703efd9-51bd-4f91-b550-73830667d...@www.fastmail.com>
>>   "Re: [DISC] Improving Arrow's database support" on Fri, 26 Aug 2022 
>> 11:03:26 -0400,
>>   "David Li" <lidav...@apache.org> wrote:
>>
>>> Thank you Kou!
>>> 
>>> At least initially, I don't think I'll be able to complete the Dataset 
>>> integration in time. So 10.0.0 probably won't ship with a hard dependency. 
>>> That said I am hoping to have PyArrow take an optional dependency (so 
>>> Flight SQL can finally be available from Python).
>>> 
>>> On Fri, Aug 26, 2022, at 01:01, Sutou Kouhei wrote:
>>>> Hi,
>>>>
>>>> As a maintainer of Linux packages, I want apache/arrow-adbc
>>>> to be released before apache/arrow is released so that
>>>> apache/arrow's .deb/.rpm can depend on apache/arrow-adbc's
>>>> .deb/.rpm.
>>>>
>>>> (If Apache Arrow Dataset uses apache/arrow-adbc,
>>>> apache/arrow's .deb/.rpm needs to depend on
>>>> apache/arrow-adbc's .deb/.rpm.)
>>>>
>>>> We can add .deb/.rpm related files
>>>> (dev/tasks/linux-packages/ in apache/arrow) to
>>>> apache/arrow-adbc to build .deb/.rpm for apache/arrow-adbc.
>>>>
>>>> FYI: I did it for datafusion-contrib/datafusion-c:
>>>>
>>>> * https://github.com/datafusion-contrib/datafusion-c/tree/main/package
>>>> * 
>>>> https://github.com/datafusion-contrib/datafusion-c/blob/main/.github/workflows/package.yaml
>>>>
>>>> I can work on it in apache/arrow-adbc.
>>>>
>>>>
>>>> Thanks,
>>>> -- 
>>>> kou
>>>>
>>>> In <5cbf2923-4fb4-4c5e-b11d-007209fdd...@www.fastmail.com>
>>>>   "Re: [DISC] Improving Arrow's database support" on Thu, 25 Aug 2022 
>>>> 11:51:08 -0400,
>>>>   "David Li" <lidav...@apache.org> wrote:
>>>>
>>>>> Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of 
>>>>> text that follows…)
>>>>> 
>>>>> These are the components:
>>>>> 
>>>>> - Core adbc.h header
>>>>> - Driver manager for C/C++
>>>>> - Flight SQL-based driver
>>>>> - Postgres-based driver (WIP)
>>>>> - SQLite-based driver (more of a testbed for me than an actual component 
>>>>> - I don't think we'd actually distribute this)
>>>>> - Java core interfaces
>>>>> - Java driver manager
>>>>> - Java JDBC-based driver
>>>>> - Java Flight SQL-based driver
>>>>> - Python driver manager
>>>>> 
>>>>> I think: adbc.h gets mirrored into the Arrow repo. The Flight SQL drivers 
>>>>> get moved to the main Arrow repo and distributed as part of the regular 
>>>>> Arrow releases.
>>>>> 
>>>>> For the rest of the components: they could be packaged individually, but 
>>>>> versioned and released together. Also, each C/C++ driver probably needs a 
>>>>> corresponding Python package so Python users do not have to futz with 
>>>>> shared library configurations. (See [1].) So for instance, installing 
>>>>> PyArrow would also give you the Flight SQL driver, and `pip install 
>>>>> adbc_postgres` would get you the Postgres-based driver.
>>>>> 
>>>>> That would mean setting up separate CI, release, etc. (and eventually 
>>>>> linking Crossbow & Conbench as well?). That does mean duplication of 
>>>>> effort, but the trade off is avoiding bloating the main release process 
>>>>> even further. However, I'd like to hear from those closer to the release 
>>>>> process on this subject - if it would make people's lives easier, we 
>>>>> could merge everything into one repo/process.
>>>>> 
>>>>> Integrations would be distributed as part of their respective packages 
>>>>> (e.g. Arrow Dataset would optionally link to the driver manager). So the 
>>>>> "part of Arrow 10.0.0" aspect means having a stable interface for adbc.h, 
>>>>> and getting the Flight SQL drivers into the main repo.
>>>>> 
>>>>> [1]: https://github.com/apache/arrow-adbc/issues/53
>>>>> 
>>>>> On Thu, Aug 25, 2022, at 11:34, Antoine Pitrou wrote:
>>>>>> On Fri, 19 Aug 2022 14:09:44 -0400
>>>>>> "David Li" <lidav...@apache.org> wrote:
>>>>>>> Since it's been a while, I'd like to give an update. There are also a 
>>>>>>> few questions I have around distribution.
>>>>>>> 
>>>>>>> Currently:
>>>>>>> - Supported in C, Java, and Python.
>>>>>>> - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, 
>>>>>>> with a draft of a libpq (Postgres) driver (using nanoarrow).
>>>>>>> - For Java, there are drivers wrapping JDBC and Flight SQL.
>>>>>>> - For Python, there's low-level bindings to the C API, and the DBAPI 
>>>>>>> interface on top of that (+a few extension methods resembling 
>>>>>>> DuckDB/Turbodbc).
>>>>>>>  
>>>>>>> There's drafts of integration with Ibis [1], DBI (R), and DuckDB. (I'd 
>>>>>>> like to thank Hannes and Kirill for their comments, as well as Antoine, 
>>>>>>> Dewey, and Matt here.)
>>>>>>> 
>>>>>>> I'd like to have this as part of 10.0.0 in some fashion. However, I'm 
>>>>>>> not sure how we would like to handle packaging and distribution. In 
>>>>>>> particular, there are several sub-components for each language (the 
>>>>>>> driver manager + the drivers), increasing the work. Any thoughts here?
>>>>>>
>>>>>> Sorry, forgot to answer here. But I think your question is too broadly
>>>>>> formulated. It probably deserves a case-by-case discussion, IMHO.
>>>>>>
>>>>>>> I'm also wondering how we want to handle this in terms of specification 
>>>>>>> - I assume we'd consider the core header file/Java interfaces a spec 
>>>>>>> like the C Data Interface/Flight RPC, and vote on them/mirror them into 
>>>>>>> the format/ directory?
>>>>>>
>>>>>> That sounds like the right way to me indeed.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Antoine.

Reply via email to