Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

Dirk Eddelbuettel Sun, 31 Mar 2024 04:48:41 -0700


Julian,


Arrow is a complicated and large package. We use it at work (where there is a
fair amount of Python, also to Conda etc) and do have issues with more
complex builds especially because it is 'data infrastructure' and can come in
from different parts. I would recommend against packaging at old one -- we
also have seen issues with different (py)arrow version biting.

Have you seen https://github.com/apache/arrow-nanoarrow ?

It works via the C API to Arrow which interchanges data via two void* to the
the two structs for arrow array and schema -- and avoids linkage issue. (In
user space the pyarrow or R arrow packages can still be used also interfacing
via these.)  I have been using it for R package bindings for some time and we
plan to expand that (again, at work) -- as do others. It is already use by
duckdb, by the Arrow 'ADBC' interfaces (which are generic in the ODBC/JDBC
sense but for Arrow, and also by a python interface to snowflake.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

Reply via email to