Hi, Tom, This does not address the question directly, but for what is worth, I had the same issue and thus released a Python binding for DataFusion <https://pypi.org/project/datafusion/>. It allows e.g. to create a pyarrow RecordBatch by reading from s3 (via pyarrow), and use it as a source to DataFusion's plan via SQL or DataFrame API. Because it uses the C data interface, there is virtually no cost in moving from and to datafusion/pyarrow. It supports UDFs and UDAF in native pyarrow arrays, which means that there is no performance hit when using a UDF with a pyarrow/C++ kernel also. Performance decrates when you need to map the pyarrow array to some other format (e.g. numpy), typically to push it to sklearn, scipy, etc.
`pip install datafusion`, but fyi this is *not* production ready and many of the pyarrow types are not supported yet. :) Best, Jorge On Fri, Feb 12, 2021 at 5:41 PM Tom Scheffers <t...@youngbulls.nl.invalid> wrote: > Dear devs, > > I am really interested in an in-memory query interface to Arrow tables > (like DataFusion is for Rust), preferably in Python. In my opinion, there > are three routes: 1. create a wrapper/interface to DataFusion directly, 2. > copy Arrow to pandas and use an existing framework (like Ibis) and 3. > build/extend something new based on pyarrow (with small conversions back > and forth to numpy or pandas). > > The Arrow / DataFusion route currently lacks some capabilities, like > parquet files directly from S3, but also the push down of predicates. > Therefore, I would rather wait for things to mature. Besides, the C++ > branch of Arrow seems to be more mature and integrates nicely with Python. > > The pandas route is probably more convenient, however it will be much less > efficient. Columnar storage, predicate push downs and statistics > optimizations are the main reason for using Arrow, which will not be fully > utilized in this route. > > Is there already something like DataFusion on the roadmap for C++ (and thus > Python)? Or is there an Ibis like engine which acts directly on Pyarrow? I > would like to help on advancements into this direction, but struggle in > finding where to start. > > Thanks for your help. > > Kind regards, > > Tom >