Would "incorporate" mean that the codebase is moved into the arrow repository or is the plan to keep a separate repository for datafusion-python but under the apache org?
On Sun, Apr 25, 2021 at 10:40 PM Daniël Heres <danielhe...@gmail.com> wrote: > Hi Jorge, > > Awesome, I think this is a super valuable addition and makes DataFusion > much more accessible / approachable for anyone wanting to experiment with > DataFusion. > Would be very cool to update it to the latest version and include it in the > project. > > Best, > > Daniël > > On Sun, Apr 25, 2021, 22:32 Micah Kornfield <emkornfi...@gmail.com> wrote: > > > Hi Jorge, > > I think this would certainly be a valuable contribution. How were you > > thinking of hosting (which repo)/publishing it (maintaintaining a > separate > > wheel)? Also did you have thoughts integration testing with pyarrow? > > > > Cheers, > > Micah > > > > On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão < > > jorgecarlei...@gmail.com> wrote: > > > > > Hi, > > > > > > I fielded a PR [1] to open up a discussion to incorporate > > python-datafusion > > > [2] into the Apache Arrow project. > > > > > > Python-datafusion is a Python library [3] built on top of DataFusions > > that > > > enables people to use DataFusion from Python. It leverages the C data > > > interface for zero-cost copy between DataFusion and pyarrow (a bunch of > > > pointers is shared around). > > > > > > For example, it allows users to read a CSV from Rust, pass the arrays > to > > a > > > C++ kernel, continue the computation in Rust's kernels, and export to > > > parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs > > and > > > UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or > > > tensorflow. =) > > > > > > Best, > > > Jorge > > > > > > [1] https://github.com/apache/arrow-datafusion/pull/69 > > > [2] https://github.com/jorgecarleitao/datafusion-python > > > [3] https://pypi.org/project/datafusion/ > > >rer > > >