Would "incorporate" mean that the codebase is moved into the arrow
repository or is the plan to keep a separate repository
for datafusion-python but under the apache org?

On Sun, Apr 25, 2021 at 10:40 PM Daniël Heres <danielhe...@gmail.com> wrote:

> Hi Jorge,
>
> Awesome, I think this is a super valuable addition and makes DataFusion
> much more accessible / approachable for anyone wanting to experiment with
> DataFusion.
> Would be very cool to update it to the latest version and include it in the
> project.
>
> Best,
>
> Daniël
>
> On Sun, Apr 25, 2021, 22:32 Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> > Hi Jorge,
> > I think this would certainly be a valuable contribution.  How were you
> > thinking of hosting (which repo)/publishing it (maintaintaining a
> separate
> > wheel)?  Also did you have thoughts integration testing with pyarrow?
> >
> > Cheers,
> > Micah
> >
> > On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão <
> > jorgecarlei...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I fielded a PR [1] to open up a discussion to incorporate
> > python-datafusion
> > > [2] into the Apache Arrow project.
> > >
> > > Python-datafusion is a Python library [3] built on top of DataFusions
> > that
> > > enables people to use DataFusion from Python. It leverages the C data
> > > interface for zero-cost copy between DataFusion and pyarrow (a bunch of
> > > pointers is shared around).
> > >
> > > For example, it allows users to read a CSV from Rust, pass the arrays
> to
> > a
> > > C++ kernel, continue the computation in Rust's kernels, and export to
> > > parquet using Rust (or C++ parquet, or whatever ^_^). It supports UDFs
> > and
> > > UDAFs, in case someone wants to go crazy with Pyarrow, Pandas, numpy or
> > > tensorflow. =)
> > >
> > > Best,
> > > Jorge
> > >
> > > [1] https://github.com/apache/arrow-datafusion/pull/69
> > > [2] https://github.com/jorgecarleitao/datafusion-python
> > > [3] https://pypi.org/project/datafusion/
> > >rer
> >
>

Reply via email to