Wes, thanks for following up on this and making sure that we are following
the process here. I have merged a PR to revert the previous revert, so the
Python bindings are now back in the repo.

On Tue, May 4, 2021 at 4:14 PM Wes McKinney <wesmck...@gmail.com> wrote:

> Based on the general@incubator thread, there isn't a 100% consensus
> but I think we can accept the PR as is and move forward. I appreciate
> everyone's patience
>
> On Tue, May 4, 2021 at 10:24 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > See thread on general@incubator
> >
> >
> https://lists.apache.org/thread.html/r3108dd293240967cab4d75a8003895b247b3b3b726a7e1e54f3d9b65%40%3Cgeneral.incubator.apache.org%3E
> >
> > On Tue, May 4, 2021 at 9:35 AM Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > I admit it's an unusual situation to have a single-author codebase
> > > where the developer is on the PMC, let's determine what is the
> > > protocol for this kind of thing in the future so we don't create
> > > unnecessary work for ourselves.
> > >
> > > On Tue, May 4, 2021 at 9:15 AM Andy Grove <andygrov...@gmail.com>
> wrote:
> > > >
> > > > I apologize. For some reason, I had thought that because Jorge was
> the only
> > > > contributor (except for one contribution fixing a typo in the
> README) that
> > > > the IP clearance process did not apply in this case.
> > > >
> > > > I will create a PR to revert.
> > > >
> > > > On Tue, May 4, 2021 at 8:06 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > > >
> > > > > Just to circle back on this. Since this was an independent codebase
> > > > > previously developed over a 10 month period, I had assumed we
> would be
> > > > > looking at an IP clearance vote, but instead it was just merged
> into
> > > > > arrow-datafusion.
> > > > >
> > > > > On Tue, Apr 27, 2021 at 10:50 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi Jorge,
> > > > > > This all sounds good to me.  It might be nice to test against
> both the
> > > > > > pinned released version of pyarrow and at head if possible.
> > > > > >
> > > > > > I like the idea of not causing release churn as long as all the
> > > > > underlying
> > > > > > libraries are compatible.
> > > > > >
> > > > > > Thanks for the write up.
> > > > > >
> > > > > > -Micah
> > > > > >
> > > > > > On Mon, Apr 26, 2021 at 10:30 AM Jorge Cardoso Leitão <
> > > > > > jorgecarlei...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Micah,
> > > > > > >
> > > > > > > All testing is actually done from Python: create a record
> batch in
> > > > > > > pyarrow, push it to datafusion,
> > > > > > > consume it back in Python, and compare the result using
> pyarrows'
> > > > > > > equality. Sometimes parquet is used instead.
> > > > > > > The library is tested against pyarrow==1 from pypi: we can
> bump that,
> > > > > but
> > > > > > > if it works in pyarrow==1,
> > > > > > > chances are things will improve with higher versions :)
> > > > > > >
> > > > > > > Releases: I thought to have it released as a separate wheel
> for two
> > > > > > > reasons:
> > > > > > >
> > > > > > > * not force people that want pyarrow to download datafusion
> binaries
> > > > > with
> > > > > > > it
> > > > > > > * have independent versioning from pyarrow
> > > > > > >
> > > > > > > and "bracked" the pyarrow that we ensure compatibility with.
> > > > > > >
> > > > > > > Another alternative is to release with the same versioning as
> > > > > datafusion,
> > > > > > > like arrow c++ / pyarrow and spark / pyspark.
> > > > > > > The upside is that the versions are aligned. The downside is
> that we
> > > > > will
> > > > > > > be releasing a lot of majors for no reason: so far, all
> backward
> > > > > > > incompatible changes in datafusion were not backward
> incompatible in
> > > > > > > python-datafusion: it is easier to break backward compat. in a
> Rust
> > > > > library
> > > > > > > than it is in a Python wrapper to a Rust library.
> > > > > > >
> > > > > > > What are your thoughts, Micah?
> > > > > > >
> > > > > > > Best,
> > > > > > > Jorge
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Apr 25, 2021 at 10:32 PM Micah Kornfield <
> > > > > emkornfi...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Jorge,
> > > > > > >> I think this would certainly be a valuable contribution.  How
> were you
> > > > > > >> thinking of hosting (which repo)/publishing it
> (maintaintaining a
> > > > > separate
> > > > > > >> wheel)?  Also did you have thoughts integration testing with
> pyarrow?
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Micah
> > > > > > >>
> > > > > > >> On Sun, Apr 25, 2021 at 9:13 AM Jorge Cardoso Leitão <
> > > > > > >> jorgecarlei...@gmail.com> wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > I fielded a PR [1] to open up a discussion to incorporate
> > > > > > >> python-datafusion
> > > > > > >> > [2] into the Apache Arrow project.
> > > > > > >> >
> > > > > > >> > Python-datafusion is a Python library [3] built on top of
> > > > > DataFusions
> > > > > > >> that
> > > > > > >> > enables people to use DataFusion from Python. It leverages
> the C
> > > > > data
> > > > > > >> > interface for zero-cost copy between DataFusion and pyarrow
> (a
> > > > > bunch of
> > > > > > >> > pointers is shared around).
> > > > > > >> >
> > > > > > >> > For example, it allows users to read a CSV from Rust, pass
> the
> > > > > arrays
> > > > > > >> to a
> > > > > > >> > C++ kernel, continue the computation in Rust's kernels, and
> export
> > > > > to
> > > > > > >> > parquet using Rust (or C++ parquet, or whatever ^_^). It
> supports
> > > > > UDFs
> > > > > > >> and
> > > > > > >> > UDAFs, in case someone wants to go crazy with Pyarrow,
> Pandas,
> > > > > numpy or
> > > > > > >> > tensorflow. =)
> > > > > > >> >
> > > > > > >> > Best,
> > > > > > >> > Jorge
> > > > > > >> >
> > > > > > >> > [1] https://github.com/apache/arrow-datafusion/pull/69
> > > > > > >> > [2] https://github.com/jorgecarleitao/datafusion-python
> > > > > > >> > [3] https://pypi.org/project/datafusion/
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > >
>

Reply via email to