At least for the filtering part, isn't it already possible via gandiva
filters[1]? I had a similar question about pushing record-level filtering
into the parquet reader.

[1]
https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_gandiva.py#L86-L100

On Mon, May 13, 2019 at 8:51 AM Wes McKinney <[email protected]> wrote:

> https://issues.apache.org/jira/browse/ARROW-1558
>
> On Mon, May 13, 2019 at 10:47 AM Micah Kornfield <[email protected]>
> wrote:
> >
> > There are also some open JIRA issues for these  sorting  in
> > cpp/src/arrow/compute [1][2].   I couldn't find one for filtering but I'm
> > surprised one doesn't exist.
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-4631
> > <
> https://issues.apache.org/jira/browse/ARROW-4631?jql=project%20%3D%20ARROW%20AND%20text%20~%20sort
> >
> > [2] https://
> > <
> https://issues.apache.org/jira/browse/ARROW-4631?jql=project%20%3D%20ARROW%20AND%20text%20~%20sort
> >
> > issues.apache.org/jira/browse/ARROW-1566
> >
> >
> > On Mon, May 13, 2019 at 8:36 AM Wes McKinney <[email protected]>
> wrote:
> >
> > > hi John -- I'd recommend implementing these capabilities as Kernel
> > > functions under cpp/src/arrow/compute, then they can be exposed in
> > > Python easily.
> > >
> > > - Wes
> > >
> > > On Mon, May 13, 2019 at 9:01 AM John Muehlhausen <[email protected]> wrote:
> > > >
> > > > Does pyarrow currently support filter/sort/search without conversion
> to
> > > > pandas? I don’t see anything but want to be sure.  Sorry if I
> overlooked
> > > it.
> > > >
> > > > Specific needs:
> > > >
> > > > 1- filter an arrow record batch and sort the results into a new batch
> > > > 2- find slice locations for a sorted batch using binary search
> > > >
> > > > If I wanted to contribute this functionality to pyarrow, how would I
> plug
> > > > in to that effort?
> > > >
> > > > Thanks,
> > > > John
> > >
>

Reply via email to