Hello!

I am trying to use the expression and compute APIs for query processing,
and in my searches so far, this thread seems to be the most relevant.

A lot of the operators and functions that I need in the short-term appear
to be implemented, but the documentation seems sparse or at least not all
in the same place. The document that Micah linked has been useful, and I've
been perusing the source, but I was wondering if some initial contributions
I can make would be to document the designed model and then propose further
changes or designs afterwards.

Is anyone already putting effort in (or completed) consolidating or
expanding documentation on the compute and dataset/expression APIs and how
they interact, etc.?

Thanks!

Aldrin Montana
Computer Science PhD Student
UC Santa Cruz


On Mon, Nov 30, 2020 at 7:40 AM Wes McKinney <wesmck...@gmail.com> wrote:

> One objective of the precompiled kernels project is to have meaningful
> computational functionality in a package that does not need to include
> the LLVM runtime -- to require the LLVM dependency even for simple
> functions would more than double the size of our Python packages, for
> example.
>
> There is currently little code sharing between functions that do
> identical work in arrow::compute:: versus gandiva:: -- this has been
> discussed, but it needs a champion to do something about it. When I
> was working on the new function framework earlier this year, I spent a
> day or so perusing src/gandiva/precompiled and reasoned it would be a
> prohibitive amount of refactoring for me to undertake at that time. In
> principle many of these functions (e.g. string functions) can be
> incrementally refactored into reusable inline functions / templates
> for improved code reuse. We could also explore common infrastructure
> for unit testing and benchmarking. Anything is possible if enough
> engineering time is invested.
>
> I would hope in the future to see a generalized expression API as part
> of a logical query plan-type system (for query processing) that has
> the ability to use Gandiva (if it's available) to compile
> subexpressions for better performance. I had hoped to spend some time
> on this myself earlier this year, but I've gotten busy with some other
> things and won't be able to devote much development time to this
> myself.
>
> - Wes
>
> On Sun, Nov 29, 2020 at 11:18 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > >
> > > There are some computations kernels in arrow and it looks that this
> part is
> > > in active development right now. I wonder if there is a document / some
> > > emails describing what is the goal and uses cases for this part of the
> code
> > > base. Would be very interesting to know a bit more and I would like to
> > > contribute at some point.
> >
> >
> >
> https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit
> > talks about some of the goals of the compute module.
> >
> > I'm interested because I develop a Proof-of-concept for a declarative
> > > language to perform statistical computations on top of gandiva.
> >
> >
> > I think upon cursory examination someone (maybe Wes) thought Gandiva and
> > the compute kernels might not play nicely together, but I can't find a
> > reference to that at the moment.
> >
> >
> > On Sat, Nov 21, 2020 at 3:09 AM Kirill Lykov <lykov.kir...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > There are some computations kernels in arrow and it looks that this
> part is
> > > in active development right now. I wonder if there is a document / some
> > > emails describing what is the goal and uses cases for this part of the
> code
> > > base. Would be very interesting to know a bit more and I would like to
> > > contribute at some point.
> > > I'm interested because I develop a Proof-of-concept for a declarative
> > > language to perform statistical computations on top of gandiva.
> > >
> > > --
> > > Best regards,
> > > Kirill Lykov
> > >
>

Reply via email to