Welcome Aldrin, This sounds like a very reasonable way to start contributing.
-Micah On Fri, Jan 29, 2021 at 1:53 PM Aldrin <akmon...@ucsc.edu.invalid> wrote: > Hello! > > I am trying to use the expression and compute APIs for query processing, > and in my searches so far, this thread seems to be the most relevant. > > A lot of the operators and functions that I need in the short-term appear > to be implemented, but the documentation seems sparse or at least not all > in the same place. The document that Micah linked has been useful, and I've > been perusing the source, but I was wondering if some initial contributions > I can make would be to document the designed model and then propose further > changes or designs afterwards. > > Is anyone already putting effort in (or completed) consolidating or > expanding documentation on the compute and dataset/expression APIs and how > they interact, etc.? > > Thanks! > > Aldrin Montana > Computer Science PhD Student > UC Santa Cruz > > > On Mon, Nov 30, 2020 at 7:40 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > One objective of the precompiled kernels project is to have meaningful > > computational functionality in a package that does not need to include > > the LLVM runtime -- to require the LLVM dependency even for simple > > functions would more than double the size of our Python packages, for > > example. > > > > There is currently little code sharing between functions that do > > identical work in arrow::compute:: versus gandiva:: -- this has been > > discussed, but it needs a champion to do something about it. When I > > was working on the new function framework earlier this year, I spent a > > day or so perusing src/gandiva/precompiled and reasoned it would be a > > prohibitive amount of refactoring for me to undertake at that time. In > > principle many of these functions (e.g. string functions) can be > > incrementally refactored into reusable inline functions / templates > > for improved code reuse. We could also explore common infrastructure > > for unit testing and benchmarking. Anything is possible if enough > > engineering time is invested. > > > > I would hope in the future to see a generalized expression API as part > > of a logical query plan-type system (for query processing) that has > > the ability to use Gandiva (if it's available) to compile > > subexpressions for better performance. I had hoped to spend some time > > on this myself earlier this year, but I've gotten busy with some other > > things and won't be able to devote much development time to this > > myself. > > > > - Wes > > > > On Sun, Nov 29, 2020 at 11:18 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > > > > > > > There are some computations kernels in arrow and it looks that this > > part is > > > > in active development right now. I wonder if there is a document / > some > > > > emails describing what is the goal and uses cases for this part of > the > > code > > > > base. Would be very interesting to know a bit more and I would like > to > > > > contribute at some point. > > > > > > > > > > > > https://docs.google.com/document/d/1LFk3WRfWGQbJ9uitWwucjiJsZMqLh8lC1vAUOscLtj8/edit > > > talks about some of the goals of the compute module. > > > > > > I'm interested because I develop a Proof-of-concept for a declarative > > > > language to perform statistical computations on top of gandiva. > > > > > > > > > I think upon cursory examination someone (maybe Wes) thought Gandiva > and > > > the compute kernels might not play nicely together, but I can't find a > > > reference to that at the moment. > > > > > > > > > On Sat, Nov 21, 2020 at 3:09 AM Kirill Lykov <lykov.kir...@gmail.com> > > wrote: > > > > > > > Hi, > > > > > > > > There are some computations kernels in arrow and it looks that this > > part is > > > > in active development right now. I wonder if there is a document / > some > > > > emails describing what is the goal and uses cases for this part of > the > > code > > > > base. Would be very interesting to know a bit more and I would like > to > > > > contribute at some point. > > > > I'm interested because I develop a Proof-of-concept for a declarative > > > > language to perform statistical computations on top of gandiva. > > > > > > > > -- > > > > Best regards, > > > > Kirill Lykov > > > > > > >