Changing the subject as we've veered off topic On Mon, Dec 10, 2018 at 8:04 AM Andy Grove <andygrov...@gmail.com> wrote: > > Cool. I will continue to add primitive operations but I am now adding this > in a separate source file to keep it separate from the core array code. > > I'm not sure how important it will be to support Rust data sources with > Gandiva. I can see that each language should be able to construct the > logical query plan to submit to Gandiva and let Gandiva handle execution.
Note: Gandiva isn't an execution engine. It generates compiled function kernels given an expression tree. It depends on an execution engine to invoke the kernels in a database runtime-type environment -- Dremio is doing so in production already IIUC. It might be that Rust developers would choose someday to develop a Rust-native query runtime, in which case the Gandiva JIT-compiling could be used to generate custom kernels in a similar fashion to how they're being used by Dremio in Java. > I think the more interesting part is how do we support language-specific > lambda functions as part of that logical query plan. Maybe it is possible > to compile the lambda down to LLVM (I haven't started learning about LLVM > in detail yet so this is wild speculation on my part). Generally database systems define operator nodes for each type of user-defined function, and the user code is invoked dynamically similar to interpreted languages. Compiling to LLVM isn't possible in generality. > Another option is for Gandiva to support calling into shared libraries and > that maybe is > simpler for languages that support building C-native shared libraries (Rust > supports this with zero overhead). These would be C UDFs. I'm familiar with Impala's UDF system, for example: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_udf.html There you can declare a new function that is looked up in a shared library using dlopen / dlsym - Wes > > Andy. > > > > > On Sun, Dec 9, 2018 at 11:42 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi Andy, > > > > I can see an argument for having some basic native function kernel > > support in Rust. One of the things that Gandiva has begun is a > > Protobuf-based serialized representation representation of projection > > and filter expressions. In the long run I would like to see a more > > complete relational algebra / logical query plan that can be submitted > > for execution. There's complexities, though, such as bridging > > iteration of data sources written in Rust, say, with a query engine > > written in C++. You would need to provide some kind of a callback > > mechanism for the query engine to request the next chunk of a dataset > > to be materialized. > > > > It will be interested to see what contributors will be motivated > > enough to build over the next few years. At the end of the day, Apache > > projects are do-ocracies. > > > > - Wes > > On Fri, Dec 7, 2018 at 6:22 AM Andy Grove <andygrov...@gmail.com> wrote: > > > > > > I've added one PR to the list (https://github.com/apache/arrow/pull/3119 > > ) > > > to update the project to use Rust 2018 Edition. > > > > > > I'm also considering removing one PR from the list and would like to get > > > opinions here. > > > > > > I have a PR (https://github.com/apache/arrow/pull/3033) to add some > > basic > > > math and comparison operators to primitive arrays. These are baby steps > > > towards implementing more query execution capabilities such as > > projection, > > > selection, etc but Chao made a good point that other Rust implementations > > > don't have these kind of capabilities and I am now wondering if this is a > > > distraction. We already have Gandiva and the new efforts in Ursa labs and > > > it would probably make more sense to look at having Rust bindings for the > > > query execution capabilities there rather than having a competing (and > > less > > > capable) implementation in Rust. > > > > > > Thoughts? > > > > > > Andy. > > > > > > > > > > > > > > > > > > On Thu, Dec 6, 2018 at 8:42 PM paddy horan <paddyho...@hotmail.com> > > wrote: > > > > > > > Other than Andy’s PR below I’m going to try and find time to work on > > > > ARROW-3827, I’ll bump it 0.13 if I can’t find the time early next week. > > > > There is nothing else in the 0.12 backlog for Rust. It would be nice > > to > > > > get the parquet merge in though. > > > > > > > > > > > > > > > > Paddy > > > > > > > > > > > > > > > > ________________________________ > > > > From: Andy Grove <andygrov...@gmail.com> > > > > Sent: Thursday, December 6, 2018 10:20:48 AM > > > > To: dev@arrow.apache.org > > > > Subject: Re: Timeline for Arrow 0.12.0 release > > > > > > > > I have PRs pending for all the Rust issues that I want to get into > > 0.12.0 > > > > and would appreciate some reviews so I can go ahead and merge: > > > > > > > > https://github.com/apache/arrow/pull/3033 (covers ARROW-3880 and > > > > ARROW-3881 > > > > - add math and comparison operations to primitive arrays) > > > > https://github.com/apache/arrow/pull/3096 (ARROW-3885 - Rust release > > > > process) > > > > https://github.com/apache/arrow/pull/3111 (ARROW-3838 - CSV Writer) > > > > > > > > With these in place I plan on writing a tutorial for reading a CSV > > file, > > > > performing some operations on primitive arrays and writing the output > > to a > > > > new CSV file. > > > > > > > > I am deferring ARROW-3882 (casting for primitive arrays) to 0.13.0 > > > > > > > > Thanks, > > > > > > > > Andy. > > > > > > > > On Tue, Dec 4, 2018 at 7:57 PM Andy Grove <andygrov...@gmail.com> > > wrote: > > > > > > > > > I'd love to tackle the three related issues for supporting simple > > > > > math/comparison operations on primitive arrays and casting primitive > > > > arrays > > > > > but since the change to use Rust specialization feature I'm a bit > > stuck > > > > and > > > > > need some assistance applying the math operations to the numeric > > types > > > > and > > > > > not the boolean primitives. I have added a comment to > > > > > https://github.com/apache/arrow/pull/3033 ... if I can get help > > solving > > > > > for this PR then I should be able to handle the others. I'll also do > > some > > > > > research and try and figure this out myself. > > > > > > > > > > Andy. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Dec 4, 2018 at 7:03 PM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > > > >> Andy, Paddy, or other Rust developers -- could you review the 6 > > issues > > > > >> in TODO in the 0.12 backlog and either assign them or move them to > > the > > > > >> next release if they aren't going to be completed this week or next? > > > > >> > > > > >> > > > > >> On Fri, Nov 30, 2018 at 4:34 PM Wes McKinney <wesmck...@gmail.com> > > > > wrote: > > > > >> > > > > > >> > hi folks, > > > > >> > > > > > >> > Tomorrow is December 1. The last major Arrow release (0.11.0) took > > > > >> > place on October 8. Given how much work has happened in the > > project in > > > > >> > the last ~2 months, I think it would be great to complete the next > > > > >> > major release before the end-of-year holidays set in. > > > > >> > > > > > >> > I've been curating the JIRA backlog the last couple of weeks, and > > have > > > > >> > just created a 0.12.0 release wiki page to help us stay organized > > > > >> > > > > > >> > > > > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.12.0+Release > > > > >> > > > > > >> > Given that there are only 3 full working weeks between now and > > > > >> > Christmas, I think we should be in position to cut a release by > > the > > > > >> > end of the week of December 10, i.e. by Friday December 14. Not > > all of > > > > >> > the TODO issues have to be completed to make the release, but it > > would > > > > >> > be good to push to complete as much as possible. Please help by > > > > >> > reviewing the backlog, and if possible, assigning issues to > > yourself > > > > >> > that you'd like to pursue in the next 2 weeks. > > > > >> > > > > > >> > Let me know if this sounds reasonable, or any concerns. > > > > >> > > > > > >> > Thanks > > > > >> > Wes > > > > >> > > > > > > > > > > >