hey Andy, I replied on GitHub and then saw your e-mail thread.
The Gandiva library as it stands right now is not a query engine or an execution engine, properly speaking. It is a subgraph compiler for creating accelerated expressions for use inside another execution or query engine, like it is being used now in Dremio. For this reason I am -1 on adding logical query plan definitions to Gandiva until a more rigorous design effort takes place to decide where to build an actual query/execution engine (which includes file / dataset scanners, projections, joins, aggregates, filters, etc.) in C++. My preference is to start building a from-the-ground-up system that will depend on Gandiva to compile expressions during execution. Among other things, I don't think it is necessarily a good idea to require a query engine to depend on LLVM, so tight coupling to an LLVM-based component may not be desirable. In the meantime, if you want to start creating an (experimental) Protobuf / Flatbuffer definition to define a general query execution plan (that lives outside Gandiva for the time being) to assist with building a query engine in Rust, I think that is fine, but I want to make sure we are being deliberate and layering the project components in a good way - Wes On Sat, Jan 5, 2019 at 8:15 AM Andy Grove <andygrov...@gmail.com> wrote: > > I have created a PR to start a discussion around representing logical query > plans in Gandiva (ARROW-4163). > > https://github.com/apache/arrow/pull/3319 > > I think that adding the various steps such as projection, selection, sort, > and so on are fairly simple and not contentious. The harder part is how we > represent data sources since this likely has different meanings to > different use cases. My thought is that we can register data sources by > name (similar to CREATE EXTERNAL TABLE in Hadoop) or tie this into the IPC > meta-data somehow so we can pass memory addresses and schema information. > > I would love to hear others thoughts on this. > > Thanks, > > Andy.