This is something that I am also interested in.

My current approach in my personal project that uses Arrow is to use
protobuf to represent expressions (as well as logical and physical query
plans). I used the Gandiva protobuf definition as a starting point.

Protobuf works for going between different languages in the same process as
well as for passing query plans over the network. I'm passing these
protobuf definitions over the Flight protocol.

I only have support for a few simple expressions so far, but here is my
protobuf file for reference:

https://github.com/ballista-compute/ballista/blob/main/proto/ballista.proto

Andy.

On Mon, Jul 6, 2020 at 1:50 PM Steve Kim <chairm...@gmail.com> wrote:

> I have been following the discussion on a pull request (
> https://github.com/apache/arrow/pull/7030) by Hongze Zhang to use the
> high-level dataset API via JNI.
>
> An obstacle that was encountered in this PR is that there is not a good way
> to pass a filter expression via JNI. Expressions have a defined
> serialization in the C++ implementation, but this serialization includes
> enums and types that are only defined in C++ and are not accessible in
> other languages.
>
> I agree with Micah Kornfield's comment (
> https://github.com/apache/arrow/pull/7030#discussion_r425563920) that
> there
> ought to be one representation that we reuse across languages. If we had
> this cross-language functionality, then we could do the following:
>
>    1. build an arbitrary filter expression in Java
>    2. serialize the expression to bytes to be passed via JNI
>    3. deserialize from bytes to a native filter expression in the C++
>    implementation
>
> Has there already been discussion about what a cross-language
> representation of filter expressions (and possibly other parts of the
> Dataset API) might look like? I see that we use Flatbuffers in other parts
> of Arrow.
>
> What would need to change in the C++ implementation to make use of such a
> representation?
>
> Steve
>

Reply via email to