I have been following the discussion on a pull request (
https://github.com/apache/arrow/pull/7030) by Hongze Zhang to use the
high-level dataset API via JNI.

An obstacle that was encountered in this PR is that there is not a good way
to pass a filter expression via JNI. Expressions have a defined
serialization in the C++ implementation, but this serialization includes
enums and types that are only defined in C++ and are not accessible in
other languages.

I agree with Micah Kornfield's comment (
https://github.com/apache/arrow/pull/7030#discussion_r425563920) that there
ought to be one representation that we reuse across languages. If we had
this cross-language functionality, then we could do the following:

   1. build an arbitrary filter expression in Java
   2. serialize the expression to bytes to be passed via JNI
   3. deserialize from bytes to a native filter expression in the C++
   implementation

Has there already been discussion about what a cross-language
representation of filter expressions (and possibly other parts of the
Dataset API) might look like? I see that we use Flatbuffers in other parts
of Arrow.

What would need to change in the C++ implementation to make use of such a
representation?

Steve

Reply via email to