I have been following the discussion on a pull request ( https://github.com/apache/arrow/pull/7030) by Hongze Zhang to use the high-level dataset API via JNI.
An obstacle that was encountered in this PR is that there is not a good way to pass a filter expression via JNI. Expressions have a defined serialization in the C++ implementation, but this serialization includes enums and types that are only defined in C++ and are not accessible in other languages. I agree with Micah Kornfield's comment ( https://github.com/apache/arrow/pull/7030#discussion_r425563920) that there ought to be one representation that we reuse across languages. If we had this cross-language functionality, then we could do the following: 1. build an arbitrary filter expression in Java 2. serialize the expression to bytes to be passed via JNI 3. deserialize from bytes to a native filter expression in the C++ implementation Has there already been discussion about what a cross-language representation of filter expressions (and possibly other parts of the Dataset API) might look like? I see that we use Flatbuffers in other parts of Arrow. What would need to change in the C++ implementation to make use of such a representation? Steve