I am OK with using the .proto files for now while the serialization protocol is in development and focusing on capturing the functional requirements and leaving the Protobuf vs Flatbuffers debate for later.
I don't think that JSON is an adequate substitute, because if Protobuf is the desired / official serialization protocol, then we must ship C++ libraries that support it, which means pulling in the libprotobuf.a dependency. This also creates a transitive dependency on Protocol Buffers for any third party library that wants to import or export the serialized expressions, so it won't just be the Arrow C++ libraries that are impacted, but also any library that wants to participate in the ecosystem. Flatbuffers by contrast can be used in C/C++ applications without introducing any build or runtime dependencies (indeed, when you do "cmake .; make" in the Arrow C++ project now, Flatbuffers is not required to have IPC protocol support because we have the Flatbuffers header files and generated sources checked in to the repo). This is a meaningful tax on the ecosystem that IMHO significantly outweighs the negative effects of a worse API / developer UX when using Flatbuffers in Java. On Fri, Jul 24, 2020 at 3:28 PM Jacques Nadeau <jacq...@apache.org> wrote: > > Per my other email, you can generate JSON that is canonical protobuf if you > don't want to pull the protobuf dependency. In terms of field typing: I > could see treating that as optional in a user expression that is resolved > later. > > On Fri, Jul 24, 2020 at 12:47 PM Patrick Pai <p...@drwholdings.com> wrote: > > > I only briefly looked into the Gandiva protobuf, but one issue seems to be > > using protobuf (Wes is against this for dependency reasons). There's also > > some inconsistencies between the Gandiva protobuf and how filter > > expressions should be represented, i.e. in the Gandiva protobuf fields are > > typed when I think fields should just contain a field name. > > > > -----Original Message----- > > From: Jacques Nadeau <jacq...@apache.org> > > Sent: Thursday, July 23, 2020 10:14 PM > > To: dev <dev@arrow.apache.org> > > Subject: [ext] Re: language independent representation of filter > > expressions > > > > Have you tried to use the existing expression representation provided by > > Gandiva? What are the issues you've seen with it? > > > > On Wed, Jul 22, 2020 at 10:24 AM Patrick Pai <p...@drwholdings.com> wrote: > > > > > Hi all, > > > > > > After some discussion with Steve, we'd like to propose and get > > > feedback on an alternative to representing expressions entirely with > > flatbuffers. > > > > > > To give some context, we thought about how we'd construct flatbuffer > > > expressions in Java or another language if we went down that route. We > > > realized that it'd be possible, but not user friendly. An example is > > > specifying an array of int values in Java for an InExpression. In > > > Java, we'd ideally have some user-friendly class (i.e. arrow's > > > IntVector) that then gets converted to the appropriate flatbuffer > > > representation. I think this is what Jacques was saying about language > > > support being too weak - it's possible for Java users to construct a > > > flatbuffer expression, but not easily without an additional conversion > > layer for every language. > > > > > > An alternative we're thinking about is to only represent enum values > > (i.e. > > > those defined in arrow::dataset::ExpressionType::type) in a flatbuffer > > > schema, and rely on the existing IPC format (used to > > > serialize/deserialize cpp expressions) to pass the struct array > > > representation of an expression from for example Java to C++. The one > > > difference is in the struct array representation, we use the enum > > > values defined in our flatbuffer schema instead of existing cpp enums. > > > This approach requires us on the Java side (and languages other than > > > C++) to construct the struct array, but the benefit is minimal changes > > > to the C++ code (the main change being using our flatbuffer schema enum > > values). > > > > > > > > > On 2020/07/13 09:21:19, Antoine Pitrou <solip...@pitrou.net> wrote: > > > > On Sat, 11 Jul 2020 09:55:16 -0700 > > > > Jacques Nadeau <jacq...@apache.org> wrote: > > > > > > > > > > I'm against extending use of flatbuf within Arrow. The language > > > support is > > > > > too weak. Language support isn't just about having a binding for > > > different > > > > > languages, it is about having a high-quality binding. > > > > > > > > Could you please expand on this? ("the language support is too > > > > weak") > > > > > > > > Thank you > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > This e-mail and any attachments may contain information that is > > > confidential and proprietary and otherwise protected from disclosure. > > > If you are not the intended recipient of this e-mail, do not read, > > > duplicate or redistribute it by any means. Please immediately delete > > > it and any attachments and notify the sender that you have received it > > by mistake. > > > Unintended recipients are prohibited from taking action on the basis > > > of information in this e-mail or any attachments. The DRW Companies > > > make no representations that this e-mail or any attachments are free > > > of computer viruses or other defects. > > > > > This e-mail and any attachments may contain information that is > > confidential and proprietary and otherwise protected from disclosure. If > > you are not the intended recipient of this e-mail, do not read, duplicate > > or redistribute it by any means. Please immediately delete it and any > > attachments and notify the sender that you have received it by mistake. > > Unintended recipients are prohibited from taking action on the basis of > > information in this e-mail or any attachments. The DRW Companies make no > > representations that this e-mail or any attachments are free of computer > > viruses or other defects. > >