Re: [ext] Re: language independent representation of filter expressions

Wes McKinney Mon, 27 Jul 2020 14:45:09 -0700

I am OK with using the .proto files for now while the serialization
protocol is in development and focusing on capturing the functional
requirements and leaving the Protobuf vs Flatbuffers debate for later.


I don't think that JSON is an adequate substitute, because if Protobuf
is the desired / official serialization protocol, then we must ship
C++ libraries that support it, which means pulling in the
libprotobuf.a dependency. This also creates a transitive dependency on
Protocol Buffers for any third party library that wants to import or
export the serialized expressions, so it won't just be the Arrow C++
libraries that are impacted, but also any library that wants to
participate in the ecosystem. Flatbuffers by contrast can be used in
C/C++ applications without introducing any build or runtime
dependencies (indeed, when you do "cmake .; make" in the Arrow C++
project now, Flatbuffers is not required to have IPC protocol support
because we have the Flatbuffers header files and generated sources
checked in to the repo). This is a meaningful tax on the ecosystem
that IMHO significantly outweighs the negative effects of a worse API
/ developer UX when using Flatbuffers in Java.

On Fri, Jul 24, 2020 at 3:28 PM Jacques Nadeau <jacq...@apache.org> wrote:
>
> Per my other email, you can generate JSON that is canonical protobuf if you
> don't want to pull the protobuf dependency. In terms of field typing: I
> could see treating that as optional in a user expression that is resolved
> later.
>
> On Fri, Jul 24, 2020 at 12:47 PM Patrick Pai <p...@drwholdings.com> wrote:
>
> > I only briefly looked into the Gandiva protobuf, but one issue seems to be
> > using protobuf (Wes is against this for dependency reasons). There's also
> > some inconsistencies between the Gandiva protobuf and how filter
> > expressions should be represented, i.e. in the Gandiva protobuf fields are
> > typed when I think fields should just contain a field name.
> >
> > -----Original Message-----
> > From: Jacques Nadeau <jacq...@apache.org>
> > Sent: Thursday, July 23, 2020 10:14 PM
> > To: dev <dev@arrow.apache.org>
> > Subject: [ext] Re: language independent representation of filter
> > expressions
> >
> > Have you tried to use the existing expression representation provided by
> > Gandiva? What are the issues you've seen with it?
> >
> > On Wed, Jul 22, 2020 at 10:24 AM Patrick Pai <p...@drwholdings.com> wrote:
> >
> > > Hi all,
> > >
> > > After some discussion with Steve, we'd like to propose and get
> > > feedback on an alternative to representing expressions entirely with
> > flatbuffers.
> > >
> > > To give some context, we thought about how we'd construct flatbuffer
> > > expressions in Java or another language if we went down that route. We
> > > realized that it'd be possible, but not user friendly. An example is
> > > specifying an array of int values in Java for an InExpression. In
> > > Java, we'd ideally have some user-friendly class (i.e. arrow's
> > > IntVector) that then gets converted to the appropriate flatbuffer
> > > representation. I think this is what Jacques was saying about language
> > > support being too weak - it's possible for Java users to construct a
> > > flatbuffer expression, but not easily without an additional conversion
> > layer for every language.
> > >
> > > An alternative we're thinking about is to only represent enum values
> > (i.e.
> > > those defined in arrow::dataset::ExpressionType::type) in a flatbuffer
> > > schema, and rely on the existing IPC format (used to
> > > serialize/deserialize cpp expressions) to pass the struct array
> > > representation of an expression from for example Java to C++. The one
> > > difference is in the struct array representation, we use the enum
> > > values defined in our flatbuffer schema instead of existing cpp enums.
> > > This approach requires us on the Java side (and languages other than
> > > C++) to construct the struct array, but the benefit is minimal changes
> > > to the C++ code (the main change being using our flatbuffer schema enum
> > values).
> > >
> > >
> > > On 2020/07/13 09:21:19, Antoine Pitrou <solip...@pitrou.net> wrote:
> > > > On Sat, 11 Jul 2020 09:55:16 -0700
> > > > Jacques Nadeau <jacq...@apache.org> wrote:
> > > > >
> > > > > I'm against extending use of flatbuf within Arrow. The language
> > > support is
> > > > > too weak. Language support isn't just about having a binding for
> > > different
> > > > > languages, it is about having a high-quality binding.
> > > >
> > > > Could you please expand on this?  ("the language support is too
> > > > weak")
> > > >
> > > > Thank you
> > > >
> > > > Antoine.
> > > >
> > > >
> > > >
> > >
> > > This e-mail and any attachments may contain information that is
> > > confidential and proprietary and otherwise protected from disclosure.
> > > If you are not the intended recipient of this e-mail, do not read,
> > > duplicate or redistribute it by any means. Please immediately delete
> > > it and any attachments and notify the sender that you have received it
> > by mistake.
> > > Unintended recipients are prohibited from taking action on the basis
> > > of information in this e-mail or any attachments. The DRW Companies
> > > make no representations that this e-mail or any attachments are free
> > > of computer viruses or other defects.
> > >
> > This e-mail and any attachments may contain information that is
> > confidential and proprietary and otherwise protected from disclosure. If
> > you are not the intended recipient of this e-mail, do not read, duplicate
> > or redistribute it by any means. Please immediately delete it and any
> > attachments and notify the sender that you have received it by mistake.
> > Unintended recipients are prohibited from taking action on the basis of
> > information in this e-mail or any attachments. The DRW Companies make no
> > representations that this e-mail or any attachments are free of computer
> > viruses or other defects.
> >

Re: [ext] Re: language independent representation of filter expressions

Reply via email to