Fair enough. I'm okay with the bytes approach and the proposal looks good
to me.

On Fri, Nov 8, 2019 at 11:37 AM David Li <li.david...@gmail.com> wrote:

> I've updated the proposal.
>
> On the subject of Protobuf Any vs bytes, and how to handle
> errors/metadata, I still think using bytes is preferable:
> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
> - We wouldn't be able to practically expose the Protobuf field to C++
> users without causing build pains,
> - We can't let Python users take advantage of the Protobuf field
> without somehow being compatible with the Protobuf wheels (by linking
> to the same version, and doing magic to turn the C++ Protobufs into
> the Python ones),
> - All our other application-defined fields are already bytes.
>
> Applications that want structure can encode JSON or Protobuf Any into
> the bytes field themselves, much as you can already do for Ticket,
> commands in FlightDescriptors, and application metadata in
> DoGet/DoPut. I don't think this is (much) less efficient than using
> Any directly, since Any itself is a bytes field with a tag, and must
> invoke the Protobuf deserializer again to read the actual message.
>
> If we decide on using bytes, then I don't think it makes sense to
> define a new message with a oneof either, since it would be redundant.
>
> Thanks,
> David
>
> On 11/7/19, David Li <li.david...@gmail.com> wrote:
> > I've been extremely backlogged, I will update the proposal when I get
> > a chance and reply here when done.
> >
> > Best,
> > David
> >
> > On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote:
> >> Bumping this discussion since a couple of weeks have passed. It seems
> >> there are still some questions here, could we summarize what are the
> >> alternatives along with any public API implications so we can try to
> >> render a decision?
> >>
> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com> wrote:
> >>>
> >>> Hi Wes,
> >>>
> >>> Responses inline:
> >>>
> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote:
> >>>
> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > The question is whether to repurpose the existing FlightData
> >>> > > structure, and allow for the metadata field to be filled in and
> data
> >>> > > fields to be blank (as a control message), or to wrap the
> FlightData
> >>> > > structure in another structure that explicitly distinguishes
> between
> >>> > > control and data messages.
> >>> >
> >>> > I'm not super against having metadata-only FlightData with empty
> body.
> >>> > One question to consider is what changes (if any) would need to be
> >>> > made to public APIs in either scenario.
> >>> >
> >>>
> >>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
> >>> in
> >>> the future. This would be a breaking change, but wouldn't change the
> >>> wire
> >>> format. I think the APIs could be changed backwards compatibly, though.
> >>>
> >>>
> >>>
> >>> > > The other question is how to handle the metadata fields. So far,
> >>> > > we've
> >>> > > used bytestring fields for application-defined data. This is
> >>> > > workable
> >>> > > if you want to use Protobuf to define the contents of those fields,
> >>> > > but requires you to pack/unpack your Protobuf into/from the
> >>> > > bytestring
> >>> > > field. If we instead used the Protobuf Any field, a dynamically
> >>> > > typed
> >>> > > field, this would be more convenient, but then we'd be exposing
> >>> > > Protobuf types. We could alternatively use a combination of a type
> >>> > > field and a bytestring field, mimicking what the Protobuf Any type
> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
> any
> >>> > > of the language APIs, though.
> >>> >
> >>> > Leaving the deserialization of the app metadata to the particular
> >>> > Flight implementation seems on first principles like the most
> flexible
> >>> > thing, if Any is used, does that mean the metadata _must_ be a
> >>> > protobuf?
> >>> >
> >>>
> >>>
> >>> If Any is used, we could still expose a bytes-based API, but it would
> >>> have
> >>> some more wrapping. (We could put a ByteString in Any.) Then the
> >>> question
> >>> would just be how to expose this (would be easier in Java, harder in
> >>> C++).
> >>>
> >>>
> >>>
> >>> > > David
> >>> > >
> >>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
> >>> > > >
> >>> > > > Can one of you explain what is being proposed in non-protobuf
> >>> > > > terms?
> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
> >>> > > >
> >>> > > > Regards
> >>> > > >
> >>> > > > Antoine.
> >>> > > >
> >>> > > >
> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
> be
> >>> > > >> application-level logic. (The official guide doesn't even
> mention
> >>> > > >> it
> >>> > > >> in the encoding docs; I found
> >>> > > >>
> >>> >
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> >>> > > >> as well.)
> >>> > > >>
> >>> > > >> If I follow you, Jacques, then you are proposing essentially
> >>> > > >> inlining
> >>> > > >> the definition of Any, e.g.
> >>> > > >>
> >>> > > >> message FlightMessage {
> >>> > > >>   oneof message {
> >>> > > >>     FlightData data = 1;
> >>> > > >>     FlightAny metadata = 2;
> >>> > > >>   }
> >>> > > >> }
> >>> > > >>
> >>> > > >> message FlightAny {
> >>> > > >>   string type = 1;
> >>> > > >>   bytes data = 2;
> >>> > > >> }
> >>> > > >>
> >>> > > >> Is this correct?
> >>> > > >>
> >>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
> >>> > > >> as
> >>> > > >> well, but at that point, I'd rather we be consistent with all of
> >>> > > >> them,
> >>> > > >> rather than have one of the three methods do its own thing.
> >>> > > >>
> >>> > > >> Thanks,
> >>> > > >> David
> >>> > > >>
> >>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
> >>> > > >>> I think we could probably expose the oneof behavior without
> >>> > > >>> exposing
> >>> > the
> >>> > > >>> protobuf functions. On the any... hmm. I guess we could expose
> >>> > > >>> as
> >>> > > >>> two
> >>> > > >>> fields: type and data. Then users could use it for whatever but
> >>> > > >>> if
> >>> > > >>> people
> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
> >>> > > >>> could
> >>> > > >>> use
> >>> > > >>> any
> >>> > > >>> with it easily but they could also use any other mechanism). At
> >>> > least in
> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
> language
> >>> > > >>> bindings
> >>> > > >>> less diy?
> >>> > > >>>
> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but
> >>> > > >>> it
> >>> > just
> >>> > > >>> seemed a bit janky.
> >>> > > >>>
> >>> > > >>> Thinking about the control message/wrapper object thing, I
> >>> > > >>> wonder
> >>> > > >>> if
> >>> > we
> >>> > > >>> should redefine DoPut and DoGet to have the same property if we
> >>> > think it
> >>> > > >>> is
> >>> > > >>> a good idea...
> >>> > > >>>
> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
> li.david...@gmail.com>
> >>> > wrote:
> >>> > > >>>
> >>> > > >>>> I was definitely considering having control messages without
> >>> > > >>>> data,
> >>> > and
> >>> > > >>>> I thought that could be encoded by a FlightData with only
> >>> > app_metadata
> >>> > > >>>> set. I think I understand your position now: FlightData should
> >>> > always
> >>> > > >>>> carry (some) data (with optional metadata)?
> >>> > > >>>>
> >>> > > >>>> That makes sense to me, and is consistent with the
> >>> > > >>>> documentation
> >>> > > >>>> on
> >>> > > >>>> FlightData in the Protobuf file. I was worried about having a
> >>> > > >>>> redundant metadata field, but oneof prevents that from
> >>> > > >>>> happening,
> >>> > and
> >>> > > >>>> overall having a clear separation between data and control
> >>> > > >>>> messages
> >>> > is
> >>> > > >>>> cleaner.
> >>> > > >>>>
> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
> >>> > > >>>> exposing
> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
> >>> > > >>>>
> >>> > > >>>> Best,
> >>> > > >>>> David
> >>> > > >>>>
> >>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
> >>> > > >>>>> Hey David,
> >>> > > >>>>>
> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
> >>> > > >>>>> doget/doput
> >>> > > >>>>> for
> >>> > > >>>>> async. Yes, more thinking java given java grpc's async always
> >>> > pattern.
> >>> > > >>>>>
> >>> > > >>>>> On the comment around the FlightData, I think it is
> >>> > > >>>>> overloading
> >>> > > >>>>> the
> >>> > > >>>> message
> >>> > > >>>>> to use metadata for this. If I want to send a control message
> >>> > > >>>> independently
> >>> > > >>>>> of the data message, I would have to define something like an
> >>> > > >>>>> empty
> >>> > > >>>> flight
> >>> > > >>>>> data message that has custom metadata. Why not support a
> >>> > > >>>>> container
> >>> > > >>>>> object
> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
> >>> > > >>>>> more
> >>> > data
> >>> > > >>>>> as
> >>> > > >>>>> desired. The default impl could be a noop for the Any
> >>> > > >>>>> messages.
> >>> > > >>>>>
> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
> >>> > > >>>>> <li.david...@gmail.com>
> >>> > > >>>>> wrote:
> >>> > > >>>>>
> >>> > > >>>>>> Hi Jacques,
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks for the comments.
> >>> > > >>>>>>
> >>> > > >>>>>> - I do agree DoExchange is a better name!
> >>> > > >>>>>> - FlightData already has metadata fields as a result of
> prior
> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
> >>> > > >>>>>> that
> >>> > kind
> >>> > > >>>>>> of information.
> >>> > > >>>>>> - I like the suggestion of an async handler to handle
> >>> > > >>>>>> incoming
> >>> > > >>>>>> messages as the fundamental API; it would actually be quite
> >>> > natural
> >>> > > >>>>>> to
> >>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
> >>> > > >>>>>> in
> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
> >>> > gRPC-Java
> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> >>> > experimental
> >>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
> >>> > > >>>>>> those
> >>> > > >>>>>> are
> >>> > > >>>>>> only in relatively recent gRPC versions and are still under
> >>> > > >>>>>> development (contrary to the interceptor APIs which have
> been
> >>> > around
> >>> > > >>>>>> for quite a while).
> >>> > > >>>>>>
> >>> > > >>>>>> Thanks,
> >>> > > >>>>>> David
> >>> > > >>>>>>
> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
> >>> > > >>>>>>> discussion
> >>> > > >>>>>>> here
> >>> > > >>>>>>> depending on your thoughts.
> >>> > > >>>>>>>
> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
> >>> > > >>>>>>> <li.david...@gmail.com>
> >>> > > >>>> wrote:
> >>> > > >>>>>>>
> >>> > > >>>>>>>> Hey Ryan,
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks for the comments.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> >>> > strawman.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
> >>> > > >>>>>>>> interleave
> >>> > > >>>> uploads
> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
> >>> > > >>>>>>>> synchronous
> >>> > APIs
> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
> >>> > > >>>>>>>> for
> >>> > each
> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give
> >>> > > >>>>>>>> us
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
> >>> > > >>>>>>>> timeout.)
> >>> > > >>>>>>>> As
> >>> > an
> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> This is mostly tangential though, eventually we will want
> >>> > > >>>>>>>> to
> >>> > design
> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
> >>> > > >>>>>>>> stream
> >>> > > >>>>>>>> like
> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
> >>> > > >>>>>>>> run
> >>> > into.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
> >>> > > >>>>>>>> the
> >>> > main
> >>> > > >>>>>>>> concern is that depending on how you deploy, two separate
> >>> > > >>>>>>>> calls
> >>> > > >>>>>>>> could
> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
> >>> > > >>>>>>>> some
> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
> >>> > > >>>>>>>> the
> >>> > two
> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
> >>> > available,
> >>> > > >>>>>>>> the client will happily reconnect to the new server
> without
> >>> > > >>>>>>>> warning.
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> Thanks,
> >>> > > >>>>>>>> David
> >>> > > >>>>>>>>
> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
> >>> > > >>>>>>>>> Hey David,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> possibility
> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
> >>> > > >>>>>>>>> help
> >>> > me
> >>> > > >>>>>> would
> >>> > > >>>>>>>> be
> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
> >>> > > >>>>>>>>> Also,
> >>> > what
> >>> > > >>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
> >>> > > >>>>>>>>> client
> >>> > > >>>>>>>>> block
> >>> > > >>>>>>>> till
> >>> > > >>>>>>>>> the bidirectional call return ie c =
> flight.vector_mult(a,
> >>> > > >>>>>>>>> b)
> >>> > or
> >>> > > >>>>>>>>> would
> >>> > > >>>>>>>> the
> >>> > > >>>>>>>>> client wait to be signaled that computation was done. If
> >>> > > >>>>>>>>> the
> >>> > > >>>>>>>>> later
> >>> > > >>>>>>>>> how
> >>> > > >>>>>>>>> is
> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
> >>> > > >>>>>>>>> this
> >>> > could
> >>> > > >>>> be
> >>> > > >>>>>>>>> implemented without extending the RPC interface but
> rather
> >>> > > >>>>>>>>> by a
> >>> > > >>>>>>>>> function/util?
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Best,
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> >>> > li.david...@gmail.com>
> >>> > > >>>>>> wrote:
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>> Hi all,
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
> we
> >>> > > >>>>>>>>>> have
> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
> >>> > > >>>>>>>>>> both
> >>> > > >>>>>>>>>> send
> >>> > > >>>>>>>>>> and
> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
> >>> > > >>>>>>>>>> end,
> >>> > I've
> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>
> >>> >
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
> >>> > document.
> >>> > > >>>>>>>>>> I'd
> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> >>> > > >>>>>>>>>> straightforward
> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
> >>> > > >>>>>>>>>> decided
> >>> > > >>>>>>>>>> to
> >>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut),
> >>> > > >>>>>>>>>> as
> >>> > > >>>>>>>>>> it
> >>> > > >>>>>> doesn't
> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
> >>> > > >>>>>>>>>> the
> >>> > > >>>>>>>>>> semantics.
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>> Thanks,
> >>> > > >>>>>>>>>> David
> >>> > > >>>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> --
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> +447540852009 | rym...@dremio.com
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>> <https://www.dremio.com/>
> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
> join
> >>> > > >>>>>>>>> our
> >>> > > >>>>>>>>> community
> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> >>> > > >>>>>>>>> <https://www.dremio.com/download>
> >>> > > >>>>>>>>>
> >>> > > >>>>>>>>
> >>> > > >>>>>>>
> >>> > > >>>>>>
> >>> > > >>>>>
> >>> > > >>>>
> >>> > > >>>
> >>> > > >
> >>> >
> >>
> >
>

Reply via email to