Hi Wes,

Responses inline:

On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote:

> On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com> wrote:
> >
> > The question is whether to repurpose the existing FlightData
> > structure, and allow for the metadata field to be filled in and data
> > fields to be blank (as a control message), or to wrap the FlightData
> > structure in another structure that explicitly distinguishes between
> > control and data messages.
>
> I'm not super against having metadata-only FlightData with empty body.
> One question to consider is what changes (if any) would need to be
> made to public APIs in either scenario.
>

We could leave DoGet/DoPut as-is for now, and allow empty data messages in
the future. This would be a breaking change, but wouldn't change the wire
format. I think the APIs could be changed backwards compatibly, though.



> > The other question is how to handle the metadata fields. So far, we've
> > used bytestring fields for application-defined data. This is workable
> > if you want to use Protobuf to define the contents of those fields,
> > but requires you to pack/unpack your Protobuf into/from the bytestring
> > field. If we instead used the Protobuf Any field, a dynamically typed
> > field, this would be more convenient, but then we'd be exposing
> > Protobuf types. We could alternatively use a combination of a type
> > field and a bytestring field, mimicking what the Protobuf Any type
> > looks like on the wire. I'm not sure this is actually cleaner in any
> > of the language APIs, though.
>
> Leaving the deserialization of the app metadata to the particular
> Flight implementation seems on first principles like the most flexible
> thing, if Any is used, does that mean the metadata _must_ be a
> protobuf?
>


If Any is used, we could still expose a bytes-based API, but it would have
some more wrapping. (We could put a ByteString in Any.) Then the question
would just be how to expose this (would be easier in Java, harder in C++).



> > David
> >
> > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
> > >
> > > Can one of you explain what is being proposed in non-protobuf terms?
> > > Knowledge of protobuf shouldn't be required to use Flight.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 21/10/2019 à 15:46, David Li a écrit :
> > >> Oneof doesn't actually change the wire encoding; it would just be
> > >> application-level logic. (The official guide doesn't even mention it
> > >> in the encoding docs; I found
> > >>
> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
> > >> as well.)
> > >>
> > >> If I follow you, Jacques, then you are proposing essentially inlining
> > >> the definition of Any, e.g.
> > >>
> > >> message FlightMessage {
> > >>   oneof message {
> > >>     FlightData data = 1;
> > >>     FlightAny metadata = 2;
> > >>   }
> > >> }
> > >>
> > >> message FlightAny {
> > >>   string type = 1;
> > >>   bytes data = 2;
> > >> }
> > >>
> > >> Is this correct?
> > >>
> > >> It might be nice to consider the wrapper message for DoGet/DoPut as
> > >> well, but at that point, I'd rather we be consistent with all of them,
> > >> rather than have one of the three methods do its own thing.
> > >>
> > >> Thanks,
> > >> David
> > >>
> > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
> > >>> I think we could probably expose the oneof behavior without exposing
> the
> > >>> protobuf functions. On the any... hmm. I guess we could expose as two
> > >>> fields: type and data. Then users could use it for whatever but if
> > >>> people
> > >>> wanted to treat it as any, it would work. (Basically a user could use
> > >>> any
> > >>> with it easily but they could also use any other mechanism). At
> least in
> > >>> java, the any concepts are pretty simple/diy. Are other language
> > >>> bindings
> > >>> less diy?
> > >>>
> > >>> I'm *not* hardcore against the empty FlightData + metadata but it
> just
> > >>> seemed a bit janky.
> > >>>
> > >>> Thinking about the control message/wrapper object thing, I wonder if
> we
> > >>> should redefine DoPut and DoGet to have the same property if we
> think it
> > >>> is
> > >>> a good idea...
> > >>>
> > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com>
> wrote:
> > >>>
> > >>>> I was definitely considering having control messages without data,
> and
> > >>>> I thought that could be encoded by a FlightData with only
> app_metadata
> > >>>> set. I think I understand your position now: FlightData should
> always
> > >>>> carry (some) data (with optional metadata)?
> > >>>>
> > >>>> That makes sense to me, and is consistent with the documentation on
> > >>>> FlightData in the Protobuf file. I was worried about having a
> > >>>> redundant metadata field, but oneof prevents that from happening,
> and
> > >>>> overall having a clear separation between data and control messages
> is
> > >>>> cleaner.
> > >>>>
> > >>>> As for using Protobuf's Any: so far, we've refrained from exposing
> > >>>> Protobuf by using bytes, would we want to change that now?
> > >>>>
> > >>>> Best,
> > >>>> David
> > >>>>
> > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
> > >>>>> Hey David,
> > >>>>>
> > >>>>> RE: Async: I was trying to match the pattern we use for doget/doput
> > >>>>> for
> > >>>>> async. Yes, more thinking java given java grpc's async always
> pattern.
> > >>>>>
> > >>>>> On the comment around the FlightData, I think it is overloading the
> > >>>> message
> > >>>>> to use metadata for this. If I want to send a control message
> > >>>> independently
> > >>>>> of the data message, I would have to define something like an empty
> > >>>> flight
> > >>>>> data message that has custom metadata. Why not support a container
> > >>>>> object
> > >>>>> with a oneof{FlightData, Any} in it instead so users can add more
> data
> > >>>>> as
> > >>>>> desired. The default impl could be a noop for the Any messages.
> > >>>>>
> > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li.david...@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Jacques,
> > >>>>>>
> > >>>>>> Thanks for the comments.
> > >>>>>>
> > >>>>>> - I do agree DoExchange is a better name!
> > >>>>>> - FlightData already has metadata fields as a result of prior
> > >>>>>> proposals, so I don't think we need a new message to carry that
> kind
> > >>>>>> of information.
> > >>>>>> - I like the suggestion of an async handler to handle incoming
> > >>>>>> messages as the fundamental API; it would actually be quite
> natural
> > >>>>>> to
> > >>>>>> implement in Flight/Java. I will note that it's not possible in
> > >>>>>> C++/Python without spawning a thread, though. (In essence,
> gRPC-Java
> > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
> experimental
> > >>>>>> C++ APIs that would let us do something similar to Java, but those
> > >>>>>> are
> > >>>>>> only in relatively recent gRPC versions and are still under
> > >>>>>> development (contrary to the interceptor APIs which have been
> around
> > >>>>>> for quite a while).
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> David
> > >>>>>>
> > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
> > >>>>>>> I like it. Added some comments to the doc. Might worth discussion
> > >>>>>>> here
> > >>>>>>> depending on your thoughts.
> > >>>>>>>
> > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li.david...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hey Ryan,
> > >>>>>>>>
> > >>>>>>>> Thanks for the comments.
> > >>>>>>>>
> > >>>>>>>> Concrete example: I've edited the doc to provide a Python
> strawman.
> > >>>>>>>>
> > >>>>>>>> Sync vs async: while I don't touch on it, you could interleave
> > >>>> uploads
> > >>>>>>>> and downloads if you were so inclined. Right now, synchronous
> APIs
> > >>>>>>>> make this error-prone, e.g. if both client and server wait for
> each
> > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the
> > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As
> an
> > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
> > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
> > >>>>>>>>
> > >>>>>>>> This is mostly tangential though, eventually we will want to
> design
> > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream
> > >>>>>>>> like
> > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run
> into.
> > >>>>>>>>
> > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
> main
> > >>>>>>>> concern is that depending on how you deploy, two separate calls
> > >>>>>>>> could
> > >>>>>>>> get routed to different instances. Additionally, gRPC has some
> > >>>>>>>> reconnection behaviors; if the server goes away in between the
> two
> > >>>>>>>> calls, but it then restarts or there is another instance
> available,
> > >>>>>>>> the client will happily reconnect to the new server without
> > >>>>>>>> warning.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> David
> > >>>>>>>>
> > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
> > >>>>>>>>> Hey David,
> > >>>>>>>>>
> > >>>>>>>>> I think this proposal makes a lot of sense. I like it and the
> > >>>>>>>>> possibility
> > >>>>>>>>> of remote compute via arrow buffers. One thing that would help
> me
> > >>>>>> would
> > >>>>>>>> be
> > >>>>>>>>> a concrete example of the API in a real life use case. Also,
> what
> > >>>>>> would
> > >>>>>>>> the
> > >>>>>>>>> client experience be in terms of sync vs asyc? Would the client
> > >>>>>>>>> block
> > >>>>>>>> till
> > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b)
> or
> > >>>>>>>>> would
> > >>>>>>>> the
> > >>>>>>>>> client wait to be signaled that computation was done. If the
> > >>>>>>>>> later
> > >>>>>>>>> how
> > >>>>>>>>> is
> > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
> could
> > >>>> be
> > >>>>>>>>> implemented without extending the RPC interface but rather by a
> > >>>>>>>>> function/util?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Ryan
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
> li.david...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi all,
> > >>>>>>>>>>
> > >>>>>>>>>> We've been using Flight quite successfully so far, but we have
> > >>>>>>>>>> identified a new use case on the horizon: being able to both
> > >>>>>>>>>> send
> > >>>>>>>>>> and
> > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
> I've
> > >>>>>>>>>> written up a proposal for a new RPC method:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
> > >>>>>>>>>>
> > >>>>>>>>>> Please let me know if you can't view or comment on the
> document.
> > >>>>>>>>>> I'd
> > >>>>>>>>>> appreciate any feedback; I think this is a relatively
> > >>>>>>>>>> straightforward
> > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
> > >>>>>>>>>>
> > >>>>>>>>>> This is a format change and would require a vote. I've decided
> > >>>>>>>>>> to
> > >>>>>>>>>> table the other format change I had proposed (on DoPut), as it
> > >>>>>> doesn't
> > >>>>>>>>>> functionally change Flight, just the interpretation of the
> > >>>>>>>>>> semantics.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> David
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
> > >>>>>>>>>
> > >>>>>>>>> +447540852009 | rym...@dremio.com
> > >>>>>>>>>
> > >>>>>>>>> <https://www.dremio.com/>
> > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our
> > >>>>>>>>> community
> > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
> > >>>>>>>>> <https://www.dremio.com/download>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >
>

Reply via email to