I've been extremely backlogged, I will update the proposal when I get
a chance and reply here when done.

Best,
David

On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote:
> Bumping this discussion since a couple of weeks have passed. It seems
> there are still some questions here, could we summarize what are the
> alternatives along with any public API implications so we can try to
> render a decision?
>
> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com> wrote:
>>
>> Hi Wes,
>>
>> Responses inline:
>>
>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com> wrote:
>> > >
>> > > The question is whether to repurpose the existing FlightData
>> > > structure, and allow for the metadata field to be filled in and data
>> > > fields to be blank (as a control message), or to wrap the FlightData
>> > > structure in another structure that explicitly distinguishes between
>> > > control and data messages.
>> >
>> > I'm not super against having metadata-only FlightData with empty body.
>> > One question to consider is what changes (if any) would need to be
>> > made to public APIs in either scenario.
>> >
>>
>> We could leave DoGet/DoPut as-is for now, and allow empty data messages
>> in
>> the future. This would be a breaking change, but wouldn't change the wire
>> format. I think the APIs could be changed backwards compatibly, though.
>>
>>
>>
>> > > The other question is how to handle the metadata fields. So far,
>> > > we've
>> > > used bytestring fields for application-defined data. This is workable
>> > > if you want to use Protobuf to define the contents of those fields,
>> > > but requires you to pack/unpack your Protobuf into/from the
>> > > bytestring
>> > > field. If we instead used the Protobuf Any field, a dynamically typed
>> > > field, this would be more convenient, but then we'd be exposing
>> > > Protobuf types. We could alternatively use a combination of a type
>> > > field and a bytestring field, mimicking what the Protobuf Any type
>> > > looks like on the wire. I'm not sure this is actually cleaner in any
>> > > of the language APIs, though.
>> >
>> > Leaving the deserialization of the app metadata to the particular
>> > Flight implementation seems on first principles like the most flexible
>> > thing, if Any is used, does that mean the metadata _must_ be a
>> > protobuf?
>> >
>>
>>
>> If Any is used, we could still expose a bytes-based API, but it would
>> have
>> some more wrapping. (We could put a ByteString in Any.) Then the question
>> would just be how to expose this (would be easier in Java, harder in
>> C++).
>>
>>
>>
>> > > David
>> > >
>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
>> > > >
>> > > > Can one of you explain what is being proposed in non-protobuf
>> > > > terms?
>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> > > >
>> > > > Regards
>> > > >
>> > > > Antoine.
>> > > >
>> > > >
>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> > > >> Oneof doesn't actually change the wire encoding; it would just be
>> > > >> application-level logic. (The official guide doesn't even mention
>> > > >> it
>> > > >> in the encoding docs; I found
>> > > >>
>> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> > > >> as well.)
>> > > >>
>> > > >> If I follow you, Jacques, then you are proposing essentially
>> > > >> inlining
>> > > >> the definition of Any, e.g.
>> > > >>
>> > > >> message FlightMessage {
>> > > >>   oneof message {
>> > > >>     FlightData data = 1;
>> > > >>     FlightAny metadata = 2;
>> > > >>   }
>> > > >> }
>> > > >>
>> > > >> message FlightAny {
>> > > >>   string type = 1;
>> > > >>   bytes data = 2;
>> > > >> }
>> > > >>
>> > > >> Is this correct?
>> > > >>
>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut
>> > > >> as
>> > > >> well, but at that point, I'd rather we be consistent with all of
>> > > >> them,
>> > > >> rather than have one of the three methods do its own thing.
>> > > >>
>> > > >> Thanks,
>> > > >> David
>> > > >>
>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> > > >>> I think we could probably expose the oneof behavior without
>> > > >>> exposing
>> > the
>> > > >>> protobuf functions. On the any... hmm. I guess we could expose as
>> > > >>> two
>> > > >>> fields: type and data. Then users could use it for whatever but
>> > > >>> if
>> > > >>> people
>> > > >>> wanted to treat it as any, it would work. (Basically a user could
>> > > >>> use
>> > > >>> any
>> > > >>> with it easily but they could also use any other mechanism). At
>> > least in
>> > > >>> java, the any concepts are pretty simple/diy. Are other language
>> > > >>> bindings
>> > > >>> less diy?
>> > > >>>
>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it
>> > just
>> > > >>> seemed a bit janky.
>> > > >>>
>> > > >>> Thinking about the control message/wrapper object thing, I wonder
>> > > >>> if
>> > we
>> > > >>> should redefine DoPut and DoGet to have the same property if we
>> > think it
>> > > >>> is
>> > > >>> a good idea...
>> > > >>>
>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com>
>> > wrote:
>> > > >>>
>> > > >>>> I was definitely considering having control messages without
>> > > >>>> data,
>> > and
>> > > >>>> I thought that could be encoded by a FlightData with only
>> > app_metadata
>> > > >>>> set. I think I understand your position now: FlightData should
>> > always
>> > > >>>> carry (some) data (with optional metadata)?
>> > > >>>>
>> > > >>>> That makes sense to me, and is consistent with the documentation
>> > > >>>> on
>> > > >>>> FlightData in the Protobuf file. I was worried about having a
>> > > >>>> redundant metadata field, but oneof prevents that from
>> > > >>>> happening,
>> > and
>> > > >>>> overall having a clear separation between data and control
>> > > >>>> messages
>> > is
>> > > >>>> cleaner.
>> > > >>>>
>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> > > >>>> exposing
>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> > > >>>>
>> > > >>>> Best,
>> > > >>>> David
>> > > >>>>
>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> > > >>>>> Hey David,
>> > > >>>>>
>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> > > >>>>> doget/doput
>> > > >>>>> for
>> > > >>>>> async. Yes, more thinking java given java grpc's async always
>> > pattern.
>> > > >>>>>
>> > > >>>>> On the comment around the FlightData, I think it is overloading
>> > > >>>>> the
>> > > >>>> message
>> > > >>>>> to use metadata for this. If I want to send a control message
>> > > >>>> independently
>> > > >>>>> of the data message, I would have to define something like an
>> > > >>>>> empty
>> > > >>>> flight
>> > > >>>>> data message that has custom metadata. Why not support a
>> > > >>>>> container
>> > > >>>>> object
>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add
>> > > >>>>> more
>> > data
>> > > >>>>> as
>> > > >>>>> desired. The default impl could be a noop for the Any messages.
>> > > >>>>>
>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> > > >>>>> <li.david...@gmail.com>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>>> Hi Jacques,
>> > > >>>>>>
>> > > >>>>>> Thanks for the comments.
>> > > >>>>>>
>> > > >>>>>> - I do agree DoExchange is a better name!
>> > > >>>>>> - FlightData already has metadata fields as a result of prior
>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> > > >>>>>> that
>> > kind
>> > > >>>>>> of information.
>> > > >>>>>> - I like the suggestion of an async handler to handle incoming
>> > > >>>>>> messages as the fundamental API; it would actually be quite
>> > natural
>> > > >>>>>> to
>> > > >>>>>> implement in Flight/Java. I will note that it's not possible
>> > > >>>>>> in
>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> > gRPC-Java
>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> > experimental
>> > > >>>>>> C++ APIs that would let us do something similar to Java, but
>> > > >>>>>> those
>> > > >>>>>> are
>> > > >>>>>> only in relatively recent gRPC versions and are still under
>> > > >>>>>> development (contrary to the interceptor APIs which have been
>> > around
>> > > >>>>>> for quite a while).
>> > > >>>>>>
>> > > >>>>>> Thanks,
>> > > >>>>>> David
>> > > >>>>>>
>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> > > >>>>>>> discussion
>> > > >>>>>>> here
>> > > >>>>>>> depending on your thoughts.
>> > > >>>>>>>
>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> > > >>>>>>> <li.david...@gmail.com>
>> > > >>>> wrote:
>> > > >>>>>>>
>> > > >>>>>>>> Hey Ryan,
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks for the comments.
>> > > >>>>>>>>
>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python
>> > strawman.
>> > > >>>>>>>>
>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> > > >>>>>>>> interleave
>> > > >>>> uploads
>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> > > >>>>>>>> synchronous
>> > APIs
>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait
>> > > >>>>>>>> for
>> > each
>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us
>> > > >>>>>>>> the
>> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.)
>> > > >>>>>>>> As
>> > an
>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> > > >>>>>>>>
>> > > >>>>>>>> This is mostly tangential though, eventually we will want to
>> > design
>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> > > >>>>>>>> stream
>> > > >>>>>>>> like
>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to
>> > > >>>>>>>> run
>> > into.
>> > > >>>>>>>>
>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the
>> > main
>> > > >>>>>>>> concern is that depending on how you deploy, two separate
>> > > >>>>>>>> calls
>> > > >>>>>>>> could
>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has
>> > > >>>>>>>> some
>> > > >>>>>>>> reconnection behaviors; if the server goes away in between
>> > > >>>>>>>> the
>> > two
>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> > available,
>> > > >>>>>>>> the client will happily reconnect to the new server without
>> > > >>>>>>>> warning.
>> > > >>>>>>>>
>> > > >>>>>>>> Thanks,
>> > > >>>>>>>> David
>> > > >>>>>>>>
>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
>> > > >>>>>>>>> Hey David,
>> > > >>>>>>>>>
>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and
>> > > >>>>>>>>> the
>> > > >>>>>>>>> possibility
>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would
>> > > >>>>>>>>> help
>> > me
>> > > >>>>>> would
>> > > >>>>>>>> be
>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> > > >>>>>>>>> Also,
>> > what
>> > > >>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the
>> > > >>>>>>>>> client
>> > > >>>>>>>>> block
>> > > >>>>>>>> till
>> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a,
>> > > >>>>>>>>> b)
>> > or
>> > > >>>>>>>>> would
>> > > >>>>>>>> the
>> > > >>>>>>>>> client wait to be signaled that computation was done. If
>> > > >>>>>>>>> the
>> > > >>>>>>>>> later
>> > > >>>>>>>>> how
>> > > >>>>>>>>> is
>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this
>> > could
>> > > >>>> be
>> > > >>>>>>>>> implemented without extending the RPC interface but rather
>> > > >>>>>>>>> by a
>> > > >>>>>>>>> function/util?
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> Best,
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan
>> > > >>>>>>>>>
>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> > li.david...@gmail.com>
>> > > >>>>>> wrote:
>> > > >>>>>>>>>
>> > > >>>>>>>>>> Hi all,
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we
>> > > >>>>>>>>>> have
>> > > >>>>>>>>>> identified a new use case on the horizon: being able to
>> > > >>>>>>>>>> both
>> > > >>>>>>>>>> send
>> > > >>>>>>>>>> and
>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end,
>> > I've
>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> > > >>>>>>>>>>
>> > > >>>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>
>> > > >>>>
>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> > document.
>> > > >>>>>>>>>> I'd
>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> > > >>>>>>>>>> straightforward
>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> > > >>>>>>>>>> decided
>> > > >>>>>>>>>> to
>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as
>> > > >>>>>>>>>> it
>> > > >>>>>> doesn't
>> > > >>>>>>>>>> functionally change Flight, just the interpretation of the
>> > > >>>>>>>>>> semantics.
>> > > >>>>>>>>>>
>> > > >>>>>>>>>> Thanks,
>> > > >>>>>>>>>> David
>> > > >>>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>>
>> > > >>>>>>>>> --
>> > > >>>>>>>>>
>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> > > >>>>>>>>>
>> > > >>>>>>>>> +447540852009 | rym...@dremio.com
>> > > >>>>>>>>>
>> > > >>>>>>>>> <https://www.dremio.com/>
>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join
>> > > >>>>>>>>> our
>> > > >>>>>>>>> community
>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> > > >>>>>>>>> <https://www.dremio.com/download>
>> > > >>>>>>>>>
>> > > >>>>>>>>
>> > > >>>>>>>
>> > > >>>>>>
>> > > >>>>>
>> > > >>>>
>> > > >>>
>> > > >
>> >
>

Reply via email to