Hi Wes, Responses inline:
On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote: > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com> wrote: > > > > The question is whether to repurpose the existing FlightData > > structure, and allow for the metadata field to be filled in and data > > fields to be blank (as a control message), or to wrap the FlightData > > structure in another structure that explicitly distinguishes between > > control and data messages. > > I'm not super against having metadata-only FlightData with empty body. > One question to consider is what changes (if any) would need to be > made to public APIs in either scenario. > We could leave DoGet/DoPut as-is for now, and allow empty data messages in the future. This would be a breaking change, but wouldn't change the wire format. I think the APIs could be changed backwards compatibly, though. > > The other question is how to handle the metadata fields. So far, we've > > used bytestring fields for application-defined data. This is workable > > if you want to use Protobuf to define the contents of those fields, > > but requires you to pack/unpack your Protobuf into/from the bytestring > > field. If we instead used the Protobuf Any field, a dynamically typed > > field, this would be more convenient, but then we'd be exposing > > Protobuf types. We could alternatively use a combination of a type > > field and a bytestring field, mimicking what the Protobuf Any type > > looks like on the wire. I'm not sure this is actually cleaner in any > > of the language APIs, though. > > Leaving the deserialization of the app metadata to the particular > Flight implementation seems on first principles like the most flexible > thing, if Any is used, does that mean the metadata _must_ be a > protobuf? > If Any is used, we could still expose a bytes-based API, but it would have some more wrapping. (We could put a ByteString in Any.) Then the question would just be how to expose this (would be easier in Java, harder in C++). > > David > > > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote: > > > > > > Can one of you explain what is being proposed in non-protobuf terms? > > > Knowledge of protobuf shouldn't be required to use Flight. > > > > > > Regards > > > > > > Antoine. > > > > > > > > > Le 21/10/2019 à 15:46, David Li a écrit : > > >> Oneof doesn't actually change the wire encoding; it would just be > > >> application-level logic. (The official guide doesn't even mention it > > >> in the encoding docs; I found > > >> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct > > >> as well.) > > >> > > >> If I follow you, Jacques, then you are proposing essentially inlining > > >> the definition of Any, e.g. > > >> > > >> message FlightMessage { > > >> oneof message { > > >> FlightData data = 1; > > >> FlightAny metadata = 2; > > >> } > > >> } > > >> > > >> message FlightAny { > > >> string type = 1; > > >> bytes data = 2; > > >> } > > >> > > >> Is this correct? > > >> > > >> It might be nice to consider the wrapper message for DoGet/DoPut as > > >> well, but at that point, I'd rather we be consistent with all of them, > > >> rather than have one of the three methods do its own thing. > > >> > > >> Thanks, > > >> David > > >> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote: > > >>> I think we could probably expose the oneof behavior without exposing > the > > >>> protobuf functions. On the any... hmm. I guess we could expose as two > > >>> fields: type and data. Then users could use it for whatever but if > > >>> people > > >>> wanted to treat it as any, it would work. (Basically a user could use > > >>> any > > >>> with it easily but they could also use any other mechanism). At > least in > > >>> java, the any concepts are pretty simple/diy. Are other language > > >>> bindings > > >>> less diy? > > >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it > just > > >>> seemed a bit janky. > > >>> > > >>> Thinking about the control message/wrapper object thing, I wonder if > we > > >>> should redefine DoPut and DoGet to have the same property if we > think it > > >>> is > > >>> a good idea... > > >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com> > wrote: > > >>> > > >>>> I was definitely considering having control messages without data, > and > > >>>> I thought that could be encoded by a FlightData with only > app_metadata > > >>>> set. I think I understand your position now: FlightData should > always > > >>>> carry (some) data (with optional metadata)? > > >>>> > > >>>> That makes sense to me, and is consistent with the documentation on > > >>>> FlightData in the Protobuf file. I was worried about having a > > >>>> redundant metadata field, but oneof prevents that from happening, > and > > >>>> overall having a clear separation between data and control messages > is > > >>>> cleaner. > > >>>> > > >>>> As for using Protobuf's Any: so far, we've refrained from exposing > > >>>> Protobuf by using bytes, would we want to change that now? > > >>>> > > >>>> Best, > > >>>> David > > >>>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote: > > >>>>> Hey David, > > >>>>> > > >>>>> RE: Async: I was trying to match the pattern we use for doget/doput > > >>>>> for > > >>>>> async. Yes, more thinking java given java grpc's async always > pattern. > > >>>>> > > >>>>> On the comment around the FlightData, I think it is overloading the > > >>>> message > > >>>>> to use metadata for this. If I want to send a control message > > >>>> independently > > >>>>> of the data message, I would have to define something like an empty > > >>>> flight > > >>>>> data message that has custom metadata. Why not support a container > > >>>>> object > > >>>>> with a oneof{FlightData, Any} in it instead so users can add more > data > > >>>>> as > > >>>>> desired. The default impl could be a noop for the Any messages. > > >>>>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li <li.david...@gmail.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi Jacques, > > >>>>>> > > >>>>>> Thanks for the comments. > > >>>>>> > > >>>>>> - I do agree DoExchange is a better name! > > >>>>>> - FlightData already has metadata fields as a result of prior > > >>>>>> proposals, so I don't think we need a new message to carry that > kind > > >>>>>> of information. > > >>>>>> - I like the suggestion of an async handler to handle incoming > > >>>>>> messages as the fundamental API; it would actually be quite > natural > > >>>>>> to > > >>>>>> implement in Flight/Java. I will note that it's not possible in > > >>>>>> C++/Python without spawning a thread, though. (In essence, > gRPC-Java > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are > experimental > > >>>>>> C++ APIs that would let us do something similar to Java, but those > > >>>>>> are > > >>>>>> only in relatively recent gRPC versions and are still under > > >>>>>> development (contrary to the interceptor APIs which have been > around > > >>>>>> for quite a while). > > >>>>>> > > >>>>>> Thanks, > > >>>>>> David > > >>>>>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote: > > >>>>>>> I like it. Added some comments to the doc. Might worth discussion > > >>>>>>> here > > >>>>>>> depending on your thoughts. > > >>>>>>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li <li.david...@gmail.com> > > >>>> wrote: > > >>>>>>> > > >>>>>>>> Hey Ryan, > > >>>>>>>> > > >>>>>>>> Thanks for the comments. > > >>>>>>>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python > strawman. > > >>>>>>>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could interleave > > >>>> uploads > > >>>>>>>> and downloads if you were so inclined. Right now, synchronous > APIs > > >>>>>>>> make this error-prone, e.g. if both client and server wait for > each > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us the > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) As > an > > >>>>>>>> example of this happening with DoPut, see ARROW-6063: > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063 > > >>>>>>>> > > >>>>>>>> This is mostly tangential though, eventually we will want to > design > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional stream > > >>>>>>>> like > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to run > into. > > >>>>>>>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the > main > > >>>>>>>> concern is that depending on how you deploy, two separate calls > > >>>>>>>> could > > >>>>>>>> get routed to different instances. Additionally, gRPC has some > > >>>>>>>> reconnection behaviors; if the server goes away in between the > two > > >>>>>>>> calls, but it then restarts or there is another instance > available, > > >>>>>>>> the client will happily reconnect to the new server without > > >>>>>>>> warning. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> David > > >>>>>>>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote: > > >>>>>>>>> Hey David, > > >>>>>>>>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and the > > >>>>>>>>> possibility > > >>>>>>>>> of remote compute via arrow buffers. One thing that would help > me > > >>>>>> would > > >>>>>>>> be > > >>>>>>>>> a concrete example of the API in a real life use case. Also, > what > > >>>>>> would > > >>>>>>>> the > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the client > > >>>>>>>>> block > > >>>>>>>> till > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, b) > or > > >>>>>>>>> would > > >>>>>>>> the > > >>>>>>>>> client wait to be signaled that computation was done. If the > > >>>>>>>>> later > > >>>>>>>>> how > > >>>>>>>>> is > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this > could > > >>>> be > > >>>>>>>>> implemented without extending the RPC interface but rather by a > > >>>>>>>>> function/util? > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> > > >>>>>>>>> Ryan > > >>>>>>>>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li < > li.david...@gmail.com> > > >>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi all, > > >>>>>>>>>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we have > > >>>>>>>>>> identified a new use case on the horizon: being able to both > > >>>>>>>>>> send > > >>>>>>>>>> and > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, > I've > > >>>>>>>>>> written up a proposal for a new RPC method: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing > > >>>>>>>>>> > > >>>>>>>>>> Please let me know if you can't view or comment on the > document. > > >>>>>>>>>> I'd > > >>>>>>>>>> appreciate any feedback; I think this is a relatively > > >>>>>>>>>> straightforward > > >>>>>>>>>> addition - it is essentially "DoPutThenGet". > > >>>>>>>>>> > > >>>>>>>>>> This is a format change and would require a vote. I've decided > > >>>>>>>>>> to > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as it > > >>>>>> doesn't > > >>>>>>>>>> functionally change Flight, just the interpretation of the > > >>>>>>>>>> semantics. > > >>>>>>>>>> > > >>>>>>>>>> Thanks, > > >>>>>>>>>> David > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> > > >>>>>>>>> Ryan Murray | Principal Consulting Engineer > > >>>>>>>>> > > >>>>>>>>> +447540852009 | rym...@dremio.com > > >>>>>>>>> > > >>>>>>>>> <https://www.dremio.com/> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join our > > >>>>>>>>> community > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio > > >>>>>>>>> <https://www.dremio.com/download> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > >