I've been extremely backlogged, I will update the proposal when I get a chance and reply here when done.
Best, David On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote: > Bumping this discussion since a couple of weeks have passed. It seems > there are still some questions here, could we summarize what are the > alternatives along with any public API implications so we can try to > render a decision? > > On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com> wrote: >> >> Hi Wes, >> >> Responses inline: >> >> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote: >> >> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com> wrote: >> > > >> > > The question is whether to repurpose the existing FlightData >> > > structure, and allow for the metadata field to be filled in and data >> > > fields to be blank (as a control message), or to wrap the FlightData >> > > structure in another structure that explicitly distinguishes between >> > > control and data messages. >> > >> > I'm not super against having metadata-only FlightData with empty body. >> > One question to consider is what changes (if any) would need to be >> > made to public APIs in either scenario. >> > >> >> We could leave DoGet/DoPut as-is for now, and allow empty data messages >> in >> the future. This would be a breaking change, but wouldn't change the wire >> format. I think the APIs could be changed backwards compatibly, though. >> >> >> >> > > The other question is how to handle the metadata fields. So far, >> > > we've >> > > used bytestring fields for application-defined data. This is workable >> > > if you want to use Protobuf to define the contents of those fields, >> > > but requires you to pack/unpack your Protobuf into/from the >> > > bytestring >> > > field. If we instead used the Protobuf Any field, a dynamically typed >> > > field, this would be more convenient, but then we'd be exposing >> > > Protobuf types. We could alternatively use a combination of a type >> > > field and a bytestring field, mimicking what the Protobuf Any type >> > > looks like on the wire. I'm not sure this is actually cleaner in any >> > > of the language APIs, though. >> > >> > Leaving the deserialization of the app metadata to the particular >> > Flight implementation seems on first principles like the most flexible >> > thing, if Any is used, does that mean the metadata _must_ be a >> > protobuf? >> > >> >> >> If Any is used, we could still expose a bytes-based API, but it would >> have >> some more wrapping. (We could put a ByteString in Any.) Then the question >> would just be how to expose this (would be easier in Java, harder in >> C++). >> >> >> >> > > David >> > > >> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote: >> > > > >> > > > Can one of you explain what is being proposed in non-protobuf >> > > > terms? >> > > > Knowledge of protobuf shouldn't be required to use Flight. >> > > > >> > > > Regards >> > > > >> > > > Antoine. >> > > > >> > > > >> > > > Le 21/10/2019 à 15:46, David Li a écrit : >> > > >> Oneof doesn't actually change the wire encoding; it would just be >> > > >> application-level logic. (The official guide doesn't even mention >> > > >> it >> > > >> in the encoding docs; I found >> > > >> >> > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct >> > > >> as well.) >> > > >> >> > > >> If I follow you, Jacques, then you are proposing essentially >> > > >> inlining >> > > >> the definition of Any, e.g. >> > > >> >> > > >> message FlightMessage { >> > > >> oneof message { >> > > >> FlightData data = 1; >> > > >> FlightAny metadata = 2; >> > > >> } >> > > >> } >> > > >> >> > > >> message FlightAny { >> > > >> string type = 1; >> > > >> bytes data = 2; >> > > >> } >> > > >> >> > > >> Is this correct? >> > > >> >> > > >> It might be nice to consider the wrapper message for DoGet/DoPut >> > > >> as >> > > >> well, but at that point, I'd rather we be consistent with all of >> > > >> them, >> > > >> rather than have one of the three methods do its own thing. >> > > >> >> > > >> Thanks, >> > > >> David >> > > >> >> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote: >> > > >>> I think we could probably expose the oneof behavior without >> > > >>> exposing >> > the >> > > >>> protobuf functions. On the any... hmm. I guess we could expose as >> > > >>> two >> > > >>> fields: type and data. Then users could use it for whatever but >> > > >>> if >> > > >>> people >> > > >>> wanted to treat it as any, it would work. (Basically a user could >> > > >>> use >> > > >>> any >> > > >>> with it easily but they could also use any other mechanism). At >> > least in >> > > >>> java, the any concepts are pretty simple/diy. Are other language >> > > >>> bindings >> > > >>> less diy? >> > > >>> >> > > >>> I'm *not* hardcore against the empty FlightData + metadata but it >> > just >> > > >>> seemed a bit janky. >> > > >>> >> > > >>> Thinking about the control message/wrapper object thing, I wonder >> > > >>> if >> > we >> > > >>> should redefine DoPut and DoGet to have the same property if we >> > think it >> > > >>> is >> > > >>> a good idea... >> > > >>> >> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <li.david...@gmail.com> >> > wrote: >> > > >>> >> > > >>>> I was definitely considering having control messages without >> > > >>>> data, >> > and >> > > >>>> I thought that could be encoded by a FlightData with only >> > app_metadata >> > > >>>> set. I think I understand your position now: FlightData should >> > always >> > > >>>> carry (some) data (with optional metadata)? >> > > >>>> >> > > >>>> That makes sense to me, and is consistent with the documentation >> > > >>>> on >> > > >>>> FlightData in the Protobuf file. I was worried about having a >> > > >>>> redundant metadata field, but oneof prevents that from >> > > >>>> happening, >> > and >> > > >>>> overall having a clear separation between data and control >> > > >>>> messages >> > is >> > > >>>> cleaner. >> > > >>>> >> > > >>>> As for using Protobuf's Any: so far, we've refrained from >> > > >>>> exposing >> > > >>>> Protobuf by using bytes, would we want to change that now? >> > > >>>> >> > > >>>> Best, >> > > >>>> David >> > > >>>> >> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote: >> > > >>>>> Hey David, >> > > >>>>> >> > > >>>>> RE: Async: I was trying to match the pattern we use for >> > > >>>>> doget/doput >> > > >>>>> for >> > > >>>>> async. Yes, more thinking java given java grpc's async always >> > pattern. >> > > >>>>> >> > > >>>>> On the comment around the FlightData, I think it is overloading >> > > >>>>> the >> > > >>>> message >> > > >>>>> to use metadata for this. If I want to send a control message >> > > >>>> independently >> > > >>>>> of the data message, I would have to define something like an >> > > >>>>> empty >> > > >>>> flight >> > > >>>>> data message that has custom metadata. Why not support a >> > > >>>>> container >> > > >>>>> object >> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add >> > > >>>>> more >> > data >> > > >>>>> as >> > > >>>>> desired. The default impl could be a noop for the Any messages. >> > > >>>>> >> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li >> > > >>>>> <li.david...@gmail.com> >> > > >>>>> wrote: >> > > >>>>> >> > > >>>>>> Hi Jacques, >> > > >>>>>> >> > > >>>>>> Thanks for the comments. >> > > >>>>>> >> > > >>>>>> - I do agree DoExchange is a better name! >> > > >>>>>> - FlightData already has metadata fields as a result of prior >> > > >>>>>> proposals, so I don't think we need a new message to carry >> > > >>>>>> that >> > kind >> > > >>>>>> of information. >> > > >>>>>> - I like the suggestion of an async handler to handle incoming >> > > >>>>>> messages as the fundamental API; it would actually be quite >> > natural >> > > >>>>>> to >> > > >>>>>> implement in Flight/Java. I will note that it's not possible >> > > >>>>>> in >> > > >>>>>> C++/Python without spawning a thread, though. (In essence, >> > gRPC-Java >> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are >> > experimental >> > > >>>>>> C++ APIs that would let us do something similar to Java, but >> > > >>>>>> those >> > > >>>>>> are >> > > >>>>>> only in relatively recent gRPC versions and are still under >> > > >>>>>> development (contrary to the interceptor APIs which have been >> > around >> > > >>>>>> for quite a while). >> > > >>>>>> >> > > >>>>>> Thanks, >> > > >>>>>> David >> > > >>>>>> >> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote: >> > > >>>>>>> I like it. Added some comments to the doc. Might worth >> > > >>>>>>> discussion >> > > >>>>>>> here >> > > >>>>>>> depending on your thoughts. >> > > >>>>>>> >> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li >> > > >>>>>>> <li.david...@gmail.com> >> > > >>>> wrote: >> > > >>>>>>> >> > > >>>>>>>> Hey Ryan, >> > > >>>>>>>> >> > > >>>>>>>> Thanks for the comments. >> > > >>>>>>>> >> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python >> > strawman. >> > > >>>>>>>> >> > > >>>>>>>> Sync vs async: while I don't touch on it, you could >> > > >>>>>>>> interleave >> > > >>>> uploads >> > > >>>>>>>> and downloads if you were so inclined. Right now, >> > > >>>>>>>> synchronous >> > APIs >> > > >>>>>>>> make this error-prone, e.g. if both client and server wait >> > > >>>>>>>> for >> > each >> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give us >> > > >>>>>>>> the >> > > >>>>>>>> ability to have per-read timeouts, only an overall timeout.) >> > > >>>>>>>> As >> > an >> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063: >> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063 >> > > >>>>>>>> >> > > >>>>>>>> This is mostly tangential though, eventually we will want to >> > design >> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional >> > > >>>>>>>> stream >> > > >>>>>>>> like >> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to >> > > >>>>>>>> run >> > into. >> > > >>>>>>>> >> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but the >> > main >> > > >>>>>>>> concern is that depending on how you deploy, two separate >> > > >>>>>>>> calls >> > > >>>>>>>> could >> > > >>>>>>>> get routed to different instances. Additionally, gRPC has >> > > >>>>>>>> some >> > > >>>>>>>> reconnection behaviors; if the server goes away in between >> > > >>>>>>>> the >> > two >> > > >>>>>>>> calls, but it then restarts or there is another instance >> > available, >> > > >>>>>>>> the client will happily reconnect to the new server without >> > > >>>>>>>> warning. >> > > >>>>>>>> >> > > >>>>>>>> Thanks, >> > > >>>>>>>> David >> > > >>>>>>>> >> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote: >> > > >>>>>>>>> Hey David, >> > > >>>>>>>>> >> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and >> > > >>>>>>>>> the >> > > >>>>>>>>> possibility >> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would >> > > >>>>>>>>> help >> > me >> > > >>>>>> would >> > > >>>>>>>> be >> > > >>>>>>>>> a concrete example of the API in a real life use case. >> > > >>>>>>>>> Also, >> > what >> > > >>>>>> would >> > > >>>>>>>> the >> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the >> > > >>>>>>>>> client >> > > >>>>>>>>> block >> > > >>>>>>>> till >> > > >>>>>>>>> the bidirectional call return ie c = flight.vector_mult(a, >> > > >>>>>>>>> b) >> > or >> > > >>>>>>>>> would >> > > >>>>>>>> the >> > > >>>>>>>>> client wait to be signaled that computation was done. If >> > > >>>>>>>>> the >> > > >>>>>>>>> later >> > > >>>>>>>>> how >> > > >>>>>>>>> is >> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that this >> > could >> > > >>>> be >> > > >>>>>>>>> implemented without extending the RPC interface but rather >> > > >>>>>>>>> by a >> > > >>>>>>>>> function/util? >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> Best, >> > > >>>>>>>>> >> > > >>>>>>>>> Ryan >> > > >>>>>>>>> >> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li < >> > li.david...@gmail.com> >> > > >>>>>> wrote: >> > > >>>>>>>>> >> > > >>>>>>>>>> Hi all, >> > > >>>>>>>>>> >> > > >>>>>>>>>> We've been using Flight quite successfully so far, but we >> > > >>>>>>>>>> have >> > > >>>>>>>>>> identified a new use case on the horizon: being able to >> > > >>>>>>>>>> both >> > > >>>>>>>>>> send >> > > >>>>>>>>>> and >> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that end, >> > I've >> > > >>>>>>>>>> written up a proposal for a new RPC method: >> > > >>>>>>>>>> >> > > >>>>>>>>>> >> > > >>>>>>>> >> > > >>>>>> >> > > >>>> >> > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing >> > > >>>>>>>>>> >> > > >>>>>>>>>> Please let me know if you can't view or comment on the >> > document. >> > > >>>>>>>>>> I'd >> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively >> > > >>>>>>>>>> straightforward >> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet". >> > > >>>>>>>>>> >> > > >>>>>>>>>> This is a format change and would require a vote. I've >> > > >>>>>>>>>> decided >> > > >>>>>>>>>> to >> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), as >> > > >>>>>>>>>> it >> > > >>>>>> doesn't >> > > >>>>>>>>>> functionally change Flight, just the interpretation of the >> > > >>>>>>>>>> semantics. >> > > >>>>>>>>>> >> > > >>>>>>>>>> Thanks, >> > > >>>>>>>>>> David >> > > >>>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> >> > > >>>>>>>>> -- >> > > >>>>>>>>> >> > > >>>>>>>>> Ryan Murray | Principal Consulting Engineer >> > > >>>>>>>>> >> > > >>>>>>>>> +447540852009 | rym...@dremio.com >> > > >>>>>>>>> >> > > >>>>>>>>> <https://www.dremio.com/> >> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, join >> > > >>>>>>>>> our >> > > >>>>>>>>> community >> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio >> > > >>>>>>>>> <https://www.dremio.com/download> >> > > >>>>>>>>> >> > > >>>>>>>> >> > > >>>>>>> >> > > >>>>>> >> > > >>>>> >> > > >>>> >> > > >>> >> > > > >> > >