Following up here, I've submitted a draft implementation for C++: https://github.com/apache/arrow/pull/6656
The core functionality is there, but there are still holes that I need to implement. Compared to the draft spec, the client also sends a FlightDescriptor to begin with, though it's currently not exposed. This provides consistency with DoGet/DoPut which also send a message to begin with to describe the stream to the server. Andy, I hope this helps clarify whether it meets your needs. Best, David On 2/25/20, David Li <li.david...@gmail.com> wrote: > Hey Andy, > > I've been rather busy unfortunately. I had started on an > implementation in C++ to provide as part of this discussion, but it's > not complete. I'm hoping to have more done in March. > > Best, > David > > On 2/25/20, Andy Grove <andygrov...@gmail.com> wrote: >> I was wondering if there had been any momentum on this (the BiDirectional >> RPC design)? >> >> I'm interested in this for the use case of Apache Spark sending a stream >> of >> data to another process to invoke custom code and then receive a stream >> back with the transformed data. >> >> Thanks, >> >> Andy. >> >> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <jacq...@apache.org> >> wrote: >> >>> I support moving forward with the current proposal. >>> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li.david...@gmail.com> wrote: >>> >>> > Just following up here again, any other thoughts? >>> > >>> > I think we do have justifications for potentially separate streams in >>> > a call, but that's more of an orthogonal question - it doesn't need to >>> > be addressed here. I do agree that it very much complicates things. >>> > >>> > Thanks, >>> > David >>> > >>> > On 11/29/19, Wes McKinney <wesmck...@gmail.com> wrote: >>> > > I would generally agree with this. Note that you have the >>> > > possibility >>> > > to use unions-of-structs to send record batches with different >>> > > schemas >>> > > in the same stream, though with some added complexity on each side >>> > > >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau <jacq...@apache.org> >>> > wrote: >>> > >> >>> > >> I'd vote for explicitly not supported. We should keep our >>> > >> primitives >>> > >> narrow. >>> > >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li.david...@gmail.com> >>> > >> wrote: >>> > >> >>> > >> > Thanks for the feedback. >>> > >> > >>> > >> > I do think if we had explicitly embraced gRPC from the beginning, >>> > >> > there are a lot of places where things could be made more >>> > >> > ergonomic, >>> > >> > including with the metadata fields. But it would also have locked >>> out >>> > >> > us of potential future transports. >>> > >> > >>> > >> > On another note: I hesitate to put too much into this method, but >>> > >> > we >>> > >> > are looking at use cases where potentially, a client may want to >>> > >> > upload multiple distinct datasets (with differing schemas). (This >>> is a >>> > >> > little tentative, and I can get more details...) Right now, each >>> > >> > logical stream in Flight must have a single, consistent schema; >>> would >>> > >> > it make sense to look at ways to relax this, or declare this >>> > >> > explicitly out of scope (and require multiple calls and >>> > >> > coordination >>> > >> > with the deployment topology) in order to accomplish this? >>> > >> > >>> > >> > Best, >>> > >> > David >>> > >> > >>> > >> > On 11/27/19, Jacques Nadeau <jacq...@apache.org> wrote: >>> > >> > > Fair enough. I'm okay with the bytes approach and the proposal >>> looks >>> > >> > > good >>> > >> > > to me. >>> > >> > > >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li >>> > >> > > <li.david...@gmail.com> >>> > >> > > wrote: >>> > >> > > >>> > >> > >> I've updated the proposal. >>> > >> > >> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle >>> > >> > >> errors/metadata, I still think using bytes is preferable: >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping >>> Protobuf >>> > >> > types, >>> > >> > >> - We wouldn't be able to practically expose the Protobuf field >>> > >> > >> to >>> > >> > >> C++ >>> > >> > >> users without causing build pains, >>> > >> > >> - We can't let Python users take advantage of the Protobuf >>> > >> > >> field >>> > >> > >> without somehow being compatible with the Protobuf wheels (by >>> > >> > >> linking >>> > >> > >> to the same version, and doing magic to turn the C++ Protobufs >>> into >>> > >> > >> the Python ones), >>> > >> > >> - All our other application-defined fields are already bytes. >>> > >> > >> >>> > >> > >> Applications that want structure can encode JSON or Protobuf >>> > >> > >> Any >>> > >> > >> into >>> > >> > >> the bytes field themselves, much as you can already do for >>> Ticket, >>> > >> > >> commands in FlightDescriptors, and application metadata in >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient than >>> using >>> > >> > >> Any directly, since Any itself is a bytes field with a tag, >>> > >> > >> and >>> > must >>> > >> > >> invoke the Protobuf deserializer again to read the actual >>> message. >>> > >> > >> >>> > >> > >> If we decide on using bytes, then I don't think it makes sense >>> > >> > >> to >>> > >> > >> define a new message with a oneof either, since it would be >>> > >> > >> redundant. >>> > >> > >> >>> > >> > >> Thanks, >>> > >> > >> David >>> > >> > >> >>> > >> > >> On 11/7/19, David Li <li.david...@gmail.com> wrote: >>> > >> > >> > I've been extremely backlogged, I will update the proposal >>> when I >>> > >> > >> > get >>> > >> > >> > a chance and reply here when done. >>> > >> > >> > >>> > >> > >> > Best, >>> > >> > >> > David >>> > >> > >> > >>> > >> > >> > On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote: >>> > >> > >> >> Bumping this discussion since a couple of weeks have >>> > >> > >> >> passed. >>> It >>> > >> > >> >> seems >>> > >> > >> >> there are still some questions here, could we summarize >>> > >> > >> >> what >>> are >>> > >> > >> >> the >>> > >> > >> >> alternatives along with any public API implications so we >>> > >> > >> >> can >>> > try >>> > >> > >> >> to >>> > >> > >> >> render a decision? >>> > >> > >> >> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li < >>> li.david...@gmail.com >>> > > >>> > >> > >> >> wrote: >>> > >> > >> >>> >>> > >> > >> >>> Hi Wes, >>> > >> > >> >>> >>> > >> > >> >>> Responses inline: >>> > >> > >> >>> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney < >>> wesmck...@gmail.com> >>> > >> > wrote: >>> > >> > >> >>> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li >>> > >> > >> >>> > <li.david...@gmail.com> >>> > >> > >> >>> > wrote: >>> > >> > >> >>> > > >>> > >> > >> >>> > > The question is whether to repurpose the existing >>> > FlightData >>> > >> > >> >>> > > structure, and allow for the metadata field to be >>> > >> > >> >>> > > filled >>> in >>> > >> > >> >>> > > and >>> > >> > >> data >>> > >> > >> >>> > > fields to be blank (as a control message), or to wrap >>> > >> > >> >>> > > the >>> > >> > >> FlightData >>> > >> > >> >>> > > structure in another structure that explicitly >>> > distinguishes >>> > >> > >> between >>> > >> > >> >>> > > control and data messages. >>> > >> > >> >>> > >>> > >> > >> >>> > I'm not super against having metadata-only FlightData >>> > >> > >> >>> > with >>> > >> > >> >>> > empty >>> > >> > >> body. >>> > >> > >> >>> > One question to consider is what changes (if any) would >>> need >>> > to >>> > >> > >> >>> > be >>> > >> > >> >>> > made to public APIs in either scenario. >>> > >> > >> >>> > >>> > >> > >> >>> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow empty >>> data >>> > >> > >> >>> messages >>> > >> > >> >>> in >>> > >> > >> >>> the future. This would be a breaking change, but wouldn't >>> > change >>> > >> > >> >>> the >>> > >> > >> >>> wire >>> > >> > >> >>> format. I think the APIs could be changed backwards >>> compatibly, >>> > >> > >> >>> though. >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> > > The other question is how to handle the metadata >>> > >> > >> >>> > > fields. >>> So >>> > >> > >> >>> > > far, >>> > >> > >> >>> > > we've >>> > >> > >> >>> > > used bytestring fields for application-defined data. >>> > >> > >> >>> > > This >>> > is >>> > >> > >> >>> > > workable >>> > >> > >> >>> > > if you want to use Protobuf to define the contents of >>> those >>> > >> > >> >>> > > fields, >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf >>> > >> > >> >>> > > into/from >>> the >>> > >> > >> >>> > > bytestring >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a >>> > >> > >> >>> > > dynamically >>> > >> > >> >>> > > typed >>> > >> > >> >>> > > field, this would be more convenient, but then we'd be >>> > >> > >> >>> > > exposing >>> > >> > >> >>> > > Protobuf types. We could alternatively use a >>> > >> > >> >>> > > combination >>> of >>> > >> > >> >>> > > a >>> > >> > >> >>> > > type >>> > >> > >> >>> > > field and a bytestring field, mimicking what the >>> > >> > >> >>> > > Protobuf >>> > >> > >> >>> > > Any >>> > >> > >> >>> > > type >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is actually >>> > cleaner >>> > >> > >> >>> > > in >>> > >> > >> any >>> > >> > >> >>> > > of the language APIs, though. >>> > >> > >> >>> > >>> > >> > >> >>> > Leaving the deserialization of the app metadata to the >>> > >> > >> >>> > particular >>> > >> > >> >>> > Flight implementation seems on first principles like the >>> most >>> > >> > >> flexible >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata >>> > >> > >> >>> > _must_ >>> be >>> > a >>> > >> > >> >>> > protobuf? >>> > >> > >> >>> > >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based API, >>> > >> > >> >>> but >>> it >>> > >> > would >>> > >> > >> >>> have >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.) >>> > >> > >> >>> Then >>> > the >>> > >> > >> >>> question >>> > >> > >> >>> would just be how to expose this (would be easier in Java, >>> > harder >>> > >> > >> >>> in >>> > >> > >> >>> C++). >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> >>> > >> > >> >>> > > David >>> > >> > >> >>> > > >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> >>> > >> > >> >>> > > wrote: >>> > >> > >> >>> > > > >>> > >> > >> >>> > > > Can one of you explain what is being proposed in >>> > >> > >> >>> > > > non-protobuf >>> > >> > >> >>> > > > terms? >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to use >>> > Flight. >>> > >> > >> >>> > > > >>> > >> > >> >>> > > > Regards >>> > >> > >> >>> > > > >>> > >> > >> >>> > > > Antoine. >>> > >> > >> >>> > > > >>> > >> > >> >>> > > > >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit : >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; it >>> > would >>> > >> > just >>> > >> > >> be >>> > >> > >> >>> > > >> application-level logic. (The official guide >>> > >> > >> >>> > > >> doesn't >>> > even >>> > >> > >> mention >>> > >> > >> >>> > > >> it >>> > >> > >> >>> > > >> in the encoding docs; I found >>> > >> > >> >>> > > >> >>> > >> > >> >>> > >>> > >> > >> >>> > >> > >>> > >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct >>> > >> > >> >>> > > >> as well.) >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing >>> > >> > >> >>> > > >> essentially >>> > >> > >> >>> > > >> inlining >>> > >> > >> >>> > > >> the definition of Any, e.g. >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> message FlightMessage { >>> > >> > >> >>> > > >> oneof message { >>> > >> > >> >>> > > >> FlightData data = 1; >>> > >> > >> >>> > > >> FlightAny metadata = 2; >>> > >> > >> >>> > > >> } >>> > >> > >> >>> > > >> } >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> message FlightAny { >>> > >> > >> >>> > > >> string type = 1; >>> > >> > >> >>> > > >> bytes data = 2; >>> > >> > >> >>> > > >> } >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> Is this correct? >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message >>> > >> > >> >>> > > >> for >>> > >> > >> >>> > > >> DoGet/DoPut >>> > >> > >> >>> > > >> as >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be >>> > >> > >> >>> > > >> consistent >>> > with >>> > >> > >> >>> > > >> all >>> > >> > >> >>> > > >> of >>> > >> > >> >>> > > >> them, >>> > >> > >> >>> > > >> rather than have one of the three methods do its >>> > >> > >> >>> > > >> own >>> > >> > >> >>> > > >> thing. >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> Thanks, >>> > >> > >> >>> > > >> David >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> >>> wrote: >>> > >> > >> >>> > > >>> I think we could probably expose the oneof >>> > >> > >> >>> > > >>> behavior >>> > >> > >> >>> > > >>> without >>> > >> > >> >>> > > >>> exposing >>> > >> > >> >>> > the >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess we >>> could >>> > >> > >> >>> > > >>> expose >>> > >> > >> >>> > > >>> as >>> > >> > >> >>> > > >>> two >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it for >>> > >> > >> >>> > > >>> whatever >>> > >> > >> >>> > > >>> but >>> > >> > >> >>> > > >>> if >>> > >> > >> >>> > > >>> people >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work. >>> > >> > >> >>> > > >>> (Basically >>> a >>> > >> > >> >>> > > >>> user >>> > >> > >> >>> > > >>> could >>> > >> > >> >>> > > >>> use >>> > >> > >> >>> > > >>> any >>> > >> > >> >>> > > >>> with it easily but they could also use any other >>> > >> > >> >>> > > >>> mechanism). >>> > >> > >> >>> > > >>> At >>> > >> > >> >>> > least in >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. Are >>> other >>> > >> > >> language >>> > >> > >> >>> > > >>> bindings >>> > >> > >> >>> > > >>> less diy? >>> > >> > >> >>> > > >>> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData + >>> > >> > >> >>> > > >>> metadata >>> > >> > >> >>> > > >>> but >>> > >> > >> >>> > > >>> it >>> > >> > >> >>> > just >>> > >> > >> >>> > > >>> seemed a bit janky. >>> > >> > >> >>> > > >>> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper object >>> > thing, >>> > >> > >> >>> > > >>> I >>> > >> > >> >>> > > >>> wonder >>> > >> > >> >>> > > >>> if >>> > >> > >> >>> > we >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the same >>> > property >>> > >> > >> >>> > > >>> if >>> > >> > >> >>> > > >>> we >>> > >> > >> >>> > think it >>> > >> > >> >>> > > >>> is >>> > >> > >> >>> > > >>> a good idea... >>> > >> > >> >>> > > >>> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li < >>> > >> > >> li.david...@gmail.com> >>> > >> > >> >>> > wrote: >>> > >> > >> >>> > > >>> >>> > >> > >> >>> > > >>>> I was definitely considering having control >>> > >> > >> >>> > > >>>> messages >>> > >> > without >>> > >> > >> >>> > > >>>> data, >>> > >> > >> >>> > and >>> > >> > >> >>> > > >>>> I thought that could be encoded by a FlightData >>> > >> > >> >>> > > >>>> with >>> > >> > >> >>> > > >>>> only >>> > >> > >> >>> > app_metadata >>> > >> > >> >>> > > >>>> set. I think I understand your position now: >>> > FlightData >>> > >> > >> >>> > > >>>> should >>> > >> > >> >>> > always >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)? >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with >>> > >> > >> >>> > > >>>> the >>> > >> > >> >>> > > >>>> documentation >>> > >> > >> >>> > > >>>> on >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried >>> > >> > >> >>> > > >>>> about >>> > >> > >> >>> > > >>>> having >>> > >> > >> >>> > > >>>> a >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents that >>> from >>> > >> > >> >>> > > >>>> happening, >>> > >> > >> >>> > and >>> > >> > >> >>> > > >>>> overall having a clear separation between data >>> > >> > >> >>> > > >>>> and >>> > >> > >> >>> > > >>>> control >>> > >> > >> >>> > > >>>> messages >>> > >> > >> >>> > is >>> > >> > >> >>> > > >>>> cleaner. >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've >>> > >> > >> >>> > > >>>> refrained >>> > >> > >> >>> > > >>>> from >>> > >> > >> >>> > > >>>> exposing >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to change >>> that >>> > >> > >> >>> > > >>>> now? >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > > >>>> Best, >>> > >> > >> >>> > > >>>> David >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> >>> > wrote: >>> > >> > >> >>> > > >>>>> Hey David, >>> > >> > >> >>> > > >>>>> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern we >>> > >> > >> >>> > > >>>>> use >>> > >> > >> >>> > > >>>>> for >>> > >> > >> >>> > > >>>>> doget/doput >>> > >> > >> >>> > > >>>>> for >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java grpc's >>> > async >>> > >> > >> >>> > > >>>>> always >>> > >> > >> >>> > pattern. >>> > >> > >> >>> > > >>>>> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think it >>> > >> > >> >>> > > >>>>> is >>> > >> > >> >>> > > >>>>> overloading >>> > >> > >> >>> > > >>>>> the >>> > >> > >> >>> > > >>>> message >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a >>> control >>> > >> > >> >>> > > >>>>> message >>> > >> > >> >>> > > >>>> independently >>> > >> > >> >>> > > >>>>> of the data message, I would have to define >>> something >>> > >> > >> >>> > > >>>>> like >>> > >> > >> >>> > > >>>>> an >>> > >> > >> >>> > > >>>>> empty >>> > >> > >> >>> > > >>>> flight >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why not >>> > support >>> > >> > >> >>> > > >>>>> a >>> > >> > >> >>> > > >>>>> container >>> > >> > >> >>> > > >>>>> object >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so >>> users >>> > >> > >> >>> > > >>>>> can >>> > >> > >> >>> > > >>>>> add >>> > >> > >> >>> > > >>>>> more >>> > >> > >> >>> > data >>> > >> > >> >>> > > >>>>> as >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for >>> > >> > >> >>> > > >>>>> the >>> Any >>> > >> > >> >>> > > >>>>> messages. >>> > >> > >> >>> > > >>>>> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li >>> > >> > >> >>> > > >>>>> <li.david...@gmail.com> >>> > >> > >> >>> > > >>>>> wrote: >>> > >> > >> >>> > > >>>>> >>> > >> > >> >>> > > >>>>>> Hi Jacques, >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>>>> Thanks for the comments. >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name! >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as a >>> result >>> > >> > >> >>> > > >>>>>> of >>> > >> > >> prior >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new >>> > >> > >> >>> > > >>>>>> message >>> to >>> > >> > carry >>> > >> > >> >>> > > >>>>>> that >>> > >> > >> >>> > kind >>> > >> > >> >>> > > >>>>>> of information. >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler to >>> > handle >>> > >> > >> >>> > > >>>>>> incoming >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would >>> > >> > >> >>> > > >>>>>> actually >>> > be >>> > >> > >> >>> > > >>>>>> quite >>> > >> > >> >>> > natural >>> > >> > >> >>> > > >>>>>> to >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that it's >>> not >>> > >> > >> >>> > > >>>>>> possible >>> > >> > >> >>> > > >>>>>> in >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, though. >>> > >> > >> >>> > > >>>>>> (In >>> > >> > essence, >>> > >> > >> >>> > gRPC-Java >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) >>> There >>> > >> > >> >>> > > >>>>>> are >>> > >> > >> >>> > experimental >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something similar >>> > >> > >> >>> > > >>>>>> to >>> > >> > >> >>> > > >>>>>> Java, >>> > >> > >> >>> > > >>>>>> but >>> > >> > >> >>> > > >>>>>> those >>> > >> > >> >>> > > >>>>>> are >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and are >>> > still >>> > >> > >> >>> > > >>>>>> under >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor APIs >>> which >>> > >> > >> >>> > > >>>>>> have >>> > >> > >> been >>> > >> > >> >>> > around >>> > >> > >> >>> > > >>>>>> for quite a while). >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>>>> Thanks, >>> > >> > >> >>> > > >>>>>> David >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau >>> > >> > >> >>> > > >>>>>> <jacq...@apache.org> >>> > >> > >> >>> > > >>>>>> wrote: >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. >>> > >> > >> >>> > > >>>>>>> Might >>> > >> > >> >>> > > >>>>>>> worth >>> > >> > >> >>> > > >>>>>>> discussion >>> > >> > >> >>> > > >>>>>>> here >>> > >> > >> >>> > > >>>>>>> depending on your thoughts. >>> > >> > >> >>> > > >>>>>>> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li >>> > >> > >> >>> > > >>>>>>> <li.david...@gmail.com> >>> > >> > >> >>> > > >>>> wrote: >>> > >> > >> >>> > > >>>>>>> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan, >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments. >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to >>> provide a >>> > >> > >> >>> > > >>>>>>>> Python >>> > >> > >> >>> > strawman. >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you >>> > could >>> > >> > >> >>> > > >>>>>>>> interleave >>> > >> > >> >>> > > >>>> uploads >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. Right >>> now, >>> > >> > >> >>> > > >>>>>>>> synchronous >>> > >> > >> >>> > APIs >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client >>> > >> > >> >>> > > >>>>>>>> and >>> > >> > >> >>> > > >>>>>>>> server >>> > >> > >> >>> > > >>>>>>>> wait >>> > >> > >> >>> > > >>>>>>>> for >>> > >> > >> >>> > each >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. (gRPC >>> > >> > >> >>> > > >>>>>>>> doesn't >>> > >> > >> >>> > > >>>>>>>> give >>> > >> > >> >>> > > >>>>>>>> us >>> > >> > >> >>> > > >>>>>>>> the >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an >>> overall >>> > >> > >> >>> > > >>>>>>>> timeout.) >>> > >> > >> >>> > > >>>>>>>> As >>> > >> > >> >>> > an >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see >>> > >> > >> >>> > > >>>>>>>> ARROW-6063: >>> > >> > >> >>> > > >>>>>>>> >>> https://issues.apache.org/jira/browse/ARROW-6063 >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, eventually >>> > >> > >> >>> > > >>>>>>>> we >>> > >> > >> >>> > > >>>>>>>> will >>> > >> > >> >>> > > >>>>>>>> want >>> > >> > >> >>> > > >>>>>>>> to >>> > >> > >> >>> > design >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A >>> > >> > bidirectional >>> > >> > >> >>> > > >>>>>>>> stream >>> > >> > >> >>> > > >>>>>>>> like >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these >>> > >> > >> >>> > > >>>>>>>> pitfalls >>> > >> > >> >>> > > >>>>>>>> easier >>> > >> > >> >>> > > >>>>>>>> to >>> > >> > >> >>> > > >>>>>>>> run >>> > >> > >> >>> > into. >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the >>> > >> > >> >>> > > >>>>>>>> proposal, >>> > >> > but >>> > >> > >> >>> > > >>>>>>>> the >>> > >> > >> >>> > main >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you deploy, >>> > >> > >> >>> > > >>>>>>>> two >>> > >> > >> >>> > > >>>>>>>> separate >>> > >> > >> >>> > > >>>>>>>> calls >>> > >> > >> >>> > > >>>>>>>> could >>> > >> > >> >>> > > >>>>>>>> get routed to different instances. >>> > >> > >> >>> > > >>>>>>>> Additionally, >>> > >> > >> >>> > > >>>>>>>> gRPC >>> > >> > >> >>> > > >>>>>>>> has >>> > >> > >> >>> > > >>>>>>>> some >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes >>> > >> > >> >>> > > >>>>>>>> away >>> in >>> > >> > >> >>> > > >>>>>>>> between >>> > >> > >> >>> > > >>>>>>>> the >>> > >> > >> >>> > two >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is >>> > >> > >> >>> > > >>>>>>>> another >>> > >> > instance >>> > >> > >> >>> > available, >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the new >>> > server >>> > >> > >> without >>> > >> > >> >>> > > >>>>>>>> warning. >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> Thanks, >>> > >> > >> >>> > > >>>>>>>> David >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> >>> > wrote: >>> > >> > >> >>> > > >>>>>>>>> Hey David, >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. >>> > >> > >> >>> > > >>>>>>>>> I >>> > like >>> > >> > >> >>> > > >>>>>>>>> it >>> > >> > >> >>> > > >>>>>>>>> and >>> > >> > >> >>> > > >>>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>> possibility >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One >>> > >> > >> >>> > > >>>>>>>>> thing >>> > >> > >> >>> > > >>>>>>>>> that >>> > >> > >> >>> > > >>>>>>>>> would >>> > >> > >> >>> > > >>>>>>>>> help >>> > >> > >> >>> > me >>> > >> > >> >>> > > >>>>>> would >>> > >> > >> >>> > > >>>>>>>> be >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real life >>> use >>> > >> > >> >>> > > >>>>>>>>> case. >>> > >> > >> >>> > > >>>>>>>>> Also, >>> > >> > >> >>> > what >>> > >> > >> >>> > > >>>>>> would >>> > >> > >> >>> > > >>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs >>> > >> > >> >>> > > >>>>>>>>> asyc? >>> > >> > >> >>> > > >>>>>>>>> Would >>> > >> > >> >>> > > >>>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>> client >>> > >> > >> >>> > > >>>>>>>>> block >>> > >> > >> >>> > > >>>>>>>> till >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c = >>> > >> > >> flight.vector_mult(a, >>> > >> > >> >>> > > >>>>>>>>> b) >>> > >> > >> >>> > or >>> > >> > >> >>> > > >>>>>>>>> would >>> > >> > >> >>> > > >>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that computation >>> > >> > >> >>> > > >>>>>>>>> was >>> > >> > >> >>> > > >>>>>>>>> done. >>> > >> > >> >>> > > >>>>>>>>> If >>> > >> > >> >>> > > >>>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>> later >>> > >> > >> >>> > > >>>>>>>>> how >>> > >> > >> >>> > > >>>>>>>>> is >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I >>> suppose >>> > >> > >> >>> > > >>>>>>>>> that >>> > >> > >> >>> > > >>>>>>>>> this >>> > >> > >> >>> > could >>> > >> > >> >>> > > >>>> be >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC >>> > >> > >> >>> > > >>>>>>>>> interface >>> > >> > >> >>> > > >>>>>>>>> but >>> > >> > >> rather >>> > >> > >> >>> > > >>>>>>>>> by a >>> > >> > >> >>> > > >>>>>>>>> function/util? >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> Best, >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> Ryan >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li < >>> > >> > >> >>> > li.david...@gmail.com> >>> > >> > >> >>> > > >>>>>> wrote: >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> Hi all, >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite successfully >>> > >> > >> >>> > > >>>>>>>>>> so >>> > >> > >> >>> > > >>>>>>>>>> far, >>> > >> > but >>> > >> > >> we >>> > >> > >> >>> > > >>>>>>>>>> have >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the horizon: >>> being >>> > >> > >> >>> > > >>>>>>>>>> able >>> > >> > >> >>> > > >>>>>>>>>> to >>> > >> > >> >>> > > >>>>>>>>>> both >>> > >> > >> >>> > > >>>>>>>>>> send >>> > >> > >> >>> > > >>>>>>>>>> and >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC >>> > >> > >> >>> > > >>>>>>>>>> call. >>> To >>> > >> > >> >>> > > >>>>>>>>>> that >>> > >> > >> >>> > > >>>>>>>>>> end, >>> > >> > >> >>> > I've >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC method: >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > >>> > >> > >> >>> > >> > >>> > >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or >>> comment >>> > >> > >> >>> > > >>>>>>>>>> on >>> > >> > the >>> > >> > >> >>> > document. >>> > >> > >> >>> > > >>>>>>>>>> I'd >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a >>> > >> > >> >>> > > >>>>>>>>>> relatively >>> > >> > >> >>> > > >>>>>>>>>> straightforward >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet". >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require a >>> > vote. >>> > >> > I've >>> > >> > >> >>> > > >>>>>>>>>> decided >>> > >> > >> >>> > > >>>>>>>>>> to >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had >>> > >> > >> >>> > > >>>>>>>>>> proposed >>> (on >>> > >> > >> >>> > > >>>>>>>>>> DoPut), >>> > >> > >> >>> > > >>>>>>>>>> as >>> > >> > >> >>> > > >>>>>>>>>> it >>> > >> > >> >>> > > >>>>>> doesn't >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the >>> > >> > >> >>> > > >>>>>>>>>> interpretation >>> > >> > of >>> > >> > >> >>> > > >>>>>>>>>> the >>> > >> > >> >>> > > >>>>>>>>>> semantics. >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>>> Thanks, >>> > >> > >> >>> > > >>>>>>>>>> David >>> > >> > >> >>> > > >>>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> -- >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray | Principal Consulting Engineer >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rym...@dremio.com >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>, >>> > >> > >> join >>> > >> > >> >>> > > >>>>>>>>> our >>> > >> > >> >>> > > >>>>>>>>> community >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & >>> Download >>> > >> > Dremio >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download> >>> > >> > >> >>> > > >>>>>>>>> >>> > >> > >> >>> > > >>>>>>>> >>> > >> > >> >>> > > >>>>>>> >>> > >> > >> >>> > > >>>>>> >>> > >> > >> >>> > > >>>>> >>> > >> > >> >>> > > >>>> >>> > >> > >> >>> > > >>> >>> > >> > >> >>> > > > >>> > >> > >> >>> > >>> > >> > >> >> >>> > >> > >> > >>> > >> > >> >>> > >> > > >>> > >> > >>> > > >>> > >>> >> >