Hey Wes, Thanks for the review. I've broken out the format change into this PR: https://github.com/apache/arrow/pull/6686
Best, David On 3/22/20, Wes McKinney <wesmck...@gmail.com> wrote: > hi David, > > I did a preliminary view and things look to be on the right track > there. What do you think about breaking out the protocol changes (and > adding appropriate comments) so we can have a vote on that in > relatively short order? > > - Wes > > On Wed, Mar 18, 2020 at 9:06 AM David Li <li.david...@gmail.com> wrote: >> >> Following up here, I've submitted a draft implementation for C++: >> https://github.com/apache/arrow/pull/6656 >> >> The core functionality is there, but there are still holes that I need >> to implement. Compared to the draft spec, the client also sends a >> FlightDescriptor to begin with, though it's currently not exposed. >> This provides consistency with DoGet/DoPut which also send a message >> to begin with to describe the stream to the server. >> >> Andy, I hope this helps clarify whether it meets your needs. >> >> Best, >> David >> >> On 2/25/20, David Li <li.david...@gmail.com> wrote: >> > Hey Andy, >> > >> > I've been rather busy unfortunately. I had started on an >> > implementation in C++ to provide as part of this discussion, but it's >> > not complete. I'm hoping to have more done in March. >> > >> > Best, >> > David >> > >> > On 2/25/20, Andy Grove <andygrov...@gmail.com> wrote: >> >> I was wondering if there had been any momentum on this (the >> >> BiDirectional >> >> RPC design)? >> >> >> >> I'm interested in this for the use case of Apache Spark sending a >> >> stream >> >> of >> >> data to another process to invoke custom code and then receive a >> >> stream >> >> back with the transformed data. >> >> >> >> Thanks, >> >> >> >> Andy. >> >> >> >> >> >> >> >> On Fri, Dec 13, 2019 at 12:12 PM Jacques Nadeau <jacq...@apache.org> >> >> wrote: >> >> >> >>> I support moving forward with the current proposal. >> >>> >> >>> On Thu, Dec 12, 2019 at 12:20 PM David Li <li.david...@gmail.com> >> >>> wrote: >> >>> >> >>> > Just following up here again, any other thoughts? >> >>> > >> >>> > I think we do have justifications for potentially separate streams >> >>> > in >> >>> > a call, but that's more of an orthogonal question - it doesn't need >> >>> > to >> >>> > be addressed here. I do agree that it very much complicates things. >> >>> > >> >>> > Thanks, >> >>> > David >> >>> > >> >>> > On 11/29/19, Wes McKinney <wesmck...@gmail.com> wrote: >> >>> > > I would generally agree with this. Note that you have the >> >>> > > possibility >> >>> > > to use unions-of-structs to send record batches with different >> >>> > > schemas >> >>> > > in the same stream, though with some added complexity on each >> >>> > > side >> >>> > > >> >>> > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau >> >>> > > <jacq...@apache.org> >> >>> > wrote: >> >>> > >> >> >>> > >> I'd vote for explicitly not supported. We should keep our >> >>> > >> primitives >> >>> > >> narrow. >> >>> > >> >> >>> > >> On Wed, Nov 27, 2019, 1:17 PM David Li <li.david...@gmail.com> >> >>> > >> wrote: >> >>> > >> >> >>> > >> > Thanks for the feedback. >> >>> > >> > >> >>> > >> > I do think if we had explicitly embraced gRPC from the >> >>> > >> > beginning, >> >>> > >> > there are a lot of places where things could be made more >> >>> > >> > ergonomic, >> >>> > >> > including with the metadata fields. But it would also have >> >>> > >> > locked >> >>> out >> >>> > >> > us of potential future transports. >> >>> > >> > >> >>> > >> > On another note: I hesitate to put too much into this method, >> >>> > >> > but >> >>> > >> > we >> >>> > >> > are looking at use cases where potentially, a client may want >> >>> > >> > to >> >>> > >> > upload multiple distinct datasets (with differing schemas). >> >>> > >> > (This >> >>> is a >> >>> > >> > little tentative, and I can get more details...) Right now, >> >>> > >> > each >> >>> > >> > logical stream in Flight must have a single, consistent >> >>> > >> > schema; >> >>> would >> >>> > >> > it make sense to look at ways to relax this, or declare this >> >>> > >> > explicitly out of scope (and require multiple calls and >> >>> > >> > coordination >> >>> > >> > with the deployment topology) in order to accomplish this? >> >>> > >> > >> >>> > >> > Best, >> >>> > >> > David >> >>> > >> > >> >>> > >> > On 11/27/19, Jacques Nadeau <jacq...@apache.org> wrote: >> >>> > >> > > Fair enough. I'm okay with the bytes approach and the >> >>> > >> > > proposal >> >>> looks >> >>> > >> > > good >> >>> > >> > > to me. >> >>> > >> > > >> >>> > >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li >> >>> > >> > > <li.david...@gmail.com> >> >>> > >> > > wrote: >> >>> > >> > > >> >>> > >> > >> I've updated the proposal. >> >>> > >> > >> >> >>> > >> > >> On the subject of Protobuf Any vs bytes, and how to handle >> >>> > >> > >> errors/metadata, I still think using bytes is preferable: >> >>> > >> > >> - It doesn't require (conditionally) exposing or wrapping >> >>> Protobuf >> >>> > >> > types, >> >>> > >> > >> - We wouldn't be able to practically expose the Protobuf >> >>> > >> > >> field >> >>> > >> > >> to >> >>> > >> > >> C++ >> >>> > >> > >> users without causing build pains, >> >>> > >> > >> - We can't let Python users take advantage of the Protobuf >> >>> > >> > >> field >> >>> > >> > >> without somehow being compatible with the Protobuf wheels >> >>> > >> > >> (by >> >>> > >> > >> linking >> >>> > >> > >> to the same version, and doing magic to turn the C++ >> >>> > >> > >> Protobufs >> >>> into >> >>> > >> > >> the Python ones), >> >>> > >> > >> - All our other application-defined fields are already >> >>> > >> > >> bytes. >> >>> > >> > >> >> >>> > >> > >> Applications that want structure can encode JSON or >> >>> > >> > >> Protobuf >> >>> > >> > >> Any >> >>> > >> > >> into >> >>> > >> > >> the bytes field themselves, much as you can already do for >> >>> Ticket, >> >>> > >> > >> commands in FlightDescriptors, and application metadata in >> >>> > >> > >> DoGet/DoPut. I don't think this is (much) less efficient >> >>> > >> > >> than >> >>> using >> >>> > >> > >> Any directly, since Any itself is a bytes field with a tag, >> >>> > >> > >> and >> >>> > must >> >>> > >> > >> invoke the Protobuf deserializer again to read the actual >> >>> message. >> >>> > >> > >> >> >>> > >> > >> If we decide on using bytes, then I don't think it makes >> >>> > >> > >> sense >> >>> > >> > >> to >> >>> > >> > >> define a new message with a oneof either, since it would be >> >>> > >> > >> redundant. >> >>> > >> > >> >> >>> > >> > >> Thanks, >> >>> > >> > >> David >> >>> > >> > >> >> >>> > >> > >> On 11/7/19, David Li <li.david...@gmail.com> wrote: >> >>> > >> > >> > I've been extremely backlogged, I will update the >> >>> > >> > >> > proposal >> >>> when I >> >>> > >> > >> > get >> >>> > >> > >> > a chance and reply here when done. >> >>> > >> > >> > >> >>> > >> > >> > Best, >> >>> > >> > >> > David >> >>> > >> > >> > >> >>> > >> > >> > On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote: >> >>> > >> > >> >> Bumping this discussion since a couple of weeks have >> >>> > >> > >> >> passed. >> >>> It >> >>> > >> > >> >> seems >> >>> > >> > >> >> there are still some questions here, could we summarize >> >>> > >> > >> >> what >> >>> are >> >>> > >> > >> >> the >> >>> > >> > >> >> alternatives along with any public API implications so >> >>> > >> > >> >> we >> >>> > >> > >> >> can >> >>> > try >> >>> > >> > >> >> to >> >>> > >> > >> >> render a decision? >> >>> > >> > >> >> >> >>> > >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li < >> >>> li.david...@gmail.com >> >>> > > >> >>> > >> > >> >> wrote: >> >>> > >> > >> >>> >> >>> > >> > >> >>> Hi Wes, >> >>> > >> > >> >>> >> >>> > >> > >> >>> Responses inline: >> >>> > >> > >> >>> >> >>> > >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney < >> >>> wesmck...@gmail.com> >> >>> > >> > wrote: >> >>> > >> > >> >>> >> >>> > >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li >> >>> > >> > >> >>> > <li.david...@gmail.com> >> >>> > >> > >> >>> > wrote: >> >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > The question is whether to repurpose the existing >> >>> > FlightData >> >>> > >> > >> >>> > > structure, and allow for the metadata field to be >> >>> > >> > >> >>> > > filled >> >>> in >> >>> > >> > >> >>> > > and >> >>> > >> > >> data >> >>> > >> > >> >>> > > fields to be blank (as a control message), or to >> >>> > >> > >> >>> > > wrap >> >>> > >> > >> >>> > > the >> >>> > >> > >> FlightData >> >>> > >> > >> >>> > > structure in another structure that explicitly >> >>> > distinguishes >> >>> > >> > >> between >> >>> > >> > >> >>> > > control and data messages. >> >>> > >> > >> >>> > >> >>> > >> > >> >>> > I'm not super against having metadata-only FlightData >> >>> > >> > >> >>> > with >> >>> > >> > >> >>> > empty >> >>> > >> > >> body. >> >>> > >> > >> >>> > One question to consider is what changes (if any) >> >>> > >> > >> >>> > would >> >>> need >> >>> > to >> >>> > >> > >> >>> > be >> >>> > >> > >> >>> > made to public APIs in either scenario. >> >>> > >> > >> >>> > >> >>> > >> > >> >>> >> >>> > >> > >> >>> We could leave DoGet/DoPut as-is for now, and allow >> >>> > >> > >> >>> empty >> >>> data >> >>> > >> > >> >>> messages >> >>> > >> > >> >>> in >> >>> > >> > >> >>> the future. This would be a breaking change, but >> >>> > >> > >> >>> wouldn't >> >>> > change >> >>> > >> > >> >>> the >> >>> > >> > >> >>> wire >> >>> > >> > >> >>> format. I think the APIs could be changed backwards >> >>> compatibly, >> >>> > >> > >> >>> though. >> >>> > >> > >> >>> >> >>> > >> > >> >>> >> >>> > >> > >> >>> >> >>> > >> > >> >>> > > The other question is how to handle the metadata >> >>> > >> > >> >>> > > fields. >> >>> So >> >>> > >> > >> >>> > > far, >> >>> > >> > >> >>> > > we've >> >>> > >> > >> >>> > > used bytestring fields for application-defined >> >>> > >> > >> >>> > > data. >> >>> > >> > >> >>> > > This >> >>> > is >> >>> > >> > >> >>> > > workable >> >>> > >> > >> >>> > > if you want to use Protobuf to define the contents >> >>> > >> > >> >>> > > of >> >>> those >> >>> > >> > >> >>> > > fields, >> >>> > >> > >> >>> > > but requires you to pack/unpack your Protobuf >> >>> > >> > >> >>> > > into/from >> >>> the >> >>> > >> > >> >>> > > bytestring >> >>> > >> > >> >>> > > field. If we instead used the Protobuf Any field, a >> >>> > >> > >> >>> > > dynamically >> >>> > >> > >> >>> > > typed >> >>> > >> > >> >>> > > field, this would be more convenient, but then we'd >> >>> > >> > >> >>> > > be >> >>> > >> > >> >>> > > exposing >> >>> > >> > >> >>> > > Protobuf types. We could alternatively use a >> >>> > >> > >> >>> > > combination >> >>> of >> >>> > >> > >> >>> > > a >> >>> > >> > >> >>> > > type >> >>> > >> > >> >>> > > field and a bytestring field, mimicking what the >> >>> > >> > >> >>> > > Protobuf >> >>> > >> > >> >>> > > Any >> >>> > >> > >> >>> > > type >> >>> > >> > >> >>> > > looks like on the wire. I'm not sure this is >> >>> > >> > >> >>> > > actually >> >>> > cleaner >> >>> > >> > >> >>> > > in >> >>> > >> > >> any >> >>> > >> > >> >>> > > of the language APIs, though. >> >>> > >> > >> >>> > >> >>> > >> > >> >>> > Leaving the deserialization of the app metadata to >> >>> > >> > >> >>> > the >> >>> > >> > >> >>> > particular >> >>> > >> > >> >>> > Flight implementation seems on first principles like >> >>> > >> > >> >>> > the >> >>> most >> >>> > >> > >> flexible >> >>> > >> > >> >>> > thing, if Any is used, does that mean the metadata >> >>> > >> > >> >>> > _must_ >> >>> be >> >>> > a >> >>> > >> > >> >>> > protobuf? >> >>> > >> > >> >>> > >> >>> > >> > >> >>> >> >>> > >> > >> >>> >> >>> > >> > >> >>> If Any is used, we could still expose a bytes-based >> >>> > >> > >> >>> API, >> >>> > >> > >> >>> but >> >>> it >> >>> > >> > would >> >>> > >> > >> >>> have >> >>> > >> > >> >>> some more wrapping. (We could put a ByteString in Any.) >> >>> > >> > >> >>> Then >> >>> > the >> >>> > >> > >> >>> question >> >>> > >> > >> >>> would just be how to expose this (would be easier in >> >>> > >> > >> >>> Java, >> >>> > harder >> >>> > >> > >> >>> in >> >>> > >> > >> >>> C++). >> >>> > >> > >> >>> >> >>> > >> > >> >>> >> >>> > >> > >> >>> >> >>> > >> > >> >>> > > David >> >>> > >> > >> >>> > > >> >>> > >> > >> >>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> >> >>> > >> > >> >>> > > wrote: >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > > > Can one of you explain what is being proposed in >> >>> > >> > >> >>> > > > non-protobuf >> >>> > >> > >> >>> > > > terms? >> >>> > >> > >> >>> > > > Knowledge of protobuf shouldn't be required to >> >>> > >> > >> >>> > > > use >> >>> > Flight. >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > > > Regards >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > > > Antoine. >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit : >> >>> > >> > >> >>> > > >> Oneof doesn't actually change the wire encoding; >> >>> > >> > >> >>> > > >> it >> >>> > would >> >>> > >> > just >> >>> > >> > >> be >> >>> > >> > >> >>> > > >> application-level logic. (The official guide >> >>> > >> > >> >>> > > >> doesn't >> >>> > even >> >>> > >> > >> mention >> >>> > >> > >> >>> > > >> it >> >>> > >> > >> >>> > > >> in the encoding docs; I found >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > >> >>> > >> > >> >> >>> > >> > >> >>> > >> >>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct >> >>> > >> > >> >>> > > >> as well.) >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> If I follow you, Jacques, then you are proposing >> >>> > >> > >> >>> > > >> essentially >> >>> > >> > >> >>> > > >> inlining >> >>> > >> > >> >>> > > >> the definition of Any, e.g. >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> message FlightMessage { >> >>> > >> > >> >>> > > >> oneof message { >> >>> > >> > >> >>> > > >> FlightData data = 1; >> >>> > >> > >> >>> > > >> FlightAny metadata = 2; >> >>> > >> > >> >>> > > >> } >> >>> > >> > >> >>> > > >> } >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> message FlightAny { >> >>> > >> > >> >>> > > >> string type = 1; >> >>> > >> > >> >>> > > >> bytes data = 2; >> >>> > >> > >> >>> > > >> } >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> Is this correct? >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> It might be nice to consider the wrapper message >> >>> > >> > >> >>> > > >> for >> >>> > >> > >> >>> > > >> DoGet/DoPut >> >>> > >> > >> >>> > > >> as >> >>> > >> > >> >>> > > >> well, but at that point, I'd rather we be >> >>> > >> > >> >>> > > >> consistent >> >>> > with >> >>> > >> > >> >>> > > >> all >> >>> > >> > >> >>> > > >> of >> >>> > >> > >> >>> > > >> them, >> >>> > >> > >> >>> > > >> rather than have one of the three methods do its >> >>> > >> > >> >>> > > >> own >> >>> > >> > >> >>> > > >> thing. >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> Thanks, >> >>> > >> > >> >>> > > >> David >> >>> > >> > >> >>> > > >> >> >>> > >> > >> >>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> >> >>> wrote: >> >>> > >> > >> >>> > > >>> I think we could probably expose the oneof >> >>> > >> > >> >>> > > >>> behavior >> >>> > >> > >> >>> > > >>> without >> >>> > >> > >> >>> > > >>> exposing >> >>> > >> > >> >>> > the >> >>> > >> > >> >>> > > >>> protobuf functions. On the any... hmm. I guess >> >>> > >> > >> >>> > > >>> we >> >>> could >> >>> > >> > >> >>> > > >>> expose >> >>> > >> > >> >>> > > >>> as >> >>> > >> > >> >>> > > >>> two >> >>> > >> > >> >>> > > >>> fields: type and data. Then users could use it >> >>> > >> > >> >>> > > >>> for >> >>> > >> > >> >>> > > >>> whatever >> >>> > >> > >> >>> > > >>> but >> >>> > >> > >> >>> > > >>> if >> >>> > >> > >> >>> > > >>> people >> >>> > >> > >> >>> > > >>> wanted to treat it as any, it would work. >> >>> > >> > >> >>> > > >>> (Basically >> >>> a >> >>> > >> > >> >>> > > >>> user >> >>> > >> > >> >>> > > >>> could >> >>> > >> > >> >>> > > >>> use >> >>> > >> > >> >>> > > >>> any >> >>> > >> > >> >>> > > >>> with it easily but they could also use any >> >>> > >> > >> >>> > > >>> other >> >>> > >> > >> >>> > > >>> mechanism). >> >>> > >> > >> >>> > > >>> At >> >>> > >> > >> >>> > least in >> >>> > >> > >> >>> > > >>> java, the any concepts are pretty simple/diy. >> >>> > >> > >> >>> > > >>> Are >> >>> other >> >>> > >> > >> language >> >>> > >> > >> >>> > > >>> bindings >> >>> > >> > >> >>> > > >>> less diy? >> >>> > >> > >> >>> > > >>> >> >>> > >> > >> >>> > > >>> I'm *not* hardcore against the empty FlightData >> >>> > >> > >> >>> > > >>> + >> >>> > >> > >> >>> > > >>> metadata >> >>> > >> > >> >>> > > >>> but >> >>> > >> > >> >>> > > >>> it >> >>> > >> > >> >>> > just >> >>> > >> > >> >>> > > >>> seemed a bit janky. >> >>> > >> > >> >>> > > >>> >> >>> > >> > >> >>> > > >>> Thinking about the control message/wrapper >> >>> > >> > >> >>> > > >>> object >> >>> > thing, >> >>> > >> > >> >>> > > >>> I >> >>> > >> > >> >>> > > >>> wonder >> >>> > >> > >> >>> > > >>> if >> >>> > >> > >> >>> > we >> >>> > >> > >> >>> > > >>> should redefine DoPut and DoGet to have the >> >>> > >> > >> >>> > > >>> same >> >>> > property >> >>> > >> > >> >>> > > >>> if >> >>> > >> > >> >>> > > >>> we >> >>> > >> > >> >>> > think it >> >>> > >> > >> >>> > > >>> is >> >>> > >> > >> >>> > > >>> a good idea... >> >>> > >> > >> >>> > > >>> >> >>> > >> > >> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li < >> >>> > >> > >> li.david...@gmail.com> >> >>> > >> > >> >>> > wrote: >> >>> > >> > >> >>> > > >>> >> >>> > >> > >> >>> > > >>>> I was definitely considering having control >> >>> > >> > >> >>> > > >>>> messages >> >>> > >> > without >> >>> > >> > >> >>> > > >>>> data, >> >>> > >> > >> >>> > and >> >>> > >> > >> >>> > > >>>> I thought that could be encoded by a >> >>> > >> > >> >>> > > >>>> FlightData >> >>> > >> > >> >>> > > >>>> with >> >>> > >> > >> >>> > > >>>> only >> >>> > >> > >> >>> > app_metadata >> >>> > >> > >> >>> > > >>>> set. I think I understand your position now: >> >>> > FlightData >> >>> > >> > >> >>> > > >>>> should >> >>> > >> > >> >>> > always >> >>> > >> > >> >>> > > >>>> carry (some) data (with optional metadata)? >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > > >>>> That makes sense to me, and is consistent with >> >>> > >> > >> >>> > > >>>> the >> >>> > >> > >> >>> > > >>>> documentation >> >>> > >> > >> >>> > > >>>> on >> >>> > >> > >> >>> > > >>>> FlightData in the Protobuf file. I was worried >> >>> > >> > >> >>> > > >>>> about >> >>> > >> > >> >>> > > >>>> having >> >>> > >> > >> >>> > > >>>> a >> >>> > >> > >> >>> > > >>>> redundant metadata field, but oneof prevents >> >>> > >> > >> >>> > > >>>> that >> >>> from >> >>> > >> > >> >>> > > >>>> happening, >> >>> > >> > >> >>> > and >> >>> > >> > >> >>> > > >>>> overall having a clear separation between data >> >>> > >> > >> >>> > > >>>> and >> >>> > >> > >> >>> > > >>>> control >> >>> > >> > >> >>> > > >>>> messages >> >>> > >> > >> >>> > is >> >>> > >> > >> >>> > > >>>> cleaner. >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > > >>>> As for using Protobuf's Any: so far, we've >> >>> > >> > >> >>> > > >>>> refrained >> >>> > >> > >> >>> > > >>>> from >> >>> > >> > >> >>> > > >>>> exposing >> >>> > >> > >> >>> > > >>>> Protobuf by using bytes, would we want to >> >>> > >> > >> >>> > > >>>> change >> >>> that >> >>> > >> > >> >>> > > >>>> now? >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > > >>>> Best, >> >>> > >> > >> >>> > > >>>> David >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > > >>>> On 10/16/19, Jacques Nadeau >> >>> > >> > >> >>> > > >>>> <jacq...@apache.org> >> >>> > wrote: >> >>> > >> > >> >>> > > >>>>> Hey David, >> >>> > >> > >> >>> > > >>>>> >> >>> > >> > >> >>> > > >>>>> RE: Async: I was trying to match the pattern >> >>> > >> > >> >>> > > >>>>> we >> >>> > >> > >> >>> > > >>>>> use >> >>> > >> > >> >>> > > >>>>> for >> >>> > >> > >> >>> > > >>>>> doget/doput >> >>> > >> > >> >>> > > >>>>> for >> >>> > >> > >> >>> > > >>>>> async. Yes, more thinking java given java >> >>> > >> > >> >>> > > >>>>> grpc's >> >>> > async >> >>> > >> > >> >>> > > >>>>> always >> >>> > >> > >> >>> > pattern. >> >>> > >> > >> >>> > > >>>>> >> >>> > >> > >> >>> > > >>>>> On the comment around the FlightData, I think >> >>> > >> > >> >>> > > >>>>> it >> >>> > >> > >> >>> > > >>>>> is >> >>> > >> > >> >>> > > >>>>> overloading >> >>> > >> > >> >>> > > >>>>> the >> >>> > >> > >> >>> > > >>>> message >> >>> > >> > >> >>> > > >>>>> to use metadata for this. If I want to send a >> >>> control >> >>> > >> > >> >>> > > >>>>> message >> >>> > >> > >> >>> > > >>>> independently >> >>> > >> > >> >>> > > >>>>> of the data message, I would have to define >> >>> something >> >>> > >> > >> >>> > > >>>>> like >> >>> > >> > >> >>> > > >>>>> an >> >>> > >> > >> >>> > > >>>>> empty >> >>> > >> > >> >>> > > >>>> flight >> >>> > >> > >> >>> > > >>>>> data message that has custom metadata. Why >> >>> > >> > >> >>> > > >>>>> not >> >>> > support >> >>> > >> > >> >>> > > >>>>> a >> >>> > >> > >> >>> > > >>>>> container >> >>> > >> > >> >>> > > >>>>> object >> >>> > >> > >> >>> > > >>>>> with a oneof{FlightData, Any} in it instead >> >>> > >> > >> >>> > > >>>>> so >> >>> users >> >>> > >> > >> >>> > > >>>>> can >> >>> > >> > >> >>> > > >>>>> add >> >>> > >> > >> >>> > > >>>>> more >> >>> > >> > >> >>> > data >> >>> > >> > >> >>> > > >>>>> as >> >>> > >> > >> >>> > > >>>>> desired. The default impl could be a noop for >> >>> > >> > >> >>> > > >>>>> the >> >>> Any >> >>> > >> > >> >>> > > >>>>> messages. >> >>> > >> > >> >>> > > >>>>> >> >>> > >> > >> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li >> >>> > >> > >> >>> > > >>>>> <li.david...@gmail.com> >> >>> > >> > >> >>> > > >>>>> wrote: >> >>> > >> > >> >>> > > >>>>> >> >>> > >> > >> >>> > > >>>>>> Hi Jacques, >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>>>> Thanks for the comments. >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>>>> - I do agree DoExchange is a better name! >> >>> > >> > >> >>> > > >>>>>> - FlightData already has metadata fields as >> >>> > >> > >> >>> > > >>>>>> a >> >>> result >> >>> > >> > >> >>> > > >>>>>> of >> >>> > >> > >> prior >> >>> > >> > >> >>> > > >>>>>> proposals, so I don't think we need a new >> >>> > >> > >> >>> > > >>>>>> message >> >>> to >> >>> > >> > carry >> >>> > >> > >> >>> > > >>>>>> that >> >>> > >> > >> >>> > kind >> >>> > >> > >> >>> > > >>>>>> of information. >> >>> > >> > >> >>> > > >>>>>> - I like the suggestion of an async handler >> >>> > >> > >> >>> > > >>>>>> to >> >>> > handle >> >>> > >> > >> >>> > > >>>>>> incoming >> >>> > >> > >> >>> > > >>>>>> messages as the fundamental API; it would >> >>> > >> > >> >>> > > >>>>>> actually >> >>> > be >> >>> > >> > >> >>> > > >>>>>> quite >> >>> > >> > >> >>> > natural >> >>> > >> > >> >>> > > >>>>>> to >> >>> > >> > >> >>> > > >>>>>> implement in Flight/Java. I will note that >> >>> > >> > >> >>> > > >>>>>> it's >> >>> not >> >>> > >> > >> >>> > > >>>>>> possible >> >>> > >> > >> >>> > > >>>>>> in >> >>> > >> > >> >>> > > >>>>>> C++/Python without spawning a thread, >> >>> > >> > >> >>> > > >>>>>> though. >> >>> > >> > >> >>> > > >>>>>> (In >> >>> > >> > essence, >> >>> > >> > >> >>> > gRPC-Java >> >>> > >> > >> >>> > > >>>>>> is async-always and gRPC-C++ is >> >>> > >> > >> >>> > > >>>>>> sync-always.) >> >>> There >> >>> > >> > >> >>> > > >>>>>> are >> >>> > >> > >> >>> > experimental >> >>> > >> > >> >>> > > >>>>>> C++ APIs that would let us do something >> >>> > >> > >> >>> > > >>>>>> similar >> >>> > >> > >> >>> > > >>>>>> to >> >>> > >> > >> >>> > > >>>>>> Java, >> >>> > >> > >> >>> > > >>>>>> but >> >>> > >> > >> >>> > > >>>>>> those >> >>> > >> > >> >>> > > >>>>>> are >> >>> > >> > >> >>> > > >>>>>> only in relatively recent gRPC versions and >> >>> > >> > >> >>> > > >>>>>> are >> >>> > still >> >>> > >> > >> >>> > > >>>>>> under >> >>> > >> > >> >>> > > >>>>>> development (contrary to the interceptor >> >>> > >> > >> >>> > > >>>>>> APIs >> >>> which >> >>> > >> > >> >>> > > >>>>>> have >> >>> > >> > >> been >> >>> > >> > >> >>> > around >> >>> > >> > >> >>> > > >>>>>> for quite a while). >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>>>> Thanks, >> >>> > >> > >> >>> > > >>>>>> David >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>>>> On 10/15/19, Jacques Nadeau >> >>> > >> > >> >>> > > >>>>>> <jacq...@apache.org> >> >>> > >> > >> >>> > > >>>>>> wrote: >> >>> > >> > >> >>> > > >>>>>>> I like it. Added some comments to the doc. >> >>> > >> > >> >>> > > >>>>>>> Might >> >>> > >> > >> >>> > > >>>>>>> worth >> >>> > >> > >> >>> > > >>>>>>> discussion >> >>> > >> > >> >>> > > >>>>>>> here >> >>> > >> > >> >>> > > >>>>>>> depending on your thoughts. >> >>> > >> > >> >>> > > >>>>>>> >> >>> > >> > >> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li >> >>> > >> > >> >>> > > >>>>>>> <li.david...@gmail.com> >> >>> > >> > >> >>> > > >>>> wrote: >> >>> > >> > >> >>> > > >>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Hey Ryan, >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Thanks for the comments. >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Concrete example: I've edited the doc to >> >>> provide a >> >>> > >> > >> >>> > > >>>>>>>> Python >> >>> > >> > >> >>> > strawman. >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, >> >>> > >> > >> >>> > > >>>>>>>> you >> >>> > could >> >>> > >> > >> >>> > > >>>>>>>> interleave >> >>> > >> > >> >>> > > >>>> uploads >> >>> > >> > >> >>> > > >>>>>>>> and downloads if you were so inclined. >> >>> > >> > >> >>> > > >>>>>>>> Right >> >>> now, >> >>> > >> > >> >>> > > >>>>>>>> synchronous >> >>> > >> > >> >>> > APIs >> >>> > >> > >> >>> > > >>>>>>>> make this error-prone, e.g. if both client >> >>> > >> > >> >>> > > >>>>>>>> and >> >>> > >> > >> >>> > > >>>>>>>> server >> >>> > >> > >> >>> > > >>>>>>>> wait >> >>> > >> > >> >>> > > >>>>>>>> for >> >>> > >> > >> >>> > each >> >>> > >> > >> >>> > > >>>>>>>> other due to an application logic bug. >> >>> > >> > >> >>> > > >>>>>>>> (gRPC >> >>> > >> > >> >>> > > >>>>>>>> doesn't >> >>> > >> > >> >>> > > >>>>>>>> give >> >>> > >> > >> >>> > > >>>>>>>> us >> >>> > >> > >> >>> > > >>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>> ability to have per-read timeouts, only an >> >>> overall >> >>> > >> > >> >>> > > >>>>>>>> timeout.) >> >>> > >> > >> >>> > > >>>>>>>> As >> >>> > >> > >> >>> > an >> >>> > >> > >> >>> > > >>>>>>>> example of this happening with DoPut, see >> >>> > >> > >> >>> > > >>>>>>>> ARROW-6063: >> >>> > >> > >> >>> > > >>>>>>>> >> >>> https://issues.apache.org/jira/browse/ARROW-6063 >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> This is mostly tangential though, >> >>> > >> > >> >>> > > >>>>>>>> eventually >> >>> > >> > >> >>> > > >>>>>>>> we >> >>> > >> > >> >>> > > >>>>>>>> will >> >>> > >> > >> >>> > > >>>>>>>> want >> >>> > >> > >> >>> > > >>>>>>>> to >> >>> > >> > >> >>> > design >> >>> > >> > >> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A >> >>> > >> > bidirectional >> >>> > >> > >> >>> > > >>>>>>>> stream >> >>> > >> > >> >>> > > >>>>>>>> like >> >>> > >> > >> >>> > > >>>>>>>> this (and like DoPut) just makes these >> >>> > >> > >> >>> > > >>>>>>>> pitfalls >> >>> > >> > >> >>> > > >>>>>>>> easier >> >>> > >> > >> >>> > > >>>>>>>> to >> >>> > >> > >> >>> > > >>>>>>>> run >> >>> > >> > >> >>> > into. >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the >> >>> > >> > >> >>> > > >>>>>>>> proposal, >> >>> > >> > but >> >>> > >> > >> >>> > > >>>>>>>> the >> >>> > >> > >> >>> > main >> >>> > >> > >> >>> > > >>>>>>>> concern is that depending on how you >> >>> > >> > >> >>> > > >>>>>>>> deploy, >> >>> > >> > >> >>> > > >>>>>>>> two >> >>> > >> > >> >>> > > >>>>>>>> separate >> >>> > >> > >> >>> > > >>>>>>>> calls >> >>> > >> > >> >>> > > >>>>>>>> could >> >>> > >> > >> >>> > > >>>>>>>> get routed to different instances. >> >>> > >> > >> >>> > > >>>>>>>> Additionally, >> >>> > >> > >> >>> > > >>>>>>>> gRPC >> >>> > >> > >> >>> > > >>>>>>>> has >> >>> > >> > >> >>> > > >>>>>>>> some >> >>> > >> > >> >>> > > >>>>>>>> reconnection behaviors; if the server goes >> >>> > >> > >> >>> > > >>>>>>>> away >> >>> in >> >>> > >> > >> >>> > > >>>>>>>> between >> >>> > >> > >> >>> > > >>>>>>>> the >> >>> > >> > >> >>> > two >> >>> > >> > >> >>> > > >>>>>>>> calls, but it then restarts or there is >> >>> > >> > >> >>> > > >>>>>>>> another >> >>> > >> > instance >> >>> > >> > >> >>> > available, >> >>> > >> > >> >>> > > >>>>>>>> the client will happily reconnect to the >> >>> > >> > >> >>> > > >>>>>>>> new >> >>> > server >> >>> > >> > >> without >> >>> > >> > >> >>> > > >>>>>>>> warning. >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> Thanks, >> >>> > >> > >> >>> > > >>>>>>>> David >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> On 10/15/19, Ryan Murray >> >>> > >> > >> >>> > > >>>>>>>> <rym...@dremio.com> >> >>> > wrote: >> >>> > >> > >> >>> > > >>>>>>>>> Hey David, >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> I think this proposal makes a lot of >> >>> > >> > >> >>> > > >>>>>>>>> sense. >> >>> > >> > >> >>> > > >>>>>>>>> I >> >>> > like >> >>> > >> > >> >>> > > >>>>>>>>> it >> >>> > >> > >> >>> > > >>>>>>>>> and >> >>> > >> > >> >>> > > >>>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>> possibility >> >>> > >> > >> >>> > > >>>>>>>>> of remote compute via arrow buffers. One >> >>> > >> > >> >>> > > >>>>>>>>> thing >> >>> > >> > >> >>> > > >>>>>>>>> that >> >>> > >> > >> >>> > > >>>>>>>>> would >> >>> > >> > >> >>> > > >>>>>>>>> help >> >>> > >> > >> >>> > me >> >>> > >> > >> >>> > > >>>>>> would >> >>> > >> > >> >>> > > >>>>>>>> be >> >>> > >> > >> >>> > > >>>>>>>>> a concrete example of the API in a real >> >>> > >> > >> >>> > > >>>>>>>>> life >> >>> use >> >>> > >> > >> >>> > > >>>>>>>>> case. >> >>> > >> > >> >>> > > >>>>>>>>> Also, >> >>> > >> > >> >>> > what >> >>> > >> > >> >>> > > >>>>>> would >> >>> > >> > >> >>> > > >>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>> client experience be in terms of sync vs >> >>> > >> > >> >>> > > >>>>>>>>> asyc? >> >>> > >> > >> >>> > > >>>>>>>>> Would >> >>> > >> > >> >>> > > >>>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>> client >> >>> > >> > >> >>> > > >>>>>>>>> block >> >>> > >> > >> >>> > > >>>>>>>> till >> >>> > >> > >> >>> > > >>>>>>>>> the bidirectional call return ie c = >> >>> > >> > >> flight.vector_mult(a, >> >>> > >> > >> >>> > > >>>>>>>>> b) >> >>> > >> > >> >>> > or >> >>> > >> > >> >>> > > >>>>>>>>> would >> >>> > >> > >> >>> > > >>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>> client wait to be signaled that >> >>> > >> > >> >>> > > >>>>>>>>> computation >> >>> > >> > >> >>> > > >>>>>>>>> was >> >>> > >> > >> >>> > > >>>>>>>>> done. >> >>> > >> > >> >>> > > >>>>>>>>> If >> >>> > >> > >> >>> > > >>>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>> later >> >>> > >> > >> >>> > > >>>>>>>>> how >> >>> > >> > >> >>> > > >>>>>>>>> is >> >>> > >> > >> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I >> >>> suppose >> >>> > >> > >> >>> > > >>>>>>>>> that >> >>> > >> > >> >>> > > >>>>>>>>> this >> >>> > >> > >> >>> > could >> >>> > >> > >> >>> > > >>>> be >> >>> > >> > >> >>> > > >>>>>>>>> implemented without extending the RPC >> >>> > >> > >> >>> > > >>>>>>>>> interface >> >>> > >> > >> >>> > > >>>>>>>>> but >> >>> > >> > >> rather >> >>> > >> > >> >>> > > >>>>>>>>> by a >> >>> > >> > >> >>> > > >>>>>>>>> function/util? >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> Best, >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li >> >>> > >> > >> >>> > > >>>>>>>>> < >> >>> > >> > >> >>> > li.david...@gmail.com> >> >>> > >> > >> >>> > > >>>>>> wrote: >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> Hi all, >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> We've been using Flight quite >> >>> > >> > >> >>> > > >>>>>>>>>> successfully >> >>> > >> > >> >>> > > >>>>>>>>>> so >> >>> > >> > >> >>> > > >>>>>>>>>> far, >> >>> > >> > but >> >>> > >> > >> we >> >>> > >> > >> >>> > > >>>>>>>>>> have >> >>> > >> > >> >>> > > >>>>>>>>>> identified a new use case on the >> >>> > >> > >> >>> > > >>>>>>>>>> horizon: >> >>> being >> >>> > >> > >> >>> > > >>>>>>>>>> able >> >>> > >> > >> >>> > > >>>>>>>>>> to >> >>> > >> > >> >>> > > >>>>>>>>>> both >> >>> > >> > >> >>> > > >>>>>>>>>> send >> >>> > >> > >> >>> > > >>>>>>>>>> and >> >>> > >> > >> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC >> >>> > >> > >> >>> > > >>>>>>>>>> call. >> >>> To >> >>> > >> > >> >>> > > >>>>>>>>>> that >> >>> > >> > >> >>> > > >>>>>>>>>> end, >> >>> > >> > >> >>> > I've >> >>> > >> > >> >>> > > >>>>>>>>>> written up a proposal for a new RPC >> >>> > >> > >> >>> > > >>>>>>>>>> method: >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > >> >>> > >> > >> >> >>> > >> > >> >>> > >> >>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> Please let me know if you can't view or >> >>> comment >> >>> > >> > >> >>> > > >>>>>>>>>> on >> >>> > >> > the >> >>> > >> > >> >>> > document. >> >>> > >> > >> >>> > > >>>>>>>>>> I'd >> >>> > >> > >> >>> > > >>>>>>>>>> appreciate any feedback; I think this is >> >>> > >> > >> >>> > > >>>>>>>>>> a >> >>> > >> > >> >>> > > >>>>>>>>>> relatively >> >>> > >> > >> >>> > > >>>>>>>>>> straightforward >> >>> > >> > >> >>> > > >>>>>>>>>> addition - it is essentially >> >>> > >> > >> >>> > > >>>>>>>>>> "DoPutThenGet". >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> This is a format change and would require >> >>> > >> > >> >>> > > >>>>>>>>>> a >> >>> > vote. >> >>> > >> > I've >> >>> > >> > >> >>> > > >>>>>>>>>> decided >> >>> > >> > >> >>> > > >>>>>>>>>> to >> >>> > >> > >> >>> > > >>>>>>>>>> table the other format change I had >> >>> > >> > >> >>> > > >>>>>>>>>> proposed >> >>> (on >> >>> > >> > >> >>> > > >>>>>>>>>> DoPut), >> >>> > >> > >> >>> > > >>>>>>>>>> as >> >>> > >> > >> >>> > > >>>>>>>>>> it >> >>> > >> > >> >>> > > >>>>>> doesn't >> >>> > >> > >> >>> > > >>>>>>>>>> functionally change Flight, just the >> >>> > >> > >> >>> > > >>>>>>>>>> interpretation >> >>> > >> > of >> >>> > >> > >> >>> > > >>>>>>>>>> the >> >>> > >> > >> >>> > > >>>>>>>>>> semantics. >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>>> Thanks, >> >>> > >> > >> >>> > > >>>>>>>>>> David >> >>> > >> > >> >>> > > >>>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> -- >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> Ryan Murray | Principal Consulting >> >>> > >> > >> >>> > > >>>>>>>>> Engineer >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> +447540852009 | rym...@dremio.com >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/> >> >>> > >> > >> >>> > > >>>>>>>>> Check out our GitHub >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.github.com/dremio>, >> >>> > >> > >> join >> >>> > >> > >> >>> > > >>>>>>>>> our >> >>> > >> > >> >>> > > >>>>>>>>> community >> >>> > >> > >> >>> > > >>>>>>>>> site <https://community.dremio.com/> & >> >>> Download >> >>> > >> > Dremio >> >>> > >> > >> >>> > > >>>>>>>>> <https://www.dremio.com/download> >> >>> > >> > >> >>> > > >>>>>>>>> >> >>> > >> > >> >>> > > >>>>>>>> >> >>> > >> > >> >>> > > >>>>>>> >> >>> > >> > >> >>> > > >>>>>> >> >>> > >> > >> >>> > > >>>>> >> >>> > >> > >> >>> > > >>>> >> >>> > >> > >> >>> > > >>> >> >>> > >> > >> >>> > > > >> >>> > >> > >> >>> > >> >>> > >> > >> >> >> >>> > >> > >> > >> >>> > >> > >> >> >>> > >> > > >> >>> > >> > >> >>> > > >> >>> > >> >>> >> >> >> > >