Fair enough. I'm okay with the bytes approach and the proposal looks good to me.
On Fri, Nov 8, 2019 at 11:37 AM David Li <li.david...@gmail.com> wrote: > I've updated the proposal. > > On the subject of Protobuf Any vs bytes, and how to handle > errors/metadata, I still think using bytes is preferable: > - It doesn't require (conditionally) exposing or wrapping Protobuf types, > - We wouldn't be able to practically expose the Protobuf field to C++ > users without causing build pains, > - We can't let Python users take advantage of the Protobuf field > without somehow being compatible with the Protobuf wheels (by linking > to the same version, and doing magic to turn the C++ Protobufs into > the Python ones), > - All our other application-defined fields are already bytes. > > Applications that want structure can encode JSON or Protobuf Any into > the bytes field themselves, much as you can already do for Ticket, > commands in FlightDescriptors, and application metadata in > DoGet/DoPut. I don't think this is (much) less efficient than using > Any directly, since Any itself is a bytes field with a tag, and must > invoke the Protobuf deserializer again to read the actual message. > > If we decide on using bytes, then I don't think it makes sense to > define a new message with a oneof either, since it would be redundant. > > Thanks, > David > > On 11/7/19, David Li <li.david...@gmail.com> wrote: > > I've been extremely backlogged, I will update the proposal when I get > > a chance and reply here when done. > > > > Best, > > David > > > > On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote: > >> Bumping this discussion since a couple of weeks have passed. It seems > >> there are still some questions here, could we summarize what are the > >> alternatives along with any public API implications so we can try to > >> render a decision? > >> > >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com> wrote: > >>> > >>> Hi Wes, > >>> > >>> Responses inline: > >>> > >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote: > >>> > >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com> > >>> > wrote: > >>> > > > >>> > > The question is whether to repurpose the existing FlightData > >>> > > structure, and allow for the metadata field to be filled in and > data > >>> > > fields to be blank (as a control message), or to wrap the > FlightData > >>> > > structure in another structure that explicitly distinguishes > between > >>> > > control and data messages. > >>> > > >>> > I'm not super against having metadata-only FlightData with empty > body. > >>> > One question to consider is what changes (if any) would need to be > >>> > made to public APIs in either scenario. > >>> > > >>> > >>> We could leave DoGet/DoPut as-is for now, and allow empty data messages > >>> in > >>> the future. This would be a breaking change, but wouldn't change the > >>> wire > >>> format. I think the APIs could be changed backwards compatibly, though. > >>> > >>> > >>> > >>> > > The other question is how to handle the metadata fields. So far, > >>> > > we've > >>> > > used bytestring fields for application-defined data. This is > >>> > > workable > >>> > > if you want to use Protobuf to define the contents of those fields, > >>> > > but requires you to pack/unpack your Protobuf into/from the > >>> > > bytestring > >>> > > field. If we instead used the Protobuf Any field, a dynamically > >>> > > typed > >>> > > field, this would be more convenient, but then we'd be exposing > >>> > > Protobuf types. We could alternatively use a combination of a type > >>> > > field and a bytestring field, mimicking what the Protobuf Any type > >>> > > looks like on the wire. I'm not sure this is actually cleaner in > any > >>> > > of the language APIs, though. > >>> > > >>> > Leaving the deserialization of the app metadata to the particular > >>> > Flight implementation seems on first principles like the most > flexible > >>> > thing, if Any is used, does that mean the metadata _must_ be a > >>> > protobuf? > >>> > > >>> > >>> > >>> If Any is used, we could still expose a bytes-based API, but it would > >>> have > >>> some more wrapping. (We could put a ByteString in Any.) Then the > >>> question > >>> would just be how to expose this (would be easier in Java, harder in > >>> C++). > >>> > >>> > >>> > >>> > > David > >>> > > > >>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote: > >>> > > > > >>> > > > Can one of you explain what is being proposed in non-protobuf > >>> > > > terms? > >>> > > > Knowledge of protobuf shouldn't be required to use Flight. > >>> > > > > >>> > > > Regards > >>> > > > > >>> > > > Antoine. > >>> > > > > >>> > > > > >>> > > > Le 21/10/2019 à 15:46, David Li a écrit : > >>> > > >> Oneof doesn't actually change the wire encoding; it would just > be > >>> > > >> application-level logic. (The official guide doesn't even > mention > >>> > > >> it > >>> > > >> in the encoding docs; I found > >>> > > >> > >>> > > https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct > >>> > > >> as well.) > >>> > > >> > >>> > > >> If I follow you, Jacques, then you are proposing essentially > >>> > > >> inlining > >>> > > >> the definition of Any, e.g. > >>> > > >> > >>> > > >> message FlightMessage { > >>> > > >> oneof message { > >>> > > >> FlightData data = 1; > >>> > > >> FlightAny metadata = 2; > >>> > > >> } > >>> > > >> } > >>> > > >> > >>> > > >> message FlightAny { > >>> > > >> string type = 1; > >>> > > >> bytes data = 2; > >>> > > >> } > >>> > > >> > >>> > > >> Is this correct? > >>> > > >> > >>> > > >> It might be nice to consider the wrapper message for DoGet/DoPut > >>> > > >> as > >>> > > >> well, but at that point, I'd rather we be consistent with all of > >>> > > >> them, > >>> > > >> rather than have one of the three methods do its own thing. > >>> > > >> > >>> > > >> Thanks, > >>> > > >> David > >>> > > >> > >>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote: > >>> > > >>> I think we could probably expose the oneof behavior without > >>> > > >>> exposing > >>> > the > >>> > > >>> protobuf functions. On the any... hmm. I guess we could expose > >>> > > >>> as > >>> > > >>> two > >>> > > >>> fields: type and data. Then users could use it for whatever but > >>> > > >>> if > >>> > > >>> people > >>> > > >>> wanted to treat it as any, it would work. (Basically a user > >>> > > >>> could > >>> > > >>> use > >>> > > >>> any > >>> > > >>> with it easily but they could also use any other mechanism). At > >>> > least in > >>> > > >>> java, the any concepts are pretty simple/diy. Are other > language > >>> > > >>> bindings > >>> > > >>> less diy? > >>> > > >>> > >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata but > >>> > > >>> it > >>> > just > >>> > > >>> seemed a bit janky. > >>> > > >>> > >>> > > >>> Thinking about the control message/wrapper object thing, I > >>> > > >>> wonder > >>> > > >>> if > >>> > we > >>> > > >>> should redefine DoPut and DoGet to have the same property if we > >>> > think it > >>> > > >>> is > >>> > > >>> a good idea... > >>> > > >>> > >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li < > li.david...@gmail.com> > >>> > wrote: > >>> > > >>> > >>> > > >>>> I was definitely considering having control messages without > >>> > > >>>> data, > >>> > and > >>> > > >>>> I thought that could be encoded by a FlightData with only > >>> > app_metadata > >>> > > >>>> set. I think I understand your position now: FlightData should > >>> > always > >>> > > >>>> carry (some) data (with optional metadata)? > >>> > > >>>> > >>> > > >>>> That makes sense to me, and is consistent with the > >>> > > >>>> documentation > >>> > > >>>> on > >>> > > >>>> FlightData in the Protobuf file. I was worried about having a > >>> > > >>>> redundant metadata field, but oneof prevents that from > >>> > > >>>> happening, > >>> > and > >>> > > >>>> overall having a clear separation between data and control > >>> > > >>>> messages > >>> > is > >>> > > >>>> cleaner. > >>> > > >>>> > >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from > >>> > > >>>> exposing > >>> > > >>>> Protobuf by using bytes, would we want to change that now? > >>> > > >>>> > >>> > > >>>> Best, > >>> > > >>>> David > >>> > > >>>> > >>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote: > >>> > > >>>>> Hey David, > >>> > > >>>>> > >>> > > >>>>> RE: Async: I was trying to match the pattern we use for > >>> > > >>>>> doget/doput > >>> > > >>>>> for > >>> > > >>>>> async. Yes, more thinking java given java grpc's async always > >>> > pattern. > >>> > > >>>>> > >>> > > >>>>> On the comment around the FlightData, I think it is > >>> > > >>>>> overloading > >>> > > >>>>> the > >>> > > >>>> message > >>> > > >>>>> to use metadata for this. If I want to send a control message > >>> > > >>>> independently > >>> > > >>>>> of the data message, I would have to define something like an > >>> > > >>>>> empty > >>> > > >>>> flight > >>> > > >>>>> data message that has custom metadata. Why not support a > >>> > > >>>>> container > >>> > > >>>>> object > >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can add > >>> > > >>>>> more > >>> > data > >>> > > >>>>> as > >>> > > >>>>> desired. The default impl could be a noop for the Any > >>> > > >>>>> messages. > >>> > > >>>>> > >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li > >>> > > >>>>> <li.david...@gmail.com> > >>> > > >>>>> wrote: > >>> > > >>>>> > >>> > > >>>>>> Hi Jacques, > >>> > > >>>>>> > >>> > > >>>>>> Thanks for the comments. > >>> > > >>>>>> > >>> > > >>>>>> - I do agree DoExchange is a better name! > >>> > > >>>>>> - FlightData already has metadata fields as a result of > prior > >>> > > >>>>>> proposals, so I don't think we need a new message to carry > >>> > > >>>>>> that > >>> > kind > >>> > > >>>>>> of information. > >>> > > >>>>>> - I like the suggestion of an async handler to handle > >>> > > >>>>>> incoming > >>> > > >>>>>> messages as the fundamental API; it would actually be quite > >>> > natural > >>> > > >>>>>> to > >>> > > >>>>>> implement in Flight/Java. I will note that it's not possible > >>> > > >>>>>> in > >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence, > >>> > gRPC-Java > >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are > >>> > experimental > >>> > > >>>>>> C++ APIs that would let us do something similar to Java, but > >>> > > >>>>>> those > >>> > > >>>>>> are > >>> > > >>>>>> only in relatively recent gRPC versions and are still under > >>> > > >>>>>> development (contrary to the interceptor APIs which have > been > >>> > around > >>> > > >>>>>> for quite a while). > >>> > > >>>>>> > >>> > > >>>>>> Thanks, > >>> > > >>>>>> David > >>> > > >>>>>> > >>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote: > >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth > >>> > > >>>>>>> discussion > >>> > > >>>>>>> here > >>> > > >>>>>>> depending on your thoughts. > >>> > > >>>>>>> > >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li > >>> > > >>>>>>> <li.david...@gmail.com> > >>> > > >>>> wrote: > >>> > > >>>>>>> > >>> > > >>>>>>>> Hey Ryan, > >>> > > >>>>>>>> > >>> > > >>>>>>>> Thanks for the comments. > >>> > > >>>>>>>> > >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a Python > >>> > strawman. > >>> > > >>>>>>>> > >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could > >>> > > >>>>>>>> interleave > >>> > > >>>> uploads > >>> > > >>>>>>>> and downloads if you were so inclined. Right now, > >>> > > >>>>>>>> synchronous > >>> > APIs > >>> > > >>>>>>>> make this error-prone, e.g. if both client and server wait > >>> > > >>>>>>>> for > >>> > each > >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't give > >>> > > >>>>>>>> us > >>> > > >>>>>>>> the > >>> > > >>>>>>>> ability to have per-read timeouts, only an overall > >>> > > >>>>>>>> timeout.) > >>> > > >>>>>>>> As > >>> > an > >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063: > >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063 > >>> > > >>>>>>>> > >>> > > >>>>>>>> This is mostly tangential though, eventually we will want > >>> > > >>>>>>>> to > >>> > design > >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional > >>> > > >>>>>>>> stream > >>> > > >>>>>>>> like > >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier to > >>> > > >>>>>>>> run > >>> > into. > >>> > > >>>>>>>> > >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but > >>> > > >>>>>>>> the > >>> > main > >>> > > >>>>>>>> concern is that depending on how you deploy, two separate > >>> > > >>>>>>>> calls > >>> > > >>>>>>>> could > >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC has > >>> > > >>>>>>>> some > >>> > > >>>>>>>> reconnection behaviors; if the server goes away in between > >>> > > >>>>>>>> the > >>> > two > >>> > > >>>>>>>> calls, but it then restarts or there is another instance > >>> > available, > >>> > > >>>>>>>> the client will happily reconnect to the new server > without > >>> > > >>>>>>>> warning. > >>> > > >>>>>>>> > >>> > > >>>>>>>> Thanks, > >>> > > >>>>>>>> David > >>> > > >>>>>>>> > >>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote: > >>> > > >>>>>>>>> Hey David, > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it and > >>> > > >>>>>>>>> the > >>> > > >>>>>>>>> possibility > >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that would > >>> > > >>>>>>>>> help > >>> > me > >>> > > >>>>>> would > >>> > > >>>>>>>> be > >>> > > >>>>>>>>> a concrete example of the API in a real life use case. > >>> > > >>>>>>>>> Also, > >>> > what > >>> > > >>>>>> would > >>> > > >>>>>>>> the > >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would the > >>> > > >>>>>>>>> client > >>> > > >>>>>>>>> block > >>> > > >>>>>>>> till > >>> > > >>>>>>>>> the bidirectional call return ie c = > flight.vector_mult(a, > >>> > > >>>>>>>>> b) > >>> > or > >>> > > >>>>>>>>> would > >>> > > >>>>>>>> the > >>> > > >>>>>>>>> client wait to be signaled that computation was done. If > >>> > > >>>>>>>>> the > >>> > > >>>>>>>>> later > >>> > > >>>>>>>>> how > >>> > > >>>>>>>>> is > >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that > >>> > > >>>>>>>>> this > >>> > could > >>> > > >>>> be > >>> > > >>>>>>>>> implemented without extending the RPC interface but > rather > >>> > > >>>>>>>>> by a > >>> > > >>>>>>>>> function/util? > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Best, > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Ryan > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li < > >>> > li.david...@gmail.com> > >>> > > >>>>>> wrote: > >>> > > >>>>>>>>> > >>> > > >>>>>>>>>> Hi all, > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but > we > >>> > > >>>>>>>>>> have > >>> > > >>>>>>>>>> identified a new use case on the horizon: being able to > >>> > > >>>>>>>>>> both > >>> > > >>>>>>>>>> send > >>> > > >>>>>>>>>> and > >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that > >>> > > >>>>>>>>>> end, > >>> > I've > >>> > > >>>>>>>>>> written up a proposal for a new RPC method: > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>> > >>> > > >>>> > >>> > > https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the > >>> > document. > >>> > > >>>>>>>>>> I'd > >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively > >>> > > >>>>>>>>>> straightforward > >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet". > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>>> This is a format change and would require a vote. I've > >>> > > >>>>>>>>>> decided > >>> > > >>>>>>>>>> to > >>> > > >>>>>>>>>> table the other format change I had proposed (on DoPut), > >>> > > >>>>>>>>>> as > >>> > > >>>>>>>>>> it > >>> > > >>>>>> doesn't > >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of > >>> > > >>>>>>>>>> the > >>> > > >>>>>>>>>> semantics. > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>>> Thanks, > >>> > > >>>>>>>>>> David > >>> > > >>>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> -- > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> Ryan Murray | Principal Consulting Engineer > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> +447540852009 | rym...@dremio.com > >>> > > >>>>>>>>> > >>> > > >>>>>>>>> <https://www.dremio.com/> > >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>, > join > >>> > > >>>>>>>>> our > >>> > > >>>>>>>>> community > >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio > >>> > > >>>>>>>>> <https://www.dremio.com/download> > >>> > > >>>>>>>>> > >>> > > >>>>>>>> > >>> > > >>>>>>> > >>> > > >>>>>> > >>> > > >>>>> > >>> > > >>>> > >>> > > >>> > >>> > > > > >>> > > >> > > >