Thanks for the feedback.

I do think if we had explicitly embraced gRPC from the beginning,
there are a lot of places where things could be made more ergonomic,
including with the metadata fields. But it would also have locked out
us of potential future transports.

On another note: I hesitate to put too much into this method, but we
are looking at use cases where potentially, a client may want to
upload multiple distinct datasets (with differing schemas). (This is a
little tentative, and I can get more details...) Right now, each
logical stream in Flight must have a single, consistent schema; would
it make sense to look at ways to relax this, or declare this
explicitly out of scope (and require multiple calls and coordination
with the deployment topology) in order to accomplish this?

Best,
David

On 11/27/19, Jacques Nadeau <jacq...@apache.org> wrote:
> Fair enough. I'm okay with the bytes approach and the proposal looks good
> to me.
>
> On Fri, Nov 8, 2019 at 11:37 AM David Li <li.david...@gmail.com> wrote:
>
>> I've updated the proposal.
>>
>> On the subject of Protobuf Any vs bytes, and how to handle
>> errors/metadata, I still think using bytes is preferable:
>> - It doesn't require (conditionally) exposing or wrapping Protobuf types,
>> - We wouldn't be able to practically expose the Protobuf field to C++
>> users without causing build pains,
>> - We can't let Python users take advantage of the Protobuf field
>> without somehow being compatible with the Protobuf wheels (by linking
>> to the same version, and doing magic to turn the C++ Protobufs into
>> the Python ones),
>> - All our other application-defined fields are already bytes.
>>
>> Applications that want structure can encode JSON or Protobuf Any into
>> the bytes field themselves, much as you can already do for Ticket,
>> commands in FlightDescriptors, and application metadata in
>> DoGet/DoPut. I don't think this is (much) less efficient than using
>> Any directly, since Any itself is a bytes field with a tag, and must
>> invoke the Protobuf deserializer again to read the actual message.
>>
>> If we decide on using bytes, then I don't think it makes sense to
>> define a new message with a oneof either, since it would be redundant.
>>
>> Thanks,
>> David
>>
>> On 11/7/19, David Li <li.david...@gmail.com> wrote:
>> > I've been extremely backlogged, I will update the proposal when I get
>> > a chance and reply here when done.
>> >
>> > Best,
>> > David
>> >
>> > On 11/7/19, Wes McKinney <wesmck...@gmail.com> wrote:
>> >> Bumping this discussion since a couple of weeks have passed. It seems
>> >> there are still some questions here, could we summarize what are the
>> >> alternatives along with any public API implications so we can try to
>> >> render a decision?
>> >>
>> >> On Sat, Oct 26, 2019 at 7:19 PM David Li <li.david...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Wes,
>> >>>
>> >>> Responses inline:
>> >>>
>> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney <wesmck...@gmail.com> wrote:
>> >>>
>> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li <li.david...@gmail.com>
>> >>> > wrote:
>> >>> > >
>> >>> > > The question is whether to repurpose the existing FlightData
>> >>> > > structure, and allow for the metadata field to be filled in and
>> data
>> >>> > > fields to be blank (as a control message), or to wrap the
>> FlightData
>> >>> > > structure in another structure that explicitly distinguishes
>> between
>> >>> > > control and data messages.
>> >>> >
>> >>> > I'm not super against having metadata-only FlightData with empty
>> body.
>> >>> > One question to consider is what changes (if any) would need to be
>> >>> > made to public APIs in either scenario.
>> >>> >
>> >>>
>> >>> We could leave DoGet/DoPut as-is for now, and allow empty data
>> >>> messages
>> >>> in
>> >>> the future. This would be a breaking change, but wouldn't change the
>> >>> wire
>> >>> format. I think the APIs could be changed backwards compatibly,
>> >>> though.
>> >>>
>> >>>
>> >>>
>> >>> > > The other question is how to handle the metadata fields. So far,
>> >>> > > we've
>> >>> > > used bytestring fields for application-defined data. This is
>> >>> > > workable
>> >>> > > if you want to use Protobuf to define the contents of those
>> >>> > > fields,
>> >>> > > but requires you to pack/unpack your Protobuf into/from the
>> >>> > > bytestring
>> >>> > > field. If we instead used the Protobuf Any field, a dynamically
>> >>> > > typed
>> >>> > > field, this would be more convenient, but then we'd be exposing
>> >>> > > Protobuf types. We could alternatively use a combination of a
>> >>> > > type
>> >>> > > field and a bytestring field, mimicking what the Protobuf Any
>> >>> > > type
>> >>> > > looks like on the wire. I'm not sure this is actually cleaner in
>> any
>> >>> > > of the language APIs, though.
>> >>> >
>> >>> > Leaving the deserialization of the app metadata to the particular
>> >>> > Flight implementation seems on first principles like the most
>> flexible
>> >>> > thing, if Any is used, does that mean the metadata _must_ be a
>> >>> > protobuf?
>> >>> >
>> >>>
>> >>>
>> >>> If Any is used, we could still expose a bytes-based API, but it would
>> >>> have
>> >>> some more wrapping. (We could put a ByteString in Any.) Then the
>> >>> question
>> >>> would just be how to expose this (would be easier in Java, harder in
>> >>> C++).
>> >>>
>> >>>
>> >>>
>> >>> > > David
>> >>> > >
>> >>> > > On 10/21/19, Antoine Pitrou <anto...@python.org> wrote:
>> >>> > > >
>> >>> > > > Can one of you explain what is being proposed in non-protobuf
>> >>> > > > terms?
>> >>> > > > Knowledge of protobuf shouldn't be required to use Flight.
>> >>> > > >
>> >>> > > > Regards
>> >>> > > >
>> >>> > > > Antoine.
>> >>> > > >
>> >>> > > >
>> >>> > > > Le 21/10/2019 à 15:46, David Li a écrit :
>> >>> > > >> Oneof doesn't actually change the wire encoding; it would just
>> be
>> >>> > > >> application-level logic. (The official guide doesn't even
>> mention
>> >>> > > >> it
>> >>> > > >> in the encoding docs; I found
>> >>> > > >>
>> >>> >
>> https://stackoverflow.com/questions/52226409/how-protobuf-encodes-oneof-message-construct
>> >>> > > >> as well.)
>> >>> > > >>
>> >>> > > >> If I follow you, Jacques, then you are proposing essentially
>> >>> > > >> inlining
>> >>> > > >> the definition of Any, e.g.
>> >>> > > >>
>> >>> > > >> message FlightMessage {
>> >>> > > >>   oneof message {
>> >>> > > >>     FlightData data = 1;
>> >>> > > >>     FlightAny metadata = 2;
>> >>> > > >>   }
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> message FlightAny {
>> >>> > > >>   string type = 1;
>> >>> > > >>   bytes data = 2;
>> >>> > > >> }
>> >>> > > >>
>> >>> > > >> Is this correct?
>> >>> > > >>
>> >>> > > >> It might be nice to consider the wrapper message for
>> >>> > > >> DoGet/DoPut
>> >>> > > >> as
>> >>> > > >> well, but at that point, I'd rather we be consistent with all
>> >>> > > >> of
>> >>> > > >> them,
>> >>> > > >> rather than have one of the three methods do its own thing.
>> >>> > > >>
>> >>> > > >> Thanks,
>> >>> > > >> David
>> >>> > > >>
>> >>> > > >> On 10/20/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> >>> > > >>> I think we could probably expose the oneof behavior without
>> >>> > > >>> exposing
>> >>> > the
>> >>> > > >>> protobuf functions. On the any... hmm. I guess we could
>> >>> > > >>> expose
>> >>> > > >>> as
>> >>> > > >>> two
>> >>> > > >>> fields: type and data. Then users could use it for whatever
>> >>> > > >>> but
>> >>> > > >>> if
>> >>> > > >>> people
>> >>> > > >>> wanted to treat it as any, it would work. (Basically a user
>> >>> > > >>> could
>> >>> > > >>> use
>> >>> > > >>> any
>> >>> > > >>> with it easily but they could also use any other mechanism).
>> >>> > > >>> At
>> >>> > least in
>> >>> > > >>> java, the any concepts are pretty simple/diy. Are other
>> language
>> >>> > > >>> bindings
>> >>> > > >>> less diy?
>> >>> > > >>>
>> >>> > > >>> I'm *not* hardcore against the empty FlightData + metadata
>> >>> > > >>> but
>> >>> > > >>> it
>> >>> > just
>> >>> > > >>> seemed a bit janky.
>> >>> > > >>>
>> >>> > > >>> Thinking about the control message/wrapper object thing, I
>> >>> > > >>> wonder
>> >>> > > >>> if
>> >>> > we
>> >>> > > >>> should redefine DoPut and DoGet to have the same property if
>> >>> > > >>> we
>> >>> > think it
>> >>> > > >>> is
>> >>> > > >>> a good idea...
>> >>> > > >>>
>> >>> > > >>> On Wed, Oct 16, 2019 at 5:13 PM David Li <
>> li.david...@gmail.com>
>> >>> > wrote:
>> >>> > > >>>
>> >>> > > >>>> I was definitely considering having control messages without
>> >>> > > >>>> data,
>> >>> > and
>> >>> > > >>>> I thought that could be encoded by a FlightData with only
>> >>> > app_metadata
>> >>> > > >>>> set. I think I understand your position now: FlightData
>> >>> > > >>>> should
>> >>> > always
>> >>> > > >>>> carry (some) data (with optional metadata)?
>> >>> > > >>>>
>> >>> > > >>>> That makes sense to me, and is consistent with the
>> >>> > > >>>> documentation
>> >>> > > >>>> on
>> >>> > > >>>> FlightData in the Protobuf file. I was worried about having
>> >>> > > >>>> a
>> >>> > > >>>> redundant metadata field, but oneof prevents that from
>> >>> > > >>>> happening,
>> >>> > and
>> >>> > > >>>> overall having a clear separation between data and control
>> >>> > > >>>> messages
>> >>> > is
>> >>> > > >>>> cleaner.
>> >>> > > >>>>
>> >>> > > >>>> As for using Protobuf's Any: so far, we've refrained from
>> >>> > > >>>> exposing
>> >>> > > >>>> Protobuf by using bytes, would we want to change that now?
>> >>> > > >>>>
>> >>> > > >>>> Best,
>> >>> > > >>>> David
>> >>> > > >>>>
>> >>> > > >>>> On 10/16/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> >>> > > >>>>> Hey David,
>> >>> > > >>>>>
>> >>> > > >>>>> RE: Async: I was trying to match the pattern we use for
>> >>> > > >>>>> doget/doput
>> >>> > > >>>>> for
>> >>> > > >>>>> async. Yes, more thinking java given java grpc's async
>> >>> > > >>>>> always
>> >>> > pattern.
>> >>> > > >>>>>
>> >>> > > >>>>> On the comment around the FlightData, I think it is
>> >>> > > >>>>> overloading
>> >>> > > >>>>> the
>> >>> > > >>>> message
>> >>> > > >>>>> to use metadata for this. If I want to send a control
>> >>> > > >>>>> message
>> >>> > > >>>> independently
>> >>> > > >>>>> of the data message, I would have to define something like
>> >>> > > >>>>> an
>> >>> > > >>>>> empty
>> >>> > > >>>> flight
>> >>> > > >>>>> data message that has custom metadata. Why not support a
>> >>> > > >>>>> container
>> >>> > > >>>>> object
>> >>> > > >>>>> with a oneof{FlightData, Any} in it instead so users can
>> >>> > > >>>>> add
>> >>> > > >>>>> more
>> >>> > data
>> >>> > > >>>>> as
>> >>> > > >>>>> desired. The default impl could be a noop for the Any
>> >>> > > >>>>> messages.
>> >>> > > >>>>>
>> >>> > > >>>>> On Tue, Oct 15, 2019 at 6:50 PM David Li
>> >>> > > >>>>> <li.david...@gmail.com>
>> >>> > > >>>>> wrote:
>> >>> > > >>>>>
>> >>> > > >>>>>> Hi Jacques,
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks for the comments.
>> >>> > > >>>>>>
>> >>> > > >>>>>> - I do agree DoExchange is a better name!
>> >>> > > >>>>>> - FlightData already has metadata fields as a result of
>> prior
>> >>> > > >>>>>> proposals, so I don't think we need a new message to carry
>> >>> > > >>>>>> that
>> >>> > kind
>> >>> > > >>>>>> of information.
>> >>> > > >>>>>> - I like the suggestion of an async handler to handle
>> >>> > > >>>>>> incoming
>> >>> > > >>>>>> messages as the fundamental API; it would actually be
>> >>> > > >>>>>> quite
>> >>> > natural
>> >>> > > >>>>>> to
>> >>> > > >>>>>> implement in Flight/Java. I will note that it's not
>> >>> > > >>>>>> possible
>> >>> > > >>>>>> in
>> >>> > > >>>>>> C++/Python without spawning a thread, though. (In essence,
>> >>> > gRPC-Java
>> >>> > > >>>>>> is async-always and gRPC-C++ is sync-always.) There are
>> >>> > experimental
>> >>> > > >>>>>> C++ APIs that would let us do something similar to Java,
>> >>> > > >>>>>> but
>> >>> > > >>>>>> those
>> >>> > > >>>>>> are
>> >>> > > >>>>>> only in relatively recent gRPC versions and are still
>> >>> > > >>>>>> under
>> >>> > > >>>>>> development (contrary to the interceptor APIs which have
>> been
>> >>> > around
>> >>> > > >>>>>> for quite a while).
>> >>> > > >>>>>>
>> >>> > > >>>>>> Thanks,
>> >>> > > >>>>>> David
>> >>> > > >>>>>>
>> >>> > > >>>>>> On 10/15/19, Jacques Nadeau <jacq...@apache.org> wrote:
>> >>> > > >>>>>>> I like it. Added some comments to the doc. Might worth
>> >>> > > >>>>>>> discussion
>> >>> > > >>>>>>> here
>> >>> > > >>>>>>> depending on your thoughts.
>> >>> > > >>>>>>>
>> >>> > > >>>>>>> On Tue, Oct 15, 2019 at 7:11 AM David Li
>> >>> > > >>>>>>> <li.david...@gmail.com>
>> >>> > > >>>> wrote:
>> >>> > > >>>>>>>
>> >>> > > >>>>>>>> Hey Ryan,
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks for the comments.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Concrete example: I've edited the doc to provide a
>> >>> > > >>>>>>>> Python
>> >>> > strawman.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Sync vs async: while I don't touch on it, you could
>> >>> > > >>>>>>>> interleave
>> >>> > > >>>> uploads
>> >>> > > >>>>>>>> and downloads if you were so inclined. Right now,
>> >>> > > >>>>>>>> synchronous
>> >>> > APIs
>> >>> > > >>>>>>>> make this error-prone, e.g. if both client and server
>> >>> > > >>>>>>>> wait
>> >>> > > >>>>>>>> for
>> >>> > each
>> >>> > > >>>>>>>> other due to an application logic bug. (gRPC doesn't
>> >>> > > >>>>>>>> give
>> >>> > > >>>>>>>> us
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>> ability to have per-read timeouts, only an overall
>> >>> > > >>>>>>>> timeout.)
>> >>> > > >>>>>>>> As
>> >>> > an
>> >>> > > >>>>>>>> example of this happening with DoPut, see ARROW-6063:
>> >>> > > >>>>>>>> https://issues.apache.org/jira/browse/ARROW-6063
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> This is mostly tangential though, eventually we will
>> >>> > > >>>>>>>> want
>> >>> > > >>>>>>>> to
>> >>> > design
>> >>> > > >>>>>>>> asynchronous APIs for Flight as a whole. A bidirectional
>> >>> > > >>>>>>>> stream
>> >>> > > >>>>>>>> like
>> >>> > > >>>>>>>> this (and like DoPut) just makes these pitfalls easier
>> >>> > > >>>>>>>> to
>> >>> > > >>>>>>>> run
>> >>> > into.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Using DoPut+DoGet: I discussed this in the proposal, but
>> >>> > > >>>>>>>> the
>> >>> > main
>> >>> > > >>>>>>>> concern is that depending on how you deploy, two
>> >>> > > >>>>>>>> separate
>> >>> > > >>>>>>>> calls
>> >>> > > >>>>>>>> could
>> >>> > > >>>>>>>> get routed to different instances. Additionally, gRPC
>> >>> > > >>>>>>>> has
>> >>> > > >>>>>>>> some
>> >>> > > >>>>>>>> reconnection behaviors; if the server goes away in
>> >>> > > >>>>>>>> between
>> >>> > > >>>>>>>> the
>> >>> > two
>> >>> > > >>>>>>>> calls, but it then restarts or there is another instance
>> >>> > available,
>> >>> > > >>>>>>>> the client will happily reconnect to the new server
>> without
>> >>> > > >>>>>>>> warning.
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> Thanks,
>> >>> > > >>>>>>>> David
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>> On 10/15/19, Ryan Murray <rym...@dremio.com> wrote:
>> >>> > > >>>>>>>>> Hey David,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> I think this proposal makes a lot of sense. I like it
>> >>> > > >>>>>>>>> and
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> possibility
>> >>> > > >>>>>>>>> of remote compute via arrow buffers. One thing that
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>>> help
>> >>> > me
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> be
>> >>> > > >>>>>>>>> a concrete example of the API in a real life use case.
>> >>> > > >>>>>>>>> Also,
>> >>> > what
>> >>> > > >>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client experience be in terms of sync vs asyc? Would
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> client
>> >>> > > >>>>>>>>> block
>> >>> > > >>>>>>>> till
>> >>> > > >>>>>>>>> the bidirectional call return ie c =
>> flight.vector_mult(a,
>> >>> > > >>>>>>>>> b)
>> >>> > or
>> >>> > > >>>>>>>>> would
>> >>> > > >>>>>>>> the
>> >>> > > >>>>>>>>> client wait to be signaled that computation was done.
>> >>> > > >>>>>>>>> If
>> >>> > > >>>>>>>>> the
>> >>> > > >>>>>>>>> later
>> >>> > > >>>>>>>>> how
>> >>> > > >>>>>>>>> is
>> >>> > > >>>>>>>>> that different from a DoPut then DoGet? I suppose that
>> >>> > > >>>>>>>>> this
>> >>> > could
>> >>> > > >>>> be
>> >>> > > >>>>>>>>> implemented without extending the RPC interface but
>> rather
>> >>> > > >>>>>>>>> by a
>> >>> > > >>>>>>>>> function/util?
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Best,
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> On Sun, Oct 13, 2019 at 9:24 PM David Li <
>> >>> > li.david...@gmail.com>
>> >>> > > >>>>>> wrote:
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>> Hi all,
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> We've been using Flight quite successfully so far, but
>> we
>> >>> > > >>>>>>>>>> have
>> >>> > > >>>>>>>>>> identified a new use case on the horizon: being able
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> both
>> >>> > > >>>>>>>>>> send
>> >>> > > >>>>>>>>>> and
>> >>> > > >>>>>>>>>> retrieve Arrow data within a single RPC call. To that
>> >>> > > >>>>>>>>>> end,
>> >>> > I've
>> >>> > > >>>>>>>>>> written up a proposal for a new RPC method:
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>
>> >>> >
>> https://docs.google.com/document/d/1Hh-3Z0hK5PxyEYFxwVxp77jens3yAgC_cpp0TGW-dcw/edit?usp=sharing
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Please let me know if you can't view or comment on the
>> >>> > document.
>> >>> > > >>>>>>>>>> I'd
>> >>> > > >>>>>>>>>> appreciate any feedback; I think this is a relatively
>> >>> > > >>>>>>>>>> straightforward
>> >>> > > >>>>>>>>>> addition - it is essentially "DoPutThenGet".
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> This is a format change and would require a vote. I've
>> >>> > > >>>>>>>>>> decided
>> >>> > > >>>>>>>>>> to
>> >>> > > >>>>>>>>>> table the other format change I had proposed (on
>> >>> > > >>>>>>>>>> DoPut),
>> >>> > > >>>>>>>>>> as
>> >>> > > >>>>>>>>>> it
>> >>> > > >>>>>> doesn't
>> >>> > > >>>>>>>>>> functionally change Flight, just the interpretation of
>> >>> > > >>>>>>>>>> the
>> >>> > > >>>>>>>>>> semantics.
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>> Thanks,
>> >>> > > >>>>>>>>>> David
>> >>> > > >>>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> --
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> Ryan Murray  | Principal Consulting Engineer
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> +447540852009 | rym...@dremio.com
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>> <https://www.dremio.com/>
>> >>> > > >>>>>>>>> Check out our GitHub <https://www.github.com/dremio>,
>> join
>> >>> > > >>>>>>>>> our
>> >>> > > >>>>>>>>> community
>> >>> > > >>>>>>>>> site <https://community.dremio.com/> & Download Dremio
>> >>> > > >>>>>>>>> <https://www.dremio.com/download>
>> >>> > > >>>>>>>>>
>> >>> > > >>>>>>>>
>> >>> > > >>>>>>>
>> >>> > > >>>>>>
>> >>> > > >>>>>
>> >>> > > >>>>
>> >>> > > >>>
>> >>> > > >
>> >>> >
>> >>
>> >
>>
>

Reply via email to