Thanks for chiming in - I've replied in the doc. Scoping it to just schema 
evolution would be preferable, but I'm not sure if Gosh's usecase requires more 
flexibility than that or not.

Again, though, given that 1) gRPC recycles a connection, so repeated calls 
aren't necessarily expensive and 2) encoding tricks like union-of-structs, any 
solution needs to be weighed against those/we should make sure to document why 
they aren't sufficient. (For instance, 1) is hampered by the use of L7 load 
balancers and/or client-side load balancing policies in gRPC and assumes 
statefulness which is undesirable in general. There's also the eventual desire 
to have a transport besides gRPC someday.)

-David

On Wed, Jun 23, 2021, at 16:24, Nate Bauernfeind wrote:
> Thanks for writing this up! I added a few general comments, but have a
> question on the approach because it's not quite what I was expecting.
> 
> I am slightly concerned that the proposal looks more like support for
> "multiplexing" IPC streams into a single RPC stream rather than support for
> a changing Schema of an otherwise consistently logical stream. gRPC already
> does a good job decoupling RPC streams from one another. I feel that
> throwing that idea into the IPC stream increases client-library
> implementation cost by quite a lot.
> 
> Why is it not good enough to replace the Schema when we see a duplicate?
> This is undoubtedly less work across all client implementations.
> 
> The benefit I see is that you might have two schemas that you swap between
> frequently then you can indicate with a single integer. If that's what you
> want to support I would rather think of them as `schema_id` instead of
> `stream_id` and not give this impression that multiplexing is a goal. As
> you have proposed, it seems that the "done writing for a stream" needs a
> callback notifying the user receiving the stream that a logical subset of
> the flight is complete. Alternatively, if they aren't independent streams
> (to the end-user), we could tell the Arrow layer that a particular schema
> is no longer needed without also needing to communicate further downstream.
> 
> On Wed, Jun 23, 2021 at 1:39 PM David Li <lidav...@apache.org> wrote:
> 
> > Ah to be clear, the API is indeed inconsistent - DoExchange was added some
> > time later (and by its nature returning a FlightDataStream would not have
> > been possible, since it's meant to be able to interleave reading/writing).
> > But really, DoGet is indeed the odd one out in the C++ API and it may be
> > worth correcting. You could also perhaps imagine making a FlightDataStream
> > implementation that accepts a closure and provides it a fake writer, if the
> > API mismatch is hard to work with...
> >
> > That said: this has some benefits, e.g. for a Python service that returns
> > a Table, that means data can be fed into gRPC entirely in C++ without
> > having to bounce into Python for each chunk.
> >
> > Best,
> > David
> >
> > On Wed, Jun 23, 2021, at 15:33, Gosh Arzumanyan wrote:
> > > Hi David,
> > >
> > > Got you. In fact I was looking at this more from the point of view of
> > consistency of the API in terms of "inputs" and thought DoExchange is kind
> > of a DoGet+ so might make sense to have the same classes being utilized in
> > both places. But again, I might be missing something and I get the point
> > about breaking change.
> > >
> > > Cheers,
> > > Gosh
> > >
> > > On Wed, Jun 23, 2021 at 2:58 PM David Li <lidav...@apache.org> wrote:
> > >> __
> > >> It's mostly a quirk of implementation (and just for clarification,
> > they're all nearly identical on the format/protocol level).
> > >>
> > >> DoGet is conceptualized as your application returning a readable stream
> > of batches, instead of your application imperatively writing batches to the
> > client. (This is different than how Flight is implemented in Java.) You
> > would normally not implement FlightDataStream - you would return a
> > RecordBatchStream.
> > >>
> > >> DoGet could not have FlightMessageWriter as a return type as that
> > wouldn't make sense, but it could accept an instance of that as a parameter
> > instead, much like DoExchange. That would be a breaking change.
> > >>
> > >> Best,
> > >> David
> > >>
> > >> On Wed, Jun 23, 2021, at 08:47, Gosh Arzumanyan wrote:
> > >>> Hi David,
> > >>>
> > >>> Going through the ArrowFlight API: got confused a bit on DoGet and
> > >>> DoPut/DoExachange apis: the former one expects FlightDataStream which
> > talks
> > >>> in already serialized message terms while the latter to
> > >>> accept FlightMessageReader/Writer which expect the user to pass in
> > >>> RecordBatches etc. Is there any reason why the DoGet can't have
> > >>> FlightMessageWriter as a return type?
> > >>>
> > >>> Cheers,
> > >>> Gosh
> > >>>
> > >>> On Mon, Jun 21, 2021 at 9:47 PM Gosh Arzumanyan <gosh...@gmail.com>
> > wrote:
> > >>>
> > >>> > Thanks David!
> > >>> >
> > >>> > I also responded/added more suggestions/questions to the doc. I
> > think it
> > >>> > makes sense to have two sections: one purely protocol oriented and
> > second
> > >>> > API oriented(examples in c++ or in any other language should make
> > the idea
> > >>> > easier to digest).
> > >>> >
> > >>> > Thanks for the reference too!
> > >>> >
> > >>> > Cheers,
> > >>> > Gosh
> > >>> >
> > >>> > On Mon, Jun 21, 2021 at 4:41 PM David Li <lidav...@apache.org>
> > wrote:
> > >>> >
> > >>> >> Thanks! I've left some initial comments/suggestions to expand it in
> > terms
> > >>> >> of the format definitions and not the C++ APIs.
> > >>> >>
> > >>> >> I'll also note something like this was proposed a long time ago -
> > there's
> > >>> >> not very much discussion about it there but for reference:
> > >>> >>
> > https://lists.apache.org/thread.html/0e5ba78c48cdd0e357f3a4a6d8affd31767c34376b62c001910823af%40%3Cdev.arrow.apache.org%3E
> > >>> >> (or see the thread '[Discuss][FlightRPC] Extensions to Flight:
> > >>> >> "DoBidirectional"' from 2019-2020). It might be good to address why
> > the
> > >>> >> proposed workaround there (union-of-structs) is insufficient for
> > the use
> > >>> >> cases here (and in FlightSQL).
> > >>> >>
> > >>> >> -David
> > >>> >>
> > >>> >> On Mon, Jun 21, 2021, at 08:22, Gosh Arzumanyan wrote:
> > >>> >> > Ah sorry, comments should work now.
> > >>> >> >
> > >>> >> > Cheers,
> > >>> >> > Gosh
> > >>> >> >
> > >>> >> > On Mon., 21 Jun. 2021, 14:18 David Li, <lidav...@apache.org
> > <mailto:
> > >>> >> lidavidm%40apache.org>> wrote:
> > >>> >> >
> > >>> >> > > Thanks! Will give it a look.
> > >>> >> > >
> > >>> >> > > Would you mind opening it up for comments?
> > >>> >> > >
> > >>> >> > > -David
> > >>> >> > >
> > >>> >> > > On 2021/06/21 11:56:24, Gosh Arzumanyan <gosh...@gmail.com
> > <mailto:
> > >>> >> gosharz%40gmail.com>> wrote:
> > >>> >> > > > Hi folks,
> > >>> >> > > >
> > >>> >> > > > Started putting some thoughts together here:
> > >>> >> > > >
> > >>> >> > >
> > >>> >>
> > https://docs.google.com/document/d/1dIOpKNYwsd9sdChsRBAx37BiJXl_7enpwWkH76n1tOI/edit?usp=sharing
> > >>> >> > > > Any feedback is welcome!
> > >>> >> > > >
> > >>> >> > > > Cheers,
> > >>> >> > > > Gosh
> > >>> >> > > >
> > >>> >> > >
> > >>> >> >
> > >>> >>
> > >>> >
> > >>>
> > >>
> >
> 
> 
> --
> 

Reply via email to