>
> 1. It seems like renaming stream_id to schema_id and delegating "logical
> stream" distinction to app_metadata mitigates the "multiplexing" point
> while at the same time it gives enough flexibility to address both Nate's
> and our use cases.


I don't think this is the case.  It seems that having no additional fields
added and then sending a new schema when necessary combined with Union of
Structs would solve most use-cases.  The main downside could be potential
performance implications if the schema is changing frequently.  Gosh, could
you address why this wouldn't be sufficient (either here or on the doc).

Thanks,
-Micah

On Fri, Jun 25, 2021 at 5:30 AM Gosh Arzumanyan <gosh...@gmail.com> wrote:

> Hi guys,
>
> Thanks for sharing your insights/concerns! I also left some comments based
> on the discussion we had. Briefly:
>
> 1. It seems like renaming stream_id to schema_id and delegating "logical
> stream" distinction to app_metadata mitigates the "multiplexing" point
> while at the same time it gives enough flexibility to address both Nate's
> and our use cases.
> 2. To David's point about other transports: in fact currently we are using
> other transports(aside from gRPC) so we don't wanna depend on only gRPC
> features.
>
> Cheers,
> Gosh
>
> On Wed, Jun 23, 2021 at 10:40 PM David Li <lidav...@apache.org> wrote:
>
> > Thanks for chiming in - I've replied in the doc. Scoping it to just
> schema
> > evolution would be preferable, but I'm not sure if Gosh's usecase
> requires
> > more flexibility than that or not.
> >
> > Again, though, given that 1) gRPC recycles a connection, so repeated
> calls
> > aren't necessarily expensive and 2) encoding tricks like
> union-of-structs,
> > any solution needs to be weighed against those/we should make sure to
> > document why they aren't sufficient. (For instance, 1) is hampered by the
> > use of L7 load balancers and/or client-side load balancing policies in
> gRPC
> > and assumes statefulness which is undesirable in general. There's also
> the
> > eventual desire to have a transport besides gRPC someday.)
> >
> > -David
> >
> > On Wed, Jun 23, 2021, at 16:24, Nate Bauernfeind wrote:
> >
> > Thanks for writing this up! I added a few general comments, but have a
> > question on the approach because it's not quite what I was expecting.
> >
> > I am slightly concerned that the proposal looks more like support for
> > "multiplexing" IPC streams into a single RPC stream rather than support
> for
> > a changing Schema of an otherwise consistently logical stream. gRPC
> already
> > does a good job decoupling RPC streams from one another. I feel that
> > throwing that idea into the IPC stream increases client-library
> > implementation cost by quite a lot.
> >
> > Why is it not good enough to replace the Schema when we see a duplicate?
> > This is undoubtedly less work across all client implementations.
> >
> > The benefit I see is that you might have two schemas that you swap
> between
> > frequently then you can indicate with a single integer. If that's what
> you
> > want to support I would rather think of them as `schema_id` instead of
> > `stream_id` and not give this impression that multiplexing is a goal. As
> > you have proposed, it seems that the "done writing for a stream" needs a
> > callback notifying the user receiving the stream that a logical subset of
> > the flight is complete. Alternatively, if they aren't independent streams
> > (to the end-user), we could tell the Arrow layer that a particular schema
> > is no longer needed without also needing to communicate further
> downstream.
> >
> > On Wed, Jun 23, 2021 at 1:39 PM David Li <lidav...@apache.org> wrote:
> >
> > > Ah to be clear, the API is indeed inconsistent - DoExchange was added
> > some
> > > time later (and by its nature returning a FlightDataStream would not
> have
> > > been possible, since it's meant to be able to interleave
> > reading/writing).
> > > But really, DoGet is indeed the odd one out in the C++ API and it may
> be
> > > worth correcting. You could also perhaps imagine making a
> > FlightDataStream
> > > implementation that accepts a closure and provides it a fake writer, if
> > the
> > > API mismatch is hard to work with...
> > >
> > > That said: this has some benefits, e.g. for a Python service that
> returns
> > > a Table, that means data can be fed into gRPC entirely in C++ without
> > > having to bounce into Python for each chunk.
> > >
> > > Best,
> > > David
> > >
> > > On Wed, Jun 23, 2021, at 15:33, Gosh Arzumanyan wrote:
> > > > Hi David,
> > > >
> > > > Got you. In fact I was looking at this more from the point of view of
> > > consistency of the API in terms of "inputs" and thought DoExchange is
> > kind
> > > of a DoGet+ so might make sense to have the same classes being utilized
> > in
> > > both places. But again, I might be missing something and I get the
> point
> > > about breaking change.
> > > >
> > > > Cheers,
> > > > Gosh
> > > >
> > > > On Wed, Jun 23, 2021 at 2:58 PM David Li <lidav...@apache.org>
> wrote:
> > > >> __
> > > >> It's mostly a quirk of implementation (and just for clarification,
> > > they're all nearly identical on the format/protocol level).
> > > >>
> > > >> DoGet is conceptualized as your application returning a readable
> > stream
> > > of batches, instead of your application imperatively writing batches to
> > the
> > > client. (This is different than how Flight is implemented in Java.) You
> > > would normally not implement FlightDataStream - you would return a
> > > RecordBatchStream.
> > > >>
> > > >> DoGet could not have FlightMessageWriter as a return type as that
> > > wouldn't make sense, but it could accept an instance of that as a
> > parameter
> > > instead, much like DoExchange. That would be a breaking change.
> > > >>
> > > >> Best,
> > > >> David
> > > >>
> > > >> On Wed, Jun 23, 2021, at 08:47, Gosh Arzumanyan wrote:
> > > >>> Hi David,
> > > >>>
> > > >>> Going through the ArrowFlight API: got confused a bit on DoGet and
> > > >>> DoPut/DoExachange apis: the former one expects FlightDataStream
> which
> > > talks
> > > >>> in already serialized message terms while the latter to
> > > >>> accept FlightMessageReader/Writer which expect the user to pass in
> > > >>> RecordBatches etc. Is there any reason why the DoGet can't have
> > > >>> FlightMessageWriter as a return type?
> > > >>>
> > > >>> Cheers,
> > > >>> Gosh
> > > >>>
> > > >>> On Mon, Jun 21, 2021 at 9:47 PM Gosh Arzumanyan <gosh...@gmail.com
> >
> > > wrote:
> > > >>>
> > > >>> > Thanks David!
> > > >>> >
> > > >>> > I also responded/added more suggestions/questions to the doc. I
> > > think it
> > > >>> > makes sense to have two sections: one purely protocol oriented
> and
> > > second
> > > >>> > API oriented(examples in c++ or in any other language should make
> > > the idea
> > > >>> > easier to digest).
> > > >>> >
> > > >>> > Thanks for the reference too!
> > > >>> >
> > > >>> > Cheers,
> > > >>> > Gosh
> > > >>> >
> > > >>> > On Mon, Jun 21, 2021 at 4:41 PM David Li <lidav...@apache.org>
> > > wrote:
> > > >>> >
> > > >>> >> Thanks! I've left some initial comments/suggestions to expand it
> > in
> > > terms
> > > >>> >> of the format definitions and not the C++ APIs.
> > > >>> >>
> > > >>> >> I'll also note something like this was proposed a long time ago
> -
> > > there's
> > > >>> >> not very much discussion about it there but for reference:
> > > >>> >>
> > >
> >
> https://lists.apache.org/thread.html/0e5ba78c48cdd0e357f3a4a6d8affd31767c34376b62c001910823af%40%3Cdev.arrow.apache.org%3E
> > > >>> >> (or see the thread '[Discuss][FlightRPC] Extensions to Flight:
> > > >>> >> "DoBidirectional"' from 2019-2020). It might be good to address
> > why
> > > the
> > > >>> >> proposed workaround there (union-of-structs) is insufficient for
> > > the use
> > > >>> >> cases here (and in FlightSQL).
> > > >>> >>
> > > >>> >> -David
> > > >>> >>
> > > >>> >> On Mon, Jun 21, 2021, at 08:22, Gosh Arzumanyan wrote:
> > > >>> >> > Ah sorry, comments should work now.
> > > >>> >> >
> > > >>> >> > Cheers,
> > > >>> >> > Gosh
> > > >>> >> >
> > > >>> >> > On Mon., 21 Jun. 2021, 14:18 David Li, <lidav...@apache.org
> > > <mailto:
> > > >>> >> lidavidm%40apache.org>> wrote:
> > > >>> >> >
> > > >>> >> > > Thanks! Will give it a look.
> > > >>> >> > >
> > > >>> >> > > Would you mind opening it up for comments?
> > > >>> >> > >
> > > >>> >> > > -David
> > > >>> >> > >
> > > >>> >> > > On 2021/06/21 11:56:24, Gosh Arzumanyan <gosh...@gmail.com
> > > <mailto:
> > > >>> >> gosharz%40gmail.com>> wrote:
> > > >>> >> > > > Hi folks,
> > > >>> >> > > >
> > > >>> >> > > > Started putting some thoughts together here:
> > > >>> >> > > >
> > > >>> >> > >
> > > >>> >>
> > >
> >
> https://docs.google.com/document/d/1dIOpKNYwsd9sdChsRBAx37BiJXl_7enpwWkH76n1tOI/edit?usp=sharing
> > > >>> >> > > > Any feedback is welcome!
> > > >>> >> > > >
> > > >>> >> > > > Cheers,
> > > >>> >> > > > Gosh
> > > >>> >> > > >
> > > >>> >> > >
> > > >>> >> >
> > > >>> >>
> > > >>> >
> > > >>>
> > > >>
> > >
> >
> >
> > --
> >
> >
> >
>

Reply via email to