Hi Micah,

Sure, let me do it here:

   1. In our case we do expect relatively frequent changes in the schema of
   the batch being sent out. I don't see that pattern changing in the mid term
   for a good reason. However long term maybe it will be possible to leverage
   separate RPC calls. I left some description in the comments thread here
   
<https://docs.google.com/document/d/1dIOpKNYwsd9sdChsRBAx37BiJXl_7enpwWkH76n1tOI/edit?disco=AAAAMkkJ9xQ>
   .
   2. Regarding the union of structs trick: while it is a good workaround
   for most of the cases as of now there are also some aspects which deserve
   consideration:
      1. There is a  space overhead
      2. There is an additional glue code to make it work
      3. Considering that this solution is bubbling up frequently in
      similar contexts, seems like users might benefit if it  was "natively"
      supported and they could just focus on populating the schemas they really
      need to.
   3. Re complexity of "one schema at a time" vs "schema id based": i think
   they are not much different, right? In fact the second one is more of an
   optimization to the first one which is beneficial to us. Anyways even with
   the first approach you need to add some logic of schema change. Adding the
   schema identification is not that complex at first glance.

Cheers,
Gosh

On Fri, Jun 25, 2021 at 6:25 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> >
> > 1. It seems like renaming stream_id to schema_id and delegating "logical
> > stream" distinction to app_metadata mitigates the "multiplexing" point
> > while at the same time it gives enough flexibility to address both Nate's
> > and our use cases.
>
>
> I don't think this is the case.  It seems that having no additional fields
> added and then sending a new schema when necessary combined with Union of
> Structs would solve most use-cases.  The main downside could be potential
> performance implications if the schema is changing frequently.  Gosh, could
> you address why this wouldn't be sufficient (either here or on the doc).
>
> Thanks,
> -Micah
>
> On Fri, Jun 25, 2021 at 5:30 AM Gosh Arzumanyan <gosh...@gmail.com> wrote:
>
> > Hi guys,
> >
> > Thanks for sharing your insights/concerns! I also left some comments
> based
> > on the discussion we had. Briefly:
> >
> > 1. It seems like renaming stream_id to schema_id and delegating "logical
> > stream" distinction to app_metadata mitigates the "multiplexing" point
> > while at the same time it gives enough flexibility to address both Nate's
> > and our use cases.
> > 2. To David's point about other transports: in fact currently we are
> using
> > other transports(aside from gRPC) so we don't wanna depend on only gRPC
> > features.
> >
> > Cheers,
> > Gosh
> >
> > On Wed, Jun 23, 2021 at 10:40 PM David Li <lidav...@apache.org> wrote:
> >
> > > Thanks for chiming in - I've replied in the doc. Scoping it to just
> > schema
> > > evolution would be preferable, but I'm not sure if Gosh's usecase
> > requires
> > > more flexibility than that or not.
> > >
> > > Again, though, given that 1) gRPC recycles a connection, so repeated
> > calls
> > > aren't necessarily expensive and 2) encoding tricks like
> > union-of-structs,
> > > any solution needs to be weighed against those/we should make sure to
> > > document why they aren't sufficient. (For instance, 1) is hampered by
> the
> > > use of L7 load balancers and/or client-side load balancing policies in
> > gRPC
> > > and assumes statefulness which is undesirable in general. There's also
> > the
> > > eventual desire to have a transport besides gRPC someday.)
> > >
> > > -David
> > >
> > > On Wed, Jun 23, 2021, at 16:24, Nate Bauernfeind wrote:
> > >
> > > Thanks for writing this up! I added a few general comments, but have a
> > > question on the approach because it's not quite what I was expecting.
> > >
> > > I am slightly concerned that the proposal looks more like support for
> > > "multiplexing" IPC streams into a single RPC stream rather than support
> > for
> > > a changing Schema of an otherwise consistently logical stream. gRPC
> > already
> > > does a good job decoupling RPC streams from one another. I feel that
> > > throwing that idea into the IPC stream increases client-library
> > > implementation cost by quite a lot.
> > >
> > > Why is it not good enough to replace the Schema when we see a
> duplicate?
> > > This is undoubtedly less work across all client implementations.
> > >
> > > The benefit I see is that you might have two schemas that you swap
> > between
> > > frequently then you can indicate with a single integer. If that's what
> > you
> > > want to support I would rather think of them as `schema_id` instead of
> > > `stream_id` and not give this impression that multiplexing is a goal.
> As
> > > you have proposed, it seems that the "done writing for a stream" needs
> a
> > > callback notifying the user receiving the stream that a logical subset
> of
> > > the flight is complete. Alternatively, if they aren't independent
> streams
> > > (to the end-user), we could tell the Arrow layer that a particular
> schema
> > > is no longer needed without also needing to communicate further
> > downstream.
> > >
> > > On Wed, Jun 23, 2021 at 1:39 PM David Li <lidav...@apache.org> wrote:
> > >
> > > > Ah to be clear, the API is indeed inconsistent - DoExchange was added
> > > some
> > > > time later (and by its nature returning a FlightDataStream would not
> > have
> > > > been possible, since it's meant to be able to interleave
> > > reading/writing).
> > > > But really, DoGet is indeed the odd one out in the C++ API and it may
> > be
> > > > worth correcting. You could also perhaps imagine making a
> > > FlightDataStream
> > > > implementation that accepts a closure and provides it a fake writer,
> if
> > > the
> > > > API mismatch is hard to work with...
> > > >
> > > > That said: this has some benefits, e.g. for a Python service that
> > returns
> > > > a Table, that means data can be fed into gRPC entirely in C++ without
> > > > having to bounce into Python for each chunk.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > On Wed, Jun 23, 2021, at 15:33, Gosh Arzumanyan wrote:
> > > > > Hi David,
> > > > >
> > > > > Got you. In fact I was looking at this more from the point of view
> of
> > > > consistency of the API in terms of "inputs" and thought DoExchange is
> > > kind
> > > > of a DoGet+ so might make sense to have the same classes being
> utilized
> > > in
> > > > both places. But again, I might be missing something and I get the
> > point
> > > > about breaking change.
> > > > >
> > > > > Cheers,
> > > > > Gosh
> > > > >
> > > > > On Wed, Jun 23, 2021 at 2:58 PM David Li <lidav...@apache.org>
> > wrote:
> > > > >> __
> > > > >> It's mostly a quirk of implementation (and just for clarification,
> > > > they're all nearly identical on the format/protocol level).
> > > > >>
> > > > >> DoGet is conceptualized as your application returning a readable
> > > stream
> > > > of batches, instead of your application imperatively writing batches
> to
> > > the
> > > > client. (This is different than how Flight is implemented in Java.)
> You
> > > > would normally not implement FlightDataStream - you would return a
> > > > RecordBatchStream.
> > > > >>
> > > > >> DoGet could not have FlightMessageWriter as a return type as that
> > > > wouldn't make sense, but it could accept an instance of that as a
> > > parameter
> > > > instead, much like DoExchange. That would be a breaking change.
> > > > >>
> > > > >> Best,
> > > > >> David
> > > > >>
> > > > >> On Wed, Jun 23, 2021, at 08:47, Gosh Arzumanyan wrote:
> > > > >>> Hi David,
> > > > >>>
> > > > >>> Going through the ArrowFlight API: got confused a bit on DoGet
> and
> > > > >>> DoPut/DoExachange apis: the former one expects FlightDataStream
> > which
> > > > talks
> > > > >>> in already serialized message terms while the latter to
> > > > >>> accept FlightMessageReader/Writer which expect the user to pass
> in
> > > > >>> RecordBatches etc. Is there any reason why the DoGet can't have
> > > > >>> FlightMessageWriter as a return type?
> > > > >>>
> > > > >>> Cheers,
> > > > >>> Gosh
> > > > >>>
> > > > >>> On Mon, Jun 21, 2021 at 9:47 PM Gosh Arzumanyan <
> gosh...@gmail.com
> > >
> > > > wrote:
> > > > >>>
> > > > >>> > Thanks David!
> > > > >>> >
> > > > >>> > I also responded/added more suggestions/questions to the doc. I
> > > > think it
> > > > >>> > makes sense to have two sections: one purely protocol oriented
> > and
> > > > second
> > > > >>> > API oriented(examples in c++ or in any other language should
> make
> > > > the idea
> > > > >>> > easier to digest).
> > > > >>> >
> > > > >>> > Thanks for the reference too!
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> > Gosh
> > > > >>> >
> > > > >>> > On Mon, Jun 21, 2021 at 4:41 PM David Li <lidav...@apache.org>
> > > > wrote:
> > > > >>> >
> > > > >>> >> Thanks! I've left some initial comments/suggestions to expand
> it
> > > in
> > > > terms
> > > > >>> >> of the format definitions and not the C++ APIs.
> > > > >>> >>
> > > > >>> >> I'll also note something like this was proposed a long time
> ago
> > -
> > > > there's
> > > > >>> >> not very much discussion about it there but for reference:
> > > > >>> >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/0e5ba78c48cdd0e357f3a4a6d8affd31767c34376b62c001910823af%40%3Cdev.arrow.apache.org%3E
> > > > >>> >> (or see the thread '[Discuss][FlightRPC] Extensions to Flight:
> > > > >>> >> "DoBidirectional"' from 2019-2020). It might be good to
> address
> > > why
> > > > the
> > > > >>> >> proposed workaround there (union-of-structs) is insufficient
> for
> > > > the use
> > > > >>> >> cases here (and in FlightSQL).
> > > > >>> >>
> > > > >>> >> -David
> > > > >>> >>
> > > > >>> >> On Mon, Jun 21, 2021, at 08:22, Gosh Arzumanyan wrote:
> > > > >>> >> > Ah sorry, comments should work now.
> > > > >>> >> >
> > > > >>> >> > Cheers,
> > > > >>> >> > Gosh
> > > > >>> >> >
> > > > >>> >> > On Mon., 21 Jun. 2021, 14:18 David Li, <lidav...@apache.org
> > > > <mailto:
> > > > >>> >> lidavidm%40apache.org>> wrote:
> > > > >>> >> >
> > > > >>> >> > > Thanks! Will give it a look.
> > > > >>> >> > >
> > > > >>> >> > > Would you mind opening it up for comments?
> > > > >>> >> > >
> > > > >>> >> > > -David
> > > > >>> >> > >
> > > > >>> >> > > On 2021/06/21 11:56:24, Gosh Arzumanyan <
> gosh...@gmail.com
> > > > <mailto:
> > > > >>> >> gosharz%40gmail.com>> wrote:
> > > > >>> >> > > > Hi folks,
> > > > >>> >> > > >
> > > > >>> >> > > > Started putting some thoughts together here:
> > > > >>> >> > > >
> > > > >>> >> > >
> > > > >>> >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1dIOpKNYwsd9sdChsRBAx37BiJXl_7enpwWkH76n1tOI/edit?usp=sharing
> > > > >>> >> > > > Any feedback is welcome!
> > > > >>> >> > > >
> > > > >>> >> > > > Cheers,
> > > > >>> >> > > > Gosh
> > > > >>> >> > > >
> > > > >>> >> > >
> > > > >>> >> >
> > > > >>> >>
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> > >
> > > --
> > >
> > >
> > >
> >
>

Reply via email to