>
> The operation flow would be like this, or what would it look like?
> Client ---> GetFlightInfo (query/update operation in payload) ---> Server
> ---> Results (non-streamed)


This is roughly the flow I was imagining if the server chooses to send back
inlined data.

-Micah

On Tue, Mar 8, 2022 at 11:27 AM Gavin Ray <ray.gavi...@gmail.com> wrote:

> Thank you for doing this, left a few questions on the GH issue
>
> I would adopt this proposal as soon as it makes it into nightlies
> (or possibly earlier if it's just a matter of regenerating the proto
> definitions)
>
> The operation flow would be like this, or what would it look like?
>
> Client ---> GetFlightInfo (query/update operation in payload) ---> Server
> ---> Results (non-streamed)
>
>
>
>
> On Tue, Mar 8, 2022 at 2:04 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Some people have already left comments on
>> https://github.com/apache/arrow/pull/12571  More eyes on it would be
>> appreciated.  If there aren't more comments, I'll try to start
>> implementing
>> this feature in Flight next week, and hopefully have a vote after it is
>> supported in Java and C++/Python.
>>
>>
>> Thanks,
>> Micah
>>
>> On Fri, Mar 4, 2022 at 10:54 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>> > I put together straw-man proposal in PR [1] for the Flight changes.
>> > Ultimately, it seemed based on the use-cases discussed inlining the
>> data on
>> > the Ticket made the most sense.  This might be overly complex (I'm not
>> sure
>> > how I feel about a enum indicating partial vs full results) but welcome
>> > feedback.  Once we get consensus on this proposal, I can add changes to
>> > Flight SQL and try to provide reference implementations.
>> >
>> > [1] https://github.com/apache/arrow/pull/12571
>> >
>> > On Tue, Mar 1, 2022 at 10:51 PM Micah Kornfield <emkornfi...@gmail.com>
>> > wrote:
>> >
>> >> Would it make sense to make this part of DoGet since it
>> >>> still would be returning a record batch
>> >>
>> >> I would lean against this. I think in many cases the client doesn't
>> know
>> >> the size of the data that it expects.  Leaving the flexibility on the
>> >> server side to send back inlined data when it thinks it makes sense,
>> or a
>> >> bunch of tickets when there is in fact a lot of data seems like the
>> best
>> >> option here.
>> >>
>> >> For cases like previewing data, you usually just want to get a small
>> >>> amount
>> >>> of data quickly.
>> >>
>> >> This is interesting and might be an additional use case.  If we did
>> >> decide to extend FlightInfo we might also want a way of annotating
>> inlined
>> >> data with its corresponding ticket.  That way even for large results,
>> you
>> >> could still send back a small preview if desired.
>> >>
>> >> After considering it a little bit I think I'm sold that inlined data
>> >> should not replace a ticket.  So in my mind the open question is
>> whether
>> >> the client needs to actively opt-in to inlined data.  The scenarios I
>> could
>> >> come with where inlined data isn't useful are:
>> >> 1.  The client is an old client and isn't aware inline data might be
>> >> returned.  In this case the main cost is of extra data on the wire and
>> >> storing it as unknown fields [1].
>> >> 2.  The client is a new client but still doesn't want to get inline
>> data
>> >> (it might want to distribute all consumption to other processes).  Same
>> >> cost is paid as option 1.
>> >>
>> >> Are there other scenarios?  If servers choose reasonable limits on what
>> >> data to inline, the extra complexity of negotiating with the client in
>> this
>> >> case might not be worth the benefits.
>> >>
>> >> Cheers,
>> >> Micah
>> >>
>> >>
>> >> [1]
>> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
>> >>
>> >> On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com>
>> wrote:
>> >>
>> >>> I think this would be a useful feature and be nice to have in Flight
>> >>> core.
>> >>> For cases like previewing data, you usually just want to get a small
>> >>> amount
>> >>> of data quickly. Would it make sense to make this part of DoGet since
>> it
>> >>> still would be returning a record batch? Perhaps a Ticket could be
>> made
>> >>> to
>> >>> have an optional FlightDescriptor that would serve as an all-in-one
>> shot?
>> >>>
>> >>> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org> wrote:
>> >>>
>> >>> > I agree with something along Antoine's proposal, though: maybe we
>> >>> should
>> >>> > be more structured with the flags (akin to what Micah mentioned with
>> >>> the
>> >>> > Feature enum).
>> >>> >
>> >>> > Also, the flag could be embedded into the Flight SQL messages
>> instead.
>> >>> (So
>> >>> > in effect, Flight would only add the capability to return data with
>> >>> > FlightInfo, and it's up to applications, like Flight SQL, to decide
>> how
>> >>> > they want to take advantage of that.)
>> >>> >
>> >>> > I think having a completely separate method and return type and
>> having
>> >>> to
>> >>> > poll for it beforehand somewhat defeats the purpose of having
>> it/would
>> >>> be
>> >>> > much harder of a transition.
>> >>> >
>> >>> > Also: it should be `repeated FlightInfo inline_data` right? In case
>> we
>> >>> > also need dictionary batches?
>> >>> >
>> >>> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
>> >>> > > Can we just add the following field to the FlightDescriptor
>> message:
>> >>> > >
>> >>> > >   bool accept_inline_data = 4;
>> >>> > >
>> >>> > > and this one to the FlightInfo message:
>> >>> > >
>> >>> > >   FlightData inline_data = 100;
>> >>> > >
>> >>> > > Then new clients can `accept_inline_data` to true (the default
>> being
>> >>> > > false if omitted) to signal servers that they can put the data if
>> >>> > > `inline_data` if deemed small enough.
>> >>> > >
>> >>> > > (the `accept_inline_data` field could also be used to the Criteria
>> >>> > > message)
>> >>> > >
>> >>> > >
>> >>> > > Alternatively, if the FlightDescriptor expansion looks a bit dirty
>> >>> > > (FlightDescriptor being used in other contexts where
>> >>> > > `accept_inline_data` makes no sense), we can instead define a new
>> >>> > > method:
>> >>> > >
>> >>> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns (FlightInfo)
>> {}
>> >>> > >
>> >>> > > with:
>> >>> > >
>> >>> > > message GetFlightInfoRequest {
>> >>> > >   FlightDescriptor flight_descriptor = 1;
>> >>> > >   bool accept_inline_data = 2;
>> >>> > > }
>> >>> > >
>> >>> > > Regards
>> >>> > >
>> >>> > > Antoine.
>> >>> > >
>> >>> > >
>> >>> > > On Mon, 28 Feb 2022 11:29:12 -0800
>> >>> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
>> >>> > >> This seems reasonable, however we need to account for existing
>> >>> Flight
>> >>> > >> clients that were written before this.
>> >>> > >>
>> >>> > >> It seems like the server will need to still handle the ticket
>> >>> returned
>> >>> > for
>> >>> > >> getStream() for clients that are unaware of the small result
>> >>> > optimization.
>> >>> > >>
>> >>> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <lidav...@apache.org>
>> >>> wrote:
>> >>> > >>
>> >>> > >> > Ah, that makes more sense, that would be a reasonable
>> extension to
>> >>> > Flight
>> >>> > >> > overall. (While we're at it, I think it would help to have an
>> >>> > app_metadata
>> >>> > >> > field in FlightInfo as well.)
>> >>> > >> >
>> >>> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
>> >>> > >> > >>
>> >>> > >> > >> But it seems reasonable to add a one-shot query path using
>> >>> DoGet.
>> >>> > >> > >
>> >>> > >> > >
>> >>> > >> > > I was thinking more of adding a bytes field to FlightInfo
>> that
>> >>> > could
>> >>> > >> > store
>> >>> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
>> >>> necessary
>> >>> > for
>> >>> > >> > > small results when executing a CMD.  The client doesn't
>> >>> necessarily
>> >>> > know
>> >>> > >> > > whether a query will return large or small results.
>> >>> > >> > >
>> >>> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <
>> lidav...@apache.org>
>> >>> > wrote:
>> >>> > >> > >
>> >>> > >> > >> I think the focus was on large result sets (though I don't
>> >>> recall
>> >>> > this
>> >>> > >> > >> being discussed before) and supporting multi-node setups
>> (hence
>> >>> > >> > >> GetFlightInfo/DoGet are separated). But it seems reasonable
>> to
>> >>> add
>> >>> > a
>> >>> > >> > >> one-shot query path using DoGet.
>> >>> > >> > >>
>> >>> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
>> >>> > >> > >> > I saw the same. A small, stateless query ability would be
>> >>> nice
>> >>> > >> > >> (connection
>> >>> > >> > >> > open, initialization, query in one message, the resultset
>> in
>> >>> > the
>> >>> > >> > response
>> >>> > >> > >> > in one message)
>> >>> > >> > >> >
>> >>> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
>> >>> > emkornfi...@gmail.com>
>> >>> > >> > >> wrote:
>> >>> > >> > >> >
>> >>> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not
>> sure
>> >>> if
>> >>> > I'm
>> >>> > >> > >> missing
>> >>> > >> > >> >> it but is there any optimization for small results?  My
>> >>> concern
>> >>> > is
>> >>> > >> > that
>> >>> > >> > >> the
>> >>> > >> > >> >> overhead of the RPCs for the DoGet after executing the
>> query
>> >>> > could
>> >>> > >> > add
>> >>> > >> > >> >> non-trivial latency for smaller results.
>> >>> > >> > >> >>
>> >>> > >> > >> >> Has anybody else thought about this/investigated it?  Am
>> I
>> >>> > >> > understanding
>> >>> > >> > >> >> this correctly?
>> >>> > >> > >> >>
>> >>> > >> > >> >> Thanks,
>> >>> > >> > >> >> Micah
>> >>> > >> > >> >>
>> >>> > >> > >>
>> >>> > >> >
>> >>> > >>
>> >>> > >>
>> >>> >
>> >>>
>> >>
>>
>

Reply via email to