+1

In <421fbc7b-f441-4b0a-8626-a8d2dfff0...@app.fastmail.com>
  "[VOTE] Flight RPC: add 'fallback' URI scheme" on Tue, 27 Feb 2024 09:01:36 
-0500,
  "David Li" <lidav...@apache.org> wrote:

> I would like to propose a 'reuse connection' URI scheme for Flight RPC. This 
> proposal was previously discussed at [1]. A candidate implementation for C++, 
> Java, and Go is at [2].
> 
> The vote will be open for at least 72 hours.
> 
> [ ] +1 
> [ ] +0
> [ ] -1 Do not accept this proposal because...
> 
> [1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz
> [2]: https://github.com/apache/arrow/pull/40084
> 
> On Tue, Feb 20, 2024, at 14:14, David Li wrote:
>> Thanks for the comments - I've updated the implementation [1] and added 
>> Go + integration tests. If this all checks out I'd like to start a vote 
>> soon.
>>
>> [1]: https://github.com/apache/arrow/pull/40084
>>
>> On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote:
>>> Thank you -- I think the usecase is great, but agree with the other
>>> reviewers that the name may be confusing. I left some notes on the ticket
>>>
>>> Andrew
>>>
>>> On Wed, Feb 14, 2024 at 3:52 PM David Li <lidav...@apache.org> wrote:
>>>
>>>> I've put up a candidate implementation sans integration test [1].
>>>>
>>>> Some caveats:
>>>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?'
>>>> (yes, an empty query string pacifies it). I've chosen the latter since the
>>>> former is technically a URI with a non-empty path but neither are ideal.
>>>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more
>>>> faithful to the intended use than 'fallback'.
>>>>
>>>> [1]: https://github.com/apache/arrow/pull/40084
>>>>
>>>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
>>>> > Hi David,
>>>> >
>>>> > It's reasonable. I think we can start with your initial proposal (it
>>>> > sounds fine to me) and we can always improve step by step.
>>>> >
>>>> > Thanks !
>>>> > Regards
>>>> > JB
>>>> >
>>>> > On Tue, Feb 13, 2024 at 4:53 PM David Li <lidav...@apache.org> wrote:
>>>> >>
>>>> >> I'm going to keep the proposal as-is then. It can be extended if this
>>>> use case comes up.
>>>> >>
>>>> >> I'll start work on candidate implementations now.
>>>> >>
>>>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>>>> >> > I think the original proposal is sufficient.
>>>> >> >
>>>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls
>>>> to
>>>> >> > http without an explicit server location (unless both Flight servers
>>>> are
>>>> >> > hosted under the same port?). So the "+" proposal seems a bit weird.
>>>> >> >
>>>> >> >
>>>> >> > Le 12/02/2024 à 23:39, David Li a écrit :
>>>> >> >> The idea is that the client would reuse the existing connection, in
>>>> which case the protocol and such are implicit. (If the client doesn't have
>>>> a connection anymore, it can't use the fallback anyways.)
>>>> >> >>
>>>> >> >> I suppose this has the advantage that you could "fall back" to a
>>>> known hostname with a different protocol, but I'm not sure that always
>>>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX
>>>> addresses aren't hostnames but rather opaque byte blobs, for instance.)
>>>> >> >>
>>>> >> >> If we do prefer this, to avoid overloading the hostname, there's
>>>> also the informal convention of using + in the scheme, so it could be
>>>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>>>> >> >>
>>>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>>>> >> >>> Thanks for clarifying.
>>>> >> >>>
>>>> >> >>> Given the relationship between these two proposals, would it also be
>>>> >> >>> necessary to distinguish the scheme (or schemes) supported by the
>>>> >> >>> originating Flight RPC service?
>>>> >> >>>
>>>> >> >>> If that is the case, it may be preferred to use the "host" portion
>>>> of the
>>>> >> >>> URI rather than the "scheme" to denote the location of the data. In
>>>> this
>>>> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is
>>>> defined in
>>>> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>>>> >> >>> consistent with the intended use-case. There are some caveats to
>>>> this usage
>>>> >> >>> but in my experience it's not uncommon for protocols to extend the
>>>> >> >>> definition of this address in their own usage.
>>>> >> >>>
>>>> >> >>> A benefit of this convention is that the scheme remains available
>>>> in the
>>>> >> >>> URI to specify the transport available. For example, the following
>>>> list of
>>>> >> >>> locations may be included in the response:
>>>> >> >>>
>>>> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4",
>>>> <other_locations>...]
>>>> >> >>>
>>>> >> >>> This would indicate that grpc and ucx transport is available from
>>>> the
>>>> >> >>> current service, grpc is available at 1.2.3.4, and possibly more
>>>> >> >>> combinations of scheme/host.
>>>> >> >>>
>>>> >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>>>> >> >>>
>>>> >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org>
>>>> wrote:
>>>> >> >>>
>>>> >> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not
>>>> >> >>>> specifying it that way.  Better ideas for names would be
>>>> appreciated.
>>>> >> >>>>
>>>> >> >>>> The actual precedence has never been specified. All endpoints are
>>>> >> >>>> equivalent, so clients may use what is "best". For instance, with
>>>> Matt
>>>> >> >>>> Topol's concurrent proposal, a GPU-enabled client may
>>>> preferentially try
>>>> >> >>>> UCX endpoints while other clients may choose to ignore them
>>>> entirely (e.g.
>>>> >> >>>> because they don't have UCX installed).
>>>> >> >>>>
>>>> >> >>>> In practice the ADBC/JDBC drivers just scan the list left to right
>>>> and try
>>>> >> >>>> each endpoint in turn for lack of a better heuristic.
>>>> >> >>>>
>>>> >> >>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
>>>> >> >>>>> Thanks for proposing this David.
>>>> >> >>>>>
>>>> >> >>>>> I think the ability to include the Flight RPC service itself in
>>>> the list
>>>> >> >>>> of
>>>> >> >>>>> endpoints from which data can be fetched is a helpful addition.
>>>> >> >>>>>
>>>> >> >>>>> The current choice of name for the URI (arrow-flight-fallback://)
>>>> seems
>>>> >> >>>> to
>>>> >> >>>>> imply that there is an order of precedence that should be
>>>> considered in
>>>> >> >>>> the
>>>> >> >>>>> list of URI’s. Specifically, as a developer receiving the list of
>>>> >> >>>> locations
>>>> >> >>>>> I might assume that I should try fetching from other locations
>>>> first. If
>>>> >> >>>>> those do not succeed, I may try the original service as a
>>>> fallback.
>>>> >> >>>>>
>>>> >> >>>>> Are these the intended semantics? If so, is there a way to
>>>> include the
>>>> >> >>>>> original service in the list of locations without the implied
>>>> precedence?
>>>> >> >>>>>
>>>> >> >>>>> Thanks,
>>>> >> >>>>> Joel
>>>> >> >>>>>
>>>> >> >>>>> On Mon, Feb 12, 2024 at 11:52 James Duong <
>>>> james.du...@improving.com
>>>> >> >>>> .invalid>
>>>> >> >>>>> wrote:
>>>> >> >>>>>
>>>> >> >>>>>> This seems like a good idea, and also improves consistency with
>>>> clients
>>>> >> >>>>>> that erroneously assumed that the service endpoint was always in
>>>> the
>>>> >> >>>> list
>>>> >> >>>>>> of endpoints.
>>>> >> >>>>>>
>>>> >> >>>>>> From: Antoine Pitrou <anto...@python.org>
>>>> >> >>>>>> Date: Monday, February 12, 2024 at 6:05 AM
>>>> >> >>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>>>> >> >>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
>>>> >> >>>>>>
>>>> >> >>>>>> Hello,
>>>> >> >>>>>>
>>>> >> >>>>>> This looks fine to me.
>>>> >> >>>>>>
>>>> >> >>>>>> Regards
>>>> >> >>>>>>
>>>> >> >>>>>> Antoine.
>>>> >> >>>>>>
>>>> >> >>>>>>
>>>> >> >>>>>> Le 12/02/2024 à 14:46, David Li a écrit :
>>>> >> >>>>>>> Hello,
>>>> >> >>>>>>>
>>>> >> >>>>>>> I'd like to propose a slight update to Flight RPC to make
>>>> Flight SQL
>>>> >> >>>>>> work better in different deployment scenarios.  Comments on the
>>>> doc
>>>> >> >>>> would
>>>> >> >>>>>> be appreciated:
>>>> >> >>>>>>>
>>>> >> >>>>>>>
>>>> >> >>>>>>
>>>> >> >>>>
>>>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
>>>> >> >>>>>>>
>>>> >> >>>>>>> The gist is that FlightEndpoint allows specifying either (1) a
>>>> list of
>>>> >> >>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to
>>>> fetch from
>>>> >> >>>> the
>>>> >> >>>>>> Flight RPC service itself; but it would be useful to combine both
>>>> >> >>>> behaviors
>>>> >> >>>>>> (try these concrete URIs and fall back to the Flight RPC service
>>>> itself)
>>>> >> >>>>>> without requiring the service to know its own public address.
>>>> >> >>>>>>>
>>>> >> >>>>>>> Best,
>>>> >> >>>>>>> David
>>>> >> >>>>>>
>>>> >> >>>>
>>>>

Reply via email to