+1 In <421fbc7b-f441-4b0a-8626-a8d2dfff0...@app.fastmail.com> "[VOTE] Flight RPC: add 'fallback' URI scheme" on Tue, 27 Feb 2024 09:01:36 -0500, "David Li" <lidav...@apache.org> wrote:
> I would like to propose a 'reuse connection' URI scheme for Flight RPC. This > proposal was previously discussed at [1]. A candidate implementation for C++, > Java, and Go is at [2]. > > The vote will be open for at least 72 hours. > > [ ] +1 > [ ] +0 > [ ] -1 Do not accept this proposal because... > > [1]: https://lists.apache.org/thread/pc9fs0hf8t5ylj9os00r9vg8d2xv2npz > [2]: https://github.com/apache/arrow/pull/40084 > > On Tue, Feb 20, 2024, at 14:14, David Li wrote: >> Thanks for the comments - I've updated the implementation [1] and added >> Go + integration tests. If this all checks out I'd like to start a vote >> soon. >> >> [1]: https://github.com/apache/arrow/pull/40084 >> >> On Fri, Feb 16, 2024, at 13:43, Andrew Lamb wrote: >>> Thank you -- I think the usecase is great, but agree with the other >>> reviewers that the name may be confusing. I left some notes on the ticket >>> >>> Andrew >>> >>> On Wed, Feb 14, 2024 at 3:52 PM David Li <lidav...@apache.org> wrote: >>> >>>> I've put up a candidate implementation sans integration test [1]. >>>> >>>> Some caveats: >>>> - java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?' >>>> (yes, an empty query string pacifies it). I've chosen the latter since the >>>> former is technically a URI with a non-empty path but neither are ideal. >>>> - I've changed the scheme to 'arrow-flight-reuse-connection' to be more >>>> faithful to the intended use than 'fallback'. >>>> >>>> [1]: https://github.com/apache/arrow/pull/40084 >>>> >>>> On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote: >>>> > Hi David, >>>> > >>>> > It's reasonable. I think we can start with your initial proposal (it >>>> > sounds fine to me) and we can always improve step by step. >>>> > >>>> > Thanks ! >>>> > Regards >>>> > JB >>>> > >>>> > On Tue, Feb 13, 2024 at 4:53 PM David Li <lidav...@apache.org> wrote: >>>> >> >>>> >> I'm going to keep the proposal as-is then. It can be extended if this >>>> use case comes up. >>>> >> >>>> >> I'll start work on candidate implementations now. >>>> >> >>>> >> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote: >>>> >> > I think the original proposal is sufficient. >>>> >> > >>>> >> > Also, it is not obvious to me how one would switch from e.g. grpc+tls >>>> to >>>> >> > http without an explicit server location (unless both Flight servers >>>> are >>>> >> > hosted under the same port?). So the "+" proposal seems a bit weird. >>>> >> > >>>> >> > >>>> >> > Le 12/02/2024 à 23:39, David Li a écrit : >>>> >> >> The idea is that the client would reuse the existing connection, in >>>> which case the protocol and such are implicit. (If the client doesn't have >>>> a connection anymore, it can't use the fallback anyways.) >>>> >> >> >>>> >> >> I suppose this has the advantage that you could "fall back" to a >>>> known hostname with a different protocol, but I'm not sure that always >>>> applies anyways. (Correct me if I'm wrong Matt, but as I recall, UCX >>>> addresses aren't hostnames but rather opaque byte blobs, for instance.) >>>> >> >> >>>> >> >> If we do prefer this, to avoid overloading the hostname, there's >>>> also the informal convention of using + in the scheme, so it could be >>>> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc. >>>> >> >> >>>> >> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote: >>>> >> >>> Thanks for clarifying. >>>> >> >>> >>>> >> >>> Given the relationship between these two proposals, would it also be >>>> >> >>> necessary to distinguish the scheme (or schemes) supported by the >>>> >> >>> originating Flight RPC service? >>>> >> >>> >>>> >> >>> If that is the case, it may be preferred to use the "host" portion >>>> of the >>>> >> >>> URI rather than the "scheme" to denote the location of the data. In >>>> this >>>> >> >>> scenario, the host "0.0.0.0" could be used. This IP address is >>>> defined in >>>> >> >>> IETF RFC1122 [1] as "This host on this network", which seems most >>>> >> >>> consistent with the intended use-case. There are some caveats to >>>> this usage >>>> >> >>> but in my experience it's not uncommon for protocols to extend the >>>> >> >>> definition of this address in their own usage. >>>> >> >>> >>>> >> >>> A benefit of this convention is that the scheme remains available >>>> in the >>>> >> >>> URI to specify the transport available. For example, the following >>>> list of >>>> >> >>> locations may be included in the response: >>>> >> >>> >>>> >> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", >>>> <other_locations>...] >>>> >> >>> >>>> >> >>> This would indicate that grpc and ucx transport is available from >>>> the >>>> >> >>> current service, grpc is available at 1.2.3.4, and possibly more >>>> >> >>> combinations of scheme/host. >>>> >> >>> >>>> >> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3 >>>> >> >>> >>>> >> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org> >>>> wrote: >>>> >> >>> >>>> >> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not >>>> >> >>>> specifying it that way. Better ideas for names would be >>>> appreciated. >>>> >> >>>> >>>> >> >>>> The actual precedence has never been specified. All endpoints are >>>> >> >>>> equivalent, so clients may use what is "best". For instance, with >>>> Matt >>>> >> >>>> Topol's concurrent proposal, a GPU-enabled client may >>>> preferentially try >>>> >> >>>> UCX endpoints while other clients may choose to ignore them >>>> entirely (e.g. >>>> >> >>>> because they don't have UCX installed). >>>> >> >>>> >>>> >> >>>> In practice the ADBC/JDBC drivers just scan the list left to right >>>> and try >>>> >> >>>> each endpoint in turn for lack of a better heuristic. >>>> >> >>>> >>>> >> >>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote: >>>> >> >>>>> Thanks for proposing this David. >>>> >> >>>>> >>>> >> >>>>> I think the ability to include the Flight RPC service itself in >>>> the list >>>> >> >>>> of >>>> >> >>>>> endpoints from which data can be fetched is a helpful addition. >>>> >> >>>>> >>>> >> >>>>> The current choice of name for the URI (arrow-flight-fallback://) >>>> seems >>>> >> >>>> to >>>> >> >>>>> imply that there is an order of precedence that should be >>>> considered in >>>> >> >>>> the >>>> >> >>>>> list of URI’s. Specifically, as a developer receiving the list of >>>> >> >>>> locations >>>> >> >>>>> I might assume that I should try fetching from other locations >>>> first. If >>>> >> >>>>> those do not succeed, I may try the original service as a >>>> fallback. >>>> >> >>>>> >>>> >> >>>>> Are these the intended semantics? If so, is there a way to >>>> include the >>>> >> >>>>> original service in the list of locations without the implied >>>> precedence? >>>> >> >>>>> >>>> >> >>>>> Thanks, >>>> >> >>>>> Joel >>>> >> >>>>> >>>> >> >>>>> On Mon, Feb 12, 2024 at 11:52 James Duong < >>>> james.du...@improving.com >>>> >> >>>> .invalid> >>>> >> >>>>> wrote: >>>> >> >>>>> >>>> >> >>>>>> This seems like a good idea, and also improves consistency with >>>> clients >>>> >> >>>>>> that erroneously assumed that the service endpoint was always in >>>> the >>>> >> >>>> list >>>> >> >>>>>> of endpoints. >>>> >> >>>>>> >>>> >> >>>>>> From: Antoine Pitrou <anto...@python.org> >>>> >> >>>>>> Date: Monday, February 12, 2024 at 6:05 AM >>>> >> >>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org> >>>> >> >>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme >>>> >> >>>>>> >>>> >> >>>>>> Hello, >>>> >> >>>>>> >>>> >> >>>>>> This looks fine to me. >>>> >> >>>>>> >>>> >> >>>>>> Regards >>>> >> >>>>>> >>>> >> >>>>>> Antoine. >>>> >> >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> Le 12/02/2024 à 14:46, David Li a écrit : >>>> >> >>>>>>> Hello, >>>> >> >>>>>>> >>>> >> >>>>>>> I'd like to propose a slight update to Flight RPC to make >>>> Flight SQL >>>> >> >>>>>> work better in different deployment scenarios. Comments on the >>>> doc >>>> >> >>>> would >>>> >> >>>>>> be appreciated: >>>> >> >>>>>>> >>>> >> >>>>>>> >>>> >> >>>>>> >>>> >> >>>> >>>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing >>>> >> >>>>>>> >>>> >> >>>>>>> The gist is that FlightEndpoint allows specifying either (1) a >>>> list of >>>> >> >>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to >>>> fetch from >>>> >> >>>> the >>>> >> >>>>>> Flight RPC service itself; but it would be useful to combine both >>>> >> >>>> behaviors >>>> >> >>>>>> (try these concrete URIs and fall back to the Flight RPC service >>>> itself) >>>> >> >>>>>> without requiring the service to know its own public address. >>>> >> >>>>>>> >>>> >> >>>>>>> Best, >>>> >> >>>>>>> David >>>> >> >>>>>> >>>> >> >>>> >>>>