I've put up a candidate implementation sans integration test [1].

Some caveats:
- java.net.URI doesn't accept 'scheme://', only 'scheme:/' or 'scheme://?' 
(yes, an empty query string pacifies it). I've chosen the latter since the 
former is technically a URI with a non-empty path but neither are ideal. 
- I've changed the scheme to 'arrow-flight-reuse-connection' to be more 
faithful to the intended use than 'fallback'.

[1]: https://github.com/apache/arrow/pull/40084

On Tue, Feb 13, 2024, at 13:01, Jean-Baptiste Onofré wrote:
> Hi David,
>
> It's reasonable. I think we can start with your initial proposal (it
> sounds fine to me) and we can always improve step by step.
>
> Thanks !
> Regards
> JB
>
> On Tue, Feb 13, 2024 at 4:53 PM David Li <lidav...@apache.org> wrote:
>>
>> I'm going to keep the proposal as-is then. It can be extended if this use 
>> case comes up.
>>
>> I'll start work on candidate implementations now.
>>
>> On Tue, Feb 13, 2024, at 03:22, Antoine Pitrou wrote:
>> > I think the original proposal is sufficient.
>> >
>> > Also, it is not obvious to me how one would switch from e.g. grpc+tls to
>> > http without an explicit server location (unless both Flight servers are
>> > hosted under the same port?). So the "+" proposal seems a bit weird.
>> >
>> >
>> > Le 12/02/2024 à 23:39, David Li a écrit :
>> >> The idea is that the client would reuse the existing connection, in which 
>> >> case the protocol and such are implicit. (If the client doesn't have a 
>> >> connection anymore, it can't use the fallback anyways.)
>> >>
>> >> I suppose this has the advantage that you could "fall back" to a known 
>> >> hostname with a different protocol, but I'm not sure that always applies 
>> >> anyways. (Correct me if I'm wrong Matt, but as I recall, UCX addresses 
>> >> aren't hostnames but rather opaque byte blobs, for instance.)
>> >>
>> >> If we do prefer this, to avoid overloading the hostname, there's also the 
>> >> informal convention of using + in the scheme, so it could be 
>> >> arrow-flight-fallback+grpc+tls://, arrow-flight-fallback+http://, etc.
>> >>
>> >> On Mon, Feb 12, 2024, at 17:03, Joel Lubinitsky wrote:
>> >>> Thanks for clarifying.
>> >>>
>> >>> Given the relationship between these two proposals, would it also be
>> >>> necessary to distinguish the scheme (or schemes) supported by the
>> >>> originating Flight RPC service?
>> >>>
>> >>> If that is the case, it may be preferred to use the "host" portion of the
>> >>> URI rather than the "scheme" to denote the location of the data. In this
>> >>> scenario, the host "0.0.0.0" could be used. This IP address is defined in
>> >>> IETF RFC1122 [1] as "This host on this network", which seems most
>> >>> consistent with the intended use-case. There are some caveats to this 
>> >>> usage
>> >>> but in my experience it's not uncommon for protocols to extend the
>> >>> definition of this address in their own usage.
>> >>>
>> >>> A benefit of this convention is that the scheme remains available in the
>> >>> URI to specify the transport available. For example, the following list 
>> >>> of
>> >>> locations may be included in the response:
>> >>>
>> >>> ["grpc://0.0.0.0", "ucx://0.0.0.0", "grpc://1.2.3.4", 
>> >>> <other_locations>...]
>> >>>
>> >>> This would indicate that grpc and ucx transport is available from the
>> >>> current service, grpc is available at 1.2.3.4, and possibly more
>> >>> combinations of scheme/host.
>> >>>
>> >>> [1] https://datatracker.ietf.org/doc/html/rfc1122#section-3.2.1.3
>> >>>
>> >>> On Mon, Feb 12, 2024 at 2:53 PM David Li <lidav...@apache.org> wrote:
>> >>>
>> >>>> Ah, while I was thinking of it as useful for a fallback, I'm not
>> >>>> specifying it that way.  Better ideas for names would be appreciated.
>> >>>>
>> >>>> The actual precedence has never been specified. All endpoints are
>> >>>> equivalent, so clients may use what is "best". For instance, with Matt
>> >>>> Topol's concurrent proposal, a GPU-enabled client may preferentially try
>> >>>> UCX endpoints while other clients may choose to ignore them entirely 
>> >>>> (e.g.
>> >>>> because they don't have UCX installed).
>> >>>>
>> >>>> In practice the ADBC/JDBC drivers just scan the list left to right and 
>> >>>> try
>> >>>> each endpoint in turn for lack of a better heuristic.
>> >>>>
>> >>>> On Mon, Feb 12, 2024, at 14:28, Joel Lubinitsky wrote:
>> >>>>> Thanks for proposing this David.
>> >>>>>
>> >>>>> I think the ability to include the Flight RPC service itself in the 
>> >>>>> list
>> >>>> of
>> >>>>> endpoints from which data can be fetched is a helpful addition.
>> >>>>>
>> >>>>> The current choice of name for the URI (arrow-flight-fallback://) seems
>> >>>> to
>> >>>>> imply that there is an order of precedence that should be considered in
>> >>>> the
>> >>>>> list of URI’s. Specifically, as a developer receiving the list of
>> >>>> locations
>> >>>>> I might assume that I should try fetching from other locations first. 
>> >>>>> If
>> >>>>> those do not succeed, I may try the original service as a fallback.
>> >>>>>
>> >>>>> Are these the intended semantics? If so, is there a way to include the
>> >>>>> original service in the list of locations without the implied 
>> >>>>> precedence?
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Joel
>> >>>>>
>> >>>>> On Mon, Feb 12, 2024 at 11:52 James Duong <james.du...@improving.com
>> >>>> .invalid>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> This seems like a good idea, and also improves consistency with 
>> >>>>>> clients
>> >>>>>> that erroneously assumed that the service endpoint was always in the
>> >>>> list
>> >>>>>> of endpoints.
>> >>>>>>
>> >>>>>> From: Antoine Pitrou <anto...@python.org>
>> >>>>>> Date: Monday, February 12, 2024 at 6:05 AM
>> >>>>>> To: dev@arrow.apache.org <dev@arrow.apache.org>
>> >>>>>> Subject: Re: [DISCUSS] Flight RPC: add 'fallback' URI scheme
>> >>>>>>
>> >>>>>> Hello,
>> >>>>>>
>> >>>>>> This looks fine to me.
>> >>>>>>
>> >>>>>> Regards
>> >>>>>>
>> >>>>>> Antoine.
>> >>>>>>
>> >>>>>>
>> >>>>>> Le 12/02/2024 à 14:46, David Li a écrit :
>> >>>>>>> Hello,
>> >>>>>>>
>> >>>>>>> I'd like to propose a slight update to Flight RPC to make Flight SQL
>> >>>>>> work better in different deployment scenarios.  Comments on the doc
>> >>>> would
>> >>>>>> be appreciated:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>> https://docs.google.com/document/d/1g9M9FmsZhkewlT1mLibuceQO8ugI0-fqumVAXKFjVGg/edit?usp=sharing
>> >>>>>>>
>> >>>>>>> The gist is that FlightEndpoint allows specifying either (1) a list 
>> >>>>>>> of
>> >>>>>> concrete URIs to fetch data from or (2) no URIs, meaning to fetch from
>> >>>> the
>> >>>>>> Flight RPC service itself; but it would be useful to combine both
>> >>>> behaviors
>> >>>>>> (try these concrete URIs and fall back to the Flight RPC service 
>> >>>>>> itself)
>> >>>>>> without requiring the service to know its own public address.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> David
>> >>>>>>
>> >>>>

Reply via email to