On Thu, Jun 10, 2021 at 8:21 AM Tobias S. Josefowitz <[email protected]> wrote: > > On Wed, Jun 9, 2021 at 10:49 PM Stephen R. van den Berg <[email protected]> wrote: > > > > I'd say, the way forward should be changing the connecting functions to > > iterate through the IP addresses and find the first that actually connects. > > > > >Needn't revert completely. If it's decided that the preference should > > >continue to be A, then it's just a matter of reversing the order of > > >returned values - a simple matter of switching two numbers. > > > > That would be an acceptable workaround until the connection code is fixed, > > I think. > > Maybe I am the one missing something here, but even if "the connection > code" were to learn to try to establish connections to all IPv4+IPv6 > addresses resolved for a hostname, it could not do that on top of > Protocols.DNS.async_host_to_ip, as that returns a single result.
True, but if you call Protocols.HTTP.get_url(), the way that it does its DNS lookups is an implementation detail. > Changing it to result in something other than a single result will > break every single user of Protocols.DNS.async_host_to_ip. > > As Protocols.DNS.async_host_to_ip is thus constrained to providing a > single response, changing it to respond with an IPv6 IP preferred at > this stage of IPv6 deployment is breaking more than it fixes. I would > assume there are still much more IPv4-only than IPv6 hosts in the > world, and the amount of IPv6-only hosts will be severely limited - at > least some 6to4 translation should normally be available. Responding > with IPv6 addresses first, esp. without applying any heuristics > whatsoever to determine if a system might be IPv6-enabled, can thus > only be seen as a severe regression. > > Responding with an IPv6-address if no IPv4-address can be resolved > might not be a regression, but is neither terribly consistent or > elegant, so it might be best to not do that either. > > All in all, I think the solution - beyond restoring useful and > expected behaviour of Protocols.DNS.async_host_to_ip - would be to > introduce a more suitable API - if necessary - and to start using that > instead. Okay. Here's a proposal: 1) Have a Protocols.DNS.prefer_ipv6() that chooses whether IPv6 is preferred over IPv4 for simple lookups. Calling prefer_ipv6(1) will result in the current behaviour, calling prefer_ipv6(0) will return to previous behaviour. This just changes the order of responses, nothing else. 2) Create host_to_ips() which returns (or in the case of a promise, yields) an array with all of the results. By definition, host_to_ip() == host_to_ips()[0] or "". 3) Change Protocols.HTTP.Query() to use async_host_to_ips(). This will change its hostname_cache to carry arrays instead of strings - is that used externally? I found one reference in Protocols.HTTP.Session and one in Web.Crawler, but they're just synchronizing caches, and they don't care what's actually stored in it. 4) Possibly change Protocols.HTTP.dns_lookup to use Protocols.DNS for consistency?? Currently that one uses gethostbyname, so there's a notable behavioural difference between sync_request and thread_request (which use dns_lookup and gethostbyname) and async_request (which uses dns_lookup_async and Protocols.DNS). Otherwise, just tweak dns_lookup to store an array in the cache. 5) Add another phase to asynchronous HTTP connection after obtaining DNS results: it maintains an array of IPs and attempts to connect to them sequentially. The success callback will be the same, the failure callback will attempt the next IP. I'm part-way inclined to start with #4, since there's a weird inconsistency there already. If you mix and match sync_request/thread_request and async_request, the behaviour may depend on which one populates the cache. Is this worth working on? ChrisA
