>  I'm pretty sure that c-ares is already doing this next server as a parallel 
> query, just the default timeout isn't where you expect.  If you set it lower, 
> it will start a second request at that point the timeout is hit, but if the 
> first request responds,
>  it will still use that response if the next server on the list hasn't yet 
> responded .... its been a while since I looked at the code, but that seems to 
> be what I recall.

Nope. C-ares iterates name servers sequentially and waits until DNS timeout 
occurs before switching to the other name server in the list.
It matches the expected behavior for resolv.conf on Linux, which prescribes 
resolver to iterate name servers sequentially.

For resolv.conf c-ares honors only the “rotate” option, which allows to start 
not from the first server in the name server list, but not any other options.

While sequential approach makes sense in general, it doesn’t work well for 
cases with bad name servers (either dual or single stack) where the fastest 
name resolution is very critical, and it also makes overall DNS timeout 
non-deterministic depending on a number of bad servers in the list.
So, for such cases we either need to have internal sorting putting good servers 
on top, or use some kind of parallel approach.

Thanks,
Dmitry Karpov

From: Brad House <b...@brad-house.com>
Sent: Wednesday, January 19, 2022 4:24 PM
To: c-ares discussions <c-ares@lists.haxx.se>
Cc: Dmitry Karpov <dkar...@roku.com>
Subject: Re: Feature request for parallel queries for name servers from 
different protocol families (IPv4 vs IPv6)

I'm pretty sure that c-ares is already doing this next server as a parallel 
query, just the default timeout isn't where you expect.  If you set it lower, 
it will start a second request at that point the timeout is hit, but if the 
first request responds, it will still use that response if the next server on 
the list hasn't yet responded .... its been a while since I looked at the code, 
but that seems to be what I recall.  What c-ares does NOT have is an overall 
query timeout ... that has been requested previously, but it doesn't currently 
exist (though I agree it should).  The logic for retries once it hits the end 
of the list of nameservers is a bit weird so predicting when a query will 
return a failed result is basically impossible from what I recall.  So this 
seems to be converging on what I originally suggested then, except now it 
sounds like also adding the ability to set an overall query timeout.

On 1/19/22 7:04 PM, Dmitry Karpov via c-ares wrote:
>  Again, there's a reason happy eyeballs doesn't just hammer all endpoints 
> returned from getaddrinfo() simultaneously, I'd think the same reasoning 
> would go for DNS servers ... be kind ... start a second query after a short 
> delay if we haven't received a response yet (e.g. 200ms).
> It doesn't make sense to hammer more than 1 DNS server if they're all 
> responsive, you just doubled the network load for DNS for no reason.


Very true! But in my parallel approach, I didn’t mean to start all parallel 
queries simultaneously.
I didn’t nail the details, but obviously such approach should be similar to the 
Happy Eyeballs even for single stacks.

So, parallel queries in the parallel approach should be started with some small 
delays like 200ms in Happy Eyeballs, but the whole name resolution should be 
controlled by one constant and deterministic timeout – i.e. 5s, which shouldn’t 
depend on the number of the name servers in the list, as it is currently the 
case with c-ares.
In my use cases, using c-ares with libcurl, I see different name resolution 
timeouts: 5s, 15s,… depending on a number of bad name servers in the list, 
which cause some my time critical services to fail.


And we can’t just use 200ms as a DNS timeout per name server and iterate name 
servers sequentially, because there are high-latency satellite links with big 
RTTs, which require 2s and sometimes more for name resolutions.
That’s why the parallel approach (with delays between parallel queries) seems 
to me as a better solution for bad name servers than the sequential one.

But as I said, any improvements in this area will be very welcomed c-ares 
extensions, especially if they help libcurl with c-ares, used by a lot of 
people, to better handle issues with bad name servers.

Thanks,
Dmitry Karpov


From: Brad House <b...@brad-house.com><mailto:b...@brad-house.com>
Sent: Wednesday, January 19, 2022 2:37 PM
To: c-ares discussions <c-ares@lists.haxx.se><mailto:c-ares@lists.haxx.se>
Cc: Dmitry Karpov <dkar...@roku.com><mailto:dkar...@roku.com>
Subject: Re: Feature request for parallel queries for name servers from 
different protocol families (IPv4 vs IPv6)

I guess it always depends on the design of whatever is using c-ares.  In my own 
use cases, I have a single ares_channel running on an event loop and enqueue my 
lookups to there ... so it keeps state.  Nothing with thread local storage or 
anything, just dispatching to that event loop for any DNS queries that need to 
be performed.  The single ares_channel can handle multiple simultaneous DNS 
queries.

Also, since there is a proposed feedback loop, if a DNS server is no longer 
reachable, it will re-sort the list for any future requests, so it would only 
impact a single request (ok, well, whatever number of requests came in before 
the timeout or error occurred).

Again, there's a reason happy eyeballs doesn't just hammer all endpoints 
returned from getaddrinfo() simultaneously, I'd think the same reasoning would 
go for DNS servers ... be kind ... start a second query after a short delay if 
we haven't received a response yet (e.g. 200ms).  It doesn't make sense to 
hammer more than 1 DNS server if they're all responsive, you just doubled the 
network load for DNS for no reason.



On 1/19/22 5:25 PM, Dmitry Karpov via c-ares wrote:

>  I wasn't suggesting this be outside of c-ares, I was talking about 
> implementing this inside of c-ares as a simpler alternative to your proposal.



OK, I got it know. :)
Pre-sorting name servers based on reachability from previous queries or/and 
protocol family may help in some cases, but the sequential approach, even with 
sorting, still will have some issues that the parallel approach allows to solve 
more efficiently.

For example, the first query when nothing is sorted, may cause critical 
connection timeouts aborting some applications, and storing name server 
“reachability metrics” which name servers will be sorted on will require either 
thread local storage (thus requiring each thread to go through the same “name 
server discovery” procedure as the other app threads using c-ares) or some 
global access to the metrics data with proper read/write accesses, needed by 
multi-threaded apps.

Also, if run-time conditions change from the previous query then the sorted 
list may be not sorted correctly for the current conditions, and thus not the 
best server or even bad server may be tried first, thus increasing name 
resolution time.

The parallel approach, on the other hand, will provide the fastest name 
resolution regardless the previous queries, so it doesn’t need to store any 
name server metrics and do pre-processing of the name server list from OS.

But I agree that implementing parallel approach may be not very easy and any 
improvements in this area will be a very welcomed extension, anyway.
So, if you think that updated sequential approach with smart sorting is much 
easier to implement than the parallel one, then hopefully we can get it in next 
c-ares updates.

Thanks,
Dmitry Karpov


From: Brad House <b...@brad-house.com><mailto:b...@brad-house.com>
Sent: Wednesday, January 19, 2022 12:10 PM
To: c-ares discussions <c-ares@lists.haxx.se><mailto:c-ares@lists.haxx.se>
Cc: Dmitry Karpov <dkar...@roku.com><mailto:dkar...@roku.com>
Subject: Re: Feature request for parallel queries for name servers from 
different protocol families (IPv4 vs IPv6)

Commenting below ...
On 1/19/22 2:51 PM, Dmitry Karpov via c-ares wrote:
> Infact, happyeyeballs itself doesn't always do parallel connection attempts, 
> its an implementation-defined delay before also attempting the next address 
> in the list.

In case of Happy Eyeballs, a delay between IPv4 and IPv6 connections is 
constant and typically relatively short – 200-300ms.
But non-functional IPv6 name servers in the server list may create dynamic 
delays in connection establishment which can be very large.




By default, c-ares uses 5s timeout per name server, so it may take 5s and more 
(if several IPv6 name servers are in the list)  to get to the connection Happy 
Eyeballs thus taking much more than expected 200-300ms.

It would be assumed as part of this patch set, this timer would be reduced.






> It would be much easier to stay closer to happy eyeballs and just sort the 
> dns server list using prior result success/fail (even upfront sorting using 
> some algorithm to interleave ipv6/ipv4 in a pattern would help,
>  maybe with using logic such as from RFC6724 sec 2.1 like we do in 
> ares_getaddrinfo for returned addresses, but instead of the nameservers 
> themselves).

Yes, of course, it is possible that c-ares client can implement some kind of 
name server sorting/filtering logic outside of c-ares and just pass a list of 
“good” name servers to c-ares,  but in this case it has to be more involved 
into the name resolution business than it would be desired.

I wasn't suggesting this be outside of c-ares, I was talking about implementing 
this inside of c-ares as a simpler alternative to your proposal.

-Brad







-- 
c-ares mailing list
c-ares@lists.haxx.se
https://lists.haxx.se/listinfo/c-ares

Reply via email to