Hi,

A couple of recent changes forced by the security fixes and general hardening 
made BIND 9 go in two different directions:

1. Limiting the number of outgoing queries: max-recursion-queries (single query 
without restarts in resolution), max-query-count (total number per single 
client query), max-query-restarts (how many CNAME/DNAME restarts), and 
max-recursion-depth (maximum levels of recursion) - these are described more in 
detail in the ARM.

2. sending more outgoing queries - validating nameserver queries (ADB), 
ignoring extra records in the incoming DNS messages, and asking for these 
explicitly (some types of GLUE are now ignored). BIND 9 was designed from the 
very beginning to fill up the caches as quickly as possible; cache memory still 
has lower latency than network even nowadays.

The aftermath is that the recursive server with cold cache might return 
SERVFAIL on the first try for some names because of TLDs referring other TLDs, 
long CDN CNAME chains jumping from domain to domain, etc.

One of the most recent examples that were given on the mailing list was 
teams.microsoft.com:

Asking for this name on a cold cache BIND 9.20 server ends with

$ grep -c "sending packet to" named.run
114

On 9.21 (future 9.22):

$ grep -c "sending packet from" named.run
110

As you can see, there are more than 100 outgoing DNS queries for a single name 
queried, and often this leads to a SERVFAIL. Verisign's Transitive Trust 
Checker can be used to visualize this: 
https://trans-trust.verisignlabs.com/?z=teams.microsoft.com

Another example that recently circulated around was a reverse name: 
https://trans-trust.verisignlabs.com/?z=195.5.90.45.in-addr.arpa

$ grep -c "sending packet to" named.run
166

Now, there is a merge request in preparation that reduces the number of 
outgoing queries by not delaying the fanning out on the nameservers. Instead of 
sending A and AAAA queries for each nameserver in the set, it sends one and 
waits 100 ms for the response. If the response is not received, it continues 
with the next server, and so on and so on until all nameservers have been 
tried. For 2 nameservers, this might incur a 100 ms delay if the first one does 
not respond (or is slow). For 13 nameservers, we suddenly get a 1200 ms delay 
if all but the last one is not responsive.

This is a big change to the way the resolver operates, and thus we would like 
to gather some real-world data from people willing to run their resolvers with 
this patch.

However, there are a couple of requirements, especially you must:
1. know how to patch and compile the named from the source (and perhaps do that 
more than once).
2. be willing to communicate about this on the GitLab merge request 
(https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/11205), new updates 
will be posted there.
3. know how to compile BIND 9 with debug symbols and either keep them inside 
the binaries or use the detached symbols. Either is fine, but a possible 
coredump that shows just "???" instead of symbols is mostly unusable.
3. be willing to share testing cases, both where it helped and where it didn't.
4. not get angry if named crashes, doesn't work, etc.

The whole MR is mostly still a work in progress; the extra system tests that 
would test the timed-fallbacks are still missing. And that's also a reason why 
we are looking for some extra testers that might provide us with real-world 
examples of what is currently broken.

Now, how does the patched version improve things?

- teams.microsoft.com

$ grep -c "sending packet to" named.run
79

- 195.5.90.45.in-addr.arpa

$ grep -c "sending packet to" named.run
45

Much better, right?

If you read so far and you are still interested in testing this, the latest 
tarball is always available in the latest pipeline in the tarball-create job in 
the "precheck" stage, but I've also copied the latest one into a latest comment 
in the MR itself: 
https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/11205#note_611712

Disclaimer: This work might be a bust or it could hit a dead end.

Thanks,
--
Ondřej Surý (He/Him)
[email protected]

My working hours and your working hours may be different. Please do not feel 
obligated to reply outside your normal working hours.

-- 
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from 
this list.

Reply via email to