[dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

Doug Barton Thu, 02 Apr 2020 13:06:06 -0700

Howdy,

I redelegated shopdisney.co.uk this morning. I can see that all of theNominet authorities are returning the correct new NS set, however I havea number of reports of resolution failures. There are resolvers fromOpenDNS, Google, Virgin, O2, and others that are not finding any nameservers at all, and refusing to re-query. This is causing address recordresolution failures for users behind those resolvers.

What is odd to me is that earlier this week we cross-pollinated the oldand new zone files with both the old and new sets of name servers. Ihave seen situations in the past where cutting cleanly from one set ofname servers to a completely different set has caused problems, so wetake this extra step of updating the zones so that no matter what pointin the process we're at the resolving name servers will always have atleast one good set to query. It's always worked for me in the past.

What's even more strange is that we also did shopdisney.it this morning,having done the same preparation, and it's solid as a rock. It's onlythe CO.UK name that is failing. When querying OpenDNS or Google directlyI get the same result when it fails:


dig @8.8.4.4 shopdisney.co.uk ns
; <<>> DiG 9.10.6 <<>> @8.8.4.4 shopdisney.co.uk ns
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1587
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shopdisney.co.uk.              IN      NS

;; Query time: 501 msec
;; SERVER: 8.8.4.4#53(8.8.4.4)
;; WHEN: Thu Apr 02 12:28:46 PDT 2020
;; MSG SIZE  rcvd: 45

The flags are the same for the OpenDNS servers.

Has anyone seen this happen before? I've seen plenty of cases whereresolvers have hung onto the old NS set for too long (following theparent TTL instead of the child), which is why I have been adding bothsets of name servers to both zones in advance of the redelegation. But Ihave literally never seen a case where a resolver not only has no NSrecords, but also will not re-query.

My first thought was that Nominet withdrew the delegation for a shortperiod, and the resolvers have a negative cache entry, but when doingthe UAT this morning I happened to catch the exact point at which theychanged. In serial number 1308977661 they had the old NS set, and in1308977662 they had the new one. So that doesn't seem to be the problem.

If anyone from OpenDNS and/or Google can take a look at a resolver thatis failing for shopdisney.co.uk and tell me what's in the logs I woulddeeply appreciate it. Since I can't figure out what happened, I'm notsure how to mitigate it for the next change.

In the past I've taken the intermediate step of also updating the parentdelegation to include both NS sets, which I plan to do for the next setof updates just to be on the safe side, but given this fun new failuremode it's not clear to me that even doing that will insulate us.


Any thoughts/help/advice welcome,

Doug
_______________________________________________
dns-operations mailing list
[email protected]
https://lists.dns-oarc.net/mailman/listinfo/dns-operations

[dns-operations] OpenDNS, Google, Nominet - New delegation update failure mode

Reply via email to