Hello, I've had a hard time identifying the source of intermittent name resolution failure for a customer. The source of the problem is a DNS spec violation with a RRSET with multiple CNAME:
dig @ns-29-b.gandi.net CNAME lb.qual.flash-global.net ; <<>> DiG 9.18.2-1+ubuntu20.04.1+isc+3-Ubuntu <<>> @ns-29-b.gandi.net CNAME lb.qual.flash-global.net ; (2 servers found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42945 ;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;lb.qual.flash-global.net. IN CNAME ;; ANSWER SECTION: lb.qual.flash-global.net. 10800 IN CNAME lb1.qual.flash-global.net. lb.qual.flash-global.net. 10800 IN CNAME lb2.qual.flash-global.net. ;; Query time: 10 msec ;; SERVER: 213.167.230.30#53(ns-29-b.gandi.net) (UDP) ;; WHEN: Fri May 13 15:03:00 CEST 2022 ;; MSG SIZE rcvd: 89 If I try the resolution via my Bind (9.18.2) resolver, cache cold, it properly return a SERVFAIL: dig @172.29.0.36 +dnssec +cd CNAME lb.qual.flash-global.net ; <<>> DiG 9.18.2-1+ubuntu20.04.1+isc+3-Ubuntu <<>> @172.29.0.36 +dnssec +cd CNAME lb.qual.flash-global.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 24053 ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1232 ; COOKIE: 23ac9b539bf16ad001000000627e57c0b7d630e657322232 (good) ;; QUESTION SECTION: ;lb.qual.flash-global.net. IN CNAME ;; Query time: 30 msec ;; SERVER: 172.29.0.36#53(172.29.0.36) (UDP) ;; WHEN: Fri May 13 15:06:09 CEST 2022 ;; MSG SIZE rcvd: 81 because the authoritative answer is correctly identified as invalid: named[147998]: FORMERR resolving 'lb.qual.flash-global.net/CNAME/IN': 213.167.230.30#53 named[147998]: FORMERR resolving 'lb.qual.flash-global.net/CNAME/IN': 217.70.187.161#53 named[147998]: FORMERR resolving 'lb.qual.flash-global.net/CNAME/IN': 173.246.100.82#53 Google DNS returns the same. If I do a A request, I get an (unexpected in my opinion) answer: dig @172.29.0.36 +dnssec +cd A lb.qual.flash-global.net ; <<>> DiG 9.18.2-1+ubuntu20.04.1+isc+3-Ubuntu <<>> @172.29.0.36 +dnssec +cd A lb.qual.flash-global.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26546 ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1232 ; COOKIE: b5755aa921e65a4401000000627e58a481dbcf3655737b6b (good) ;; QUESTION SECTION: ;lb.qual.flash-global.net. IN A ;; ANSWER SECTION: lb.qual.flash-global.net. 10800 IN CNAME lb1.qual.flash-global.net. lb.qual.flash-global.net. 10800 IN RRSIG CNAME 13 4 10800 20220526000000 20220505000000 57605 flash-global.net. NVDmeCSKkx998LRnmiB6hWz4PdZJ5WPG6CCrDTSP587pLUxxoxeNlCmJ l8l0p8/l8o+ZmZr1EXqxUA1FXpGbGw== lb1.qual.flash-global.net. 600 IN A 51.68.158.37 lb1.qual.flash-global.net. 600 IN RRSIG A 13 4 600 20220526000000 20220505000000 57605 flash-global.net. G1YUaDtWVGxj5NbA18crQ912tW/VWra49wi3U1EeRio9kId+2mwo7Vuj GH8adlvvjQyps7IBtj9gYVmbewN+GQ== ;; Query time: 30 msec ;; SERVER: 172.29.0.36#53(172.29.0.36) (UDP) ;; WHEN: Fri May 13 15:09:57 CEST 2022 ;; MSG SIZE rcvd: 339 Google DNS do the same BUT Now on my side I have cache pollution as a new CNAME request give me dig @172.29.0.36 +dnssec +cd CNAME lb.qual.flash-global.net ; <<>> DiG 9.18.2-1+ubuntu20.04.1+isc+3-Ubuntu <<>> @172.29.0.36 +dnssec +cd CNAME lb.qual.flash-global.net ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42637 ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 1232 ; COOKIE: ea748ef065e32df101000000627e59947b2e1424679d72f2 (good) ;; QUESTION SECTION: ;lb.qual.flash-global.net. IN CNAME ;; ANSWER SECTION: lb.qual.flash-global.net. 10560 IN CNAME lb1.qual.flash-global.net. lb.qual.flash-global.net. 10560 IN RRSIG CNAME 13 4 10800 20220526000000 20220505000000 57605 flash-global.net. NVDmeCSKkx998LRnmiB6hWz4PdZJ5WPG6CCrDTSP587pLUxxoxeNlCmJ l8l0p8/l8o+ZmZr1EXqxUA1FXpGbGw== ;; Query time: 20 msec ;; SERVER: 172.29.0.36#53(172.29.0.36) (UDP) ;; WHEN: Fri May 13 15:13:56 CEST 2022 ;; MSG SIZE rcvd: 211 until I issue a rndc flush command. This cache pollution is bad and seems to not happen on the google side (but there are many DNS behind 8.8.8.8). I would have expected a SERVFAIL/FORMERR in the A request case. Even if I could understand a conservative approach from the Google side, I don't buy it for Bind and expect a configuration directive to reject it. If this (the A case) is an expected behavior for Bind, I think that the cache pollution is not and should be fixed. am I wrong ? The question of whether Gandi should correct the fact of being able/allow to declare several CNAMEs on an entry and how to contact them to fix this is more a question for dns-operation. Emmanuel. -- Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information. bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users