Florian CROUZAT wrote onĀ 2011-08-31: > Lyle Giese wrote on 2011-08-31: > >> On 8/31/2011 8:40 AM, Florian CROUZAT wrote: >>> Florian CROUZAT wrote on 2011-08-25: >>> >>>> Hi list, >>>> >>>> On a few domains (we'll consider only one domain for this example) I >>>> encounter sometimes (seemingly randoms) ServFails while resolving >>>> domain names. A client (192.168.147.2) asks my caching server >>>> (192.168.151.100) to resolve a target (www.leclercdrive.fr) >>>> >>>> Here are the relevant logs: >>>> >>>> Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: >>>> info: client 192.168.147.2#34502: view internal: query: >>>> www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]: >>>> 24-Aug-2011 17:14:19.380 queries: info: client 192.168.147.2#34502: >>>> view internal: query: www.leclercdrive.fr IN A + Aug 24 17:14:19 ns >>>> named[24929]: 24-Aug- 2011 17:14:19.382 queries: info: client >>>> 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN A + >>>> >>>> >>>> A tcpdump on the local side of the NS server shows the A request and >>>> the instant ServFail. A tcpdump on the external side of the NS server >>>> shows no traffic at all in this case meaning it fails internally and >>>> doesn't even try to forward the A request to the Internet. >>>> >>>> 17:14:19.377608 IP 192.168.147.2.34502> 192.168.151.100.53: 26340+ >>>> A? www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53> >>>> 192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP >>>> 192.168.147.2.34502> 192.168.151.100.53: 52628+ A? >>>> www.leclercdrive.fr. (37) 17:14:19.381383 IP 192.168.151.100.53> >>>> 192.168.147.2.34502: 52628 ServFail 0/0/0 (37) 17:14:19.382605 IP >>>> 192.168.147.2.34502> 192.168.151.100.53: 58933+ A? >>>> www.leclercdrive.fr. (37) 17:14:19.383406 IP 192.168.151.100.53> >>>> 192.168.147.2.34502: 58933 ServFail 0/0/0 (37) >>>> >>>> A few minutes before, or later, it worked just fine, see: >>>> >>>> 17:15:58.736177 IP 192.168.147.2.34502> 192.168.151.100.53: 49610+ >>>> A? www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53> >>>> 192.168.147.2.34502: 49610 3/3/6 CNAME[|domain] >>>> >>>> The TTL of the www.leclercdrive.fr entry is 300 - which seems short >>>> to me - maybe the ServFail happens when a request is treated at the >>>> exact time of the TTL reaching zero and the cache entry beeing >>>> flushed ? I tried flushing the cache using rndc but the first request >>>> after that worked just fine (of course...) >>>> >>>> Any ideas/hints are welcome. >>>> >>>> The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1 >>>> cat /etc/debian_version => 5.0.4 >>>> (I have no control on the version of the tools) >>> >>> >>> >>> I found in my logfiles a few other domains where the ServFails happen, >>> their respective TTL are all different, from 300 sec to 86400. I still >>> have no idea at all how to resolve this issue and as far as I >>> investigated, I haven't been able to identify a pattern in those >>> ServFails. I'm not even sure the TTL is involved since I saw two >>> ServFail separated in time by less than the TTL value of the entry... >>> >>> Florian >>> >> >> The authorative name servers for leclercdrive.fr are a.dns.gandi.net, >> b.dns.gandi.net and c.dns.gandi.net. I don't know how big gandi.net >> is, but traceroutes to those servers end up going through Level3 in >> Baltimore, MD from here. They did have a hurricane go through there >> and I would not be surprised if traffic levels have been a bit high for >> the last few days. >> >> Lyle > > Well, it's a french registrar, my servers are in France and my clients > are french too so from here the traceroute is pretty neat. Anyway my > problem isn't (apparently) Gandi related, or even www.leclercdrive.fr > related since the ServFails happen internally and instantanetly in my > BIND which doesn't even try to forward the A request. > > > Florian
Apparently -- even if I don't understand why -- the problem seems to be that the NS ({a,b,c}.dns.gandi.net) of leclercdrive.fr and other domains which ServFail have AAAA entries and my caching server has IPv6 enabled but my network doesn't route or handle IPv6. All I had to do to get rid of those ServFails was to add "-4" in the starting options of bind (CentOS: /etc/default/bind9, OPTIONS=) Anyway, I don't really understand whether or not it's a bug in bind that only happens when your interface has a link-local IPv6 addr, the remote NS have AAAA entries and your network doesn't handle IPv6. The solution I applied works, but I'm not satisfied with it. Any precisions are of course welcome. Greetings, Florian
_______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users