On 8/31/2011 8:40 AM, Florian CROUZAT wrote:
Florian CROUZAT wrote on 2011-08-25:

Hi list,

On a few domains (we'll consider only one domain for this example) I
encounter sometimes (seemingly randoms) ServFails while resolving domain
names. A client (192.168.147.2) asks my caching server (192.168.151.100)
to resolve a target (www.leclercdrive.fr)

Here are the relevant logs:

Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.377 queries: info:
client 192.168.147.2#34502: view internal: query: www.leclercdrive.fr IN
A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011 17:14:19.380 queries:
info: client 192.168.147.2#34502: view internal: query:
www.leclercdrive.fr IN A + Aug 24 17:14:19 ns named[24929]: 24-Aug-2011
17:14:19.382 queries: info: client 192.168.147.2#34502: view internal:
query: www.leclercdrive.fr IN A +


A tcpdump on the local side of the NS server shows the A request and the
instant ServFail. A tcpdump on the external side of the NS server shows
no traffic at all in this case meaning it fails internally and doesn't
even try to forward the A request to the Internet.

17:14:19.377608 IP 192.168.147.2.34502>  192.168.151.100.53: 26340+ A?
www.leclercdrive.fr. (37) 17:14:19.378845 IP 192.168.151.100.53>
192.168.147.2.34502: 26340 ServFail 0/0/0 (37) 17:14:19.380607 IP
192.168.147.2.34502>  192.168.151.100.53: 52628+ A? www.leclercdrive.fr.
(37) 17:14:19.381383 IP 192.168.151.100.53>  192.168.147.2.34502: 52628
ServFail 0/0/0 (37) 17:14:19.382605 IP 192.168.147.2.34502>
192.168.151.100.53: 58933+ A? www.leclercdrive.fr. (37) 17:14:19.383406
IP 192.168.151.100.53>  192.168.147.2.34502: 58933 ServFail 0/0/0 (37)

A few minutes before, or later, it worked just fine, see:

17:15:58.736177 IP 192.168.147.2.34502>  192.168.151.100.53: 49610+ A?
www.leclercdrive.fr. (37) 17:15:58.784470 IP 192.168.151.100.53>
192.168.147.2.34502: 49610 3/3/6 CNAME[|domain]

The TTL of the www.leclercdrive.fr entry is 300 - which seems short to
me - maybe the ServFail happens when a request is treated at the exact
time of the TTL reaching zero and the cache entry beeing flushed ? I
tried flushing the cache using rndc but the first request after that
worked just fine (of course...)

Any ideas/hints are welcome.

The DNS server runs 1:9.5.1.dfsg.P3-1+lenny1
cat /etc/debian_version =>  5.0.4
(I have no control on the version of the tools)



I found in my logfiles a few other domains where the ServFails happen, their
respective TTL are all different, from 300 sec to 86400.
I still have no idea at all how to resolve this issue and as far as I
investigated, I haven't been able to identify a pattern in those ServFails.
I'm not even sure the TTL is involved since I saw two ServFail separated in
time by less than the TTL value of the entry...

Florian


The authorative name servers for leclercdrive.fr are a.dns.gandi.net, b.dns.gandi.net and c.dns.gandi.net. I don't know how big gandi.net is, but traceroutes to those servers end up going through Level3 in Baltimore, MD from here. They did have a hurricane go through there and I would not be surprised if traffic levels have been a bit high for the last few days.

Lyle
_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe 
from this list

bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

Reply via email to