Package: bind9 Version: 1:9.5.1.dfsg.P1-1 Severity: important
Since we upgraded our bind9 name servers from Etch to Lenny we are experiencing occasional hangs. While all requests for authoritative zones are still answered correctly we can't seem to get replies for recursive queries. All we get is NXDOMAIN until we init.d/restart the bind process. Every tenth or so request is answered properly but the next request fails again with NXDOMAIN. So the successful response from the root servers doesn't seem to get served from the internal cache either. In that situation our log file fills up with: Mar 4 07:21:45 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:21:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:07 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:57 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:22:59 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:03 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:09 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:32 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:36 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:37 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:23:43 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:26:23 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:26:34 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:31:35 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:33 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:35 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:32:45 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:47 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:51 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:54 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:54 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:56 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:33:58 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:00 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:03 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:12 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:12 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found Mar 4 07:34:13 pns named[7077]: general: checkhints: unable to get root NS rrset from cache: not found We traced (tshark) what's happening on the network and it seems like bind9 isn't even sending out requests to the internet if we send it a recursive query from inside/LAN. Instead is instantly replies with NXDOMAIN. This situation is happening every few days and requires a bind restart or else our clients can't run recursive queries any more (which apparently isn't making them happy). Our name server serves nearly 500 authoritative zones and is used as a forwarder for the internal/LAN clients. "rndc status" shows: ========================================== version: 9.5.1-P1 number of zones: 511 debug level: 0 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is OFF recursive clients: 6/0/1000 tcp clients: 0/100 server is up and running ========================================== Our named.conf* looks basically like this: ========================================== options { directory "/var/cache/bind"; max-cache-size 10m; datasize unlimited; stacksize unlimited; coresize default; auth-nxdomain no; check-names master ignore; transfers-per-ns 50; transfers-in 20; cleaning-interval 0; transfer-format one-answer; notify yes; allow-recursion { internal; }; allow-query { internal; }; allow-transfer { foo; bar; }; also-notify { x.x.x.x; x.x.x.x; }; }; logging { category "lame-servers" { null; }; channel default_syslog { syslog daemon; print-category yes; }; category "default" { "default_syslog"; "default_debug"; }; }; key "rndc-key" { algorithm hmac-md5; secret "1D0NtG1V3Y0U0UrS3Cr3tK3Y=="; }; controls { inet 127.0.0.1 port 953 allow { 127.0.0.1; } keys { "rndc-key"; }; }; ========================================== The only other log message that was worrying us is: Mar 4 07:18:03 pns named[7077]: general: max open files (1024) is smaller than max sockets (4096) Let us know if there is anything else we can debug in case of that situation. Cheers Christoph -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org