A few thoughts:
* You can check for dropped packets on the receive path with # netstat -u -s High numbers on "packet receive errors” can indicate an overflow in the receive buffer - this is fixable by network stack tuning as Mike Mitchell suggests. * You can check for dropped packets on the send path by looking for "error sending response: unset” in the named logs ...similarly fixable with sysctl tuning. We changed the following: net.core.rmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_max = 16777216 net.core.wmem_default = 16777216 net.core.netdev_max_backlog = 5000 net.unix.max_dgram_qlen = 100 * Try watching your incoming UDP packet buffers in tight intervals at the same time as top # watch -n 0.1 'cat /proc/net/udp | grep ":0035 00000000:0000 "' OR # watch -n 0.1 'cat /proc/net/udp | grep -v "00000000:00000000 00:00000000 00000000”' # top -d 0.1 -p $PID_OF_NAMED # where $PID_OF_NAMED is the named pid Does the named unresponsiveness coincide with the UDP rx_queue filling up and named dropping to 0% CPU usage? Mathew Eis Northern Arizona University Information Technology Services mathew....@nau.edu (928) 523-2960 -----Original Message----- From: Michael Brunnbauer <bru...@netestate.de> Date: Friday, April 1, 2016 at 9:29 AM To: Mathew Eis <mathew....@nau.edu> Cc: "bind-users@lists.isc.org" <bind-users@lists.isc.org>, <d...@dotat.at> Subject: Re: Recursive bind becomes unresponsive with high load > >Hello Mathew, > >On Fri, Apr 01, 2016 at 04:01:04PM +0000, Mathew Ian Eis wrote: >> What OS are you running your BIND server on? Is it virtualized? > >Linux Kernel 3.4.111 with glibc 2.22, 32bit, not virtualized. No distribution - >everything was compiled by hand. > >> Is it fully unresponsive, or could it be simply taking longer to respond >> than your client timeout? > >Assuming that bind would report dropped queries, I guess it is the latter. > >Regarding the suggestion made by Tony Finch about too many TCP connections >in the TIME_WAIT status: That would have been a good explanation. But I do not >see more than 200 TCP connections in TIME_WAIT status when the problem occurs >and not more than 5000 TCP/UDP connections with port 53. > >cu, >brunni > >-- >++ Michael Brunnbauer >++ netEstate GmbH >++ Geisenhausener Straße 11a >++ 81379 München >++ Tel +49 89 32 19 77 80 >++ Fax +49 89 32 19 77 89 >++ E-Mail bru...@netestate.de >++ http://www.netestate.de/ >++ >++ Sitz: München, HRB Nr.142452 (Handelsregister B München) >++ USt-IdNr. DE221033342 >++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer >++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel _______________________________________________ Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users