Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
At Thu, 29 Apr 2010 14:53:44 -0700, Dale Kiefling wrote: > We have a Bind 9.7.0-P1 instance that is throwing the following errors: > 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds > limit > (1024/1024) The fact that the FD limit is 1024 suggests your named uses select instead of epoll. As far as I know Linux kernel 2.6 should support epoll, so your named may have been built with --disable-epoll. What's the result of named -V? > $ uname -a > Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 > EDT 2009 i686 athlon i386 GNU/Linux For a busy recursive server that could consume more than 1024 open sockets, select won't work well anyway. Even if you increase the FD limit it's quite likely that the server hits other scalability issues. So, if your named was built --disable-epoll, I'd suggest you to rebuild it with enabling epoll (which should be enabled by default on your Linux system) and try again. In any case, the assertion failure should be a bug, but right now I have no idea about how it happened. --- JINMEI, Tatuya Internet Systems Consortium, Inc. ___ bind-users mailing list bind-users@lists.isc.org https://lists.isc.org/mailman/listinfo/bind-users
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Dale: Sorry I emailed you directly. I"m sending my response to the group. Dale: The limits.conf file will only set the high and low limit when you log in. Once you log out, the open file limit will go back to its default vaule. Read the man page for limits.conf. The below issue has caught all of us. *Excerpt is below* In general, individual limits have priority over group limits, so if you impose no limits for admin group, but one of the members in this group have a limits line, the user will have its limits set according to this line. Also, please note that all limit settings are set per login. They are not global, nor are they permanent; existing only for the duration of the session. On Fri, Apr 30, 2010 at 7:32 PM, Dale Kiefling wrote: > Hey Ezra, > Thanks for the reply. > > ulimit -Hn and ulimit -Sn report 8192. > > Wasn't sure if limits.conf would help or not. > > Dale > > On Apr 30, 2010, at 4:18 PM, Ezra Taylor wrote: > > Dale: > > The limits.conf file is not going to solve your problem. Read > the man page for initscript and inittab. > > On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling wrote: > >> We have a Bind 9.7.0-P1 instance that is throwing the following errors: >> 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds >> limit >> (1024/1024) >> 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds >> limit >> (1024/1024) >> 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds >> limit >> (1024/1024) >> >> When we try to increase the socket value we are seeing assertion failures. >> >> Restarted named with the option -S 8192: >> Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: >> INSIST(!sock->pending_recv) failed, back trace >> Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? >> Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? >> Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? >> Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? >> Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? >> Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? >> Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? >> Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? >> Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion >> failure) >> >> Any advice given the info provided below? Let me know if I can provide >> more info. >> >> Dale >> >> >> $ dig +short version.bind chaos txt >> "9.7.0-P1" >> >> $ uname -a >> Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 >> EDT 2009 i686 athlon i386 GNU/Linux >> >> $ cat /etc/redhat-release >> CentOS release 5.3 (Final) >> >> >> $ cat /etc/security/limits.conf >> * hardnofile 8192 >> * softnofile 8192 >> ntp - memlock 32768 >> >> >> cat named.conf >> ... >> options { >>directory "/var/opt/named"; >>pid-file "/etc/named.pid"; >>notify yes; >>also-notify { >>}; >>recursion yes; >>allow-query { any; }; >>//edns-udp-size 512; >> }; >> ... >> >> >> unlimit -a reports: >> open files (-n) 8192 >> >> >> recent rndc stats: >> +++ Statistics Dump +++ (1271794427) >> ++ Incoming Requests ++ >> 108267159 QUERY >> 313 NOTIFY >> ++ Incoming Queries ++ >>91731351 A >> 314215 NS >> 10840 SOA >> 2704323 PTR >> 4367570 MX >> 81 TXT >> 325 X25 >> 9135705 >>1072 SRV >> 6 IXFR >>1453 AXFR >> 218 ANY >> ++ Outgoing Queries ++ >> [View: default] >> 3077427 A >>5991 NS >>2113 SOA >> 44931 PTR >> 7552045 MX >> 53 TXT >> 41 X25 >> 3218008 >> 426 SRV >> 18 ANY >> [View: _bind] >> [View: _meta] >> ++ Name Server Statistics ++ >> 108267472 IPv4 requests received >>3342 requests with EDNS(0) received >>5600 TCP requests received >> 108051102 responses sent >>4972 truncated responses sent >>3342 responses with EDNS(0) sent >>98180939 queries resulted in successful answer >> 101089523 queries resulted in authoritative answer >> 5075782 queries resulted in non authoritative answer >> 7 queries resulted in referral answer >> 3987640 queries resulted in nxrrset >> 1885481 queries resulted in SERVFAIL >> 3996719 queries resulted in NXDOMAIN >> 5660199 queries caused recursion >> 207266 duplicate queries received >>7610 queries dropped >>1456 requested transfers completed >> ++ Zone Maintenance Statistics ++ >>9833 IPv4
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Hey Ezra, Thanks for the reply. ulimit -Hn and ulimit -Sn report 8192. Wasn't sure if limits.conf would help or not. Dale On Apr 30, 2010, at 4:18 PM, Ezra Taylor wrote: Dale: The limits.conf file is not going to solve your problem. Read the man page for initscript and inittab. On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling wrote: We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock->pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt "9.7.0-P1" $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory "/var/opt/named"; pid-file "/etc/named.pid"; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested 29 transfer requests succeeded ++ Resolver Statistics ++ [Common] 570 mismatch responses received 151245 failures in opening query sockets [View: default] 13714283 IPv4 queries sent 186770 IPv6 queries sent 10815900 IPv4 responses received 31 IPv6 responses received 123548 NXDOMAIN received 955379 SERVFAIL received 33013 FORMERR received 806336 other errors received 382773 EDNS(0) query failures 442 truncated responses received 751147 lame delegations received 4759160 query retries 3103740 query timeouts 546721 IPv4 NS address fetches 1168510 IPv6 NS address fetches 80562 IPv4 NS address fetch failed 1158909 IPv6 NS address fetch failed 1527841 queries with RTT < 10ms
Re: Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
Dale: The limits.conf file is not going to solve your problem. Read the man page for initscript and inittab. On Thu, Apr 29, 2010 at 5:53 PM, Dale Kiefling wrote: > We have a Bind 9.7.0-P1 instance that is throwing the following errors: > 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds > limit > (1024/1024) > 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds > limit > (1024/1024) > 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds > limit > (1024/1024) > > When we try to increase the socket value we are seeing assertion failures. > > Restarted named with the option -S 8192: > Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: > INSIST(!sock->pending_recv) failed, back trace > Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? > Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? > Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? > Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? > Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? > Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? > Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? > Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? > Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion > failure) > > Any advice given the info provided below? Let me know if I can provide > more info. > > Dale > > > $ dig +short version.bind chaos txt > "9.7.0-P1" > > $ uname -a > Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT > 2009 i686 athlon i386 GNU/Linux > > $ cat /etc/redhat-release > CentOS release 5.3 (Final) > > > $ cat /etc/security/limits.conf > * hardnofile 8192 > * softnofile 8192 > ntp - memlock 32768 > > > cat named.conf > ... > options { >directory "/var/opt/named"; >pid-file "/etc/named.pid"; >notify yes; >also-notify { >}; >recursion yes; >allow-query { any; }; >//edns-udp-size 512; > }; > ... > > > unlimit -a reports: > open files (-n) 8192 > > > recent rndc stats: > +++ Statistics Dump +++ (1271794427) > ++ Incoming Requests ++ > 108267159 QUERY > 313 NOTIFY > ++ Incoming Queries ++ >91731351 A > 314215 NS > 10840 SOA > 2704323 PTR > 4367570 MX > 81 TXT > 325 X25 > 9135705 >1072 SRV > 6 IXFR >1453 AXFR > 218 ANY > ++ Outgoing Queries ++ > [View: default] > 3077427 A >5991 NS >2113 SOA > 44931 PTR > 7552045 MX > 53 TXT > 41 X25 > 3218008 > 426 SRV > 18 ANY > [View: _bind] > [View: _meta] > ++ Name Server Statistics ++ > 108267472 IPv4 requests received >3342 requests with EDNS(0) received >5600 TCP requests received > 108051102 responses sent >4972 truncated responses sent >3342 responses with EDNS(0) sent >98180939 queries resulted in successful answer > 101089523 queries resulted in authoritative answer > 5075782 queries resulted in non authoritative answer > 7 queries resulted in referral answer > 3987640 queries resulted in nxrrset > 1885481 queries resulted in SERVFAIL > 3996719 queries resulted in NXDOMAIN > 5660199 queries caused recursion > 207266 duplicate queries received >7610 queries dropped >1456 requested transfers completed > ++ Zone Maintenance Statistics ++ >9833 IPv4 notifies sent > 301 IPv4 notifies received > 268 notifies rejected > 315214 IPv4 SOA queries sent > 6 IPv4 AXFR requested > 23 IPv4 IXFR requested > 29 transfer requests succeeded > ++ Resolver Statistics ++ > [Common] > 570 mismatch responses received > 151245 failures in opening query sockets > [View: default] >13714283 IPv4 queries sent > 186770 IPv6 queries sent >10815900 IPv4 responses received > 31 IPv6 responses received > 123548 NXDOMAIN received > 955379 SERVFAIL received > 33013 FORMERR received > 806336 other errors received > 382773 EDNS(0) query failures > 442 truncated responses received > 751147 lame delegations received > 4759160 query retries > 3103740 query timeouts > 546721 IPv4 NS address fetches > 1168510 IPv6 NS address fetches > 80562 IPv4 NS address fetch failed > 1158909 I
Bind 9.7.0-P1 socket: file descriptor exceeds limit / assertion failure
We have a Bind 9.7.0-P1 instance that is throwing the following errors: 21-Apr-2010 16:59:00.173 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.122 general: error: socket: file descriptor exceeds limit (1024/1024) 21-Apr-2010 17:00:00.123 general: error: socket: file descriptor exceeds limit (1024/1024) When we try to increase the socket value we are seeing assertion failures. Restarted named with the option -S 8192: Apr 26 19:20:54 ha1 named[3891]: socket.c:2781: INSIST(!sock->pending_recv) failed, back trace Apr 26 19:20:54 ha1 named[3891]: #0 0x806525b in ?? Apr 26 19:20:54 ha1 named[3891]: #1 0x7b4b57 in ?? Apr 26 19:20:54 ha1 named[3891]: #2 0x7dfc03 in ?? Apr 26 19:20:54 ha1 named[3891]: #3 0x7e16f9 in ?? Apr 26 19:20:54 ha1 named[3891]: #4 0x7e1979 in ?? Apr 26 19:20:54 ha1 named[3891]: #5 0x7e1be7 in ?? Apr 26 19:20:54 ha1 named[3891]: #6 0x61a49b in ?? Apr 26 19:20:54 ha1 named[3891]: #7 0x6fd42e in ?? Apr 26 19:20:54 ha1 named[3891]: exiting (due to assertion failure) Any advice given the info provided below? Let me know if I can provide more info. Dale $ dig +short version.bind chaos txt "9.7.0-P1" $ uname -a Linux ha1.example.com 2.6.18-128.1.10.el5PAE #1 SMP Thu May 7 11:14:31 EDT 2009 i686 athlon i386 GNU/Linux $ cat /etc/redhat-release CentOS release 5.3 (Final) $ cat /etc/security/limits.conf * hardnofile 8192 * softnofile 8192 ntp - memlock 32768 cat named.conf ... options { directory "/var/opt/named"; pid-file "/etc/named.pid"; notify yes; also-notify { }; recursion yes; allow-query { any; }; //edns-udp-size 512; }; ... unlimit -a reports: open files (-n) 8192 recent rndc stats: +++ Statistics Dump +++ (1271794427) ++ Incoming Requests ++ 108267159 QUERY 313 NOTIFY ++ Incoming Queries ++ 91731351 A 314215 NS 10840 SOA 2704323 PTR 4367570 MX 81 TXT 325 X25 9135705 1072 SRV 6 IXFR 1453 AXFR 218 ANY ++ Outgoing Queries ++ [View: default] 3077427 A 5991 NS 2113 SOA 44931 PTR 7552045 MX 53 TXT 41 X25 3218008 426 SRV 18 ANY [View: _bind] [View: _meta] ++ Name Server Statistics ++ 108267472 IPv4 requests received 3342 requests with EDNS(0) received 5600 TCP requests received 108051102 responses sent 4972 truncated responses sent 3342 responses with EDNS(0) sent 98180939 queries resulted in successful answer 101089523 queries resulted in authoritative answer 5075782 queries resulted in non authoritative answer 7 queries resulted in referral answer 3987640 queries resulted in nxrrset 1885481 queries resulted in SERVFAIL 3996719 queries resulted in NXDOMAIN 5660199 queries caused recursion 207266 duplicate queries received 7610 queries dropped 1456 requested transfers completed ++ Zone Maintenance Statistics ++ 9833 IPv4 notifies sent 301 IPv4 notifies received 268 notifies rejected 315214 IPv4 SOA queries sent 6 IPv4 AXFR requested 23 IPv4 IXFR requested 29 transfer requests succeeded ++ Resolver Statistics ++ [Common] 570 mismatch responses received 151245 failures in opening query sockets [View: default] 13714283 IPv4 queries sent 186770 IPv6 queries sent 10815900 IPv4 responses received 31 IPv6 responses received 123548 NXDOMAIN received 955379 SERVFAIL received 33013 FORMERR received 806336 other errors received 382773 EDNS(0) query failures 442 truncated responses received 751147 lame delegations received 4759160 query retries 3103740 query timeouts 546721 IPv4 NS address fetches 1168510 IPv6 NS address fetches 80562 IPv4 NS address fetch failed 1158909 IPv6 NS address fetch failed 1527841 queries with RTT < 10ms 4509306 queries with RTT 10-100ms 3619163 queries with RTT 100-500ms 518078 queries with RTT 500-800ms 493598 queries with RTT 800-1600ms 147945 queries with RTT > 1600ms [View: _bind] [View: _meta] ++ Cache DB RRsets ++ [View: default