Re: File Descriptor limit and malfunction bind

2010-01-05 Thread Imri Zvik
On Sunday 03 January 2010 16:36:06 Ram Akuka wrote:
 i have a high load DNS server running bind 9.4.3 on RH -
 yesterday we experienced a problem with the bind  (the bind froze) , and
 when looking at the logs i saw the following error :
 named error: socket: file descriptor exceeds limit (4096/4096)
 i looked at my OS file descriptor limit and using ulimit -n   - 1024 .
 where the number 4096 come from?

If I'm not mistaken, you should either recompile with a higher value for 
ISC_SOCKET_MAXSOCKETS or restart named with the -S maxsockets argument.

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users

[BUG] bind crash in statschannel.c

2010-01-05 Thread Marinescu Paul dan

bind (9.6.1-P2) dies when one tries to retrieve statistics via HTTP from the 
statistcs-channel feature if an underlying call to libxml fails (returns a NULL 
pointer) at statschannel.c:720 - writer = xmlNewTextWriterDoc(doc, 0);

gdb stack trace attached

PaulGNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as i486-linux-gnu...
Reading symbols from /lib/i686/cmov/libcrypto.so.0.9.8...done.
Loaded symbols for /lib/i686/cmov/libcrypto.so.0.9.8
Reading symbols from /usr/lib/libxml2.so.2...done.
Loaded symbols for /usr/lib/libxml2.so.2
Reading symbols from /lib/tls/i686/cmov/libc.so.6...Reading symbols from 
/usr/lib/debug/lib/tls/i686/cmov/libc-2.9.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libc.so.6
Reading symbols from /lib/tls/i686/cmov/librt.so.1...Reading symbols from 
/usr/lib/debug/lib/tls/i686/cmov/librt-2.9.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/librt.so.1
Reading symbols from /lib/tls/i686/cmov/libdl.so.2...Reading symbols from 
/usr/lib/debug/lib/tls/i686/cmov/libdl-2.9.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libdl.so.2
Reading symbols from /usr/lib/libelf.so.1...done.
Loaded symbols for /usr/lib/libelf.so.1
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from /lib/tls/i686/cmov/libm.so.6...Reading symbols from 
/usr/lib/debug/lib/tls/i686/cmov/libm-2.9.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libm.so.6
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/ld-linux.so.2...Reading symbols from 
/usr/lib/debug/lib/ld-2.9.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libz.so.1...done.
Loaded symbols for /lib/libz.so.1
Reading symbols from /lib/tls/i686/cmov/libpthread.so.0...Reading symbols from 
/usr/lib/debug/lib/tls/i686/cmov/libpthread-2.9.so...done.
done.
Loaded symbols for /lib/tls/i686/cmov/libpthread.so.0
Core was generated by `/home/paul/testing/bind-9.6.1-P2/bin/named/named -m 
record,size,mctx -c'.
Program terminated with signal 6, Aborted.
[New process 17359]
#0  0xb800e430 in __kernel_vsyscall ()
(gdb) bt full
#0  0xb800e430 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7c046d0 in *__GI_raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
resultvar = value optimized out
pid = -1210875916
selftid = 17359
#2  0xb7c06098 in *__GI_abort () at abort.c:88
act = {__sigaction_handler = {sa_handler = 0xb802bff4, sa_sigaction = 
0xb802bff4}, sa_mask = {__val = {200, 158227552, 
  3084089632, 0, 3218258208, 3218258196, 3084095808, 3218258136, 
3083128550, 3218258272, 3087189616, 134520972, 
  3218258120, 0, 0, 3084765040, 150, 3084249268, 3084764937, 3085393908, 
83125478, 3085393908, 3084256000, 3085400992, 
  3218258216, 3084255320, 158227560, 3084256000, 0, 4294967295, 3085364480, 
3218258288}}, sa_flags = -1076709096, 
  sa_restorer = 0xbfd2b9b8}
sigs = {__val = {32, 0 repeats 31 times}}
#3  0x0805b5ed in assertion_failed (file=0x81e5954 statschannel.c, line=721, 
type=isc_assertiontype_insist, 
cond=0x81e59d0 xmlrc = 0) at ./main.c:161
No locals.
#4  0x08075e45 in generatexml (server=0xb7a2b018, buflen=0xbfd2be8c, 
buf=0xbfd2be90) at statschannel.c:721
boottime = 2009-12-16T19:01:48Z
nowstr = 2009-12-16T19:02:00Z
now = {seconds = 1260990120, nanoseconds = 623889000}
writer = (xmlTextWriterPtr) 0x0
doc = value optimized out
xmlrc = 0
view = value optimized out
dumparg = {type = 3087190056, arg = 0x30313032, ncounters = 
-1210875916, counterindices = 0xbfd2bef4, 
  countervalues = 0xbfd2bf08}
cachestats = value optimized out
nsstat_values = {953482756112, 13229877028469080104, 154755098322, 
673625797440831776, 13232763590089375780, 
  13246071614975508696, 679868415230579008, 584259502379892770, 956563732520, 
673625797440831528, 146165163730, 
  673625797440831760, 13232763418290683938, 584273395522386984, 9492013999826, 
13822318495170585830, 585299528710621346, 
  13822319047924267048, 13822318787302077208, 13232577669628191974, 
13822319426018655812, 579440689730993240, 
  13822319391658917444, 966419115570774080, 13259015910889536648, 
966419115570753984, 13259015910889536664, 
  577768293754909726, 13251703746116836712, 13822110218024942864, 
577761572135492968, 13259371310710489700, 20261403844, 
  13259020261691407636, 13259015910889536728, 577768293754909726}
resstat_values = {13259301299658162176, 13822110218161667296, 
13251704123403581107, 60205065653, 13239934711087615248, 
  13242563351467441680, 13822319181205520160, 

Re: File Descriptor limit and malfunction bind

2010-01-05 Thread Kevin Darcy

Shumon Huque wrote:

On Mon, Jan 04, 2010 at 01:43:52PM -0500, Kevin Darcy wrote:
  
named seems to use, by default, the OS hard limit on file descriptors, 
even though the ARM says The default is |unlimited|. . When it starts 
up as superuser, in theory it should be able to set both the hard and 
soft limit to infinity, but it doesn't appear to be doing that, at 
least it doesn't on Solaris.



This is not my experience on Solaris 10. According to the code, if
undefined in the config file, it's raising them to RLIM_INFINITY 
(lib/isc/unix/resource.c), and that's what I observe on my servers:


$ plimit `pgrep named`
23385:  /usr/local/sbin/named
   resource  current maximum
  time(seconds) unlimited   unlimited
  file(blocks)  unlimited   unlimited
  data(kbytes)  unlimited   unlimited
  stack(kbytes) unlimited   unlimited
  coredump(blocks)  unlimited   unlimited
  nofiles(descriptors)  unlimited   unlimited
  vmemory(kbytes)   unlimited   unlimited

The invoking environment had nofiles settings of 256 (soft) and
65536 (hard) respectively, which appear to be the OS defaults.

  
I was accidentally running a very old version of BIND on my test box 
(9.3.2), even though I was quoting the documentation from a later version.


You are correct, since at least 9.4.3-P4 (the lowest-numbered supported 
version of BIND), named seems to raise the limit to infinity, as the 
documentation states.


Sorry for the confusion.

- Kevin

___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


Re: [BUG] bind crash in statschannel.c

2010-01-05 Thread Evan Hunt
 bind (9.6.1-P2) dies when one tries to retrieve statistics via HTTP from
 the statistcs-channel feature if an underlying call to libxml fails
 (returns a NULL pointer) at statschannel.c:720 - writer =
 xmlNewTextWriterDoc(doc, 0);

Thank you, we'll look into it.  Please note, though, bug reports should be
sent to bind9-b...@isc.org, not bind-users.

-- 
Evan Hunt -- e...@isc.org
Internet Systems Consortium, Inc.
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users


9.4.3 oddities

2010-01-05 Thread Imri Zvik
Hi,

We've recently upgraded our caching servers to 9.4.3-P4/P3 (2 of them running 
9.4.3-P4 and 2 running 9.4.3-P3). Few days ago I've noticed something 
strange - When the server is loaded, some queries randomly fails (SERVFAIL). 
It seems that only queries for which the answer is NOT cached are affected.
I've verified with host/dig and tcpdump that there is no network issue (no 
unanswered packets). Digging deeper into the issue, I've found that the issue 
appears when the number of sockets used by named approach 1024~ (checked with 
netstat/lsof). The weirdest part, is that if I run rndc reconfig, suddenly 
named is able to use more than 1024 sockets (I've seen it using 4000-5000~ 
sockets), and the problem goes away for about an hour.

If I downgrade to 3.4.2-P2 the problems goes away.

I used the following command to reproduce the problem:
for i in {1..10}; do dig mx www.cnn.com @localhost |grep status |grep -v 
NOERROR; done

My servers are running RHEL 5.4 (2.6.18-164.9.1.el5) and FreeBSD 7.0 (the 
problem is seen on both), and they are splitted into two, unrelated, 
networks, and on two separate physical locations.

I've compiled bind from the vanilla ISC sources using the following configure 
command:

./configure --enable-threads --enable-largefile --prefix=/usr/local

I've also tried the following (I've also raised the OS limits, of course):
STD_CDEFINES=-DISC_SOCKET_FDSETSIZE=1048576 ./configure --enable-threads 
--enable-largefile --prefix=/usr/local

As I was seeing the general: error: socket: file descriptor exceeds limit 
(4096/4096) error a couple of days ago.

My best guess is that the problem is related to the recent move to epoll...

Any ideas on how I should proceed from here? 
___
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users