Subject: nscd freezes when used with libnss-ldap on busy server. Package: nscd Version: N/A; reported 2004-05-10 Severity: critical Justification: breaks the whole system
IMHO what I describe here is a bug in nscd when using libnss-ldap. I've seen some old bug reports on debian and redhat with similar problems and some suggestions to not use nscd and libnss-ldap together. We have a mail (postfix 2.1.0) server using debian Woody (all security updates made), kernel 2.4.26 (but had the same problem with kernel 2.4.25). We use libnss-ldap with local slapd server (a replication of our primary ldap server) for users' accounts. so: test: ~$head -3 /etc/nsswitch.conf passwd: ldap files group: ldap files shadow: ldap files and test: ~$cat /etc/libnss-ldap.conf host 127.0.0.1 base .... ldap_version 3 The problem is that, using nscd for password caching (default configuration) everything works fine, but the machine sometimes hangs. Tests on another test-server have shown that it can happen (randomly) when postfix has to deliver mail to aliases with many (more than 100) local users. The server then hangs (no connection possible even locally) but, if we were already logged, we can see the following behaviour: Everything is fine with the exception of name resolving. No respons from ls -l or ps -ef or any program that needs accounts information. test: ~$strace ls -l fstat64(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(136, 1), ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40014000 write(1, "total 12\n", 9total 12 ) = 9 socket(PF_UNIX, SOCK_STREAM, 0) = 3 connect(3, {sin_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = 0 write(3, "\2\0\0\0\1\0\0\0\2\0\0\0", 12) = 12 write(3, "0\0", 2) = 2 read(3, and nothing until Ctrl+C. With options -b -l -n lsof works and lsof -b -l -n | grep .nscd_socket | wc -l gives 121 opened files and test: ~$cat /proc/sys/fs/file-nr 4931 2562 52425 so the number of opened files should not be the problem. ( for i in `pgrep nscd` ; do ls /proc/$i/fd/ | wc -l ; done or for i in `pgrep slapd` ; do ls /proc/$i/fd/ | wc -l ; done show not too many files and this is not related to http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=246057). The only way to restore the machine is to kill nscd or slapd. Our only way to have a stable (up to now) server is NOT to use nscd. I think this bug could also be considered a security problem since it may lead to a local DoS. Ciao and thanks. Pietro -- System Information Debian Release: 3.0 Architecture: i386 Kernel: Linux test 2.4.26 #3 SMP Tue Apr 27 15:53:14 CEST 2004 i686 unknown Locale: LANG=POSIX, LC_CTYPE=POSIX -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]