Hello, I see a very strange behaviour with exim on one x86_64 SLES10 Linux system. At the moment a daemon (exim -q15m -bd or for tests exim -q15s) does not start the periodic queue runners. Analysing the logs shows that this misfeature started after updating to http://download.opensuse.org/repositories/server:/mail/SLE_10/x86_64/exim-4.67-14.1.x86_64.rpm and restarting the exim daemon afterwards (12 days ago). The queue runners were started correctly with the previous also 4.67 based package. Release does not seem to be the cause anyway, as I tried with different binary packages (including 4.51 from openSUSE 10.1 distribution) and manually built from streight exim.org sources (4.67, 4.66) - it's always the same on this running system. I cannot/don't want to reboot it at the moment. Uptime is much longer than the start of this problem, as is last glibc update.
Now some debugging data. problem-systemi:~ # exim -d -q15s Exim version 4.67 uid=0 gid=0 pid=23684 D=fbb95cfd Berkeley DB: Sleepycat Software: Berkeley DB 4.3.29: (June 16, 2006) Support for: crypteq iconv() IPv6 OpenSSL move_frozen_messages Content_Scanning Old_Demime [...] originator: uid=0 gid=0 login=root name=root 23721 changed uid/gid: running as a daemon 23721 uid=8 gid=12 pid=23721 23721 auxiliary group list: 12 23721 LOG: MAIN 23721 exim 4.67 daemon started: pid=23721, -q15s, not listening for SMTP 23721 set_process_info: 23721 daemon: -q15s, not listening 23721 daemon running with uid=8 gid=12 euid=8 egid=12 23721 SIGALRM received 23722 Starting queue-runner: pid 23722 23722 exec /usr/sbin/exim -q 23721 1 queue-runner process running 23721 child 23722 ended: status=0x0 23721 0 queue-runner processes now running and that's it there - nothing more happens ok-system:~ # exim -d -q15s Exim version 4.67 uid=0 gid=0 pid=13291 D=fbb95cfd Berkeley DB: Sleepycat Software: Berkeley DB 4.3.29: (June 16, 2006) Support for: crypteq iconv() IPv6 OpenSSL move_frozen_messages Content_Scanning Old_Demime [...] originator: uid=0 gid=0 login=root name=root 13311 changed uid/gid: running as a daemon 13311 uid=8 gid=12 pid=13311 13311 auxiliary group list: 12 13311 LOG: MAIN 13311 exim 4.67 daemon started: pid=13311, -q15s, not listening for SMTP 13311 set_process_info: 13311 daemon: -q15s, not listening 13311 daemon running with uid=8 gid=12 euid=8 egid=12 13311 SIGALRM received 13312 Starting queue-runner: pid 13312 13312 exec /usr/sbin/exim -q 13311 1 queue-runner process running 13311 child 13312 ended: status=0x0 13311 0 queue-runner processes now running 13311 SIGALRM received 13313 Starting queue-runner: pid 13313 13313 exec /usr/sbin/exim -q 13311 1 queue-runner process running 13311 child 13313 ended: status=0x0 13311 0 queue-runner processes now running last 6 lines repeated every 15 seconds as it should be. Output of "strace exim -d -q15s" (for around 40s) starting with the first or only output of "SIGALARM received" (which is simulated in daemon.c) shows the underlying difference: problematic-system: write(2, "24249 daemon running with uid=8 "..., 5424249 daemon running with uid=8 gid=12 euid=8 egid=12) = 54 write(2, "24249 SIGALRM received\n", 2324249 SIGALRM received) = 23 clone(24250 Starting queue-runner: pid 24250 24250 exec /usr/sbin/exim -q child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b206d4102f0) = 24250 write(2, "24249 1 queue-runner process run"..., 3724249 1 queue-runner process running) = 37 alarm(15) = 0 select(0, NULL, NULL, NULL, {15, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, {SIG_DFL}, NULL, 8) = 0 rt_sigreturn(0x11) = -1 EINTR (Interrupted system call) wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 24250 write(2, "24249 child 24250 ended: status="..., 3624249 child 24250 ended: status=0x0 ) = 36 write(2, "24249 0 queue-runner processes n"..., 4324249 0 queue-runner processes now running ) = 43 wait4(-1, 0x7fff3eaa12f4, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigaction(SIGCHLD, {0x555555576310, [], SA_RESTORER, 0x2b206cfdfbf0}, NULL, 8) = 0 select(0, NULL, NULL, NULL, {15, 0}) = 0 (Timeout) wait4(-1, 0x7fff3eaa12f4, WNOHANG, NULL) = -1 ECHILD (No child processes) select(0, NULL, NULL, NULL, {15, 0}) = 0 (Timeout) wait4(-1, 0x7fff3eaa12f4, WNOHANG, NULL) = -1 ECHILD (No child processes) select(0, NULL, NULL, NULL, {15, 0} <unfinished ...> ok-system: write(2, "13379 daemon running with uid=8 "..., 5413379 daemon running with uid=8 gid=12 euid=8 egid=12) = 54 write(2, "13379 SIGALRM received\n", 2313379 SIGALRM received) = 23 clone(13380 Starting queue-runner: pid 13380 13380 exec /usr/sbin/exim -q child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b2f73a3c2f0) = 13380 write(2, "13379 1 queue-runner process run"..., 3713379 1 queue-runner process running) = 37 alarm(15) = 0 select(0, NULL, NULL, NULL, {15, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGCHLD (Child exited) @ 0 (0) --- rt_sigaction(SIGCHLD, {SIG_DFL}, NULL, 8) = 0 rt_sigreturn(0x11) = -1 EINTR (Interrupted system call) wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG, NULL) = 13380 write(2, "13379 child 13380 ended: status="..., 3613379 child 13380 ended: status=0x0) = 36 write(2, "13379 0 queue-runner processes n"..., 4313379 0 queue-runner processes now running) = 43 wait4(-1, 0x7fff38477f24, WNOHANG, NULL) = -1 ECHILD (No child processes) rt_sigaction(SIGCHLD, {0x555555576310, [], SA_RESTORER, 0x2b2f73609c10}, NULL, 8) = 0 select(0, NULL, NULL, NULL, {15, 0}) = ? ERESTARTNOHAND (To be restarted) --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigaction(SIGALRM, {0x555555585ca0, [], SA_RESTORER, 0x2b2f73609c10}, NULL, 8) = 0 rt_sigreturn(0xe) = -1 EINTR (Interrupted system call) wait4(-1, 0x7fff38477f24, WNOHANG, NULL) = -1 ECHILD (No child processes) write(2, "13379 SIGALRM received\n", 2313379 SIGALRM received) = 23 clone(13381 Starting queue-runner: pid 13381 13381 exec /usr/sbin/exim -q child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b2f73a3c2f0) = 13381 write(2, "13379 1 queue-runner process run"..., 3713379 1 queue-runner process running) = 37 alarm(15) = 0 select(0, NULL, NULL, NULL, {15, 0}) = ? ERESTARTNOHAND (To be restarted) ... repeating... Any idea why SIGALRM is not properly delivered on the problematic system? Thanks, Lutz -- Lutz Preßler <[EMAIL PROTECTED]> http://www.SerNet.DE/ SerNet Service Network GmbH, Bahnhofsallee 1b, D-37081 Göttingen Tel.: +49-551-370000-2, FAX: +49-551-370000-9 AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen -- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/