Thank you for your reply. On Wed, Jul 07, 2010 at 02:11:40PM -0700, Phil Pennock wrote: > On 2010-07-07 at 09:34 +0200, Matthias Foerste wrote: > > recently we had a problem with an exim installation at one of our > > customers. Remote connections would just time out, while local > > connections still worked. We didn't have much time to investigate, > > restarted the service and could connect again. > > > > Afterwards we noticed some old exim processes (children of init) just > > sitting there for days. > > If it's a child of init, then it got reparented, perhaps when you killed > the old Exim? >
Thats quite possible. Currently we don't have any of these processes running anymore though, because i think i may have found the problem. Yesterday i got around to get a backtrace from one of these processes: (gdb) bt #0 0x00007f3c06e5a02e in __lll_lock_wait_private () from /lib/libc.so.6 #1 0x00007f3c06e0ebad in _L_lock_1593 () from /lib/libc.so.6 #2 0x00007f3c06e0e976 in __tz_convert () from /lib/libc.so.6 --> #3 0x0000000000466e9d in tod_stamp (type=1) at tod.c:81 #4 0x00000000004379c0 in log_write (selector=0, flags=8, format=0x4b2801 "%s") at log.c:737 #5 0x000000000041dffc in usr1_handler (sig=<value optimized out>) at exim.c:158 #6 <signal handler called> #7 0x00007f3c06e4b4d7 in munmap () from /lib/libc.so.6 #8 0x00007f3c06df0fb2 in _IO_setb_internal () from /lib/libc.so.6 #9 0x00007f3c06defbb5 in _IO_new_file_close_it () from /lib/libc.so.6 #10 0x00007f3c06de2e30 in fclose@@GLIBC_2.2.5 () from /lib/libc.so.6 #11 0x00007f3c06e0fd7c in __tzfile_read () from /lib/libc.so.6 #12 0x00007f3c06e0e79e in tzset_internal () from /lib/libc.so.6 #13 0x00007f3c06e0e997 in __tz_convert () from /lib/libc.so.6 --> #14 0x0000000000466e9d in tod_stamp (type=1) at tod.c:81 #15 0x00000000004148a9 in post_process_one (addr=0x6eaeb8, result=4096, logflags=0, driver_type=-1, logchar=892219441) at deliver.c:700 #16 0x00000000004199b6 in deliver_message (id=0x7fffffffddba "1OVnQj-0003wD-9q", forced=<value optimized out>, give_up=<value optimized out>) at deliver.c:2531 #17 0x000000000042234e in main (argc=3, cargv=0x7fffffffd538) at exim.c:3972 (gdb) quit It looks like __tz_convert() wants to lock something - succeeding the first time, but waiting forever for the lock being released in the signal handler. Someone ran 'watch -n 10 exiwhat' inside a screen session and appearantly exiwhat caught the exim listener while he was in tod_stamp() about once per week. > What does "exiwhat" say? This is a tool supplied with Exim which should > ask all the Exim processes what they're currently doing. > > -Phil > > -- > ## List details at http://lists.exim.org/mailman/listinfo/exim-users > ## Exim details at http://www.exim.org/ > ## Please use the Wiki with this list - http://wiki.exim.org/ -- Matthias Förste
signature.asc
Description: Digital signature
-- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
