signaled to death by 6
Hi fellows, I am installing Cyrus imapd 2.2.6 on RHEL. It's compiled with the distribution-provided Cyrus SASL 2.1.15 rpm. It was configured with /configure --enable-murder --with-auth=unix \ --with-openssl=/usr/include/openssl --enable-gssapi=/usr/kerberos \ --without-snmp I have various compiling errors unless I use "--enable-gssapi" and "--without-snmp". I ran mkimap as the cyrus users, no problem here. When I start Cyrus-imapd, I have the following error in the log : Jul 20 16:42:32 mupdate-dev master[18239]: about to exec /usr/cyrus/bin/ctl_cyrusdb Jul 20 16:42:32 mupdate-dev master[18240]: about to exec /usr/cyrus/bin/mupdate Jul 20 16:42:32 mupdate-dev master[18234]: process 18239 exited, signaled to death by 6 Jul 20 16:42:32 mupdate-dev master[18234]: process 18240 exited, signaled to death by 6 Jul 20 16:42:32 mupdate-dev master[18234]: service mupdate pid 18240 in READY state: terminated abnormally Jul 20 16:42:32 mupdate-dev master[18241]: about to exec /usr/cyrus/bin/mupdate Jul 20 16:42:32 mupdate-dev master[18234]: process 18241 exited, signaled to death by 6 Jul 20 16:42:32 mupdate-dev master[18234]: service mupdate pid 18241 in READY state: terminated abnormally Jul 20 16:42:32 mupdate-dev master[18242]: about to exec /usr/cyrus/bin/mupdate Jul 20 16:42:32 mupdate-dev master[18234]: process 18242 exited, signaled to death by 6 It continue like that forever until I stop the service. Any idea what could be causing the signal 6 ? From what I can gather, this is caused by abort(), but how can I figure out what is making an assertion fail ? Thanks for your insights ! signature.asc Description: OpenPGP digital signature
Re: Ocassional "signaled to death by 6" followed by increasing numbers of hung processes
Date: Mon, 05 Mar 2001 11:32:01 -0500 From: Irelann Kerry Anderson <[EMAIL PROTECTED]> We recently converted our main mail server (30,000+ users) from cyrus-1.6 to cyrus-2.0.12, we had converted a smaller (6000+ users) some time earlier to 2.0.9. We had tried 2.0.9 on this larger server, but that version has severe performance problems with that many mailboxes. Things looked pretty good initially, but after a few days, it stopped responding to POP and IMAP requests. A lsof and a PS showed hundreds of lmtpd processes and increasing. About that time we could get no response at all from the machine and were forced to reboot before we could gather more information. This has happened 4 more times since at intervals of from 1 to 4 days (always during off hours although that may not be significant). One of these times I was able to get in and send a TERM signal to the master process and all shut down fine and things worked fine when I restarted the master process. From this it appears that when a process is aborted in this fashion, some resource is remaining locked causing all new processes (lmtpd, imapd and pop) to hang. This is consistent with a lock being held in the Berkeley db environment when a process crashes. On examining the logs, I found that each of these incidents was immediately preceded by the message: "signaled to death by 6" 4 times the process in question was imapd, once it was lmtpd. Signal 6 on my Linux system is SIGABRT, which is usually caused by an assert() failing or an abort() call. This should always dump core. Since imapd does chdir(), it could be dumping core in some user's mailbox; I'd run a find /var/spool/imap -type f -name core to track down the core files and find out what's causing them if they exist (I'm sure you'll have some with that many users). There was no core file produced, I've since changed the startup script to cd into a directory writeable by cyrus and removed the "ulimit -c 0" from the startup script, but I've not yet gotten a core file to look at. I'm surprised the lmtpd didn't dump core into that directory. In the meantime, I'm posting this to the list on the off chance someone else has seen and debugged this problem. The mail server is a dual Pentium III 500 with 1GB ram, 100GB hardware raid running RedHat 7.0 with all current updates applied except the kernel which is kernel-smp-2.2.16-22 Since with this many users you may be somewhat desperate, I'll mention that it's possible to run Cyrus v2 using the flat file /var/imap/mailboxes.db instead of the Berkeley db-ized /var/imap/mailboxes.db. Doing this conversion may solve the symptom but not the problem, and will also cause your CREATE/RENAME/etc. performance to be approximately what it is with v1.6. If you can't debug this, we can talk about how to make this change. Larry
Ocassional "signaled to death by 6" followed by increasing numbers of hung processes
We recently converted our main mail server (30,000+ users) from cyrus-1.6 to cyrus-2.0.12, we had converted a smaller (6000+ users) some time earlier to 2.0.9. We had tried 2.0.9 on this larger server, but that version has severe performance problems with that many mailboxes. Things looked pretty good initially, but after a few days, it stopped responding to POP and IMAP requests. A lsof and a PS showed hundreds of lmtpd processes and increasing. About that time we could get no response at all from the machine and were forced to reboot before we could gather more information. This has happened 4 more times since at intervals of from 1 to 4 days (always during off hours although that may not be significant). One of these times I was able to get in and send a TERM signal to the master process and all shut down fine and things worked fine when I restarted the master process. From this it appears that when a process is aborted in this fashion, some resource is remaining locked causing all new processes (lmtpd, imapd and pop) to hang. On examining the logs, I found that each of these incidents was immediately preceded by the message: "signaled to death by 6" 4 times the process in question was imapd, once it was lmtpd. There was no core file produced, I've since changed the startup script to cd into a directory writeable by cyrus and removed the "ulimit -c 0" from the startup script, but I've not yet gotten a core file to look at. In the meantime, I'm posting this to the list on the off chance someone else has seen and debugged this problem. The mail server is a dual Pentium III 500 with 1GB ram, 100GB hardware raid running RedHat 7.0 with all current updates applied except the kernel which is kernel-smp-2.2.16-22 -- Irelann Kerry Anderson phone:(207)581-3508 Systems Group internet [EMAIL PROTECTED] UNET (formerly CAPS) Technology Services University of Maine System