Date: Mon, 05 Mar 2001 11:32:01 -0500
   From: Irelann Kerry Anderson <[EMAIL PROTECTED]>

   We recently converted our main mail server (30,000+ users) from
   cyrus-1.6 to cyrus-2.0.12, we had converted a smaller (6000+ users)
   some time earlier to 2.0.9.  We had tried 2.0.9 on this larger
   server, but that version has severe performance problems with that
   many mailboxes.

   Things looked pretty good initially, but after a few days, it
   stopped responding to POP and IMAP requests.  A lsof and a PS
   showed hundreds of lmtpd processes and increasing.  About that time
   we could get no response at all from the machine and were forced to
   reboot before we could gather more information.

   This has happened 4 more times since at intervals of from 1 to 4
   days (always during off hours although that may not be
   significant).  One of these times I was able to get in and send a
   TERM signal to the master process and all shut down fine and things
   worked fine when I restarted the master process.  From this it
   appears that when a process is aborted in this fashion, some
   resource is remaining locked causing all new processes (lmtpd,
   imapd and pop) to hang.

This is consistent with a lock being held in the Berkeley db
environment when a process crashes.

   On examining the logs, I found that each of these incidents was
   immediately preceded by the message:

   "signaled to death by 6"

   4 times the process in question was imapd, once it was lmtpd.

Signal 6 on my Linux system is SIGABRT, which is usually caused by an
assert() failing or an abort() call.  This should always dump core.
Since imapd does chdir(), it could be dumping core in some user's
mailbox; I'd run a 

find /var/spool/imap -type f -name core

to track down the core files and find out what's causing them if they
exist (I'm sure you'll have some with that many users).

   There was no core file produced, I've since changed the startup
   script to cd into a directory writeable by cyrus and removed the
   "ulimit -c 0" from the startup script, but I've not yet gotten a
   core file to look at.

I'm surprised the lmtpd didn't dump core into that directory.

   In the meantime, I'm posting this to the list on the off chance
   someone else has seen and debugged this problem.

   The mail server is a dual Pentium III 500 with 1GB ram, 100GB
   hardware raid running RedHat 7.0 with all current updates applied
   except the kernel which is kernel-smp-2.2.16-22

Since with this many users you may be somewhat desperate, I'll mention
that it's possible to run Cyrus v2 using the flat file
/var/imap/mailboxes.db instead of the Berkeley db-ized
/var/imap/mailboxes.db.

Doing this conversion may solve the symptom but not the problem, and
will also cause your CREATE/RENAME/etc. performance to be
approximately what it is with v1.6.  If you can't debug this, we can
talk about how to make this change.

Larry


Reply via email to