We have a bunch of servers in a murder, and I recently took a look in /var/lib/imap/cores and found core files for lmtpd/lmtpproxyd all over the place. These seem to be generated in waves, and can occur on the front end imap proxy servers, the incoming e-mail servers, and the mailbox servers -- often all at the same time, perhaps 10 cores in 20 minutes ... then nothing for a month or two. The system keeps running well enough that we have not noticed problems.

I suspect this is load related, but am not sure. Perhaps something goes wrong on the mupdate server, and the wheel fall off downstream for a while?

Anyhow, I tracked it down to the following (patch: conn is null) in mupdate_client.c. We are running 2.4.17, but this patch is against the current HEAD of git://git.cyrusimap.org/cyrus-imapd/ . It stops the crash, but further than that I have no idea if this is a good thing, or if it just hides a bigger problem; I don't know the code well enough.

The call to the crashing mupdate_noop is coming from here, in lmtpd.c:

    /* get a connection to the mupdate server */
    r = 0;
    if (mhandle) {
    /* we have one already, test it */
    r = mupdate_noop(mhandle, mupdate_ignore_cb, NULL);
    if (r) {

I guess the test is failing when it crashes?

Hopefully this is useful - please reply if anyone needs more info.

g
diff --git a/imap/mupdate-client.c b/imap/mupdate-client.c
index 22e0c97..1449688 100644
--- a/imap/mupdate-client.c
+++ b/imap/mupdate-client.c
@@ -520,6 +520,11 @@ EXPORTED int mupdate_noop(mupdate_handle *handle, mupdate_callback callback,
         return MUPDATE_BADPARAM;
     }
 
+    if (!handle->conn) {
+        syslog(LOG_ERR, "%s: no handle->conn", __func__);
+        return MUPDATE_BADPARAM;
+    }
+
     prot_printf(handle->conn->out,
                 "X%u NOOP\r\n", handle->tagn++);
 

Reply via email to