We have a bunch of servers in a murder, and I recently took a look in
/var/lib/imap/cores and found core files for lmtpd/lmtpproxyd all over
the place. These seem to be generated in waves, and can occur on the
front end imap proxy servers, the incoming e-mail servers, and the
mailbox servers -- often all at the same time, perhaps 10 cores in 20
minutes ... then nothing for a month or two. The system keeps running
well enough that we have not noticed problems.
I suspect this is load related, but am not sure. Perhaps something goes
wrong on the mupdate server, and the wheel fall off downstream for a while?
Anyhow, I tracked it down to the following (patch: conn is null) in
mupdate_client.c. We are running 2.4.17, but this patch is against the
current HEAD of git://git.cyrusimap.org/cyrus-imapd/ . It stops the
crash, but further than that I have no idea if this is a good thing, or
if it just hides a bigger problem; I don't know the code well enough.
The call to the crashing mupdate_noop is coming from here, in lmtpd.c:
/* get a connection to the mupdate server */
r = 0;
if (mhandle) {
/* we have one already, test it */
r = mupdate_noop(mhandle, mupdate_ignore_cb, NULL);
if (r) {
I guess the test is failing when it crashes?
Hopefully this is useful - please reply if anyone needs more info.
g
diff --git a/imap/mupdate-client.c b/imap/mupdate-client.c
index 22e0c97..1449688 100644
--- a/imap/mupdate-client.c
+++ b/imap/mupdate-client.c
@@ -520,6 +520,11 @@ EXPORTED int mupdate_noop(mupdate_handle *handle, mupdate_callback callback,
return MUPDATE_BADPARAM;
}
+ if (!handle->conn) {
+ syslog(LOG_ERR, "%s: no handle->conn", __func__);
+ return MUPDATE_BADPARAM;
+ }
+
prot_printf(handle->conn->out,
"X%u NOOP\r\n", handle->tagn++);