I've tried reproducing by having long running auth queries in the sql and KILLing them on the server, restarting the mysql service, and setting max auth workers to 1 and running 2 sessions at the same time (with long-running auth queries), but to no effect. There must be something else going on here; I saw it in particular when exim on our frontend servers had queued a large number of messages and suddenly released them all at once hence the auth-worker hypothesis although the log messages do not support this. I'll try to see if I can trigger this manually although we have been doing some massively parallel testing previously and not seen this.
Mark ________________________________________ From: Timo Sirainen [t...@iki.fi] Sent: 26 January 2012 12:31 To: Mark Zealey Cc: dovecot@dovecot.org Subject: Re: [Dovecot] auth-worker temporary failures causing lmtp 500 rejection On 26.1.2012, at 12.14, Mark Zealey wrote: > I'm using dovecot 2.0.16 with a mysql user database. From time to time when > we have a big influx of messages (perhaps more than 30 concurrent rcpt to:<> > sessions at the same time so no auth-workers free?) or when we have a > transient issue connecting to the database server, we see the message: > > Jan 25 16:38:23 mailbox dovecot: auth-worker: sql(f...@bar.com,1.2.3.4): > Unknown user This happens only when the SQL query doesn't return any rows, but does return success. > and the lmtp process returns: > > 550 5.1.1 <f...@bar.com> User doesn't exist: f...@bar.com > > This would be correct for a permanent error where the user doesn't exist in > our database, however it seems to be doing this on transient errors too. Is > this an issue with the code or perhaps some setting I have missed? The problem is that temporary errors are returning "unknown user". Can you reproduce this somehow? Like if you stop MySQL it always returns that "Unknown user"?