On 26/01/2012 14:37, Mark Zealey wrote:
I've tried reproducing by having long running auth queries in the sql and 
KILLing them on the server, restarting the mysql service, and setting max auth 
workers to 1 and running 2 sessions at the same time (with long-running auth 
queries), but to no effect. There must be something else going on here; I saw 
it in particular when exim on our frontend servers had queued a large number of 
messages and suddenly released them all at once hence the auth-worker 
hypothesis although the log messages do not support this. I'll try to see if I 
can trigger this manually although we have been doing some massively parallel 
testing previously and not seen this.


Could it be a *timeout* rather than lack of worker processes? Theory would be that disk starvation causes other processes to take a long time to respond, hence the worker is *alive*, but doesn't return a response quickly enough, which in turn causes the "unknown user" message?

You could try a different disk io scheduler, or ionice to control the effect of these big bursts of disk activity on other processes?

(Most MTA programs such as postfix and qmail do a lot of fsyncs - this will cause a lot of IO activity and could easily starve other processes on the same box?)


Good luck

Ed W

Reply via email to