RE: Problems with qmail-remote hanging
This problem's been reported before. If your OS says that an fd is readable via select(), then the read() should not block. As you observe though, the read is blocking so your OS is probably not telling the truth when it returns from the select(). The archives have plenty of discussion on this and the simplest solution is to put a large-value alarm() handler in qmail-remote. No one as yet seems to be able to narrow down which OSes do this and under what circumstances. Mark, Thanks for the reply. I only seem to experience the problem with large mail-outs. One possibility is that because of the way qmail works, there's a significant chance that we will be making a large number of simultaneous connections to some servers. It's possible that this is causing a connection to be blackholed somewhere ... that doesn't explain why select/read are failing to agree, though. Perhaps select thinks the connection is closed, but read doesn't. Setting an alarm is a nasty hack in my opinion, but I have to admit that it's something I considered. A slightly neater solution might be to use the SO_KEEPALIVE socket option - if it works (and there isn't a good reason not to use it) that is. What would be better is finding out why this happens, of course. Thanks, Richard P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200
Re: Problems with qmail-remote hanging
Setting an alarm is a nasty hack in my opinion, but I have to admit that it's something I considered. Well, the qmail-remote connection is well and truly wedged once it's in this state and if the select() timed out as it's meant to, qmail-remote would exit with a delivery failure indication, so it's not that bad a hack. It's also very easy to code - just a single alarm() call at teh top of main(). A slightly neater solution might be to use the SO_KEEPALIVE socket option - if it works (and there isn't a good reason not to use it) that is. It'll be interesting to hear if this works. What would be better is finding out why this happens, of course. Indeed. Does Linux offer tools/syscalls that would tell you why the select worked, but the read failed? P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200 I hesitate to say this, but Linux kernels seem to predominate in this regard, but that just may be that qmail is running on more Linux out there than other Unixen. Regards.
Re: Problems with qmail-remote hanging
I've been running qmail on a number of platforms quite happily for a while - until now I've had no problems at all. However, I am now experiencing a problem with qmail-remote hanging. The problem I see is with qmail-remote failing to terminate when a connection times-out. If left alone, the number of stuck processes will slowly climb, after about a month I had about 25 such processes. The network connections remain in the ESTABLISHED state. Looking at the process list right now, I have one stuck: # ps -ef | grep qmail-remote qmailr 12278 662 0 13:13 ?00:00:00 qmail-remote xx.co.uk xx qmailr 19876 662 0 16:09 ?00:00:00 qmail-remote xx.com root 19912 19489 0 16:10 pts/000:00:00 grep qmail-remote # strace -p 12278 read(3, unfinished ... ... all socket read()s in qmail-remote should be protected by a select and therefore should not block as this one is doing now. After recompiling with debugging and symbols, I get ... Exactly. This problem's been reported before. If your OS says that an fd is readable via select(), then the read() should not block. As you observe though, the read is blocking so your OS is probably not telling the truth when it returns from the select(). The archives have plenty of discussion on this and the simplest solution is to put a large-value alarm() handler in qmail-remote. No one as yet seems to be able to narrow down which OSes do this and under what circumstances. Regards.