RE: Problems with qmail-remote hanging

2001-07-31 Thread Richard Underwood

 This problem's been reported before. If your OS says that an fd is
 readable via select(), then the read() should not block.
 
 As you observe though, the read is blocking so your OS is probably not
 telling the truth when it returns from the select().
 
 The archives have plenty of discussion on this and the simplest
 solution is to put a large-value alarm() handler in qmail-remote. No
 one as yet seems to be able to narrow down which OSes do this and
 under what circumstances.

Mark,

Thanks for the reply. I only seem to experience the problem with
large mail-outs. One possibility is that because of the way qmail works,
there's a significant chance that we will be making a large number of
simultaneous connections to some servers.

It's possible that this is causing a connection to be blackholed
somewhere ... that doesn't explain why select/read are failing to agree,
though. Perhaps select thinks the connection is closed, but read doesn't.

Setting an alarm is a nasty hack in my opinion, but I have to admit
that it's something I considered. A slightly neater solution might be to use
the SO_KEEPALIVE socket option - if it works (and there isn't a good reason
not to use it) that is.

What would be better is finding out why this happens, of course.

Thanks,

Richard

P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200



Re: Problems with qmail-remote hanging

2001-07-31 Thread MarkD

   Setting an alarm is a nasty hack in my opinion, but I have to admit
 that it's something I considered.

Well, the qmail-remote connection is well and truly wedged once it's
in this state and if the select() timed out as it's meant to,
qmail-remote would exit with a delivery failure indication, so it's
not that bad a hack. It's also very easy to code - just a single
alarm() call at teh top of main().

 A slightly neater solution might be to use
 the SO_KEEPALIVE socket option - if it works (and there isn't a good reason
 not to use it) that is.

It'll be interesting to hear if this works.

   What would be better is finding out why this happens, of course.

Indeed. Does Linux offer tools/syscalls that would tell you why the
select worked, but the read failed?

 P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200

I hesitate to say this, but Linux kernels seem to predominate in this
regard, but that just may be that qmail is running on more Linux out
there than other Unixen.


Regards.



Re: Problems with qmail-remote hanging

2001-07-30 Thread MarkD

   I've been running qmail on a number of platforms quite happily for a
 while - until now I've had no problems at all. However, I am now
 experiencing a problem with qmail-remote hanging.

   The problem I see is with qmail-remote failing to terminate when a
 connection times-out. If left alone, the number of stuck processes will
 slowly climb, after about a month I had about 25 such processes. The network
 connections remain in the ESTABLISHED state.
 
   Looking at the process list right now, I have one stuck:
 
 # ps -ef | grep qmail-remote
 qmailr   12278   662  0 13:13 ?00:00:00 qmail-remote
 xx.co.uk xx
 qmailr   19876   662  0 16:09 ?00:00:00 qmail-remote xx.com
 
 root 19912 19489  0 16:10 pts/000:00:00 grep qmail-remote
 
 # strace -p 12278
 read(3,  unfinished ...
 
   ... all socket read()s in qmail-remote should be protected by a
 select and therefore should not block as this one is doing now. After
 recompiling with debugging and symbols, I get ...

Exactly.

This problem's been reported before. If your OS says that an fd is
readable via select(), then the read() should not block.

As you observe though, the read is blocking so your OS is probably not
telling the truth when it returns from the select().

The archives have plenty of discussion on this and the simplest
solution is to put a large-value alarm() handler in qmail-remote. No
one as yet seems to be able to narrow down which OSes do this and
under what circumstances.


Regards.