> I just looked at the server I had problems with -- 15 hung 
> qmail-remotes :(
> 
        Not good! I peaked at 26 before I noticed.

> How did you test this patch?
> Are you saying that you were able to reliably reproduce the problem?
> I could never do this... If so, how?
> 
        I tested the patch by running it on the live server for three days.
I was experiencing on average 1-2 processes getting stuck a day and haven't
had one stuck since. The problems generally started during large mailing
which happen daily on this server.

        I couldn't repeat the problem, but it happened reliably enough for
me to believe that it has now been stopped.

        The patch itself should not affect the running of the program in any
way except dropping dead connections.

> There is a lot of mistery in this:  Most (but not all) reports 
> had connections hung to outblaze.com
> Most (but not all) servers ran Linux.
> 
> It's weird...
> 
        It is. I didn't spot a pattern in the remote hosts, but then I
didn't try to. I suspect it's something to do with stateful firewalls
dropping a session after a period of inactivity, it doesn't explain why the
code is affected by it all, though.

        My other suspicion is that there's a chance that my one server will
try a couple of dozen connections to the same remote host at the same time.
(This is an issue in itself!) It could be that a firewall in the path is
mistaking the connection as a DOS attempt and responding weirdly, kicking
off a bug with select.

        I'll let you know if the problems re-appear.

                Richard

Reply via email to