Just an fyi for the group.  I've been running this patch for several days now 
and my stuck processes are completely gone and no sign of any problems arising 
from the changes.  Has anyone else had a chance to look over the changes I made 
yet?  Also, the load on my boxes has dropped dramatically.  Each of my boxes 
generally had 4-6 "stuck" processes eating up cpu time talking down empty 
sockets (tls on some, remote disconnected on others) which would give me load 
averages on each box between 3 and 19.  Load averages now on each box are 
between 0.04 and 0.19 with each box processing roughly 6-12 connections per 
second.

--
Thanks,
Ed McLain

-----Original Message-----
From: Ed McLain
Sent: Tuesday, September 23, 2008 2:31 PM
To: qpsmtpd@perl.org
Subject: RE: high CPU on "lost" processes with forkserver

Does anyone have any problems with the patch to fix this bug?  Basically, when 
TcpServer's run method is called it is passed a copy of the $client IO::Socket 
and $client->connected is called in the respond method (for TcpServer and 
PreFork) to verify that the socket is still open.  I've been testing it for a 
while and haven't seen any issues, also don't have any stuck processes anymore 
either.  I'm not a perl monger though so I just want to make sure I'm not doing 
anything insane.  Any and all input is greatly welcome.

--
Thanks,
Ed McLain

-----Original Message-----
From: Ed McLain
Sent: Monday, September 22, 2008 10:51 AM
To: Jose Luis Martinez; qpsmtpd@perl.org
Subject: RE: high CPU on "lost" processes with forkserver

Anything new on a fix for this bug?  I seem to have quite a few connections 
hitting this these days.

--
Thanks,
Ed McLain


-----Original Message-----
From: Jose Luis Martinez [mailto:[EMAIL PROTECTED]
Sent: Tuesday, April 29, 2008 11:42 AM
To: qpsmtpd@perl.org
Subject: Re: high CPU on "lost" processes with forkserver

Peter J. Holzer escribió:
> On 2008-04-25 21:24:17 +0200, Jose Luis Martinez wrote:
>> Peter J. Holzer escribió:
>> You caught it!!! It did the trick!
>>
> As I wrote previously, my guess is that both the mysql library and the
> tls library catch SIGPIPE but don't call the previously installed
> signal handler. So only one of them gets called (whichever is
> registered last) and the other one loses.

So before patching the qp core in the respond method (Matt Sergeant
commented: "But I removed it because then alarm() features VERY heavily in the 
performance profiling as an expensive system call.").

I chose to work around the DBD::mysql to make it behave...

   my $sighandle = $SIG{'PIPE'};

   my $dbh =
DBI->connect('DBI:mysql:database=xxx;host=localhost;port=3306',
'xxx','xxx') or
       $self->log(LOGDEBUG, 'Could not connect ' . DBI->errstr()) ;

   $SIG{'PIPE'} = $sighandle;

It seems the DBD::mysql uses the SIGPIPE to reconnect to the mysql in case the 
connection is lost. Good bye feature!

Looks like Apache & DBD::mysql have or have had the same problem from this 
post...
Found this:
http://mail-archives.apache.org/mod_mbox/httpd-dev/199903.mbox/[EMAIL PROTECTED]

> No, but there are at least two layers below that: The PerlIO layer and
> the TLS layer. Either one could retry an unsuccessful write if the
> actual cause of the error was lost.

I'll try to contact the author of the TLS layer so that instead of depending on 
the signal, maybe he can depend on the return value of the writes (EPIPE) to 
cancel out... (Seems like a more stable solution...
that way external modules cannot influence you).

Thanks for all the help and comments.

Jose Luis Martinez
[EMAIL PROTECTED]
CAPSiDE

Reply via email to