Just an fyi for the group. I've been running this patch for several days now and my stuck processes are completely gone and no sign of any problems arising from the changes. Has anyone else had a chance to look over the changes I made yet? Also, the load on my boxes has dropped dramatically. Each of my boxes generally had 4-6 "stuck" processes eating up cpu time talking down empty sockets (tls on some, remote disconnected on others) which would give me load averages on each box between 3 and 19. Load averages now on each box are between 0.04 and 0.19 with each box processing roughly 6-12 connections per second.
-- Thanks, Ed McLain -----Original Message----- From: Ed McLain Sent: Tuesday, September 23, 2008 2:31 PM To: qpsmtpd@perl.org Subject: RE: high CPU on "lost" processes with forkserver Does anyone have any problems with the patch to fix this bug? Basically, when TcpServer's run method is called it is passed a copy of the $client IO::Socket and $client->connected is called in the respond method (for TcpServer and PreFork) to verify that the socket is still open. I've been testing it for a while and haven't seen any issues, also don't have any stuck processes anymore either. I'm not a perl monger though so I just want to make sure I'm not doing anything insane. Any and all input is greatly welcome. -- Thanks, Ed McLain -----Original Message----- From: Ed McLain Sent: Monday, September 22, 2008 10:51 AM To: Jose Luis Martinez; qpsmtpd@perl.org Subject: RE: high CPU on "lost" processes with forkserver Anything new on a fix for this bug? I seem to have quite a few connections hitting this these days. -- Thanks, Ed McLain -----Original Message----- From: Jose Luis Martinez [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 29, 2008 11:42 AM To: qpsmtpd@perl.org Subject: Re: high CPU on "lost" processes with forkserver Peter J. Holzer escribió: > On 2008-04-25 21:24:17 +0200, Jose Luis Martinez wrote: >> Peter J. Holzer escribió: >> You caught it!!! It did the trick! >> > As I wrote previously, my guess is that both the mysql library and the > tls library catch SIGPIPE but don't call the previously installed > signal handler. So only one of them gets called (whichever is > registered last) and the other one loses. So before patching the qp core in the respond method (Matt Sergeant commented: "But I removed it because then alarm() features VERY heavily in the performance profiling as an expensive system call."). I chose to work around the DBD::mysql to make it behave... my $sighandle = $SIG{'PIPE'}; my $dbh = DBI->connect('DBI:mysql:database=xxx;host=localhost;port=3306', 'xxx','xxx') or $self->log(LOGDEBUG, 'Could not connect ' . DBI->errstr()) ; $SIG{'PIPE'} = $sighandle; It seems the DBD::mysql uses the SIGPIPE to reconnect to the mysql in case the connection is lost. Good bye feature! Looks like Apache & DBD::mysql have or have had the same problem from this post... Found this: http://mail-archives.apache.org/mod_mbox/httpd-dev/199903.mbox/[EMAIL PROTECTED] > No, but there are at least two layers below that: The PerlIO layer and > the TLS layer. Either one could retry an unsuccessful write if the > actual cause of the error was lost. I'll try to contact the author of the TLS layer so that instead of depending on the signal, maybe he can depend on the return value of the writes (EPIPE) to cancel out... (Seems like a more stable solution... that way external modules cannot influence you). Thanks for all the help and comments. Jose Luis Martinez [EMAIL PROTECTED] CAPSiDE