On 24-Apr-08, at 5:52 PM, Brian Szymanski wrote:
Ask Bjørn Hansen wrote:

On Apr 24, 2008, at 11:02 AM, Charlie Brady wrote:

Ask said "Yeah, this is a pretty bad bug" in March 2007, but I
haven't seen anyone looking to fix it.

We must be in pretty good shape when billions (or whatever) of email
transactions are processed every day and nobody is bothered enough by
possibly our worst known bug to come up with a patch.  :-)

 - ask


We have a workaround to kill off any process that's been alive more than
5 minutes or so.

I'm anxious to get rid of it though, fixing things the right way, since
our mail server is struggling to keep up (only partially a result of
this). Any advice on where to start to tackle this one? And, just to be clear we're talking about the same bug, this exists in .3x as well, yea?

I think the core used to do something like this:

Index: lib/Qpsmtpd.pm
===================================================================
--- lib/Qpsmtpd.pm      (revision 876)
+++ lib/Qpsmtpd.pm      (working copy)
@@ -390,7 +390,10 @@
     if ($hooks->{$hook}) {
         my @r;
         for my $code (@{$hooks->{$hook}}) {
+            $SIG{ALRM} = sub { die "Alarm" };
+            my $prev = alarm(10); # should be long enough for anyone!
eval { (@r) = $code->{code}->($self, $self- >transaction, @_); };
+            alarm($prev);
             $@ and warn("FATAL PLUGIN ERROR: ", $@) and next;
             if ($r[0] == YIELD) {
                 die "YIELD not valid from $hook hook";
@@ -419,7 +422,10 @@
     #warn("Got sampler called: ${hook}_$code->{name}\n");
     $self->varlog(LOGDEBUG, $hook, $code->{name});
     my $tran = $self->transaction;
+    $SIG{ALRM} = sub { die "Alarm" };
+    my $prev = alarm(10); # should be long enough for anyone!
     eval { (@r) = $code->{code}->($self, $tran, @$args); };
+    alarm($prev);
     $@ and $self->log(LOGCRIT, "FATAL PLUGIN ERROR: ", $@) and next;

     !defined $r[0]


But I removed it because then alarm() features VERY heavily in the performance profiling as an expensive system call.

A better option might be to have the parent process watch for long running children and terminate them.

Matt.

Reply via email to