> >>>>> "MD" == Mark Dedlow <[EMAIL PROTECTED]> writes:
> 
>   MD> Event on my Redhat 8.0 linux system loses (doesn't get?) SIGCHLD's
>   MD> that arrive too quickly.  I have a test script that works
>   MD> correctly on Solaris, but not on Linux, so I'm guessing it's OS
>   MD> specific?  Is this a known issue?
> 
> some OS's merge multiple duplicate signals into one delivery. you should
> always do all the possible work you can when you get such a signal. that
> means when you get a SIGCHLD, you try to reap all child procs until you
> get no more. use a non-blocking waitpid option and a loop. i bet you are
> reaping one child per signal you get.

I am counting $e->hits in my signal watcher, i.e. I do understand
that with rapidly arriving signals, there could be more than one signal
recevied per watcher callback.  I am also using non-blocking waitpid, 
although I think the _arrival_ of signals is independent of waitpid'ing
on them.  In other words, I shouldn't need to waitpid at all, no?

Here's my test script:

--------------------------------------------------
use Event qw(loop unloop);
use Proc::Fork;
use POSIX ':sys_wait_h';
use Proc::WaitStat qw(waitstat);

$c = Event->signal(signal => CHLD, cb => \&reaper );

$launched = 0;
$reaped = 0;

# fork n processes 
for (1..$ARGV[0]) {
    child { sleep 2; exit; } 
    parent { $launched++; }
    error {};
}

# report total number of forked processes
printf STDERR "forked %d children\n", $launched;

my $ret = loop();

sub reaper {
        my $e = shift;
        $reaped += $e->hits;
        my $pid = waitpid(-1,&WNOHANG);
        printf STDERR "reaped %d on this callabck, %d total\n", $e->hits, $reaped;
}
--------------------------------------------------

WHen I run this on a solaris system, I see typically:

  forked 20 children
  reaped 1 on this callabck, 1 total
  reaped 1 on this callabck, 2 total
  reaped 1 on this callabck, 3 total
  reaped 4 on this callabck, 7 total
  reaped 1 on this callabck, 8 total
  reaped 2 on this callabck, 10 total
  reaped 1 on this callabck, 11 total
  reaped 1 on this callabck, 12 total
  reaped 1 on this callabck, 13 total
  reaped 1 on this callabck, 14 total
  reaped 1 on this callabck, 15 total
  reaped 1 on this callabck, 16 total
  reaped 1 on this callabck, 17 total
  reaped 1 on this callabck, 18 total
  reaped 1 on this callabck, 19 total 
  reaped 1 on this callabck, 20 total

..which is what I expect. Some watchers saw multiple hits, but the 
total _always_ equals the total number or forked children (no signals lost)

But on Linux, I see typically:

  forked 20 children
  reaped 1 on this callabck, 1 total
  reaped 1 on this callabck, 2 total
  reaped 1 on this callabck, 3 total
  reaped 1 on this callabck, 4 total

and that's it.  I lost 16 out of 20 signals!

> never assume signal behavior is anything. it is such a poorly defined and
> implemented API.

Are you suggesting I should wiatpid in a loop, even though $e->hits
suggests there's only one signal in the queue?

Mark


Reply via email to