> >>>>> "MD" == Mark Dedlow <[EMAIL PROTECTED]> writes:
>
> MD> Event on my Redhat 8.0 linux system loses (doesn't get?) SIGCHLD's
> MD> that arrive too quickly. I have a test script that works
> MD> correctly on Solaris, but not on Linux, so I'm guessing it's OS
> MD> specific? Is this a known issue?
>
> some OS's merge multiple duplicate signals into one delivery. you should
> always do all the possible work you can when you get such a signal. that
> means when you get a SIGCHLD, you try to reap all child procs until you
> get no more. use a non-blocking waitpid option and a loop. i bet you are
> reaping one child per signal you get.
I am counting $e->hits in my signal watcher, i.e. I do understand
that with rapidly arriving signals, there could be more than one signal
recevied per watcher callback. I am also using non-blocking waitpid,
although I think the _arrival_ of signals is independent of waitpid'ing
on them. In other words, I shouldn't need to waitpid at all, no?
Here's my test script:
--------------------------------------------------
use Event qw(loop unloop);
use Proc::Fork;
use POSIX ':sys_wait_h';
use Proc::WaitStat qw(waitstat);
$c = Event->signal(signal => CHLD, cb => \&reaper );
$launched = 0;
$reaped = 0;
# fork n processes
for (1..$ARGV[0]) {
child { sleep 2; exit; }
parent { $launched++; }
error {};
}
# report total number of forked processes
printf STDERR "forked %d children\n", $launched;
my $ret = loop();
sub reaper {
my $e = shift;
$reaped += $e->hits;
my $pid = waitpid(-1,&WNOHANG);
printf STDERR "reaped %d on this callabck, %d total\n", $e->hits, $reaped;
}
--------------------------------------------------
WHen I run this on a solaris system, I see typically:
forked 20 children
reaped 1 on this callabck, 1 total
reaped 1 on this callabck, 2 total
reaped 1 on this callabck, 3 total
reaped 4 on this callabck, 7 total
reaped 1 on this callabck, 8 total
reaped 2 on this callabck, 10 total
reaped 1 on this callabck, 11 total
reaped 1 on this callabck, 12 total
reaped 1 on this callabck, 13 total
reaped 1 on this callabck, 14 total
reaped 1 on this callabck, 15 total
reaped 1 on this callabck, 16 total
reaped 1 on this callabck, 17 total
reaped 1 on this callabck, 18 total
reaped 1 on this callabck, 19 total
reaped 1 on this callabck, 20 total
..which is what I expect. Some watchers saw multiple hits, but the
total _always_ equals the total number or forked children (no signals lost)
But on Linux, I see typically:
forked 20 children
reaped 1 on this callabck, 1 total
reaped 1 on this callabck, 2 total
reaped 1 on this callabck, 3 total
reaped 1 on this callabck, 4 total
and that's it. I lost 16 out of 20 signals!
> never assume signal behavior is anything. it is such a poorly defined and
> implemented API.
Are you suggesting I should wiatpid in a loop, even though $e->hits
suggests there's only one signal in the queue?
Mark