On Thu, Aug 06, 2020 at 03:23:08AM +0200, Michał Mirosław wrote: > On Thu, Aug 06, 2020 at 03:10:55AM +0200, Michał Mirosław wrote: > > On Thu, Aug 06, 2020 at 02:39:42AM +0200, Michał Mirosław wrote: > > > On Thu, Aug 06, 2020 at 03:16:35AM +0300, Peter Pentchev wrote: > > > > On Thu, Aug 06, 2020 at 12:48:10AM +0200, Michał Mirosław wrote: > > > > > On Thu, Aug 06, 2020 at 12:29:36AM +0300, Peter Pentchev wrote: > > > > > > On Wed, Aug 05, 2020 at 10:52:31PM +0200, Michał Mirosław wrote: > > > > > [...] > > > > > > > Using print-debugging, I see that it stops at wait_for_child line > > > > > > > just > > > > > > > after printing the version. It seems that something is reaping > > > > > > > the child > > > > > > > before the main thread has a chance to wait for it. > > > > > > > > > > > > OK, so the only thing that comes to my mind now is that you may be > > > > > > hitting a crazy, crazy race between register_child() and > > > > > > child_reaper(), > > > > > > and I say "a crazy, crazy race", because the test has to (apparently > > > > > > reproducibly) receive the CHLD signal exactly between the check and > > > > > > the creation in register_child()'s first "$children{...} //= ...cv" > > > > > > statement. > > > > > > > > > > Well, there is nothing that prevents SIGCHLD arriving between fork() > > > > > and > > > > > register_child(). You could test this with more confidence (though not > > > > > 100%-reliably) by putting 'exit 1' just at the start of ($pid == 0) > > > > > branch. > > > > > > > > Nah, the problem is not just "between fork() and register_child()". > > > > It really must arrive at a very specific moment in time, because > > > > the //= operations for setting $children{$pid}{cv} try to make sure that > > > > a new value is not set (that is, a new condition variable is not > > > > created) if there already is such an element in the array. So the race > > > > is indeed between the //= in register_child() and the //= in > > > > child_reaper() - that is, child_reaper() must be invoked (SIGCHLD must > > > > arrive) *during* the execution of the //= in register_child(). > > > > > > > > Unless I'm missing something, which is not at all out of the question :) > > > > > > The assignment seems not to be at fault (see last strace). I don't know > > > perl's > > > internals enough to say if this statement can be interrupted visibly by a > > > signal > > > handler (I would guess not a perl handler, though). There are two wait4() > > > calls > > > even before child_reaper has a chance to run. > > > > Another data point: this happens only with anyevent + libev and not with > > anyevent + libevent. The first is preferred and installed by default with > > libanyevent-perl, though.
Aaaaand this is why I could not reproduce it until now - I've always (well, okay, ever since it was introduced, I'm a bit older than that) had apt *not* automatically install recommended packages... And here I thought I was going crazy... thanks, now it's, mm, let's say easier to reproduce! > AnyEvent's doc [1] mentions that the framework installs (or just might?) it's > own SIGCHLD handler. Maybe there are just too many handlers for SIGCHLD? Aaaaaaand this is why I should never be let near a keyboard... So how many years have I been doing Perl programming now?... and I managed to forget about AnyEvent installing its own SIGCHLD handler? Great. Just great. Thanks an *awful* lot for your perseverance, your analysis, and basically doing my own debugging work for me! Expect another patch soon. G'luck, Peter -- Peter Pentchev r...@ringlet.net r...@debian.org p...@storpool.com PGP key: http://people.FreeBSD.org/~roam/roam.key.asc Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13
signature.asc
Description: PGP signature