Simon Marlow wrote: > > The fine points of Unix signal semantics have always been somewhat > > mysterious to me. However, after digging around in man pages > > for a while, > > I have a theory as to what's going "wrong"... > > Yes, your diagnosis looks very plausible. > > The right way, I believe, to handle this in your signal handler is to > call getAnyProcessStatus repeatedly until it doesn't return any more > children (not forgetting to use the non-blocking version, ie. the first > arg should be False). Does that help?
I had thought of having the signal handler reap as many terminated child processes as possible, but had been concerned about a possible race condition. After you suggested that approach, I thought some more and decided that no race problem should exist. So I've implemented multiple reaping and it does help. I no longer have any tests hang as before. (Note that I still do see the occasional "EVACUATED object entered!" error.) However, the implementation turned out to be surprisingly complex. The first issue I confronted is that the get*ProcessStatus routines return an error rather than "nothing" if there is no candidate child process. (The GHC routines simply reflect the system call semantics.) This required me to maintain a count of child processes so I could avoid trying to reap nonexistent children. Fortunately, adding the counting to my monad was not difficult. But having the signal handler avoid subsequent reaping was insufficient. I apparently had instances of the signal handler for which there were no children to reap. What I figure must be happening is something like the following. A sigCHLD signal comes in. A signal handler instance is created. But before it is run, sigCHLD is unblocked. A second sigCHLD signal comes in. A second signal handler instance is created. One of the signal handler instances runs and reaps both terminated children. The second signal handler instance runs and finds nothing to reap. I bullet-proofed my logic, so that the signal handler conditions even its first getAnyProcessStatus call on a nonzero child count. (By the way, I had to be careful to lock access to the child count appropriately.) The result now seems to work properly. Two questions: 1. Though I'm now immune to seeing "too many" sigCHLD signals, I still rely on seeing "enough" of them. Can you think of any way that a signal could go unnoticed in my scheme described above? 2. Is my supposition true, that sigCHLD is unblocked *before* invoking the signal handler? If so, I think this subtle but important semantic difference between GHC RTS and POSIX signal handling should be documented. Are there other differences? Dean _______________________________________________ Glasgow-haskell-bugs mailing list [EMAIL PROTECTED] http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs