On Sun, Mar 15, 2020 at 03:36:39PM +0700, Robert Elz wrote: > The NetBSD shell - and I suspect many others, perhaps all others, waits > for any terminated children (reaps them from the kernel) more or less as > soon as they exit - then remembers the info in the internal jobs table > for later reporting status via "wait $pid" or "jobs" (or just an > interactive prompt) at the appropriate time.
> This has the advantage that the kernel's process table has zombie > processes removed quickly, and isn't cluttered with trash lying around > because some script is running lots of background processes without waiting > for any of them - the only cost is (or seems to be) some memory in the > shell's jobs table (which the standard allows us to bound, if we desire). > However, I have been pondering a somewhat weird case (or more > correctly, possibility, as I have never actually seen it happen) > Consider > bg-process-1 & PID1=$! > long-running-monster-fg-process > bg-process-2 & PID2=$! > "long-running-monster-fg-orocess" is something like a complete system > build, including lots of add-on utilities (imagine, gnome and all that > goes with it, and kde, and all associated with that ...) - it doesn't > really matter, except that there are lots of processes being run. > It is irrelevant whether that is lots of childrem from the current > shell, or whether that is a script (or "make" or something) that simply > takes a long time to complete. > In this case, and with the shell strategy above, it is possible that > PID1 and PID2 contain the same value. > In that case, if both background processes have exited, and > the script then does > wait $PID1 > what are we supposed to do? How are we to distinguish that from > wait $PID2 > ? > Does anyone know of a shell that correctly handles this now? About six years ago I committed something to FreeBSD sh that fixed the storage of the exit status of the second process: https://svnweb.freebsd.org/base?view=revision&revision=263453 The commit message also included a (slow) test: exit 7 & p1=$!; until exit 8 & p2=$!; [ "$p1" = "$p2" ]; do wait "$p2"; done; sleep 0.1; wait %1; echo $? I guess I did not fix the problem with $! not being a unique identifier because it seemed very hard for a very improbable situation. > The only solutions I can see are to: > Only ever use waitpid() with an explicit pid for the particular > process of which we actually want the exit status, and leave all > other completed processes as zombies until they are wanted (any of > the other newer wait*() sys calls with similar functionality would > do as well of course). > This would mean that while a pipeline is running, we would be unable > to report status of earlier completed elements of the pipe when the > final (rightmost) process is still yet to complete, which would be annoying > (but not actually fatal to anything). > It would also mean that there would be no way to retain the > wait -p PID -n $PID1 $PID2 ... > command option that the NetBSD shell has, which waits for any one of the > specified jobs to finish (any that was already finished, in which case > there is no actual wait and a random one of the completed jobs is selected) > or the next of them which happens to finish, if none were already done. > (The "-p PID" option names a variable in which the ID of the job that > finished is placed - the same as the arg string if there is one, with no > pid args, the pid of the job (what $! was when the job started), the exit > status of the wait command is the status of that job). That relies on > being able to wait for any child to exit. > Or: > We always use wait*() with the WNOWAIT flag when waiting for any random > child to complete, and then wait() again (wthout WNOWAIT, but with the > explicit pid) when we want to clean up the jobs table entry for that job. > The problem with this (aside from WNOWAIT in the standard only applying > to waitid() - in practice I suspect that all of the wait*() sys calls > that take a flags arg implement the same set of flags - certainly NetBSD > does) is that I see no way to prevent that child process being returned > again and again every time we do an anonymous wait*() system call. That > is, I see no way to wait for something not previously ever waited upon, > which is what we would need here - the kernel would need a bunch more > mechanism, and a new WXXXXX flag would be required. NetBSD has > WNOZOMBIE ("Ignore zombies") which only waits for some running process > to change status - but that's no use, we want to get status from processes > that have already exited (ie: zombies) if there are any - just only once. I agree that WNOWAIT is not helpful here because the same child process may be returned over and over. What is necessary here is a notification mechanism for process termination that is not a wait*() function. Such a mechanism does not seem to exist in POSIX but exists on various operating systems: * fully queued SIGCHLD with siginfo (works on FreeBSD but not Linux) * kqueue with EVFILT_PROC (various BSD systems) * proc connector (Linux) * whatever pwait(1) uses (Solaris and related systems) By the way, most of these mechanisms also allow waiting for an unrelated process to terminate. > Of course, both of these "solutions" mean keeping zombies in the kernel > process table - that's the point, as that prevents the kernel from > re-using the process ID. Yes, although this keeping is only necessary for the most recent background process or if $! has been referenced for it. > Or: > Every time the shell forks, before running any of the subshell code, > it could check whether the PID it was assigned is a PID that is still "live" > in the jobs table, and if so, it simply exits without doing anything. > Simultaneously the parent is doing the same check using the new child's > PID. Since the two are simply forks() of the parent, the data structures > they see are identical - both child and parent will answer that check the > same way. When the check reports "still in use" the child simply exits > (as mentioned). the parent simply does a waitpid(PID, ...) to clean up > that child (without ever having entered it into any data structs) and > then forks again, and the whole process repeats. > This is the solution I see with most promise, but relies upon the kernel > not simply assigning the same pid over and over again (even if there happens > to only be one available unused pid to assign). To deal with this the > parent shell would need something like a counter of attempts, and if we > fail to get a new pid after a few attempts, give up, and signal a fork error. Reuse can be prevented by delaying the waitpid() on the unwanted duplicates until a process with a unique PID has been created or a limit has been reached. In the general case this information can be stored as a flag in the previous job structure, so it does not allocate unbounded memory in userspace. > This looks kind of cumbersome and ugly to me - even though I don't > currently see any other plausible solution to this, that meets our goals. > I'd love to hear from anyone who has (or can even imagine, regardless of > whether it is currently implemented anywhere) a better solution for this > issue. Or if for some reason I am not understanding this isn't even a > potential (certainly it is extremely unlikely) problem, then why. > ps: note that we don't currently have a problem with the kernel assigning > the pid of a previously exited process, which is still alive in the > jobs table, the shell can cope with that - the issue only arises when that > pid is communicated to the script, and then used by the script. A similar > problem would be if the script attempted > kill $PID1 > after bg-process-1 has finished (without the script realising that) > which then ends up signalling $PID2 (the same thing) which is still > running. Of course, a similar problem can happen here, without PID2 > being involved - with the script simply signalling some unintended process. > The only way of avoiding that would be to keep the zombies until the > script has been made aware that the process is completed, after which it > is simply a script bug if it tries to kill a process it knows is already > complete. A possible fix would be to add a magic variable that returns the most recent background job's identifier in %<number> form, somewhat like $! in that referencing it causes the job to be remembered. Scripts would need to use the new variable instead of $!. -- Jilles Tjoelker