The NetBSD shell - and I suspect many others, perhaps all others, waits
for any terminated children (reaps them from the kernel) more or less as
soon as they exit - then remembers the info in the internal jobs table
for later reporting status via "wait $pid" or "jobs"  (or just an
interactive prompt) at the appropriate time.

This has the advantage that the kernel's process table has zombie
processes removed quickly, and isn't cluttered with trash lying around
because some script is running lots of background processes without waiting
for any of them - the only cost is (or seems to be) some memory in the
shell's jobs table (which the standard allows us to bound, if we desire).

However, I have been pondering a somewhat weird case (or more
correctly, possibility, as I have never actually seen it happen)

Consider

        bg-process-1 & PID1=$!
        long-running-monster-fg-process
        bg-process-2 & PID2=$!

"long-running-monster-fg-orocess" is something like a complete system
build, including lots of add-on utilities (imagine, gnome and all that
goes with it, and kde, and all associated with that ...) - it doesn't
really matter, except that there are lots of processes being run.
It is irrelevant whether that is lots of childrem from the current
shell, or whether that is a script (or "make" or something) that simply
takes a long time to complete.

In this case, and with the shell strategy above, it is possible that
PID1 and PID2 contain the same value.

In that case, if both background processes have exited, and
the script then does

        wait $PID1

what are we supposed to do?   How are we to distinguish that from

        wait $PID2

?

Does anyone know of a shell that correctly handles this now?


The only solutions I can see are to:

Only ever use waitpid() with an explicit pid for the particular
process of which we actually want the exit status, and leave all
other completed processes as zombies until they are wanted (any of
the other newer wait*() sys calls with similar functionality would
do as well of course).

This would mean that while a pipeline is running, we would be unable
to report status of earlier completed elements of the pipe when the
final (rightmost) process is still yet to complete, which would be annoying
(but not actually fatal to anything).

It would also mean that there would be no way to retain the

        wait -p PID -n $PID1 $PID2 ...

command option that the NetBSD shell has, which waits for any one of the
specified jobs to finish (any that was already finished, in which case
there is no actual wait and a random one of the completed jobs is selected)
or the next of them which happens to finish, if none were already done.
(The "-p PID" option names a variable in which the ID of the job that
finished is placed - the same as the arg string if there is one, with no
pid args, the pid of the job (what $! was when the job started), the exit
status of the wait command is the status of that job).   That relies on
being able to wait for any child to exit.


Or:

We always use wait*() with the WNOWAIT flag when waiting for any random
child to complete, and then wait() again (wthout WNOWAIT, but with the
explicit pid) when we want to clean up the jobs table entry for that job.

The problem with this (aside from WNOWAIT in the standard only applying
to waitid() - in practice I suspect that all of the wait*() sys calls
that take a flags arg implement the same set of flags - certainly NetBSD
does) is that I see no way to prevent that child process being returned
again and again every time we do an anonymous wait*() system call.  That
is, I see no way to wait for something not previously ever waited upon,
which is what we would need here - the kernel would need a bunch more
mechanism, and a new WXXXXX flag would be required.    NetBSD has
WNOZOMBIE ("Ignore zombies") which only waits for some running process
to change status - but that's no use, we want to get status from processes
that have already exited (ie: zombies) if there are any - just only once.


Of course, both of these "solutions" mean keeping zombies in the kernel
process table - that's the point, as that prevents the kernel from
re-using the process ID.


Or:

Every time the shell forks, before running any of the subshell code,
it could check whether the PID it was assigned is a PID that is still "live"
in the jobs table, and if so, it simply exits without doing anything.
Simultaneously the parent is doing the same check using the new child's
PID.   Since the two are simply forks() of the parent, the data structures
they see are identical - both child and parent will answer that check the
same way.   When the check reports "still in use" the child simply exits
(as mentioned). the parent simply does a waitpid(PID, ...) to clean up
that child (without ever having entered it into any data structs) and
then forks again, and the whole process repeats.

This is the solution I see with most promise, but relies upon the kernel
not simply assigning the same pid over and over again (even if there happens
to only be one available unused pid to assign).   To deal with this the
parent shell would need something like a counter of attempts, and if we
fail to get a new pid after a few attempts, give up, and signal a fork error.

This looks kind of cumbersome and ugly to me - even though I don't
currently see any other plausible solution to this, that meets our goals.


I'd love to hear from anyone who has (or can even imagine, regardless of
whether it is currently implemented anywhere) a better solution for this
issue.   Or if for some reason I am not understanding this isn't even a
potential (certainly it is extremely unlikely) problem, then why.

kre

ps: note that we don't currently have a problem with the kernel assigning
the pid of a previously exited process, which is still alive in the
jobs table, the shell can cope with that - the issue only arises when that
pid is communicated to the script, and then used by the script.   A similar
problem would be if the script attempted

        kill $PID1

after   bg-process-1  has finished (without the script realising that)
which then ends up signalling $PID2 (the same thing) which is still
running.   Of course, a similar problem can happen here, without PID2
being involved - with the script simply signalling some unintended process.
The only way of avoiding that would be to keep the zombies until the
script has been made aware that the process is completed, after which it
is simply a script bug if it tries to kill a process it knows is already
complete.

Reply via email to