Re: When can shells remove "known" process IDs from the list?

Robert Elz via austin-group-l at The Open Group Fri, 29 Apr 2022 11:41:38 -0700

    Date:        Fri, 29 Apr 2022 15:39:23 +0100
    From:        "Geoff Clare via austin-group-l at The Open Group" 
<austin-group-l@opengroup.org>
    Message-ID:  <20220429143923.GA22521@localhost>


Sorry, been too busy to participate here much recently, will catch up
someday soon (I hope).

  | However, today it threw a last curve ball when I was working on an
  | update to the description of set -b ...

How many shells actually implement that?

  | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs
  | remain known until:
  |
  |  1. The command terminates and the application waits for the process ID.
  |
  |  2. Another asynchronous list is invoked before "$!" (corresponding to
  |     the previous asynchronous list) is expanded in the current execution
  |     environment.

Does anyone implement that bit (#2) at all?  In a non-interactive shell it
might almost be possible, but in an interactive shell, if the job isn't in
the list (whether $! has been referenced or not - usually it will not have
been) because it has been removed, what is the shell supposed to do if the
job stops?   Further users (even in scripts) are allowed to use % %- %1
etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should
work).   I'd suggest that #2 should simply be removed.

But do note that the definition of the jobs command says:

        When jobs reports the termination status of a job, the shell shall
        remove its process ID from the list of those ``known in the current
        shell execution environment''; see Section 2.9.3.1 (on page 2338).

(quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly).

So that's another way that an entry is removed, and this one is "shall remove"
whereas "remain known until" puts a minimum on how long the job is supposed
to remain known, but doesn't actually require removal.   For #2 that's obvious,
shells aren't required to make that optimisation (that's some academic view of
what was thought should be possible - but isn't in practice), but for #1 if
the job isn't removed (when wait happens) then it could still be there, again,
and again, forever - even if the system uses the same pid later (days, weeks,
months later perhaps) for another job started by the same shell -- against which
there is no protection of any kind currently, though a shell could do WNOWAIT
waits so zombies remain in the process table, even though the shell has 
already collected the exit status - but that's difficult to actually
code correctly, especially given the definition of how SIGCHLD works, which
as best I can tell has to be used as the only thing that would make it
even conceivable to use WNOWAIT.   Without that, when the shell acts like
I believe most, or all do, and cleans up zombies ASAP, just keeping the
job in its jobs table, marked terminated, with the status ready to give
back when requested, the kernel is free to assign the reclaimed pid to any
new process it likes, whenever it likes.

  | My initial reaction to this was that the above quote from set -b is
  | likely a left-over from before the decision to disallow the historical
  | remove-before-prompting behaviour was made.

I doubt that -b is particularly relevant to this, other than that it provides
an alternate time at which termination status of a process can be shown.

  | However, then I spotted that the text from wait, which seems to be an
  | attempt to justify that decision, first says it was historical
  | behaviour for *interactive* shells but then talks about the problems
  | it could cause for *scripts*.  So it seems to me that the
  | justification does not stand up to scrutiny.

The justification doesn't, but for scripts I don't recall there ever
really being an issue - the removal happens when the status of jobs which
have changed status is reported just before PS1 is written, and
non-interactive shells (scripts) don't do that.

On the other hand, users of interactive shells are not in the habit of
issuing wait commands (even jobs commands, without some reason do do so).
They expect to be told when a background job has finished (without -b both
working, and set, that might require causing new prompts to appear from time
to time) and simply expect that when a job has been reported as done, it is
done, and no longer exists.

  | It also appears that dash still implements remove-before-prompting.

Does anyone not?

  | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to
  | add a third list item (for interactive shells only) and deleting the
  | above quoted text from the wait page.

This is necessary, we would be making use of the shell too difficult for
interactive users otherwise.   But there is no particular need for an
"interactive only" here, scripts can (though usually don't) use the jobs
command as well (it is a convenient way to get rid of any jobs from the
table that have finished, without knowing what they are, and without
potentially hanging waiting for something still running).

Note that the jobs command (in Rationale, so not normative) also
says:

        In an early proposal, a -n option was included to ``Display the
        status of jobs that have changed, exited, or stopped since the last
        status report''. It was removed because the shell always writes
        any changed status of jobs before each prompt.

where what is relevant here is the final sentence.   I don't recall where that
is actually stated to happen, but I think it is something like "as if by the
jobs command", which then would be requiring "shall remove"  jobs which have
been reported as finished from the jobs table.   This only happens in 
interactive shells (though I suppose a script could do "set -b"), but there's
no need to specifically mention this in 2.9.3.1

While you're considering all of this, you might want to also consider what
is intended to happen if a script does

        trap '' CHLD

and how that is supposed to interact with maintenance of the jobs command,
the wait command, and all else related.

FWIW, while we allow that (or anything else the shell wants to do with SIGCHLD
traps) nothing the user does has any impact at all upon the disposition of
SIGCHLD signals, and regardless of what the standard says, if SIGCHLD is
ignored on shell startup, it isn't for very long after (nothing works properly
in a shell if SIGCHLD is ignored).

And last, also in this area, is the question of stopped jobs and the wait
command, and how those two are intended to interact.

kre

Re: When can shells remove "known" process IDs from the list?

Reply via email to