Date: Fri, 29 Apr 2022 15:39:23 +0100 From: "Geoff Clare via austin-group-l at The Open Group" <austin-group-l@opengroup.org> Message-ID: <20220429143923.GA22521@localhost>
Sorry, been too busy to participate here much recently, will catch up someday soon (I hope). | However, today it threw a last curve ball when I was working on an | update to the description of set -b ... How many shells actually implement that? | This conflicts with 2.9.3.1 Asynchronous Lists which says that IDs | remain known until: | | 1. The command terminates and the application waits for the process ID. | | 2. Another asynchronous list is invoked before "$!" (corresponding to | the previous asynchronous list) is expanded in the current execution | environment. Does anyone implement that bit (#2) at all? In a non-interactive shell it might almost be possible, but in an interactive shell, if the job isn't in the list (whether $! has been referenced or not - usually it will not have been) because it has been removed, what is the shell supposed to do if the job stops? Further users (even in scripts) are allowed to use % %- %1 etc to refer to jobs, $! isn't the only way to reference one ("wait %2 should work). I'd suggest that #2 should simply be removed. But do note that the definition of the jobs command says: When jobs reports the termination status of a job, the shell shall remove its process ID from the list of those ``known in the current shell execution environment''; see Section 2.9.3.1 (on page 2338). (quote from I8 Draft 2.1 -- but that text has been there forever, or seemingly). So that's another way that an entry is removed, and this one is "shall remove" whereas "remain known until" puts a minimum on how long the job is supposed to remain known, but doesn't actually require removal. For #2 that's obvious, shells aren't required to make that optimisation (that's some academic view of what was thought should be possible - but isn't in practice), but for #1 if the job isn't removed (when wait happens) then it could still be there, again, and again, forever - even if the system uses the same pid later (days, weeks, months later perhaps) for another job started by the same shell -- against which there is no protection of any kind currently, though a shell could do WNOWAIT waits so zombies remain in the process table, even though the shell has already collected the exit status - but that's difficult to actually code correctly, especially given the definition of how SIGCHLD works, which as best I can tell has to be used as the only thing that would make it even conceivable to use WNOWAIT. Without that, when the shell acts like I believe most, or all do, and cleans up zombies ASAP, just keeping the job in its jobs table, marked terminated, with the status ready to give back when requested, the kernel is free to assign the reclaimed pid to any new process it likes, whenever it likes. | My initial reaction to this was that the above quote from set -b is | likely a left-over from before the decision to disallow the historical | remove-before-prompting behaviour was made. I doubt that -b is particularly relevant to this, other than that it provides an alternate time at which termination status of a process can be shown. | However, then I spotted that the text from wait, which seems to be an | attempt to justify that decision, first says it was historical | behaviour for *interactive* shells but then talks about the problems | it could cause for *scripts*. So it seems to me that the | justification does not stand up to scrutiny. The justification doesn't, but for scripts I don't recall there ever really being an issue - the removal happens when the status of jobs which have changed status is reported just before PS1 is written, and non-interactive shells (scripts) don't do that. On the other hand, users of interactive shells are not in the habit of issuing wait commands (even jobs commands, without some reason do do so). They expect to be told when a background job has finished (without -b both working, and set, that might require causing new prompts to appear from time to time) and simply expect that when a job has been reported as done, it is done, and no longer exists. | It also appears that dash still implements remove-before-prompting. Does anyone not? | B. Allow remove-before-prompting. This would mean changing 2.9.3.1 to | add a third list item (for interactive shells only) and deleting the | above quoted text from the wait page. This is necessary, we would be making use of the shell too difficult for interactive users otherwise. But there is no particular need for an "interactive only" here, scripts can (though usually don't) use the jobs command as well (it is a convenient way to get rid of any jobs from the table that have finished, without knowing what they are, and without potentially hanging waiting for something still running). Note that the jobs command (in Rationale, so not normative) also says: In an early proposal, a -n option was included to ``Display the status of jobs that have changed, exited, or stopped since the last status report''. It was removed because the shell always writes any changed status of jobs before each prompt. where what is relevant here is the final sentence. I don't recall where that is actually stated to happen, but I think it is something like "as if by the jobs command", which then would be requiring "shall remove" jobs which have been reported as finished from the jobs table. This only happens in interactive shells (though I suppose a script could do "set -b"), but there's no need to specifically mention this in 2.9.3.1 While you're considering all of this, you might want to also consider what is intended to happen if a script does trap '' CHLD and how that is supposed to interact with maintenance of the jobs command, the wait command, and all else related. FWIW, while we allow that (or anything else the shell wants to do with SIGCHLD traps) nothing the user does has any impact at all upon the disposition of SIGCHLD signals, and regardless of what the standard says, if SIGCHLD is ignored on shell startup, it isn't for very long after (nothing works properly in a shell if SIGCHLD is ignored). And last, also in this area, is the question of stopped jobs and the wait command, and how those two are intended to interact. kre