On 1/31/24 2:35 PM, Robert Elz wrote:
| Not quite. `new' in this sense is the opposite of `anything in the past' | as Dale described it -- already notified and removed from the jobs list.I guess the part about bash that I am not understanding here is how the "already notified" works. To me there are just two ways for that, either the user has done a "wait" which has collected that pid already (either without -n, and no pid args, or with pid args and one of those is the pid in question) or with -n and the pid in question was the one whose status was returned, or the user/script did the jobs command (or jobs -l) and the job in question was shown as completed. Is there some other way?
Notification after a job terminates due to a signal in a non-interactive shell -- that runs the equivalent of `jobs'. As it turns out, this was the problem with Steven Pelley's original report. I fixed one issue, but that kind of notification will leave jobs marked as notified and eligible to be removed from the jobs list.
| Half the problem here is that bash aggressively marks dead jobs as being | notified in non-interactive shells without job control enabled, and moves | them out of the jobs table. That might be more than half the problem, it might be the entire problem.
It seems to be in this case. It's a good thing it's limited to processes that terminate due to signals; a bad thing that processes terminating due to signals was the entire subject of the original report.
| but if you | do, or if you use wait -n with pid/job arguments (which you've presumably | saved yourself) you're going to need slightly different semantics than we | have now to answer that reliably. And that will probably need a new option. That's a pity, particularly since the current semantics don't seem tobe useful in general.
Shoehorning pid/job arguments into the previous behavior, which only dealt with running jobs, resulted in the current semantics. I should probably have made `wait -n' with pid arguments look at terminated and notified processes, but I didn't change the `running job' semantics. Hindsight.
Since the sole issue provoking that seems to be the wait over and over policy,
It's not a policy, per se, it's behavior that has historically worked that way.
rather than "wait once, and remove completely"
POSIX semantics.
perhaps rather than a new, but different, -n like option, a better idea would be a "only once" option (ie: if the option (-r (remove) or -c (cleanup) or -o (once only)) is set, then when the wait with that option returns status or, or waits until termination without returning status (in the not -n case, with no pid args, or many pid args) then the processes are completely deleted fromeverywhere in the shell.
Or you could use posix mode with the recent change, already in devel, since POSIX requires this behavior (but see below).
Using that option would make a changed -n safe to use in loops. If you do that, also add an option (maybe the upper case version of whatever is selected for that one, or just some other letter) to mean "don't wait" (kind of like wait(2) WNOWAIT) - which in default bash would just be a no-op (except in posix mode, apparently - whereas the -[cor] option would be a no-op in posix mode).
You're not the only one to suggest some new option(s). Only one really matters for this discussion.
If you were to do that, other shells could add the same (except in probably all of them, -[cor] would always be the default, and the other one would be the one which changes behaviour).
That's always hit or miss.
| > The one change that should be made is | > to allow wait -n to collect processes/jobs that have already terminated. | | Yes, that's one of the things we're talking about. I don't have any problem | with it, but should it take a new option to change those semantics? Good, though I think some more thought should go into that. In another thread you said (paraphrasing) correctly, that scripts should not be relying upon bugs, and the current wait -n behaviour is a bug - that it might have been intentionally coded that way doesn't make it any less so.
Trust me, there are people on the other side of that question.
It isn't as if it was ever documented to work the way it does, or everyone would have known about it already.
You mean the behavior of `wait -n' with pid arguments, I presume. The problem with your statement is that people do know about it. The question, as above, is whether or not to avoid changing the behavior because they do. There are two things that we could change: 1. wait -n needs to get access to the list of terminated pids (the ones that satisfy POSIX's "CHILD_MAX processes known in the current shell environment"), like wait without -n does. This can happen via a wait option, a shell option, or a change in behavior controlled by the compatibility level. 2. Some option to implement the posix-mode semantics of removing a pid from this list of "known processes" that has finer granularity than `set -o posix'. This can happen in the same way(s).
message was unclear about what "more like wait without -n" meant.
#1 above.
| Yeah, but we're talking about bash here. It doesn't really matter what | the Bourne shell did; there are likely plenty of scripts that assume | the historical bash behavior. Really? Why? What's the point of collecting the status twice?
Who can say? But is it reasonable for wait to return a status for a pid that terminated due to a signal and displayed a status message? Or, since `jobs' lets you see the status but not capture or do anything with it, is it reasonable to allow wait to collect the status of those, too? So you have this second list, which you need anyway to keep track of the last CHILD_MAX exit statuses. Back in 2005 I didn't want to use that much storage in the jobs list to save all these exited process statuses, and I didn't want to spend time traversing a huge jobs list to add a new one. Let's just say it was a less capable device world. Hell, the script in the original bug report that resulted in this took 1-2 hours to run. The information is there if you need it, but saving it doesn't slow normal operation down. And bash only lazily removes pids from that second list (hash table, really), when you exceed CHILD_MAX (or the RLIMIT_NPROC limit, or the max upper bound), so you can wait for them more than once. I'm sure there are scripts that take advantage of it for some reason I can't think of.
Maybe a better discussion, and potential change, would be to whatever other that the use of the wait, or jobs, commands can result in a job moving out of the jobs list. If there were nothing other than those, (and jobs list overflow or similar) then we'd be fine, and it seems to me now, no change to the -n operation would be needed.
See above.
| That hasn't actually been true with bash running in default mode for a | very long time now. Bash has allowed multiple waits for the same pid for | many years, whether or not you or I think it's a good idea or the correct | semantics. Even if it was an accident of the implementation, and maybe you | could say it was, we are stuck with it. Which is why I suggested an option (just above) to turn that misfeature off. Even better perhaps might be a bash shopt.
See #2 above. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/
OpenPGP_signature.asc
Description: OpenPGP digital signature