OK, I see in POSIX mode that a trap on SIGCHLD will cause wait to unblock. We are still maintaining a counter of running jobs though so it seems to me that there could race condition in the following line
trap '((j--))' CHLD if two processes quit in rapid succession and one trap gets preempted in the middle of ((j--)) then the count may be off. Is this possible? I tried to test whether or not traps are mutually exclusive with the following code and got more interesting warnings. The count appears to suggest that there is indeed a race condition going on here but I am unsure what "bad value in trap_list" means? #!/bin/bash count=0 function dummy { usleep $RANDOM } set -m trap ': $(( ++count ))' CHLD for i in {1..1000} do dummy $i & done wait echo $count $ ./trap_race ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 ./trap_race: warning: run_pending_traps: bad value in trap_list[17]: 0x4536e0 983 Thanks, ------- Elliott Forney On Mon, Nov 5, 2012 at 5:11 PM, Dan Douglas <orm...@gmail.com> wrote: > Hi Elliott. The behavior of wait differs depending upon whether you are in > POSIX mode. Try this script, which I think does essentially what you're after > (also here: https://gist.github.com/3911059 ): > > #!/usr/bin/env bash > > ${BASH_VERSION+shopt -s lastpipe extglob} > > if [[ -v .sh.version ]]; then > builtin getconf > function BASHPID.get { > read -r .sh.value _ </proc/self/stat > } > fi > > function f { > printf '%d: sleeping %d sec\n' "${@:1:2}" >&2 > sleep "$2" > > printf '%d: returning %d\n' "$1" "$3" >&2 > return "$3" > } > > function main { > typeset -i n= j= maxj=$(getconf _NPROCESSORS_ONLN) > > set -m > trap '((j--))' CHLD > > while ((n++<30)); do > f "$BASHPID" $(((RANDOM%5)+1)) $((RANDOM%2)) & > ((++j >= maxj)) && POSIXLY_CORRECT= wait > done > > echo 'finished, waiting for remaining jobs...' >&2 > wait > } > > main "$@" > echo > > # vim: set fenc=utf-8 ff=unix ts=4 sts=4 sw=4 ft=sh nowrap et: > > > The remaining issues are making it work in other shells (Bash in non-POSIX > mode agrees with ksh, but ksh doesn't agree with POSIX), and also I can't > think of a reasonable way to retrieve the exit statuses. The status of "wait" > is rather useless here. Otherwise I think this is the best approach, using > SIGCHLD and relying upon the POSIX wait behavior. See here: > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_11 > > An issue to be aware of is that the trap will fire when any child exits > including command/process substitutions or pipelines etc. If any are located > within the main loop then monitor mode needs to be toggled off around them. > -- > Dan Douglas