On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote: > Could we just kill them all? > > I guess it's a little tricky, because $! is going to give us the pid of > each subshell. We actually want to kill the test sub-process. That takes > a few contortions, but the output is nice (every sub-job immediately > says "ABORTED ...", and the final process does not exit until the whole > tree is done):
It's trickier than that: - Nowadays our test lib translates SIGINT to exit, but not TERM (or HUP, in case a user decides to close the terminal window), which means that if the test runs any daemons in the background, then those won't be killed when the stress test is aborted. This is easy enough to address, and I've been meaning to do this in an other patch series anyway. - With the (implied) '--verbose-log' the process killed in the background subshell's SIGTERM handler is not the actual test process, because '--verbose-log' runs the test again in a piped subshell to save its output: (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1; echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE" That 'kill' kills the "outer" shell process. And though I get "ABORTED..." immediately and I get back my prompt right away, the commands involved in the above construct are still running in the background: $ ./t3903-stash.sh --stress=1 ^CABORTED 0.0 $ ps a |grep t3903 1280 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 1281 pts/17 S 0:00 tee -a <...>/test-results/t3903-stash.stress-0.out 1282 pts/17 S 0:00 /bin/sh ./t3903-stash.sh --stress=1 4173 pts/17 S+ 0:00 grep t3903 I see this behavior with all shells I tried. I haven't yet started to think it through why this happens :) Not implying '--verbose-log' but redirecting the test script's output, like you did in your 'stress' script, seems to work in dash, ksh, and ks93, but not in Bash v4.3 or later, where, for whatever reason, the subshell get SIGINT before the SIGTERM would arrive. While we could easily handle SIGINT in the subshell with the same signal handler as SIGTERM, it bothers me that something fundamental is wrong here. Furthermore, then we should perhaps forbid '--stress' in combination with '--verbose-log' or '--tee'. > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 9b7f687396..357dead3ff 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -98,8 +98,9 @@ done,*) > mkdir -p "$(dirname "$TEST_RESULTS_BASE")" > stressfail="$TEST_RESULTS_BASE.stress-failed" > rm -f "$stressfail" > - trap 'echo aborted >"$stressfail"' TERM INT HUP > + trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' > TERM INT HUP > > + job_pids= > job_nr=0 > while test $job_nr -lt "$job_count" > do > @@ -108,10 +109,13 @@ done,*) > GIT_TEST_STRESS_JOB_NR=$job_nr > export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR > > + trap 'kill $test_pid 2>/dev/null' TERM > + > cnt=0 > while ! test -e "$stressfail" > do > - if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 > + $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & > test_pid=$! > + if wait > then > printf >&2 "OK %2d.%d\n" > $GIT_TEST_STRESS_JOB_NR $cnt > elif test -f "$stressfail" && > @@ -124,16 +128,11 @@ done,*) > fi > cnt=$(($cnt + 1)) > done > - ) & > + ) & job_pids="$job_pids $!" > job_nr=$(($job_nr + 1)) > done > > - job_nr=0 > - while test $job_nr -lt "$job_count" > - do > - wait > - job_nr=$(($job_nr + 1)) > - done > + wait > > if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" > then >