On Wed, Dec 05, 2018 at 04:56:21PM -0500, Jeff King wrote:
> Could we just kill them all?
> 
> I guess it's a little tricky, because $! is going to give us the pid of
> each subshell. We actually want to kill the test sub-process. That takes
> a few contortions, but the output is nice (every sub-job immediately
> says "ABORTED ...", and the final process does not exit until the whole
> tree is done):

It's trickier than that:

  - Nowadays our test lib translates SIGINT to exit, but not TERM (or
    HUP, in case a user decides to close the terminal window), which
    means that if the test runs any daemons in the background, then
    those won't be killed when the stress test is aborted.

    This is easy enough to address, and I've been meaning to do this
    in an other patch series anyway.

  - With the (implied) '--verbose-log' the process killed in the
    background subshell's SIGTERM handler is not the actual test
    process, because '--verbose-log' runs the test again in a piped
    subshell to save its output:
    
      (GIT_TEST_TEE_STARTED=done ${TEST_SHELL_PATH} "$0" "$@" 2>&1;
       echo $? >"$TEST_RESULTS_BASE.exit") | tee -a "$GIT_TEST_TEE_OUTPUT_FILE"

    That 'kill' kills the "outer" shell process.
    And though I get "ABORTED..." immediately and I get back my
    prompt right away, the commands involved in the above construct
    are still running in the background:

      $ ./t3903-stash.sh --stress=1
      ^CABORTED  0.0
      $ ps a |grep t3903
      1280 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
      1281 pts/17   S      0:00 tee -a 
<...>/test-results/t3903-stash.stress-0.out
      1282 pts/17   S      0:00 /bin/sh ./t3903-stash.sh --stress=1
      4173 pts/17   S+     0:00 grep t3903

    I see this behavior with all shells I tried.
    I haven't yet started to think it through why this happens :)

    Not implying '--verbose-log' but redirecting the test script's
    output, like you did in your 'stress' script, seems to work in
    dash, ksh, and ks93, but not in Bash v4.3 or later, where, for
    whatever reason, the subshell get SIGINT before the SIGTERM would
    arrive.
    While we could easily handle SIGINT in the subshell with the same
    signal handler as SIGTERM, it bothers me that something
    fundamental is wrong here.
    Furthermore, then we should perhaps forbid '--stress' in
    combination with '--verbose-log' or '--tee'.
    

> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 9b7f687396..357dead3ff 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -98,8 +98,9 @@ done,*)
>       mkdir -p "$(dirname "$TEST_RESULTS_BASE")"
>       stressfail="$TEST_RESULTS_BASE.stress-failed"
>       rm -f "$stressfail"
> -     trap 'echo aborted >"$stressfail"' TERM INT HUP
> +     trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' 
> TERM INT HUP
>  
> +     job_pids=
>       job_nr=0
>       while test $job_nr -lt "$job_count"
>       do
> @@ -108,10 +109,13 @@ done,*)
>                       GIT_TEST_STRESS_JOB_NR=$job_nr
>                       export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
>  
> +                     trap 'kill $test_pid 2>/dev/null' TERM
> +
>                       cnt=0
>                       while ! test -e "$stressfail"
>                       do
> -                             if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
> +                             $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & 
> test_pid=$!
> +                             if wait
>                               then
>                                       printf >&2 "OK   %2d.%d\n" 
> $GIT_TEST_STRESS_JOB_NR $cnt
>                               elif test -f "$stressfail" &&
> @@ -124,16 +128,11 @@ done,*)
>                               fi
>                               cnt=$(($cnt + 1))
>                       done
> -             ) &
> +             ) & job_pids="$job_pids $!"
>               job_nr=$(($job_nr + 1))
>       done
>  
> -     job_nr=0
> -     while test $job_nr -lt "$job_count"
> -     do
> -             wait
> -             job_nr=$(($job_nr + 1))
> -     done
> +     wait
>  
>       if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
>       then
> 

Reply via email to