https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100203
Bug ID: 100203 Summary: Dejagnu timeouts don't work Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: jakub at gcc dot gnu.org Target Milestone: --- On i686-linux, the libstdc++ 29_atomics/atomic_float/wait_notify.cc testcase is miscompiled and hangs. That is tracked elsewhere, this PR is about the make check getting stuck forever when it times out. If I cd i686-pc-linux-gnu/libstdc++-v3/testsuite make check RUNTESTFLAGS='-v -v -v conformance.exp=wait_notify.cc' then I see ... spawn -ignore SIGHUP /home/jakub/src/gcc/obj11/./gcc/xg++ -shared-libgcc -B/home/jakub/src/gcc/obj11/./gcc -nostdinc++ -L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src -L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src/.libs -L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/libsupc++/.libs -B/usr/local/i686-pc-linux-gnu/bin/ -B/usr/local/i686-pc-linux-gnu/lib/ -isystem /usr/local/i686-pc-linux-gnu/include -isystem /usr/local/i686-pc-linux-gnu/sys-include -fchecking=1 -B/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs -fmessage-length=0 -fno-show-column -ffunction-sections -fdata-sections -g -O2 -DLOCALEDIR="." -nostdinc++ -I/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/include/i686-pc-linux-gnu -I/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/include -I/home/jakub/src/gcc/libstdc++-v3/libsupc++ -I/home/jakub/src/gcc/libstdc++-v3/include/backward -I/home/jakub/src/gcc/libstdc++-v3/testsuite/util /home/jakub/src/gcc/libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc -std=gnu++2a -pthread -fdiagnostics-plain-output ./libtestc++.a -Wl,--gc-sections -L/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/libstdc++-v3/src/filesystem/.libs -lm -o ./wait_notify.exe pid is 1403276 -1403276 pid is -1 waitres is 1403276 exp7 0 0 output is status 0 calling is_remote host board_info build name getting tucnak name board_info host name getting tucnak name board is host, host is local Checking pattern "sparc-*-sunos*" with i686-pc-linux-gnu Checking pattern "alpha*-*-*" with i686-pc-linux-gnu Checking pattern "hppa*-*-hpux*" with i686-pc-linux-gnu Checking pattern "sparc-*-sunos*" with i686-pc-linux-gnu Checking pattern "alpha*-*-*" with i686-pc-linux-gnu Checking pattern "hppa*-*-hpux*" with i686-pc-linux-gnu board_info target name getting unix name calling is_remote target board_info build name getting tucnak name board_info host name getting tucnak name calling is_remote unix board_info build name getting tucnak name board_info host name getting tucnak name board is unix, not remote board_info target exists is_simulator board_info unix exists name board_info unix name getting unix name board_info unix exists name board_info unix exists protocol board_info unix protocol getting unix protocol call_remote load unix ./wait_notify.exe {} {} board_info unix file_transfer getting unix file_transfer board_info unix connect getting unix connect call_remote calling unix_load loading to unix calling is_remote unix board_info build name getting tucnak name board_info host name getting tucnak name board is unix, not remote Setting LD_LIBRARY_PATH to :/home/jakub/src/gcc/obj11/gcc:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libatomic/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs::/home/jakub/src/gcc/obj11/gcc:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libatomic/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/../libgomp/.libs:/home/jakub/src/gcc/obj11/i686-pc-linux-gnu/./libstdc++-v3/src/.libs Execution timeout is: 300 calling is_remote unix board_info build name getting tucnak name board_info host name getting tucnak name board is unix, not remote remote_spawn is local board_info unix exists name board_info unix name getting unix name spawning command ./wait_notify.exe spawn [open ...] setting board_info(unix,fileid) to exp13 board_info unix exists name board_info unix name getting unix name board_info unix exists name board_info unix exists protocol board_info unix protocol getting unix protocol call_remote wait unix 300 board_info unix file_transfer getting unix file_transfer board_info unix connect getting unix connect call_remote calling standard_wait board_info target exists gcc,timeout board_info target exists gcc,timeout board_info unix fileid getting unix fileid ==== WARNING: program timed out. board_info unix exists name board_info unix name getting unix name board_info unix exists name board_info unix exists protocol board_info unix protocol getting unix protocol call_remote close unix board_info unix connect getting unix connect call_remote calling standard_close board_info unix exists fileid board_info unix fileid getting unix fileid Closing the remote shell exp13 board_info unix exists fileid_origid board_info unix fileid_origid getting unix fileid_origid doing kill, pid is 1403285 1403286 pid is 1403285 1403286 Now, 1403285 process is the wait_notify.exe that is stuck and 1403286 is a cat process that dejagnu? seems to pipe the output of the process through for some reason. dejagnu remote.exp seems to run sh -c "exec > /dev/null 2>&1 && (kill -2 -1403285 1403286 || kill -2 1403285 1403286)" and sh -c "exec > /dev/null 2>&1 && sleep 5 && (kill -15 -1403285 1403286 || kill -15 1403285 1403286) && sleep 5 && (kill -9 -1403285 1403286 || kill -9 1403285 1403286) && sleep 5" The problem is I think in the $pid containing more than one pid. If I run the kill command manually and without stderr redirection, I get: kill -2 -1403285 1403286; echo $? sh: kill: (-1403285) - No such process 0 similarly for -15 or -9. 1403285 pts/23 S+ 0:00 ./wait_notify.exe 1403286 pts/23 Z+ 0:00 [cat] <defunct> While kill man page says that when multiple processes are specified and there is just partial success, 64 should be returned rather than 0, that is not what is happening for me. So, I wonder if if { $pid > 0 } { # Tcl has no kill primitive, so we have to execute an external # command in order to kill the process. verbose "doing kill, pid is $pid" # Prepend "-" to generate the "process group ID" needed by # kill. set pgid "-$pid" # Send SIGINT to give the program a better chance to interrupt # whatever it might be doing and react to stdin closing. # eg, in case of GDB, this should get it back to the prompt. exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid)" # If the program doesn't exit gracefully when stdin closes, # we'll need to kill it. But only do this after 'wait'ing a # bit, to avoid killing the wrong process in case of a # PID-reuse race. The extra sleep at the end is there to give # time to kill $exec_pid without having _that_ be subject to a # PID reuse race. set secs 5 set sh_cmd "exec > /dev/null 2>&1" append sh_cmd " && sleep $secs && (kill -15 $pgid || kill -15 $pid)" append sh_cmd " && sleep $secs && (kill -9 $pgid || kill -9 $pid)" append sh_cmd " && sleep $secs" set exec_pid [exec sh -c "$sh_cmd" &] } shouldn't be changed, so that if $pid contains more than one number instead of doing one (kill -SIGNUM $pgid || kill -SIGNUM $pid) it will do separate kill -SIGNUM -$pid || kill -SIGNUM $pid for each of the pids in the list.