Re: Script command under NetBSD-current
On Wed, 16 Jun 2021, Taylor R Campbell wrote: This indicates a bug in the script(1) program -- it calls exit() in a signal handler, but exit() is not async-signal-safe. Ah nuts... I should've noticed that. The corresponding membar_enter in _rtld_shared_enter at the beginning doesn't make sense and should be removed. In particular, the order generally needs to be something like: mumblefrotz_enter: atomic_r/m/w(lock stuff); membar_enter(); body of critical section; mumblefrotz_exit: membar_exit(); atomic_r/m/w(lock stuff); Putting another membar_enter _before_ the atomic_r/m/w(lock stuff) in mumblefrotz_enter doesn't really do anything. Thanks for this. I'll keep it in mind. -RVP
Re: Script command under NetBSD-current
> Date: Mon, 14 Jun 2021 20:01:49 + > From: RVP > > #0 0x7f7fbfa0ab3a in ___lwp_park60 () from /usr/libexec/ld.elf_so > #1 0x7f7fbfa0265d in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so > #2 0x7f7fbfa03125 in _rtld_exit () from /usr/libexec/ld.elf_so > #3 0x79097fb6bb1f in __cxa_finalize () from /usr/lib/libc.so.12 > #4 0x79097fb6b73d in exit () from /usr/lib/libc.so.12 > #5 0x01401771 in done () > #6 0x01401853 in finish () > #7 This indicates a bug in the script(1) program -- it calls exit() in a signal handler, but exit() is not async-signal-safe. script(1) should be changed to use only async-signal-safe functions in its signal handlers -- e.g., by just setting a flag in the signal handler and either handling EINTR after every blocking syscall or running with signals masked except during pselect/pollts loop. I don't know why it's different in netbsd-9 and current, but it was broken in netbsd-9 before, and there were some changes to some of the logic could which trigger race conditions differently in current. > Date: Tue, 15 Jun 2021 08:15:12 + > From: RVP > > The small patch below fixes it for me. > > --- START PATCH --- > --- libexec/ld.elf_so.orig/rtld.c 2020-09-22 00:41:27.0 + > +++ libexec/ld.elf_so/rtld.c 2021-06-15 08:11:34.301709238 + > @@ -1750,6 +1750,8 @@ > sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */ > sigprocmask(SIG_BLOCK, &blockmask, mask); > > + membar_enter(); This may change some timing with the effect of rejiggering a race condition, but it doesn't meaningfully affect the semantics, and certainly won't prevent a deadlock from calling exit in the signal handler if it interrupts lazy symbol binding. The corresponding membar_enter in _rtld_shared_enter at the beginning doesn't make sense and should be removed. In particular, the order generally needs to be something like: mumblefrotz_enter: atomic_r/m/w(lock stuff); membar_enter(); body of critical section; mumblefrotz_exit: membar_exit(); atomic_r/m/w(lock stuff); Putting another membar_enter _before_ the atomic_r/m/w(lock stuff) in mumblefrotz_enter doesn't really do anything.
Re: Script command under NetBSD-current
On Tuesday 15 Jun 2021 08:15:12 RVP wrote: > On Mon, 14 Jun 2021, Dave Tyson wrote: > > NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue > > Jun 8 19:42:49 GMT 2021 > > r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64 > > > > If you issue a script command, run a process with takes a little while and > > then an exit command to close the script file and return to a command > > prompt. viz: > > > > cd /usr/pkgsrc > > script /tmp/ll > > cvs update -dP > > exit > > > > what happens is you don't get a command prompt - its like the script > > command > > freezes and doesn't return. Issuing a ps shows the script processes: > The small patch below fixes it for me. > > --- START PATCH --- > --- libexec/ld.elf_so.orig/rtld.c 2020-09-22 00:41:27.0 + > +++ libexec/ld.elf_so/rtld.c 2021-06-15 08:11:34.301709238 + > @@ -1750,6 +1750,8 @@ > sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */ > sigprocmask(SIG_BLOCK, &blockmask, mask); > > + membar_enter(); > + > for (;;) { > if (atomic_cas_uint(&_rtld_mutex, 0, locked_value) == 0) { > membar_enter(); > --- END PATCH --- > > -RVP Thanks for the fast response and fix. I can confirm the patch fixes the problem. Please can this be committed Cheers, Dave
Re: Script command under NetBSD-current
On Mon, 14 Jun 2021, Dave Tyson wrote: NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue Jun 8 19:42:49 GMT 2021 r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64 If you issue a script command, run a process with takes a little while and then an exit command to close the script file and return to a command prompt. viz: cd /usr/pkgsrc script /tmp/ll cvs update -dP exit what happens is you don't get a command prompt - its like the script command freezes and doesn't return. Issuing a ps shows the script processes: The small patch below fixes it for me. --- START PATCH --- --- libexec/ld.elf_so.orig/rtld.c 2020-09-22 00:41:27.0 + +++ libexec/ld.elf_so/rtld.c2021-06-15 08:11:34.301709238 + @@ -1750,6 +1750,8 @@ sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */ sigprocmask(SIG_BLOCK, &blockmask, mask); + membar_enter(); + for (;;) { if (atomic_cas_uint(&_rtld_mutex, 0, locked_value) == 0) { membar_enter(); --- END PATCH --- -RVP
Re: Script command under NetBSD-current
On Mon, 14 Jun 2021, RVP wrote: The parent process is reading from the child: $ gdb -p 2655 Attaching to process 2655 Reading symbols from /usr/bin/script... (No debugging symbols found in /usr/bin/script) Reading symbols from /usr/lib/libutil.so.7... (No debugging symbols found in /usr/lib/libutil.so.7) Reading symbols from /usr/lib/libc.so.12... (No debugging symbols found in /usr/lib/libc.so.12) Reading symbols from /usr/libexec/ld.elf_so... (No debugging symbols found in /usr/libexec/ld.elf_so) [Switching to LWP 2655 of process 2655] 0x79097fa4499a in read () from /usr/lib/libc.so.12 (gdb) bt #0 0x79097fa4499a in read () from /usr/lib/libc.so.12 #1 0x01402012 in main () (gdb) quit A debugging session is active. Inferior 1 [process 2655] will be detached. Quit anyway? (y or n) y Detaching from program: /usr/bin/script, process 2655 [Inferior 1 (process 2655) detached] $ You can kill it normally and then the child also exits. Incorrect. The child process _does not_ exit--it hangs around in ___lwp_park60(). -RVP
Script command under NetBSD-current
I thought I noticed an issue with the script command under NetBSD-current a few weeks ago, but recently stumbled on it again. This is under: NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue Jun 8 19:42:49 GMT 2021 r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64 If you issue a script command, run a process with takes a little while and then an exit command to close the script file and return to a command prompt. viz: cd /usr/pkgsrc script /tmp/ll cvs update -dP exit what happens is you don't get a command prompt - its like the script command freezes and doesn't return. Issuing a ps shows the script processes: USER PID %CPU %MEM VSZ RSS TTY STAT STARTEDTIME COMMAND UID PID PPID CPU PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND root1250 0.0 0.0 17812 1516 pts/0 I+5:40PM 0:00.00 script 0 1250 11150 85 0 17812 1516 ttyraw I+ pts/0 0:00.00 script /tmp/ll root1252 0.0 0.0 17924 1304 pts/0 I+5:40PM 0:00.32 script 0 1252 12500 43 0 17924 1304 parked I+ pts/0 0:00.32 script /tmp/ll Both appear to be idle. Killing the parent process returns a command prompt. Anyone else seeing this? Dave