Re: Script command under NetBSD-current

2021-06-16 Thread RVP

On Wed, 16 Jun 2021, Taylor R Campbell wrote:


This indicates a bug in the script(1) program -- it calls exit() in a
signal handler, but exit() is not async-signal-safe.



Ah nuts... I should've noticed that.


The corresponding membar_enter in _rtld_shared_enter at the beginning
doesn't make sense and should be removed.  In particular, the order
generally needs to be something like:

mumblefrotz_enter:
atomic_r/m/w(lock stuff);
membar_enter();

body of critical section;

mumblefrotz_exit:
membar_exit();
atomic_r/m/w(lock stuff);

Putting another membar_enter _before_ the atomic_r/m/w(lock stuff) in
mumblefrotz_enter doesn't really do anything.



Thanks for this. I'll keep it in mind.

-RVP


Re: Script command under NetBSD-current

2021-06-16 Thread Taylor R Campbell
> Date: Mon, 14 Jun 2021 20:01:49 +
> From: RVP 
> 
> #0  0x7f7fbfa0ab3a in ___lwp_park60 () from /usr/libexec/ld.elf_so
> #1 0x7f7fbfa0265d in _rtld_exclusive_enter () from /usr/libexec/ld.elf_so
> #2  0x7f7fbfa03125 in _rtld_exit () from /usr/libexec/ld.elf_so
> #3  0x79097fb6bb1f in __cxa_finalize () from /usr/lib/libc.so.12
> #4  0x79097fb6b73d in exit () from /usr/lib/libc.so.12
> #5  0x01401771 in done ()
> #6  0x01401853 in finish ()
> #7  

This indicates a bug in the script(1) program -- it calls exit() in a
signal handler, but exit() is not async-signal-safe.

script(1) should be changed to use only async-signal-safe functions in
its signal handlers -- e.g., by just setting a flag in the signal
handler and either handling EINTR after every blocking syscall or
running with signals masked except during pselect/pollts loop.

I don't know why it's different in netbsd-9 and current, but it was
broken in netbsd-9 before, and there were some changes to some of the
logic could which trigger race conditions differently in current.


> Date: Tue, 15 Jun 2021 08:15:12 +
> From: RVP 
> 
> The small patch below fixes it for me.
> 
> --- START PATCH ---
> --- libexec/ld.elf_so.orig/rtld.c 2020-09-22 00:41:27.0 +
> +++ libexec/ld.elf_so/rtld.c  2021-06-15 08:11:34.301709238 +
> @@ -1750,6 +1750,8 @@
>   sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */
>   sigprocmask(SIG_BLOCK, &blockmask, mask);
> 
> + membar_enter();

This may change some timing with the effect of rejiggering a race
condition, but it doesn't meaningfully affect the semantics, and
certainly won't prevent a deadlock from calling exit in the signal
handler if it interrupts lazy symbol binding.

The corresponding membar_enter in _rtld_shared_enter at the beginning
doesn't make sense and should be removed.  In particular, the order
generally needs to be something like:

mumblefrotz_enter:
atomic_r/m/w(lock stuff);
membar_enter();

body of critical section;

mumblefrotz_exit:
membar_exit();
atomic_r/m/w(lock stuff);

Putting another membar_enter _before_ the atomic_r/m/w(lock stuff) in
mumblefrotz_enter doesn't really do anything.


Re: Script command under NetBSD-current

2021-06-15 Thread Dave Tyson
On Tuesday 15 Jun 2021 08:15:12 RVP wrote:
> On Mon, 14 Jun 2021, Dave Tyson wrote:
> > NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue
> > Jun  8 19:42:49 GMT 2021
> > r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
> > 
> > If you issue a script command, run a process with takes a little while and
> > then an exit command to close the script file and return to a command
> > prompt. viz:
> > 
> > cd /usr/pkgsrc
> > script /tmp/ll
> > cvs update -dP
> > exit
> > 
> > what happens is you don't get a command prompt - its like the script
> > command
> > freezes and doesn't return. Issuing a ps shows the script processes:
> The small patch below fixes it for me.
> 
> --- START PATCH ---
> --- libexec/ld.elf_so.orig/rtld.c 2020-09-22 00:41:27.0 +
> +++ libexec/ld.elf_so/rtld.c  2021-06-15 08:11:34.301709238 +
> @@ -1750,6 +1750,8 @@
>   sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */
>   sigprocmask(SIG_BLOCK, &blockmask, mask);
> 
> + membar_enter();
> +
>   for (;;) {
>   if (atomic_cas_uint(&_rtld_mutex, 0, locked_value) == 0) {
>   membar_enter();
> --- END PATCH ---
> 
> -RVP

Thanks for the fast response and fix. I can confirm the patch fixes the 
problem. Please can this be committed

Cheers,
Dave 


Re: Script command under NetBSD-current

2021-06-15 Thread RVP

On Mon, 14 Jun 2021, Dave Tyson wrote:


NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue Jun  8
19:42:49 GMT 2021
r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64

If you issue a script command, run a process with takes a little while and
then an exit command to close the script file and return to a command prompt.
viz:

cd /usr/pkgsrc
script /tmp/ll
cvs update -dP
exit

what happens is you don't get a command prompt - its like the script command
freezes and doesn't return. Issuing a ps shows the script processes:



The small patch below fixes it for me.

--- START PATCH ---
--- libexec/ld.elf_so.orig/rtld.c   2020-09-22 00:41:27.0 +
+++ libexec/ld.elf_so/rtld.c2021-06-15 08:11:34.301709238 +
@@ -1750,6 +1750,8 @@
sigdelset(&blockmask, SIGTRAP); /* Allow the debugger */
sigprocmask(SIG_BLOCK, &blockmask, mask);

+   membar_enter();
+
for (;;) {
if (atomic_cas_uint(&_rtld_mutex, 0, locked_value) == 0) {
membar_enter();
--- END PATCH ---

-RVP


Re: Script command under NetBSD-current

2021-06-14 Thread RVP

On Mon, 14 Jun 2021, RVP wrote:


The parent process is reading from the child:
$ gdb -p 2655
Attaching to process 2655
Reading symbols from /usr/bin/script...
(No debugging symbols found in /usr/bin/script)
Reading symbols from /usr/lib/libutil.so.7...
(No debugging symbols found in /usr/lib/libutil.so.7)
Reading symbols from /usr/lib/libc.so.12...
(No debugging symbols found in /usr/lib/libc.so.12)
Reading symbols from /usr/libexec/ld.elf_so...
(No debugging symbols found in /usr/libexec/ld.elf_so)
[Switching to LWP 2655 of process 2655]
0x79097fa4499a in read () from /usr/lib/libc.so.12
(gdb) bt
#0  0x79097fa4499a in read () from /usr/lib/libc.so.12
#1  0x01402012 in main ()
(gdb) quit
A debugging session is active.

   Inferior 1 [process 2655] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/script, process 2655
[Inferior 1 (process 2655) detached]
$

You can kill it normally and then the child also exits.



Incorrect. The child process _does not_ exit--it hangs around
in ___lwp_park60().

-RVP


Script command under NetBSD-current

2021-06-14 Thread Dave Tyson
I thought I noticed an issue with the script command under NetBSD-current a 
few weeks ago, but recently stumbled on it again.

This is under:

NetBSD cruncher2.anduin.org.uk 9.99.83 NetBSD 9.99.83 (GENERIC) #2: Tue Jun  8 
19:42:49 GMT 2021  
r...@cruncher2.anduin.org.uk:/usr/obj/sys/arch/amd64/compile/GENERIC amd64

If you issue a script command, run a process with takes a little while and 
then an exit command to close the script file and return to a command prompt. 
viz: 

cd /usr/pkgsrc
script /tmp/ll
cvs update -dP
exit

what happens is you don't get a command prompt - its like the script command 
freezes and doesn't return. Issuing a ps shows the script processes:

USER PID %CPU %MEM   VSZ   RSS TTY   STAT STARTEDTIME COMMAND  UID  
PID PPID  CPU PRI NI   VSZ   RSS WCHAN   STAT TTY  TIME COMMAND
root1250  0.0  0.0 17812  1516 pts/0 I+5:40PM 0:00.00 script 0 
1250 11150  85  0 17812  1516 ttyraw  I+   pts/0 0:00.00 script /tmp/ll 
root1252  0.0  0.0 17924  1304 pts/0 I+5:40PM 0:00.32 script 0 
1252 12500  43  0 17924  1304 parked  I+   pts/0 0:00.32 script /tmp/ll 

Both appear to be idle. Killing the parent process returns a command prompt.

Anyone else seeing this?

Dave