On Sun, 2015-03-01 at 20:14 -0500, Chet Ramey wrote:
> On 2/27/15 12:10 PM, Dave Anderson wrote:
> > 
> > This issue was first reported with respect to the crash utility,
> > which is an interactive program that uses the readline library.  
> > 
> > The problem occurs only if the crash utility is run from within 
> > an executable bash script, i.e., like so:
> > 
> >   $ cat doit
> >   crash
> >   $
> > 
> > If crash is invoked as above, the crash utility does its initialization
> > and eventually calls readline().  Then, if CTRL-z is entered, the parent 
> > bash shell itself is blocked, but the crash utility spins at 100% cpu usage.
> > Debugging it shows that the crash utility is stuck spinning in the readline
> > libary's _set_tty_settings() function, where the tcsetattr() call repeatedly
> > fails with an EINTR, where _rl_caught_signal contains SIGTTOU.  
> > 
> > But taking the crash utility out of the picture, I can reproduce it with
> > readline-6.3.tar.gz, where I simply build it with "configure; make", then 
> > go into the examples subdirectory, and enter "make".  If I then put the
> > simple "rl" command in script file, and do the same thing, this happens:
> > 
> >   $ cat doit
> >   ./rl
> >   $ ./doit
> >   readline$ ^Z
> >   [1]+  Stopped                 ./doit
> >   $
> >   $ top
> >   top - 12:02:33 up 23:12,  5 users,  load average: 0.37, 0.09, 0.04
> >   Tasks: 159 total,   2 running, 154 sleeping,   3 stopped,   0 zombie
> >   Cpu(s):  3.4%us, 21.6%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si,  
> > 0.0%st
> >   Mem:   3917056k total,  3709052k used,   208004k free,    88732k buffers
> >   Swap:  4063228k total,        0k used,  4063228k free,  3049316k cached
> >   
> >     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND      
> >                                                             
> >   12336 root      20   0  100m 1016  788 R 100.0  0.0   1:12.13 rl
> >       1 root      20   0 19356 1532 1216 S  0.0  0.0   0:02.89 init
> >       2 root      20   0     0    0    0 S  0.0  0.0   0:00.02 kthreadd
> >       3 root      RT   0     0    0    0 S  0.0  0.0   0:00.19 migration/0
> >   ...
> > 
> >    
> > If I attach gdb to the rl process above, it shows the same ultimate trace as
> > the spinning crash utility does:
> > 
> >   # gdb -p 12336
> >   GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
> >   Copyright (C) 2010 Free Software Foundation, Inc.
> >   License GPLv3+: GNU GPL version 3 or later 
> > <http://gnu.org/licenses/gpl.html>
> >   This is free software: you are free to change and redistribute it.
> >   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> >   and "show warranty" for details.
> >   This GDB was configured as "x86_64-redhat-linux-gnu".
> >   For bug reporting instructions, please see:
> >   <http://www.gnu.org/software/gdb/bugs/>.
> >   Attaching to process 12336
> >   Reading symbols from /root/readline-6.3/examples/rl...done.
> >   Reading symbols from /lib64/libtinfo.so.5...Reading symbols from 
> > /usr/lib/debug/lib64/libtinfo.so.5.7.debug...done.
> >   done.
> >   Loaded symbols for /lib64/libtinfo.so.5
> >   Reading symbols from /lib64/libc.so.6...Reading symbols from 
> > /usr/lib/debug/lib64/libc-2.12.so.debug...done.
> >   done.
> >   Loaded symbols for /lib64/libc.so.6
> >   Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from 
> > /usr/lib/debug/lib64/ld-2.12.so.debug...done.
> >   done.
> >   Loaded symbols for /lib64/ld-linux-x86-64.so.2
> >   0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized 
> > out>, termios_p=0x62cb60)
> >       at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
> >   84          retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
> >   (gdb) bt
> >   #0  0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value 
> > optimized out>, termios_p=0x62cb60)
> >       at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
> >   #1  0x0000000000406f8d in _set_tty_settings (tty=0, tiop=0x62cb60) at 
> > rltty.c:476
> >   #2  0x0000000000406fe3 in set_tty_settings (tty=<value optimized out>, 
> > tiop=<value optimized out>) at rltty.c:490
> >   #3  0x00000000004072d0 in rl_deprep_terminal () at rltty.c:688
> >   #4  0x0000000000413352 in rl_cleanup_after_signal () at signals.c:536
> >   #5  0x0000000000413731 in _rl_handle_signal (sig=20) at signals.c:232
> >   #6  0x00000000004137e5 in _rl_signal_handler (sig=<value optimized out>) 
> > at signals.c:155
> >   #7  0x0000000000415575 in rl_getc (stream=0x33f558e6c0) at input.c:480
> >   #8  0x0000000000415a60 in rl_read_key () at input.c:462
> >   #9  0x000000000040340d in readline_internal_char () at readline.c:564
> >   #10 0x00000000004037d3 in readline_internal_charloop (prompt=<value 
> > optimized out>) at readline.c:629
> >   #11 readline_internal (prompt=<value optimized out>) at readline.c:643
> >   #12 readline (prompt=<value optimized out>) at readline.c:369
> >   #13 0x00000000004025b6 in main (argc=1, argv=0x7fff0af285d8) at rl.c:149
> >   (gdb) 
> >   (gdb) c
> >   Continuing.
> >   
> >   Program received signal SIGTTOU, Stopped (tty output).
> >   0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized 
> > out>, termios_p=0x62cb60)
> >       at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
> >   84          retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
> >   (gdb) c
> >   Continuing.
> >   
> >   Program received signal SIGTTOU, Stopped (tty output).
> >   0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized 
> > out>, termios_p=0x62cb60)
> >       at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
> >   84          retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
> >   (gdb) c
> >   Continuing.
> >   
> >   Program received signal SIGTTOU, Stopped (tty output).
> >   0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized 
> > out>, termios_p=0x62cb60)
> >       at ../sysdeps/unix/sysv/linux/tcsetattr.c:84
> >   84          retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios);
> >   (gdb) 
> >   
> > I have tried this with several different kernel versions, RHEL6, Fedora, 
> > RHEL7,
> > all with the same result.  The examples above are on a RHEL6 
> > 2.6.32-537.el6.x86_64
> > kernel.
> 
> Here's what's happening: the application calling readline (crash) gets the
> SIGTSTP while readline has control.  The readline signal handler catches
> it and tries to clean up readline's state, including restoring the terminal
> attributes.  Unfortunately, by this time, the kernel has marked the entire
> process group as a background pgrp, and disallows writing to the terminal
> (the TOSTOP setting doesn't matter in this case).  The attempt to restore
> the terminal settings generates the SIGTTOU you see.  The SIGTTOU causes
> readline to follow the signal handling code path, and the same thing
> happens again and again.
> 

I agree this is what is happening - the signal handler loops around
pathologically trying the same operation and getting the same result.
Can you explain why the problem is intermittent?  I can reproduce it but
not always.


> There are a couple of things that can be done here.  The first is removing
> the shell from the equation.  Changing `rl' to `exec rl' seems to eliminate
> the problem behavior.  (The shell doesn't matter: running it from dash
> does the same thing with and without the `exec'.)  That makes me wonder
> whether the difference is whether or not the process using readline is the
> process group leader, but I can't figure out why that would make the
> difference.
> 
> Obviously, preventing readline from trying to restore the terminal settings
> will solve this problem, but that's a little drastic: any program using
> readline will then leave the terminal settings modified on SIGTSTP, which
> will cause havoc for users of shells who don't restore the terminal
> settings when a process stops or terminates due to a signal.
> 
> I will have to look at some other things.  Any ideas are welcome.
> 

Not knowing much about this code, but just looking at a high-level view
of software behavior, it does seem like a signal handler problem.
Though not sure what it would take to fix it without introducing
undesired side-effects as you describe or otherwise.




Reply via email to