On Sun, 2015-03-01 at 20:14 -0500, Chet Ramey wrote: > On 2/27/15 12:10 PM, Dave Anderson wrote: > > > > This issue was first reported with respect to the crash utility, > > which is an interactive program that uses the readline library. > > > > The problem occurs only if the crash utility is run from within > > an executable bash script, i.e., like so: > > > > $ cat doit > > crash > > $ > > > > If crash is invoked as above, the crash utility does its initialization > > and eventually calls readline(). Then, if CTRL-z is entered, the parent > > bash shell itself is blocked, but the crash utility spins at 100% cpu usage. > > Debugging it shows that the crash utility is stuck spinning in the readline > > libary's _set_tty_settings() function, where the tcsetattr() call repeatedly > > fails with an EINTR, where _rl_caught_signal contains SIGTTOU. > > > > But taking the crash utility out of the picture, I can reproduce it with > > readline-6.3.tar.gz, where I simply build it with "configure; make", then > > go into the examples subdirectory, and enter "make". If I then put the > > simple "rl" command in script file, and do the same thing, this happens: > > > > $ cat doit > > ./rl > > $ ./doit > > readline$ ^Z > > [1]+ Stopped ./doit > > $ > > $ top > > top - 12:02:33 up 23:12, 5 users, load average: 0.37, 0.09, 0.04 > > Tasks: 159 total, 2 running, 154 sleeping, 3 stopped, 0 zombie > > Cpu(s): 3.4%us, 21.6%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, > > 0.0%st > > Mem: 3917056k total, 3709052k used, 208004k free, 88732k buffers > > Swap: 4063228k total, 0k used, 4063228k free, 3049316k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > > 12336 root 20 0 100m 1016 788 R 100.0 0.0 1:12.13 rl > > 1 root 20 0 19356 1532 1216 S 0.0 0.0 0:02.89 init > > 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd > > 3 root RT 0 0 0 0 S 0.0 0.0 0:00.19 migration/0 > > ... > > > > > > If I attach gdb to the rl process above, it shows the same ultimate trace as > > the spinning crash utility does: > > > > # gdb -p 12336 > > GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) > > Copyright (C) 2010 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later > > <http://gnu.org/licenses/gpl.html> > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > > and "show warranty" for details. > > This GDB was configured as "x86_64-redhat-linux-gnu". > > For bug reporting instructions, please see: > > <http://www.gnu.org/software/gdb/bugs/>. > > Attaching to process 12336 > > Reading symbols from /root/readline-6.3/examples/rl...done. > > Reading symbols from /lib64/libtinfo.so.5...Reading symbols from > > /usr/lib/debug/lib64/libtinfo.so.5.7.debug...done. > > done. > > Loaded symbols for /lib64/libtinfo.so.5 > > Reading symbols from /lib64/libc.so.6...Reading symbols from > > /usr/lib/debug/lib64/libc-2.12.so.debug...done. > > done. > > Loaded symbols for /lib64/libc.so.6 > > Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from > > /usr/lib/debug/lib64/ld-2.12.so.debug...done. > > done. > > Loaded symbols for /lib64/ld-linux-x86-64.so.2 > > 0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized > > out>, termios_p=0x62cb60) > > at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 > > 84 retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios); > > (gdb) bt > > #0 0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value > > optimized out>, termios_p=0x62cb60) > > at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 > > #1 0x0000000000406f8d in _set_tty_settings (tty=0, tiop=0x62cb60) at > > rltty.c:476 > > #2 0x0000000000406fe3 in set_tty_settings (tty=<value optimized out>, > > tiop=<value optimized out>) at rltty.c:490 > > #3 0x00000000004072d0 in rl_deprep_terminal () at rltty.c:688 > > #4 0x0000000000413352 in rl_cleanup_after_signal () at signals.c:536 > > #5 0x0000000000413731 in _rl_handle_signal (sig=20) at signals.c:232 > > #6 0x00000000004137e5 in _rl_signal_handler (sig=<value optimized out>) > > at signals.c:155 > > #7 0x0000000000415575 in rl_getc (stream=0x33f558e6c0) at input.c:480 > > #8 0x0000000000415a60 in rl_read_key () at input.c:462 > > #9 0x000000000040340d in readline_internal_char () at readline.c:564 > > #10 0x00000000004037d3 in readline_internal_charloop (prompt=<value > > optimized out>) at readline.c:629 > > #11 readline_internal (prompt=<value optimized out>) at readline.c:643 > > #12 readline (prompt=<value optimized out>) at readline.c:369 > > #13 0x00000000004025b6 in main (argc=1, argv=0x7fff0af285d8) at rl.c:149 > > (gdb) > > (gdb) c > > Continuing. > > > > Program received signal SIGTTOU, Stopped (tty output). > > 0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized > > out>, termios_p=0x62cb60) > > at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 > > 84 retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios); > > (gdb) c > > Continuing. > > > > Program received signal SIGTTOU, Stopped (tty output). > > 0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized > > out>, termios_p=0x62cb60) > > at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 > > 84 retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios); > > (gdb) c > > Continuing. > > > > Program received signal SIGTTOU, Stopped (tty output). > > 0x00000033f52dff48 in tcsetattr (fd=0, optional_actions=<value optimized > > out>, termios_p=0x62cb60) > > at ../sysdeps/unix/sysv/linux/tcsetattr.c:84 > > 84 retval = INLINE_SYSCALL (ioctl, 3, fd, cmd, &k_termios); > > (gdb) > > > > I have tried this with several different kernel versions, RHEL6, Fedora, > > RHEL7, > > all with the same result. The examples above are on a RHEL6 > > 2.6.32-537.el6.x86_64 > > kernel. > > Here's what's happening: the application calling readline (crash) gets the > SIGTSTP while readline has control. The readline signal handler catches > it and tries to clean up readline's state, including restoring the terminal > attributes. Unfortunately, by this time, the kernel has marked the entire > process group as a background pgrp, and disallows writing to the terminal > (the TOSTOP setting doesn't matter in this case). The attempt to restore > the terminal settings generates the SIGTTOU you see. The SIGTTOU causes > readline to follow the signal handling code path, and the same thing > happens again and again. >
I agree this is what is happening - the signal handler loops around pathologically trying the same operation and getting the same result. Can you explain why the problem is intermittent? I can reproduce it but not always. > There are a couple of things that can be done here. The first is removing > the shell from the equation. Changing `rl' to `exec rl' seems to eliminate > the problem behavior. (The shell doesn't matter: running it from dash > does the same thing with and without the `exec'.) That makes me wonder > whether the difference is whether or not the process using readline is the > process group leader, but I can't figure out why that would make the > difference. > > Obviously, preventing readline from trying to restore the terminal settings > will solve this problem, but that's a little drastic: any program using > readline will then leave the terminal settings modified on SIGTSTP, which > will cause havoc for users of shells who don't restore the terminal > settings when a process stops or terminates due to a signal. > > I will have to look at some other things. Any ideas are welcome. > Not knowing much about this code, but just looking at a high-level view of software behavior, it does seem like a signal handler problem. Though not sure what it would take to fix it without introducing undesired side-effects as you describe or otherwise.