On Sun, 11 Aug 2002, David Xu wrote: > following is patch for su, I can type "suspend" and stop $$ without the > problem you described, I have tested it under tcsh and bash, all works > for me. > > --- su.c Mon Aug 12 13:08:01 2002 > +++ su.c.new Mon Aug 12 13:16:14 2002 > @@ -329,10 +329,13 @@ > default: > while ((ret_pid = waitpid(child_pid, &statusp, WUNTRACED)) != -1) { > if (WIFSTOPPED(statusp)) { > - child_pgrp = tcgetpgrp(1); > kill(getpid(), SIGSTOP); > - tcsetpgrp(1, child_pgrp); > - kill(child_pid, SIGCONT); > + child_pgrp = getpgid(child_pid); > + if (tcgetpgrp(1) == getpgrp()) > + { > + tcsetpgrp(1, child_pgrp); > + kill(child_pid, SIGCONT); > + } > statusp = 1; > continue; > }
Explanation of this patch: (1) su has shot itself in the foot using PAM. Normally the parent shell waits for children and handles them when they stop. The extra process for PAM is now in between the parent shell and the su shell, so the parent shell can't do this directly. The above code attempts to relay some aspects of job control back to the parent shell. It is not clear that it can do this properly without duplicating lots of shell specific job control, but I think it can do this in principle. There are related problems for propagation of SIGHUP to indirect descendants of login shells when the shell exits. Here there is at least there is an intermediate process that can relay the signals if necessary. I think propagation of SIGHUP is automatic if the intermediate process doesn't exit first and it doesn't change its job control stuff too much, so the SIGHUP problem doesn't affect PAMmed applications. (2) To relay SIGSTOP, the intermediate su just needs stop itself. To relay SIGCONT, the intermediate su needs to switch to enough of its child's job control environment before starting the child. Switching only fd 1's process group seems to be sufficient, but it is not easy to determine even that and the broken version got it wrong. The child's environment is very shell-dependent. Some of the following may depend on the initial shell being bash: (a) sh, csh and bash start a new process group (equal to their pid). zsh stays in the process group of the intermediate su process. (b) "kill -STOP $$ ... fg" worked in most (all?) cases because fd 1's pgrp is still the child's pgid when the child is killed in that way. For zsh the child's pgid is the same as the intermediate shell's so the pgrps can't be different, and for the other shells I think the pgrp hasn't been changed because the child can'tcontrol it (SIGSTOP is uncatchable) and the kernel doesn't. Later, switching fd 1's pgrp back to the child's pgid works except possbly for zsh because it is correct and different. (c) "suspend ... fg" failed for several reasons. First, something (presumably the child) sets fd 1's pgrp to the intermediate su's pgid, so tcgetpgrp(1) gives a wrong pgrp for restoring later. The patch fixes this by not getting the pgrp in this way. It uses getpgid(child_pid) instead. I think this works for at least normal shells. Second when the pgrp is restored, something (presumably the shell above the intermediate su, or the kernel) has already switched fd 1's pgrp to child's pgid instead of to the intermediate su's pgid (despite the intermediate su's being correct at SIGSTOP time for suspend but not for kill -STOP). Setting fd 1's pgrp to the value that it alread has is then fatal for reasons that I don't completely understand yet. The patch avoids the problem by not doing apparently-null tcsetpgrp()'s. Sending the SIGCONT seems to have no affect in this case, so I think shell above the su's has already started both the child su and the intermediate one and this isn't a problem until the su's get in each other's way. Putting printfs in the above code seems to make the problem easier to debug by ensuring that they get in each other's way :-). Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message