On Sun, 11 Aug 2002, David Xu wrote:

> following is patch for su, I can type "suspend" and stop $$ without the
> problem you described, I have tested it under tcsh and bash, all works
> for me.
>
> --- su.c      Mon Aug 12 13:08:01 2002
> +++ su.c.new  Mon Aug 12 13:16:14 2002
> @@ -329,10 +329,13 @@
>       default:
>               while ((ret_pid = waitpid(child_pid, &statusp, WUNTRACED)) != -1) {
>                       if (WIFSTOPPED(statusp)) {
> -                             child_pgrp = tcgetpgrp(1);
>                               kill(getpid(), SIGSTOP);
> -                             tcsetpgrp(1, child_pgrp);
> -                             kill(child_pid, SIGCONT);
> +                             child_pgrp = getpgid(child_pid);
> +                             if (tcgetpgrp(1) == getpgrp())
> +                             {
> +                                     tcsetpgrp(1, child_pgrp);
> +                                     kill(child_pid, SIGCONT);
> +                             }
>                               statusp = 1;
>                               continue;
>                       }

Explanation of this patch:

(1) su has shot itself in the foot using PAM.  Normally the parent shell
    waits for children and handles them when they stop.  The extra process
    for PAM is now in between the parent shell and the su shell, so the
    parent shell can't do this directly.  The above code attempts to
    relay some aspects of job control back to the parent shell.  It is
    not clear that it can do this properly without duplicating lots of
    shell specific job control, but I think it can do this in principle.

    There are related problems for propagation of SIGHUP to indirect
    descendants of login shells when the shell exits.  Here there is
    at least there is an intermediate process that can relay the signals
    if necessary.  I think propagation of SIGHUP is automatic if the
    intermediate process doesn't exit first and it doesn't change its
    job control stuff too much, so the SIGHUP problem doesn't affect
    PAMmed applications.

(2) To relay SIGSTOP, the intermediate su just needs stop itself.  To
    relay SIGCONT, the intermediate su needs to switch to enough of
    its child's job control environment before starting the child.
    Switching only fd 1's process group seems to be sufficient, but
    it is not easy to determine even that and the broken version got
    it wrong.

    The child's environment is very shell-dependent.  Some of the following
    may depend on the initial shell being bash:
    (a) sh, csh and bash start a new process group (equal to their pid).
        zsh stays in the process group of the intermediate su process.
    (b) "kill -STOP $$ ... fg" worked in most (all?) cases because
        fd 1's pgrp is still the child's pgid when the child is killed
        in that way.  For zsh the child's pgid is the same as the
        intermediate shell's so the pgrps can't be different, and for
        the other shells I think the pgrp hasn't been changed because
        the child can'tcontrol it (SIGSTOP is uncatchable) and the
        kernel doesn't.  Later, switching fd 1's pgrp back to the
        child's pgid works except possbly for zsh because it is correct
        and different.

    (c) "suspend ... fg" failed for several reasons.  First, something
        (presumably the child) sets fd 1's pgrp to the intermediate
        su's pgid, so tcgetpgrp(1) gives a wrong pgrp for restoring
        later.  The patch fixes this by not getting the pgrp in this
        way.  It uses getpgid(child_pid) instead.  I think this works
        for at least normal shells.  Second when the pgrp is restored,
        something (presumably the shell above the intermediate su, or
        the kernel) has already switched fd 1's pgrp to child's pgid
        instead of to the intermediate su's pgid (despite the intermediate
        su's being correct at SIGSTOP time for suspend but not for
        kill -STOP).  Setting fd 1's pgrp to the value that it alread
        has is then fatal for reasons that I don't completely understand
        yet.  The patch avoids the problem by not doing apparently-null
        tcsetpgrp()'s.  Sending the SIGCONT seems to have no affect in
        this case, so I think shell above the su's has already started
        both the child su and the intermediate one and this isn't a
        problem until the su's get in each other's way.  Putting printfs
        in the above code seems to make the problem easier to debug by
        ensuring that they get in each other's way :-).

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to