On Wed, Jul 17, 2019 at 01:21:00PM -0400, Joel Fernandes wrote:
> From: Suren Baghdasaryan <sur...@google.com>
> 
> There is a race between reading task->exit_state in pidfd_poll and writing
> it after do_notify_parent calls do_notify_pidfd. Expected sequence of
> events is:
> 
> CPU 0                            CPU 1
> ------------------------------------------------
> exit_notify
>   do_notify_parent
>     do_notify_pidfd
>   tsk->exit_state = EXIT_DEAD
>                                   pidfd_poll
>                                      if (tsk->exit_state)
> 
> However nothing prevents the following sequence:
> 
> CPU 0                            CPU 1
> ------------------------------------------------
> exit_notify
>   do_notify_parent
>     do_notify_pidfd
>                                    pidfd_poll
>                                       if (tsk->exit_state)
>   tsk->exit_state = EXIT_DEAD
> 
> This causes a polling task to wait forever, since poll blocks because
> exit_state is 0 and the waiting task is not notified again. A stress
> test continuously doing pidfd poll and process exits uncovered this bug,
> and the below patch fixes it.
> 
> To fix this, we set tsk->exit_state before calling do_notify_pidfd.
> 
> Cc: kernel-t...@android.com
> Signed-off-by: Suren Baghdasaryan <sur...@google.com>
> Signed-off-by: Joel Fernandes (Google) <j...@joelfernandes.org>

That means in such a situation other users will see EXIT_ZOMBIE where
they didn't see that before until after the parent failed to get
notified.

That's a rather subtle internal change. I was worried about
__ptrace_detach() since it explicitly checks for EXIT_ZOMBIE but it
seems to me that this is fine since we hold write_lock_irq(&tasklist_lock);
at the point when we do set p->exit_signal.

Acked-by: Christian Brauner <christ...@brauner.io>

Once Oleg confirms that I'm right not to worty I'll pick this up.

Thanks!
Christian

> 
> ---
>  kernel/exit.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index a75b6a7f458a..740ceacb4b76 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -720,6 +720,7 @@ static void exit_notify(struct task_struct *tsk, int 
> group_dead)
>       if (group_dead)
>               kill_orphaned_pgrp(tsk->group_leader, NULL);
>  
> +     tsk->exit_state = EXIT_ZOMBIE;
>       if (unlikely(tsk->ptrace)) {
>               int sig = thread_group_leader(tsk) &&
>                               thread_group_empty(tsk) &&
> @@ -1156,10 +1157,11 @@ static int wait_task_zombie(struct wait_opts *wo, 
> struct task_struct *p)
>               ptrace_unlink(p);
>  
>               /* If parent wants a zombie, don't release it now */
> -             state = EXIT_ZOMBIE;
> +             p->exit_state = EXIT_ZOMBIE;
>               if (do_notify_parent(p, p->exit_signal))
> -                     state = EXIT_DEAD;
> -             p->exit_state = state;
> +                     p->exit_state = EXIT_DEAD;
> +
> +             state = p->exit_state;
>               write_unlock_irq(&tasklist_lock);
>       }
>       if (state == EXIT_DEAD)
> -- 
> 2.22.0.657.g960e92d24f-goog
> 

Reply via email to