On Sat, Sep 14, 2019 at 07:33:34AM -0500, Eric W. Biederman wrote:
> 
> Add a count of the number of rcu users (currently 1) of the task
> struct so that we can later add the scheduler case and get rid of the
> very subtle task_rcu_dereference, and just use rcu_dereference.
> 
> As suggested by Oleg have the count overlap rcu_head so that no
> additional space in task_struct is required.
> 
> Inspired-by: Linus Torvalds <[email protected]>
> Inspired-by: Oleg Nesterov <[email protected]>
> Signed-off-by: "Eric W. Biederman" <[email protected]>
> ---
>  include/linux/sched.h      | 5 ++++-
>  include/linux/sched/task.h | 1 +
>  kernel/exit.c              | 7 ++++++-
>  kernel/fork.c              | 7 +++----
>  4 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9f51932bd543..99a4518b9b17 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1142,7 +1142,10 @@ struct task_struct {
>  
>       struct tlbflush_unmap_batch     tlb_ubc;
>  
> -     struct rcu_head                 rcu;
> +     union {
> +             refcount_t              rcu_users;
> +             struct rcu_head         rcu;
> +     };
>  
>       /* Cache last used pipe for splice(): */
>       struct pipe_inode_info          *splice_pipe;
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index 0497091e40c1..4c44c37236b2 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -116,6 +116,7 @@ static inline void put_task_struct(struct task_struct *t)
>  }
>  
>  struct task_struct *task_rcu_dereference(struct task_struct **ptask);
> +void put_task_struct_rcu_user(struct task_struct *task);
>  
>  #ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT
>  extern int arch_task_struct_size __read_mostly;
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 5b4a5dcce8f8..2e259286f4e7 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -182,6 +182,11 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
>       put_task_struct(tsk);
>  }
>  
> +void put_task_struct_rcu_user(struct task_struct *task)
> +{
> +     if (refcount_dec_and_test(&task->rcu_users))
> +             call_rcu(&task->rcu, delayed_put_task_struct);

We "instantly" transition from the union being ->rcu_user to being ->rcu,
so there needs to be some mechanism that has previously made sure that
nothing is going to attempt to use ->rcu on this task.

We cannot be relying solely on something like atomic_add_unless(),
because call_rcu() will likely result in ->rcu being non-zero.

> +}
>  
>  void release_task(struct task_struct *p)
>  {
> @@ -222,7 +227,7 @@ void release_task(struct task_struct *p)
>  
>       write_unlock_irq(&tasklist_lock);
>       release_thread(p);
> -     call_rcu(&p->rcu, delayed_put_task_struct);
> +     put_task_struct_rcu_user(p);

This, along with the pre-existing initialization of ->rcu_user to two 
below gives some hope, at least assuming that release_task() is invoked
after no one can possibly try to increment ->rcu_user.  And in v5.2,
release_task() is invoked from these places:

o       de_thread().  On this one, I must defer to Oleg, Peter, and crew.
        It might be that the list removals while write-holding the
        tasklist_lock do the trick, but that assumes that this lock is
        involved in acquiring a reference.

o       find_child_reaper().  This is invoked via exit_notify() from
        do_exit(), just after the call to exit_tasks_rcu_start().
        This is OK from a Tasks RCU perspective because it is using
        separate synchornization.  Something earlier must prevent
        new ->rcu_user references.

o       wait_task_zombie().  Here is hoping that the check for
        EXIT_DEAD is helpful here.

I am not seeing how this would be safe, but then again this is only the
first patch.  Plus there is much about this use case that I do not know.
OK, on to the other patches...

                                                        Thanx, Paul

>  
>       p = leader;
>       if (unlikely(zap_leader))
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 2852d0e76ea3..9f04741d5c70 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -900,10 +900,9 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>       if (orig->cpus_ptr == &orig->cpus_mask)
>               tsk->cpus_ptr = &tsk->cpus_mask;
>  
> -     /*
> -      * One for us, one for whoever does the "release_task()" (usually
> -      * parent)
> -      */
> +     /* One for the user space visible state that goes away when reaped. */
> +     refcount_set(&tsk->rcu_users, 1);
> +     /* One for the rcu users, and one for the scheduler */
>       refcount_set(&tsk->usage, 2);
>  #ifdef CONFIG_BLK_DEV_IO_TRACE
>       tsk->btrace_seq = 0;
> -- 
> 2.21.0.dirty
> 

Reply via email to