Re: [PATCH 1/3] task: Add a count of task rcu users
On Wed, Sep 04, 2019 at 11:20:03AM -0700, Linus Torvalds wrote: > On Wed, Sep 4, 2019 at 9:33 AM Frederic Weisbecker > wrote: > > > > I thought the point of these rcu_users was to be able to do: > > > > rcu_read_lock() > > p = rcu_dereference(task) > > if (!refcount_inc_not_zero(p->rcu_users)) { > > No. Because of the shared state, you can't do that from RCU context. > > But you *could* increase rcu_users from within the process context > itself (as long as you do it before the exit path, ie during normal > system call execution), or possibly while holding the tasklist_lock > and verifying that the task hasn't died yet. > > I'm not sure there is any sensible case for doing that, though. It > would have to have a similar pattern to the runqueue use, where you > add a new RCU lookup point for the task. I'm sure something like that > _could_ exist, I just can't think of any right now. Yeah indeed, in fact I just hadn't read the patches entirely so I missed the point. I see now that the purpose of rcu_users is to postpone the RCU delayed put_task_struct() at least once we are done with both final schedule out and release_task(). Sorry for the noise :o)
Re: [PATCH 1/3] task: Add a count of task rcu users
On Wed, Sep 4, 2019 at 9:33 AM Frederic Weisbecker wrote: > > I thought the point of these rcu_users was to be able to do: > > rcu_read_lock() > p = rcu_dereference(task) > if (!refcount_inc_not_zero(p->rcu_users)) { No. Because of the shared state, you can't do that from RCU context. But you *could* increase rcu_users from within the process context itself (as long as you do it before the exit path, ie during normal system call execution), or possibly while holding the tasklist_lock and verifying that the task hasn't died yet. I'm not sure there is any sensible case for doing that, though. It would have to have a similar pattern to the runqueue use, where you add a new RCU lookup point for the task. I'm sure something like that _could_ exist, I just can't think of any right now. Linus
Re: [PATCH 1/3] task: Add a count of task rcu users
On Wed, Sep 04, 2019 at 05:32:46PM +0200, Oleg Nesterov wrote: > On 09/04, Frederic Weisbecker wrote: > > > > So what happens if, say: > > > > > >CPU 1 CPU 2 > >-- > >rcu_read_lock() > >p = rcu_dereference(rq->task) > >if (refcount_inc_not_zero(p->rcu_users)) { > >. > > release_task() { > > put_task_struct_rcu_user() { > > call_rcu() { > > queue rcu_head > > in this particular case call_rcu() won't be called, so Yeah I confused myself in finding the scenario but like you say below, release_task() can simply happen between the p = rcu_dereference() and the refcount_inc to break things. I thought the point of these rcu_users was to be able to do: rcu_read_lock() p = rcu_dereference(task) if (!refcount_inc_not_zero(p->rcu_users)) { rcu_read_unlock(); return -1; } rcu_read_unlock(); //do stuff with task put_task_struct_rcu_user(p); Thanks. > > > } > > } > > } > >put_task_struct_rcu_user(); //here rcu_users has been overwritten > > rcu_users won't be overwritten. > > But nobody should try to increment ->rcu_users, > > rcu_read_lock(); > p = rcu_dereference(rq->task); > refcount_inc_not_zero(p->rcu_users); > > is already wrong because both release_task/last-schedule can happen in > between, before refcount_inc_not_zero(). > > Oleg. >
Re: [PATCH 1/3] task: Add a count of task rcu users
On 09/04, Frederic Weisbecker wrote: > > So what happens if, say: > > >CPU 1 CPU 2 >-- >rcu_read_lock() >p = rcu_dereference(rq->task) >if (refcount_inc_not_zero(p->rcu_users)) { >. > release_task() { > put_task_struct_rcu_user() { > call_rcu() { > queue rcu_head in this particular case call_rcu() won't be called, so > } > } > } >put_task_struct_rcu_user(); //here rcu_users has been overwritten rcu_users won't be overwritten. But nobody should try to increment ->rcu_users, rcu_read_lock(); p = rcu_dereference(rq->task); refcount_inc_not_zero(p->rcu_users); is already wrong because both release_task/last-schedule can happen in between, before refcount_inc_not_zero(). Oleg.
Re: [PATCH 1/3] task: Add a count of task rcu users
On Mon, Sep 02, 2019 at 11:51:34PM -0500, Eric W. Biederman wrote: > > Add a count of the number of rcu users (currently 1) of the task > struct so that we can later add the scheduler case and get rid of the > very subtle task_rcu_dereference, and just use rcu_dereference. > > As suggested by Oleg have the count overlap rcu_head so that no > additional space in task_struct is required. > > Inspired-by: Linus Torvalds > Inspired-by: Oleg Nesterov > Signed-off-by: "Eric W. Biederman" > --- > include/linux/sched.h | 5 - > include/linux/sched/task.h | 1 + > kernel/exit.c | 7 ++- > kernel/fork.c | 7 +++ > 4 files changed, 14 insertions(+), 6 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 9f51932bd543..99a4518b9b17 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1142,7 +1142,10 @@ struct task_struct { > > struct tlbflush_unmap_batch tlb_ubc; > > - struct rcu_head rcu; > + union { > + refcount_t rcu_users; > + struct rcu_head rcu; So what happens if, say: CPU 1 CPU 2 -- rcu_read_lock() p = rcu_dereference(rq->task) if (refcount_inc_not_zero(p->rcu_users)) { . release_task() { put_task_struct_rcu_user() { call_rcu() { queue rcu_head } } } put_task_struct_rcu_user(); //here rcu_users has been overwritten Thanks.
Re: [PATCH 1/3] task: Add a count of task rcu users
On 09/02, Eric W. Biederman wrote: > > @@ -900,10 +900,9 @@ static struct task_struct *dup_task_struct(struct > task_struct *orig, int node) > - /* > - * One for us, one for whoever does the "release_task()" (usually > - * parent) > - */ > + /* One for the user space visible state that goes away when reaped. */ > + refcount_set(&tsk->rcu_users, 1); forgot to mention... This is minor, but we don't really need to initialize child->rcu_users in dup_task_struct(), we can just add .rcu_users = REFCOUNT_INIT(2), into init_task's initializer. Until we have a refcount_inc(task->rcu_users) user, of course. Oleg.
[PATCH 1/3] task: Add a count of task rcu users
Add a count of the number of rcu users (currently 1) of the task struct so that we can later add the scheduler case and get rid of the very subtle task_rcu_dereference, and just use rcu_dereference. As suggested by Oleg have the count overlap rcu_head so that no additional space in task_struct is required. Inspired-by: Linus Torvalds Inspired-by: Oleg Nesterov Signed-off-by: "Eric W. Biederman" --- include/linux/sched.h | 5 - include/linux/sched/task.h | 1 + kernel/exit.c | 7 ++- kernel/fork.c | 7 +++ 4 files changed, 14 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9f51932bd543..99a4518b9b17 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1142,7 +1142,10 @@ struct task_struct { struct tlbflush_unmap_batch tlb_ubc; - struct rcu_head rcu; + union { + refcount_t rcu_users; + struct rcu_head rcu; + }; /* Cache last used pipe for splice(): */ struct pipe_inode_info *splice_pipe; diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 0497091e40c1..4c44c37236b2 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -116,6 +116,7 @@ static inline void put_task_struct(struct task_struct *t) } struct task_struct *task_rcu_dereference(struct task_struct **ptask); +void put_task_struct_rcu_user(struct task_struct *task); #ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT extern int arch_task_struct_size __read_mostly; diff --git a/kernel/exit.c b/kernel/exit.c index 5b4a5dcce8f8..2e259286f4e7 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -182,6 +182,11 @@ static void delayed_put_task_struct(struct rcu_head *rhp) put_task_struct(tsk); } +void put_task_struct_rcu_user(struct task_struct *task) +{ + if (refcount_dec_and_test(&task->rcu_users)) + call_rcu(&task->rcu, delayed_put_task_struct); +} void release_task(struct task_struct *p) { @@ -222,7 +227,7 @@ void release_task(struct task_struct *p) write_unlock_irq(&tasklist_lock); release_thread(p); - call_rcu(&p->rcu, delayed_put_task_struct); + put_task_struct_rcu_user(p); p = leader; if (unlikely(zap_leader)) diff --git a/kernel/fork.c b/kernel/fork.c index 2852d0e76ea3..9f04741d5c70 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -900,10 +900,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) if (orig->cpus_ptr == &orig->cpus_mask) tsk->cpus_ptr = &tsk->cpus_mask; - /* -* One for us, one for whoever does the "release_task()" (usually -* parent) -*/ + /* One for the user space visible state that goes away when reaped. */ + refcount_set(&tsk->rcu_users, 1); + /* One for the rcu users, and one for the scheduler */ refcount_set(&tsk->usage, 2); #ifdef CONFIG_BLK_DEV_IO_TRACE tsk->btrace_seq = 0; -- 2.21.0.dirty