On Sun, May 10, 2026 at 01:41:27PM +0000, Alice Ryhl wrote: > On Fri, May 08, 2026 at 11:38:18PM +0200, Andrea Righi wrote: > > Hi Alice, > > > > On Fri, May 08, 2026 at 02:02:45PM +0000, Alice Ryhl wrote: > > > The sched/task.h header file currently exposes a tryget_task_struct() > > > function, but it is very risky to use it: If the last refcount of the > > > task is dropped using put_task_struct_many(), then the task is freed > > > right away without an RCU grace period. > > > > > > This means that if the kernel contains a code path anywhere such that > > > the last refcount of a task may be dropped with put_task_struct_many(), > > > and it also contains a code path anywhere that tries to stash a task > > > pointer under rcu and use tryget_task_struct() on it, then if they ever > > > execute on the same 'struct task_struct', it results in a > > > use-after-free. > > > > > > The above applies even if the RCU user drops its own task reference with > > > put_task_struct(), because if that is not the last reference, then it's > > > possible for another thread to invoke put_task_struct_many() and free > > > the task less than a grace period after the RCU user called > > > put_task_struct(). > > > > > > There does not appear to be an actual problem in the kernel tree right > > > now because there are no in-tree users of put_task_struct_many() where > > > refcount_sub_and_test() might return 'true'. Io-uring invokes the > > > function from task work while the task is still running, so it will not > > > decrement it all the way to zero. (Note that if I'm wrong about this, > > > then it's probably possible to trigger UAF by combining this codepath in > > > io-uring with the tryget_task_struct() call in sched-ext.) > > > > > > However, the current situation is fragile and error-prone. > > > - If you look at put_task_struct_many() in isolation, it looks like it > > > would be okay to call it in a situation where refcount_sub_and_test() > > > might return 'true'. > > > - Similarly, if you look at tryget_task_struct(), you would assume that > > > you are allowed to call this method for a grace period after 'users' > > > hitting zero. (If not, why does it exist?) > > > But if two different kernel developers anywhere in the kernel make these > > > conflicting assumptions at any point in the future, then the combination > > > of their code may lead to a use-after-free if there is any way for them > > > to interact via the same 'struct task_struct'. > > > > > > Thus, as a defensive measure, we should either make > > > put_task_struct_many() use call_rcu(), or we should delete > > > tryget_task_struct(). This patch suggests the former because it does not > > > change anything for any callers that exist today. (As argued previously, > > > the body of the 'if' statement is dead code in the kernel today.) > > > > > > The comment in put_task_struct() is also updated so that nobody changes > > > its implementation to only use call_rcu() under PREEMPT_RT in the > > > future. The current comment suggests that would be a legal change, but > > > it is similarly incompatible with anyone using tryget_task_struct(). > > > > > > Signed-off-by: Alice Ryhl <[email protected]> > > > --- > > > Including sched-ext and io-uring in the cc list as they are the only > > > users of tryget_task_struct() and put_task_struct_many() respectively. > > > > For sched_ext I think we should be already protected by scx_tasks_lock. > > > > From kernel/sched/core.c: > > > > finish_task_switch(): > > if (prev_state == TASK_DEAD) { > > prev->sched_class->task_dead(prev); > > sched_ext_dead(prev); > > cgroup_task_dead(prev); > > put_task_stack(prev); > > ... > > put_task_struct_rcu_user(prev); > > } > > > > And sched_ext_dead() in kernel/sched/ext.c: > > > > scoped_guard(raw_spinlock_irqsave, &scx_tasks_lock) { > > list_del_init(&p->scx.tasks_node); > > ... > > } > > > > Now on the sched_ext iter side: > > > > scx_task_iter_start(); /* takes scx_tasks_lock */ > > while ((p = scx_task_iter_next_locked())) > > if (!tryget_task_struct(p)) /* still under scx_tasks_lock > > */ > > ... > > > > So, the locking gives us the invariant: while the iter holds scx_tasks_lock > > and > > observes p on the list, sched_ext_dead(p) cannot have completed. > > Correct my if I'm wrong, but this sounds like you don't need the tryget > variant. The 'users' counter is guaranteed be non-zero for one grace > period after put_task_struct_rcu_user(prev).
Correct, I think we can just get rid of tryget and use get_task_struct(). I'll run some stress tests with this change. -Andrea

