On Tue, Oct 31, 2017 at 09:36:44AM +0100, Peter Zijlstra wrote: > On Mon, Oct 30, 2017 at 12:44:00PM -0700, syzbot wrote: > > WARNING: CPU: 1 PID: 24353 at kernel/futex.c:818 get_pi_state+0x15b/0x190 > > kernel/futex.c:818 > > > exit_pi_state_list+0x556/0x7a0 kernel/futex.c:932 > > mm_release+0x46d/0x590 kernel/fork.c:1191 > > exit_mm kernel/exit.c:499 [inline] > > do_exit+0x481/0x1b00 kernel/exit.c:852 > > SYSC_exit kernel/exit.c:937 [inline] > > SyS_exit+0x22/0x30 kernel/exit.c:935 > > entry_SYSCALL_64_fastpath+0x1f/0xbe > > > Argh, I definitely messed that up. Let me have a prod..
The below appears to cure the problem, I could (fairly quickly) reproduce the issue one I hacked up the repro.c to not bother with tunnels. With the below patch, the reproducer has been running for a fairly long time now without issue. This should fix both that WARN and the UAF report, both were related problems. --- Subject: futex: Fix more put_pi_state() vs exit_pi_state_list() races Dmitry (through syzbot) reported being able to trigger the WARN in get_pi_state() and a use-after-free on raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock). Both are due to this race: exit_pi_state_list() put_pi_state() lock(&curr->pi_lock) while() { pi_state = list_first_entry(head); hb = hash_futex(&pi_state->key); unlock(&curr->pi_lock); dec_and_test(&pi_state->refcount); lock(&hb->lock) lock(&pi_state->pi_mutex.wait_lock) // uaf if pi_state free'd lock(&curr->pi_lock); .... unlock(&curr->pi_lock); get_pi_state(); // WARN; refcount==0 The problem is we take the reference count too late, and don't allow it being 0. Fix it by using inc_not_zero() and simply retrying the loop when we fail to get a refcount. In that case put_pi_state() should remove the entry from the list. Cc: Gratian Crisan <gratian.cri...@ni.com> Reported-by: Dmitry Vyukov <dvyu...@google.com> Fixes: c74aef2d06a9 ("futex: Fix pi_state->owner serialization") Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> --- futex.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 0518a0bfc746..ca5bb9cba5cf 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -903,11 +903,27 @@ void exit_pi_state_list(struct task_struct *curr) */ raw_spin_lock_irq(&curr->pi_lock); while (!list_empty(head)) { - next = head->next; pi_state = list_entry(next, struct futex_pi_state, list); key = pi_state->key; hb = hash_futex(&key); + + /* + * We can race against put_pi_state() removing itself from the + * list (a waiter going away). put_pi_state() will first + * decrement the reference count and then modify the list, so + * its possible to see the list entry but fail this reference + * acquire. + * + * In that case; drop the locks to let put_pi_state() make + * progress and retry the loop. + */ + if (!atomic_inc_not_zero(&pi_state->refcount)) { + raw_spin_unlock_irq(&curr->pi_lock); + cpu_relax(); + raw_spin_lock_irq(&curr->pi_lock); + continue; + } raw_spin_unlock_irq(&curr->pi_lock); spin_lock(&hb->lock); @@ -918,8 +934,10 @@ void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { + /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); + put_pi_state(pi_state); continue; } @@ -927,9 +945,8 @@ void exit_pi_state_list(struct task_struct *curr) WARN_ON(list_empty(&pi_state->list)); list_del_init(&pi_state->list); pi_state->owner = NULL; - raw_spin_unlock(&curr->pi_lock); - get_pi_state(pi_state); + raw_spin_unlock(&curr->pi_lock); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock);