The runqueue locks are special in that the owner changes over a context
switch. To ensure that this is accounted for in CONFIG_DEBUG_SPINLOCK
builds, finish_lock_switch updates rq->lock.owner while the lock is
held.

However, this happens *after* prev->on_cpu is cleared, which allows prev
to be scheduled on another CPU. If prev then attempts to acquire the
same rq lock, before the updated rq->lock.owner is made visible, it will
see itself as the owner.

This can happen in virtual environments, where a vCPU can be preempted
for an arbitrarily long period between updating prev->on_cpu and
rq->lock.owner, as has been observed on arm64 and x86 under KVM, e.g.

BUG: spinlock recursion on CPU#1, syz-executor1/1521
 lock: 0xffff80001a764080, .magic: dead4ead, .owner: syz-executor1/1521, 
.owner_cpu: 0
CPU: 1 PID: 1521 Comm: syz-executor1 Not tainted 4.15.0 #3
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x330 arch/arm64/kernel/time.c:52
 show_stack+0x20/0x30 arch/arm64/kernel/traps.c:151
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0xd0/0x120 lib/dump_stack.c:53
 spin_dump+0x150/0x1f0 kernel/locking/spinlock_debug.c:67
 spin_bug kernel/locking/spinlock_debug.c:75 [inline]
 debug_spin_lock_before kernel/locking/spinlock_debug.c:84 [inline]
 do_raw_spin_lock+0x1e4/0x250 kernel/locking/spinlock_debug.c:112
 __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
 _raw_spin_lock+0x44/0x50 kernel/locking/spinlock.c:144
 __task_rq_lock+0xc0/0x288 kernel/sched/core.c:103
 wake_up_new_task+0x384/0x798 kernel/sched/core.c:2465
 _do_fork+0x1b4/0xa78 kernel/fork.c:2069
 SYSC_clone kernel/fork.c:2154 [inline]
 SyS_clone+0x48/0x60 kernel/fork.c:2132
 el0_svc_naked+0x20/0x24

Let's avoid this updating rq->lock.owner before clearing prev->on_cpu.
As the latter is a release, it ensures that the new owner is visible to
prev if it is scheduled on another CPU.

Signed-off-by: Mark Rutland <mark.rutl...@arm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
---
 kernel/sched/sched.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b19552a212de..4f0d2e3701c3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1342,6 +1342,10 @@ static inline void prepare_lock_switch(struct rq *rq, 
struct task_struct *next)
 
 static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 {
+#ifdef CONFIG_DEBUG_SPINLOCK
+       /* this is a valid case when another task releases the spinlock */
+       rq->lock.owner = current;
+#endif
 #ifdef CONFIG_SMP
        /*
         * After ->on_cpu is cleared, the task can be moved to a different CPU.
@@ -1355,10 +1359,6 @@ static inline void finish_lock_switch(struct rq *rq, 
struct task_struct *prev)
         */
        smp_store_release(&prev->on_cpu, 0);
 #endif
-#ifdef CONFIG_DEBUG_SPINLOCK
-       /* this is a valid case when another task releases the spinlock */
-       rq->lock.owner = current;
-#endif
        /*
         * If we are tracking spinlock dependencies then we have to
         * fix up the runqueue lock - which gets 'carried over' from
-- 
2.11.0

Reply via email to