Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/ that the spinning-lock used at __schedule() should be RCsc to ensure visibility of writes prior to __schedule when the task is to be migrated to another CPU.
And this is emphasized at the comment of the newly introduced smp_mb__after_spinlock(), * This barrier must provide two things: * * - it must guarantee a STORE before the spin_lock() is ordered against a * LOAD after it, see the comments at its two usage sites. * * - it must ensure the critical section is RCsc. * * The latter is important for cases where we observe values written by other * CPUs in spin-loops, without barriers, while being subject to scheduling. * * CPU0 CPU1 CPU2 * * for (;;) { * if (READ_ONCE(X)) * break; * } * X=1 * <sched-out> * <sched-in> * r = X; * * without transitivity it could be that CPU1 observes X!=0 breaks the loop, * we get migrated and CPU2 sees X==0. which is used at, __schedule(bool preempt) { ... rq_lock(rq, &rf); smp_mb__after_spinlock(); ... } . If I didn't miss something, I found this kind of visibility is __not__ necessarily depends on the spinning-lock at __schedule being RCsc. In fact, as for runnable task A, the migration would be, CPU0 CPU1 CPU2 <ACCESS before schedule out A> lock(rq0) schedule out A unock(rq0) lock(rq0) remove A from rq0 unlock(rq0) lock(rq2) add A into rq2 unlock(rq2) lock(rq2) schedule in A unlock(rq2) <ACCESS after schedule in A> <ACCESS before schedule out A> happens-before unlock(rq0) happends-before lock(rq0) happends-before unlock(rq2) happens-before lock(rq2) happens-before <ACCESS after schedule in A> And for stopped tasks, CPU0 CPU1 CPU2 <ACCESS before schedule out A> lock(rq0) schedule out A remove A from rq0 store-release(A->on_cpu) unock(rq0) load_acquire(A->on_cpu) set_task_cpu(A, 2) lock(rq2) add A into rq2 unlock(rq2) lock(rq2) schedule in A unlock(rq2) <ACCESS after schedule in A> <ACCESS before schedule out A> happens-before store-release(A->on_cpu) happens-before load_acquire(A->on_cpu) happens-before unlock(rq2) happens-before lock(rq2) happens-before <ACCESS after schedule in A> So, I think the only requirement to smp_mb__after_spinlock is to guarantee a STORE before the spin_lock() is ordered against a LOAD after it. So we could remove the RCsc requirement to allow more efficient implementation. Did I miss something or this RCsc requirement does not really matter?