From: Alexander Sverdlin <[email protected]> Ensure writes are pushed out of core write buffer to prevent waiting code on another cores from spinning longer than necessary.
6 threads running tight spinlock loop competing for the same lock on 6 cores on MIPS/Octeon do 1000000 iterations... before the patch in: 4.3 sec after the patch in: 1.2 sec Same 6-core Octeon machine: sysbench --test=mutex --num-threads=64 --memory-scope=local run w/o patch: 1.53s with patch: 1.28s This will also allow to remove the smp_wmb() in arch/arm/include/asm/mcs_spinlock.h (was it actually addressing the same issue?). Finally our internal quite diverse test suite of different IPC/network aspects didn't detect any regressions on ARM/ARM64/x86_64. Signed-off-by: Alexander Sverdlin <[email protected]> --- kernel/locking/mcs_spinlock.h | 5 +++++ kernel/locking/qspinlock.c | 6 ++++++ 2 files changed, 11 insertions(+) diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 5e10153..10e497a 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -89,6 +89,11 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) return; } WRITE_ONCE(prev->next, node); + /* + * This is necessary to make sure that the corresponding "while" in the + * mcs_spin_unlock() doesn't loop forever + */ + smp_wmb(); /* Wait until the lock holder passes the lock down. */ arch_mcs_spin_lock_contended(&node->locked); diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index cbff6ba..577fe01 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -469,6 +469,12 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) /* Link @node into the waitqueue. */ WRITE_ONCE(prev->next, node); + /* + * This is necessary to make sure that the corresponding + * smp_cond_load_relaxed() below (running on another core) + * doesn't spin forever. + */ + smp_wmb(); pv_wait_node(node, prev); arch_mcs_spin_lock_contended(&node->locked); -- 2.10.2

