On Wed, Jan 27, 2021 at 09:01:08PM +0100, Alexander A Sverdlin wrote: > From: Alexander Sverdlin <[email protected]> > > Ensure writes are pushed out of core write buffer to prevent waiting code > on another cores from spinning longer than necessary. > > 6 threads running tight spinlock loop competing for the same lock > on 6 cores on MIPS/Octeon do 1000000 iterations... > > before the patch in: 4.3 sec > after the patch in: 1.2 sec
If you only have 6 cores, I'm not sure qspinlock makes any sense... > Same 6-core Octeon machine: > sysbench --test=mutex --num-threads=64 --memory-scope=local run > > w/o patch: 1.53s > with patch: 1.28s > > This will also allow to remove the smp_wmb() in > arch/arm/include/asm/mcs_spinlock.h (was it actually addressing the same > issue?). > > Finally our internal quite diverse test suite of different IPC/network > aspects didn't detect any regressions on ARM/ARM64/x86_64. > > Signed-off-by: Alexander Sverdlin <[email protected]> > --- > kernel/locking/mcs_spinlock.h | 5 +++++ > kernel/locking/qspinlock.c | 6 ++++++ > 2 files changed, 11 insertions(+) > > diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h > index 5e10153..10e497a 100644 > --- a/kernel/locking/mcs_spinlock.h > +++ b/kernel/locking/mcs_spinlock.h > @@ -89,6 +89,11 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct > mcs_spinlock *node) > return; > } > WRITE_ONCE(prev->next, node); > + /* > + * This is necessary to make sure that the corresponding "while" in the > + * mcs_spin_unlock() doesn't loop forever > + */ > + smp_wmb(); If it loops forever, that's broken hardware design; store buffers need to drain. I don't think we should add unconditional barriers to bodge this. > /* Wait until the lock holder passes the lock down. */ > arch_mcs_spin_lock_contended(&node->locked); > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index cbff6ba..577fe01 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -469,6 +469,12 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, > u32 val) > > /* Link @node into the waitqueue. */ > WRITE_ONCE(prev->next, node); > + /* > + * This is necessary to make sure that the corresponding > + * smp_cond_load_relaxed() below (running on another core) > + * doesn't spin forever. > + */ > + smp_wmb(); Likewise. Will

