On 5/26/2016 10:19 AM, Peter Zijlstra wrote:
--- a/arch/tile/lib/spinlock_32.c +++ b/arch/tile/lib/spinlock_32.c @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock if (next == curr) return;+ smp_rmb();+ /* Wait until the current locker has released the lock. */ do { delay_backoff(iterations++); } while (READ_ONCE(lock->current_ticket) == curr); + + smp_acquire__after_ctrl_dep(); } EXPORT_SYMBOL(arch_spin_unlock_wait);--- a/arch/tile/lib/spinlock_64.c+++ b/arch/tile/lib/spinlock_64.c @@ -72,10 +72,14 @@ void arch_spin_unlock_wait(arch_spinlock if (arch_spin_next(val) == curr) return;+ smp_rmb();+ /* Wait until the current locker has released the lock. */ do { delay_backoff(iterations++); } while (arch_spin_current(READ_ONCE(lock->lock)) == curr); + + smp_acquire__after_ctrl_dep(); } EXPORT_SYMBOL(arch_spin_unlock_wait);
The smp_rmb() are unnecessary for tile. We READ_ONCE next/curr from the lock and compare them, so we know the load(s) are complete. There's no microarchitectural speculation going on so that's that. Then we READ_ONCE the next load on the lock from within the wait loop, so our load/load ordering is guaranteed. With that change, Acked-by: Chris Metcalf <[email protected]> [for tile] -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com

