On Wed, Jan 27, 2021 at 09:36:22PM +0100, Alexander A Sverdlin wrote:
> From: Alexander Sverdlin <alexander.sverd...@nokia.com>
> 
> On Octeon smp_mb() translates to SYNC while wmb+rmb translates to SYNCW
> only. This brings around 10% performance on tight uncontended spinlock
> loops.
> 
> Refer to commit 500c2e1fdbcc ("MIPS: Optimize spinlocks.") and the link
> below.
> 
> On 6-core Octeon machine:
> sysbench --test=mutex --num-threads=64 --memory-scope=local run
> 
> w/o patch:    1.60s
> with patch:   1.51s
> 
> Link: https://lore.kernel.org/lkml/5644d08d.4080...@caviumnetworks.com/
> Signed-off-by: Alexander Sverdlin <alexander.sverd...@nokia.com>
> ---
>  arch/mips/include/asm/barrier.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
> index 49ff172..24c3f2c 100644
> --- a/arch/mips/include/asm/barrier.h
> +++ b/arch/mips/include/asm/barrier.h
> @@ -113,6 +113,15 @@ static inline void wmb(void)
>                                           ".set arch=octeon\n\t"      \
>                                           "syncw\n\t"                 \
>                                           ".set pop" : : : "memory")
> +
> +#define __smp_store_release(p, v)                                    \
> +do {                                                                 \
> +     compiletime_assert_atomic_type(*p);                             \
> +     __smp_wmb();                                                    \
> +     __smp_rmb();                                                    \
> +     WRITE_ONCE(*p, v);                                              \
> +} while (0)

This is wrong in general since smp_rmb() will only provide order between
two loads and smp_store_release() is a store.

If this is correct for all MIPS, this needs a giant comment on exactly
how that smp_rmb() makes sense here.

Reply via email to