A link from ibm.com states: "Ensures that all instructions preceding the call to __lwsync complete before any subsequent store instructions can be executed on the processor that executed the function. Also, it ensures that all load instructions preceding the call to __lwsync complete before any subsequent load instructions can be executed on the processor that executed the function. This allows you to synchronize between multiple processors with minimal performance impact, as __lwsync does not wait for confirmation from each processor."
Thats why smp_rmb() and smp_wmb() are defined to lwsync. But this same understanding applies to parallel pipeline execution on each PowerPC processor. So, use the lwsync instruction for rmb() and wmb() on the PPC architectures that support it. Signed-off-by: Kautuk Consul <kcon...@linux.vnet.ibm.com> --- arch/powerpc/include/asm/barrier.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index b95b666f0374..e088dacc0ee8 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -36,8 +36,15 @@ * heavy-weight sync, so smp_wmb() can be a lighter-weight eieio. */ #define __mb() __asm__ __volatile__ ("sync" : : : "memory") + +/* The sub-arch has lwsync. */ +#if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC) +#define __rmb() __asm__ __volatile__ ("lwsync" : : : "memory") +#define __wmb() __asm__ __volatile__ ("lwsync" : : : "memory") +#else #define __rmb() __asm__ __volatile__ ("sync" : : : "memory") #define __wmb() __asm__ __volatile__ ("sync" : : : "memory") +#endif /* The sub-arch has lwsync */ #if defined(CONFIG_PPC64) || defined(CONFIG_PPC_E500MC) -- 2.31.1