Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than using sync for smp_mb. Although it takes more instructions.
Running tbench with 4 clients on my 4 core G5 (20 times) gives the following: unpatched AVG=920.33 STD=2.36 patched AVG=921.27 STD=2.77 So not a big improvement here, actually it could even be in the noise. But other workloads or systems might see a bigger win, and the patch maybe is interesting or could be improved, so I'll ask for comments. --- Index: linux-2.6/arch/powerpc/include/asm/system.h =================================================================== --- linux-2.6.orig/arch/powerpc/include/asm/system.h 2009-02-20 01:51:24.000000000 +1100 +++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 +1100 @@ -52,7 +52,16 @@ # define SMPWMB eieio #endif +#ifdef __powerpc64__ +#define smp_mb() __asm__ __volatile__ ( \ + "1: lwsync \n" \ + " cmpw 0,%%r0,%%r0 \n" \ + " bne- 1b \n" \ + " isync \n" \ + : : : "memory") +#else #define smp_mb() mb() +#endif #define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory") #define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory") #define smp_read_barrier_depends() read_barrier_depends() _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev