Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
using sync for smp_mb. Although it takes more instructions.

Running tbench with 4 clients on my 4 core G5 (20 times) gives the
following:

unpatched AVG=920.33 STD=2.36
  patched AVG=921.27 STD=2.77

So not a big improvement here, actually it could even be in the noise.
But other workloads or systems might see a bigger win, and the patch
maybe is interesting or could be improved, so I'll ask for comments. 

---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================
--- linux-2.6.orig/arch/powerpc/include/asm/system.h    2009-02-20 
01:51:24.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h 2009-02-20 02:09:41.000000000 
+1100
@@ -52,7 +52,16 @@
 #    define SMPWMB      eieio
 #endif
 
+#ifdef __powerpc64__
+#define smp_mb()       __asm__ __volatile__ (                              \
+                                       "1:     lwsync                  \n" \
+                                       "       cmpw    0,%%r0,%%r0     \n" \
+                                       "       bne-    1b              \n" \
+                                       "       isync                   \n" \
+                                       : : : "memory")
+#else
 #define smp_mb()       mb()
+#endif
 #define smp_rmb()      __asm__ __volatile__ (stringify_in_c(LWSYNC) : : 
:"memory")
 #define smp_wmb()      __asm__ __volatile__ (stringify_in_c(SMPWMB) : : 
:"memory")
 #define smp_read_barrier_depends()     read_barrier_depends()
_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Reply via email to