Re: [patch] powerpc: implement optimised mutex fastpaths

2008-11-05 Thread Paul Mackerras
Nick Piggin writes:

 On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
  
  Implement a more optimal mutex fastpath for powerpc, making use of acquire
  and release barrier semantics. This takes the mutex lock+unlock benchmark
  from 203 to 173 cycles on a G5.
  
  +static inline int
  +__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
  +{
  +   if (likely(__mutex_cmpxchg_lock(count, 1, 0) == 1))
  +   return 1;
 
 Oops, I must have sent the wrong version. This needs a return 0 here.

Are you going to send the right version?  (or did you send it and I
missed it?)

Also I note that the comment you added just before that routine says:

+ * Change the count from 1 to a value lower than 1, and return 0 (failure)
+ * if it wasn't 1 originally, or return 1 (success) otherwise. This function
+ * MUST leave the value lower than 1 even when the 1 assertion wasn't true.

yet just doing a __mutex_cmpxchg_lock isn't going to do that.  So
please fix either the comment or the code (or both :).

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc: implement optimised mutex fastpaths

2008-11-05 Thread Nick Piggin
On Thu, Nov 06, 2008 at 03:09:08PM +1100, Paul Mackerras wrote:
 Nick Piggin writes:
 
  On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
   
   Implement a more optimal mutex fastpath for powerpc, making use of acquire
   and release barrier semantics. This takes the mutex lock+unlock benchmark
   from 203 to 173 cycles on a G5.
   
   +static inline int
   +__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
   +{
   + if (likely(__mutex_cmpxchg_lock(count, 1, 0) == 1))
   + return 1;
  
  Oops, I must have sent the wrong version. This needs a return 0 here.
 
 Are you going to send the right version?  (or did you send it and I
 missed it?)

Hmm, I thought I did but perhaps not. Will do...


 Also I note that the comment you added just before that routine says:
 
 + * Change the count from 1 to a value lower than 1, and return 0 (failure)
 + * if it wasn't 1 originally, or return 1 (success) otherwise. This function
 + * MUST leave the value lower than 1 even when the 1 assertion wasn't true.
 
 yet just doing a __mutex_cmpxchg_lock isn't going to do that.  So
 please fix either the comment or the code (or both :).

OK.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc: implement optimised mutex fastpaths

2008-10-14 Thread Nick Piggin
On Mon, Oct 13, 2008 at 11:20:20AM -0500, Scott Wood wrote:
 On Mon, Oct 13, 2008 at 11:15:47AM -0500, Scott Wood wrote:
  On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
   +static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
   +{
   + int t;
   +
   + __asm__ __volatile__ (
   +1:  lwarx   %0,0,%1 # mutex trylock\n\
   + cmpw0,%0,%2\n\
   + bne-2f\n
   + PPC405_ERR77(0,%1)
   +stwcx.  %3,0,%1\n\
   + bne-1b
   + ISYNC_ON_SMP
   + \n\
   +2:
   + : =r (t)
   + : r (v-counter), r (old), r (new)
   + : cc, memory);
  
  This will break if the compiler picks r0 for v-counter.  Use b as the
  constraint, or better yet do lwarx %0, %y1, with a Z (v-counter)
  constraint.
 
 Sorry, had a morning brain-fart there -- you're not using the base
 register.

OK no problem. Do you think it looks OK? Thanks for reviewing...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc: implement optimised mutex fastpaths

2008-10-13 Thread Scott Wood
On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
 +static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
 +{
 + int t;
 +
 + __asm__ __volatile__ (
 +1:  lwarx   %0,0,%1 # mutex trylock\n\
 + cmpw0,%0,%2\n\
 + bne-2f\n
 + PPC405_ERR77(0,%1)
 +stwcx.  %3,0,%1\n\
 + bne-1b
 + ISYNC_ON_SMP
 + \n\
 +2:
 + : =r (t)
 + : r (v-counter), r (old), r (new)
 + : cc, memory);

This will break if the compiler picks r0 for v-counter.  Use b as the
constraint, or better yet do lwarx %0, %y1, with a Z (v-counter)
constraint.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc: implement optimised mutex fastpaths

2008-10-13 Thread Scott Wood
On Mon, Oct 13, 2008 at 11:15:47AM -0500, Scott Wood wrote:
 On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
  +static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
  +{
  +   int t;
  +
  +   __asm__ __volatile__ (
  +1:lwarx   %0,0,%1 # mutex trylock\n\
  +   cmpw0,%0,%2\n\
  +   bne-2f\n
  +   PPC405_ERR77(0,%1)
  +  stwcx.  %3,0,%1\n\
  +   bne-1b
  +   ISYNC_ON_SMP
  +   \n\
  +2:
  +   : =r (t)
  +   : r (v-counter), r (old), r (new)
  +   : cc, memory);
 
 This will break if the compiler picks r0 for v-counter.  Use b as the
 constraint, or better yet do lwarx %0, %y1, with a Z (v-counter)
 constraint.

Sorry, had a morning brain-fart there -- you're not using the base
register.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[patch] powerpc: implement optimised mutex fastpaths

2008-10-12 Thread Nick Piggin

Implement a more optimal mutex fastpath for powerpc, making use of acquire
and release barrier semantics. This takes the mutex lock+unlock benchmark
from 203 to 173 cycles on a G5.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/arch/powerpc/include/asm/mutex.h
===
--- linux-2.6.orig/arch/powerpc/include/asm/mutex.h
+++ linux-2.6/arch/powerpc/include/asm/mutex.h
@@ -1,9 +1,136 @@
 /*
- * Pull in the generic implementation for the mutex fastpath.
+ * Optimised mutex implementation of include/asm-generic/mutex-dec.h algorithm
+ */
+#ifndef _ASM_POWERPC_MUTEX_H
+#define _ASM_POWERPC_MUTEX_H
+
+static inline int __mutex_cmpxchg_lock(atomic_t *v, int old, int new)
+{
+   int t;
+
+   __asm__ __volatile__ (
+1:lwarx   %0,0,%1 # mutex trylock\n\
+   cmpw0,%0,%2\n\
+   bne-2f\n
+   PPC405_ERR77(0,%1)
+  stwcx.  %3,0,%1\n\
+   bne-1b
+   ISYNC_ON_SMP
+   \n\
+2:
+   : =r (t)
+   : r (v-counter), r (old), r (new)
+   : cc, memory);
+
+   return t;
+}
+
+static inline int __mutex_dec_return_lock(atomic_t *v)
+{
+   int t;
+
+   __asm__ __volatile__(
+1:lwarx   %0,0,%1 # mutex lock\n\
+   addic   %0,%0,-1\n
+   PPC405_ERR77(0,%1)
+  stwcx.  %0,0,%1\n\
+   bne-1b
+   ISYNC_ON_SMP
+   : =r (t)
+   : r (v-counter)
+   : cc, memory);
+
+   return t;
+}
+
+static inline int __mutex_inc_return_unlock(atomic_t *v)
+{
+   int t;
+
+   __asm__ __volatile__(
+   LWSYNC_ON_SMP
+1:lwarx   %0,0,%1 # mutex unlock\n\
+   addic   %0,%0,1\n
+   PPC405_ERR77(0,%1)
+  stwcx.  %0,0,%1 \n\
+   bne-1b
+   : =r (t)
+   : r (v-counter)
+   : cc, memory);
+
+   return t;
+}
+
+/**
+ *  __mutex_fastpath_lock - try to take the lock by moving the count
+ *  from 1 to a 0 value
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call fail_fn if
+ * it wasn't 1 originally. This function MUST leave the value lower than
+ * 1 even when the 1 assertion wasn't true.
+ */
+static inline void
+__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+   if (unlikely(__mutex_dec_return_lock(count)  0))
+   fail_fn(count);
+}
+
+/**
+ *  __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ * from 1 to a 0 value
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 1
+ *
+ * Change the count from 1 to a value lower than 1, and call fail_fn if
+ * it wasn't 1 originally. This function returns 0 if the fastpath succeeds,
+ * or anything the slow path function returns.
+ */
+static inline int
+__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+   if (unlikely(__mutex_dec_return_lock(count)  0))
+   return fail_fn(count);
+   return 0;
+}
+
+/**
+ *  __mutex_fastpath_unlock - try to promote the count from 0 to 1
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: function to call if the original value was not 0
+ *
+ * Try to promote the count from 0 to 1. If it wasn't 0, call fail_fn.
+ * In the failure case, this function is allowed to either set the value to
+ * 1, or to set it to a value lower than 1.
+ */
+static inline void
+__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
+{
+   if (unlikely(__mutex_inc_return_unlock(count) = 0))
+   fail_fn(count);
+}
+
+#define __mutex_slowpath_needs_to_unlock() 1
+
+/**
+ * __mutex_fastpath_trylock - try to acquire the mutex, without waiting
+ *
+ *  @count: pointer of type atomic_t
+ *  @fail_fn: fallback function
  *
- * TODO: implement optimized primitives instead, or leave the generic
- * implementation in place, or pick the atomic_xchg() based generic
- * implementation. (see asm-generic/mutex-xchg.h for details)
+ * Change the count from 1 to a value lower than 1, and return 0 (failure)
+ * if it wasn't 1 originally, or return 1 (success) otherwise. This function
+ * MUST leave the value lower than 1 even when the 1 assertion wasn't true.
+ * Additionally, if the value was  0 originally, this function must not leave
+ * it to 0 on failure.
  */
+static inline int
+__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
+{
+   if (likely(__mutex_cmpxchg_lock(count, 1, 0) == 1))
+   return 1;
+}
 
-#include asm-generic/mutex-dec.h
+#endif
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] powerpc: implement optimised mutex fastpaths

2008-10-12 Thread Nick Piggin
On Sun, Oct 12, 2008 at 07:47:32AM +0200, Nick Piggin wrote:
 
 Implement a more optimal mutex fastpath for powerpc, making use of acquire
 and release barrier semantics. This takes the mutex lock+unlock benchmark
 from 203 to 173 cycles on a G5.
 
 +static inline int
 +__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
 +{
 + if (likely(__mutex_cmpxchg_lock(count, 1, 0) == 1))
 + return 1;

Oops, I must have sent the wrong version. This needs a return 0 here.

 +}

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev