Re: [patch] mutex: optimise generic mutex implementations

2008-10-23 Thread Nick Piggin
On Thu, Oct 23, 2008 at 03:43:58PM +1100, Benjamin Herrenschmidt wrote:
 On Wed, 2008-10-22 at 17:59 +0200, Ingo Molnar wrote:
  * Nick Piggin [EMAIL PROTECTED] wrote:
  
   Speed up generic mutex implementations.
   
   - atomic operations which both modify the variable and return something 
   imply
 full smp memory barriers before and after the memory operations involved
 (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier 
   because
 they don't modify the target). See Documentation/atomic_ops.txt.
 So remove extra barriers and branches.
 
   - All architectures support atomic_cmpxchg. This has no relation to
 __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path 
   unconditionally
   
   This reduces a simple single threaded fastpath lock+unlock test from 590 
   cycles
   to 203 cycles on a ppc970 system.
   
   Signed-off-by: Nick Piggin [EMAIL PROTECTED]
  
  no objections here. Lets merge these two patches via the ppc tree, so 
  that it gets testing on real hardware as well?
  
  Acked-by: Ingo Molnar [EMAIL PROTECTED]
 
 Allright but in that case it will be after -rc1 unless I manage to sneak
 something in tomorrow before linux closes the merge window.
 
 I can't get an update today.

Fine with me.

Thanks,
Nick
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] mutex: optimise generic mutex implementations

2008-10-22 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 Speed up generic mutex implementations.
 
 - atomic operations which both modify the variable and return something imply
   full smp memory barriers before and after the memory operations involved
   (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier 
 because
   they don't modify the target). See Documentation/atomic_ops.txt.
   So remove extra barriers and branches.
   
 - All architectures support atomic_cmpxchg. This has no relation to
   __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path 
 unconditionally
 
 This reduces a simple single threaded fastpath lock+unlock test from 590 
 cycles
 to 203 cycles on a ppc970 system.
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]

no objections here. Lets merge these two patches via the ppc tree, so 
that it gets testing on real hardware as well?

Acked-by: Ingo Molnar [EMAIL PROTECTED]

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] mutex: optimise generic mutex implementations

2008-10-22 Thread David Howells
Nick Piggin [EMAIL PROTECTED] wrote:

 Speed up generic mutex implementations.
 
 - atomic operations which both modify the variable and return something imply
   full smp memory barriers before and after the memory operations involved
   (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier 
 because
   they don't modify the target). See Documentation/atomic_ops.txt.
   So remove extra barriers and branches.
   
 - All architectures support atomic_cmpxchg. This has no relation to
   __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path 
 unconditionally
 
 This reduces a simple single threaded fastpath lock+unlock test from 590 
 cycles
 to 203 cycles on a ppc970 system.
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]

This seems to work on FRV which uses the mutex-dec generic algorithm, though
you have to take that with a pinch of salt as I don't have SMP hardware for
it.

Acked-by: David Howells [EMAIL PROTECTED]
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] mutex: optimise generic mutex implementations

2008-10-22 Thread Benjamin Herrenschmidt
On Wed, 2008-10-22 at 17:59 +0200, Ingo Molnar wrote:
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  Speed up generic mutex implementations.
  
  - atomic operations which both modify the variable and return something 
  imply
full smp memory barriers before and after the memory operations involved
(failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier 
  because
they don't modify the target). See Documentation/atomic_ops.txt.
So remove extra barriers and branches.

  - All architectures support atomic_cmpxchg. This has no relation to
__HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path 
  unconditionally
  
  This reduces a simple single threaded fastpath lock+unlock test from 590 
  cycles
  to 203 cycles on a ppc970 system.
  
  Signed-off-by: Nick Piggin [EMAIL PROTECTED]
 
 no objections here. Lets merge these two patches via the ppc tree, so 
 that it gets testing on real hardware as well?
 
 Acked-by: Ingo Molnar [EMAIL PROTECTED]

Allright but in that case it will be after -rc1 unless I manage to sneak
something in tomorrow before linux closes the merge window.

I can't get an update today.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [patch] mutex: optimise generic mutex implementations

2008-10-14 Thread Benjamin Herrenschmidt
On Sun, 2008-10-12 at 07:46 +0200, Nick Piggin wrote:
 Speed up generic mutex implementations.
 
 - atomic operations which both modify the variable and return something imply
   full smp memory barriers before and after the memory operations involved
   (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier 
 because
   they don't modify the target). See Documentation/atomic_ops.txt.
   So remove extra barriers and branches.
   
 - All architectures support atomic_cmpxchg. This has no relation to
   __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path 
 unconditionally
 
 This reduces a simple single threaded fastpath lock+unlock test from 590 
 cycles
 to 203 cycles on a ppc970 system.
 
 Signed-off-by: Nick Piggin [EMAIL PROTECTED]

Looks ok.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[patch] mutex: optimise generic mutex implementations

2008-10-11 Thread Nick Piggin
Speed up generic mutex implementations.

- atomic operations which both modify the variable and return something imply
  full smp memory barriers before and after the memory operations involved
  (failing atomic_cmpxchg, atomic_add_unless, etc don't imply a barrier because
  they don't modify the target). See Documentation/atomic_ops.txt.
  So remove extra barriers and branches.
  
- All architectures support atomic_cmpxchg. This has no relation to
  __HAVE_ARCH_CMPXCHG. We can just take the atomic_cmpxchg path unconditionally

This reduces a simple single threaded fastpath lock+unlock test from 590 cycles
to 203 cycles on a ppc970 system.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/asm-generic/mutex-dec.h
===
--- linux-2.6.orig/include/asm-generic/mutex-dec.h
+++ linux-2.6/include/asm-generic/mutex-dec.h
@@ -22,8 +22,6 @@ __mutex_fastpath_lock(atomic_t *count, v
 {
if (unlikely(atomic_dec_return(count)  0))
fail_fn(count);
-   else
-   smp_mb();
 }
 
 /**
@@ -41,10 +39,7 @@ __mutex_fastpath_lock_retval(atomic_t *c
 {
if (unlikely(atomic_dec_return(count)  0))
return fail_fn(count);
-   else {
-   smp_mb();
-   return 0;
-   }
+   return 0;
 }
 
 /**
@@ -63,7 +58,6 @@ __mutex_fastpath_lock_retval(atomic_t *c
 static inline void
 __mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
 {
-   smp_mb();
if (unlikely(atomic_inc_return(count) = 0))
fail_fn(count);
 }
@@ -98,15 +92,9 @@ __mutex_fastpath_trylock(atomic_t *count
 * just as efficient (and simpler) as a 'destructive' probing of
 * the mutex state would be.
 */
-#ifdef __HAVE_ARCH_CMPXCHG
-   if (likely(atomic_cmpxchg(count, 1, 0) == 1)) {
-   smp_mb();
+   if (likely(atomic_cmpxchg(count, 1, 0) == 1))
return 1;
-   }
return 0;
-#else
-   return fail_fn(count);
-#endif
 }
 
 #endif
Index: linux-2.6/include/asm-generic/mutex-xchg.h
===
--- linux-2.6.orig/include/asm-generic/mutex-xchg.h
+++ linux-2.6/include/asm-generic/mutex-xchg.h
@@ -27,8 +27,6 @@ __mutex_fastpath_lock(atomic_t *count, v
 {
if (unlikely(atomic_xchg(count, 0) != 1))
fail_fn(count);
-   else
-   smp_mb();
 }
 
 /**
@@ -46,10 +44,7 @@ __mutex_fastpath_lock_retval(atomic_t *c
 {
if (unlikely(atomic_xchg(count, 0) != 1))
return fail_fn(count);
-   else {
-   smp_mb();
-   return 0;
-   }
+   return 0;
 }
 
 /**
@@ -67,7 +62,6 @@ __mutex_fastpath_lock_retval(atomic_t *c
 static inline void
 __mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
 {
-   smp_mb();
if (unlikely(atomic_xchg(count, 1) != 0))
fail_fn(count);
 }
@@ -110,7 +104,6 @@ __mutex_fastpath_trylock(atomic_t *count
if (prev  0)
prev = 0;
}
-   smp_mb();
 
return prev;
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev