Re: Use atomics in libgomp mutex

2011-11-29 Thread Alan Modra
On Mon, Nov 28, 2011 at 05:33:17PM -0800, Richard Henderson wrote:
 On 11/27/2011 02:57 PM, Alan Modra wrote:
  This is the mutex part.  Depends on
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
  values.
  
  * config/linux/mutex.h: Use atomic rather than sync builtins.
  * config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
  * config/linux/omp-lock.h: Comment fix.
  * config/linux/arm/mutex.h: Delete.
  * config/linux/powerpc/mutex.h: Delete.
  * config/linux/ia64/mutex.h: Delete.
  * config/linux/mips/mutex.h: Delete.
 
 Looks good modulo _4/_n and using the success return of 
 __atomic_compare_exchange.

Did you see my comment about the OpenMP spec requirement that critical,
ordered and parallel regions have a flush on entry and on exit?  So
should I commit a version with

static inline void
gomp_mutex_lock (gomp_mutex_t *mutex)
{
  int oldval = 0;
  /* FIXME: This should just be MEMMODEL_ACQUIRE, MEMMODEL_RELAXED
 but GOMP_critical_start and other functions rely on the lock
 acquisition doing a flush.  See OpenMP API version 3.1 section
 2.8.6 flush Construct.  */
  if (__builtin_expect (!__atomic_compare_exchange_n (mutex, oldval, 1, false,
  MEMMODEL_ACQ_REL,
  MEMMODEL_ACQUIRE),
0))
gomp_mutex_lock_slow (mutex, oldval);
}

or as posted in
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02455.html?  Perhaps with
the following as well

* ordered.c (gomp_ordered_sync): Add MEMMODEL_RELEASE fence.
* critical.c (GOMP_critical_start): Likewise.

I think we're OK on parallel regions by virtue of gomp_barrier_wait,
but I may not know what I'm talking about.  This stuff fries my
brains.

Index: ordered.c
===
--- ordered.c   (revision 181770)
+++ ordered.c   (working copy)
@@ -199,6 +199,9 @@ gomp_ordered_sync (void)
   if (team == NULL || team-nthreads == 1)
 return;
 
+  /* There is an implicit flush on entry to an ordered region. */
+  __atomic_thread_fence (MEMMODEL_RELEASE);
+
   /* ??? I believe it to be safe to access this data without taking the
  ws-lock.  The only presumed race condition is with the previous
  thread on the queue incrementing ordered_cur such that it points
Index: critical.c
===
--- critical.c  (revision 181770)
+++ critical.c  (working copy)
@@ -33,6 +33,8 @@ static gomp_mutex_t default_lock;
 void
 GOMP_critical_start (void)
 {
+  /* There is an implicit flush on entry to a critical region. */
+  __atomic_thread_fence (MEMMODEL_RELEASE);
   gomp_mutex_lock (default_lock);
 }
 

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-29 Thread Richard Henderson
On 11/29/2011 04:27 AM, Alan Modra wrote:
 Did you see my comment about the OpenMP spec requirement that critical,
 ordered and parallel regions have a flush on entry and on exit?  So
 should I commit a version with

No, I missed that.  Too many branches to this thread.  ;-)

   /* FIXME: This should just be MEMMODEL_ACQUIRE, MEMMODEL_RELAXED
  but GOMP_critical_start and other functions rely on the lock
  acquisition doing a flush.  See OpenMP API version 3.1 section
  2.8.6 flush Construct.  */
   if (__builtin_expect (!__atomic_compare_exchange_n (mutex, oldval, 1, 
 false,
 MEMMODEL_ACQ_REL,
 MEMMODEL_ACQUIRE),
   0))

No, I think we should simply put other barriers elsewhere.

 +  /* There is an implicit flush on entry to an ordered region. */
 +  __atomic_thread_fence (MEMMODEL_RELEASE);
 +
/* ??? I believe it to be safe to access this data without taking the
   ws-lock.  The only presumed race condition is with the previous
   thread on the queue incrementing ordered_cur such that it points

How about MEMMODEL_ACQ_REL, after the comment about the lock, which would have 
done the flush?  Given that we don't have a lock, and no ACQ barrier, perhaps 
the full barrier makes more sense?

 +  /* There is an implicit flush on entry to a critical region. */
 +  __atomic_thread_fence (MEMMODEL_RELEASE);
gomp_mutex_lock (default_lock);

And, yes, this together with the ACQUIRE barrier inside the lock makes a full 
barrier.  (Which also reminds me that for gcc 4.8 we should expose these 
barriers at both gimple and rtl levels and optimize them.  We'll get two 
sequential lwsync-like insns here on any target where the barrier isn't 
included in the lock insn directly.)

As for parallels, I agree the barrier implementation should flush all data.

And I've found the menu for next week: 
http://www.masterchef.com.au/fried-brains-with-bacon-crumble-and-apple-vinaigrette.htm


r~


Re: Use atomics in libgomp mutex

2011-11-29 Thread Alan Modra
On Tue, Nov 29, 2011 at 08:37:07AM -0800, Richard Henderson wrote:
 How about MEMMODEL_ACQ_REL

Right.  Committed with that change revision 181832.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-28 Thread Jakub Jelinek
On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote:
 On Mon, Nov 28, 2011 at 09:27:20AM +1030, Alan Modra wrote:
  This is the mutex part.  Depends on
  http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
  values.
 
 Arrgh, I posted the wrong patch.  I know it needs work elsewhere in
 libgomp to comply with the OMP spec, which says of flush:
 
 14 A flush region without a list is implied at the following locations:
 15 • During a barrier region.
 16 • At entry to and exit from parallel, critical, and ordered regions.
 [snip]
 
 Yet COMP_critical_start simply calls gomp_mutex_lock, relying on the
 lock acquisition to do the flush.  This won't necessarily happen with
 MEMMODEL_ACQUIRE.
 
 This bit in gomp_mutex_lock
  +  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
  +  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
 should be
   __atomic_compare_exchange_4 (mutex, oldval, 1, false,
  MEMMODEL_ACQ_REL, MEMMODEL_ACQ_REL);
 
 and once you do that it's hard to justify the patch for stage3 as
 fixing a bug (unnecessary sync instructions emitted by __sync
 builtins).

If it is just GOMP_critical_start, couldn't it use an extra barrier
before resp. after it calls mutex_lock?

Jakub


Re: Use atomics in libgomp mutex

2011-11-28 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:15:02AM +0100, Jakub Jelinek wrote:
 On Mon, Nov 28, 2011 at 10:37:09AM +1030, Alan Modra wrote:
  14 A flush region without a list is implied at the following locations:
  15 • During a barrier region.
  16 • At entry to and exit from parallel, critical, and ordered regions.
  [snip]

 If it is just GOMP_critical_start, couldn't it use an extra barrier
 before resp. after it calls mutex_lock?

I believe gomp_ordered_sync needs a barrier, and I'm unsure about
GOMP_parallel_start but I think one is needed there too.  Yes, putting
the barriers where they are actually needed is the best solution, but
I'm far from an OpenMP expert so am unsure whether these are the only
places requiring a barrier.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Use atomics in libgomp mutex

2011-11-28 Thread Richard Henderson
On 11/27/2011 02:57 PM, Alan Modra wrote:
 This is the mutex part.  Depends on
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
 values.
 
   * config/linux/mutex.h: Use atomic rather than sync builtins.
   * config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
   * config/linux/omp-lock.h: Comment fix.
   * config/linux/arm/mutex.h: Delete.
   * config/linux/powerpc/mutex.h: Delete.
   * config/linux/ia64/mutex.h: Delete.
   * config/linux/mips/mutex.h: Delete.

Looks good modulo _4/_n and using the success return of 
__atomic_compare_exchange.


r~


Use atomics in libgomp mutex

2011-11-27 Thread Alan Modra
This is the mutex part.  Depends on
http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
values.

* config/linux/mutex.h: Use atomic rather than sync builtins.
* config/linux/mutex.c: Likewise.  Comment.  Use -1 for waiting state.
* config/linux/omp-lock.h: Comment fix.
* config/linux/arm/mutex.h: Delete.
* config/linux/powerpc/mutex.h: Delete.
* config/linux/ia64/mutex.h: Delete.
* config/linux/mips/mutex.h: Delete.

Index: libgomp/config/linux/mutex.h
===
--- libgomp/config/linux/mutex.h(revision 181718)
+++ libgomp/config/linux/mutex.h(working copy)
@@ -33,39 +33,35 @@ typedef int gomp_mutex_t;
 
 #define GOMP_MUTEX_INIT_0 1
 
-static inline void gomp_mutex_init (gomp_mutex_t *mutex)
+extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex, int);
+extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
+
+static inline void
+gomp_mutex_init (gomp_mutex_t *mutex)
 {
   *mutex = 0;
 }
 
-extern void gomp_mutex_lock_slow (gomp_mutex_t *mutex, int);
-static inline void gomp_mutex_lock (gomp_mutex_t *mutex)
+static inline void
+gomp_mutex_destroy (gomp_mutex_t *mutex)
 {
-  int oldval = __sync_val_compare_and_swap (mutex, 0, 1);
-  if (__builtin_expect (oldval, 0))
-gomp_mutex_lock_slow (mutex, oldval);
 }
 
-extern void gomp_mutex_unlock_slow (gomp_mutex_t *mutex);
-static inline void gomp_mutex_unlock (gomp_mutex_t *mutex)
+static inline void
+gomp_mutex_lock (gomp_mutex_t *mutex)
 {
-  /* Warning: By definition __sync_lock_test_and_set() does not have
- proper memory barrier semantics for a mutex unlock operation.
- However, this default implementation is written assuming that it
- does, which is true for some targets.
-
- Targets that require additional memory barriers before
- __sync_lock_test_and_set to achieve the release semantics of
- mutex unlock, are encouraged to include
- config/linux/ia64/mutex.h in a target specific mutex.h instead
- of using this file.  */
-  int val = __sync_lock_test_and_set (mutex, 0);
-  if (__builtin_expect (val  1, 0))
-gomp_mutex_unlock_slow (mutex);
+  int oldval = 0;
+  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
+  if (__builtin_expect (oldval != 0, 0))
+gomp_mutex_lock_slow (mutex, oldval);
 }
 
-static inline void gomp_mutex_destroy (gomp_mutex_t *mutex)
+static inline void
+gomp_mutex_unlock (gomp_mutex_t *mutex)
 {
+  int wait = __atomic_exchange_4 (mutex, 0, MEMMODEL_RELEASE);
+  if (__builtin_expect (wait  0, 0))
+gomp_mutex_unlock_slow (mutex);
 }
-
 #endif /* GOMP_MUTEX_H */
Index: libgomp/config/linux/mutex.c
===
--- libgomp/config/linux/mutex.c(revision 181718)
+++ libgomp/config/linux/mutex.c(working copy)
@@ -34,25 +34,33 @@ long int gomp_futex_wait = FUTEX_WAIT | 
 void
 gomp_mutex_lock_slow (gomp_mutex_t *mutex, int oldval)
 {
+  /* First loop spins a while.  */
   while (oldval == 1)
 {
   if (do_spin (mutex, 1))
{
- oldval = __sync_lock_test_and_set (mutex, 2);
+ /* Spin timeout, nothing changed.  Set waiting flag.  */
+ oldval = __atomic_exchange_4 (mutex, -1, MEMMODEL_ACQUIRE);
  if (oldval == 0)
return;
- futex_wait (mutex, 2);
+ futex_wait (mutex, -1);
  break;
}
   else
{
- oldval = __sync_val_compare_and_swap (mutex, 0, 1);
+ /* Something changed.  If now unlocked, we're good to go.  */
+ oldval = 0;
+ __atomic_compare_exchange_4 (mutex, oldval, 1, false,
+  MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
  if (oldval == 0)
return;
}
 }
-  while ((oldval = __sync_lock_test_and_set (mutex, 2)))
-do_wait (mutex, 2);
+
+  /* Second loop waits until mutex is unlocked.  We always exit this
+ loop with wait flag set, so next unlock will awaken a thread.  */
+  while ((oldval = __atomic_exchange_4 (mutex, -1, MEMMODEL_ACQUIRE)))
+do_wait (mutex, -1);
 }
 
 void
Index: libgomp/config/linux/omp-lock.h
===
--- libgomp/config/linux/omp-lock.h (revision 181718)
+++ libgomp/config/linux/omp-lock.h (working copy)
@@ -3,8 +3,8 @@
structures without polluting the namespace.
 
When using the Linux futex primitive, non-recursive locks require
-   only one int.  Recursive locks require we identify the owning task
-   and so require one int and a pointer.  */
+   one int.  Recursive locks require we identify the owning task
+   and so require in addition one int and a pointer.  */
 
 typedef int omp_lock_t;
 typedef struct { int lock, count; void *owner; } omp_nest_lock_t;
Index: libgomp/config/linux/arm/mutex.h

Re: Use atomics in libgomp mutex

2011-11-27 Thread Alan Modra
On Mon, Nov 28, 2011 at 09:27:20AM +1030, Alan Modra wrote:
 This is the mutex part.  Depends on
 http://gcc.gnu.org/ml/gcc-patches/2011-11/msg02454.html for MEMMODEL_*
 values.

Arrgh, I posted the wrong patch.  I know it needs work elsewhere in
libgomp to comply with the OMP spec, which says of flush:

14 A flush region without a list is implied at the following locations:
15 • During a barrier region.
16 • At entry to and exit from parallel, critical, and ordered regions.
[snip]

Yet COMP_critical_start simply calls gomp_mutex_lock, relying on the
lock acquisition to do the flush.  This won't necessarily happen with
MEMMODEL_ACQUIRE.

This bit in gomp_mutex_lock
 +  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
 +MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
should be
  __atomic_compare_exchange_4 (mutex, oldval, 1, false,
   MEMMODEL_ACQ_REL, MEMMODEL_ACQ_REL);

and once you do that it's hard to justify the patch for stage3 as
fixing a bug (unnecessary sync instructions emitted by __sync
builtins).

-- 
Alan Modra
Australia Development Lab, IBM