Re: Fix libgomp semaphores

Alan Modra Mon, 28 Nov 2011 16:53:10 -0800

On Mon, Nov 28, 2011 at 04:00:10PM -0800, Richard Henderson wrote:
> On 11/27/2011 02:53 PM, Alan Modra wrote:
> > +enum memmodel
> > +{
> > +  MEMMODEL_RELAXED = 0,
> > +  MEMMODEL_CONSUME = 1,
> > +  MEMMODEL_ACQUIRE = 2,
> > +  MEMMODEL_RELEASE = 3,
> > +  MEMMODEL_ACQ_REL = 4,
> > +  MEMMODEL_SEQ_CST = 5,
> > +  MEMMODEL_LAST = 6
> > +};
> 
> This should probably go to libgomp.h.


I wondered about that.

> >  /* This is a Linux specific implementation of a semaphore synchronization
> >     mechanism for libgomp.  This type is private to the library.  This 
> > +   counting semaphore implementation uses atomic instructions and the
> > +   futex syscall, and a single 32-bit int to store semaphore state.
> > +   The low 31 bits are the count, the top bit is a flag set when some
> > +   threads may be waiting.  */
> 
> I think we'll get better code generation on a number of targets if we make 
> the low bit the waiting  bit, and the higher bits the count.  This we do 
> (count & 1) and (count + 2) instead of larger constants needing to be 
> generated.  Not changed below...

Funny, that's the way I wrote this code at first, then went for the
wait bit as the sign bit because you can test > 0 in one place where
you want "not waiting and count non-zero".

> > +static inline void
> > +gomp_sem_wait (gomp_sem_t *sem)
> >  {
> > +  int count = *sem;
> > +
> > +  while ((count & 0x7fffffff) != 0)
> > +    {
> > +      int oldval = count;
> > +      __atomic_compare_exchange_4 (sem, &oldval, count - 1,
> > +                              false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED);
> > +      if (__builtin_expect (oldval == count, 1))
> > +   return;
> > +      count = oldval;
> > +    }
> 
> I'd really prefer not to hard-code any sizes at all.  Let's change the 
> explicit _4 to _n everywhere.

OK.

> The loop above doesn't actually make sense to me.  If the compare-and-swap 
> succeeds, then oldval == count - 1.  Which doesn't seem to be what you're 
> testing at all.

Huh?  If it succeeds, oldval == count and we return.

> Perhaps this entire function is better written as
> 
> {
>   int count = *sem;
>   do
>     {
>       if (__builtin_expect (count & 0x80000000u, 0)
>         {
>           gomp_sem_wait_slow (sem, count);
>           return;
>         }
>     }
>   while (!__atomic_compare_exchange_n (sem, &count, count - 1, true,
>                                      MEMMODEL_ACQUIRE, MEMMODEL_RELAXED));
> }

No, we don't need the slow path if we have *sem & 0x7fffffff non-zero.

> 
> > +gomp_sem_post (gomp_sem_t *sem)
> >  {
> > +  int count = *sem;
> > +
> > +  while (1)
> > +    {
> > +      int oldval = count;
> > +      __atomic_compare_exchange_4 (sem, &oldval, (count + 1) & 0x7fffffff,
> > +                              false, MEMMODEL_RELEASE, MEMMODEL_RELAXED);
> > +      if (__builtin_expect (oldval == count, 1))
> > +   break;
> 
> Again, this equality doesn't make sense.  Further, you aren't testing for the
> high bit set inside the loop, nor are you preserving the high bit.

See above about the equality.  We don't want to preserve the wait bit
here.  Otherwise we'd never go back to the uncontended state, which
answers

> ... All that said, I don't see how we can ever clear the wait bit
> once its set?

this question.

> Are we better off splitting the 32-bit value into two 16-bit fields for 
> value+waiters?  Or similarly splitting a 64-bit value?  That way we can at 
> least update both fields atomically, and we ought never lose a waiter.

Other splits of the field are certainly possible, but of course
restrict the max number, and you'd need the fancy futex op_wait.
Some targets don't have 64-bit atomics, so we can't really go that
way.  Yes, if I did keep track of number of waiters we'd save one
unnecessary futex_wake call, but I'm quite confident no waiters will
be lost just using a flag.

-- 
Alan Modra
Australia Development Lab, IBM

Re: Fix libgomp semaphores

Reply via email to