On Mon, Nov 28, 2011 at 04:00:10PM -0800, Richard Henderson wrote: > On 11/27/2011 02:53 PM, Alan Modra wrote: > > +enum memmodel > > +{ > > + MEMMODEL_RELAXED = 0, > > + MEMMODEL_CONSUME = 1, > > + MEMMODEL_ACQUIRE = 2, > > + MEMMODEL_RELEASE = 3, > > + MEMMODEL_ACQ_REL = 4, > > + MEMMODEL_SEQ_CST = 5, > > + MEMMODEL_LAST = 6 > > +}; > > This should probably go to libgomp.h.
I wondered about that. > > /* This is a Linux specific implementation of a semaphore synchronization > > mechanism for libgomp. This type is private to the library. This > > + counting semaphore implementation uses atomic instructions and the > > + futex syscall, and a single 32-bit int to store semaphore state. > > + The low 31 bits are the count, the top bit is a flag set when some > > + threads may be waiting. */ > > I think we'll get better code generation on a number of targets if we make > the low bit the waiting bit, and the higher bits the count. This we do > (count & 1) and (count + 2) instead of larger constants needing to be > generated. Not changed below... Funny, that's the way I wrote this code at first, then went for the wait bit as the sign bit because you can test > 0 in one place where you want "not waiting and count non-zero". > > +static inline void > > +gomp_sem_wait (gomp_sem_t *sem) > > { > > + int count = *sem; > > + > > + while ((count & 0x7fffffff) != 0) > > + { > > + int oldval = count; > > + __atomic_compare_exchange_4 (sem, &oldval, count - 1, > > + false, MEMMODEL_ACQUIRE, MEMMODEL_RELAXED); > > + if (__builtin_expect (oldval == count, 1)) > > + return; > > + count = oldval; > > + } > > I'd really prefer not to hard-code any sizes at all. Let's change the > explicit _4 to _n everywhere. OK. > The loop above doesn't actually make sense to me. If the compare-and-swap > succeeds, then oldval == count - 1. Which doesn't seem to be what you're > testing at all. Huh? If it succeeds, oldval == count and we return. > Perhaps this entire function is better written as > > { > int count = *sem; > do > { > if (__builtin_expect (count & 0x80000000u, 0) > { > gomp_sem_wait_slow (sem, count); > return; > } > } > while (!__atomic_compare_exchange_n (sem, &count, count - 1, true, > MEMMODEL_ACQUIRE, MEMMODEL_RELAXED)); > } No, we don't need the slow path if we have *sem & 0x7fffffff non-zero. > > > +gomp_sem_post (gomp_sem_t *sem) > > { > > + int count = *sem; > > + > > + while (1) > > + { > > + int oldval = count; > > + __atomic_compare_exchange_4 (sem, &oldval, (count + 1) & 0x7fffffff, > > + false, MEMMODEL_RELEASE, MEMMODEL_RELAXED); > > + if (__builtin_expect (oldval == count, 1)) > > + break; > > Again, this equality doesn't make sense. Further, you aren't testing for the > high bit set inside the loop, nor are you preserving the high bit. See above about the equality. We don't want to preserve the wait bit here. Otherwise we'd never go back to the uncontended state, which answers > ... All that said, I don't see how we can ever clear the wait bit > once its set? this question. > Are we better off splitting the 32-bit value into two 16-bit fields for > value+waiters? Or similarly splitting a 64-bit value? That way we can at > least update both fields atomically, and we ought never lose a waiter. Other splits of the field are certainly possible, but of course restrict the max number, and you'd need the fancy futex op_wait. Some targets don't have 64-bit atomics, so we can't really go that way. Yes, if I did keep track of number of waiters we'd save one unnecessary futex_wake call, but I'm quite confident no waiters will be lost just using a flag. -- Alan Modra Australia Development Lab, IBM