On 05/31/2011 06:38 AM, Jakub Jelinek wrote:

Aldy was just too excited about working on memory model I think :-)

I've been looking at this, and I propose we go this way :

http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen

Please feel free to criticize, comment on,  or ask for
clarification.  I usually miss something I meant to get across.
I think the addition of new __sync_* builtins for the different models
is preferrable and would be generally more usable even for other users than
C++ atomics. On some targets any atomic insn will act as a full barrier,
while on others it could generate different insns or code sequences that
way.  For OpenMP atomics having a none (in addition to full/acq/rel)
would be useful, I think #pragma omp atomic doesn't impose any ordering
on memory accesses other than the memory being atomically
read/written/changed.  Haven't read the C++0x standard in detail why
it has 6 memory order modes instead of just 4, but if really 6 are needed
(even for 4 probably), having new builtins with just one constant extra
argument which says the memory ordering mode would be best.


I'm not sure if you are agreeing or not, or how much :-)

There is still only the basics of relaxed, consume, release/acquire, and seq-cst. so there are 4 modes. C++ gives you two more by separating release and acquire for loads and stores, loads using 'acquire' mode, stores using 'release'. I guess It allows for a slightly finer control over instructions that can be loads and/or stores. It looks like the optimal powerpc sequence for cmpxchg is slightly more efficient when its just an acquire or just a release rather than an acquire/release for instance. (and all 3 sequences are slightly different)

The table is more or less complete... ie, a store cant have an 'acquire' mode... and I presume that a consumer which doesn't break release-acquire down into component parts would use that 'release' version of the store as 'release/acquire' mode.

I presume a single builtin with a parameter is the most efficient way to build them, but thats just an implementation detail. Presumable you have each builtin in the table with each of those possible modes as a valid parameter. The one thing I would care about is i would like to see the relaxed version be 'just an insn' rather than a builtin, if thats possible... My understanding is that relaxed (as far as C++) has no synchronization at all, so therefore you can treat it like a normal operation as far as optimization. That seems the same for openMP. Its just thats its atomic operation. So it would be preferable if we can avoid a builtin in the optimizers for that. Thats why I left it out of the table. If all the atomic operations are already builtins, well, then I guess it doesn't matter :-P

It would be nice to say something like emit_atomic_fetch_add (memory_order) and if its relaxed, emit the atomic fetch_add insn (or builtin if thats what it is), and if its something else, emit the appropriate builtin. That would make bits/libstdc++v2/atomic_2.h even easier too

I think maybe we are more or less saying the same thing? :-)

Andrew





Reply via email to