full memory barrier semantics

Andrew MacLeod Tue, 31 May 2011 06:12:28 -0700

On 05/31/2011 06:38 AM, Jakub Jelinek wrote:

Aldy was just too excited about working on memory model I think :-)

I've been looking at this, and I propose we go this way :

http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen

Please feel free to criticize, comment on,  or ask for
clarification.  I usually miss something I meant to get across.

I think the addition of new __sync_* builtins for the different models
is preferrable and would be generally more usable even for other users than
C++ atomics. On some targets any atomic insn will act as a full barrier,
while on others it could generate different insns or code sequences that
way.  For OpenMP atomics having a none (in addition to full/acq/rel)
would be useful, I think #pragma omp atomic doesn't impose any ordering
on memory accesses other than the memory being atomically
read/written/changed.  Haven't read the C++0x standard in detail why
it has 6 memory order modes instead of just 4, but if really 6 are needed
(even for 4 probably), having new builtins with just one constant extra
argument which says the memory ordering mode would be best.

I'm not sure if you are agreeing or not, or how much :-)

There is still only the basics of relaxed, consume, release/acquire, andseq-cst. so there are 4 modes. C++ gives you two more by separatingrelease and acquire for loads and stores, loads using 'acquire' mode,stores using 'release'. I guess It allows for a slightly finer controlover instructions that can be loads and/or stores. It looks like theoptimal powerpc sequence for cmpxchg is slightly more efficient when itsjust an acquire or just a release rather than an acquire/release forinstance. (and all 3 sequences are slightly different)

The table is more or less complete... ie, a store cant have an'acquire' mode... and I presume that a consumer which doesn't breakrelease-acquire down into component parts would use that 'release'version of the store as 'release/acquire' mode.

I presume a single builtin with a parameter is the most efficient way tobuild them, but thats just an implementation detail. Presumable you haveeach builtin in the table with each of those possible modes as a validparameter. The one thing I would care about is i would like to see therelaxed version be 'just an insn' rather than a builtin, if thatspossible... My understanding is that relaxed (as far as C++) has nosynchronization at all, so therefore you can treat it like a normaloperation as far as optimization. That seems the same for openMP. Itsjust thats its atomic operation. So it would be preferable if we canavoid a builtin in the optimizers for that. Thats why I left it out ofthe table. If all the atomic operations are already builtins, well,then I guess it doesn't matter :-P

It would be nice to say something like emit_atomic_fetch_add(memory_order) and if its relaxed, emit the atomic fetch_add insn (orbuiltin if thats what it is), and if its something else, emit theappropriate builtin. That would make bits/libstdc++v2/atomic_2.h eveneasier too


I think maybe we are more or less saying the same thing? :-)

Andrew

Re: __sync_swap* with acq/rel/full memory barrier semantics

Reply via email to