On 05/31/2011 06:38 AM, Jakub Jelinek wrote:
Aldy was just too excited about working on memory model I think :-)
I've been looking at this, and I propose we go this way :
http://gcc.gnu.org/wiki/Atomic/GCCMM/CodeGen
Please feel free to criticize, comment on, or ask for
clarification. I usually miss something I meant to get across.
I think the addition of new __sync_* builtins for the different models
is preferrable and would be generally more usable even for other users than
C++ atomics. On some targets any atomic insn will act as a full barrier,
while on others it could generate different insns or code sequences that
way. For OpenMP atomics having a none (in addition to full/acq/rel)
would be useful, I think #pragma omp atomic doesn't impose any ordering
on memory accesses other than the memory being atomically
read/written/changed. Haven't read the C++0x standard in detail why
it has 6 memory order modes instead of just 4, but if really 6 are needed
(even for 4 probably), having new builtins with just one constant extra
argument which says the memory ordering mode would be best.
I'm not sure if you are agreeing or not, or how much :-)
There is still only the basics of relaxed, consume, release/acquire, and
seq-cst. so there are 4 modes. C++ gives you two more by separating
release and acquire for loads and stores, loads using 'acquire' mode,
stores using 'release'. I guess It allows for a slightly finer control
over instructions that can be loads and/or stores. It looks like the
optimal powerpc sequence for cmpxchg is slightly more efficient when its
just an acquire or just a release rather than an acquire/release for
instance. (and all 3 sequences are slightly different)
The table is more or less complete... ie, a store cant have an
'acquire' mode... and I presume that a consumer which doesn't break
release-acquire down into component parts would use that 'release'
version of the store as 'release/acquire' mode.
I presume a single builtin with a parameter is the most efficient way to
build them, but thats just an implementation detail. Presumable you have
each builtin in the table with each of those possible modes as a valid
parameter. The one thing I would care about is i would like to see the
relaxed version be 'just an insn' rather than a builtin, if thats
possible... My understanding is that relaxed (as far as C++) has no
synchronization at all, so therefore you can treat it like a normal
operation as far as optimization. That seems the same for openMP. Its
just thats its atomic operation. So it would be preferable if we can
avoid a builtin in the optimizers for that. Thats why I left it out of
the table. If all the atomic operations are already builtins, well,
then I guess it doesn't matter :-P
It would be nice to say something like emit_atomic_fetch_add
(memory_order) and if its relaxed, emit the atomic fetch_add insn (or
builtin if thats what it is), and if its something else, emit the
appropriate builtin. That would make bits/libstdc++v2/atomic_2.h even
easier too
I think maybe we are more or less saying the same thing? :-)
Andrew