On 16/11/15 10:18, Claudiu Zissulescu wrote:
+/* Expand code to perform a 8 or 16-bit compare and swap by doing
+ 32-bit compare and swap on the word containing the byte or
+ half-word. The difference between a weak and a strong CAS is that
+ the weak version may simply fail. The strong version relays on two
Typo: relays -> relies
More importantly, your use of barriers makes no sense to me.
Memory models other that MEMMODEL_RELAXED impose two requirement
on the compiler:
- For systems without hardware memory coherency (e.g. multiple caches
with software
synchronisation), emit any instructions necessary to archive
coherency for those objects
that the access / memory model requires.
- Prevent code movement by compiler optimizations. This is where,
hardware-independently, the
memory model makes / could make a difference in how much restrictions
are placed on the
optimizers.
Because of PR middle-end/59448, we currently promote MEMMODEL_CONSUME to
MEMMODEL_AQUIRE;
which is a shame, really, because otherwise we could just rely on
ordinary dependencies to prevent
reordering after a cache flush/invalidation at the atomic operation.
Now, assuming we have multiple cores with software-synchronized caches:
A MEMMODEL_SEQ / MEMMODEL_RELEASE operation requires a cache flush
(unless you have a
write-through cache in the first place), so that all values that have
been written into the local cache
become visible in main memory. Also, any writes that are delayed due to
out-of-order operation or
a write buffer must be flushed to main memory.
A MEMMODEL_SEQ / MEMMODEL_AQUIRE operation requires a cache invalidation
- preceded by a
cache flush to avoid loosing data, so that values written by the
releasing thread to main memory
will be seen by the current thread.
The patterns that represent the hardware cache / synchronisation
operations may also double as
memory barriers for the compiler.
If you don't need hardware cache / synchronization operations (either
because you have hardware coherency, or you have only a single cache
system for all cores / the only core in the system),
you still need memory barriers for the compiler.
AFAICT, you use hardware synchronisation instruction for EMMODEL_SEQ,
and compiler memory barriers
for all other memory models (except MEMMODEL_RELAXED). That makes no
sense; either the platform
needs explicit instructions for memory coherency, or it doesn't.
On the other hand, your memory barriers are more restrictive than they
need to be.
To tell the compiler that it must not sink a write below MEMMODEL_SEQ /
MEMMODE_RELEASE operations,
it is sufficient to display a USE of an unspecified memory location.
This is also true when you have
a cache flush: it is sufficient to show the compiler that this cache
flush may read anything.
(Well, actually, for our purposes it'd be OK to make it so that
thread-local variables, spill slots and variables that satisfy an escape
analysis are considered independent.)
The USE of the unspecified memory has to be tied to the atomic
operation, of course. This could be
by making it part of the instruction pattern itself, or by having the
atomic operation USE something
(e.g. a fake hard register) that is 'set' by the memory barrier / sync/
cache flush pattern.