On 16/11/15 10:18, Claudiu Zissulescu wrote:
+/* Expand code to perform a 8 or 16-bit compare and swap by doing
+   32-bit compare and swap on the word containing the byte or
+   half-word.  The difference between a weak and a strong CAS is that
+   the weak version may simply fail.  The strong version relays on two

Typo: relays -> relies

More importantly, your use of barriers makes no sense to me.
Memory models other that MEMMODEL_RELAXED impose two requirement
on the compiler:
- For systems without hardware memory coherency (e.g. multiple caches with software synchronisation), emit any instructions necessary to archive coherency for those objects
  that the access / memory model requires.
- Prevent code movement by compiler optimizations. This is where, hardware-independently, the memory model makes / could make a difference in how much restrictions are placed on the
  optimizers.

Because of PR middle-end/59448, we currently promote MEMMODEL_CONSUME to MEMMODEL_AQUIRE; which is a shame, really, because otherwise we could just rely on ordinary dependencies to prevent
reordering after a cache flush/invalidation at the atomic operation.

Now, assuming we have multiple cores with software-synchronized caches:

A MEMMODEL_SEQ / MEMMODEL_RELEASE operation requires a cache flush (unless you have a write-through cache in the first place), so that all values that have been written into the local cache become visible in main memory. Also, any writes that are delayed due to out-of-order operation or
a write buffer must be flushed to main memory.

A MEMMODEL_SEQ / MEMMODEL_AQUIRE operation requires a cache invalidation - preceded by a cache flush to avoid loosing data, so that values written by the releasing thread to main memory
will be seen by the current thread.

The patterns that represent the hardware cache / synchronisation operations may also double as
memory barriers for the compiler.

If you don't need hardware cache / synchronization operations (either because you have hardware coherency, or you have only a single cache system for all cores / the only core in the system),
you still need memory barriers for the compiler.

AFAICT, you use hardware synchronisation instruction for EMMODEL_SEQ, and compiler memory barriers for all other memory models (except MEMMODEL_RELAXED). That makes no sense; either the platform
needs explicit instructions for memory coherency, or it doesn't.

On the other hand, your memory barriers are more restrictive than they need to be. To tell the compiler that it must not sink a write below MEMMODEL_SEQ / MEMMODE_RELEASE operations, it is sufficient to display a USE of an unspecified memory location. This is also true when you have a cache flush: it is sufficient to show the compiler that this cache flush may read anything. (Well, actually, for our purposes it'd be OK to make it so that thread-local variables, spill slots and variables that satisfy an escape analysis are considered independent.) The USE of the unspecified memory has to be tied to the atomic operation, of course. This could be by making it part of the instruction pattern itself, or by having the atomic operation USE something (e.g. a fake hard register) that is 'set' by the memory barrier / sync/ cache flush pattern.

Reply via email to