Re: [PATCH] [ARC] Add support for atomic memory built-in.

Joern Wolfgang Rennecke Thu, 03 Dec 2015 17:09:43 -0800


On 16/11/15 10:18, Claudiu Zissulescu wrote:

+/* Expand code to perform a 8 or 16-bit compare and swap by doing

+   32-bit compare and swap on the word containing the byte or
+   half-word.  The difference between a weak and a strong CAS is that
+   the weak version may simply fail.  The strong version relays on two

Typo: relays -> relies

More importantly, your use of barriers makes no sense to me.
Memory models other that MEMMODEL_RELAXED impose two requirement
on the compiler:

- For systems without hardware memory coherency (e.g. multiple cacheswith softwaresynchronisation), emit any instructions necessary to archivecoherency for those objects

  that the access / memory model requires.

- Prevent code movement by compiler optimizations. This is where,hardware-independently, thememory model makes / could make a difference in how much restrictionsare placed on the

  optimizers.

Because of PR middle-end/59448, we currently promote MEMMODEL_CONSUME toMEMMODEL_AQUIRE;which is a shame, really, because otherwise we could just rely onordinary dependencies to prevent

reordering after a cache flush/invalidation at the atomic operation.

Now, assuming we have multiple cores with software-synchronized caches:

A MEMMODEL_SEQ / MEMMODEL_RELEASE operation requires a cache flush(unless you have awrite-through cache in the first place), so that all values that havebeen written into the local cachebecome visible in main memory. Also, any writes that are delayed due toout-of-order operation or

a write buffer must be flushed to main memory.

A MEMMODEL_SEQ / MEMMODEL_AQUIRE operation requires a cache invalidation- preceded by acache flush to avoid loosing data, so that values written by thereleasing thread to main memory

will be seen by the current thread.

The patterns that represent the hardware cache / synchronisationoperations may also double as

memory barriers for the compiler.

If you don't need hardware cache / synchronization operations (eitherbecause you have hardware coherency, or you have only a single cachesystem for all cores / the only core in the system),

you still need memory barriers for the compiler.

AFAICT, you use hardware synchronisation instruction for EMMODEL_SEQ,and compiler memory barriersfor all other memory models (except MEMMODEL_RELAXED). That makes nosense; either the platform

needs explicit instructions for memory coherency, or it doesn't.

On the other hand, your memory barriers are more restrictive than theyneed to be.To tell the compiler that it must not sink a write below MEMMODEL_SEQ /MEMMODE_RELEASE operations,it is sufficient to display a USE of an unspecified memory location.This is also true when you havea cache flush: it is sufficient to show the compiler that this cacheflush may read anything.(Well, actually, for our purposes it'd be OK to make it so thatthread-local variables, spill slots and variables that satisfy an escapeanalysis are considered independent.)The USE of the unspecified memory has to be tied to the atomicoperation, of course. This could beby making it part of the instruction pattern itself, or by having theatomic operation USE something(e.g. a fake hard register) that is 'set' by the memory barrier / sync/cache flush pattern.

Re: [PATCH] [ARC] Add support for atomic memory built-in.

Reply via email to