Re: Making sense of Memory Barriers

Avi Kivity Wed, 22 Jun 2016 00:54:13 -0700

100% agree! In the kernel people ignore this, but only by splatteringmemory clobbers everywhere, to defeat any compiler reordering orcollapsing. But Gil's approach is much better, and works with languagesthat have a memory model.


On 06/21/2016 08:37 PM, Gil Tene wrote:

The main tip I give people that try to understand fences, barriers,and memory ordering rules is to completely (and very intentionally)ignore and avoid thinking about any CPU's internal details (likebuffers or queues or flushes or such). The simple rule to remember isthis: before a CPU ever sees the instructions you think you arerunning, a compiler (static, JIT, whatever) will have a chance tocompletely shuffle those instructions around and give them to the cpuin any order it sees fit. So what a CPU actually does with barriershas no functional/correntness implications, as it is the compiler'sjob to make the CPU do what it needs to. Barriers/fences/memory modelsare first and foremost compiler concerns, and have both a correctnessand (potential and profound) performance implications. The CPUconcerns are only secondary, and only effect performance (nevercorrectness). This is critical to understand and get your head aroundbefore trying to reason about what fences/barriers/etc. *mean* toprogram code.
For example, the sort of code mutation we are talking about is notjust a simple change in order. It is also a combination of reorderingwith other optimizations. E.g. a repreated load from the same locationin a loop can (and will) float "up above" the loop, execute only once(instead of once per loop iteration) and cache the loaded result in aregister, unless barriers/fences/ordering rules prevent it from doingso. Similarly redundant stores to the same location can be folded intoa single store, unless rules prevent it.
With that in mind, you can start looking at the various barriers andfence semantics available in various languages (that actually definethem) and in various ad-hoc tooling (like e.g. libraries andconventions used in C).
I like to use the simplistic LoadLoad, LoadStore, StoreStore,StoreLoad mental model as starting point for thinking about barriers,as it can be used to model most (not all) barrier semantics. Those aresimple to understand:- LoadStore means "all loads that precede this fence will appear (fromthe point of view of any other thread) to execute before the nextstore appears to execute.- LoadLoad means "all loads that precede this fence will appear (fromthe point of view of any other thread) to execute before the next loadappears to execute.
etc. etc.
The name tells you what the rule is, clear and simple. Nothing more ispromised, and nothing less.
I like to think of Acquire and Release (which are somewhat morerestrictive) in terms of "what would a lock need to do?" (even if nolock is involved). Thinking in terms of "in a critical sectionprotected by a lock, what sort of code motion would theacquire/release fences associated with the lock acquire and releaseaction allow?" allows me to answer questions about them withoutlooking up a definition or thinking about the more abstract notionsthat define them. E.g. A lock acquire can allow preceding loads andstores that (in the program) appear "before" the lock acquire to"float forward past the acquire and into the locked region". But itcannot allow loads and stores that (in the program) appear "after" thelock acquire to "float to backwards past the acquire and out of thelocked region". So an acquire fence needs to enforce those rules.Similarly, release can allow loads and stores that follow the releaseto "float back past the release and into the locked region", butcannot allow loads and stores that precede it to "float forward pastthe release and out of the locked region".
Since acquire doesn't (necessarily) involve a load or a store, Acquiredoesn't quite equate to LoadLoad|LoadStore. It can be when acquiringthe lock involves a load followed by LoadLoad|LoadStore, but there arelock implementations where that doesn't actually happen, and uses ofAcquire fences with no locking or loads involved. Similarly, since arelease doesn't (necessarily) involve a load or a store, it is notquite equivalent to StoreLoad|StoreStore. (But can be when releasinginvolves a store followed by StoreLoad|StoreStore.
HTH

On Sunday, June 12, 2016 at 4:28:14 PM UTC-7, Dain Ironfoot wrote:

    Hi Guys,

    I am attempting to understand memory barriers at a level useful
    for java lock-free programmers.
    This level, I feel, is somewhere between learning about volatile
    from java books and learning about working of Store/Load buffers
    from an x86 manual.
    I spent some time reading a bunch of blogs/cookbooks and have come
    up with the summary shown below.

    LFENCE
    ====================================================
    Name             : LFENCE/Load Barrier/Acquire Fence
    Barriers          : LoadLoad + LoadStore
    Details           : Given sequence {Load1, LFENCE, Load2, Store1},
    the barrier ensures that Load1 can't be moved south and Load2 and
    Store1 can't be   moved north of the barrier. Note that Load2 and
    Store1 can still be reordered.
    Buffer Effect    : Causes the contents of the *LoadBuffer*
    (pending loads) to be processed for that CPU.
                            This makes program state exposed from
    other CPUs visible to this CPU before Load2 and Store1 are executed.
    Cost on x86    : Either very cheap or a no-op.
    Java instruction: Reading a volatile variable, Unsafe.loadFence()


    SFENCE
    ====================================================
    Name              : SFENCE/Store Barrier/Release Fence
    Barriers           : StoreStore + LoadStore
    Details            : Given sequence {Load1, Store1, SFENCE,
    Store2, Load2}, the barrier ensures that Load1 and Store1 can't be
    moved south and Store2 can't be moved north of the barrier.
                           Note that Load1 and Store1 can still be
    reordered AND Load2 can be  moved north of the barrier.
    Buffer Effect    : Causes the contents of the *StoreBuffer
    *flushed to cache for the CPU on which it is issued.
                            This will make program state visible to
    other CPUs before Store2 and Load1 are executed.
    Cost on x86    : Either very cheap or a no-op.
    Java instruction: lazySet(), Unsafe.storeFence(), Unsafe.putOrdered*()


    MFENCE
    ====================================================
    Name            : MFENCE/Full Barrier/Fence
    Barriers         : StoreLoad
    Details          : Obtains the effects of the other three barrier.
    So can serve as a general-purpose barrier.
                          Given sequence {Load1, Store1, MFENCE,
    Store2, Load2}, the barrier ensures that Load1 and Store1 can't be
    moved south and Store2 and       Load2 can't be moved north of the
    barrier.
                          Note that Load1 and Store1 can still be
    reordered AND Store2 and Load2 can still be reordered.

    Buffer Effect   : Causes the contents of the *LoadBuffer *(pending
    loads) to be processed for that CPU.
                          AND
                          Causes the contents of the *StoreBuffer
    *flushed to cache for the CPU on which it is issued.

    Cost on x86    : The most expensive kind.
    Java instruction: Writing to a volatile, Unsafe.fullFence(), Using
    locks

    My questions are:

    a) Is my understanding correct or am I missed something critical?
    b) If both SFENCE and MFENCE drains the storeBuffer (invalidates
    cacheline and waits for acks from other cpus), why is one a no-op
    and the other a very expensive op?

    Thanks

    ** Edited to make the info easier to read.

--
You received this message because you are subscribed to the GoogleGroups "mechanical-sympathy" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected]<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Making sense of Memory Barriers

Reply via email to