Re: Avoiding expensive memory barriers

Gil Tene Wed, 21 Mar 2018 14:29:01 -0700

Daniel, if what you originally meant is "You don't need memory barriers 
that incur any costs on x86 to implement an SPSC queue.", instead of the 
stated "You don't need memory barriers to implement an SPSC queue for 
x86.", you and Avi may be on the same page. I think.

Barriers are barriers no matter what level you express them in. Arguably, 
all source-code-stated barriers are "compiler barriers". And from an 
expression point of view, most barriers are "language memory model 
barriers" (this is true even in C or MASM when calling 
implementation-specific things that establish a required (or the more 
common I-sure-hope-the-compiler-listens-to-me-and-doesn't-mess-with-this) 
ordering. At the language/source-code levels some barriers might be 
explicit and some are implicit (e.g. a read from a volatile field in Java 
includes implicit barrier semantics, but a VarHandle.loadLoadFence() is an 
explicit barrier). The compiler (or macro assembler, etc.) will adhere to 
the language semantics barriers in its own choices on ordering and 
potential elimination of code (before any CPU instructions actually get 
emitted), and it may translate some of those semantic barriers to machine 
instructions that enforce them (stated or implicit). The x86 memory model 
includes all sorts of implicit and "free" barriers. e.g. x86 instructions 
generally imply load-load, and load-store, and store-store, but not 
store-load (Only specific x86 instructions establish a store-load 
ordering). 

On x86, compilers don't need to emit barrier instruction in order to 
maintain load-load or load-store, or store-store ordering, or to enforce 
acquire fences (loadLoad and loadStore combined) and release fences 
(loadStore and storeStore combined). But the compilers themselves still 
need to know what fences/barriers they need to enforce in their own 
code-jumbling. E.g. without a source-code-semantics barrier requiring 
store-store ordering, the compiler may feely reorder any two stores in your 
code, and emit the stores in that new order, which can easily wreck the 
correctness of otherwise very-fast-on-x86 SPSC queue code, even on x86.

None of this opines on whether or not SPSC can be done without a store-load 
barrier so that on x86 no barrier instructions would be needed. I don't 
know if it can.

On Monday, March 19, 2018 at 7:41:41 AM UTC-7, Daniel Eloff wrote:
>
> We're getting a little confused on the terminology. That's a compiler 
> barrier, as it prevents the compiler from reordering certain instructions 
> beyond it (I don't think relaxed prevents any reordering, but release and 
> acquire do.) I know you understand this stuff given your background, I just 
> want to clarify the terminology for the sake of the discussion.
>
> The original post and article discuss real memory barriers like mfence. 
> These prevent the CPU from reordering loads and stores. Which should be 
> unnecessary for SPSC queues on x86 because it gives strong enough 
> guarantees about reordering, in this case, without that.
>
>
> On Mon, Mar 19, 2018, 1:19 AM Avi Kivity <a...@scylladb.com <javascript:>> 
> wrote:
>
>> The release write is a memory barrier. It's not an SFENCE or another 
>> fancy instruction, but it is a memory barrier from the application writer's 
>> point of view.
>>
>>
>> The C++ code
>>
>>
>>     x.store(5, std::memory_order_relaxed)
>>
>> has two effects on x86:
>>
>>   1. generate a write to x that is a single instruction (e.g. mov $5, x)
>>   2. prevent preceding writes from being reordered by the compiler (they 
>> are implicitly ordered by the processor on x86).
>>
>>
>>
>> On 03/18/2018 08:16 PM, Dan Eloff wrote:
>>
>> You don't need memory barriers to implement an SPSC queue for x86. You 
>> can do a relaxed store to the queue followed by a release write to 
>> producer_idx. As long as consumer begins with an acquire load from 
>> producer_idx it is guaranteed to see all stores to the queue memory before 
>> producer_idx, according to the happens before ordering. There are no memory 
>> barriers on x86 for acquire/release semantics.
>>
>> The release/acquire semantics have no meaning when used with different 
>> memory locations, but if used on producer_idx when synchronizing the 
>> consumer, and consumer_idx when synchronizing the producer, it should work.
>>
>>
>>
>> On Thu, Feb 15, 2018 at 8:29 AM, Avi Kivity <a...@scylladb.com 
>> <javascript:>> wrote:
>>
>>> Ever see mfence (aka full memory barrier, or std::memory_order_seq_cst) 
>>> taking the top row in a profile? Here's the complicated story of how we 
>>> took it down:
>>>
>>>
>>> https://www.scylladb.com/2018/02/15/memory-barriers-seastar-linux/
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to mechanical-sympathy+unsubscr...@googlegroups.com 
>>> <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to mechanical-sympathy+unsubscr...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mechanical-sympathy+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Avoiding expensive memory barriers

Reply via email to