Re: Another memory order opaque question

Peter Veentjer Mon, 17 Feb 2025 00:11:17 -0800

For Hotspot and X86 a release/acquire vs volatile can make a difference.

Imagine you would have a:


A=10
r1=B

So we have a store to A and a load of B.

On the X86 every store is a release store and every load is an acquire load.

On the X86, a store can be reordered with a newer load due to store
buffers. So A=10 and r1=B could be reordered.

If A and B would be volatile, then this reordering isn't allowed. The
reason this isn't allowed is that a program without data races can only
have sequential consistent execution. And for an execution to be sequential
consistent, it needs to have the same effect as if all the threads ran
their operations in the order of the program (so no reordering).

To prevent the store and load to be reordered, a [StoreLoad] barrier needs
to be inserted  (e.g. in the form of an MFENCE or an LOCK prefixed
instruction)

A=10
[StoreLoad]
r1=B

This [StoreLoad] effectively stalls the execution of the load (r1=B) till
the store (A=1) in the store buffer has been drained to the coherent cache
and this could take some time. There could be many queued stores in the
store buffer all waiting for the cache line to be returned in the right
state.

Without [StoreLoad] the load can be performed if the store is still in the
store buffer.





On Mon, Feb 17, 2025 at 9:34 AM Daniel Marques <[email protected]>
wrote:

> Thanks for the response.  I hope you don't mind a few follow ups:
>
> Is there a "for dummies" which describes the difference between
> Release/Acquire vs Volatile?  For Hotspot and x86-64, are there actual
> differences in implementation, and measurable performance using
> Release/Acquire vs Volatile?
>
> Again, thanks in advance.
>
> Dan
>
>
> On Wed, Feb 12, 2025 at 2:05 PM Peter Veentjer <[email protected]>
> wrote:
>
>> Yes, it is the same.
>>
>> You could even go for:
>>
>> class ExampleTwo {
>>
>>      void threadOne() {
>>           dataBuffer.putInt(valueOffset, 100)
>>           Unsafe.putIntRelease(null,  dataBufferAddr + readyOffset, 1)
>>      }
>>
>>      void threadTwo() {
>>           while ( Unsafe.getIntAcquire(null,  dataBufferAddr  +
>> readyOffset)  == 0)
>>                 ;
>>           assert  dataBuffer.getInt(valueOffset)  == 100;
>>      }
>> }
>>
>>
>> On Wed, Feb 12, 2025 at 5:55 PM Daniel Marques <[email protected]>
>> wrote:
>>
>>> I'm very new to both offheap allocations and the JMM, etc., so forgive
>>> the perhaps naive question.
>>>
>>> The introduction material to the JMM typically presents the following
>>> example of a correct program, assuming the two methods are executed
>>> concurrently in different threads.
>>>
>>> class ExampleOne {
>>>      volatile int ready;
>>>      int value;
>>>
>>>      void threadOne() {
>>>           value = 100;
>>>           ready = 1;
>>>      }
>>>
>>>      void threadTwo() {
>>>           while (ready == 0)
>>>                 ;
>>>           assert value == 100;
>>>      }
>>> }
>>>
>>> Is the following semantically equivalent, but now the two methods could
>>> be run in different processes, or are there any additional operations
>>> necessary to 'coordinate' between two processes sharing memory (assuming
>>> jdk >= 9)?
>>>
>>> class ExampleTwo {
>>>       MappedByteBuffer dataBuffer;
>>>       long   dataBufferAddr;
>>>       int valueOffset = 0;
>>>       int readyOffset = 4;
>>>
>>>       ExampleTwo() {
>>>             File file = new File("foo.dat");
>>>             FileChannel fc = new ...
>>>             dataBuffer = fc.map(READ_WRITE, 0, 2 * Integer.BYTES)
>>>             dataBufferAddr = Unsafe.magic(databuffer) // I'm actually
>>> using Agrona's UnsafeBuffer to do all the magic for me
>>>       }
>>>
>>>      void threadOne() {
>>>           dataBuffer.putInt(valueOffset, 100)
>>>           Unsafe.putIntVolatile(null,  dataBufferAddr + readyOffset, 1)
>>>      }
>>>
>>>      void threadTwo() {
>>>           while ( Unsafe.getIntVolatile(null,  dataBufferAddr  +
>>> readyOffset)  == 0)
>>>                 ;
>>>           assert  dataBuffer.getInt(valueOffset)  == 100;
>>>      }
>>> }
>>>
>>> Thanks in advance,
>>>
>>> Dan
>>>
>>> On Tue, Feb 11, 2025 at 6:34 AM Peter Veentjer <[email protected]>
>>> wrote:
>>>
>>>> Thanks a lot for your answer and for the confirmation that my
>>>> understanding is correct.
>>>>
>>>> On Wed, Feb 5, 2025 at 12:30 PM Aleksey Shipilev <
>>>> [email protected]> wrote:
>>>>
>>>>> On 2/3/25 12:06, Peter Veentjer wrote:
>>>>> > Imagine the following code:
>>>>> >
>>>>> > ... lot of writes writes to the buffer
>>>>> > buffer.putInt(a_offset,a_value)  (1)
>>>>> > buffer.putRelease(b_offset,b_value) (2)
>>>>> > releaseFence() (3)
>>>>> > buffer.putInt(c_offset,c_value) (4)
>>>>> >
>>>>> > Buffer is a chunk of memory that is shared with another process and
>>>>> the writes need to be seen in
>>>>> > order. So when 'b' is seen, 'a' should be seen. And when 'c' is
>>>>> seen, 'b' should be seen. There is
>>>>> > no other synchronization.
>>>>> >
>>>>> > All offsets are guaranteed to be naturally aligned. All the putInts
>>>>> are plain puts (using Unsafe).
>>>>> >
>>>>> > The putRelease (2) will ensure that 'a' is seen before 'b', and it
>>>>> will ensure atomicity and
>>>>> > visibility of 'b' (so the appropriate compiler and memory fences
>>>>> where needed).
>>>>> >
>>>>> > The releaseFence (3) will ensure that b is seen before c.
>>>>>
>>>>> Looks to me this fence can be replaced with releasing store of "c":
>>>>>
>>>>>   buffer.putInt(a_offset,a_value)
>>>>>   buffer.putRelease(b_offset,b_value)
>>>>>   buffer.putRelease(c_offset,c_value)
>>>>>
>>>>> My preference is almost always to avoid the explicit fences if you can
>>>>> control the memory ordering
>>>>> of the actual accesses. Using putRelease instead of explicit fence
>>>>> also forces you think about the
>>>>> symmetries: should all loads of "c" be performed with getAcquire to
>>>>> match the putRelease?
>>>>>
>>>>> > My question is about (4). Since it is a plain store, the compiler
>>>>> can do a ton of trickery including
>>>>> > the delay of visibility of (4). Is my understanding correct and is
>>>>> there anything else that could go
>>>>> > wrong?
>>>>>
>>>>> The common wisdom is indeed "let's put non-plain memory access mode,
>>>>> so the access is hopefully more
>>>>> prompt", but I have not seen any of these effects thoroughly
>>>>> quantified beyond "let's forbid the
>>>>> compiler to yank our access out of the loop". Maybe I have not looked
>>>>> hard enough.
>>>>>
>>>>> I suspect the delays introduced by compiler moving code around in
>>>>> sequential code streams is on the
>>>>> scale where it does not matter all that much for end-to-end latency.
>>>>> The only (?) place where code
>>>>> movement impact could be multiplied to a macro-effect is when the
>>>>> memory ops shift in/out/around the
>>>>> loops. I would not be overly concerned about latency impact of
>>>>> reordering within the short straight
>>>>> code stream.
>>>>>
>>>>> You can try to measure it with producer-consumer / ping-pong style
>>>>> benchmarks: put more memory ops
>>>>> around (4), turn on instruction scheduler randomizers (-XX:+StressLCM
>>>>> should be useful here, maybe
>>>>> -XX:+StressGCM), see if there is an impact. I suspect the effect is
>>>>> too fine-grained to be
>>>>> accurately measured with direct timing measurements, so you'll need to
>>>>> get creative how to measure
>>>>> "promptness".
>>>>>
>>>>> > What would be the lowest memory access mode that would resolve this
>>>>> problem? My guess is that the
>>>>> > last putInt, should be a putIntOpaque.
>>>>>
>>>>> Yes, in current Hotspot, opaque would effectively pin the access in
>>>>> place, so it would be exposed to
>>>>> hardware in the order closer to original source code order. Then it is
>>>>> up to hardware to see when to
>>>>> perform the store. But as I said above, I'll be surprised if it
>>>>> actually matters.
>>>>>
>>>>> Thanks,
>>>>> -Aleksey
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "mechanical-sympathy" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion, visit
>>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion, visit
>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion, visit
>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com
>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion, visit
> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com
> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion, visit 
https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdD8OBjYC_Fqgx5HnXvq_3iKiFNeq0ZL2nY2L_J85GaB_g%40mail.gmail.com.

Re: Another memory order opaque question

Reply via email to