And due to OOoE of modern microarchitectures, with acquire/release, the load can even be executed before the store is executed since there is no raw dependency.
On Mon, Feb 17, 2025 at 10:10 AM Peter Veentjer <[email protected]> wrote: > For Hotspot and X86 a release/acquire vs volatile can make a difference. > > Imagine you would have a: > > A=10 > r1=B > > So we have a store to A and a load of B. > > On the X86 every store is a release store and every load is an acquire > load. > > On the X86, a store can be reordered with a newer load due to store > buffers. So A=10 and r1=B could be reordered. > > If A and B would be volatile, then this reordering isn't allowed. The > reason this isn't allowed is that a program without data races can only > have sequential consistent execution. And for an execution to be sequential > consistent, it needs to have the same effect as if all the threads ran > their operations in the order of the program (so no reordering). > > To prevent the store and load to be reordered, a [StoreLoad] barrier needs > to be inserted (e.g. in the form of an MFENCE or an LOCK prefixed > instruction) > > A=10 > [StoreLoad] > r1=B > > This [StoreLoad] effectively stalls the execution of the load (r1=B) till > the store (A=1) in the store buffer has been drained to the coherent cache > and this could take some time. There could be many queued stores in the > store buffer all waiting for the cache line to be returned in the right > state. > > Without [StoreLoad] the load can be performed if the store is still in the > store buffer. > > > > > > On Mon, Feb 17, 2025 at 9:34 AM Daniel Marques <[email protected]> > wrote: > >> Thanks for the response. I hope you don't mind a few follow ups: >> >> Is there a "for dummies" which describes the difference between >> Release/Acquire vs Volatile? For Hotspot and x86-64, are there actual >> differences in implementation, and measurable performance using >> Release/Acquire vs Volatile? >> >> Again, thanks in advance. >> >> Dan >> >> >> On Wed, Feb 12, 2025 at 2:05 PM Peter Veentjer <[email protected]> >> wrote: >> >>> Yes, it is the same. >>> >>> You could even go for: >>> >>> class ExampleTwo { >>> >>> void threadOne() { >>> dataBuffer.putInt(valueOffset, 100) >>> Unsafe.putIntRelease(null, dataBufferAddr + readyOffset, 1) >>> } >>> >>> void threadTwo() { >>> while ( Unsafe.getIntAcquire(null, dataBufferAddr + >>> readyOffset) == 0) >>> ; >>> assert dataBuffer.getInt(valueOffset) == 100; >>> } >>> } >>> >>> >>> On Wed, Feb 12, 2025 at 5:55 PM Daniel Marques <[email protected]> >>> wrote: >>> >>>> I'm very new to both offheap allocations and the JMM, etc., so forgive >>>> the perhaps naive question. >>>> >>>> The introduction material to the JMM typically presents the following >>>> example of a correct program, assuming the two methods are executed >>>> concurrently in different threads. >>>> >>>> class ExampleOne { >>>> volatile int ready; >>>> int value; >>>> >>>> void threadOne() { >>>> value = 100; >>>> ready = 1; >>>> } >>>> >>>> void threadTwo() { >>>> while (ready == 0) >>>> ; >>>> assert value == 100; >>>> } >>>> } >>>> >>>> Is the following semantically equivalent, but now the two methods could >>>> be run in different processes, or are there any additional operations >>>> necessary to 'coordinate' between two processes sharing memory (assuming >>>> jdk >= 9)? >>>> >>>> class ExampleTwo { >>>> MappedByteBuffer dataBuffer; >>>> long dataBufferAddr; >>>> int valueOffset = 0; >>>> int readyOffset = 4; >>>> >>>> ExampleTwo() { >>>> File file = new File("foo.dat"); >>>> FileChannel fc = new ... >>>> dataBuffer = fc.map(READ_WRITE, 0, 2 * Integer.BYTES) >>>> dataBufferAddr = Unsafe.magic(databuffer) // I'm actually >>>> using Agrona's UnsafeBuffer to do all the magic for me >>>> } >>>> >>>> void threadOne() { >>>> dataBuffer.putInt(valueOffset, 100) >>>> Unsafe.putIntVolatile(null, dataBufferAddr + readyOffset, 1) >>>> } >>>> >>>> void threadTwo() { >>>> while ( Unsafe.getIntVolatile(null, dataBufferAddr + >>>> readyOffset) == 0) >>>> ; >>>> assert dataBuffer.getInt(valueOffset) == 100; >>>> } >>>> } >>>> >>>> Thanks in advance, >>>> >>>> Dan >>>> >>>> On Tue, Feb 11, 2025 at 6:34 AM Peter Veentjer <[email protected]> >>>> wrote: >>>> >>>>> Thanks a lot for your answer and for the confirmation that my >>>>> understanding is correct. >>>>> >>>>> On Wed, Feb 5, 2025 at 12:30 PM Aleksey Shipilev < >>>>> [email protected]> wrote: >>>>> >>>>>> On 2/3/25 12:06, Peter Veentjer wrote: >>>>>> > Imagine the following code: >>>>>> > >>>>>> > ... lot of writes writes to the buffer >>>>>> > buffer.putInt(a_offset,a_value) (1) >>>>>> > buffer.putRelease(b_offset,b_value) (2) >>>>>> > releaseFence() (3) >>>>>> > buffer.putInt(c_offset,c_value) (4) >>>>>> > >>>>>> > Buffer is a chunk of memory that is shared with another process and >>>>>> the writes need to be seen in >>>>>> > order. So when 'b' is seen, 'a' should be seen. And when 'c' is >>>>>> seen, 'b' should be seen. There is >>>>>> > no other synchronization. >>>>>> > >>>>>> > All offsets are guaranteed to be naturally aligned. All the putInts >>>>>> are plain puts (using Unsafe). >>>>>> > >>>>>> > The putRelease (2) will ensure that 'a' is seen before 'b', and it >>>>>> will ensure atomicity and >>>>>> > visibility of 'b' (so the appropriate compiler and memory fences >>>>>> where needed). >>>>>> > >>>>>> > The releaseFence (3) will ensure that b is seen before c. >>>>>> >>>>>> Looks to me this fence can be replaced with releasing store of "c": >>>>>> >>>>>> buffer.putInt(a_offset,a_value) >>>>>> buffer.putRelease(b_offset,b_value) >>>>>> buffer.putRelease(c_offset,c_value) >>>>>> >>>>>> My preference is almost always to avoid the explicit fences if you >>>>>> can control the memory ordering >>>>>> of the actual accesses. Using putRelease instead of explicit fence >>>>>> also forces you think about the >>>>>> symmetries: should all loads of "c" be performed with getAcquire to >>>>>> match the putRelease? >>>>>> >>>>>> > My question is about (4). Since it is a plain store, the compiler >>>>>> can do a ton of trickery including >>>>>> > the delay of visibility of (4). Is my understanding correct and is >>>>>> there anything else that could go >>>>>> > wrong? >>>>>> >>>>>> The common wisdom is indeed "let's put non-plain memory access mode, >>>>>> so the access is hopefully more >>>>>> prompt", but I have not seen any of these effects thoroughly >>>>>> quantified beyond "let's forbid the >>>>>> compiler to yank our access out of the loop". Maybe I have not looked >>>>>> hard enough. >>>>>> >>>>>> I suspect the delays introduced by compiler moving code around in >>>>>> sequential code streams is on the >>>>>> scale where it does not matter all that much for end-to-end latency. >>>>>> The only (?) place where code >>>>>> movement impact could be multiplied to a macro-effect is when the >>>>>> memory ops shift in/out/around the >>>>>> loops. I would not be overly concerned about latency impact of >>>>>> reordering within the short straight >>>>>> code stream. >>>>>> >>>>>> You can try to measure it with producer-consumer / ping-pong style >>>>>> benchmarks: put more memory ops >>>>>> around (4), turn on instruction scheduler randomizers (-XX:+StressLCM >>>>>> should be useful here, maybe >>>>>> -XX:+StressGCM), see if there is an impact. I suspect the effect is >>>>>> too fine-grained to be >>>>>> accurately measured with direct timing measurements, so you'll need >>>>>> to get creative how to measure >>>>>> "promptness". >>>>>> >>>>>> > What would be the lowest memory access mode that would resolve this >>>>>> problem? My guess is that the >>>>>> > last putInt, should be a putIntOpaque. >>>>>> >>>>>> Yes, in current Hotspot, opaque would effectively pin the access in >>>>>> place, so it would be exposed to >>>>>> hardware in the order closer to original source code order. Then it >>>>>> is up to hardware to see when to >>>>>> perform the store. But as I said above, I'll be surprised if it >>>>>> actually matters. >>>>>> >>>>>> Thanks, >>>>>> -Aleksey >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "mechanical-sympathy" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion, visit >>>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "mechanical-sympathy" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion, visit >>>> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mechanical-sympathy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion, visit >>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion, visit >> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com >> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAV8yNjHma2to4So9%2BdMmksbhwLqkdWniU-wQUx1XPfvQ%40mail.gmail.com.
