Thanks for the response. I hope you don't mind a few follow ups: Is there a "for dummies" which describes the difference between Release/Acquire vs Volatile? For Hotspot and x86-64, are there actual differences in implementation, and measurable performance using Release/Acquire vs Volatile?
Again, thanks in advance. Dan On Wed, Feb 12, 2025 at 2:05 PM Peter Veentjer <[email protected]> wrote: > Yes, it is the same. > > You could even go for: > > class ExampleTwo { > > void threadOne() { > dataBuffer.putInt(valueOffset, 100) > Unsafe.putIntRelease(null, dataBufferAddr + readyOffset, 1) > } > > void threadTwo() { > while ( Unsafe.getIntAcquire(null, dataBufferAddr + > readyOffset) == 0) > ; > assert dataBuffer.getInt(valueOffset) == 100; > } > } > > > On Wed, Feb 12, 2025 at 5:55 PM Daniel Marques <[email protected]> > wrote: > >> I'm very new to both offheap allocations and the JMM, etc., so forgive >> the perhaps naive question. >> >> The introduction material to the JMM typically presents the following >> example of a correct program, assuming the two methods are executed >> concurrently in different threads. >> >> class ExampleOne { >> volatile int ready; >> int value; >> >> void threadOne() { >> value = 100; >> ready = 1; >> } >> >> void threadTwo() { >> while (ready == 0) >> ; >> assert value == 100; >> } >> } >> >> Is the following semantically equivalent, but now the two methods could >> be run in different processes, or are there any additional operations >> necessary to 'coordinate' between two processes sharing memory (assuming >> jdk >= 9)? >> >> class ExampleTwo { >> MappedByteBuffer dataBuffer; >> long dataBufferAddr; >> int valueOffset = 0; >> int readyOffset = 4; >> >> ExampleTwo() { >> File file = new File("foo.dat"); >> FileChannel fc = new ... >> dataBuffer = fc.map(READ_WRITE, 0, 2 * Integer.BYTES) >> dataBufferAddr = Unsafe.magic(databuffer) // I'm actually >> using Agrona's UnsafeBuffer to do all the magic for me >> } >> >> void threadOne() { >> dataBuffer.putInt(valueOffset, 100) >> Unsafe.putIntVolatile(null, dataBufferAddr + readyOffset, 1) >> } >> >> void threadTwo() { >> while ( Unsafe.getIntVolatile(null, dataBufferAddr + >> readyOffset) == 0) >> ; >> assert dataBuffer.getInt(valueOffset) == 100; >> } >> } >> >> Thanks in advance, >> >> Dan >> >> On Tue, Feb 11, 2025 at 6:34 AM Peter Veentjer <[email protected]> >> wrote: >> >>> Thanks a lot for your answer and for the confirmation that my >>> understanding is correct. >>> >>> On Wed, Feb 5, 2025 at 12:30 PM Aleksey Shipilev < >>> [email protected]> wrote: >>> >>>> On 2/3/25 12:06, Peter Veentjer wrote: >>>> > Imagine the following code: >>>> > >>>> > ... lot of writes writes to the buffer >>>> > buffer.putInt(a_offset,a_value) (1) >>>> > buffer.putRelease(b_offset,b_value) (2) >>>> > releaseFence() (3) >>>> > buffer.putInt(c_offset,c_value) (4) >>>> > >>>> > Buffer is a chunk of memory that is shared with another process and >>>> the writes need to be seen in >>>> > order. So when 'b' is seen, 'a' should be seen. And when 'c' is seen, >>>> 'b' should be seen. There is >>>> > no other synchronization. >>>> > >>>> > All offsets are guaranteed to be naturally aligned. All the putInts >>>> are plain puts (using Unsafe). >>>> > >>>> > The putRelease (2) will ensure that 'a' is seen before 'b', and it >>>> will ensure atomicity and >>>> > visibility of 'b' (so the appropriate compiler and memory fences >>>> where needed). >>>> > >>>> > The releaseFence (3) will ensure that b is seen before c. >>>> >>>> Looks to me this fence can be replaced with releasing store of "c": >>>> >>>> buffer.putInt(a_offset,a_value) >>>> buffer.putRelease(b_offset,b_value) >>>> buffer.putRelease(c_offset,c_value) >>>> >>>> My preference is almost always to avoid the explicit fences if you can >>>> control the memory ordering >>>> of the actual accesses. Using putRelease instead of explicit fence also >>>> forces you think about the >>>> symmetries: should all loads of "c" be performed with getAcquire to >>>> match the putRelease? >>>> >>>> > My question is about (4). Since it is a plain store, the compiler can >>>> do a ton of trickery including >>>> > the delay of visibility of (4). Is my understanding correct and is >>>> there anything else that could go >>>> > wrong? >>>> >>>> The common wisdom is indeed "let's put non-plain memory access mode, so >>>> the access is hopefully more >>>> prompt", but I have not seen any of these effects thoroughly quantified >>>> beyond "let's forbid the >>>> compiler to yank our access out of the loop". Maybe I have not looked >>>> hard enough. >>>> >>>> I suspect the delays introduced by compiler moving code around in >>>> sequential code streams is on the >>>> scale where it does not matter all that much for end-to-end latency. >>>> The only (?) place where code >>>> movement impact could be multiplied to a macro-effect is when the >>>> memory ops shift in/out/around the >>>> loops. I would not be overly concerned about latency impact of >>>> reordering within the short straight >>>> code stream. >>>> >>>> You can try to measure it with producer-consumer / ping-pong style >>>> benchmarks: put more memory ops >>>> around (4), turn on instruction scheduler randomizers (-XX:+StressLCM >>>> should be useful here, maybe >>>> -XX:+StressGCM), see if there is an impact. I suspect the effect is too >>>> fine-grained to be >>>> accurately measured with direct timing measurements, so you'll need to >>>> get creative how to measure >>>> "promptness". >>>> >>>> > What would be the lowest memory access mode that would resolve this >>>> problem? My guess is that the >>>> > last putInt, should be a putIntOpaque. >>>> >>>> Yes, in current Hotspot, opaque would effectively pin the access in >>>> place, so it would be exposed to >>>> hardware in the order closer to original source code order. Then it is >>>> up to hardware to see when to >>>> perform the store. But as I said above, I'll be surprised if it >>>> actually matters. >>>> >>>> Thanks, >>>> -Aleksey >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mechanical-sympathy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion, visit >>> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com >>> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "mechanical-sympathy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion, visit >> https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion, visit > https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com > <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdDAHj3hbrMWo7Y4ik62JB5GDsVfnq%2BsGKFBdCFNZ6O9hw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbWqy1hDvVh9xuY5wEE9wr5wfO90wB2i9Tw1QL1%2BbtR8A%40mail.gmail.com.
