I'm very new to both offheap allocations and the JMM, etc., so forgive the
perhaps naive question.
The introduction material to the JMM typically presents the following
example of a correct program, assuming the two methods are executed
concurrently in different threads.
class ExampleOne {
volatile int ready;
int value;
void threadOne() {
value = 100;
ready = 1;
}
void threadTwo() {
while (ready == 0)
;
assert value == 100;
}
}
Is the following semantically equivalent, but now the two methods could be
run in different processes, or are there any additional operations
necessary to 'coordinate' between two processes sharing memory (assuming
jdk >= 9)?
class ExampleTwo {
MappedByteBuffer dataBuffer;
long dataBufferAddr;
int valueOffset = 0;
int readyOffset = 4;
ExampleTwo() {
File file = new File("foo.dat");
FileChannel fc = new ...
dataBuffer = fc.map(READ_WRITE, 0, 2 * Integer.BYTES)
dataBufferAddr = Unsafe.magic(databuffer) // I'm actually using
Agrona's UnsafeBuffer to do all the magic for me
}
void threadOne() {
dataBuffer.putInt(valueOffset, 100)
Unsafe.putIntVolatile(null, dataBufferAddr + readyOffset, 1)
}
void threadTwo() {
while ( Unsafe.getIntVolatile(null, dataBufferAddr +
readyOffset) == 0)
;
assert dataBuffer.getInt(valueOffset) == 100;
}
}
Thanks in advance,
Dan
On Tue, Feb 11, 2025 at 6:34 AM Peter Veentjer <[email protected]>
wrote:
> Thanks a lot for your answer and for the confirmation that my
> understanding is correct.
>
> On Wed, Feb 5, 2025 at 12:30 PM Aleksey Shipilev <
> [email protected]> wrote:
>
>> On 2/3/25 12:06, Peter Veentjer wrote:
>> > Imagine the following code:
>> >
>> > ... lot of writes writes to the buffer
>> > buffer.putInt(a_offset,a_value) (1)
>> > buffer.putRelease(b_offset,b_value) (2)
>> > releaseFence() (3)
>> > buffer.putInt(c_offset,c_value) (4)
>> >
>> > Buffer is a chunk of memory that is shared with another process and the
>> writes need to be seen in
>> > order. So when 'b' is seen, 'a' should be seen. And when 'c' is seen,
>> 'b' should be seen. There is
>> > no other synchronization.
>> >
>> > All offsets are guaranteed to be naturally aligned. All the putInts are
>> plain puts (using Unsafe).
>> >
>> > The putRelease (2) will ensure that 'a' is seen before 'b', and it will
>> ensure atomicity and
>> > visibility of 'b' (so the appropriate compiler and memory fences where
>> needed).
>> >
>> > The releaseFence (3) will ensure that b is seen before c.
>>
>> Looks to me this fence can be replaced with releasing store of "c":
>>
>> buffer.putInt(a_offset,a_value)
>> buffer.putRelease(b_offset,b_value)
>> buffer.putRelease(c_offset,c_value)
>>
>> My preference is almost always to avoid the explicit fences if you can
>> control the memory ordering
>> of the actual accesses. Using putRelease instead of explicit fence also
>> forces you think about the
>> symmetries: should all loads of "c" be performed with getAcquire to match
>> the putRelease?
>>
>> > My question is about (4). Since it is a plain store, the compiler can
>> do a ton of trickery including
>> > the delay of visibility of (4). Is my understanding correct and is
>> there anything else that could go
>> > wrong?
>>
>> The common wisdom is indeed "let's put non-plain memory access mode, so
>> the access is hopefully more
>> prompt", but I have not seen any of these effects thoroughly quantified
>> beyond "let's forbid the
>> compiler to yank our access out of the loop". Maybe I have not looked
>> hard enough.
>>
>> I suspect the delays introduced by compiler moving code around in
>> sequential code streams is on the
>> scale where it does not matter all that much for end-to-end latency. The
>> only (?) place where code
>> movement impact could be multiplied to a macro-effect is when the memory
>> ops shift in/out/around the
>> loops. I would not be overly concerned about latency impact of reordering
>> within the short straight
>> code stream.
>>
>> You can try to measure it with producer-consumer / ping-pong style
>> benchmarks: put more memory ops
>> around (4), turn on instruction scheduler randomizers (-XX:+StressLCM
>> should be useful here, maybe
>> -XX:+StressGCM), see if there is an impact. I suspect the effect is too
>> fine-grained to be
>> accurately measured with direct timing measurements, so you'll need to
>> get creative how to measure
>> "promptness".
>>
>> > What would be the lowest memory access mode that would resolve this
>> problem? My guess is that the
>> > last putInt, should be a putIntOpaque.
>>
>> Yes, in current Hotspot, opaque would effectively pin the access in
>> place, so it would be exposed to
>> hardware in the order closer to original source code order. Then it is up
>> to hardware to see when to
>> perform the store. But as I said above, I'll be surprised if it actually
>> matters.
>>
>> Thanks,
>> -Aleksey
>>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion, visit
> https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com
> <https://groups.google.com/d/msgid/mechanical-sympathy/CAGuAWdAsWprk9BK46iJdZ_w1wPBcM4OCkDgCLTAP98B4VCPscw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion, visit
https://groups.google.com/d/msgid/mechanical-sympathy/CAO%3DkmEbpAvVtsnjCQn%2BUShRPa%2B8uJgCgGj9OvOcUTzs9gh%2BXOQ%40mail.gmail.com.