On Mon, Dec 1, 2014 at 1:51 PM, Hans Boehm <bo...@acm.org> wrote: > Needless to say, I would clearly also like to see a simple correspondence. > > But this does raise the interesting question of whether put/get and > store(..., memory_order_relaxed)/load(memory_order_relaxed) are intended to > have similar semantics. I would guess not, in that the former don't satisfy > coherence; accesses to the same variable can be reordered as for normal > variable accesses, while the C++11/C11 variants do provide those guarantees. > On most, but not all, architectures that's entirely a compiler issue; the > hardware claims to provide that guarantee. > > This affects, for example, whether a variable that is only ever incremented > by one thread can appear to another thread to decrease in value. Or if a > reference set to a non-null value exactly once can appear to change back to > null after appearing non-null. In my opinion, it makes sense to always > provide coherence for atomics, since the overhead is small, and so are the > odds of getting code relying on non-coherent racing accesses correct. But > for ordinary variables whose accesses are not intended to race the > trade-offs are very different.
It would be nice to pretend that ordinary java loads and stores map perfectly to C11 relaxed loads and stores. This maps well to the lack of undefined behavior for data races in Java. But this fails also with lack of atomicity of Java longs and doubles. I have no intuition as to whether always requiring per-variable sequential consistency would be a performance problem. Introducing an explicit relaxed memory order mode in Java when the distinction between ordinary access is smaller than in C/C++ 11 would be confusing. Despite all that, it would be clean, consistent and seemingly straightforward to simply add all of the C/C++ atomic loads, stores and fences to sun.misc.Unsafe (with the possible exception of consume, which is still under a cloud). If that works out for jdk-internal code, we can add them to a public API. Providing the full set will help with interoperability with C code running in another thread accessing a direct buffer.