Torvald, thank you for your output, but I think, this discussion gets a little pointless. There is nothing else I can add since gcc folks are reluctant to this change anyway. In my opinion, there is no compelling reason against such an implementation (it is perfectly fine with the standard, read-only memory is not guaranteed for atomic_load anyway). Even binary compatibility that was mentioned is unlikely to be an issue if implemented as I described. And finally this is something that can actually be useful in practice (at least as far as I can judge from my experience). By the way, this issue was already raised multiple times during last couple of years by different people who actually use it for various real projects (bugs were eventually closed as 'INVALID'). All described challenges are purely technical and can easily be resolved. Moreover, clang/llvm chose this implementation, and it seems very logical and non-confusing to me. It certainly makes sense to expose hardware capabilities through standard interfaces whenever possible.
For my projects, I will simply fall back to my own implementation using inline assembly (at least for now) because, unfortunately, it is the only thing that is guaranteed to work outside of clang/llvm in the foreseeable future (__sync functions have some limitations and do not look like an attractive option either, by the way). On Tuesday, February 27, 2018 11:21 AM, Torvald Riegel <trie...@redhat.com> wrote: On Tue, 2018-02-27 at 13:16 +0000, Ruslan Nikolaev via gcc wrote: > > 3) Torvald pointed out further considerations such as users expecting > > lock-free atomic loads to be faster than stores. > > Is it even true? Is it faster to use some global lock (implemented through > RMW) than a single RMW operation? If you use this global lock, you will not > get loads faster than stores. If GCC declares a type as lock-free, atomic loads on this type will be natively supported through some sort of load instruction. That means they are faster than stores under concurrent accesses, in particular when there are concurrent atomic loads (for all major HW we care about). If there is no natively supported atomic load, GCC will not declare the type to be lock-free. Nobody made statement about performance of locks vs. RMWs.