Torvald, I definitely do not want to insist on this design choice, but it makes sense to at least seriuously consider it given the concerns I described. And especially because IFFUNC in libatomic already redirects to cmpxchg16b, so it just adds extra cost and indirection. Quite frankly, I do not even see any serious problem here with respect to binary compatibility. Even if cmpxchg16b was not used on some platforms outside Linux, old binaries will go to libatomic which can now be updated to simply use cmpxchg16b. (Even for statically linked should not be an issue -- they will not have any direct interaction with newer binaries.)
> Not getting the performance usually associated with atomic loads can be > a big problem for code that tries to be portable. I do not think it is a common use case anyway. How often atomic_load is used on double-width operations? If a programmer needs some guarantees and does not care about lock-freedom, why not use a regular lock here? This way nothing magical happens. Otherwise, he will may hit unexpected issues in places like signal handlers (which is hard to debug since it will hang only once in a while). With cmpxchg16b, it is at least more or less reproducible: if you tried to use it on read-only memory, you will immediately get a segfault. > I think I now remember why we "didn't fix" libatomic: There might be > compiled code out there that does use the wide CAS, so changing > libatomic from the status quo to using its intenral locks could break > programs. Well, it already happens for Linux and glibc. There nothing will break. For other architectures, it would be good to implement the same, so that consistent behavior is observed everywhere. > No, they only said that it doesn't need to be a concern for the > standard. Implementations have to pay attention to more things, so it > is a concern for implementation. Yes, but the only problem I see is that it is currently placed to .rodata when const is used. It is easy to resolve: just do not place it there for _Atomic objects > 8 bytes. Then also clarify that a programmer cannot safely cast some arbitrary object that can be placed in .rodata to use with atomic_load. It needs to be addressed anyway, as there is already a segfault for provided example in x86-64 and Linux even with redirection to libatomic. > It's not "visible" in the abstract machine under some setting of the > as-if rule. But it is definitely visible in an implementation in which > the effects of read-only memory are visible (see my example of mapping > memory from another process read-only so as to read data from that > process). True but it is not defined for read-only memory anyway, and no assumptions can be made in portable code. -- Ruslan