Torvald, I definitely do not want to insist on this design choice, but it makes 
sense to at least seriuously consider it given the concerns I described. And 
especially because IFFUNC in libatomic already redirects to cmpxchg16b, so it 
just adds extra cost and indirection. Quite frankly, I do not even see any 
serious problem here with respect to binary compatibility. Even if cmpxchg16b 
was not used on some platforms outside Linux, old binaries will go to libatomic 
which can now be updated to simply use cmpxchg16b. (Even for statically linked 
should not be an issue -- they will not have any direct interaction with newer 
binaries.)

 > Not getting the performance usually associated with atomic loads can be
> a big problem for code that tries to be portable.

I do not think it is a common use case anyway. How often atomic_load is used on 
double-width operations? If a programmer needs some guarantees and does not 
care about lock-freedom, why not use a regular lock here? This way nothing 
magical happens. Otherwise, he will may hit unexpected issues in places like 
signal handlers (which is hard to debug since it will hang only once in a 
while). With cmpxchg16b, it is at least more or less reproducible: if you tried 
to use it on read-only memory, you will immediately get a segfault.

> I think I now remember why we "didn't fix" libatomic: There might be
> compiled code out there that does use the wide CAS, so changing
> libatomic from the status quo to using its intenral locks could break
> programs.
Well, it already happens for Linux and glibc. There nothing will break. For 
other architectures, it would be good to implement the same, so that consistent 
behavior is observed everywhere.


> No, they only said that it doesn't need to be a concern for the
> standard.  Implementations have to pay attention to more things, so it
> is a concern for implementation.
Yes, but the only problem I see is that it is currently placed to .rodata when 
const is used. It is easy to resolve: just do not place it there for _Atomic 
objects > 8 bytes. Then also clarify that a programmer cannot safely cast some 
arbitrary object that can be placed in .rodata to use with atomic_load.
It needs to be addressed anyway, as there is already a segfault for provided 
example in x86-64 and Linux even with redirection to libatomic.

> It's not "visible" in the abstract machine under some setting of the
> as-if rule.  But it is definitely visible in an implementation in which
> the effects of read-only memory are visible (see my example of mapping
> memory from another process read-only so as to read data from that
> process).
True but it is not defined for read-only memory anyway, and no assumptions can 
be made in portable code. 

-- Ruslan






   

Reply via email to