On 4/27/23 10:22, Patrick O'Neill wrote:
This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157

Context
---------
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].

For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
   https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/
Right.  I recall this discussion, but thanks for the back reference.


Why not A.6?
---------
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
   https://lists.riscv.org/g/tech-privileged/message/1284

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.
So I like that we can have compatible sequences across A.6 and A.7. Of course the concern is performance ;-)



What does a mapping compatible with both A.6 & A.7 look like?
---------
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.
Right. So my worry here is silicon that is either already available or coming online shortly. Those implementations simply aren't going to be able to use the A.7 mapping, so they pay a penalty. Does it make sense to have the compatibility fences conditional?




Benchmark Interpretation
--------
As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.
Yea, that's a bit of a surprise.


Most machines in the wild are expected to use Table A.7 once the
instructions are introduced.
Incurring this added cost now will make it easier for compiled RISC-V
binaries to transition to the A.7 memory model mapping.

The performance benefits of moving to A.7 can be more clearly seen using
an almost-all-load microbenchmark (included on page 3 of Hans’
proposal). The code for that microbenchmark is attached below [5].
   
https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf
   https://lists.riscv.org/g/tech-unprivileged/topic/92916241
Yea. I'm not questioning the value of the new instructions that are on the horizon, just the value of trying to make everything A.7 compatible.



Conformance test cases notes
--------
The conformance tests in this patch are a good sanity check but do not
guarantee exactly following Table A.6. It checks that the right
instructions are emitted (ex. fence rw,r) but not the order of those
instructions.
Note there is a way to check ordering as well. You might look at the check-function-bodies approach. I think there are some recent examples in the gcc risc-v specific tests.



LLVM mapping notes
--------
LLVM emits corresponding fences for atomic_signal_fence instructions.
This seems to be an oversight since AFAIK atomic_signal_fence acts as a
compiler directive. GCC does not emit any fences for atomic_signal_fence
instructions.
This starts to touch on a larger concern. Specifically I'd really like the two compilers to be compatible in terms of the code they generate for the various atomics.

What I worry about is code being written (by design or accident) that is dependent on the particular behavior of one compiler and then if that code gets built with the other compiler, and we end up different behavior. Worse yet, if/when this happens, it's likely to be tough to expose, reproduce & debug.

Do you have any sense of where Clang/LLVM is going to go WRT providing an A.6 mapping that is compatible with A.7 by using the additional fences?


Jeff

Reply via email to