On 4/28/23 09:29, Palmer Dabbelt wrote:
On Fri, 28 Apr 2023 09:14:00 PDT (-0700), jeffreya...@gmail.com wrote:


On 4/27/23 10:22, Patrick O'Neill wrote:
This patchset aims to make the RISCV atomics implementation stronger
than the recommended mapping present in table A.6 of the ISA manual.
https://github.com/riscv/riscv-isa-manual/blob/c7cf84547b3aefacab5463add1734c1602b67a49/src/memory.tex#L1083-L1157

Context
---------
GCC defined RISC-V mappings [1] before the Memory Model task group
finalized their work and provided the ISA Manual Table A.6/A.7 mappings[2].

For at least a year now, we've known that the mappings were different,
but it wasn't clear if these unique mappings had correctness issues.

Andrea Parri found an issue with the GCC mappings, showing that
atomic_compare_exchange_weak_explicit(-,-,-,release,relaxed) mappings do
not enforce release ordering guarantees. (Meaning the GCC mappings have
a correctness issue).
https://inbox.sourceware.org/gcc-patches/Y1GbJuhcBFpPGJQ0@andrea/
Right.  I recall this discussion, but thanks for the back reference.

Yep, and it's an important one: that's why we're calling the change a bug fix and dropping the current GCC mappings.  If we didn't have the bug we'd be talking about an ABI break, and since the GCC mappings predate the ISA mappings we'd likely need an additional compatibility mode.

So I guess we're lucky that we have a concurrency bug.  I think it's the first time I've said that ;)

Why not A.6?
---------
We can update our mappings now, so the obvious choice would be to
implement Table A.6 (what LLVM implements/ISA manual recommends).

The reason why that isn't the best path forward for GCC is due to a
proposal by Hans Boehm to add L{d|w|b|h}.aq/rl and S{d|w|b|h}.aq/rl.

For context, there is discussion about fast-tracking the addition of
these instructions. The RISCV architectural review committee supports
adopting a "new and common atomics ABI for gcc and LLVM toochains ...
that assumes the addition of the preceding instructions”. That common
ABI is likely to be A.7.
   https://lists.riscv.org/g/tech-privileged/message/1284

Transitioning from A.6 to A.7 will cause an ABI break. We can hedge
against that risk by emitting a conservative fence after SEQ_CST stores
to make the mapping compatible with both A.6 and A.7.
So I like that we can have compatible sequences across A.6 and A.7.  Of
course the concern is performance ;-)



What does a mapping compatible with both A.6 & A.7 look like?
---------
It is exactly the same as Table A.6, but SEQ_CST stores have a trailing
fence rw,rw. It's strictly stronger than Table A.6.
Right.  So my worry here is silicon that is either already available or
coming online shortly.   Those implementations simply aren't going to be
able to use the A.7 mapping, so they pay a penalty.  Does it make sense
to have the compatibility fences conditional?

IIRC this was discussed somewhere in some thread, but I think there's really three ABIs that could be implemented here (ignoring the current GCC mappings as they're broken):

* ABI compatible with the current mappings in the ISA manual (A.6).   This will presumably perform best on extant hardware, given that it's  what the words in the PDF say to do. * ABI compatible with the proposed mappings for the ISA manual (A.7).   This may perform better on new hardware. * ABI compatible with both A.6 and A.7.  This is likely slow on both new  and old hardware, but allows cross-linking.  If there's no performance  issues this would be the only mode we need, but that seems unlikely.

IMO those should be encoded somewhere in the ELF.  I'd just do it as two bits in the header, but last time I proposed header bits the psABI folks wanted to do something different.  I don't think where we encode this matters all that much, but if we're doing to treat these as real long-term ABIs we should have some way to encode that.

There's also the orthogonal axis of whether we use the new instructions.  Those aren't in specs yet so I think we can hold off on them for a bit, but they're the whole point of doing the ABI break so we should at least think them over.  I think we're OK because we've just split out the ABI from the ISA here, but I'm not sure if I'm missing something.

Now that I wrote that, though, I remember talking to Patrick about it and we drew a bunch of stuff on the whiteboard and then got confused.  So sorry if I'm just out of the loop here...
This looks up-to-date with how I understand it.


Benchmark Interpretation
--------
As expected, out of order machines are significantly faster with the
REL_STORE mappings. Unexpectedly, the in-order machines are
significantly slower with REL_STORE rather than REL_STORE_FENCE.
Yea, that's a bit of a surprise.


Most machines in the wild are expected to use Table A.7 once the
instructions are introduced.
Incurring this added cost now will make it easier for compiled RISC-V
binaries to transition to the A.7 memory model mapping.

The performance benefits of moving to A.7 can be more clearly seen using
an almost-all-load microbenchmark (included on page 3 of Hans’
proposal). The code for that microbenchmark is attached below [5].
https://lists.riscv.org/g/tech-unprivileged/attachment/382/0/load-acquire110422.pdf
   https://lists.riscv.org/g/tech-unprivileged/topic/92916241
Yea.  I'm not questioning the value of the new instructions that are on
the horizon, just the value of trying to make everything A.7 compatible.



Conformance test cases notes
--------
The conformance tests in this patch are a good sanity check but do not
guarantee exactly following Table A.6. It checks that the right
instructions are emitted (ex. fence rw,r) but not the order of those
instructions.
Note there is a way to check ordering as well.  You might look at the
check-function-bodies approach.  I think there are some recent examples
in the gcc risc-v specific tests.
I'll check that out - thank you!

LLVM mapping notes
--------
LLVM emits corresponding fences for atomic_signal_fence instructions.
This seems to be an oversight since AFAIK atomic_signal_fence acts as a
compiler directive. GCC does not emit any fences for atomic_signal_fence
instructions.
This starts to touch on a larger concern.  Specifically I'd really like
the two compilers to be compatible in terms of the code they generate
for the various atomics.

What I worry about is code being written (by design or accident) that is
dependent on the particular behavior of one compiler and then if that
code gets built with the other compiler, and we end up different
behavior.  Worse yet, if/when this happens, it's likely to be tough to
expose, reproduce & debug.
Agreed.

I'll open an issue with LLVM and see what they have to say about this
particular behavior. Ideally we'd have perfectly compatible compilers
(for atomic ops) by the end of this :)

AFAICT GCC hasn't ever been emitting fences for these instructions.
(& This behavior isn't touched by the patchset).

Do you have any sense of where Clang/LLVM is going to go WRT providing
an A.6 mapping that is compatible with A.7 by using the additional fences?
I don't - Based on how LLVM initially waited for A.6, I'd speculate that
they would wait for an official compatibility mapping to be added the
PSABI doc or ISA manual before implementing it.

I imagine there will be demand for an ABI-compatible mapping so the ABI
break has a transition plan, but the timeline for LLVM isn't clear to me.

Patrick

Jeff

Reply via email to