On 4/18/23 08:28, Patrick O'Neill wrote:
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.
This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.
gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.
2023-04-18 Patrick O'Neill <patr...@rivosinc.com>
PR target/104338
* riscv-protos.h: Add helper function stubs.
* riscv.cc: Add helper functions for subword masking.
* riscv.opt: Add command-line flag.
* sync.md: Add masking logic and inline asm for fetch_and_op,
fetch_and_nand, CAS, and exchange ops.
* invoke.texi: Add blurb regarding command-line flag.
* inline-atomics-1.c: New test.
* inline-atomics-2.c: Likewise.
* inline-atomics-3.c: Likewise.
* inline-atomics-4.c: Likewise.
* inline-atomics-5.c: Likewise.
* inline-atomics-6.c: Likewise.
* inline-atomics-7.c: Likewise.
* inline-atomics-8.c: Likewise.
* atomic.c: Add reference to duplicate logic.
So for others who may be interested. The motivation here is that for a
sub-word atomic we currently have to explicitly link in libatomic or we
get undefined symbols.
This is particularly problematical for the distros because we're one of
the few (only?) architectures supported by the distros that require
linking in libatomic for these cases. THe distros don't want to adjust
each affected packages and be stuck carrying that change forward or
negotiating with all the relevant upstreams. The distros might tackle
this problem by porting this patch into their compiler tree which has
its own set of problems with long term maintenance.
The net is from a usability standpoint it's best if we get this problem
addressed and backported to our gcc-13 RISC-V coordination branch.
We had held this up pending resolution of some other issues in the
atomics space. In retrospect that might have been a mistake.
So with that background... Here we go...
+/* Helper function for extracting a subword from memory. */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+ rtx *not_mask)
So I'd expand on that comment. The idea is we would like someone
working in the backend to be able to read the function comment and have
a reasonable sense of what the function does as well as the inputs and
return value. So perhaps something like this:
/* Given memory reference MEM, expand code to compute the aligned
memory address, shift and mask values and store them into
*ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK. */
Or something like that.
+{
+ /* Align the memory addess to a word. */
s/addess/address/
+ rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+ rtx aligned_addr = gen_reg_rtx (Pmode);
+ emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr,
+ gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better.
That may result in needing to rewrap this code. I'd bring the
gen_rtx_AND down on a new line, aligned with aligned_addr.
Presumably using SImode is intentional here rather than wanting to use
word_mode which would be SImode for rv32 and DImode for rv64? I'm going
to work based on that assumption, but if it isn't there's more work to
do to generalize this code.
+
+ /* Calculate the shift amount. */
+ emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+ gen_int_mode (3, SImode)));
+ emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+ gen_int_mode(3, SImode)));
Formatting nit. space after before open paren in the gen_int_mode call
on that last line above. This minor goof shows up in various places,
please review the patch as a whole looking for similar nits.
+
+ /* Calculate the mask. */
+ int unshifted_mask;
+ if (GET_MODE (mem) == QImode)
+ unshifted_mask = 0xFF;
+ else
+ unshifted_mask = 0xFFFF;
Can you just use GET_MASK_MODE here which should simplify this to
something like
unshifted_mask = GET_MODE_MASK (GET_MODE (mem));
+
+(define_expand "atomic_fetch_nand<mode>"
+ [(set (match_operand:SHORT 0 "register_operand" "=&r")
+ (match_operand:SHORT 1 "memory_operand" "+A"))
+ (set (match_dup 1)
+ (unspec_volatile:SHORT
+ [(not:SHORT (and:SHORT (match_dup 1)
+ (match_operand:SHORT 2 "reg_or_0_operand"
"rJ")))
+ (match_operand:SI 3 "const_int_operand")] ;; model
+ UNSPEC_SYNC_OLD_OP_SUBWORD))]
+ "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand. They
don't hurt anything though. They do document expectations, but then you
have to maintain them over time. I'm OK leaving them, mostly wanted to
make sure you're aware they aren't strictly necessary for a define_expand.
+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+ [(set (match_operand:SHORT 0 "register_operand" "=&r") ;; old
value at mem
+ (match_operand:SHORT 1 "memory_operand" "+A")) ;;
mem location
+ (set (match_dup 1)
+ (unspec_volatile:SHORT
+ [(any_atomic:SHORT (match_dup 1)
+ (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value
for op
+ (match_operand:SI 3 "const_int_operand")] ;; model
+ UNSPEC_SYNC_OLD_OP_SUBWORD))]
+ "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+ /* We have no QImode/HImode atomics, so form a mask, then use
+ subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+ operation. */
+
+ /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+ is disabled */
Makes me wonder if we should expose builtins that could be used by
atomic.c rather than having the logic open coded in two places. That
could be a follow-up in my opinion rather than a blocker for this patch.
+(define_expand "atomic_compare_and_swap<mode>"
+ [(match_operand:SI 0 "register_operand" "") ;; bool output
+ (match_operand:SHORT 1 "register_operand" "") ;; val output
+ (match_operand:SHORT 2 "memory_operand" "") ;; memory
+ (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+ (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+ (match_operand:SI 5 "const_int_operand" "") ;; is_weak
+ (match_operand:SI 6 "const_int_operand" "") ;; mod_s
+ (match_operand:SI 7 "const_int_operand" "")] ;; mod_f
+ "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
I nearly suggested you use this form for the earlier expanders where the
RTL in the expander is ultimately thrown away because we generate all
the RTL we need in the C fragment and finish with a "DONE" tag.
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4"
"fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+ __sync_fetch_and_add(&foo, 1);
+ __sync_fetch_and_nand(&bar, 1);
+ __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls. */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4"
"fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" }
} */
Note that you can #include another test file. When you do that the
dg-directives in the #included file are ignored. Meaning that you can
share test code, but use different dg-directives. So for inline-2.c,
you can have the dg-* directives above, followed by:
#include "inline-atomics-1.c"
Overall it looks pretty close to ready.
Jeff