On 4/18/23 08:28, Patrick O'Neill wrote:
RISC-V has no support for subword atomic operations; code currently
generates libatomic library calls.

This patch changes the default behavior to inline subword atomic calls
(using the same logic as the existing library call).
Behavior can be specified using the -minline-atomics and
-mno-inline-atomics command line flags.

gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
This will need to stay for backwards compatibility and the
-mno-inline-atomics flag.

2023-04-18 Patrick O'Neill <patr...@rivosinc.com>

        PR target/104338
        * riscv-protos.h: Add helper function stubs.
        * riscv.cc: Add helper functions for subword masking.
        * riscv.opt: Add command-line flag.
        * sync.md: Add masking logic and inline asm for fetch_and_op,
        fetch_and_nand, CAS, and exchange ops.
        * invoke.texi: Add blurb regarding command-line flag.
        * inline-atomics-1.c: New test.
        * inline-atomics-2.c: Likewise.
        * inline-atomics-3.c: Likewise.
        * inline-atomics-4.c: Likewise.
        * inline-atomics-5.c: Likewise.
        * inline-atomics-6.c: Likewise.
        * inline-atomics-7.c: Likewise.
        * inline-atomics-8.c: Likewise.
        * atomic.c: Add reference to duplicate logic.
So for others who may be interested. The motivation here is that for a sub-word atomic we currently have to explicitly link in libatomic or we get undefined symbols.

This is particularly problematical for the distros because we're one of the few (only?) architectures supported by the distros that require linking in libatomic for these cases. THe distros don't want to adjust each affected packages and be stuck carrying that change forward or negotiating with all the relevant upstreams. The distros might tackle this problem by porting this patch into their compiler tree which has its own set of problems with long term maintenance.

The net is from a usability standpoint it's best if we get this problem addressed and backported to our gcc-13 RISC-V coordination branch.

We had held this up pending resolution of some other issues in the atomics space. In retrospect that might have been a mistake.

So with that background...  Here we go...

+/* Helper function for extracting a subword from memory. */
+
+void
+riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
+                      rtx *not_mask)
So I'd expand on that comment. The idea is we would like someone working in the backend to be able to read the function comment and have a reasonable sense of what the function does as well as the inputs and return value. So perhaps something like this:

/* Given memory reference MEM, expand code to compute the aligned
   memory address, shift and mask values and store them into
   *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */

Or something like that.


+{
+  /* Align the memory addess to a word.  */
s/addess/address/


+  rtx addr = force_reg (Pmode, XEXP (mem, 0));
+
+  rtx aligned_addr = gen_reg_rtx (Pmode);
+  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
+                                             gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. That may result in needing to rewrap this code. I'd bring the gen_rtx_AND down on a new line, aligned with aligned_addr.

Presumably using SImode is intentional here rather than wanting to use word_mode which would be SImode for rv32 and DImode for rv64? I'm going to work based on that assumption, but if it isn't there's more work to do to generalize this code.



+
+  /* Calculate the shift amount.  */
+  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
+                                      gen_int_mode (3, SImode)));
+  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
+                                         gen_int_mode(3, SImode)));


Formatting nit. space after before open paren in the gen_int_mode call on that last line above. This minor goof shows up in various places, please review the patch as a whole looking for similar nits.



+
+  /* Calculate the mask.  */
+  int unshifted_mask;
+  if (GET_MODE (mem) == QImode)
+    unshifted_mask = 0xFF;
+  else
+    unshifted_mask = 0xFFFF;
Can you just use GET_MASK_MODE here which should simplify this to something like

unshifted_mask = GET_MODE_MASK (GET_MODE (mem));








+
+(define_expand "atomic_fetch_nand<mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")
+       (match_operand:SHORT 1 "memory_operand" "+A"))
+   (set (match_dup 1)
+       (unspec_volatile:SHORT
+         [(not:SHORT (and:SHORT (match_dup 1)
+                                (match_operand:SHORT 2 "reg_or_0_operand" 
"rJ")))
+          (match_operand:SI 3 "const_int_operand")] ;; model
+        UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand. They don't hurt anything though. They do document expectations, but then you have to maintain them over time. I'm OK leaving them, mostly wanted to make sure you're aware they aren't strictly necessary for a define_expand.






+
+(define_expand "atomic_fetch_<atomic_optab><mode>"
+  [(set (match_operand:SHORT 0 "register_operand" "=&r")             ;; old 
value at mem
+       (match_operand:SHORT 1 "memory_operand" "+A"))                    ;; 
mem location
+   (set (match_dup 1)
+       (unspec_volatile:SHORT
+         [(any_atomic:SHORT (match_dup 1)
+                    (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value 
for op
+          (match_operand:SI 3 "const_int_operand")]                ;; model
+        UNSPEC_SYNC_OLD_OP_SUBWORD))]
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
+{
+  /* We have no QImode/HImode atomics, so form a mask, then use
+     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
+     operation. */
+
+  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
+     is disabled */
Makes me wonder if we should expose builtins that could be used by atomic.c rather than having the logic open coded in two places. That could be a follow-up in my opinion rather than a blocker for this patch.


+(define_expand "atomic_compare_and_swap<mode>"
+  [(match_operand:SI 0 "register_operand" "")    ;; bool output
+   (match_operand:SHORT 1 "register_operand" "") ;; val output
+   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
+   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
+   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
+   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
+   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
+   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
+  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
I nearly suggested you use this form for the earlier expanders where the RTL in the expander is ultimately thrown away because we generate all the RTL we need in the C fragment and finish with a "DONE" tag.


diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c 
b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
new file mode 100644
index 00000000000..5c5623d9b2f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-mno-inline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" 
"fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
+
+char foo;
+char bar;
+char baz;
+
+int
+main ()
+{
+  __sync_fetch_and_add(&foo, 1);
+  __sync_fetch_and_nand(&bar, 1);
+  __sync_bool_compare_and_swap (&baz, 1, 2);
+}
diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c 
b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
new file mode 100644
index 00000000000..fdce7a5d71f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Verify that subword atomics do not generate calls.  */
+/* { dg-options "-minline-atomics" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" 
"fetch_and_nand" { target *-*-* } 0 } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
+/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } 
} */
Note that you can #include another test file. When you do that the dg-directives in the #included file are ignored. Meaning that you can share test code, but use different dg-directives. So for inline-2.c, you can have the dg-* directives above, followed by:

#include "inline-atomics-1.c"

Overall it looks pretty close to ready.

Jeff

Reply via email to