Re: [PATCH v5] RISCV: Inline subword atomic ops
On 4/18/23 14:48, Patrick O'Neill wrote: On 4/18/23 09:59, Jeff Law wrote: On 4/18/23 08:28, Patrick O'Neill wrote: ... + rtx addr = force_reg (Pmode, XEXP (mem, 0)); + + rtx aligned_addr = gen_reg_rtx (Pmode); + emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, + gen_int_mode (-4, Pmode))); So rather than -4 as a magic number, GET_MODE_MASK would be better. That may result in needing to rewrap this code. I'd bring the gen_rtx_AND down on a new line, aligned with aligned_addr. IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It doesn't have the granularity to generate 0x3 (which we can NOT to get -4). I searched the GCC internals docs but couldn't find a function that does address alignment masks. Yea, yea. Big "duh" on my side. Presumably using SImode is intentional here rather than wanting to use word_mode which would be SImode for rv32 and DImode for rv64? I'm going to work based on that assumption, but if it isn't there's more work to do to generalize this code. It's been a year but IIRC it was just simpler to implement (and to me it didn't make sense to use 64 bits for a subword op). Is there a benefit in using 64 bit instructions when computing subwords? Given that rv64 should have 32bit load/stores, I don't offhand see any advantage. + +(define_expand "atomic_fetch_nand" + [(set (match_operand:SHORT 0 "register_operand" "=&r") + (match_operand:SHORT 1 "memory_operand" "+A")) + (set (match_dup 1) + (unspec_volatile:SHORT + [(not:SHORT (and:SHORT (match_dup 1) + (match_operand:SHORT 2 "reg_or_0_operand" "rJ"))) + (match_operand:SI 3 "const_int_operand")] ;; model + UNSPEC_SYNC_OLD_OP_SUBWORD))] + "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" Just a note, constraints aren't necessary for a define_expand. They don't hurt anything though. They do document expectations, but then you have to maintain them over time. I'm OK leaving them, mostly wanted to make sure you're aware they aren't strictly necessary for a define_expand. I wasn't aware, thanks for pointing it out! - you're referring to the "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register constraints) right? I was referring to the register constraints like "=&r". They're ignored on define_expand constructors. A define_expand generates RTL that will be matched later by a define_insn. The "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" is usually referred to as the insn condition. Thanks for reviewing! NP. Looking forward to V6 which I expect will be ready for inclusion. jeff
Re: [PATCH v5] RISCV: Inline subword atomic ops
On 4/18/23 09:59, Jeff Law wrote: On 4/18/23 08:28, Patrick O'Neill wrote: ... + rtx addr = force_reg (Pmode, XEXP (mem, 0)); + + rtx aligned_addr = gen_reg_rtx (Pmode); + emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, + gen_int_mode (-4, Pmode))); So rather than -4 as a magic number, GET_MODE_MASK would be better. That may result in needing to rewrap this code. I'd bring the gen_rtx_AND down on a new line, aligned with aligned_addr. IIUC GET_MODE_MASK generates masks like 0xFF for QI (for example). It doesn't have the granularity to generate 0x3 (which we can NOT to get -4). I searched the GCC internals docs but couldn't find a function that does address alignment masks. Presumably using SImode is intentional here rather than wanting to use word_mode which would be SImode for rv32 and DImode for rv64? I'm going to work based on that assumption, but if it isn't there's more work to do to generalize this code. It's been a year but IIRC it was just simpler to implement (and to me it didn't make sense to use 64 bits for a subword op). Is there a benefit in using 64 bit instructions when computing subwords? + +(define_expand "atomic_fetch_nand" + [(set (match_operand:SHORT 0 "register_operand" "=&r") + (match_operand:SHORT 1 "memory_operand" "+A")) + (set (match_dup 1) + (unspec_volatile:SHORT + [(not:SHORT (and:SHORT (match_dup 1) + (match_operand:SHORT 2 "reg_or_0_operand" "rJ"))) + (match_operand:SI 3 "const_int_operand")] ;; model + UNSPEC_SYNC_OLD_OP_SUBWORD))] + "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" Just a note, constraints aren't necessary for a define_expand. They don't hurt anything though. They do document expectations, but then you have to maintain them over time. I'm OK leaving them, mostly wanted to make sure you're aware they aren't strictly necessary for a define_expand. I wasn't aware, thanks for pointing it out! - you're referring to the "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC", (not the register constraints) right? ... Thanks for reviewing! Patrick
Re: [PATCH v5] RISCV: Inline subword atomic ops
On 4/18/23 08:28, Patrick O'Neill wrote: RISC-V has no support for subword atomic operations; code currently generates libatomic library calls. This patch changes the default behavior to inline subword atomic calls (using the same logic as the existing library call). Behavior can be specified using the -minline-atomics and -mno-inline-atomics command line flags. gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm. This will need to stay for backwards compatibility and the -mno-inline-atomics flag. 2023-04-18 Patrick O'Neill PR target/104338 * riscv-protos.h: Add helper function stubs. * riscv.cc: Add helper functions for subword masking. * riscv.opt: Add command-line flag. * sync.md: Add masking logic and inline asm for fetch_and_op, fetch_and_nand, CAS, and exchange ops. * invoke.texi: Add blurb regarding command-line flag. * inline-atomics-1.c: New test. * inline-atomics-2.c: Likewise. * inline-atomics-3.c: Likewise. * inline-atomics-4.c: Likewise. * inline-atomics-5.c: Likewise. * inline-atomics-6.c: Likewise. * inline-atomics-7.c: Likewise. * inline-atomics-8.c: Likewise. * atomic.c: Add reference to duplicate logic. So for others who may be interested. The motivation here is that for a sub-word atomic we currently have to explicitly link in libatomic or we get undefined symbols. This is particularly problematical for the distros because we're one of the few (only?) architectures supported by the distros that require linking in libatomic for these cases. THe distros don't want to adjust each affected packages and be stuck carrying that change forward or negotiating with all the relevant upstreams. The distros might tackle this problem by porting this patch into their compiler tree which has its own set of problems with long term maintenance. The net is from a usability standpoint it's best if we get this problem addressed and backported to our gcc-13 RISC-V coordination branch. We had held this up pending resolution of some other issues in the atomics space. In retrospect that might have been a mistake. So with that background... Here we go... +/* Helper function for extracting a subword from memory. */ + +void +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask, + rtx *not_mask) So I'd expand on that comment. The idea is we would like someone working in the backend to be able to read the function comment and have a reasonable sense of what the function does as well as the inputs and return value. So perhaps something like this: /* Given memory reference MEM, expand code to compute the aligned memory address, shift and mask values and store them into *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK. */ Or something like that. +{ + /* Align the memory addess to a word. */ s/addess/address/ + rtx addr = force_reg (Pmode, XEXP (mem, 0)); + + rtx aligned_addr = gen_reg_rtx (Pmode); + emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, + gen_int_mode (-4, Pmode))); So rather than -4 as a magic number, GET_MODE_MASK would be better. That may result in needing to rewrap this code. I'd bring the gen_rtx_AND down on a new line, aligned with aligned_addr. Presumably using SImode is intentional here rather than wanting to use word_mode which would be SImode for rv32 and DImode for rv64? I'm going to work based on that assumption, but if it isn't there's more work to do to generalize this code. + + /* Calculate the shift amount. */ + emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr), + gen_int_mode (3, SImode))); + emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift, + gen_int_mode(3, SImode))); Formatting nit. space after before open paren in the gen_int_mode call on that last line above. This minor goof shows up in various places, please review the patch as a whole looking for similar nits. + + /* Calculate the mask. */ + int unshifted_mask; + if (GET_MODE (mem) == QImode) +unshifted_mask = 0xFF; + else +unshifted_mask = 0x; Can you just use GET_MASK_MODE here which should simplify this to something like unshifted_mask = GET_MODE_MASK (GET_MODE (mem)); + +(define_expand "atomic_fetch_nand" + [(set (match_operand:SHORT 0 "register_operand" "=&r") + (match_operand:SHORT 1 "memory_operand" "+A")) + (set (match_dup 1) + (unspec_volatile:SHORT + [(not:SHORT (and:SHORT (match_dup 1) +(match_operand:SHORT 2 "reg_or_0_operand" "rJ"))) + (match_operand:SI 3 "const_int_operand")] ;; model +UNSPEC_SYNC_OLD_OP_SUBWORD))] + "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC" Just a note, constraints aren't
Re: [PATCH v5] RISCV: Inline subword atomic ops
On Apr 18 2023, Patrick O'Neill wrote: > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index a38547f53e5..9c3e91d2fee 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -1227,6 +1227,7 @@ See RS/6000 and PowerPC Options. > -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{reg} > -mstack-protector-guard-offset=@var{offset} > -mcsr-check -mno-csr-check} > +-minline-atomics -mno-inline-atomics The options need to be inside @gccoptlist. > @@ -29006,6 +29007,13 @@ Do or don't use smaller but slower prologue and > epilogue code that uses > library function calls. The default is to use fast inline prologues and > epilogues. > > +@item -minline-atomics > +@itemx -mno-inline-atomics > +@opindex minline-atomics @opindex should precede @item. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[PATCH v5] RISCV: Inline subword atomic ops
RISC-V has no support for subword atomic operations; code currently generates libatomic library calls. This patch changes the default behavior to inline subword atomic calls (using the same logic as the existing library call). Behavior can be specified using the -minline-atomics and -mno-inline-atomics command line flags. gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm. This will need to stay for backwards compatibility and the -mno-inline-atomics flag. 2023-04-18 Patrick O'Neill PR target/104338 * riscv-protos.h: Add helper function stubs. * riscv.cc: Add helper functions for subword masking. * riscv.opt: Add command-line flag. * sync.md: Add masking logic and inline asm for fetch_and_op, fetch_and_nand, CAS, and exchange ops. * invoke.texi: Add blurb regarding command-line flag. * inline-atomics-1.c: New test. * inline-atomics-2.c: Likewise. * inline-atomics-3.c: Likewise. * inline-atomics-4.c: Likewise. * inline-atomics-5.c: Likewise. * inline-atomics-6.c: Likewise. * inline-atomics-7.c: Likewise. * inline-atomics-8.c: Likewise. * atomic.c: Add reference to duplicate logic. Signed-off-by: Patrick O'Neill Signed-off-by: Palmer Dabbelt --- v4: https://inbox.sourceware.org/gcc-patches/20220821215823.18207-1-pal...@rivosinc.com/ Rebased v4 and addressed Kito Cheng's comments. No new failures on trunk. --- The mapping implemented here matches Libatomic. That mapping changes if "Implement ISA Manual Table A.6 Mappings" is merged. Depending on which patch is merged first, I will update the other to make sure the correct mapping is emitted. https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615748.html --- gcc/config/riscv/riscv-protos.h | 2 + gcc/config/riscv/riscv.cc | 50 ++ gcc/config/riscv/riscv.opt| 4 + gcc/config/riscv/sync.md | 314 ++ gcc/doc/invoke.texi | 8 + .../gcc.target/riscv/inline-atomics-1.c | 18 + .../gcc.target/riscv/inline-atomics-2.c | 19 + .../gcc.target/riscv/inline-atomics-3.c | 569 ++ .../gcc.target/riscv/inline-atomics-4.c | 566 + .../gcc.target/riscv/inline-atomics-5.c | 87 +++ .../gcc.target/riscv/inline-atomics-6.c | 87 +++ .../gcc.target/riscv/inline-atomics-7.c | 69 +++ .../gcc.target/riscv/inline-atomics-8.c | 69 +++ libgcc/config/riscv/atomic.c | 2 + 14 files changed, 1864 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-7.c create mode 100644 gcc/testsuite/gcc.target/riscv/inline-atomics-8.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 5244e8dcbf0..02b33e02020 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -79,6 +79,8 @@ extern void riscv_reinit (void); extern poly_uint64 riscv_regmode_natural_size (machine_mode); extern bool riscv_v_ext_vector_mode_p (machine_mode); extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT); +extern void riscv_subword_address (rtx, rtx *, rtx *, rtx *, rtx *); +extern void riscv_lshift_subword (machine_mode, rtx, rtx, rtx *); /* Routines implemented in riscv-c.cc. */ void riscv_cpu_cpp_builtins (cpp_reader *); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index e4937d1af25..fa0247be22f 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -7143,6 +7143,56 @@ riscv_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs) & ~zeroed_hardregs); } +/* Helper function for extracting a subword from memory. */ + +void +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask, + rtx *not_mask) +{ + /* Align the memory addess to a word. */ + rtx addr = force_reg (Pmode, XEXP (mem, 0)); + + rtx aligned_addr = gen_reg_rtx (Pmode); + emit_move_insn (aligned_addr, gen_rtx_AND (Pmode, addr, + gen_int_mode (-4, Pmode))); + + *aligned_mem = change_address (mem, SImode, aligned_addr); + + /* Calculate the shift amount. */ + emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr), + gen_int_mode (3, SImode))); + emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift, +