[gcc r15-2390] [target/116104] Fix test guarding UINTVAL to extract shift count

2024-07-29 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:5ab9a351247a551c47b0ab9d8e8b907223e7faf6

commit r15-2390-g5ab9a351247a551c47b0ab9d8e8b907223e7faf6
Author: Jeff Law 
Date:   Mon Jul 29 16:17:25 2024 -0600

[target/116104] Fix test guarding UINTVAL to extract shift count

Minor oversight in the ext-dce bits.  If the shift count is a constant 
vector,
then we shouldn't be extracting values with [U]INTVAL.  We guarded that test
with CONSTANT_P, when it should have been CONSTANT_INT_P.

Shows up on gcn, but I wouldn't be terribly surprised if it could be 
triggered
elsewhere.

Verified the testcase compiles on gcn.  Haven't done a libgcc build for gcn
though.  Also verified x86 bootstraps and regression tests cleanly.

Pushing to the trunk.

PR target/116104
gcc/
* ext-dce.cc (carry_backpropagate): Fix test guarding UINTVAL
extraction of shift count.

Diff:
---
 gcc/ext-dce.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 14f163a01d63..f7b0eb114180 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -493,7 +493,7 @@ carry_backpropagate (unsigned HOST_WIDE_INT mask, enum 
rtx_code code, rtx x)
 /* We propagate for the shifted operand, but not the shift
count.  The count is handled specially.  */
 case ASHIFT:
-  if (CONSTANT_P (XEXP (x, 1))
+  if (CONST_INT_P (XEXP (x, 1))
  && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
return (HOST_WIDE_INT)mask >> INTVAL (XEXP (x, 1));
   return (2ULL << floor_log2 (mask)) - 1;


[gcc r15-2352] [RISC-V][target/116085] Fix rv64 minmax extension avoidance splitter

2024-07-26 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6e5aae47e3b910f9af6983f744d7a3e2dcecba1d

commit r15-2352-g6e5aae47e3b910f9af6983f744d7a3e2dcecba1d
Author: Jeff Law 
Date:   Fri Jul 26 17:30:08 2024 -0600

[RISC-V][target/116085] Fix rv64 minmax extension avoidance splitter

A patch introduced a pattern to avoid unnecessary extensions when doing a
min/max operation where one of the values is a 32 bit positive constant.

> (define_insn_and_split "*minmax"
>   [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
>   (subreg:SI
> (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 
"register_operand" "r"))
> (match_operand:DI 2 
"immediate_operand" "i"))
>0)))
>(clobber (match_scratch:DI 3 "="))
>(clobber (match_scratch:DI 4 "="))]
>   "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
>(set (match_dup 4) (match_dup 2))
>(set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))]

Lots going on in here.  The key is the nonconstant value is zero extended 
from
SI to DI in the original RTL and we know the constant value is unchanged if 
we
were to sign extend it from 32 to 64 bits.

We change the extension of the nonconstant operand from zero to sign 
extension.
I'm pretty confident the goal there is take advantage of the fact that SI
values are kept sign extended and will often be optimized away.

The problem occurs when the nonconstant operand has the SI sign bit set.  
As an
example:

smax (0x800, 0x7)  resulting in 0x8000

The split RTL will generate
smax (sign_extend (0x8000), 0x7))

smax (0x8000, 0x7) resulting in 0x7

Opps.

We really needed to change the opcode to umax for this transformation to 
work.
That's easy enough.  But there's further improvements we can make.

First the pattern is a define_and_split with a post-reload split condition. 
 It
would be better implemented as a 4->3 define_split so that the costing model
just works.  Second, if operands[1] is a suitably promoted subreg, then we 
can
elide the sign extension when we generate the split code, so often it'll be 
a
4->2 split, again with the cost model working with no adjustments needed.

Tested on rv32 and rv64 in my tester.  I'll wait for the pre-commit tester 
to
spin it as well.

PR target/116085
gcc/
* config/riscv/bitmanip.md (minmax extension avoidance splitter):
Rewrite as a simpler define_split.  Adjust the opcode appropriately.
Avoid emitting sign extension if it's clearly not needed.
* config/riscv/iterators.md (minmax_optab): Rename to uminmax_optab
and map everything to unsigned variants.

gcc/testsuite/
* gcc.target/riscv/pr116085.c: New test.

Diff:
---
 gcc/config/riscv/bitmanip.md  | 38 +++
 gcc/config/riscv/iterators.md |  9 
 gcc/testsuite/gcc.target/riscv/pr116085.c | 29 +++
 3 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 9fc5215d6e35..b19295cd9424 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -549,23 +549,33 @@
 
 ;; Optimize the common case of a SImode min/max against a constant
 ;; that is safe both for sign- and zero-extension.
-(define_insn_and_split "*minmax"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
(sign_extend:DI
  (subreg:SI
-   (bitmanip_minmax:DI (zero_extend:DI (match_operand:SI 1 
"register_operand" "r"))
-   (match_operand:DI 2 
"immediate_operand" "i"))
-  0)))
-   (clobber (match_scratch:DI 3 "="))
-   (clobber (match_scratch:DI 4 "="))]
+   (bitmanip_minmax:DI (zero_extend:DI
+ (match_operand:SI 1 "register_operand"))
+   (match_operand:DI 2 "immediate_operand")) 0)))
+   (clobber (match_operand:DI 3 "register_operand"))
+   (clobber (match_operand:DI 4 "register_operand"))]
   "TARGET_64BIT && TARGET_ZBB && sext_hwi (INTVAL (operands[2]), 32) >= 0"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
-   (set (match_dup 4) (match_dup 2))
-   (set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))]
-  ""
-  [(set_attr "type" "bitmanip")])
+  [(set (match_dup 0) (:DI (match_dup 4) (match_dup 3)))]
+  "
+{
+  /* Load the constant into a register.  */
+  emit_move_insn (operands[3], operands[2]);
+
+  /* If operands[1] is a sign 

[gcc r15-2321] [PR rtl-optimization/116039] Fix life computation for promoted subregs

2024-07-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:34fb0feca71f763b2fbe832548749666d34a4a76

commit r15-2321-g34fb0feca71f763b2fbe832548749666d34a4a76
Author: Jeff Law 
Date:   Thu Jul 25 12:32:28 2024 -0600

[PR rtl-optimization/116039] Fix life computation for promoted subregs

So this turned out to be a neat little test and while the fuzzer found it on
RISC-V, I wouldn't be surprised if the underlying issue is also the root 
cause
of the loongarch issue with ext-dce.

The key issue is that if we have something like

(set (dest) (any_extend (subreg (source

If the subreg object is marked with SUBREG_PROMOTED and the sign/unsigned 
state
matches the any_extend opcode, then combine (and I guess anything using
simplify-rtx) may simplify that to

(set (dest) (source))

That implies that bits outside the mode of the subreg are actually live and
valid.  This needs to be accounted for during liveness computation.

We have to be careful here though. If we're too conservative about setting
additional bits live, then we'll inhibit the desired optimization in the
coremark examples.  To do a good job we need to know the extension opcode.

I'm extremely unhappy with how the use handling works in ext-dce.  It mixes
different conceptual steps and has horribly complex control flow.  It only
handles a subset of the unary/binary opcodes, etc etc.  It's just damn mess.
It's going to need some more noodling around.

In the mean time this is a bit hacky in that it depends on non-obvious 
behavior
to know it can get the extension opcode, but I don't want to leave the 
trunk in
a broken state while I figure out the refactoring problem.

Bootstrapped and regression tested on x86 and tested on the crosses.  
Pushing to the trunk.

PR rtl-optimization/116039
gcc/
* ext-dce.cc (ext_dce_process_uses): Add some comments about 
concerns
with current code.  Mark additional bit groups as live when we have
an extension of a suitably promoted subreg.

gcc/testsuite
* gcc.dg/torture/pr116039.c: New test.

Diff:
---
 gcc/ext-dce.cc  | 43 -
 gcc/testsuite/gcc.dg/torture/pr116039.c | 20 +++
 2 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c94d1fc34145..14f163a01d63 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -667,6 +667,12 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
  if (modify && !skipped_dest && (dst_mask & ~src_mask) == 0)
ext_dce_try_optimize_insn (insn, x);
 
+ /* Stripping the extension here just seems wrong on multiple
+levels.  It's source side handling, so it seems like it
+belongs in the loop below.  Stripping here also makes it
+harder than necessary to properly handle live bit groups
+for (ANY_EXTEND (SUBREG)) where the SUBREG has
+SUBREG_PROMOTED state.  */
  dst_mask &= src_mask;
  src = XEXP (src, 0);
  code = GET_CODE (src);
@@ -674,8 +680,8 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
 
  /* Optimization is done at this point.  We just want to make
 sure everything that should get marked as live is marked
-from here onward.  */
-
+from here onward.  Shouldn't the backpropagate step happen
+before optimization?  */
  dst_mask = carry_backpropagate (dst_mask, code, src);
 
  /* We will handle the other operand of a binary operator
@@ -688,7 +694,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
  /* We're inside a SET and want to process the source operands
 making things live.  Breaking from this loop will cause
 the iterator to work on sub-rtxs, so it is safe to break
-if we see something we don't know how to handle.  */
+if we see something we don't know how to handle.
+
+This code is just hokey as it really just handles trivial
+unary and binary cases.  Otherwise the loop exits and we
+continue iterating on sub-rtxs, but outside the set context.  
*/
  unsigned HOST_WIDE_INT save_mask = dst_mask;
  for (;;)
{
@@ -704,10 +714,26 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
y = XEXP (y, 0);
  else if (SUBREG_P (y) && SUBREG_BYTE (y).is_constant ())
{
- /* For anything but (subreg (reg)), break the inner loop
-and process normally (conservatively).  */
- if (!REG_P (SUBREG_REG (y)))
+ /* We really want to 

[gcc r15-2316] [committed] Trivial testcase adjustment

2024-07-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:2dd45655db47362153756261881413b368582597

commit r15-2316-g2dd45655db47362153756261881413b368582597
Author: Jeff Law 
Date:   Thu Jul 25 08:42:04 2024 -0600

[committed] Trivial testcase adjustment

I made pr116037.c dependent on int32 just based on the constants used 
without
noting the int128 vector type.  Naturally on targets that don't support 
int128
the test fails.  Fixed by changing the target selector from int32 to int128.

Pushed to the trunk.

gcc/testsuite
* gcc.dg/torture/pr116037.c: Fix target selector.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr116037.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr116037.c 
b/gcc/testsuite/gcc.dg/torture/pr116037.c
index cb34ba4e5d46..86ab50de4b2f 100644
--- a/gcc/testsuite/gcc.dg/torture/pr116037.c
+++ b/gcc/testsuite/gcc.dg/torture/pr116037.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target int32 } */
+/* { dg-require-effective-target int128 } */
 /* { dg-additional-options "-Wno-psabi" } */
 
 typedef __attribute__((__vector_size__ (64))) unsigned char VC;


Re: [RISC-V] Combining vfmv and .vv instructions into a .vf instruction

2024-07-24 Thread Jeff Law via Gcc




On 7/24/24 11:25 AM, Artemiy Volkov wrote:

Hi Juzhe, Demin, Jeff,

This email is intended to continue the discussion started in
https://marc.info/?l=gcc-patches=170927452922009=2 about combining vfmv.v.f
and vfmxx.vv instructions into the scalar-vector form vfmxx.vf.

There was a mention on that thread of the potential usefulness of the 
late-combine
pass (added last month in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=792f97b44ffc5e6a967292b3747fd835e99396e7)
in making this transformation. However, when I tried it out with my testcase at
https://godbolt.org/z/o8oPzo7qY, I found it unable to handle these complex
post-split1 patterns for broadcast and vfmacc:

(insn 129 128 130 3 (set (reg:RVVM4SF 168 [ _61 ])
 (if_then_else:RVVM4SF (unspec:RVVMF8BI [
 (const_vector:RVVMF8BI [
 (const_int 1 [0x1]) repeated x16
 ])
 (const_int 16 [0x10])
 (const_int 2 [0x2]) repeated x2
 (const_int 0 [0])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (vec_duplicate:RVVM4SF (mem:SF (reg:SI 143 [ ivtmp.21 ]) [1 
MEM[(float *)_145]+0 S4 A32]))
 (unspec:RVVM4SF [
 (reg:SI 0 zero)
 ] UNSPEC_VUNDEF))) "/app/example.c":19:53 4019 
{*pred_broadcastrvvm4sf_zvfh}
  (nil))
[ ... ]
(insn 131 130 34 3 (set (reg:RVVM4SF 139 [ D__lsm.10 ])
 (if_then_else:RVVM4SF (unspec:RVVMF8BI [
 (const_vector:RVVMF8BI [
 (const_int 1 [0x1]) repeated x16
 ])
 (const_int 16 [0x10])
 (const_int 2 [0x2]) repeated x2
 (const_int 0 [0])
 (const_int 7 [0x7])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 (reg:SI 69 frm)
 ] UNSPEC_VPREDICATE)
 (plus:RVVM4SF (mult:RVVM4SF (reg/v:RVVM4SF 135 [ row ])
 (reg:RVVM4SF 168 [ _61 ]))
 (reg:RVVM4SF 139 [ D__lsm.10 ]))
 (unspec:RVVM4SF [
 (reg:SI 0 zero)
 ] UNSPEC_VUNDEF))) "/app/example.c":19:36 15007 
{*pred_mul_addrvvm4sf_undef}
  (nil))

I'm no expert on this, but what's stopping us from adding some vector-scalar
split patterns alongside vector-vector ones in autovec.md to fix this? For
instance, the addition of fma4_scalar insn_and_split like this:

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index d5793ac..bf54d71 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1229,2 +1229,22 @@

+(define_insn_and_split "fma4_scalar"
+  [(set (match_operand:V_VLSF 0 "register_operand")
+(plus:V_VLSF
+ (mult:V_VLSF
+   (vec_duplicate:V_VLSF (match_operand:SF 1 
"direct_broadcast_operand"))
+   (match_operand:V_VLSF 2 "register_operand"))
+ (match_operand:V_VLSF 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+  {
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+ operands[0]};
+riscv_vector::emit_vlmax_insn (code_for_pred_mul_scalar (PLUS, mode),
+  riscv_vector::TERNARY_OP_FRM_DYN, ops);
+DONE;
+  }
+  [(set_attr "type" "vector")])
+
  ;; -

does lead to vfmacc.vf instructions being emitted instead of vfmacc.vv's for the
testcase linked above.

What do you think about this approach to implement this optimization? Am I
missing anything important? Maybe split1 is too early to determine the final
instruction format (.vf vs .vv) and we should strive to recombine during
late-combine2?

Also, is there anyone working on this optimization at the present moment?
Before jumping straight to a new combiner pattern (especially a 
define_insn_and_split), I would want to have a clearer understanding of 
the code before/after instruction combination as well as before/after 
register allocation and reloading.


When I was looking at the results of late-combine it did seem to fairly 
consistently work well for generating .vx and .vf forms rather than .vv, 
so odds are something simple is missing somewhere.




I'm not aware of anyone working on this, except perhaps Demin.  So 
there's ample room to work without stepping on anyone's toes.


jeff




[gcc r15-2275] [rtl-optimization/116037] Explicitly track if a destination was skipped in ext-dce

2024-07-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:679086172b84be18c55fdbb9cda7e97806e7c083

commit r15-2275-g679086172b84be18c55fdbb9cda7e97806e7c083
Author: Jeff Law 
Date:   Wed Jul 24 11:16:26 2024 -0600

[rtl-optimization/116037] Explicitly track if a destination was skipped in 
ext-dce

So this has been in the hopper since the first bugs were reported against
ext-dce.  It'd been holding off committing as I was finding other issues in
terms of correctness of live computations.  There's still problems in that
space, but I think it's time to push this chunk forward.  I'm marking it as
116037, but it may impact other bugs.

This patch starts explicitly tracking if set processing skipped a 
destination,
which can happen for wide modes (TI+), vectors, certain subregs, etc.  This 
is
computed during ext_dce_set_processing.

During use processing we use that flag to determine reliably if we need to 
make
the inputs fully live and to avoid even trying to eliminate an extension if 
we
skipped output processing.

While testing this I found that a recent change to fix cases where we had 
two
subreg input operands mucked up the code to make things like a shift/rotate
count fully live.  So that goof has been fixed.

Bootstrapped and regression tested on x86.  Most, but not all, of these 
changes
have also been tested on the crosses.  Pushing to the trunk.

I'm not including it in this patch but I'm poking at converting this code to
use note_uses/note_stores to make it more maintainable.  The SUBREG and
STRICT_LOW_PART handling of note_stores is problematical, but I think it's
solvable.  I haven't tried a conversion to note_uses yet.

PR rtl-optimization/116037
gcc/
* ext-dce.cc (ext_dce_process_sets): Note if we ever skip a dest
and return that info explicitly.
(ext_dce_process_uses): If a set was skipped, then consider all bits
in every input as live.  Do not try to optimize away an extension if
we skipped processing a destination in the same insn.  Restore code
to make shift/rotate count fully live.
(ext_dce_process_bb): Handle API changes for ext_dce_process_sets.

gcc/testsuite/
* gcc.dg/torture/pr116037.c: New test

Diff:
---
 gcc/ext-dce.cc  | 42 ++---
 gcc/testsuite/gcc.dg/torture/pr116037.c | 36 
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c56dfb505b88..c94d1fc34145 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -181,9 +181,11 @@ safe_for_live_propagation (rtx_code code)
within an object) are set by INSN, the more aggressive the
optimization phase during use handling will be.  */
 
-static void
+static bool
 ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp)
 {
+  bool skipped_dest = false;
+
   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, obj, NONCONST)
 {
@@ -210,6 +212,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Skip the subrtxs of this destination.  There is
 little value in iterating into the subobjects, so
 just skip them for a bit of efficiency.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -241,6 +244,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Skip the subrtxs of the STRICT_LOW_PART.  We can't
 process them because it'll set objects as no longer
 live when they are in fact still live.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -291,6 +295,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  if (!is_a  (GET_MODE (SUBREG_REG (x)), 
_mode)
  || GET_MODE_BITSIZE (outer_mode) > 64)
{
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -318,6 +323,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 remain the same.  Thus we can not continue here, we must
 either figure out what part of the destination is modified
 or skip the sub-rtxs.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
  continue;
}
@@ -370,9 +376,11 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
   else if (GET_CODE (x) == COND_EXEC)
{
  /* This isn't ideal, but may not be so bad in practice.  */
+ skipped_dest = true;
  iter.skip_subrtxes ();
}
 }
+  return skipped_dest;
 }
 
 /* INSN has a sign/zero extended 

[gcc r15-2240] [PR rtl-optimization/115877][6/n] Add testcase from pr115877

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f9a60d575f02822852aa22513c636be38f9c63ea

commit r15-2240-gf9a60d575f02822852aa22513c636be38f9c63ea
Author: Jeff Law 
Date:   Tue Jul 23 19:11:04 2024 -0600

[PR rtl-optimization/115877][6/n] Add testcase from pr115877

This just adds the testcase from pr115877.  It's working now on the trunk.  
I'm
not done with cleanups/bugfixing, but there's no reason to not have the
testcase installed at this point.

PR rtl-optimization/115877
gcc/testsuite
* gcc.dg/torture/pr115877.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr115877.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115877.c 
b/gcc/testsuite/gcc.dg/torture/pr115877.c
new file mode 100644
index ..432b1280b177
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr115877.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target int128 } } */
+
+char a[16];
+unsigned short u;
+
+__int128
+foo (int i)
+{
+  i -= (unsigned short) ~u;
+  a[(unsigned short) i] = 1;
+  return i;
+}
+
+int
+main ()
+{
+  __int128 x = foo (0);
+  if (x != -0x)
+__builtin_abort();
+}


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [5/n][PR rtl-optimization/115877] Fix handling of input/output operands

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:99d15ac27522519377f7019cf6e5cb67b1497458

commit 99d15ac27522519377f7019cf6e5cb67b1497458
Author: Jeff Law 
Date:   Mon Jul 22 21:48:28 2024 -0600

[5/n][PR rtl-optimization/115877] Fix handling of input/output operands

So in this patch we're correcting a failure to mark objects live in 
scenarios
like

(set (dest) (plus (dest) (src))

When handling set pseudos, we transfer the liveness information from LIVENOW
into LIVE_TMP.  LIVE_TMP is subsequently used to narrow what bit groups are
live for the inputs.

The first time we process the block we may not have DEST in the LIVENOW set 
(it
may be live across the loop, but not live after the loop).  Thus we can 
totally
miss making certain objects live, resulting in incorrect code.

The fix is pretty simple.  If LIVE_TMP is empty, then we should go ahead and
mark all the bit groups for the set object in LIVE_TMP.  This also removes 
an
invalid gcc_assert on the state of the liveness bitmaps.

This showed up on pru, rl78 and/or msp430 in the testsuite.  So no new test.

Bootstrapped and regression tested on x86_64 and also run through my tester 
on
all the cross platforms.

Pushing to the trunk.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_sets): Reasonably handle input/output
operands.
(ext_dce_rd_transfer_n): Drop bogus assertion.

(cherry picked from commit ad642d2c950657539777ea436b787e7fff4ec09e)

Diff:
---
 gcc/ext-dce.cc | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 7f0a6d725f1e..43d2447acb5d 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -245,13 +245,25 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  continue;
}
 
- /* Transfer all the LIVENOW bits for X into LIVE_TMP.  */
+ /* LIVE_TMP contains the set groups that are live-out and set in
+this insn.  It is used to narrow the groups live-in for the
+inputs of this insn.
+
+The simple thing to do is mark all the groups as live, but
+that will significantly inhibit optimization.
+
+We also need to be careful in the case where we have an in-out
+operand.  If we're not careful we'd clear LIVE_TMP
+incorrectly.  */
  HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
  int limit = group_limit (SUBREG_REG (x));
  for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
+ if (bitmap_empty_p (live_tmp))
+   make_reg_live (live_tmp, rn);
+
  /* The mode of the SUBREG tells us how many bits we can
 clear.  */
  machine_mode mode = GET_MODE (x);
@@ -316,14 +328,25 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Now handle the actual object that was changed.  */
  if (REG_P (x))
{
- /* Transfer the appropriate bits from LIVENOW into
-LIVE_TMP.  */
+ /* LIVE_TMP contains the set groups that are live-out and set in
+this insn.  It is used to narrow the groups live-in for the
+inputs of this insn.
+
+The simple thing to do is mark all the groups as live, but
+that will significantly inhibit optimization.
+
+We also need to be careful in the case where we have an in-out
+operand.  If we're not careful we'd clear LIVE_TMP
+incorrectly.  */
  HOST_WIDE_INT rn = REGNO (x);
  int limit = group_limit (x);
  for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
+ if (bitmap_empty_p (live_tmp))
+   make_reg_live (live_tmp, rn);
+
  /* Now clear the bits known written by this instruction.
 Note that BIT need not be a power of two, consider a
 ZERO_EXTRACT destination.  */
@@ -935,8 +958,6 @@ ext_dce_rd_transfer_n (int bb_index)
  the generic dataflow code that something changed.  */
   if (!bitmap_equal_p ([bb_index], livenow))
 {
-  gcc_assert (!bitmap_intersect_compl_p ([bb_index], livenow));
-
   bitmap_copy ([bb_index], livenow);
   return true;
 }


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [4/n][PR rtl-optimization/115877] Correct SUBREG handling in a destination

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d4f5e86b8cf0666aefd9c1f10188274af147df46

commit d4f5e86b8cf0666aefd9c1f10188274af147df46
Author: Jeff Law 
Date:   Mon Jul 22 10:11:57 2024 -0600

[4/n][PR rtl-optimization/115877] Correct SUBREG handling in a destination

If we encounter something during SET handling that we can not handle, the 
safe
thing to do is to ignore the destination and continue the loop.

We've actually been trying to do slightly better with SUBREG destinations by
iterating into SUBREG_REG.  It turns out that wasn't working as expected.

The problem is once we "continue" we lose the state that we were inside the 
SET
and thus we ended up ignoring the destination completely rather than 
tracking
the SUBREG_REG object.  This could be fixed by restarting SET processing, 
but I
just don't see this as all that important to handle.  So rather than leave 
the
code as-is, not working per design, I'm twiddling it to use the common 'skip
subrtxs and continue' idiom used elsewhere.

This is a prerequisite for another patch in this series.  Specifically I 
have a
patch that explicitly tracks if we skipped a destination rather than trying 
to
imply it from the state of LIVE_TMP.  So this is probably NFC right now, but
that's a short-lived NFC.

Bootstrapped and regression tested on x86 and also run as part of a larger 
kit
on the crosses in my tester.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_sets): More correctly handle SUBREG
destinations.

(cherry picked from commit ab7c0aed52054976d0b5e12c52e82239d4277b98)

Diff:
---
 gcc/ext-dce.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index d1a31e1819e2..7f0a6d725f1e 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -270,11 +270,18 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
  if (SUBREG_P (x))
{
- /* If we have a SUBREG that is too wide, just continue the loop
-and let the iterator go down into SUBREG_REG.  */
+ /* If we have a SUBREG destination that is too wide, just
+skip the destination rather than continuing this iterator.
+While continuing would be better, we'd need to strip the
+subreg and restart within the SET processing rather than
+the top of the loop which just complicates the flow even
+more.  */
  if (!is_a  (GET_MODE (SUBREG_REG (x)), 
_mode)
  || GET_MODE_BITSIZE (outer_mode) > 64)
-   continue;
+   {
+ iter.skip_subrtxes ();
+ continue;
+   }
 
  /* We can safely strip a paradoxical subreg.  The inner mode will
 be narrower than the outer mode.  We'll clear fewer bits in


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as live in ext-dce

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:316f9617dcf040ccad190d853ee7f94b2f9caace

commit 316f9617dcf040ccad190d853ee7f94b2f9caace
Author: Jeff Law 
Date:   Mon Jul 22 08:45:10 2024 -0600

[NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as 
live in ext-dce

Another patch to refine liveness computations.  This should be NFC and is
designed to help debugging.

In simplest terms the patch avoids setting bit groups outside the size of a
pseudo as live.  Consider a HImode pseudo, bits 16..63 for such a pseudo 
don't
really have meaning, yet we often set bit groups related to bits 16.63 on in
the liveness bitmaps.

This makes debugging harder than it needs to be by simply having larger 
bitmaps
to verify when walking through the code in a debugger.

This has been bootstrapped and regression tested on x86_64.  It's also been
tested on the crosses in my tester without regressions.

Pushing to the trunk,

PR rtl-optimization/115877
gcc/
* ext-dce.cc (group_limit): New function.
(mark_reg_live): Likewise.
(ext_dce_process_sets): Use new functions.
(ext_dce_process_uses): Likewise.
(ext_dce_init): Likewise.

(cherry picked from commit 88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2)

Diff:
---
 gcc/ext-dce.cc | 64 +++---
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 59bcc4572d57..d1a31e1819e2 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -48,6 +48,57 @@ static bool modify;
bit 16..31
bit 32..BITS_PER_WORD-1  */
 
+/* For the given REG, return the number of bit groups implied by the
+   size of the REG's mode, up to a maximum of 4 (number of bit groups
+   tracked by this pass).
+
+   For partial integer and variable sized modes also return 4.  This
+   could possibly be refined for something like PSI mode, but it
+   does not seem worth the effort.  */
+
+static int
+group_limit (const_rtx reg)
+{
+  machine_mode mode = GET_MODE (reg);
+
+  if (!GET_MODE_BITSIZE (mode).is_constant ())
+return 4;
+
+  int size = GET_MODE_SIZE (mode).to_constant ();
+
+  size = exact_log2 (size);
+
+  if (size < 0)
+return 4;
+
+  size++;
+  return (size > 4 ? 4 : size);
+}
+
+/* Make all bit groups live for REGNO in bitmap BMAP.  For hard regs,
+   we assume all groups are live.  For a pseudo we consider the size
+   of the pseudo to avoid creating unnecessarily live chunks of data.  */
+
+static void
+make_reg_live (bitmap bmap, int regno)
+{
+  int limit;
+
+  /* For pseudos we can use the mode to limit how many bit groups
+ are marked as live since a pseudo only has one mode.  Hard
+ registers have to be handled more conservatively.  */
+  if (regno > FIRST_PSEUDO_REGISTER)
+{
+  rtx reg = regno_reg_rtx[regno];
+  limit = group_limit (reg);
+}
+  else
+limit = 4;
+
+  for (int i = 0; i < limit; i++)
+bitmap_set_bit (bmap, regno * 4 + i);
+}
+
 /* Note this pass could be used to narrow memory loads too.  It's
not clear if that's profitable or not in general.  */
 
@@ -196,7 +247,8 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 
  /* Transfer all the LIVENOW bits for X into LIVE_TMP.  */
  HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
- for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
+ int limit = group_limit (SUBREG_REG (x));
+ for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
@@ -260,7 +312,8 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Transfer the appropriate bits from LIVENOW into
 LIVE_TMP.  */
  HOST_WIDE_INT rn = REGNO (x);
- for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
+ int limit = group_limit (x);
+ for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
@@ -692,7 +745,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
   /* If we have a register reference that is not otherwise handled,
 just assume all the chunks are live.  */
   else if (REG_P (x))
-   bitmap_set_range (livenow, REGNO (x) * 4, 4);
+   bitmap_set_range (livenow, REGNO (x) * 4, group_limit (x));
 }
 }
 
@@ -819,10 +872,7 @@ ext_dce_init (void)
   unsigned i;
   bitmap_iterator bi;
   EXECUTE_IF_SET_IN_BITMAP (refs, 0, i, bi)
-{
-  for (int j = 0; j < 4; j++)
-   bitmap_set_bit ([EXIT_BLOCK], i * 4 + j);
-}
+make_reg_live ([EXIT_BLOCK], i);
 
   livenow = BITMAP_ALLOC (NULL);
   all_blocks = BITMAP_ALLOC (NULL);


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0ee41b02916d9c4c68ae6dbaa364cefcd79bb7da

commit 0ee41b02916d9c4c68ae6dbaa364cefcd79bb7da
Author: Pan Li 
Date:   Mon Jul 1 16:36:35 2024 +0800

RISC-V: Implement the .SAT_TRUNC for scalar

This patch would like to implement the simple .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint32_t, uint64_t)

Before this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  _Bool overflow;
  unsigned char _1;
  unsigned char _2;
  unsigned char _3;
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  overflow_5 = x_4(D) > 255;
  _1 = (unsigned char) x_4(D);
  _2 = (unsigned char) overflow_5;
  _3 = -_2;
  _6 = _1 | _3;
  return _6;
;;succ:   EXIT

}

After this patch:
__attribute__((noinline))
uint8_t sat_u_truc_uint16_t_to_uint8_t_fmt_1 (uint16_t x)
{
  uint8_t _6;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  _6 = .SAT_TRUNC (x_4(D)); [tail call]
  return _6;
;;succ:   EXIT

}

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_DOUBLE_TRUNC): Add new iterator
for int double truncation.
(ANYI_DOUBLE_TRUNCATED): Add new attr for int double truncation.
(anyi_double_truncated): Ditto but for lowercase.
* config/riscv/riscv-protos.h (riscv_expand_ustrunc): Add new
func decl for expanding ustrunc
* config/riscv/riscv.cc (riscv_expand_ustrunc): Add new func
impl to expand ustrunc.
* config/riscv/riscv.md (ustrunc2): 
Impl
the new pattern ustrunc2 for int.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_arith_data.h: New test.
* gcc.target/riscv/sat_u_trunc-1.c: New test.
* gcc.target/riscv/sat_u_trunc-2.c: New test.
* gcc.target/riscv/sat_u_trunc-3.c: New test.
* gcc.target/riscv/sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/scalar_sat_unary.h: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 5d2115b850df63b0ecdf56efb720ad848e7afe21)

Diff:
---
 gcc/config/riscv/iterators.md  | 10 
 gcc/config/riscv/riscv-protos.h|  1 +
 gcc/config/riscv/riscv.cc  | 40 
 gcc/config/riscv/riscv.md  | 10 
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 16 +++
 gcc/testsuite/gcc.target/riscv/sat_arith_data.h| 56 ++
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-1.c | 17 +++
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-2.c | 20 
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-3.c | 19 
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-1.c | 16 +++
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-2.c | 16 +++
 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-3.c | 16 +++
 gcc/testsuite/gcc.target/riscv/scalar_sat_unary.h  | 22 +
 13 files changed, 259 insertions(+)

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index d61ed53a8b1b..734da041f0cb 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -65,6 +65,16 @@
 ;; Iterator for hardware-supported integer modes.
 (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
+
+(define_mode_attr ANYI_DOUBLE_TRUNCATED [
+  (HI "QI") (SI "HI") (DI "SI")
+])
+
+(define_mode_attr anyi_double_truncated [
+  (HI "qi") (SI "hi") (DI "si")
+])
+
 ;; Iterator for hardware-supported floating-point modes.
 (define_mode_iterator ANYF [(SF "TARGET_HARD_FLOAT || TARGET_ZFINX")
(DF "TARGET_DOUBLE_FLOAT || TARGET_ZDINX")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 7c0ea1b445b1..ce5e38d3dbbf 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -135,6 +135,7 @@ riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
 extern void riscv_expand_usadd (rtx, rtx, 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PR rtl-optimization/115877] Fix livein computation for ext-dce

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:3e82d5753917b74abcbb9212465eec8d89fef824

commit 3e82d5753917b74abcbb9212465eec8d89fef824
Author: Jeff Law 
Date:   Sun Jul 21 07:36:37 2024 -0600

[PR rtl-optimization/115877] Fix livein computation for ext-dce

So I'm not yet sure how I'm going to break everything down, but this is easy
enough to break out as 1/N of ext-dce fixes/improvements.

When handling uses in an insn, we first determine what bits are set in the
destination which is represented in DST_MASK.  Then we use that to refine 
what
bits are live in the source operands.

In the source operand handling section we *modify* DST_MASK if the source
operand is a SUBREG (ugh!).  So if the first operand is a SUBREG, then we 
can
incorrectly compute which bit groups are live in the second operand, 
especially
if it is a SUBREG as well.

This was seen when testing a larger set of patches on the rl78 port
(builtin-arith-overflow-p-7 & pr71631 execution failures), so no new test 
for
this bugfix.

Run through my tester (in conjunction with other ext-dce changes) on the
various cross targets.  Run individually through a bootstrap and regression
test cycle on x86_64 as well.

Pushing to the trunk.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_uses): Restore the value of DST_MASK
for reach operand.

(cherry picked from commit 91e468b72dafc9dcd5dcf7915f1d0ef172264d53)

Diff:
---
 gcc/ext-dce.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 7270de2a3bfe..d431f8ac12d4 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -591,8 +591,10 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 making things live.  Breaking from this loop will cause
 the iterator to work on sub-rtxs, so it is safe to break
 if we see something we don't know how to handle.  */
+ unsigned HOST_WIDE_INT save_mask = dst_mask;
  for (;;)
{
+ dst_mask = save_mask;
  /* Strip an outer paradoxical subreg.  The bits outside
 the inner mode are don't cares.  So we can just strip
 and process the inner object.  */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Rearrange the test helper files for vector .SAT_*

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:97d90509f1d6b8189d6492d51383c06239c57bbe

commit 97d90509f1d6b8189d6492d51383c06239c57bbe
Author: Pan Li 
Date:   Sat Jul 20 10:43:44 2024 +0800

RISC-V: Rearrange the test helper files for vector .SAT_*

Rearrange the test help header files,  as well as align the name
conventions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary.h: Move to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvv_run.h: 
...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: Move 
to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vvx_run.h: 
...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: Move to...
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx_run.h: 
...here.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-1.c: Adjust
the include file names.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-28.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-29.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-30.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-31.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-32.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-13.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-14.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-15.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-20.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-21.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-22.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-23.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-24.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-25.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-26.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-27.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-run-28.c: Ditto.
   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PR rtl-optimization/115877][2/n] Improve liveness computation for constant initialization

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:faadb6b663ea7bda38ebfcd1ed772882cdc725da

commit faadb6b663ea7bda38ebfcd1ed772882cdc725da
Author: Jeff Law 
Date:   Sun Jul 21 08:41:28 2024 -0600

[PR rtl-optimization/115877][2/n] Improve liveness computation for constant 
initialization

While debugging pr115877, I noticed we were failing to remove the 
destination
register from LIVENOW bitmap when it was set to a constant value.  ie  (set
(dest) (const_int)).  This was a trivial oversight in
safe_for_live_propagation.

I don't have an example of this affecting code generation, but it certainly
could.  More importantly, by making LIVENOW more accurate it's easier to 
debug
when LIVENOW differs from expectations.

As with the prior patch this has been tested as part of a larger patchset 
with
the crosses as well as individually on x86_64.

Pushing to the trunk,

PR rtl-optimization/115877
gcc/
* ext-dce.cc (safe_for_live_propagation): Handle RTX_CONST_OBJ.

(cherry picked from commit 9d8ef2711dfecd093077aef6123d9e93ea23454e)

Diff:
---
 gcc/ext-dce.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index d431f8ac12d4..59bcc4572d57 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -69,6 +69,7 @@ safe_for_live_propagation (rtx_code code)
   switch (GET_RTX_CLASS (code))
 {
   case RTX_OBJ:
+  case RTX_CONST_OBJ:
return true;
 
   case RTX_COMPARE:


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Fix testcase missing arch attribute

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:bdb2115f7ee854a6daecf6079274700321f1a2b5

commit bdb2115f7ee854a6daecf6079274700321f1a2b5
Author: Edwin Lu 
Date:   Tue Jul 16 17:43:45 2024 -0700

RISC-V: Fix testcase missing arch attribute

The C + F extention implies the zcf extension on rv32. Add missing zcf
extension for the rv32 target.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/target-attr-16.c: Update expected assembly

Signed-off-by: Edwin Lu 
(cherry picked from commit 5bb01e91d40c34e8f8230b142f7ebff3d6aa88d1)

Diff:
---
 gcc/testsuite/gcc.target/riscv/target-attr-16.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-16.c 
b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
index 1c7badccdeee..c6b626d0c6ce 100644
--- a/gcc/testsuite/gcc.target/riscv/target-attr-16.c
+++ b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
@@ -24,5 +24,5 @@ void bar (void)
 {
 }
 
-/* { dg-final { scan-assembler-times ".option arch, 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times ".option arch, 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0_zba1p0_zbb1p0"
 4 { target { rv32 } } } } */
 /* { dg-final { scan-assembler-times ".option arch, 
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
 4 { target { rv64 } } } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Add debug counter for ext_dce

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b33c9eebd9581a86d56e3cdba2bef96fda1727f4

commit b33c9eebd9581a86d56e3cdba2bef96fda1727f4
Author: Andrew Pinski 
Date:   Tue Jul 16 09:53:20 2024 -0700

Add debug counter for ext_dce

Like r15-1610-gb6215065a5b143 (which adds one for late_combine),
adding one for ext_dce is useful to debug some issues with this pass.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* dbgcnt.def (ext_dce): New debug counter.
* ext-dce.cc (ext_dce_try_optimize_insn): Reject the insn
if the debug counter says so.
(ext_dce): Rename to ...
(ext_dce_execute): This.
(pass_ext_dce::execute): Update for the name of ext_dce.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 7c3287f3613210d4f98c8095bc739bea6582bfbb)

Diff:
---
 gcc/dbgcnt.def |  1 +
 gcc/ext-dce.cc | 16 +---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index ed9f062eac2c..4e7aaeae2da5 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -162,6 +162,7 @@ DEBUG_COUNTER (dom_unreachable_edges)
 DEBUG_COUNTER (dse)
 DEBUG_COUNTER (dse1)
 DEBUG_COUNTER (dse2)
+DEBUG_COUNTER (ext_dce)
 DEBUG_COUNTER (form_fma)
 DEBUG_COUNTER (gcse2_delete)
 DEBUG_COUNTER (gimple_unroll)
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 7ecb99fef81d..7270de2a3bfe 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl-iter.h"
 #include "df.h"
 #include "print-rtl.h"
+#include "dbgcnt.h"
 
 /* These should probably move into a C++ class.  */
 static vec livein;
@@ -312,6 +313,15 @@ ext_dce_try_optimize_insn (rtx_insn *insn, rtx set)
   print_rtl_single (dump_file, SET_SRC (set));
 }
 
+  /* We decided to turn do the optimization but allow it to be rejected for
+ bisection purposes.  */
+  if (!dbg_cnt (::ext_dce))
+{
+  if (dump_file)
+   fprintf (dump_file, "Rejected due to debug counter.\n");
+  return;
+}
+
   new_pattern = simplify_gen_subreg (GET_MODE (src), inner,
 GET_MODE (inner), 0);
   /* simplify_gen_subreg may fail in which case NEW_PATTERN will be NULL.
@@ -881,8 +891,8 @@ static bool ext_dce_rd_confluence_n (edge) { return true; }
are never read.  Turn such extensions into SUBREGs instead which
can often be propagated away.  */
 
-static void
-ext_dce (void)
+void
+ext_dce_execute (void)
 {
   df_analyze ();
   ext_dce_init ();
@@ -929,7 +939,7 @@ public:
   virtual bool gate (function *) { return flag_ext_dce && optimize > 0; }
   virtual unsigned int execute (function *)
 {
-  ext_dce ();
+  ext_dce_execute ();
   return 0;
 }


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Revert "RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr"

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:20fe3e21e824daeb20679a24d3de78969c17710c

commit 20fe3e21e824daeb20679a24d3de78969c17710c
Author: Christoph Müllner 
Date:   Mon Jul 15 23:42:39 2024 +0200

Revert "RISC-V: Attribute parser: Use alloca() instead of new + 
std::unique_ptr"

This reverts commit 5040c273484d7123a40a99cdeb434cecbd17a2e9.

(cherry picked from commit eb0c163aada970b8351067b17121f013fc58dbc9)

Diff:
---
 gcc/config/riscv/riscv-target-attr.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 57235c9c0a7e..1645a6692177 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -101,7 +101,8 @@ riscv_target_attr_parser::parse_arch (const char *str)
 {
   /* Parsing the extension list like "+[,+]*".  */
   size_t len = strlen (str);
-  char *str_to_check = (char *) alloca (len + 1);
+  std::unique_ptr buf (new char[len+1]);
+  char *str_to_check = buf.get ();
   strcpy (str_to_check, str);
   const char *token = strtok_r (str_to_check, ",", _to_check);
   const char *local_arch_str = global_options.x_riscv_arch_string;
@@ -253,7 +254,8 @@ riscv_process_one_target_attr (char *arg_str,
   return false;
 }
 
-  char *str_to_check = (char *) alloca (len + 1);
+  std::unique_ptr buf (new char[len+1]);
+  char *str_to_check = buf.get();
   strcpy (str_to_check, arg_str);
 
   char *arg = strchr (str_to_check, '=');
@@ -339,7 +341,8 @@ riscv_process_target_attr (tree args, location_t loc)
   return false;
 }
 
-  char *str_to_check = (char *) alloca (len + 1);
+  std::unique_ptr buf (new char[len+1]);
+  char *str_to_check = buf.get ();
   strcpy (str_to_check, TREE_STRING_POINTER (args));
 
   /* Used to catch empty spaces between semi-colons i.e.


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Fix liveness computation for shift/rotate counts in ext-dce

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:fa543ce46c2e205f2813e13fd9d4df65e8544b87

commit fa543ce46c2e205f2813e13fd9d4df65e8544b87
Author: Jeff Law 
Date:   Mon Jul 15 18:15:33 2024 -0600

Fix liveness computation for shift/rotate counts in ext-dce

So as I've noted before I believe the control flow in ext-dce.cc is horribly
messy.  While investigating a fix for 115877 I came across another problem
related to control flow handling.

Specifically, if we have an binary op which implies the 2nd operand is fully
live, then we'd actually fail to mark that operand as live.

We essentially broke out of the loop which was supposed to be safe.  But Y 
was
a REG and if Y is a REG or CONST_INT we skip sub-rtxs and thus failed to
process that operand (the shift count) at all.

Rather than muck around with control flow, we can just set all the bits as 
live
in DST_MASK and let normal processing continue.  With all the bits live IN
DST_MASK all the bits implied by the mode of the argument will also be live.

No testcase.

Bootstrapped and regression tested on x86.  Pushing to the trunk.

gcc/
* ext-dce.cc (ext_dce_process_uses): Simplify control flow and fix
liveness computation for shift/rotate counts.

(cherry picked from commit b31b8af807f5459674b0b310cb62a5bc81b676e7)

Diff:
---
 gcc/ext-dce.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 91789d283fcd..7ecb99fef81d 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -632,10 +632,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  else if (!CONSTANT_P (y))
break;
 
- /* We might have (ashift (const_int 1) (reg...)) */
- /* XXX share this logic with code below.  */
+ /* We might have (ashift (const_int 1) (reg...))
+By setting dst_mask we can continue iterating on the
+the next operand and it will be considered fully live.  */
  if (binop_implies_op2_fully_live (GET_CODE (src)))
-   break;
+   dst_mask = -1;
 
  /* If this was anything but a binary operand, break the inner
 loop.  This is conservatively correct as it will cause the


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Rewrite target attribute handling

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:fa716b37c5c85663b8f73d725a2d4020116e2a77

commit fa716b37c5c85663b8f73d725a2d4020116e2a77
Author: Christoph Müllner 
Date:   Sat Jun 22 21:59:04 2024 +0200

RISC-V: Rewrite target attribute handling

The target-arch attribute handling in RISC-V is only a few months old,
but already saw a rewrite (9941f0295a14), which addressed an important
issue.  This rewrite introduced a hash table in the backend, which is
used to keep track of target-arch attributes of all functions.
The index of this hash table is the pointer to the function declaration
object (fndecl).  However, objects like these don't have the lifetime
that is assumed here, which resulted in observing two fndecl objects
with the same address for different objects (triggering the assertion
in riscv_func_target_put() -- see also PR115562).

This patch removes the hash table approach in favor of storing target
specific options using the DECL_FUNCTION_SPECIFIC_TARGET() macro, which
is also used by other backends and is specifically designed for this
purpose (https://gcc.gnu.org/onlinedocs/gccint/Function-Properties.html).

To have an accessible field in the target options, we need to
adjust riscv.opt and introduce the field riscv_arch_string
(for the already existing option '-march=').

Using this macro allows to remove much code from riscv-common.cc, which
controls access to the objects 'func_target_table' and 
'current_subset_list'.

One thing to mention is, that we had two subset lists:
current_subset_list and cmdline_subset_list, with the latter being
introduced recently for target attribute handling.
This patch reduces them back to one (cmdline_subset_list) which
contains the list of extensions that have been enabled by the command
line arguments.

Note that the patch keeps the existing behavior of rejecting
duplications of extensions when added via the '+' operator in a function
target attribute.  E.g. "-march=rv64gc_zbb" and "arch=+zbb" will trigger
an error (see pr115554.c).  However, at the same time this patch breaks
the acceptance of adding implied extensions, which causes the following
six regressions (with the error "extension 'EXT' appear more than one 
time"):
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c

New tests were added to document the behavior and to ensure it won't
regress.  This patch did not show any regressions for rv32/rv64
and fixes the ICEs from PR115554 and PR115562.

PR target/115554
PR target/115562

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (struct 
riscv_func_target_info):
Remove.
(struct riscv_func_target_hasher): Likewise.
(riscv_func_decl_hash): Likewise.
(riscv_func_target_hasher::hash): Likewise.
(riscv_func_target_hasher::equal): Likewise.
(riscv_current_subset_list): Likewise.
(riscv_cmdline_subset_list): Remove obsolete space.
(riscv_func_target_table_lazy_init): Remove.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
(riscv_arch_str): Generate from cmdline_subset_list.
(riscv_set_arch_by_subset_list): Don't set current_subset_list.
(riscv_parse_arch_string): Remove current_subset_list.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Get subset list via riscv_cmdline_subset_list().
* config/riscv/riscv-subset.h (riscv_current_subset_list):
Remove prototype.
(riscv_func_target_get): Likewise.
(riscv_func_target_put): Likewise.
(riscv_func_target_remove_and_destory): Likewise.
* config/riscv/riscv-target-attr.cc 
(riscv_target_attr_parser::parse_arch):
Build base arch string from existing target options, if any.
(riscv_target_attr_parser::update_settings): Store new arch
string in target options.
(riscv_process_one_target_attr): Whitespace fix.
(riscv_process_target_attr): Drop opts argument.
(riscv_option_valid_attribute_p): Properly save, change and restore
target options.
* config/riscv/riscv.cc (get_arch_str): New function.
(riscv_declare_function_name): Get arch string for option-arch
directive from function's target options.

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Allow adding enabled extension via target arch attributes

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4157d59413a5f35808603e06de06cfb388811a65

commit 4157d59413a5f35808603e06de06cfb388811a65
Author: Christoph Müllner 
Date:   Sat Jul 6 17:03:18 2024 +0200

RISC-V: Allow adding enabled extension via target arch attributes

The set of enabled extensions can be extended via target arch function
attributes by listing each extension with a '+' prefix and a comma as
list separator.  E.g.:
  __attribute__((target("arch=+zba,+zbb"))) void foo();

The programmer intends to ensure that one or more extensions
are enabled when building the code.  This is independent of the arch
string that is passed at build time via the -march= option.

Therefore, it is reasonable to allow enabling extensions via target arch
attributes, which have already been enabled via the -march= string.

The subset list code already supports such duplication for implied
extensions.  This patch adds an interface so the subset list
parser can be switched into a mode where duplication is allowed.

This commit fixes the following regressed test cases:
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-39.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-42.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-43.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-44.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-45.c
* gcc.target/riscv/rvv/base/target_attribute_v_with_intrinsic-46.c

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_subset_list::add):
Allow adding enabled extension if m_allow_adding_dup is set.
* config/riscv/riscv-subset.h: Add m_allow_adding_dup and setter.
* config/riscv/riscv-target-attr.cc 
(riscv_target_attr_parser::parse_arch):
Allow adding enabled extensions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr115554.c: Change expected fail to expected 
pass.
* gcc.target/riscv/target-attr-16.c: New test.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 61c21a719e205f70bd046c6a0275d1a3fd6341a4)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc | 17 +--
 gcc/config/riscv/riscv-subset.h |  5 +
 gcc/config/riscv/riscv-target-attr.cc   |  3 +++
 gcc/testsuite/gcc.target/riscv/pr115554.c   |  2 --
 gcc/testsuite/gcc.target/riscv/target-attr-16.c | 28 +
 5 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 8e9beb6801f9..682826c0e344 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -702,12 +702,17 @@ riscv_subset_list::add (const char *subset, int 
major_version,
  ext->minor_version = minor_version;
}
   else
-   error_at (
- m_loc,
- "%<-march=%s%>: extension %qs appear more than one time",
- m_arch,
- subset);
-
+   {
+ /* The extension is already in the list.  */
+ if (!m_allow_adding_dup
+ || ext->major_version != major_version
+ || ext->minor_version != minor_version)
+   error_at (
+ m_loc,
+ "%<-march=%s%>: extension %qs appear more than one time",
+ m_arch,
+ subset);
+   }
   return;
 }
   else if (strlen (subset) == 1 && !standard_extensions_p (subset))
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index 279716feab57..dace4de65753 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -65,6 +65,9 @@ private:
   /* Number of subsets. */
   unsigned m_subset_num;
 
+  /* Allow adding the same extension more than once.  */
+  bool m_allow_adding_dup;
+
   riscv_subset_list (const char *, location_t);
 
   const char *parsing_subset_version (const char *, const char *, unsigned *,
@@ -109,6 +112,8 @@ public:
 
   void set_loc (location_t);
 
+  void set_allow_adding_dup (bool v) { m_allow_adding_dup = v; }
+
   void finalize ();
 };
 
diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 317806143949..57235c9c0a7e 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -109,6 +109,8 @@ riscv_target_attr_parser::parse_arch (const char *str)
  ? riscv_subset_list::parse (local_arch_str, m_loc)
  : riscv_cmdline_subset_list ()->clone ();
   m_subset_list->set_loc (m_loc);
+  m_subset_list->set_allow_adding_dup (true);
+
   while (token)
{
  if (token[0] != '+')
@@ -134,6 +136,7 @@ riscv_target_attr_parser::parse_arch (const char *str)
  token = strtok_r (NULL, ",", _to_check);
}
 
+

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:64b4b5211aa664e224e3cd722ab5aa11f278aa68

commit 64b4b5211aa664e224e3cd722ab5aa11f278aa68
Author: Christoph Müllner 
Date:   Fri Jul 5 04:48:15 2024 +0200

RISC-V: Attribute parser: Use alloca() instead of new + std::unique_ptr

Allocating an object on the heap with new, wrapping it in a
std::unique_ptr and finally getting the buffer via buf.get()
is a correct way to allocate a buffer that is automatically
freed on return.  However, a simple invocation of alloca()
does the same with less overhead.

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc 
(riscv_target_attr_parser::parse_arch):
Replace new + std::unique_ptr by alloca().
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5040c273484d7123a40a99cdeb434cecbd17a2e9)

Diff:
---
 gcc/config/riscv/riscv-target-attr.cc | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 0bbe7df25d19..3d7753f64574 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -109,8 +109,7 @@ riscv_target_attr_parser::parse_arch (const char *str)
 {
   /* Parsing the extension list like "+[,+]*".  */
   size_t len = strlen (str);
-  std::unique_ptr buf (new char[len+1]);
-  char *str_to_check = buf.get ();
+  char *str_to_check = (char *) alloca (len + 1);
   strcpy (str_to_check, str);
   const char *token = strtok_r (str_to_check, ",", _to_check);
   m_subset_list = riscv_cmdline_subset_list ()->clone ();
@@ -247,8 +246,7 @@ riscv_process_one_target_attr (char *arg_str,
   return false;
 }
 
-  std::unique_ptr buf (new char[len+1]);
-  char *str_to_check = buf.get();
+  char *str_to_check = (char *) alloca (len + 1);
   strcpy (str_to_check, arg_str);
 
   char *arg = strchr (str_to_check, '=');
@@ -334,8 +332,7 @@ riscv_process_target_attr (tree fndecl, tree args, 
location_t loc,
   return false;
 }
 
-  std::unique_ptr buf (new char[len+1]);
-  char *str_to_check = buf.get ();
+  char *str_to_check = (char *) alloca (len + 1);
   strcpy (str_to_check, TREE_STRING_POINTER (args));
 
   /* Used to catch empty spaces between semi-colons i.e.


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement locality for __builtin_prefetch

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e8be9b1c419d1b59a6f5fe5f166c43bfea27ec0e

commit e8be9b1c419d1b59a6f5fe5f166c43bfea27ec0e
Author: Monk Chiang 
Date:   Thu Jul 6 14:05:17 2023 +0800

RISC-V: Implement locality for __builtin_prefetch

The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.

(cherry picked from commit bf26413fc4081dfd18b915580b35bdb71481327e)

Diff:
---
 gcc/config/riscv/riscv.cc  | 22 ++
 gcc/config/riscv/riscv.md  | 10 +++---
 gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c   | 20 
 .../gcc.target/riscv/prefetch-zihintntl.c  | 20 
 4 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d4553aacee96..9bedefa74c35 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6488,6 +6488,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
'A' Print the atomic operation suffix for memory model OP.
'I' Print the LR suffix for memory model OP.
'J' Print the SC suffix for memory model OP.
+   'L' Print a non-temporal locality hints instruction.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -6682,6 +6683,27 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 }
 
+case 'L':
+  {
+   const char *ntl_hint = NULL;
+   switch (INTVAL (op))
+ {
+ case 0:
+   ntl_hint = "ntl.all";
+   break;
+ case 1:
+   ntl_hint = "ntl.pall";
+   break;
+ case 2:
+   ntl_hint = "ntl.p1";
+   break;
+ }
+
+  if (ntl_hint)
+   asm_fprintf (file, "%s\n\t", ntl_hint);
+  break;
+  }
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 379015c60de8..46c46039c33a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4113,12 +4113,16 @@
 {
   switch (INTVAL (operands[1]))
   {
-case 0: return "prefetch.r\t%a0";
-case 1: return "prefetch.w\t%a0";
+case 0: return TARGET_ZIHINTNTL ? "%L2prefetch.r\t%a0" : "prefetch.r\t%a0";
+case 1: return TARGET_ZIHINTNTL ? "%L2prefetch.w\t%a0" : "prefetch.w\t%a0";
 default: gcc_unreachable ();
   }
 }
-  [(set_attr "type" "store")])
+  [(set_attr "type" "store")
+   (set (attr "length") (if_then_else (and (match_test "TARGET_ZIHINTNTL")
+  (match_test "IN_RANGE (INTVAL 
(operands[2]), 0, 2)"))
+ (const_string "8")
+ (const_string "4")))])
 
 (define_insn "riscv_prefetchi_"
   [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
new file mode 100644
index ..0faa120f1f79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+/* { dg-final { scan-assembler-not "ntl.all\t" } } */
+/* { dg-final { scan-assembler-not "ntl.pall\t" } } */
+/* { dg-final { scan-assembler-not "ntl.p1\t" } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
new file mode 100644
index ..78a3afe68333
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop_zihintntl -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:21814631a11523712913a1fdec4055176aa89e28

commit 21814631a11523712913a1fdec4055176aa89e28
Author: Edwin Lu 
Date:   Fri Jul 12 11:31:16 2024 -0700

RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
initialize variable

Signed-off-by: Edwin Lu 
(cherry picked from commit 4306f76192bc7ab71c5997a7e2c95320505029ab)

Diff:
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
index d238c6392def..309d63377d53 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
@@ -9,6 +9,7 @@ main ()
 
   for (i = 0; i < sizeof (DATA) / sizeof (DATA[0]); i++)
 {
+  d = DATA[i];
   RUN_BINARY_VX ([N], d.b, N);
 
   for (k = 0; k < N; k++)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add md files for vector BFloat16

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:c2b212fbed411183ca108323d674fcd62028c851

commit c2b212fbed411183ca108323d674fcd62028c851
Author: Feng Wang 
Date:   Tue Jun 18 06:13:35 2024 +

RISC-V: Add md files for vector BFloat16

V3: Add Bfloat16 vector insn in generic-vector-ooo.md
v2: Rebase
Accroding to the BFloat16 spec, some vector iterators and new pattern
are added in md files.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/generic-vector-ooo.md: Add def_insn_reservation for 
vector BFloat16.
* config/riscv/riscv.md: Add new insn name for vector BFloat16.
* config/riscv/vector-iterators.md: Add some iterators for vector 
BFloat16.
* config/riscv/vector.md: Add some attribute for vector BFloat16.
* config/riscv/vector-bfloat16.md: New file. Add insn pattern 
vector BFloat16.

(cherry picked from commit 9f521632dd9ce71ce28ff1da9c161f76bc20fe3e)

Diff:
---
 gcc/config/riscv/generic-vector-ooo.md |   4 +-
 gcc/config/riscv/riscv.md  |  13 ++-
 gcc/config/riscv/vector-bfloat16.md| 135 ++
 gcc/config/riscv/vector-iterators.md   | 169 -
 gcc/config/riscv/vector.md | 103 +---
 5 files changed, 407 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/generic-vector-ooo.md 
b/gcc/config/riscv/generic-vector-ooo.md
index 5e933c838418..efe6bc41e864 100644
--- a/gcc/config/riscv/generic-vector-ooo.md
+++ b/gcc/config/riscv/generic-vector-ooo.md
@@ -53,7 +53,7 @@
 (define_insn_reservation "vec_fcmp" 3
   (eq_attr "type" "vfrecp,vfminmax,vfcmp,vfsgnj,vfclass,vfcvtitof,\
vfcvtftoi,vfwcvtitof,vfwcvtftoi,vfwcvtftof,vfncvtitof,\
-   vfncvtftoi,vfncvtftof")
+   vfncvtftoi,vfncvtftof,vfncvtbf16,vfwcvtbf16")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector integer multiplication.
@@ -69,7 +69,7 @@
 
 ;; Vector float multiplication and FMA.
 (define_insn_reservation "vec_fmul" 6
-  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd")
+  (eq_attr "type" "vfmul,vfwmul,vfmuladd,vfwmuladd,vfwmaccbf16")
   "vxu_ooo_issue,vxu_ooo_alu")
 
 ;; Vector crypto, assumed to be a generic operation for now.
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 5dee837a5878..379015c60de8 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -200,6 +200,7 @@
   RVVMF64BI,RVVMF32BI,RVVMF16BI,RVVMF8BI,RVVMF4BI,RVVMF2BI,RVVM1BI,
   RVVM8QI,RVVM4QI,RVVM2QI,RVVM1QI,RVVMF2QI,RVVMF4QI,RVVMF8QI,
   RVVM8HI,RVVM4HI,RVVM2HI,RVVM1HI,RVVMF2HI,RVVMF4HI,
+  RVVM8BF,RVVM4BF,RVVM2BF,RVVM1BF,RVVMF2BF,RVVMF4BF,
   RVVM8HF,RVVM4HF,RVVM2HF,RVVM1HF,RVVMF2HF,RVVMF4HF,
   RVVM8SI,RVVM4SI,RVVM2SI,RVVM1SI,RVVMF2SI,
   RVVM8SF,RVVM4SF,RVVM2SF,RVVM1SF,RVVMF2SF,
@@ -219,6 +220,11 @@
   RVVM2x4HI,RVVM1x4HI,RVVMF2x4HI,RVVMF4x4HI,
   RVVM2x3HI,RVVM1x3HI,RVVMF2x3HI,RVVMF4x3HI,
   RVVM4x2HI,RVVM2x2HI,RVVM1x2HI,RVVMF2x2HI,RVVMF4x2HI,
+  RVVM1x8BF,RVVMF2x8BF,RVVMF4x8BF,RVVM1x7BF,RVVMF2x7BF,
+  RVVMF4x7BF,RVVM1x6BF,RVVMF2x6BF,RVVMF4x6BF,RVVM1x5BF,
+  RVVMF2x5BF,RVVMF4x5BF,RVVM2x4BF,RVVM1x4BF,RVVMF2x4BF,
+  RVVMF4x4BF,RVVM2x3BF,RVVM1x3BF,RVVMF2x3BF,RVVMF4x3BF,
+  RVVM4x2BF,RVVM2x2BF,RVVM1x2BF,RVVMF2x2BF,RVVMF4x2BF,
   RVVM1x8HF,RVVMF2x8HF,RVVMF4x8HF,RVVM1x7HF,RVVMF2x7HF,
   RVVMF4x7HF,RVVM1x6HF,RVVMF2x6HF,RVVMF4x6HF,RVVM1x5HF,
   RVVMF2x5HF,RVVMF4x5HF,RVVM2x4HF,RVVM1x4HF,RVVMF2x4HF,
@@ -462,6 +468,10 @@
 ;; vsm4rcrypto vector SM4 Rounds instructions
 ;; vsm3me   crypto vector SM3 Message Expansion instructions
 ;; vsm3ccrypto vector SM3 Compression instructions
+;; 18.Vector BF16 instrctions
+;; vfncvtbf16  vector narrowing single floating-point to brain floating-point 
instruction
+;; vfwcvtbf16  vector widening brain floating-point to single floating-point 
instruction
+;; vfwmaccbf16  vector BF16 widening multiply-accumulate
 (define_attr "type"
   "unknown,branch,jump,jalr,ret,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
@@ -483,7 +493,7 @@
vslideup,vslidedown,vislide1up,vislide1down,vfslide1up,vfslide1down,

vgather,vcompress,vmov,vector,vandn,vbrev,vbrev8,vrev8,vclz,vctz,vcpop,vrol,vror,vwsll,

vclmul,vclmulh,vghsh,vgmul,vaesef,vaesem,vaesdf,vaesdm,vaeskf1,vaeskf2,vaesz,
-   vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c"
+   
vsha2ms,vsha2ch,vsha2cl,vsm4k,vsm4r,vsm3me,vsm3c,vfncvtbf16,vfwcvtbf16,vfwmaccbf16"
   (cond [(eq_attr "got" "load") (const_string "load")
 
 ;; If a doubleword move uses these expensive instructions,
@@ -4373,6 +4383,7 @@
 (include "generic-ooo.md")
 (include "vector.md")
 (include "vector-crypto.md")
+(include "vector-bfloat16.md")
 (include "zicond.md")
 (include "sfb.md")
 (include "zc.md")
diff --git a/gcc/config/riscv/vector-bfloat16.md 
b/gcc/config/riscv/vector-bfloat16.md
new file mode 100644
index ..562aa8ee5ed7
--- /dev/null
+++ 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:958e43c1baa3d40fcbb206bb8469c7782e044e7a

commit 958e43c1baa3d40fcbb206bb8469c7782e044e7a
Author: Feng Wang 
Date:   Mon Jun 17 01:59:57 2024 +

RISC-V: Add Zvfbfmin and Zvfbfwma intrinsic

v3: Modify warning message in riscv.cc
v2: Rebase
Accroding to the intrinsic doc, the 'Zvfbfmin' and 'Zvfbfwma' intrinsic
functions are added by this patch.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vfncvtbf16_f):
Add 'Zvfbfmin' intrinsic in bases.
(class vfwcvtbf16_f): Ditto.
(class vfwmaccbf16): Add 'Zvfbfwma' intrinsic in bases.
(BASE): Add BASE macro for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration for 
'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS):
Add builtins def for 'Zvfbfmin' and 'Zvfbfwma'.
(vfncvtbf16_f): Ditto.
(vfncvtbf16_f_frm): Ditto.
(vfwcvtbf16_f): Ditto.
(vfwmaccbf16): Ditto.
(vfwmaccbf16_frm): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (supports_vectype_p):
Add vector intrinsic build judgment for BFloat16.
(build_all): Ditto.
(BASE_NAME_MAX_LEN): Adjust max length.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F32_OPS):
Add new operand type for BFloat16.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_F32_OPS): Ditto.
(validate_instance_type_required_extensions):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.
* config/riscv/riscv-vector-builtins.h (enum required_ext):
Add required_ext declaration for 'Zvfbfmin' and 'Zvfbfwma'.
(reqired_ext_to_isa_name): Ditto.
(required_extensions_specified): Ditto.
(struct function_group_info): Add match case for 'Zvfbfmin' and 
'Zvfbfwma'.
* config/riscv/riscv.cc (riscv_validate_vector_type):
Add required_ext checking for 'Zvfbfmin' and 'Zvfbfwma'.

(cherry picked from commit 281f021ed4fbf9c2336048e34b6b40c6f7119baa)

Diff:
---
 gcc/config/riscv/riscv-vector-builtins-bases.cc| 69 ++
 gcc/config/riscv/riscv-vector-builtins-bases.h |  7 +++
 .../riscv/riscv-vector-builtins-functions.def  | 15 +
 gcc/config/riscv/riscv-vector-builtins-shapes.cc   | 31 +-
 gcc/config/riscv/riscv-vector-builtins-types.def   | 13 
 gcc/config/riscv/riscv-vector-builtins.cc  | 67 +
 gcc/config/riscv/riscv-vector-builtins.h   | 34 +++
 gcc/config/riscv/riscv.cc  | 13 ++--
 8 files changed, 232 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 6483faba39c4..193392fbcc2a 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2417,6 +2417,60 @@ public:
   }
 };
 
+/* Implements vfncvtbf16_f. */
+template 
+class vfncvtbf16_f : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
+  bool may_require_frm_p () const override { return true; }
+
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (code_for_pred_trunc_to_bf16 (e.vector_mode ()));
+  }
+};
+
+/* Implements vfwcvtbf16_f. */
+class vfwcvtbf16_f : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (code_for_pred_extend_bf16_to (e.vector_mode ()));
+  }
+};
+
+/* Implements vfwmaccbf16. */
+template 
+class vfwmaccbf16 : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override
+  {
+return FRM_OP == HAS_FRM;
+  }
+
+  bool may_require_frm_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander ) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_widen_ternop_insn (
+   code_for_pred_widen_bf16_mul_scalar (e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_widen_ternop_insn (
+   code_for_pred_widen_bf16_mul (e.vector_mode ()));
+gcc_unreachable ();
+  }
+};
+
 static CONSTEXPR const vsetvl vsetvl_obj;
 static CONSTEXPR const vsetvl vsetvlmax_obj;
 static CONSTEXPR const loadstore vle_obj;
@@ -2734,6 +2788,14 @@ static CONSTEXPR const crypto_vv   
vsm4r_obj;
 static CONSTEXPR const vsm3me vsm3me_obj;
 static CONSTEXPR const vaeskf2_vsm3c   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add vector type of BFloat16 format

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a5ca595e63500d080294f81eada2b25e320dd572

commit a5ca595e63500d080294f81eada2b25e320dd572
Author: Feng Wang 
Date:   Thu Jun 13 00:32:14 2024 +

RISC-V: Add vector type of BFloat16 format

v3: Rebase
v2: Rebase
The vector type of BFloat16 format is added in this patch,
subsequent extensions to zvfbfmin and zvfwma need to be based
on this patch.

Signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (bfloat16_type):
Generate bf16 vector_type and scalar_type in DEF_RVV_TYPE_INDEX.
(bfloat16_wide_type): Ditto.
(same_ratio_eew_bf16_type): Ditto.
(main): Ditto.
* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
Add vector type for BFloat16.
(RVV_WHOLE_MODES): Add vector type for BFloat16.
(RVV_FRACT_MODE): Ditto.
(RVV_NF4_MODES): Ditto.
(RVV_NF8_MODES): Ditto.
(RVV_NF2_MODES): Ditto.
* config/riscv/riscv-vector-builtins-types.def (vbfloat16mf4_t):
Add builtin vector type for BFloat16.
(vbfloat16mf2_t): Add builtin vector type for BFloat16.
(vbfloat16m1_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m8_t): Ditto.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add required_ext checking for BFloat16.
* config/riscv/riscv-vector-builtins.def (vbfloat16mf4_t):
Add vector_type for BFloat16 in builtins.def.
(vbfloat16mf4x2_t): Ditto.
(vbfloat16mf4x3_t): Ditto.
(vbfloat16mf4x4_t): Ditto.
(vbfloat16mf4x5_t): Ditto.
(vbfloat16mf4x6_t): Ditto.
(vbfloat16mf4x7_t): Ditto.
(vbfloat16mf4x8_t): Ditto.
(vbfloat16mf2_t): Ditto.
(vbfloat16mf2x2_t): Ditto.
(vbfloat16mf2x3_t): Ditto.
(vbfloat16mf2x4_t): Ditto.
(vbfloat16mf2x5_t): Ditto.
(vbfloat16mf2x6_t): Ditto.
(vbfloat16mf2x7_t): Ditto.
(vbfloat16mf2x8_t): Ditto.
(vbfloat16m1_t): Ditto.
(vbfloat16m1x2_t): Ditto.
(vbfloat16m1x3_t): Ditto.
(vbfloat16m1x4_t): Ditto.
(vbfloat16m1x5_t): Ditto.
(vbfloat16m1x6_t): Ditto.
(vbfloat16m1x7_t): Ditto.
(vbfloat16m1x8_t): Ditto.
(vbfloat16m2_t): Ditto.
(vbfloat16m2x2_t): Ditto.
(vbfloat16m2x3_t): Ditto.
(vbfloat16m2x4_t): Ditto.
(vbfloat16m4_t): Ditto.
(vbfloat16m4x2_t): Ditto.
(vbfloat16m8_t): Ditto.
(double_trunc_bfloat_scalar): Add scalar_type def for BFloat16.
(double_trunc_bfloat_vector): Add vector_type def for BFloat16.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_BF_16):
Add required defination of BFloat16 ext.
* config/riscv/riscv-vector-switch.def (ENTRY):
Add vector_type information for BFloat16.
(TUPLE_ENTRY): Add tuple vector_type information for BFloat16.

(cherry picked from commit 666f167bec09d1234e6496c86b566fe1a71f61f0)

Diff:
---
 gcc/config/riscv/genrvv-type-indexer.cc  | 115 +++
 gcc/config/riscv/riscv-modes.def |  30 +-
 gcc/config/riscv/riscv-vector-builtins-types.def |  50 ++
 gcc/config/riscv/riscv-vector-builtins.cc|   7 +-
 gcc/config/riscv/riscv-vector-builtins.def   |  55 ++-
 gcc/config/riscv/riscv-vector-builtins.h |   1 +
 gcc/config/riscv/riscv-vector-switch.def |  36 +++
 7 files changed, 291 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 27cbd14982c1..8626ddeaaa8b 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PR rtl-optimization/115876] Fix one of two ubsan reported issues in new ext-dce.cc code

2024-07-23 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ead3454f089bc864e448b1bf6ace6b445eca3152

commit ead3454f089bc864e448b1bf6ace6b445eca3152
Author: Jeff Law 
Date:   Fri Jul 12 13:11:33 2024 -0600

[PR rtl-optimization/115876] Fix one of two ubsan reported issues in new 
ext-dce.cc code

David Binderman did a bootstrap build with ubsan enabled which triggered a 
few
errors in the new ext-dce.cc code.  This fixes the trivial case of shifting
negative values.

Bootstrapped and regression tested on x86.

Pushing to the trunk.

gcc/
PR rtl-optimization/115876
* ext-dce.cc (carry_backpropagate): Make mask and mmask unsigned.

(cherry picked from commit a6f551d079de1d151b272bcdd3d42316857c9d4e)

Diff:
---
 gcc/ext-dce.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index adc9084df57d..91789d283fcd 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -374,13 +374,13 @@ binop_implies_op2_fully_live (rtx_code code)
exclusively pertain to the first operand.  */
 
 HOST_WIDE_INT
-carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)
+carry_backpropagate (unsigned HOST_WIDE_INT mask, enum rtx_code code, rtx x)
 {
   if (mask == 0)
 return 0;
 
   enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
-  HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
+  unsigned HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
   switch (code)
 {
 case PLUS:


[gcc r15-2212] [5/n][PR rtl-optimization/115877] Fix handling of input/output operands

2024-07-22 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ad642d2c950657539777ea436b787e7fff4ec09e

commit r15-2212-gad642d2c950657539777ea436b787e7fff4ec09e
Author: Jeff Law 
Date:   Mon Jul 22 21:48:28 2024 -0600

[5/n][PR rtl-optimization/115877] Fix handling of input/output operands

So in this patch we're correcting a failure to mark objects live in 
scenarios
like

(set (dest) (plus (dest) (src))

When handling set pseudos, we transfer the liveness information from LIVENOW
into LIVE_TMP.  LIVE_TMP is subsequently used to narrow what bit groups are
live for the inputs.

The first time we process the block we may not have DEST in the LIVENOW set 
(it
may be live across the loop, but not live after the loop).  Thus we can 
totally
miss making certain objects live, resulting in incorrect code.

The fix is pretty simple.  If LIVE_TMP is empty, then we should go ahead and
mark all the bit groups for the set object in LIVE_TMP.  This also removes 
an
invalid gcc_assert on the state of the liveness bitmaps.

This showed up on pru, rl78 and/or msp430 in the testsuite.  So no new test.

Bootstrapped and regression tested on x86_64 and also run through my tester 
on
all the cross platforms.

Pushing to the trunk.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_sets): Reasonably handle input/output
operands.
(ext_dce_rd_transfer_n): Drop bogus assertion.

Diff:
---
 gcc/ext-dce.cc | 31 ++-
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 21feabd9ce31..c56dfb505b88 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -245,13 +245,25 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  continue;
}
 
- /* Transfer all the LIVENOW bits for X into LIVE_TMP.  */
+ /* LIVE_TMP contains the set groups that are live-out and set in
+this insn.  It is used to narrow the groups live-in for the
+inputs of this insn.
+
+The simple thing to do is mark all the groups as live, but
+that will significantly inhibit optimization.
+
+We also need to be careful in the case where we have an in-out
+operand.  If we're not careful we'd clear LIVE_TMP
+incorrectly.  */
  HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
  int limit = group_limit (SUBREG_REG (x));
  for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
+ if (bitmap_empty_p (live_tmp))
+   make_reg_live (live_tmp, rn);
+
  /* The mode of the SUBREG tells us how many bits we can
 clear.  */
  machine_mode mode = GET_MODE (x);
@@ -316,14 +328,25 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Now handle the actual object that was changed.  */
  if (REG_P (x))
{
- /* Transfer the appropriate bits from LIVENOW into
-LIVE_TMP.  */
+ /* LIVE_TMP contains the set groups that are live-out and set in
+this insn.  It is used to narrow the groups live-in for the
+inputs of this insn.
+
+The simple thing to do is mark all the groups as live, but
+that will significantly inhibit optimization.
+
+We also need to be careful in the case where we have an in-out
+operand.  If we're not careful we'd clear LIVE_TMP
+incorrectly.  */
  HOST_WIDE_INT rn = REGNO (x);
  int limit = group_limit (x);
  for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
+ if (bitmap_empty_p (live_tmp))
+   make_reg_live (live_tmp, rn);
+
  /* Now clear the bits known written by this instruction.
 Note that BIT need not be a power of two, consider a
 ZERO_EXTRACT destination.  */
@@ -935,8 +958,6 @@ ext_dce_rd_transfer_n (int bb_index)
  the generic dataflow code that something changed.  */
   if (!bitmap_equal_p ([bb_index], livenow))
 {
-  gcc_assert (!bitmap_intersect_compl_p ([bb_index], livenow));
-
   bitmap_copy ([bb_index], livenow);
   return true;
 }


[gcc r15-2203] [4/n][PR rtl-optimization/115877] Correct SUBREG handling in a destination

2024-07-22 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ab7c0aed52054976d0b5e12c52e82239d4277b98

commit r15-2203-gab7c0aed52054976d0b5e12c52e82239d4277b98
Author: Jeff Law 
Date:   Mon Jul 22 10:11:57 2024 -0600

[4/n][PR rtl-optimization/115877] Correct SUBREG handling in a destination

If we encounter something during SET handling that we can not handle, the 
safe
thing to do is to ignore the destination and continue the loop.

We've actually been trying to do slightly better with SUBREG destinations by
iterating into SUBREG_REG.  It turns out that wasn't working as expected.

The problem is once we "continue" we lose the state that we were inside the 
SET
and thus we ended up ignoring the destination completely rather than 
tracking
the SUBREG_REG object.  This could be fixed by restarting SET processing, 
but I
just don't see this as all that important to handle.  So rather than leave 
the
code as-is, not working per design, I'm twiddling it to use the common 'skip
subrtxs and continue' idiom used elsewhere.

This is a prerequisite for another patch in this series.  Specifically I 
have a
patch that explicitly tracks if we skipped a destination rather than trying 
to
imply it from the state of LIVE_TMP.  So this is probably NFC right now, but
that's a short-lived NFC.

Bootstrapped and regression tested on x86 and also run as part of a larger 
kit
on the crosses in my tester.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_sets): More correctly handle SUBREG
destinations.

Diff:
---
 gcc/ext-dce.cc | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 44f64e2d18cf..21feabd9ce31 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -270,11 +270,18 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
= GET_MODE_MASK (GET_MODE_INNER (GET_MODE (x)));
  if (SUBREG_P (x))
{
- /* If we have a SUBREG that is too wide, just continue the loop
-and let the iterator go down into SUBREG_REG.  */
+ /* If we have a SUBREG destination that is too wide, just
+skip the destination rather than continuing this iterator.
+While continuing would be better, we'd need to strip the
+subreg and restart within the SET processing rather than
+the top of the loop which just complicates the flow even
+more.  */
  if (!is_a  (GET_MODE (SUBREG_REG (x)), 
_mode)
  || GET_MODE_BITSIZE (outer_mode) > 64)
-   continue;
+   {
+ iter.skip_subrtxes ();
+ continue;
+   }
 
  /* We can safely strip a paradoxical subreg.  The inner mode will
 be narrower than the outer mode.  We'll clear fewer bits in


[gcc r15-2196] [NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as live in ext-dce

2024-07-22 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2

commit r15-2196-g88d16194d0c8a6bdc2896c8944bfbf3e6038c9d2
Author: Jeff Law 
Date:   Mon Jul 22 08:45:10 2024 -0600

[NFC][PR rtl-optimization/115877] Avoid setting irrelevant bit groups as 
live in ext-dce

Another patch to refine liveness computations.  This should be NFC and is
designed to help debugging.

In simplest terms the patch avoids setting bit groups outside the size of a
pseudo as live.  Consider a HImode pseudo, bits 16..63 for such a pseudo 
don't
really have meaning, yet we often set bit groups related to bits 16.63 on in
the liveness bitmaps.

This makes debugging harder than it needs to be by simply having larger 
bitmaps
to verify when walking through the code in a debugger.

This has been bootstrapped and regression tested on x86_64.  It's also been
tested on the crosses in my tester without regressions.

Pushing to the trunk,

PR rtl-optimization/115877
gcc/
* ext-dce.cc (group_limit): New function.
(mark_reg_live): Likewise.
(ext_dce_process_sets): Use new functions.
(ext_dce_process_uses): Likewise.
(ext_dce_init): Likewise.

Diff:
---
 gcc/ext-dce.cc | 64 +++---
 1 file changed, 57 insertions(+), 7 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index cbecfc53dba7..44f64e2d18cf 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -48,6 +48,57 @@ static bool modify;
bit 16..31
bit 32..BITS_PER_WORD-1  */
 
+/* For the given REG, return the number of bit groups implied by the
+   size of the REG's mode, up to a maximum of 4 (number of bit groups
+   tracked by this pass).
+
+   For partial integer and variable sized modes also return 4.  This
+   could possibly be refined for something like PSI mode, but it
+   does not seem worth the effort.  */
+
+static int
+group_limit (const_rtx reg)
+{
+  machine_mode mode = GET_MODE (reg);
+
+  if (!GET_MODE_BITSIZE (mode).is_constant ())
+return 4;
+
+  int size = GET_MODE_SIZE (mode).to_constant ();
+
+  size = exact_log2 (size);
+
+  if (size < 0)
+return 4;
+
+  size++;
+  return (size > 4 ? 4 : size);
+}
+
+/* Make all bit groups live for REGNO in bitmap BMAP.  For hard regs,
+   we assume all groups are live.  For a pseudo we consider the size
+   of the pseudo to avoid creating unnecessarily live chunks of data.  */
+
+static void
+make_reg_live (bitmap bmap, int regno)
+{
+  int limit;
+
+  /* For pseudos we can use the mode to limit how many bit groups
+ are marked as live since a pseudo only has one mode.  Hard
+ registers have to be handled more conservatively.  */
+  if (regno > FIRST_PSEUDO_REGISTER)
+{
+  rtx reg = regno_reg_rtx[regno];
+  limit = group_limit (reg);
+}
+  else
+limit = 4;
+
+  for (int i = 0; i < limit; i++)
+bitmap_set_bit (bmap, regno * 4 + i);
+}
+
 /* Note this pass could be used to narrow memory loads too.  It's
not clear if that's profitable or not in general.  */
 
@@ -196,7 +247,8 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 
  /* Transfer all the LIVENOW bits for X into LIVE_TMP.  */
  HOST_WIDE_INT rn = REGNO (SUBREG_REG (x));
- for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
+ int limit = group_limit (SUBREG_REG (x));
+ for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
@@ -260,7 +312,8 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  /* Transfer the appropriate bits from LIVENOW into
 LIVE_TMP.  */
  HOST_WIDE_INT rn = REGNO (x);
- for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + 4; i++)
+ int limit = group_limit (x);
+ for (HOST_WIDE_INT i = 4 * rn; i < 4 * rn + limit; i++)
if (bitmap_bit_p (livenow, i))
  bitmap_set_bit (live_tmp, i);
 
@@ -692,7 +745,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
   /* If we have a register reference that is not otherwise handled,
 just assume all the chunks are live.  */
   else if (REG_P (x))
-   bitmap_set_range (livenow, REGNO (x) * 4, 4);
+   bitmap_set_range (livenow, REGNO (x) * 4, group_limit (x));
 }
 }
 
@@ -819,10 +872,7 @@ ext_dce_init (void)
   unsigned i;
   bitmap_iterator bi;
   EXECUTE_IF_SET_IN_BITMAP (refs, 0, i, bi)
-{
-  for (int j = 0; j < 4; j++)
-   bitmap_set_bit ([EXIT_BLOCK], i * 4 + j);
-}
+make_reg_live ([EXIT_BLOCK], i);
 
   livenow = BITMAP_ALLOC (NULL);
   all_blocks = BITMAP_ALLOC (NULL);


[gcc r15-2186] [PR rtl-optimization/115877][2/n] Improve liveness computation for constant initialization

2024-07-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9d8ef2711dfecd093077aef6123d9e93ea23454e

commit r15-2186-g9d8ef2711dfecd093077aef6123d9e93ea23454e
Author: Jeff Law 
Date:   Sun Jul 21 08:41:28 2024 -0600

[PR rtl-optimization/115877][2/n] Improve liveness computation for constant 
initialization

While debugging pr115877, I noticed we were failing to remove the 
destination
register from LIVENOW bitmap when it was set to a constant value.  ie  (set
(dest) (const_int)).  This was a trivial oversight in
safe_for_live_propagation.

I don't have an example of this affecting code generation, but it certainly
could.  More importantly, by making LIVENOW more accurate it's easier to 
debug
when LIVENOW differs from expectations.

As with the prior patch this has been tested as part of a larger patchset 
with
the crosses as well as individually on x86_64.

Pushing to the trunk,

PR rtl-optimization/115877
gcc/
* ext-dce.cc (safe_for_live_propagation): Handle RTX_CONST_OBJ.

Diff:
---
 gcc/ext-dce.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index b4450e42ed16..cbecfc53dba7 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -69,6 +69,7 @@ safe_for_live_propagation (rtx_code code)
   switch (GET_RTX_CLASS (code))
 {
   case RTX_OBJ:
+  case RTX_CONST_OBJ:
return true;
 
   case RTX_COMPARE:


[gcc r15-2185] [PR rtl-optimization/115877] Fix livein computation for ext-dce

2024-07-21 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:91e468b72dafc9dcd5dcf7915f1d0ef172264d53

commit r15-2185-g91e468b72dafc9dcd5dcf7915f1d0ef172264d53
Author: Jeff Law 
Date:   Sun Jul 21 07:36:37 2024 -0600

[PR rtl-optimization/115877] Fix livein computation for ext-dce

So I'm not yet sure how I'm going to break everything down, but this is easy
enough to break out as 1/N of ext-dce fixes/improvements.

When handling uses in an insn, we first determine what bits are set in the
destination which is represented in DST_MASK.  Then we use that to refine 
what
bits are live in the source operands.

In the source operand handling section we *modify* DST_MASK if the source
operand is a SUBREG (ugh!).  So if the first operand is a SUBREG, then we 
can
incorrectly compute which bit groups are live in the second operand, 
especially
if it is a SUBREG as well.

This was seen when testing a larger set of patches on the rl78 port
(builtin-arith-overflow-p-7 & pr71631 execution failures), so no new test 
for
this bugfix.

Run through my tester (in conjunction with other ext-dce changes) on the
various cross targets.  Run individually through a bootstrap and regression
test cycle on x86_64 as well.

Pushing to the trunk.

PR rtl-optimization/115877
gcc/
* ext-dce.cc (ext_dce_process_uses): Restore the value of DST_MASK
for reach operand.

Diff:
---
 gcc/ext-dce.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 6d4b8858ec63..b4450e42ed16 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -591,8 +591,10 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
 making things live.  Breaking from this loop will cause
 the iterator to work on sub-rtxs, so it is safe to break
 if we see something we don't know how to handle.  */
+ unsigned HOST_WIDE_INT save_mask = dst_mask;
  for (;;)
{
+ dst_mask = save_mask;
  /* Strip an outer paradoxical subreg.  The bits outside
 the inner mode are don't cares.  So we can just strip
 and process the inner object.  */


Re: Insn combine trying (ior:HI (clobber:HI (const_int 0)))

2024-07-16 Thread Jeff Law via Gcc




On 7/16/24 4:29 AM, Georg-Johann Lay wrote:



Am 15.07.24 um 19:53 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

In a test case I see insn combine trying to match such
expressions, which do not make any sense to me, like:

Trying 2 -> 7:
  2: r45:HI=r48:HI
    REG_DEAD r48:HI
  7: {r47:HI=r45:HI|r46:PSI#0;clobber scratch;}
    REG_DEAD r46:PSI
    REG_DEAD r45:HI
Failed to match this instruction:
(parallel [
  (set (reg:HI 47 [ _4 ])
  (ior:HI (clobber:HI (const_int 0 [0]))
  (reg:HI 48)))
  (clobber (scratch:QI))
  ])

and many other occasions like that.

Is this just insn combine doing its business?

Or should this be some sensible RTL instead?

Seen on target avr with v14 and trunk,
attached test case and dump compiled with


(clobber:M (const_int 0)) is combine's way of representing
"something went wrong here".  And yeah, recog acts as an error
detection mechanism in these cases.  In other words, the idea
is that recog should eventually fail on nonsense rtl like that,
so earlier code doesn't need to check explicitly.

Richard


Hi Richard,

I never saw this before; and the point is that in some functions,
combine is almost exclusively producing these rtxes with clobber
const_int, but almost no meaningful combinations.  Like in:
Perhaps, but the convention of using a (clobber (const_int 0)) for cases 
where combine gets "lost" has been around for decades.



Jeff


[gcc r15-2049] Fix liveness computation for shift/rotate counts in ext-dce

2024-07-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b31b8af807f5459674b0b310cb62a5bc81b676e7

commit r15-2049-gb31b8af807f5459674b0b310cb62a5bc81b676e7
Author: Jeff Law 
Date:   Mon Jul 15 18:15:33 2024 -0600

Fix liveness computation for shift/rotate counts in ext-dce

So as I've noted before I believe the control flow in ext-dce.cc is horribly
messy.  While investigating a fix for 115877 I came across another problem
related to control flow handling.

Specifically, if we have an binary op which implies the 2nd operand is fully
live, then we'd actually fail to mark that operand as live.

We essentially broke out of the loop which was supposed to be safe.  But Y 
was
a REG and if Y is a REG or CONST_INT we skip sub-rtxs and thus failed to
process that operand (the shift count) at all.

Rather than muck around with control flow, we can just set all the bits as 
live
in DST_MASK and let normal processing continue.  With all the bits live IN
DST_MASK all the bits implied by the mode of the argument will also be live.

No testcase.

Bootstrapped and regression tested on x86.  Pushing to the trunk.

gcc/
* ext-dce.cc (ext_dce_process_uses): Simplify control flow and fix
liveness computation for shift/rotate counts.

Diff:
---
 gcc/ext-dce.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 2869a389c3aa..6c961feee635 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -632,10 +632,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap 
live_tmp)
  else if (!CONSTANT_P (y))
break;
 
- /* We might have (ashift (const_int 1) (reg...)) */
- /* XXX share this logic with code below.  */
+ /* We might have (ashift (const_int 1) (reg...))
+By setting dst_mask we can continue iterating on the
+the next operand and it will be considered fully live.  */
  if (binop_implies_op2_fully_live (GET_CODE (src)))
-   break;
+   dst_mask = -1;
 
  /* If this was anything but a binary operand, break the inner
 loop.  This is conservatively correct as it will cause the


[gcc r15-2048] Fix sign/carry bit handling in ext-dce.

2024-07-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:94b21f13763638f64e83e7f9959c7f1523b9eaed

commit r15-2048-g94b21f13763638f64e83e7f9959c7f1523b9eaed
Author: Jeff Law 
Date:   Mon Jul 15 16:57:44 2024 -0600

Fix sign/carry bit handling in ext-dce.

My change to fix a ubsan issue broke handling propagation of the carry/sign 
bit
down through a right shift.  Thanks to Andreas for the analysis and proposed
fix and Sergei for the testcase.

PR rtl-optimization/115876
PR rtl-optimization/115916
gcc/
* ext-dce.cc (carry_backpropagate): Make return type unsigned as 
well.
Cast to signed for right shift to preserve sign bit.

gcc/testsuite/

* g++.dg/torture/pr115916.C: New test.

Co-author: Andreas Schwab 
Co-author: Sergei Trofimovich 

Diff:
---
 gcc/ext-dce.cc  |  4 +-
 gcc/testsuite/g++.dg/torture/pr115916.C | 90 +
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index 91789d283fcd..2869a389c3aa 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -373,7 +373,7 @@ binop_implies_op2_fully_live (rtx_code code)
binop_implies_op2_fully_live (e.g. shifts), the computed mask may
exclusively pertain to the first operand.  */
 
-HOST_WIDE_INT
+unsigned HOST_WIDE_INT
 carry_backpropagate (unsigned HOST_WIDE_INT mask, enum rtx_code code, rtx x)
 {
   if (mask == 0)
@@ -393,7 +393,7 @@ carry_backpropagate (unsigned HOST_WIDE_INT mask, enum 
rtx_code code, rtx x)
 case ASHIFT:
   if (CONSTANT_P (XEXP (x, 1))
  && known_lt (UINTVAL (XEXP (x, 1)), GET_MODE_BITSIZE (mode)))
-   return mask >> INTVAL (XEXP (x, 1));
+   return (HOST_WIDE_INT)mask >> INTVAL (XEXP (x, 1));
   return (2ULL << floor_log2 (mask)) - 1;
 
 /* We propagate for the shifted operand, but not the shift
diff --git a/gcc/testsuite/g++.dg/torture/pr115916.C 
b/gcc/testsuite/g++.dg/torture/pr115916.C
new file mode 100644
index ..3d788678eaa3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr115916.C
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+
+#include 
+#include 
+
+struct ve {
+ve() = default;
+ve(const ve&) = default;
+ve& operator=(const ve&) = default;
+
+// note that the code usually uses the first half of this array
+uint8_t raw[16] = {};
+};
+
+static ve First8_(void) {
+ve m;
+__builtin_memset(m.raw, 0xff, 8);
+return m;
+}
+
+static ve And_(ve a, ve b) {
+ve au;
+__builtin_memcpy(au.raw, a.raw, 16);
+for (size_t i = 0; i < 8; ++i) {
+au.raw[i] &= b.raw[i];
+}
+return au;
+}
+
+__attribute__((noipa, optimize(0)))
+static void vec_assert(ve a) {
+if (a.raw[6] != 0x06 && a.raw[6] != 0x07)
+__builtin_trap();
+}
+
+static ve Reverse4_(ve v) {
+ve ret;
+for (size_t i = 0; i < 8; i += 4) {
+ret.raw[i + 0] = v.raw[i + 3];
+ret.raw[i + 1] = v.raw[i + 2];
+ret.raw[i + 2] = v.raw[i + 1];
+ret.raw[i + 3] = v.raw[i + 0];
+}
+return ret;
+}
+
+static ve DupEven_(ve v) {
+for (size_t i = 0; i < 8; i += 2) {
+v.raw[i + 1] = v.raw[i];
+}
+return v;
+}
+
+template 
+ve Per4LaneBlockShuffle_(ve v) {
+if (b) {
+return Reverse4_(v);
+} else {
+return DupEven_(v);
+}
+}
+
+template 
+static inline __attribute__((always_inline)) void 
DoTestPer4LaneBlkShuffle(const ve v) {
+ve actual = Per4LaneBlockShuffle_(v);
+const auto valid_lanes_mask = First8_();
+ve actual_masked = And_(valid_lanes_mask, actual);
+vec_assert(actual_masked);
+}
+
+static void DoTestPer4LaneBlkShuffles(const ve v) {
+alignas(128) uint8_t src_lanes[8];
+__builtin_memcpy(src_lanes, v.raw, 8);
+// need both, hm
+DoTestPer4LaneBlkShuffle(v);
+DoTestPer4LaneBlkShuffle(v);
+}
+
+__attribute__((noipa, optimize(0)))
+static void bug(void) {
+   uint8_t iv[8] = {1,2,3,4,5,6,7,8};
+   ve v;
+   __builtin_memcpy(v.raw, iv, 8);
+   DoTestPer4LaneBlkShuffles(v);
+}
+
+int main(void) {
+bug();
+}
+


[gcc r15-2011] [PR rtl-optimization/115876] Fix one of two ubsan reported issues in new ext-dce.cc code

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a6f551d079de1d151b272bcdd3d42316857c9d4e

commit r15-2011-ga6f551d079de1d151b272bcdd3d42316857c9d4e
Author: Jeff Law 
Date:   Fri Jul 12 13:11:33 2024 -0600

[PR rtl-optimization/115876] Fix one of two ubsan reported issues in new 
ext-dce.cc code

David Binderman did a bootstrap build with ubsan enabled which triggered a 
few
errors in the new ext-dce.cc code.  This fixes the trivial case of shifting
negative values.

Bootstrapped and regression tested on x86.

Pushing to the trunk.

gcc/
PR rtl-optimization/115876
* ext-dce.cc (carry_backpropagate): Make mask and mmask unsigned.

Diff:
---
 gcc/ext-dce.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index adc9084df57d..91789d283fcd 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -374,13 +374,13 @@ binop_implies_op2_fully_live (rtx_code code)
exclusively pertain to the first operand.  */
 
 HOST_WIDE_INT
-carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)
+carry_backpropagate (unsigned HOST_WIDE_INT mask, enum rtx_code code, rtx x)
 {
   if (mask == 0)
 return 0;
 
   enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
-  HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
+  unsigned HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
   switch (code)
 {
 case PLUS:


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [RISC-V] Avoid unnecessary sign extension after memcmp

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:67116d59607d1897629deb3b95f7ccb9d4d875cb

commit 67116d59607d1897629deb3b95f7ccb9d4d875cb
Author: Jeff Law 
Date:   Fri Jul 12 07:53:41 2024 -0600

[RISC-V] Avoid unnecessary sign extension after memcmp

Similar to the str[n]cmp work, this adjusts the block compare expansion to 
do
its work in X mode with an appropriate lowpart extraction of the results at 
the
end of the sequence.

This has gone through my tester on rv32 and rv64, but that's it. Waiting on
pre-commit testing before moving forward.

gcc/

* config/riscv/riscv-string.cc 
(emit_memcmp_scalar_load_and_compare):
Set RESULT directly rather than using a temporary.
(emit_memcmp_scalar_result_calculation): Similarly.
(riscv_expand_block_compare_scalar): Use CONST0_RTX rather than
generating new RTL.
* config/riscv/riscv.md (cmpmemsi): Pass an X mode temporary to the
expansion routines.  If necessary extract low part of the word to 
store
in final result location.

(cherry picked from commit ae829a27785307232e4db0df6a30ca275941b613)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 15 ++-
 gcc/config/riscv/riscv.md| 14 --
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 4736228e6f14..80d22e87d571 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -663,9 +663,7 @@ emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
   /* Fast-path for a single byte.  */
   if (cmp_bytes == 1)
{
- rtx tmp = gen_reg_rtx (Xmode);
- do_sub3 (tmp, data1, data2);
- emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ do_sub3 (result, data1, data2);
  emit_jump_insn (gen_jump (final_label));
  emit_barrier (); /* No fall-through.  */
  return;
@@ -702,12 +700,11 @@ emit_memcmp_scalar_result_calculation (rtx result, rtx 
data1, rtx data2)
   /* Get bytes in big-endian order and compare as words.  */
   do_bswap2 (data1, data1);
   do_bswap2 (data2, data2);
+
   /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
-  rtx tmp = gen_reg_rtx (Xmode);
-  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
-  do_neg2 (tmp, tmp);
-  do_ior3 (tmp, tmp, const1_rtx);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, result, data1, data2));
+  do_neg2 (result, result);
+  do_ior3 (result, result, const1_rtx);
 }
 
 /* Expand memcmp using scalar instructions (incl. Zbb).
@@ -773,7 +770,7 @@ riscv_expand_block_compare_scalar (rtx result, rtx src1, 
rtx src2, rtx nbytes)
   data1, data2,
   diff_label, final_label);
 
-  emit_insn (gen_rtx_SET (result, gen_rtx_CONST_INT (SImode, 0)));
+  emit_move_insn (result, CONST0_RTX (GET_MODE (result)));
   emit_jump_insn (gen_jump (final_label));
   emit_barrier (); /* No fall-through.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 2e2379dfca4f..5dee837a5878 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2675,9 +2675,19 @@
   operands[2], operands[3]))
 DONE;
 
-  if (riscv_expand_block_compare (operands[0], operands[1], operands[2],
+  rtx temp = gen_reg_rtx (word_mode);
+  if (riscv_expand_block_compare (temp, operands[1], operands[2],
   operands[3]))
-DONE;
+{
+  if (TARGET_64BIT)
+   {
+ temp = gen_lowpart (SImode, temp);
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, SRP_SIGNED);
+   }
+  emit_move_insn (operands[0], temp);
+  DONE;
+}
   else
 FAIL;
 })


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: NO_WARNING preferred else value for RVV

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6322f7af2b8d41d5a298af66b5200d0c6ac79191

commit 6322f7af2b8d41d5a298af66b5200d0c6ac79191
Author: YunQiang Su 
Date:   Thu Jul 11 20:43:54 2024 +0800

RISC-V: NO_WARNING preferred else value for RVV

PR target/115840.

In riscv_preferred_else_value, we create an uninitialized tmp var
for else value, instead of the 0 (as default_preferred_else_value)
or the pre-exists VAR (as aarch64 does), so that we can use agnostic
policy.

The problem is that `warn_uninit` will emit a warning:
  '({anonymous})' may be used uninitialized

Let's mark this tmp var as NO_WARNING.

This problem is found when I try to build glibc with V extension.

gcc

PR target/115840
* config/riscv/riscv.cc(riscv_preferred_else_value): Mark
tmp_var as NO_WARNING.

gcc/testsuite
* gcc.dg/vect/pr115840.c: New testcase.

(cherry picked from commit c6f38e5e6d900b8ed6a4f5c126d3197946cad4dd)

Diff:
---
 gcc/config/riscv/riscv.cc|  6 +-
 gcc/testsuite/gcc.dg/vect/pr115840.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8a8826deac17..07539b62 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11432,7 +11432,11 @@ riscv_preferred_else_value (unsigned ifn, tree 
vectype, unsigned int nops,
tree *ops)
 {
   if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
-return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
+{
+  tree tmp_var = create_tmp_var (vectype);
+  TREE_NO_WARNING (tmp_var) = 1;
+  return get_or_create_ssa_default_def (cfun, tmp_var);
+}
 
   return default_preferred_else_value (ifn, vectype, nops, ops);
 }
diff --git a/gcc/testsuite/gcc.dg/vect/pr115840.c 
b/gcc/testsuite/gcc.dg/vect/pr115840.c
new file mode 100644
index ..09dc9e4eb7c2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115840.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wall -Werror" } */
+
+double loads[16];
+
+void
+foo (double loadavg[], int count)
+{
+  for (int i = 0; i < count; i++)
+loadavg[i] = loads[i] / 1.5;
+}


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Disable misaligned vector access in hook riscv_slow_unaligned_access[PR115862]

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d39d88ed512ae6affbd2f5128b1e55d92d8f5c0a

commit d39d88ed512ae6affbd2f5128b1e55d92d8f5c0a
Author: xuli 
Date:   Thu Jul 11 04:29:11 2024 +

RISC-V: Disable misaligned vector access in hook 
riscv_slow_unaligned_access[PR115862]

The reason is that in the following code, icode = movmisalignv8si has
already been rejected by TARGET_VECTOR_MISALIGN_SUPPORTED, but it is
allowed by targetm.slow_unaligned_access,which is contradictory.

(((icode = optab_handler (movmisalign_optab, mode))
   != CODE_FOR_nothing)
  || targetm.slow_unaligned_access (mode, align))

misaligned vector access should be enabled by -mno-vector-strict-align 
option.

PR target/115862

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_slow_unaligned_access): Disable 
vector misalign.

Signed-off-by: Li Xu 
(cherry picked from commit 63d7d5998e3768f6e3703c29e8774e8b54af108c)

Diff:
---
 gcc/config/riscv/riscv.cc  |  5 ++-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c | 52 ++
 2 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index ce73c18f88f8..8a8826deac17 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10269,9 +10269,10 @@ riscv_cannot_copy_insn_p (rtx_insn *insn)
 /* Implement TARGET_SLOW_UNALIGNED_ACCESS.  */
 
 static bool
-riscv_slow_unaligned_access (machine_mode, unsigned int)
+riscv_slow_unaligned_access (machine_mode mode, unsigned int)
 {
-  return riscv_slow_unaligned_access_p;
+  return VECTOR_MODE_P (mode) ? TARGET_VECTOR_MISALIGN_SUPPORTED
+ : riscv_slow_unaligned_access_p;
 }
 
 static bool
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
new file mode 100644
index ..3cbc3c3a0ea4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=rv64gcv_zvl512b -mabi=lp64d" } */
+
+struct mallinfo2
+{
+  int arena;
+  int ordblks;
+  int smblks;
+  int hblks;
+  int hblkhd;
+  int usmblks;
+  int fsmblks;
+  int uordblks;
+  int fordblks;
+  int keepcost;
+};
+
+struct mallinfo
+{
+  int arena;
+  int ordblks;
+  int smblks;
+  int hblks;
+  int hblkhd;
+  int usmblks;
+  int fsmblks;
+  int uordblks;
+  int fordblks;
+  int keepcost;
+};
+
+struct mallinfo
+__libc_mallinfo (void)
+{
+  struct mallinfo m;
+  struct mallinfo2 m2;
+
+  m.arena = m2.arena;
+  m.ordblks = m2.ordblks;
+  m.smblks = m2.smblks;
+  m.hblks = m2.hblks;
+  m.hblkhd = m2.hblkhd;
+  m.usmblks = m2.usmblks;
+  m.fsmblks = m2.fsmblks;
+  m.uordblks = m2.uordblks;
+  m.fordblks = m2.fordblks;
+  m.keepcost = m2.keepcost;
+
+  return m;
+}
+
+/* { dg-final { scan-assembler {vle32\.v} } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add SiFive extensions, xsfvcp and xsfcease

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:68a2ba4a346992f1399abb410c5fb788aa9bca25

commit 68a2ba4a346992f1399abb410c5fb788aa9bca25
Author: Kito Cheng 
Date:   Tue Jul 9 15:50:57 2024 +0800

RISC-V: Add SiFive extensions, xsfvcp and xsfcease

We have already upstreamed these extensions into binutils, and now we need 
GCC
to recognize these extensions and pass them to binutils as well. We also 
plan
to upstream intrinsics in the near future. :)

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info): Add 
xsfvcp.
(riscv_ext_version_table): Add xsfvcp, xsfcease.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt (riscv_sifive_subext): New.
(XSFVCP): New.
(XSFCEASE): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-sf-1.c: New.
* gcc.target/riscv/predef-sf-2.c: New.

(cherry picked from commit 3ea47ea1fcab95fd1b80acc724fdbb27fc436985)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc  |  8 
 gcc/config/riscv/riscv.opt   |  7 +++
 gcc/testsuite/gcc.target/riscv/predef-sf-1.c | 19 +++
 gcc/testsuite/gcc.target/riscv/predef-sf-2.c | 14 ++
 4 files changed, 48 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3c4178c19c99..d883efa7a3ab 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -216,6 +216,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"ssstateen", "zicsr"},
   {"sstc", "zicsr"},
 
+  {"xsfvcp", "zve32x"},
+
   {NULL, NULL}
 };
 
@@ -415,6 +417,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"xsfvcp",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xsfcease", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1822,6 +1827,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"xventanacondops", _options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
+  {"xsfvcp",   _options::x_riscv_sifive_subext, MASK_XSFVCP},
+  {"xsfcease", _options::x_riscv_sifive_subext, MASK_XSFCEASE},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 32a0dda58439..a1d70b636382 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -507,6 +507,13 @@ int riscv_xventana_subext
 
 Mask(XVENTANACONDOPS) Var(riscv_xventana_subext)
 
+TargetVariable
+int riscv_sifive_subext
+
+Mask(XSFVCP) Var(riscv_sifive_subext)
+
+Mask(XSFCEASE) Var(riscv_sifive_subext)
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
diff --git a/gcc/testsuite/gcc.target/riscv/predef-sf-1.c 
b/gcc/testsuite/gcc.target/riscv/predef-sf-1.c
new file mode 100644
index ..d6c07e7d9207
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-sf-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g_xsfvcp -mabi=lp64" } */
+
+int main () {
+#if !defined(__riscv)
+#error "__riscv"
+#endif
+
+#if !defined(__riscv_zve32x)
+#error "__riscv_zve32x"
+#endif
+
+
+#if !defined(__riscv_xsfvcp)
+#error "__riscv_xsfvcp"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-sf-2.c 
b/gcc/testsuite/gcc.target/riscv/predef-sf-2.c
new file mode 100644
index ..dcb746bcd260
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-sf-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g_xsfcease -mabi=lp64" } */
+
+int main () {
+#if !defined(__riscv)
+#error "__riscv"
+#endif
+
+#if !defined(__riscv_xsfcease)
+#error "__riscv_xsfvcp"
+#endif
+
+  return 0;
+}


[gcc r15-2006] [RISC-V] Avoid unnecessary sign extension after memcmp

2024-07-12 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ae829a27785307232e4db0df6a30ca275941b613

commit r15-2006-gae829a27785307232e4db0df6a30ca275941b613
Author: Jeff Law 
Date:   Fri Jul 12 07:53:41 2024 -0600

[RISC-V] Avoid unnecessary sign extension after memcmp

Similar to the str[n]cmp work, this adjusts the block compare expansion to 
do
its work in X mode with an appropriate lowpart extraction of the results at 
the
end of the sequence.

This has gone through my tester on rv32 and rv64, but that's it. Waiting on
pre-commit testing before moving forward.

gcc/

* config/riscv/riscv-string.cc 
(emit_memcmp_scalar_load_and_compare):
Set RESULT directly rather than using a temporary.
(emit_memcmp_scalar_result_calculation): Similarly.
(riscv_expand_block_compare_scalar): Use CONST0_RTX rather than
generating new RTL.
* config/riscv/riscv.md (cmpmemsi): Pass an X mode temporary to the
expansion routines.  If necessary extract low part of the word to 
store
in final result location.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 15 ++-
 gcc/config/riscv/riscv.md| 14 --
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 4736228e6f14..80d22e87d571 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -663,9 +663,7 @@ emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
   /* Fast-path for a single byte.  */
   if (cmp_bytes == 1)
{
- rtx tmp = gen_reg_rtx (Xmode);
- do_sub3 (tmp, data1, data2);
- emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ do_sub3 (result, data1, data2);
  emit_jump_insn (gen_jump (final_label));
  emit_barrier (); /* No fall-through.  */
  return;
@@ -702,12 +700,11 @@ emit_memcmp_scalar_result_calculation (rtx result, rtx 
data1, rtx data2)
   /* Get bytes in big-endian order and compare as words.  */
   do_bswap2 (data1, data1);
   do_bswap2 (data2, data2);
+
   /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
-  rtx tmp = gen_reg_rtx (Xmode);
-  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
-  do_neg2 (tmp, tmp);
-  do_ior3 (tmp, tmp, const1_rtx);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, result, data1, data2));
+  do_neg2 (result, result);
+  do_ior3 (result, result, const1_rtx);
 }
 
 /* Expand memcmp using scalar instructions (incl. Zbb).
@@ -773,7 +770,7 @@ riscv_expand_block_compare_scalar (rtx result, rtx src1, 
rtx src2, rtx nbytes)
   data1, data2,
   diff_label, final_label);
 
-  emit_insn (gen_rtx_SET (result, gen_rtx_CONST_INT (SImode, 0)));
+  emit_move_insn (result, CONST0_RTX (GET_MODE (result)));
   emit_jump_insn (gen_jump (final_label));
   emit_barrier (); /* No fall-through.  */
 
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 2e2379dfca4f..5dee837a5878 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2675,9 +2675,19 @@
   operands[2], operands[3]))
 DONE;
 
-  if (riscv_expand_block_compare (operands[0], operands[1], operands[2],
+  rtx temp = gen_reg_rtx (word_mode);
+  if (riscv_expand_block_compare (temp, operands[1], operands[2],
   operands[3]))
-DONE;
+{
+  if (TARGET_64BIT)
+   {
+ temp = gen_lowpart (SImode, temp);
+ SUBREG_PROMOTED_VAR_P (temp) = 1;
+ SUBREG_PROMOTED_SET (temp, SRP_SIGNED);
+   }
+  emit_move_insn (operands[0], temp);
+  DONE;
+}
   else
 FAIL;
 })


[gcc r15-1989] [committed] Fix m68k bootstrap segfault with late-combine

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a91c51c187a78e4164bc4039ebdb543848e379d2

commit r15-1989-ga91c51c187a78e4164bc4039ebdb543848e379d2
Author: Jeff Law 
Date:   Thu Jul 11 21:37:34 2024 -0600

[committed] Fix m68k bootstrap segfault with late-combine

So the m68k port has failed to bootstrap since the introduction of
late-combine.  My suspicion has been this is a backend problem.  Sure enough
after bisecting things down (thank goodness for the debug counter!) I'm 
happy
to report m68k (after this patch) has moved into its stage3 build for the 
first
time in a month.

Basically late-combine propagated an address calculation to its use points,
generating this insn (dwarf2out.c, I forget what function):

> (insn 653 652 655 (parallel [
> (set (mem/j:DI (plus:SI (plus:SI (reg/f:SI 9 %a1 [orig:64 _67 
] [64])
> (reg:SI 0 %d0 [321]))
> (const_int 20 [0x14])) [0 
slot_204->dw_attr_val.v.val_unsigned+0 S8 A16])
> (sign_extend:DI (mem/c:SI (plus:SI (reg/f:SI 14 %a6)
> (const_int -28 [0xffe4])) [870 
%sfp+-28 S4 A16])))
> (clobber (reg:SI 0 %d0))
> ]) "../../../gcc/gcc/dwarf2out.cc":24961:23 93 {extendsidi2}
>  (expr_list:REG_DEAD (reg/f:SI 9 %a1 [orig:64 _67 ] [64])
> (expr_list:REG_DEAD (reg:SI 0 %d0 [321])
> (expr_list:REG_UNUSED (reg:SI 0 %d0)
> (nil)
Note how the output uses d0 in the address calculation and the clobber uses 
d0.

It matches this insn in the md file:

> (define_insn "extendsidi2"
>   [(set (match_operand:DI 0 "nonimmediate_operand" "=d,o,o,<")
> (sign_extend:DI
>  (match_operand:SI 1 "nonimmediate_src_operand" "rm,rm,r,rm")))
>(clobber (match_scratch:SI 2 "=X,,,"))]
>   ""
> {
>   if (which_alternative == 0)
> /* Handle alternative 0.  */
> {
>   if (TARGET_68020 || TARGET_COLDFIRE)
> return "move%.l %1,%R0\;smi %0\;extb%.l %0";
>   else
> return "move%.l %1,%R0\;smi %0\;ext%.w %0\;ext%.l %0";
> }
>
>   /* Handle alternatives 1, 2 and 3.  We don't need to adjust address by 4
>  in alternative 3 because autodecrement will do that for us.  */
>   operands[3] = adjust_address (operands[0], SImode,
> which_alternative == 3 ? 0 : 4);
>   operands[0] = adjust_address (operands[0], SImode, 0);
>
>   if (TARGET_68020 || TARGET_COLDFIRE)
> return "move%.l %1,%3\;smi %2\;extb%.l %2\;move%.l %2,%0";
>   else
> return "move%.l %1,%3\;smi %2\;ext%.w %2\;ext%.l %2\;move%.l %2,%0";
> }
>   [(set_attr "ok_for_coldfire" "yes,no,yes,yes")])
Note the smi/ext instruction pair in the case for alternatives 1..3.  Those
clobber the scratch register before we're done consuming inputs.  The 
scratch
register really needs to be marked as an earlyclobber.

That fixes the bootstrap problem, but a cursory review of m68k.md is not
encouraging.  I will not be surprised at all if there's more of this kind of
problem lurking.

But happy to at least have m68k bootstrapping again.   It's failing the
comparison test, but definitely progress.

* config/m68k/m68k.md (extendsidi2): Add missing early clobbers.

Diff:
---
 gcc/config/m68k/m68k.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/m68k/m68k.md b/gcc/config/m68k/m68k.md
index 037978db40c0..e5c252888448 100644
--- a/gcc/config/m68k/m68k.md
+++ b/gcc/config/m68k/m68k.md
@@ -1887,7 +1887,7 @@
   [(set (match_operand:DI 0 "nonimmediate_operand" "=d,o,o,<")
(sign_extend:DI
 (match_operand:SI 1 "nonimmediate_src_operand" "rm,rm,r,rm")))
-   (clobber (match_scratch:SI 2 "=X,d,d,d"))]
+   (clobber (match_scratch:SI 2 "=X,,,"))]
   ""
 {
   if (which_alternative == 0)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed, RISC-V] Eliminate unnecessary sign extension after inlined str[n]cmp

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e0bf7a4734598402720ed5b86285705c1791e457

commit e0bf7a4734598402720ed5b86285705c1791e457
Author: Jeff Law 
Date:   Thu Jul 11 12:05:56 2024 -0600

[to-be-committed,RISC-V] Eliminate unnecessary sign extension after inlined 
str[n]cmp

This patch eliminates an unnecessary sign extension for scalar inlined
string comparisons on rv64.

Conceptually this is pretty simple.  Prove all the paths which "return"
a value from the inlined string comparison already have sign extended
values.

FINAL_LABEL is the point after the calculation of the return value.  So
if we have a jump to FINAL_LABEL, we must have a properly extended
result value at that point.

Second we're going to arrange in the .md part of the expander to use an
X mode temporary for the result.  After computing the result we will (if
necessary) extract the low part of the result using a SUBREG tagged with
the appropriate SUBREG_PROMOTED_* bits.

So with that background.

We find a jump to FINAL_LABEL in emit_strcmp_scalar_compare_byte.  Since
we know the result is X mode, we can just emit the subtraction of the
two chars in X mode and we'll have a properly sign extended result.

There's 4 jumps to final_label in emit_strcmp_scalar.

The first is just returning zero and needs trivial simplification to not
force the result into SImode.

The second is after calling strcmp in the library.  The ABI mandates
that value is sign extended, so there's nothing to do for that case.

The 3rd occurs after a call to
emit_strcmp_scalar_result_calculation_nonul.  If we dive into that
routine it needs simplificationq similar to what we did in
emit_strcmp_scalar_compare_byte

The 4th occurs after a call to emit_strcmp_scalar_result_calculation
which again needs trivial adjustment like we've done in the other routines.

Finally, at the end of expand_strcmp, just store the X mode result
sitting in SUB to RESULT.

The net of all that is we know every path has its result properly
extended to X mode.  Standard redundant extension removal will take care
of the rest.

We've been running this within Ventana for about 6 months, so naturally
it's been through various QA cycles, dhrystone, spec2017, etc.  It's
also been through a build/test cycle in my tester.  Waiting on results
from the pre-commit testing before moving forward.

gcc/
* config/riscv/riscv-string.cc
(emit_strcmp_scalar_compare_byte): Set RESULT directly rather
than using a new temporary.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Use CONST0_RTX rather than
generating a new node.
(expand_strcmp): Copy directly from SUB to RESULT.
* config/riscv/riscv.md (cmpstrnsi, cmpstrsi): Pass an X
mode temporary to the expansion routines.  If necessary
extract low part of the word to store in final result location.

(cherry picked from commit 74d8accaf88f83bfcab1150bf9be5140e7ac0e94)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 15 +--
 gcc/config/riscv/riscv.md| 28 
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 257a514d2901..4736228e6f14 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -140,9 +140,7 @@ static void
 emit_strcmp_scalar_compare_byte (rtx result, rtx data1, rtx data2,
 rtx final_label)
 {
-  rtx tmp = gen_reg_rtx (Xmode);
-  do_sub3 (tmp, data1, data2);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_sub3 (result, data1, data2);
   emit_jump_insn (gen_jump (final_label));
   emit_barrier (); /* No fall-through.  */
 }
@@ -310,8 +308,7 @@ emit_strcmp_scalar_result_calculation_nonul (rtx result, 
rtx data1, rtx data2)
   rtx tmp = gen_reg_rtx (Xmode);
   emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
   do_neg2 (tmp, tmp);
-  do_ior3 (tmp, tmp, const1_rtx);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_ior3 (result, tmp, const1_rtx);
 }
 
 /* strcmp-result calculation.
@@ -367,9 +364,7 @@ emit_strcmp_scalar_result_calculation (rtx result, rtx 
data1, rtx data2,
   unsigned int shiftr = (xlen - 1) * BITS_PER_UNIT;
   do_lshr3 (data1, data1, GEN_INT (shiftr));
   do_lshr3 (data2, data2, GEN_INT (shiftr));
-  rtx tmp = gen_reg_rtx (Xmode);
-  do_sub3 (tmp, data1, data2);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_sub3 (result, data1, data2);
 }
 
 /* Expand str(n)cmp using Zbb/TheadBb instructions.
@@ -444,7 +439,7 @@ riscv_expand_strcmp_scalar (rtx result, rtx src1, 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1726acdf1f7a3c3129a08fa571d750c5d09f8176

commit 1726acdf1f7a3c3129a08fa571d750c5d09f8176
Author: Pan Li 
Date:   Thu Jul 11 15:54:32 2024 +0800

RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

This patch would like to add the test cases for the vector .SAT_SUB in
the zip benchmark.  Aka:

Form in zip benchmark:
  #define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
  void __attribute__((noinline))\
  vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
  { \
T2 a;   \
T1 *p = x;  \
do {\
  a = *--p; \
  *p = (T1)(a >= b ? a - b : 0);\
} while (--limit);  \
  }

DEF_VEC_SAT_U_SUB_ZIP(uint8_t, uint16_t)

vec_sat_u_sub_uint16_t_uint32_t_fmt_zip:
  ...
  vsetvli   a4,zero,e32,m1,ta,ma
  vmv.v.x   v6,a1
  vsetvli   zero,zero,e16,mf2,ta,ma
  vid.v v2
  lia4,-1
  vnclipu.wiv6,v6,0   // .SAT_TRUNC
.L3:
  vle16.v   v3,0(a3)
  vrsub.vx  v5,v2,a6
  mva7,a4
  addw  a4,a4,t3
  vrgather.vv   v1,v3,v5
  vssubu.vv v1,v1,v6  // .SAT_SUB
  vrgather.vv   v3,v1,v5
  vse16.v   v3,0(a3)
  sub   a3,a3,t1
  bgtu  t4,a4,.L3

Passed the rv64gcv tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add test
helper macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test
data for .SAT_SUB in zip benchmark.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit b3c686416e88bf135def0e72d316713af01445a1)

Diff:
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h| 18 +
 .../riscv/rvv/autovec/binop/vec_sat_binary_vx.h| 22 ++
 .../riscv/rvv/autovec/binop/vec_sat_data.h | 81 ++
 .../rvv/autovec/binop/vec_sat_u_sub_zip-run.c  | 16 +
 .../riscv/rvv/autovec/binop/vec_sat_u_sub_zip.c| 18 +
 5 files changed, 155 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 10459807b2c4..416a1e49a47b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -322,6 +322,19 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 } \
 }
 
+#define DEF_VEC_SAT_U_SUB_ZIP(T1, T2) \
+void __attribute__((noinline))\
+vec_sat_u_sub_##T1##_##T2##_fmt_zip (T1 *x, T2 b, unsigned limit) \
+{ \
+  T2 a;   \
+  T1 *p = x;  \
+  do {\
+a = *--p; \
+*p = (T1)(a >= b ? a - b : 0);\
+  } while (--limit);  \
+}
+#define DEF_VEC_SAT_U_SUB_ZIP_WRAP(T1, T2) DEF_VEC_SAT_U_SUB_ZIP(T1, T2)
+
 #define RUN_VEC_SAT_U_SUB_FMT_1(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_1(out, op_1, op_2, N)
 
@@ -352,6 +365,11 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
 
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+  vec_sat_u_sub_##T1##_##T2##_fmt_zip(x, b, N)
+#define RUN_VEC_SAT_U_SUB_FMT_ZIP_WRAP(T1, T2, x, b, N) \
+  RUN_VEC_SAT_U_SUB_FMT_ZIP(T1, T2, x, b, N) \
+
 
/**/
 /* Saturation Sub Truncated (Unsigned and Signed) 
*/
 
/**/
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
new file mode 100644
index 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: c implies zca, and conditionally zcf & zcd

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b3586773824f09bf2bf16ce398d2f988428c0821

commit b3586773824f09bf2bf16ce398d2f988428c0821
Author: Fei Gao 
Date:   Wed Jul 10 10:12:02 2024 +

RISC-V: c implies zca, and conditionally zcf & zcd

According to Zc-1.0.4-3.pdf from

https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
The rule is that:
- C always implies Zca
- C+F implies Zcf (RV32 only)
- C+D implies Zcd

Signed-off-by: Fei Gao 
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
c implies zca, and conditionally zcf & zcd.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute-15.c: adapt TC.
* gcc.target/riscv/attribute-16.c: likewise.
* gcc.target/riscv/attribute-17.c: likewise.
* gcc.target/riscv/attribute-18.c: likewise.
* gcc.target/riscv/pr110696.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
* gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
* gcc.target/riscv/arch-39.c: New test.
* gcc.target/riscv/arch-40.c: New test.

(cherry picked from commit 36e5e409190e595638cec053ea034d20d5c74d6b)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc  | 12 
 gcc/testsuite/gcc.target/riscv/arch-39.c |  7 +++
 gcc/testsuite/gcc.target/riscv/arch-40.c |  7 +++
 gcc/testsuite/gcc.target/riscv/attribute-15.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-16.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-17.c|  2 +-
 gcc/testsuite/gcc.target/riscv/attribute-18.c|  2 +-
 gcc/testsuite/gcc.target/riscv/pr110696.c|  2 +-
 .../gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c  |  2 +-
 .../gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c  |  2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-1.c |  4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/base/pr114352-3.c |  8 
 12 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b0a16f5bd30f..3c4178c19c99 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -82,6 +82,18 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"a", "zaamo"},
   {"a", "zalrsc"},
 
+  {"c", "zca"},
+  {"c", "zcf",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->xlen () == 32 && subset_list->lookup ("f");
+   }},
+  {"c", "zcd",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->lookup ("d");
+   }},
+
   {"zabha", "zaamo"},
 
   {"b", "zba"},
diff --git a/gcc/testsuite/gcc.target/riscv/arch-39.c 
b/gcc/testsuite/gcc.target/riscv/arch-39.c
new file mode 100644
index ..beeb81e44c50
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-39.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmt -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmt" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-40.c 
b/gcc/testsuite/gcc.target/riscv/arch-40.c
new file mode 100644
index ..eaefaf1d0d75
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-40.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64idc_zcmp -mabi=lp64d" } */
+int
+foo ()
+{}
+
+/* { dg-error "zcd conflicts with zcmp" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-15.c 
b/gcc/testsuite/gcc.target/riscv/attribute-15.c
index a2e394b6489b..ac6caaecd4f7 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-15.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-15.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0\"" } } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0\"" 
} } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-16.c 
b/gcc/testsuite/gcc.target/riscv/attribute-16.c
index d2b18160cb5d..539e426ca976 100644
--- a/gcc/testsuite/gcc.target/riscv/attribute-16.c
+++ b/gcc/testsuite/gcc.target/riscv/attribute-16.c
@@ -3,4 +3,4 @@
 int foo()
 {
 }
-/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0\"" 
} } */
+/* { dg-final { scan-assembler ".attribute arch, 
\"rv32i2p1_m2p0_a2p0_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0\""
 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/attribute-17.c 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Update testsuite to use b

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6b32e85e97ef5f5654bb070172659f967c5ba50f

commit 6b32e85e97ef5f5654bb070172659f967c5ba50f
Author: Edwin Lu 
Date:   Wed Jul 3 17:17:27 2024 -0700

RISC-V: Update testsuite to use b

Update all instances of zba_zbb_zbs in the testsuite to use b instead

gcc/testsuite/ChangeLog:

* g++.target/riscv/redundant-bitmap-1.C: Use gcb instead of
zba_zbb_zbs
* g++.target/riscv/redundant-bitmap-2.C: Ditto
* g++.target/riscv/redundant-bitmap-3.C: Ditto
* g++.target/riscv/redundant-bitmap-4.C: Ditto
* gcc.target/riscv/shift-add-1.c: Ditto
* gcc.target/riscv/shift-add-2.c: Ditto
* gcc.target/riscv/synthesis-1.c: Ditto
* gcc.target/riscv/synthesis-2.c: Ditto
* gcc.target/riscv/synthesis-3.c: Ditto
* gcc.target/riscv/synthesis-4.c: Ditto
* gcc.target/riscv/synthesis-5.c: Ditto
* gcc.target/riscv/synthesis-6.c: Ditto
* gcc.target/riscv/synthesis-7.c: Ditto
* gcc.target/riscv/synthesis-8.c: Ditto
* gcc.target/riscv/zba_zbs_and-1.c: Ditto
* gcc.target/riscv/zbs-zext-3.c: Ditto
* lib/target-supports.exp: Add b to riscv_get_arch

Signed-off-by: Edwin Lu 
(cherry picked from commit 04df2a924bba38c271bfe4ed0e94af1877413818)

Diff:
---
 gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C | 2 +-
 gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C | 2 +-
 gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C | 2 +-
 gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C | 2 +-
 gcc/testsuite/gcc.target/riscv/shift-add-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/shift-add-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-1.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-2.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-3.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-4.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-5.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-6.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-7.c| 2 +-
 gcc/testsuite/gcc.target/riscv/synthesis-8.c| 2 +-
 gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/zbs-zext-3.c | 4 ++--
 gcc/testsuite/lib/target-supports.exp   | 2 +-
 17 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
index 37066f10eeae..62bb2ab7b67d 100644
--- a/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-options "-O2 -march=rv64gcb -mabi=lp64" } */
 
 void setBit(char , int b) {
 char c = 0x1UL << b;
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C
index 86acaba298fc..52204daecd11 100644
--- a/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-options "-O2 -march=rv64gcb -mabi=lp64" } */
 
 void setBit(char , int b) {
 char c = 0x1UL << b;
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C
index 16bd7c1785e7..6745220f2f41 100644
--- a/gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-options "-O2 -march=rv64gcb -mabi=lp64" } */
 
 void setBit(char , int b) {
 char c = 0x1UL << b;
diff --git a/gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C 
b/gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C
index f664ee01a016..5e351fe457e9 100644
--- a/gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C
+++ b/gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-options "-O2 -march=rv64gcb -mabi=lp64" } */
 
 void setBit(char , int b) {
 char c = 0x1UL << b;
diff --git a/gcc/testsuite/gcc.target/riscv/shift-add-1.c 
b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
index d98875c32716..db84a51a2227 100644
--- a/gcc/testsuite/gcc.target/riscv/shift-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/shift-add-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zba_zbb_zbs -mabi=lp64" } */
+/* { dg-options "-march=rv64gcb -mabi=lp64" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
 
 int composeFromSurrogate(const unsigned short high) {
diff --git 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add support for B standard extension

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ac013ac6c759c6398db35fcc76265300375f5b37

commit ac013ac6c759c6398db35fcc76265300375f5b37
Author: Edwin Lu 
Date:   Wed Jul 10 09:44:48 2024 -0700

RISC-V: Add support for B standard extension

This patch adds support for recognizing the B standard extension to be the
collection of Zba, Zbb, Zbs extensions for consistency and conciseness
across toolchains

https://github.com/riscv/riscv-b/tags

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add imply rules for B 
extension
* config/riscv/arch-canonicalize: Ditto

Signed-off-by: Edwin Lu 
(cherry picked from commit 2a90c41a131080e5fdd2b5554fcdba5c654cb93f)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc | 7 +++
 gcc/config/riscv/arch-canonicalize  | 1 +
 2 files changed, 8 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index dab2e7679653..b0a16f5bd30f 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -84,6 +84,10 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"zabha", "zaamo"},
 
+  {"b", "zba"},
+  {"b", "zbb"},
+  {"b", "zbs"},
+
   {"zdinx", "zfinx"},
   {"zfinx", "zicsr"},
   {"zdinx", "zicsr"},
@@ -245,6 +249,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"c", ISA_SPEC_CLASS_20190608, 2, 0},
   {"c", ISA_SPEC_CLASS_2P2,  2, 0},
 
+  {"b",   ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"h",   ISA_SPEC_CLASS_NONE, 1, 0},
 
   {"v",   ISA_SPEC_CLASS_NONE, 1, 0},
@@ -405,6 +411,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 static const struct riscv_ext_version riscv_combine_info[] =
 {
   {"a", ISA_SPEC_CLASS_20191213, 2, 1},
+  {"b",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zk",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zkn",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zks",  ISA_SPEC_CLASS_NONE, 1, 0},
diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index 35a7fe4455a6..2ea514dd9869 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -45,6 +45,7 @@ IMPLIED_EXT = {
   "zabha" : ["zaamo"],
 
   "f" : ["zicsr"],
+  "b" : ["zba", "zbb", "zbs"],
   "zdinx" : ["zfinx", "zicsr"],
   "zfinx" : ["zicsr"],
   "zhinx" : ["zhinxmin", "zfinx", "zicsr"],


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: fix zcmp popretz [PR113715]

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4649bea8adb5b29d566dd4c63f68e56a877ebefc

commit 4649bea8adb5b29d566dd4c63f68e56a877ebefc
Author: Fei Gao 
Date:   Tue Jul 9 10:00:29 2024 +

RISC-V: fix zcmp popretz [PR113715]

No functional changes compared with V1, just spaces to table conversion
in testcases to pass check-function-bodies.

V1 passed regression locally but suprisingly failed in pre-commit CI, after
picking the patch from patchwork, I realize table got coverted to spaces
before sending the patch.

Root cause:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
Commit above tries in targetm.gen_epilogue () to detect if
there's li a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and li a0,0 insn is deleted.

Insertion of the generated epilogue sequence
into the insn chain doesn't happen at this moment.
If later shrink-wrap decides NOT to insert the epilogue sequence at the end
of insn chain, then the li a0,0 insn has already been mistakeny removed.

Fix this issue by removing generation of cm.popretz in epilogue,
leaving the assignment to a0 and use insn with cm.popret.

That's likely going to result in some kind of code size regression,
but not a correctness regression.

Optimization can be done in future.

Signed-off-by: Fei Gao 

gcc/ChangeLog:
PR target/113715

* config/riscv/riscv.cc (riscv_zcmp_can_use_popretz): Removed.
(riscv_gen_multi_pop_insn): Remove generation of cm.popretz.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rv32e_zcmp.c: Adapt TC.
* gcc.target/riscv/rv32i_zcmp.c: Likewise.

(cherry picked from commit 7a345d0314f8cf0f15ca3664b1e4430d65764570)

Diff:
---
 gcc/config/riscv/riscv.cc   | 53 -
 gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c |  3 +-
 gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c |  3 +-
 3 files changed, 4 insertions(+), 55 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4acd643fd8d3..ce73c18f88f8 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -8167,52 +8167,6 @@ riscv_adjust_libcall_cfi_epilogue ()
   return dwarf;
 }
 
-/* return true if popretz pattern can be matched.
-   set (reg 10 a0) (const_int 0)
-   use (reg 10 a0)
-   NOTE_INSN_EPILOGUE_BEG  */
-static rtx_insn *
-riscv_zcmp_can_use_popretz (void)
-{
-  rtx_insn *insn = NULL, *use = NULL, *clear = NULL;
-
-  /* sequence stack for NOTE_INSN_EPILOGUE_BEG*/
-  struct sequence_stack *outer_seq = get_current_sequence ()->next;
-  if (!outer_seq)
-return NULL;
-  insn = outer_seq->first;
-  if (!insn || !NOTE_P (insn) || NOTE_KIND (insn) != NOTE_INSN_EPILOGUE_BEG)
-return NULL;
-
-  /* sequence stack for the insn before NOTE_INSN_EPILOGUE_BEG*/
-  outer_seq = outer_seq->next;
-  if (outer_seq)
-insn = outer_seq->last;
-
-  /* skip notes  */
-  while (insn && NOTE_P (insn))
-{
-  insn = PREV_INSN (insn);
-}
-  use = insn;
-
-  /* match use (reg 10 a0)  */
-  if (use == NULL || !INSN_P (use) || GET_CODE (PATTERN (use)) != USE
-  || !REG_P (XEXP (PATTERN (use), 0))
-  || REGNO (XEXP (PATTERN (use), 0)) != A0_REGNUM)
-return NULL;
-
-  /* match set (reg 10 a0) (const_int 0 [0])  */
-  clear = PREV_INSN (use);
-  if (clear != NULL && INSN_P (clear) && GET_CODE (PATTERN (clear)) == SET
-  && REG_P (SET_DEST (PATTERN (clear)))
-  && REGNO (SET_DEST (PATTERN (clear))) == A0_REGNUM
-  && SET_SRC (PATTERN (clear)) == const0_rtx)
-return clear;
-
-  return NULL;
-}
-
 static void
 riscv_gen_multi_pop_insn (bool use_multi_pop_normal, unsigned mask,
  unsigned multipop_size)
@@ -8223,13 +8177,6 @@ riscv_gen_multi_pop_insn (bool use_multi_pop_normal, 
unsigned mask,
   if (!use_multi_pop_normal)
 insn = emit_insn (
   riscv_gen_multi_push_pop_insn (POP_IDX, multipop_size, regs_count));
-  else if (rtx_insn *clear_a0_insn = riscv_zcmp_can_use_popretz ())
-{
-  delete_insn (NEXT_INSN (clear_a0_insn));
-  delete_insn (clear_a0_insn);
-  insn = emit_jump_insn (
-   riscv_gen_multi_push_pop_insn (POPRETZ_IDX, multipop_size, regs_count));
-}
   else
 insn = emit_jump_insn (
   riscv_gen_multi_push_pop_insn (POPRET_IDX, multipop_size, regs_count));
diff --git a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c 
b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
index 50e443573ad9..0af4d7199f68 100644
--- a/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
+++ b/gcc/testsuite/gcc.target/riscv/rv32e_zcmp.c
@@ -259,7 +259,8 @@ foo (void)
 **test_popretz:
 ** cm.push {ra}, -16
 ** callf1
-** cm.popretz  {ra}, 16
+** li  a0,0
+** cm.popret   {ra}, 16
 */
 long
 test_popretz ()
diff --git a/gcc/testsuite/gcc.target/riscv/rv32i_zcmp.c 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Fix comment/naming in attribute parsing code

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:16d1d5b7db194fa7e0373bf78fccabdf8ef0d009

commit 16d1d5b7db194fa7e0373bf78fccabdf8ef0d009
Author: Christoph Müllner 
Date:   Fri Jul 5 04:58:07 2024 +0200

RISC-V: Fix comment/naming in attribute parsing code

Function target attributes have to be separated by semi-colons.
Let's fix the comment and variable naming to better explain what
the code does.

gcc/ChangeLog:

* config/riscv/riscv-target-attr.cc (riscv_process_target_attr):
Fix comments and variable names.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5ef0b7d2048a7142174ee3e8e021fc1a9c3e3334)

Diff:
---
 gcc/config/riscv/riscv-target-attr.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-target-attr.cc 
b/gcc/config/riscv/riscv-target-attr.cc
index 19eb7b06d548..0bbe7df25d19 100644
--- a/gcc/config/riscv/riscv-target-attr.cc
+++ b/gcc/config/riscv/riscv-target-attr.cc
@@ -338,11 +338,11 @@ riscv_process_target_attr (tree fndecl, tree args, 
location_t loc,
   char *str_to_check = buf.get ();
   strcpy (str_to_check, TREE_STRING_POINTER (args));
 
-  /* Used to catch empty spaces between commas i.e.
+  /* Used to catch empty spaces between semi-colons i.e.
  attribute ((target ("attr1;;attr2"))).  */
-  unsigned int num_commas = num_occurences_in_str (';', str_to_check);
+  unsigned int num_semicolons = num_occurences_in_str (';', str_to_check);
 
-  /* Handle multiple target attributes separated by ','.  */
+  /* Handle multiple target attributes separated by ';'.  */
   char *token = strtok_r (str_to_check, ";", _to_check);
 
   riscv_target_attr_parser attr_parser (loc);
@@ -354,7 +354,7 @@ riscv_process_target_attr (tree fndecl, tree args, 
location_t loc,
   token = strtok_r (NULL, ";", _to_check);
 }
 
-  if (num_attrs != num_commas + 1)
+  if (num_attrs != num_semicolons + 1)
 {
   error_at (loc, "malformed % attribute",
TREE_STRING_POINTER (args));


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Deduplicate arch subset list processing

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6b9226f62dd3a5425815a0b1ddf64215548b2669

commit 6b9226f62dd3a5425815a0b1ddf64215548b2669
Author: Christoph Müllner 
Date:   Fri Jul 5 01:09:46 2024 +0200

RISC-V: Deduplicate arch subset list processing

We have a code duplication in riscv_set_arch_by_subset_list() and
riscv_parse_arch_string(), where the latter function parses an ISA string
into a subset_list before doing the same as the former function.

riscv_parse_arch_string() is used to process command line options and
riscv_set_arch_by_subset_list() processes target attributes.
So, it is obvious that both functions should do the same.
Let's deduplicate the code to enforce this.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_set_arch_by_subset_list):
Fix overlong line.
(riscv_parse_arch_string): Replace duplicated code by a call to
riscv_set_arch_by_subset_list.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 85fa334fbcaa8e4b98ab197a8c9410dde87f0ae3)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index b9bda3e110a2..dab2e7679653 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1826,7 +1826,8 @@ riscv_set_arch_by_subset_list (riscv_subset_list 
*subset_list,
   else if (subset_list->xlen () == 64)
opts->x_target_flags |= MASK_64BIT;
 
-  for (arch_ext_flag_tab = _ext_flag_table[0]; 
arch_ext_flag_tab->ext;
+  for (arch_ext_flag_tab = _ext_flag_table[0];
+  arch_ext_flag_tab->ext;
   ++arch_ext_flag_tab)
{
  if (subset_list->lookup (arch_ext_flag_tab->ext))
@@ -1850,30 +1851,6 @@ riscv_parse_arch_string (const char *isa,
   if (!subset_list)
 return;
 
-  if (opts)
-{
-  const riscv_ext_flag_table_t *arch_ext_flag_tab;
-  /* Clean up target flags before we set.  */
-  for (arch_ext_flag_tab = _ext_flag_table[0];
-  arch_ext_flag_tab->ext;
-  ++arch_ext_flag_tab)
-   opts->*arch_ext_flag_tab->var_ref &= ~arch_ext_flag_tab->mask;
-
-  if (subset_list->xlen () == 32)
-   opts->x_target_flags &= ~MASK_64BIT;
-  else if (subset_list->xlen () == 64)
-   opts->x_target_flags |= MASK_64BIT;
-
-
-  for (arch_ext_flag_tab = _ext_flag_table[0];
-  arch_ext_flag_tab->ext;
-  ++arch_ext_flag_tab)
-   {
- if (subset_list->lookup (arch_ext_flag_tab->ext))
-   opts->*arch_ext_flag_tab->var_ref |= arch_ext_flag_tab->mask;
-   }
-}
-
   /* Avoid double delete if current_subset_list equals cmdline_subset_list.  */
   if (current_subset_list && current_subset_list != cmdline_subset_list)
 delete current_subset_list;
@@ -1881,7 +1858,10 @@ riscv_parse_arch_string (const char *isa,
   if (cmdline_subset_list)
 delete cmdline_subset_list;
 
-  current_subset_list = cmdline_subset_list = subset_list;
+  cmdline_subset_list = subset_list;
+  /* current_subset_list is set in the call below.  */
+
+  riscv_set_arch_by_subset_list (subset_list, opts);
 }
 
 /* Return the riscv_cpu_info entry for CPU, NULL if not found.  */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: testsuite: Properly gate LTO tests

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6f195492e9798607b475c14934b0b77cf258a2a6

commit 6f195492e9798607b475c14934b0b77cf258a2a6
Author: Christoph Müllner 
Date:   Fri Jul 5 09:53:34 2024 +0200

RISC-V: testsuite: Properly gate LTO tests

There are two test cases with the following skip directive:
  dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" }
This reads as: skip if both '-flto' and '-fno-fat-lto-objects'
are present.  This is not the case if only '-flto' is present.

Since both tests depend on instruction sequences (one does
check-function-bodies the other tests for an assembler error
message), they won't work reliably with fat LTO objects.

Let's change the skip line to gate the test on '-flto'
to avoid failing tests like this:

FAIL: gcc.target/riscv/interrupt-misaligned.c   -O2 -flto   
check-function-bodies interrupt
FAIL: gcc.target/riscv/interrupt-misaligned.c   -O2 -flto 
-flto-partition=none   check-function-bodies interrupt
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto   (test for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto   (test for errors, line 9)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto -flto-partition=none   (test 
for errors, line 10)
FAIL: gcc.target/riscv/pr93202.c   -O2 -flto -flto-partition=none   (test 
for errors, line 9)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/interrupt-misaligned.c: Remove
"-fno-fat-lto-objects" from skip condition.
* gcc.target/riscv/pr93202.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 0717d50fc4ff983b79093bdef43b04e4584cc3cd)

Diff:
---
 gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c | 2 +-
 gcc/testsuite/gcc.target/riscv/pr93202.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c 
b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
index b5f8e6c2bbef..912f180e4d65 100644
--- a/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
+++ b/gcc/testsuite/gcc.target/riscv/interrupt-misaligned.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -march=rv64gc -mabi=lp64d -fno-schedule-insns 
-fno-schedule-insns2" } */
-/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
 
 /*  Make sure no stack offset are misaligned.
 **  interrupt:
diff --git a/gcc/testsuite/gcc.target/riscv/pr93202.c 
b/gcc/testsuite/gcc.target/riscv/pr93202.c
index 5501191ea52c..5de003fac421 100644
--- a/gcc/testsuite/gcc.target/riscv/pr93202.c
+++ b/gcc/testsuite/gcc.target/riscv/pr93202.c
@@ -1,7 +1,7 @@
 /* PR inline-asm/93202 */
 /* { dg-do compile { target fpic } } */
 /* { dg-options "-fpic" } */
-/* { dg-skip-if "" { *-*-* } { "-flto -fno-fat-lto-objects" } } */
+/* { dg-skip-if "" { *-*-* } { "-flto" } } */
 
 void
 foo (void)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 2

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f3efae407d19273bf91c16e588ff63a80f0baf26

commit f3efae407d19273bf91c16e588ff63a80f0baf26
Author: Pan Li 
Date:   Mon Jul 8 21:58:59 2024 +0800

RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 2

After the middle-end supported the vector mode of .SAT_ADD,  add more
testcases to ensure the correctness of RISC-V backend for form 2.  Aka:

Form 2:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM)  \
  T __attribute__((noinline))  \
  vec_sat_u_add_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  out[i] = (T)(in[i] + IMM) < in[i] ? -1 : (in[i] + IMM);  \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_2 (uint64_t, 9)

Passed the fully rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help
test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-6.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-7.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-8.c: New 
test.

Signed-off-by: Pan Li 
(cherry picked from commit ecde8d50bea3573194f21277666f83463cbbe9c9)

Diff:
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h| 17 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c  | 14 +++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-6.c  | 14 +++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-7.c  | 14 +++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-8.c  | 14 +++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-5.c| 28 ++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-6.c| 28 ++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-7.c| 28 ++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-8.c| 28 ++
 9 files changed, 185 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index 3733c8fd2c15..10459807b2c4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -158,12 +158,29 @@ vec_sat_u_add_imm##IMM##_##T##_fmt_1 (T *out, T *in, 
unsigned limit) \
 #define DEF_VEC_SAT_U_ADD_IMM_FMT_1_WRAP(T, IMM) \
   DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM)
 
+#define DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM)  \
+T __attribute__((noinline))  \
+vec_sat_u_add_imm##IMM##_##T##_fmt_2 (T *out, T *in, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+out[i] = (T)(in[i] + IMM) < in[i] ? -1 : (in[i] + IMM);  \
+}
+#define DEF_VEC_SAT_U_ADD_IMM_FMT_2_WRAP(T, IMM) \
+  DEF_VEC_SAT_U_ADD_IMM_FMT_2(T, IMM)
+
 #define RUN_VEC_SAT_U_ADD_IMM_FMT_1(T, out, op_1, expect, IMM, N) \
   vec_sat_u_add_imm##IMM##_##T##_fmt_1(out, op_1, N); \
   VALIDATE_RESULT (out, expect, N)
 #define RUN_VEC_SAT_U_ADD_IMM_FMT_1_WRAP(T, out, op_1, expect, IMM, N) \
   RUN_VEC_SAT_U_ADD_IMM_FMT_1(T, out, op_1, expect, IMM, N)
 
+#define RUN_VEC_SAT_U_ADD_IMM_FMT_2(T, out, op_1, expect, IMM, N) \
+  vec_sat_u_add_imm##IMM##_##T##_fmt_2(out, op_1, N); \
+  VALIDATE_RESULT (out, expect, N)
+#define RUN_VEC_SAT_U_ADD_IMM_FMT_2_WRAP(T, out, op_1, expect, IMM, N) \
+  RUN_VEC_SAT_U_ADD_IMM_FMT_2(T, out, op_1, expect, IMM, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c
new file mode 100644
index ..d25fdcf78f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-5.c
@@ -0,0 +1,14 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed][RISC-V][V3] DCE analysis for extension elimination

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d678469ed4f98cee0180ff6a9961920404e82a14

commit d678469ed4f98cee0180ff6a9961920404e82a14
Author: Jeff Law 
Date:   Mon Jul 8 17:06:55 2024 -0600

[to-be-committed][RISC-V][V3] DCE analysis for extension elimination

The pre-commit testing showed that making ext-dce only active at -O2 and 
above
would require minor edits to the tests.  In some cases we had specified -O1 
in
the test or specified no optimization level at all. Those need to be bumped 
to
-O2.   In one test we had one set of dg-options overriding another.

The other approach that could have been taken would be to drop the -On
argument, add an explicit -fext-dce and add dg-skip-if options.  I 
originally
thought that was going to be way to go, but the dg-skip-if aspect was going 
to
get ugly as things like interaction between unrolling, peeling and -ftracer
would have to be accounted for and would likely need semi-regular 
adjustment.

Changes since V2:
  Testsuite changes to deal with pass only being enabled at -O2 or
  higher.

--

Changes since V1:

  Check flag_ext_dce before running the new pass.  I'd forgotten that
  I had removed that part of the gate to facilitate more testing.
  Turn flag_ext_dce on at -O2 and above.
  Adjust one of the riscv tests to explicitly avoid vectors
  Adjust a few aarch64 tests
In tbz_2.c we remove an unnecessary extension which causes us to use
"x" registers instead of "w" registers.

In the pred_clobber tests we also remove an extension and that
ultimately causes a reg->reg copy to change locations.

--

This was actually ack'd late in the gcc-14 cycle, but I chose not to 
integrate
it given how late we were in the cycle.

The basic idea here is to track liveness of subobjects within a word and if 
we
find an extension where the bits set aren't actually used, then we convert 
the
extension into a subreg.  The subreg typically simplifies away.

I've seen this help a few routines in coremark, fix one bug in the testsuite
(pr111384) and fix a couple internally reported bugs in Ventana.

The original idea and code were from Joern; Jivan and I hacked it into 
usable
shape.  I've had this in my tester for ~8 months, so it's been through more
build/test cycles than I care to contemplate and nearly every architecture 
we
support.

But just in case, I'm going to wait for it to spin through the pre-commit CI
tester.  I'll find my old ChangeLog before committing.

gcc/
* Makefile.in (OBJS): Add ext-dce.o
* common.opt (ext-dce): Document new option.
* df-scan.cc (df_get_ext_block_use_set): Delete prototype and
make extern.
* df.h (df_get_exit_block_use_set): Prototype.
* ext-dce.cc: New file/pass.
* opts.cc (default_options_table): Handle ext-dce at -O2 or higher.
* passes.def: Add ext-dce before combine.
* tree-pass.h (make_pass_ext_dce): Prototype.

gcc/testsuite
* gcc.target/aarch64/sve/pred_clobber_1.c: Update expected output.
* gcc.target/aarch64/sve/pred_clobber_2.c: Likewise.
* gcc.target/aarch64/sve/pred_clobber_3.c: Likewise.
* gcc.target/aarch64/tbz_2.c: Likewise.
* gcc.target/riscv/core_bench_list.c: New test.
* gcc.target/riscv/core_init_matrix.c: New test.
* gcc.target/riscv/core_list_init.c: New test.
* gcc.target/riscv/matrix_add_const.c: New test.
* gcc.target/riscv/mem-extend.c: New test.
* gcc.target/riscv/pr111384.c: New test.

Co-authored-by: Jivan Hakobyan 
Co-authored-by: Joern Rennecke 

(cherry picked from commit 98914f9eba5f19d3eb93fbce8726b5264631cba0)

Diff:
---
 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/df-scan.cc|   3 +-
 gcc/df.h  |   1 +
 gcc/ext-dce.cc| 943 ++
 gcc/opts.cc   |   1 +
 gcc/passes.def|   1 +
 gcc/testsuite/gcc.target/aarch64/tbz_2.c  |   6 +-
 gcc/testsuite/gcc.target/riscv/core_bench_list.c  |  15 +
 gcc/testsuite/gcc.target/riscv/core_init_matrix.c |  17 +
 gcc/testsuite/gcc.target/riscv/core_list_init.c   |  18 +
 gcc/testsuite/gcc.target/riscv/matrix_add_const.c |  13 +
 gcc/testsuite/gcc.target/riscv/mem-extend.c   |  14 +
 gcc/testsuite/gcc.target/riscv/pr111384.c |  11 +
 gcc/tree-pass.h   |   1 +
 15 files changed, 1044 insertions(+), 5 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 1

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:79aca63c26b3eee221c26ee62803fa19e5996502

commit 79aca63c26b3eee221c26ee62803fa19e5996502
Author: Pan Li 
Date:   Mon Jul 8 20:31:31 2024 +0800

RISC-V: Add testcases for unsigned vector .SAT_ADD IMM form 1

After the middle-end supported the vector mode of .SAT_ADD,  add more
testcases to ensure the correctness of RISC-V backend for form 1.  Aka:

Form 1:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM)  \
  T __attribute__((noinline))  \
  vec_sat_u_add_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  out[i] = (T)(in[i] + IMM) >= in[i] ? (in[i] + IMM) : -1; \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_1 (uint64_t, 9)

Passed the fully rv64gcv regression tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add help
test macro.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-4.c: New 
test.

Signed-off-by: Pan Li 
(cherry picked from commit 35b1096896a94a90d787f5ef402ba009dd4f0393)

Diff:
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h|  25 ++
 .../riscv/rvv/autovec/binop/vec_sat_data.h | 256 +
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-1.c  |  14 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-2.c  |  14 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-3.c  |  14 ++
 .../riscv/rvv/autovec/binop/vec_sat_u_add_imm-4.c  |  14 ++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-1.c|  28 +++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-2.c|  28 +++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-3.c|  28 +++
 .../rvv/autovec/binop/vec_sat_u_add_imm-run-4.c|  28 +++
 10 files changed, 449 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index b55a589e019a..3733c8fd2c15 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -4,6 +4,14 @@
 #include 
 #include 
 
+#define VALIDATE_RESULT(out, expect, N)  \
+  do \
+{\
+  for (unsigned i = 0; i < N; i++)   \
+if (out[i] != expect[i]) __builtin_abort (); \
+}\
+  while (false)
+
 
/**/
 /* Saturation Add (unsigned and signed)   
*/
 
/**/
@@ -139,6 +147,23 @@ vec_sat_u_add_##T##_fmt_8 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_ADD_FMT_8(T, out, op_1, op_2, N) \
   vec_sat_u_add_##T##_fmt_8(out, op_1, op_2, N)
 
+#define DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM)  \
+T __attribute__((noinline))  \
+vec_sat_u_add_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+out[i] = (T)(in[i] + IMM) >= in[i] ? (in[i] + IMM) : -1; \
+}
+#define DEF_VEC_SAT_U_ADD_IMM_FMT_1_WRAP(T, IMM) \
+  DEF_VEC_SAT_U_ADD_IMM_FMT_1(T, IMM)
+
+#define RUN_VEC_SAT_U_ADD_IMM_FMT_1(T, out, op_1, expect, IMM, N) \
+  vec_sat_u_add_imm##IMM##_##T##_fmt_1(out, op_1, N); \
+  VALIDATE_RESULT (out, expect, N)
+#define RUN_VEC_SAT_U_ADD_IMM_FMT_1_WRAP(T, out, op_1, expect, IMM, N) \
+  RUN_VEC_SAT_U_ADD_IMM_FMT_1(T, out, op_1, expect, IMM, N)
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:cbee085f82008a05b335754ea8b9764c3a78202c

commit cbee085f82008a05b335754ea8b9764c3a78202c
Author: Pan Li 
Date:   Fri Jul 5 09:02:47 2024 +0800

RISC-V: Implement .SAT_TRUNC for vector unsigned int

This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:

* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8

Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \
  void __attribute__((noinline))\
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  { \
unsigned i; \
for (i = 0; i < limit; i++) \
  { \
WT x = in[i];   \
bool overflow = x > (WT)(NT)(-1);   \
out[i] = ((NT)x) | (NT)-overflow;   \
  } \
  }

DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)

Before this patch:
.L3:
  vsetvli  a5,a2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vmsgtu.vvv0,v1,v2
  vsetvli  zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

After this patch:
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle64.v  v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

Passed the rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec.md (ustrunc2): Add
new pattern for double truncation.
(ustrunc2): Ditto but for quad truncation.
(ustrunc2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_double_ustrunc): Add
new func decl to expand double vec ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.
* config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new
func impl to expand vector double ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New 
test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New 
test.

Signed-off-by: Pan Li 
(cherry picked from commit dafd63d7c5cddce1e00803606e742d75927b1a1e)

Diff:
---
 gcc/config/riscv/autovec.md|  35 ++
 gcc/config/riscv/riscv-protos.h|   4 +
 gcc/config/riscv/riscv-v.cc|  46 +++
 .../riscv/rvv/autovec/binop/vec_sat_arith.h|  22 ++
 .../riscv/rvv/autovec/unop/vec_sat_data.h  | 394 +
 .../riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c |  19 +
 .../riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c |  21 ++
 .../riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c |  23 ++
 .../riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [RISC-V] add implied extension repeatly until stable

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:fe7b8250ee5190f31a622d99e394861f0688feef

commit fe7b8250ee5190f31a622d99e394861f0688feef
Author: Fei Gao 
Date:   Fri Jul 5 09:56:30 2024 +

[RISC-V] add implied extension repeatly until stable

Call handle_implied_ext repeatly until there's no
new subset added into the subset list.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc 
(riscv_subset_list::riscv_subset_list):
init m_subset_num to 0.
(riscv_subset_list::add): increase m_subset_num once a subset added.
(riscv_subset_list::finalize): call handle_implied_ext repeatly
until no change in m_subset_num.
* config/riscv/riscv-subset.h: add m_subset_num member.

Signed-off-by: Fei Gao 
(cherry picked from commit 682731d11f9c02b24358d1af1e2bf6fca0221ee7)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc | 14 +++---
 gcc/config/riscv/riscv-subset.h |  3 +++
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 16bdb3fd2259..b9bda3e110a2 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -569,7 +569,8 @@ riscv_subset_t::riscv_subset_t ()
 }
 
 riscv_subset_list::riscv_subset_list (const char *arch, location_t loc)
-  : m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen (0)
+  : m_arch (arch), m_loc (loc), m_head (NULL), m_tail (NULL), m_xlen (0),
+m_subset_num (0)
 {
 }
 
@@ -815,6 +816,7 @@ riscv_subset_list::add (const char *subset, int 
major_version,
   return;
 }
 
+  m_subset_num++;
   riscv_subset_t *s = new riscv_subset_t ();
   riscv_subset_t *itr;
 
@@ -1597,9 +1599,15 @@ void
 riscv_subset_list::finalize ()
 {
   riscv_subset_t *subset;
+  unsigned pre_subset_num;
 
-  for (subset = m_head; subset != NULL; subset = subset->next)
-handle_implied_ext (subset->name.c_str ());
+  do
+{
+  pre_subset_num = m_subset_num;
+  for (subset = m_head; subset != NULL; subset = subset->next)
+   handle_implied_ext (subset->name.c_str ());
+}
+  while (pre_subset_num != m_subset_num);
 
   gcc_assert (check_implied_ext ());
 
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index fe7f54d8bc57..7dc196a20074 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -62,6 +62,9 @@ private:
   /* X-len of m_arch. */
   unsigned m_xlen;
 
+  /* Number of subsets. */
+  unsigned m_subset_num;
+
   riscv_subset_list (const char *, location_t);
 
   const char *parsing_subset_version (const char *, const char *, unsigned *,


[gcc r15-1976] [to-be-committed, RISC-V] Eliminate unnecessary sign extension after inlined str[n]cmp

2024-07-11 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:74d8accaf88f83bfcab1150bf9be5140e7ac0e94

commit r15-1976-g74d8accaf88f83bfcab1150bf9be5140e7ac0e94
Author: Jeff Law 
Date:   Thu Jul 11 12:05:56 2024 -0600

[to-be-committed,RISC-V] Eliminate unnecessary sign extension after inlined 
str[n]cmp

This patch eliminates an unnecessary sign extension for scalar inlined
string comparisons on rv64.

Conceptually this is pretty simple.  Prove all the paths which "return"
a value from the inlined string comparison already have sign extended
values.

FINAL_LABEL is the point after the calculation of the return value.  So
if we have a jump to FINAL_LABEL, we must have a properly extended
result value at that point.

Second we're going to arrange in the .md part of the expander to use an
X mode temporary for the result.  After computing the result we will (if
necessary) extract the low part of the result using a SUBREG tagged with
the appropriate SUBREG_PROMOTED_* bits.

So with that background.

We find a jump to FINAL_LABEL in emit_strcmp_scalar_compare_byte.  Since
we know the result is X mode, we can just emit the subtraction of the
two chars in X mode and we'll have a properly sign extended result.

There's 4 jumps to final_label in emit_strcmp_scalar.

The first is just returning zero and needs trivial simplification to not
force the result into SImode.

The second is after calling strcmp in the library.  The ABI mandates
that value is sign extended, so there's nothing to do for that case.

The 3rd occurs after a call to
emit_strcmp_scalar_result_calculation_nonul.  If we dive into that
routine it needs simplificationq similar to what we did in
emit_strcmp_scalar_compare_byte

The 4th occurs after a call to emit_strcmp_scalar_result_calculation
which again needs trivial adjustment like we've done in the other routines.

Finally, at the end of expand_strcmp, just store the X mode result
sitting in SUB to RESULT.

The net of all that is we know every path has its result properly
extended to X mode.  Standard redundant extension removal will take care
of the rest.

We've been running this within Ventana for about 6 months, so naturally
it's been through various QA cycles, dhrystone, spec2017, etc.  It's
also been through a build/test cycle in my tester.  Waiting on results
from the pre-commit testing before moving forward.

gcc/
* config/riscv/riscv-string.cc
(emit_strcmp_scalar_compare_byte): Set RESULT directly rather
than using a new temporary.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Use CONST0_RTX rather than
generating a new node.
(expand_strcmp): Copy directly from SUB to RESULT.
* config/riscv/riscv.md (cmpstrnsi, cmpstrsi): Pass an X
mode temporary to the expansion routines.  If necessary
extract low part of the word to store in final result location.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 15 +--
 gcc/config/riscv/riscv.md| 28 
 2 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 257a514d2901..4736228e6f14 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -140,9 +140,7 @@ static void
 emit_strcmp_scalar_compare_byte (rtx result, rtx data1, rtx data2,
 rtx final_label)
 {
-  rtx tmp = gen_reg_rtx (Xmode);
-  do_sub3 (tmp, data1, data2);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_sub3 (result, data1, data2);
   emit_jump_insn (gen_jump (final_label));
   emit_barrier (); /* No fall-through.  */
 }
@@ -310,8 +308,7 @@ emit_strcmp_scalar_result_calculation_nonul (rtx result, 
rtx data1, rtx data2)
   rtx tmp = gen_reg_rtx (Xmode);
   emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
   do_neg2 (tmp, tmp);
-  do_ior3 (tmp, tmp, const1_rtx);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_ior3 (result, tmp, const1_rtx);
 }
 
 /* strcmp-result calculation.
@@ -367,9 +364,7 @@ emit_strcmp_scalar_result_calculation (rtx result, rtx 
data1, rtx data2,
   unsigned int shiftr = (xlen - 1) * BITS_PER_UNIT;
   do_lshr3 (data1, data1, GEN_INT (shiftr));
   do_lshr3 (data2, data2, GEN_INT (shiftr));
-  rtx tmp = gen_reg_rtx (Xmode);
-  do_sub3 (tmp, data1, data2);
-  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+  do_sub3 (result, data1, data2);
 }
 
 /* Expand str(n)cmp using Zbb/TheadBb instructions.
@@ -444,7 +439,7 @@ riscv_expand_strcmp_scalar (rtx result, rtx src1, rtx src2,
   /* All compared and everything was equal.  */
   if 

[gcc r15-1901] [to-be-committed][RISC-V][V3] DCE analysis for extension elimination

2024-07-08 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:98914f9eba5f19d3eb93fbce8726b5264631cba0

commit r15-1901-g98914f9eba5f19d3eb93fbce8726b5264631cba0
Author: Jeff Law 
Date:   Mon Jul 8 17:06:55 2024 -0600

[to-be-committed][RISC-V][V3] DCE analysis for extension elimination

The pre-commit testing showed that making ext-dce only active at -O2 and 
above
would require minor edits to the tests.  In some cases we had specified -O1 
in
the test or specified no optimization level at all. Those need to be bumped 
to
-O2.   In one test we had one set of dg-options overriding another.

The other approach that could have been taken would be to drop the -On
argument, add an explicit -fext-dce and add dg-skip-if options.  I 
originally
thought that was going to be way to go, but the dg-skip-if aspect was going 
to
get ugly as things like interaction between unrolling, peeling and -ftracer
would have to be accounted for and would likely need semi-regular 
adjustment.

Changes since V2:
  Testsuite changes to deal with pass only being enabled at -O2 or
  higher.

--

Changes since V1:

  Check flag_ext_dce before running the new pass.  I'd forgotten that
  I had removed that part of the gate to facilitate more testing.
  Turn flag_ext_dce on at -O2 and above.
  Adjust one of the riscv tests to explicitly avoid vectors
  Adjust a few aarch64 tests
In tbz_2.c we remove an unnecessary extension which causes us to use
"x" registers instead of "w" registers.

In the pred_clobber tests we also remove an extension and that
ultimately causes a reg->reg copy to change locations.

--

This was actually ack'd late in the gcc-14 cycle, but I chose not to 
integrate
it given how late we were in the cycle.

The basic idea here is to track liveness of subobjects within a word and if 
we
find an extension where the bits set aren't actually used, then we convert 
the
extension into a subreg.  The subreg typically simplifies away.

I've seen this help a few routines in coremark, fix one bug in the testsuite
(pr111384) and fix a couple internally reported bugs in Ventana.

The original idea and code were from Joern; Jivan and I hacked it into 
usable
shape.  I've had this in my tester for ~8 months, so it's been through more
build/test cycles than I care to contemplate and nearly every architecture 
we
support.

But just in case, I'm going to wait for it to spin through the pre-commit CI
tester.  I'll find my old ChangeLog before committing.

gcc/
* Makefile.in (OBJS): Add ext-dce.o
* common.opt (ext-dce): Document new option.
* df-scan.cc (df_get_ext_block_use_set): Delete prototype and
make extern.
* df.h (df_get_exit_block_use_set): Prototype.
* ext-dce.cc: New file/pass.
* opts.cc (default_options_table): Handle ext-dce at -O2 or higher.
* passes.def: Add ext-dce before combine.
* tree-pass.h (make_pass_ext_dce): Prototype.

gcc/testsuite
* gcc.target/aarch64/sve/pred_clobber_1.c: Update expected output.
* gcc.target/aarch64/sve/pred_clobber_2.c: Likewise.
* gcc.target/aarch64/sve/pred_clobber_3.c: Likewise.
* gcc.target/aarch64/tbz_2.c: Likewise.
* gcc.target/riscv/core_bench_list.c: New test.
* gcc.target/riscv/core_init_matrix.c: New test.
* gcc.target/riscv/core_list_init.c: New test.
* gcc.target/riscv/matrix_add_const.c: New test.
* gcc.target/riscv/mem-extend.c: New test.
* gcc.target/riscv/pr111384.c: New test.

Co-authored-by: Jivan Hakobyan 
Co-authored-by: Joern Rennecke 

Diff:
---
 gcc/Makefile.in|   1 +
 gcc/common.opt |   4 +
 gcc/df-scan.cc |   3 +-
 gcc/df.h   |   1 +
 gcc/ext-dce.cc | 943 +
 gcc/opts.cc|   1 +
 gcc/passes.def |   1 +
 .../gcc.target/aarch64/sve/pred_clobber_1.c|   1 +
 .../gcc.target/aarch64/sve/pred_clobber_2.c|   1 +
 .../gcc.target/aarch64/sve/pred_clobber_3.c|   1 +
 gcc/testsuite/gcc.target/aarch64/tbz_2.c   |   6 +-
 gcc/testsuite/gcc.target/riscv/core_bench_list.c   |  15 +
 gcc/testsuite/gcc.target/riscv/core_init_matrix.c  |  17 +
 gcc/testsuite/gcc.target/riscv/core_list_init.c|  18 +
 gcc/testsuite/gcc.target/riscv/matrix_add_const.c  |  13 +
 gcc/testsuite/gcc.target/riscv/mem-extend.c|  14 +
 gcc/testsuite/gcc.target/riscv/pr111384.c  |  11 +
 gcc/tree-pass.h|   1 +
 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:576d8c0e57bc6894ab1215bfd45406b8adcfd2dd

commit 576d8c0e57bc6894ab1215bfd45406b8adcfd2dd
Author: Jeff Law 
Date:   Sat Jul 6 12:57:59 2024 -0600

[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

Last patch in this round of bitmanip work...  At least I think I'm going to
pause here and switch gears to other projects that need attention 

This patch introduces the ability to generate bitmanip instructions for rv64
when operating on SI objects when we know something about the range of the 
bit
position (due to masking of the position).

I've got note that the (7-pos % 8) bit position form was discovered by RAU 
in
500.perl.  I took that and expanded it to the simple (pos & mask) form as 
well
as covering bset, binv and bclr.

As far as the implementation is concerned

This turns the recently added define_splits into define_insn_and_split
constructs.  This allows combine to "see" enough RTL to realize a sign
extension is unnecessary.  Otherwise we get undesirable sign extensions for 
the
new testcases.

Second it adds new patterns for the logical operations.  Two patterns for
IOR/XOR and two patterns for AND.

I think a key concept to keep in mind is that once we determine a Zbs 
operation
is safe to perform on a SI value, we can rewrite the RTL in 64bit form.  If 
we
were ever to try and use range information at expand time for this stuff 
(and
we probably should investigate that), that's the path I'd suggest.

This is notably cleaner than my original implementation which actually kept 
the
more complex RTL form through final and emitted 2/3 instructions (mask the 
bit
position, then the bset/bclr/binv).

Tested in my tester, but waiting for pre-commit CI to report back before 
taking
further action.

gcc/
* config/riscv/bitmanip.md (bset splitters): Turn into 
define_and_splits.
Don't depend on combine splitting the "andn with constant" form.
(bset, binv, bclr with masked bit position): New patterns.

gcc/testsuite
* gcc.target/riscv/binv-for-simode-1.c: New test.
* gcc.target/riscv/bset-for-simode-1.c: New test.
* gcc.target/riscv/bclr-for-simode-1.c: New test.

(cherry picked from commit 273f16a125c4fab664683376ae04a9a31e7d6a22)

Diff:
---
 gcc/config/riscv/bitmanip.md   | 135 ++---
 gcc/testsuite/gcc.target/riscv/bclr-for-simode-1.c |  25 
 gcc/testsuite/gcc.target/riscv/binv-for-simode-1.c |  24 
 gcc/testsuite/gcc.target/riscv/bset-for-simode-1.c |  24 
 4 files changed, 192 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3eedabffca0..f403ba8dbba 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -615,37 +615,140 @@
 ;; shift constant.  With the limited range we know the SImode sign
 ;; bit is never set, thus we can treat this as zero extending and
 ;; generate the bsetdi_2 pattern.
-(define_split
-  [(set (match_operand:DI 0 "register_operand")
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
(any_extend:DI
 (ashift:SI (const_int 1)
(subreg:QI
- (and:DI (not:DI (match_operand:DI 1 "register_operand"))
+ (and:DI (not:DI (match_operand:DI 1 "register_operand" 
"r"))
  (match_operand 2 "const_int_operand")) 0
-   (clobber (match_operand:DI 3 "register_operand"))]
+   (clobber (match_scratch:X 3 "="))]
   "TARGET_64BIT
&& TARGET_ZBS
&& (TARGET_ZBB || TARGET_ZBKB)
&& (INTVAL (operands[2]) & 0x1f) != 0x1f"
-   [(set (match_dup 0) (and:DI (not:DI (match_dup 1)) (match_dup 2)))
-(set (match_dup 0) (zero_extend:DI (ashift:SI
-  (const_int 1)
-  (subreg:QI (match_dup 0) 0])
+  "#"
+  "&& reload_completed"
+   [(set (match_dup 3) (match_dup 2))
+(set (match_dup 3) (and:DI (not:DI (match_dup 1)) (match_dup 3)))
+(set (match_dup 0) (zero_extend:DI
+(ashift:SI (const_int 1) (match_dup 4]
+  { operands[4] = gen_lowpart (QImode, operands[3]); }
+  [(set_attr "type" "bitmanip")])
 
-(define_split
-  [(set (match_operand:DI 0 "register_operand")
-   (any_extend:DI
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(any_extend:DI
 (ashift:SI (const_int 1)
(subreg:QI
- (and:DI (match_operand:DI 1 "register_operand")
+ (and:DI (match_operand:DI 1 "register_operand" "r")
  (match_operand 2 "const_int_operand")) 0]
   "TARGET_64BIT
&& TARGET_ZBS
&& (INTVAL (operands[2]) & 0x1f) != 0x1f"
-   [(set 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: fix internal error on global variable-length array

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a3d53feb6ab6b71a11e220a800d8f5a013e7590c

commit a3d53feb6ab6b71a11e220a800d8f5a013e7590c
Author: Eric Botcazou 
Date:   Sat Jul 6 11:56:19 2024 +0200

RISC-V: fix internal error on global variable-length array

This is an ICE in the RISC-V back-end calling tree_to_uhwi on the DECL_SIZE
of a global variable-length array.

gcc/
PR target/115591
* config/riscv/riscv.cc (riscv_valid_lo_sum_p): Add missing test on
tree_fits_uhwi_p before calling tree_to_uhwi.

gcc/testsuite/
* gnat.dg/array41.ads, gnat.dg/array41.adb: New test.

(cherry picked from commit 8bc5561c43b195e1638e5acace8b41b3f7512be3)

Diff:
---
 gcc/config/riscv/riscv.cc |  4 +++-
 gcc/testsuite/gnat.dg/array41.adb | 37 +
 gcc/testsuite/gnat.dg/array41.ads |  5 +
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index cca7ffde33a..4acd643fd8d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1702,7 +1702,9 @@ riscv_valid_lo_sum_p (enum riscv_symbol_type sym_type, 
machine_mode mode,
   align = (SYMBOL_REF_DECL (x)
   ? DECL_ALIGN (SYMBOL_REF_DECL (x))
   : 1);
-  size = (SYMBOL_REF_DECL (x) && DECL_SIZE (SYMBOL_REF_DECL (x))
+  size = (SYMBOL_REF_DECL (x)
+ && DECL_SIZE (SYMBOL_REF_DECL (x))
+ && tree_fits_uhwi_p (DECL_SIZE (SYMBOL_REF_DECL (x)))
  ? tree_to_uhwi (DECL_SIZE (SYMBOL_REF_DECL (x)))
  : 2*BITS_PER_WORD);
 }
diff --git a/gcc/testsuite/gnat.dg/array41.adb 
b/gcc/testsuite/gnat.dg/array41.adb
new file mode 100644
index 000..d0d5a69eeaf
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/array41.adb
@@ -0,0 +1,37 @@
+-- { dg-do compile }
+
+with System.Storage_Elements;
+
+package body Array41 is
+
+   procedure Program_Initialization
+   with
+ Export,
+ Convention => Ada,
+ External_Name => "program_initialization";
+
+   procedure Program_Initialization is
+  use System.Storage_Elements;
+
+  Sdata : Storage_Element
+with Import, Convention => Asm, External_Name => "_sdata";
+  Edata : Storage_Element
+with Import, Convention => Asm, External_Name => "_edata";
+
+  Data_Size : constant Storage_Offset := Edata'Address - Sdata'Address;
+
+  --  Index from 1 so as to avoid subtracting 1 from the size
+  Data_In_Flash : constant Storage_Array (1 .. Data_Size)
+with Import, Convention => Asm, External_Name => "_sidata";
+
+  Data_In_Sram : Storage_Array (1 .. Data_Size)
+with Volatile, Import, Convention => Asm, External_Name => "_sdata";
+
+   begin
+  --  Copy rw data from flash to ram
+  for J in Data_In_Flash'Range loop
+ Data_In_Sram (J) := Data_In_Flash (J);
+  end loop;
+   end Program_Initialization;
+
+end Array41;
diff --git a/gcc/testsuite/gnat.dg/array41.ads 
b/gcc/testsuite/gnat.dg/array41.ads
new file mode 100644
index 000..50cde3cd819
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/array41.ads
@@ -0,0 +1,5 @@
+package Array41 is
+
+  pragma Elaborate_Body;
+
+end Array41;


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Use tu policy for first-element vec_set [PR115725].

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:30267a9ae0d49265e80d7d6748cd0851eb4fe1ad

commit 30267a9ae0d49265e80d7d6748cd0851eb4fe1ad
Author: Robin Dapp 
Date:   Mon Jul 1 13:37:17 2024 +0200

RISC-V: Use tu policy for first-element vec_set [PR115725].

This patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy.  With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.

gcc/ChangeLog:

* config/riscv/autovec.md: Add TU policy.
* config/riscv/riscv-protos.h (enum insn_type): Define
SCALAR_MOVE_MERGED_OP_TU.

gcc/testsuite/ChangeLog:

PR target/115725

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test expectation.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.

(cherry picked from commit acc3b703c05debc6276451f9daae5d0ffc797eac)

Diff:
---
 gcc/config/riscv/autovec.md  |  3 ++-
 gcc/config/riscv/riscv-protos.h  |  4 
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c   | 12 
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c   | 12 
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c   | 12 
 .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c   | 12 
 6 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 66d70f678a6..0fb6316a2cf 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1341,7 +1341,8 @@
 {
   rtx ops[] = {operands[0], operands[0], operands[1]};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (mode),
-   riscv_vector::SCALAR_MOVE_MERGED_OP, 
ops, CONST1_RTX (Pmode));
+   riscv_vector::SCALAR_MOVE_MERGED_OP_TU,
+   ops, CONST1_RTX (Pmode));
 }
   else
 {
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a8b76173fa0..abf6e34b5cc 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -524,6 +524,10 @@ enum insn_type : unsigned int
   SCALAR_MOVE_MERGED_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
  | HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
  | UNARY_OP_P,
+
+  SCALAR_MOVE_MERGED_OP_TU = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
+ | HAS_MERGE_P | TU_POLICY_P | MDEFAULT_POLICY_P
+ | UNARY_OP_P,
 };
 
 enum vlmul_type
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
index ecb160933d6..99b0f625c83 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
@@ -64,14 +64,10 @@ typedef double vnx2df __attribute__((vector_size (16)));
 TEST_ALL1 (VEC_SET)
 TEST_ALL_VAR1 (VEC_SET_VAR1)
 
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 5 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 4 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 6 } } */
 
 /* { dg-final { scan-assembler-times {\tvmv.v.x} 13 } } */
 /* { dg-final { scan-assembler-times {\tvfmv.v.f} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
index 194abff77cc..64a40308eb1 100644
--- 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed][RISC-V] Fix test expectations after recent late-combine changes

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:a2b380b973fc48b5b2f8c7ebff775ab5aab5a5cb

commit a2b380b973fc48b5b2f8c7ebff775ab5aab5a5cb
Author: Jeff Law 
Date:   Thu Jul 4 09:25:20 2024 -0600

[committed][RISC-V] Fix test expectations after recent late-combine changes

With the recent DCE related adjustment to late-combine the 
rvv/base/vcreate.c
test no longer has those undesirable vmvNr statements.

It's a bit unclear why this wasn't written as a scan-assembler-not and 
xfailed
given the comment says we don't want to see vmvNr insructions.  I must have
missed that during review.

This patch adjusts the test to expect no vmvNr statements and if they're 
ever
re-introduced, we'll get a nice unexpected failure.

gcc/testsuite
* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.

(cherry picked from commit b611f3969249967d7f098c6adfcf5f701192a2d0)

Diff:
---
 gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
index 01006de7c81..1c7c154637e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
@@ -256,6 +256,6 @@ test_vcreate_v_i64m2x4 (vint64m2_t v0, vint64m2_t v1, 
vint64m2_t v2,
 }
 
 // Ideally with O3, should find 0 instances of any vmvnr.v PR113913
-/* { dg-final { scan-assembler-times {vmv1r.v\s+v[0-9]+,\s*v[0-9]+} 72 } } */
-/* { dg-final { scan-assembler-times {vmv2r.v\s+v[0-9]+,\s*v[0-9]+} 36 } } */
-/* { dg-final { scan-assembler-times {vmv4r.v\s+v[0-9]+,\s*v[0-9]+} 16 } } */
+/* { dg-final { scan-assembler-not {vmv1r.v\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmv2r.v\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmv4r.v\s+v[0-9]+,\s*v[0-9]+} } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Describe -march behavior for dependent extensions

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:316bda6c8e290853cfca060f38b2f0f4281d65f9

commit 316bda6c8e290853cfca060f38b2f0f4281d65f9
Author: Palmer Dabbelt 
Date:   Tue Jul 2 18:20:39 2024 -0700

RISC-V: Describe -march behavior for dependent extensions

gcc/ChangeLog:

* doc/invoke.texi: Describe -march behavior for dependent 
extensions on
RISC-V.

(cherry picked from commit 70f6bc39c4b0e147a816ad1dad583f944616c367)

Diff:
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 023ee575b86..9486758d463 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -30927,6 +30927,10 @@ If both @option{-march} and @option{-mcpu=} are not 
specified, the default for
 this argument is system dependent, users who want a specific architecture
 extensions should specify one explicitly.
 
+When the RISC-V specifications define an extension as depending on other
+extensions, GCC will implicitly add the dependent extensions to the enabled
+extension set if they weren't added explicitly.
+
 @opindex mcpu
 @item -mcpu=@var{processor-string}
 Use architecture of and optimize the output for the given processor, specified


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add support for Zabha extension

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:de6abc8953cdba01e27be7427afe7834a68c583d

commit de6abc8953cdba01e27be7427afe7834a68c583d
Author: Gianluca Guida 
Date:   Tue Jul 2 18:05:14 2024 -0700

RISC-V: Add support for Zabha extension

The Zabha extension adds support for subword Zaamo ops.

Extension: https://github.com/riscv/riscv-zabha.git
Ratification: https://jira.riscv.org/browse/RVS-1685

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zabha when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zabha imply zaamo.
* config/riscv/iterators.md (amobh): Add iterator for amo
byte/halfword.
* config/riscv/riscv.opt: Add zabha.
* config/riscv/sync.md (atomic_): Add
subword atomic op pattern.
(zabha_atomic_fetch_): Add subword
atomic_fetch op pattern.
(lrsc_atomic_fetch_): Prefer zabha over lrsc
for subword atomic ops.
(zabha_atomic_exchange): Add subword atomic exchange
pattern.
(lrsc_atomic_exchange): Prefer zabha over lrsc for subword
atomic exchange ops.
* configure: Regenerate.
* configure.ac: Add zabha assembler check.
* doc/sourcebuild.texi: Add zabha documentation.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zabha testsuite infra support.
* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue 
to
test the lr/sc subword patterns.
* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.

Co-Authored-By: Patrick O'Neill 
Signed-Off-By: Gianluca Guida 
Tested-by: Andrea Parri 
(cherry picked from commit 7b2b2e3d660edc8ef3a8cfbdfc2b0fd499459601)

Diff:
---
 gcc/common/config/riscv/riscv-common.cc| 12 
 gcc/config.in  |  6 ++
 gcc/config/riscv/arch-canonicalize |  3 +
 gcc/config/riscv/iterators.md  |  3 +
 gcc/config/riscv/riscv.opt |  2 +
 gcc/config/riscv/sync.md   | 81 +-
 gcc/configure  | 31 +
 gcc/configure.ac   |  5 ++
 gcc/doc/sourcebuild.texi   | 12 +++-
 .../gcc.target/riscv/amo/inline-atomics-1.c|  1 +
 .../gcc.target/riscv/amo/inline-atomics-2.c|  1 +
 .../riscv/amo/zabha-all-amo-ops-char-run.c |  5 ++
 .../riscv/amo/zabha-all-amo-ops-short-run.c|  5 ++
 .../riscv/amo/zabha-rvwmo-all-amo-ops-char.c   | 23 ++
 .../riscv/amo/zabha-rvwmo-all-amo-ops-short.c  | 23 ++
 .../riscv/amo/zabha-rvwmo-amo-add-char.c   | 57 +++
 .../riscv/amo/zabha-rvwmo-amo-add-short.c  | 57 +++
 .../gcc.target/riscv/amo/zabha-ztso-amo-add-char.c | 57 +++
 .../riscv/amo/zabha-ztso-amo-add-short.c   | 57 +++
 .../zalrsc-rvwmo-subword-amo-add-char-acq-rel.c|  1 +
 .../zalrsc-rvwmo-subword-amo-add-char-acquire.c|  1 +
 .../zalrsc-rvwmo-subword-amo-add-char-relaxed.c|  1 +
 .../zalrsc-rvwmo-subword-amo-add-char-release.c|  1 +
 .../zalrsc-rvwmo-subword-amo-add-char-seq-cst.c|  1 +
 .../amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c |  1 +
 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Fix asm check failure for truncated after SAT_SUB

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b810058b87e322cf66e0130f6cfb04f08764b701

commit b810058b87e322cf66e0130f6cfb04f08764b701
Author: Pan Li 
Date:   Wed Jul 3 13:17:16 2024 +0800

RISC-V: Fix asm check failure for truncated after SAT_SUB

It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
Ditto.

Signed-off-by: Pan Li 
(cherry picked from commit ab3e3d2f0564c2eb0640de3f4d0a50e1fcc8c318)

Diff:
---
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 2 +-
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
index dd9e3999a29..1e380657d74 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
@@ -11,7 +11,7 @@
 ** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
 ** ...
 ** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
 ** vsetvli\s+zero,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma
 ** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
 ** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
index 738d1465a01..d7b8931f0ec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
@@ -11,7 +11,7 @@
 ** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
 ** ...
 ** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
 ** vsetvli\s+zero,\s*zero,\s*e16,\s*mf2,\s*ta,\s*ma
 ** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
 ** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
index b008b21cf0c..edf42a1f776 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
@@ -11,7 +11,7 @@
 ** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
 ** ...
 ** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
 ** vsetvli\s+zero,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
 ** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
 ** ...


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-07 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:7013a4562f74ba14cd14f4e57cc318225752c762

commit 7013a4562f74ba14cd14f4e57cc318225752c762
Author: Pan Li 
Date:   Wed Jul 3 22:06:48 2024 +0800

RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra

PR target/115763

gcc/ChangeLog:

* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit de9254e224eb3d89303cb9b3ba50b4c479c55f7c)

Diff:
---
 gcc/config/riscv/vector.md | 49 +++---
 .../gcc.target/riscv/rvv/base/pr115763-1.c |  9 
 .../gcc.target/riscv/rvv/base/pr115763-2.c | 10 +
 .../gcc.target/riscv/rvv/base/scalar_move-5.c  |  4 +-
 .../gcc.target/riscv/rvv/base/scalar_move-6.c  |  6 +--
 .../gcc.target/riscv/rvv/base/scalar_move-7.c  |  6 +--
 .../gcc.target/riscv/rvv/base/scalar_move-8.c  |  6 +--
 7 files changed, 64 insertions(+), 26 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
 
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
-   (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+   (if_then_else:V_VLSF
  (unspec:
-   [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, 
vm, vm,Wc1,Wc1,Wb1,Wb1")
-(match_operand 4 "vector_length_operand"  " rK, rK, 
rK, rK, rK, rK, rK, rK")
-(match_operand 5 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
-(match_operand 6 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
-(match_operand 7 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
+   [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1, 
Wb1, Wb1")
+(match_operand  4 "vector_length_operand" " rK,  rK,  
rK,  rK")
+(match_operand  5 "const_int_operand" "  i,   i,   
i,   i")
+(match_operand  6 "const_int_operand" "  i,   i,   
i,   i")
+(match_operand  7 "const_int_operand" "  i,   i,   
i,   i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLSF_ZVFHMIN
-   (match_operand: 3 "direct_broadcast_operand"   " f,  
f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
- (match_operand:V_VLSF_ZVFHMIN 2 "vector_merge_operand""vu,  0, 
vu,  0, vu,  0, vu,  0")))]
+ (vec_duplicate:V_VLSF
+   (match_operand: 3 "direct_broadcast_operand"  "  f,   f,   
f,   f"))
+ (match_operand:V_VLSF  2 "vector_merge_operand"  " vu,   0,  
vu,   0")))]
   "TARGET_VECTOR"
   "@
vfmv.v.f\t%0,%3
vfmv.v.f\t%0,%3
+   vfmv.s.f\t%0,%3
+   vfmv.s.f\t%0,%3"
+  [(set_attr "type" "vfmov,vfmov,vfmovfv,vfmovfv")
+   (set_attr "mode" "")])
+
+(define_insn "*pred_broadcast_zvfhmin"
+  [(set (match_operand:V_VLSF_ZVFHMIN   0 "register_operand"  
"=vr,  vr,  vr,  vr")
+   

[gcc r15-1874] [to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

2024-07-06 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:273f16a125c4fab664683376ae04a9a31e7d6a22

commit r15-1874-g273f16a125c4fab664683376ae04a9a31e7d6a22
Author: Jeff Law 
Date:   Sat Jul 6 12:57:59 2024 -0600

[to-be-committed][v3][RISC-V] Handle bit manipulation of SImode values

Last patch in this round of bitmanip work...  At least I think I'm going to
pause here and switch gears to other projects that need attention 

This patch introduces the ability to generate bitmanip instructions for rv64
when operating on SI objects when we know something about the range of the 
bit
position (due to masking of the position).

I've got note that the (7-pos % 8) bit position form was discovered by RAU 
in
500.perl.  I took that and expanded it to the simple (pos & mask) form as 
well
as covering bset, binv and bclr.

As far as the implementation is concerned

This turns the recently added define_splits into define_insn_and_split
constructs.  This allows combine to "see" enough RTL to realize a sign
extension is unnecessary.  Otherwise we get undesirable sign extensions for 
the
new testcases.

Second it adds new patterns for the logical operations.  Two patterns for
IOR/XOR and two patterns for AND.

I think a key concept to keep in mind is that once we determine a Zbs 
operation
is safe to perform on a SI value, we can rewrite the RTL in 64bit form.  If 
we
were ever to try and use range information at expand time for this stuff 
(and
we probably should investigate that), that's the path I'd suggest.

This is notably cleaner than my original implementation which actually kept 
the
more complex RTL form through final and emitted 2/3 instructions (mask the 
bit
position, then the bset/bclr/binv).

Tested in my tester, but waiting for pre-commit CI to report back before 
taking
further action.

gcc/
* config/riscv/bitmanip.md (bset splitters): Turn into 
define_and_splits.
Don't depend on combine splitting the "andn with constant" form.
(bset, binv, bclr with masked bit position): New patterns.

gcc/testsuite
* gcc.target/riscv/binv-for-simode-1.c: New test.
* gcc.target/riscv/bset-for-simode-1.c: New test.
* gcc.target/riscv/bclr-for-simode-1.c: New test.

Diff:
---
 gcc/config/riscv/bitmanip.md   | 135 ++---
 gcc/testsuite/gcc.target/riscv/bclr-for-simode-1.c |  25 
 gcc/testsuite/gcc.target/riscv/binv-for-simode-1.c |  24 
 gcc/testsuite/gcc.target/riscv/bset-for-simode-1.c |  24 
 4 files changed, 192 insertions(+), 16 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 3eedabffca0..f403ba8dbba 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -615,37 +615,140 @@
 ;; shift constant.  With the limited range we know the SImode sign
 ;; bit is never set, thus we can treat this as zero extending and
 ;; generate the bsetdi_2 pattern.
-(define_split
-  [(set (match_operand:DI 0 "register_operand")
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
(any_extend:DI
 (ashift:SI (const_int 1)
(subreg:QI
- (and:DI (not:DI (match_operand:DI 1 "register_operand"))
+ (and:DI (not:DI (match_operand:DI 1 "register_operand" 
"r"))
  (match_operand 2 "const_int_operand")) 0
-   (clobber (match_operand:DI 3 "register_operand"))]
+   (clobber (match_scratch:X 3 "="))]
   "TARGET_64BIT
&& TARGET_ZBS
&& (TARGET_ZBB || TARGET_ZBKB)
&& (INTVAL (operands[2]) & 0x1f) != 0x1f"
-   [(set (match_dup 0) (and:DI (not:DI (match_dup 1)) (match_dup 2)))
-(set (match_dup 0) (zero_extend:DI (ashift:SI
-  (const_int 1)
-  (subreg:QI (match_dup 0) 0])
+  "#"
+  "&& reload_completed"
+   [(set (match_dup 3) (match_dup 2))
+(set (match_dup 3) (and:DI (not:DI (match_dup 1)) (match_dup 3)))
+(set (match_dup 0) (zero_extend:DI
+(ashift:SI (const_int 1) (match_dup 4]
+  { operands[4] = gen_lowpart (QImode, operands[3]); }
+  [(set_attr "type" "bitmanip")])
 
-(define_split
-  [(set (match_operand:DI 0 "register_operand")
-   (any_extend:DI
+(define_insn_and_split ""
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(any_extend:DI
 (ashift:SI (const_int 1)
(subreg:QI
- (and:DI (match_operand:DI 1 "register_operand")
+ (and:DI (match_operand:DI 1 "register_operand" "r")
  (match_operand 2 "const_int_operand")) 0]
   "TARGET_64BIT
&& TARGET_ZBS
&& (INTVAL (operands[2]) & 0x1f) != 0x1f"
-   [(set (match_dup 0) (and:DI (match_dup 1) (match_dup 2)))
-(set 

[gcc r15-1872] [committed] Fix various sh define_insn_and_split predicates

2024-07-06 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:cb9badea8be5396afe90f4c497e9f333cce1cb3f

commit r15-1872-gcb9badea8be5396afe90f4c497e9f333cce1cb3f
Author: Jeff Law 
Date:   Sat Jul 6 06:35:54 2024 -0600

[committed] Fix various sh define_insn_and_split predicates

The sh4-linux-gnu port has failed to bootstrap since the introduction of 
late
combine due to failures to split certain insns.

This is caused by incorrect predicates in various define_insn_and_split
patterns.  Essentially the insn's predicate is something like "TARGET_SH1".
The split predicate is "&& can_create_pseudos_p ()".  So these patterns will
match post-reload, but be un-splittable.  So at assembly output time, we get
the failure as the output template is "#".

This patch fixes the most obvious & egregious cases by bringing the split
condition into the insn's predicate and leaving "&& 1" as the split 
condition.
That's enough to get sh4-linux-gnu bootstrapping again and I'm hoping it 
does
the same for sh4eb-linux-gnu.

Pushing to the trunk.

gcc/
* config/sh/sh.md (adddi3): Only allow matching when we can
still create new pseudos.
(subdi3, *rotcl, *rotcr, *rotcr_neg_t, negdi2): Likewise.
(abs2, negabs2, negdi_cond): Likewise.
(*swapbisi2_and_shl8, *swapbhisi2, *movsi_index_disp_load): 
Likewise.
(*movhi_index_disp_load, *movindex_disp_store): Likewise.
(*mov_t_msb_neg, *negt_msb, clipu_one): Likewise.

Diff:
---
 gcc/config/sh/sh.md | 100 ++--
 1 file changed, 50 insertions(+), 50 deletions(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 9491b49e55b..3e978254ab0 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -1542,9 +1542,9 @@
(plus:DI (match_operand:DI 1 "arith_reg_operand")
 (match_operand:DI 2 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(const_int 0)]
 {
   emit_insn (gen_clrt ());
@@ -1934,9 +1934,9 @@
(minus:DI (match_operand:DI 1 "arith_reg_operand")
  (match_operand:DI 2 "arith_reg_operand")))
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(const_int 0)]
 {
   emit_insn (gen_clrt ());
@@ -3174,9 +3174,9 @@
(and:SI (match_operand:SI 3 "arith_reg_or_t_reg_operand")
(const_int 1
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(const_int 0)]
 {
   gcc_assert (INTVAL (operands[2]) > 0);
@@ -3259,9 +3259,9 @@
   (match_operand:SI 2 "const_int_operand"))
(match_operand 3 "treg_set_expr")))
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (ior:SI (ashift:SI (match_dup 1) (match_dup 2))
   (and:SI (match_dup 3) (const_int 1
@@ -3278,9 +3278,9 @@
(ashift:SI (match_operand:SI 2 "arith_reg_operand")
   (match_operand:SI 3 "const_int_operand"
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (ior:SI (ashift:SI (match_dup 2) (match_dup 3))
   (and:SI (match_dup 1) (const_int 1
@@ -3293,9 +3293,9 @@
(lshiftrt:SI (match_operand:SI 3 "arith_reg_operand")
 (const_int 31
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (ior:SI (ashift:SI (match_dup 1) (match_dup 2))
   (and:SI (reg:SI T_REG) (const_int 1
@@ -3312,9 +3312,9 @@
(ashift:SI (match_operand:SI 1 "arith_reg_operand")
   (match_operand:SI 2 "const_int_operand"
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (ior:SI (ashift:SI (match_dup 1) (match_dup 2))
   (and:SI (reg:SI T_REG) (const_int 1
@@ -3332,9 +3332,9 @@
 (const_int 1)
 (match_operand 4 "const_int_operand"
(clobber (reg:SI T_REG))]
-  "TARGET_SH1"
+  "TARGET_SH1 && can_create_pseudo_p ()"
   "#"
-  "&& can_create_pseudo_p ()"
+  "&& 1"
   [(parallel [(set (match_dup 0)
   (ior:SI (ashift:SI (match_dup 1) 

[gcc r15-1844] [committed][RISC-V] Fix test expectations after recent late-combine changes

2024-07-04 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b611f3969249967d7f098c6adfcf5f701192a2d0

commit r15-1844-gb611f3969249967d7f098c6adfcf5f701192a2d0
Author: Jeff Law 
Date:   Thu Jul 4 09:25:20 2024 -0600

[committed][RISC-V] Fix test expectations after recent late-combine changes

With the recent DCE related adjustment to late-combine the 
rvv/base/vcreate.c
test no longer has those undesirable vmvNr statements.

It's a bit unclear why this wasn't written as a scan-assembler-not and 
xfailed
given the comment says we don't want to see vmvNr insructions.  I must have
missed that during review.

This patch adjusts the test to expect no vmvNr statements and if they're 
ever
re-introduced, we'll get a nice unexpected failure.

gcc/testsuite
* gcc.target/riscv/rvv/base/vcreate.c: Update expected output.

Diff:
---
 gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
index 01006de7c81..1c7c154637e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vcreate.c
@@ -256,6 +256,6 @@ test_vcreate_v_i64m2x4 (vint64m2_t v0, vint64m2_t v1, 
vint64m2_t v2,
 }
 
 // Ideally with O3, should find 0 instances of any vmvnr.v PR113913
-/* { dg-final { scan-assembler-times {vmv1r.v\s+v[0-9]+,\s*v[0-9]+} 72 } } */
-/* { dg-final { scan-assembler-times {vmv2r.v\s+v[0-9]+,\s*v[0-9]+} 36 } } */
-/* { dg-final { scan-assembler-times {vmv4r.v\s+v[0-9]+,\s*v[0-9]+} 16 } } */
+/* { dg-final { scan-assembler-not {vmv1r.v\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmv2r.v\s+v[0-9]+,\s*v[0-9]+} } } */
+/* { dg-final { scan-assembler-not {vmv4r.v\s+v[0-9]+,\s*v[0-9]+} } } */


[gcc r15-1834] [committed] Fix newlib build failure with rx as well as several dozen testsuite failures

2024-07-03 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:759f4abe1220a8202b8389f9b756c35b6c9c439d

commit r15-1834-g759f4abe1220a8202b8389f9b756c35b6c9c439d
Author: Jeff Law 
Date:   Wed Jul 3 21:11:07 2024 -0600

[committed] Fix newlib build failure with rx as well as several dozen 
testsuite failures

The rx port has been failing to build newlib for a bit over a week.  I can't
remember if it was the late-combine work or the IRA costing twiddle, 
regardless
the real bug is in the rx backend.

Basically dwarf2cfi is blowing up because of inconsistent state caused by 
the
failure to mark a stack adjustment as frame related.  This instance in the
epilogue looks like a simple goof.

With the port building again, the testsuite would run and it showed a 
number of
regressions, again related to CFI handling.  The common thread was a 
failure to
mark a copy from FP to SP in the prologue as frame related.  The change 
which
introduced this bug as supposed to just be changing promotions of vector 
types.
It's unclear if Nick included the hunk accidentally or just goof'd on the
logic.  Regardless it looks quite incorrect.

Reverting that hunk fixes the regressions *and* fixes 94 pre-existing 
failures.

The net is rx-elf is regression free and has moved forward in terms of its
testsuite status.

Pushing to the trunk momentarily.

gcc/

* config/rx/rx.cc (rx_expand_prologue): Mark the copy from FP to SP
as frame related.
(rx_expand_epilogue): Mark the stack pointer adjustment as frame
related.

Diff:
---
 gcc/config/rx/rx.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rx/rx.cc b/gcc/config/rx/rx.cc
index 8048cc98708..c84e1398aad 100644
--- a/gcc/config/rx/rx.cc
+++ b/gcc/config/rx/rx.cc
@@ -1845,8 +1845,7 @@ rx_expand_prologue (void)
gen_safe_add (stack_pointer_rtx, stack_pointer_rtx,
  GEN_INT (- (HOST_WIDE_INT) frame_size), true);
   else
-   gen_safe_add (stack_pointer_rtx, frame_pointer_rtx, NULL_RTX,
- false /* False because the epilogue will use the FP not 
the SP.  */);
+   gen_safe_add (stack_pointer_rtx, frame_pointer_rtx, NULL_RTX, true);
 }
 }
 
@@ -2119,7 +2118,7 @@ rx_expand_epilogue (bool is_sibcall)
   /* Cannot use the special instructions - deconstruct by hand.  */
   if (total_size)
gen_safe_add (stack_pointer_rtx, stack_pointer_rtx,
- GEN_INT (total_size), false);
+ GEN_INT (total_size), true);
 
   if (MUST_SAVE_ACC_REGISTER)
{


[gcc r15-1828] [committed] Fix previously latent bug in reorg affecting cris port

2024-07-03 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:e5f73853ae78d4e9ae434c707a12da1494459b24

commit r15-1828-ge5f73853ae78d4e9ae434c707a12da1494459b24
Author: Jeff Law 
Date:   Wed Jul 3 12:47:31 2024 -0600

[committed] Fix previously latent bug in reorg affecting cris port

The late-combine patch has triggered a previously latent bug in reorg.

Basically we have a sequence like this in the middle of reorg before we 
start
relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)

> (insn 67 49 18 (sequence [
> (jump_insn 50 49 52 (set (pc)
> (if_then_else (ne (reg:CC 19 ccr)
> (const_int 0 [0]))
> (label_ref:SI 30)
> (pc))) "j.c":10:6 discrim 1 282 {*bnecc}
>  (expr_list:REG_DEAD (reg:CC 19 ccr)
> (int_list:REG_BR_PROB 7 (nil)))
>  -> 30)
> (insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
> (reg:SI 16 srp)) 37 {*mov_tomemsi}
>  (nil))
> ]) "j.c":10:6 discrim 1 -1
>  (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 55 NOTE_INSN_EPILOGUE_BEG)
>
> (jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
>  (nil)
>  -> return)
>
> (barrier 56 55 43)
>
> (note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (code_label 30 65 8 5 6 (nil) [1 uses])
>
> (note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)

So at a high level the things to note are that insn 50 conditionally jumps
around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note between insn 50 
and
the target label for insn 50 (code_label 30).

reorg sees the conditional jump around the unconditional jump/return and 
will
invert the jump and retarget the original jump to an appropriate location.  
In
this case generating:

> (insn 67 49 18 (sequence [
> (jump_insn 50 49 52 (set (pc)
> (if_then_else (eq (reg:CC 19 ccr)
> (const_int 0 [0]))
> (label_ref:SI 68)
> (pc))) "j.c":10:6 discrim 1 281 {*beqcc}
>  (expr_list:REG_DEAD (reg:CC 19 ccr)
> (int_list:REG_BR_PROB 1073741831 (nil)))
>  -> 68)
> (insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
> (reg:SI 16 srp)) 37 {*mov_tomemsi}
>  (nil))
> ]) "j.c":10:6 discrim 1 -1
>  (nil))
>
> (note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> (note 54 18 43 NOTE_INSN_EPILOGUE_BEG)
>
> (note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)
>
> (note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)
>
> (note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)
[ ... ]
Where the new target of the jump is a return statement later in the IL.

Note that we now have a SWITCH_TEXT_SECTIONS note that is not immediately
preceded by a BARRIER.  That triggers an assertion in the dwarf2 code.  
Removal
of the BARRIER is inherent in this optimization.

The fix is simple, we avoid this optimization when there's a
SWITCH_TEXT_SECTIONS note between the conditional jump insn and its target.
Thankfully we already have a routine to test for this in reorg, so we just 
need
to call it appropriately.  The other approach would be to drop the note 
which I
considered and discarded.

We don't have great coverage for delay slot targets.  I've tested arc, cris,
fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester as crosses
without new regressions, fixing one regression along the way.   Bootstrap &
regression testing on sh4 and hppa will take considerably longer.

gcc/

* reorg.cc (relax_delay_slots): Do not optimize a conditional
jump around an unconditional jump/return in the presence of
a text section switch.

Diff:
---
 gcc/reorg.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/reorg.cc b/gcc/reorg.cc
index 99228a22c69..633099ca765 100644
--- a/gcc/reorg.cc
+++ b/gcc/reorg.cc
@@ -3409,7 +3409,8 @@ relax_delay_slots (rtx_insn *first)
  && next && simplejump_or_return_p (next)
  && (next_active_insn (as_a (target_label))
  == next_active_insn (next))
- && no_labels_between_p (insn, next))
+ && no_labels_between_p (insn, next)
+ && !switch_text_sections_between_p (insn, next_active_insn (next)))
{
  rtx label = JUMP_LABEL (next);
  rtx old_label = JUMP_LABEL (delay_jump_insn);


[gcc r15-1823] [PATCH] ARC: Update gcc.target/arc/pr9001184797.c test

2024-07-03 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:c41eb4c702ed04993a475d5910c190af1ff66720

commit r15-1823-gc41eb4c702ed04993a475d5910c190af1ff66720
Author: Luis Silva 
Date:   Wed Jul 3 09:41:05 2024 -0600

[PATCH] ARC: Update gcc.target/arc/pr9001184797.c test

... to comply with new standards due to stricter analysis in
the latest GCC versions.

gcc/testsuite/ChangeLog:

* gcc.target/arc/pr9001184797.c: Fix compiler warnings.

Diff:
---
 gcc/testsuite/gcc.target/arc/pr9001184797.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arc/pr9001184797.c 
b/gcc/testsuite/gcc.target/arc/pr9001184797.c
index e76c6769042..6c5de5fe729 100644
--- a/gcc/testsuite/gcc.target/arc/pr9001184797.c
+++ b/gcc/testsuite/gcc.target/arc/pr9001184797.c
@@ -4,13 +4,15 @@
 
 /* This test studies the use of anchors and tls symbols. */
 
+extern int h();
+
 struct a b;
 struct a {
   long c;
   long d
 } e() {
   static __thread struct a f;
-  static __thread g;
+  static __thread int g;
   g = 5;
   h();
   if (f.c)


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f97a257c3f83ea9490139800793b2352f74eb275

commit f97a257c3f83ea9490139800793b2352f74eb275
Author: Pan Li 
Date:   Sun Jun 30 16:48:19 2024 +0800

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 4.  Aka:

Form 4:
  #define DEF_SAT_U_ADD_IMM_FMT_4(T)\
  T __attribute__((noinline))   \
  sat_u_add_imm_##T##_fmt_4 (T x)   \
  { \
T ret;  \
return __builtin_add_overflow (x, 9, ) == 0 ? ret : -1; \
  }

DEF_SAT_U_ADD_IMM_FMT_4(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-16.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-13.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-14.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-15.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-16.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 7a65ab6b5f38d3018ffd456f278a9fd885487a27)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 11 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-13.c  | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-14.c  | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-15.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-16.c  | 17 
 .../gcc.target/riscv/sat_u_add_imm-run-13.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-14.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-15.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-16.c| 46 ++
 9 files changed, 270 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 83b294db476..75442c94dc1 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -82,6 +82,14 @@ sat_u_add_imm##IMM##_##T##_fmt_3 (T x) \
   return __builtin_add_overflow (x, IMM, ) ? -1 : ret; \
 }
 
+#define DEF_SAT_U_ADD_IMM_FMT_4(T, IMM) \
+T __attribute__((noinline)) \
+sat_u_add_imm##IMM##_##T##_fmt_4 (T x)  \
+{   \
+  T ret;\
+  return __builtin_add_overflow (x, IMM, ) == 0 ? ret : -1; \
+}
+
 #define RUN_SAT_U_ADD_IMM_FMT_1(T, x, IMM, expect) \
   if (sat_u_add_imm##IMM##_##T##_fmt_1(x) != expect) __builtin_abort ()
 
@@ -91,6 +99,9 @@ sat_u_add_imm##IMM##_##T##_fmt_3 (T x) \
 #define RUN_SAT_U_ADD_IMM_FMT_3(T, x, IMM, expect) \
   if (sat_u_add_imm##IMM##_##T##_fmt_3(x) != expect) __builtin_abort ()
 
+#define RUN_SAT_U_ADD_IMM_FMT_4(T, x, IMM, expect) \
+  if (sat_u_add_imm##IMM##_##T##_fmt_4(x) != expect) __builtin_abort ()
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-13.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-13.c
new file mode 100644
index 000..a3b2679233c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-13.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm9_uint8_t_fmt_4:
+** addi\s+[atx][0-9]+,\s*a0,\s*9
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_IMM_FMT_4(uint8_t, 9)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-14.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-14.c
new file mode 100644
index 000..968534b74da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-14.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1ee1caef0b826e95dc96e4a751d86981bdcd8ee2

commit 1ee1caef0b826e95dc96e4a751d86981bdcd8ee2
Author: Pan Li 
Date:   Sun Jun 30 16:41:16 2024 +0800

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 3.  Aka:

Form 3:
  #define DEF_SAT_U_ADD_IMM_FMT_3(T)   \
  T __attribute__((noinline))  \
  sat_u_add_imm_##T##_fmt_3 (T x)  \
  {\
T ret; \
return __builtin_add_overflow (x, 8, ) ? -1 : ret; \
  }

DEF_SAT_U_ADD_IMM_FMT_3(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-9.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-10.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-11.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-12.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-9.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit 6d98e88f61f9b2e6864775ce390e9ce0a1359624)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 11 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-10.c  | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-11.c  | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-12.c  | 17 
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-9.c   | 19 +
 .../gcc.target/riscv/sat_u_add_imm-run-10.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-11.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-12.c| 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-9.c | 46 ++
 9 files changed, 270 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index d94f0fd602c..83b294db476 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -74,12 +74,23 @@ sat_u_add_imm##IMM##_##T##_fmt_2 (T x)  \
   return (T)(x + IMM) < x ? -1 : (x + IMM); \
 }
 
+#define DEF_SAT_U_ADD_IMM_FMT_3(T, IMM)\
+T __attribute__((noinline))\
+sat_u_add_imm##IMM##_##T##_fmt_3 (T x) \
+{  \
+  T ret;   \
+  return __builtin_add_overflow (x, IMM, ) ? -1 : ret; \
+}
+
 #define RUN_SAT_U_ADD_IMM_FMT_1(T, x, IMM, expect) \
   if (sat_u_add_imm##IMM##_##T##_fmt_1(x) != expect) __builtin_abort ()
 
 #define RUN_SAT_U_ADD_IMM_FMT_2(T, x, IMM, expect) \
   if (sat_u_add_imm##IMM##_##T##_fmt_2(x) != expect) __builtin_abort ()
 
+#define RUN_SAT_U_ADD_IMM_FMT_3(T, x, IMM, expect) \
+  if (sat_u_add_imm##IMM##_##T##_fmt_3(x) != expect) __builtin_abort ()
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-10.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-10.c
new file mode 100644
index 000..24cdd267cca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm3_uint16_t_fmt_3:
+** addi\s+[atx][0-9]+,\s*a0,\s*3
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_IMM_FMT_3(uint16_t, 3)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-11.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-11.c
new file mode 100644
index 000..f30e2405a0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0d154f4ef5020b85879e1013fa49ac6d1a7af62e

commit 0d154f4ef5020b85879e1013fa49ac6d1a7af62e
Author: Pan Li 
Date:   Sun Jun 30 16:14:38 2024 +0800

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 2.  Aka:

Form 2:
  #define DEF_SAT_U_ADD_IMM_FMT_2(T)  \
  T __attribute__((noinline)) \
  sat_u_add_imm_##T##_fmt_1 (T x) \
  {   \
return (T)(x + 9) < x ? -1 : (x + 9); \
  }

DEF_SAT_U_ADD_IMM_FMT_2(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-8.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-8.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit bff0d025aff8efaa5d991fcd13dd9876b115dc94)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 10 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-5.c   | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-6.c   | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-7.c   | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-8.c   | 17 
 .../gcc.target/riscv/sat_u_add_imm-run-5.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-6.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-7.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-8.c | 46 ++
 9 files changed, 269 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 4ec4ec36cc1..d94f0fd602c 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -67,9 +67,19 @@ sat_u_add_imm##IMM##_##T##_fmt_1 (T x)   \
   return (T)(x + IMM) >= x ? (x + IMM) : -1; \
 }
 
+#define DEF_SAT_U_ADD_IMM_FMT_2(T, IMM) \
+T __attribute__((noinline)) \
+sat_u_add_imm##IMM##_##T##_fmt_2 (T x)  \
+{   \
+  return (T)(x + IMM) < x ? -1 : (x + IMM); \
+}
+
 #define RUN_SAT_U_ADD_IMM_FMT_1(T, x, IMM, expect) \
   if (sat_u_add_imm##IMM##_##T##_fmt_1(x) != expect) __builtin_abort ()
 
+#define RUN_SAT_U_ADD_IMM_FMT_2(T, x, IMM, expect) \
+  if (sat_u_add_imm##IMM##_##T##_fmt_2(x) != expect) __builtin_abort ()
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-5.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-5.c
new file mode 100644
index 000..19b502db6c9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm9_uint8_t_fmt_2:
+** addi\s+[atx][0-9]+,\s*a0,\s*9
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_IMM_FMT_2(uint8_t, 9)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-6.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-6.c
new file mode 100644
index 000..0317370b67e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-6.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm3_uint16_t_fmt_2:
+** addi\s+[atx][0-9]+,\s*a0,\s*3
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:170e8b651c138b37c5d86569a5acf909e137142c

commit 170e8b651c138b37c5d86569a5acf909e137142c
Author: Pan Li 
Date:   Sun Jun 30 16:03:41 2024 +0800

RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1

This patch would like to add test cases for the unsigned scalar
.SAT_ADD IMM form 1.  Aka:

Form 1:
  #define DEF_SAT_U_ADD_IMM_FMT_1(T)   \
  T __attribute__((noinline))  \
  sat_u_add_imm_##T##_fmt_1 (T x)  \
  {\
return (T)(x + 9) >= x ? (x + 9) : -1; \
  }

DEF_SAT_U_ADD_IMM_FMT_1(uint64_t)

The below test is passed for this patch.
* The rv64gcv regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Add helper test macro.
* gcc.target/riscv/sat_u_add_imm-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-4.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_add_imm-run-4.c: New test.

Signed-off-by: Pan Li 
(cherry picked from commit ed213b384fdca9375c3ec53c2a0eae134fb98612)

Diff:
---
 gcc/testsuite/gcc.target/riscv/sat_arith.h | 10 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-1.c   | 19 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-2.c   | 21 ++
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-3.c   | 18 +
 gcc/testsuite/gcc.target/riscv/sat_u_add_imm-4.c   | 17 
 .../gcc.target/riscv/sat_u_add_imm-run-1.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-2.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-3.c | 46 ++
 .../gcc.target/riscv/sat_u_add_imm-run-4.c | 46 ++
 9 files changed, 269 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/sat_arith.h
index 0c2e44af718..4ec4ec36cc1 100644
--- a/gcc/testsuite/gcc.target/riscv/sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/sat_arith.h
@@ -60,6 +60,16 @@ sat_u_add_##T##_fmt_6 (T x, T y)\
 #define RUN_SAT_U_ADD_FMT_5(T, x, y) sat_u_add_##T##_fmt_5(x, y)
 #define RUN_SAT_U_ADD_FMT_6(T, x, y) sat_u_add_##T##_fmt_6(x, y)
 
+#define DEF_SAT_U_ADD_IMM_FMT_1(T, IMM)  \
+T __attribute__((noinline))  \
+sat_u_add_imm##IMM##_##T##_fmt_1 (T x)   \
+{\
+  return (T)(x + IMM) >= x ? (x + IMM) : -1; \
+}
+
+#define RUN_SAT_U_ADD_IMM_FMT_1(T, x, IMM, expect) \
+  if (sat_u_add_imm##IMM##_##T##_fmt_1(x) != expect) __builtin_abort ()
+
 
/**/
 /* Saturation Sub (Unsigned and Signed)   
*/
 
/**/
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-1.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-1.c
new file mode 100644
index 000..14e9b7595a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm9_uint8_t_fmt_1:
+** addi\s+[atx][0-9]+,\s*a0,\s*9
+** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** andi\s+a0,\s*a0,\s*0xff
+** ret
+*/
+DEF_SAT_U_ADD_IMM_FMT_1(uint8_t, 9)
+
+/* { dg-final { scan-rtl-dump-times ".SAT_ADD " 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-2.c 
b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-2.c
new file mode 100644
index 000..c1a3c6ff21d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/sat_u_add_imm-2.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "sat_arith.h"
+
+/*
+** sat_u_add_imm3_uint16_t_fmt_1:
+** addi\s+[atx][0-9]+,\s*a0,\s*3
+** slli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** srli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*48
+** sltu\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** neg\s+[atx][0-9]+,\s*[atx][0-9]+
+** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+
+** slli\s+a0,\s*a0,\s*48
+** srli\s+a0,\s*a0,\s*48
+** ret
+*/
+DEF_SAT_U_ADD_IMM_FMT_1(uint16_t, 3)
+
+/* { dg-final { 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [to-be-committed, RISC-V, V4] movmem for RISCV with V extension

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:5bcf397a0c5a4a23e89af43fec6bd53c1faea3ee

commit 5bcf397a0c5a4a23e89af43fec6bd53c1faea3ee
Author: Sergei Lewis 
Date:   Sat Jun 29 14:34:31 2024 -0600

[to-be-committed,RISC-V,V4] movmem for RISCV with V extension

I hadn't updated my repo on the host where I handle email, so it picked
up the older version of this patch without the testsuite fix.  So, V4
with the testsuite option for lmul fixed.

--

And Sergei's movmem patch.  Just trivial testsuite adjustment for an
option name change and a whitespace fix from me.

I've spun this in my tester for rv32 and rv64.  I'll wait for pre-commit
CI before taking further action.

Just a reminder, this patch is designed to handle the case where we can
issue a single vector load/store which avoids all the complexities of
determining which direction to copy.

--

gcc/ChangeLog

* config/riscv/riscv.md (movmem): New expander.

gcc/testsuite/ChangeLog

PR target/112109
* gcc.target/riscv/rvv/base/movmem-1.c: New test

(cherry picked from commit 42946aa9b3228262e413481a3193bda85c20ef4b)

Diff:
---
 gcc/config/riscv/riscv.md  | 22 
 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c | 60 ++
 2 files changed, 82 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff37125e3f2..c0c960353eb 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2723,6 +2723,28 @@
 FAIL;
 })
 
+;; Inlining general memmove is a pessimisation: we can't avoid having to decide
+;; which direction to go at runtime, which is costly in instruction count
+;; however for situations where the entire move fits in one vector operation
+;; we can do all reads before doing any writes so we don't have to worry
+;; so generate the inline vector code in such situations
+;; nb. prefer scalar path for tiny memmoves.
+(define_expand "movmem"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+   (match_operand:BLK 1 "general_operand"))
+(use (match_operand:P 2 "const_int_operand"))
+(use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN / 8)
+   && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+   && riscv_vector::expand_block_move (operands[0], operands[1],
+operands[2]))
+DONE;
+  else
+FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 000..d9d4a70a392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3 -mrvv-max-lmul=dynamic" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8)
+
+/* Tiny memmoves should not be vectorised.
+** f1:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char *
+f1 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES - 1);
+}
+
+/* Vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char *
+f2 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* Vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char *
+f3 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8);
+}
+
+/* Don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char *
+f4 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8 + 1);
+}


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add testcases for vector truncate after .SAT_SUB

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:86b900f6761bb4265d3d30b900c366d1817e0f7d

commit 86b900f6761bb4265d3d30b900c366d1817e0f7d
Author: Pan Li 
Date:   Mon Jun 24 22:25:57 2024 +0800

RISC-V: Add testcases for vector truncate after .SAT_SUB

This patch would like to add the test cases of the vector truncate after
.SAT_SUB.  Aka:

  #define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
  void __attribute__((noinline))   \
  vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
 unsigned limit) \
  {\
unsigned i;\
for (i = 0; i < limit; i++)\
  {\
IN_T x = op_1[i];  \
out[i] = (OUT_T)(x >= y ? x - y : 0);  \
  }\
  }

The below 3 cases are included.

DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint8_t, uint16_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint16_t, uint32_t)
DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(uint32_t, uint64_t)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c: 
New test.

Signed-off-by: Pan Li 
(cherry picked from commit b55798c0fc5cb02512b58502961d8425fb60588f)

Diff:
---
 .../riscv/rvv/autovec/binop/vec_sat_arith.h| 19 ++
 .../rvv/autovec/binop/vec_sat_binary_scalar.h  | 27 
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 21 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c  | 74 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c  | 74 ++
 .../rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c  | 74 ++
 8 files changed, 331 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
index d5c81fbe5a9..a3116033fb3 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h
@@ -310,4 +310,23 @@ vec_sat_u_sub_##T##_fmt_10 (T *out, T *op_1, T *op_2, 
unsigned limit) \
 #define RUN_VEC_SAT_U_SUB_FMT_10(T, out, op_1, op_2, N) \
   vec_sat_u_sub_##T##_fmt_10(out, op_1, op_2, N)
 
+/**/
+/* Saturation Sub Truncated (Unsigned and Signed) 
*/
+/**/
+#define DEF_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T)   \
+void __attribute__((noinline))   \
+vec_sat_u_sub_trunc_##OUT_T##_fmt_1 (OUT_T *out, IN_T *op_1, IN_T y, \
+unsigned limit) \
+{\
+  unsigned i;\
+  for (i = 0; i < limit; i++)\
+{\
+  IN_T x = op_1[i];  \
+  out[i] = (OUT_T)(x >= y ? x - y : 0);  \
+}\
+}
+
+#define RUN_VEC_SAT_U_SUB_TRUNC_FMT_1(OUT_T, IN_T, out, op_1, y, N) \
+  vec_sat_u_sub_trunc_##OUT_T##_fmt_1(out, op_1, y, N)
+
 #endif
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h
new file mode 100644
index 000..c79b180054e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h
@@ -0,0 +1,27 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY_SCALAR
+#define HAVE_DEFINED_VEC_SAT_BINARY_SCALAR
+
+int
+main ()

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Update testcase comments to point to PSABI rather than Table A.6

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:09b41e93787a07e6dc3b6db35dea35d84b7481b1

commit 09b41e93787a07e6dc3b6db35dea35d84b7481b1
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:18 2024 -0700

RISC-V: Update testcase comments to point to PSABI rather than Table A.6

Table A.6 was originally the source of truth for the recommended mappings.
Point to the PSABI doc since the memory model mappings have been moved 
there.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/a-rvwmo-fence.c: Replace A.6 reference with 
PSABI.
* gcc.target/riscv/amo/a-rvwmo-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-rvwmo-store-release.c: Ditto.
* gcc.target/riscv/amo/a-ztso-fence.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-acquire.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-load-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-relaxed.c: Ditto.
* gcc.target/riscv/amo/a-ztso-store-release.c: Ditto.
* gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c: Ditto.
* gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c: Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: 
Ditto.

Signed-off-by: Patrick O'Neill 
(cherry picked from commit 86a3dbeb6c6a36f8cf97c66cef83c9bc3ad82027)

Diff:
---
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-acquire.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-relaxed.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-relaxed.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-store-release.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-fence.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-acquire.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-relaxed.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-load-seq-cst.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-relaxed.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-store-release.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c   | 2 +-
 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Consolidate amo testcase variants

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:001dc95c8c47f6ff0ddac5c962f5049473bdb9ef

commit 001dc95c8c47f6ff0ddac5c962f5049473bdb9ef
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:17 2024 -0700

RISC-V: Consolidate amo testcase variants

Many riscv/amo/ testcases use check-function-bodies. These testcases can be
consolidated with related testcases (memory ordering variants) without 
affecting
the assertions.

Give functions descriptive names so testsuite failures are obvious from the
'FAIL:' line.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Removed.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-amo-add-5.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-1.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-2.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-3.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-4.c: Removed.
* gcc.target/riscv/amo/amo-table-ztso-fence-5.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Removed.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Removed.
* gcc.target/riscv/amo/a-rvwmo-fence.c: New test.
* gcc.target/riscv/amo/a-ztso-fence.c: New test.
* gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c: New test.
* gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c: New test.
* gcc.target/riscv/amo/zalrsc-rvwmo-amo-add-int.c: New test.
* gcc.target/riscv/amo/zalrsc-ztso-amo-add-int.c: New test.

Signed-off-by: Patrick O'Neill 
(cherry picked from commit aa89e86f70ac65e2d51f33ac45849d05a4f30524)

Diff:
---
 gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c | 56 
 gcc/testsuite/gcc.target/riscv/amo/a-ztso-fence.c  | 52 +++
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c | 17 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-1.c   | 15 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-2.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-3.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-4.c   | 16 -
 .../gcc.target/riscv/amo/amo-table-a-6-fence-5.c   | 16 -
 .../riscv/amo/amo-table-ztso-amo-add-1.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-2.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-3.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-4.c   | 17 -
 .../riscv/amo/amo-table-ztso-amo-add-5.c   | 17 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-1.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-2.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-3.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-4.c  | 15 -
 .../gcc.target/riscv/amo/amo-table-ztso-fence-5.c  | 16 -
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c| 22 --
 .../gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c| 22 --
 .../gcc.target/riscv/amo/zaamo-rvwmo-amo-add-int.c | 57 
 .../gcc.target/riscv/amo/zaamo-ztso-amo-add-int.c  | 57 
 .../riscv/amo/zalrsc-rvwmo-amo-add-int.c   | 78 ++
 .../gcc.target/riscv/amo/zalrsc-ztso-amo-add-int.c | 78 ++
 31 files changed, 378 insertions(+), 435 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo/a-rvwmo-fence.c 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Rename amo testcases

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0fe00385dd91ba9483bca688f5950390b70b0acd

commit 0fe00385dd91ba9483bca688f5950390b70b0acd
Author: Patrick O'Neill 
Date:   Tue Jun 25 14:14:16 2024 -0700

RISC-V: Rename amo testcases

Rename riscv/amo/ testcases to follow a '{ext}-{model}-{name}-{memory 
order}.c'
naming convention.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-acquire.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-load-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-compat-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Move to...
* gcc.target/riscv/amo/a-rvwmo-store-release.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-2.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-acquire.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-1.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-load-3.c: Move to...
* gcc.target/riscv/amo/a-ztso-load-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-3.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-compat-seq-cst.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-1.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-relaxed.c: ...here.
* gcc.target/riscv/amo/amo-table-ztso-store-2.c: Move to...
* gcc.target/riscv/amo/a-ztso-store-release.c: ...here.
* gcc.target/riscv/amo/amo-zaamo-preferred-over-zalrsc.c: Move to...
* gcc.target/riscv/amo/zaamo-preferred-over-zalrsc.c: ...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Move to...
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-6.c: Move 
to...
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-3.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-2.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
...here.
* gcc.target/riscv/amo/amo-table-ztso-compare-exchange-1.c: Move 
to...
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
...here.
   

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed][RISC-V] Fix expected output for thead store pair test

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:69251774e403bc643ed862c13ca54eb37e6cabb2

commit 69251774e403bc643ed862c13ca54eb37e6cabb2
Author: Jeff Law 
Date:   Wed Jun 26 06:59:26 2024 -0600

[committed][RISC-V] Fix expected output for thead store pair test

Surya's patch to IRA has improved the code we generate for one of the thead
store pair tests for both rv32 and rv64.  This patch adjusts the 
expectations
of that test.

I've verified that the test now passes on rv32 and rv64 in my tester.  
Pushing
to the trunk.

gcc/testsuite
* gcc.target/riscv/xtheadmempair-3.c: Update expected output.

(cherry picked from commit 03a3dffa43145f80548d32b266b9b87be07b52ee)

Diff:
---
 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c 
b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
index 5dec702819a..99a6ae7f4d7 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
@@ -17,13 +17,11 @@ void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, 
xlen_t, xlen_t);
 void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t 
g, xlen_t h)
 {
   foo (a, b, c, d, e, f, g, h);
-  /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs.  */
-  /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs.  */
   bar (a, b, c, d, e, f, g, h);
 }
 
-/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */
-/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.ldd\t" 4 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.sdd\t" 4 { target { rv64 } } } } */
 
-/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */
-/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv32 } } } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6a975b9f6e819f4c0532593f15ff15ccaf3a616e

commit 6a975b9f6e819f4c0532593f15ff15ccaf3a616e
Author: Sergei Lewis 
Date:   Tue Jun 25 15:26:14 2024 -0600

[PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension

So this is the cmpmem patch from Sergei, updated for the trunk.

Updates included adjusting the existing cmpmemsi expander to
conditionally try expansion via vector.  And a minor testsuite
adjustment to turn off vector expansion in one test that is primarily
focused on vset optimization and ensuring we don't have extras.

I've spun this in my tester successfully and just want to see a clean
run through precommit CI before moving forward.

Jeff
gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): 
New
function.
* config/riscv/riscv.md (cmpmemsi): Try 
riscv_vector::expand_vec_cmpmem
for constant lengths.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
* gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Turn off vector 
mem* and
str* handling.

(cherry picked from commit b1e828dd9694294de1ec71e319d32a6b30b087d8)

Diff:
---
 gcc/config/riscv/riscv-protos.h|   1 +
 gcc/config/riscv/riscv-string.cc   | 100 +
 gcc/config/riscv/riscv.md  |   7 +-
 .../gcc.target/riscv/rvv/autovec/vls/misalign-1.c  |   2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c |  88 ++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c |  74 +++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c |  45 ++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c |  62 +
 8 files changed, 377 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a3380d4250d..a8b76173fa0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -679,6 +679,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = 
false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, rtx);
 bool expand_vec_setmem (rtx, rtx, rtx);
+bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1ddebdcee3f..257a514d290 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1605,4 +1605,104 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx 
fill_value_in)
   return true;
 }
 
+/* Used by cmpmemsi in riscv.md.  */
+
+bool
+expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in)
+{
+  HOST_WIDE_INT lmul;
+  /* Check we are able and allowed to vectorise this operation;
+ bail if not.  */
+  if (!check_vectorise_memory_operation (length_in, lmul))
+return false;
+
+  /* Strategy:
+ load entire blocks at a and b into vector regs
+ generate mask of bytes that differ
+ find first set bit in mask
+ find offset of first set bit in mask, use 0 if none set
+ result is ((char*)a[offset] - (char*)b[offset])
+   */
+
+  machine_mode vmode
+  = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul)
+ .require ();
+  rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0));
+  rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr);
+  rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0));
+  rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr);
+
+  rtx vec_a = gen_reg_rtx (vmode);
+  rtx vec_b = gen_reg_rtx (vmode);
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+  rtx mismatch_ofs = gen_reg_rtx (Pmode);
+
+  rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b);
+  rtx vmsops[] = { mask, ne, vec_a, vec_b };
+  rtx vfops[] = { mismatch_ofs, mask };
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+{
+  emit_move_insn (vec_a, blk_a);
+  emit_move_insn (vec_b, blk_b);
+  emit_vlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP,
+  vmsops);
+
+  emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+  riscv_vector::CPOP_OP, vfops);
+}
+  else
+{
+  if (!satisfies_constraint_K (length_in))
+ length_in = 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] ira: Scale save/restore costs of callee save registers with block frequency

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:6113813e19c6364bfc83fe22b218838b11ce3f38

commit 6113813e19c6364bfc83fe22b218838b11ce3f38
Author: Surya Kumari Jangala 
Date:   Tue Jun 25 08:37:49 2024 -0500

ira: Scale save/restore costs of callee save registers with block frequency

In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.

2024-06-25  Surya Kumari Jangala  

gcc/
PR rtl-optimization/111673
* ira-color.cc (assign_hard_reg): Scale save/restore costs of
callee save registers with block frequency.

gcc/testsuite/
PR rtl-optimization/111673
* gcc.target/powerpc/pr111673.c: New test.

(cherry picked from commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b)

Diff:
---
 gcc/ira-color.cc|  4 +++-
 gcc/testsuite/gcc.target/powerpc/pr111673.c | 17 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..ca32a23a0c9 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2178,7 +2178,9 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
add_cost = ((ira_memory_move_cost[mode][rclass][0]
 + ira_memory_move_cost[mode][rclass][1])
* saved_nregs / hard_regno_nregs (hard_regno,
- mode) - 1);
+ mode) - 1)
+  * (optimize_size ? 1 :
+ REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
cost += add_cost;
full_cost += add_cost;
  }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111673.c 
b/gcc/testsuite/gcc.target/powerpc/pr111673.c
new file mode 100644
index 000..e0c0f85460a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111673.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue" } */
+
+/* Verify there is an early return without the prolog and shrink-wrap
+   the function. */
+
+int f (int);
+int
+advance (int dz)
+{
+  if (dz > 0)
+return (dz + dz) * dz;
+  else
+return dz * f (dz);
+}
+
+/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
"pro_and_epilogue" } } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed][RISC-V] Fix some of the testsuite fallout from late-combine patch

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d46ded6dc58a9d2705ddb37b9a4a33117d58c622

commit d46ded6dc58a9d2705ddb37b9a4a33117d58c622
Author: Jeff Law 
Date:   Mon Jun 24 23:22:21 2024 -0600

[committed][RISC-V] Fix some of the testsuite fallout from late-combine 
patch

This fixes most, but not all of the testsuite fallout from the late-combine
patch.  Specifically in the vector space we're often able to eliminate a
broadcast of an scalar element across a vector.  That eliminates the vsetvl
related to the broadcast, but more importantly from the testsuite 
standpoint it
turns .vv forms into .vf or .vx forms.

There were two paths we could have taken here.  One to accept .v*, ignoring 
the
actual register operands.  Or to create new matches for the .vx and .vf
variants.  I selected the latter as I'd like us to know if the code to avoid
the broadcast regresses.

I'm pushing this through now so that we've got cleaner results and to 
prevent
duplicate work.  I've got patch for the rest of the testsuite fallout, but I
want to think about them a bit.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
expected test output after late-combine changes.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Likewise.

(cherry picked from commit 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [PATCH v2 2/3] RISC-V: setmem for RISCV with V extension

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:46289ca0be63cc7a395d01e2bcbf4f2a572249b1

commit 46289ca0be63cc7a395d01e2bcbf4f2a572249b1
Author: Sergei Lewis 
Date:   Mon Jun 24 14:20:14 2024 -0600

[PATCH v2 2/3] RISC-V: setmem for RISCV with V extension

This is primarily Sergei's work, my contributions were limited to
merging his expander with the one that's on the trunk, allowing
non-constant value and trivial testsuite adjustments due to option renaming.

I'm doing setmem first because it's the easiest.  The others will follow
soon enough.

I've tested this in my system, waiting on pre-commit CI to render its
verdict before moving forward.

gcc/ChangeLog

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New
function declaration.

* config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): 
New
function: this generates an inline vectorised memory set, if and 
only if
we know the entire operation can be performed in a single vector 
store.

* config/riscv/riscv.md (setmem): Try 
riscv_vector::expand_vec_setmem
for constant lengths.  Do not require operand 2 to be a constant.

gcc/testsuite/ChangeLog

* gcc.target/riscv/rvv/base/setmem-1.c: New tests
* gcc.target/riscv/rvv/base/setmem-2.c: New tests
* gcc.target/riscv/rvv/base/setmem-3.c: New tests

(cherry picked from commit a424318d32103dde827e8507fa27d24d33407ec9)

Diff:
---
 gcc/config/riscv/riscv-protos.h|   1 +
 gcc/config/riscv/riscv-string.cc   |  89 ++
 gcc/config/riscv/riscv.md  |   9 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-1.c | 103 +
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-2.c |  51 ++
 gcc/testsuite/gcc.target/riscv/rvv/base/setmem-3.c |  69 ++
 6 files changed, 320 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d6473d0cd85..a3380d4250d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -678,6 +678,7 @@ void expand_popcount (rtx *);
 void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, rtx);
+bool expand_vec_setmem (rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 4702001bd9b..1ddebdcee3f 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1516,4 +1516,93 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx 
nbytes,
   return true;
 }
 
+/* Check we are permitted to vectorise a memory operation.
+   If so, return true and populate lmul_out.
+   Otherwise, return false and leave lmul_out unchanged.  */
+static bool
+check_vectorise_memory_operation (rtx length_in, HOST_WIDE_INT _out)
+{
+  /* If we either can't or have been asked not to vectorise, respect this.  */
+  if (!TARGET_VECTOR)
+return false;
+  if (!(stringop_strategy & STRATEGY_VECTOR))
+return false;
+
+  /* If we can't reason about the length, don't vectorise.  */
+  if (!CONST_INT_P (length_in))
+return false;
+
+  HOST_WIDE_INT length = INTVAL (length_in);
+
+  /* If it's tiny, default operation is likely better; maybe worth
+ considering fractional lmul in the future as well.  */
+  if (length < (TARGET_MIN_VLEN / 8))
+return false;
+
+  /* If we've been asked to use a specific LMUL,
+ check the operation fits and do that.  */
+  if (rvv_max_lmul != RVV_DYNAMIC)
+{
+  lmul_out = TARGET_MAX_LMUL;
+  return (length <= ((TARGET_MAX_LMUL * TARGET_MIN_VLEN) / 8));
+}
+
+  /* Find smallest lmul large enough for entire op.  */
+  HOST_WIDE_INT lmul = 1;
+  while ((lmul <= 8) && (length > ((lmul * TARGET_MIN_VLEN) / 8)))
+{
+  lmul <<= 1;
+}
+
+  if (lmul > 8)
+return false;
+
+  lmul_out = lmul;
+  return true;
+}
+
+/* Used by setmemdi in riscv.md.  */
+bool
+expand_vec_setmem (rtx dst_in, rtx length_in, rtx fill_value_in)
+{
+  HOST_WIDE_INT lmul;
+  /* Check we are able and allowed to vectorise this operation;
+ bail if not.  */
+  if (!check_vectorise_memory_operation (length_in, lmul))
+return false;
+
+  machine_mode vmode
+  = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul)
+   .require ();
+  rtx dst_addr = copy_addr_to_reg (XEXP (dst_in, 0));
+  rtx dst = change_address (dst_in, vmode, dst_addr);
+
+  rtx fill_value = gen_reg_rtx (vmode);
+  rtx broadcast_ops[] = { fill_value, fill_value_in };
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+  if (known_eq (GET_MODE_SIZE (vmode), 

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add dg-remove-option for z* extensions

2024-07-02 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:bf3011d4d1833a268a65800d5b6b5e6be800fda4

commit bf3011d4d1833a268a65800d5b6b5e6be800fda4
Author: Patrick O'Neill 
Date:   Mon Jun 24 12:06:15 2024 -0700

RISC-V: Add dg-remove-option for z* extensions

This introduces testsuite support infra for removing extensions.
Since z* extensions don't have ordering requirements the logic for
adding/removing those extensions has also been consolidated.

This fixes RVWMO compile testcases failing on Ztso targets by removing
the extension from the -march string.

gcc/ChangeLog:

* doc/sourcebuild.texi (dg-remove-option): Add documentation.
(dg-add-option): Add documentation for riscv_{a,zaamo,zalrsc,ztso}

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c: Add 
dg-remove-options
for ztso.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-fence-5.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-load-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-store-compat-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-table-a-6-subword-amo-add-5.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-1.c: Replace manually
specified -march string with dg-add/remove-options directives.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-2.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-3.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-4.c: Ditto.
* gcc.target/riscv/amo/amo-zalrsc-amo-add-5.c: Ditto.
* lib/target-supports-dg.exp: Add dg-remove-options.
* lib/target-supports.exp: Add dg-remove-options and consolidate z*
extension add/remove-option code.

Signed-off-by: Patrick O'Neill 
(cherry picked from commit 580c37f1ef7db8e7a398184eb8f5d7555124d30a)

Diff:
---
 gcc/doc/sourcebuild.texi   |  43 +
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-1.c |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-2.c |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-3.c |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-4.c |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-amo-add-5.c |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-1.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-2.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-3.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-4.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-5.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-6.c   |   1 +
 .../riscv/amo/amo-table-a-6-compare-exchange-7.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-fence-1.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-fence-2.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-fence-3.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-fence-4.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-fence-5.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-load-1.c|   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-load-2.c|   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-load-3.c|   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-store-1.c   |   1 +
 .../gcc.target/riscv/amo/amo-table-a-6-store-2.c   |   1 +
 .../riscv/amo/amo-table-a-6-store-compat-3.c   

[gcc r15-1723] [to-be-committed, RISC-V, V4] movmem for RISCV with V extension

2024-06-29 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:42946aa9b3228262e413481a3193bda85c20ef4b

commit r15-1723-g42946aa9b3228262e413481a3193bda85c20ef4b
Author: Sergei Lewis 
Date:   Sat Jun 29 14:34:31 2024 -0600

[to-be-committed,RISC-V,V4] movmem for RISCV with V extension

I hadn't updated my repo on the host where I handle email, so it picked
up the older version of this patch without the testsuite fix.  So, V4
with the testsuite option for lmul fixed.

--

And Sergei's movmem patch.  Just trivial testsuite adjustment for an
option name change and a whitespace fix from me.

I've spun this in my tester for rv32 and rv64.  I'll wait for pre-commit
CI before taking further action.

Just a reminder, this patch is designed to handle the case where we can
issue a single vector load/store which avoids all the complexities of
determining which direction to copy.

--

gcc/ChangeLog

* config/riscv/riscv.md (movmem): New expander.

gcc/testsuite/ChangeLog

PR target/112109
* gcc.target/riscv/rvv/base/movmem-1.c: New test

Diff:
---
 gcc/config/riscv/riscv.md  | 22 
 gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c | 60 ++
 2 files changed, 82 insertions(+)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff37125e3f2..c0c960353eb 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2723,6 +2723,28 @@
 FAIL;
 })
 
+;; Inlining general memmove is a pessimisation: we can't avoid having to decide
+;; which direction to go at runtime, which is costly in instruction count
+;; however for situations where the entire move fits in one vector operation
+;; we can do all reads before doing any writes so we don't have to worry
+;; so generate the inline vector code in such situations
+;; nb. prefer scalar path for tiny memmoves.
+(define_expand "movmem"
+  [(parallel [(set (match_operand:BLK 0 "general_operand")
+   (match_operand:BLK 1 "general_operand"))
+(use (match_operand:P 2 "const_int_operand"))
+(use (match_operand:SI 3 "const_int_operand"))])]
+  "TARGET_VECTOR"
+{
+  if ((INTVAL (operands[2]) >= TARGET_MIN_VLEN / 8)
+   && (INTVAL (operands[2]) <= TARGET_MIN_VLEN)
+   && riscv_vector::expand_block_move (operands[0], operands[1],
+operands[2]))
+DONE;
+  else
+FAIL;
+})
+
 ;; Expand in-line code to clear the instruction cache between operand[0] and
 ;; operand[1].
 (define_expand "clear_cache"
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
new file mode 100644
index 000..d9d4a70a392
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/movmem-1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-add-options riscv_v } */
+/* { dg-additional-options "-O3 -mrvv-max-lmul=dynamic" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#define MIN_VECTOR_BYTES (__riscv_v_min_vlen / 8)
+
+/* Tiny memmoves should not be vectorised.
+** f1:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char *
+f1 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES - 1);
+}
+
+/* Vectorise+inline minimum vector register width with LMUL=1
+** f2:
+**  (
+**  vsetivli\s+zero,16,e8,m1,ta,ma
+**  |
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m1,ta,ma
+**  )
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char *
+f2 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES);
+}
+
+/* Vectorise+inline up to LMUL=8
+** f3:
+**  li\s+[ta][0-7],\d+
+**  vsetvli\s+zero,[ta][0-7],e8,m8,ta,ma
+**  vle8\.v\s+v\d+,0\(a1\)
+**  vse8\.v\s+v\d+,0\(a0\)
+**  ret
+*/
+char *
+f3 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8);
+}
+
+/* Don't vectorise if the move is too large for one operation
+** f4:
+**  li\s+a2,\d+
+**  tail\s+memmove
+*/
+char *
+f4 (char *a, char const *b)
+{
+  return __builtin_memmove (a, b, MIN_VECTOR_BYTES * 8 + 1);
+}


[gcc r15-1719] [committed] Fix mcore-elf regression after recent IRA change

2024-06-28 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:9fbbad9b6c6e7fa7eaf37552173f5b8b2958976b

commit r15-1719-g9fbbad9b6c6e7fa7eaf37552173f5b8b2958976b
Author: Jeff Law 
Date:   Fri Jun 28 18:36:50 2024 -0600

[committed] Fix mcore-elf regression after recent IRA change

So the recent IRA change exposed a bug in the mcore backend.

The mcore has a special instruction (xtrb3) which can zero extend a GPR into
R1.  It's useful because zextb requires a matching source/destination.
Unfortunately xtrb3 modifies CC.

The IRA changes twiddle register allocation such that we want to use xtrb3.
Unfortunately CC is live at the point where we want to use xtrb3 and 
clobbering
CC causes the test to fail.

Exposing the clobber in the expander and insn seems like the best path 
forward.
We could also drop the xtrb3 alternative, but that seems like it would hurt
codegen more than exposing the clobber.

The bitfield extraction patterns using xtrb look problematic as well, but I
didn't try to fix those.

This fixes the builtn-arith-overflow regressions and appears to fix
20010122-1.c as a side effect.

gcc/
* config/mcore/mcore.md  (zero_extendqihi2): Clobber CC in expander
and matching insn.
(zero_extendqisi2): Likewise.

Diff:
---
 gcc/config/mcore/mcore.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/mcore/mcore.md b/gcc/config/mcore/mcore.md
index d416ce24a97..432b89520d7 100644
--- a/gcc/config/mcore/mcore.md
+++ b/gcc/config/mcore/mcore.md
@@ -1057,15 +1057,17 @@
   [(set_attr "type" "load")])
 
 (define_expand "zero_extendqisi2"
-  [(set (match_operand:SI 0 "mcore_arith_reg_operand" "")
-   (zero_extend:SI (match_operand:QI 1 "general_operand" "")))]
+  [(parallel [(set (match_operand:SI 0 "mcore_arith_reg_operand" "")
+ (zero_extend:SI (match_operand:QI 1 "general_operand" "")))
+ (clobber (reg:CC 17))])]
   ""
   "") 
 
 ;; RBE: XXX: we don't recognize that the xtrb3 kills the CC register.
 (define_insn ""
   [(set (match_operand:SI 0 "mcore_arith_reg_operand" "=r,b,r")
-   (zero_extend:SI (match_operand:QI 1 "general_operand" "0,r,m")))]
+   (zero_extend:SI (match_operand:QI 1 "general_operand" "0,r,m")))
+   (clobber (reg:CC 17))]
   ""
   "@
zextb   %0
@@ -1091,15 +1093,17 @@
   [(set_attr "type" "load")])
 
 (define_expand "zero_extendqihi2"
-  [(set (match_operand:HI 0 "mcore_arith_reg_operand" "")
-   (zero_extend:HI (match_operand:QI 1 "general_operand" "")))]
+  [(parallel [(set (match_operand:HI 0 "mcore_arith_reg_operand" "")
+  (zero_extend:HI (match_operand:QI 1 "general_operand" "")))
+ (clobber (reg:CC 17))])]
   ""
   "") 
 
 ;; RBE: XXX: we don't recognize that the xtrb3 kills the CC register.
 (define_insn ""
   [(set (match_operand:HI 0 "mcore_arith_reg_operand" "=r,b,r")
-   (zero_extend:HI (match_operand:QI 1 "general_operand" "0,r,m")))]
+   (zero_extend:HI (match_operand:QI 1 "general_operand" "0,r,m")))
+   (clobber (reg:CC 17))]
   ""
   "@
zextb   %0


[gcc r15-1655] [committed] Remove compromised sh test

2024-06-26 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:47b68cda2c4afe32e84c5f18da0196c39e5e0edf

commit r15-1655-g47b68cda2c4afe32e84c5f18da0196c39e5e0edf
Author: Jeff Law 
Date:   Wed Jun 26 07:20:29 2024 -0600

[committed] Remove compromised sh test

Surya's recent patch to IRA improves the code for sh/pr54602-1.c slightly.
Specifically it's able to eliminate a save/restore in the prologue/epilogue 
and
a bit of register shuffling.

As a result there literally aren't any insns that can be used to fill the 
delay
slot of the return, so a nop gets emitted and the test fails.

Given there literally aren't any insns to move into the delay slot, the best
course of action is to just drop the test.

gcc/testsuite
* gcc.target/sh/pr54602-1.c: Delete test.

Diff:
---
 gcc/testsuite/gcc.target/sh/pr54602-1.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/gcc/testsuite/gcc.target/sh/pr54602-1.c 
b/gcc/testsuite/gcc.target/sh/pr54602-1.c
deleted file mode 100644
index e7fb2a9a642..000
--- a/gcc/testsuite/gcc.target/sh/pr54602-1.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/* Verify that the delay slot is stuffed with register pop insns for normal
-   (i.e. not interrupt handler) function returns.  If everything goes as
-   expected we won't see any nop insns.  */
-/* { dg-do compile }  */
-/* { dg-options "-O1" } */
-/* { dg-final { scan-assembler-not "nop" } } */
-
-int test00 (int a, int b);
-
-int
-test01 (int a, int b, int c, int d)
-{
-  return test00 (a, b) + c;
-}


[gcc r15-1654] [committed][RISC-V] Fix expected output for thead store pair test

2024-06-26 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:03a3dffa43145f80548d32b266b9b87be07b52ee

commit r15-1654-g03a3dffa43145f80548d32b266b9b87be07b52ee
Author: Jeff Law 
Date:   Wed Jun 26 06:59:26 2024 -0600

[committed][RISC-V] Fix expected output for thead store pair test

Surya's patch to IRA has improved the code we generate for one of the thead
store pair tests for both rv32 and rv64.  This patch adjusts the 
expectations
of that test.

I've verified that the test now passes on rv32 and rv64 in my tester.  
Pushing
to the trunk.

gcc/testsuite
* gcc.target/riscv/xtheadmempair-3.c: Update expected output.

Diff:
---
 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c 
b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
index 5dec702819a..99a6ae7f4d7 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
@@ -17,13 +17,11 @@ void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, 
xlen_t, xlen_t);
 void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t 
g, xlen_t h)
 {
   foo (a, b, c, d, e, f, g, h);
-  /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs.  */
-  /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs.  */
   bar (a, b, c, d, e, f, g, h);
 }
 
-/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */
-/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.ldd\t" 4 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.sdd\t" 4 { target { rv64 } } } } */
 
-/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */
-/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv32 } } } } */


Re: Straw poll on shifts with out of range operands

2024-06-25 Thread Jeff Law via Gcc




On 6/25/24 8:44 PM, Andrew Pinski via Gcc wrote:

I am in the middle of improving the isolation path pass for shifts
with out of range operands.
There are 3 options we could do really:
1) isolate the path to __builtin_unreachable
2) isolate the path to __builtin_trap
 This is what is currently done for null pointer and divide by zero
3) isolate the path and turn the shift into zero constant
This happens currently for explicit use in both match (in many
cases) and VRP for others.

All 3 are not hard to implement.
This comes up in the context of https://gcc.gnu.org/PR115636 where the
original requestor thinks it should be #3 but I suspect they didn't
realize it was undefined behavior then.
2 seems the best for user experience.
1 seems like the best for optimizations.
3 is more in line with how other parts of the compiler handle it.

So the question I have is which one should we go for? (is there
another option I missed besides not do anything)
There was a time when we were thinking about having a knob that would 
allow one to select which of the 3 approaches makes the most sense given 
the expected execution environment.


While I prefer #2, some have (reasonably) argued that it's not 
appropriate behavior for code in a library.


I'm not a fan of #1 because it allows unpredictable code execution. 
Essentially you just fall into whatever code was after the bogus shift 
in the executable and hope for the best.  One day this is going to bite 
us hard from a security standpoint.


#3 isn't great, but it's not terrible either.

Jeff


[gcc r15-1637] [PATCH 11/11] Handle subroutine types in CodeView

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:01f8b1002147c08f206a058a6d1f7bfb006aa324

commit r15-1637-g01f8b1002147c08f206a058a6d1f7bfb006aa324
Author: Mark Harmstone 
Date:   Tue Jun 25 20:24:22 2024 -0600

[PATCH 11/11] Handle subroutine types in CodeView

gcc/
* dwarf2codeview.cc (struct codeview_custom_type): Add lf_procedure
and lf_arglist to union.
(write_lf_procedure, write_lf_arglist): New functions.
(write_custom_types): Call write_lf_procedure and write_lf_arglist.
(get_type_num_subroutine_type): New function.
(get_type_num): Handle DW_TAG_subroutine_type DIEs.
* dwarf2codeview.h (LF_PROCEDURE, LF_ARGLIST): Define.

Diff:
---
 gcc/dwarf2codeview.cc | 238 ++
 gcc/dwarf2codeview.h  |   2 +
 2 files changed, 240 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 06267639169..e8ed3713480 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -262,6 +262,19 @@ struct codeview_custom_type
   uint8_t length;
   uint8_t position;
 } lf_bitfield;
+struct
+{
+  uint32_t return_type;
+  uint8_t calling_convention;
+  uint8_t attributes;
+  uint16_t num_parameters;
+  uint32_t arglist;
+} lf_procedure;
+struct
+{
+  uint32_t num_entries;
+  uint32_t *args;
+} lf_arglist;
   };
 };
 
@@ -1623,6 +1636,102 @@ write_lf_bitfield (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_PROCEDURE type.  Function pointers are implemented as pointers
+   to one of these.  */
+
+static void
+write_lf_procedure (codeview_custom_type *t)
+{
+  /* This is lf_procedure in binutils and lfProc in Microsoft's cvinfo.h:
+
+struct lf_procedure
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t return_type;
+  uint8_t calling_convention;
+  uint8_t attributes;
+  uint16_t num_parameters;
+  uint32_t arglist;
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_procedure.return_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_procedure.calling_convention);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_procedure.attributes);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_procedure.num_parameters);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_procedure.arglist);
+  putc ('\n', asm_out_file);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
+/* Write an LF_ARGLIST type.  This is just a list of other types.  LF_PROCEDURE
+   entries point to one of these.  */
+
+static void
+write_lf_arglist (codeview_custom_type *t)
+{
+  /* This is lf_arglist in binutils and lfArgList in Microsoft's cvinfo.h:
+
+struct lf_arglist
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t num_entries;
+  uint32_t args[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_arglist.num_entries);
+  putc ('\n', asm_out_file);
+
+  for (uint32_t i = 0; i < t->lf_arglist.num_entries; i++)
+{
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_arglist.args[i]);
+  putc ('\n', asm_out_file);
+}
+
+  free (t->lf_arglist.args);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1673,6 +1782,14 @@ write_custom_types (void)
case LF_BITFIELD:
  write_lf_bitfield (custom_types);
  break;
+
+   case LF_PROCEDURE:
+ write_lf_procedure (custom_types);
+ break;
+
+   case LF_ARGLIST:
+ write_lf_arglist (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2488,6 +2605,123 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   return 

[gcc r15-1636] [PATCH 10/11] Handle bitfields for CodeView

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:009b3290f2b2a41409530cbc8119343d6344b50e

commit r15-1636-g009b3290f2b2a41409530cbc8119343d6344b50e
Author: Mark Harmstone 
Date:   Tue Jun 25 20:20:26 2024 -0600

[PATCH 10/11] Handle bitfields for CodeView

Translates structure members with DW_AT_data_bit_offset set in DWARF
into LF_BITFIELD symbols.

gcc/
* dwarf2codeview.cc (struct codeview_custom_type): Add lf_bitfield 
to
union.
(write_lf_bitfield): New function.
(write_custom_types): Call write_lf_bitfield.
(create_bitfield): New function.
(get_type_num_struct): Handle bitfields.
* dwarf2codeview.h (LF_BITFIELD): Define.

Diff:
---
 gcc/dwarf2codeview.cc | 89 +--
 gcc/dwarf2codeview.h  |  1 +
 2 files changed, 88 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 3f1ce5577fc..06267639169 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -256,6 +256,12 @@ struct codeview_custom_type
   uint32_t index_type;
   codeview_integer length_in_bytes;
 } lf_array;
+struct
+{
+  uint32_t base_type;
+  uint8_t length;
+  uint8_t position;
+} lf_bitfield;
   };
 };
 
@@ -1573,6 +1579,50 @@ write_lf_array (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_BITFIELD type.  */
+
+static void
+write_lf_bitfield (codeview_custom_type *t)
+{
+  /* This is lf_bitfield in binutils and lfBitfield in Microsoft's cvinfo.h:
+
+struct lf_bitfield
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t base_type;
+  uint8_t length;
+  uint8_t position;
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.base_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.length);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (1, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_bitfield.position);
+  putc ('\n', asm_out_file);
+
+  write_cv_padding (2);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1619,6 +1669,10 @@ write_custom_types (void)
case LF_ARRAY:
  write_lf_array (custom_types);
  break;
+
+   case LF_BITFIELD:
+ write_lf_bitfield (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2199,6 +2253,33 @@ add_struct_forward_def (dw_die_ref type)
   return ct->num;
 }
 
+/* Add an LF_BITFIELD type, returning its number.  DWARF represents bitfields
+   as members in a struct with a DW_AT_data_bit_offset attribute, whereas in
+   CodeView they're a distinct type.  */
+
+static uint32_t
+create_bitfield (dw_die_ref c)
+{
+  codeview_custom_type *ct;
+  uint32_t base_type;
+
+  base_type = get_type_num (get_AT_ref (c, DW_AT_type), true, false);
+  if (base_type == 0)
+return 0;
+
+  ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
+
+  ct->next = NULL;
+  ct->kind = LF_BITFIELD;
+  ct->lf_bitfield.base_type = base_type;
+  ct->lf_bitfield.length = get_AT_unsigned (c, DW_AT_bit_size);
+  ct->lf_bitfield.position = get_AT_unsigned (c, DW_AT_data_bit_offset);
+
+  add_custom_type (ct);
+
+  return ct->num;
+}
+
 /* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type
DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type,
and return the number of the latter.  */
@@ -2279,8 +2360,12 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
  break;
}
 
- el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type), true,
-   false);
+ if (get_AT (c, DW_AT_data_bit_offset))
+   el->lf_member.type = create_bitfield (c);
+ else
+   el->lf_member.type = get_type_num (get_AT_ref (c, DW_AT_type),
+  true, false);
+
  el->lf_member.offset.neg = false;
  el->lf_member.offset.num = get_AT_unsigned (c,
  
DW_AT_data_member_location);
diff --git a/gcc/dwarf2codeview.h b/gcc/dwarf2codeview.h
index 70eed6bf2aa..70eae554b80 100644
--- a/gcc/dwarf2codeview.h
+++ b/gcc/dwarf2codeview.h
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 

[gcc r15-1630] [PATCH 09/11] Handle arrays for CodeView

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:3e64a687a303dd550df072676c853d79076de37f

commit r15-1630-g3e64a687a303dd550df072676c853d79076de37f
Author: Mark Harmstone 
Date:   Tue Jun 25 17:31:24 2024 -0600

[PATCH 09/11] Handle arrays for CodeView

Translates DW_TAG_array_type DIEs into LF_ARRAY symbols.

gcc/
* dwarf2codeview.cc (struct codeview_custom_type): Add lf_array to
union.
(write_lf_array): New function.
(write_custom_types): Call write_lf_array.
(get_type_num_array_type): New function.
(get_type_num): Handle DW_TAG_array_type DIEs.
* dwarf2codeview.h (LF_ARRAY): Define.

Diff:
---
 gcc/dwarf2codeview.cc | 179 ++
 gcc/dwarf2codeview.h  |   1 +
 2 files changed, 180 insertions(+)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 9e3b64522b2..3f1ce5577fc 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -250,6 +250,12 @@ struct codeview_custom_type
   codeview_integer length;
   char *name;
 } lf_structure;
+struct
+{
+  uint32_t element_type;
+  uint32_t index_type;
+  codeview_integer length_in_bytes;
+} lf_array;
   };
 };
 
@@ -1520,6 +1526,53 @@ write_lf_union (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_ARRAY type.  */
+
+static void
+write_lf_array (codeview_custom_type *t)
+{
+  size_t leaf_len;
+
+  /* This is lf_array in binutils and lfArray in Microsoft's cvinfo.h:
+
+struct lf_array
+{
+  uint16_t size;
+  uint16_t kind;
+  uint32_t element_type;
+  uint32_t index_type;
+  uint16_t length_in_bytes;
+  char name[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_array.element_type);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_array.index_type);
+  putc ('\n', asm_out_file);
+
+  leaf_len = 13 + write_cv_integer (>lf_array.length_in_bytes);
+
+  ASM_OUTPUT_ASCII (asm_out_file, "", 1);
+
+  write_cv_padding (4 - (leaf_len % 4));
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1562,6 +1615,10 @@ write_custom_types (void)
case LF_UNION:
  write_lf_union (custom_types);
  break;
+
+   case LF_ARRAY:
+ write_lf_array (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2346,6 +2403,124 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   return ct->num;
 }
 
+/* Process a DW_TAG_array_type DIE, adding an LF_ARRAY type and returning its
+   number.  */
+
+static uint32_t
+get_type_num_array_type (dw_die_ref type, bool in_struct)
+{
+  dw_die_ref base_type, t, first_child, c, *dimension_arr;
+  uint64_t size = 0;
+  unsigned int dimensions, i;
+  uint32_t element_type;
+
+  base_type = get_AT_ref (type, DW_AT_type);
+  if (!base_type)
+return 0;
+
+  /* We need to know the size of our base type.  Loop through until we find
+ it.  */
+  t = base_type;
+  while (t && size == 0)
+{
+  switch (dw_get_die_tag (t))
+   {
+   case DW_TAG_const_type:
+   case DW_TAG_volatile_type:
+   case DW_TAG_typedef:
+   case DW_TAG_enumeration_type:
+ t = get_AT_ref (t, DW_AT_type);
+ break;
+
+   case DW_TAG_base_type:
+   case DW_TAG_structure_type:
+   case DW_TAG_class_type:
+   case DW_TAG_union_type:
+   case DW_TAG_pointer_type:
+ size = get_AT_unsigned (t, DW_AT_byte_size);
+ break;
+
+   default:
+ return 0;
+   }
+}
+
+  if (size == 0)
+return 0;
+
+  first_child = dw_get_die_child (type);
+  if (!first_child)
+return 0;
+
+  element_type = get_type_num (base_type, in_struct, false);
+  if (element_type == 0)
+return 0;
+
+  /* Create an array of our DW_TAG_subrange_type children, in reverse order.
+ We have to do this because unlike DWARF CodeView doesn't have
+ multidimensional arrays, so instead we do arrays of arrays.  */
+
+  dimensions = 0;
+  c = first_child;
+  do
+{
+  c = dw_get_die_sib (c);
+  if (dw_get_die_tag (c) != DW_TAG_subrange_type)
+   continue;
+
+  dimensions++;
+}
+  while (c != first_child);
+
+  if (dimensions == 0)
+return 0;
+
+  dimension_arr = (dw_die_ref *) xmalloc (sizeof (dw_die_ref) * dimensions);
+
+  c = first_child;
+  i = 0;
+  do
+

[gcc r15-1629] [PATCH 08/11] Handle unions for CodeView.

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0a5f559fb369ac9568ff57835928ac3ce3517be7

commit r15-1629-g0a5f559fb369ac9568ff57835928ac3ce3517be7
Author: Mark Harmstone 
Date:   Tue Jun 25 17:28:49 2024 -0600

[PATCH 08/11] Handle unions for CodeView.

Translates DW_TAG_union_type DIEs into LF_UNION symbols.

gcc/
* dwarf2codeview.cc (write_lf_union): New function.
(write_custom_types): Call write_lf_union.
(add_struct_forward_def): Handle DW_TAG_union_type DIEs.
(get_type_num_struct): Handle unions.
(get_type_num): Handle DW_TAG_union_type DIEs.
* dwarf2codeview.h (LF_UNION): Define.

Diff:
---
 gcc/dwarf2codeview.cc | 91 +++
 gcc/dwarf2codeview.h  |  1 +
 2 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 9c6614f6297..9e3b64522b2 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -1454,6 +1454,72 @@ write_lf_structure (codeview_custom_type *t)
   asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
 }
 
+/* Write an LF_UNION type.  */
+
+static void
+write_lf_union (codeview_custom_type *t)
+{
+  size_t name_len, leaf_len;
+
+  /* This is lf_union in binutils and lfUnion in Microsoft's cvinfo.h:
+
+struct lf_union
+{
+  uint16_t size;
+  uint16_t kind;
+  uint16_t num_members;
+  uint16_t properties;
+  uint32_t field_list;
+  uint16_t length;
+  char name[];
+} ATTRIBUTE_PACKED;
+  */
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end - %LLcv_type%x_start\n",
+  t->num, t->num);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_start:\n", t->num);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->kind);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.num_members);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (2, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.properties);
+  putc ('\n', asm_out_file);
+
+  fputs (integer_asm_op (4, false), asm_out_file);
+  fprint_whex (asm_out_file, t->lf_structure.field_list);
+  putc ('\n', asm_out_file);
+
+  leaf_len = 12 + write_cv_integer (>lf_structure.length);
+
+  if (t->lf_structure.name)
+{
+  name_len = strlen (t->lf_structure.name) + 1;
+  ASM_OUTPUT_ASCII (asm_out_file, t->lf_structure.name, name_len);
+}
+  else
+{
+  static const char unnamed_struct[] = "";
+
+  name_len = sizeof (unnamed_struct);
+  ASM_OUTPUT_ASCII (asm_out_file, unnamed_struct, name_len);
+}
+
+  leaf_len += name_len;
+  write_cv_padding (4 - (leaf_len % 4));
+
+  free (t->lf_structure.name);
+
+  asm_fprintf (asm_out_file, "%LLcv_type%x_end:\n", t->num);
+}
+
 /* Write the .debug$T section, which contains all of our custom type
definitions.  */
 
@@ -1492,6 +1558,10 @@ write_custom_types (void)
case LF_CLASS:
  write_lf_structure (custom_types);
  break;
+
+   case LF_UNION:
+ write_lf_union (custom_types);
+ break;
}
 
   free (custom_types);
@@ -2026,7 +2096,7 @@ flush_deferred_types (void)
   last_deferred_type = NULL;
 }
 
-/* Add a forward definition for a struct or class.  */
+/* Add a forward definition for a struct, class, or union.  */
 
 static uint32_t
 add_struct_forward_def (dw_die_ref type)
@@ -2047,6 +2117,10 @@ add_struct_forward_def (dw_die_ref type)
   ct->kind = LF_STRUCTURE;
   break;
 
+case DW_TAG_union_type:
+  ct->kind = LF_UNION;
+  break;
+
 default:
   break;
 }
@@ -2068,9 +2142,9 @@ add_struct_forward_def (dw_die_ref type)
   return ct->num;
 }
 
-/* Process a DW_TAG_structure_type or DW_TAG_class_type DIE, add an
-   LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS type, and return the number of
-   the latter.  */
+/* Process a DW_TAG_structure_type, DW_TAG_class_type, or DW_TAG_union_type
+   DIE, add an LF_FIELDLIST and an LF_STRUCTURE / LF_CLASS / LF_UNION type,
+   and return the number of the latter.  */
 
 static uint32_t
 get_type_num_struct (dw_die_ref type, bool in_struct, bool *is_fwd_ref)
@@ -2227,8 +2301,8 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   ct = ct2;
 }
 
-  /* Now add an LF_STRUCTURE / LF_CLASS, pointing to the LF_FIELDLIST we just
- added.  */
+  /* Now add an LF_STRUCTURE / LF_CLASS / LF_UNION, pointing to the
+ LF_FIELDLIST we just added.  */
 
   ct = (codeview_custom_type *) xmalloc (sizeof (codeview_custom_type));
 
@@ -2244,6 +2318,10 @@ get_type_num_struct (dw_die_ref type, bool in_struct, 
bool *is_fwd_ref)
   ct->kind = LF_STRUCTURE;
   break;
 
+case DW_TAG_union_type:
+  ct->kind = LF_UNION;
+  break;
+
 default:
   break;
 }
@@ -2325,6 +2403,7 @@ get_type_num (dw_die_ref 

[gcc r15-1624] [PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:b1e828dd9694294de1ec71e319d32a6b30b087d8

commit r15-1624-gb1e828dd9694294de1ec71e319d32a6b30b087d8
Author: Sergei Lewis 
Date:   Tue Jun 25 15:26:14 2024 -0600

[PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension

So this is the cmpmem patch from Sergei, updated for the trunk.

Updates included adjusting the existing cmpmemsi expander to
conditionally try expansion via vector.  And a minor testsuite
adjustment to turn off vector expansion in one test that is primarily
focused on vset optimization and ensuring we don't have extras.

I've spun this in my tester successfully and just want to see a clean
run through precommit CI before moving forward.

Jeff
gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): 
New
function.
* config/riscv/riscv.md (cmpmemsi): Try 
riscv_vector::expand_vec_cmpmem
for constant lengths.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
* gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Turn off vector 
mem* and
str* handling.

Diff:
---
 gcc/config/riscv/riscv-protos.h|   1 +
 gcc/config/riscv/riscv-string.cc   | 100 +
 gcc/config/riscv/riscv.md  |   7 +-
 .../gcc.target/riscv/rvv/autovec/vls/misalign-1.c  |   2 +-
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-1.c |  88 ++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-2.c |  74 +++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-3.c |  45 ++
 gcc/testsuite/gcc.target/riscv/rvv/base/cmpmem-4.c |  62 +
 8 files changed, 377 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a3380d4250d..a8b76173fa0 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -679,6 +679,7 @@ void expand_rawmemchr (machine_mode, rtx, rtx, rtx, bool = 
false);
 bool expand_strcmp (rtx, rtx, rtx, rtx, unsigned HOST_WIDE_INT, bool);
 void emit_vec_extract (rtx, rtx, rtx);
 bool expand_vec_setmem (rtx, rtx, rtx);
+bool expand_vec_cmpmem (rtx, rtx, rtx, rtx);
 
 /* Rounding mode bitfield for fixed point VXRM.  */
 enum fixed_point_rounding_mode
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 1ddebdcee3f..257a514d290 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -1605,4 +1605,104 @@ expand_vec_setmem (rtx dst_in, rtx length_in, rtx 
fill_value_in)
   return true;
 }
 
+/* Used by cmpmemsi in riscv.md.  */
+
+bool
+expand_vec_cmpmem (rtx result_out, rtx blk_a_in, rtx blk_b_in, rtx length_in)
+{
+  HOST_WIDE_INT lmul;
+  /* Check we are able and allowed to vectorise this operation;
+ bail if not.  */
+  if (!check_vectorise_memory_operation (length_in, lmul))
+return false;
+
+  /* Strategy:
+ load entire blocks at a and b into vector regs
+ generate mask of bytes that differ
+ find first set bit in mask
+ find offset of first set bit in mask, use 0 if none set
+ result is ((char*)a[offset] - (char*)b[offset])
+   */
+
+  machine_mode vmode
+  = riscv_vector::get_vector_mode (QImode, BYTES_PER_RISCV_VECTOR * lmul)
+ .require ();
+  rtx blk_a_addr = copy_addr_to_reg (XEXP (blk_a_in, 0));
+  rtx blk_a = change_address (blk_a_in, vmode, blk_a_addr);
+  rtx blk_b_addr = copy_addr_to_reg (XEXP (blk_b_in, 0));
+  rtx blk_b = change_address (blk_b_in, vmode, blk_b_addr);
+
+  rtx vec_a = gen_reg_rtx (vmode);
+  rtx vec_b = gen_reg_rtx (vmode);
+
+  machine_mode mask_mode = get_mask_mode (vmode);
+  rtx mask = gen_reg_rtx (mask_mode);
+  rtx mismatch_ofs = gen_reg_rtx (Pmode);
+
+  rtx ne = gen_rtx_NE (mask_mode, vec_a, vec_b);
+  rtx vmsops[] = { mask, ne, vec_a, vec_b };
+  rtx vfops[] = { mismatch_ofs, mask };
+
+  /* If the length is exactly vlmax for the selected mode, do that.
+ Otherwise, use a predicated store.  */
+
+  if (known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in)))
+{
+  emit_move_insn (vec_a, blk_a);
+  emit_move_insn (vec_b, blk_b);
+  emit_vlmax_insn (code_for_pred_cmp (vmode), riscv_vector::COMPARE_OP,
+  vmsops);
+
+  emit_vlmax_insn (code_for_pred_ffs (mask_mode, Pmode),
+  riscv_vector::CPOP_OP, vfops);
+}
+  else
+{
+  if (!satisfies_constraint_K (length_in))
+ length_in = force_reg (Pmode, length_in);
+
+  rtx memmask = CONSTM1_RTX 

[gcc r15-1617] [committed] Fix fr30-elf newlib build failure with late-combine

2024-06-25 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:7c28228cda274484b78611ea4c5cbe4ce08f512e

commit r15-1617-g7c28228cda274484b78611ea4c5cbe4ce08f512e
Author: Jeff Law 
Date:   Tue Jun 25 11:22:01 2024 -0600

[committed] Fix fr30-elf newlib build failure with late-combine

So the late combine work has exposed a latent bug in the fr30 port.

The fr30 "call" instruction is pc-relative with a *very* limited range, 12 
bits
to be precise.

With such a limited range its hard to see how we could ever consistently 
use it
in the compiler, with the possible exception of self-recursion.  Even for a
call to a locally binding function -ffunction-sections and linker placement 
of
functions may separate the caller/callee.  Code generation seemed to be 
using
indirect forms pretty consistently, though the RTL would allow direct calls.

With late-combine some of those indirects would be optimized into direct 
calls.
This naturally led to out of range scenarios.

With the fr30 port slated for removal unless it gets updated to use LRA and 
the
fundamental problems using direct calls, I took the shortest path to keep
things working -- namely forcing all calls to be indirect.

Tested in my tester with no regressions (and fixes the newlib build failure
with late-combine enabled).Pushed to the trunk.

gcc/
* config/fr30/constraints.md (Q): Remove unused constraint.
* config/fr30/predicates.md (call_operand): Remove unused predicate.
* config/fr30/fr30.md (call, vall_value): Turn into expanders and
force the call address into a register.
(*call, *call_value): Adjust to only allow indirect calls.  Adjust
output template accordingly.

Diff:
---
 gcc/config/fr30/constraints.md |  6 --
 gcc/config/fr30/fr30.md| 26 --
 gcc/config/fr30/predicates.md  | 10 --
 3 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/gcc/config/fr30/constraints.md b/gcc/config/fr30/constraints.md
index e4e2be1bfd9..1beee7cc3a2 100644
--- a/gcc/config/fr30/constraints.md
+++ b/gcc/config/fr30/constraints.md
@@ -63,9 +63,3 @@
   "An integer in the range -256 to 255."
   (and (match_code "const_int")
(match_test "IN_RANGE (ival, -256, 255)")))
-
-;; Extra constraints.
-(define_constraint "Q"
-  "@internal"
-  (and (match_code "mem")
-   (match_code "symbol_ref" "0")))
diff --git a/gcc/config/fr30/fr30.md b/gcc/config/fr30/fr30.md
index ecde60b455d..04f6d909054 100644
--- a/gcc/config/fr30/fr30.md
+++ b/gcc/config/fr30/fr30.md
@@ -1079,12 +1079,19 @@
 ;; `SImode', except it is normally a `const_int'); operand 2 is the number of
 ;; registers used as operands.
 
-(define_insn "call"
-  [(call (match_operand 0 "call_operand" "Qm")
+(define_expand "call"
+  [(parallel [(call (mem:QI (match_operand:SI  0 "register_operand" "r"))
+(match_operand 1 "" "g"))
+  (clobber (reg:SI 17))])]
+  ""
+  " { operands[0] = force_reg (SImode, XEXP (operands[0], 0)); } ")
+
+(define_insn "*call"
+  [(call (mem:QI (match_operand:SI 0 "register_operand" "r"))
 (match_operand 1 "" "g"))
(clobber (reg:SI 17))]
   ""
-  "call%#\\t%0"
+  "call%#\\t@%0"
   [(set_attr "delay_type" "delayed")]
 )
 
@@ -1094,14 +1101,21 @@
 ;; increased by one).
 
 ;; Subroutines that return `BLKmode' objects use the `call' insn.
+(define_expand "call_value"
+  [(parallel [(set (match_operand 0 "register_operand"  "=r")
+   (call (mem:QI (match_operand:SI 1 "register_operand" "r"))
+ (match_operand 2 "" "g")))
+  (clobber (reg:SI 17))])]
+  ""
+  " { operands[1] = force_reg (SImode, XEXP (operands[1], 0)); } ")
 
-(define_insn "call_value"
+(define_insn "*call_value"
   [(set (match_operand 0 "register_operand"  "=r")
-   (call (match_operand 1 "call_operand" "Qm")
+   (call (mem:QI (match_operand:SI 1 "register_operand" "r"))
  (match_operand 2 "" "g")))
(clobber (reg:SI 17))]
   ""
-  "call%#\\t%1"
+  "call%#\\t@%1"
   [(set_attr "delay_type" "delayed")]
 )
 
diff --git a/gcc/config/fr30/predicates.md b/gcc/config/fr30/predicates.md
index f67c6097430..2f87a4c81b2 100644
--- a/gcc/config/fr30/predicates.md
+++ b/gcc/config/fr30/predicates.md
@@ -51,16 +51,6 @@
  && REGNO (op) <= 7);
 })
 
-;; Returns true if OP is suitable for use in a CALL insn.
-
-(define_predicate "call_operand"
-  (match_code "mem")
-{
-  return (GET_CODE (op) == MEM
- && (GET_CODE (XEXP (op, 0)) == SYMBOL_REF
- || GET_CODE (XEXP (op, 0)) == REG));
-})
-
 ;; Returns TRUE if OP is a valid operand of a DImode operation.
 
 (define_predicate "di_operand"


[gcc r15-1596] [PATCH 07/11] Handle structs and classes for CodeView

2024-06-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4f86d2a5c0246a90d8d20fb325572a97f3ce7080

commit r15-1596-g4f86d2a5c0246a90d8d20fb325572a97f3ce7080
Author: Mark Harmstone 
Date:   Mon Jun 24 23:39:46 2024 -0600

[PATCH 07/11] Handle structs and classes for CodeView

Translates DW_TAG_structure_type DIEs into LF_STRUCTURE symbols, and
DW_TAG_class_type DIEs into LF_CLASS symbols.

gcc/
* dwarf2codeview.cc (struct codeview_type): Add is_fwd_ref member.
(struct codeview_subtype): Add lf_member to union.
(struct codeview_custom_type): Add lf_structure to union.
(struct codeview_deferred_type): New structure.
(deferred_types, last_deferred_type): New variables.
(get_type_num): Add new args to prototype.
(write_lf_fieldlist): Handle LF_MEMBER subtypes.
(write_lf_structure): New function.
(write_custom_types): Call write_lf_structure.
(get_type_num_pointer_type): Add in_struct argument.
(get_type_num_const_type): Likewise.
(get_type_num_volatile_type): Likewise.
(add_enum_forward_def): Fix get_type_num call.
(get_type_num_enumeration_type): Add in-struct argument.
(add_deferred_type, flush_deferred_types): New functions.
(add_struct_forward_def, get_type_num_struct): Likewise.
(get_type_num): Handle self-referential structs.
(add_variable): Fix get_type_num call.
(codeview_debug_early_finish): Call flush_deferred_types.
* dwarf2codeview.h (LF_CLASS, LF_STRUCTURE, LF_MEMBER): Define.

Diff:
---
 gcc/dwarf2codeview.cc | 513 +++---
 gcc/dwarf2codeview.h  |   3 +
 2 files changed, 493 insertions(+), 23 deletions(-)

diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc
index 475a53573e9..9c6614f6297 100644
--- a/gcc/dwarf2codeview.cc
+++ b/gcc/dwarf2codeview.cc
@@ -158,6 +158,7 @@ struct codeview_type
 {
   dw_die_ref die;
   uint32_t num;
+  bool is_fwd_ref;
 };
 
 struct die_hasher : free_ptr_hash 
@@ -197,6 +198,13 @@ struct codeview_subtype
 {
   uint32_t type_num;
 } lf_index;
+struct
+{
+  uint16_t attributes;
+  uint32_t type;
+  codeview_integer offset;
+  char *name;
+} lf_member;
   };
 };
 
@@ -232,9 +240,25 @@ struct codeview_custom_type
   uint32_t fieldlist;
   char *name;
 } lf_enum;
+struct
+{
+  uint16_t num_members;
+  uint16_t properties;
+  uint32_t field_list;
+  uint32_t derived_from;
+  uint32_t vshape;
+  codeview_integer length;
+  char *name;
+} lf_structure;
   };
 };
 
+struct codeview_deferred_type
+{
+  struct codeview_deferred_type *next;
+  dw_die_ref type;
+};
+
 static unsigned int line_label_num;
 static unsigned int func_label_num;
 static unsigned int sym_label_num;
@@ -249,8 +273,9 @@ static uint32_t last_file_id;
 static codeview_symbol *sym, *last_sym;
 static hash_table *types_htab;
 static codeview_custom_type *custom_types, *last_custom_type;
+static codeview_deferred_type *deferred_types, *last_deferred_type;
 
-static uint32_t get_type_num (dw_die_ref type);
+static uint32_t get_type_num (dw_die_ref type, bool in_struct, bool 
no_fwd_ref);
 
 /* Record new line number against the current function.  */
 
@@ -1217,6 +1242,51 @@ write_lf_fieldlist (codeview_custom_type *t)
  free (v->lf_enumerate.name);
  break;
 
+   case LF_MEMBER:
+ /* This is lf_member in binutils and lfMember in Microsoft's
+cvinfo.h:
+
+   struct lf_member
+   {
+ uint16_t kind;
+ uint16_t attributes;
+ uint32_t type;
+ uint16_t offset;
+ char name[];
+   } ATTRIBUTE_PACKED;
+ */
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, LF_MEMBER);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (2, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_member.attributes);
+ putc ('\n', asm_out_file);
+
+ fputs (integer_asm_op (4, false), asm_out_file);
+ fprint_whex (asm_out_file, v->lf_member.type);
+ putc ('\n', asm_out_file);
+
+ leaf_len = 8 + write_cv_integer (>lf_member.offset);
+
+ if (v->lf_member.name)
+   {
+ name_len = strlen (v->lf_member.name) + 1;
+ ASM_OUTPUT_ASCII (asm_out_file, v->lf_member.name, name_len);
+   }
+ else
+   {
+ name_len = 1;
+ ASM_OUTPUT_ASCII (asm_out_file, "", name_len);
+   }
+
+ leaf_len += name_len;
+ write_cv_padding (4 - (leaf_len % 4));
+
+ free (v->lf_member.name);
+ break;
+
case LF_INDEX:
  /* This is lf_index in binutils and lfIndex in Microsoft's cvinfo.h:
 
@@ -1308,6 +1378,82 @@ write_lf_enum 

[gcc r15-1595] [committed][RISC-V] Fix some of the testsuite fallout from late-combine patch

2024-06-24 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:41ff74aa581ed38d04c46e6c8839eab48e1b63de

commit r15-1595-g41ff74aa581ed38d04c46e6c8839eab48e1b63de
Author: Jeff Law 
Date:   Mon Jun 24 23:22:21 2024 -0600

[committed][RISC-V] Fix some of the testsuite fallout from late-combine 
patch

This fixes most, but not all of the testsuite fallout from the late-combine
patch.  Specifically in the vector space we're often able to eliminate a
broadcast of an scalar element across a vector.  That eliminates the vsetvl
related to the broadcast, but more importantly from the testsuite 
standpoint it
turns .vv forms into .vf or .vx forms.

There were two paths we could have taken here.  One to accept .v*, ignoring 
the
actual register operands.  Or to create new matches for the .vx and .vf
variants.  I selected the latter as I'd like us to know if the code to avoid
the broadcast regresses.

I'm pushing this through now so that we've got cleaner results and to 
prevent
duplicate work.  I've got patch for the rest of the testsuite fallout, but I
want to think about them a bit.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adjust
expected test output after late-combine changes.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: 
Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Likewise.

Diff:
---
 

  1   2   3   4   5   6   7   8   9   10   >