Re: [PATCH] Fortran: fix associate with assumed-length character array [PR115700]

2024-07-03 Thread Andre Vehreschild
Hi Harald,

that patch looks fine to me. I'd say ok for mainline. Give it some time there
and then backport it.

Thanks for the patch.

Regards,
Andre

On Tue, 2 Jul 2024 21:44:01 +0200
Harald Anlauf  wrote:

> Dear all,
>
> the attached patch addresses an effectively bogus warning about
> uninitialized temporary string lengths of associate selectors.
> The primary reason is that the array descriptor for a character
> array is created before the corresponding string length is set.
> Moving the setting of the string length temporary to the beginning
> of the block solves the issue.
>
> The patch does not solve the case for the target containing
> substring references.  This needs to be addressed separately.
> (So far I could not find a solution that does not regress.)
>
> Regtested on x86_64-pc-linux-gnu.  OK for mainline?
>
> As the PR is marked as a regression, is it also OK for backporting?
>
> Thanks,
> Harald
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


[PATCH] tree-optimization/115764 - testcase for BB SLP issue

2024-07-03 Thread Richard Biener
The following adds a testcase for a CSE issue with BB SLP two operator
handling when we make those CSE aware by providing SLP_TREE_SCALAR_STMTS
for them.  This was reduced from 526.blender_r.

Tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115764
* gcc.dg/vect/bb-slp-76.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-76.c | 30 +++
 1 file changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-76.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-76.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-76.c
new file mode 100644
index 000..b3b6a58e7c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-76.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-ffast-math" } */
+
+typedef struct {
+  float xmin, xmax;
+} rctf;
+int U_0;
+float BLI_rctf_size_x_rct_1, view_zoomdrag_apply_dx;
+void *view_zoomdrag_apply_op_0;
+float RNA_float_get();
+typedef struct {
+  rctf cur;
+} View2D;
+typedef struct {
+  View2D v2d;
+} v2dViewZoomData;
+void view_zoomdrag_apply() {
+  v2dViewZoomData *vzd = view_zoomdrag_apply_op_0;
+  View2D *v2d = &vzd->v2d;
+  view_zoomdrag_apply_dx = RNA_float_get();
+  if (U_0) {
+float mval_fac = BLI_rctf_size_x_rct_1, mval_faci = mval_fac,
+  ofs = mval_faci * view_zoomdrag_apply_dx;
+v2d->cur.xmin += ofs + view_zoomdrag_apply_dx;
+v2d->cur.xmax += ofs - view_zoomdrag_apply_dx;
+  } else {
+v2d->cur.xmin += view_zoomdrag_apply_dx;
+v2d->cur.xmax -= view_zoomdrag_apply_dx;
+  }
+}
-- 
2.35.3


[PATCH][v2] Handle NULL stmt in SLP_TREE_SCALAR_STMTS

2024-07-03 Thread Richard Biener
The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
with the first candidate being the two-operator nodes where some
lanes are do-not-care and also do not have a scalar stmt computing
the result.  I originally added SLP_TREE_SCALAR_STMTS to two-operator
nodes but this exposes PR115764, so I've split that out.

I have a patch use NULL elements for loads from groups with gaps
where we get around not doing that by having a load permutation.

I'm currently re-bootstrapping and testing this, it passed multiple
testing rounds before (with the two-operator change).

Richard.

* tree-vect-slp.cc (bst_traits::hash): Handle NULL elements
in SLP_TREE_SCALAR_STMTS.
(vect_print_slp_tree): Likewise.
(vect_mark_slp_stmts): Likewise.
(vect_mark_slp_stmts_relevant): Likewise.
(vect_find_last_scalar_stmt_in_slp): Likewise.
(vect_bb_slp_mark_live_stmts): Likewise.
(vect_slp_prune_covered_roots): Likewise.
(vect_bb_partition_graph_r): Likewise.
(vect_remove_slp_scalar_calls): Likewise.
(vect_slp_gather_vectorized_scalar_stmts): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_contains_pattern_stmt_p): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_find_first_scalar_stmt_in_slp): Likewise.
(vect_optimize_slp_pass::remove_redundant_permutations): Likewise.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_schedule_slp_node): Likewise.
* tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
(vectorizable_shift): Likewise.
* tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
Handle NULL elements in SLP_TREE_SCALAR_STMTS.
---
 gcc/tree-vect-data-refs.cc |  2 +
 gcc/tree-vect-slp.cc   | 76 +++---
 gcc/tree-vect-stmts.cc | 22 ++-
 3 files changed, 61 insertions(+), 39 deletions(-)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 959e127c385..39fd887a96b 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -1041,6 +1041,8 @@ vect_slp_analyze_load_dependences (vec_info *vinfo, 
slp_tree node,
 
   for (unsigned k = 0; k < SLP_TREE_SCALAR_STMTS (node).length (); ++k)
 {
+  if (! SLP_TREE_SCALAR_STMTS (node)[k])
+   continue;
   stmt_vec_info access_info
= vect_orig_stmt (SLP_TREE_SCALAR_STMTS (node)[k]);
   if (access_info == first_access_info)
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b060161c021..7a9aa86f517 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -356,7 +356,7 @@ vect_contains_pattern_stmt_p (vec stmts)
   stmt_vec_info stmt_info;
   unsigned int i;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
-if (is_pattern_stmt_p (stmt_info))
+if (stmt_info && is_pattern_stmt_p (stmt_info))
   return true;
   return false;
 }
@@ -1592,7 +1592,7 @@ bst_traits::hash (value_type x)
 {
   inchash::hash h;
   for (unsigned i = 0; i < x.length (); ++i)
-h.add_int (gimple_uid (x[i]->stmt));
+h.add_int (x[i] ? gimple_uid (x[i]->stmt) : -1);
   return h.end ();
 }
 inline bool
@@ -2801,9 +2801,12 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
dump_location_t loc,
 }
   if (SLP_TREE_SCALAR_STMTS (node).exists ())
 FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-  dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
-  STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
-  i, stmt_info->stmt);
+  if (stmt_info)
+   dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
+STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
+i, stmt_info->stmt);
+  else
+   dump_printf_loc (metadata, user_loc, "\tstmt %u ---\n", i);
   else
 {
   dump_printf_loc (metadata, user_loc, "\t{ ");
@@ -2944,7 +2947,8 @@ vect_mark_slp_stmts (slp_tree node, hash_set 
&visited)
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-STMT_SLP_TYPE (stmt_info) = pure_slp;
+if (stmt_info)
+  STMT_SLP_TYPE (stmt_info) = pure_slp;
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 if (child)
@@ -2974,11 +2978,12 @@ vect_mark_slp_stmts_relevant (slp_tree node, 
hash_set &visited)
 return;
 
   FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
-{
-  gcc_assert (!STMT_VINFO_RELEVANT (stmt_info)
-  || STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope);
-  STMT_VINFO_RELEVANT (stmt_info) = vect_used_in_scope;
-}
+if (stmt_info)
+  {
+   gcc_assert (!STMT_VINFO_RELEVANT (stmt_info)
+   || STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_scope);
+   STMT_VINFO_RELEVANT (stmt_info) = vect_used_in_scope;
+  }
 
   FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
 if (child)
@@ -3029,10 +3034,11 @@ vect_find_last_scalar_stmt_i

Re: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB

2024-07-03 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-03 13:22
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB
From: Pan Li 
 
It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
Ditto.
 
Signed-off-by: Pan Li 
---
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
index dd9e3999a29..1e380657d74 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
** ...
** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
index 738d1465a01..d7b8931f0ec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
** ...
** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e16,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
index b008b21cf0c..edf42a1f776 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
** ...
** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
-- 
2.34.1
 
 


[PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread liuhongt
The patch can avoid SIGILL on non-AVX512 machine due to kmovd is
generated in dynamic check.

Committed as an obvious fix.

gcc/testsuite/ChangeLog:

PR target/115748
* gcc.target/i386/avx512-check.h: Move runtime check into a
separate function and guard it with target ("no-avx").
---
 gcc/testsuite/gcc.target/i386/avx512-check.h | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
b/gcc/testsuite/gcc.target/i386/avx512-check.h
index 0ad9064f637..71858a33dac 100644
--- a/gcc/testsuite/gcc.target/i386/avx512-check.h
+++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
@@ -34,8 +34,9 @@ check_osxsave (void)
   return (ecx & bit_OSXSAVE) != 0;
 }
 
+__attribute__((noipa,target("no-avx")))
 int
-main ()
+avx512_runtime_support_p ()
 {
   unsigned int eax, ebx, ecx, edx;
 
@@ -100,6 +101,17 @@ main ()
   && (edx & bit_AVX512VP2INTERSECT)
 #endif
   && avx512f_os_support ())
+{
+  return 1;
+}
+
+  return 0;
+}
+
+int
+main ()
+{
+  if (avx512_runtime_support_p ())
 {
   DO_TEST ();
 #ifdef DEBUG
-- 
2.31.1



RE: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB

2024-07-03 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, July 3, 2024 3:24 PM
To: Li, Pan2 ; gcc-patches 
Cc: kito.cheng ; jeffreyalaw ; 
Robin Dapp ; Li, Pan2 
Subject: Re: [PATCH v1] RISC-V: Fix asm check failure for truncated after 
SAT_SUB

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-07-03 13:22
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Fix asm check failure for truncated after SAT_SUB
From: Pan Li mailto:pan2...@intel.com>>

It seems that the asm check is incorrect for truncated after SAT_SUB,
we should take the vx check for vssubu instead of vv check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c:
Update vssubu check from vv to vx.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c:
Ditto.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c:
Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c  | 2 +-
.../gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c  | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
index dd9e3999a29..1e380657d74 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e16,\s*m1,\s*ta,\s*ma
** ...
** vle16\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e8,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
index 738d1465a01..d7b8931f0ec 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e32,\s*m1,\s*ta,\s*ma
** ...
** vle32\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e16,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
index b008b21cf0c..edf42a1f776 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c
@@ -11,7 +11,7 @@
** vsetvli\s+[atx][0-9]+,\s*[atx][0-9]+,\s*e64,\s*m1,\s*ta,\s*ma
** ...
** vle64\.v\s+v[0-9]+,\s*0\([atx][0-9]+\)
-** vssubu\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+
+** vssubu\.vx\s+v[0-9]+,\s*v[0-9]+,\s*[atx][0-9]+
** vsetvli\s+zero,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma
** vncvt\.x\.x\.w\s+v[0-9]+,\s*v[0-9]+
** ...
--
2.34.1




[patch,avr,applied] Fix PR98762 partial clobber in movqi output

2024-07-03 Thread Georg-Johann Lay

The movqi output for Reduced Tiny had a wrong condition
and restore statement for the base register in the case
where the destination overlaps the base register.

Applied as obvious.

Johann

--

AVR: target/98762 - Handle partial clobber in movqi output.

PR target/98762
gcc/
* config/avr/avr.cc (avr_out_movqi_r_mr_reg_disp_tiny): Properly
restore the base register when it is partially clobbered.
gcc/testsuite/
* gcc.target/avr/torture/pr98762.c: New test.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index a110af62cd5..f048bf5fd41 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -4838,13 +4838,30 @@ avr_out_movqi_r_mr_reg_disp_tiny (rtx_insn *insn, rtx op[], int *plen)
   rtx dest = op[0];
   rtx src = op[1];
   rtx x = XEXP (src, 0);
+  rtx base = XEXP (x, 0);
 
-  avr_asm_len (TINY_ADIW (%I1, %J1, %o1) CR_TAB
-	   "ld %0,%b1" , op, plen, -3);
+  if (plen)
+*plen = 0;
 
-  if (!reg_overlap_mentioned_p (dest, XEXP (x, 0))
-  && !reg_unused_after (insn, XEXP (x, 0)))
-avr_asm_len (TINY_SBIW (%I1, %J1, %o1), op, plen, 2);
+  if (!reg_overlap_mentioned_p (dest, base))
+{
+  avr_asm_len (TINY_ADIW (%I1, %J1, %o1) CR_TAB
+		   "ld %0,%b1", op, plen, 3);
+  if (!reg_unused_after (insn, base))
+	avr_asm_len (TINY_SBIW (%I1, %J1, %o1), op, plen, 2);
+}
+  else
+{
+  // PR98762: The base register overlaps dest and is only partly clobbered.
+  rtx base2 = all_regs_rtx[1 ^ REGNO (dest)];
+
+  if (!reg_unused_after (insn, base2))
+	avr_asm_len ("mov __tmp_reg__,%0" , &base2, plen, 1);
+  avr_asm_len (TINY_ADIW (%I1, %J1, %o1) CR_TAB
+		   "ld %0,%b1", op, plen, 3);
+  if (!reg_unused_after (insn, base2))
+	avr_asm_len ("mov %0,__tmp_reg__" , &base2, plen, 1);
+}
 
   return "";
 }
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr98762.c b/gcc/testsuite/gcc.target/avr/torture/pr98762.c
new file mode 100644
index 000..c3ba7da69a8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr98762.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-additional-options "-std=c99" } */
+
+long long acc = 0x1122334455667788;
+
+__attribute__((noinline,noclone))
+void addhi (short a)
+{
+  acc += (long long) a << 32;
+}
+
+int main (void)
+{
+  addhi (0x0304);
+  if (acc != 0x1122364855667788)
+__builtin_abort();
+
+  return 0;
+}


Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation

2024-07-03 Thread Mariam Arutunian
On Sat, Jun 8, 2024 at 3:41 PM Richard Sandiford 
wrote:

> Mariam Arutunian  writes:
> > This patch introduces two new expanders for the aarch64 backend,
> > dedicated to generate optimized code for CRC computations.
> > The new expanders are designed to leverage specific hardware capabilities
> > to achieve faster CRC calculations,
> > particularly using the pmul or crc32 instructions when supported by the
> > target architecture.
>
> Thanks for porting this to aarch64!
>
> > Expander 1: Bit-Forward CRC (crc4)
> > For targets that support pmul instruction (TARGET_AES),
> > the expander will generate code that uses the pmul (crypto_pmulldi)
> > instruction for CRC computation.
> >
> > Expander 2: Bit-Reversed CRC (crc_rev4)
> > The expander first checks if the target supports the CRC32 instruction
> set
> > (TARGET_CRC32)
> > and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are
> met,
> > it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
> > crc32w, or crc32x depending on the data size).
> > If the target does not support crc32 but supports pmul, it then uses the
> > pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
> >
> > Otherwise table-based CRC is generated.
> >
> >   gcc/config/aarch64/
> >
> > * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
> > function declaration.
> > (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> > * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
> > (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> > * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> > (crc_rev4): New expander for reversed CRC.
> > (crc4): New expander for reversed CRC.
> > * iterators.md (crc_data_type): New mode attribute.
> >
> >   gcc/testsuite/gcc.target/aarch64/
> >
> > * crc-1-pmul.c: Likewise.
> > * crc-10-pmul.c: Likewise.
> > * crc-12-pmul.c: Likewise.
> > * crc-13-pmul.c: Likewise.
> > * crc-14-pmul.c: Likewise.
> > * crc-17-pmul.c: Likewise.
> > * crc-18-pmul.c: Likewise.
> > * crc-21-pmul.c: Likewise.
> > * crc-22-pmul.c: Likewise.
> > * crc-23-pmul.c: Likewise.
> > * crc-4-pmul.c: Likewise.
> > * crc-5-pmul.c: Likewise.
> > * crc-6-pmul.c: Likewise.
> > * crc-7-pmul.c: Likewise.
> > * crc-8-pmul.c: Likewise.
> > * crc-9-pmul.c: Likewise.
> > * crc-CCIT-data16-pmul.c: Likewise.
> > * crc-CCIT-data8-pmul.c: Likewise.
> > * crc-coremark-16bitdata-pmul.c: Likewise.
> > * crc-crc32-data16.c: New test.
> > * crc-crc32-data32.c: Likewise.
> > * crc-crc32-data8.c: Likewise.
> >
> > Signed-off-by: Mariam Arutunian  > diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> > index 1d3f94c813e..167e1140f0d 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree,
> rtx, int);
> >
> >  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
> >  void aarch64_restore_za (rtx);
> > +void aarch64_expand_crc_using_clmul (rtx *);
> > +void aarch64_expand_reversed_crc_using_clmul (rtx *);
> > +
> >
> >  #endif /* GCC_AARCH64_PROTOS_H */
> > diff --git a/gcc/config/aarch64/aarch64.cc
> b/gcc/config/aarch64/aarch64.cc
> > index ee12d8897a8..05cd0296d38 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname,
> bool write_p, bool is128op)
> >return sysreg->encoding;
> >  }
> >
> > +/* Generate assembly to calculate CRC
> > +   using carry-less multiplication instruction.
> > +   OPERANDS[1] is input CRC,
> > +   OPERANDS[2] is data (message),
> > +   OPERANDS[3] is the polynomial without the leading 1.  */
> > +
> > +void
> > +aarch64_expand_crc_using_clmul (rtx *operands)
>
> This should probably be pmul rather than clmul.
>
> +{
> > +  /* Check and keep arguments.  */
> > +  gcc_assert (!CONST_INT_P (operands[0]));
> > +  gcc_assert (CONST_INT_P (operands[3]));
> > +  rtx crc = operands[1];
> > +  rtx data = operands[2];
> > +  rtx polynomial = operands[3];
> > +
> > +  unsigned HOST_WIDE_INT
> > +  crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant
> ();
> > +  gcc_assert (crc_size <= 32);
> > +  unsigned HOST_WIDE_INT
> > +  data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();
>
> We could instead make the interface:
>
> void
> aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
>rtx *operands)
>
> so that the lines above don't need the to_constant.  This should "just
> work" on the .md file side, since the modes being passed are naturally
> scalar_mode.
>
> I think it'd be worth asserting also that data_size <= crc_size.
> (Although we could handle any MAX (data_size, crc_size) <= 32
> with some adjustment.)
>
> > +
> > +  /* Calculate the quot

Re: [PATCH] diagnostics: Follow DECL_ABSTRACT_ORIGIN links in lhd_decl_printable_name [PR102061]

2024-07-03 Thread Richard Biener
On Wed, Jul 3, 2024 at 8:18 AM Peter Damianov  wrote:
>
> Currently, if a warning references a cloned function, the name of the cloned
> function will be emitted in the "In function 'xyz'" part of the diagnostic,
> which users aren't supposed to see. This patch follows the 
> DECL_ABSTRACT_ORIGIN
> links until encountering the original function.
>
> gcc/ChangeLog:
> PR diagnostics/102061
> * langhooks.cc (lhd_decl_printable_name): Follow DECL_ABSTRACT_ORIGIN
> links to the source
>
> Signed-off-by: Peter Damianov 
> ---
>
> I would add a testcase but I'm not familiar with that process, and would need
> some help. I also did not bootstrap or test this patch, I'm posting to see if
> the CI will do it for me.
>
> I used "while" because I'm not sure if there can be clones of clones or not.
> The second check is because I see comments elsewhere that say:
> "DECL_ABSTRACT_ORIGIN can point to itself", so I want to avoid a potential
> infinite loop.
>
>  gcc/langhooks.cc | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
> index 61f2b676256..89a89b74535 100644
> --- a/gcc/langhooks.cc
> +++ b/gcc/langhooks.cc
> @@ -223,6 +223,8 @@ lhd_get_alias_set (tree ARG_UNUSED (t))
>  const char *
>  lhd_decl_printable_name (tree decl, int ARG_UNUSED (verbosity))
>  {
> +  while (DECL_ABSTRACT_ORIGIN(decl) && DECL_ABSTRACT_ORIGIN(decl) != decl)
> +decl = DECL_ABSTRACT_ORIGIN(decl);

DECL_ABSTRACT_ORIGIN is maintained to point to the original function, there's
no need to "iterate" here.  You should be able to do

 decl = DECL_ORIGIN (decl);

>gcc_assert (decl && DECL_NAME (decl));
>return IDENTIFIER_POINTER (DECL_NAME (decl));
>  }
> --
> 2.39.2
>


[PATCH] vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME

2024-07-03 Thread Hu, Lin1
Hi, all

I forgot to check if the tree's code is SSA_NAME. Have modified.

Bootstrapped and regtested on {x86-64, aarch64}-linux-gnu, OK for trunk?

BRs,
Lin

2024-07-03  Hu, Lin1 
Andrew Pinski 

gcc/ChangeLog:

PR tree-optimization/115753
* tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
TYPE_CODE check before SSA_NAME_RANGE_INFO.

gcc/testsuite/ChangeLog:

PR tree-optimization/115753
* gcc.dg/vect/pr115753-1.c: New test.
* gcc.dg/vect/pr115753-2.c: Ditto.
* gcc.dg/vect/pr115753-3.c: Ditto.
---
 gcc/testsuite/gcc.dg/vect/pr115753-1.c | 12 
 gcc/testsuite/gcc.dg/vect/pr115753-2.c | 20 
 gcc/testsuite/gcc.dg/vect/pr115753-3.c | 15 +++
 gcc/tree-vect-stmts.cc |  2 +-
 4 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-2.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-3.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-1.c 
b/gcc/testsuite/gcc.dg/vect/pr115753-1.c
new file mode 100644
index 000..2c1b6e5df63
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115753-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -frounding-math" } */
+/* { dg-add-options float16  } */
+/* { dg-require-effective-target float16  } */
+
+void f(_Complex _Float16*);
+void
+foo1 (_Complex _Float16 *d)
+{
+_Complex _Float16 cf = 3967 + 3791 * 1i;
+f(&cf);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-2.c 
b/gcc/testsuite/gcc.dg/vect/pr115753-2.c
new file mode 100644
index 000..ceacada2a76
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115753-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -frounding-math" } */
+/* { dg-add-options float16  } */
+/* { dg-require-effective-target float16  } */
+
+void f(_Float16*);
+void
+foo1 ()
+{
+  int t0 = 3967;
+  int t1 = 3969;
+  int t2 = 3971;
+  int t3 = 3973;
+  _Float16 tt[4];
+  tt[0] = t0;
+  tt[1] = t1;
+  tt[2] = t2;
+  tt[3] = t3;
+  f(&tt[0]);
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-3.c 
b/gcc/testsuite/gcc.dg/vect/pr115753-3.c
new file mode 100644
index 000..8e95445897c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115753-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -frounding-math" } */
+
+void f(float*);
+void
+foo1 ()
+{
+  long long t0 = __LONG_LONG_MAX__;
+  long long t1 = __LONG_LONG_MAX__ - 1;
+  float tt[2];
+  tt[0] = t0;
+  tt[1] = t1;
+  f(&tt[0]);
+}
+
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 20b84515446..b4f346ee6ab 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -14678,7 +14678,7 @@ supportable_indirect_convert_operation (code_helper 
code,
 In the future, if it is supported, changes may need to be made
 to this part, such as checking the RANGE of each element
 in the vector.  */
- if (!SSA_NAME_RANGE_INFO (op0)
+ if ((TREE_CODE (op0) == SSA_NAME && !SSA_NAME_RANGE_INFO (op0))
  || !vect_get_range_info (op0, &op_min_value, &op_max_value))
break;
 
-- 
2.31.1



Re: [Patch, rtl-optimization]: Loop unroll factor based on register pressure

2024-07-03 Thread Richard Biener
On Sun, Jun 30, 2024 at 4:15 AM Ajit Agarwal  wrote:
>
> Hello All:
>
> This patch determines Unroll factor based on loop register pressure.
>
> Unroll factor is quotient of max of available registers in loop
> by number of liveness.
>
> If available registers increases unroll factor increases.
> Wherein unroll factor decreases if number of liveness increases.

Unrolling as implemented does not increase register lifetime unless
-fsplit-ivs-in-unroller or -fvariable-expansion-in-unroller.  But I do not
see you looking at those transforms at all.

Richard.

> Loop unrolling is based on loop variables that determines unroll
> factor. Loop variables of the loop are the variables that increases
> register pressure and take advantage of existing register pressure
> calculation.
>
> Available registers are determined by the number of hard registers
> available for each register class minus max reg pressure of loop
> for given register class.
>
> Bootstrapped and regtested on powerpc64-linux-gnu.
>
> Thanks & Regards
> Ajit
>
>
> rtl-optimization: Loop unroll factor based on register pressure
>
> Unroll factor is calculated based on loop register pressure.
>
> Unroll factor is quotient of max of available registers in loop
> by number of liveness.
>
> If available registers increases unroll factor increases.
> Wherein unroll factor decreases if number of liveness increases.
>
> Loop unrolling is based on loop variables that determines unroll
> factor. Loop variables of the loop are the variables that increases
> register pressure and take advantage of existing register pressure
> calculation.
>
> Available registers are determined by the number of hard registers
> available for each register class minus max reg pressure of loop
> for given register class.
>
> 2024-06-29  Ajit Kumar Agarwal  
>
> gcc/ChangeLog:
>
> * loop-unroll.cc: Add calculation of register pressure of
> the loop and use of that to calculate unroll factor.
> ---
>  gcc/loop-unroll.cc | 331 -
>  1 file changed, 328 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/loop-unroll.cc b/gcc/loop-unroll.cc
> index bfdfe6c2bb7..6936ba7afb9 100644
> --- a/gcc/loop-unroll.cc
> +++ b/gcc/loop-unroll.cc
> @@ -35,6 +35,11 @@ along with GCC; see the file COPYING3.  If not see
>  #include "dojump.h"
>  #include "expr.h"
>  #include "dumpfile.h"
> +#include "regs.h"
> +#include "ira.h"
> +#include "rtl-iter.h"
> +#include "regset.h"
> +#include "df.h"
>
>  /* This pass performs loop unrolling.  We only perform this
> optimization on innermost loops (with single exception) because
> @@ -65,6 +70,38 @@ along with GCC; see the file COPYING3.  If not see
> showed that this choice may affect performance in order of several %.
> */
>
> +class loop_data
> +{
> +public:
> +  class loop *outermost_exit;  /* The outermost exit of the loop.  */
> +  bool has_call;   /* True if the loop contains a call.  */
> +  /* Maximal register pressure inside loop for given register class
> + (defined only for the pressure classes).  */
> +  int max_reg_pressure[N_REG_CLASSES];
> +  /* Loop regs referenced and live pseudo-registers.  */
> +  bitmap_head regs_ref;
> +  bitmap_head regs_live;
> +};
> +
> +#define LOOP_DATA(LOOP) ((class loop_data *) (LOOP)->aux)
> +
> +/* Record all regs that are set in any one insn.  Communication from
> +   mark_reg_{store,clobber} and global_conflicts.  Asm can refer to
> +   all hard-registers.  */
> +static rtx regs_set[(FIRST_PSEUDO_REGISTER > MAX_RECOG_OPERANDS
> +? FIRST_PSEUDO_REGISTER : MAX_RECOG_OPERANDS) * 2];
> +/* Number of regs stored in the previous array.  */
> +static int n_regs_set;
> +
> +/* Currently processed loop.  */
> +static class loop *curr_loop;
> +
> +/* Registers currently living.  */
> +static bitmap_head curr_regs_live;
> +
> +/* Current reg pressure for each pressure class.  */
> +static int curr_reg_pressure[N_REG_CLASSES];
> +
>  /* Information about induction variables to split.  */
>
>  struct iv_to_split
> @@ -272,11 +309,262 @@ decide_unrolling (int flags)
>  }
>  }
>
> +/* Return pressure class and number of needed hard registers (through
> +   *NREGS) of register REGNO.  */
> +static enum reg_class
> +get_regno_pressure_class (int regno, int *nregs)
> +{
> +  if (regno >= FIRST_PSEUDO_REGISTER)
> +{
> +  enum reg_class pressure_class;
> +  pressure_class = reg_allocno_class (regno);
> +  pressure_class = ira_pressure_class_translate[pressure_class];
> +  *nregs
> +   = ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
> +  return pressure_class;
> +}
> +  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
> +  && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
> +{
> +  *nregs = 1;
> +  return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
> +}
> +  else
> +{
> +  *nregs = 0;
> +  return NO_REGS;
> +}

Re: [PATCH] c++: Fix ICE locating 'this' for (not matching) template member function [PR115364]

2024-07-03 Thread Simon Martin
On 29 Jun 2024, at 0:00, Patrick Palka wrote:

> On Fri, 28 Jun 2024, Simon Martin wrote:
>
>> We currently ICE when emitting the error message for this invalid 
>> code:
>>
>> === cut here ===
>> struct foo {
>>   template void not_const() {}
>> };
>> void fn(const foo& obj) {
>>   obj.not_const<5>();
>> }
>> === cut here ===
>>
>> The problem is that get_fndecl_argument_location assumes that it has 
>> a
>> FUNCTION_DECL in its hands to find the location of the bad argument. 
>> It might
>> however have a TEMPLATE_DECL if there's a single candidate that 
>> cannot be
>> instantiated, like here.
>>
>> This patch simply defaults to using the FNDECL's location in this 
>> case, which
>> fixes this PR.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>>  PR c++/115364
>>
>> gcc/cp/ChangeLog:
>>
>>  * call.cc (get_fndecl_argument_location): Use FNDECL's location for
>>  TEMPLATE_DECLs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/overload/template7.C: New test.
>>
>> ---
>>  gcc/cp/call.cc| 4 
>>  gcc/testsuite/g++.dg/overload/template7.C | 9 +
>>  2 files changed, 13 insertions(+)
>>  create mode 100644 gcc/testsuite/g++.dg/overload/template7.C
>>
>> diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
>> index 7bbc1fb0c78..d5ff2311e63 100644
>> --- a/gcc/cp/call.cc
>> +++ b/gcc/cp/call.cc
>> @@ -8347,6 +8347,10 @@ get_fndecl_argument_location (tree fndecl, int 
>> argnum)
>>if (DECL_ARTIFICIAL (fndecl))
>>  return DECL_SOURCE_LOCATION (fndecl);
>>
>> +  /* Use FNDECL's location for TEMPLATE_DECLs.  */
>> +  if (TREE_CODE (fndecl) == TEMPLATE_DECL)
>> +return DECL_SOURCE_LOCATION (fndecl);
>> +
>
> For TEMPLATE_DECL fndecl, it'd be more natural to return the
> corresponding argument location of its DECL_TEMPLATE_RESULT (which
> should be a FUNCTION_DECL).  The STRIP_TEMPLATE macro would be
> convenient to use here.
>
>
> It seems this doesn't fix the regression completely however because
> in GCC 11 the code was rejected with a "permerror" (which can be
> downgraded to a warning with -fpermissive):
>
>   115364.C: In function ‘void fn(const foo&)’:
>   115364.C:5:43: error: passing ‘const foo’ as ‘this’ argument 
> discards qualifiers [-fpermissive]
>   5 | void fn(const foo& obj) { obj.not_const<5>(); }
> |   ^~
>   115364.C:3:24: note:   in call to ‘void foo::not_const() [with int 
>  = 5]’
>   3 | template void not_const() {}
> |^
>
> and we now reject with an ordinary error:
>
>   115364.C: In function ‘void fn(const foo&)’:
>   115364.C:5:27: error: cannot convert ‘const foo*’ to ‘foo*’
>   5 | void fn(const foo& obj) { obj.not_const<5>(); }
> |   ^~~
> |   |
> |   const foo*
>   115364.C:3:24: note:   initializing argument 'this' of 
> ‘template > void foo::not_const()’
>   3 | template void not_const() {}
> |^
>
> To restore the error into a permerror, we need to figure out why we're
> unexpectedly hitting this code path with a TEMPLATE_DECL, and why it's
> necessary that the member function needs to take no arguments.  It 
> turns
> out I looked into this and submitted a patch for PR106760 (of which 
> this
> PR115364 is a dup) last year:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620514.html
>
> The patch was approved, but I lost track of it and never pushed it :/
> I'm going to go ahead and push that fix shortly, sorry for not doing 
> so
> earlier.  Thanks for looking into this issue!
Sounds good, thanks!

>
>>int i;
>>tree param;
>>
>> diff --git a/gcc/testsuite/g++.dg/overload/template7.C 
>> b/gcc/testsuite/g++.dg/overload/template7.C
>> new file mode 100644
>> index 000..67191c4ff62
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.dg/overload/template7.C
>> @@ -0,0 +1,9 @@
>> +// PR c++/115364
>> +// { dg-do compile }
>> +
>> +struct foo {
>> +  template void not_const() {} // { dg-note "initializing" }
>> +};
>> +void fn(const foo& obj) {
>> +  obj.not_const<5>(); // { dg-error "cannot convert" }
>> +}
>> -- 
>> 2.44.0
>>
>>
>>
>>



Re: [PATCH] rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

2024-07-03 Thread Kewen.Lin
Hi Peter,

on 2024/6/20 05:14, Peter Bergner wrote:
> We currently only emit the ROP-protect hash* insns for Power10, where the
> insns were added to the architecture.  We want to emit them for earlier
> cpus (where they operate as NOPs), so that if those older binaries are
> ever executed on a Power10, then they'll be protected from ROP attacks.
> Binutils accepts hashst and hashchk back to Power8, so change GCC to emit
> them for Power8 and later.  This matches clang's behavior.
> 
> This patch is independent of the ROP shrink-wrap fix submitted earlier.
> This passed bootstrap and regtesting on powerpc64le-linux with no regressions.
> Ok for trunk?  
> 
> Peter
> 
> 
> 
> 2024-06-19  Peter Bergner  
> 
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000-logue.cc (rs6000_stack_info): Use TARGET_POWER8.
>   (rs6000_emit_prologue): Likewise.
>   * config/rs6000/rs6000.md (hashchk): Likewise.
>   (hashst): Likewise.
>   Fix whitespace.
> 
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-2.c: New test.
>   * lib/target-supports.exp (rop_ok): Use
>   check_effective_target_has_arch_pwr8.
> ---
>  gcc/config/rs6000/rs6000-logue.cc |  6 +++---
>  gcc/config/rs6000/rs6000.md   |  6 +++---
>  gcc/testsuite/gcc.target/powerpc/pr114759-2.c | 17 +
>  gcc/testsuite/lib/target-supports.exp |  2 +-
>  4 files changed, 24 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> 
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index c384e48e378..bd363b625a4 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -716,7 +716,7 @@ rs6000_stack_info (void)
>info->calls_p = (!crtl->is_leaf || cfun->machine->ra_needs_full_frame);
>info->rop_hash_size = 0;
>  
> -  if (TARGET_POWER10
> +  if (TARGET_POWER8
>&& info->calls_p
>&& DEFAULT_ABI == ABI_ELFv2
>&& rs6000_rop_protect)

Nit: I noticed that this is the only place to change
info->rop_hash_size to non-zero, and ...

> @@ -3277,7 +3277,7 @@ rs6000_emit_prologue (void)
>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>   but there's no way to know that here.  Harmless except for
>   performance, of course.  */
> -  if (TARGET_POWER10 && rs6000_rop_protect && info->rop_hash_size != 0)
> +  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)

... this condition and ...

>  {
>gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
> @@ -5056,7 +5056,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>  
>/* The ROP hash check must occur after the stack pointer is restored
>   (since the hash involves r1), and is not performed for a sibcall.  */
> -  if (TARGET_POWER10
> +  if (TARGET_POWER8>&& rs6000_rop_protect
>&& info->rop_hash_size != 0

... here, both check info->rop_hash_size isn't zero, I think we can drop these
two TARGET_POWER10 (TARGET_POWER8) and rs6000_rop_protect checks?  Instead just
update the inner gcc_assert (now checking DEFAULT_ABI == ABI_ELFv2) by extra
checkings on TARGET_POWER8 && rs6000_rop_protect?

The other looks good to me, ok for trunk with this nit tweaked (if you agree
with it and re-tested well), thanks!

BR,
Kewen


>&& epilogue_type != EPILOGUE_TYPE_SIBCALL)
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a5d20594789..694076e311f 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -15808,9 +15808,9 @@ (define_insn "*cmpeqb_internal"
>  
>  (define_insn "hashst"
>[(set (match_operand:DI 0 "simple_offsettable_mem_operand" "=m")
> -(unspec_volatile:DI [(match_operand:DI 1 "int_reg_operand" "r")]
> + (unspec_volatile:DI [(match_operand:DI 1 "int_reg_operand" "r")]
>   UNSPEC_HASHST))]
> -  "TARGET_POWER10 && rs6000_rop_protect"
> +  "TARGET_POWER8 && rs6000_rop_protect"
>  {
>static char templ[32];
>const char *p = rs6000_privileged ? "p" : "";
> @@ -15823,7 +15823,7 @@ (define_insn "hashchk"
>[(unspec_volatile [(match_operand:DI 0 "int_reg_operand" "r")
>(match_operand:DI 1 "simple_offsettable_mem_operand" "m")]
>   UNSPEC_HASHCHK)]
> -  "TARGET_POWER10 && rs6000_rop_protect"
> +  "TARGET_POWER8 && rs6000_rop_protect"
>  {
>static char templ[32];
>const char *p = rs6000_privileged ? "p" : "";
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114759-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> new file mode 100644
> index 000..3881ebd416e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mrop-protect" } */
> +/* { dg-require-effective-target r

Re: [PATCH v1] Vect: Distribute truncation into .SAT_SUB operands

2024-07-03 Thread Richard Biener
On Sun, Jun 30, 2024 at 5:13 AM  wrote:
>
> From: Pan Li 
>
> To get better vectorized code of .SAT_SUB,  we would like to avoid the
> truncated operation for the assignment.  For example, as below.
>
> unsigned int _1;
> unsigned int _2;
> _9 = (unsigned short int).SAT_SUB (_1, _2);
>
> If we make sure that the _1 is in the range of unsigned short int.  Such
> as a def similar to:
>
> _1 = (unsigned short int)_4;
>
> Then we can do the distribute the truncation operation to:
>
> _3 = MIN_EXPR (_2, 65535);
> _9 = .SAT_SUB ((unsigned short int)_1, (unsigned short int)_3);
>
> Let's take RISC-V vector as example to tell the changes.  For below
> sample code:
>
> __attribute__((noinline))
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0);
>   } while (--n);
> }
>
> Before this patch:
>   ...
>   .L3:
>   vle16.v   v1,0(a3)
>   vrsub.vx  v5,v2,t1
>   mvt3,a4
>   addw  a4,a4,t5
>   vrgather.vv   v3,v1,v5
>   vsetvli   zero,zero,e32,m1,ta,ma
>   vzext.vf2 v1,v3
>   vssubu.vx v1,v1,a1
>   vsetvli   zero,zero,e16,mf2,ta,ma
>   vncvt.x.x.w   v1,v1
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t4
>   bgtu  t6,a4,.L3
>   ...
>
> After this patch:
> test:
>   ...
>   .L3:
>   vle16.v   v3,0(a3)
>   vrsub.vx  v5,v2,a6
>   mva7,a4
>   addw  a4,a4,t3
>   vrgather.vv   v1,v3,v5
>   vssubu.vv v1,v1,v6
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t1
>   bgtu  t4,a4,.L3
>   ...
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_distribute):
> Add new func impl to perform the truncation distribution.
> (vect_recog_sat_sub_pattern): Perform above optimize before
> generate .SAT_SUB call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 73 +++
>  1 file changed, 73 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..7329ecec2c4 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4565,6 +4565,77 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to distribute the truncation for .SAT_SUB pattern,  mostly occurs in
> + * the benchmark zip.  Aka:
> + *
> + *   unsigned int _1;
> + *   unsigned int _2;
> + *   _9 = (unsigned short int).SAT_SUB (_1, _2);
> + *
> + *   if _1 is known to be in the range of unsigned short int.  For example
> + *   there is a def _1 = (unsigned short int)_4.  Then we can distribute the
> + *   truncation to:
> + *
> + *   _3 = MIN (65535, _2);
> + *   _9 = .SAT_SUB ((unsigned short int)_1, (unsigned short int)_3);
> + *
> + *   Then,  we can better vectorized code and avoid the unnecessary narrowing
> + *   stmt during vectorization.
> + */
> +static void
> +vect_recog_sat_sub_pattern_distribute (vec_info *vinfo,
> +  stmt_vec_info stmt_vinfo,
> +  gimple *stmt, tree lhs, tree *ops)
> +{
> +  tree otype = TREE_TYPE (lhs);
> +  tree itype = TREE_TYPE (ops[0]);
> +
> +  if (types_compatible_p (otype, itype))
> +return;
> +
> +  unsigned itype_prec = TYPE_PRECISION (itype);
> +  unsigned otype_prec = TYPE_PRECISION (otype);
> +
> +  if (otype_prec >= itype_prec)
> +return;
> +
> +  int_range_max r;
> +  gimple_ranger granger;
> +
> +  if (granger.range_of_expr (r, ops[0], stmt) && !r.undefined_p ())
> +{
> +  wide_int bound = r.upper_bound ();
> +  wide_int otype_max = wi::mask (otype_prec, /* negate */false, 
> itype_prec);
> +
> +  if (bound != otype_max)

Isn't bound < otype_max OK as well?

Given your example I wonder if you instead want to use
vect_look_through_possible_promotion?  Because ...

> +   return;
> +
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +
> +  /* 1. Build truncated op_0  */
> +  tree op_0_out = vect_recog_temp_ssa_var (otype, NULL);
> +  gimple *op_0_cast = gimple_build_assign (op_0_out, NOP_EXPR, ops[0]);
> +  append_pattern_def_seq (vinfo, stmt_vinfo, op_0_cast, v_otype);

.. if you do it like this the widened op is still there and vectorized
and the whole
point is to make it possible to use a smaller vectorization factor?

> +  /* 2. Build MIN_EXPR (op_1, 65536)  */
> +  tree max = wide_int_to_tree (itype, otype_max);
> +  tree op_1_in = vect_recog_temp_ssa_var (itype, NULL);
> +  gimple *op_1_min = gimple_build_assign (op_1_in, MIN_EXPR, ops[1], 
> max);

I think you want to check that the target supports vecto

Re: [PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

2024-07-03 Thread Richard Biener
On Tue, Jul 2, 2024 at 3:38 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC has the input and output types,  aka cvt from
> itype to otype and the sizeof (otype) < sizeof (itype).  The
> previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
> But actually we have 1/4 and 1/8 truncation.
>
> This patch would like to support more types trunction when
> sizeof (otype) < sizeof (itype).  The below truncation will be
> covered.
>
> * uint64_t => uint8_t
> * uint64_t => uint16_t
> * uint64_t => uint32_t
> * uint32_t => uint8_t
> * uint32_t => uint16_t
> * uint16_t => uint8_t
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

> gcc/ChangeLog:
>
> * match.pd: Allow any otype is less than itype truncation.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7fff7b5f9fe..f708f4622bd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3239,16 +3239,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> (convert @0))
> - (with {
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> unsigned otype_precision = TYPE_PRECISION (type);
> -   wide_int trunc_max = wi::mask (itype_precision / 2, false, 
> itype_precision);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> wide_int int_cst = wi::to_wide (@1, itype_precision);
>}
> -  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -   && TYPE_UNSIGNED (TREE_TYPE (@0))
> -   && otype_precision < itype_precision
> -   && wi::eq_p (trunc_max, int_cst)
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> --
> 2.34.1
>


Re: [PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-03 Thread Richard Biener
On Wed, Jul 3, 2024 at 3:33 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_TRUNC for the unsigned
> vector int.  Given we have below example code:
>
> Form 1
>   #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
>   void __attribute__((noinline))   \
>   vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
>   {\
> for (unsigned i = 0; i < limit; i++)   \
>   {\
> bool overflow = y[i] > (WT)(NT)(-1);   \
> x[i] = ((NT)y[i]) | (NT)-overflow; \
>   }\
>   }
>
> VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)
>
> Before this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
>   ivtmp_35 = _51 * 8;
>   vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
>   mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
>   vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
>   vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, 
> vect__5.9_29);
>   ivtmp_12 = _51 * 4;
>   .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
>   vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
>   vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
>   ivtmp_50 = ivtmp_49 - _51;
>   if (ivtmp_50 != 0)
>   ...
> }
>
> After this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
>   ivtmp_34 = _12 * 8;
>   vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
>   vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
>   ivtmp_29 = _12 * 4;
>   .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
>   vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
>   vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
>   ivtmp_20 = ivtmp_21 - _12;
>   if (ivtmp_20 != 0)
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The rv64gcv fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
> new decl generated by match.
> (vect_recog_sat_trunc_pattern): Add new func impl to recog the
> .SAT_TRUNC pattern.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 54 +++
>  1 file changed, 54 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..86e893a1c43 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4489,6 +4489,7 @@ vect_recog_mult_pattern (vec_info *vinfo,
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
>
>  static gimple *
>  vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
> @@ -4603,6 +4604,58 @@ vect_recog_sat_sub_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to detect saturation truncation pattern (SAT_TRUNC), aka below gimple:
> + *   overflow_5 = x_4(D) > 4294967295;
> + *   _1 = (unsigned int) x_4(D);
> + *   _2 = (unsigned int) overflow_5;
> + *   _3 = -_2;
> + *   _6 = _1 | _3;
> + *
> + * And then simplied to
> + *   _6 = .SAT_TRUNC (x_4(D));
> + */
> +
> +static gimple *
> +vect_recog_sat_trunc_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> + tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +return NULL;
> +
> +  tree ops[1];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
> +{
> +  tree itype = TREE_TYPE (ops[0]);
> +  tree otype = TREE_TYPE (lhs);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  internal_fn fn = IFN_SAT_TRUNC;
> +
> +  if (v_itype != NULL_TREE && v_otype != NULL_TREE
> +   && direct_internal_fn_supported_p (fn, tree_pair (v_otype, v_itype),
> +  OPTIMIZE_FOR_BOTH))
> +   {
> + gcall *call = gimple_build_call_internal (fn, 1, ops[0]);
> + tree out_ssa = vect_recog_temp_ssa_var (otype, NULL);
> +
> + gimple_call_set_lhs (call, out_ssa);
> + gimple_call_se

Re: [PATCH] vect: Fix ICE caused by missing check for TREE_CODE == SSA_NAME

2024-07-03 Thread Richard Biener
On Wed, 3 Jul 2024, Hu, Lin1 wrote:

> Hi, all
> 
> I forgot to check if the tree's code is SSA_NAME. Have modified.
> 
> Bootstrapped and regtested on {x86-64, aarch64}-linux-gnu, OK for trunk?

OK.

Thanks,
Richard.

> BRs,
> Lin
> 
> 2024-07-03  Hu, Lin1 
>   Andrew Pinski 
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/115753
>   * tree-vect-stmts.cc (supportable_indirect_convert_operation): Add
>   TYPE_CODE check before SSA_NAME_RANGE_INFO.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/115753
>   * gcc.dg/vect/pr115753-1.c: New test.
>   * gcc.dg/vect/pr115753-2.c: Ditto.
>   * gcc.dg/vect/pr115753-3.c: Ditto.
> ---
>  gcc/testsuite/gcc.dg/vect/pr115753-1.c | 12 
>  gcc/testsuite/gcc.dg/vect/pr115753-2.c | 20 
>  gcc/testsuite/gcc.dg/vect/pr115753-3.c | 15 +++
>  gcc/tree-vect-stmts.cc |  2 +-
>  4 files changed, 48 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr115753-3.c
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-1.c 
> b/gcc/testsuite/gcc.dg/vect/pr115753-1.c
> new file mode 100644
> index 000..2c1b6e5df63
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115753-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -frounding-math" } */
> +/* { dg-add-options float16  } */
> +/* { dg-require-effective-target float16  } */
> +
> +void f(_Complex _Float16*);
> +void
> +foo1 (_Complex _Float16 *d)
> +{
> +_Complex _Float16 cf = 3967 + 3791 * 1i;
> +f(&cf);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-2.c 
> b/gcc/testsuite/gcc.dg/vect/pr115753-2.c
> new file mode 100644
> index 000..ceacada2a76
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115753-2.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -frounding-math" } */
> +/* { dg-add-options float16  } */
> +/* { dg-require-effective-target float16  } */
> +
> +void f(_Float16*);
> +void
> +foo1 ()
> +{
> +  int t0 = 3967;
> +  int t1 = 3969;
> +  int t2 = 3971;
> +  int t3 = 3973;
> +  _Float16 tt[4];
> +  tt[0] = t0;
> +  tt[1] = t1;
> +  tt[2] = t2;
> +  tt[3] = t3;
> +  f(&tt[0]);
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr115753-3.c 
> b/gcc/testsuite/gcc.dg/vect/pr115753-3.c
> new file mode 100644
> index 000..8e95445897c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr115753-3.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -frounding-math" } */
> +
> +void f(float*);
> +void
> +foo1 ()
> +{
> +  long long t0 = __LONG_LONG_MAX__;
> +  long long t1 = __LONG_LONG_MAX__ - 1;
> +  float tt[2];
> +  tt[0] = t0;
> +  tt[1] = t1;
> +  f(&tt[0]);
> +}
> +
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 20b84515446..b4f346ee6ab 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -14678,7 +14678,7 @@ supportable_indirect_convert_operation (code_helper 
> code,
>In the future, if it is supported, changes may need to be made
>to this part, such as checking the RANGE of each element
>in the vector.  */
> -   if (!SSA_NAME_RANGE_INFO (op0)
> +   if ((TREE_CODE (op0) == SSA_NAME && !SSA_NAME_RANGE_INFO (op0))
> || !vect_get_range_info (op0, &op_min_value, &op_max_value))
>   break;
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH-1v4, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-07-03 Thread Kewen.Lin
Hi Haochen,

on 2024/6/27 09:41, HAO CHEN GUI wrote:
> Hi,
>   This patch implemented optab_isinf for SFDF and IEEE128 by test
> data class instructions.
> 
>   Compared with previous version, the main change is to define
> and use the constant mask for test data class insns.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652593.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Implement optab_isinf for SFDF and IEEE128
> 
> gcc/
>   PR target/97786
>   * config/rs6000/rs6000.md (ISNAN, ISINF, ISZERO, ISDENORMAL): Define.
>   * config/rs6000/vsx.md (isinf2 for SFDF): New expand.
>   (isinf2 for IEEE128): New expand.
> 
> gcc/testsuite/
>   PR target/97786
>   * gcc.target/powerpc/pr97786-1.c: New test.
>   * gcc.target/powerpc/pr97786-2.c: New test.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..e84e6b08f03 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -53,6 +53,17 @@ (define_constants
> (FRAME_POINTER_REGNUM 110)
>])
> 
> +;;
> +;; Test data class mask
> +;;
> +
> +(define_constants
> +  [(ISNAN0x40)
> +   (ISINF0x30)
> +   (ISZERO   0xC)
> +   (ISDENORMAL   0x3)

Nit: Maybe it's better to add prefix on test data class, such
as: TEST_DATA_CLASS_NAN or DATA_CLASS_NAN.

And DATA_CLASS_INF can be separated as DATA_CLASS_POS_INF 0x20
and DATA_CLASS_NEG_INF 0x10, similar separating for DENORM.

> +  ])
> +
>  ;;
>  ;; UNSPEC usage
>  ;;
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index f135fa079bd..67615bae8c0 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5313,6 +5313,24 @@ (define_expand "xststdcp"
>operands[4] = CONST0_RTX (SImode);
>  })
> 
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:SFDF 1 "vsx_register_operand"))]
> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcp (operands[0], operands[1], GEN_INT (ISINF)));
> +  DONE;
> +})
> +
> +(define_expand "isinf2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> +   (use (match_operand:IEEE128 1 "vsx_register_operand"))]

QP insns are special, only altivec regs can be used, so 
s/vsx_register_operand/altivec_register_operand/

Also applied to the other two patches for isnormal and isfinite.

And as discussed offline, let's merge these patterns with mode attribute. :)

> +  "TARGET_HARD_FLOAT && TARGET_P9_VECTOR"
> +{
> +  emit_insn (gen_xststdcqp_ (operands[0], operands[1], GEN_INT 
> (ISINF)));
> +  DONE;
> +})
> +
>  ;; The VSX Scalar Test Negative Quad-Precision
>  (define_expand "xststdcnegqp_"
>[(set (match_dup 2)
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> new file mode 100644
> index 000..c1c4f64ee8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-1.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */

Nit: Not necessary, but it's preferred to put dg-options line before the line 
for powerpc_vsx
as powerpc_vsx considers current_compiler_flags but dg-options line isn't 
processed if it's put
behind.  Also applied for the other test cases.

BR,
Kewen

> +
> +int test1 (double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test3 (float x)
> +{
> +  return __builtin_isinff (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmp} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcsp\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr97786-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> new file mode 100644
> index 000..ed305e8572e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-2.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target ppc_float128_hw } */
> +/* { dg-require-effective-target powerpc_vsx } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mabi=ieeelongdouble -Wno-psabi" } 
> */
> +
> +int test1 (long double x)
> +{
> +  return __builtin_isinf (x);
> +}
> +
> +int test2 (long double x)
> +{
> +  return __builtin_isinfl (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mxscmpuqp\M} } } */
> +/* { dg-final { scan-assembler-times {\mxststdcqp\M} 2 } } */



Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-03 Thread Kewen.Lin
Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:
> GCC maintainers:
> 
> The following patch updates the user documentation for the vec_ld, vec_lde, 
> vec_st and vec_ste built-ins to make it clearer that there are data alignment 
> requirements for these built-ins.  If the data alignment requirements are not 
> followed, the data loaded or stored by these built-ins will be wrong.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> 
> rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation
> 
> Use of the vec_ld and vec_st built-ins require that the data be 16-byte
> aligned to work properly.  Add some additional text to the existing
> documentation to make this clearer to the user.
> 
> Similarly, the vec_lde and vec_ste built-ins also have data alignment
> requirements based on the size of the vector element.  Update the
> documentation to make this clear to the user.
> 
> gcc/ChangeLog:
>   * doc/extend.texi: Add clarification for the use of the vec_ld
>   vec_st, vec_lde and vec_ste built-ins.
> ---
>  gcc/doc/extend.texi | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index ee3644a5264..55faded17b9 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned 
> char,
>  @end smallexample
>  
>  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
> -generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
> -if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
> -@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
> -@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
> +generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think 
it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words 
to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly. 

> +instructions mask off the lower 4 bits of the effective address thus 
> requiring
> +the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
> +@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
> +integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
> +@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
> +off the lower bits of the effective address based on the size of the data.
> +Thus the data must be aligned to the size of the vector element to work
> +properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
> +always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
> +@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and 
users
should read the description in PVIPR instead (it's more recommended).

BR,
Kewen

>  
>  @node PowerPC AltiVec Built-in Functions Available on ISA 2.07
>  @subsubsection PowerPC AltiVec Built-in Functions Available on ISA 2.07



[PATCH] RISC-V: Use tu policy for first-element vec_set [PR115725].

2024-07-03 Thread Robin Dapp
Hi,

this patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy.  With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.

Regtested on rv64gcv_zvfh.

OK for trunk and GCC 14 backport?

Regards
 Robin

gcc/ChangeLog:

* config/riscv/autovec.md: Add TU policy.
* config/riscv/riscv-protos.h (enum insn_type): Define
SCALAR_MOVE_MERGED_OP_TU.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test expectation.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
---
 gcc/config/riscv/autovec.md  |  3 ++-
 gcc/config/riscv/riscv-protos.h  |  4 
 .../riscv/rvv/autovec/vls-vlmax/vec_set-1.c  | 12 
 .../riscv/rvv/autovec/vls-vlmax/vec_set-2.c  | 12 
 .../riscv/rvv/autovec/vls-vlmax/vec_set-3.c  | 12 
 .../riscv/rvv/autovec/vls-vlmax/vec_set-4.c  | 12 
 6 files changed, 22 insertions(+), 33 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1af50a46c4c..aa7dd526804 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1341,7 +1341,8 @@ (define_expand "vec_set"
 {
   rtx ops[] = {operands[0], operands[0], operands[1]};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (mode),
-   riscv_vector::SCALAR_MOVE_MERGED_OP, 
ops, CONST1_RTX (Pmode));
+   riscv_vector::SCALAR_MOVE_MERGED_OP_TU,
+   ops, CONST1_RTX (Pmode));
 }
   else
 {
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 39b723a590b..064aa082742 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -524,6 +524,10 @@ enum insn_type : unsigned int
   SCALAR_MOVE_MERGED_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
  | HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
  | UNARY_OP_P,
+
+  SCALAR_MOVE_MERGED_OP_TU = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
+ | HAS_MERGE_P | TU_POLICY_P | MDEFAULT_POLICY_P
+ | UNARY_OP_P,
 };
 
 enum vlmul_type
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
index ecb160933d6..99b0f625c83 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
@@ -64,14 +64,10 @@ typedef double vnx2df __attribute__((vector_size (16)));
 TEST_ALL1 (VEC_SET)
 TEST_ALL_VAR1 (VEC_SET_VAR1)
 
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 5 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 4 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 6 } } */
 
 /* { dg-final { scan-assembler-times {\tvmv.v.x} 13 } } */
 /* { dg-final { scan-assembler-times {\tvfmv.v.f} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
index 194abff77cc..64a40308eb1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
@@ -76,14 +76,10 @@ typedef double vnx4df __attribute__((vector_size (32)));
 TEST_ALL2 (VEC_SET)
 TEST_ALL_VAR2 (VEC_SET_VAR2)
 
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
-/* { dg-final {

Re: [PATCH] RISC-V: Use tu policy for first-element vec_set [PR115725].

2024-07-03 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-07-03 17:39
To: gcc-patches
CC: rdapp.gcc; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw; Li, Pan2
Subject: [PATCH] RISC-V: Use tu policy for first-element vec_set [PR115725].
Hi,
 
this patch changes the tail policy for vmv.s.x from ta to tu.
By default the bug does not show up with qemu because qemu's
current vmv.s.x implementation always uses the tail-undisturbed
policy.  With a local qemu version that overwrites the tail
with ones when the tail-agnostic policy is specified, the bug
shows.
 
Regtested on rv64gcv_zvfh.
 
OK for trunk and GCC 14 backport?
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: Add TU policy.
* config/riscv/riscv-protos.h (enum insn_type): Define
SCALAR_MOVE_MERGED_OP_TU.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
test expectation.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
---
gcc/config/riscv/autovec.md  |  3 ++-
gcc/config/riscv/riscv-protos.h  |  4 
.../riscv/rvv/autovec/vls-vlmax/vec_set-1.c  | 12 
.../riscv/rvv/autovec/vls-vlmax/vec_set-2.c  | 12 
.../riscv/rvv/autovec/vls-vlmax/vec_set-3.c  | 12 
.../riscv/rvv/autovec/vls-vlmax/vec_set-4.c  | 12 
6 files changed, 22 insertions(+), 33 deletions(-)
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 1af50a46c4c..aa7dd526804 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1341,7 +1341,8 @@ (define_expand "vec_set"
 {
   rtx ops[] = {operands[0], operands[0], operands[1]};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (mode),
- riscv_vector::SCALAR_MOVE_MERGED_OP, ops, CONST1_RTX (Pmode));
+ riscv_vector::SCALAR_MOVE_MERGED_OP_TU,
+ ops, CONST1_RTX (Pmode));
 }
   else
 {
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 39b723a590b..064aa082742 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -524,6 +524,10 @@ enum insn_type : unsigned int
   SCALAR_MOVE_MERGED_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
  | HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
  | UNARY_OP_P,
+
+  SCALAR_MOVE_MERGED_OP_TU = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
+   | HAS_MERGE_P | TU_POLICY_P | MDEFAULT_POLICY_P
+   | UNARY_OP_P,
};
enum vlmul_type
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
index ecb160933d6..99b0f625c83 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
@@ -64,14 +64,10 @@ typedef double vnx2df __attribute__((vector_size (16)));
TEST_ALL1 (VEC_SET)
TEST_ALL_VAR1 (VEC_SET_VAR1)
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 5 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 6 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 2 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 4 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 6 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 8 } } */
+/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 6 } } */
/* { dg-final { scan-assembler-times {\tvmv.v.x} 13 } } */
/* { dg-final { scan-assembler-times {\tvfmv.v.f} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
index 194abff77cc..64a40308eb1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
@@ -76,14 +76,10 @@ typedef double vnx4df __attribute__((vector_size (32)));
TEST_ALL2 (VEC_SET)
TEST_ALL_VAR2 (VEC_SET_VAR2)
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m2,\s*ta,\s*ma} 1 } } */
-/* { dg-final { scan-assembler-times 
{vset[i]*vli\s+[a-z0

[PATCH 1/2] aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro

2024-07-03 Thread Kyrylo Tkachov
Hi all,

The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
 header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.

This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
I think it makes sense to back port this to branches that support the 
arm_bf16.h header.
I’ll prepare patches for those separately.

Thanks,
Kyrill

gcc/

PR target/115457
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.

gcc/testsuite/

PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.

Signed-off-by: Kyrylo Tkachov 


0001-aarch64-PR-target-115457-Implement-missing-__ARM_FEA.patch
Description: 0001-aarch64-PR-target-115457-Implement-missing-__ARM_FEA.patch


[PATCH 2/2] aarch64: PR target/115475 Implement missing __ARM_FEATURE_SVE_BF16 macro

2024-07-03 Thread Kyrylo Tkachov
Hi all,

The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
I think it makes sense to back port this to branches that support the SVE BF16 
intrinsics.
I’ll prepare patches for those separately.

Thanks,
Kyrill

gcc/

PR target/115475
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.

gcc/testsuite/

PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.

Signed-off-by: Kyrylo Tkachov 



0002-aarch64-PR-target-115475-Implement-missing-__ARM_FEA.patch
Description: 0002-aarch64-PR-target-115475-Implement-missing-__ARM_FEA.patch


[RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-03 Thread Sébastien Michelland
libgcc's fp-bit.c is quite slow and most modern/developed architectures
have switched to using the soft-fp library. This patch does so for
free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters
for the most part, most notably no exceptions.

A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
about x3 speedup (~320 -> 1050 Kwhets/s).

I'm sending this as RFC because I'm quite unsure about testing. I built
the compiler and ran the benchmark, but I don't know if GCC has a test
for soft-fp correctness and whether I can run that in my non-hosted
environment. Any advice?

Cheers,
Sébastien

libgcc/ChangeLog:

* config.host: Use soft-fp library for non-hosted SH3/SH4
instead of fpdbit.
* config/sh/sfp-machine.h: New.

Signed-off-by: Sébastien Michelland 
---
 libgcc/config.host | 10 +++-
 libgcc/config/sh/sfp-machine.h | 83 ++
 2 files changed, 92 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/sh/sfp-machine.h

diff --git a/libgcc/config.host b/libgcc/config.host
index 9fae51d4c..fee3bf0c0 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1399,7 +1399,15 @@ s390x-ibm-tpf*)
md_unwind_header=s390/tpf-unwind.h
;;
 sh-*-elf* | sh[12346l]*-*-elf*)
-   tmake_file="$tmake_file sh/t-sh t-crtstuff-pic t-fdpbit"
+   tmake_file="$tmake_file sh/t-sh t-crtstuff-pic"
+   case ${host} in
+   sh[34]*-*-elf*)
+   tmake_file="${tmake_file} t-softfp-sfdf t-softfp"
+   ;;
+   *)
+   tmake_file="${tmake_file} t-fdpbit"
+   ;;
+   esac
extra_parts="$extra_parts crt1.o crti.o crtn.o crtbeginS.o crtendS.o \
libic_invalidate_array_4-100.a \
libic_invalidate_array_4-200.a \
diff --git a/libgcc/config/sh/sfp-machine.h b/libgcc/config/sh/sfp-machine.h
new file mode 100644
index 0..c1aa428c0
--- /dev/null
+++ b/libgcc/config/sh/sfp-machine.h
@@ -0,0 +1,83 @@
+/* Software floating-point machine description for SuperH.
+
+   Copyright (C) 2016-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define _FP_W_TYPE_SIZE 32
+#define _FP_W_TYPE  unsigned long
+#define _FP_WS_TYPE signed long
+#define _FP_I_TYPE  long
+
+#define _FP_MUL_MEAT_S(R,X,Y)   \
+  _FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_D(R,X,Y)   \
+  _FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D,R,X,Y,umul_ppmm)
+#define _FP_MUL_MEAT_Q(R,X,Y)   \
+  _FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q,R,X,Y,umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R,X,Y)   _FP_DIV_MEAT_1_udiv_norm(S,R,X,Y)
+#define _FP_DIV_MEAT_D(R,X,Y)   _FP_DIV_MEAT_2_udiv(D,R,X,Y)
+#define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
+
+#define _FP_NANFRAC_B   _FP_QNANBIT_B
+#define _FP_NANFRAC_H   _FP_QNANBIT_H
+#define _FP_NANFRAC_S   _FP_QNANBIT_S
+#define _FP_NANFRAC_D   _FP_QNANBIT_D, 0
+#define _FP_NANFRAC_Q   _FP_QNANBIT_Q, 0, 0, 0
+
+/* The type of the result of a floating point comparison.  This must
+   match __libgcc_cmp_return__ in GCC for the target.  */
+typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__)));
+#define CMPtype __gcc_CMPtype
+
+#define _FP_NANSIGN_B   0
+#define _FP_NANSIGN_H   0
+#define _FP_NANSIGN_S   0
+#define _FP_NANSIGN_D   0
+#define _FP_NANSIGN_Q   0
+
+#define _FP_KEEPNANFRACP 0
+#define _FP_QNANNEGATEDP 0
+
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)  \
+  do {  \
+R##_s = _FP_NANSIGN_##fs;   \
+_FP_FRAC_SET_##wc(R,_FP_NANFRAC_##fs);  \
+R##_c = FP_CLS_NAN; \
+  } while (0)
+
+#define _FP_TININESS_AFTER_ROUNDING 1
+
+#define __LITTLE_ENDIAN 1234
+#define __BIG_ENDIAN4321
+
+#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+#define __BYTE_ORDER __BIG_ENDIAN
+#else
+#define __BYTE_ORDER __LITTLE_ENDIAN
+#endif
+
+/* Define ALIASNAME as a strong alias for NAME.  */
+# define strong_alias(name, aliasname) _strong_alias(name, alias

[PATCH v1 1/2] Aarch64: Add test for non-commutative SIMD intrinsic

2024-07-03 Thread alfie.richards

This adds a test for non-commutative SIMD NEON intrinsics.
Specifically addp is non-commutative and has a bug in the current big-endian 
implementation.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vector_intrinsics_asm.c: New test.
---
 .../aarch64/vector_intrinsics_asm.c   | 371 ++
 1 file changed, 371 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c

diff --git a/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
new file mode 100644
index 000..b7d5620abab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
@@ -0,0 +1,371 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" { xfail be } } } */
+
+#include "arm_neon.h"
+
+// SIGNED VADD INTRINSICS
+
+/*
+**test_vadd_s8:
+**	addp	v0\.8b, v0\.8b, v1\.8b
+**	ret
+*/
+int8x8_t test_vadd_s8(int8x8_t v1, int8x8_t v2) {
+ int8x8_t v3 = vpadd_s8(v1, v2);
+ return v3;
+}
+
+/*
+**test_vadd_s16:
+**addp	v0\.4h, v0\.4h, v1\.4h
+**ret
+*/
+int16x4_t test_vadd_s16(int16x4_t v1, int16x4_t v2) {
+ int16x4_t v3 = vpadd_s16(v1, v2);
+ return v3;
+}
+
+/*
+**test_vadd_s32:
+**	addp	v0\.2s, v0\.2s, v1\.2s
+**	ret
+*/
+int32x2_t test_vadd_s32(int32x2_t v1, int32x2_t v2) {
+ int32x2_t v3 = vpadd_s32(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_s8:
+**...
+**	addp	v0\.16b, v0\.16b, v1\.16b
+**	ret
+*/
+int8x16_t test_vaddq_s8(int8x16_t v1, int8x16_t v2) {
+ int8x16_t v3 = vpaddq_s8(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_s16:
+**...
+**	addp	v0\.8h, v0\.8h, v1\.8h
+**	ret
+*/
+int16x8_t test_vaddq_s16(int16x8_t v1, int16x8_t v2) {
+ int16x8_t v3 = vpaddq_s16(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_s32:
+**...
+**	addp	v0\.4s, v0\.4s, v1\.4s
+**	ret
+*/
+int32x4_t test_vaddq_s32(int32x4_t v1, int32x4_t v2) {
+ int32x4_t v3 = vpaddq_s32(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_s64:
+**...
+**	addp	v0\.2d, v0\.2d, v1\.2d
+**	ret
+*/
+int64x2_t test_vaddq_s64(int64x2_t v1, int64x2_t v2) {
+ int64x2_t v3 = vpaddq_s64(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddd_s64:
+**...
+**	addp	(d[0-9]+), v0\.2d
+**	fmov	x0, \1
+**	ret
+*/
+int64_t test_vaddd_s64(int64x2_t v1) {
+ int64_t v2 = vpaddd_s64(v1);
+ return v2;
+}
+
+/*
+**test_vaddl_s8:
+**...
+**	saddlp	v0\.4h, v0\.8b
+**	ret
+*/
+int16x4_t test_vaddl_s8(int8x8_t v1) {
+ int16x4_t v2 = vpaddl_s8(v1);
+ return v2;
+}
+
+/*
+**test_vaddlq_s8:
+**...
+**	saddlp	v0\.8h, v0\.16b
+**	ret
+*/
+int16x8_t test_vaddlq_s8(int8x16_t v1) {
+ int16x8_t v2 = vpaddlq_s8(v1);
+ return v2;
+}
+/*
+**test_vaddl_s16:
+**...
+**	saddlp	v0\.2s, v0\.4h
+**	ret
+*/
+int32x2_t test_vaddl_s16(int16x4_t v1) {
+ int32x2_t v2 = vpaddl_s16(v1);
+ return v2;
+}
+
+/*
+**test_vaddlq_s16:
+**...
+**	saddlp	v0\.4s, v0\.8h
+**	ret
+*/
+int32x4_t test_vaddlq_s16(int16x8_t v1) {
+ int32x4_t v2 = vpaddlq_s16(v1);
+ return v2;
+}
+
+/*
+**test_vaddl_s32:
+**...
+**	saddlp	v0\.1d, v0\.2s
+**	ret
+*/
+int64x1_t test_vaddl_s32(int32x2_t v1) {
+ int64x1_t v2 = vpaddl_s32(v1);
+ return v2;
+}
+
+/*
+**test_vaddlq_s32:
+**...
+**	saddlp	v0\.2d, v0\.4s
+**	ret
+*/
+int64x2_t test_vaddlq_s32(int32x4_t v1) {
+ int64x2_t v2 = vpaddlq_s32(v1);
+ return v2;
+}
+
+// UNSIGNED VADD INTRINSICS
+
+/*
+**test_vadd_u8:
+**...
+**	addp	v0\.8b, v0\.8b, v1\.8b
+**	ret
+*/
+uint8x8_t test_vadd_u8(uint8x8_t v1, uint8x8_t v2) {
+ uint8x8_t v3 = vpadd_u8(v1, v2);
+ return v3;
+}
+
+/*
+**test_vadd_u16:
+**...
+**	addp	v0\.4h, v0\.4h, v1\.4h
+**	ret
+*/
+uint16x4_t test_vadd_u16(uint16x4_t v1, uint16x4_t v2) {
+ uint16x4_t v3 = vpadd_u16(v1, v2);
+ return v3;
+}
+
+/*
+**test_vadd_u32:
+**...
+**	addp	v0\.2s, v0\.2s, v1\.2s
+**	ret
+*/
+uint32x2_t test_vadd_u32(uint32x2_t v1, uint32x2_t v2) {
+ uint32x2_t v3 = vpadd_u32(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_u8:
+**...
+**	addp	v0\.16b, v0\.16b, v1\.16b
+**	ret
+*/
+uint8x16_t test_vaddq_u8(uint8x16_t v1, uint8x16_t v2) {
+ uint8x16_t v3 = vpaddq_u8(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_u16:
+**...
+**	addp	v0\.8h, v0\.8h, v1\.8h
+**	ret
+*/
+uint16x8_t test_vaddq_u16(uint16x8_t v1, uint16x8_t v2) {
+ uint16x8_t v3 = vpaddq_u16(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_u32:
+**...
+**	addp	v0\.4s, v0\.4s, v1\.4s
+**	ret
+*/
+uint32x4_t test_vaddq_u32(uint32x4_t v1, uint32x4_t v2) {
+ uint32x4_t v3 = vpaddq_u32(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddq_u64:
+**...
+**	addp	v0\.2d, v0\.2d, v1\.2d
+**	ret
+*/
+uint64x2_t test_vaddq_u64(uint64x2_t v1, uint64x2_t v2) {
+ uint64x2_t v3 = vpaddq_u64(v1, v2);
+ return v3;
+}
+
+/*
+**test_vaddd_u64:
+**...
+**	addp	(d[0-9]+), v0\.2d
+**	fmov	x0, \1
+**	ret
+*/
+uint64_t test_vaddd_u64(uint64x2_t v1) {
+ uint64_t v2 = vpaddd_u64(v1);
+ return v2;
+}
+
+/*
+**test_vaddl_u8:
+**...
+**	uaddlp	v0\.4h, v0\.8b
+**	ret
+*/
+uint16x4_t test_vaddl_u8(uint8x8_t v1) {
+ uint16x4_t v2 = vpaddl_u8(v1);
+ return v2;
+}
+
+/*
+**test_vaddlq_u8:
+**...
+**	uadd

[PATCH v1 0/2] Aarch64: addp NEON big-endian fix [PR114890]

2024-07-03 Thread alfie.richards
From: Alfie Richards 

Hi All,

This fixes a case where the operands for the addp NEON intrinsic were 
erroneously swapped.

Regtested on aarch64-unknown-linux-gnu

Ok for master and GCC14.2?

Kind regards,
Alfie Richards

Alfie Richards (2):
  Aarch64: Add test for non-commutative SIMD intrinsic
  Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]

 gcc/config/aarch64/aarch64-simd.md|   2 -
 .../aarch64/vector_intrinsics_asm.c   | 371 ++
 2 files changed, 371 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c

-- 
2.34.1



[PATCH v1 2/2] Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]

2024-07-03 Thread alfie.richards

This change removes code that switches the operands in bigendian mode 
erroneously.
This fixes the related test also.

gcc/ChangeLog:

PR target/114890
* config/aarch64/aarch64-simd.md: Remove bigendian operand swap.

gcc/testsuite/ChangeLog:

PR target/114890
* gcc.target/aarch64/vector_intrinsics_asm.c: Remove xfail.
---
 gcc/config/aarch64/aarch64-simd.md   | 2 --
 gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c | 2 +-
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 868f4486218..095ef3228cc 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7341,8 +7341,6 @@ (define_expand "aarch64_addp"
   nunits /= 2;
 rtx par_even = aarch64_gen_stepped_int_parallel (nunits, 0, 2);
 rtx par_odd = aarch64_gen_stepped_int_parallel (nunits, 1, 2);
-if (BYTES_BIG_ENDIAN)
-  std::swap (operands[1], operands[2]);
 emit_insn (gen_aarch64_addp_insn (operands[0], operands[1],
 	operands[2], par_even, par_odd));
 DONE;
diff --git a/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
index b7d5620abab..e3dcd0830c8 100644
--- a/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
+++ b/gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { check-function-bodies "**" "" "" { xfail be } } } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
 
 #include "arm_neon.h"
 


Re: [PATCH 1/2] aarch64: PR target/115457 Implement missing __ARM_FEATURE_BF16 macro

2024-07-03 Thread Kyrylo Tkachov


> On 3 Jul 2024, at 11:59, Kyrylo Tkachov  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi all,
> 
> The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
>  header but GCC doesn't set this up.
> LLVM does, so this is an inconsistency between the compilers.
> 
> This patch enables that macro for TARGET_BF16_FP.
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Pushing to trunk.
> I think it makes sense to back port this to branches that support the 
> arm_bf16.h header.
> I’ll prepare patches for those separately.
> 
> Thanks,
> Kyrill
> 
> gcc/
> 
>PR target/115457
>* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
>Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.

This should be aarch64-c.cc  of course. I’ve fixed that 
up on commit and in the second patch too.
Thanks,
Kyrill

> 
> gcc/testsuite/
> 
>PR target/115457
>* gcc.target/aarch64/acle/bf16_feature.c: New test.
> 
> Signed-off-by: Kyrylo Tkachov 
> <0001-aarch64-PR-target-115457-Implement-missing-__ARM_FEA.patch>



Re: [PATCH v1 0/2] Aarch64: addp NEON big-endian fix [PR114890]

2024-07-03 Thread Kyrylo Tkachov
Hi Alfie,

> On 3 Jul 2024, at 12:10, alfie.richa...@arm.com wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> From: Alfie Richards 
> 
> Hi All,
> 
> This fixes a case where the operands for the addp NEON intrinsic were 
> erroneously swapped.
> 
> Regtested on aarch64-unknown-linux-gnu
> 
> Ok for master and GCC14.2?

This is okay, though I would have done it as a single commit rather than 
splitting the test case separately and then updating it to remove the xfail.
But it’s okay as is anyway.
Do you need someone to commit it for you?
Thanks,
Kyrill

> 
> Kind regards,
> Alfie Richards
> 
> Alfie Richards (2):
>  Aarch64: Add test for non-commutative SIMD intrinsic
>  Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]
> 
> gcc/config/aarch64/aarch64-simd.md|   2 -
> .../aarch64/vector_intrinsics_asm.c   | 371 ++
> 2 files changed, 371 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
> 
> --
> 2.34.1
> 



[PATCH v2] [FYI] Deduplicate explicitly-sized types (was: Re: [FYI] map packed field type to unpacked for debug info)

2024-07-03 Thread Alexandre Oliva
On Jun 13, 2024, Alexandre Oliva  wrote:

> I'll be back with an improved version.

When make_type_from_size is called with a biased type, for an entity
that isn't explicitly biased, we may refrain from reusing the given
type because it doesn't seem to match, and then proceed to create an
exact copy of that type.

Compute earlier the biased status of the expected type, early enough
for the suitability check of the given type.  Modify for_biased
instead of biased_p, so that biased_p remains with the given type's
status for the comparison.

Avoid creating unnecessary copies of types in make_type_from_size, by
caching and reusing previously-created identical types, similarly to
the caching of packable types.

While at that, fix two vaguely related issues:

- TYPE_DEBUG_TYPE's storage is shared with other sorts of references
to types, so it shouldn't be accessed unless
TYPE_CAN_HAVE_DEBUG_TYPE_P holds.

- When we choose the narrower/packed variant of a type as the main
debug info type, we fail to output its name if we fail to follow debug
type for the TYPE_NAME decl type in modified_type_die.

Regstrapped on x86_64-linux-gnu.  Pre-approved by Eric.


for  gcc/ada/ChangeLog

* gcc-interface/misc.cc (gnat_get_array_descr_info): Only follow
TYPE_DEBUG_TYPE if TYPE_CAN_HAVE_DEBUG_TYPE_P.
* gcc-interface/utils.cc (sized_type_hash): New struct.
(sized_type_hasher): New struct.
(sized_type_hash_table): New variable.
(init_gnat_utils): Allocate it.
(destroy_gnat_utils): Release it.
(sized_type_hasher::equal): New.
(hash_sized_type): New.
(canonicalize_sized_type): New.
(make_type_from_size): Use it to cache packed variants.  Fix
type reuse by combining biased_p and for_biased earlier.  Hold
the combination in for_biased, adjusting later uses.

for  gcc/ChangeLog

* dwarf2out.cc (modified_type_die): Follow name's debug type.

for  gcc/testsuite/ChangeLog

* gnat.dg/bias1.adb: Count occurrences of -7.*DW_AT_GNU_bias.
---
 gcc/ada/gcc-interface/misc.cc   |3 +
 gcc/ada/gcc-interface/utils.cc  |  116 +--
 gcc/dwarf2out.cc|7 ++
 gcc/testsuite/gnat.dg/bias1.adb |3 +
 4 files changed, 120 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 4f6f6774fe702..f77629ce70bf6 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -967,7 +967,8 @@ gnat_get_array_descr_info (const_tree const_type,
 
   while (true)
{
- if (TYPE_DEBUG_TYPE (source_element_type))
+ if (TYPE_CAN_HAVE_DEBUG_TYPE_P (source_element_type)
+ && TYPE_DEBUG_TYPE (source_element_type))
source_element_type = TYPE_DEBUG_TYPE (source_element_type);
  else if (TYPE_IS_PADDING_P (source_element_type))
source_element_type
diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 0eb9af8d4a2d5..66e3192ea4fbd 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -364,6 +364,26 @@ struct pad_type_hasher : ggc_cache_ptr_hash
 
 static GTY ((cache)) hash_table *pad_type_hash_table;
 
+struct GTY((for_user)) sized_type_hash
+{
+  hashval_t hash;
+  tree type;
+};
+
+struct sized_type_hasher : ggc_cache_ptr_hash
+{
+  static inline hashval_t hash (sized_type_hash *t) { return t->hash; }
+  static bool equal (sized_type_hash *a, sized_type_hash *b);
+
+  static int
+  keep_cache_entry (sized_type_hash *&t)
+  {
+return ggc_marked_p (t->type);
+  }
+};
+
+static GTY ((cache)) hash_table *sized_type_hash_table;
+
 static tree merge_sizes (tree, tree, tree, bool, bool);
 static tree fold_bit_position (const_tree);
 static tree compute_related_constant (tree, tree);
@@ -421,6 +441,9 @@ init_gnat_utils (void)
 
   /* Initialize the hash table of padded types.  */
   pad_type_hash_table = hash_table::create_ggc (512);
+
+  /* Initialize the hash table of sized types.  */
+  sized_type_hash_table = hash_table::create_ggc (512);
 }
 
 /* Destroy data structures of the utils.cc module.  */
@@ -443,6 +466,10 @@ destroy_gnat_utils (void)
   /* Destroy the hash table of padded types.  */
   pad_type_hash_table->empty ();
   pad_type_hash_table = NULL;
+
+  /* Destroy the hash table of sized types.  */
+  sized_type_hash_table->empty ();
+  sized_type_hash_table = NULL;
 }
 
 /* GNAT_ENTITY is a GNAT tree node for an entity.  Associate GNU_DECL, a GCC
@@ -1350,6 +1377,79 @@ type_unsigned_for_rm (tree type)
   return false;
 }
 
+/* Return true iff the sized types are equivalent.  */
+
+bool
+sized_type_hasher::equal (sized_type_hash *t1, sized_type_hash *t2)
+{
+  tree type1, type2;
+
+  if (t1->hash != t2->hash)
+return false;
+
+  type1 = t1->type;
+  type2 = t2->type;
+
+  /* We consider sized types equivalent if they have the same name,
+ size, alignment, RM size, and biasing.  The r

[PATCH, FYI] [debug] Avoid dropping bits from num/den in fixed-point types

2024-07-03 Thread Alexandre Oliva


We used to use an unsigned 128-bit type to hold the numerator and
denominator used to represent the delta of a fixed-point type in debug
information, but there are cases in which that was not enough, and
more significant bits silently overflowed and got omitted from debug
information.

Introduce a mode in which UI_to_gnu selects a wide-enough unsigned
type, and use that to convert numerator and denominator.  While at
that, avoid exceeding the maximum precision for wide ints, and for
available int modes, when selecting a type to represent very wide
constants, falling back to 0/0 for unrepresentable fractions.

Regstrapped on x86_64-linux-gnu.  Pre-approved by Eric.  I'm checking it in.


for  gcc/ada/ChangeLog

* gcc-interface/cuintp.cc (UI_To_gnu): Add mode that selects a
wide enough unsigned type.  Fail if the constant exceeds the
representable numbers.
* gcc-interface/decl.cc (gnat_to_gnu_entity): Use it for
numerator and denominator of fixed-point types.  In case of
failure, fall back to an indeterminate fraction.
---
 gcc/ada/gcc-interface/cuintp.cc |   66 +--
 gcc/ada/gcc-interface/decl.cc   |   19 +--
 2 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/gcc/ada/gcc-interface/cuintp.cc b/gcc/ada/gcc-interface/cuintp.cc
index cdf6c0197500d..1903c5af0f19a 100644
--- a/gcc/ada/gcc-interface/cuintp.cc
+++ b/gcc/ada/gcc-interface/cuintp.cc
@@ -35,6 +35,7 @@
 #include "tree.h"
 #include "inchash.h"
 #include "fold-const.h"
+#include "stor-layout.h"
 
 #include "ada.h"
 #include "types.h"
@@ -67,7 +68,9 @@ build_cst_from_int (tree type, HOST_WIDE_INT low)
 /* Similar to UI_To_Int, but return a GCC INTEGER_CST or REAL_CST node,
depending on whether TYPE is an integral or real type.  Overflow is tested
by the constant-folding used to build the node.  TYPE is the GCC type of
-   the resulting node.  */
+   the resulting node.  If TYPE is NULL, an unsigned integer type wide enough
+   to hold the entire constant is selected, and if no such type exists,
+   return NULL_TREE.  */
 
 tree
 UI_To_gnu (Uint Input, tree type)
@@ -77,8 +80,10 @@ UI_To_gnu (Uint Input, tree type)
  any such possible value for intermediate computations and then rely on a
  conversion back to TYPE to perform the bias adjustment when need be.  */
   tree comp_type
-= TREE_CODE (type) == INTEGER_TYPE && TYPE_BIASED_REPRESENTATION_P (type)
-  ? get_base_type (type) : type;
+= (!type ? gnat_type_for_size (32, 1)
+   : (TREE_CODE (type) == INTEGER_TYPE
+ && TYPE_BIASED_REPRESENTATION_P (type))
+   ? get_base_type (type) : type);
   tree gnu_ret;
 
   if (Input <= Uint_Direct_Last)
@@ -88,9 +93,14 @@ UI_To_gnu (Uint Input, tree type)
   Int Idx = (*Uints_Ptr)[Input - Uint_Table_Start].Loc;
   Pos Length = (*Uints_Ptr)[Input - Uint_Table_Start].Length;
   Int First = (*Udigits_Ptr)[Idx];
+  tree_code code = First < 0 ? MINUS_EXPR : PLUS_EXPR;
   tree gnu_base;
 
   gcc_assert (Length > 0);
+  /* The extension of unsigned types we use to try to fit the
+constant only works if we're dealing with nonnegative
+constants, but that's what we expect when !TYPE.  */
+  gcc_assert (type || First >= 0);
 
   /* The computations we perform below always require a type at least as
 large as an integer not to overflow.  FP types are always fine, but
@@ -103,22 +113,44 @@ UI_To_gnu (Uint Input, tree type)
   gnu_base = build_cst_from_int (comp_type, Base);
 
   gnu_ret = build_cst_from_int (comp_type, First);
-  if (First < 0)
-   for (Idx++, Length--; Length; Idx++, Length--)
- gnu_ret = fold_build2 (MINUS_EXPR, comp_type,
-fold_build2 (MULT_EXPR, comp_type,
- gnu_ret, gnu_base),
-build_cst_from_int (comp_type,
-(*Udigits_Ptr)[Idx]));
-  else
-   for (Idx++, Length--; Length; Idx++, Length--)
- gnu_ret = fold_build2 (PLUS_EXPR, comp_type,
-fold_build2 (MULT_EXPR, comp_type,
- gnu_ret, gnu_base),
-build_cst_from_int (comp_type,
-(*Udigits_Ptr)[Idx]));
+  for (Idx++, Length--; Length; Idx++, Length--)
+   for (;;)
+ {
+   tree elt, scaled, next_ret;
+   elt = build_cst_from_int (comp_type, (*Udigits_Ptr)[Idx]);
+   /* We want to detect overflows with an unsigned type when
+  TYPE is not given, but int_const_binop doesn't work for
+  e.g. floating-point TYPEs.  */
+   if (!type)
+ {
+   scaled = int_const_binop (MULT_EXPR, gnu_ret, gnu_base, -1);
+   next_ret = int_const_binop (code, scaled, elt,

Re: [PATCH v1 0/2] Aarch64: addp NEON big-endian fix [PR114890]

2024-07-03 Thread Alfie Richards
Hi Kyrill,

Okay noted for future!
Yes happy someone to commit this.

Kind regards,
Alfie

Sent from Outlook for iOS

From: Kyrylo Tkachov 
Sent: Wednesday, July 3, 2024 11:23:37 AM
To: Alfie Richards 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH v1 0/2] Aarch64: addp NEON big-endian fix [PR114890]

Hi Alfie,

> On 3 Jul 2024, at 12:10, alfie.richa...@arm.com wrote:
>
> External email: Use caution opening links or attachments
>
>
> From: Alfie Richards 
>
> Hi All,
>
> This fixes a case where the operands for the addp NEON intrinsic were 
> erroneously swapped.
>
> Regtested on aarch64-unknown-linux-gnu
>
> Ok for master and GCC14.2?

This is okay, though I would have done it as a single commit rather than 
splitting the test case separately and then updating it to remove the xfail.
But it’s okay as is anyway.
Do you need someone to commit it for you?
Thanks,
Kyrill

>
> Kind regards,
> Alfie Richards
>
> Alfie Richards (2):
>  Aarch64: Add test for non-commutative SIMD intrinsic
>  Aarch64, bugfix: Fix NEON bigendian addp intrinsic [PR114890]
>
> gcc/config/aarch64/aarch64-simd.md|   2 -
> .../aarch64/vector_intrinsics_asm.c   | 371 ++
> 2 files changed, 371 insertions(+), 2 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/vector_intrinsics_asm.c
>
> --
> 2.34.1
>



Re: [Fortran, Patch, PR 96992, V3] Fix Class arrays of different ranks are rejected as storage association argument

2024-07-03 Thread Andre Vehreschild
Hi Harald,

I am sorry for the long delay, but fixing the negative stride lead from one
issue to the next. I finally got a version that does not regress. Please have a
look.

This patch has two parts:
1. The runtime library part in pr96992_3p1.patch and
2. the compiler changes in pr96992_3p2.patch.

In my branch also the two patches from Paul for pr59104 and pr102689 are
living, which might lead to small shifts during application of the patches.

NOTE, this patch adds internal packing and unpacking of class arrays similar to
the regular pack and unpack. I think this is necessary, because the regular
un-/pack does not use the vptr's _copy routine for moving data and therefore
may produce bugs.

The un-/pack_class routines are yet only used for converting a derived type
array to a class array. Extending their use when a UN-/PACK() is applied on a
class array is still to be done (as part of another PR).

Regtests fine on x86_64-pc-linux-gnu/ Fedora 39.

Regards,
Andre

PS: @Paul I could figure my test failures with -Ox with x e { 2, 3, s } to be
caused by initialization order. I.e. a member was set only after it was read.

On Wed, 19 Jun 2024 21:17:23 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> Am 19.06.24 um 09:07 schrieb Andre Vehreschild:
> > Hi Harald,
> >
> > thank you for the investigation and useful tips. I had to figure what went
> > wrong here, but I now figured, that the array needs repacking when a
> > negative stride is used (or at least a call to that routine, which then
> > fixes "stuff"). I have added it, freeing the memory allocated potentially
> > by pack, and also updated the testcase to include the negative stride.
>
> hmmm, the pack does not always get generated:
>
> module foo_mod
>implicit none
>type foo
>   integer :: i
>end type foo
> contains
>subroutine d1(x,n)
>  integer, intent(in) :: n
>  integer :: i
>  class (foo), intent(out) :: x(n)
>  select type(x)
>  class is(foo)
> x(:)%i = (/ (42 + i, i = 1, n ) /)
>  class default
> stop 1
>  end select
>end subroutine d1
>subroutine d2(x,n)
>  integer, intent(in) :: n
>  integer :: i
>  class (foo), intent(in) :: x(n,n,n)
>  select type (x)
>  class is (foo)
> print *,"d2:  ", x%i
> if ( any( x%i /= reshape((/ (42 + i, i = 1, n ** 3 ) /), [n, n,
> n] ))) stop 2
>  class default
> stop 3
>  end select
>end subroutine d2
>
>subroutine d3(x,n)
>  integer, intent(in) :: n
>  integer :: i
>  class (foo), intent(inout) :: x(n)
>  select type (x)
>  class is (foo)
> print *,"d3_1:", x%i
> x%i = -x%i   ! Simply negate elements
> print *,"d3_2:", x%i
>  class default
> stop 33
>  end select
>end subroutine d3
> end module foo_mod
> program main
>use foo_mod
>implicit none
>type (foo), dimension(:), allocatable :: f
>integer :: n, k, m
>n = 2
>allocate (f(n*n*n))
>! Original testcase:
>call d1(f,n*n*n)
>print *, "d1->:", f%i
>call d2(f,n)
>! Ensure that array f is ok:
>print *, "d2->:", f%i
>
>! The following shows that no appropriate internal pack is generated:
>call d1(f,n*n*n)
>print *, "d1->:", f%i
>m = n*n*n
>k = 3
>print *, "->d3:", f(1:m:k)%i
>call d3(f(1:m:k),1+(m-1)/k)
>print *, "d3->:", f(1:m:k)%i
>print *, "full:", f%i
>deallocate (f)
> end program main
>
>
> After the second version of your patch this prints:
>
>   d1->:  43  44  45  46  47
>  48  49  50
>   d2:43  44  45  46  47
>  48  49  50
>   d2->:  43  44  45  46  47
>  48  49  50
>   d1->:  43  44  45  46  47
>  48  49  50
>   ->d3:  43  46  49
>   d3_1:  43  44  45
>   d3_2: -43 -44 -45
>   d3->: -43  46  49
>   full: -43 -44 -45  46  47
>  48  49  50
>
> While the print properly handles f(1:m:k)%i, passing it as
> actual argument to subroutine d3 does not do pack/unpack.
>
> Can you have another look?
>
> Thanks,
> Harald
>
>
> > Regtests fine on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
> >
> > Regards,
> > Andre
> >
> > On Sun, 16 Jun 2024 23:27:46 +0200
> > Harald Anlauf  wrote:
> >
> > << snipped for brevity >>>
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de
From d429783a8b5c9dc9b6004ea8bc89247d1da63127 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Fri, 28 Jun 2024 08:31:29 +0200
Subject: [PATCH 1/2] libgfortran: Add internal un/pack_class runtime
 functions.

Packing class arrays was done using t

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-07-03 Thread Alexandre Oliva
On Jun 27, 2024, Hongtao Liu  wrote:

> LGTM, thanks.

> On Thu, Jun 13, 2024 at 3:32 PM Alexandre Oliva  wrote:

>> for  gcc/ChangeLog
>> 
>> PR target/113719
>> * config/i386/i386-options.cc
>> (ix86_override_options_after_change_1): Add opts and opts_set
>> parms, operate on them, after factoring out of...
>> (ix86_override_options_after_change): ... this.  Restore calls
>> of ix86_default_align and ix86_recompute_optlev_based_flags.
>> (ix86_option_override_internal): Call the factored-out bits.

Thanks, I've finally put it in.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] RISC-V: Use tu policy for first-element vec_set [PR115725].

2024-07-03 Thread Kito Cheng
Ok for trunk and gcc 14

juzhe.zh...@rivai.ai  於 2024年7月3日 週三 17:43 寫道:

> LGTM
>
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* Robin Dapp 
> *Date:* 2024-07-03 17:39
> *To:* gcc-patches 
> *CC:* rdapp.gcc ; palmer ; Kito
> Cheng ; juzhe.zh...@rivai.ai; jeffreyalaw
> ; Li, Pan2 
> *Subject:* [PATCH] RISC-V: Use tu policy for first-element vec_set
> [PR115725].
> Hi,
>
> this patch changes the tail policy for vmv.s.x from ta to tu.
> By default the bug does not show up with qemu because qemu's
> current vmv.s.x implementation always uses the tail-undisturbed
> policy.  With a local qemu version that overwrites the tail
> with ones when the tail-agnostic policy is specified, the bug
> shows.
>
> Regtested on rv64gcv_zvfh.
>
> OK for trunk and GCC 14 backport?
>
> Regards
> Robin
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Add TU policy.
> * config/riscv/riscv-protos.h (enum insn_type): Define
> SCALAR_MOVE_MERGED_OP_TU.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: Adjust
> test expectation.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: Ditto.
> ---
> gcc/config/riscv/autovec.md  |  3 ++-
> gcc/config/riscv/riscv-protos.h  |  4 
> .../riscv/rvv/autovec/vls-vlmax/vec_set-1.c  | 12 
> .../riscv/rvv/autovec/vls-vlmax/vec_set-2.c  | 12 
> .../riscv/rvv/autovec/vls-vlmax/vec_set-3.c  | 12 
> .../riscv/rvv/autovec/vls-vlmax/vec_set-4.c  | 12 
> 6 files changed, 22 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 1af50a46c4c..aa7dd526804 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1341,7 +1341,8 @@ (define_expand "vec_set"
>  {
>rtx ops[] = {operands[0], operands[0], operands[1]};
>riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast
> (mode),
> - riscv_vector::SCALAR_MOVE_MERGED_OP, ops, CONST1_RTX (Pmode));
> + riscv_vector::SCALAR_MOVE_MERGED_OP_TU,
> + ops, CONST1_RTX (Pmode));
>  }
>else
>  {
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 39b723a590b..064aa082742 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -524,6 +524,10 @@ enum insn_type : unsigned int
>SCALAR_MOVE_MERGED_OP = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
>   | HAS_MERGE_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P
>   | UNARY_OP_P,
> +
> +  SCALAR_MOVE_MERGED_OP_TU = HAS_DEST_P | HAS_MASK_P | USE_ONE_TRUE_MASK_P
> +   | HAS_MERGE_P | TU_POLICY_P | MDEFAULT_POLICY_P
> +   | UNARY_OP_P,
> };
> enum vlmul_type
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
> index ecb160933d6..99b0f625c83 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c
> @@ -64,14 +64,10 @@ typedef double vnx2df __attribute__((vector_size
> (16)));
> TEST_ALL1 (VEC_SET)
> TEST_ALL_VAR1 (VEC_SET_VAR1)
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*ta,\s*ma} 1 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 5 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*ta,\s*ma} 2 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 6 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*ta,\s*ma} 2 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 6 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*ta,\s*ma} 2 } } */
> -/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 4 } } */
> +/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e8,\s*m1,\s*tu,\s*ma} 6 } } */
> +/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e16,\s*m1,\s*tu,\s*ma} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e32,\s*m1,\s*tu,\s*ma} 8 } } */
> +/* { dg-final { scan-assembler-times
> {vset[i]*vli\s+[a-z0-9,]+,\s*e64,\s*m1,\s*tu,\s*ma} 6 } } */
> /* { dg-final { scan-assembler-times {\tvmv.v.x} 13 } } */
> /* { dg-final { scan-assembler-times {\tvfmv.v.f} 8 } } */
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
> index 194abff77cc..64a40308eb1 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c
> +++ b/gcc/testsuite/gcc

Re: [PATCH] match.pd: Fold x/sqrt(x) to sqrt(x)

2024-07-03 Thread Jennifer Schmitz


> On 2 Jul 2024, at 15:01, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 2 Jul 2024, Jennifer Schmitz wrote:
> 
>> This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for 
>> -funsafe-math-optimizations. Test cases were added for double, float, and 
>> long double.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> Ok for mainline?
> 
> You put that into /* Fold comparisons against built-in math functions.  */
> but it fits more the group of simplifications around
> 
> /* Simplification of math builtins.  These rules must all be optimizations
>   as well as IL simplifications.  If there is a possibility that the new
>   form could be a pessimization, the rule should go in the
> canonicalization
> ...
> 
> where we already have sqrt related foldings.  Please put it there.
Done, thank you. I have bootstrapped and tested the patch again for the new 
location.
> 
> Otherwise this looks OK.
> 
> Richard.
> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>> 
>>  * match.pd: Fold x/sqrt(x) to sqrt(x).
>> 
>> gcc/testsuite/
>> 
>>  * gcc.dg/tree-ssa/sqrt_div.c: New test.
>> 
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


fold_sqrt_v2.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Jakub Jelinek
On Tue, Jun 11, 2024 at 01:20:50PM +0200, Richard Biener wrote:
> When the operand is gimplified to an extract of a register or a
> register we have to disallow memory as we otherwise fail to
> gimplify it properly.  Instead of
> 
>   __asm__("" : "=rm" __imag );
> 
> we want
> 
>   __asm__("" : "=rm" D.2772);
>   _1 = REALPART_EXPR ;
>   r = COMPLEX_EXPR <_1, D.2772>;
> 
> otherwise SSA rewrite will fail and generate wrong code with 'r'
> left bare in the asm output.
> 
> Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.
> 
> I've made the testcase hopefully generic enough (the bug used =X
> which I'm not sure is portable - I've used _Complex int so 'r'
> has a chance to work).

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -7044,6 +7044,22 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
> ret = tret;
>   }
>  
> +  /* If the gimplified operand is a register we do not allow memory.  */
> +  if (allows_mem
> +   && (is_gimple_reg (TREE_VALUE (link))
> +   || (handled_component_p (TREE_VALUE (link))
> +   && is_gimple_reg (TREE_OPERAND (TREE_VALUE (link), 0)
> + {
> +   if (allows_reg)
> + allows_mem = 0;
> +   else
> + {
> +   error ("impossible constraint in %");
> +   error ("non-memory output %d must stay in memory", i);
> +   return GS_ERROR;

Does this else part ever trigger or could it be just gcc_assert (allows_reg)?
E.g. C FE build_asm_expr has
  /* If the operand is going to end up in memory,
 mark it addressable.  */
  if (!allows_reg && !c_mark_addressable (output))
Or C++ FE finish_asm_stmt:
  /* If the operand is going to end up in memory,
 mark it addressable.  */
  if (!allows_reg && !cxx_mark_addressable (*op))

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr115426.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu11" } */
> +
> +_Complex int fcs()
> +{
> +  _Complex int r;

It would be useful to initialize r or at least __real__ r
before the asm as we return it whole and I think the bug
should trigger with that too.

> +  __asm__("" : "=rm" (__imag__ r));
> +  return r;
> +}

Also, it would be nice to cover also the "=m" case in another
function to make sure that still works.

Jakub



[PATCH] Remove redundant vector permute dump

2024-07-03 Thread Richard Biener
The following removes redundant dumping in vect permute vectorization.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
redundant dump.
---
 gcc/tree-vect-slp.cc | 10 --
 1 file changed, 10 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 5e4423c05b1..c62b0b5cf88 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9702,16 +9702,6 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, 
gimple_stmt_iterator *gsi,
 }
 
   gcc_assert (perm.length () == SLP_TREE_LANES (node));
-  if (dump_p)
-{
-  dump_printf_loc (MSG_NOTE, vect_location,
-  "vectorizing permutation");
-  for (unsigned i = 0; i < perm.length (); ++i)
-   dump_printf (MSG_NOTE, " op%u[%u]", perm[i].first, perm[i].second);
-  if (repeating_p)
-   dump_printf (MSG_NOTE, " (repeat %d)\n", SLP_TREE_LANES (node));
-  dump_printf (MSG_NOTE, "\n");
-}
 
   /* REPEATING_P is true if every output vector is guaranteed to use the
  same permute vector.  We can handle that case for both variable-length
-- 
2.35.3


Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Richard Biener
On Wed, 3 Jul 2024, Jakub Jelinek wrote:

> On Tue, Jun 11, 2024 at 01:20:50PM +0200, Richard Biener wrote:
> > When the operand is gimplified to an extract of a register or a
> > register we have to disallow memory as we otherwise fail to
> > gimplify it properly.  Instead of
> > 
> >   __asm__("" : "=rm" __imag );
> > 
> > we want
> > 
> >   __asm__("" : "=rm" D.2772);
> >   _1 = REALPART_EXPR ;
> >   r = COMPLEX_EXPR <_1, D.2772>;
> > 
> > otherwise SSA rewrite will fail and generate wrong code with 'r'
> > left bare in the asm output.
> > 
> > Bootstrap and regtest in progress on x86_64-unknown-linux-gnu.
> > 
> > I've made the testcase hopefully generic enough (the bug used =X
> > which I'm not sure is portable - I've used _Complex int so 'r'
> > has a chance to work).
> 
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -7044,6 +7044,22 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> > gimple_seq *post_p)
> >   ret = tret;
> > }
> >  
> > +  /* If the gimplified operand is a register we do not allow memory.  
> > */
> > +  if (allows_mem
> > + && (is_gimple_reg (TREE_VALUE (link))
> > + || (handled_component_p (TREE_VALUE (link))
> > + && is_gimple_reg (TREE_OPERAND (TREE_VALUE (link), 0)
> > +   {
> > + if (allows_reg)
> > +   allows_mem = 0;
> > + else
> > +   {
> > + error ("impossible constraint in %");
> > + error ("non-memory output %d must stay in memory", i);
> > + return GS_ERROR;
> 
> Does this else part ever trigger or could it be just gcc_assert (allows_reg)?

I've added this just for completeness, so I think I can make it an
assert if we consider this invalid GENERIC.

> E.g. C FE build_asm_expr has
> /* If the operand is going to end up in memory,
>mark it addressable.  */
> if (!allows_reg && !c_mark_addressable (output))
> Or C++ FE finish_asm_stmt:
>   /* If the operand is going to end up in memory,
>  mark it addressable.  */
>   if (!allows_reg && !cxx_mark_addressable (*op))
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr115426.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-std=gnu11" } */
> > +
> > +_Complex int fcs()
> > +{
> > +  _Complex int r;
> 
> It would be useful to initialize r or at least __real__ r
> before the asm as we return it whole and I think the bug
> should trigger with that too.

The bug triggers also for

_Complex int fcs (_Complex int r)
{
  __asm__("" : "=rm" (__imag__ r));
  return r;
}


> > +  __asm__("" : "=rm" (__imag__ r));
> > +  return r;
> > +}
> 
> Also, it would be nice to cover also the "=m" case in another
> function to make sure that still works.

Done.

I'm re-testing the following.

Richard.

>From 40023cac83562a1451aba550533d042fec1c144e Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Tue, 11 Jun 2024 13:11:08 +0200
Subject: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output
 operand
To: gcc-patches@gcc.gnu.org

When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly.  Instead of

  __asm__("" : "=rm" __imag );

we want

  __asm__("" : "=rm" D.2772);
  _1 = REALPART_EXPR ;
  r = COMPLEX_EXPR <_1, D.2772>;

otherwise SSA rewrite will fail and generate wrong code with 'r'
left bare in the asm output.

PR middle-end/115426
* gimplify.cc (gimplify_asm_expr): Handle "rm" output
constraint gimplified to a register (operation).

* gcc.dg/pr115426.c: New testcase.
---
 gcc/gimplify.cc | 10 ++
 gcc/testsuite/gcc.dg/pr115426.c | 14 ++
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr115426.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 04875e8df29..13189ddee99 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -7044,6 +7044,16 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
  ret = tret;
}
 
+  /* If the gimplified operand is a register we do not allow memory.  */
+  if (allows_mem
+ && (is_gimple_reg (TREE_VALUE (link))
+ || (handled_component_p (TREE_VALUE (link))
+ && is_gimple_reg (TREE_OPERAND (TREE_VALUE (link), 0)
+   {
+ gcc_assert (allows_reg);
+ allows_mem = 0;
+   }
+
   /* If the constraint does not allow memory make sure we gimplify
  it to a register if it is not already but its base is.  This
 happens for complex and vector components.  */
diff --git a/gcc/testsuite/gcc.dg/pr115426.c b/gcc/testsuite/gcc.dg/pr115426.c
new file mode 100644
index 000..02bfc3f21fa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115426.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu11" } */
+
+_Complex int fcs (_Complex int r)
+{
+  __asm__("" : "=rm" (__ima

Re: [PATCH] match.pd: Fold x/sqrt(x) to sqrt(x)

2024-07-03 Thread Richard Biener
On Wed, 3 Jul 2024, Jennifer Schmitz wrote:

> 

OK.

Thanks,
Richard.

> > On 2 Jul 2024, at 15:01, Richard Biener  wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Tue, 2 Jul 2024, Jennifer Schmitz wrote:
> > 
> >> This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for 
> >> -funsafe-math-optimizations. Test cases were added for double, float, and 
> >> long double.
> >> 
> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
> >> regression.
> >> Ok for mainline?
> > 
> > You put that into /* Fold comparisons against built-in math functions.  */
> > but it fits more the group of simplifications around
> > 
> > /* Simplification of math builtins.  These rules must all be optimizations
> >   as well as IL simplifications.  If there is a possibility that the new
> >   form could be a pessimization, the rule should go in the
> > canonicalization
> > ...
> > 
> > where we already have sqrt related foldings.  Please put it there.
> Done, thank you. I have bootstrapped and tested the patch again for the new 
> location.
> > 
> > Otherwise this looks OK.
> > 
> > Richard.
> > 
> >> Signed-off-by: Jennifer Schmitz 
> >> 
> >> gcc/
> >> 
> >>  * match.pd: Fold x/sqrt(x) to sqrt(x).
> >> 
> >> gcc/testsuite/
> >> 
> >>  * gcc.dg/tree-ssa/sqrt_div.c: New test.
> >> 
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Jakub Jelinek
On Wed, Jul 03, 2024 at 02:21:25PM +0200, Richard Biener wrote:
> I'm re-testing the following.
> 
> Richard.
> 
> >From 40023cac83562a1451aba550533d042fec1c144e Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Tue, 11 Jun 2024 13:11:08 +0200
> Subject: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output
>  operand
> To: gcc-patches@gcc.gnu.org
> 
> When the operand is gimplified to an extract of a register or a
> register we have to disallow memory as we otherwise fail to
> gimplify it properly.  Instead of
> 
>   __asm__("" : "=rm" __imag );
> 
> we want
> 
>   __asm__("" : "=rm" D.2772);
>   _1 = REALPART_EXPR ;
>   r = COMPLEX_EXPR <_1, D.2772>;
> 
> otherwise SSA rewrite will fail and generate wrong code with 'r'
> left bare in the asm output.
> 
>   PR middle-end/115426
>   * gimplify.cc (gimplify_asm_expr): Handle "rm" output
>   constraint gimplified to a register (operation).
> 
>   * gcc.dg/pr115426.c: New testcase.

LGTM, thanks.

Jakub



Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Jakub Jelinek
On Wed, Jul 03, 2024 at 02:24:19PM +0200, Jakub Jelinek wrote:
> > >From 40023cac83562a1451aba550533d042fec1c144e Mon Sep 17 00:00:00 2001
> > From: Richard Biener 
> > Date: Tue, 11 Jun 2024 13:11:08 +0200
> > Subject: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output
> >  operand
> > To: gcc-patches@gcc.gnu.org
> > 
> > When the operand is gimplified to an extract of a register or a
> > register we have to disallow memory as we otherwise fail to
> > gimplify it properly.  Instead of
> > 
> >   __asm__("" : "=rm" __imag );
> > 
> > we want
> > 
> >   __asm__("" : "=rm" D.2772);
> >   _1 = REALPART_EXPR ;
> >   r = COMPLEX_EXPR <_1, D.2772>;
> > 
> > otherwise SSA rewrite will fail and generate wrong code with 'r'
> > left bare in the asm output.
> > 
> > PR middle-end/115426
> > * gimplify.cc (gimplify_asm_expr): Handle "rm" output
> > constraint gimplified to a register (operation).
> > 
> > * gcc.dg/pr115426.c: New testcase.
> 
> LGTM, thanks.

Actually, wonder if that assert won't trigger on something invalid like
"=i" (__imag__ r)
which previously got rejected only during expansion.

So maybe instead of the assert just do your new handling just for allows_reg
&& allows_mem and leave the rest as before.

Jakub



Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Richard Biener
On Wed, 3 Jul 2024, Jakub Jelinek wrote:

> On Wed, Jul 03, 2024 at 02:24:19PM +0200, Jakub Jelinek wrote:
> > > >From 40023cac83562a1451aba550533d042fec1c144e Mon Sep 17 00:00:00 2001
> > > From: Richard Biener 
> > > Date: Tue, 11 Jun 2024 13:11:08 +0200
> > > Subject: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm 
> > > output
> > >  operand
> > > To: gcc-patches@gcc.gnu.org
> > > 
> > > When the operand is gimplified to an extract of a register or a
> > > register we have to disallow memory as we otherwise fail to
> > > gimplify it properly.  Instead of
> > > 
> > >   __asm__("" : "=rm" __imag );
> > > 
> > > we want
> > > 
> > >   __asm__("" : "=rm" D.2772);
> > >   _1 = REALPART_EXPR ;
> > >   r = COMPLEX_EXPR <_1, D.2772>;
> > > 
> > > otherwise SSA rewrite will fail and generate wrong code with 'r'
> > > left bare in the asm output.
> > > 
> > >   PR middle-end/115426
> > >   * gimplify.cc (gimplify_asm_expr): Handle "rm" output
> > >   constraint gimplified to a register (operation).
> > > 
> > >   * gcc.dg/pr115426.c: New testcase.
> > 
> > LGTM, thanks.
> 
> Actually, wonder if that assert won't trigger on something invalid like
> "=i" (__imag__ r)
> which previously got rejected only during expansion.

This case works - the path only triggers when allows_mem so I think it
should be safe.

> So maybe instead of the assert just do your new handling just for allows_reg
> && allows_mem and leave the rest as before.

But "=mi" would be a valid constraint (even if a literal immediate
would be never OK there)?

But yeah, && allows_reg would make it obviously safe.  I'll adjust
and re-fire the testing.

Richard.

>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] middle-end/115426 - wrong gimplification of "rm" asm output operand

2024-07-03 Thread Jakub Jelinek
On Wed, Jul 03, 2024 at 02:37:12PM +0200, Richard Biener wrote:
> > So maybe instead of the assert just do your new handling just for allows_reg
> > && allows_mem and leave the rest as before.
> 
> But "=mi" would be a valid constraint (even if a literal immediate
> would be never OK there)?

I'd say so, the compiler can choose and while i won't ever work there, m
can.  But then the FEs would try to make it addressable because it is
!allows_reg.  Still, I think it is better to just guard and not assert.

> But yeah, && allows_reg would make it obviously safe.  I'll adjust
> and re-fire the testing.

Thanks.

Jakub



Re: [PATCH] match.pd: Fold x/sqrt(x) to sqrt(x)

2024-07-03 Thread Kyrylo Tkachov


> On 3 Jul 2024, at 14:22, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 3 Jul 2024, Jennifer Schmitz wrote:
> 
>> 
> 
> OK.

I’ve pushed the patch on Jennifer’s behalf with 
8dc5ad3ce8d4d2cd6cc2b7516d282395502fdf7d .
One thing I noticed is that the patch had DOS-style line endings (which render 
as ^M on many Linux editors and break common tools like patch -p1).
You may want to have a look at your editor settings to ensure that it uses Unix 
line endings.
On a vim editor it’s something like “set ff=unix”.
I’ve fixed them up in the patch before committing in this case.

Welcome to the GCC community!
Thanks,
Kyrill


> 
> Thanks,
> Richard.
> 
>>> On 2 Jul 2024, at 15:01, Richard Biener  wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> On Tue, 2 Jul 2024, Jennifer Schmitz wrote:
>>> 
 This patch adds a pattern in match.pd folding x/sqrt(x) to sqrt(x) for 
 -funsafe-math-optimizations. Test cases were added for double, float, and 
 long double.
 
 The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
 regression.
 Ok for mainline?
>>> 
>>> You put that into /* Fold comparisons against built-in math functions.  */
>>> but it fits more the group of simplifications around
>>> 
>>> /* Simplification of math builtins.  These rules must all be optimizations
>>>  as well as IL simplifications.  If there is a possibility that the new
>>>  form could be a pessimization, the rule should go in the
>>> canonicalization
>>> ...
>>> 
>>> where we already have sqrt related foldings.  Please put it there.
>> Done, thank you. I have bootstrapped and tested the patch again for the new 
>> location.
>>> 
>>> Otherwise this looks OK.
>>> 
>>> Richard.
>>> 
 Signed-off-by: Jennifer Schmitz 
 
 gcc/
 
 * match.pd: Fold x/sqrt(x) to sqrt(x).
 
 gcc/testsuite/
 
 * gcc.dg/tree-ssa/sqrt_div.c: New test.
 
>>> 
>>> --
>>> Richard Biener 
>>> SUSE Software Solutions Germany GmbH,
>>> Frankenstrasse 146, 90461 Nuernberg, Germany;
>>> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>> 
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)




[PATCH 0/5][v2] SLP load and store permute lowering

2024-07-03 Thread Richard Biener


This completes (functionality wise) the work bringing non-SLP
"interleaving" group vectorization to SLP.  I've chosen to keep
[1/5] exactly the same as posted last time but it might confuse
things as [2/5] largely refactors it.

The series is how it worked out development wise with [2/5]
completing [1/5] with handling multi-level interleaving steps
and implementing a crude fallback for the case where SLP groups
lanes in a way not compatible with an interleaving scheme
(we fail to split SLP into multiple single-lane SLP when the
permute isn't supported).

[3/5] then goes to handle gaps, with the prerequesite of handling
NULL in SLP_TREE_SCALAR_STMTS done (pushed this morning) it's
quite trivial to add.

[4/5] adds support for group-size of three on the load side

[5/5] adds support for group-size of three (and power-of-two multiples of 
that) on the store side.

I've bootstrapped and tested the series (and individual steps)
on x86_64-unknown-linux-gnu.  I've also built SPEC CPU 2017 with [4/5].

As far as merging goes I will start with [5/5] which is quite
independent, I'll squash the rest and I'm yet undecided on how
and whether to populate the bst_map (I think the SLP pattern
matching expects the nodes to be there).

With the change we will still have load permutations in some cases.

Code generation with SLP is sometimes worse than without, but that
happens only when SLP discovery has some multi-lane sub-graphs, for
the all-single-lane case it should be good.

Comments welcome, esp. which parts (permutes) are unlikely going
to work for VLA so I can put comments in the code.

Thanks,
Richard.


[PATCH 1/5] lower SLP load permutation to interleaving

2024-07-03 Thread Richard Biener
The following emulates classical interleaving for SLP load permutes
that we are unlikely handling natively.  This is to handle cases
where interleaving (or load/store-lanes) is the optimal choice for
vectorizing even when we are doing that within SLP.  An example
would be

void foo (int * __restrict a, int * b)
{
  for (int i = 0; i < 16; ++i)
{
  a[4*i + 0] = b[4*i + 0] * 3;
  a[4*i + 1] = b[4*i + 1] + 3;
  a[4*i + 2] = (b[4*i + 2] * 3 + 3);
  a[4*i + 3] = b[4*i + 3] * 3;
}
}

where currently the SLP store is merging four single-lane SLP
sub-graphs but none of the loads in it can be code-generated
with V4SImode vectors and a VF of four as the permutes would need
three vectors.

The patch introduces a lowering phase after SLP discovery but
before SLP pattern recognition or permute optimization that
analyzes all loads from the same dataref group and creates an
interleaving scheme starting from an unpermuted load.

What can be handled is quite restrictive, matching only a subset
of the non-SLP interleaving cases (the power-of-two group size
ones, in addition only cases without gaps).  The interleaving
vectorization in addition can handle size 3 and 5 - but I am not
sure if it's possible to do that in a VL agnostic way.  It
should be still possible to set up the SLP graph in a way that
a load-lane could be matched from SLP pattern recognition.

As said gaps are currently not handled - for SLP we have a
representational issue that SLP_TREE_SCALAR_STMTS for "gap lanes"
would need to be filled in some way (even if we just push NULL).

The patch misses multi-level even/odd handling as well as CSEing
intermediate generated permutes.  Both is quite straight-forward
to add, but eventually there's a better or more general strategy
for lowering?  The main goal of the patch is to avoid falling
back to non-SLP for cases the interleaving code handles.

* tree-vect-slp.cc (vllp_cmp): New function.
(vect_lower_load_permutations): Likewise.
(vect_analyze_slp): Call it.
---
 gcc/tree-vect-slp.cc | 287 +++
 1 file changed, 287 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b88bba44760..1d5c4d99549 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3889,6 +3889,287 @@ vect_analyze_slp_instance (vec_info *vinfo,
   return res;
 }
 
+/* qsort comparator ordering SLP load nodes.  */
+
+static int
+vllp_cmp (const void *a_, const void *b_)
+{
+  const slp_tree a = *(const slp_tree *)a_;
+  const slp_tree b = *(const slp_tree *)b_;
+  stmt_vec_info a0 = SLP_TREE_SCALAR_STMTS (a)[0];
+  stmt_vec_info b0 = SLP_TREE_SCALAR_STMTS (b)[0];
+  if (STMT_VINFO_GROUPED_ACCESS (a0)
+  && STMT_VINFO_GROUPED_ACCESS (b0)
+  && DR_GROUP_FIRST_ELEMENT (a0) == DR_GROUP_FIRST_ELEMENT (b0))
+{
+  /* Same group, order after lanes used.  */
+  if (SLP_TREE_LANES (a) < SLP_TREE_LANES (b))
+   return 1;
+  else if (SLP_TREE_LANES (a) > SLP_TREE_LANES (b))
+   return -1;
+  else
+   {
+ /* Try to order loads using the same lanes together, breaking
+the tie with the lane number that first differs.  */
+ if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+ && !SLP_TREE_LOAD_PERMUTATION (b).exists ())
+   return 0;
+ else if (SLP_TREE_LOAD_PERMUTATION (a).exists ()
+  && !SLP_TREE_LOAD_PERMUTATION (b).exists ())
+   return 1;
+ else if (!SLP_TREE_LOAD_PERMUTATION (a).exists ()
+  && SLP_TREE_LOAD_PERMUTATION (b).exists ())
+   return -1;
+ else
+   {
+ for (unsigned i = 0; i < SLP_TREE_LANES (a); ++i)
+   if (SLP_TREE_LOAD_PERMUTATION (a)[i]
+   != SLP_TREE_LOAD_PERMUTATION (b)[i])
+ {
+   /* In-order lane first, that's what the above case for
+  no permutation does.  */
+   if (SLP_TREE_LOAD_PERMUTATION (a)[i] == i)
+ return -1;
+   else if (SLP_TREE_LOAD_PERMUTATION (b)[i] == i)
+ return 1;
+   else if (SLP_TREE_LOAD_PERMUTATION (a)[i]
+< SLP_TREE_LOAD_PERMUTATION (b)[i])
+ return -1;
+   else
+ return 1;
+ }
+ return 0;
+   }
+   }
+}
+  else /* Different groups or non-groups.  */
+{
+  /* Order groups as their first element to keep them together.  */
+  if (STMT_VINFO_GROUPED_ACCESS (a0))
+   a0 = DR_GROUP_FIRST_ELEMENT (a0);
+  if (STMT_VINFO_GROUPED_ACCESS (b0))
+   b0 = DR_GROUP_FIRST_ELEMENT (b0);
+  if (a0 == b0)
+   return 0;
+  /* Tie using UID.  */
+  else if (gimple_uid (STMT_VINFO_STMT (a0))
+  < gimple_uid (STMT_VINFO_STMT (b0)))
+   return -1;
+  else
+   {
+ gcc_assert (gimple_ui

RE: [PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

2024-07-03 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 3, 2024 5:04 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Match: Allow more types truncation for .SAT_TRUNC

On Tue, Jul 2, 2024 at 3:38 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC has the input and output types,  aka cvt from
> itype to otype and the sizeof (otype) < sizeof (itype).  The
> previous patch only allows the sizeof (otype) == sizeof (itype) / 2.
> But actually we have 1/4 and 1/8 truncation.
>
> This patch would like to support more types trunction when
> sizeof (otype) < sizeof (itype).  The below truncation will be
> covered.
>
> * uint64_t => uint8_t
> * uint64_t => uint16_t
> * uint64_t => uint32_t
> * uint32_t => uint8_t
> * uint32_t => uint16_t
> * uint16_t => uint8_t
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.

OK.

> gcc/ChangeLog:
>
> * match.pd: Allow any otype is less than itype truncation.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 7fff7b5f9fe..f708f4622bd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3239,16 +3239,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_trunc @0)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> (convert @0))
> - (with {
> + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> + (with
> +  {
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> unsigned otype_precision = TYPE_PRECISION (type);
> -   wide_int trunc_max = wi::mask (itype_precision / 2, false, 
> itype_precision);
> +   wide_int trunc_max = wi::mask (otype_precision, false, itype_precision);
> wide_int int_cst = wi::to_wide (@1, itype_precision);
>}
> -  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -   && TYPE_UNSIGNED (TREE_TYPE (@0))
> -   && otype_precision < itype_precision
> -   && wi::eq_p (trunc_max, int_cst)
> +  (if (otype_precision < itype_precision && wi::eq_p (trunc_max, 
> int_cst))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> --
> 2.34.1
>


[PATCH 2/5] extend SLP load permutation lowering

2024-07-03 Thread Richard Biener
The following extends the SLP load permutation to single-level
interleaving to handle the case we need multiple levels and adding
a fallback handling when the required permutes do not match
interleaving but would need three vectors to implement (and thus
fail).  With the change all single-lane SLP instances should be
supported with interleaving (with similar constraints as for non-SLP
but not implementing the identical 3 and 5 element group special
cases).

It also handles

void foo (int * __restrict a, int * b)
{
  for (int i = 0; i < 16; ++i)
{
  a[8*i + 0] = b[8*i + 0] * 3;
  a[8*i + 1] = b[8*i + 1] + 3;
  a[8*i + 2] = (b[8*i + 2] * 3 + 3);
  a[8*i + 3] = b[8*i + 3] * 3;
  a[8*i + 4] = b[8*i + 4] * 3;
  a[8*i + 5] = b[8*i + 5] + 3;
  a[8*i + 6] = (b[8*i + 6] * 3 + 3);
  a[8*i + 7] = b[8*i + 7] * 3;
}
}

albeit with 58 instead of 48 permutes.  The non-interleaving fallback
needs more work.

Next up is supporting gaps.

* tree-vect-slp.cc (vect_lower_load_permutations): Support
multi-level interleaving.  Support non-even/odd permutes.

* gcc.dg/vect/slp-11a.c: Expect SLP.
* gcc.dg/vect/slp-12a.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/slp-11a.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-12a.c |   2 +-
 gcc/tree-vect-slp.cc| 199 +---
 3 files changed, 122 insertions(+), 81 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-11a.c 
b/gcc/testsuite/gcc.dg/vect/slp-11a.c
index fcb7cf6c7a2..2efa1796757 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-11a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-11a.c
@@ -72,4 +72,4 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } 
} */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" } 
} */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c 
b/gcc/testsuite/gcc.dg/vect/slp-12a.c
index 2f98dc9da0b..fedf27b69d2 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
@@ -80,5 +80,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { 
! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! { vect_strided8 && vect_int_mult } } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 1d5c4d99549..6f3822af950 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3993,7 +3993,9 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
return;
   group_lanes++;
 }
-  /* Only a power-of-two number of lanes matches interleaving.  */
+  /* Only a power-of-two number of lanes matches interleaving with N levels.
+ ???  An even number of lanes could be reduced to 1<= group_lanes / 2)
continue;
 
-  /* Lower by reducing the group to half its size using an
-interleaving scheme.  For this try to compute whether all
-elements needed for this load are in even or odd elements of
-an even/odd decomposition with N consecutive elements.
-Thus { e, e, o, o, e, e, o, o } woud be an even/odd decomposition
-with N == 2.  */
-  unsigned even = (1 << ceil_log2 (DR_GROUP_SIZE (first))) - 1;
-  unsigned odd = even;
-  for (unsigned l : SLP_TREE_LOAD_PERMUTATION (load))
-   {
- even &= ~l;
- odd &= l;
-   }
-  /* Give up when this doesn't match up with an interleaving scheme.  */
-  if (!even && !odd)
-   continue;
-
   /* First build (and possibly re-use) a load node for the
 unpermuted group.  */
   vec stmts;
@@ -4047,66 +4038,105 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
final_perm.quick_push
  (std::make_pair (0, SLP_TREE_LOAD_PERMUTATION (load)[i]));
 
-  /* Now build an even or odd extraction from the unpermuted load.  */
-  lane_permutation_t perm;
-  perm.create (group_lanes / 2);
-  unsigned level;
-  if (even
- && ((level = 1 << ctz_hwi (even)), true)
- && group_lanes % (2 * level) == 0)
-   {
- /* { 0, 1, ... 4, 5 ..., } */
- unsigned level = 1 << ctz_hwi (even);
- for (unsigned i = 0; i < group_lanes / 2 / level; ++i)
- 

[PATCH 3/5] Handle gaps in SLP load permutation lowering

2024-07-03 Thread Richard Biener
The following adds handling of gaps by representing them with NULL
entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node.

The SLP discovery changes could be elided if we manually build the
load node instead.

* tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt.
(vect_build_slp_tree_2): Likewise.  Release load permutation
when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's
no actual permutation in that case.
(vect_lower_load_permutations): Handle gaps in loads.

* gcc.dg/vect/slp-51.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-51.c | 17 +++
 gcc/tree-vect-slp.cc   | 49 ++
 2 files changed, 47 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-51.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-51.c 
b/gcc/testsuite/gcc.dg/vect/slp-51.c
new file mode 100644
index 000..91ae763be30
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-51.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[4*i+0] = y[4*i+0];
+  x[4*i+1] = y[4*i+2] * 2;
+  x[4*i+2] = y[4*i+0] + 3;
+  x[4*i+3] = y[4*i+2] * 2 - 5;
+}
+}
+
+/* Check we can handle SLP with gaps and an interleaving scheme.  */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 6f3822af950..fdefee90e92 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1080,10 +1080,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   stmt_vec_info stmt_info;
   FOR_EACH_VEC_ELT (stmts, i, stmt_info)
 {
-  gimple *stmt = stmt_info->stmt;
   swap[i] = 0;
   matches[i] = false;
+  if (!stmt_info)
+   {
+ matches[i] = true;
+ continue;
+   }
 
+  gimple *stmt = stmt_info->stmt;
   if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "Build SLP for %G", stmt);
 
@@ -1984,10 +1989,16 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  stmt_vec_info first_stmt_info
= DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]);
  bool any_permute = false;
+ bool any_null = false;
  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info)
{
  int load_place;
- if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
+ if (! load_info)
+   {
+ load_place = j;
+ any_null = true;
+   }
+ else if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
load_place = vect_get_place_in_interleaving_chain
(load_info, first_stmt_info);
  else
@@ -1996,6 +2007,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  any_permute |= load_place != j;
  load_permutation.quick_push (load_place);
}
+ if (any_null)
+   {
+ gcc_assert (!any_permute);
+ load_permutation.release ();
+   }
 
  if (gcall *stmt = dyn_cast  (stmt_info->stmt))
{
@@ -3978,24 +3994,11 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
   stmt_vec_info first
 = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]);
 
-  /* ???  In principle we have to consider a gap up to the next full
- vector, but we have to actually represent a scalar stmt for the
- gaps value so delay handling this.  The same is true for
- inbetween gaps which the load places in the load-permutation
- represent.  It's probably not worth trying an intermediate packing
- to vectors without gap even if that might handle some more cases.
- Instead get the gap case correct in some way.  */
-  unsigned group_lanes = 0;
-  for (stmt_vec_info s = first; s; s = DR_GROUP_NEXT_ELEMENT (s))
-{
-  if ((s == first && DR_GROUP_GAP (s) != 0)
- || (s != first && DR_GROUP_GAP (s) != 1))
-   return;
-  group_lanes++;
-}
   /* Only a power-of-two number of lanes matches interleaving with N levels.
+ The non-SLP path also supports DR_GROUP_SIZE == 3.
  ???  An even number of lanes could be reduced to 1< stmts;
   stmts.create (group_lanes);
   for (stmt_vec_info s = first; s; s = DR_GROUP_NEXT_ELEMENT (s))
-   stmts.quick_push (s);
+   {
+ if (s != first)
+   for (unsigned i = 1; i < DR_GROUP_GAP (s); ++i)
+ stmts.quick_push (NULL);
+ stmts.quick_push (s);
+   }
+  for (unsigned i = 0; i < DR_GROUP_GAP (first); ++i)
+   stmts.quick_push (NULL);
   poly_uint64 max_nunits;
   bool *matches = XALLOCAVEC (bool, group_lanes);
   unsigned li

[PATCH 5/5] RISC-V: Support group size of three in SLP store permute lowering

2024-07-03 Thread Richard Biener
The following implements the group-size three scheme from
vect_permute_store_chain in SLP grouped store permute lowering
and extends it to power-of-two multiples of group size three.

The scheme goes from vectors A, B and C to
{ A[0], B[0], C[0], A[1], B[1], C[1], ... } by first producing
{ A[0], B[0], X, A[1], B[1], X, ... } (with X random but chosen
to A[n]) and then permuting in C[n] in the appropriate places.

The extension goes as to replace vector elements with a
power-of-two number of lanes and you'd get pairwise interleaving
until the final three input permutes happen.

The last permute step could be seen as extending C to { C[0], C[0],
C[0], ... } and then performing a blend.

VLA archs will want to use store-lanes here I guess, I'm not sure
if the three vector interleave operation is also available with
a register source and destination and thus available for a shuffle.

* tree-vect-slp.cc (vect_build_slp_instance): Special case
three input permute with the same number of lanes in store
permute lowering.

* gcc.dg/vect/slp-53.c: New testcase.
* gcc.dg/vect/slp-54.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-53.c | 15 +++
 gcc/testsuite/gcc.dg/vect/slp-54.c | 18 +
 gcc/tree-vect-slp.cc   | 65 +-
 3 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-53.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-54.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-53.c 
b/gcc/testsuite/gcc.dg/vect/slp-53.c
new file mode 100644
index 000..d00ca236958
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-53.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[3*i+0] = y[2*i+0] * 7 + 5;
+  x[3*i+1] = y[2*i+1] * 2;
+  x[3*i+2] = y[2*i+0] + 3;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-54.c 
b/gcc/testsuite/gcc.dg/vect/slp-54.c
new file mode 100644
index 000..57268ab50b7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-54.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
+  y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__);
+  for (int i = 0; i < 1024; ++i)
+{
+  x[6*i+0] = y[4*i+0] * 7 + 5;
+  x[6*i+1] = y[4*i+1] * 2;
+  x[6*i+2] = y[4*i+2] + 3;
+  x[6*i+3] = y[4*i+3] * 7 + 5;
+  x[6*i+4] = y[4*i+0] * 2;
+  x[6*i+5] = y[4*i+3] + 3;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index c62b0b5cf88..fd4b4574a2f 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3722,6 +3722,69 @@ vect_build_slp_instance (vec_info *vinfo,
 when the number of lanes is even.  */
  while (SLP_TREE_CHILDREN (perm).length () > 2)
{
+ /* When we have three equal sized groups left the pairwise
+reduction does not result in a scheme that avoids using
+three vectors.  Instead merge the first two groups
+to the final size with do-not-care elements (chosen
+from the first group) and then merge with the third.
+  { A0, B0,  x, A1, B1,  x, ... }
+   -> { A0, B0, C0, A1, B1, C1, ... }
+This handles group size of three (and at least
+power-of-two multiples of that).  */
+ if (SLP_TREE_CHILDREN (perm).length () == 3
+ && (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[0])
+ == SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[1]))
+ && (SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[0])
+ == SLP_TREE_LANES (SLP_TREE_CHILDREN (perm)[2])))
+   {
+ int ai = 0;
+ int bi = 1;
+ slp_tree a = SLP_TREE_CHILDREN (perm)[ai];
+ slp_tree b = SLP_TREE_CHILDREN (perm)[bi];
+ unsigned n = SLP_TREE_LANES (perm);
+
+ slp_tree permab
+   = vect_create_new_slp_node (2, VEC_PERM_EXPR);
+ SLP_TREE_LANES (permab) = n;
+ SLP_TREE_LANE_PERMUTATION (permab).create (n);
+ SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm);
+ /* ???  Should be NULL but that's not expected.  */
+ SLP_TREE_REPRESENTATIVE (permab)
+   = SLP_TREE_REPRESE

[PATCH 4/5] Support group-size of three in SLP load permutation lowering

2024-07-03 Thread Richard Biener
The following adds support for group-size three in SLP load permutation
lowering to match the non-SLP capabilities.  This is done by using
the non-interleaving fallback code which then creates at VF == 4 from
{ { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } }
the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 }
to produce { c0, c1, c2, c3 }.

This turns out to be more effective than the scheme implemented
for non-SLP for SSE and only slightly worse for AVX512 and a bit
more worse for AVX2.  It seems to me that this would extend to
other non-power-of-two group-sizes though (but the patch does not).
Optimal schemes are likely difficult to lay out in VF agnostic form.

I'll note that while the lowering assumes even/odd extract is
generally available for all vector element sizes (which is probably
a good assumption), it doesn't in any way constrain the other
permutes it generates based on target availability.  Again difficult
to do in a VF agnostic way (but at least currently the vector type
is fixed).

I'll also note that the SLP store side merges lanes in a way
producing three-vector permutes for store group-size of three, so
the testcase uses a store group-size of four.

* tree-vect-slp.cc (vect_lower_load_permutations): Support
group-size of three.

* gcc.dg/vect/slp-52.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/slp-52.c | 14 
 gcc/tree-vect-slp.cc   | 35 +-
 2 files changed, 34 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/slp-52.c

diff --git a/gcc/testsuite/gcc.dg/vect/slp-52.c 
b/gcc/testsuite/gcc.dg/vect/slp-52.c
new file mode 100644
index 000..ba49f0046e2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/slp-52.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+void foo (int * __restrict x, int *y)
+{
+  for (int i = 0; i < 1024; ++i)
+{
+  x[4*i+0] = y[3*i+0];
+  x[4*i+1] = y[3*i+1] * 2;
+  x[4*i+2] = y[3*i+2] + 3;
+  x[4*i+3] = y[3*i+2] * 2 - 5;
+}
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { 
vect_int && vect_int_mult } } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fdefee90e92..c62b0b5cf88 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -3718,7 +3718,8 @@ vect_build_slp_instance (vec_info *vinfo,
 with the least number of lanes to one and then repeat until
 we end up with two inputs.  That scheme makes sure we end
 up with permutes satisfying the restriction of requiring at
-most two vector inputs to produce a single vector output.  */
+most two vector inputs to produce a single vector output
+when the number of lanes is even.  */
  while (SLP_TREE_CHILDREN (perm).length () > 2)
{
  /* Pick the two nodes with the least number of lanes,
@@ -3995,11 +3996,10 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
 = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]);
 
   /* Only a power-of-two number of lanes matches interleaving with N levels.
- The non-SLP path also supports DR_GROUP_SIZE == 3.
  ???  An even number of lanes could be reduced to 1<= group_lanes / 2)
+  if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
continue;
 
   /* First build (and possibly re-use) a load node for the
@@ -4052,7 +4052,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
   while (1)
{
  unsigned group_lanes = SLP_TREE_LANES (l0);
- if (SLP_TREE_LANES (load) >= group_lanes / 2)
+ if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2)
break;
 
  /* Try to lower by reducing the group to half its size using an
@@ -4062,19 +4062,24 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo,
 Thus { e, e, o, o, e, e, o, o } woud be an even/odd decomposition
 with N == 2.  */
  /* ???  Only an even number of lanes can be handed this way, but the
-fallback below could work for any number.  */
- gcc_assert ((group_lanes & 1) == 0);
- unsigned even = (1 << ceil_log2 (group_lanes)) - 1;
- unsigned odd = even;
- for (auto l : final_perm)
+fallback below could work for any number.  We have to make sure
+to round up in that case.  */
+ gcc_assert ((group_lanes & 1) == 0 || group_lanes == 3);
+ unsigned even = 0, odd = 0;
+ if ((group_lanes & 1) == 0)
{
- even &= ~l.second;
- odd &= l.second;
+ even = (1 << ceil_log2 (group_lanes)) - 1;
+ odd = even;
+ for (auto l : final_perm)
+   {
+ even &= ~l.second;
+ odd &= l.second;
+   }
}
 
  /* Now build an even or odd extraction fro

RE: [PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

2024-07-03 Thread Li, Pan2
> OK.

Committed, thanks Richard.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 3, 2024 5:06 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v2] Vect: Support IFN SAT_TRUNC for unsigned vector int

On Wed, Jul 3, 2024 at 3:33 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the .SAT_TRUNC for the unsigned
> vector int.  Given we have below example code:
>
> Form 1
>   #define VEC_DEF_SAT_U_TRUC_FMT_1(NT, WT) \
>   void __attribute__((noinline))   \
>   vec_sat_u_truc_##WT##_to_##NT##_fmt_1 (NT *x, WT *y, unsigned limit) \
>   {\
> for (unsigned i = 0; i < limit; i++)   \
>   {\
> bool overflow = y[i] > (WT)(NT)(-1);   \
> x[i] = ((NT)y[i]) | (NT)-overflow; \
>   }\
>   }
>
> VEC_DEF_SAT_U_TRUC_FMT_1 (uint32_t, uint64_t)
>
> Before this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _51 = .SELECT_VL (ivtmp_49, POLY_INT_CST [2, 2]);
>   ivtmp_35 = _51 * 8;
>   vect__4.7_32 = .MASK_LEN_LOAD (vectp_y.5_34, 64B, { -1, ... }, _51, 0);
>   mask_overflow_16.8_30 = vect__4.7_32 > { 4294967295, ... };
>   vect__5.9_29 = (vector([2,2]) unsigned int) vect__4.7_32;
>   vect__10.13_20 = .VCOND_MASK (mask_overflow_16.8_30, { 4294967295, ... }, 
> vect__5.9_29);
>   ivtmp_12 = _51 * 4;
>   .MASK_LEN_STORE (vectp_x.14_11, 32B, { -1, ... }, _51, 0, vect__10.13_20);
>   vectp_y.5_33 = vectp_y.5_34 + ivtmp_35;
>   vectp_x.14_46 = vectp_x.14_11 + ivtmp_12;
>   ivtmp_50 = ivtmp_49 - _51;
>   if (ivtmp_50 != 0)
>   ...
> }
>
> After this patch:
> void vec_sat_u_truc_uint64_t_to_uint32_t_fmt_1 (uint32_t * x, uint64_t * y, 
> unsigned int limit)
> {
>   ...
>   _12 = .SELECT_VL (ivtmp_21, POLY_INT_CST [2, 2]);
>   ivtmp_34 = _12 * 8;
>   vect__4.7_31 = .MASK_LEN_LOAD (vectp_y.5_33, 64B, { -1, ... }, _12, 0);
>   vect_patt_40.8_30 = .SAT_TRUNC (vect__4.7_31); // << .SAT_TRUNC
>   ivtmp_29 = _12 * 4;
>   .MASK_LEN_STORE (vectp_x.9_28, 32B, { -1, ... }, _12, 0, vect_patt_40.8_30);
>   vectp_y.5_32 = vectp_y.5_33 + ivtmp_34;
>   vectp_x.9_27 = vectp_x.9_28 + ivtmp_29;
>   ivtmp_20 = ivtmp_21 - _12;
>   if (ivtmp_20 != 0)
>   ...
> }
>
> The below test suites are passed for this patch
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The rv64gcv fully regression tests.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (gimple_unsigned_integer_sat_trunc): Add
> new decl generated by match.
> (vect_recog_sat_trunc_pattern): Add new func impl to recog the
> .SAT_TRUNC pattern.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 54 +++
>  1 file changed, 54 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..86e893a1c43 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4489,6 +4489,7 @@ vect_recog_mult_pattern (vec_info *vinfo,
>
>  extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
>  extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
> +extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
>
>  static gimple *
>  vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info 
> stmt_info,
> @@ -4603,6 +4604,58 @@ vect_recog_sat_sub_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to detect saturation truncation pattern (SAT_TRUNC), aka below gimple:
> + *   overflow_5 = x_4(D) > 4294967295;
> + *   _1 = (unsigned int) x_4(D);
> + *   _2 = (unsigned int) overflow_5;
> + *   _3 = -_2;
> + *   _6 = _1 | _3;
> + *
> + * And then simplied to
> + *   _6 = .SAT_TRUNC (x_4(D));
> + */
> +
> +static gimple *
> +vect_recog_sat_trunc_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> + tree *type_out)
> +{
> +  gimple *last_stmt = STMT_VINFO_STMT (stmt_vinfo);
> +
> +  if (!is_gimple_assign (last_stmt))
> +return NULL;
> +
> +  tree ops[1];
> +  tree lhs = gimple_assign_lhs (last_stmt);
> +
> +  if (gimple_unsigned_integer_sat_trunc (lhs, ops, NULL))
> +{
> +  tree itype = TREE_TYPE (ops[0]);
> +  tree otype = TREE_TYPE (lhs);
> +  tree v_itype = get_vectype_for_scalar_type (vinfo, itype);
> +  tree v_otype = get_vectype_for_scalar_type (vinfo, otype);
> +  internal_fn fn = IFN_SAT_TRUNC;
> +
> +  if (v_itype != NULL_TREE && v_otype != NULL_TREE
> +   

RE: [PATCH v1] Vect: Distribute truncation into .SAT_SUB operands

2024-07-03 Thread Li, Pan2
Thanks Richard for comments.

> Isn't bound < otype_max OK as well?

Yes, less than or equal is OK as well.

> Given your example I wonder if you instead want to use
> vect_look_through_possible_promotion?  Because ...

> .. if you do it like this the widened op is still there and vectorized
> and the whole
> point is to make it possible to use a smaller vectorization factor?

Got it, will have a try thru vect_look_through_possible_promotion.

> I think you want to check that the target supports vectorizing
> MIN_EXPR in this type.

Sure.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, July 3, 2024 5:03 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] Vect: Distribute truncation into .SAT_SUB operands

On Sun, Jun 30, 2024 at 5:13 AM  wrote:
>
> From: Pan Li 
>
> To get better vectorized code of .SAT_SUB,  we would like to avoid the
> truncated operation for the assignment.  For example, as below.
>
> unsigned int _1;
> unsigned int _2;
> _9 = (unsigned short int).SAT_SUB (_1, _2);
>
> If we make sure that the _1 is in the range of unsigned short int.  Such
> as a def similar to:
>
> _1 = (unsigned short int)_4;
>
> Then we can do the distribute the truncation operation to:
>
> _3 = MIN_EXPR (_2, 65535);
> _9 = .SAT_SUB ((unsigned short int)_1, (unsigned short int)_3);
>
> Let's take RISC-V vector as example to tell the changes.  For below
> sample code:
>
> __attribute__((noinline))
> void test (uint16_t *x, unsigned b, unsigned n)
> {
>   unsigned a = 0;
>   uint16_t *p = x;
>
>   do {
> a = *--p;
> *p = (uint16_t)(a >= b ? a - b : 0);
>   } while (--n);
> }
>
> Before this patch:
>   ...
>   .L3:
>   vle16.v   v1,0(a3)
>   vrsub.vx  v5,v2,t1
>   mvt3,a4
>   addw  a4,a4,t5
>   vrgather.vv   v3,v1,v5
>   vsetvli   zero,zero,e32,m1,ta,ma
>   vzext.vf2 v1,v3
>   vssubu.vx v1,v1,a1
>   vsetvli   zero,zero,e16,mf2,ta,ma
>   vncvt.x.x.w   v1,v1
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t4
>   bgtu  t6,a4,.L3
>   ...
>
> After this patch:
> test:
>   ...
>   .L3:
>   vle16.v   v3,0(a3)
>   vrsub.vx  v5,v2,a6
>   mva7,a4
>   addw  a4,a4,t3
>   vrgather.vv   v1,v3,v5
>   vssubu.vv v1,v1,v6
>   vrgather.vv   v3,v1,v5
>   vse16.v   v3,0(a3)
>   sub   a3,a3,t1
>   bgtu  t4,a4,.L3
>   ...
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
> * tree-vect-patterns.cc (vect_recog_sat_sub_pattern_distribute):
> Add new func impl to perform the truncation distribution.
> (vect_recog_sat_sub_pattern): Perform above optimize before
> generate .SAT_SUB call.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/tree-vect-patterns.cc | 73 +++
>  1 file changed, 73 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 519d15f2a43..7329ecec2c4 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4565,6 +4565,77 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
> stmt_vec_info stmt_vinfo,
>return NULL;
>  }
>
> +/*
> + * Try to distribute the truncation for .SAT_SUB pattern,  mostly occurs in
> + * the benchmark zip.  Aka:
> + *
> + *   unsigned int _1;
> + *   unsigned int _2;
> + *   _9 = (unsigned short int).SAT_SUB (_1, _2);
> + *
> + *   if _1 is known to be in the range of unsigned short int.  For example
> + *   there is a def _1 = (unsigned short int)_4.  Then we can distribute the
> + *   truncation to:
> + *
> + *   _3 = MIN (65535, _2);
> + *   _9 = .SAT_SUB ((unsigned short int)_1, (unsigned short int)_3);
> + *
> + *   Then,  we can better vectorized code and avoid the unnecessary narrowing
> + *   stmt during vectorization.
> + */
> +static void
> +vect_recog_sat_sub_pattern_distribute (vec_info *vinfo,
> +  stmt_vec_info stmt_vinfo,
> +  gimple *stmt, tree lhs, tree *ops)
> +{
> +  tree otype = TREE_TYPE (lhs);
> +  tree itype = TREE_TYPE (ops[0]);
> +
> +  if (types_compatible_p (otype, itype))
> +return;
> +
> +  unsigned itype_prec = TYPE_PRECISION (itype);
> +  unsigned otype_prec = TYPE_PRECISION (otype);
> +
> +  if (otype_prec >= itype_prec)
> +return;
> +
> +  int_range_max r;
> +  gimple_ranger granger;
> +
> +  if (granger.range_of_expr (r, ops[0], stmt) && !r.undefined_p ())
> +{
> +  wide_int bound = r.upper_bound ();
> +  wide_int otype_max = wi::mask (otype_prec, /* negate */false, 
> itype_prec);
> +
> +  if (bound != otype_max)

Isn't bound < otype_max OK as well?

Given your example I wonder if you instead want to use
vect_look_through_possible_promotion?  Because ...

> +   re

[MAINTAINERS] Update my email address

2024-07-03 Thread Prathamesh Kulkarni
Pushing to trunk.

Signed-off-by: Prathamesh Kulkarni  

Thanks,
Prathamesh


Re: [PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread Richard Biener
On Wed, Jul 3, 2024 at 9:25 AM liuhongt  wrote:
>
> The patch can avoid SIGILL on non-AVX512 machine due to kmovd is
> generated in dynamic check.
>
> Committed as an obvious fix.

Hmm, now all avx512 tests SIGILL when testing with -m32:

Dump of assembler code for function __get_cpuid_count:
=> 0x08049500 <+0>: kmovd  %eax,%k2
   0x08049504 <+4>: kmovd  %edx,%k1
   0x08049508 <+8>: pushf
   0x08049509 <+9>: pushf
   0x0804950a <+10>:pop%eax
   0x0804950b <+11>:mov%eax,%edx

looks like __get_cpuid_count is no longer inlined but AVX512 is in
effect for it.

Maybe use #pragma GCC target around the includes instead?

> gcc/testsuite/ChangeLog:
>
> PR target/115748
> * gcc.target/i386/avx512-check.h: Move runtime check into a
> separate function and guard it with target ("no-avx").
> ---
>  gcc/testsuite/gcc.target/i386/avx512-check.h | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512-check.h 
> b/gcc/testsuite/gcc.target/i386/avx512-check.h
> index 0ad9064f637..71858a33dac 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512-check.h
> +++ b/gcc/testsuite/gcc.target/i386/avx512-check.h
> @@ -34,8 +34,9 @@ check_osxsave (void)
>return (ecx & bit_OSXSAVE) != 0;
>  }
>
> +__attribute__((noipa,target("no-avx")))
>  int
> -main ()
> +avx512_runtime_support_p ()
>  {
>unsigned int eax, ebx, ecx, edx;
>
> @@ -100,6 +101,17 @@ main ()
>&& (edx & bit_AVX512VP2INTERSECT)
>  #endif
>&& avx512f_os_support ())
> +{
> +  return 1;
> +}
> +
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  if (avx512_runtime_support_p ())
>  {
>DO_TEST ();
>  #ifdef DEBUG
> --
> 2.31.1
>


RE: [MAINTAINERS] Update my email address

2024-07-03 Thread Prathamesh Kulkarni
Sorry, forgot to attach diff.


-Original Message-
From: Prathamesh Kulkarni  
Sent: Wednesday, July 3, 2024 7:04 PM
To: gcc-patches@gcc.gnu.org
Subject: [MAINTAINERS] Update my email address

External email: Use caution opening links or attachments


Pushing to trunk.

Signed-off-by: Prathamesh Kulkarni  

Thanks,
Prathamesh
[MAINTAINERS] Update my email address.

* MAINTAINERS: Update my email address and add myself to DCO.

Signed-off-by: Prathamesh Kulkarni  

diff --git a/MAINTAINERS b/MAINTAINERS
index 41319595bb5..2218f81194f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -512,7 +512,7 @@ Matt Kraai  

 Jan Kratochvil 
 Matthias Kretz 
 Louis Krupp
-Prathamesh Kulkarni
+Prathamesh Kulkarni
 Venkataramanan Kumar   
 Doug Kwan  
 Aaron W. LaFramboise   
@@ -792,3 +792,4 @@ Jonathan Wakely 

 Alexander Westbrooks   
 Chung-Ju Wu
 Pengxuan Zheng 
+Prathamesh Kulkarni


[PATCH v2] diagnostics: Follow DECL_ORIGIN in lhd_decl_printable_name [PR102061]

2024-07-03 Thread Peter Damianov
Currently, if a warning references a cloned function, the name of the cloned
function will be emitted in the "In function 'xyz'" part of the diagnostic,
which users aren't supposed to see. This patch follows the DECL_ORIGIN link
to get the name of the original function.

gcc/ChangeLog:
PR diagnostics/102061
* langhooks.cc (lhd_decl_printable_name): Follow DECL_ORIGIN
link

Signed-off-by: Peter Damianov 
---
v2: use DECL_ORIGIN instead of DECL_ABSTRACT_ORIGIN and remove loop

 gcc/langhooks.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 61f2b676256..943b8345a95 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -223,6 +223,7 @@ lhd_get_alias_set (tree ARG_UNUSED (t))
 const char *
 lhd_decl_printable_name (tree decl, int ARG_UNUSED (verbosity))
 {
+  decl = DECL_ORIGIN(decl);
   gcc_assert (decl && DECL_NAME (decl));
   return IDENTIFIER_POINTER (DECL_NAME (decl));
 }
-- 
2.39.2



[PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread pan2 . li
From: Pan Li 

According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra

PR target/115763

gcc/ChangeLog:

* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector.md| 49 +--
 .../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
 .../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
 .../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
 .../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
 .../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
 .../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
 7 files changed, 64 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
 
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
-   (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+   (if_then_else:V_VLSF
  (unspec:
-   [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, 
vm, vm,Wc1,Wc1,Wb1,Wb1")
-(match_operand 4 "vector_length_operand"  " rK, rK, 
rK, rK, rK, rK, rK, rK")
-(match_operand 5 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
-(match_operand 6 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
-(match_operand 7 "const_int_operand"  "  i,  i,  
i,  i,  i,  i,  i,  i")
+   [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1, 
Wb1, Wb1")
+(match_operand  4 "vector_length_operand" " rK,  rK,  
rK,  rK")
+(match_operand  5 "const_int_operand" "  i,   i,   
i,   i")
+(match_operand  6 "const_int_operand" "  i,   i,   
i,   i")
+(match_operand  7 "const_int_operand" "  i,   i,   
i,   i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
- (vec_duplicate:V_VLSF_ZVFHMIN
-   (match_operand: 3 "direct_broadcast_operand"   " f,  
f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
- (match_operand:V_VLSF_ZVFHMIN 2 "vector_merge_operand""vu,  0, 
vu,  0, vu,  0, vu,  0")))]
+ (vec_duplicate:V_VLSF
+   (match_operand: 3 "direct_broadcast_operand"  "  f,   f,   
f,   f"))
+ (match_operand:V_VLSF  2 "vector_merge_operand"  " vu,   0,  
vu,   0")))]
   "TARGET_VECTOR"
   "@
vfmv.v.f\t%0,%3
vfmv.v.f\t%0,%3
+   vfmv.s.f\t%0,%3
+   vfmv.s.f\t%0,%3"
+  [(set_attr "type" "vfmov,vfmov,vfmovfv,vfmovfv")
+   (set_attr "mode" "")])
+
+(define_insn "*pred_broadcast_zvfhmin"
+  [(set (match_operand:V_VLSF_ZVFHMIN   0 "register_operand"  
"=vr,  vr,  vr,  vr")
+   (if_then_else:V_VLSF_ZVFHMIN
+ (unspec:
+   [(match_operand:1 "vector_broadcast_mask_operand" " vm, 
 vm, Wc1, Wc1")
+(match_operand 4 "vector_length_operand" " rK, 
 rK,  rK,  rK")
+(match_operand 5 "const_int_operand" "  i, 
  i,   i,   i")
+(match_operand 6 "const

Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread 钟居哲
LGTM。



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-03 22:17
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]
From: Pan Li 
 
According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.
 
This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:
 
void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}
 
when compile with -march=rv64gcv_zfh_zvfhmin
 
Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret
 
After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra
 
PR target/115763
 
gcc/ChangeLog:
 
* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/vector.md| 49 +--
.../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
.../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
.../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
.../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
7 files changed, 64 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
- (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+ (if_then_else:V_VLSF
  (unspec:
- [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm, 
vm,Wc1,Wc1,Wb1,Wb1")
-  (match_operand 4 "vector_length_operand"  " rK, rK, rK, rK, 
rK, rK, rK, rK")
-  (match_operand 5 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 6 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 7 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
+ [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1, Wb1, 
Wb1")
+  (match_operand  4 "vector_length_operand" " rK,  rK,  rK,  
rK")
+  (match_operand  5 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  6 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  7 "const_int_operand" "  i,   i,   i,   
i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (vec_duplicate:V_VLSF_ZVFHMIN
- (match_operand: 3 "direct_broadcast_operand"   " f,  
f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
-   (match_operand:V_VLSF_ZVFHMIN 2 "vector_merge_operand""vu,  0, vu,  0, 
vu,  0, vu,  0")))]
+   (vec_duplicate:V_VLSF
+ (match_operand: 3 "direct_broadcast_operand"  "  f,   f,   f,   
f"))
+   (match_operand:V_VLSF  2 "vector_merge_operand"  " vu,   0,  vu,   
0")))]
   "TARGET_VECTOR"
   "@
vfmv.v.f\t%0,%3
vfmv.v.f\t%0,%3
+   vfmv.s.f\t%0,%3
+   vfmv.s.f\t%0,%3"
+  [(set_attr "type" "vfmov,vfmov,vfmovfv,vfmovfv")
+   (set_attr "mode" "")])
+
+(define_insn "*pred_broadcast_zvfhmin"
+  [(set (match_operand:V_VLSF_ZVFHMIN   0 "register_operand"  
"=vr,  vr,  vr,  vr")
+ (if_then_else:V_VLSF_ZVFHMIN
+   (unspec:
+ [(match_operand:1 "vector_broadcast_mask_operand" " vm,  vm, 
Wc1, Wc1")
+  (match_operand 4 "vector_length_operand" " rK,  rK,  
rK,  rK")
+  (match_operand 5 "const_int_operand" "  i,   i,  
 i,   i")
+  (match_operand 6 "const_int_operand" 

Re: [RFC][PATCH v1] Provide more contexts for -Warray-bounds warning messages

2024-07-03 Thread Qing Zhao


> On Jul 2, 2024, at 18:02, David Malcolm  wrote:
> 
> On Tue, 2024-07-02 at 16:17 +, Qing Zhao wrote:
>> due to code duplication from jump threading [PR109071]
>> Control this with a new option -fdiagnostic-try-to-explain-harder.
> 
> The name -fdiagnostic-try-to-explain-harder seems a little too "cute"
> to me, but I can't think of a better name.
Me either -:)
> 
> Various comments inline below...  I'm sorry I didn't take a close look
> at the copy history implementation; I'm hoping Richi will dig into that
> part of the patch.
> 
>> 
>> This patch has been tested with -fdiagnostic-try-to-expain-harder on
>> by
>> default to bootstrap gcc and regression testing on both x86 and
>> aarch64,
>> resolved all bootstrap issues and regression testing issues. 
>> 
>> I need some help in the following two items:
>> 1. suggestions on better documentation wordings for the new option.
>> 2. checking on the new data structures copy_history and the
>>memory management for it.
>>In the beginning, I tried to use the GGC for it.
>>but have met quite some issues (possibly because the gimple and
>> the containing
>>basic block elimination), then I gave up the GGC scheme and
>> instead
>>used the obstack and manually clean and deleted the obstack in the
>> end of
>>the compilation.  
>> 
>> The following is more details on the patchs
>> 
>> Thanks a lot!
>> 
>> Qing
>> 
>> $ cat t.c
>> extern void warn(void);
>> static inline void assign(int val, int *regs, int *index)
>> {
>>   if (*index >= 4)
>> warn();
>>   *regs = val;
>> }
>> struct nums {int vals[4];};
>> 
>> void sparx5_set (int *ptr, struct nums *sg, int index)
>> {
>>   int *val = &sg->vals[index];
>> 
>>   assign(0,ptr, &index);
>>   assign(*val, ptr, &index);
>> }
>> 
>> $ gcc -Wall -O2  -c -o t.o t.c
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of
>> ‘int[4]’ [-Warray-bounds=]
>>12 |   int *val = &sg->vals[index];
>>   |   ^~~
>> t.c:8:18: note: while referencing ‘vals’
>> 8 | struct nums {int vals[4];};
>>   |  ^~~~
>> 
>> In the above, Although the warning is correct in theory, the warning
>> message
>> itself is confusing to the end-user since there is information that
>> cannot
>> be connected to the source code directly.
>> 
>> It will be a nice improvement to add more information in the warning
>> message
>> to report where such index value come from.
>> 
>> In order to achieve this, we add a new data structure copy_history to
>> record
>> the condition and the transformation that triggered the code
>> duplication.
>> Whenever there is a code duplication due to some specific
>> transformations,
>> such as jump threading, loop switching, etc, a copy_history structure
>> is
>> created and attached to the duplicated gimple statement.
>> 
>> During array out-of-bound checking or other warning checking, the
>> copy_history
>> that was attached to the gimple statement is used to form a sequence
>> of
>> diagnostic events that are added to the corresponding rich location
>> to be used
>> to report the warning message.
>> 
>> This behavior is controled by the new option -fdiagnostic-try-to-
>> explain-harder
>> which is off by default.
>> 
>> With this change, by adding -fdiagnostic-try-to-explain-harder,
>> the warning message for the above testing case is now:
>> 
>> t.c: In function ‘sparx5_set’:
>> t.c:12:23: warning: array subscript 4 is above array bounds of
>> ‘int[4]’ [-Warray-bounds=]
>>12 |   int *val = &sg->vals[index];
>>   |   ^~~
>>   event 1
>> |
>> |4 |   if (*index >= 4)
>> |  |  ^
>> |  |  |
>> |  |  (1) when the condition is evaluated to true
>> |
>> t.c:8:18: note: while referencing ‘vals’
>> 8 | struct nums {int vals[4];};
>>   |  ^~~~
> 
> BTW I notice in the above example you have the extra vertical line
> below "event 1" here:
> 
>  event 1
>|
>|4 |   if (*index >= 4)
>|  |  ^
>|  |  |
>|  |  (1) when the condition is evaluated to true
>|
> 
> whereas in the testcase it looks like you're expecting the simpler:
> 
>  event 1
>   4 |   if (*index >= 4)
> |  ^
> |  |
> |  (1) when the condition is evaluated to true
> 
> which is due to r15-533-g3cd267446755ab, so I think the example text is
> slightly outdated.

Yes, you are right, I updated the format in the testing case but forgot to 
update it in
the comment of my patch. 
I will update the comment of the patch too.
> 
> I wonder if the wording of the event could be improved to better
> explain to the user what the optimizer is "thinking".
> 
> Perhaps it could be two events:
> 
> (1) when specializing the code for both branches...
> (2) ...and considering the 'true' branch...
> 
> or something like that?  (I'm not sure)
I can explain the event in more details as you

[PATCH] ARC: Update gcc.target/arc/pr9001184797.c test

2024-07-03 Thread Luis Silva
... to comply with new standards due to stricter analysis in
the latest GCC versions.

gcc/testsuite/ChangeLog:

* gcc.target/arc/pr9001184797.c: (Fix compiler warnings)
---
 gcc/testsuite/ChangeLog | 4 
 gcc/testsuite/gcc.target/arc/pr9001184797.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9aeec32f9e6..bd825881b75 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2024-07-03  Luis Silva  
+
+   * gcc.target/arc/pr9001184797.c: (Fix compiler warnings)
+
 2024-07-02  Pengxuan Zheng  
 
PR target/113859
diff --git a/gcc/testsuite/gcc.target/arc/pr9001184797.c 
b/gcc/testsuite/gcc.target/arc/pr9001184797.c
index e76c6769042..6c5de5fe729 100644
--- a/gcc/testsuite/gcc.target/arc/pr9001184797.c
+++ b/gcc/testsuite/gcc.target/arc/pr9001184797.c
@@ -4,13 +4,15 @@
 
 /* This test studies the use of anchors and tls symbols. */
 
+extern int h();
+
 struct a b;
 struct a {
   long c;
   long d
 } e() {
   static __thread struct a f;
-  static __thread g;
+  static __thread int g;
   g = 5;
   h();
   if (f.c)
-- 
2.37.1



Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread Kito Cheng
LGTM and ok for gcc 14 as well,
btw an idea is that actually could passed via gpr, I mean fpr->gpr and then
vmv.v.x, but it's not block commend for this patch.

钟居哲  於 2024年7月3日 週三 22:18 寫道:

> LGTM。
>
> --
> juzhe.zh...@rivai.ai
>
>
> *From:* pan2.li 
> *Date:* 2024-07-03 22:17
> *To:* gcc-patches 
> *CC:* juzhe.zhong ; kito.cheng
> ; jeffreyalaw ; rdapp.gcc
> ; Pan Li 
> *Subject:* [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW
> [PR115763]
> From: Pan Li 
>
> According to the ISA,  the zvfhmin sub extension should only contain
> convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
> present when only the zvfhmin option is given.
>
> This patch would like to fix it by split the pred_broadcast define_insn
> into zvfhmin and zvfh part.  Given below example:
>
> void test (_Float16 *dest, _Float16 bias) {
>   dest[0] = bias;
>   dest[1] = bias;
> }
>
> when compile with -march=rv64gcv_zfh_zvfhmin
>
> Before this patch:
> test:
>   vsetivlizero,2,e16,mf4,ta,ma
>   vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
>   vse16.v v1,0(a0)
>   ret
>
> After this patch:
> test:
>   addi sp,sp,-16
>   fsh  fa0,14(sp)
>   addi a5,sp,14
>   vsetivli zero,2,e16,mf4,ta,ma
>   vlse16.v v1,0(a5),zero
>   vse16.v  v1,0(a0)
>   addi sp,sp,16
>   jr   ra
>
> PR target/115763
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md (*pred_broadcast): Split into
> zvfh and zvfhmin part.
> (*pred_broadcast_zvfh): New define_insn for zvfh part.
> (*pred_broadcast_zvfhmin): Ditto but for zvfhmin.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
> * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
> * gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
> * gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
> * gcc.target/riscv/rvv/base/pr115763-1.c: New test.
> * gcc.target/riscv/rvv/base/pr115763-2.c: New test.
>
> Signed-off-by: Pan Li 
> ---
> gcc/config/riscv/vector.md| 49 +--
> .../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
> .../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
> .../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
> .../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
> .../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
> .../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
> 7 files changed, 64 insertions(+), 26 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
> create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index fe18ee5b5f7..d9474262d54 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
>[(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
> (set_attr "mode" "")])
> -(define_insn "*pred_broadcast"
> -  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr,
> vr, vr, vr, vr, vr, vr, vr")
> - (if_then_else:V_VLSF_ZVFHMIN
> +(define_insn "*pred_broadcast_zvfh"
> +  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,
> vr,  vr,  vr")
> + (if_then_else:V_VLSF
>   (unspec:
> - [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm,
> vm,Wc1,Wc1,Wb1,Wb1")
> -  (match_operand 4 "vector_length_operand"  " rK, rK, rK,
> rK, rK, rK, rK, rK")
> -  (match_operand 5 "const_int_operand"  "  i,  i,
> i,  i,  i,  i,  i,  i")
> -  (match_operand 6 "const_int_operand"  "  i,  i,
> i,  i,  i,  i,  i,  i")
> -  (match_operand 7 "const_int_operand"  "  i,  i,
> i,  i,  i,  i,  i,  i")
> + [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1,
> Wb1, Wb1")
> +  (match_operand  4 "vector_length_operand" " rK,  rK,
> rK,  rK")
> +  (match_operand  5 "const_int_operand" "  i,   i,
> i,   i")
> +  (match_operand  6 "const_int_operand" "  i,   i,
> i,   i")
> +  (match_operand  7 "const_int_operand" "  i,   i,
> i,   i")
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
> -   (vec_duplicate:V_VLSF_ZVFHMIN
> - (match_operand: 3 "direct_broadcast_operand"   " f,
> f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
> -   (match_operand:V_VLSF_ZVFHMIN 2 "vector_merge_operand""vu,  0,
> vu,  0, vu,  0, vu,  0")))]
> +   (vec_duplicate:V_VLSF
> + (match_operand: 3 "direct_broadcast_operand"  "  f,   f,
> f,   f"))
> +   (match_operand:V_VLSF  2 "vector_merge_operand"  " vu,   0,
> vu,   0")))]
>"TARGET_VECTOR"
>"@
> vfmv.v.f\t%0,%3
> vfmv.v.f\t%0,%3
> +   vfmv.s.f\t%0,%3
> +   vfmv.s.f\t%0,%3"
> +  [(set_attr "type" "vfmov,vfmov,vfmovfv,vfmovfv")
> +   (set_attr "mode" "")])
> +
> +(define_insn "*pred_broadcast_zvfhmin"
> +  [(set (match_operand:V_

[PATCH] c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

2024-07-03 Thread Jakub Jelinek
Hi!

With the PR115754 fix in, constexpr placement new mostly just works,
so this patch just adds constexpr keyword to the placement new operators
in , adds FTMs and testsuite coverage.

There is one accepts-invalid though, the
new (p + 1) int[]{2, 3};  // error (in this paper)
case from the paper.  Can we handle that incrementally?
The problem with that is I think calling operator new now that it is
constexpr should be fine even in that case in constant expressions, so
int *p = std::allocator{}.allocate(3);
int *q = operator new[] (sizeof (int) * 2, p + 1);
should be ok, so it can't be easily the placement new operator call
itself on whose constexpr evaluation we try something special, it should
be on the new expression, but constexpr.cc actually sees only
<<< Unknown tree: expr_stmt
  (void) (TARGET_EXPR (b) + 4>>, TARGET_EXPR )>,   int * D.2643;
  <<< Unknown tree: expr_stmt
(void) (D.2643 = (int *) D.2642) >>>;
and that is just fine by the preexisting constexpr evaluation rules.

Should build_new_1 emit some extra cast for the array cases with placement
new in maybe_constexpr_fn (current_function_decl) that the existing P2738
code would catch?

Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-03  Jakub Jelinek  

PR c++/115744
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_constexpr
from 202306L to 202406L for C++26.
gcc/testsuite/
* g++.dg/cpp2a/construct_at.h (operator new, operator new[]):
Use constexpr instead of inline if __cpp_constexpr >= 202406L.
* g++.dg/cpp26/constexpr-new1.C: New test.
* g++.dg/cpp26/constexpr-new2.C: New test.
* g++.dg/cpp26/constexpr-new3.C: New test.
* g++.dg/cpp26/feat-cxx26.C (__cpp_constexpr): Adjust expected
value.
libstdc++-v3/
* libsupc++/new (__glibcxx_want_constexpr_new): Define before
including bits/version.h.
(_GLIBCXX_PLACEMENT_CONSTEXPR): Define.
(operator new, operator new[]): Use it for placement new instead
of inline.
* include/bits/version.def (constexpr_new): New FTM.
* include/bits/version.h: Regenerate.

--- gcc/c-family/c-cppbuiltin.cc.jj 2024-07-02 22:06:21.343875948 +0200
+++ gcc/c-family/c-cppbuiltin.cc2024-07-03 10:18:00.311324004 +0200
@@ -1091,7 +1091,7 @@ c_cpp_builtins (cpp_reader *pfile)
   if (cxx_dialect > cxx23)
{
  /* Set feature test macros for C++26.  */
- cpp_define (pfile, "__cpp_constexpr=202306L");
+ cpp_define (pfile, "__cpp_constexpr=202406L");
  cpp_define (pfile, "__cpp_static_assert=202306L");
  cpp_define (pfile, "__cpp_placeholder_variables=202306L");
  cpp_define (pfile, "__cpp_structured_bindings=202403L");
--- gcc/testsuite/g++.dg/cpp2a/construct_at.h.jj2024-07-02 
22:06:22.138865784 +0200
+++ gcc/testsuite/g++.dg/cpp2a/construct_at.h   2024-07-03 10:18:00.312323991 
+0200
@@ -58,5 +58,18 @@ namespace std
   { l->~T (); }
 }
 
-inline void *operator new (std::size_t, void *p) noexcept
+#if __cpp_constexpr >= 202406L
+constexpr
+#else
+inline
+#endif
+void *operator new (std::size_t, void *p) noexcept
+{ return p; }
+
+#if __cpp_constexpr >= 202406L
+constexpr
+#else
+inline
+#endif
+void *operator new[] (std::size_t, void *p) noexcept
 { return p; }
--- gcc/testsuite/g++.dg/cpp26/constexpr-new1.C.jj  2024-07-03 
10:18:00.312323991 +0200
+++ gcc/testsuite/g++.dg/cpp26/constexpr-new1.C 2024-07-03 10:18:00.312323991 
+0200
@@ -0,0 +1,66 @@
+// C++26 P2747R2 - constexpr placement new
+// { dg-do compile { target c++26 } }
+
+#include "../cpp2a/construct_at.h"
+
+struct S {
+  constexpr S () : a (42), b (43) {}
+  constexpr S (int c, int d) : a (c), b (d) {}
+  int a, b;
+};
+struct T {
+  int a, b;
+};
+
+constexpr bool
+foo ()
+{
+  std::allocator a;
+  auto b = a.allocate (3);
+  ::new (b) int ();
+  ::new (b + 1) int (1);
+  ::new (b + 2) int {2};
+  if (b[0] != 0 || b[1] != 1 || b[2] != 2)
+return false;
+  a.deallocate (b, 3);
+  std::allocator c;
+  auto d = c.allocate (4);
+  ::new (d) S;
+  ::new (d + 1) S ();
+  ::new (d + 2) S (7, 8);
+  ::new (d + 3) S { 9, 10 };
+  if (d[0].a != 42 || d[0].b != 43
+  || d[1].a != 42 || d[1].b != 43
+  || d[2].a != 7 || d[2].b != 8
+  || d[3].a != 9 || d[3].b != 10)
+return false;
+  d[0].~S ();
+  d[1].~S ();
+  d[2].~S ();
+  d[3].~S ();
+  c.deallocate (d, 4);
+  std::allocator e;
+  auto f = e.allocate (3);
+  ::new (f) T ();
+  ::new (f + 1) T (7, 8);
+  ::new (f + 2) T { .a = 9, .b = 10 };
+  if (f[0].a != 0 || f[0].b != 0
+  || f[1].a != 7 || f[1].b != 8
+  || f[2].a != 9 || f[2].b != 10)
+return false;
+  f[0].~T ();
+  f[1].~T ();
+  f[2].~T ();
+  e.deallocate (f, 3);
+  auto g = a.allocate (3);
+  new (g) int[] {1, 2, 3};
+  if (g[0] != 1 || g[1] != 2 || g[2] != 3)
+return false;
+  new (g) int[] {4, 5};
+  if (g[0] != 4 || g[1] != 5)
+return false;
+  a.dea

[PATCH] c++: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions

2024-07-03 Thread Jakub Jelinek
Hi!

The following patch implements CWG2819 (which wasn't a DR because
it changes behavior of C++26 only).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-03  Jakub Jelinek  

* constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow
cv void * null pointer value conversion to object types in constant
expressions.

* g++.dg/cpp26/constexpr-voidptr3.C: New test.
* g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-cast4.C: Likewise.

--- gcc/cp/constexpr.cc.jj  2024-07-02 22:09:52.493176541 +0200
+++ gcc/cp/constexpr.cc 2024-07-03 12:46:57.255025849 +0200
@@ -8157,10 +8157,13 @@ cxx_eval_constant_expression (const cons
|| DECL_NAME (decl) == heap_vec_uninit_identifier))
  /* OK */;
/* P2738 (C++26): a conversion from a prvalue P of type "pointer to
-  cv void" to a pointer-to-object type T unless P points to an
-  object whose type is similar to T.  */
+  cv void" to a pointer-to-object type T unless P is a null
+  pointer value or points to an object whose type is similar to
+  T.  */
else if (cxx_dialect > cxx23)
  {
+   if (integer_zerop (sop))
+ return build_int_cst (type, 0);
r = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), sop);
if (r)
  {
--- gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C.jj  2024-07-03 
12:35:56.301762995 +0200
+++ gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C 2024-07-03 
12:35:51.577825446 +0200
@@ -0,0 +1,13 @@
+// CWG 2819 - Cast from null pointer value in a constant expression
+// { dg-do compile { target c++26 } }
+
+struct S { int s; };
+
+constexpr S *
+foo ()
+{
+  void *p = nullptr;
+  return static_cast (p);
+}
+
+static_assert (foo () == nullptr);
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C.jj 2023-07-17 
09:07:42.104283529 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C2024-07-03 
16:22:33.294916937 +0200
@@ -5,9 +5,9 @@
 static int i;
 constexpr void *vp0 = nullptr;
 constexpr void *vpi = &i;
-constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" }
+constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
 constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
-constexpr int *p3 = static_cast(vp0); // { dg-error "cast from 
.void\\*. is not allowed" }
+constexpr int *p3 = static_cast(vp0); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
 constexpr int *p4 = static_cast(vpi); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
 constexpr void *p5 = vp0;
 constexpr void *p6 = vpi;
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C.jj 2023-11-02 
07:39:18.679201173 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C2024-07-03 
16:23:29.424197809 +0200
@@ -8,4 +8,3 @@ constexpr float* pf = static_cast(vnp);  // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
-// { dg-error "cast from .void\\*. is not allowed in a constant expression 
because .vnp. does not point to an object" "" { target c++26 } .-1 }

Jakub



RE: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW [PR115763]

2024-07-03 Thread Li, Pan2
Committed, thanks Juzhe and Kito. Let’s wait for a while before backport to 14.

I suspect there may be similar cases for other insn(s), will double check and 
fix first.

Pan

From: Kito Cheng 
Sent: Wednesday, July 3, 2024 10:32 PM
To: juzhe.zh...@rivai.ai
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]


LGTM and ok for gcc 14 as well,
btw an idea is that actually could passed via gpr, I mean fpr->gpr and then 
vmv.v.x, but it's not block commend for this patch.

钟居哲 mailto:juzhe.zh...@rivai.ai>> 於 2024年7月3日 週三 22:18 寫道:
LGTM。


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2024-07-03 22:17
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; 
jeffreyalaw; 
rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Bugfix vfmv insn honor zvfhmin for FP16 SEW 
[PR115763]
From: Pan Li mailto:pan2...@intel.com>>

According to the ISA,  the zvfhmin sub extension should only contain
convertion insn.  Thus,  the vfmv insn acts on FP16 should not be
present when only the zvfhmin option is given.

This patch would like to fix it by split the pred_broadcast define_insn
into zvfhmin and zvfh part.  Given below example:

void test (_Float16 *dest, _Float16 bias) {
  dest[0] = bias;
  dest[1] = bias;
}

when compile with -march=rv64gcv_zfh_zvfhmin

Before this patch:
test:
  vsetivlizero,2,e16,mf4,ta,ma
  vfmv.v.fv1,fa0 // should not leverage vfmv for zvfhmin
  vse16.v v1,0(a0)
  ret

After this patch:
test:
  addi sp,sp,-16
  fsh  fa0,14(sp)
  addi a5,sp,14
  vsetivli zero,2,e16,mf4,ta,ma
  vlse16.v v1,0(a5),zero
  vse16.v  v1,0(a0)
  addi sp,sp,16
  jr   ra

PR target/115763

gcc/ChangeLog:

* config/riscv/vector.md (*pred_broadcast): Split into
zvfh and zvfhmin part.
(*pred_broadcast_zvfh): New define_insn for zvfh part.
(*pred_broadcast_zvfhmin): Ditto but for zvfhmin.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Adjust asm check.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto.
* gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto.
* gcc.target/riscv/rvv/base/pr115763-1.c: New test.
* gcc.target/riscv/rvv/base/pr115763-2.c: New test.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/vector.md| 49 +--
.../gcc.target/riscv/rvv/base/pr115763-1.c|  9 
.../gcc.target/riscv/rvv/base/pr115763-2.c| 10 
.../gcc.target/riscv/rvv/base/scalar_move-5.c |  4 +-
.../gcc.target/riscv/rvv/base/scalar_move-6.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-7.c |  6 +--
.../gcc.target/riscv/rvv/base/scalar_move-8.c |  6 +--
7 files changed, 64 insertions(+), 26 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115763-2.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fe18ee5b5f7..d9474262d54 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -2080,31 +2080,50 @@ (define_insn_and_split "*pred_broadcast"
   [(set_attr "type" "vimov,vimov,vlds,vlds,vlds,vlds,vimovxv,vimovxv")
(set_attr "mode" "")])
-(define_insn "*pred_broadcast"
-  [(set (match_operand:V_VLSF_ZVFHMIN 0 "register_operand" "=vr, vr, 
vr, vr, vr, vr, vr, vr")
- (if_then_else:V_VLSF_ZVFHMIN
+(define_insn "*pred_broadcast_zvfh"
+  [(set (match_operand:V_VLSF0 "register_operand"  "=vr,  vr,  
vr,  vr")
+ (if_then_else:V_VLSF
  (unspec:
- [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1,Wc1, vm, 
vm,Wc1,Wc1,Wb1,Wb1")
-  (match_operand 4 "vector_length_operand"  " rK, rK, rK, rK, 
rK, rK, rK, rK")
-  (match_operand 5 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 6 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
-  (match_operand 7 "const_int_operand"  "  i,  i,  i,  i,  
i,  i,  i,  i")
+ [(match_operand: 1 "vector_broadcast_mask_operand" "Wc1, Wc1, Wb1, 
Wb1")
+  (match_operand  4 "vector_length_operand" " rK,  rK,  rK,  
rK")
+  (match_operand  5 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  6 "const_int_operand" "  i,   i,   i,   
i")
+  (match_operand  7 "const_int_operand" "  i,   i,   i,   
i")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
-   (vec_duplicate:V_VLSF_ZVFHMIN
- (match_operand: 3 "direct_broadcast_operand"   " f,  
f,Wdm,Wdm,Wdm,Wdm,  f,  f"))
-   (match_operand:V_VLSF_ZVFHMIN 2 "vector_

Re: [PATCH] c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

2024-07-03 Thread Jonathan Wakely
On Wed, 3 Jul 2024 at 15:37, Jakub Jelinek  wrote:
>
> Hi!
>
> With the PR115754 fix in, constexpr placement new mostly just works,
> so this patch just adds constexpr keyword to the placement new operators
> in , adds FTMs and testsuite coverage.
>
> There is one accepts-invalid though, the
> new (p + 1) int[]{2, 3};  // error (in this paper)
> case from the paper.  Can we handle that incrementally?
> The problem with that is I think calling operator new now that it is
> constexpr should be fine even in that case in constant expressions, so
> int *p = std::allocator{}.allocate(3);
> int *q = operator new[] (sizeof (int) * 2, p + 1);
> should be ok, so it can't be easily the placement new operator call
> itself on whose constexpr evaluation we try something special, it should
> be on the new expression, but constexpr.cc actually sees only
> <<< Unknown tree: expr_stmt
>   (void) (TARGET_EXPR  VIEW_CONVERT_EXPR(b) + 4>>, TARGET_EXPR  NON_LVALUE_EXPR )>,   int * D.2643;
>   <<< Unknown tree: expr_stmt
> (void) (D.2643 = (int *) D.2642) >>>;
> and that is just fine by the preexisting constexpr evaluation rules.
>
> Should build_new_1 emit some extra cast for the array cases with placement
> new in maybe_constexpr_fn (current_function_decl) that the existing P2738
> code would catch?
>
> Anyway, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

I have a mild preference for #undef _GLIBCXX_PLACEMENT_CONSTEXPR after
we're finished using it, but the libstdc++ parts are OK either way.


>
> 2024-07-03  Jakub Jelinek  
>
> PR c++/115744
> gcc/c-family/
> * c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_constexpr
> from 202306L to 202406L for C++26.
> gcc/testsuite/
> * g++.dg/cpp2a/construct_at.h (operator new, operator new[]):
> Use constexpr instead of inline if __cpp_constexpr >= 202406L.
> * g++.dg/cpp26/constexpr-new1.C: New test.
> * g++.dg/cpp26/constexpr-new2.C: New test.
> * g++.dg/cpp26/constexpr-new3.C: New test.
> * g++.dg/cpp26/feat-cxx26.C (__cpp_constexpr): Adjust expected
> value.
> libstdc++-v3/
> * libsupc++/new (__glibcxx_want_constexpr_new): Define before
> including bits/version.h.
> (_GLIBCXX_PLACEMENT_CONSTEXPR): Define.
> (operator new, operator new[]): Use it for placement new instead
> of inline.
> * include/bits/version.def (constexpr_new): New FTM.
> * include/bits/version.h: Regenerate.
>
> --- gcc/c-family/c-cppbuiltin.cc.jj 2024-07-02 22:06:21.343875948 +0200
> +++ gcc/c-family/c-cppbuiltin.cc2024-07-03 10:18:00.311324004 +0200
> @@ -1091,7 +1091,7 @@ c_cpp_builtins (cpp_reader *pfile)
>if (cxx_dialect > cxx23)
> {
>   /* Set feature test macros for C++26.  */
> - cpp_define (pfile, "__cpp_constexpr=202306L");
> + cpp_define (pfile, "__cpp_constexpr=202406L");
>   cpp_define (pfile, "__cpp_static_assert=202306L");
>   cpp_define (pfile, "__cpp_placeholder_variables=202306L");
>   cpp_define (pfile, "__cpp_structured_bindings=202403L");
> --- gcc/testsuite/g++.dg/cpp2a/construct_at.h.jj2024-07-02 
> 22:06:22.138865784 +0200
> +++ gcc/testsuite/g++.dg/cpp2a/construct_at.h   2024-07-03 10:18:00.312323991 
> +0200
> @@ -58,5 +58,18 @@ namespace std
>{ l->~T (); }
>  }
>
> -inline void *operator new (std::size_t, void *p) noexcept
> +#if __cpp_constexpr >= 202406L
> +constexpr
> +#else
> +inline
> +#endif
> +void *operator new (std::size_t, void *p) noexcept
> +{ return p; }
> +
> +#if __cpp_constexpr >= 202406L
> +constexpr
> +#else
> +inline
> +#endif
> +void *operator new[] (std::size_t, void *p) noexcept
>  { return p; }
> --- gcc/testsuite/g++.dg/cpp26/constexpr-new1.C.jj  2024-07-03 
> 10:18:00.312323991 +0200
> +++ gcc/testsuite/g++.dg/cpp26/constexpr-new1.C 2024-07-03 10:18:00.312323991 
> +0200
> @@ -0,0 +1,66 @@
> +// C++26 P2747R2 - constexpr placement new
> +// { dg-do compile { target c++26 } }
> +
> +#include "../cpp2a/construct_at.h"
> +
> +struct S {
> +  constexpr S () : a (42), b (43) {}
> +  constexpr S (int c, int d) : a (c), b (d) {}
> +  int a, b;
> +};
> +struct T {
> +  int a, b;
> +};
> +
> +constexpr bool
> +foo ()
> +{
> +  std::allocator a;
> +  auto b = a.allocate (3);
> +  ::new (b) int ();
> +  ::new (b + 1) int (1);
> +  ::new (b + 2) int {2};
> +  if (b[0] != 0 || b[1] != 1 || b[2] != 2)
> +return false;
> +  a.deallocate (b, 3);
> +  std::allocator c;
> +  auto d = c.allocate (4);
> +  ::new (d) S;
> +  ::new (d + 1) S ();
> +  ::new (d + 2) S (7, 8);
> +  ::new (d + 3) S { 9, 10 };
> +  if (d[0].a != 42 || d[0].b != 43
> +  || d[1].a != 42 || d[1].b != 43
> +  || d[2].a != 7 || d[2].b != 8
> +  || d[3].a != 9 || d[3].b != 10)
> +return false;
> +  d[0].~S ();
> +  d[1].~S ();
> +  d[2].~S ();
> +  d[3].~S ();
> +  c.deallocate (d, 4);
> +  std::allocator e;
> +  auto f = e.allocate (3

Re: [PATCH] rs6000: ROP - Emit hashst and hashchk insns on Power8 and later [PR114759]

2024-07-03 Thread Peter Bergner
On 7/3/24 4:01 AM, Kewen.Lin wrote:
>> -  if (TARGET_POWER10
>> +  if (TARGET_POWER8
>>&& info->calls_p
>>&& DEFAULT_ABI == ABI_ELFv2
>>&& rs6000_rop_protect)
> 
> Nit: I noticed that this is the only place to change
> info->rop_hash_size to non-zero, and ...
> 
>> @@ -3277,7 +3277,7 @@ rs6000_emit_prologue (void)
>>/* NOTE: The hashst isn't needed if we're going to do a sibcall,
>>   but there's no way to know that here.  Harmless except for
>>   performance, of course.  */
>> -  if (TARGET_POWER10 && rs6000_rop_protect && info->rop_hash_size != 0)
>> +  if (TARGET_POWER8 && rs6000_rop_protect && info->rop_hash_size != 0)
> 
> ... this condition and ...
> 
>>  {
>>gcc_assert (DEFAULT_ABI == ABI_ELFv2);
>>rtx stack_ptr = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
>> @@ -5056,7 +5056,7 @@ rs6000_emit_epilogue (enum epilogue_type epilogue_type)
>>  
>>/* The ROP hash check must occur after the stack pointer is restored
>>   (since the hash involves r1), and is not performed for a sibcall.  */
>> -  if (TARGET_POWER10
>> +  if (TARGET_POWER8>&& rs6000_rop_protect
>>&& info->rop_hash_size != 0
> 
> ... here, both check info->rop_hash_size isn't zero, I think we can drop these
> two TARGET_POWER10 (TARGET_POWER8) and rs6000_rop_protect checks?  Instead 
> just
> update the inner gcc_assert (now checking DEFAULT_ABI == ABI_ELFv2) by extra
> checkings on TARGET_POWER8 && rs6000_rop_protect?
> 
> The other looks good to me, ok for trunk with this nit tweaked (if you agree
> with it and re-tested well), thanks!

I agree with you, because the next patch I haven't submitted yet (waiting
on this to get in), makes that simplification as part of the adding earlier
checking of invalid options. :-)  The follow-on patch will not only remove
the TARGET_* and the 2nd/3rd rs6000_rop_protect usage, but will also remove
the test and asserts of ELFv2...because we've already verified valid option
usage earlier in the normal options handling code.

Therefore, I'd like to keep this patch as simple as possible and limited to
the TARGET_POWER10 -> TARGET_POWER8 change and the cleanup of those tests is
coming in the next patch...which has already been tested.

Peter




Re: [PATCH v2] RISC-V: Implement the .SAT_TRUNC for scalar

2024-07-03 Thread Jeff Law




On 7/2/24 7:16 PM, Li, Pan2 wrote:

Thanks Jeff for comments.


Why are you using Pmode?  Pmode is for pointers.  This stuff looks like
basic integer ops, so I don't see why Pmode is appropriate.


The incoming operand may be HI/QI/SImode, so we need to prompt the mode.
So there we should take Xmode? Will update in v2.
I would expect that QI/HI shouldn't be happening in practice due to the 
definition of WORD_REGISTER_OPERATIONS.


For rv32 I would expect to just see SI.  For rv64 we're likely to see 
both SI and DI and I would expect that you can just use GET_MODE (src) 
to get that input mode -- unless the input is a constant.


Note that since you're ultimately generating an IOR, if you've got an SI 
input on rv64, then you're going to need to either extend the input or 
wrap it in a suitable widening subreg.


If we allow constants, then we probably need further adjustments.


Jeff


[PING*2][PATCH v2] rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]

2024-07-03 Thread Peter Bergner
Ping * 2.   [Message-ID: <1e003d78-3b2e-4263-830a-7c00a3e9d...@linux.ibm.com>]

Segher, this resolves the issues you mentioned in your review.

Peter



On 6/18/24 5:59 PM, Peter Bergner wrote:
> Updated patch.  This passed bootstrap and regtesting on powerpc64le-linux
> with no regressions.  Ok for trunk?
>
> Changes from v1:
> 1. Moved the disabling of shrink-wrapping to rs6000_emit_prologue
>and beefed up comment.  Used a more accurate test.
> 2. Added comment to the test case on why rop_ok is needed.
>
> Peter
>
>
> rs6000: ROP - Do not disable shrink-wrapping for leaf functions [PR114759]
>
> Only disable shrink-wrapping when using -mrop-protect when we know we
> will be emitting the ROP-protect hash instructions (ie, non-leaf functions).
>
> 2024-06-17  Peter Bergner  
>
> gcc/
>   PR target/114759
>   * config/rs6000/rs6000.cc (rs6000_override_options_after_change): Move
>   the disabling of shrink-wrapping from here
>   * config/rs6000/rs6000-logue.cc (rs6000_emit_prologue): ...to here.
>
> gcc/testsuite/
>   PR target/114759
>   * gcc.target/powerpc/pr114759-1.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc |  5 +
>  gcc/config/rs6000/rs6000.cc   |  4 
>  gcc/testsuite/gcc.target/powerpc/pr114759-1.c | 16 
>  3 files changed, 21 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114759-1.c
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index 193e2122c0f..c384e48e378 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -3018,6 +3018,11 @@ rs6000_emit_prologue (void)
> && (lookup_attribute ("no_split_stack",
>   DECL_ATTRIBUTES 
> (cfun->decl))
> == NULL));
> +  /* If we are inserting ROP-protect hash instructions, disable shrink-wrap
> + until the bug where the hashst insn is emitted in the wrong location
> + is fixed.  See PR101324 for details.  */
> +  if (info->rop_hash_size)
> +flag_shrink_wrap = 0;
>  
>frame_pointer_needed_indeed
>  = frame_pointer_needed && df_regs_ever_live_p 
> (HARD_FRAME_POINTER_REGNUM);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..fd6e013c346 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3427,10 +3427,6 @@ rs6000_override_options_after_change (void)
>  }
>else if (!OPTION_SET_P (flag_cunroll_grow_size))
>  flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> -
> -  /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
> -  if (rs6000_rop_protect)
> -flag_shrink_wrap = 0;
>  }
>  
>  #ifdef TARGET_USES_LINUX64_OPT
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114759-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> new file mode 100644
> index 000..579e08e920f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114759-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -mrop-protect 
> -fdump-rtl-pro_and_epilogue" } */
> +/* { dg-require-effective-target rop_ok } Only enable on supported ABIs. */
> +
> +/* Verify we still attempt shrink-wrapping when using -mrop-protect
> +   and there are no function calls.  */
> +
> +long
> +foo (long arg)
> +{
> +  if (arg)
> +asm ("" ::: "r20");
> +  return 0;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Performing shrink-wrapping" 1 
> "pro_and_epilogue" } } */



Re: [PATCH 1/3] libstdc++: Use RAII in

2024-07-03 Thread Jonathan Wakely
On Thu, 27 Jun 2024 at 11:52, Jonathan Wakely wrote:
>
> This refactoring to use RAII doesn't seem to make any difference in
> benchmarks, although the generated code for some std::vector operations
> seems to be slightly larger. Maybe it will be faster (or slower) in some
> cases I didn't test?
>
> I think I like the change anyway - any other opinions on whether it's an
> improvement?

Any thoughts before I push this? Better? Worse? Needs more cowbell?


> Tested x86_64-linux.
>
> -- >8 --
>
> This adds an _UninitDestroyGuard class template, similar to
> ranges::_DestroyGuard used in . This allows
> us to remove all the try-catch blocks and rethrows, because any required
> cleanup gets done in the guard destructor.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/stl_uninitialized.h (_UninitDestroyGuard): New
> class template and partial specialization.
> (__do_uninit_copy, __do_uninit_fill, __do_uninit_fill_n)
> (__uninitialized_copy_a, __uninitialized_fill_a)
> (__uninitialized_fill_n_a, __uninitialized_copy_move)
> (__uninitialized_move_copy, __uninitialized_fill_move)
> (__uninitialized_move_fill, __uninitialized_default_1)
> (__uninitialized_default_n_a, __uninitialized_default_novalue_1)
> (__uninitialized_default_novalue_n_1, __uninitialized_copy_n)
> (__uninitialized_copy_n_pair): Use it.
> ---
>  libstdc++-v3/include/bits/stl_uninitialized.h | 365 --
>  1 file changed, 156 insertions(+), 209 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h 
> b/libstdc++-v3/include/bits/stl_uninitialized.h
> index 3c405d8fbe8..a9965f26269 100644
> --- a/libstdc++-v3/include/bits/stl_uninitialized.h
> +++ b/libstdc++-v3/include/bits/stl_uninitialized.h
> @@ -107,24 +107,70 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  __is_trivial(T) && __is_assignable(T&, U)
>  #endif
>
> +  template
> +struct _UninitDestroyGuard
> +{
> +  _GLIBCXX20_CONSTEXPR
> +  explicit
> +  _UninitDestroyGuard(_ForwardIterator& __first, _Alloc& __a)
> +  : _M_first(__first), _M_cur(__builtin_addressof(__first)), 
> _M_alloc(__a)
> +  { }
> +
> +  _GLIBCXX20_CONSTEXPR
> +  ~_UninitDestroyGuard()
> +  {
> +   if (__builtin_expect(_M_cur != 0, 0))
> + std::_Destroy(_M_first, *_M_cur, _M_alloc);
> +  }
> +
> +  _GLIBCXX20_CONSTEXPR
> +  void release() { _M_cur = 0; }
> +
> +private:
> +  _ForwardIterator const _M_first;
> +  _ForwardIterator* _M_cur;
> +  _Alloc& _M_alloc;
> +
> +  _UninitDestroyGuard(const _UninitDestroyGuard&);
> +};
> +
> +  template
> +struct _UninitDestroyGuard<_ForwardIterator, void>
> +{
> +  _GLIBCXX20_CONSTEXPR
> +  explicit
> +  _UninitDestroyGuard(_ForwardIterator& __first)
> +  : _M_first(__first), _M_cur(__builtin_addressof(__first))
> +  { }
> +
> +  _GLIBCXX20_CONSTEXPR
> +  ~_UninitDestroyGuard()
> +  {
> +   if (__builtin_expect(_M_cur != 0, 0))
> + std::_Destroy(_M_first, *_M_cur);
> +  }
> +
> +  _GLIBCXX20_CONSTEXPR
> +  void release() { _M_cur = 0; }
> +
> +  _ForwardIterator const _M_first;
> +  _ForwardIterator* _M_cur;
> +
> +private:
> +  _UninitDestroyGuard(const _UninitDestroyGuard&);
> +};
> +
>template
>  _GLIBCXX20_CONSTEXPR
>  _ForwardIterator
>  __do_uninit_copy(_InputIterator __first, _InputIterator __last,
>  _ForwardIterator __result)
>  {
> -  _ForwardIterator __cur = __result;
> -  __try
> -   {
> - for (; __first != __last; ++__first, (void)++__cur)
> -   std::_Construct(std::__addressof(*__cur), *__first);
> - return __cur;
> -   }
> -  __catch(...)
> -   {
> - std::_Destroy(__result, __cur);
> - __throw_exception_again;
> -   }
> +  _UninitDestroyGuard<_ForwardIterator> __guard(__result);
> +  for (; __first != __last; ++__first, (void)++__result)
> +   std::_Construct(std::__addressof(*__result), *__first);
> +  __guard.release();
> +  return __result;
>  }
>
>template
> @@ -192,17 +238,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  __do_uninit_fill(_ForwardIterator __first, _ForwardIterator __last,
>  const _Tp& __x)
>  {
> -  _ForwardIterator __cur = __first;
> -  __try
> -   {
> - for (; __cur != __last; ++__cur)
> -   std::_Construct(std::__addressof(*__cur), __x);
> -   }
> -  __catch(...)
> -   {
> - std::_Destroy(__first, __cur);
> - __throw_exception_again;
> -   }
> +  _UninitDestroyGuard<_ForwardIterator> __guard(__first);
> +  for (; __first != __last; ++__first)
> +   std::_Construct(std::__addressof(*__first), __x);
> +  __guard.release();
>  }
>
>template
> @@ -260,18 +299,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  _ForwardIterator
>  __do_unini

Re: [PATCH] c++: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions

2024-07-03 Thread Jason Merrill

On 7/3/24 10:39 AM, Jakub Jelinek wrote:

Hi!

The following patch implements CWG2819 (which wasn't a DR because
it changes behavior of C++26 only).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-03  Jakub Jelinek  

* constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow
cv void * null pointer value conversion to object types in constant
expressions.

* g++.dg/cpp26/constexpr-voidptr3.C: New test.
* g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-cast4.C: Likewise.

--- gcc/cp/constexpr.cc.jj  2024-07-02 22:09:52.493176541 +0200
+++ gcc/cp/constexpr.cc 2024-07-03 12:46:57.255025849 +0200
@@ -8157,10 +8157,13 @@ cxx_eval_constant_expression (const cons
|| DECL_NAME (decl) == heap_vec_uninit_identifier))
  /* OK */;
/* P2738 (C++26): a conversion from a prvalue P of type "pointer to
-  cv void" to a pointer-to-object type T unless P points to an
-  object whose type is similar to T.  */
+  cv void" to a pointer-to-object type T unless P is a null
+  pointer value or points to an object whose type is similar to
+  T.  */
else if (cxx_dialect > cxx23)
  {
+   if (integer_zerop (sop))
+ return build_int_cst (type, 0);


This patch should also remove the integer_zerop diagnostic lower in the 
function, which becomes dead code with this change.



r = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), sop);
if (r)
  {
--- gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C.jj  2024-07-03 
12:35:56.301762995 +0200
+++ gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C 2024-07-03 
12:35:51.577825446 +0200
@@ -0,0 +1,13 @@
+// CWG 2819 - Cast from null pointer value in a constant expression
+// { dg-do compile { target c++26 } }
+
+struct S { int s; };
+
+constexpr S *
+foo ()
+{
+  void *p = nullptr;
+  return static_cast (p);
+}
+
+static_assert (foo () == nullptr);
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C.jj 2023-07-17 
09:07:42.104283529 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C2024-07-03 
16:22:33.294916937 +0200
@@ -5,9 +5,9 @@
  static int i;
  constexpr void *vp0 = nullptr;
  constexpr void *vpi = &i;
-constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" }
+constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not allowed" 
"" { target c++23_down } }
  constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not allowed" 
"" { target c++23_down } }
-constexpr int *p3 = static_cast(vp0); // { dg-error "cast from .void\\*. is 
not allowed" }
+constexpr int *p3 = static_cast(vp0); // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
  constexpr int *p4 = static_cast(vpi); // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
  constexpr void *p5 = vp0;
  constexpr void *p6 = vpi;
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C.jj 2023-11-02 
07:39:18.679201173 +0100
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C2024-07-03 
16:23:29.424197809 +0200
@@ -8,4 +8,3 @@ constexpr float* pf = static_cast  
  constexpr void* vnp = nullptr;

  constexpr int* pi2 = static_cast(vnp);  // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
-// { dg-error "cast from .void\\*. is not allowed in a constant expression because .vnp. does 
not point to an object" "" { target c++26 } .-1 }

Jakub





Re: [PATCH 1/3] libstdc++: Use RAII in

2024-07-03 Thread Ville Voutilainen
On Wed, 3 Jul 2024 at 18:33, Jonathan Wakely  wrote:
>
> On Thu, 27 Jun 2024 at 11:52, Jonathan Wakely wrote:
> >
> > This refactoring to use RAII doesn't seem to make any difference in
> > benchmarks, although the generated code for some std::vector operations
> > seems to be slightly larger. Maybe it will be faster (or slower) in some
> > cases I didn't test?
> >
> > I think I like the change anyway - any other opinions on whether it's an
> > improvement?
>
> Any thoughts before I push this? Better? Worse? Needs more cowbell?

I think the patch is an improvement. Push it.


Re: [PATCH] c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

2024-07-03 Thread Jason Merrill

On 7/3/24 10:37 AM, Jakub Jelinek wrote:

+#if __cpp_lib_constexpr_new >= 202406L
+# define _GLIBCXX_PLACEMENT_CONSTEXPR constexpr
+#else
+# define _GLIBCXX_PLACEMENT_CONSTEXPR inline
+#endif


I'm a bit curious why you want constexpr *or* inline rather than leaving 
the inline keyword on the declaration and maybe adding constexpr.  The 
effect should be the same either way, so just wondering.


Jason



Re: [PATCH] ARC: Update gcc.target/arc/pr9001184797.c test

2024-07-03 Thread Jeff Law




On 7/3/24 8:24 AM, Luis Silva wrote:

... to comply with new standards due to stricter analysis in
the latest GCC versions.

gcc/testsuite/ChangeLog:

* gcc.target/arc/pr9001184797.c: (Fix compiler warnings)

I fixed the ChangeLog entry and pushed this to the trunk for you.

I guess we shouldn't be surprised that some of the target tests slipped 
through the cracks.


Thanks!
Jeff


Re: [PATCH] c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

2024-07-03 Thread Jakub Jelinek
On Wed, Jul 03, 2024 at 11:41:58AM -0400, Jason Merrill wrote:
> On 7/3/24 10:37 AM, Jakub Jelinek wrote:
> > +#if __cpp_lib_constexpr_new >= 202406L
> > +# define _GLIBCXX_PLACEMENT_CONSTEXPR constexpr
> > +#else
> > +# define _GLIBCXX_PLACEMENT_CONSTEXPR inline
> > +#endif
> 
> I'm a bit curious why you want constexpr *or* inline rather than leaving the
> inline keyword on the declaration and maybe adding constexpr.  The effect
> should be the same either way, so just wondering.

Just that the inline is then redundant.
But I'll do whatever Jonathan wants (already added #undef of the macro after
uses).

Jakub



[PATCH] c++, v2: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions

2024-07-03 Thread Jakub Jelinek
On Wed, Jul 03, 2024 at 11:35:26AM -0400, Jason Merrill wrote:
> This patch should also remove the integer_zerop diagnostic lower in the
> function, which becomes dead code with this change.

So like this?
Passed quick testing, ok if it passes full bootstrap/regtest?

2024-07-03  Jakub Jelinek  

* constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow
cv void * null pointer value conversion to object types in constant
expressions.

* g++.dg/cpp26/constexpr-voidptr3.C: New test.
* g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-cast4.C: Likewise.

--- gcc/cp/constexpr.cc.jj  2024-07-02 22:09:52.493176541 +0200
+++ gcc/cp/constexpr.cc 2024-07-03 17:39:44.849460994 +0200
@@ -8157,10 +8157,13 @@ cxx_eval_constant_expression (const cons
|| DECL_NAME (decl) == heap_vec_uninit_identifier))
  /* OK */;
/* P2738 (C++26): a conversion from a prvalue P of type "pointer to
-  cv void" to a pointer-to-object type T unless P points to an
-  object whose type is similar to T.  */
+  cv void" to a pointer-to-object type T unless P is a null
+  pointer value or points to an object whose type is similar to
+  T.  */
else if (cxx_dialect > cxx23)
  {
+   if (integer_zerop (sop))
+ return build_int_cst (type, 0);
r = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), sop);
if (r)
  {
@@ -8169,26 +8172,16 @@ cxx_eval_constant_expression (const cons
  }
if (!ctx->quiet)
  {
-   if (TREE_CODE (sop) == ADDR_EXPR)
- {
-   auto_diagnostic_group d;
-   error_at (loc, "cast from %qT is not allowed in a "
- "constant expression because "
- "pointed-to type %qT is not similar to %qT",
- TREE_TYPE (op), TREE_TYPE (TREE_TYPE (sop)),
- TREE_TYPE (type));
-   tree obj = build_fold_indirect_ref (sop);
-   inform (DECL_SOURCE_LOCATION (obj),
-   "pointed-to object declared here");
- }
-   else
- {
-   gcc_assert (integer_zerop (sop));
-   error_at (loc, "cast from %qT is not allowed in a "
- "constant expression because "
- "%qE does not point to an object",
- TREE_TYPE (op), oldop);
- }
+   gcc_assert (TREE_CODE (sop) == ADDR_EXPR);
+   auto_diagnostic_group d;
+   error_at (loc, "cast from %qT is not allowed in a "
+ "constant expression because "
+ "pointed-to type %qT is not similar to %qT",
+ TREE_TYPE (op), TREE_TYPE (TREE_TYPE (sop)),
+ TREE_TYPE (type));
+   tree obj = build_fold_indirect_ref (sop);
+   inform (DECL_SOURCE_LOCATION (obj),
+   "pointed-to object declared here");
  }
*non_constant_p = true;
return t;
--- gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C.jj  2024-07-03 
12:35:56.301762995 +0200
+++ gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C 2024-07-03 
12:35:51.577825446 +0200
@@ -0,0 +1,13 @@
+// CWG 2819 - Cast from null pointer value in a constant expression
+// { dg-do compile { target c++26 } }
+
+struct S { int s; };
+
+constexpr S *
+foo ()
+{
+  void *p = nullptr;
+  return static_cast (p);
+}
+
+static_assert (foo () == nullptr);
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C.jj 2023-07-17 
09:07:42.104283529 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C2024-07-03 
16:22:33.294916937 +0200
@@ -5,9 +5,9 @@
 static int i;
 constexpr void *vp0 = nullptr;
 constexpr void *vpi = &i;
-constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" }
+constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
 constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
-constexpr int *p3 = static_cast(vp0); // { dg-error "cast from 
.void\\*. is not allowed" }
+constexpr int *p3 = static_cast(vp0); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
 constexpr int *p4 = static_cast(vpi); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
 constexpr void *p5 = vp0;
 constexpr void *p6 = vpi;
--- gcc/testsuite/g++.dg/cp

Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-03 Thread Jeff Law




On 7/3/24 3:59 AM, Sébastien Michelland wrote:

libgcc's fp-bit.c is quite slow and most modern/developed architectures
have switched to using the soft-fp library. This patch does so for
free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default parameters
for the most part, most notably no exceptions.

A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
about x3 speedup (~320 -> 1050 Kwhets/s).

I'm sending this as RFC because I'm quite unsure about testing. I built
the compiler and ran the benchmark, but I don't know if GCC has a test
for soft-fp correctness and whether I can run that in my non-hosted
environment. Any advice?

Cheers,
Sébastien

libgcc/ChangeLog:

 * config.host: Use soft-fp library for non-hosted SH3/SH4
 instead of fpdbit.
 * config/sh/sfp-machine.h: New.
I'd really like to hear from Oleg on this, though given we're using the 
soft-fp library on other targets it seems reasonable at a high level.


As far as testing, the GCC testsuite has some FP components which would 
implicitly test soft fp on any target that doesn't have hardware 
floating point.




Jeff


[Committed] RISC-V: Add support for Zabha extension

2024-07-03 Thread Patrick O'Neill

Committed w/ fixup to changelog to add missing:
* lib/target-supports.exp: Add zabha testsuite infra support.

Patrick

On 7/2/24 18:05, Patrick O'Neill wrote:

From: Gianluca Guida 

The Zabha extension adds support for subword Zaamo ops.

Extension: https://github.com/riscv/riscv-zabha.git
Ratification: https://jira.riscv.org/browse/RVS-1685

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zabha when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zabha imply zaamo.
* config/riscv/iterators.md (amobh): Add iterator for amo
byte/halfword.
* config/riscv/riscv.opt: Add zabha.
* config/riscv/sync.md (atomic_): Add
subword atomic op pattern.
(zabha_atomic_fetch_): Add subword
atomic_fetch op pattern.
(lrsc_atomic_fetch_): Prefer zabha over lrsc
for subword atomic ops.
(zabha_atomic_exchange): Add subword atomic exchange
pattern.
(lrsc_atomic_exchange): Prefer zabha over lrsc for subword
atomic exchange ops.
* configure: Regenerate.
* configure.ac: Add zabha assembler check.
* doc/sourcebuild.texi: Add zabha documentation.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo/inline-atomics-1.c: Remove zabha to continue to
test the lr/sc subword patterns.
* gcc.target/riscv/amo/inline-atomics-2.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-subword-amo-add-char-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acq-rel.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-release.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-subword-amo-add-char-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zabha-all-amo-ops-char-run.c: New test.
* gcc.target/riscv/amo/zabha-all-amo-ops-short-run.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-all-amo-ops-short.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-rvwmo-amo-add-short.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-char.c: New test.
* gcc.target/riscv/amo/zabha-ztso-amo-add-short.c: New test.

Co-Authored-By: Patrick O'Neill 
Signed-Off-By: Gianluca Guida 
Tested-by: Andrea Parri 
---
v2 ChangeLog:
Rebase to resolve conflict with testsuite cleanup. Regenerate gcc/testsuite 
ChangeLog.
Add Signed-Off-By that Gianluca gave.
Ok'd by Jeff Law here: 
https://inbox.sourceware.org/gcc-patches/fae68675-519f-4d80-b0fb-dfd5d8a22...@gmail.com/
I'll let it sit on the lists overnight and commit in the morning tomorrow (PST 
timezone).
---
  gcc/common/config/riscv/riscv-common.cc   | 12 +++
  gcc/config.in |  6 ++
  gcc/config/riscv/arch-canonicalize|  3 +
  gcc/config/riscv/iterators.md |  3 +
  gcc/config/riscv/riscv.opt|  2 +
  gcc/config/riscv/sync.md  | 81 ++-
  gcc/configure | 31 +++
  gcc/configure.ac  |  5 ++
  gcc/doc/sourcebuild.texi  | 12 ++-
  .../gcc.target/riscv/amo/inline-atomics-1.c   |  1 +
  .../gcc.target/riscv/amo/inline-atomics-2.c   |  1 +
  .../riscv/amo/zabha-all-amo-ops-char-run.c|  5 ++
  .../riscv/amo/zabha-all-amo-ops-short-run.c   |  5 ++
  .../riscv/amo/zabha-rvwmo-all-amo-ops-char.c  | 23 ++
  .../riscv/amo/zabha-rvwmo-all-amo-ops-short.c | 23 ++
  .../riscv/amo/zabha-rvwmo-amo-add-char.c  | 57 +
  .../riscv/amo/zabha-rvwmo-amo-add-short.c | 57 +
  .../riscv/amo/zabha-ztso-amo-add-char.c   | 57 +
  .../riscv/amo/zabha-ztso-amo-add-short.c  | 57 +
  ...alrsc-rvwmo-subword-amo-add-char-acq-rel.c |  1 +
  ...alrsc-rvwmo-subword-amo-add-char-acquire.c |  1 +
  ...alrsc-rvwmo-subword-amo-add-char-relaxed.c |  1 +
  ...alrsc-rvwmo-subword-amo-add-char-release.c |  1 +
  ...alrsc-rvwmo-subword-amo-add-char-seq-cst.c |  1 +
  ...zalrsc-ztso-subword-amo-add-char-acq-rel.c |  1 +
  ...zalrsc-ztso-subword-amo-add-char-acquire.c |  1 +
  ...zalrsc-ztso-subword-amo-add-char-relaxed.c |  1 +
  ...zalrsc-ztso-subword-amo-add-char-release.c |  1 +
  ...zalrsc-ztso-subword-

[Committed] RISC-V: Describe -march behavior for dependent extensions

2024-07-03 Thread Patrick O'Neill

Committed.

Patrick

On 7/2/24 18:29, Kito Cheng wrote:
LGTM, BTW, based on the discussion[1], my understanding is: depend 
 == require  == imply  for the RISC-V ISA spec.
[1] 
https://github.com/riscv/riscv-v-spec/issues/723#issuecomment-922153867


On Wed, Jul 3, 2024 at 9:21 AM Patrick O'Neill  
wrote:


From: Palmer Dabbelt 

gcc/ChangeLog:

        * doc/invoke.texi: Describe -march behavior for dependent
extensions on
        RISC-V.
---
Ok'd by Jeff Law here:

https://inbox.sourceware.org/gcc-patches/fae68675-519f-4d80-b0fb-dfd5d8a22...@gmail.com/
I'll let it sit on the lists overnight and commit in the morning
tomorrow (PST timezone).
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 68ebd79d676..1181ee2de14 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -31063,6 +31063,10 @@ If both @option{-march} and
@option{-mcpu=} are not specified, the default for
 this argument is system dependent, users who want a specific
architecture
 extensions should specify one explicitly.

+When the RISC-V specifications define an extension as depending
on other
+extensions, GCC will implicitly add the dependent extensions to
the enabled
+extension set if they weren't added explicitly.
+
 @opindex mcpu
 @item -mcpu=@var{processor-string}
 Use architecture of and optimize the output for the given
processor, specified
--
2.43.2


Re: [PATCH] c++, libstdc++: Implement C++26 P2747R2 - constexpr placement new [PR115744]

2024-07-03 Thread Jonathan Wakely
On Wed, 3 Jul 2024 at 16:51, Jakub Jelinek wrote:
>
> On Wed, Jul 03, 2024 at 11:41:58AM -0400, Jason Merrill wrote:
> > On 7/3/24 10:37 AM, Jakub Jelinek wrote:
> > > +#if __cpp_lib_constexpr_new >= 202406L
> > > +# define _GLIBCXX_PLACEMENT_CONSTEXPR constexpr
> > > +#else
> > > +# define _GLIBCXX_PLACEMENT_CONSTEXPR inline
> > > +#endif
> >
> > I'm a bit curious why you want constexpr *or* inline rather than leaving the
> > inline keyword on the declaration and maybe adding constexpr.  The effect
> > should be the same either way, so just wondering.
>
> Just that the inline is then redundant.
> But I'll do whatever Jonathan wants (already added #undef of the macro after
> uses).

I have a mild preference (again :-) for what Jakub's patch does. Those
declarations are getting more and more verbose, so if we don't have
the 'inline' there (because it's part of the macro) then that seems a
little less cluttered.



Re: [PATCH] c++, v2: Implement C++26 CWG2819 - Allow cv void * null pointer value conversion to object types in constant expressions

2024-07-03 Thread Jason Merrill

On 7/3/24 11:56 AM, Jakub Jelinek wrote:

On Wed, Jul 03, 2024 at 11:35:26AM -0400, Jason Merrill wrote:

This patch should also remove the integer_zerop diagnostic lower in the
function, which becomes dead code with this change.


So like this?
Passed quick testing, ok if it passes full bootstrap/regtest?


OK.


2024-07-03  Jakub Jelinek  

* constexpr.cc (cxx_eval_constant_expression): CWG2819 - Allow
cv void * null pointer value conversion to object types in constant
expressions.

* g++.dg/cpp26/constexpr-voidptr3.C: New test.
* g++.dg/cpp0x/constexpr-cast2.C: Adjust expected diagnostics for
C++26.
* g++.dg/cpp0x/constexpr-cast4.C: Likewise.

--- gcc/cp/constexpr.cc.jj  2024-07-02 22:09:52.493176541 +0200
+++ gcc/cp/constexpr.cc 2024-07-03 17:39:44.849460994 +0200
@@ -8157,10 +8157,13 @@ cxx_eval_constant_expression (const cons
|| DECL_NAME (decl) == heap_vec_uninit_identifier))
  /* OK */;
/* P2738 (C++26): a conversion from a prvalue P of type "pointer to
-  cv void" to a pointer-to-object type T unless P points to an
-  object whose type is similar to T.  */
+  cv void" to a pointer-to-object type T unless P is a null
+  pointer value or points to an object whose type is similar to
+  T.  */
else if (cxx_dialect > cxx23)
  {
+   if (integer_zerop (sop))
+ return build_int_cst (type, 0);
r = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), sop);
if (r)
  {
@@ -8169,26 +8172,16 @@ cxx_eval_constant_expression (const cons
  }
if (!ctx->quiet)
  {
-   if (TREE_CODE (sop) == ADDR_EXPR)
- {
-   auto_diagnostic_group d;
-   error_at (loc, "cast from %qT is not allowed in a "
- "constant expression because "
- "pointed-to type %qT is not similar to %qT",
- TREE_TYPE (op), TREE_TYPE (TREE_TYPE (sop)),
- TREE_TYPE (type));
-   tree obj = build_fold_indirect_ref (sop);
-   inform (DECL_SOURCE_LOCATION (obj),
-   "pointed-to object declared here");
- }
-   else
- {
-   gcc_assert (integer_zerop (sop));
-   error_at (loc, "cast from %qT is not allowed in a "
- "constant expression because "
- "%qE does not point to an object",
- TREE_TYPE (op), oldop);
- }
+   gcc_assert (TREE_CODE (sop) == ADDR_EXPR);
+   auto_diagnostic_group d;
+   error_at (loc, "cast from %qT is not allowed in a "
+ "constant expression because "
+ "pointed-to type %qT is not similar to %qT",
+ TREE_TYPE (op), TREE_TYPE (TREE_TYPE (sop)),
+ TREE_TYPE (type));
+   tree obj = build_fold_indirect_ref (sop);
+   inform (DECL_SOURCE_LOCATION (obj),
+   "pointed-to object declared here");
  }
*non_constant_p = true;
return t;
--- gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C.jj  2024-07-03 
12:35:56.301762995 +0200
+++ gcc/testsuite/g++.dg/cpp26/constexpr-voidptr3.C 2024-07-03 
12:35:51.577825446 +0200
@@ -0,0 +1,13 @@
+// CWG 2819 - Cast from null pointer value in a constant expression
+// { dg-do compile { target c++26 } }
+
+struct S { int s; };
+
+constexpr S *
+foo ()
+{
+  void *p = nullptr;
+  return static_cast (p);
+}
+
+static_assert (foo () == nullptr);
--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C.jj 2023-07-17 
09:07:42.104283529 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C2024-07-03 
16:22:33.294916937 +0200
@@ -5,9 +5,9 @@
  static int i;
  constexpr void *vp0 = nullptr;
  constexpr void *vpi = &i;
-constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" }
+constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not allowed" 
"" { target c++23_down } }
  constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not allowed" 
"" { target c++23_down } }
-constexpr int *p3 = static_cast(vp0); // { dg-error "cast from .void\\*. is 
not allowed" }
+constexpr int *p3 = static_cast(vp0); // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
  constexpr int *p4 = static_cast(vpi); // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
  constexpr void *p5 = vp0;

Re: [PATCH] c++: array new with value-initialization [PR115645]

2024-07-03 Thread Jason Merrill

On 7/2/24 4:43 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/branches?


OK.


-- >8 --
This extends the r11-5179 fix which doesn't work with multidimensional
arrays.  In particular,

   struct S {
 explicit S() { }
   };
   auto p = new S[1][1]();

should not say "converting to S from initializer list would use
explicit constructor" because there's no {}.  However, since we
went into the block where we create a {}, we got confused.  We
should not have gotten there but we did because array_p was true.

This patch refines the check once more.

PR c++/115645

gcc/cp/ChangeLog:

* init.cc (build_new): Don't do any deduction for arrays with
bounds if it's value-initialized.

gcc/testsuite/ChangeLog:

* g++.dg/expr/anew7.C: New test.
---
  gcc/cp/init.cc| 12 
  gcc/testsuite/g++.dg/expr/anew7.C | 13 +
  2 files changed, 21 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/expr/anew7.C

diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc
index 826a31c4a84..e9561c146d7 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -4005,10 +4005,14 @@ build_new (location_t loc, vec 
**placement, tree type,
/* P1009: Array size deduction in new-expressions.  */
const bool array_p = TREE_CODE (type) == ARRAY_TYPE;
if (*init
-  /* If ARRAY_P, we have to deduce the array bound.  For C++20 paren-init,
-we have to process the parenthesized-list.  But don't do it for (),
-which is value-initialization, and INIT should stay empty.  */
-  && (array_p || (cxx_dialect >= cxx20 && nelts && !(*init)->is_empty (
+  /* If the array didn't specify its bound, we have to deduce it.  */
+  && ((array_p && !TYPE_DOMAIN (type))
+ /* For C++20 array with parenthesized-init, we have to process
+the parenthesized-list.  But don't do it for (), which is
+value-initialization, and INIT should stay empty.  */
+ || (cxx_dialect >= cxx20
+ && (array_p || nelts)
+ && !(*init)->is_empty (
  {
/* This means we have 'new T[]()'.  */
if ((*init)->is_empty ())
diff --git a/gcc/testsuite/g++.dg/expr/anew7.C 
b/gcc/testsuite/g++.dg/expr/anew7.C
new file mode 100644
index 000..ead5536e109
--- /dev/null
+++ b/gcc/testsuite/g++.dg/expr/anew7.C
@@ -0,0 +1,13 @@
+// PR c++/115645
+// { dg-do compile { target c++11 } }
+
+struct S {
+  explicit S() { }
+};
+
+auto p = new S[1][1]();
+auto q = new S[1][1]{}; // { dg-error "explicit" }
+auto r = new S[1]();
+auto s = new S[1]{}; // { dg-error "explicit" }
+auto t = new S[1][1][1]();
+auto u = new S[1][1][1]{}; // { dg-error "explicit" }

base-commit: 1250540a98e0a1dfa4d7834672d88d8543ea70b1




Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-03 Thread Sébastien Michelland

On 2024-07-03 17:59, Jeff Law wrote:

On 7/3/24 3:59 AM, Sébastien Michelland wrote:

libgcc's fp-bit.c is quite slow and most modern/developed architectures
have switched to using the soft-fp library. This patch does so for
free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default 
parameters

for the most part, most notably no exceptions.

A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
about x3 speedup (~320 -> 1050 Kwhets/s).

I'm sending this as RFC because I'm quite unsure about testing. I built
the compiler and ran the benchmark, but I don't know if GCC has a test
for soft-fp correctness and whether I can run that in my non-hosted
environment. Any advice?

Cheers,
Sébastien

libgcc/ChangeLog:

 * config.host: Use soft-fp library for non-hosted SH3/SH4
 instead of fpdbit.
 * config/sh/sfp-machine.h: New.
I'd really like to hear from Oleg on this, though given we're using the 
soft-fp library on other targets it seems reasonable at a high level.


As far as testing, the GCC testsuite has some FP components which would 
implicitly test soft fp on any target that doesn't have hardware 
floating point.


Thank you. I went this route, following the guide [1] and the 
instructions for cross-compiling [2] before hitting "Newlib does not 
support CPU sh3eb" which I should have seen coming.


There are plenty of random ports lying around but just grabbing one 
doesn't feel right (and I don't have a canonical one to go to as I 
usually run a custom libc for... mostly bad reasons).


Deferring maybe again to the few SH users... how do you usually do it?

Sébastien

[1] https://gcc.gnu.org/install/test.html
[2] https://gcc.gnu.org/simtest-howto.html


Re: [PATCH][c++ frontend]: check for missing condition for novector [PR115623]

2024-07-03 Thread Jason Merrill

On 6/27/24 11:25 AM, Tamar Christina wrote:

-Original Message-
From: Jason Merrill 
Sent: Tuesday, June 25, 2024 10:24 PM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; nat...@acm.org
Subject: Re: [PATCH][c++ frontend]: check for missing condition for novector
[PR115623]

On 6/25/24 12:52, Tamar Christina wrote:

The 06/25/2024 17:10, Jason Merrill wrote:

On 6/25/24 04:01, Tamar Christina wrote:

Hi All,

It looks like I forgot to check in the C++ frontend if a condition exist for the
loop being adorned with novector.  This causes a segfault because cond isn't
expected to be null.

This fixes it by issuing the same kind of diagnostics we issue for the other
pragmas.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport to GCC-14?


Hmm, I'm not sure we want to error in this case; it's pointless, but
indeed we aren't going to vectorize a loop that always loops.  I'd think
we should treat it the same as an explicit "true" condition.  And
perhaps the same for unroll/ivdep.

Does the C front-end treat the null condition different from a constant
true condition?



No, in the C front-end we error for ivdep and unroll, but for novector we 
explicitly
suppress it by checking for novector && cond && cond != error_mark_node

instead of

just novector && cond != error_mark_node in the use site.

Do you want to handle it that way to be consistent?


Please.



How about this version:

This fixes it by issuing ignoring the pragma when there's no loop condition
the same way we do in the C frontend.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master? and backport to GCC-14?


OK.


Thanks,
Tamar

gcc/cp/ChangeLog:

PR c++/115623
* semantics.cc (finish_for_cond): Add check for C++ cond.

gcc/testsuite/ChangeLog:

PR c++/115623
* g++.dg/vect/vect-novector-pragma_2.cc: New test.

-- inline copy of patch --

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 
08f5f245e7d11a76b975bb04c0075ded1b3ca8ba..4e1374c98130247eb10e3fe7571fec00834e9c05
 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1501,7 +1501,7 @@ finish_for_cond (tree cond, tree for_stmt, bool ivdep, 
tree unroll,
  build_int_cst (integer_type_node,
 annot_expr_unroll_kind),
  unroll);
-  if (novector && cond != error_mark_node)
+  if (novector && cond && cond != error_mark_node)
  FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR,
  TREE_TYPE (FOR_COND (for_stmt)),
  FOR_COND (for_stmt),
diff --git a/gcc/testsuite/g++.dg/vect/vect-novector-pragma_2.cc 
b/gcc/testsuite/g++.dg/vect/vect-novector-pragma_2.cc
new file mode 100644
index 
..d2a8eee8d71610281b4e34a694576b6783f0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/vect/vect-novector-pragma_2.cc
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+void f (char *a, int i)
+{
+#pragma GCC novector
+  for (;;i++)
+a[i] *= 2;
+}
+


Jason






Re: [PATCH] rs6000, update vec_ld, vec_lde, vec_st and vec_ste, documentation

2024-07-03 Thread Carl Love



On 7/3/24 2:36 AM, Kewen.Lin wrote:

Hi Carl,

on 2024/6/27 01:05, Carl Love wrote:

GCC maintainers:

The following patch updates the user documentation for the vec_ld, vec_lde, 
vec_st and vec_ste built-ins to make it clearer that there are data alignment 
requirements for these built-ins.  If the data alignment requirements are not 
followed, the data loaded or stored by these built-ins will be wrong.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


rs6000, update vec_ld, vec_lde, vec_st and vec_ste documentation

Use of the vec_ld and vec_st built-ins require that the data be 16-byte
aligned to work properly.  Add some additional text to the existing
documentation to make this clearer to the user.

Similarly, the vec_lde and vec_ste built-ins also have data alignment
requirements based on the size of the vector element.  Update the
documentation to make this clear to the user.

gcc/ChangeLog:
* doc/extend.texi: Add clarification for the use of the vec_ld
vec_st, vec_lde and vec_ste built-ins.
---
  gcc/doc/extend.texi | 15 +++
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ee3644a5264..55faded17b9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22644,10 +22644,17 @@ vector unsigned char vec_xxsldi (vector unsigned char,
  @end smallexample
  
  Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always

-generate the AltiVec @samp{LVX} and @samp{STVX} instructions even
-if the VSX instruction set is available.  The @samp{vec_vsx_ld} and
-@samp{vec_vsx_st} built-in functions always generate the VSX @samp{LXVD2X},
-@samp{LXVW4X}, @samp{STXVD2X}, and @samp{STXVW4X} instructions.
+generate the AltiVec @samp{LVX}, and @samp{STVX} instructions.  The

This change removed "even if the VSX instruction set is available.", I think 
it's
not intentional?  vec_ld and vec_st are well defined in PVIPR, this paragraph is
not to document them IMHO.  Since we document vec_vsx_ld and vec_vsx_st here, it
aims to note the difference between these two pairs.  But I'm not opposed to add
more words to emphasis the special masking off, I prefer to use the same words 
to
PVIPR "ignoring the four low-order bits of the calculated address".  And IMHO we
should not say "it requires the data to be 16-byte aligned to work properly" in
case the users are aware of this behavior well and have some no 16-byte aligned
data and expect it to behave like that, it's arguable to define "it" as not work
properly.


Yea, probably should have left "even if the VSX instruction set is 
available."


I was looking to make it clear that if the data is not 16-bye aligned 
you may not get the expected data loaded/stored.


So how about the following instead:

   Note that the @samp{vec_ld} and @samp{vec_st} built-in functions always
   generate the AltiVec @samp{LVX}, and @samp{STVX} instructions even
   if the VSX
   instruction set is available. The instructions mask off the lower
   4-bits of
   the calculated address. The use of these instructions on data that
   is not
   16-byte aligned may result in unexpected bytes being loaded or stored.


+instructions mask off the lower 4 bits of the effective address thus requiring
+the data to be 16-byte aligned to work properly.  The @samp{vec_lde} and
+@samp{vec_ste} built-in functions operate on vectors of bytes, short integer,
+integer, and float.  The corresponding AltiVec instructions @samp{LVEBX},
+@samp{LVEHX}, @samp{LVEWX}, @samp{STVEBX}, @samp{STVEHX}, @samp{STVEWX} mask
+off the lower bits of the effective address based on the size of the data.
+Thus the data must be aligned to the size of the vector element to work
+properly.  The @samp{vec_vsx_ld} and @samp{vec_vsx_st} built-in functions
+always generate the VSX @samp{LXVD2X}, @samp{LXVW4X}, @samp{STXVD2X}, and
+@samp{STXVW4X} instructions.

As above, there was a reason to mention vec_ld and vec_st here, but not one for
vec_lde and vec_ste IMHO, so let's not mention vec_lde and vec_ste here and 
users
should read the description in PVIPR instead (it's more recommended).


The goal of mentioning the vec_lde and vec_ste built-ins was to give the 
user a pointer to built-ins that will work as expected on unaligned 
data.  It will probably save them a lot of time an frustration if they 
are given a hint of what built-ins they should look at.  So, how about 
the following:


   See the PVIPR description of the vec_lde and vec_ste for loading and
   storing
   data that is not 16-byte aligned.

   Carl


Re: [PATCH v5] c++: fix constained auto deduction in templ spec scopes [PR114915]

2024-07-03 Thread Patrick Palka
On Fri, 14 Jun 2024, Seyed Sajad Kahani wrote:

> When deducing auto for `adc_return_type`, `adc_variable_type`, and
> `adc_decomp_type` contexts (at the usage time), we try to resolve the 
> outermost
> template arguments to be used for satisfaction. This is done by one of the
> following, depending on the scope:
> 
> 1. Checking the `DECL_TEMPLATE_INFO` of the current function scope and
> extracting DECL_TI_ARGS from it for function scope deductions (pt.cc:31236).
> 2. Checking the `DECL_TEMPLATE_INFO` of the declaration (alongside with other
> conditions) for non-function scope variable declaration deductions
> (decl.cc:8527).
> 
> Then, we do not retrieve the deeper layers of the template arguments; instead,
> we fill the missing levels with dummy levels (pt.cc:31260).
> 
> The problem (that is shown in PR114915) is that we do not consider the case
> where the deduction happens in a template specialization scope. In this case,
> the type is not dependent on the outermost template arguments (which are
> the specialization arguments). Yet, we still resolve the outermost template
> arguments, and then the number of layers in the template arguments exceeds the
> number of levels in the type. This causes the missing levels to be negative.
> This leads to the rejection of valid code and ICEs (like segfault) in the
> release mode. In the debug mode, it is possible to show as an assertion 
> failure
> (when creating a tree_vec with a negative size).
> 
> This patch resolves PR114915 by replacing the logic that fills in the
> missing levels in do_auto_deduction in cp/pt.cc.
> The new approach now trims targs if the depth of targs is deeper than desired
> (this will only happen in specific contexts), and still fills targs with empty
> layers if it has fewer depths than expected.

LGTM

> 
>   PR c++/114915
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (do_auto_deduction): Handle excess outer template
>   arguments during constrained auto satisfaction.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-placeholder14.C: New test.
>   * g++.dg/cpp2a/concepts-placeholder15.C: New test.
>   * g++.dg/cpp2a/concepts-placeholder16.C: New test.
> ---
>  gcc/cp/pt.cc  | 20 ---
>  .../g++.dg/cpp2a/concepts-placeholder14.C | 19 +++
>  .../g++.dg/cpp2a/concepts-placeholder15.C | 26 +++
>  .../g++.dg/cpp2a/concepts-placeholder16.C | 33 +++
>  4 files changed, 94 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder15.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-placeholder16.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 32640f8e9..2206d9ffe 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -31253,6 +31253,19 @@ do_auto_deduction (tree type, tree init, tree 
> auto_node,
>   full_targs = add_outermost_template_args (tmpl, full_targs);
>full_targs = add_to_template_args (full_targs, targs);
>  
> +  int want = TEMPLATE_TYPE_ORIG_LEVEL (auto_node);
> +  int have = TMPL_ARGS_DEPTH (full_targs);
> +
> +  if (want < have)
> + {
> +   /* If a constrained auto is declared in an explicit specialization. */
> +   gcc_assert (context == adc_variable_type || context == adc_return_type
> +   || context == adc_decomp_type);
> +   tree trimmed_full_args 
> + = get_innermost_template_args (full_targs, want);
> +   full_targs = trimmed_full_args;
> + }
> +
>/* HACK: Compensate for callers not always communicating all levels of
>outer template arguments by filling in the outermost missing levels
>with dummy levels before checking satisfaction.  We'll still crash
> @@ -31260,11 +31273,10 @@ do_auto_deduction (tree type, tree init, tree 
> auto_node,
>these missing levels, but this hack otherwise allows us to handle a
>large subset of possible constraints (including all non-dependent
>constraints).  */
> -  if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node)
> - - TMPL_ARGS_DEPTH (full_targs)))
> +  if (want > have)
>   {
> -   tree dummy_levels = make_tree_vec (missing_levels);
> -   for (int i = 0; i < missing_levels; ++i)
> +   tree dummy_levels = make_tree_vec (want - have);
> +   for (int i = 0; i < want - have; ++i)
>   TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
> full_targs = add_to_template_args (dummy_levels, full_targs);
>   }
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
> new file mode 100644
> index 0..fcdbd7608
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-placeholder14.C
> @@ -0,0 +1,19 @@
> +// PR c++/114915
> +// { dg-do compile { target c++20 } }
> +
> +te

[PATCH] RISC-V: Add basic support for the Zacas extension

2024-07-03 Thread Patrick O'Neill
From: Gianluca Guida 

This patch adds support for amocas.{b|h|w|d}. Support for amocas.q
(64/128 bit cas for rv32/64) will be added in a future patch.

Extension: https://github.com/riscv/riscv-zacas
Ratification: https://jira.riscv.org/browse/RVS-680

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zacas when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zacas imply zaamo.
* config/riscv/riscv.opt: Add zacas.
* config/riscv/sync.md (zacas_atomic_cas_value): New pattern.
(atomic_compare_and_swap): Use new pattern for compare-and-swap 
ops.
* configure: Regenerate.
* configure.ac: Regenerate.
* doc/sourcebuild.texi: Add Zacas documentation.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zacas testsuite infra support.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c:
Remove zacas to continue to test the lr/sc pairs.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-seq-cst.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire-release.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-relaxed.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-release.c: 
Ditto.
* 
gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst-relaxed.c: Ditto.
* gcc.target/riscv/amo/zalrsc-ztso-compare-exchange-int-seq-cst.c: 
Ditto.
* gcc.target/riscv/amo/zabha-zacas-preferred-over-zalrsc.c: New test.
* gcc.target/riscv/amo/zacas-char-requires-zabha.c: New test.
* gcc.target/riscv/amo/zacas-char-requires-zacas.c: New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acq-rel.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-acquire.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-relaxed.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-release.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-char-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acq-rel.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-acquire.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-relaxed.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-release.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-int-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acq-rel.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-acquire.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-relaxed.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-release.c: 
New test.
* gcc.target/riscv/amo/zacas-rvwmo-compare-exchange-short-seq-cst.c: 
New test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-char-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-char.c: New test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-int.c: New test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short-seq-cst.c: New 
test.
* gcc.target/riscv/amo/zacas-ztso-compare-exchange-short.c: New test.

Co-authored-by: Patrick O'Neill 
---
Gianluca Guida created the initial patch. Rebased and added more
testcases/docs/etc.

Tested using amo.exp with rv64gc_zalrsc and rv64gc_ztso.
Relying on precommit for full testing.
---
 gcc/common/config/riscv/riscv-common.cc   | 11 +++
 gcc/config.in |  6 ++
 gcc/config/riscv/arch-canonicalize|  1 +
 gcc/config/riscv/riscv.opt|  2 +
 gcc/config/riscv/sync.md  | 69 ---
 gcc/configure | 31 +
 gcc/configure.ac  |  5 ++
 gcc/doc/sourcebuild.texi  | 10 +++
 .../amo/zabha-zacas-preferred-over-zalrsc.c   | 16 +
 .../riscv/amo/zacas-char-requires-zabha.c | 17 +
 ...

[committed] Fix previously latent bug in reorg affecting cris port

2024-07-03 Thread Jeff Law


The late-combine patch has triggered a previously latent bug in reorg.

Basically we have a sequence like this in the middle of reorg before we 
start relaxing delay slots (cris-elf, gcc.dg/torture/pr98289.c)



(insn 67 49 18 (sequence [
(jump_insn 50 49 52 (set (pc)
(if_then_else (ne (reg:CC 19 ccr)
(const_int 0 [0]))
(label_ref:SI 30)
(pc))) "j.c":10:6 discrim 1 282 {*bnecc}
 (expr_list:REG_DEAD (reg:CC 19 ccr)
(int_list:REG_BR_PROB 7 (nil)))
 -> 30)
(insn/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
(reg:SI 16 srp)) 37 {*mov_tomemsi}
 (nil))
]) "j.c":10:6 discrim 1 -1
 (nil))

(note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)

(note 54 18 55 NOTE_INSN_EPILOGUE_BEG)

(jump_insn 55 54 56 (return) "j.c":14:1 228 {*return_expanded}
 (nil)
 -> return)

(barrier 56 55 43)

(note 43 56 65 [bb 4] NOTE_INSN_BASIC_BLOCK)

(note 65 43 30 NOTE_INSN_SWITCH_TEXT_SECTIONS)

(code_label 30 65 8 5 6 (nil) [1 uses])

(note 8 30 61 [bb 5] NOTE_INSN_BASIC_BLOCK)


So at a high level the things to note are that insn 50 conditionally 
jumps around insn 55.  Second there's a SWITCH_TEXT_SECTIONS note 
between insn 50 and the target label for insn 50 (code_label 30).


reorg sees the conditional jump around the unconditional jump/return and 
will invert the jump and retarget the original jump to an appropriate 
location.  In this case generating:



(insn 67 49 18 (sequence [
(jump_insn 50 49 52 (set (pc)
(if_then_else (eq (reg:CC 19 ccr)
(const_int 0 [0]))
(label_ref:SI 68)
(pc))) "j.c":10:6 discrim 1 281 {*beqcc}
 (expr_list:REG_DEAD (reg:CC 19 ccr)
(int_list:REG_BR_PROB 1073741831 (nil)))
 -> 68)
(insn/s/f 52 50 18 (set (mem:SI (reg/f:SI 14 sp) [1  S4 A8])
(reg:SI 16 srp)) 37 {*mov_tomemsi}
 (nil))
]) "j.c":10:6 discrim 1 -1
 (nil))

(note 18 67 54 [bb 3] NOTE_INSN_BASIC_BLOCK)

(note 54 18 43 NOTE_INSN_EPILOGUE_BEG)

(note 43 54 65 [bb 4] NOTE_INSN_BASIC_BLOCK)

(note 65 43 8 NOTE_INSN_SWITCH_TEXT_SECTIONS)

(note 8 65 61 [bb 5] NOTE_INSN_BASIC_BLOCK)

[ ... ]
Where the new target of the jump is a return statement later in the IL.


Note that we now have a SWITCH_TEXT_SECTIONS note that is not 
immediately preceded by a BARRIER.  That triggers an assertion in the 
dwarf2 code.  Removal of the BARRIER is inherent in this optimization.


The fix is simple, we avoid this optimization when there's a 
SWITCH_TEXT_SECTIONS note between the conditional jump insn and its 
target.  Thankfully we already have a routine to test for this in reorg, 
so we just need to call it appropriately.  The other approach would be 
to drop the note which I considered and discarded.


We don't have great coverage for delay slot targets.  I've tested arc, 
cris, fr30, frv, h8, iq2000, microblaze, or1k, sh3  visium in my tester 
as crosses without new regressions, fixing one regression along the way. 
  Bootstrap & regression testing on sh4 and hppa will take considerably 
longer.


Pushing to the trunk momentarily.

Jeff


gcc/

* reorg.cc (relax_delay_slots): Do not optimize a conditional
jump around an unconditional jump/return in the presence of
a text section switch.

diff --git a/gcc/reorg.cc b/gcc/reorg.cc
index 99228a22c69..633099ca765 100644
--- a/gcc/reorg.cc
+++ b/gcc/reorg.cc
@@ -3409,7 +3409,8 @@ relax_delay_slots (rtx_insn *first)
  && next && simplejump_or_return_p (next)
  && (next_active_insn (as_a (target_label))
  == next_active_insn (next))
- && no_labels_between_p (insn, next))
+ && no_labels_between_p (insn, next)
+ && !switch_text_sections_between_p (insn, next_active_insn (next)))
{
  rtx label = JUMP_LABEL (next);
  rtx old_label = JUMP_LABEL (delay_jump_insn);


Re: [PATCH][v2] Handle NULL stmt in SLP_TREE_SCALAR_STMTS

2024-07-03 Thread Richard Sandiford
Richard Biener  writes:
> The following starts to handle NULL elements in SLP_TREE_SCALAR_STMTS
> with the first candidate being the two-operator nodes where some
> lanes are do-not-care and also do not have a scalar stmt computing
> the result.  I originally added SLP_TREE_SCALAR_STMTS to two-operator
> nodes but this exposes PR115764, so I've split that out.
>
> I have a patch use NULL elements for loads from groups with gaps
> where we get around not doing that by having a load permutation.
>
> I'm currently re-bootstrapping and testing this, it passed multiple
> testing rounds before (with the two-operator change).

Ah, so this might answer a question I was going to ask later,
when I actually had time to do something with the answer.

For SVE:

short *s;

s[0] += 1;
s[2] += 2;
s[4] += 3;
s[6] += 4;

(contrived example) could use something like (little-endian):

index   z1.s, #1, #1
ptrue   p0.s, vl4
ldrhz0.h, p0/z, [s]
add z0.h, z0.h, z1.h
strhz0.h, p0, [s]

where the .h predicate is 1,0,1,0,1,0,1,0,0s...  The question was going
to be how we should represent that in SLP, but I guess the answer is
simply to create an 8 (or 7?) element SLP tree and fill in the blanks
with nulls?

Thanks,
Richard

>
> Richard.
>
>   * tree-vect-slp.cc (bst_traits::hash): Handle NULL elements
>   in SLP_TREE_SCALAR_STMTS.
>   (vect_print_slp_tree): Likewise.
>   (vect_mark_slp_stmts): Likewise.
>   (vect_mark_slp_stmts_relevant): Likewise.
>   (vect_find_last_scalar_stmt_in_slp): Likewise.
>   (vect_bb_slp_mark_live_stmts): Likewise.
>   (vect_slp_prune_covered_roots): Likewise.
>   (vect_bb_partition_graph_r): Likewise.
>   (vect_remove_slp_scalar_calls): Likewise.
>   (vect_slp_gather_vectorized_scalar_stmts): Likewise.
>   (vect_bb_slp_scalar_cost): Likewise.
>   (vect_contains_pattern_stmt_p): Likewise.
>   (vect_slp_convert_to_external): Likewise.
>   (vect_find_first_scalar_stmt_in_slp): Likewise.
>   (vect_optimize_slp_pass::remove_redundant_permutations): Likewise.
>   (vect_slp_analyze_node_operations_1): Likewise.
>   (vect_schedule_slp_node): Likewise.
>   * tree-vect-stmts.cc (can_vectorize_live_stmts): Likewise.
>   (vectorizable_shift): Likewise.
>   * tree-vect-data-refs.cc (vect_slp_analyze_load_dependences):
>   Handle NULL elements in SLP_TREE_SCALAR_STMTS.
> ---
>  gcc/tree-vect-data-refs.cc |  2 +
>  gcc/tree-vect-slp.cc   | 76 +++---
>  gcc/tree-vect-stmts.cc | 22 ++-
>  3 files changed, 61 insertions(+), 39 deletions(-)
>
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 959e127c385..39fd887a96b 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1041,6 +1041,8 @@ vect_slp_analyze_load_dependences (vec_info *vinfo, 
> slp_tree node,
>  
>for (unsigned k = 0; k < SLP_TREE_SCALAR_STMTS (node).length (); ++k)
>  {
> +  if (! SLP_TREE_SCALAR_STMTS (node)[k])
> + continue;
>stmt_vec_info access_info
>   = vect_orig_stmt (SLP_TREE_SCALAR_STMTS (node)[k]);
>if (access_info == first_access_info)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index b060161c021..7a9aa86f517 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -356,7 +356,7 @@ vect_contains_pattern_stmt_p (vec stmts)
>stmt_vec_info stmt_info;
>unsigned int i;
>FOR_EACH_VEC_ELT (stmts, i, stmt_info)
> -if (is_pattern_stmt_p (stmt_info))
> +if (stmt_info && is_pattern_stmt_p (stmt_info))
>return true;
>return false;
>  }
> @@ -1592,7 +1592,7 @@ bst_traits::hash (value_type x)
>  {
>inchash::hash h;
>for (unsigned i = 0; i < x.length (); ++i)
> -h.add_int (gimple_uid (x[i]->stmt));
> +h.add_int (x[i] ? gimple_uid (x[i]->stmt) : -1);
>return h.end ();
>  }
>  inline bool
> @@ -2801,9 +2801,12 @@ vect_print_slp_tree (dump_flags_t dump_kind, 
> dump_location_t loc,
>  }
>if (SLP_TREE_SCALAR_STMTS (node).exists ())
>  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> -  dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
> -STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
> -i, stmt_info->stmt);
> +  if (stmt_info)
> + dump_printf_loc (metadata, user_loc, "\t%sstmt %u %G",
> +  STMT_VINFO_LIVE_P (stmt_info) ? "[l] " : "",
> +  i, stmt_info->stmt);
> +  else
> + dump_printf_loc (metadata, user_loc, "\tstmt %u ---\n", i);
>else
>  {
>dump_printf_loc (metadata, user_loc, "\t{ ");
> @@ -2944,7 +2947,8 @@ vect_mark_slp_stmts (slp_tree node, hash_set 
> &visited)
>  return;
>  
>FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> -STMT_SLP_TYPE (stmt_info) = pure_slp;
> +if (stmt_info)
> +

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-07-03 Thread Rainer Orth
Hi Alexandre,

> On Jun 27, 2024, Hongtao Liu  wrote:
>
>> LGTM, thanks.
>
>> On Thu, Jun 13, 2024 at 3:32 PM Alexandre Oliva  wrote:
>
>>> for  gcc/ChangeLog
>>> 
>>> PR target/113719
>>> * config/i386/i386-options.cc
>>> (ix86_override_options_after_change_1): Add opts and opts_set
>>> parms, operate on them, after factoring out of...
>>> (ix86_override_options_after_change): ... this.  Restore calls
>>> of ix86_default_align and ix86_recompute_optlev_based_flags.
>>> (ix86_option_override_internal): Call the factored-out bits.
>
> Thanks, I've finally put it in.

unfortunately this patch caused two regressions on Solaris/x86:

FAIL: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline "hooray[^n]*inline 
copy in test"

both 32 and 64-bit.  Solaris/x86 does default to -fno-omit-frame-pointer.

This failure was fixed before by

commit 499d00127d39ba894b0f7216d73660b380bdc325
Author: Hongyu Wang 
Date:   Wed May 15 11:24:34 2024 +0800

i386: Fix ix86_option override after change [PR 113719]

Obviously the two patches interact badly.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-07-03 Thread Xi Ruoyao
On Fri, 2024-06-28 at 17:53 -0700, Vineet Gupta wrote:
> I was also hoping to get __builtin_inf() done but unforutnately it
> requires little more rtl foo/bar to implement a tri-modal return.

Hmm do we really need to care the symbol?  The generic __builtin_isinf
does not care the symbol anyway: https://godbolt.org/z/bnnGf3a38 and the
standards only require a non-zero return value if the input is infinite
(positive or negative).

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH v2] diagnostics: Follow DECL_ORIGIN in lhd_decl_printable_name [PR102061]

2024-07-03 Thread Peter0x44

3 Jul 2024 3:10:14 pm Peter Damianov :

Currently, if a warning references a cloned function, the name of the 
cloned
function will be emitted in the "In function 'xyz'" part of the 
diagnostic,
which users aren't supposed to see. This patch follows the DECL_ORIGIN 
link

to get the name of the original function.

gcc/ChangeLog:
    PR diagnostics/102061
    * langhooks.cc (lhd_decl_printable_name): Follow DECL_ORIGIN
    link

Signed-off-by: Peter Damianov 
---
v2: use DECL_ORIGIN instead of DECL_ABSTRACT_ORIGIN and remove loop

gcc/langhooks.cc | 1 +
1 file changed, 1 insertion(+)

diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 61f2b676256..943b8345a95 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -223,6 +223,7 @@ lhd_get_alias_set (tree ARG_UNUSED (t))
const char *
lhd_decl_printable_name (tree decl, int ARG_UNUSED (verbosity))
{
+  decl = DECL_ORIGIN(decl);
   gcc_assert (decl && DECL_NAME (decl));
   return IDENTIFIER_POINTER (DECL_NAME (decl));
}
--
2.39.2

This fails many tests. I will have to examine that some other time.


Re: [patch,avr] PR87376: Disable -ftree-ter

2024-07-03 Thread Georg-Johann Lay




Am 02.07.24 um 15:48 schrieb Richard Biener:

On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay  wrote:


Hi Jeff,

This is a patch to get correct code out of 64-bit
loads from address-space __memx.

The AVR address-spaces may require that move insns issue
calls to library support functions, a fact that -ftree-ter
doesn't account for.  tree-ssa-ter.cc then replaces an
expression across such a library call, resulting in wrong code.

This patch disables that pass per default on avr, as there is no
more fine grained way to avoid malicious optimizations.
The pass can still be re-enabled by means of explicit -ftree-ter.

Ok to apply?


I think this requires more details on what goes wrong - I assume
it's not stmt reordering that effectively happens but recursive
expand_expr on SSA defs when those invoke libcalls?  In that
case this would point to a deeper issue.


The difference is that with TER, we get a hard reg in .expand
for a movdi from 24-bit address-space __memx.

Such moves require library calls, which in turn require
specific hard registers.  As avr backend has no movdi, the
moddi gets expanded as 8 * movqi, and that does not work
when the target registers are hard regs, as some of them
are clobbered by the libcalls.

Moreover, even with TER, the code is no more efficient than
without it, so it's not clear what's the point in propagating
hard regs into expander operands. Later passes like fwprop1 and
combine can do that, too.

Requiring libcalls in a mov insn is quite special indeed,
and there is no way to tell that to TER.  TER itself does not
optimize code involving libcalls, so it knows they are fragile.


So - if the wrongness is already apparent in the RTL expansion
pass dump can you quote the respective pieces and explain why?


It expand a 64-bit move from __memx address-space to registers
R18...R25.  This is broken into 8 QI moves to these regs, but
the movqi requires a libcall in some situations, which pass their
arguments in R21...R25.  Hence the libcalls clobber some of the
destination regs.

It would already help when TER would not propagate hard-regs into
expander operands.

Johann


As an alternative, the option could be disabled permanently in
avr.cc::avr_option_override().

Johann

--

AVR: middle-end/87376 - Use -fno-tree-ter per default.

Temporary expression replacement might replace expressions across
library calls, for example with move insn from address-space __memx
like in PR87376.  -ftree-ter has no way where the backend could hook
in to avoid only problematic replacements, thus kick it out altogether.

 PR middle-end/87376
gcc/
 * common/config/avr/avr-common.cc (avr_option_optimization_table)
 : Set to 0.
gcc/testsuite/
 * gcc.target/avr/torture/pr87376-memx.c: New test.


Re: [patch,avr] PR87376: Disable -ftree-ter

2024-07-03 Thread Jeff Law




On 7/3/24 1:26 PM, Georg-Johann Lay wrote:



Am 02.07.24 um 15:48 schrieb Richard Biener:

On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay  wrote:


Hi Jeff,

This is a patch to get correct code out of 64-bit
loads from address-space __memx.

The AVR address-spaces may require that move insns issue
calls to library support functions, a fact that -ftree-ter
doesn't account for.  tree-ssa-ter.cc then replaces an
expression across such a library call, resulting in wrong code.

This patch disables that pass per default on avr, as there is no
more fine grained way to avoid malicious optimizations.
The pass can still be re-enabled by means of explicit -ftree-ter.

Ok to apply?


I think this requires more details on what goes wrong - I assume
it's not stmt reordering that effectively happens but recursive
expand_expr on SSA defs when those invoke libcalls?  In that
case this would point to a deeper issue.


The difference is that with TER, we get a hard reg in .expand
for a movdi from 24-bit address-space __memx.

Such moves require library calls, which in turn require
specific hard registers.  As avr backend has no movdi, the
moddi gets expanded as 8 * movqi, and that does not work
when the target registers are hard regs, as some of them
are clobbered by the libcalls.
But this is something that's handled for other targets.  I don't 
remember all the details, but there's generic code to handle this situation.


Jeff


Re: [patch,avr] PR87376: Disable -ftree-ter

2024-07-03 Thread Georg-Johann Lay




Am 03.07.24 um 21:39 schrieb Jeff Law:



On 7/3/24 1:26 PM, Georg-Johann Lay wrote:



Am 02.07.24 um 15:48 schrieb Richard Biener:

On Tue, Jul 2, 2024 at 3:43 PM Georg-Johann Lay  wrote:


Hi Jeff,

This is a patch to get correct code out of 64-bit
loads from address-space __memx.

The AVR address-spaces may require that move insns issue
calls to library support functions, a fact that -ftree-ter
doesn't account for.  tree-ssa-ter.cc then replaces an
expression across such a library call, resulting in wrong code.

This patch disables that pass per default on avr, as there is no
more fine grained way to avoid malicious optimizations.
The pass can still be re-enabled by means of explicit -ftree-ter.

Ok to apply?


I think this requires more details on what goes wrong - I assume
it's not stmt reordering that effectively happens but recursive
expand_expr on SSA defs when those invoke libcalls?  In that
case this would point to a deeper issue.


The difference is that with TER, we get a hard reg in .expand
for a movdi from 24-bit address-space __memx.

Such moves require library calls, which in turn require
specific hard registers.  As avr backend has no movdi, the
moddi gets expanded as 8 * movqi, and that does not work
when the target registers are hard regs, as some of them
are clobbered by the libcalls.
But this is something that's handled for other targets.  I don't 
remember all the details, but there's generic code to handle this 
situation.


Jeff


A libcall in a move insn? How would the middle-end know that?

Johann



[patch,avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-03 Thread Georg-Johann Lay

Address computation (usually add) with symbols that are aligned
to 256 bytes does not require to add the lo8() part as it is zero.

This patch adds a new combine insn that performs a widening add
from QImode plus such a symbol.  The case when such an aligned
symbol is added to a reg that's already in HImode can be handled
in the addhi3 asm printer.

Ok to apply?

Johann

--

AVR: target90616 - Improve adding constants that are 0 mod 256.

This patch introduces a new insn that works as an insn combine
pattern for (plus:HI (zero_extend:HI(reg:QI) const_0mod256_operannd:HI))
which requires at most 2 instructions.  When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).

gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index f048bf5fd41..014588dd6a7 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -9343,6 +9343,12 @@ avr_out_plus_symbol (rtx *xop, enum rtx_code code, int *plen)
 
   gcc_assert (mode == HImode || mode == PSImode);
 
+  if (mode == HImode
+  && const_0mod256_operand (xop[2], HImode))
+return avr_asm_len (PLUS == code
+			? "subi %B0,hi8(-(%2))"
+			: "subi %B0,hi8(%2)", xop, plen, -1);
+
   avr_asm_len (PLUS == code
 	   ? "subi %A0,lo8(-(%2))" CR_TAB "sbci %B0,hi8(-(%2))"
 	   : "subi %A0,lo8(%2)"CR_TAB "sbci %B0,hi8(%2)",
@@ -12615,6 +12621,14 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
 	  *total = COSTS_N_INSNS (3);
 	  return true;
 	}
+  // *aligned_add_symbol
+  if (mode == HImode
+	  && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+	  && const_0mod256_operand (XEXP (x, 1), HImode))
+	{
+	  *total = COSTS_N_INSNS (1.5);
+	  return true;
+	}
 
   if (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
 	  && REG_P (XEXP (x, 1)))
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index dabf4c0fc5a..72ea1292576 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -10077,6 +10077,23 @@ (define_expand "isinf2"
 FAIL;
   })
 
+
+;; PR90616: Adding symbols that are aligned to 256 bytes can
+;; save up to two instructions.
+(define_insn "*aligned_add_symbol"
+  [(set (match_operand:HI 0 "register_operand" "=d")
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
+ (match_operand:HI 2 "const_0mod256_operand""Cp8")))]
+  ""
+  {
+return REGNO (operands[0]) == REGNO (operands[1])
+  ? "ldi %B0,hi8(%2)"
+  : "mov %A0,%1\;ldi %B0,hi8(%2)";
+  }
+  [(set (attr "length")
+(symbol_ref ("2 - (REGNO (operands[0]) == REGNO (operands[1]))")))])
+
+
 
 ;; Fixed-point instructions
 (include "avr-fixed.md")
diff --git a/gcc/config/avr/constraints.md b/gcc/config/avr/constraints.md
index b4e5525d197..35448614aa7 100644
--- a/gcc/config/avr/constraints.md
+++ b/gcc/config/avr/constraints.md
@@ -253,6 +253,11 @@ (define_constraint "Cn8"
   (and (match_code "const_int")
(match_test "IN_RANGE (ival, -255, -1)")))
 
+(define_constraint "Cp8"
+  "A constant integer or symbolic operand that is at least .p2align 8."
+  (and (match_code "const_int,symbol_ref,const")
+   (match_test "const_0mod256_operand (op, HImode)")))
+
 ;; CONST_FIXED is no element of 'n' so cook our own.
 ;; "i" or "s" would match but because the insn uses iterators that cover
 ;; INT_MODE, "i" or "s" is not always possible.
diff --git a/gcc/config/avr/predicates.md b/gcc/config/avr/predicates.md
index 12013660ed1..5b49481ff0f 100644
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@@ -171,6 +171,20 @@ (define_predicate "reg_or_0_operand"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+;; Returns true when OP is a SYMBOL_REF, CONST or CONST_INT that is
+;; a multiple of 256, i.e. lo8(OP) = 0.
+(define_predicate "const_0mod256_operand"
+  (ior (and (match_code "symbol_ref")
+(match_test "SYMBOL_REF_DECL (op)
+ && DECL_P (SYMBOL_REF_DECL (op))
+ && DECL_ALIGN (SYMBOL_REF_DECL (op)) >= 8 * 256"))
+   (and (match_code "const")
+(match_test "GET_CODE (XEXP (op, 0)) == PLUS")
+(match_test "const_0mod256_operand (XEXP (XEXP (op, 0), 0), HImode)")
+(match_test "const_0mod256_operand (XEXP (XEXP (op, 0), 1), HImode)"))
+   (and (match_code "const_int")
+(match_test "INTVAL

Re: [PATCH v2] c++: remove Concepts TS code

2024-07-03 Thread Jason Merrill

On 6/14/24 12:56 PM, Marek Polacek wrote:

On Mon, Jun 10, 2024 at 10:23:37PM -0400, Jason Merrill wrote:

On 6/10/24 11:13, Marek Polacek wrote:

On Mon, Jun 10, 2024 at 10:22:11AM -0400, Patrick Palka wrote:

On Fri, 7 Jun 2024, Marek Polacek wrote:

@@ -3940,9 +3936,6 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, 
void* data)
 parameter pack (14.6.3), or the type-specifier-seq of a type-id that
 is a pack expansion, the invented template parameter is a template
 parameter pack.  */


This comment should be removed too I think.


Removed in my local tree.

-  if (flag_concepts_ts && ppd->type_pack_expansion_p && is_auto (t)


(BTW this seems to be the only actual user of type_pack_expansion_p so we
can in turn remove that field too.)


Oh neat.  I can do that as a follow-up, unless y'all think it should be
part of this patch.  Thanks,


It probably makes sense for it to be part of this patch.


OK, done.


One exception I'm aware of is template-introductions, as in:

   template
   concept C = true;

   C{T} void foo ();

where we warn by default, but accept the code, and my patch does not
remove the support just yet.


I think let's go ahead and remove it as well.


Done as well.  I was able to remove quite a lot of functions.


+// ??? This used to be a link test with Concepts TS, but now we
+// get:
+// undefined reference to `_Z2f5ITk1C1XEvT_Q1DIS1_E'
+// undefined reference to `_Z2f6ITk1C1XEvT_Q1DIS1_E'
+// so it's a compile test only.


That means the test is failing, and we shouldn't in general change tests to
stop testing the thing that fails; better to xfail.

In this case, however, C++20 doesn't establish the equivalence that it's
testing; that's another thing that wasn't adopted from the Concepts TS.

Note that this area is in question currently; see CWG2802.  But I think the
equivalence is unlikely to return.

So let's move main() to the bottom of the test and test for the ambiguity
errors that we get because they aren't equivalent.


Thanks, done.


--- a/gcc/testsuite/g++.dg/concepts/pr67595.C
+++ /dev/null
@@ -1,14 +0,0 @@
-// { dg-do compile { target c++17_only } }
-// { dg-options "-fconcepts-ts" }
-
-template  concept bool allocatable = requires{{new X}->X *; };
-template  concept bool semiregular = allocatable;
-template  concept bool readable = requires{requires semiregular;};
-template  int weak_input_iterator = requires{{0}->readable;};
-template  bool input_iterator{weak_input_iterator}; // { dg-prune-output 
"narrowing conversion" }
-template  bool forward_iterator{input_iterator};
-template  bool bidirectional_iterator{forward_iterator};
-template 
-concept bool random_access_iterator{bidirectional_iterator}; // { dg-error 
"constant" }
-void fn1(random_access_iterator);
-int main() { fn1(0); }  // { dg-error "" }


Why remove this test?  The main issue I see is that {new X}->X* needs to
change to {new X}->convertible_to (or same_as)


Adjusted as suggested.


+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-requires5.C
@@ -1,7 +1,5 @@
  // { dg-do compile { target c++20 } }
-// { dg-additional-options "-fconcepts-ts -fconcepts-diagnostics-depth=2" }
-
-// Test conversion requirements (not in C++20)


This one could get the same adjustment instead of adding dg-errors.  Or
perhaps the error could suggest that adjustment, and this testcase could
check that?


Adjusted as well.  I don't think I understand what's going on here very
well.  There was:

   concept C = requires(T x) { { x.fn() } -> S1; };

and with my patch:

   concept C = requires(T x) { { x.fn() } -> same_as>; };

so we're checking that the result of x.fn() is the same type as S1.
Why doesn't plain "S1" work?


See the discussion of this change in wg21.link/p1452 .


Anyway, here's v2:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?



-Some constructs that were allowed by the earlier C++ Extensions for
-Concepts Technical Specification, ISO 19217 (2015), but didn't make it
-into the standard, can additionally be enabled by
-@option{-fconcepts-ts}.  The option @option{-fconcepts-ts} was deprecated
-in GCC 14 and may be removed in GCC 15; users are expected to convert
-their code to C++20 concepts.


I'd probably keep the documentation for now, modified to say that it was 
removed in GCC 15.



 // Check equivalence of short- and longhand declarations.


Please add a comment that they are not equivalent in C++20.


+++ b/gcc/testsuite/g++.dg/concepts/fn-concept3.C
+static_assert(noexcept(C3), "function concept should be treated as if 
noexcept(true) specified");


Let's update the message as well.

OK with those adjustments, thanks.

Jason



Re: [PATCH] RISC-V: Add basic support for the Zacas extension

2024-07-03 Thread Patrick O'Neill

Regarding the amocas.q follow-up patch:

I'm having trouble with matching any TImode compare-and-swap patterns. 
Here's the RTL I'm trying:


(define_mode_iterator SUPERGPR [SI DI TI])

(define_insn "zacas_atomic_cas_value"
  [(set (match_operand:SUPERGPR 0 "register_operand" "=&r")
(match_operand:SUPERGPR 1 "memory_operand" "+A"))
   (set (match_dup 1)
(unspec_volatile:SUPERGPR [(match_operand:SUPERGPR 2 "reg_or_0_operand" 
"rJ")
   (match_operand:SUPERGPR 3 "reg_or_0_operand" "rJ")
   (match_operand:SI 4 "const_int_operand")  ;; mod_s
   (match_operand:SI 5 "const_int_operand")] ;; mod_f
 UNSPEC_COMPARE_AND_SWAP))]
  "TARGET_ZACAS"
  {
return "amocassupergpr";
  }
  [(set_attr "type" "atomic")
   (set (attr "length") (const_int 4))])

(define_expand "atomic_compare_and_swap"
  [(match_operand:SI 0 "register_operand" "")   ;; bool output
   (match_operand:SUPERGPR 1 "register_operand" "")  ;; val output
   (match_operand:SUPERGPR 2 "memory_operand" "");; memory
   (match_operand:SUPERGPR 3 "reg_or_0_operand" "")  ;; expected value
   (match_operand:SUPERGPR 4 "reg_or_0_operand" "")  ;; desired value
   (match_operand:SI 5 "const_int_operand" "")  ;; is_weak
   (match_operand:SI 6 "const_int_operand" "")  ;; mod_s
   (match_operand:SI 7 "const_int_operand" "")] ;; mod_f
  "TARGET_ZACAS"
{
  emit_insn (gen_zacas_atomic_cas_value (operands[1], operands[2],
   operands[3], operands[4],
   operands[6], operands[7]));
  DONE;
})

and here's my testcase:

void atomic_compare_exchange_long_long_seq_cst (__int128 *bar, __int128 *baz, 
__int128 qux)
{
  __atomic_compare_exchange_n(bar, baz, qux, 1, __ATOMIC_SEQ_CST, 
__ATOMIC_SEQ_CST);
}

void atomic_compare_exchange_int_seq_cst (int *bar, int *baz, int qux)
{
  __atomic_compare_exchange_n(bar, baz, qux, 1, __ATOMIC_SEQ_CST, 
__ATOMIC_SEQ_CST);
}

void atomic_compare_exchange_long_seq_cst (long *bar, long *baz, long qux)
{
  __atomic_compare_exchange_n(bar, baz, qux, 1, __ATOMIC_SEQ_CST, 
__ATOMIC_SEQ_CST);
}

and generated asm:

atomic_compare_exchange_long_long_seq_cst:
li  a5,5
addisp,sp,-16
mv  a4,a5
sd  ra,8(sp)
call__atomic_compare_exchange_16
ld  ra,8(sp)
addisp,sp,16
jr  ra
.size   atomic_compare_exchange_long_long_seq_cst, 
.-atomic_compare_exchange_long_long_seq_cst
.align  1
.globl  atomic_compare_exchange_int_seq_cst
.type   atomic_compare_exchange_int_seq_cst, @function
atomic_compare_exchange_int_seq_cst:
lw  a4,0(a1)
amocassupergprSI
sw  a5,0(a1)
ret
.size   atomic_compare_exchange_int_seq_cst, 
.-atomic_compare_exchange_int_seq_cst
.align  1
.globl  atomic_compare_exchange_long_seq_cst
.type   atomic_compare_exchange_long_seq_cst, @function
atomic_compare_exchange_long_seq_cst:
ld  a4,0(a1)
amocassupergprDI
sd  a5,0(a1)
ret
.size   atomic_compare_exchange_long_seq_cst, 
.-atomic_compare_exchange_long_seq_cst
.ident  "GCC: (GNU) 15.0.0 20240627 (experimental)"
.section.note.GNU-stack,"",@progbits

The SI/DI patterns match fine but TI generates a call.

I've also tried doing things similar to:

(define_expand "mulditi3"

where only the define_expand has TImode operands and the define_insn 
uses placeholder SI operands.

No luck with that approach either.

I'd appreciate any guidance here - otherwise I'll keep trying to make 
sense of what's happening in insn-recog.cc ;)


Thanks,
Patrick

On 7/3/24 11:16, Patrick O'Neill wrote:

From: Gianluca Guida

This patch adds support for amocas.{b|h|w|d}. Support for amocas.q
(64/128 bit cas for rv32/64) will be added in a future patch.

Extension:https://github.com/riscv/riscv-zacas
Ratification:https://jira.riscv.org/browse/RVS-680

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::to_string): Skip zacas when not supported by
the assembler.
* config.in: Regenerate.
* config/riscv/arch-canonicalize: Make zacas imply zaamo.
* config/riscv/riscv.opt: Add zacas.
* config/riscv/sync.md (zacas_atomic_cas_value): New pattern.
(atomic_compare_and_swap): Use new pattern for compare-and-swap 
ops.
* configure: Regenerate.
* configure.ac: Regenerate.
* doc/sourcebuild.texi: Add Zacas documentation.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add zacas testsuite infra support.
* 
gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire-release.c:
Remove zacas to continue to test the lr/sc pairs.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-acquire.c: 
Ditto.
* gcc.target/riscv/amo/zalrsc-rvwmo-compare-exchange-int-consume.c: 
Ditto.
* gcc.target/riscv/amo

[OG14] Fortran/OpenMP: Support mapping of DT with allocatable components: disable 'generate_callback_wrapper' for nvptx target (was: [Patch][Stage 1] Fortran/OpenMP: Support mapping of DT with allocat

2024-07-03 Thread Thomas Schwinge
Hi Tobias!

I've compared test results for nvptx target for GCC 14 vs. the new OG14,
and ran into a number of unexpected regressions: thousands of compilation
PASS -> FAIL in the Fortran testsuite.  The few that I looked at were all
like:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
compiler exited with status 1

Comparing '-fdump-tree-all' for 'gfortran.dg/pr37287-1.f90' (randomly
picked) for GCC 14 vs. OG14, already in 'pr37287-1.f90.005t.original' we
see:

--- [GCC 14]/pr37287-1.f90.005t.original  2024-07-03 12:45:08.369948469 
+0200
+++ [OG14]/pr37287-1.f90.005t.original   2024-07-03 12:44:57.770072298 
+0200
@@ -1,3 +1,21 @@
+__attribute__((fn spec (". r r r r ")))
+integer(kind=8) __callback___iso_c_binding_C_ptr (integer(kind=8) 
(*) (void *, void * & restrict, integer(kind=2), void (*) (void)) 
cb, void * token, void * this_ptr, integer(kind=2) flag)
+{
+  integer(kind=8) result;
+  void * * scalar;
+
+  result = 0;
+  if (flag == 1)
+{
+  result = cb (token, &this_ptr, 64, 3, 0B);
+  return result;
+}
+  L$1:;
+  scalar = (void * *) this_ptr;
+  return result;
+}
+
+
 __attribute__((fn spec (". . . ")))
 void __copy___iso_c_binding_C_ptr (void * & restrict src, void * & 
restrict dst)
 {

(In addition to the whole function '__callback___iso_c_binding_C_ptr',
also note that the 'L$1:' label and 'scalar' variable are dead here; but
that's likely unrelated to the issue at hand?)

This points to OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components":

On 2022-03-01T16:34:18+0100, Tobias Burnus  wrote:
> this patch adds support for mapping something like
>type t
>  type(t2), allocatable :: a, b(:)
>  integer, allocatable :: c, c(:)
>end type t
>type(t), allocatable :: var, var2(:,:)
>
>!$omp target enter data map(var, var)
>
> which does a deep walk of the components at runtime.
>
> [...]
>
> Issues: None known, but I am sure with experimenting,
> more can be found - [...]

Due to a number of other commits (at least textually) depending on this
one, this commit isn't easy to revert on OG14.

But: if I disable it for nvptx target as per the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target",
then we're back to good -- all GCC 14 vs. OG14 regressions resolved for
nvptx target.

By the way: it's possible that we've had the same misbehavior also on
OG13 and earlier, but just nobody ever tested that for nvptx target.

Note that also outside of OG14 (that is, in GCC 14 as well as GCC trunk),
we have a number of instances of:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'

... all over the Fortran test suite (only).  My current theory therefore
is that there is some latent issue, which is just greatly exacerbated by
OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components" (or
some related change).

This could be the Fortran front end generating incorrect GIMPLE, or the
middle end or (more likely?) nvptx back end not correctly handling
something that only comes into existance via the Fortran front end.

Anyway: until we understand the underlying issue, OK to push the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target"
to devel/omp/gcc-14 branch?


Grüße
 Thomas


>From 3fb9e4cabea736ace66ee197be1b13a978af10ac Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 3 Jul 2024 22:09:39 +0200
Subject: [PATCH] Fortran/OpenMP: Support mapping of DT with allocatable
 components: disable 'generate_callback_wrapper' for nvptx target

This is, obviously, not the final fix for this issue.

	gcc/fortran/
	* class.cc (generate_callback_wrapper) [GCC_NVPTX_H]: Disable.
---
 gcc/fortran/class.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 15aacd98fd8..2c062204e5a 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gfortran.h"
 #include "constructor.h"
 #include "target-memory.h"
+#include "tm.h" //TODO
 
 /* Inserts a derived type component reference in a data reference chain.
 TS: base type of the ref chain so far, in which we will pick the component
@@ -2420,6 +2421,30 @@ generate_callback_wrapper (gfc_symbol *vtab, gfc_symbol *derived,
 			   gfc_namespace *ns, const char *tname,
 			   gfc_compon

  1   2   >