date:20231017

[PATCH] vect: Cost adjacent vector loads/stores together [PR111784]

2023-10-17 Thread Kewen.Lin

Hi,

As comments[1][2], this patch is to change the costing way
on some adjacent vector loads/stores from costing one by
one to costing them together with the total number once.

It helps to fix the exposed regression PR111784 on aarch64,
as aarch64 specific costing could make different decisions
according to the different costing ways (counting with total
number vs. counting one by one).  Based on a reduced test
case from PR111784, only considering vec_num can fix the
regression already, but vector loads/stores in regard to
ncopies are also adjacent accesses, so they are considered
as well.

btw, this patch leaves the costing on dr_explicit_realign
and dr_explicit_realign_optimized alone to make it simple.
The costing way change can cause the differences for them
since there is one costing depending on targetm.vectorize.
builtin_mask_for_load and it's costed according to the
calling times.  IIUC, these two dr_alignment_support are
mainly used for old Power? (only having 16 bytes aligned
vector load/store but no unaligned vector load/store).

Bootstrapped and regtested on x86_64-redhat-linux,
aarch64-linux-gnu, powerpc64-linux-gnu P{7,8,9}
and powerpc64le-linux-gnu P{8,9,10}.

Is it ok for trunk?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630742.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630744.html

BR,
Kewen
-
gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Adjust costing way for
adjacent vector stores, by costing them with the total number
rather than costing them one by one.
(vectorizable_load): Adjust costing way for adjacent vector
loads, by costing them with the total number rather than costing
them one by one.
---
 gcc/tree-vect-stmts.cc | 137 -
 1 file changed, 95 insertions(+), 42 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..af134ff2bf7 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -8681,6 +8681,9 @@ vectorizable_store (vec_info *vinfo,
   alias_off = build_int_cst (ref_type, 0);
   stmt_vec_info next_stmt_info = first_stmt_info;
   auto_vec vec_oprnds (ncopies);
+  /* For costing some adjacent vector stores, we'd like to cost with
+the total number of them once instead of cost each one by one. */
+  unsigned int n_adjacent_stores = 0;
   for (g = 0; g < group_size; g++)
{
  running_off = offvar;
@@ -8738,10 +8741,7 @@ vectorizable_store (vec_info *vinfo,
 store to avoid ICE like 110776.  */
  if (VECTOR_TYPE_P (ltype)
  && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U))
-   vect_get_store_cost (vinfo, stmt_info, 1,
-alignment_support_scheme,
-misalignment, _cost,
-cost_vec);
+   n_adjacent_stores++;
  else
inside_cost
  += record_stmt_cost (cost_vec, 1, scalar_store,
@@ -8798,11 +8798,18 @@ vectorizable_store (vec_info *vinfo,
break;
}

-  if (costing_p && dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"vect_model_store_cost: inside_cost = %d, "
-"prologue_cost = %d .\n",
-inside_cost, prologue_cost);
+  if (costing_p)
+   {
+ if (n_adjacent_stores > 0)
+   vect_get_store_cost (vinfo, stmt_info, n_adjacent_stores,
+alignment_support_scheme, misalignment,
+_cost, cost_vec);
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"vect_model_store_cost: inside_cost = %d, "
+"prologue_cost = %d .\n",
+inside_cost, prologue_cost);
+   }

   return true;
 }
@@ -8909,6 +8916,9 @@ vectorizable_store (vec_info *vinfo,
 {
   gcc_assert (!slp && grouped_store);
   unsigned inside_cost = 0, prologue_cost = 0;
+  /* For costing some adjacent vector stores, we'd like to cost with
+the total number of them once instead of cost each one by one. */
+  unsigned int n_adjacent_stores = 0;
   for (j = 0; j < ncopies; j++)
{
  gimple *new_stmt;
@@ -8974,10 +8984,7 @@ vectorizable_store (vec_info *vinfo,

  if (costing_p)
{
- for (i = 0; i < vec_num; i++)
-   vect_get_store_cost (vinfo, stmt_info, 1,
-alignment_support_scheme, misalignment,
-_cost, cost_vec);
+ n_adjacent_stores += vec_num;
  continue;
}

@@ -9067,11 +9074,18 @@

[PATCH] RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx

2023-10-17 Thread Juzhe-Zhong

This patch optimize this following permutation with consecutive patterns index:

typedef char vnx16i __attribute__ ((vector_size (16)));

#define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15

vnx16i __attribute__ ((noinline, noclone))
test_1 (vnx16i x, vnx16i y)
{
  return __builtin_shufflevector (x, y, MASK_16);
}

Before this patch:

lui a5,%hi(.LC0)
addia5,a5,%lo(.LC0)
vsetivlizero,16,e8,m1,ta,ma
vle8.v  v3,0(a5)
vle8.v  v2,0(a1)
vrgather.vv v1,v2,v3
vse8.v  v1,0(a0)
ret

After this patch:

vsetivlizero,16,e8,mf8,ta,ma
vle8.v  v2,0(a1)
vsetivlizero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivlizero,16,e8,mf8,ta,ma
vse8.v  v1,0(a0)
ret

Overal reduce 1 instruction which is vector load instruction which is much more 
expansive
than VL toggling.

Also, with this patch, we are using vrgather.vi which reduce 1 vector register 
consumption.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function.
(expand_vec_perm_const_1): Add consecutive pattern recognition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add new test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 85 +
 .../rvv/autovec/vls-vlmax/consecutive-1.c | 21 +
 .../rvv/autovec/vls-vlmax/consecutive-2.c | 45 +
 .../rvv/autovec/vls-vlmax/consecutive_run-1.c | 27 ++
 .../rvv/autovec/vls-vlmax/consecutive_run-2.c | 51 ++
 .../riscv/rvv/autovec/vls/consecutive-1.c | 94 +++
 .../riscv/rvv/autovec/vls/consecutive-2.c | 68 ++
 .../riscv/rvv/autovec/vls/consecutive-3.c | 68 ++
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  6 ++
 9 files changed, 465 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 21d86c3f917..895c11d13fc 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2822,6 +2822,89 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d)
   return true;
 }
 
+/* Recognize the consecutive index that we can use a single
+   vrgather.v[x|i] to shuffle the vectors.
+
+   e.g. short[8] = VEC_PERM_EXPR 
+   Use SEW = 32, index = 1 vrgather.vi to get the result.  */
+static bool
+shuffle_consecutive_patterns (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  scalar_mode smode = GET_MODE_INNER (vmode);
+  poly_int64 vec_len = d->perm.length ();
+  HOST_WIDE_INT elt;
+
+  if (!vec_len.is_constant () || !d->perm[0].is_constant ())
+return false;
+  int vlen = vec_len.to_constant ();
+
+  /* Compute the last element index of consecutive pattern from the leading
+ consecutive elements.  */
+  int last_consecutive_idx = -1;
+  int consecutive_num = -1;
+  for (int i = 1; i < vlen; i++)
+{
+  if (maybe_ne (d->perm[i], d->perm[i - 1] + 1))
+   break;
+  last_consecutive_idx = i;
+  consecutive_num = last_consecutive_idx + 1;
+}
+
+  int new_vlen = vlen / consecutive_num;
+  if (last_consecutive_idx < 0 || consecutive_num == vlen
+  || !pow2p_hwi (consecutive_num) || !pow2p_hwi (new_vlen))
+return false;
+  /* VEC_PERM <..., (index, index + 1, ... index + consecutive_num - 1)>.
+ All elements of index, index + 1, ... index + consecutive_num - 1 should
+ locate at the same vector.  */
+  if (maybe_ge (d->perm[0], vec_len)
+  != maybe_ge (d->perm[last_consecutive_idx], vec_len))
+return false;
+  /* If a vector has 8 elements.  We allow optimizations on consecutive
+ patterns e.g. <0, 1, 2, 3, 0, 1, 2, 3> or <4, 5, 6, 7, 4, 5, 6, 7>.
+ Other patterns like <2, 3, 4, 5, 2, 3, 4, 5> are not feasible patterns
+ to be optimized.

Re: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:35
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 14/14] RISC-V: P14: Adjust and add testcases
This sub-patch adjust some testcases and add some bugfix
testcases.
 
PR target/111037
PR target/111234
PR target/111725
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/scalar_move-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-46.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-89.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-95.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/pr109773-1.c: Adjust.
* gcc.target/riscv/rvv/base/pr111037-1.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-1.c: ...here.
* gcc.target/riscv/rvv/base/pr111037-2.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr111037-2.c: ...here.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-12.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-3.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-13.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-18.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/avl_single-104.c: New test.
* gcc.target/riscv/rvv/vsetvl/avl_single-105.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr111037-4.c: New test.
 
---
.../gcc.target/riscv/rvv/base/scalar_move-1.c |  2 +-
.../riscv/rvv/vsetvl/avl_single-104.c | 35 +++
.../riscv/rvv/vsetvl/avl_single-105.c | 23 
.../riscv/rvv/vsetvl/avl_single-23.c  |  7 ++--
.../riscv/rvv/vsetvl/avl_single-46.c  |  3 +-
.../riscv/rvv/vsetvl/avl_single-89.c  |  8 ++---
.../riscv/rvv/vsetvl/avl_single-95.c  |  2 +-
.../riscv/rvv/vsetvl/imm_bb_prop-1.c  |  7 ++--
.../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  |  2 +-
.../gcc.target/riscv/rvv/vsetvl/pr109773-1.c  |  2 +-
.../riscv/rvv/{base => vsetvl}/pr111037-1.c   |  0
.../riscv/rvv/{base => vsetvl}/pr111037-2.c   |  0
.../gcc.target/riscv/rvv/vsetvl/pr111037-3.c  | 16 +
.../gcc.target/riscv/rvv/vsetvl/pr111037-4.c  | 16 +
.../riscv/rvv/vsetvl/vlmax_back_prop-25.c | 10 +++---
.../riscv/rvv/vsetvl/vlmax_back_prop-26.c | 10 +++---
.../riscv/rvv/vsetvl/vlmax_conflict-12.c  |  1 -
.../riscv/rvv/vsetvl/vlmax_conflict-3.c   |  2 +-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-13.c   |  4 +--
.../gcc.target/riscv/rvv/vsetvl/vsetvl-18.c   |  4 ++-
.../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  2 +-
21 files changed, 125 insertions(+), 31 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-105.c
rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-1.c (100%)
rename gcc/testsuite/gcc.target/riscv/rvv/{base => vsetvl}/pr111037-2.c (100%)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111037-4.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
index 18349132a88..c833d8989e9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/scalar_move-1.c
@@ -46,8 +46,8 @@ int32_t foo3 (int32_t *base, size_t vl)
** vl1re32\.v\tv[0-9]+,0\([a-x0-9]+\)
** vsetvli\tzero,[a-x0-9]+,e32,m1,t[au],m[au]
** vadd.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+
-** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
+** vsetvli\tzero,[a-x0-9]+,e32,m2,t[au],m[au]
** vmv.v.x\tv[0-9]+,\s*[a-x0-9]+
** vmv.x.s\t[a-x0-9]+,\s*v[0-9]+
** ret
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
new file mode 100644
index 000..fb3577dcb98
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/avl_single-104.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-schedule-insns 
-fno-schedule-insns2 -fno-tree-vectorize" } */
+
+#include "riscv_vector.h"
+
+void
+foo (int cond, int vl, int *in, int *out, int n)
+{
+  if (cond > 30)
+{
+  vint32m1_t v = __riscv_vle32_v_i32m1 ((int32_t *) in, vl);
+  __riscv_vse32_v_i32m1 ((int32_t *) out, v, vl);
+}
+  else if (cond < 10)
+{
+  vint8mf4_t v = __riscv_vle8_v_i8mf4 ((int8_t *) in, vl);
+  v = __riscv_vle8_v_i8mf4_tu (v, (int8_t *) in + 10, vl);
+  __riscv_vse8_v_i8mf4 ((int8_t *) out, v, vl);
+}
+  else
+{
+  vl = vl * 2;
+}
+
+  for (int i = 0; i < n; i += 1)
+{
+  vint16mf2_t

Re: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 13/14] RISC-V: P13: Reorganize functions used to modify RTL
This sub-patch reoriganize the functions that used to modify RTL.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (has_no_uses): Moved.
(validate_change_or_fail): Moved.
(gen_vsetvl_pat): Removed.
(emit_vsetvl_insn): Removed.
(eliminate_insn): Removed.
(change_insn): Removed.
(change_vsetvl_insn): New.
(pre_vsetvl::emit_vsetvl): New.
(pre_vsetvl::remove_avl_operand): Adjust.
(pre_vsetvl::remove_unused_dest_operand): Adjust.
(pass_vsetvl::simple_vsetvl): Adjust.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 443 ---
1 file changed, 176 insertions(+), 267 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index d91b0272d9f..78816cbee15 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -680,6 +680,30 @@ get_bb_index (unsigned expr_id, unsigned num_bb)
   return expr_id % num_bb;
}
 
+/* Return true if the SET result is not used by any instructions.  */
+static bool
+has_no_uses (basic_block cfg_bb, rtx_insn *rinsn, int regno)
+{
+  if (bitmap_bit_p (df_get_live_out (cfg_bb), regno))
+return false;
+
+  rtx_insn *iter;
+  for (iter = NEXT_INSN (rinsn); iter && iter != NEXT_INSN (BB_END (cfg_bb));
+   iter = NEXT_INSN (iter))
+if (df_find_use (iter, regno_reg_rtx[regno]))
+  return false;
+
+  return true;
+}
+
+/* Change insn and Assert the change always happens.  */
+static void
+validate_change_or_fail (rtx object, rtx *loc, rtx new_rtx, bool in_group)
+{
+  bool change_p = validate_change (object, loc, new_rtx, in_group);
+  gcc_assert (change_p);
+}
+
/* This flags indicates the minimum demand of the vl and vtype values by the
RVV instruction. For example, DEMAND_RATIO_P indicates that this RVV
instruction only needs the SEW/LMUL ratio to remain the same, and does not
@@ -1126,6 +1150,28 @@ public:
   }
   }
 
+  /* Returns the corresponding vsetvl rtx pat.  */
+  rtx get_vsetvl_pat (bool ignore_vl = false) const
+  {
+rtx avl = get_avl ();
+/* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
+   set the value of avl to (const_int 0) so that VSETVL PASS will
+   insert vsetvl correctly.*/
+if (!get_avl ())
+  avl = GEN_INT (0);
+rtx sew = gen_int_mode (get_sew (), Pmode);
+rtx vlmul = gen_int_mode (get_vlmul (), Pmode);
+rtx ta = gen_int_mode (get_ta (), Pmode);
+rtx ma = gen_int_mode (get_ma (), Pmode);
+
+if (change_vtype_only_p ())
+  return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma);
+else if (has_reg_vl () && !ignore_vl)
+  return gen_vsetvl (Pmode, get_vl (), avl, sew, vlmul, ta, ma);
+else
+  return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma);
+  }
+
   bool operator== (const vsetvl_info ) const
   {
 gcc_assert (!uninit_p () && !other.uninit_p ()
@@ -1938,199 +1984,6 @@ public:
   }
};
 
-/* Emit vsetvl instruction.  */
-static rtx
-gen_vsetvl_pat (enum vsetvl_type insn_type, const vsetvl_info , rtx vl)
-{
-  rtx avl = info.get_avl ();
-  /* if optimization == 0 and the instruction is vmv.x.s/vfmv.f.s,
- set the value of avl to (const_int 0) so that VSETVL PASS will
- insert vsetvl correctly.*/
-  if (!info.get_avl ())
-avl = GEN_INT (0);
-  rtx sew = gen_int_mode (info.get_sew (), Pmode);
-  rtx vlmul = gen_int_mode (info.get_vlmul (), Pmode);
-  rtx ta = gen_int_mode (info.get_ta (), Pmode);
-  rtx ma = gen_int_mode (info.get_ma (), Pmode);
-
-  if (insn_type == VSETVL_NORMAL)
-{
-  gcc_assert (vl != NULL_RTX);
-  return gen_vsetvl (Pmode, vl, avl, sew, vlmul, ta, ma);
-}
-  else if (insn_type == VSETVL_VTYPE_CHANGE_ONLY)
-return gen_vsetvl_vtype_change_only (sew, vlmul, ta, ma);
-  else
-return gen_vsetvl_discard_result (Pmode, avl, sew, vlmul, ta, ma);
-}
-
-static rtx
-gen_vsetvl_pat (rtx_insn *rinsn, const vsetvl_info , rtx vl = NULL_RTX)
-{
-  rtx new_pat;
-  vsetvl_info new_info = info;
-  /* For vmv.x.s, use 0 for avl.  */
-  if (!info.get_avl ())
-{
-  new_info.set_avl (const0_rtx);
-  new_info.set_avl_def (nullptr);
-}
-
-  if (vl)
-new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, vl);
-  else
-{
-  if (vsetvl_insn_p (rinsn) && !info.change_vtype_only_p ())
- new_pat = gen_vsetvl_pat (VSETVL_NORMAL, new_info, get_vl (rinsn));
-  else if (info.change_vtype_only_p ()
-|| INSN_CODE (rinsn) == CODE_FOR_vsetvl_vtype_change_only)
- new_pat = gen_vsetvl_pat (VSETVL_VTYPE_CHANGE_ONLY, new_info, NULL_RTX);
-  else
- new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, new_info, NULL_RTX);
-}
-  return new_pat;
-}
-
-static void
-emit_vsetvl_insn (enum vsetvl_type insn_type, enum emit_type emit_type,
-   const vsetvl_info , rtx vl, rtx_insn *rinsn)
-{

Re: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h

2023-10-17 Thread juzhe.zh...@rivai.ai

OK



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 12/14] RISC-V: P12: Delete riscv-vsetvl.h
This sub-patch delete the unused header file riscv-vsetvl.h
since we no need export any function.
 
gcc/ChangeLog:
 
* config/riscv/t-riscv: Removed riscv-vsetvl.h
* config/riscv/riscv-vsetvl.h: Removed.
 
---
gcc/config/riscv/riscv-vsetvl.h | 59 -
gcc/config/riscv/t-riscv|  2 +-
2 files changed, 1 insertion(+), 60 deletions(-)
delete mode 100644 gcc/config/riscv/riscv-vsetvl.h
 
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
deleted file mode 100644
index 16c84e0684b..000
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ /dev/null
@@ -1,59 +0,0 @@
-/* VSETVL pass header for RISC-V 'V' Extension for GNU compiler.
-   Copyright (C) 2022-2023 Free Software Foundation, Inc.
-   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 3, or(at your option)
-any later version.
-
-GCC is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
-.  */
-
-#ifndef GCC_RISCV_VSETVL_H
-#define GCC_RISCV_VSETVL_H
-
-namespace riscv_vector {
-
-/* Classification of vsetvl instruction.  */
-enum vsetvl_type
-{
-  VSETVL_NORMAL,
-  VSETVL_VTYPE_CHANGE_ONLY,
-  VSETVL_DISCARD_RESULT,
-  NUM_VSETVL_TYPE
-};
-
-enum emit_type
-{
-  /* emit_insn directly.  */
-  EMIT_DIRECT,
-  EMIT_BEFORE,
-  EMIT_AFTER,
-};
-
-enum def_type
-{
-  REAL_SET = 1 << 0,
-  PHI_SET = 1 << 1,
-  BB_HEAD_SET = 1 << 2,
-  BB_END_SET = 1 << 3,
-  /* ??? TODO: In RTL_SSA framework, we have REAL_SET,
- PHI_SET, BB_HEAD_SET, BB_END_SET and
- CLOBBER_DEF def_info types. Currently,
- we conservatively do not optimize clobber
- def since we don't see the case that we
- need to optimize it.  */
-  CLOBBER_DEF = 1 << 4
-};
-
-} // namespace riscv_vector
-#endif
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index f137e1f17ef..dd17056fe82 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -64,7 +64,7 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \
   $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \
   insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \
-  predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \
+  predict.h profile-count.h \
   $(srcdir)/config/riscv/riscv-vsetvl.def
$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/riscv/riscv-vsetvl.cc
--
2.36.3

Re: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 09/14] RISC-V: P9: Cleanup post optimize phase
This sub-patch deletes partial post optimize code(which implement
in the main phase) and move the remain cleanup code to pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::cleaup): New.
(pre_vsetvl::remove_avl_operand): New.
(pre_vsetvl::remove_unused_dest_operand): New.
(pass_vsetvl::get_vsetvl_at_end): Removed.
(local_avl_compatible_p): Removed.
(pass_vsetvl::local_eliminate_vsetvl_insn): Removed.
(get_first_vsetvl_before_rvv_insns): Removed.
(pass_vsetvl::global_eliminate_vsetvl_insn): Removed.
(pass_vsetvl::ssa_post_optimization): Removed.
(has_no_uses): Removed.
(pass_vsetvl::df_post_optimization): Removed.
(pass_vsetvl::init): Removed.
(pass_vsetvl::done): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 675 ---
1 file changed, 76 insertions(+), 599 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 5d84d290e9e..ac636623b3f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3791,6 +3791,82 @@ pre_vsetvl::emit_vsetvl ()
 commit_edge_insertions ();
}
 
+void
+pre_vsetvl::cleaup ()
+{
+  remove_avl_operand ();
+  remove_unused_dest_operand ();
+}
+
+void
+pre_vsetvl::remove_avl_operand ()
+{
+  for (const bb_info *bb : crtl->ssa->bbs ())
+for (insn_info *insn : bb->real_nondebug_insns ())
+  {
+ rtx_insn *rinsn = insn->rtl ();
+ /* Erase the AVL operand from the instruction.  */
+ if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn)))
+   continue;
+ rtx avl = get_vl (rinsn);
+ if (count_regno_occurrences (rinsn, REGNO (avl)) == 1)
+   {
+ /* Get the list of uses for the new instruction.  */
+ auto attempt = crtl->ssa->new_change_attempt ();
+ insn_change change (insn);
+ /* Remove the use of the substituted value.  */
+ access_array_builder uses_builder (attempt);
+ uses_builder.reserve (insn->num_uses () - 1);
+ for (use_info *use : insn->uses ())
+   if (use != find_access (insn->uses (), REGNO (avl)))
+ uses_builder.quick_push (use);
+ use_array new_uses = use_array (uses_builder.finish ());
+ change.new_uses = new_uses;
+ change.move_range = insn->ebb ()->insn_range ();
+ rtx pat;
+ if (fault_first_load_p (rinsn))
+   pat = simplify_replace_rtx (PATTERN (rinsn), avl, const0_rtx);
+ else
+   {
+ rtx set = single_set (rinsn);
+ rtx src = simplify_replace_rtx (SET_SRC (set), avl, const0_rtx);
+ pat = gen_rtx_SET (SET_DEST (set), src);
+   }
+ bool ok = change_insn (crtl->ssa, change, insn, pat);
+ gcc_assert (ok);
+   }
+  }
+}
+
+void
+pre_vsetvl::remove_unused_dest_operand ()
+{
+  df_analyze ();
+  hash_set to_delete;
+  basic_block cfg_bb;
+  rtx_insn *rinsn;
+  FOR_ALL_BB_FN (cfg_bb, cfun)
+{
+  FOR_BB_INSNS (cfg_bb, rinsn)
+ {
+   if (NONDEBUG_INSN_P (rinsn) && vsetvl_insn_p (rinsn))
+ {
+   rtx vl = get_vl (rinsn);
+   vsetvl_info info = vsetvl_info (rinsn);
+   if (has_no_uses (cfg_bb, rinsn, REGNO (vl)))
+ {
+   if (!info.has_vlmax_avl ())
+ {
+   rtx new_pat = gen_vsetvl_pat (VSETVL_DISCARD_RESULT, info,
+ NULL_RTX);
+   validate_change_or_fail (rinsn,  (rinsn), new_pat,
+false);
+ }
+ }
+ }
+ }
+}
+}
 
const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
@@ -3923,602 +3999,3 @@ make_pass_vsetvl (gcc::context *ctxt)
{
   return new pass_vsetvl (ctxt);
}
-
-/* Some instruction can not be accessed in RTL_SSA when we don't re-init
-   the new RTL_SSA framework but it is definetely at the END of the block.
-
-  Here we optimize the VSETVL is hoisted by LCM:
-
-   Before LCM:
- bb 1:
-   vsetvli a5,a2,e32,m1,ta,mu
- bb 2:
-   vsetvli zero,a5,e32,m1,ta,mu
-   ...
-
-   After LCM:
- bb 1:
-   vsetvli a5,a2,e32,m1,ta,mu
-   LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate
- bb 2:
-   ...
-   */
-rtx_insn *
-pass_vsetvl::get_vsetvl_at_end (const bb_info *bb, vector_insn_info *dem) const
-{
-  rtx_insn *end_vsetvl = BB_END (bb->cfg_bb ());
-  if (end_vsetvl && NONDEBUG_INSN_P (end_vsetvl))
-{
-  if (JUMP_P (end_vsetvl))
- end_vsetvl = PREV_INSN (end_vsetvl);
-
-  if (NONDEBUG_INSN_P (end_vsetvl)
-   && vsetvl_discard_result_insn_p (end_vsetvl))
- {
-   /* Only handle single succ. here, multiple succ. is much
-  more complicated.  */
-   if (single_succ_p (bb->cfg_bb ()))
- {
-   edge e = single_succ_edge (bb->cfg_bb ());
-   *dem = get_block_info (e->dest).local_dem;
-   return end_vsetvl;
- }
- }
-}
-  return nullptr;
-}
-
-/* This predicator should only used within same basic block.  */
-static bool
-local_avl_compatible_p (rtx avl1, rtx avl2)
-{
-  if

Re: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn into Phase 4

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 08/14] RISC-V: P8: Unified insert and delete of vsetvl insn 
into Phase 4
This sub-patch move the modification of rtl codes from pass_vsetvl
into pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): New.
(pass_vsetvl::can_refine_vsetvl_p): Removed.
(pass_vsetvl::refine_vsetvls): Removed.
(pass_vsetvl::cleanup_vsetvls): Removed.
(pass_vsetvl::commit_vsetvls): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 389 +++
1 file changed, 134 insertions(+), 255 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index a112895a283..5d84d290e9e 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3658,6 +3658,140 @@ pre_vsetvl::pre_global_vsetvl_info ()
 }
}
 
+void
+pre_vsetvl::emit_vsetvl ()
+{
+  bool need_commit = false;
+
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  for (const auto _info : get_block_info (bb).infos)
+ {
+   insn_info *insn = curr_info.get_insn ();
+   if (curr_info.ignore_p ())
+ {
+   if (vsetvl_insn_p (insn->rtl ()))
+ eliminate_insn (insn->rtl ());
+   continue;
+ }
+   else if (curr_info.valid_p ())
+ {
+   if (vsetvl_insn_p (insn->rtl ()))
+ {
+   const vsetvl_info temp = vsetvl_info (insn);
+   if (!(curr_info == temp))
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file, "\n  Change vsetvl info from: ");
+   temp.dump (dump_file, "");
+   fprintf (dump_file, "  to: ");
+   curr_info.dump (dump_file, "");
+ }
+   change_vsetvl_insn (insn, curr_info);
+ }
+ }
+   else
+ {
+   if (dump_file)
+ {
+   fprintf (dump_file,
+"\n  Insert vsetvl info before insn %d: ",
+insn->uid ());
+   curr_info.dump (dump_file, "");
+ }
+   insert_vsetvl (EMIT_BEFORE, insn->rtl (), curr_info);
+ }
+ }
+ }
+}
+
+  for (const vsetvl_info  : delete_list)
+{
+  gcc_assert (vsetvl_insn_p (item.get_insn ()->rtl ()));
+  eliminate_insn (item.get_insn ()->rtl ());
+}
+
+  /* Insert vsetvl as LCM suggest. */
+  for (int ed = 0; ed < NUM_EDGES (edges); ed++)
+{
+  edge eg = INDEX_EDGE (edges, ed);
+  sbitmap i = insert[ed];
+  if (bitmap_count_bits (i) < 1)
+ continue;
+
+  if (bitmap_count_bits (i) > 1)
+ /* For code with infinite loop (e.g. pr61634.c), The data flow is
+completely wrong.  */
+ continue;
+
+  gcc_assert (bitmap_count_bits (i) == 1);
+  unsigned expr_index = bitmap_first_set_bit (i);
+  const vsetvl_info  = *exprs[expr_index];
+  gcc_assert (info.valid_p ());
+  if (dump_file)
+ {
+   fprintf (dump_file,
+"\n  Insert vsetvl info at edge(bb %u -> bb %u): ",
+eg->src->index, eg->dest->index);
+   info.dump (dump_file, "");
+ }
+  rtl_profile_for_edge (eg);
+  start_sequence ();
+
+  insn_info *insn = info.get_insn ();
+  insert_vsetvl (EMIT_DIRECT, insn->rtl (), info);
+  rtx_insn *rinsn = get_insns ();
+  end_sequence ();
+  default_rtl_profile ();
+
+  /* We should not get an abnormal edge here.  */
+  gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+  need_commit = true;
+  insert_insn_on_edge (rinsn, eg);
+}
+
+  /* Insert vsetvl info that was not deleted after lift up.  */
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  const vsetvl_block_info _info = get_block_info (bb);
+  if (!block_info.has_info ())
+ continue;
+
+  const vsetvl_info _info = block_info.get_footer_info ();
+  insn_info *insn = footer_info.get_insn ();
+
+  if (footer_info.ignore_p ())
+ continue;
+
+  edge eg;
+  edge_iterator eg_iterator;
+  FOR_EACH_EDGE (eg, eg_iterator, bb->cfg_bb ()->succs)
+ {
+   gcc_assert (!(eg->flags & EDGE_ABNORMAL));
+   if (dump_file)
+ {
+   fprintf (
+ dump_file,
+ "\n  Insert missed vsetvl info at edge(bb %u -> bb %u): ",
+ eg->src->index, eg->dest->index);
+   footer_info.dump (dump_file, "");
+ }
+   start_sequence ();
+   insert_vsetvl (EMIT_DIRECT, insn->rtl (), footer_info);
+   rtx_insn *rinsn = get_insns ();
+   end_sequence ();
+   default_rtl_profile ();
+   insert_insn_on_edge (rinsn, eg);
+   need_commit = true;
+ }
+}
+
+  if (need_commit)
+commit_edge_insertions ();
+}
+
+
const pass_data pass_data_vsetvl = {
   RTL_PASS, /* type */
   "vsetvl", /* name */
@@ -3790,261 +3924,6 @@ make_pass_vsetvl (gcc::context *ctxt)
   return new pass_vsetvl (ctxt);
}
 
-
-/* Return true if VSETVL in the block can be refined as vsetvl zero,zero.  */
-bool
-pass_vsetvl::can_refine_vsetvl_p (const basic_block cfg_bb,
-   const vector_insn_info ) const
-{
-  if (!m_vector_manager->all_same_ratio_p (
- m_vector_manager->vector_avin[cfg_bb->index]))
-return false;
-
-  if

Re: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 07/14] RISC-V: P7: Move earliest fuse and lcm code to 
pre_vsetvl class
This patch adjust move the code phase 2 and 3 from pass_vsetvl to
pre_vsetvl class.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::earliest_fuse_vsetvl_info): New.
(pre_vsetvl::pre_global_vsetvl_info): New.
(pass_vsetvl::prune_expressions): Removed.
(pass_vsetvl::compute_local_properties): Removed.
(pass_vsetvl::earliest_fusion): Removed.
(pass_vsetvl::vsetvl_fusion): Removed.
(pass_vsetvl::pre_vsetvl): Removed.
(pass_vsetvl::compute_probabilities): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 829 +++
1 file changed, 398 insertions(+), 431 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index b1269e8cf4f..a112895a283 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3260,6 +3260,404 @@ pre_vsetvl::fuse_local_vsetvl_info ()
 }
}
 
+bool
+pre_vsetvl::earliest_fuse_vsetvl_info ()
+{
+  compute_avl_def_data ();
+  compute_vsetvl_def_data ();
+  compute_vsetvl_lcm_data ();
+
+  unsigned num_exprs = exprs.length ();
+  struct edge_list *edges = create_edge_list ();
+  unsigned num_edges = NUM_EDGES (edges);
+  sbitmap *antin
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+  sbitmap *antout
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), num_exprs);
+
+  sbitmap *earliest = sbitmap_vector_alloc (num_edges, num_exprs);
+
+  compute_available (avloc, kill, avout, avin);
+  compute_antinout_edge (antloc, transp, antin, antout);
+  compute_earliest (edges, num_exprs, antin, antout, avout, kill, earliest);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "\n  Compute LCM earliest insert data:\n\n");
+  fprintf (dump_file, "Expression List (%u):\n", num_exprs);
+  for (unsigned i = 0; i < num_exprs; i++)
+ {
+   const auto  = *exprs[i];
+   fprintf (dump_file, "  Expr[%u]: ", i);
+   info.dump (dump_file, "");
+ }
+  fprintf (dump_file, "\nbitmap data:\n");
+  for (const bb_info *bb : crtl->ssa->bbs ())
+ {
+   unsigned int i = bb->index ();
+   fprintf (dump_file, "  BB %u:\n", i);
+   fprintf (dump_file, "avloc: ");
+   dump_bitmap_file (dump_file, avloc[i]);
+   fprintf (dump_file, "kill: ");
+   dump_bitmap_file (dump_file, kill[i]);
+   fprintf (dump_file, "antloc: ");
+   dump_bitmap_file (dump_file, antloc[i]);
+   fprintf (dump_file, "transp: ");
+   dump_bitmap_file (dump_file, transp[i]);
+
+   fprintf (dump_file, "avin: ");
+   dump_bitmap_file (dump_file, avin[i]);
+   fprintf (dump_file, "avout: ");
+   dump_bitmap_file (dump_file, avout[i]);
+   fprintf (dump_file, "antin: ");
+   dump_bitmap_file (dump_file, antin[i]);
+   fprintf (dump_file, "antout: ");
+   dump_bitmap_file (dump_file, antout[i]);
+ }
+  fprintf (dump_file, "\n");
+  fprintf (dump_file, "  earliest:\n");
+  for (unsigned ed = 0; ed < num_edges; ed++)
+ {
+   edge eg = INDEX_EDGE (edges, ed);
+
+   if (bitmap_empty_p (earliest[ed]))
+ continue;
+   fprintf (dump_file, "Edge(bb %u -> bb %u): ", eg->src->index,
+eg->dest->index);
+   dump_bitmap_file (dump_file, earliest[ed]);
+ }
+  fprintf (dump_file, "\n");
+}
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Fused global info result:\n");
+}
+
+  bool changed = false;
+  for (unsigned ed = 0; ed < num_edges; ed++)
+{
+  sbitmap e = earliest[ed];
+  if (bitmap_empty_p (e))
+ continue;
+
+  unsigned int expr_index;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (e, 0, expr_index, sbi)
+ {
+   vsetvl_info _info = *exprs[expr_index];
+   if (!curr_info.valid_p ())
+ continue;
+
+   edge eg = INDEX_EDGE (edges, ed);
+   if (eg->probability == profile_probability::never ())
+ continue;
+   if (eg->src == ENTRY_BLOCK_PTR_FOR_FN (cfun)
+   || eg->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
+ continue;
+
+   vsetvl_block_info _block_info = get_block_info (eg->src);
+   vsetvl_block_info _block_info = get_block_info (eg->dest);
+
+   if (src_block_info.probability
+   == profile_probability::uninitialized ())
+ continue;
+
+   if (src_block_info.empty_p ())
+ {
+   vsetvl_info new_curr_info = curr_info;
+   new_curr_info.set_bb (crtl->ssa->bb (eg->dest));
+   bool has_compatible_p = false;
+   unsigned int def_expr_index;
+   sbitmap_iterator sbi2;
+   EXECUTE_IF_SET_IN_BITMAP (
+ vsetvl_def_in[new_curr_info.get_bb ()->index ()], 0,
+ def_expr_index, sbi2)
+ {
+   vsetvl_info _info = *vsetvl_def_exprs[def_expr_index];
+   if (!prev_info.valid_p ())
+ continue;
+   if

Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow

2023-10-17 Thread juzhe.zh...@rivai.ai

Copy and paste the original comments:

-/* Compute the local properties of each recorded expression.
-
-   Local properties are those that are defined by the block, irrespective of
-   other blocks.
-
-   An expression is transparent in a block if its operands are not modified
-   in the block.
-
-   An expression is computed (locally available) in a block if it is computed
-   at least once and expression would contain the same value if the
-   computation was moved to the end of the block.
-
-   An expression is locally anticipatable in a block if it is computed at
-   least once and expression would contain the same value if the computation
-   was moved to the beginning of the block.  */
-void
-pass_vsetvl::compute_local_properties (void)
-{
-  /* -  If T is locally available at the end of a block, then T' must be
-   available at the end of the same block. Since some optimization has
-   occurred earlier, T' might not be locally available, however, it must
-   have been previously computed on all paths. As a formula, T at AVLOC(B)
-   implies that T' at AVOUT(B).
-   An "available occurrence" is one that is the last occurrence in the
-   basic block and the operands are not modified by following statements in
-   the basic block [including this insn].
-
- -  If T is locally anticipated at the beginning of a block, then either
-   T', is locally anticipated or it is already available from previous
-   blocks. As a formula, this means that T at ANTLOC(B) implies that T' at
-   ANTLOC(B) at AVIN(B).
-   An "anticipatable occurrence" is one that is the first occurrence in the
-   basic block, the operands are not modified in the basic block prior
-   to the occurrence and the output is not used between the start of
-   the block and the occurrence.  */



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data 
flow
This sub-patch add some helper functions for computing reaching defintion data
and three computational functions for different object. These three functions
are used by phase 2 and 3.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New.
(compute_reaching_defintion): New.
(pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_vsetvl_lcm_data): New.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 468 +++
1 file changed, 468 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 33bdcec04d8..b1269e8cf4f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -103,6 +103,121 @@ along with GCC; see the file COPYING3.  If not see
using namespace rtl_ssa;
using namespace riscv_vector;
 
+/* Set the bitmap DST to the union of SRC of predecessors of
+   basic block B.
+   It's a bit different from bitmap_union_of_preds in cfganal.cc. This function
+   takes into account the case where pred is ENTRY basic block. The main reason
+   for this difference is to make it easier to insert some special value into
+   the ENTRY base block. For example, vsetvl_info with a status of UNKNOW.  */
+static void
+bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b)
+{
+  unsigned int set_size = dst->size;
+  edge e;
+  unsigned ix;
+
+  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
+{
+  e = EDGE_PRED (b, ix);
+  bitmap_copy (dst, src[e->src->index]);
+  break;
+}
+
+  if (ix == EDGE_COUNT (b->preds))
+bitmap_clear (dst);
+  else
+for (ix++; ix < EDGE_COUNT (b->preds); ix++)
+  {
+ unsigned int i;
+ SBITMAP_ELT_TYPE *p, *r;
+
+ e = EDGE_PRED (b, ix);
+ p = src[e->src->index]->elms;
+ r = dst->elms;
+ for (i = 0; i < set_size; i++)
+   *r++ |= *p++;
+  }
+}
+
+/* Compute the reaching defintion in and out based on the gen and KILL
+   informations in each Base Blocks.
+   This function references the compute_avaiable implementation in lcm.cc  */
+static void
+compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in,
+ sbitmap *out)
+{
+  edge e;
+  basic_block *worklist, *qin, *qout, *qend, bb;
+  unsigned int qlen;
+  edge_iterator ei;
+
+  /* Allocate a worklist array/queue.  Entries are only added to the
+ list if they were not already on the list.  So the size is
+ bounded by the number of basic blocks.  */
+  qin = qout = worklist
+= XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+
+  /* Put every block on the worklist; this is necessary because of the
+ optimistic initialization of AVOUT above.  Use reverse postorder
+ to make the forward dataflow problem require less iterations.  */
+  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+  int n =

Re: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data flow

2023-10-17 Thread juzhe.zh...@rivai.ai


compute_vsetvl_lcm_data -> compute_lcm_local_properties




juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 06/14] RISC-V: P6: Add computing reaching definition data 
flow
This sub-patch add some helper functions for computing reaching defintion data
and three computational functions for different object. These three functions
are used by phase 2 and 3.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): New.
(compute_reaching_defintion): New.
(pre_vsetvl::compute_avl_def_data): New.
(pre_vsetvl::compute_vsetvl_def_data): New.
(pre_vsetvl::compute_vsetvl_lcm_data): New.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 468 +++
1 file changed, 468 insertions(+)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 33bdcec04d8..b1269e8cf4f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -103,6 +103,121 @@ along with GCC; see the file COPYING3.  If not see
using namespace rtl_ssa;
using namespace riscv_vector;
 
+/* Set the bitmap DST to the union of SRC of predecessors of
+   basic block B.
+   It's a bit different from bitmap_union_of_preds in cfganal.cc. This function
+   takes into account the case where pred is ENTRY basic block. The main reason
+   for this difference is to make it easier to insert some special value into
+   the ENTRY base block. For example, vsetvl_info with a status of UNKNOW.  */
+static void
+bitmap_union_of_preds_with_entry (sbitmap dst, sbitmap *src, basic_block b)
+{
+  unsigned int set_size = dst->size;
+  edge e;
+  unsigned ix;
+
+  for (ix = 0; ix < EDGE_COUNT (b->preds); ix++)
+{
+  e = EDGE_PRED (b, ix);
+  bitmap_copy (dst, src[e->src->index]);
+  break;
+}
+
+  if (ix == EDGE_COUNT (b->preds))
+bitmap_clear (dst);
+  else
+for (ix++; ix < EDGE_COUNT (b->preds); ix++)
+  {
+ unsigned int i;
+ SBITMAP_ELT_TYPE *p, *r;
+
+ e = EDGE_PRED (b, ix);
+ p = src[e->src->index]->elms;
+ r = dst->elms;
+ for (i = 0; i < set_size; i++)
+   *r++ |= *p++;
+  }
+}
+
+/* Compute the reaching defintion in and out based on the gen and KILL
+   informations in each Base Blocks.
+   This function references the compute_avaiable implementation in lcm.cc  */
+static void
+compute_reaching_defintion (sbitmap *gen, sbitmap *kill, sbitmap *in,
+ sbitmap *out)
+{
+  edge e;
+  basic_block *worklist, *qin, *qout, *qend, bb;
+  unsigned int qlen;
+  edge_iterator ei;
+
+  /* Allocate a worklist array/queue.  Entries are only added to the
+ list if they were not already on the list.  So the size is
+ bounded by the number of basic blocks.  */
+  qin = qout = worklist
+= XNEWVEC (basic_block, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+
+  /* Put every block on the worklist; this is necessary because of the
+ optimistic initialization of AVOUT above.  Use reverse postorder
+ to make the forward dataflow problem require less iterations.  */
+  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS);
+  int n = pre_and_rev_post_order_compute_fn (cfun, NULL, rpo, false);
+  for (int i = 0; i < n; ++i)
+{
+  bb = BASIC_BLOCK_FOR_FN (cfun, rpo[i]);
+  *qin++ = bb;
+  bb->aux = bb;
+}
+  free (rpo);
+
+  qin = worklist;
+  qend = [n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS];
+  qlen = n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS;
+
+  /* Mark blocks which are successors of the entry block so that we
+ can easily identify them below.  */
+  FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
+e->dest->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun);
+
+  /* Iterate until the worklist is empty.  */
+  while (qlen)
+{
+  /* Take the first entry off the worklist.  */
+  bb = *qout++;
+  qlen--;
+
+  if (qout >= qend)
+ qout = worklist;
+
+  /* Do not clear the aux field for blocks which are successors of the
+ ENTRY block.  That way we never add then to the worklist again.  */
+  if (bb->aux != ENTRY_BLOCK_PTR_FOR_FN (cfun))
+ bb->aux = NULL;
+
+  bitmap_union_of_preds_with_entry (in[bb->index], out, bb);
+
+  if (bitmap_ior_and_compl (out[bb->index], gen[bb->index], in[bb->index],
+ kill[bb->index]))
+ /* If the out state of this block changed, then we need
+to add the successors of this block to the worklist
+if they are not already on the worklist.  */
+ FOR_EACH_EDGE (e, ei, bb->succs)
+   if (!e->dest->aux && e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
+ {
+   *qin++ = e->dest;
+   e->dest->aux = e;
+   qlen++;
+
+   if (qin >= qend)
+ qin = worklist;
+ }
+}
+
+  clear_aux_for_edges ();
+  clear_aux_for_blocks ();
+  free (worklist);
+}
+
static CONSTEXPR const unsigned ALL_SEW[] = {8, 16, 32, 64};
static CONSTEXPR const vlmul_type ALL_LMUL[]
   = {LMUL_1, LMUL_2, LMUL_4,

Re: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2

2023-10-17 Thread juzhe.zh...@rivai.ai

LGTM on algorithm of local analysis.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 05/14] RISC-V: P5: combine phase 1 and 2
This sub-patch combine phase 1 and 2 to use the new demand system and
delay the insert of vsetvl insn into phase 4.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): New.
(pass_vsetvl::compute_local_backward_infos): Removed.
(pass_vsetvl::need_vsetvl): Removed.
(pass_vsetvl::transfer_before): Removed.
(pass_vsetvl::transfer_after): Removed.
(pass_vsetvl::emit_local_forward_vsetvls): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 269 ++-
1 file changed, 123 insertions(+), 146 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 3f07fde782f..33bdcec04d8 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2669,6 +2669,129 @@ public:
   }
};
 
+void
+pre_vsetvl::fuse_local_vsetvl_info ()
+{
+  reg_def_loc
+= sbitmap_vector_alloc (last_basic_block_for_fn (cfun), GP_REG_LAST + 1);
+  bitmap_vector_clear (reg_def_loc, last_basic_block_for_fn (cfun));
+  bitmap_ones (reg_def_loc[ENTRY_BLOCK_PTR_FOR_FN (cfun)->index]);
+
+  for (bb_info *bb : crtl->ssa->bbs ())
+{
+  auto _info = get_block_info (bb);
+  block_info.m_bb = bb;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "  Try fuse basic block %d\n", bb->index ());
+ }
+  auto_vec infos;
+  for (insn_info *insn : bb->real_nondebug_insns ())
+ {
+   vsetvl_info curr_info = vsetvl_info (insn);
+   if (curr_info.valid_p () || curr_info.unknown_p ())
+ infos.safe_push (curr_info);
+
+   /* Collecting GP registers modified by the current bb.  */
+   if (insn->is_real ())
+ for (def_info *def : insn->defs ())
+   if (def->is_reg () && GP_REG_P (def->regno ()))
+ bitmap_set_bit (reg_def_loc[bb->index ()], def->regno ());
+ }
+
+  vsetvl_info prev_info = vsetvl_info ();
+  prev_info.set_empty ();
+  for (auto _info : infos)
+ {
+   if (prev_info.empty_p ())
+ prev_info = curr_info;
+   else if ((curr_info.unknown_p () && prev_info.valid_p ())
+|| (curr_info.valid_p () && prev_info.unknown_p ()))
+ {
+   block_info.infos.safe_push (prev_info);
+   prev_info = curr_info;
+ }
+   else if (curr_info.valid_p () && prev_info.valid_p ())
+ {
+   if (dem.available_with (prev_info, curr_info))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file,
+"Ignore curr info since prev info "
+"available with it:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "");
+   fprintf (dump_file, "\n");
+ }
+   if (!curr_info.use_by_non_rvv_insn_p ()
+   && vsetvl_insn_p (curr_info.get_insn ()->rtl ()))
+ delete_list.safe_push (curr_info);
+
+   if (curr_info.get_read_vl_insn ())
+ prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+ }
+   else if (dem.compatible_with (prev_info, curr_info))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "Fuse curr info since prev info "
+   "compatible with it:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "");
+ }
+   dem.merge_with (prev_info, curr_info);
+   if (curr_info.get_read_vl_insn ())
+ prev_info.set_read_vl_insn (curr_info.get_read_vl_insn ());
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "  prev_info after fused: ");
+   prev_info.dump (dump_file, "");
+   fprintf (dump_file, "\n");
+ }
+ }
+   else
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file,
+"Cannot fuse uncompatible infos:\n");
+   fprintf (dump_file, "  prev_info: ");
+   prev_info.dump (dump_file, "   ");
+   fprintf (dump_file, "  curr_info: ");
+   curr_info.dump (dump_file, "   ");
+ }
+   block_info.infos.safe_push (prev_info);
+   prev_info = curr_info;
+ }
+ }
+ }
+
+  if (prev_info.valid_p () || prev_info.unknown_p ())
+ block_info.infos.safe_push (prev_info);
+}
+
+  avl_regs = sbitmap_alloc (GP_REG_LAST + 1);
+  bitmap_clear (avl_regs);
+  for (const bb_info *bb : crtl->ssa->bbs ())
+{
+  vsetvl_block_info _info = get_block_info (bb);
+  if (block_info.empty_p ())
+ continue;
+
+  vsetvl_info _info = block_info.get_header_info ();
+  if (header_info.valid_p () && header_info.has_reg_avl ())
+ {
+   gcc_assert (GP_REG_P (REGNO (header_info.get_avl (;
+

Re: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to vsetvl_block_info class

2023-10-17 Thread juzhe.zh...@rivai.ai

+  const vsetvl_info _header_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }

Change it into get_entry_info (be consistent with mode-switching naming which 
also uses LCM).

+  const vsetvl_info _footer_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }

Change it into get_exit_info (be consistent with mode-switching naming which 
also uses LCM).



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 11/14] RISC-V: P11: Adjust vector_block_info to 
vsetvl_block_info class
This sub-patch adjust vector_block_info codes and rename to
vsetvl_block_info.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (class vsetvl_block_info): New.
* config/riscv/riscv-vsetvl.h (struct vector_block_info): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 55 +++-
gcc/config/riscv/riscv-vsetvl.h  | 14 
2 files changed, 54 insertions(+), 15 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index b5ed1ea774a..d91b0272d9f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -85,7 +85,6 @@ along with GCC; see the file COPYING3.  If not see
#include "predict.h"
#include "profile-count.h"
#include "gcse.h"
-#include "riscv-vsetvl.h"
 
using namespace rtl_ssa;
using namespace riscv_vector;
@@ -1218,6 +1217,60 @@ public:
   }
};
 
+class vsetvl_block_info
+{
+public:
+  /* The static execute probability of the demand info.  */
+  profile_probability probability;
+
+  auto_vec infos;
+  vsetvl_info m_info;
+  bb_info *m_bb;
+
+  bool full_available;
+
+  vsetvl_block_info () : m_bb (nullptr), full_available (false)
+  {
+infos.safe_grow_cleared (0);
+m_info.set_empty ();
+  }
+  vsetvl_block_info (const vsetvl_block_info )
+: probability (other.probability), infos (other.infos.copy ()),
+  m_info (other.m_info), m_bb (other.m_bb)
+  {}
+
+  vsetvl_info _header_info ()
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }
+  vsetvl_info _footer_info ()
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }
+  const vsetvl_info _header_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[0];
+  }
+  const vsetvl_info _footer_info () const
+  {
+gcc_assert (!empty_p ());
+return infos.is_empty () ? m_info : infos[infos.length () - 1];
+  }
+
+  bool empty_p () const { return infos.is_empty () && !has_info (); }
+  bool has_info () const { return !m_info.empty_p (); }
+  void set_info (const vsetvl_info )
+  {
+gcc_assert (infos.is_empty ());
+m_info = info;
+m_info.set_bb (m_bb);
+  }
+  void set_empty_info () { m_info.set_empty (); }
+};
+
class demand_system
{
private:
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 96e36403af7..16c84e0684b 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -55,19 +55,5 @@ enum def_type
   CLOBBER_DEF = 1 << 4
};
 
-struct vector_block_info
-{
-  /* The local_dem vector insn_info of the block.  */
-  vector_insn_info local_dem;
-
-  /* The reaching_out vector insn_info of the block.  */
-  vector_insn_info reaching_out;
-
-  /* The static execute probability of the demand info.  */
-  profile_probability probability;
-
-  vector_block_info () = default;
-};
-
} // namespace riscv_vector
#endif
--
2.36.3

Re: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl

2023-10-17 Thread juzhe.zh...@rivai.ai

LGMT this patch.



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 04/14] RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
This sub-patch remove the method about optimize vsetvl infos into
class pre_vsetvl.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info): Removed.
(pass_vsetvl::get_block_info): Removed.
(pass_vsetvl::update_vector_info): Removed.
(pass_vsetvl::update_block_info): Removed.
(pass_vsetvl::simple_vsetvl): Removed.
(pass_vsetvl::lazy_vsetvl): Removed.
(pass_vsetvl::execute): Removed.
(make_pass_vsetvl): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 228 ---
1 file changed, 87 insertions(+), 141 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index c219ad178bb..3f07fde782f 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2684,54 +2684,8 @@ const pass_data pass_data_vsetvl = {
class pass_vsetvl : public rtl_opt_pass
{
private:
-  vector_infos_manager *m_vector_manager;
-
-  const vector_insn_info _vector_info (const rtx_insn *) const;
-  const vector_insn_info _vector_info (const insn_info *) const;
-  const vector_block_info _block_info (const basic_block) const;
-  const vector_block_info _block_info (const bb_info *) const;
-  vector_block_info _block_info (const basic_block);
-  vector_block_info _block_info (const bb_info *);
-  void update_vector_info (const insn_info *, const vector_insn_info &);
-  void update_block_info (int, profile_probability, const vector_insn_info &);
-
-  void simple_vsetvl (void) const;
-  void lazy_vsetvl (void);
-
-  /* Phase 1.  */
-  void compute_local_backward_infos (const bb_info *);
-
-  /* Phase 2.  */
-  bool need_vsetvl (const vector_insn_info &, const vector_insn_info &) const;
-  void transfer_before (vector_insn_info &, insn_info *) const;
-  void transfer_after (vector_insn_info &, insn_info *) const;
-  void emit_local_forward_vsetvls (const bb_info *);
-
-  /* Phase 3.  */
-  bool earliest_fusion (void);
-  void vsetvl_fusion (void);
-
-  /* Phase 4.  */
-  void prune_expressions (void);
-  void compute_local_properties (void);
-  bool can_refine_vsetvl_p (const basic_block, const vector_insn_info &) const;
-  void refine_vsetvls (void) const;
-  void cleanup_vsetvls (void);
-  bool commit_vsetvls (void);
-  void pre_vsetvl (void);
-
-  /* Phase 5.  */
-  rtx_insn *get_vsetvl_at_end (const bb_info *, vector_insn_info *) const;
-  void local_eliminate_vsetvl_insn (const bb_info *) const;
-  bool global_eliminate_vsetvl_insn (const bb_info *) const;
-  void ssa_post_optimization (void) const;
-
-  /* Phase 6.  */
-  void df_post_optimization (void) const;
-
-  void init (void);
-  void done (void);
-  void compute_probabilities (void);
+  void simple_vsetvl ();
+  void lazy_vsetvl ();
 
public:
   pass_vsetvl (gcc::context *ctxt) : rtl_opt_pass (pass_data_vsetvl, ctxt) {}
@@ -2741,69 +2695,11 @@ public:
   virtual unsigned int execute (function *) final override;
}; // class pass_vsetvl
 
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const rtx_insn *i) const
-{
-  return m_vector_manager->vector_insn_infos[INSN_UID (i)];
-}
-
-const vector_insn_info &
-pass_vsetvl::get_vector_info (const insn_info *i) const
-{
-  return m_vector_manager->vector_insn_infos[i->uid ()];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-const vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb) const
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const basic_block bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index];
-}
-
-vector_block_info &
-pass_vsetvl::get_block_info (const bb_info *bb)
-{
-  return m_vector_manager->vector_block_infos[bb->index ()];
-}
-
-void
-pass_vsetvl::update_vector_info (const insn_info *i,
- const vector_insn_info _info)
-{
-  m_vector_manager->vector_insn_infos[i->uid ()] = new_info;
-}
-
void
-pass_vsetvl::update_block_info (int index, profile_probability prob,
- const vector_insn_info _info)
-{
-  m_vector_manager->vector_block_infos[index].probability = prob;
-  if (m_vector_manager->vector_block_infos[index].local_dem
-  == m_vector_manager->vector_block_infos[index].reaching_out)
-m_vector_manager->vector_block_infos[index].local_dem = new_info;
-  m_vector_manager->vector_block_infos[index].reaching_out = new_info;
-}
-
-/* Simple m_vsetvl_insert vsetvl for optimize == 0.  */
-void
-pass_vsetvl::simple_vsetvl (void) const
+pass_vsetvl::simple_vsetvl ()
{
   if (dump_file)
-fprintf (dump_file,
-  "\nEntering Simple VSETVL PASS and Handling %d basic blocks for "
-  "function:%s\n",
-  n_basic_blocks_for_fn (cfun),

Re: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager

2023-10-17 Thread juzhe.zh...@rivai.ai

+  demand_system dem;
+  auto_vec vector_block_infos;
+
+  /* data for avl reaching defintion.  */
+  sbitmap avl_regs;
+  sbitmap *avl_def_in;
+  sbitmap *avl_def_out;
+  sbitmap *reg_def_loc;
+
+  /* data for vsetvl info reaching defintion.  */
+  vsetvl_info unknow_info;
+  auto_vec vsetvl_def_exprs;
+  sbitmap *vsetvl_def_in;
+  sbitmap *vsetvl_def_out;
+
+  /* data for lcm */
+  auto_vec exprs;
+  sbitmap *avloc;
+  sbitmap *avin;
+  sbitmap *avout;
+  sbitmap *kill;
+  sbitmap *antloc;
+  sbitmap *transp;
+  sbitmap *insert;
+  sbitmap *del;
+  struct edge_list *edges;
+
+  auto_vec delete_list;

All of them add "m_" prefix.

earliest_fusion_worthwhile_p -> successors_probability_equal_p

calculate_dominance_info (CDI_POST_DOMINATORS); > remove
free_dominance_info (CDI_POST_DOMINATORS); ---> remove



juzhe.zh...@rivai.ai
 
From: Lehua Ding
Date: 2023-10-17 19:34
To: gcc-patches
CC: juzhe.zhong; kito.cheng; rdapp.gcc; palmer; jeffreyalaw; lehua.ding
Subject: [PATCH V2 03/14] RISC-V: P3: Refactor vector_infos_manager
This sub-patch refactor vector_infos_manager to a pre_vsetvl class
which is responsible for the entire lazy vsetvl jobs. There is no need
to introduce a separate vsetvl infos manager, because vsetvl infos are
modified by the optimization code.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (vector_infos_manager::vector_infos_manager): 
Removed.
(class pre_vsetvl): New class.
(vector_infos_manager::create_expr): Removed.
(vector_infos_manager::get_expr_id): Removed.
(vector_infos_manager::all_same_ratio_p): Removed.
(vector_infos_manager::all_avail_in_compatible_p): Removed.
(vector_infos_manager::all_same_avl_p): Removed.
(vector_infos_manager::expr_set_num): Removed.
(vector_infos_manager::release): Removed.
(vector_infos_manager::create_bitmap_vectors): Removed.
(vector_infos_manager::free_bitmap_vectors): Removed.
(vector_infos_manager::dump): Removed.
* config/riscv/riscv-vsetvl.h (class vector_infos_manager): Removed.
 
---
gcc/config/riscv/riscv-vsetvl.cc | 632 +--
gcc/config/riscv/riscv-vsetvl.h  |  75 
2 files changed, 257 insertions(+), 450 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index be40b6fdf4c..c219ad178bb 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2390,402 +2390,284 @@ public:
   }
};
 
-vector_infos_manager::vector_infos_manager ()
+class pre_vsetvl
{
-  vector_edge_list = nullptr;
-  vector_kill = nullptr;
-  vector_del = nullptr;
-  vector_insert = nullptr;
-  vector_antic = nullptr;
-  vector_transp = nullptr;
-  vector_comp = nullptr;
-  vector_avin = nullptr;
-  vector_avout = nullptr;
-  vector_antin = nullptr;
-  vector_antout = nullptr;
-  vector_earliest = nullptr;
-  vector_insn_infos.safe_grow_cleared (get_max_uid ());
-  vector_block_infos.safe_grow_cleared (last_basic_block_for_fn (cfun));
-  if (!optimize)
-{
-  basic_block cfg_bb;
-  rtx_insn *rinsn;
-  FOR_ALL_BB_FN (cfg_bb, cfun)
- {
-   vector_block_infos[cfg_bb->index].local_dem = vector_insn_info ();
-   vector_block_infos[cfg_bb->index].reaching_out = vector_insn_info ();
-   FOR_BB_INSNS (cfg_bb, rinsn)
- vector_insn_infos[INSN_UID (rinsn)].parse_insn (rinsn);
- }
-}
-  else
-{
-  for (const bb_info *bb : crtl->ssa->bbs ())
- {
-   vector_block_infos[bb->index ()].local_dem = vector_insn_info ();
-   vector_block_infos[bb->index ()].reaching_out = vector_insn_info ();
-   for (insn_info *insn : bb->real_insns ())
- vector_insn_infos[insn->uid ()].parse_insn (insn);
-   vector_block_infos[bb->index ()].probability = profile_probability ();
- }
-}
-}
-
-void
-vector_infos_manager::create_expr (vector_insn_info )
-{
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return;
-  vector_exprs.safe_push ();
-}
-
-size_t
-vector_infos_manager::get_expr_id (const vector_insn_info ) const
-{
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (*vector_exprs[i] == info)
-  return i;
-  gcc_unreachable ();
-}
-
-auto_vec
-vector_infos_manager::get_all_available_exprs (
-  const vector_insn_info ) const
-{
-  auto_vec available_list;
-  for (size_t i = 0; i < vector_exprs.length (); i++)
-if (info.available_p (*vector_exprs[i]))
-  available_list.safe_push (i);
-  return available_list;
-}
-
-bool
-vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
-{
-  if (bitmap_empty_p (bitdata))
-return false;
-
-  int ratio = -1;
-  unsigned int bb_index;
-  sbitmap_iterator sbi;
-
-  EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
-{
-  if (ratio == -1)
- ratio = vector_exprs[bb_index]->get_ratio ();
-  else if (vector_exprs[bb_index]->get_ratio () != ratio)
- return false;
-}
-  return true;
-}
-
-/* Return TRUE if the incoming vector configuration state
-   to CFG_BB is compatible with the vector configuration
-   state in CFG_BB, FALSE

Re: [PATCH V2 00/14] Refactor and cleanup vsetvl pass

2023-10-17 Thread Lehua Ding


Hi Patrick,

Thanks a lot for reporting these failes, very important. I'll locate the 
causes since my previous run was with these parameters: 
-march=gcv_zvfh_zfh + -cmodel=medany + spike did not encounter these fails.


On 2023/10/18 4:25, Patrick O'Neill wrote:

Hi Lehua!

I ran the gcc testsuite on qemu before/after applying your patches to 
305034e3 rv32/64gcv [1].


Baseline
    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
     rv32gcv/ ilp32d/ medlow |  208 /    78 |   29 /    17 |   71 /    24 |
     rv64gcv/  lp64d/ medlow |  101 /    54 |   13 / 4 |   33 /    13 |

After applying patch series:
    = Summary of gcc testsuite =
     | # of unexpected case / # of unique 
unexpected case

     |  gcc |  g++ | gfortran |
     rv32gcv/ ilp32d/ medlow |  256 /    96 |   29 /    17 |   69 /    23 |
     rv64gcv/  lp64d/ medlow |  152 /    74 |   13 / 4 |   31 /    12 |

I'm seeing:
20 new unique gcc failures on rv64gcv [2]
18 new unique gcc failures on rv32gcv [3]

Thanks,
Patrick

[1] Build commands:
git clone https://github.com/patrick-rivos/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
git submodule update --init gcc
cd gcc
git checkout 305034e3
cd ..
mkdir build
cd build
../configure --prefix=$(pwd) 
--with-multilib-generator="rv64gcv-lp64d--;rv32gcv-ilp32d--"

make report-linux -j32

Note: If you'd prefer to use upstream riscv-gnu-toolchain, I'm pretty 
sure you can do

mkdir build-64
cd build-64
../configure --prefix=$(pwd) --with-arch=rv64gcv --with-abi=lp64d
cd ..
mkdir build-32
cd build-32
../configure --prefix=$(pwd) --with-arch=rv32gcv --with-abi=lp32d
This'll make 2 folders, so run make report-linux in each of them.

[2] rv64gcv New failures:
FAIL: gcc.dg/vect/slp-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-7.c execution test
FAIL: gcc.target/riscv/zero-scratch-regs-2.c   -O3 -g scan-assembler-not 
\\mvsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O1 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -Os 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O1 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -Os 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O1 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -Os 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times 
vsetvli 2

[r14-4629 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-10-17 Thread Jiang, Haochen

On Linux/x86_64,

3179ad72f67f31824c444ef30ef171ad7495d274 is the first bad commit
commit 3179ad72f67f31824c444ef30ef171ad7495d274
Author: Richard Biener rguent...@suse.de
Date:   Fri Oct 13 12:32:51 2023 +0200

OMP SIMD inbranch call vectorization for AVX512 style masks

caused

FAIL: gcc.dg/vect/vect-simd-clone-16b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18b.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] 
[^\\n]* = foo\\.simdclone" 2

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-4629/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-16f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-17f.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18b.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)

Re: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread chenglulu




在 2023/10/17 下午10:24, WANG Xuerui 写道:


On 10/17/23 22:06, Xi Ruoyao wrote:

During the review of a LLVM change [1], on LA464 we found that zeroing

"an" LLVM change (because the word LLVM is pronounced letter-by-letter)

a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

Similarly, "an" fcc


[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?


Ok!

Thanks.



  gcc/config/loongarch/loongarch.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md

index 68897799505..743e75907a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2151,7 +2151,7 @@ (define_insn "movfcc"
    [(set (match_operand:FCC 0 "register_operand" "=z")
  (const_int 0))]
    ""
-  "movgr2cf\t%0,$r0")
+  "fcmp.caf.s\t%0,$f0,$f0")
    ;; Conditional move instructions.
Trivial enough, so this LGTM apart from the grammatical nits. (Whoever 
pushing this patch could simply amend it themselves so maybe there's 
no need for a v2.) Thanks! \

[PATCH v1] RISC-V: Remove the type size restriction of vectorizer

2023-10-17 Thread pan2 . li

From: Pan Li 

The vectoriable_call has one restriction of the size of data type.
Aka DF to DI is allowed but SF to DI isn't. You may see below message
when try to vectorize function call like lrintf.

void
test_lrintf (long *out, float *in, unsigned count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrintf (in[i]);
}

lrintf.c:5:26: missed: couldn't vectorize loop
lrintf.c:5:26: missed: not vectorized: unsupported data-type

Then the standard name pattern like lrintmn2 cannot work for different
data type size like SF => DI. This patch would like to remove this data
type size check and unblock the standard name like lrintmn2.

Passed the x86 bootstrap and regression test already.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_call): Remove data size
check.

Signed-off-by: Pan Li 
---
 gcc/tree-vect-stmts.cc | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..326e000a71d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo,
 
   return false;
 }
-  /* FORNOW: we don't yet support mixtures of vector sizes for calls,
- just mixtures of nunits.  E.g. DI->SI versions of __builtin_ctz*
- are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed
- by a pack of the two vectors into an SI vector.  We would need
- separate code to handle direct VnDI->VnSI IFN_CTZs.  */
-  if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out))
-{
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"mismatched vector sizes %T and %T\n",
-vectype_in, vectype_out);
-  return false;
-}
 
   if (VECTOR_BOOLEAN_TYPE_P (vectype_out)
   != VECTOR_BOOLEAN_TYPE_P (vectype_in))
-- 
2.34.1

Re: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]

2023-10-17 Thread juzhe.zh...@rivai.ai

Committed.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-17 15:30
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Enable more tests for dynamic LMUL and bug 
fix[PR111832]
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC:
 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html
 
which is caused by assertion FAIL.
 
When we enable more currents in rvv.exp with dynamic LMUL, such issue can be
reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832
 
Now, we enable more tests in rvv.exp in this patch and fix the bug.
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.
 
---
gcc/config/riscv/riscv-vector-costs.cc | 19 +--
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 10 --
2 files changed, 21 insertions(+), 8 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index 33061efb1d0..af87388a1e4 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -154,6 +154,14 @@ compute_local_program_points (
 }
}
+static machine_mode
+get_biggest_mode (machine_mode mode1, machine_mode mode2)
+{
+  unsigned int mode1_size = GET_MODE_BITSIZE (mode1).to_constant ();
+  unsigned int mode2_size = GET_MODE_BITSIZE (mode2).to_constant ();
+  return mode1_size >= mode2_size ? mode1 : mode2;
+}
+
/* Compute local live ranges of each vectorized variable.
Note that we only compute local live ranges (within a block) since
local live ranges information is accurate enough for us to determine
@@ -201,12 +209,12 @@ compute_local_live_ranges (
{
  unsigned int point = program_point.point;
  gimple *stmt = program_point.stmt;
-   machine_mode mode = biggest_mode;
  tree lhs = gimple_get_lhs (stmt);
  if (lhs != NULL_TREE && is_gimple_reg (lhs)
  && !POINTER_TYPE_P (TREE_TYPE (lhs)))
{
-   mode = TYPE_MODE (TREE_TYPE (lhs));
+   biggest_mode = get_biggest_mode (biggest_mode,
+TYPE_MODE (TREE_TYPE (lhs)));
  bool existed_p = false;
  pair _range
= live_ranges->get_or_insert (lhs, _p);
@@ -225,7 +233,9 @@ compute_local_live_ranges (
 the future.  */
  if (is_gimple_val (var) && !POINTER_TYPE_P (TREE_TYPE (var)))
{
-   mode = TYPE_MODE (TREE_TYPE (var));
+   biggest_mode
+ = get_biggest_mode (biggest_mode,
+ TYPE_MODE (TREE_TYPE (var)));
  bool existed_p = false;
  pair _range
= live_ranges->get_or_insert (var, _p);
@@ -238,9 +248,6 @@ compute_local_live_ranges (
live_range = pair (0, point);
}
}
-   if (GET_MODE_SIZE (mode).to_constant ()
-   > GET_MODE_SIZE (biggest_mode).to_constant ())
- biggest_mode = mode;
}
  if (dump_enabled_p ())
for (hash_map::iterator iter = live_ranges->begin ();
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index ff76e17d0e6..674ba0d72b4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -58,10 +58,12 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
   {-ftree-vectorize -O3 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O3 --param riscv-autovec-lmul=dynamic} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
   {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} \
-  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} ]
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m8} \
+  {-ftree-vectorize -O2 --param riscv-autovec-lmul=dynamic} ]
foreach op $AUTOVEC_TEST_OPTS {
   dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/partial/*.\[cS\]]] \
 "" "$op"
@@ -104,18 +106,22 @@ set AUTOVEC_TEST_OPTS [list \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m8 -fno-vect-cost-model -ffast-math} \
+  {-ftree-vectorize -O3 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=dynamic -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m1 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m2 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O2 --param riscv-autovec-preference=fixed-vlmax --param 
riscv-autovec-lmul=m4 -fno-vect-cost-model -ffast-math} \
   {-ftree-vectorize -O2

Re: [PATCH 0/3] Add Intel new cpu archs

2023-10-17 Thread Hongtao Liu

On Mon, Oct 16, 2023 at 2:25 PM Haochen Jiang  wrote:
>
> Hi all,
>
> The patches aim to add new cpu archs Clear Water Forest and
> Panther Lake. Here comes the documentation:
>
> https://cdrdv2.intel.com/v1/dl/getContent/671368
>
> Also in the patches, I refactored how we detect cpu according to features
> and added m_CORE_ATOM.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk?
Ok, please also update https://gcc.gnu.org/gcc-14/changes.html with
your patches and USER_MSR.
>
> Thx,
> Haochen
>
>



--
BR,
Hongtao

[PATCH] libstdc++: testsuite: Enhance codecvt_unicode with tests for length()

2023-10-17 Thread Dimitrij Mijoski

We can test codecvt::length() with the same data that we test
codecvt::in(). For each call of in() we add another call to length().
Some additional small cosmentic changes are applied.

libstdc++-v3/ChangeLog:

* testsuite/22_locale/codecvt/codecvt_unicode.h: Test length()
---
 .../22_locale/codecvt/codecvt_unicode.h   | 103 +++---
 1 file changed, 90 insertions(+), 13 deletions(-)

diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h 
b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
index d3ae42fac..b3c257ec2 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode.h
@@ -17,7 +17,6 @@
 
 #include 
 #include 
-#include 
 #include 
 
 struct test_offsets_ok
@@ -79,6 +78,10 @@ utf8_to_utf32_in_ok (const std::codecvt )
   VERIFY (char_traits::compare (out, exp, t.out_size) == 0);
   if (t.out_size < array_size (out))
VERIFY (out[t.out_size] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.in_size);
 }
 
   for (auto t : offsets)
@@ -99,6 +102,10 @@ utf8_to_utf32_in_ok (const std::codecvt )
   VERIFY (char_traits::compare (out, exp, t.out_size) == 0);
   if (t.out_size < array_size (out))
VERIFY (out[t.out_size] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, array_size (out));
+  VERIFY (len == t.in_size);
 }
 }
 
@@ -163,6 +170,10 @@ utf8_to_utf32_in_partial (const std::codecvt )
  == 0);
   if (t.expected_out_next < array_size (out))
VERIFY (out[t.expected_out_next] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.expected_in_next);
 }
 }
 
@@ -303,6 +314,10 @@ utf8_to_utf32_in_error (const std::codecvt )
   if (t.expected_out_next < array_size (out))
VERIFY (out[t.expected_out_next] == 0);
 
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.expected_in_next);
+
   in[t.replace_pos] = old_char;
 }
 }
@@ -334,7 +349,7 @@ utf32_to_utf8_out_ok (const std::codecvt )
   VERIFY (char_traits::length (in) == 4);
   VERIFY (char_traits::length (exp) == 10);
 
-  const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {4, 10}};
+  test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {4, 10}};
   for (auto t : offsets)
 {
   ExternT out[array_size (exp) - 1] = {};
@@ -374,7 +389,7 @@ utf32_to_utf8_out_partial (const std::codecvt )
   VERIFY (char_traits::length (in) == 4);
   VERIFY (char_traits::length (exp) == 10);
 
-  const test_offsets_partial offsets[] = {
+  test_offsets_partial offsets[] = {
 {1, 0, 0, 0}, // no space for first CP
 
 {2, 1, 1, 1}, // no space for second CP
@@ -528,6 +543,10 @@ utf8_to_utf16_in_ok (const std::codecvt )
   VERIFY (char_traits::compare (out, exp, t.out_size) == 0);
   if (t.out_size < array_size (out))
VERIFY (out[t.out_size] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.in_size);
 }
 
   for (auto t : offsets)
@@ -548,6 +567,10 @@ utf8_to_utf16_in_ok (const std::codecvt )
   VERIFY (char_traits::compare (out, exp, t.out_size) == 0);
   if (t.out_size < array_size (out))
VERIFY (out[t.out_size] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, array_size (out));
+  VERIFY (len == t.in_size);
 }
 }
 
@@ -617,6 +640,10 @@ utf8_to_utf16_in_partial (const std::codecvt )
  == 0);
   if (t.expected_out_next < array_size (out))
VERIFY (out[t.expected_out_next] == 0);
+
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.expected_in_next);
 }
 }
 
@@ -757,6 +784,10 @@ utf8_to_utf16_in_error (const std::codecvt )
   if (t.expected_out_next < array_size (out))
VERIFY (out[t.expected_out_next] == 0);
 
+  state = {};
+  auto len = cvt.length (state, in, in + t.in_size, t.out_size);
+  VERIFY (len == t.expected_in_next);
+
   in[t.replace_pos] = old_char;
 }
 }
@@ -788,7 +819,7 @@ utf16_to_utf8_out_ok (const std::codecvt )
   VERIFY (char_traits::length (in) == 5);
   VERIFY (char_traits::length (exp) == 10);
 
-  const test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {5, 10}};
+  test_offsets_ok offsets[] = {{0, 0}, {1, 1}, {2, 3}, {3, 6}, {5, 10}};
   for (auto t : offsets)
 {
   ExternT out[array_size (exp) - 1] = {};
@@ -828,7 +859,7 @@ utf16_to_utf8_out_partial (const std::codecvt )
   VERIFY (char_traits::length (in) == 5);
   VERIFY (char_traits::length (exp) == 10);
 
-  const test_offsets_partial offsets[] = {
+  test_offsets_partial offsets[] = {
 {1, 0, 0, 0}, // no space for first CP

[PATCH 2/2] aarch64: Put LR save slot first in more cases

2023-10-17 Thread Richard Sandiford

Now that the prologue and epilogue code iterates over saved
registers in offset order, we can put the LR save slot first
without compromising LDP/STP formation.

This isn't worthwhile when shadow call stacks are enabled, since the
first two registers are also push/pop candidates, and LR cannot be
popped when shadow call stacks are enabled.  (LR is instead loaded
first and compared against the shadow stack's value.)

But otherwise, it seems better to put the LR save slot first,
to reduce unnecessary variation with the layout for stack clash
protection.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make
the position of the LR save slot dependent on stack clash
protection unless shadow call stacks are enabled.

gcc/testsuite/
* gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19.
* gcc.target/aarch64/test_frame_4.c: Likewise.
* gcc.target/aarch64/test_frame_7.c: Likewise.
* gcc.target/aarch64/test_frame_10.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc| 2 +-
 gcc/testsuite/gcc.target/aarch64/test_frame_10.c | 4 ++--
 gcc/testsuite/gcc.target/aarch64/test_frame_2.c  | 4 ++--
 gcc/testsuite/gcc.target/aarch64/test_frame_4.c  | 4 ++--
 gcc/testsuite/gcc.target/aarch64/test_frame_7.c  | 4 ++--
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index e8b5dfe4d58..62b1ae0652f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8694,7 +8694,7 @@ aarch64_layout_frame (void)
   allocate_gpr_slot (R29_REGNUM);
   allocate_gpr_slot (R30_REGNUM);
 }
-  else if (flag_stack_clash_protection
+  else if ((flag_stack_clash_protection || !frame.is_scs_enabled)
   && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED))
 /* Put the LR save slot first, since it makes a good choice of probe
for stack clash purposes.  The idea is that the link register usually
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_10.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_10.c
index c19505082fa..c54ab2d0ccb 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_10.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_10.c
@@ -14,6 +14,6 @@
 t_frame_pattern_outgoing (test10, 480, "x19", 24, a[8], a[9], a[10])
 t_frame_run (test10)
 
-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } 
} */
-/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp, \[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, \[0-9\]+\\\]" 1 } 
} */
+/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp, \[0-9\]+\\\]" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
index 7e5df84cf5f..0d715314cb8 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c
@@ -14,6 +14,6 @@ t_frame_pattern (test2, 200, "x19")
 t_frame_run (test2)
 
 
-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 
} } */
-/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, -\[0-9\]+\\\]!" 1 
} } */
+/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\], \[0-9\]+" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
index ed13487a094..b41229c42f4 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c
@@ -13,6 +13,6 @@
 t_frame_pattern (test4, 400, "x19")
 t_frame_run (test4)
 
-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 
} } */
-/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, -\[0-9\]+\\\]!" 1 
} } */
+/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\], \[0-9\]+" } } */
 
diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c 
b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
index 96452794956..5702656a5da 100644
--- a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c
@@ -13,6 +13,6 @@
 t_frame_pattern (test7, 700, "x19")
 t_frame_run (test7)
 
-/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp]" 1 } } */
-/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\]" } } */
+/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp]" 1 } } */
+/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\]" } } */
 
-- 
2.25.1

[PATCH 1/2] aarch64: Use vecs to store register save order

2023-10-17 Thread Richard Sandiford

aarch64_save/restore_callee_saves looped over registers in register
number order.  This in turn meant that we could only use LDP and STP
for registers that were consecutive both number-wise and
offset-wise (after unsaved registers are excluded).

This patch instead builds lists of the registers that we've decided to
save, in offset order.  We can then form LDP/STP pairs regardless of
register number order, which in turn means that we can put the LR save
slot first without losing LDP/STP opportunities.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.h (aarch64_frame): Add vectors that
store the list saved GPRs, FPRs and predicate registers.
* config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize
the lists of saved registers.  Use them to choose push candidates.
Invalidate pop candidates if we're not going to do a pop.
(aarch64_next_callee_save): Delete.
(aarch64_save_callee_saves): Take a list of registers,
rather than a range.  Make !skip_wb select only write-back
candidates.
(aarch64_expand_prologue): Update calls accordingly.
(aarch64_restore_callee_saves): Take a list of registers,
rather than a range.  Always skip pop candidates.  Also skip
LR if shadow call stacks are enabled.
(aarch64_expand_epilogue): Update calls accordingly.

gcc/testsuite/
* gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores
to happen in offset order.
* gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 203 +-
 gcc/config/aarch64/aarch64.h  |   9 +-
 .../aarch64/sve/pcs/stack_clash_2.c   |   6 +-
 .../aarch64/sve/pcs/stack_clash_2_1024.c  |   6 +-
 .../aarch64/sve/pcs/stack_clash_2_128.c   |   6 +-
 .../aarch64/sve/pcs/stack_clash_2_2048.c  |   6 +-
 .../aarch64/sve/pcs/stack_clash_2_256.c   |   6 +-
 .../aarch64/sve/pcs/stack_clash_2_512.c   |   6 +-
 8 files changed, 128 insertions(+), 120 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9fbfc548a89..e8b5dfe4d58 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8527,13 +8527,17 @@ aarch64_save_regs_above_locals_p ()
 static void
 aarch64_layout_frame (void)
 {
-  int regno, last_fp_reg = INVALID_REGNUM;
+  unsigned regno, last_fp_reg = INVALID_REGNUM;
   machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM);
   poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode);
   bool frame_related_fp_reg_p = false;
   aarch64_frame  = cfun->machine->frame;
   poly_int64 top_of_locals = -1;
 
+  vec_safe_truncate (frame.saved_gprs, 0);
+  vec_safe_truncate (frame.saved_fprs, 0);
+  vec_safe_truncate (frame.saved_prs, 0);
+
   frame.emit_frame_chain = aarch64_needs_frame_chain ();
 
   /* Adjust the outgoing arguments size if required.  Keep it in sync with what
@@ -8618,6 +8622,7 @@ aarch64_layout_frame (void)
   for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++)
 if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
   {
+   vec_safe_push (frame.saved_prs, regno);
if (frame.sve_save_and_probe == INVALID_REGNUM)
  frame.sve_save_and_probe = regno;
frame.reg_offset[regno] = offset;
@@ -8639,7 +8644,7 @@ aarch64_layout_frame (void)
 If we don't have any vector registers to save, and we know how
 big the predicate save area is, we can just round it up to the
 next 16-byte boundary.  */
-  if (last_fp_reg == (int) INVALID_REGNUM && offset.is_constant ())
+  if (last_fp_reg == INVALID_REGNUM && offset.is_constant ())
offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT);
   else
{
@@ -8653,10 +8658,11 @@ aarch64_layout_frame (void)
 }
 
   /* If we need to save any SVE vector registers, add them next.  */
-  if (last_fp_reg != (int) INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE)
+  if (last_fp_reg != INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE)
 for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
   if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED))
{
+ vec_safe_push (frame.saved_fprs, regno);
  if (frame.sve_save_and_probe == INVALID_REGNUM)
frame.sve_save_and_probe = regno;
  frame.reg_offset[regno] = offset;
@@ -8677,13 +8683,8 @@ aarch64_layout_frame (void)
 
   auto allocate_gpr_slot = [&](unsigned int regno)
 {
-  if (frame.hard_fp_save_and_probe == INVALID_REGNUM)
-   frame.hard_fp_save_and_probe = regno;
+  vec_safe_push

Re: [PATCH] c++: accepts-invalid with =delete("") [PR111840]

2023-10-17 Thread Jason Merrill


On 10/17/23 17:38, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration
so that we don't emit multiple errors in g++.dg/parse/error57.C.
But that means we don't diagnose

   int f1() = delete("george_crumb");

anymore, because fn decls often have error_mark_node in their
DECL_INITIAL.  (The code may be allowed one day via https://wg21.link/P2573R0.)

I was hoping I could use cp_parser_error_occurred but that would
regress error57.C.

PR c++/111840

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_declaration): Do cp_parser_error
for FUNCTION_DECLs.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error65.C: New test.
---
  gcc/cp/parser.cc | 14 +++---
  gcc/testsuite/g++.dg/parse/error65.C | 10 ++
  2 files changed, 17 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/parse/error65.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 59b9852895e..57b62fb7363 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15669,6 +15669,7 @@ cp_parser_simple_declaration (cp_parser* parser,
maybe_range_for_decl,
_loc,
_result);
+  const bool fndecl_p = TREE_CODE (decl) == FUNCTION_DECL;
/* If an error occurred while parsing tentatively, exit quickly.
 (That usually happens when in the body of a function; each
 statement is treated as a declaration-statement until proven
@@ -15682,16 +15683,13 @@ cp_parser_simple_declaration (cp_parser* parser,
 init-declarator, they shall all form declarations of
 variables.  */
  if (auto_function_declaration == NULL_TREE)
-   auto_function_declaration
- = TREE_CODE (decl) == FUNCTION_DECL ? decl : error_mark_node;
- else if (TREE_CODE (decl) == FUNCTION_DECL
-  || auto_function_declaration != error_mark_node)
+   auto_function_declaration = fndecl_p ? decl : error_mark_node;
+ else if (fndecl_p || auto_function_declaration != error_mark_node)
{
  error_at (decl_specifiers.locations[ds_type_spec],
"non-variable %qD in declaration with more than one "
"declarator with placeholder type",
-   TREE_CODE (decl) == FUNCTION_DECL
-   ? decl : auto_function_declaration);
+   fndecl_p ? decl : auto_function_declaration);
  auto_function_declaration = error_mark_node;
}
}
@@ -15763,7 +15761,9 @@ cp_parser_simple_declaration (cp_parser* parser,
  /* If we have already issued an error message we don't need
 to issue another one.  */
  if ((decl != error_mark_node
-  && DECL_INITIAL (decl) != error_mark_node)
+  /* grokfndecl sets DECL_INITIAL to error_mark_node for
+ functions.  */
+  && (fndecl_p || DECL_INITIAL (decl) != error_mark_node))
  || cp_parser_uncommitted_to_tentative_parse_p (parser))
cp_parser_error (parser, "expected %<,%> or %<;%>");
  /* Skip tokens until we reach the end of the statement.  */
diff --git a/gcc/testsuite/g++.dg/parse/error65.C 
b/gcc/testsuite/g++.dg/parse/error65.C
new file mode 100644
index 000..d9e0a4bfbcb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/error65.C
@@ -0,0 +1,10 @@
+// PR c++/111840
+// { dg-do compile { target c++11 } }
+
+// NB: =delete("reason") may be allowed via P2573.
+int f1() = delete("should have a reason"); // { dg-error "expected" }
+int f2() = delete[""]; // { dg-error "expected" }
+int f3() = delete{""}; // { dg-error "expected" }
+int f4() = delete""; // { dg-error "expected" }
+int f5() = delete[{'a'""; // { dg-error "expected" }
+int i = f5();

base-commit: bac21b7ea62bd3a7911e01cf803d6bf6516fbf7b

[PATCH] c++: accepts-invalid with =delete("") [PR111840]

2023-10-17 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration
so that we don't emit multiple errors in g++.dg/parse/error57.C.
But that means we don't diagnose

  int f1() = delete("george_crumb");

anymore, because fn decls often have error_mark_node in their
DECL_INITIAL.  (The code may be allowed one day via https://wg21.link/P2573R0.)

I was hoping I could use cp_parser_error_occurred but that would
regress error57.C.

PR c++/111840

gcc/cp/ChangeLog:

* parser.cc (cp_parser_simple_declaration): Do cp_parser_error
for FUNCTION_DECLs.

gcc/testsuite/ChangeLog:

* g++.dg/parse/error65.C: New test.
---
 gcc/cp/parser.cc | 14 +++---
 gcc/testsuite/g++.dg/parse/error65.C | 10 ++
 2 files changed, 17 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/error65.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 59b9852895e..57b62fb7363 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15669,6 +15669,7 @@ cp_parser_simple_declaration (cp_parser* parser,
maybe_range_for_decl,
_loc,
_result);
+  const bool fndecl_p = TREE_CODE (decl) == FUNCTION_DECL;
   /* If an error occurred while parsing tentatively, exit quickly.
 (That usually happens when in the body of a function; each
 statement is treated as a declaration-statement until proven
@@ -15682,16 +15683,13 @@ cp_parser_simple_declaration (cp_parser* parser,
 init-declarator, they shall all form declarations of
 variables.  */
  if (auto_function_declaration == NULL_TREE)
-   auto_function_declaration
- = TREE_CODE (decl) == FUNCTION_DECL ? decl : error_mark_node;
- else if (TREE_CODE (decl) == FUNCTION_DECL
-  || auto_function_declaration != error_mark_node)
+   auto_function_declaration = fndecl_p ? decl : error_mark_node;
+ else if (fndecl_p || auto_function_declaration != error_mark_node)
{
  error_at (decl_specifiers.locations[ds_type_spec],
"non-variable %qD in declaration with more than one "
"declarator with placeholder type",
-   TREE_CODE (decl) == FUNCTION_DECL
-   ? decl : auto_function_declaration);
+   fndecl_p ? decl : auto_function_declaration);
  auto_function_declaration = error_mark_node;
}
}
@@ -15763,7 +15761,9 @@ cp_parser_simple_declaration (cp_parser* parser,
  /* If we have already issued an error message we don't need
 to issue another one.  */
  if ((decl != error_mark_node
-  && DECL_INITIAL (decl) != error_mark_node)
+  /* grokfndecl sets DECL_INITIAL to error_mark_node for
+ functions.  */
+  && (fndecl_p || DECL_INITIAL (decl) != error_mark_node))
  || cp_parser_uncommitted_to_tentative_parse_p (parser))
cp_parser_error (parser, "expected %<,%> or %<;%>");
  /* Skip tokens until we reach the end of the statement.  */
diff --git a/gcc/testsuite/g++.dg/parse/error65.C 
b/gcc/testsuite/g++.dg/parse/error65.C
new file mode 100644
index 000..d9e0a4bfbcb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/error65.C
@@ -0,0 +1,10 @@
+// PR c++/111840
+// { dg-do compile { target c++11 } }
+
+// NB: =delete("reason") may be allowed via P2573.
+int f1() = delete("should have a reason"); // { dg-error "expected" }
+int f2() = delete[""]; // { dg-error "expected" }
+int f3() = delete{""}; // { dg-error "expected" }
+int f4() = delete""; // { dg-error "expected" }
+int f5() = delete[{'a'""; // { dg-error "expected" }
+int i = f5();

base-commit: bac21b7ea62bd3a7911e01cf803d6bf6516fbf7b
-- 
2.41.0

[pushed] c++: mangling tweaks

2023-10-17 Thread Jason Merrill

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Most of this is introducing the abi_check function to reduce the verbosity
of most places that check -fabi-version.

The start_mangling change is to avoid needing to zero-initialize additional
members of the mangling globals, though I'm not actually adding any.

The comment documents existing semantics.

gcc/cp/ChangeLog:

* mangle.cc (abi_check): New.
(write_prefix, write_unqualified_name, write_discriminator)
(write_type, write_member_name, write_expression)
(write_template_arg, write_template_param): Use it.
(start_mangling): Assign from {}.
* cp-tree.h: Update comment.
---
 gcc/cp/cp-tree.h |  6 ++--
 gcc/cp/mangle.cc | 85 +---
 2 files changed, 34 insertions(+), 57 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index efcd2de54e5..1d7df62961e 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -4969,12 +4969,14 @@ get_vec_init_expr (tree t)
 /* The DECL_TEMPLATE_PARMS are a list.  The TREE_PURPOSE of each node
is a INT_CST whose TREE_INT_CST_LOW indicates the level of the
template parameters, with 1 being the outermost set of template
-   parameters.  The TREE_VALUE is a vector, whose elements are the
+   parameters.  The TREE_TYPE is TEMPLATE_PARMS_CONSTRAINTS.
+   The TREE_VALUE is a vector, whose elements are the
template parameters at each level.  Each element in the vector is a
TREE_LIST, whose TREE_VALUE is a PARM_DECL (if the parameter is a
non-type parameter), or a TYPE_DECL (if the parameter is a type
parameter) or a TEMPLATE_DECL (if the parameter is a template
-   parameter).  The TREE_PURPOSE is the default value, if any.  The
+   parameter).  The TREE_PURPOSE is the default value, if any.
+   The TREE_TYPE is TEMPLATE_PARM_CONSTRAINTS.  The
TEMPLATE_PARM_INDEX for the parameter is available as the
DECL_INITIAL (for a PARM_DECL) or as the TREE_TYPE (for a
TYPE_DECL).
diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index d079f724910..afa68da871c 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -275,6 +275,17 @@ static tree mangle_special_for_type (const tree, const 
char *);
 #define write_unsigned_number(NUMBER)  \
   write_number ((NUMBER), /*unsigned_p=*/1, 10)
 
+/* Check for -fabi-version dependent mangling and also set the need_abi_warning
+   flag as appropriate.  */
+
+static bool
+abi_check (int ver)
+{
+  if (abi_warn_or_compat_version_crosses (ver))
+G.need_abi_warning = true;
+  return abi_version_at_least (ver);
+}
+
 /* If DECL is a template instance (including the uninstantiated template
itself), return its TEMPLATE_INFO.  Otherwise return NULL.  */
 
@@ -1267,9 +1278,7 @@ write_prefix (const tree node)
  /* Before ABI 18, we did not count these as substitution
 candidates.  This leads to incorrect demanglings (and
 ABI divergence to other compilers).  */
- if (abi_warn_or_compat_version_crosses (18))
-   G.need_abi_warning = true;
- if (!abi_version_at_least (18))
+ if (!abi_check (18))
return;
}
 }
@@ -1542,9 +1551,7 @@ write_unqualified_name (tree decl)
   && any_abi_below (11))
 if (tree mtags = missing_abi_tags (decl))
   {
-   if (abi_warn_or_compat_version_crosses (11))
- G.need_abi_warning = true;
-   if (!abi_version_at_least (11))
+   if (!abi_check (11))
  tags = chainon (mtags, tags);
   }
   write_abi_tags (tags);
@@ -2094,9 +2101,7 @@ write_discriminator (const int discriminator)
   write_char ('_');
   if (discriminator - 1 >= 10)
{
- if (abi_warn_or_compat_version_crosses (11))
-   G.need_abi_warning = 1;
- if (abi_version_at_least (11))
+ if (abi_check (11))
write_char ('_');
}
   write_unsigned_number (discriminator - 1);
@@ -2425,9 +2430,7 @@ write_type (tree type)
 
  if (etype && !type_uses_auto (etype))
{
- if (abi_warn_or_compat_version_crosses (5))
-   G.need_abi_warning = 1;
- if (!abi_version_at_least (5))
+ if (!abi_check (5))
{
  write_type (etype);
  return;
@@ -2448,10 +2451,8 @@ write_type (tree type)
 
case NULLPTR_TYPE:
  write_string ("Dn");
- if (abi_version_at_least (7))
+ if (abi_check (7))
++is_builtin_type;
- if (abi_warn_or_compat_version_crosses (7))
-   G.need_abi_warning = 1;
  break;
 
case TYPEOF_TYPE:
@@ -2935,10 +2936,8 @@ write_member_name (tree member)
 {
   if (IDENTIFIER_ANY_OP_P (member))
{
- if (abi_version_at_least (11))
+ if (abi_check (11))

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-17 Thread Jason Merrill


On 9/25/23 21:56, waffl3x wrote:


On the plus side, I took my time to figure out how to best to pass down
information about whether a param is an xobj param. My initial
impression on what you were suggesting was to push another node on the
front of the list, but I stared at it for a few hours and didn't think
it would work out.


I was thinking to set a TREE_LANG_FLAG on the TREE_LIST node.


However, eventually I realized that the purpose
member if free for xobj params as it is illegal for them to have
default arguments.


Hmm, is it?  I see that clang thinks so, but I don't know where they get 
that idea from.  The grammar certainly allows it:



attribute-specifier-seqopt decl-specifier-seq declarator = initializer-clause


and I don't see anything else that prohibits it.

Jason

Re: [PATCH 11/11] aarch64: Add new load/store pair fusion pass.

2023-10-17 Thread Andrew Pinski

On Tue, Oct 17, 2023 at 1:52 PM Alex Coplan  wrote:
>
> This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
> and store pairs (LDPs and STPs).
>
> As a motivating example for the kind of thing this improves, take the
> following testcase:
>
> extern double c[20];
>
> double f(double x)
> {
>   double y = x*x;
>   y += c[16];
>   y += c[17];
>   y += c[18];
>   y += c[19];
>   return y;
> }
>
> for which we currently generate (at -O2):
>
> f:
> adrpx0, c
> add x0, x0, :lo12:c
> ldp d31, d29, [x0, 128]
> ldr d30, [x0, 144]
> fmadd   d0, d0, d0, d31
> ldr d31, [x0, 152]
> faddd0, d0, d29
> faddd0, d0, d30
> faddd0, d0, d31
> ret
>
> but with the pass, we generate:
>
> f:
> .LFB0:
> adrpx0, c
> add x0, x0, :lo12:c
> ldp d31, d29, [x0, 128]
> fmadd   d0, d0, d0, d31
> ldp d30, d31, [x0, 144]
> faddd0, d0, d29
> faddd0, d0, d30
> faddd0, d0, d31
> ret
>
> The pass is local (only considers a BB at a time).  In theory, it should
> be possible to extend it to run over EBBs, at least in the case of pure
> (MEM_READONLY_P) loads, but this is left for future work.
>
> The pass works by identifying two kinds of bases: tree decls obtained
> via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
> If a candidate memory access has a MEM_EXPR base, then we track it via
> this base, and otherwise if it is of a simple reg +  form, we track
> it via the RTL-SSA def_info for the register.
>
> For each BB, for a given kind of base, we build up a hash table mapping
> the base to an access_group.  The access_group data structure holds a
> list of accesses at each offset relative to the same base.  It uses a
> splay tree to support efficient insertion (while walking the bb), and
> the nodes are chained using a linked list to support efficient
> iteration (while doing the transformation).
>
> For each base, we then iterate over the access_group to identify
> adjacent accesses, and try to form load/store pairs for those insns that
> access adjacent memory.
>
> The pass is currently run twice, both before and after register
> allocation.  The first copy of the pass is run late in the pre-RA RTL
> pipeline, immediately after sched1, since it was found that sched1 was
> increasing register pressure when the pass was run before.  The second
> copy of the pass runs immediately before peephole2, so as to get any
> opportunities that the existing ldp/stp peepholes can handle.
>
> There are some cases that we punt on before RA, e.g.
> accesses relative to eliminable regs (such as the soft frame pointer).
> We do this since we can't know the elimination offset before RA, and we
> want to avoid the RA reloading the offset (due to being out of ldp/stp
> immediate range) as this can generate worse code.
>
> The post-RA copy of the pass is there to pick up the crumbs that were
> left behind / things we punted on in the pre-RA pass.  Among other
> things, it's needed to handle accesses relative to the stack pointer
> (see the previous patch in the series for an example).  It can also
> handle code that didn't exist at the time the pre-RA pass was run (spill
> code, prologue/epilogue code).
>
> The following table shows the effect of the passes on code size in
> SPEC CPU 2017 with -Os -flto=auto -mcpu=neoverse-v1:
>
> +-+-+--+-+
> |Benchmark| Pre-RA pass | Post-RA pass | Overall |
> +-+-+--+-+
> | 541.leela_r | 0.04%   | -0.03%   | 0.01%   |
> | 502.gcc_r   | -0.07%  | -0.02%   | -0.09%  |
> | 510.parest_r| -0.06%  | -0.04%   | -0.10%  |
> | 505.mcf_r   | -0.12%  | 0.00%| -0.12%  |
> | 500.perlbench_r | -0.12%  | -0.02%   | -0.15%  |
> | 520.omnetpp_r   | -0.13%  | -0.03%   | -0.16%  |
> | 538.imagick_r   | -0.17%  | -0.02%   | -0.19%  |
> | 525.x264_r  | -0.17%  | -0.02%   | -0.19%  |
> | 544.nab_r   | -0.22%  | -0.01%   | -0.23%  |
> | 557.xz_r| -0.27%  | -0.01%   | -0.28%  |
> | 507.cactuBSSN_r | -0.26%  | -0.03%   | -0.29%  |
> | 526.blender_r   | -0.37%  | -0.02%   | -0.38%  |
> | 523.xalancbmk_r | -0.41%  | -0.01%   | -0.42%  |
> | 531.deepsjeng_r | -0.41%  | -0.05%   | -0.46%  |
> | 511.povray_r| -0.60%  | -0.05%   | -0.65%  |
> | 548.exchange2_r | -0.55%  | -0.32%   | -0.86%  |
> | 527.cam4_r  | -0.82%  | -0.16%   | -0.98%  |
> | 503.bwaves_r| -0.63%  | -0.41%   | -1.04%  |
> | 521.wrf_r   | -1.04%  | -0.06%   | -1.10%  |
> | 549.fotonik3d_r | -0.91%  | -0.35%   | -1.26%  |
> | 554.roms_r  | -1.20%  | -0.20%   | -1.40%  |
> | 519.lbm_r   | -1.91%  | 0.00%|

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-17 Thread Marek Polacek

On Tue, Oct 17, 2023 at 04:49:52PM -0400, Jason Merrill wrote:
> On 10/16/23 20:39, Marek Polacek wrote:
> > On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:
> > > On 10/13/23 14:53, Marek Polacek wrote:
> > > > On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:
> > > > > On 10/12/23 17:04, Marek Polacek wrote:
> > > > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > > > 
> > > > > > -- >8 --
> > > > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > > > compile time with nested COND_EXPRs.  The problem is that the 
> > > > > > COND_EXPR
> > > > > > case recursively walks the arms of a COND_EXPR, but after processing
> > > > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > > > sub-expressions of the outermost COND_EXPR, triggering again walking
> > > > > > the arms of the nested COND_EXPR, and so on.  This patch brings the
> > > > > > compile time down to about 0m0.033s.
> > > > > > 
> > > > > > I've added some debug prints to make sure that the rest of cp_fold_r
> > > > > > is still performed as before.
> > > > > > 
> > > > > >PR c++/111660
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > >* cp-gimplify.cc (cp_fold_immediate_r) : 
> > > > > > Return
> > > > > >integer_zero_node instead of break;.
> > > > > >(cp_fold_immediate): Return true if cp_fold_immediate_r 
> > > > > > returned
> > > > > >error_mark_node.
> > > > > > 
> > > > > > gcc/testsuite/ChangeLog:
> > > > > > 
> > > > > >* g++.dg/cpp0x/hog1.C: New test.
> > > > > > ---
> > > > > > gcc/cp/cp-gimplify.cc |  9 ++--
> > > > > > gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 
> > > > > > +++
> > > > > > 2 files changed, 82 insertions(+), 4 deletions(-)
> > > > > > create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C
> > > > > > 
> > > > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > > > index bdf6e5f98ff..ca622ca169a 100644
> > > > > > --- a/gcc/cp/cp-gimplify.cc
> > > > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > > > @@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int 
> > > > > > *walk_subtrees, void *data_)
> > > > > > break;
> > > > > >   if (TREE_OPERAND (stmt, 1)
> > > > > >   && cp_walk_tree (_OPERAND (stmt, 1), 
> > > > > > cp_fold_immediate_r, data,
> > > > > > -  nullptr))
> > > > > > +  nullptr) == error_mark_node)
> > > > > > return error_mark_node;
> > > > > >   if (TREE_OPERAND (stmt, 2)
> > > > > >   && cp_walk_tree (_OPERAND (stmt, 2), 
> > > > > > cp_fold_immediate_r, data,
> > > > > > -  nullptr))
> > > > > > +  nullptr) == error_mark_node)
> > > > > > return error_mark_node;
> > > > > >   /* We're done here.  Don't clear *walk_subtrees here 
> > > > > > though: we're called
> > > > > >  from cp_fold_r and we must let it recurse on the 
> > > > > > expression with
> > > > > >  cp_fold.  */
> > > > > > -  break;
> > > > > > +  return integer_zero_node;
> > > > > 
> > > > > I'm concerned this will end up missing something like
> > > > > 
> > > > > 1 ? 1 : ((1 ? 1 : 1), immediate())
> > > > > 
> > > > > as the integer_zero_node from the inner ?: will prevent walk_tree from
> > > > > looking any farther.
> > > > 
> > > > You are right.  The line above works as expected, but
> > > > 
> > > > 1 ? 1 : ((1 ? 1 : id (42)), id (i));
> > > > 
> > > > shows the problem (when the expression isn't used as an initializer).
> > > > 
> > > > > Maybe we want to handle COND_EXPR in cp_fold_r instead of here?
> > > > 
> > > > I hope this version is better.
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
> > > > case recursively walks the arms of a COND_EXPR, but after processing
> > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > sub-expressions of the outermost COND_EXPR, triggering again walking
> > > > the arms of the nested COND_EXPR, and so on.  This patch brings the
> > > > compile time down to about 0m0.033s.
> > > 
> > > Is this number still accurate for this version?
> > 
> > It is.  I ran time(1) a few more times and the results were 0m0.033s - 
> > 0m0.035s.
> > That said, ...
> > 
> > > This change seems algorithmically better than the current code, but still
> > > problematic: if we have nested COND_EXPR A/B/C/D/E, it looks like we will
> > > end up cp_fold_immediate_r walking the arms of E five times, once for each
> > > COND_EXPR.
> > 
> > ...this is accurate.  I should have addressed the redundant folding in v2
> > even though the compilation

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-17 Thread Jason Merrill


On 9/25/23 21:56, waffl3x wrote:


Also, just a quick update on my copyright assignment, I have sent an
e-mail to the FSF and haven't gotten a response yet. From what I was
reading, I am confident that it's my preferred option going forward
though. Hopefully they get back to me soon.


Any progress on this, or do I need to coax the process along?  :)

Jason

[PATCH 11/11] aarch64: Add new load/store pair fusion pass.

2023-10-17 Thread Alex Coplan

This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
and store pairs (LDPs and STPs).

As a motivating example for the kind of thing this improves, take the
following testcase:

extern double c[20];

double f(double x)
{
  double y = x*x;
  y += c[16];
  y += c[17];
  y += c[18];
  y += c[19];
  return y;
}

for which we currently generate (at -O2):

f:
adrpx0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
ldr d30, [x0, 144]
fmadd   d0, d0, d0, d31
ldr d31, [x0, 152]
faddd0, d0, d29
faddd0, d0, d30
faddd0, d0, d31
ret

but with the pass, we generate:

f:
.LFB0:
adrpx0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
fmadd   d0, d0, d0, d31
ldp d30, d31, [x0, 144]
faddd0, d0, d29
faddd0, d0, d30
faddd0, d0, d31
ret

The pass is local (only considers a BB at a time).  In theory, it should
be possible to extend it to run over EBBs, at least in the case of pure
(MEM_READONLY_P) loads, but this is left for future work.

The pass works by identifying two kinds of bases: tree decls obtained
via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
If a candidate memory access has a MEM_EXPR base, then we track it via
this base, and otherwise if it is of a simple reg +  form, we track
it via the RTL-SSA def_info for the register.

For each BB, for a given kind of base, we build up a hash table mapping
the base to an access_group.  The access_group data structure holds a
list of accesses at each offset relative to the same base.  It uses a
splay tree to support efficient insertion (while walking the bb), and
the nodes are chained using a linked list to support efficient
iteration (while doing the transformation).

For each base, we then iterate over the access_group to identify
adjacent accesses, and try to form load/store pairs for those insns that
access adjacent memory.

The pass is currently run twice, both before and after register
allocation.  The first copy of the pass is run late in the pre-RA RTL
pipeline, immediately after sched1, since it was found that sched1 was
increasing register pressure when the pass was run before.  The second
copy of the pass runs immediately before peephole2, so as to get any
opportunities that the existing ldp/stp peepholes can handle.

There are some cases that we punt on before RA, e.g.
accesses relative to eliminable regs (such as the soft frame pointer).
We do this since we can't know the elimination offset before RA, and we
want to avoid the RA reloading the offset (due to being out of ldp/stp
immediate range) as this can generate worse code.

The post-RA copy of the pass is there to pick up the crumbs that were
left behind / things we punted on in the pre-RA pass.  Among other
things, it's needed to handle accesses relative to the stack pointer
(see the previous patch in the series for an example).  It can also
handle code that didn't exist at the time the pre-RA pass was run (spill
code, prologue/epilogue code).

The following table shows the effect of the passes on code size in
SPEC CPU 2017 with -Os -flto=auto -mcpu=neoverse-v1:

+-+-+--+-+
|Benchmark| Pre-RA pass | Post-RA pass | Overall |
+-+-+--+-+
| 541.leela_r | 0.04%   | -0.03%   | 0.01%   |
| 502.gcc_r   | -0.07%  | -0.02%   | -0.09%  |
| 510.parest_r| -0.06%  | -0.04%   | -0.10%  |
| 505.mcf_r   | -0.12%  | 0.00%| -0.12%  |
| 500.perlbench_r | -0.12%  | -0.02%   | -0.15%  |
| 520.omnetpp_r   | -0.13%  | -0.03%   | -0.16%  |
| 538.imagick_r   | -0.17%  | -0.02%   | -0.19%  |
| 525.x264_r  | -0.17%  | -0.02%   | -0.19%  |
| 544.nab_r   | -0.22%  | -0.01%   | -0.23%  |
| 557.xz_r| -0.27%  | -0.01%   | -0.28%  |
| 507.cactuBSSN_r | -0.26%  | -0.03%   | -0.29%  |
| 526.blender_r   | -0.37%  | -0.02%   | -0.38%  |
| 523.xalancbmk_r | -0.41%  | -0.01%   | -0.42%  |
| 531.deepsjeng_r | -0.41%  | -0.05%   | -0.46%  |
| 511.povray_r| -0.60%  | -0.05%   | -0.65%  |
| 548.exchange2_r | -0.55%  | -0.32%   | -0.86%  |
| 527.cam4_r  | -0.82%  | -0.16%   | -0.98%  |
| 503.bwaves_r| -0.63%  | -0.41%   | -1.04%  |
| 521.wrf_r   | -1.04%  | -0.06%   | -1.10%  |
| 549.fotonik3d_r | -0.91%  | -0.35%   | -1.26%  |
| 554.roms_r  | -1.20%  | -0.20%   | -1.40%  |
| 519.lbm_r   | -1.91%  | 0.00%| -1.91%  |
| 508.namd_r  | -2.40%  | -0.07%   | -2.47%  |
+-+-+--+-|
| SPEC CPU 2017   | -0.51%  | -0.05%   | -0.56%  |
+-+-+--+-+

Performance-wise, with

[PATCH 10/11] aarch64: Generalise TFmode load/store pair patterns

2023-10-17 Thread Alex Coplan

This patch generalises the TFmode load/store pair patterns to TImode and
TDmode.  This brings them in line with the DXmode patterns, and uses the
same technique with separate mode iterators (TX and TX2) to allow for
distinct modes in each arm of the load/store pair.

For example, in combination with the post-RA load/store pair fusion pass
in the following patch, this improves the codegen for the following
varargs testcase involving TImode stores:

void g(void *);
int foo(int x, ...)
{
__builtin_va_list ap;
__builtin_va_start (ap, x);
g();
__builtin_va_end (ap);
}

from:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 184]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
str q0, [sp, 48]
str q1, [sp, 64]
str q2, [sp, 80]
str q3, [sp, 96]
str q4, [sp, 112]
str q5, [sp, 128]
str q6, [sp, 144]
str q7, [sp, 160]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl  g
ldp x29, x30, [sp], 240
.LCFI1:
ret

to:

foo:
.LFB0:
stp x29, x30, [sp, -240]!
.LCFI0:
mov w9, -56
mov w8, -128
mov x29, sp
add x10, sp, 176
stp x1, x2, [sp, 1bd4971b7c71e70a637a1dq84]
add x1, sp, 240
add x0, sp, 16
stp x1, x1, [sp, 16]
str x10, [sp, 32]
stp w9, w8, [sp, 40]
stp q0, q1, [sp, 48]
stp q2, q3, [sp, 80]
stp q4, q5, [sp, 112]
stp q6, q7, [sp, 144]
stp x3, x4, [sp, 200]
stp x5, x6, [sp, 216]
str x7, [sp, 232]
bl  g
ldp x29, x30, [sp], 240
.LCFI1:
ret

Note that this patch isn't needed if we only use the mode
canonicalization approach in the new ldp fusion pass (since we
canonicalize T{I,F,D}mode to V16QImode), but we seem to get slightly
better performance with mode canonicalization disabled (see
--param=aarch64-ldp-canonicalize-modes in the following patch).

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* config/aarch64/aarch64.md (load_pair_dw_tftf): Rename to ...
(load_pair_dw_): ... this.
(store_pair_dw_tftf): Rename to ...
(store_pair_dw_): ... this.
* config/aarch64/iterators.md (TX2): New.
---
 gcc/config/aarch64/aarch64.md   | 22 +++---
 gcc/config/aarch64/iterators.md |  3 +++
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 32c7adc8928..e6af09c2e8b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1757,16 +1757,16 @@ (define_insn "load_pair_dw_"
   }
 )
 
-(define_insn "load_pair_dw_tftf"
-  [(set (match_operand:TF 0 "register_operand" "=w")
-	(match_operand:TF 1 "aarch64_mem_pair_operand" "Ump"))
-   (set (match_operand:TF 2 "register_operand" "=w")
-	(match_operand:TF 3 "memory_operand" "m"))]
+(define_insn "load_pair_dw_"
+  [(set (match_operand:TX 0 "register_operand" "=w")
+	(match_operand:TX 1 "aarch64_mem_pair_operand" "Ump"))
+   (set (match_operand:TX2 2 "register_operand" "=w")
+	(match_operand:TX2 3 "memory_operand" "m"))]
"TARGET_SIMD
 && rtx_equal_p (XEXP (operands[3], 0),
 		plus_constant (Pmode,
    XEXP (operands[1], 0),
-   GET_MODE_SIZE (TFmode)))"
+   GET_MODE_SIZE (mode)))"
   "ldp\\t%q0, %q2, %z1"
   [(set_attr "type" "neon_ldp_q")
(set_attr "fp" "yes")]
@@ -1805,11 +1805,11 @@ (define_insn "store_pair_dw_"
   }
 )
 
-(define_insn "store_pair_dw_tftf"
-  [(set (match_operand:TF 0 "aarch64_mem_pair_operand" "=Ump")
-	(match_operand:TF 1 "register_operand" "w"))
-   (set (match_operand:TF 2 "memory_operand" "=m")
-	(match_operand:TF 3 "register_operand" "w"))]
+(define_insn "store_pair_dw_"
+  [(set (match_operand:TX 0 "aarch64_mem_pair_operand" "=Ump")
+	(match_operand:TX 1 "register_operand" "w"))
+   (set (match_operand:TX2 2 "memory_operand" "=m")
+	(match_operand:TX2 3 "register_operand" "w"))]
"TARGET_SIMD &&
 rtx_equal_p (XEXP (operands[2], 0),
 		 plus_constant (Pmode,
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 2451d8c2cd8..f9e2210095e 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -319,6 +319,9 @@ (define_mode_iterator VS [V2SI V4SI])
 
 (define_mode_iterator TX [TI TF TD])
 
+;; Duplicate of the above
+(define_mode_iterator TX2 [TI TF TD])
+
 (define_mode_iterator VTX [TI TF TD V16QI V8HI V4SI V2DI V8HF V4SF V2DF V8BF])
 
 ;; Advanced SIMD opaque structure modes.

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-17 Thread Jason Merrill


On 10/16/23 20:39, Marek Polacek wrote:

On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:

On 10/13/23 14:53, Marek Polacek wrote:

On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:

On 10/12/23 17:04, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.

I've added some debug prints to make sure that the rest of cp_fold_r
is still performed as before.

   PR c++/111660

gcc/cp/ChangeLog:

   * cp-gimplify.cc (cp_fold_immediate_r) : Return
   integer_zero_node instead of break;.
   (cp_fold_immediate): Return true if cp_fold_immediate_r returned
   error_mark_node.

gcc/testsuite/ChangeLog:

   * g++.dg/cpp0x/hog1.C: New test.
---
gcc/cp/cp-gimplify.cc |  9 ++--
gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 +++
2 files changed, 82 insertions(+), 4 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index bdf6e5f98ff..ca622ca169a 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, 
void *data_)
break;
  if (TREE_OPERAND (stmt, 1)
  && cp_walk_tree (_OPERAND (stmt, 1), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
  if (TREE_OPERAND (stmt, 2)
  && cp_walk_tree (_OPERAND (stmt, 2), cp_fold_immediate_r, data,
-  nullptr))
+  nullptr) == error_mark_node)
return error_mark_node;
  /* We're done here.  Don't clear *walk_subtrees here though: we're 
called
 from cp_fold_r and we must let it recurse on the expression with
 cp_fold.  */
-  break;
+  return integer_zero_node;


I'm concerned this will end up missing something like

1 ? 1 : ((1 ? 1 : 1), immediate())

as the integer_zero_node from the inner ?: will prevent walk_tree from
looking any farther.


You are right.  The line above works as expected, but

1 ? 1 : ((1 ? 1 : id (42)), id (i));

shows the problem (when the expression isn't used as an initializer).


Maybe we want to handle COND_EXPR in cp_fold_r instead of here?


I hope this version is better.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.033s.


Is this number still accurate for this version?


It is.  I ran time(1) a few more times and the results were 0m0.033s - 0m0.035s.
That said, ...


This change seems algorithmically better than the current code, but still
problematic: if we have nested COND_EXPR A/B/C/D/E, it looks like we will
end up cp_fold_immediate_r walking the arms of E five times, once for each
COND_EXPR.


...this is accurate.  I should have addressed the redundant folding in v2
even though the compilation is pretty much immediate.
  

What I was thinking by handling COND_EXPR in cp_fold_r was to cp_fold_r walk
its subtrees (or cp_fold_immediate_r if it's clear from op0 that the branch
isn't taken) so we can clear *walk_subtrees and we don't fold_imm walk a
node more than once.


I agree I should do better here.  How's this, then?  I've added
debug_generic_expr to cp_fold_immediate_r to see if it gets the same
expr multiple times and it doesn't seem to.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_fold_immediate_r caused exponential
compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
case recursively walks the arms of a COND_EXPR, but after processing
both arms it doesn't end the walk; it proceeds to walk the
sub-expressions of the outermost COND_EXPR, triggering again walking
the arms of the nested COND_EXPR, and so on.  This patch brings the
compile time down to about 0m0.030s.

The ff_fold_immediate flag is unused after this patch but since I'm
using it in the P2564 patch, I'm not removing it now.  Maybe at_eof
can be used instead and then we can remove

[PATCH 09/11] aarch64, testsuite: Fix up pr71727.c

2023-10-17 Thread Alex Coplan

The test is trying to check that we don't use q-register stores with
-mstrict-align, so actually check specifically for that.

This is a prerequisite to avoid regressing:

scan-assembler-not "add\tx0, x0, :"

with the upcoming ldp fusion pass, as we change where the ldps are
formed such that a register is used rather than a symbolic (lo_sum)
address for the first load.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr71727.c: Adjust scan-assembler-not to
make sure we don't have q-register stores with -mstrict-align.
---
 gcc/testsuite/gcc.target/aarch64/pr71727.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/pr71727.c b/gcc/testsuite/gcc.target/aarch64/pr71727.c
index 41fa72bc67e..226258a76fe 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr71727.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr71727.c
@@ -30,4 +30,4 @@ _start (void)
 }
 
 /* { dg-final { scan-assembler-times "mov\tx" 5 {target lp64} } } */
-/* { dg-final { scan-assembler-not "add\tx0, x0, :" {target lp64} } } */
+/* { dg-final { scan-assembler-not {st[rp]\tq[0-9]+} {target lp64} } } */

[PATCH 08/11] aarch64, testsuite: Tweak sve/pcs/args_9.c to allow stps

2023-10-17 Thread Alex Coplan

With the new ldp/stp pass enabled, there is a change in the codegen for
this test as follows:

add x8, sp, 16
ptrue   p3.h, mul3
str p3, [x8]
-   str x8, [sp, 8]
-   str x9, [sp]
+   stp x9, x8, [sp]
ptrue   p3.d, vl8
ptrue   p2.s, vl7
ptrue   p1.h, vl6

i.e. we now form an stp that we were missing previously. This patch
adjusts the scan-assembler such that it should pass whether or not
we form the stp.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/pcs/args_9.c: Adjust scan-assemblers to
allow for stp.
---
 gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
index ad9affadf02..942a44ab448 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/args_9.c
@@ -45,5 +45,5 @@ caller (int64_t *x0, int16_t *x1, svbool_t p0)
   return svcntp_b8 (res, res);
 }
 
-/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp\]\n} } } */
-/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\tstr\t\2, \[sp, 8\]\n} } } */
+/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.b, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\t(?:str\t\2, \[sp\]|stp\t\2, x[0-9]+, \[sp\])\n} } } */
+/* { dg-final { scan-assembler {\tptrue\t(p[0-9]+)\.h, mul3\n\tstr\t\1, \[(x[0-9]+)\]\n.*\t(?:str\t\2, \[sp, 8\]|stp\tx[0-9]+, \2, \[sp\])\n} } } */

[PATCH 07/11] aarch64, testsuite: Prevent stp in lr_free_1.c

2023-10-17 Thread Alex Coplan

The test is looking for individual stores which are able to be merged
into stp instructions.  The test currently passes -fno-schedule-fusion
-fno-peephole2, presumably to prevent these stores from being turned
into stps, but this is no longer sufficient with the new ldp/stp fusion
pass.

As such, we add --param=aarch64-stp-policy=never to prevent stps being
formed.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/lr_free_1.c: Add
--param=aarch64-stp-policy=never to dg-options.
---
 gcc/testsuite/gcc.target/aarch64/lr_free_1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/lr_free_1.c b/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
index 50dcf04e697..9949061096e 100644
--- a/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/lr_free_1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fno-inline -O2 -fomit-frame-pointer -ffixed-x2 -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-28 -ffixed-29 --save-temps -mgeneral-regs-only -fno-ipa-cp -fno-schedule-fusion -fno-peephole2" } */
+/* { dg-options "-fno-inline -O2 -fomit-frame-pointer -ffixed-x2 -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x16 -ffixed-x17 -ffixed-x18 -ffixed-x19 -ffixed-x20 -ffixed-x21 -ffixed-x22 -ffixed-x23 -ffixed-x24 -ffixed-x25 -ffixed-x26 -ffixed-x27 -ffixed-28 -ffixed-29 --save-temps -mgeneral-regs-only -fno-ipa-cp -fno-schedule-fusion -fno-peephole2 --param=aarch64-stp-policy=never" } */
 
 extern void abort ();

[PATCH 06/11] haifa-sched: Allow for NOTE_INSN_DELETED at start of epilogue

2023-10-17 Thread Alex Coplan

haifa-sched.cc:remove_notes asserts that it lands on a real (non-note)
insn after advancing past NOTE_INSN_EPILOGUE_BEG, but with the upcoming
post-RA aarch64 load pair pass enabled, we can land on
NOTE_INSN_DELETED.

This patch adjusts remove_notes to remove these if they occur at the
start of the epilogue instead of asserting.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* haifa-sched.cc (remove_notes): Allow for NOTE_INSN_DELETED at
the start of the epilgoue, remove these.
---
 gcc/haifa-sched.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/haifa-sched.cc b/gcc/haifa-sched.cc
index 8e8add709b3..9f45528fbe9 100644
--- a/gcc/haifa-sched.cc
+++ b/gcc/haifa-sched.cc
@@ -4249,6 +4249,17 @@ remove_notes (rtx_insn *head, rtx_insn *tail)
 		  && NOTE_KIND (next) == NOTE_INSN_BASIC_BLOCK
 		  && next != next_tail)
 		next = NEXT_INSN (next);
+
+	  /* Skip over any NOTE_INSN_DELETED at the start of the epilogue.
+	   */
+	  while (NOTE_P (next)
+		 && NOTE_KIND (next) == NOTE_INSN_DELETED)
+		{
+		  auto tmp = NEXT_INSN (next);
+		  delete_insn (next);
+		  next = tmp;
+		}
+
 	  gcc_assert (INSN_P (next));
 	  add_reg_note (next, REG_SAVE_NOTE,
 			GEN_INT (NOTE_INSN_EPILOGUE_BEG));

[PATCH 05/11] rtl-ssa: Support for inserting new insns

2023-10-17 Thread Alex Coplan

The upcoming aarch64 load pair pass needs to form store pairs, and can
re-order stores over loads when alias analysis determines this is safe.
In the case that both mem defs have uses in the RTL-SSA IR, and both
stores require re-ordering over their uses, we represent that as
(tentative) deletion of the original store insns and creation of a new
insn, to prevent requiring repeated re-parenting of uses during the
pass.  We then update all mem uses that require re-parenting in one go
at the end of the pass.

To support this, RTL-SSA needs to handle inserting new insns (rather
than just changing existing ones), so this patch adds support for that.

New insns (and new accesses) are temporaries, allocated above a temporary
obstack_watermark, such that the user can easily back out of a change without
awkward bookkeeping.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::create_set): New.
* rtl-ssa/accesses.h (access_info::is_temporary): New.
* rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns.
(function_info::finalize_new_accesses): Handle new/temporary
user-created accesses.
(function_info::apply_changes_to_insn): Ensure m_is_temp flag
on new insns gets cleared.
(function_info::change_insns): Handle new/temporary insns.
(function_info::create_insn): New.
* rtl-ssa/changes.h (class insn_change): Make function_info a
friend class.
* rtl-ssa/functions.h (function_info): Declare new entry points:
create_set, create_insn.  Declare new change_alloc helper.
* rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in
dump.
* rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying
is_temporary accessor.
* rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to
false.
* rtl-ssa/member-fns.inl (function_info::change_alloc): New.
* rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add
handling for temporary defs.
---
 gcc/rtl-ssa/accesses.cc| 10 ++
 gcc/rtl-ssa/accesses.h |  4 +++
 gcc/rtl-ssa/changes.cc | 73 +++---
 gcc/rtl-ssa/changes.h  |  2 ++
 gcc/rtl-ssa/functions.h| 14 
 gcc/rtl-ssa/insns.cc   |  5 +++
 gcc/rtl-ssa/insns.h|  7 +++-
 gcc/rtl-ssa/internals.inl  |  1 +
 gcc/rtl-ssa/member-fns.inl | 12 +++
 gcc/rtl-ssa/movement.h |  8 -
 10 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index 774ab9d99ee..edf8b75f4d6 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1398,6 +1398,16 @@ function_info::make_uses_available (obstack_watermark ,
   return use_array (new_uses, num_uses);
 }
 
+set_info *
+function_info::create_set (obstack_watermark ,
+			   insn_info *insn,
+			   resource_info resource)
+{
+  auto set = change_alloc (watermark, insn, resource);
+  set->m_is_temp = true;
+  return set;
+}
+
 // Return true if ACCESS1 can represent ACCESS2 and if ACCESS2 can
 // represent ACCESS1.
 static bool
diff --git a/gcc/rtl-ssa/accesses.h b/gcc/rtl-ssa/accesses.h
index fce31d46717..7e7a90ece97 100644
--- a/gcc/rtl-ssa/accesses.h
+++ b/gcc/rtl-ssa/accesses.h
@@ -204,6 +204,10 @@ public:
   // in the main instruction pattern.
   bool only_occurs_in_notes () const { return m_only_occurs_in_notes; }
 
+  // Return true if this is a temporary access, e.g. one created for
+  // an insn that is about to be inserted.
+  bool is_temporary () const { return m_is_temp; }
+
 protected:
   access_info (resource_info, access_kind);
 
diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index 523ad60d7d8..b11a88e0919 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -345,14 +345,20 @@ move_insn (insn_change , insn_info *after)
   // At the moment we don't support moving instructions between EBBs,
   // but this would be worth adding if it's useful.
   insn_info *insn = change.insn ();
-  gcc_assert (after->ebb () == insn->ebb ());
+
   bb_info *bb = after->bb ();
   basic_block cfg_bb = bb->cfg_bb ();
 
-  if (insn->bb () != bb)
-// Force DF to mark the old block as dirty.
-df_insn_delete (rtl);
-  ::remove_insn (rtl);
+  if (!insn->is_temporary ())
+{
+  gcc_assert (after->ebb () == insn->ebb ());
+
+  if (insn->bb () != bb)
+	// Force DF to mark the old block as dirty.
+	df_insn_delete (rtl);
+  ::remove_insn (rtl);
+}
+
   ::add_insn_after (rtl, after_rtl, cfg_bb);
 }
 
@@ -390,10 +396,15 @@ function_info::finalize_new_accesses (insn_change , insn_info *pos)
 	gcc_assert (def);
 	if (def->m_is_temp)
 	  {
-	// At present, the only temporary instruction definitions we
-	// create are clobbers, such as those added during recog.
-	gcc_assert (is_a (def));
-	def = allocate (change.insn

[PATCH 04/11] rtl-ssa: Support inferring uses of mem in change_insns

2023-10-17 Thread Alex Coplan

Currently, rtl_ssa::change_insns requires all new uses and defs to be
specified explicitly.  This turns out to be rather inconvenient for
forming load pairs in the new aarch64 load pair pass, as the pass has to
determine which mem def the final load pair consumes, and then obtain or
create a suitable use (i.e. significant bookkeeping, just to keep the
RTL-SSA IR consistent).  It turns out to be much more convenient to
allow change_insns to infer which def is consumed and create a suitable
use of mem itself.  This patch does that.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new
parameter to give final insn position, infer use of mem if it isn't
specified explicitly.
(function_info::change_insns): Pass down final insn position to
finalize_new_accesses.
* rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
---
 gcc/rtl-ssa/changes.cc  | 31 ---
 gcc/rtl-ssa/functions.h |  2 +-
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/rtl-ssa/changes.cc b/gcc/rtl-ssa/changes.cc
index c48ddd2463c..523ad60d7d8 100644
--- a/gcc/rtl-ssa/changes.cc
+++ b/gcc/rtl-ssa/changes.cc
@@ -370,8 +370,11 @@ update_insn_in_place (insn_change )
 // Finalize the new list of definitions and uses in CHANGE, removing
 // any uses and definitions that are no longer needed, and converting
 // pending clobbers into actual definitions.
+//
+// POS gives the final position of INSN, which hasn't yet been moved into
+// place.
 void
-function_info::finalize_new_accesses (insn_change )
+function_info::finalize_new_accesses (insn_change , insn_info *pos)
 {
   insn_info *insn = change.insn ();
 
@@ -462,13 +465,34 @@ function_info::finalize_new_accesses (insn_change )
   // Add (possibly temporary) uses to m_temp_uses for each resource.
   // If there are multiple references to the same resource, aggregate
   // information in the modes and flags.
+  use_info *mem_use = nullptr;
   for (rtx_obj_reference ref : properties.refs ())
 if (ref.is_read ())
   {
 	unsigned int regno = ref.regno;
 	machine_mode mode = ref.is_reg () ? ref.mode : BLKmode;
 	use_info *use = find_access (unshared_uses, ref.regno);
-	gcc_assert (use);
+	if (!use)
+	  {
+	// For now, we only support inferring uses of mem.
+	gcc_assert (regno == MEM_REGNO);
+
+	if (mem_use)
+	  {
+		mem_use->record_reference (ref, false);
+		continue;
+	  }
+
+	resource_info resource { mode, regno };
+	auto def = find_def (resource, pos).prev_def (pos);
+	auto set = safe_dyn_cast  (def);
+	gcc_assert (set);
+	mem_use = allocate (insn, resource, set);
+	mem_use->record_reference (ref, true);
+	m_temp_uses.safe_push (mem_use);
+	continue;
+	  }
+
 	if (use->m_has_been_superceded)
 	  {
 	// This is the first reference to the resource.
@@ -656,7 +680,8 @@ function_info::change_insns (array_slice changes)
 
 	  // Finalize the new list of accesses for the change.  Don't install
 	  // them yet, so that we still have access to the old lists below.
-	  finalize_new_accesses (change);
+	  finalize_new_accesses (change,
+ placeholder ? placeholder : insn);
 	}
   placeholders[i] = placeholder;
 }
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index d7da9774213..73690a0e63b 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -265,7 +265,7 @@ private:
 
   insn_info *add_placeholder_after (insn_info *);
   void possibly_queue_changes (insn_change &);
-  void finalize_new_accesses (insn_change &);
+  void finalize_new_accesses (insn_change &, insn_info *);
   void apply_changes_to_insn (insn_change &);
 
   void init_function_data ();

[PATCH 03/11] rtl-ssa: Add entry point to allow re-parenting uses

2023-10-17 Thread Alex Coplan

This is needed by the upcoming aarch64 load pair pass, as it can
re-order stores (when alias analysis determines this is safe) and thus
change which mem def a given use consumes (in the RTL-SSA view, there is
no alias disambiguation of memory).

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/accesses.cc (function_info::reparent_use): New.
* rtl-ssa/functions.h (function_info): Declare new member
function reparent_use.
---
 gcc/rtl-ssa/accesses.cc | 8 
 gcc/rtl-ssa/functions.h | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/rtl-ssa/accesses.cc b/gcc/rtl-ssa/accesses.cc
index f12b5f4dd77..774ab9d99ee 100644
--- a/gcc/rtl-ssa/accesses.cc
+++ b/gcc/rtl-ssa/accesses.cc
@@ -1239,6 +1239,14 @@ function_info::add_use (use_info *use)
 insert_use_before (use, neighbor->value ());
 }
 
+void
+function_info::reparent_use (use_info *use, set_info *new_def)
+{
+  remove_use (use);
+  use->set_def (new_def);
+  add_use (use);
+}
+
 // If USE has a known definition, remove USE from that definition's list
 // of uses.  Also remove if it from the associated splay tree, if any.
 void
diff --git a/gcc/rtl-ssa/functions.h b/gcc/rtl-ssa/functions.h
index 8b53b264064..d7da9774213 100644
--- a/gcc/rtl-ssa/functions.h
+++ b/gcc/rtl-ssa/functions.h
@@ -159,6 +159,9 @@ public:
   // Like change_insns, but for a single change CHANGE.
   void change_insn (insn_change );
 
+  // Given a use USE, re-parent it to get its def from NEW_DEF.
+  void reparent_use (use_info *use, set_info *new_def);
+
   // If the changes that have been made to instructions require updates
   // to the CFG, perform those updates now.  Return true if something changed.
   // If it did:

[PATCH 02/11] rtl-ssa: Add drop_memory_access helper

2023-10-17 Thread Alex Coplan

Add a helper routine to access-utils.h which removes the memory access
from an access_array, if it has one.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/access-utils.h (drop_memory_access): New.
---
 gcc/rtl-ssa/access-utils.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/rtl-ssa/access-utils.h b/gcc/rtl-ssa/access-utils.h
index fb2c2d3..0c108b18bb8 100644
--- a/gcc/rtl-ssa/access-utils.h
+++ b/gcc/rtl-ssa/access-utils.h
@@ -51,6 +51,17 @@ memory_access (T accesses) -> decltype (accesses[0])
   return nullptr;
 }
 
+template
+inline T
+drop_memory_access (T accesses)
+{
+  if (!memory_access (accesses))
+return accesses;
+
+  access_array arr (accesses);
+  return T (arr.begin (), accesses.size () - 1);
+}
+
 // If sorted array ACCESSES includes a reference to REGNO, return the
 // access, otherwise return null.
 template

[PATCH 01/11] rtl-ssa: Fix bug in function_info::add_insn_after

2023-10-17 Thread Alex Coplan

In the case that !insn->is_debug_insn () && next->is_debug_insn (), this
function was missing an update of the prev pointer on the first nondebug
insn following the sequence of debug insns starting at next.

This can lead to corruption of the insn chain, in that we end up with:

  insn->next_any_insn ()->prev_any_insn () != insn

in this case.  This patch fixes that.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

gcc/ChangeLog:

* rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we
update the prev pointer on the following nondebug insn in the
case that !insn->is_debug_insn () && next->is_debug_insn ().
---
 gcc/rtl-ssa/insns.cc | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/rtl-ssa/insns.cc b/gcc/rtl-ssa/insns.cc
index a0c2fec2b70..f970375d906 100644
--- a/gcc/rtl-ssa/insns.cc
+++ b/gcc/rtl-ssa/insns.cc
@@ -291,9 +291,17 @@ function_info::add_insn_after (insn_info *insn, insn_info *after)
 	  first->set_last_debug_insn (insn);
 	}
   else // !insn->is_debug_insn () && next->is_debug_insn ()
-	// At present we don't (need to) support inserting a nondebug
-	// instruction between two existing debug instructions.
-	gcc_assert (!after->is_debug_insn ());
+	{
+	  // At present we don't (need to) support inserting a nondebug
+	  // instruction between two existing debug instructions.
+	  gcc_assert (!after->is_debug_insn ());
+
+	  // Find the next nondebug insn and update its previous pointer
+	  // to point to INSN.
+	  auto next_nondebug = next->last_debug_insn ()->next_any_insn ();
+	  gcc_checking_assert (!next_nondebug->is_debug_insn ());
+	  next_nondebug->set_prev_sametype_insn (insn);
+	}
 
   // If AFTER and NEXT are separated by at least two points, we can
   // use a unique point number for INSN.  Otherwise INSN will have

[PATCH 00/11] aarch64: Add new load/store pair fusion pass

2023-10-17 Thread Alex Coplan

Hi,

This patch series adds a new aarch64-specific RTL-SSA pass for forming load and
store pairs (LDPs and STPS).  See the cover letter on patch 11/11 for more
details on the pass itself.

Patch 1/11 fixes a latent bug in RTL-SSA.  Patches 2-5 add features to RTL-SSA
that are needed by the pass.  Patch 6/11 fixes a latent bug in haifa-sched.cc
that is exposed by the pass.  Patches 7-9 adjust the aarch64 testsuite to
account for the new codegen.  Patch 10/11 extends the TFmode load/store pattern
to TImode and TDmode.  Finally, patch 11/11 adds the new pass.

Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?

Thanks,
Alex

Alex Coplan (11):
  rtl-ssa: Fix bug in function_info::add_insn_after
  rtl-ssa: Add drop_memory_access helper
  rtl-ssa: Add entry point to allow re-parenting uses
  rtl-ssa: Support inferring uses of mem in change_insns
  rtl-ssa: Support for inserting new insns
  haifa-sched: Allow for NOTE_INSN_DELETED at start of epilogue
  aarch64, testsuite: Prevent stp in lr_free_1.c
  aarch64, testsuite: Tweak sve/pcs/args_9.c to allow stps
  aarch64, testsuite: Fix up pr71727.c
  aarch64: Generalise TFmode load/store pair patterns
  aarch64: Add new load/store pair fusion pass.

 gcc/config.gcc|4 +-
 gcc/config/aarch64/aarch64-ldp-fusion.cc  | 2378 +
 gcc/config/aarch64/aarch64-passes.def |2 +
 gcc/config/aarch64/aarch64-protos.h   |1 +
 gcc/config/aarch64/aarch64.md |   22 +-
 gcc/config/aarch64/aarch64.opt|   20 +
 gcc/config/aarch64/iterators.md   |3 +
 gcc/config/aarch64/t-aarch64  |7 +
 gcc/haifa-sched.cc|   11 +
 gcc/rtl-ssa/access-utils.h|   11 +
 gcc/rtl-ssa/accesses.cc   |   18 +
 gcc/rtl-ssa/accesses.h|4 +
 gcc/rtl-ssa/changes.cc|  104 +-
 gcc/rtl-ssa/changes.h |2 +
 gcc/rtl-ssa/functions.h   |   19 +-
 gcc/rtl-ssa/insns.cc  |   19 +-
 gcc/rtl-ssa/insns.h   |7 +-
 gcc/rtl-ssa/internals.inl |1 +
 gcc/rtl-ssa/member-fns.inl|   12 +
 gcc/rtl-ssa/movement.h|8 +-
 gcc/testsuite/gcc.target/aarch64/lr_free_1.c  |2 +-
 gcc/testsuite/gcc.target/aarch64/pr71727.c|2 +-
 .../gcc.target/aarch64/sve/pcs/args_9.c   |4 +-
 23 files changed, 2623 insertions(+), 38 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-ldp-fusion.cc

Re: [PATCH] c++: Add missing auto_diagnostic_groups to constexpr.cc

2023-10-17 Thread Jason Merrill


On 10/17/23 12:34, Marek Polacek wrote:

On Tue, Oct 17, 2023 at 09:35:21PM +1100, Nathaniel Shead wrote:

Marek pointed out in another patch of mine [1] that I was missing an
auto_diagnostic_group to correctly associate informative notes with
their errors in structured error outputs. This patch goes through
constexpr.cc to correct this in other locations which seem to have the
same issue.


Thanks for the patch.  I went through all of them and they all seem correct.

So, LGTM, but can't approve.


Thanks, I pushed it with a Reviewed-By of Marek.

Jason

Re: [PATCH V2 00/14] Refactor and cleanup vsetvl pass

2023-10-17 Thread Patrick O'Neill


Hi Lehua!

I ran the gcc testsuite on qemu before/after applying your patches to 
305034e3 rv32/64gcv [1].


Baseline
   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
    rv32gcv/ ilp32d/ medlow |  208 /    78 |   29 /    17 |   71 /    24 |
    rv64gcv/  lp64d/ medlow |  101 /    54 |   13 / 4 |   33 /    13 |

After applying patch series:
   = Summary of gcc testsuite =
    | # of unexpected case / # of unique 
unexpected case

    |  gcc |  g++ | gfortran |
    rv32gcv/ ilp32d/ medlow |  256 /    96 |   29 /    17 |   69 /    23 |
    rv64gcv/  lp64d/ medlow |  152 /    74 |   13 / 4 |   31 /    12 |

I'm seeing:
20 new unique gcc failures on rv64gcv [2]
18 new unique gcc failures on rv32gcv [3]

Thanks,
Patrick

[1] Build commands:
git clone https://github.com/patrick-rivos/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
git submodule update --init gcc
cd gcc
git checkout 305034e3
cd ..
mkdir build
cd build
../configure --prefix=$(pwd) 
--with-multilib-generator="rv64gcv-lp64d--;rv32gcv-ilp32d--"

make report-linux -j32

Note: If you'd prefer to use upstream riscv-gnu-toolchain, I'm pretty 
sure you can do

mkdir build-64
cd build-64
../configure --prefix=$(pwd) --with-arch=rv64gcv --with-abi=lp64d
cd ..
mkdir build-32
cd build-32
../configure --prefix=$(pwd) --with-arch=rv32gcv --with-abi=lp32d
This'll make 2 folders, so run make report-linux in each of them.

[2] rv64gcv New failures:
FAIL: gcc.dg/vect/slp-7.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/slp-7.c execution test
FAIL: gcc.target/riscv/zero-scratch-regs-2.c   -O3 -g scan-assembler-not 
\\mvsetvli
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O1 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-34.c   -Os 
scan-assembler-times vsetvli 1
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O1 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-37.c   -Os 
scan-assembler-times vsetvli 3
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O1 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 
scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-38.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 4
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-47.c   -Os 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-48.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O1 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 -flto 
-fno-use-linker-plugin -flto-partition=none   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects   scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-49.c   -Os 
scan-assembler-times vsetvli 2
FAIL: gcc.target/riscv/rvv/vsetvl/avl_single-65.c   -O2

[COMMITTED] RISC-V/testsuite/pr111466.c: update test and expected output

2023-10-17 Thread Vineet Gupta

Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..3348d593813d 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -4,7 +4,7 @@
 /* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
-int foo2(int unused, int n, unsigned y, unsigned delta){
+unsigned int foo2(int unused, int n, unsigned y, unsigned delta){
   int s = 0;
   unsigned int x = 0;
   for (;x

PING Re: [PATCH v2 RFA] diagnostic: add permerror variants with opt

2023-10-17 Thread Jason Merrill


Ping?

On 10/3/23 17:09, Jason Merrill wrote:

This revision changes from using DK_PEDWARN for permerror-with-option to using
DK_PERMERROR.

Tested x86_64-pc-linux-gnu.  OK for trunk?

-- 8< --

In the discussion of promoting some pedwarns to be errors by default, rather
than move them all into -fpermissive it seems to me to make sense to support
DK_PERMERROR with an option flag.  This way will also work with
-fpermissive, but users can also still use -Wno-error=narrowing to downgrade
that specific diagnostic rather than everything affected by -fpermissive.

So, for diagnostics that we want to make errors by default we can just
change the pedwarn call to permerror.

The tests check desired behavior for such a permerror in a system header
with various flags.  The patch preserves the existing permerror behavior of
ignoring -w and system headers by default, but respecting them when
downgraded to a warning by -fpermissive.

This seems similar to but a bit better than the approach of forcing
-pedantic-errors that I previously used for -Wnarrowing: specifically, in
that now -w by itself is not enough to silence the -Wnarrowing
error (integer-pack2.C).

gcc/ChangeLog:

* doc/invoke.texi: Move -fpermissive to Warning Options.
* diagnostic.cc (update_effective_level_from_pragmas): Remove
redundant system header check.
(diagnostic_report_diagnostic): Move down syshdr/-w check.
(diagnostic_impl): Handle DK_PERMERROR with an option number.
(permerror): Add new overloads.
* diagnostic-core.h (permerror): Declare them.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Use permerror.

gcc/testsuite/ChangeLog:

* g++.dg/ext/integer-pack2.C: Add -fpermissive.
* g++.dg/diagnostic/sys-narrow.h: New test.
* g++.dg/diagnostic/sys-narrow1.C: New test.
* g++.dg/diagnostic/sys-narrow1a.C: New test.
* g++.dg/diagnostic/sys-narrow1b.C: New test.
* g++.dg/diagnostic/sys-narrow1c.C: New test.
* g++.dg/diagnostic/sys-narrow1d.C: New test.
* g++.dg/diagnostic/sys-narrow1e.C: New test.
* g++.dg/diagnostic/sys-narrow1f.C: New test.
* g++.dg/diagnostic/sys-narrow1g.C: New test.
* g++.dg/diagnostic/sys-narrow1h.C: New test.
* g++.dg/diagnostic/sys-narrow1i.C: New test.
---
  gcc/doc/invoke.texi   | 22 +++---
  gcc/diagnostic-core.h |  3 +
  gcc/testsuite/g++.dg/diagnostic/sys-narrow.h  |  2 +
  gcc/cp/typeck2.cc | 10 +--
  gcc/diagnostic.cc | 67 ---
  gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C |  4 ++
  .../g++.dg/diagnostic/sys-narrow1a.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1b.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1c.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1d.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1e.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1f.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1g.C  |  5 ++
  .../g++.dg/diagnostic/sys-narrow1h.C  |  6 ++
  .../g++.dg/diagnostic/sys-narrow1i.C  |  6 ++
  gcc/testsuite/g++.dg/ext/integer-pack2.C  |  2 +-
  16 files changed, 117 insertions(+), 40 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow.h
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1a.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1b.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1c.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1d.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1e.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1f.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1g.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1h.C
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/sys-narrow1i.C

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4085fc90907..6b6506a75b2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -231,7 +231,7 @@ in the following sections.
  -fnew-inheriting-ctors
  -fnew-ttp-matching
  -fno-nonansi-builtins  -fnothrow-opt  -fno-operator-names
--fno-optional-diags  -fpermissive
+-fno-optional-diags
  -fno-pretty-templates
  -fno-rtti  -fsized-deallocation
  -ftemplate-backtrace-limit=@var{n}
@@ -323,7 +323,7 @@ Objective-C and Objective-C++ Dialects}.
  @item Warning Options
  @xref{Warning Options,,Options to Request or Suppress Warnings}.
  @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic
--pedantic-errors
+-pedantic-errors -fpermissive
  -w  -Wextra  -Wall  -Wabi=@var{n}
  -Waddress  -Wno-address-of-packed-member  -Waggregate-return
  -Walloc-size-larger-than=@var{byte-size}  -Walloc-zero
@@ -3494,12 +3494,6 @@ Disable diagnostics that the standard says a

Re: [PATCH v2] RISC-V/testsuite/pr111466.c: update test and expected output

2023-10-17 Thread Jeff Law





On 10/17/23 12:51, Vineet Gupta wrote:

Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Oh yes, I should have remembered that update.  Thanks for taking care of it.

jeff

[x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-17 Thread Roger Sayle


This patch contains clean-ups of the widening multiplication patterns in
i386.md, and provides variants of the existing highpart multiplication
peephole2 transformations (that tidy up register allocation after
reload), and thereby fixes PR target/110511, which is a superfluous
move instruction.

For the new test case, compiled on x86_64 with -O2.

Before:
mulx64: movabsq $-7046029254386353131, %rcx
movq%rcx, %rax
mulq%rdi
xorq%rdx, %rax
ret

After:
mulx64: movabsq $-7046029254386353131, %rax
mulq%rdi
xorq%rdx, %rax
ret

The clean-ups are (i) that operand 1 is consistently made register_operand
and operand 2 becomes nonimmediate_operand, so that predicates match the
constraints, (ii) the representation of the BMI2 mulx instruction is
updated to use the new umul_highpart RTX, and (iii) because operands
0 and 1 have different modes in widening multiplications, "a" is a more
appropriate constraint than "0" (which avoids spills/reloads containing
SUBREGs).  The new peephole2 transformations are based upon those at
around line 9951 of i386.md, that begins with the comment
;; Highpart multiplication peephole2s to tweak register allocation.
;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx  ->  mov imm,%rax; imulq %rdi


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-17  Roger Sayle  

gcc/ChangeLog
PR target/110511
* config/i386/i386.md (mul3): Make operands 1 and
2 take "regiser_operand" and "nonimmediate_operand" respectively.
(mulqihi3): Likewise.
(*bmi2_umul3_1): Operand 2 needs to be register_operand
matching the %d constraint.  Use umul_highpart RTX to represent
the highpart multiplication.
(*umul3_1):  Operand 2 should use regiser_operand
predicate, and "a" rather than "0" as operands 0 and 2 have
different modes.
(define_split): For mul to mulx conversion, use the new
umul_highpart RTX representation.
(*mul3_1):  Operand 1 should be register_operand
and the constraint %a as operands 0 and 1 have different modes.
(*mulqihi3_1): Operand 1 should be register_operand matching
the constraint %0.
(define_peephole2): Providing widening multiplication variants
of the peephole2s that tweak highpart multiplication register
allocation.

gcc/testsuite/ChangeLog
PR target/110511
* gcc.target/i386/pr110511.c: New test case.


Thanks in advance,
Roger

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2a60df5..22f18c2 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9710,33 +9710,29 @@
   [(parallel [(set (match_operand: 0 "register_operand")
   (mult:
 (any_extend:
-  (match_operand:DWIH 1 "nonimmediate_operand"))
+  (match_operand:DWIH 1 "register_operand"))
 (any_extend:
-  (match_operand:DWIH 2 "register_operand"
+  (match_operand:DWIH 2 "nonimmediate_operand"
  (clobber (reg:CC FLAGS_REG))])])
 
 (define_expand "mulqihi3"
   [(parallel [(set (match_operand:HI 0 "register_operand")
   (mult:HI
 (any_extend:HI
-  (match_operand:QI 1 "nonimmediate_operand"))
+  (match_operand:QI 1 "register_operand"))
 (any_extend:HI
-  (match_operand:QI 2 "register_operand"
+  (match_operand:QI 2 "nonimmediate_operand"
  (clobber (reg:CC FLAGS_REG))])]
   "TARGET_QIMODE_MATH")
 
 (define_insn "*bmi2_umul3_1"
   [(set (match_operand:DWIH 0 "register_operand" "=r")
(mult:DWIH
- (match_operand:DWIH 2 "nonimmediate_operand" "%d")
+ (match_operand:DWIH 2 "register_operand" "%d")
  (match_operand:DWIH 3 "nonimmediate_operand" "rm")))
(set (match_operand:DWIH 1 "register_operand" "=r")
-   (truncate:DWIH
- (lshiftrt:
-   (mult: (zero_extend: (match_dup 2))
-   (zero_extend: (match_dup 3)))
-   (match_operand:QI 4 "const_int_operand"]
-  "TARGET_BMI2 && INTVAL (operands[4]) ==  * BITS_PER_UNIT
+   (umul_highpart:DWIH (match_dup 2) (match_dup 3)))]
+  "TARGET_BMI2
&& !(MEM_P (operands[2]) && MEM_P (operands[3]))"
   "mulx\t{%3, %0, %1|%1, %0, %3}"
   [(set_attr "type" "imulx")
@@ -9747,7 +9743,7 @@
   [(set (match_operand: 0 "register_operand" "=r,A")
(mult:
  (zero_extend:
-   (match_operand:DWIH 1 "nonimmediate_operand" "%d,0"))
+   (match_operand:DWIH 1 "register_operand" "%d,a"))
  (zero_extend:
(match_operand:DWIH 2 "nonimmediate_operand" "rm,rm"
(clobber (reg:CC

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-17 Thread Prathamesh Kulkarni

On Tue, 17 Oct 2023 at 02:40, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
> >  wrote:
> >>
> >> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
> >>  wrote:
> >> >
> >> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
> >> >  wrote:
> >> > >
> >> > > Prathamesh Kulkarni  writes:
> >> > > > Hi,
> >> > > > The attached patch attempts to fix PR111648.
> >> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
> >> > > > length, we end up creating following encoding in result: { base_elem,
> >> > > > arg[0], arg[1], ... } (assuming S = 1),
> >> > > > where arg is chosen input vector, which is incorrect, since the
> >> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
> >> > > >
> >> > > > For the test-case mentioned in PR, vectorizer pass creates
> >> > > > VEC_PERM_EXPR where:
> >> > > > arg0: { -16, -9, -10, -11 }
> >> > > > arg1: { -12, -5, -6, -7 }
> >> > > > sel = { 3, 4, 5, 6 }
> >> > > >
> >> > > > arg0, arg1 and sel are encoded with npatterns = 1 and 
> >> > > > nelts_per_pattern = 3.
> >> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
> >> > > > following encoding:
> >> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, 
> >> > > > nelts_per_pattern = 3
> >> > > >   = { -11, -12, -5 }
> >> > > >
> >> > > > So for res[3], it used S = (-5) - (-12) = 7
> >> > > > And hence computed it as -5 + 7 = 2.
> >> > > > instead of selecting arg1[2], ie, -6.
> >> > > >
> >> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is 
> >> > > > a multiple
> >> > > > of vector length, so a1 ... ae select elements only from stepped part
> >> > > > of the pattern
> >> > > > from input vector and return false for this case.
> >> > > >
> >> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
> >> > > > res_npatterns = res_nelts
> >> > > > res_nelts_per_pattern  = 1
> >> > > > which seems to fix the issue by encoding all the elements.
> >> > > >
> >> > > > The patch resulted in Case 4 and Case 5 failing from 
> >> > > > test_nunits_min_2 because
> >> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
> >> > > > which used a1 = 0, and thus selected arg1[0].
> >> > > >
> >> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
> >> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
> >> > > > and added a new Case 9 to test for this issue.
> >> > > >
> >> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
> >> > > > and on x86_64-linux-gnu.
> >> > > > Does the patch look OK ?
> >> > > >
> >> > > > Thanks,
> >> > > > Prathamesh
> >> > > >
> >> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
> >> > > >
> >> > > > gcc/ChangeLog:
> >> > > >   PR tree-optimization/111648
> >> > > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if 
> >> > > > a1
> >> > > >   is a multiple of vector length.
> >> > > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
> >> > > >   (test_nunits_min_4): ... here and rename case numbers. Also add
> >> > > >   Case 9.
> >> > > >
> >> > > > gcc/testsuite/ChangeLog:
> >> > > >   PR tree-optimization/111648
> >> > > >   * gcc.dg/vect/pr111648.c: New test.
> >> > > >
> >> > > >
> >> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> >> > > > index 4f8561509ff..c5f421d6b76 100644
> >> > > > --- a/gcc/fold-const.cc
> >> > > > +++ b/gcc/fold-const.cc
> >> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree 
> >> > > > arg0, tree arg1,
> >> > > > return false;
> >> > > >   }
> >> > > >
> >> > > > -  /* Ensure that the stepped sequence always selects from the 
> >> > > > same
> >> > > > -  input pattern.  */
> >> > > > +  /* Ensure that the stepped sequence always selects from the 
> >> > > > stepped
> >> > > > +  part of same input pattern.  */
> >> > > >unsigned arg_npatterns
> >> > > >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
> >> > > > : VECTOR_CST_NPATTERNS (arg1);
> >> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree 
> >> > > > arg0, tree arg1,
> >> > > >   *reason = "step is not multiple of npatterns";
> >> > > > return false;
> >> > > >   }
> >> > > > +
> >> > > > +  /* If a1 is a multiple of len, it will select base element of 
> >> > > > input
> >> > > > +  vector resulting in following encoding:
> >> > > > +  { base_elem, arg[0], arg[1], ... } where arg is the chosen 
> >> > > > input
> >> > > > +  vector. This encoding is not originally present in arg, since 
> >> > > > it's
> >> > > > +  defined as:
> >> > > > +  { arg[0], arg[1], arg[2], ... }.  */
> >> > > > +
> >> > > > +  if (multiple_p (a1, arg_len))
> >> > > > + {
> >> > > > +   if (reason)
> >> > > > + *reason

[PATCH v2] RISC-V/testsuite/pr111466.c: update test and expected output

2023-10-17 Thread Vineet Gupta

Update the test to potentially generate two SEXT.W instructions: one for
incoming function arg, other for function return.

But after commit 8eb9cdd14218
("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg")
the test is not supposed to generate either of them so fix the expected
assembler output which was errorneously introduced by commit above.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c (foo2): Change return to unsigned
int as that will potentially generate two SEXT.W instructions.
dg-final: Change to scan-assembler-not SEXT.W.

Signed-off-by: Vineet Gupta 
---
Changes since v1:
  - Changed function return to be unsigned int
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..3348d593813d 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -4,7 +4,7 @@
 /* { dg-options "-march=rv64gc_zba_zbs -mabi=lp64" } */
 /* { dg-skip-if "" { *-*-* } { "-O0" } } */
 
-int foo2(int unused, int n, unsigned y, unsigned delta){
+unsigned int foo2(int unused, int n, unsigned y, unsigned delta){
   int s = 0;
   unsigned int x = 0;
   for (;x

[PATCH] RISC-V/testsuite/pr111466.c: fix expected output to not detect SEXT.W

2023-10-17 Thread Vineet Gupta

gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr111466.c: Change to scan-assembler-not
to not detect sext.w.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/pr111466.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr111466.c 
b/gcc/testsuite/gcc.target/riscv/pr111466.c
index 007792466a51..01e20235f7fe 100644
--- a/gcc/testsuite/gcc.target/riscv/pr111466.c
+++ b/gcc/testsuite/gcc.target/riscv/pr111466.c
@@ -12,4 +12,4 @@ int foo2(int unused, int n, unsigned y, unsigned delta){
   return s;
 }
 
-/* { dg-final { scan-assembler "\msext\M" } } */
+/* { dg-final { scan-assembler-not "\msext\M" } } */
-- 
2.34.1

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-17 Thread Vineet Gupta





On 10/16/23 21:07, Jeff Law wrote:



On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit 
registers,

meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

[...]
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
I created a ChangeLog and pushed this after a final 
bootstrap/comparison run on x86_64.


Awesome.

As I've noted before, this has been running across the various targets 
in my tester for quite a while with no issues. Additionally Robin and 
myself have dug into various paths through expr_expr_real_2 and we're 
reasonably confident this is safe (about as much as we can be given 
the lack of information about the original patch).


My strong suspicion is that Michael M. made this code obsolete when he 
last revamped the gimple/ssa->RTL expansion path.


Thanks for your patience Vineet.  It's been a long road.


All the thanks to you for verifying this across targets and deep analysis.
There was a little snafu on my part in the test for which I'll post a fixup.


Jivan is diving into Joern's work.  It shows significant promise, 
though we are seeing some very weird behavior on perlbench.


That's great to hear. I was away from sign extension work much of last 
week. Back on it now. The prev example I was chasing 
(gcc.c-torture/compile/20040401-1.c) turned out to be a dead end as it 
has explicit casts and such so not an ideal case. I'm sifting through 
the logs and looking for better tests, there's a ton of them so I'm sure 
there's bunch more we can do at expand time to eliminate extensions 
early which as you mentioned is better in general, to have to "undo" 
less in later passes.


Thx,
-Vineet

Re: [patch] fortran/intrinsic.texi: Improve SIGNAL intrinsic entry

2023-10-17 Thread Harald Anlauf


Hi Tobias,

On 10/17/23 19:36, Tobias Burnus wrote:

Hi Harald,

On 17.10.23 19:02, Harald Anlauf wrote:

your latest patch - which you already pushed - removes the
intrinsic declaration of signal.


Only to 'signal' or also to 'sleep'? I have now added both in the attach
patch.


you are right: both should be declared as intrinsic.

Thanks,
Harald


(Not yet committed.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955

RE: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Roger Sayle



Hi Uros,
Thanks for the speedy review.

> From: Uros Bizjak 
> Sent: 17 October 2023 17:38
> 
> On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle 
> wrote:
> >
> >
> > This patch is the backend piece of a solution to PRs 101955 and
> > 106245, that adds a define_insn_and_split to the i386 backend, to
> > perform sign extension of a single (least significant) bit using AND $1 
> > then NEG.
> >
> > Previously, (x<<31)>>31 would be generated as
> >
> > sall$31, %eax   // 3 bytes
> > sarl$31, %eax   // 3 bytes
> >
> > with this patch the backend now generates:
> >
> > andl$1, %eax// 3 bytes
> > negl%eax// 2 bytes
> >
> > Not only is this smaller in size, but microbenchmarking confirms that
> > it's a performance win on both Intel and AMD; Intel sees only a 2%
> > improvement (perhaps just a size effect), but AMD sees a 7% win.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-10-17  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR middle-end/101955
> > PR tree-optimization/106245
> > * config/i386/i386.md (*extv_1_0): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> > PR middle-end/101955
> > PR tree-optimization/106245
> > * gcc.target/i386/pr106245-2.c: New test case.
> > * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> > * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> > * gcc.target/i386/pr106245-5.c: Likewise.
> 
> +;; Split sign-extension of single least significant bit as and x,$1;neg
> +x (define_insn_and_split "*extv_1_0"
> +  [(set (match_operand:SWI48 0 "register_operand" "=r")
> + (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
> +(const_int 1)
> +(const_int 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  ""
> +  "#"
> +  "&& 1"
> 
> No need to use "&&" for an empty insn constraint. Just use "reload_completed" 
> in
> this case.
> 
> +  [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
> +  (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
> +  (clobber (reg:CC FLAGS_REG))])])
> 
> Did you intend to split this after reload? If this is the case, then 
> reload_completed
> is missing.

Because this splitter neither required the allocation of a new pseudo, nor a
hard register assignment, i.e. it's a splitter that can be run before or after
reload, it's written to split "whenever".  If you'd prefer it to only split 
after
reload, I agree a "reload_completed" can be added (alternatively, adding
"ix86_pre_reload_split ()" would also work).

I now see from "*load_tp_" that "" is perhaps preferred over "&& 1"
In these cases.  Please let me know which you prefer.

Cheers,
Roger

Re: [patch] fortran/intrinsic.texi: Improve SIGNAL intrinsic entry

2023-10-17 Thread Tobias Burnus


Hi Harald,

On 17.10.23 19:02, Harald Anlauf wrote:

your latest patch - which you already pushed - removes the
intrinsic declaration of signal.


Only to 'signal' or also to 'sleep'? I have now added both in the attach
patch.

(Not yet committed.)

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
From bad72a07e572e4e0ac5ae9c25b9a234d98b1258f Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Tue, 17 Oct 2023 19:35:18 +0200
Subject: [PATCH] fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL example

gcc/fortran/ChangeLog:

	* intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to
	the example to make it safer.
---
 gcc/fortran/intrinsic.texi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 0db557d5a38..d1407186aea 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13214,6 +13214,7 @@ contains
 end module
 program test_signal
   use m_handler
+  intrinsic :: signal, sleep
   call signal (12, handler_print)  ! 12 = SIGUSR2 (on some systems)
   call signal (10, 1)  ! 10 = SIGUSR1 and 1 = SIG_IGN (on some systems)
 
-- 
2.34.1

Re: [PATCH v22 02/31] c-family, c++: Look up built-in traits via identifier node

2023-10-17 Thread Patrick Palka

On Tue, 17 Oct 2023, Ken Matsui wrote:

> Since RID_MAX soon reaches 255 and all built-in traits are used approximately
> once in a C++ translation unit, this patch removes all RID values for built-in
> traits and uses the identifier node to look up the specific trait.  Rather
> than holding traits as keywords, we set all trait identifiers as cik_trait,
> which is a new cp_identifier_kind.  As cik_reserved_for_udlit was unused and
> cp_identifier_kind is 3 bits, we replaced the unused field with the new
> cik_trait.  Also, the later patch handles a subsequent token to the built-in
> identifier so that we accept the use of non-function-like built-in trait
> identifiers.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (c_common_reswords): Remove all mappings of
>   built-in traits.
>   * c-common.h (enum rid): Remove all RID values for built-in traits.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-objcp-common.cc (names_builtin_p): Remove all RID value
>   cases for built-in traits.  Check for built-in traits via
>   the new cik_trait kind.
>   * cp-tree.h (enum cp_trait_kind): Set its underlying type to
>   addr_space_t.
>   (struct cp_trait): New struct to hold trait information.
>   (cp_traits): New array to hold a mapping to all traits.
>   (num_cp_traits): New variable to hold the size of cp_traits.
>   (cik_reserved_for_udlit): Rename to ...
>   (cik_trait): ... this.
>   (IDENTIFIER_ANY_OP_P): Exclude cik_trait.
>   (IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
>   * lex.cc (init_cp_traits): New function to set cik_trait and
>   IDENTIFIER_CP_INDEX for all built-in trait identifiers.
>   (cxx_init): Call init_cp_traits function.
>   * parser.cc (cp_traits): Define its values, declared in cp-tree.h.
>   (num_cp_traits): Define its value, declared in cp-tree.h.
>   (cp_lexer_lookup_trait): New function to look up a
>   built-in trait by IDENTIFIER_CP_INDEX.
>   (cp_lexer_lookup_trait_expr): Likewise, look up an
>   expression-yielding built-in trait.
>   (cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
>   built-in trait.
>   (cp_keyword_starts_decl_specifier_p): Remove all RID value cases
>   for built-in traits.
>   (cp_lexer_next_token_is_decl_specifier_keyword): Handle
>   type-yielding built-in traits.
>   (cp_parser_primary_expression): Remove all RID value cases for
>   built-in traits.  Handle expression-yielding built-in traits.
>   (cp_parser_trait): Handle cp_trait instead of enum rid.
>   (cp_parser_simple_type_specifier): Remove all RID value cases
>   for built-in traits.  Handle type-yielding built-in traits.
> 
> Co-authored-by: Patrick Palka 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/c-family/c-common.cc  |   7 ---
>  gcc/c-family/c-common.h   |   5 --
>  gcc/cp/cp-objcp-common.cc |   8 +--
>  gcc/cp/cp-tree.h  |  33 ---
>  gcc/cp/lex.cc |  21 +++
>  gcc/cp/parser.cc  | 120 +-
>  6 files changed, 129 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index f044db5b797..21fd333ef57 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -508,13 +508,6 @@ const struct c_common_resword c_common_reswords[] =
>{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
>{ "while", RID_WHILE,  0 },
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  { NAME,RID_##CODE, D_CXXONLY },
> -#include "cp/cp-trait.def"
> -#undef DEFTRAIT
> -  /* An alias for __is_same.  */
> -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> -
>/* C++ transactional memory.  */
>{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
>{ "atomic_noexcept",   RID_ATOMIC_NOEXCEPT, D_CXXONLY | D_TRANSMEM },
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 1fdba7ef3ea..051a442e0f4 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -168,11 +168,6 @@ enum rid
>RID_BUILTIN_LAUNDER,
>RID_BUILTIN_BIT_CAST,
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  RID_##CODE,
> -#include "cp/cp-trait.def"
> -#undef DEFTRAIT
> -
>/* C++11 */
>RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
>  
> diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
> index 93b027b80ce..b1adacfec07 100644
> --- a/gcc/cp/cp-objcp-common.cc
> +++ b/gcc/cp/cp-objcp-common.cc
> @@ -421,6 +421,10 @@ names_builtin_p (const char *name)
>   }
>  }
>  
> +  /* Check for built-in traits.  */
> +  if (IDENTIFIER_TRAIT_P (id))
> +return true;
> +
>/* Also detect common reserved C++ words that aren't strictly built-in
>   functions.  */
>switch (C_RID_CODE (id))
> @@ -434,10 +438,6 @@ names_builtin_p (const char *name)
>  case RID_BUILTIN_ASSOC_BARRIER:
>  case

Re: [patch] fortran/intrinsic.texi: Improve SIGNAL intrinsic entry

2023-10-17 Thread Harald Anlauf


Tobias,

your latest patch - which you already pushed - removes the
intrinsic declaration of signal.

This can lead to a user's confusion and undesired results when
the code is compiled e.g. with -std=f2018, because

  call signal (10, 1)  ! 10 = SIGUSR1 and 1 = SIG_IGN (on some systems)

could be mapped to the wrong external instead of the
libgfortran function _gfortran_signal_sub_int etc., or you
could get other compile-time errors with the example code.

I strongly recommend to restore the intrinsic declaration.

Cheers,
Harald

Am 17.10.23 um 09:47 schrieb Tobias Burnus:

Hi Harald,

On 16.10.23 20:31, Harald Anlauf wrote:

Hi Tobias,

Am 16.10.23 um 19:11 schrieb Tobias Burnus:

OK for mainline?


I think the patch qualifies as obvious.

While at it, you might consider removing the comment a few lines below
the place you are changing,

@c TODO: What should the interface of the handler be?  Does it take
arguments?

and enhance the given example by e.g.:


Updated version attached – I will commit it later today, unless anyone
has follow-up suggestions before.

Thanks for the suggestions,

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Uros Bizjak

On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle  wrote:
>
>
> This patch is the backend piece of a solution to PRs 101955 and 106245,
> that adds a define_insn_and_split to the i386 backend, to perform sign
> extension of a single (least significant) bit using AND $1 then NEG.
>
> Previously, (x<<31)>>31 would be generated as
>
> sall$31, %eax   // 3 bytes
> sarl$31, %eax   // 3 bytes
>
> with this patch the backend now generates:
>
> andl$1, %eax// 3 bytes
> negl%eax// 2 bytes
>
> Not only is this smaller in size, but microbenchmarking confirms
> that it's a performance win on both Intel and AMD; Intel sees only a
> 2% improvement (perhaps just a size effect), but AMD sees a 7% win.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-10-17  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/101955
> PR tree-optimization/106245
> * config/i386/i386.md (*extv_1_0): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> PR middle-end/101955
> PR tree-optimization/106245
> * gcc.target/i386/pr106245-2.c: New test case.
> * gcc.target/i386/pr106245-3.c: New 32-bit test case.
> * gcc.target/i386/pr106245-4.c: New 64-bit test case.
> * gcc.target/i386/pr106245-5.c: Likewise.

+;; Split sign-extension of single least significant bit as and x,$1;neg x
+(define_insn_and_split "*extv_1_0"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+ (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+(const_int 1)
+(const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& 1"

No need to use "&&" for an empty insn constraint. Just use
"reload_completed" in this case.

+  [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
+  (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
+  (clobber (reg:CC FLAGS_REG))])])

Did you intend to split this after reload? If this is the case, then
reload_completed is missing.

Uros.

Re: [PATCH] c++: Add missing auto_diagnostic_groups to constexpr.cc

2023-10-17 Thread Marek Polacek

On Tue, Oct 17, 2023 at 09:35:21PM +1100, Nathaniel Shead wrote:
> Marek pointed out in another patch of mine [1] that I was missing an
> auto_diagnostic_group to correctly associate informative notes with
> their errors in structured error outputs. This patch goes through
> constexpr.cc to correct this in other locations which seem to have the
> same issue.

Thanks for the patch.  I went through all of them and they all seem correct.

So, LGTM, but can't approve.
 
> [1]: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632653.html
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu.
> 
> -- >8 --
> 
> gcc/cp/ChangeLog:
> 
>   * constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing
>   auto_diagnostic_group.
>   (cxx_eval_call_expression): Likewise.
>   (diag_array_subscript): Likewise.
>   (outside_lifetime_error): Likewise.
>   (potential_constant_expression_1): Likewise.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/constexpr.cc | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
> index dde4fec4a44..7c8f2cc189d 100644
> --- a/gcc/cp/constexpr.cc
> +++ b/gcc/cp/constexpr.cc
> @@ -2437,6 +2437,7 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, 
> tree call,
> {
>   if (!ctx->quiet)
> {
> + auto_diagnostic_group d;
>   error_at (loc, "reference % failed");
>   inform (loc, "dynamic type %qT of its operand does "
>   "not have a base class of type %qT",
> @@ -2492,6 +2493,7 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, 
> tree call,
>   {
> if (!ctx->quiet)
>   {
> +   auto_diagnostic_group d;
> error_at (loc, "reference % failed");
> inform (loc, "static type %qT of its operand is a "
> "non-public base class of dynamic type %qT",
> @@ -2524,6 +2526,7 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, 
> tree call,
>   {
> if (!ctx->quiet)
>   {
> +   auto_diagnostic_group d;
> error_at (loc, "reference % failed");
> inform (loc, "static type %qT of its operand is a non-public"
> " base class of dynamic type %qT", objtype, mdtype);
> @@ -2545,6 +2548,7 @@ cxx_eval_dynamic_cast_fn (const constexpr_ctx *ctx, 
> tree call,
>   {
> if (!ctx->quiet)
>   {
> +   auto_diagnostic_group d;
> error_at (loc, "reference % failed");
> if (b_kind == bk_ambig)
>   inform (loc, "%qT is an ambiguous base class of dynamic "
> @@ -2822,6 +2826,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>   {
> if (!ctx->quiet)
>   {
> +   auto_diagnostic_group d;
> error_at (loc, "array deallocation of object "
>"allocated with non-array "
>"allocation");
> @@ -2844,6 +2849,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>   {
> if (!ctx->quiet)
>   {
> +   auto_diagnostic_group d;
> error_at (loc, "non-array deallocation of "
>"object allocated with array "
>"allocation");
> @@ -4193,6 +4199,7 @@ diag_array_subscript (location_t loc, const 
> constexpr_ctx *ctx, tree array, tree
>STRIP_ANY_LOCATION_WRAPPER (array);
>if (DECL_P (array))
>   {
> +   auto_diagnostic_group d;
> if (TYPE_DOMAIN (arraytype))
>   error_at (loc, "array subscript value %qE is outside the bounds "
> "of array %qD of type %qT", sidx, array, arraytype);
> @@ -5838,6 +5845,7 @@ cxx_eval_indirect_ref (const constexpr_ctx *ctx, tree t,
>  static void
>  outside_lifetime_error (location_t loc, tree r)
>  {
> +  auto_diagnostic_group d;
>if (DECL_NAME (r) == heap_deleted_identifier)
>  {
>/* Provide a more accurate message for deleted variables.  */
> @@ -9460,6 +9468,7 @@ potential_constant_expression_1 (tree t, bool 
> want_rval, bool strict, bool now,
> if (flags & tf_error)
>   {
> tree cap = DECL_CAPTURED_VARIABLE (t);
> +   auto_diagnostic_group d;
> if (constexpr_error (input_location, fundef_p,
>  "lambda capture of %qE is not a "
>  "constant expression", cap)
> -- 
> 2.42.0
> 

Marek

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Richard Sandiford

Robin Dapp  writes:
> Thank you for the explanation.
>
> So, assuming I added an IFN_VCOND_MASK and IFN_VCOND_MASK_LEN along
> with the respective helper and expand functions, what would be the
> way forward?

IMO it'd be worth starting with the _LEN form only.

> Generate an IFN_VCOND_MASK(_LEN) here instead of a VEC_COND_EXPR?
> How would I make sure all of match.pd's vec_cond optimizations
> applied to it as well?

I think the most important ones are:

/* Simplify:

 a = a1 op a2
 r = c ? a : b;

   to:

 r = c ? a1 op a2 : b;

   if the target can do it in one go.  This makes the operation conditional
   on c, so could drop potentially-trapping arithmetic, but that's a valid
   simplification if the result of the operation isn't needed.

   Avoid speculatively generating a stand-alone vector comparison
   on targets that might not support them.  Any target implementing
   conditional internal functions must support the same comparisons
   inside and outside a VEC_COND_EXPR.  */

It would be nice if there was some match.pd syntax that automatically
extended these rules to IFN_VCOND_MASK_LEN, but I don't know how easy
that would be, due to the extra two parameters.

Perhaps that itself could be done in gimple-match-exports.cc, in a similar
way to the current conditional stuff.  That is:

- for IFN_VCOND_MASK_LEN, try folding as a VEC_COND_EXPR and then "adding
  the length back"

- for IFN_COND_LEN_FOO, try folding as an IFN_COND_FOO and then
  "add the length back"

Not sure how important the second one is.

Thanks,
Richard

> Right now AFAIK IFN_VCOND_MASK only gets created in isel and
> everything is just a VEC_COND before.  But that does not provide
> length masking so is not the way to go?
>
> Thanks.
>
> Regards
>  Robin

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Richard Sandiford

Richard Biener  writes:
> On Mon, Oct 16, 2023 at 11:59 PM Richard Sandiford
>  wrote:
>>
>> Robin Dapp  writes:
>> >> Why are the contents of this if statement wrong for COND_LEN?
>> >> If the "else" value doesn't matter, then the masked form can use
>> >> the "then" value for all elements.  I would have expected the same
>> >> thing to be true of COND_LEN.
>> >
>> > Right, that one was overly pessimistic.  Removed.
>> >
>> >> But isn't the test whether res_op->code itself is an internal_function?
>> >> In other words, shouldn't it just be:
>> >>
>> >>   if (internal_fn_p (res_op->code)
>> >>&& internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
>> >>  return true;
>> >>
>> >> maybe_resimplify_conditional_op should already have converted to an
>> >> internal function where possible, and if combined_fn (res_op->code)
>> >> does any extra conversion on the fly, that conversion won't be reflected
>> >> in res_op.
>> >
>> > I went through some of our test cases and believe most of the problems
>> > are due to situations like the following:
>> >
>> > In vect-cond-arith-2.c we have (on riscv)
>> >   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
>> >   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, 
>> > vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
>> >
>> > On aarch64 this is a situation that matches the VEC_COND_EXPR
>> > simplification that I disabled with this patch.  We valueized
>> > to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
>> > vect_res_2.5_24 = VEC_COND_EXPR ;
>> > This is later re-assembled into a COND_SUB.
>> >
>> > As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
>> > achieve the same thing.  Would it be possible to create a COND_OP
>> > directly instead, though?  I tried the following (not very polished
>> > obviously):
>> >
>> > -  new_op.set_op (VEC_COND_EXPR, res_op->type,
>> > -res_op->cond.cond, res_op->ops[0],
>> > -res_op->cond.else_value);
>> > -  *res_op = new_op;
>> > -  return gimple_resimplify3 (seq, res_op, valueize);
>> > +  if (!res_op->cond.len)
>> > +   {
>> > + new_op.set_op (VEC_COND_EXPR, res_op->type,
>> > +res_op->cond.cond, res_op->ops[0],
>> > +res_op->cond.else_value);
>> > + *res_op = new_op;
>> > + return gimple_resimplify3 (seq, res_op, valueize);
>> > +   }
>> > +  else if (seq && *seq && is_gimple_assign (*seq))
>> > +   {
>> > + new_op.code = gimple_assign_rhs_code (*seq);
>> > + new_op.type = res_op->type;
>> > + new_op.num_ops = gimple_num_ops (*seq) - 1;
>> > + new_op.ops[0] = gimple_assign_rhs1 (*seq);
>> > + if (new_op.num_ops > 1)
>> > +   new_op.ops[1] = gimple_assign_rhs2 (*seq);
>> > + if (new_op.num_ops > 2)
>> > +   new_op.ops[2] = gimple_assign_rhs2 (*seq);
>> > +
>> > + new_op.cond = res_op->cond;
>> > +
>> > + gimple_match_op bla2;
>> > + if (convert_conditional_op (_op, ))
>> > +   {
>> > + *res_op = bla2;
>> > + // SEQ should now be dead.
>> > + return true;
>> > +   }
>> > +   }
>> >
>> > This would make the other hunk (check whether it was a LEN
>> > and try to recreate it) redundant I hope.
>> >
>> > I don't know enough about valueization, whether it's always
>> > safe to do that and other implications.  On riscv this seems
>> > to work, though and the other backends never go through the LEN
>> > path.  If, however, this is a feasible direction it could also
>> > be done for the non-LEN targets?
>>
>> I don't know much about valueisation either :)  But it does feel
>> like we're working around the lack of a LEN form of COND_EXPR.
>> In other words, it seems odd that we can do:
>>
>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>
>> but we can't do:
>>
>>   IFN_COND_LEN (mask, a, b, len, bias)
>>
>> There seems to be no way of applying a length without also finding an
>> operation to perform.
>
> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
> IFN_COND{,_LEN} to be more consistent here?

Yeah, sounds like it could be worthwhile.  But I suppose we still need
VEC_COND_EXPR itself because it's a generic front-end operation that
needs to be lowered.  So it might be worth starting with an ifn for the
LEN form and seeing whether the non-LEN form should switch over.

Thanks,
Richard

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Robin Dapp

Thank you for the explanation.

So, assuming I added an IFN_VCOND_MASK and IFN_VCOND_MASK_LEN along
with the respective helper and expand functions, what would be the
way forward?

Generate an IFN_VCOND_MASK(_LEN) here instead of a VEC_COND_EXPR?
How would I make sure all of match.pd's vec_cond optimizations
applied to it as well?
Right now AFAIK IFN_VCOND_MASK only gets created in isel and
everything is just a VEC_COND before.  But that does not provide
length masking so is not the way to go?

Thanks.

Regards
 Robin

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-10-17 Thread Tobias Burnus


On 17.10.23 15:34, Jakub Jelinek wrote:

On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote:

C++11 (and C23) attribute do not seem to be properly handled:

[[omp::decl (declare target,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect)]]
int bar(void) { return 8; }

Isn't that correct?


No it isn't. Following your argument below, the following is violated:

"If the directive has a clause, it must contain at least one enter
clause, link clause, or local clause."

(OpenMP before 5.2 had 'to' instead of 'enter' and 5.2 had 'enter' with
'to' as alias.)

And the clause above does not have those. Alternatively permitted is:

"If the extended-list argument is specified, no clauses may be specified."

But that cannot be the case as the "indirect" clause has been specified.

* * *


Declare target directive has the forms
declare target (list)
declare target
declare target clauses
The first form is essentially equivalent to declare target enter (list),
the second to begin declare target with no clauses.
Now,
[[omp::decl (declare target)]] int v;
matches the first form and so enter (v) clause is implied.
But say
[[omp::decl (declare target, device_type (any))]] int v;
is the third type and so nothing is implied, so it is equivalent to
int v;


I have to admit that I failed to read 'omp::decl()' in the spec before
trying it. The TR11 spec states:

"A declarative directive that is declaration-associated may
alternatively be expressed as an attribute specifier where
directive-attr is dec( directive-specification

"A declarative directive with an association of none that accepts a
variable list or extended list as a directive argument or clause
argument may alternatively be expressed with an attribute specifier that
also uses thedeclattribute, applies to variable and/or function
declarations, and omits the variable list or extended list argument. The
effect is as if the omitted list argument is the list of declared
variables and/or functions to which the attribute specifier applies."

('declare target' is a bit odd as it either accept a variable list or
supports a form where a clause accepts a variable list, which is confusing.)


#pragma omp declare target device_type (any)
Don't remember if that is supposed to be an error or just not do anything
because there is no enter or to or link clause.

That's invalid per the restriction above - well, except for the
delimited form, which permits 'indirect' and 'device_type' as only
clauses. But that does not apply here as the association with
'omp::decl' must be none (i.e. 'delimited' is not permitted).

So, I think if you want to make foo indirect, the above would have to be:
[[omp::decl (declare target,enter,indirect(1))]]
int foo(void) { return 5; }


I concur - and admit that I missed the 'enter'.

Still, there is an issue as the restriction is not checked for.

Same with:

#pragma omp declare target indirect

(invalid but accepted) while

#pragma omp declare target device_type(any) indirect

fails with: "error: directive with only ‘device_type’ clause". The error
due to 'device_type' is no longer completely correct as I would count
'indirect' as another directive.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread WANG Xuerui




On 10/17/23 22:06, Xi Ruoyao wrote:

During the review of a LLVM change [1], on LA464 we found that zeroing

"an" LLVM change (because the word LLVM is pronounced letter-by-letter)

a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

Similarly, "an" fcc


[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

  gcc/config/loongarch/loongarch.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 68897799505..743e75907a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2151,7 +2151,7 @@ (define_insn "movfcc"
[(set (match_operand:FCC 0 "register_operand" "=z")
(const_int 0))]
""
-  "movgr2cf\t%0,$r0")
+  "fcmp.caf.s\t%0,$f0,$f0")
  
  ;; Conditional move instructions.
  
Trivial enough, so this LGTM apart from the grammatical nits. (Whoever 
pushing this patch could simply amend it themselves so maybe there's no 
need for a v2.) Thanks!

[PATCH] LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fcc

2023-10-17 Thread Xi Ruoyao

During the review of a LLVM change [1], on LA464 we found that zeroing
a fcc with fcmp.caf.s is much faster than a movgr2cf from $r0.

[1]: https://github.com/llvm/llvm-project/pull/69300

gcc/ChangeLog:

* config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for
zeroing a fcc.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/loongarch.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 68897799505..743e75907a6 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -2151,7 +2151,7 @@ (define_insn "movfcc"
   [(set (match_operand:FCC 0 "register_operand" "=z")
(const_int 0))]
   ""
-  "movgr2cf\t%0,$r0")
+  "fcmp.caf.s\t%0,$f0,$f0")
 
 ;; Conditional move instructions.
 
-- 
2.42.0

Re: Check that passes do not forget to define profile

2023-10-17 Thread Jan Hubicka

> So OK to commit this?
> 
> This patch makes sure the profile_count information is initialized for the
> new
> bb created in move_sese_region_to_fn.
> 
> gcc/ChangeLog:
> 
>   * tree-cfg.cc (move_sese_region_to_fn): Initialize profile_count for
>   new basic block.
> 
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu.

This is OK,
thanks!
Honza
> 
> On 04/10/2023 12:02, Jan Hubicka wrote:
> > > Hi Honza,
> > > 
> > > My current patch set for AArch64 VLA omp codegen started failing on
> > > gcc.dg/gomp/pr87898.c after this. I traced it back to
> > > 'move_sese_region_to_fn' in tree/cfg.cc not setting count for the bb
> > > created.
> > > 
> > > I was able to 'fix' it locally by setting the count of the new bb to the
> > > accumulation of e->count () of all the entry_endges (if initialized). I'm
> > > however not even close to certain that's the right approach, attached 
> > > patch
> > > for illustration.
> > > 
> > > Kind regards,
> > > Andre
> > > diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> > 
> > > index 
> > > ffab7518b1568b58e610e26feb9e3cab18ddb3c2..32fc47ae683164bf8fac477fbe6e2c998382e754
> > >  100644
> > > --- a/gcc/tree-cfg.cc
> > > +++ b/gcc/tree-cfg.cc
> > > @@ -8160,11 +8160,15 @@ move_sese_region_to_fn (struct function 
> > > *dest_cfun, basic_block entry_bb,
> > > bb = create_empty_bb (entry_pred[0]);
> > > if (current_loops)
> > >   add_bb_to_loop (bb, loop);
> > > +  profile_count count = profile_count::zero ();
> > > for (i = 0; i < num_entry_edges; i++)
> > >   {
> > > e = make_edge (entry_pred[i], bb, entry_flag[i]);
> > > e->probability = entry_prob[i];
> > > +  if (e->count ().initialized_p ())
> > > + count += e->count ();
> > >   }
> > > +  bb->count = count;
> > 
> > This looks generally right - if you create a BB you need to set its
> > count and unless it has self-loop that is the sum of counts of
> > incommping edges.
> > 
> > However the initialized_p check should be unnecessary: if one of entry
> > edges to BB is uninitialized, the + operation will make bb count
> > uninitialized too, which is OK.
> > 
> > Honza
> > > for (i = 0; i < num_exit_edges; i++)
> > >   {
> > 

> diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
> index 
> ffab7518b1568b58e610e26feb9e3cab18ddb3c2..ffeb20b717aead756844c5f48c2cc23f5e9f14a6
>  100644
> --- a/gcc/tree-cfg.cc
> +++ b/gcc/tree-cfg.cc
> @@ -8160,11 +8160,14 @@ move_sese_region_to_fn (struct function *dest_cfun, 
> basic_block entry_bb,
>bb = create_empty_bb (entry_pred[0]);
>if (current_loops)
>  add_bb_to_loop (bb, loop);
> +  profile_count count = profile_count::zero ();
>for (i = 0; i < num_entry_edges; i++)
>  {
>e = make_edge (entry_pred[i], bb, entry_flag[i]);
>e->probability = entry_prob[i];
> +  count += e->count ();
>  }
> +  bb->count = count;
>  
>for (i = 0; i < num_exit_edges; i++)
>  {

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Richard Sandiford

Robin Dapp  writes:
>>> I don't know much about valueisation either :)  But it does feel
>>> like we're working around the lack of a LEN form of COND_EXPR.
>>> In other words, it seems odd that we can do:
>>>
>>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>>
>>> but we can't do:
>>>
>>>   IFN_COND_LEN (mask, a, b, len, bias)
>>>
>>> There seems to be no way of applying a length without also finding an
>>> operation to perform.
>> 
>> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
>> IFN_COND{,_LEN} to be more consistent here?
>
> So, yes we could define IFN_COND_LEN (or VCOND_MASK_LEN) but I'd
> assume that there would be a whole lot of follow-up things to
> consider.
>
> I'm wondering if we really gain something from the the round-trip
> via VEC_COND_EXPR when we eventually create a COND_(LEN_)_OP anyway?

The main purpose of the VEC_COND_EXPR isn't as an intermediate step,
but as an end in its own right.  E.g. it allows:

  IFN_COND_ADD (mask, cst1, cst2, else)

to be folded to:

  VEC_COND_EXPR 

This is especially useful when vectorisation has the effect of completely
unrolling a loop.

The VEC_COND_EXPR is only used if the equivalent unconditional rule
folds to a gimple value.

> Sure, if the target doesn't have the particular operation we would
> want a VEC_COND_EXPR.  Same if SEQ is somehow more complicated.
>
> So the IFN_COND(_LEN) =? VCOND_MASK(_LEN) discussion notwithstanding,
> couldn't what I naively proposed be helpful as well?

I don't think it's independently useful, since the fold that it's
attempting is one that match.pd should be able to do.  match.pd can
also do it in a more general way, since it isn't restricted to looking
at the currenct sequence.

> Or do we
> potentially lose optimizations during the time where e.g. a
>  _foo = a BINOP b
>  VEC_COND_EXPR (cond, foo, else)
> has not yet been converted into a
>  COND_OP?

Yeah, it would miss out on that too.  

> We already create COND_OPs for the other paths
> (via convert_conditional_op) so why not for this one?  Or am I missing
> some interdependence with SEQ?

The purpose of this code is to see what happens if we apply the
usual folds for unconditional ops to the corresponding conditional forms.
E.g. for IFN_COND_ADD (mask, a, b, c) it sees what a + b would fold to,
then tries to reapply the VEC_DOND_EXPR (mask, ..., c) to the result.

If a + b folds to a gimple value, we can fold to a VEC_COND_EXPR
involving that gimple value, as discussed above.  This could happen
if a + b folds to a constant, or for things like a + 0 -> a.

If instead a + b folds to a new operation (say a + b' or a - b'),
we need to construct the equivalent conditional form of that operation,
with the same mask and else values.  This is a correctness issue rather
than an optimisation.  As the comment in:

  /* Otherwise try rewriting the operation as an IFN_COND_* call.
 Again, this isn't a simplification in itself, since it's what
 RES_OP already described.  */
  if (convert_conditional_op (res_op, _op))
*res_op = new_op;

says, it's just reconstituting what RES_OP describes in gimple form.
If that isn't possible then the simplification must fail.

In some cases we could, as a follow-on, try to make a a' op b' fold
result fall back to an unconditional a' op b' followed by a VEC_COND_EXPR.
But we don't do that currently.  It isn't safe in all cases, since
IFN_COND_ADD only adds active elements, whereas an unconditional a' op b'
would operate on all elements.  I also don't know of any specific example
where this would be useful on SVE.

Thanks,
Richard

>
> FWIW I did a full bootstrap and testsuite run on the usual architectures
> showing no changes with the attached patch.
>
> Regards
>  Robin
>
> Subject: [PATCH] gimple-match: Create COND_OP directly if possible.
>
> This patch converts simplified sequences into conditional operations
> instead of VEC_COND_EXPRs if the target supports them.
> This helps for len-masked targets which cannot directly use a
> VEC_COND_EXPR in the presence of length masking.
>
> gcc/ChangeLog:
>
>   * gimple-match-exports.cc (directly_supported_p): Define.
>   (maybe_resimplify_conditional_op): Create COND_OP directly.
>   * gimple-match.h (gimple_match_cond::gimple_match_cond):
>   Initialize length and bias.
> ---
>  gcc/gimple-match-exports.cc | 40 -
>  gcc/gimple-match.h  |  7 +--
>  2 files changed, 36 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index b36027b0bad..ba3bd1450db 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -98,6 +98,8 @@ static bool gimple_resimplify5 (gimple_seq *, 
> gimple_match_op *, tree (*)(tree))
>  static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree 
> (*)(tree));
>  static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree 
> (*)(tree));
>  
> +bool directly_supported_p

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-10-17 Thread Jakub Jelinek

On Tue, Oct 17, 2023 at 03:12:46PM +0200, Tobias Burnus wrote:
> C++11 (and C23) attribute do not seem to be properly handled:
> 
> [[omp::decl (declare target,indirect(1))]]
> int foo(void) { return 5; }
> [[omp::decl (declare target indirect)]]
> int bar(void) { return 8; }

Isn't that correct?
Declare target directive has the forms
declare target (list)
declare target
declare target clauses
The first form is essentially equivalent to declare target enter (list),
the second to begin declare target with no clauses.
Now,
[[omp::decl (declare target)]] int v;
matches the first form and so enter (v) clause is implied.
But say
[[omp::decl (declare target, device_type (any))]] int v;
is the third type and so nothing is implied, so it is equivalent to
int v;
#pragma omp declare target device_type (any)
Don't remember if that is supposed to be an error or just not do anything
because there is no enter or to or link clause.
So, I think if you want to make foo indirect, the above would have to be:
[[omp::decl (declare target,enter,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect enter)]]
int bar(void) { return 8; }
or so (or to instead of enter, but guess that is either deprecated or
removed but we should support that anyway).

Jakub

Re: [PATCH v8] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Ajit Agarwal

Hello Richard:

Below review comments are incorporated in version 10 of the patch,
Please review and let me know if its okay for trunk.


Thanks & Regards
Ajit

On 17/10/23 2:47 pm, Richard Biener wrote:
> On Tue, Oct 17, 2023 at 10:53 AM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>> On 17/10/23 2:03 pm, Richard Biener wrote:
>>> On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal  
>>> wrote:

 This patch improves code sinking pass to sink statements before call to 
 reduce
 register pressure.
 Review comments are incorporated. Synced and modified with latest trunk 
 sources.

 For example :

 void bar();
 int j;
 void foo(int a, int b, int c, int d, int e, int f)
 {
   int l;
   l = a + b + c + d +e + f;
   if (a != 5)
 {
   bar();
   j = l;
 }
 }

 Code Sinking does the following:

 void bar();
 int j;
 void foo(int a, int b, int c, int d, int e, int f)
 {
   int l;

   if (a != 5)
 {
   l = a + b + c + d +e + f;
   bar();
   j = l;
 }
 }

 Bootstrapped regtested on powerpc64-linux-gnu.

 Thanks & Regards
 Ajit

 tree-ssa-sink: Improve code sinking pass

 Currently, code sinking will sink code after function calls.  This 
 increases
 register pressure for callee-saved registers.  The following patch improves
 code sinking by placing the sunk code before calls in the use block or in
 the immediate dominator of the use blocks.
>>>
>>> The patch no longer does what the description above says.
>> Why you think so. Please let me know.
> 
> You talk about calls above but the patch doesn't do anything about calls.  You
> also don't do anything about register pressure, rather the effect of
> your changes
> are to move some stmts by a smaller "distance", whatever effect that has.
> 
>>>
>>> More comments below.
>>>
 2023-10-12  Ajit Kumar Agarwal  

 gcc/ChangeLog:

 PR tree-optimization/81953
 * tree-ssa-sink.cc (statement_sink_location): Move statements 
 before
 calls.
 (select_best_block): Add heuristics to select the best blocks in 
 the
 immediate post dominator.

 gcc/testsuite/ChangeLog:

 PR tree-optimization/81953
 * gcc.dg/tree-ssa/ssa-sink-20.c: New test.
 * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
 ---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++
  gcc/tree-ssa-sink.cc| 39 -
  3 files changed, 56 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 new file mode 100644
 index 000..d3b79ca5803
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 @@ -0,0 +1,15 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
 +void bar();
 +int j;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump 
 {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
 new file mode 100644
 index 000..84e7938c54f
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
 @@ -0,0 +1,19 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
 +void bar();
 +int j, x;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  if (b != 3)
 +x = 3;
 +  else
 +x = 5;
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump 
 {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
 diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
 index a360c5cdd6e..95298bc8402 100644
 --- a/gcc/tree-ssa-sink.cc
 +++ b/gcc/tree-ssa-sink.cc
 @@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
 bool *debug_stmts)

  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
 tree, return the best basic block between them (inclusive) to place
 -   statements.
 +   statements. The best basic block should be an immediate dominator of
 +   best basic block if the use stmt is after the

[PATCH v10] tree-ssa-sink: Improve code sinking pass

2023-10-17 Thread Ajit Agarwal

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;

  if (a != 5)
{
  l = a + b + c + d +e + f;
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code at the use points with loop having same
nesting depth. The following patch improves code sinking by placing the sunk
code in immediate dominator with same loop nest depth.

2023-10-17  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements with
same loop nest depth.
(select_best_block): Add heuristics to select the best blocks in the
immediate dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-21.c: New test.
* gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++
 gcc/tree-ssa-sink.cc| 16 +++-
 3 files changed, 45 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..d96df0d81e9 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool 
*debug_stmts)
tree, return the best basic block between them (inclusive) to place
statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
We want the most control dependent block in the shallowest loop nest.
 
If the resulting block is in a shallower loop nest, then use it.  Else
@@ -204,11 +207,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
 
+  /* If we've moved into same loop nest, then that becomes
+our best block.  */
+  if (!gimple_vuse (stmt)
+ && bb_loop_depth (temp_bb) == bb_loop_depth (best_bb))
+best_bb = temp_bb;
+
   /* Walk up the dominator tree, hopefully we'll find a shallower
 loop nest.  */
   temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
-
   /* Placing a statement before a setjmp-like function would be invalid
  (it cannot be reevaluated when execution follows an abnormal edge).
  If we selected a block with abnormal predecessors, just punt.  */
@@ -430,6 +438,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
continue;
  break;
}
+
   use = USE_STMT (one_use);
 
   if (gimple_code (use) != GIMPLE_PHI)
@@ -439,10 +448,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
  if (sinkbb == frombb)
return false;
 
- if (sinkbb == gimple_bb (use))
-   *togsi = gsi_for_stmt (use);
- else
-   *togsi = gsi_after_labels (sinkbb);
+ *togsi = gsi_after_labels (sinkbb);
 
  return true;
}
-- 
2.39.3

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2023-10-17 Thread Tobias Burnus


Hi Kwok, hi Jakub, hi all,

some first comments based on both playing around and reading the
patch - and some generic comments to any patch reader.

In general, the patch looks good. I just observe:

* There is an issue with [[omp::decl(...)]]'

* () - there is a C/C++ inconsistency in
  what is expected; it possibly affects more such conditions

* Missed optimization for the host?

* Bunch of minor comments


On 08.10.23 15:13, Kwok Cheung Yeung wrote:


This patch adds support for the 'indirect' clause in the 'declare
target' directive in C/C++ (Fortran to follow) and adds the necessary
infrastructure to support indirect calls in target regions. This allows
one to pass in pointers to functions that have been declared as indirect
from the host to the target, then invoked via the passed-in pointer on
the target device.
[...]
The C++ support is currently limited to normal indirect calls - virtual
calls on objects do not currently work. I believe the main issue is that
the vtables are not currently copied across to the target. I have added
some handling for OBJ_TYPE_REF to prevent the compiler from ICEing when
it encounters a virtual call, but without the vtable this cannot work
properly.


Side remark: Fortran polymorphic variables are similar. For them also
a vtable needs to be copied.

(For vtables, see also comment to 'libgomp.texi' far below.)

* * *

C++11 (and C23) attribute do not seem to be properly handled:

[[omp::decl (declare target,indirect(1))]]
int foo(void) { return 5; }
[[omp::decl (declare target indirect)]]
int bar(void) { return 8; }
[[omp::directive (begin declare target,indirect)]];
int baz(void) { return 11; }
[[omp::directive (end declare target)]];

While I get for the last one ("baz"):

__attribute__((omp declare target, omp declare target block, omp declare target 
indirect))

the first two (foo and bar) do not have any attribute; if I remove the 
"indirect",
I do get "__attribute__((omp declare target))". Hence, the omp::decl support 
seems to
partially work.

NOTE: C23 omp:: attribute support is still WIP and not yet in mainline.
Recent draft: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633007.html



The following works - but there is not a testcase for either syntax:

int bar(void) { return 8; }
[[omp::directive(declare target to(bar) , indirect(1))]];
int baz(void) { return 11; }
[[omp::directive ( declare target indirect enter(baz))]];

int bar(void) { return 8; }
#pragma omp declare target to(bar) , indirect(1)
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)

(There is one for #pragma + 'to' in gomp/declare-target-indirect-2.c, however.)

Side remark: OpenMP 5.2 renamed 'to' to 'enter' (with deprecated alias 'to);
hence, I also use 'enter' above. The current testcases for indiredt use 'enter'.
(Not that it should make a difference as the to/enter do work.)


The following seems to work fine, but I think we do not have
a testcase for it ('bar' has no indirect, foo and baz have it):

#pragma omp begin declare target indirect(1)
int foo(void) { return 5; }
#pragma omp begin declare target indirect(0)
int bar(void) { return 8; }
int baz(void) { return 11; }
#pragma omp declare target indirect enter(baz)
#pragma omp end declare target
#pragma omp end declare target

* * *

Possibly affecting other logical flags as well, but I do notice that
gcc but not g++ accepts the following:

#pragma omp begin declare target indirect("abs")
#pragma omp begin declare target indirect(5.5)

g++ shows: error: expected constant integer expression

OpenMP requires 'constant boolean' expr (OpenMP 5.1) or
'expression of logical type','constant' (OpenMP 5.2), where for the latter it 
has:

"The OpenMP *logical type* supports logical variables and expressions in any 
base language.
"[C / C++] Any OpenMP logical expression is a scalar expression. This document 
uses true as
a generic term for a non-zero integer value and false as a generic term for an 
integer value
of zero."

I am not quite sure what to expect here; in terms of C++, conv.bool surely 
permits
those for those pvalues "Boolean conversions".  For C, I don't find the wording 
in the
standard but 'if("abc")' and 'if (5.5)' is accepted.

* * *

I notice that the {__builtin_,}GOMP_target_map_indirect_ptr call is inserted
quite late, i.e. in omp-offload.cc.  A dump and also looking at the *.s files
shows that the
  __builtin_GOMP_target_map_indirect_ptr / callGOMP_target_map_indirect_ptr
do not only show up for the device but also for the host-fallback code.

I think the latter is not required as a host pointer can be directly executed
on the host - and device -> host pointer like in
  omp target device(ancestor:1)
do not need to be supported.

Namely the current glossary (here git version but OpenMP 5.2 is very similar);
note the "other than the host device":

"indirect device invocation - An indirect call to the device version of a
procedure on a device other than the host device, through a

[x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Roger Sayle


This patch is the backend piece of a solution to PRs 101955 and 106245,
that adds a define_insn_and_split to the i386 backend, to perform sign
extension of a single (least significant) bit using AND $1 then NEG.

Previously, (x<<31)>>31 would be generated as

sall$31, %eax   // 3 bytes
sarl$31, %eax   // 3 bytes

with this patch the backend now generates:

andl$1, %eax// 3 bytes
negl%eax// 2 bytes

Not only is this smaller in size, but microbenchmarking confirms
that it's a performance win on both Intel and AMD; Intel sees only a
2% improvement (perhaps just a size effect), but AMD sees a 7% win.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-10-17  Roger Sayle  

gcc/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* config/i386/i386.md (*extv_1_0): New define_insn_and_split.

gcc/testsuite/ChangeLog
PR middle-end/101955
PR tree-optimization/106245
* gcc.target/i386/pr106245-2.c: New test case.
* gcc.target/i386/pr106245-3.c: New 32-bit test case.
* gcc.target/i386/pr106245-4.c: New 64-bit test case.
* gcc.target/i386/pr106245-5.c: Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2a60df5..b7309be0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3414,6 +3414,21 @@
   [(set_attr "type" "imovx")
(set_attr "mode" "SI")])
 
+;; Split sign-extension of single least significant bit as and x,$1;neg x
+(define_insn_and_split "*extv_1_0"
+  [(set (match_operand:SWI48 0 "register_operand" "=r")
+   (sign_extract:SWI48 (match_operand:SWI48 1 "register_operand" "0")
+   (const_int 1)
+   (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  ""
+  "#"
+  "&& 1"
+  [(parallel [(set (match_dup 0) (and:SWI48 (match_dup 1) (const_int 1)))
+ (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 0) (neg:SWI48 (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
 (define_expand "extzv"
   [(set (match_operand:SWI248 0 "register_operand")
(zero_extract:SWI248 (match_operand:SWI248 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-2.c 
b/gcc/testsuite/gcc.target/i386/pr106245-2.c
new file mode 100644
index 000..47b0d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int f(int a)
+{
+return (a << 31) >> 31;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negl" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-3.c 
b/gcc/testsuite/gcc.target/i386/pr106245-3.c
new file mode 100644
index 000..4ec6342
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+long long f(long long a)
+{
+return (a << 63) >> 63;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negl" } } */
+/* { dg-final { scan-assembler "cltd" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-4.c 
b/gcc/testsuite/gcc.target/i386/pr106245-4.c
new file mode 100644
index 000..ef77ee5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-4.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+long long f(long long a)
+{
+return (a << 63) >> 63;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negq" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106245-5.c 
b/gcc/testsuite/gcc.target/i386/pr106245-5.c
new file mode 100644
index 000..0351866
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106245-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 f(__int128 a)
+{
+  return (a << 127) >> 127;
+}
+
+/* { dg-final { scan-assembler "andl" } } */
+/* { dg-final { scan-assembler "negq" } } */
+/* { dg-final { scan-assembler "cqto" } } */

[PATCH] tree-optimization/111846 - put simd-clone-info into SLP tree

2023-10-17 Thread Richard Biener

The following avoids bogously re-using the simd-clone-info we
currently hang off stmt_info from two different SLP contexts where
a different number of lanes should have chosen a different best
simdclone.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111846
* tree-vectorizer.h (_slp_tree::simd_clone_info): Add.
(SLP_TREE_SIMD_CLONE_INFO): New.
* tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize
SLP_TREE_SIMD_CLONE_INFO.
(_slp_tree::~_slp_tree): Release it.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use
SLP_TREE_SIMD_CLONE_INFO or STMT_VINFO_SIMD_CLONE_INFO
dependent on if we're doing SLP.

* gcc.dg/vect/pr111846.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr111846.c | 12 ++
 gcc/tree-vect-slp.cc |  2 ++
 gcc/tree-vect-stmts.cc   | 35 +---
 gcc/tree-vectorizer.h|  6 +
 4 files changed, 36 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr111846.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr111846.c 
b/gcc/testsuite/gcc.dg/vect/pr111846.c
new file mode 100644
index 000..d283882f261
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr111846.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -ffast-math" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+extern __attribute__((__simd__)) float powf(float, float);
+float gv[0][10];
+float eq_set_bands_real_adj[0];
+void eq_set_bands_real() {
+  for (int c = 0; c < 10; c++)
+for (int i = 0; i < 10; i++)
+  gv[c][i] = powf(0, eq_set_bands_real_adj[i]) - 1;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index af8f5031bd2..d081999a763 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -117,6 +117,7 @@ _slp_tree::_slp_tree ()
   SLP_TREE_CHILDREN (this) = vNULL;
   SLP_TREE_LOAD_PERMUTATION (this) = vNULL;
   SLP_TREE_LANE_PERMUTATION (this) = vNULL;
+  SLP_TREE_SIMD_CLONE_INFO (this) = vNULL;
   SLP_TREE_DEF_TYPE (this) = vect_uninitialized_def;
   SLP_TREE_CODE (this) = ERROR_MARK;
   SLP_TREE_VECTYPE (this) = NULL_TREE;
@@ -143,6 +144,7 @@ _slp_tree::~_slp_tree ()
   SLP_TREE_VEC_DEFS (this).release ();
   SLP_TREE_LOAD_PERMUTATION (this).release ();
   SLP_TREE_LANE_PERMUTATION (this).release ();
+  SLP_TREE_SIMD_CLONE_INFO (this).release ();
   if (this->failed)
 free (failed);
 }
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index b3a56498595..9bb43e98f56 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4215,6 +4215,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (nargs == 0)
 return false;
 
+  vec& simd_clone_info = (slp_node ? SLP_TREE_SIMD_CLONE_INFO (slp_node)
+   : STMT_VINFO_SIMD_CLONE_INFO (stmt_info));
   arginfo.reserve (nargs, true);
   auto_vec slp_op;
   slp_op.safe_grow_cleared (nargs);
@@ -4256,25 +4258,22 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
gcc_assert (thisarginfo.vectype != NULL_TREE);
 
   /* For linear arguments, the analyze phase should have saved
-the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
-  if (i * 3 + 4 <= STMT_VINFO_SIMD_CLONE_INFO (stmt_info).length ()
- && STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2])
+the base and step in {STMT_VINFO,SLP_TREE}_SIMD_CLONE_INFO.  */
+  if (i * 3 + 4 <= simd_clone_info.length ()
+ && simd_clone_info[i * 3 + 2])
{
  gcc_assert (vec_stmt);
- thisarginfo.linear_step
-   = tree_to_shwi (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2]);
- thisarginfo.op
-   = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 1];
+ thisarginfo.linear_step = tree_to_shwi (simd_clone_info[i * 3 + 2]);
+ thisarginfo.op = simd_clone_info[i * 3 + 1];
  thisarginfo.simd_lane_linear
-   = (STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 3]
-  == boolean_true_node);
+   = (simd_clone_info[i * 3 + 3] == boolean_true_node);
  /* If loop has been peeled for alignment, we need to adjust it.  */
  tree n1 = LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo);
  tree n2 = LOOP_VINFO_NITERS (loop_vinfo);
  if (n1 != n2 && !thisarginfo.simd_lane_linear)
{
  tree bias = fold_build2 (MINUS_EXPR, TREE_TYPE (n1), n1, n2);
- tree step = STMT_VINFO_SIMD_CLONE_INFO (stmt_info)[i * 3 + 2];
+ tree step = simd_clone_info[i * 3 + 2];
  tree opt = TREE_TYPE (thisarginfo.op);
  bias = fold_convert (TREE_TYPE (step), bias);
  bias = fold_build2 (MULT_EXPR, TREE_TYPE (step), bias, step);
@@ -4328,8 +4327,8 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
   unsigned group_size = slp_node ?

Re: [PATCH] wide-int-print: Don't print large numbers hexadecimally for print_dec{,s,u}

2023-10-17 Thread Richard Biener

On Tue, 17 Oct 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch implements printing of wide_int/widest_int numbers
> decimally when asked for that using print_dec{,s,u}, even if they have
> precision larger than 64 and get_len () above 1 (right now we printed
> them hexadecimally and even negative numbers as huge positive hexadecimal).
> 
> In order to avoid the expensive division/modulo by 10^19 twice, once to
> estimate how many will be needed and another to actually print it, the
> patch prints the 19 digit chunks in reverse order (from least significant
> to most significant) and then reorders those with linear complexity to form
> the right printed number.
> Tested with printing both 256 and 320 bit numbers (first as an example
> of even number of 19 digit chunks plus one shorter above it, the second
> as an example of odd number of 19 digit chunks plus one shorter above it).
> 
> The l * HOST_BITS_PER_WIDE_INT / 3 + 3 estimatition thinking about it now
> is one byte too much (one byte for -, one for '\0') and too conservative,
> so we could go with l * HOST_BITS_PER_WIDE_INT / 3 + 2 as well, or e.g.
> l * HOST_BITS_PER_WIDE_INT * 10 / 33 + 3 as even less conservative
> estimation (though more expensive to compute in inline code).
> But that l * HOST_BITS_PER_WIDE_INT / 4 + 4; is likely one byte too much
> as well, 2 bytes for 0x, one byte for '\0' and where does the 4th one come
> from?  Of course all of these assuming HOST_BITS_PER_WIDE_INT is a multiple
> of 64...
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2023-10-17  Jakub Jelinek  
> 
>   * wide-int-print.h (print_dec_buf_size): For length, divide number
>   of bits by 3 and add 3 instead of division by 4 and adding 4.
>   * wide-int-print.cc (print_decs): Remove superfluous ()s.  Don't call
>   print_hex, instead call print_decu on either negated value after
>   printing - or on wi itself.
>   (print_decu): Don't call print_hex, instead print even large numbers
>   decimally.
>   (pp_wide_int_large): Assume len from print_dec_buf_size is big enough
>   even if it returns false.
>   * pretty-print.h (pp_wide_int): Use print_dec_buf_size to check if
>   pp_wide_int_large should be used.
>   * tree-pretty-print.cc (dump_generic_node): Use print_hex_buf_size
>   to compute needed buffer size.
> 
> --- gcc/wide-int-print.h.jj   2023-10-15 23:04:06.195422820 +0200
> +++ gcc/wide-int-print.h  2023-10-16 10:14:41.327401697 +0200
> @@ -42,7 +42,7 @@ print_dec_buf_size (const wide_int_ref &
>unsigned int l = wi.get_len ();
>if ((l != 1 || sgn == UNSIGNED) && wi::neg_p (wi))
>  l = WIDE_INT_MAX_HWIS (wi.get_precision ());
> -  l = l * HOST_BITS_PER_WIDE_INT / 4 + 4;
> +  l = l * HOST_BITS_PER_WIDE_INT / 3 + 3;
>*len = l;
>return UNLIKELY (l > WIDE_INT_PRINT_BUFFER_SIZE);
>  }
> --- gcc/wide-int-print.cc.jj  2023-10-15 23:04:06.195422820 +0200
> +++ gcc/wide-int-print.cc 2023-10-16 11:20:30.662174735 +0200
> @@ -49,14 +49,12 @@ print_dec (const wide_int_ref , FILE
>  }
>  
>  
> -/* Try to print the signed self in decimal to BUF if the number fits
> -   in a HWI.  Other print in hex.  */
> +/* Try to print the signed self in decimal to BUF.  */
>  
>  void
>  print_decs (const wide_int_ref , char *buf)
>  {
> -  if ((wi.get_precision () <= HOST_BITS_PER_WIDE_INT)
> -  || (wi.get_len () == 1))
> +  if (wi.get_precision () <= HOST_BITS_PER_WIDE_INT || wi.get_len () == 1)
>  {
>if (wi::neg_p (wi))
>   sprintf (buf, "-" HOST_WIDE_INT_PRINT_UNSIGNED,
> @@ -64,12 +62,17 @@ print_decs (const wide_int_ref , char
>else
>   sprintf (buf, HOST_WIDE_INT_PRINT_DEC, wi.to_shwi ());
>  }
> +  else if (wi::neg_p (wi))
> +{
> +  widest2_int w = widest2_int::from (wi, SIGNED);
> +  *buf = '-';
> +  print_decu (-w, buf + 1);
> +}
>else
> -print_hex (wi, buf);
> +print_decu (wi, buf);
>  }
>  
> -/* Try to print the signed self in decimal to FILE if the number fits
> -   in a HWI.  Other print in hex.  */
> +/* Try to print the signed self in decimal to FILE.  */
>  
>  void
>  print_decs (const wide_int_ref , FILE *file)
> @@ -82,8 +85,7 @@ print_decs (const wide_int_ref , FILE
>fputs (p, file);
>  }
>  
> -/* Try to print the unsigned self in decimal to BUF if the number fits
> -   in a HWI.  Other print in hex.  */
> +/* Try to print the unsigned self in decimal to BUF.  */
>  
>  void
>  print_decu (const wide_int_ref , char *buf)
> @@ -92,11 +94,37 @@ print_decu (const wide_int_ref , char
>|| (wi.get_len () == 1 && !wi::neg_p (wi)))
>  sprintf (buf, HOST_WIDE_INT_PRINT_UNSIGNED, wi.to_uhwi ());
>else
> -print_hex (wi, buf);
> +{
> +  widest2_int w = widest2_int::from (wi, UNSIGNED), r;
> +  widest2_int ten19 = HOST_WIDE_INT_UC (1000);
> +  char buf2[20], next1[19], next2[19];
> +  size_t l, c

[PATCH v22 31/31] libstdc++: Optimize std::is_pointer compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_pointer
by dispatching to the new __is_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_pointer): Use __is_pointer
built-in trait.
* include/std/type_traits (is_pointer): Likewise. Optimize its
implementation.
(is_pointer_v): Likewise.

Co-authored-by: Jonathan Wakely 
Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/bits/cpp_type_traits.h |  8 
 libstdc++-v3/include/std/type_traits| 44 +
 2 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
b/libstdc++-v3/include/bits/cpp_type_traits.h
index 4312f32a4e0..246f2cc0b17 100644
--- a/libstdc++-v3/include/bits/cpp_type_traits.h
+++ b/libstdc++-v3/include/bits/cpp_type_traits.h
@@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   //
   // Pointer types
   //
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+  template
+struct __is_pointer : __truth_type<__is_pointer(_Tp)>
+{
+  enum { __value = __is_pointer(_Tp) };
+};
+#else
   template
 struct __is_pointer
 {
@@ -376,6 +383,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
   enum { __value = 1 };
   typedef __true_type __type;
 };
+#endif
 
   //
   // An arithmetic type is an integer type or a floating point type
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 50210297121..acd117cfa73 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
 #endif
 
-  template
-struct __is_pointer_helper
+  /// is_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+  template
+struct is_pointer
+: public __bool_constant<__is_pointer(_Tp)>
+{ };
+#else
+  template
+struct is_pointer
 : public false_type { };
 
   template
-struct __is_pointer_helper<_Tp*>
+struct is_pointer<_Tp*>
 : public true_type { };
 
-  /// is_pointer
   template
-struct is_pointer
-: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
-{ };
+struct is_pointer<_Tp* const>
+: public true_type { };
+
+  template
+struct is_pointer<_Tp* volatile>
+: public true_type { };
+
+  template
+struct is_pointer<_Tp* const volatile>
+: public true_type { };
+#endif
 
   /// is_lvalue_reference
   template
@@ -3252,8 +3266,22 @@ template 
   inline constexpr bool is_array_v<_Tp[_Num]> = true;
 #endif
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
+template 
+  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
+#else
 template 
-  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
+  inline constexpr bool is_pointer_v = false;
+template 
+  inline constexpr bool is_pointer_v<_Tp*> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* const> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
+template 
+  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
+#endif
+
 template 
   inline constexpr bool is_lvalue_reference_v = false;
 template 
-- 
2.42.0

[PATCH v22 11/31] libstdc++: Optimize std::is_unbounded_array compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_unbounded_array
by dispatching to the new __is_unbounded_array built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_unbounded_array_v): Use
__is_unbounded_array built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 4e8165e5af5..cb3d9e238fa 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3541,11 +3541,16 @@ template
   /// True for a type that is an array of unknown bound.
   /// @ingroup variable_templates
   /// @since C++20
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__is_unbounded_array)
+  template
+inline constexpr bool is_unbounded_array_v = __is_unbounded_array(_Tp);
+# else
   template
 inline constexpr bool is_unbounded_array_v = false;
 
   template
 inline constexpr bool is_unbounded_array_v<_Tp[]> = true;
+# endif
 
   /// True for a type that is an array of known bound.
   /// @since C++20
-- 
2.42.0

[PATCH v22 28/31] c++: Implement __remove_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::remove_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __remove_pointer.
* semantics.cc (finish_trait_type): Handle CPTK_REMOVE_POINTER.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __remove_pointer.
* g++.dg/ext/remove_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  5 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 gcc/testsuite/g++.dg/ext/remove_pointer.C | 51 +++
 4 files changed, 60 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/remove_pointer.C

diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 191a86307fc..1f405f61861 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -98,6 +98,7 @@ DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_tempo
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 DEFTRAIT_TYPE (REMOVE_CV, "__remove_cv", 1)
 DEFTRAIT_TYPE (REMOVE_CVREF, "__remove_cvref", 1)
+DEFTRAIT_TYPE (REMOVE_POINTER, "__remove_pointer", 1)
 DEFTRAIT_TYPE (REMOVE_REFERENCE, "__remove_reference", 1)
 DEFTRAIT_TYPE (TYPE_PACK_ELEMENT, "__type_pack_element", -1)
 DEFTRAIT_TYPE (UNDERLYING_TYPE, "__underlying_type", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index e3f71ff5902..45584e9045f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12494,6 +12494,11 @@ finish_trait_type (cp_trait_kind kind, tree type1, 
tree type2,
type1 = TREE_TYPE (type1);
   return cv_unqualified (type1);
 
+case CPTK_REMOVE_POINTER:
+  if (TYPE_PTR_P (type1))
+type1 = TREE_TYPE (type1);
+  return type1;
+
 case CPTK_REMOVE_REFERENCE:
   if (TYPE_REF_P (type1))
type1 = TREE_TYPE (type1);
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 163be1d710b..719902d3f1a 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -176,6 +176,9 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__remove_pointer)
+# error "__has_builtin (__remove_pointer) failed"
+#endif
 #if !__has_builtin (__remove_reference)
 # error "__has_builtin (__remove_reference) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/remove_pointer.C 
b/gcc/testsuite/g++.dg/ext/remove_pointer.C
new file mode 100644
index 000..7b13db93950
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/remove_pointer.C
@@ -0,0 +1,51 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(__is_same(__remove_pointer(int), int));
+SA(__is_same(__remove_pointer(int*), int));
+SA(__is_same(__remove_pointer(int**), int*));
+
+SA(__is_same(__remove_pointer(const int*), const int));
+SA(__is_same(__remove_pointer(const int**), const int*));
+SA(__is_same(__remove_pointer(int* const), int));
+SA(__is_same(__remove_pointer(int** const), int*));
+SA(__is_same(__remove_pointer(int* const* const), int* const));
+
+SA(__is_same(__remove_pointer(volatile int*), volatile int));
+SA(__is_same(__remove_pointer(volatile int**), volatile int*));
+SA(__is_same(__remove_pointer(int* volatile), int));
+SA(__is_same(__remove_pointer(int** volatile), int*));
+SA(__is_same(__remove_pointer(int* volatile* volatile), int* volatile));
+
+SA(__is_same(__remove_pointer(const volatile int*), const volatile int));
+SA(__is_same(__remove_pointer(const volatile int**), const volatile int*));
+SA(__is_same(__remove_pointer(const int* volatile), const int));
+SA(__is_same(__remove_pointer(volatile int* const), volatile int));
+SA(__is_same(__remove_pointer(int* const volatile), int));
+SA(__is_same(__remove_pointer(const int** volatile), const int*));
+SA(__is_same(__remove_pointer(volatile int** const), volatile int*));
+SA(__is_same(__remove_pointer(int** const volatile), int*));
+SA(__is_same(__remove_pointer(int* const* const volatile), int* const));
+SA(__is_same(__remove_pointer(int* volatile* const volatile), int* volatile));
+SA(__is_same(__remove_pointer(int* const volatile* const volatile), int* const 
volatile));
+
+SA(__is_same(__remove_pointer(int&), int&));
+SA(__is_same(__remove_pointer(const int&), const int&));
+SA(__is_same(__remove_pointer(volatile int&), volatile int&));
+SA(__is_same(__remove_pointer(const volatile int&), const volatile int&));
+
+SA(__is_same(__remove_pointer(int&&), int&&));
+SA(__is_same(__remove_pointer(const int&&), const int&&));
+SA(__is_same(__remove_pointer(volatile int&&), volatile int&&));
+SA(__is_same(__remove_pointer(const volatile int&&), const volatile int&&));
+
+SA(__is_same(__remove_pointer(int[3]), int[3]));
+SA(__is_same(__remove_pointer(const int[3]), const int[3]));
+SA(__is_same(__remove_pointer(volatile int[3]), volatile int[3]));

[PATCH v22 22/31] c++: Implement __is_reference built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_reference.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_reference.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_REFERENCE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_reference.
* g++.dg/ext/is_reference.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_reference.C  | 34 
 5 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_reference.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 9db3a60943e..e05d4fa4d20 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3788,6 +3788,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_POLYMORPHIC:
   inform (loc, "  %qT is not a polymorphic type", t1);
   break;
+case CPTK_IS_REFERENCE:
+  inform (loc, "  %qT is not a reference", t1);
+  break;
 case CPTK_IS_SAME:
   inform (loc, "  %qT is not the same as %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 11fd70b3964..e867d9c4c47 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -81,6 +81,7 @@ DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, 
"__is_nothrow_convertible", 2)
 DEFTRAIT_EXPR (IS_POINTER_INTERCONVERTIBLE_BASE_OF, 
"__is_pointer_interconvertible_base_of", 2)
 DEFTRAIT_EXPR (IS_POD, "__is_pod", 1)
 DEFTRAIT_EXPR (IS_POLYMORPHIC, "__is_polymorphic", 1)
+DEFTRAIT_EXPR (IS_REFERENCE, "__is_reference", 1)
 DEFTRAIT_EXPR (IS_SAME, "__is_same", 2)
 DEFTRAIT_EXPR (IS_SCOPED_ENUM, "__is_scoped_enum", 1)
 DEFTRAIT_EXPR (IS_STD_LAYOUT, "__is_standard_layout", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index c7e6396370d..cd17cd176cb 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12211,6 +12211,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_POLYMORPHIC:
   return CLASS_TYPE_P (type1) && TYPE_POLYMORPHIC_P (type1);
 
+case CPTK_IS_REFERENCE:
+  return type_code1 == REFERENCE_TYPE;
+
 case CPTK_IS_SAME:
   return same_type_p (type1, type2);
 
@@ -12405,6 +12408,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
+case CPTK_IS_REFERENCE:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
 case CPTK_IS_UNBOUNDED_ARRAY:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 8d9cdc528cd..e112d317657 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -122,6 +122,9 @@
 #if !__has_builtin (__is_polymorphic)
 # error "__has_builtin (__is_polymorphic) failed"
 #endif
+#if !__has_builtin (__is_reference)
+# error "__has_builtin (__is_reference) failed"
+#endif
 #if !__has_builtin (__is_same)
 # error "__has_builtin (__is_same) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_reference.C 
b/gcc/testsuite/g++.dg/ext/is_reference.C
new file mode 100644
index 000..b5ce4db7afd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_reference.C
@@ -0,0 +1,34 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+// Positive tests.
+SA_TEST_CATEGORY(__is_reference, int&, true);
+SA_TEST_CATEGORY(__is_reference, ClassType&, true);
+SA(__is_reference(int(&)(int)));
+SA_TEST_CATEGORY(__is_reference, int&&, true);
+SA_TEST_CATEGORY(__is_reference, ClassType&&, true);
+SA(__is_reference(int(&&)(int)));
+SA_TEST_CATEGORY(__is_reference, IncompleteClass&, true);
+
+// Negative tests
+SA_TEST_CATEGORY(__is_reference, void, false);
+SA_TEST_CATEGORY(__is_reference, int*, false);
+SA_TEST_CATEGORY(__is_reference, int[3], false);
+SA(!__is_reference(int(int)));
+SA(!__is_reference(int(*const)(int)));
+SA(!__is_reference(int(*volatile)(int)));
+SA(!__is_reference(int(*const volatile)(int)));
+
+// Sanity check.
+SA_TEST_CATEGORY(__is_reference, ClassType, false);
+SA_TEST_CATEGORY(__is_reference, IncompleteClass, false);
-- 
2.42.0

[PATCH v22 17/31] libstdc++: Optimize std::is_member_pointer compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_member_pointer
by dispatching to the new __is_member_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_member_pointer): Use __is_member_pointer
built-in trait.
(is_member_pointer_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7fd29d8d9f2..d7f89cf7c06 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -716,6 +716,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_compound
 : public __not_>::type { };
 
+  /// is_member_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_pointer)
+  template
+struct is_member_pointer
+: public __bool_constant<__is_member_pointer(_Tp)>
+{ };
+#else
   /// @cond undocumented
   template
 struct __is_member_pointer_helper
@@ -726,11 +733,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
   /// @endcond
 
-  /// is_member_pointer
   template
 struct is_member_pointer
 : public __is_member_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
+#endif
 
   template
 struct is_same;
@@ -3242,8 +3249,14 @@ template 
   inline constexpr bool is_scalar_v = is_scalar<_Tp>::value;
 template 
   inline constexpr bool is_compound_v = is_compound<_Tp>::value;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_pointer)
+template 
+  inline constexpr bool is_member_pointer_v = __is_member_pointer(_Tp);
+#else
 template 
   inline constexpr bool is_member_pointer_v = is_member_pointer<_Tp>::value;
+#endif
 
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
 template 
-- 
2.42.0

[PATCH v22 20/31] c++: Implement __is_member_object_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_member_object_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_object_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_OBJECT_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_object_pointer.
* g++.dg/ext/is_member_object_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 .../g++.dg/ext/is_member_object_pointer.C | 30 +++
 5 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_object_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index dde83533382..9db3a60943e 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3760,6 +3760,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
   inform (loc, "  %qT is not a member function pointer", t1);
   break;
+case CPTK_IS_MEMBER_OBJECT_POINTER:
+  inform (loc, "  %qT is not a member object pointer", t1);
+  break;
 case CPTK_IS_MEMBER_POINTER:
   inform (loc, "  %qT is not a member pointer", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 897b96630f2..11fd70b3964 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -73,6 +73,7 @@ DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
+DEFTRAIT_EXPR (IS_MEMBER_OBJECT_POINTER, "__is_member_object_pointer", 1)
 DEFTRAIT_EXPR (IS_MEMBER_POINTER, "__is_member_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 59aaa256232..c7e6396370d 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12187,6 +12187,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
   return TYPE_PTRMEMFUNC_P (type1);
 
+case CPTK_IS_MEMBER_OBJECT_POINTER:
+  return TYPE_PTRMEM_P (type1) && !TYPE_PTRMEMFUNC_P (type1);
+
 case CPTK_IS_MEMBER_POINTER:
   return TYPE_PTRMEM_P (type1);
 
@@ -12400,6 +12403,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
+case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 0dfe957474b..8d9cdc528cd 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -98,6 +98,9 @@
 #if !__has_builtin (__is_member_function_pointer)
 # error "__has_builtin (__is_member_function_pointer) failed"
 #endif
+#if !__has_builtin (__is_member_object_pointer)
+# error "__has_builtin (__is_member_object_pointer) failed"
+#endif
 #if !__has_builtin (__is_member_pointer)
 # error "__has_builtin (__is_member_pointer) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C
new file mode 100644
index 000..835e48c8f8e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_member_object_pointer.C
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_NON_VOLATILE(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+// Positive tests.
+SA_TEST_CATEGORY(__is_member_object_pointer, int (ClassType::*), true);
+SA_TEST_CATEGORY(__is_member_object_pointer, ClassType (ClassType::*), true);
+
+// Negative tests.
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, int (ClassType::*) (int), 
false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, int (ClassType::*) (float, 
...), false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, ClassType (ClassType::*) 
(ClassType), false);
+SA_TEST_NON_VOLATILE(__is_member_object_pointer, float (ClassType::*)

[PATCH v22 15/31] libstdc++: Optimize std::is_scoped_enum compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_scoped_enum
by dispatching to the new __is_scoped_enum built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_scoped_enum): Use
__is_scoped_enum built-in trait.
(is_scoped_enum_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index d306073a797..7fd29d8d9f2 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3633,6 +3633,12 @@ template
   /// True if the type is a scoped enumeration type.
   /// @since C++23
 
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__is_scoped_enum)
+  template
+struct is_scoped_enum
+: bool_constant<__is_scoped_enum(_Tp)>
+{ };
+# else
   template
 struct is_scoped_enum
 : false_type
@@ -3644,11 +3650,17 @@ template
 struct is_scoped_enum<_Tp>
 : bool_constant
 { };
+# endif
 
   /// @ingroup variable_templates
   /// @since C++23
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__is_scoped_enum)
+  template
+inline constexpr bool is_scoped_enum_v = __is_scoped_enum(_Tp);
+# else
   template
 inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value;
+# endif
 #endif
 
 #ifdef __cpp_lib_reference_from_temporary // C++ >= 23 && 
ref_{converts,constructs}_from_temp
-- 
2.42.0

[PATCH v22 25/31] libstdc++: Optimize std::is_function compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_function
by dispatching to the new __is_function built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_function): Use __is_function built-in
trait.
(is_function_v): Likewise. Optimize its implementation.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 36ad9814047..bd57488824b 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -637,6 +637,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   /// is_function
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function)
+  template
+struct is_function
+: public __bool_constant<__is_function(_Tp)>
+{ };
+#else
   template
 struct is_function
 : public __bool_constant::value> { };
@@ -648,6 +654,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_function<_Tp&&>
 : public false_type { };
+#endif
 
 #ifdef __cpp_lib_is_null_pointer // C++ >= 11
   /// is_null_pointer (LWG 2247).
@@ -3269,8 +3276,18 @@ template 
   inline constexpr bool is_union_v = __is_union(_Tp);
 template 
   inline constexpr bool is_class_v = __is_class(_Tp);
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function)
 template 
-  inline constexpr bool is_function_v = is_function<_Tp>::value;
+  inline constexpr bool is_function_v = __is_function(_Tp);
+#else
+template 
+  inline constexpr bool is_function_v = !is_const_v;
+template 
+  inline constexpr bool is_function_v<_Tp&> = false;
+template 
+  inline constexpr bool is_function_v<_Tp&&> = false;
+#endif
 
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
 template 
-- 
2.42.0

[PATCH v22 04/31] c++: Implement __is_const built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_const.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_const.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_CONST.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_const.
* g++.dg/ext/is_const.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_const.C  | 19 +++
 5 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_const.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 41fe2812ac4..41d9eef7227 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3724,6 +3724,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_CLASS:
   inform (loc, "  %qT is not a class", t1);
   break;
+case CPTK_IS_CONST:
+  inform (loc, "  %qT is not a const type", t1);
+  break;
 case CPTK_IS_CONSTRUCTIBLE:
   if (!t2)
 inform (loc, "  %qT is not default constructible", t1);
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 0e48e64b8dd..9e4e6d798a0 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -62,6 +62,7 @@ DEFTRAIT_EXPR (IS_AGGREGATE, "__is_aggregate", 1)
 DEFTRAIT_EXPR (IS_ASSIGNABLE, "__is_assignable", 2)
 DEFTRAIT_EXPR (IS_BASE_OF, "__is_base_of", 2)
 DEFTRAIT_EXPR (IS_CLASS, "__is_class", 1)
+DEFTRAIT_EXPR (IS_CONST, "__is_const", 1)
 DEFTRAIT_EXPR (IS_CONSTRUCTIBLE, "__is_constructible", -1)
 DEFTRAIT_EXPR (IS_CONVERTIBLE, "__is_convertible", 2)
 DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 144cb440fa3..7fbcfd7ccad 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12154,6 +12154,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_CLASS:
   return NON_UNION_CLASS_TYPE_P (type1);
 
+case CPTK_IS_CONST:
+  return CP_TYPE_CONST_P (type1);
+
 case CPTK_IS_CONSTRUCTIBLE:
   return is_xible (INIT_EXPR, type1, type2);
 
@@ -12371,6 +12374,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
   break;
 
 case CPTK_IS_CLASS:
+case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
 case CPTK_IS_SAME:
 case CPTK_IS_UNION:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 2223f08a628..e6e481b13c5 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -65,6 +65,9 @@
 #if !__has_builtin (__is_class)
 # error "__has_builtin (__is_class) failed"
 #endif
+#if !__has_builtin (__is_const)
+# error "__has_builtin (__is_const) failed"
+#endif
 #if !__has_builtin (__is_constructible)
 # error "__has_builtin (__is_constructible) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_const.C 
b/gcc/testsuite/g++.dg/ext/is_const.C
new file mode 100644
index 000..8f2d7c2fce9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_const.C
@@ -0,0 +1,19 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+// Positive tests.
+SA(__is_const(const int));
+SA(__is_const(const volatile int));
+SA(__is_const(cClassType));
+SA(__is_const(cvClassType));
+
+// Negative tests.
+SA(!__is_const(int));
+SA(!__is_const(volatile int));
+SA(!__is_const(ClassType));
+SA(!__is_const(vClassType));
-- 
2.42.0

Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-17 Thread Robin Dapp

>> I don't know much about valueisation either :)  But it does feel
>> like we're working around the lack of a LEN form of COND_EXPR.
>> In other words, it seems odd that we can do:
>>
>>   IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)
>>
>> but we can't do:
>>
>>   IFN_COND_LEN (mask, a, b, len, bias)
>>
>> There seems to be no way of applying a length without also finding an
>> operation to perform.
> 
> Indeed .. maybe - _maybe_ we want to scrap VEC_COND_EXPR for
> IFN_COND{,_LEN} to be more consistent here?

So, yes we could define IFN_COND_LEN (or VCOND_MASK_LEN) but I'd
assume that there would be a whole lot of follow-up things to
consider.

I'm wondering if we really gain something from the the round-trip
via VEC_COND_EXPR when we eventually create a COND_(LEN_)_OP anyway?
Sure, if the target doesn't have the particular operation we would
want a VEC_COND_EXPR.  Same if SEQ is somehow more complicated.

So the IFN_COND(_LEN) =? VCOND_MASK(_LEN) discussion notwithstanding,
couldn't what I naively proposed be helpful as well?  Or do we
potentially lose optimizations during the time where e.g. a
 _foo = a BINOP b
 VEC_COND_EXPR (cond, foo, else)
has not yet been converted into a
 COND_OP?
We already create COND_OPs for the other paths
(via convert_conditional_op) so why not for this one?  Or am I missing
some interdependence with SEQ?

FWIW I did a full bootstrap and testsuite run on the usual architectures
showing no changes with the attached patch.

Regards
 Robin

Subject: [PATCH] gimple-match: Create COND_OP directly if possible.

This patch converts simplified sequences into conditional operations
instead of VEC_COND_EXPRs if the target supports them.
This helps for len-masked targets which cannot directly use a
VEC_COND_EXPR in the presence of length masking.

gcc/ChangeLog:

* gimple-match-exports.cc (directly_supported_p): Define.
(maybe_resimplify_conditional_op): Create COND_OP directly.
* gimple-match.h (gimple_match_cond::gimple_match_cond):
Initialize length and bias.
---
 gcc/gimple-match-exports.cc | 40 -
 gcc/gimple-match.h  |  7 +--
 2 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
index b36027b0bad..ba3bd1450db 100644
--- a/gcc/gimple-match-exports.cc
+++ b/gcc/gimple-match-exports.cc
@@ -98,6 +98,8 @@ static bool gimple_resimplify5 (gimple_seq *, gimple_match_op 
*, tree (*)(tree))
 static bool gimple_resimplify6 (gimple_seq *, gimple_match_op *, tree 
(*)(tree));
 static bool gimple_resimplify7 (gimple_seq *, gimple_match_op *, tree 
(*)(tree));
 
+bool directly_supported_p (code_helper, tree, optab_subtype);
+
 /* Match and simplify the toplevel valueized operation THIS.
Replaces THIS with a simplified and/or canonicalized result and
returns whether any change was made.  */
@@ -299,22 +301,42 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
}
 }
 
-  /* If the "then" value is a gimple value and the "else" value matters,
- create a VEC_COND_EXPR between them, then see if it can be further
- simplified.  */
+  /* If the condition represents MASK ? THEN : ELSE, where THEN is a gimple
+ value and ELSE matters, create a VEC_COND_EXPR between them, then see
+ if it can be further simplified.
+ For COND_LEN masking, try to create a COND_LEN_OP directly in case
+ SEQ contains a supportable operation. */
   gimple_match_op new_op;
   if (res_op->cond.else_value
   && VECTOR_TYPE_P (res_op->type)
   && gimple_simplified_result_is_gimple_val (res_op))
 {
-  new_op.set_op (VEC_COND_EXPR, res_op->type,
-res_op->cond.cond, res_op->ops[0],
-res_op->cond.else_value);
-  *res_op = new_op;
-  return gimple_resimplify3 (seq, res_op, valueize);
+  /* If a previous simplification was pushed to SEQ
+and we can convert it to a COND_OP directly, do so
+in order to save a round-trip via VEC_COND_EXPR -> COND_OP.  */
+  if (seq && *seq && is_gimple_assign (*seq)
+ && directly_supported_p (gimple_assign_rhs_code (*seq), res_op->type,
+  optab_scalar))
+   {
+ res_op->code = gimple_assign_rhs_code (*seq);
+ res_op->num_ops = gimple_num_ops (*seq) - 1;
+ res_op->ops[0] = gimple_assign_rhs1 (*seq);
+ if (res_op->num_ops > 1)
+   res_op->ops[1] = gimple_assign_rhs2 (*seq);
+ if (res_op->num_ops > 2)
+   res_op->ops[2] = gimple_assign_rhs2 (*seq);
+   }
+  else if (!res_op->cond.len)
+   {
+ new_op.set_op (VEC_COND_EXPR, res_op->type,
+res_op->cond.cond, res_op->ops[0],
+res_op->cond.else_value);
+ *res_op = new_op;
+ return gimple_resimplify3 (seq, res_op, valueize);
+   }
 }
 
-  /* Otherwise try rewriting the

[PATCH v22 13/31] libstdc++: Optimize std::is_bounded_array compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_bounded_array
by dispatching to the new __is_bounded_array built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_bounded_array_v): Use __is_bounded_array
built-in trait.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index cb3d9e238fa..d306073a797 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3532,11 +3532,16 @@ template
   /// True for a type that is an array of known bound.
   /// @ingroup variable_templates
   /// @since C++20
+# if _GLIBCXX_USE_BUILTIN_TRAIT(__is_bounded_array)
+  template
+inline constexpr bool is_bounded_array_v = __is_bounded_array(_Tp);
+# else
   template
 inline constexpr bool is_bounded_array_v = false;
 
   template
 inline constexpr bool is_bounded_array_v<_Tp[_Size]> = true;
+# endif
 
   /// True for a type that is an array of unknown bound.
   /// @ingroup variable_templates
-- 
2.42.0

[PATCH v22 09/31] libstdc++: Optimize std::is_array compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_array
by dispatching to the new __is_array built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_array): Use __is_array built-in trait.
(is_array_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 12 
 1 file changed, 12 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index c01f65df22b..4e8165e5af5 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -523,6 +523,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   /// is_array
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_array)
+  template
+struct is_array
+: public __bool_constant<__is_array(_Tp)>
+{ };
+#else
   template
 struct is_array
 : public false_type { };
@@ -534,6 +540,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_array<_Tp[]>
 : public true_type { };
+#endif
 
   template
 struct __is_pointer_helper
@@ -3183,12 +3190,17 @@ template 
 template 
   inline constexpr bool is_floating_point_v = is_floating_point<_Tp>::value;
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_array)
+template 
+  inline constexpr bool is_array_v = __is_array(_Tp);
+#else
 template 
   inline constexpr bool is_array_v = false;
 template 
   inline constexpr bool is_array_v<_Tp[]> = true;
 template 
   inline constexpr bool is_array_v<_Tp[_Num]> = true;
+#endif
 
 template 
   inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
-- 
2.42.0

[PATCH v22 18/31] c++: Implement __is_member_function_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_member_function_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_function_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_FUNCTION_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_function_pointer.
* g++.dg/ext/is_member_function_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 .../g++.dg/ext/is_member_function_pointer.C   | 31 +++
 5 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_function_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a969a069db4..dde83533382 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3757,6 +3757,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_LITERAL_TYPE:
   inform (loc, "  %qT is not a literal type", t1);
   break;
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
+  inform (loc, "  %qT is not a member function pointer", t1);
+  break;
 case CPTK_IS_MEMBER_POINTER:
   inform (loc, "  %qT is not a member pointer", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 26087da3bdf..897b96630f2 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -72,6 +72,7 @@ DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
+DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
 DEFTRAIT_EXPR (IS_MEMBER_POINTER, "__is_member_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 5ca05dde75d..59aaa256232 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12184,6 +12184,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_LITERAL_TYPE:
   return literal_type_p (type1);
 
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
+  return TYPE_PTRMEMFUNC_P (type1);
+
 case CPTK_IS_MEMBER_POINTER:
   return TYPE_PTRMEM_P (type1);
 
@@ -12396,6 +12399,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CLASS:
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 994873f14e9..0dfe957474b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -95,6 +95,9 @@
 #if !__has_builtin (__is_literal_type)
 # error "__has_builtin (__is_literal_type) failed"
 #endif
+#if !__has_builtin (__is_member_function_pointer)
+# error "__has_builtin (__is_member_function_pointer) failed"
+#endif
 #if !__has_builtin (__is_member_pointer)
 # error "__has_builtin (__is_member_pointer) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C
new file mode 100644
index 000..555123e8f07
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_FN(TRAIT, TYPE, EXPECT)\
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT);
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+// Positive tests.
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (int), true);
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (int) const, true);
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (float, ...), 
true);
+SA_TEST_FN(__is_member_function_pointer, ClassType (ClassType::*) (ClassType), 
true);
+SA_TEST_FN(__is_member_function_pointer, float (ClassType::*) (int, float, 
int[], int&), true);
+
+// Negative tests.
+SA_TEST_CATEGORY(__is_member_function_pointer, int (ClassType::*), false);
+SA_TEST_CATEGORY(__is_member_function_pointer, ClassType (ClassType::*), 
false);
+
+// Sanity check.

[PATCH v21 18/30] libstdc++: Optimize std::is_member_function_pointer compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of
std::is_member_function_pointer by dispatching to the new
__is_member_function_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_member_function_pointer): Use
__is_member_function_pointer built-in trait.
(is_member_function_pointer_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 16 
 1 file changed, 16 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index d7f89cf7c06..e1b10240dc2 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -588,6 +588,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __is_member_object_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_function_pointer)
+  /// is_member_function_pointer
+  template
+struct is_member_function_pointer
+: public __bool_constant<__is_member_function_pointer(_Tp)>
+{ };
+#else
   template
 struct __is_member_function_pointer_helper
 : public false_type { };
@@ -601,6 +608,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_member_function_pointer
 : public __is_member_function_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
+#endif
 
   /// is_enum
   template
@@ -3222,9 +3230,17 @@ template 
 template 
   inline constexpr bool is_member_object_pointer_v =
 is_member_object_pointer<_Tp>::value;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_function_pointer)
+template 
+  inline constexpr bool is_member_function_pointer_v =
+__is_member_function_pointer(_Tp);
+#else
 template 
   inline constexpr bool is_member_function_pointer_v =
 is_member_function_pointer<_Tp>::value;
+#endif
+
 template 
   inline constexpr bool is_enum_v = __is_enum(_Tp);
 template 
-- 
2.42.0

[PATCH v22 26/31] c++: Implement __is_object built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_object.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_object.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_OBJECT.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_object.
* g++.dg/ext/is_object.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  6 +
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_object.C | 29 
 5 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_object.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c394657d6b9..444dbaacd78 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3781,6 +3781,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_NOTHROW_CONVERTIBLE:
  inform (loc, "  %qT is not nothrow convertible from %qE", t2, t1);
   break;
+case CPTK_IS_OBJECT:
+  inform (loc, "  %qT is not an object type", t1);
+  break;
 case CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF:
   inform (loc, "  %qT is not pointer-interconvertible base of %qT",
  t1, t2);
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index fa79bc0c68c..191a86307fc 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -79,6 +79,7 @@ DEFTRAIT_EXPR (IS_MEMBER_POINTER, "__is_member_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
 DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, "__is_nothrow_convertible", 2)
+DEFTRAIT_EXPR (IS_OBJECT, "__is_object", 1)
 DEFTRAIT_EXPR (IS_POINTER_INTERCONVERTIBLE_BASE_OF, 
"__is_pointer_interconvertible_base_of", 2)
 DEFTRAIT_EXPR (IS_POD, "__is_pod", 1)
 DEFTRAIT_EXPR (IS_POLYMORPHIC, "__is_polymorphic", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8118d3104c7..e3f71ff5902 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12205,6 +12205,11 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_NOTHROW_CONVERTIBLE:
   return is_nothrow_convertible (type1, type2);
 
+case CPTK_IS_OBJECT:
+  return (type_code1 != FUNCTION_TYPE
+ && type_code1 != REFERENCE_TYPE
+ && type_code1 != VOID_TYPE);
+
 case CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF:
   return pointer_interconvertible_base_of_p (type1, type2);
 
@@ -12412,6 +12417,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
+case CPTK_IS_OBJECT:
 case CPTK_IS_REFERENCE:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 4d3947572a4..163be1d710b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -116,6 +116,9 @@
 #if !__has_builtin (__is_nothrow_convertible)
 # error "__has_builtin (__is_nothrow_convertible) failed"
 #endif
+#if !__has_builtin (__is_object)
+# error "__has_builtin (__is_object) failed"
+#endif
 #if !__has_builtin (__is_pointer_interconvertible_base_of)
 # error "__has_builtin (__is_pointer_interconvertible_base_of) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_object.C 
b/gcc/testsuite/g++.dg/ext/is_object.C
new file mode 100644
index 000..5c759a5ef69
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_object.C
@@ -0,0 +1,29 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_NON_VOLATILE(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+SA_TEST_NON_VOLATILE(__is_object, int (int), false);
+SA_TEST_NON_VOLATILE(__is_object, ClassType (ClassType), false);
+SA_TEST_NON_VOLATILE(__is_object,
+float (int, float, int[], int&), false);
+SA_TEST_CATEGORY(__is_object, int&, false);
+SA_TEST_CATEGORY(__is_object, ClassType&, false);
+SA_TEST_NON_VOLATILE(__is_object, int(&)(int), false);
+SA_TEST_CATEGORY(__is_object, void, false);
+
+// Sanity check.
+SA_TEST_CATEGORY(__is_object, ClassType, true);
-- 
2.42.0

[PATCH v22 30/31] c++: Implement __is_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_pointer.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_pointer.
* g++.dg/ext/is_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 ++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 gcc/testsuite/g++.dg/ext/is_pointer.C| 51 
 5 files changed, 62 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 444dbaacd78..9fce36e12d1 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3791,6 +3791,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_POD:
   inform (loc, "  %qT is not a POD type", t1);
   break;
+case CPTK_IS_POINTER:
+  inform (loc, "  %qT is not a pointer", t1);
+  break;
 case CPTK_IS_POLYMORPHIC:
   inform (loc, "  %qT is not a polymorphic type", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 1f405f61861..05514a51c21 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, 
"__is_nothrow_convertible", 2)
 DEFTRAIT_EXPR (IS_OBJECT, "__is_object", 1)
 DEFTRAIT_EXPR (IS_POINTER_INTERCONVERTIBLE_BASE_OF, 
"__is_pointer_interconvertible_base_of", 2)
 DEFTRAIT_EXPR (IS_POD, "__is_pod", 1)
+DEFTRAIT_EXPR (IS_POINTER, "__is_pointer", 1)
 DEFTRAIT_EXPR (IS_POLYMORPHIC, "__is_polymorphic", 1)
 DEFTRAIT_EXPR (IS_REFERENCE, "__is_reference", 1)
 DEFTRAIT_EXPR (IS_SAME, "__is_same", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 45584e9045f..7cccbae5287 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12216,6 +12216,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_POD:
   return pod_type_p (type1);
 
+case CPTK_IS_POINTER:
+  return TYPE_PTR_P (type1);
+
 case CPTK_IS_POLYMORPHIC:
   return CLASS_TYPE_P (type1) && TYPE_POLYMORPHIC_P (type1);
 
@@ -12418,6 +12421,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_OBJECT:
+case CPTK_IS_POINTER:
 case CPTK_IS_REFERENCE:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 719902d3f1a..b1430e9bd8b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -125,6 +125,9 @@
 #if !__has_builtin (__is_pod)
 # error "__has_builtin (__is_pod) failed"
 #endif
+#if !__has_builtin (__is_pointer)
+# error "__has_builtin (__is_pointer) failed"
+#endif
 #if !__has_builtin (__is_polymorphic)
 # error "__has_builtin (__is_polymorphic) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_pointer.C
new file mode 100644
index 000..d6e39565950
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_pointer.C
@@ -0,0 +1,51 @@
+// { dg-do compile { target c++11 } }
+
+#define SA(X) static_assert((X),#X)
+
+SA(!__is_pointer(int));
+SA(__is_pointer(int*));
+SA(__is_pointer(int**));
+
+SA(__is_pointer(const int*));
+SA(__is_pointer(const int**));
+SA(__is_pointer(int* const));
+SA(__is_pointer(int** const));
+SA(__is_pointer(int* const* const));
+
+SA(__is_pointer(volatile int*));
+SA(__is_pointer(volatile int**));
+SA(__is_pointer(int* volatile));
+SA(__is_pointer(int** volatile));
+SA(__is_pointer(int* volatile* volatile));
+
+SA(__is_pointer(const volatile int*));
+SA(__is_pointer(const volatile int**));
+SA(__is_pointer(const int* volatile));
+SA(__is_pointer(volatile int* const));
+SA(__is_pointer(int* const volatile));
+SA(__is_pointer(const int** volatile));
+SA(__is_pointer(volatile int** const));
+SA(__is_pointer(int** const volatile));
+SA(__is_pointer(int* const* const volatile));
+SA(__is_pointer(int* volatile* const volatile));
+SA(__is_pointer(int* const volatile* const volatile));
+
+SA(!__is_pointer(int&));
+SA(!__is_pointer(const int&));
+SA(!__is_pointer(volatile int&));
+SA(!__is_pointer(const volatile int&));
+
+SA(!__is_pointer(int&&));
+SA(!__is_pointer(const int&&));
+SA(!__is_pointer(volatile int&&));
+SA(!__is_pointer(const volatile int&&));
+
+SA(!__is_pointer(int[3]));
+SA(!__is_pointer(const int[3]));
+SA(!__is_pointer(volatile int[3]));
+SA(!__is_pointer(const volatile int[3]));
+
+SA(!__is_pointer(int(int)));
+SA(__is_pointer(int(*const)(int)));
+SA(__is_pointer(int(*volatile)(int)));
+SA(__is_pointer(int(*const

[PATCH v21 23/30] c++: Implement __is_function built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_function.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_function.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_FUNCTION.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_function.
* g++.dg/ext/is_function.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 ++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 gcc/testsuite/g++.dg/ext/is_function.C   | 58 
 5 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index e05d4fa4d20..c394657d6b9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FINAL:
   inform (loc, "  %qT is not a final class", t1);
   break;
+case CPTK_IS_FUNCTION:
+  inform (loc, "  %qT is not a function", t1);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index e867d9c4c47..fa79bc0c68c 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -70,6 +70,7 @@ DEFTRAIT_EXPR (IS_CONVERTIBLE, "__is_convertible", 2)
 DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
+DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cd17cd176cb..8118d3104c7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12178,6 +12178,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_FINAL:
   return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
 
+case CPTK_IS_FUNCTION:
+  return type_code1 == FUNCTION_TYPE;
+
 case CPTK_IS_LAYOUT_COMPATIBLE:
   return layout_compatible_type_p (type1, type2);
 
@@ -12405,6 +12408,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CLASS:
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
+case CPTK_IS_FUNCTION:
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index e112d317657..4d3947572a4 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -89,6 +89,9 @@
 #if !__has_builtin (__is_final)
 # error "__has_builtin (__is_final) failed"
 #endif
+#if !__has_builtin (__is_function)
+# error "__has_builtin (__is_function) failed"
+#endif
 #if !__has_builtin (__is_layout_compatible)
 # error "__has_builtin (__is_layout_compatible) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_function.C 
b/gcc/testsuite/g++.dg/ext/is_function.C
new file mode 100644
index 000..2e1594b12ad
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_function.C
@@ -0,0 +1,58 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+struct A
+{ void fn(); };
+
+template
+struct AHolder { };
+
+template
+struct AHolder
+{ using type = U; };
+
+// Positive tests.
+SA(__is_function(int (int)));
+SA(__is_function(ClassType (ClassType)));
+SA(__is_function(float (int, float, int[], int&)));
+SA(__is_function(int (int, ...)));
+SA(__is_function(bool (ClassType) const));
+SA(__is_function(AHolder::type));
+
+void fn();
+SA(__is_function(decltype(fn)));
+
+// Negative tests.
+SA_TEST_CATEGORY(__is_function, int, false);
+SA_TEST_CATEGORY(__is_function, int*, false);
+SA_TEST_CATEGORY(__is_function, int&, false);
+SA_TEST_CATEGORY(__is_function, void, false);
+SA_TEST_CATEGORY(__is_function, void*, false);
+SA_TEST_CATEGORY(__is_function, void**, false);
+SA_TEST_CATEGORY(__is_function, std::nullptr_t, false);
+
+SA_TEST_CATEGORY(__is_function, AbstractClass, false);
+SA(!__is_function(int(&)(int)));
+SA(!__is_function(int(*)(int)));
+
+SA_TEST_CATEGORY(__is_function, A, false);
+SA_TEST_CATEGORY(__is_function, decltype(::fn), false);
+
+struct FnCallOverload
+{ void operator()(); };

[PATCH v22 16/31] c++: Implement __is_member_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_member_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_pointer.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_MEMBER_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_member_pointer.
* g++.dg/ext/is_member_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 ++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 gcc/testsuite/g++.dg/ext/is_member_pointer.C | 30 
 5 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 2b46e3afa97..a969a069db4 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3757,6 +3757,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_LITERAL_TYPE:
   inform (loc, "  %qT is not a literal type", t1);
   break;
+case CPTK_IS_MEMBER_POINTER:
+  inform (loc, "  %qT is not a member pointer", t1);
+  break;
 case CPTK_IS_NOTHROW_ASSIGNABLE:
   inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index e0e3fe1d23f..26087da3bdf 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -72,6 +72,7 @@ DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
+DEFTRAIT_EXPR (IS_MEMBER_POINTER, "__is_member_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
 DEFTRAIT_EXPR (IS_NOTHROW_CONVERTIBLE, "__is_nothrow_convertible", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 9f0e468f489..5ca05dde75d 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12184,6 +12184,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_LITERAL_TYPE:
   return literal_type_p (type1);
 
+case CPTK_IS_MEMBER_POINTER:
+  return TYPE_PTRMEM_P (type1);
+
 case CPTK_IS_NOTHROW_ASSIGNABLE:
   return is_nothrow_xible (MODIFY_EXPR, type1, type2);
 
@@ -12393,6 +12396,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CLASS:
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
+case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
 case CPTK_IS_UNBOUNDED_ARRAY:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index ba97beea3c3..994873f14e9 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -95,6 +95,9 @@
 #if !__has_builtin (__is_literal_type)
 # error "__has_builtin (__is_literal_type) failed"
 #endif
+#if !__has_builtin (__is_member_pointer)
+# error "__has_builtin (__is_member_pointer) failed"
+#endif
 #if !__has_builtin (__is_nothrow_assignable)
 # error "__has_builtin (__is_nothrow_assignable) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_member_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_member_pointer.C
new file mode 100644
index 000..7ee2e3ab90c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_member_pointer.C
@@ -0,0 +1,30 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_NON_VOLATILE(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+SA_TEST_CATEGORY(__is_member_pointer, int (ClassType::*), true);
+SA_TEST_CATEGORY(__is_member_pointer, ClassType (ClassType::*), true);
+
+SA_TEST_NON_VOLATILE(__is_member_pointer, int (ClassType::*)(int), true);
+SA_TEST_NON_VOLATILE(__is_member_pointer, int (ClassType::*)(int) const, true);
+SA_TEST_NON_VOLATILE(__is_member_pointer, int (ClassType::*)(float, ...), 
true);
+SA_TEST_NON_VOLATILE(__is_member_pointer, ClassType (ClassType::*)(ClassType), 
true);
+SA_TEST_NON_VOLATILE(__is_member_pointer,
+float (ClassType::*)(int, float, int[], int&), true);
+
+// Sanity check.
+SA_TEST_CATEGORY(__is_member_pointer, ClassType, false);
-- 
2.42.0

[PATCH v22 24/31] c++: Implement __is_function built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_function.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_function.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_FUNCTION.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_function.
* g++.dg/ext/is_function.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 ++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 ++
 gcc/testsuite/g++.dg/ext/is_function.C   | 58 
 5 files changed, 69 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index e05d4fa4d20..c394657d6b9 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_FINAL:
   inform (loc, "  %qT is not a final class", t1);
   break;
+case CPTK_IS_FUNCTION:
+  inform (loc, "  %qT is not a function", t1);
+  break;
 case CPTK_IS_LAYOUT_COMPATIBLE:
   inform (loc, "  %qT is not layout compatible with %qT", t1, t2);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index e867d9c4c47..fa79bc0c68c 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -70,6 +70,7 @@ DEFTRAIT_EXPR (IS_CONVERTIBLE, "__is_convertible", 2)
 DEFTRAIT_EXPR (IS_EMPTY, "__is_empty", 1)
 DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
+DEFTRAIT_EXPR (IS_FUNCTION, "__is_function", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
 DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index cd17cd176cb..8118d3104c7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12178,6 +12178,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_FINAL:
   return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
 
+case CPTK_IS_FUNCTION:
+  return type_code1 == FUNCTION_TYPE;
+
 case CPTK_IS_LAYOUT_COMPATIBLE:
   return layout_compatible_type_p (type1, type2);
 
@@ -12405,6 +12408,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CLASS:
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
+case CPTK_IS_FUNCTION:
 case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_OBJECT_POINTER:
 case CPTK_IS_MEMBER_POINTER:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index e112d317657..4d3947572a4 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -89,6 +89,9 @@
 #if !__has_builtin (__is_final)
 # error "__has_builtin (__is_final) failed"
 #endif
+#if !__has_builtin (__is_function)
+# error "__has_builtin (__is_function) failed"
+#endif
 #if !__has_builtin (__is_layout_compatible)
 # error "__has_builtin (__is_layout_compatible) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_function.C 
b/gcc/testsuite/g++.dg/ext/is_function.C
new file mode 100644
index 000..2e1594b12ad
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_function.C
@@ -0,0 +1,58 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+struct A
+{ void fn(); };
+
+template
+struct AHolder { };
+
+template
+struct AHolder
+{ using type = U; };
+
+// Positive tests.
+SA(__is_function(int (int)));
+SA(__is_function(ClassType (ClassType)));
+SA(__is_function(float (int, float, int[], int&)));
+SA(__is_function(int (int, ...)));
+SA(__is_function(bool (ClassType) const));
+SA(__is_function(AHolder::type));
+
+void fn();
+SA(__is_function(decltype(fn)));
+
+// Negative tests.
+SA_TEST_CATEGORY(__is_function, int, false);
+SA_TEST_CATEGORY(__is_function, int*, false);
+SA_TEST_CATEGORY(__is_function, int&, false);
+SA_TEST_CATEGORY(__is_function, void, false);
+SA_TEST_CATEGORY(__is_function, void*, false);
+SA_TEST_CATEGORY(__is_function, void**, false);
+SA_TEST_CATEGORY(__is_function, std::nullptr_t, false);
+
+SA_TEST_CATEGORY(__is_function, AbstractClass, false);
+SA(!__is_function(int(&)(int)));
+SA(!__is_function(int(*)(int)));
+
+SA_TEST_CATEGORY(__is_function, A, false);
+SA_TEST_CATEGORY(__is_function, decltype(::fn), false);
+
+struct FnCallOverload
+{ void operator()(); };

[PATCH v21 16/30] libstdc++: Optimize std::is_member_pointer compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_member_pointer
by dispatching to the new __is_member_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_member_pointer): Use __is_member_pointer
built-in trait.
(is_member_pointer_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7fd29d8d9f2..d7f89cf7c06 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -716,6 +716,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_compound
 : public __not_>::type { };
 
+  /// is_member_pointer
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_pointer)
+  template
+struct is_member_pointer
+: public __bool_constant<__is_member_pointer(_Tp)>
+{ };
+#else
   /// @cond undocumented
   template
 struct __is_member_pointer_helper
@@ -726,11 +733,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
   /// @endcond
 
-  /// is_member_pointer
   template
 struct is_member_pointer
 : public __is_member_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
+#endif
 
   template
 struct is_same;
@@ -3242,8 +3249,14 @@ template 
   inline constexpr bool is_scalar_v = is_scalar<_Tp>::value;
 template 
   inline constexpr bool is_compound_v = is_compound<_Tp>::value;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_pointer)
+template 
+  inline constexpr bool is_member_pointer_v = __is_member_pointer(_Tp);
+#else
 template 
   inline constexpr bool is_member_pointer_v = is_member_pointer<_Tp>::value;
+#endif
 
 #if _GLIBCXX_USE_BUILTIN_TRAIT(__is_const)
 template 
-- 
2.42.0

[PATCH v22 19/31] libstdc++: Optimize std::is_member_function_pointer compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of
std::is_member_function_pointer by dispatching to the new
__is_member_function_pointer built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_member_function_pointer): Use
__is_member_function_pointer built-in trait.
(is_member_function_pointer_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 16 
 1 file changed, 16 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index d7f89cf7c06..e1b10240dc2 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -588,6 +588,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __is_member_object_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
 
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_function_pointer)
+  /// is_member_function_pointer
+  template
+struct is_member_function_pointer
+: public __bool_constant<__is_member_function_pointer(_Tp)>
+{ };
+#else
   template
 struct __is_member_function_pointer_helper
 : public false_type { };
@@ -601,6 +608,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_member_function_pointer
 : public __is_member_function_pointer_helper<__remove_cv_t<_Tp>>::type
 { };
+#endif
 
   /// is_enum
   template
@@ -3222,9 +3230,17 @@ template 
 template 
   inline constexpr bool is_member_object_pointer_v =
 is_member_object_pointer<_Tp>::value;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_member_function_pointer)
+template 
+  inline constexpr bool is_member_function_pointer_v =
+__is_member_function_pointer(_Tp);
+#else
 template 
   inline constexpr bool is_member_function_pointer_v =
 is_member_function_pointer<_Tp>::value;
+#endif
+
 template 
   inline constexpr bool is_enum_v = __is_enum(_Tp);
 template 
-- 
2.42.0

[PATCH v22 14/31] c++: Implement __is_scoped_enum built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_scoped_enum.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_scoped_enum.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_SCOPED_ENUM.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_scoped_enum.
* g++.dg/ext/is_scoped_enum.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 +
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 +
 gcc/testsuite/g++.dg/ext/is_scoped_enum.C | 67 +++
 5 files changed, 78 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_scoped_enum.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 71a40558881..2b46e3afa97 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3782,6 +3782,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_SAME:
   inform (loc, "  %qT is not the same as %qT", t1, t2);
   break;
+case CPTK_IS_SCOPED_ENUM:
+  inform (loc, "  %qT is not a scoped enum", t1);
+  break;
 case CPTK_IS_STD_LAYOUT:
   inform (loc, "  %qT is not an standard layout type", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 6d6dff7a4c3..e0e3fe1d23f 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -79,6 +79,7 @@ DEFTRAIT_EXPR (IS_POINTER_INTERCONVERTIBLE_BASE_OF, 
"__is_pointer_interconvertib
 DEFTRAIT_EXPR (IS_POD, "__is_pod", 1)
 DEFTRAIT_EXPR (IS_POLYMORPHIC, "__is_polymorphic", 1)
 DEFTRAIT_EXPR (IS_SAME, "__is_same", 2)
+DEFTRAIT_EXPR (IS_SCOPED_ENUM, "__is_scoped_enum", 1)
 DEFTRAIT_EXPR (IS_STD_LAYOUT, "__is_standard_layout", 1)
 DEFTRAIT_EXPR (IS_TRIVIAL, "__is_trivial", 1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, "__is_trivially_assignable", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index aab35c9e5ba..9f0e468f489 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12205,6 +12205,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_SAME:
   return same_type_p (type1, type2);
 
+case CPTK_IS_SCOPED_ENUM:
+  return SCOPED_ENUM_P (type1);
+
 case CPTK_IS_STD_LAYOUT:
   return std_layout_type_p (type1);
 
@@ -12391,6 +12394,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
 case CPTK_IS_SAME:
+case CPTK_IS_SCOPED_ENUM:
 case CPTK_IS_UNBOUNDED_ARRAY:
 case CPTK_IS_UNION:
 case CPTK_IS_VOLATILE:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 4142da518b1..ba97beea3c3 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -119,6 +119,9 @@
 #if !__has_builtin (__is_same_as)
 # error "__has_builtin (__is_same_as) failed"
 #endif
+#if !__has_builtin (__is_scoped_enum)
+# error "__has_builtin (__is_scoped_enum) failed"
+#endif
 #if !__has_builtin (__is_standard_layout)
 # error "__has_builtin (__is_standard_layout) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_scoped_enum.C 
b/gcc/testsuite/g++.dg/ext/is_scoped_enum.C
new file mode 100644
index 000..a563b6ee67d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_scoped_enum.C
@@ -0,0 +1,67 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_FN(TRAIT, TYPE, EXPECT)\
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT);
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+enum class E { e1, e2 };
+SA_TEST_CATEGORY(__is_scoped_enum, E, true);
+enum class Ec : char { e1, e2 };
+SA_TEST_CATEGORY(__is_scoped_enum, Ec, true);
+
+// negative tests
+enum U { u1, u2 };
+SA_TEST_CATEGORY(__is_scoped_enum, U, false);
+enum F : int { f1, f2 };
+SA_TEST_CATEGORY(__is_scoped_enum, F, false);
+struct S;
+SA_TEST_CATEGORY(__is_scoped_enum, S, false);
+struct S { };
+SA_TEST_CATEGORY(__is_scoped_enum, S, false);
+
+SA_TEST_CATEGORY(__is_scoped_enum, int, false);
+SA_TEST_CATEGORY(__is_scoped_enum, int[], false);
+SA_TEST_CATEGORY(__is_scoped_enum, int[2], false);
+SA_TEST_CATEGORY(__is_scoped_enum, int[][2], false);
+SA_TEST_CATEGORY(__is_scoped_enum, int[2][3], false);
+SA_TEST_CATEGORY(__is_scoped_enum, int*, false);
+SA_TEST_CATEGORY(__is_scoped_enum, int&, false);
+SA_TEST_CATEGORY(__is_scoped_enum, int*&, false);
+SA_TEST_FN(__is_scoped_enum, int(), false);
+SA_TEST_FN(__is_scoped_enum, int(*)(), false);

[PATCH v21 17/30] c++: Implement __is_member_function_pointer built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_member_function_pointer.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_member_function_pointer.
* constraint.cc (diagnose_trait_expr): Handle
CPTK_IS_MEMBER_FUNCTION_POINTER.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of
__is_member_function_pointer.
* g++.dg/ext/is_member_function_pointer.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 +++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 .../g++.dg/ext/is_member_function_pointer.C   | 31 +++
 5 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_member_function_pointer.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index a969a069db4..dde83533382 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3757,6 +3757,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_LITERAL_TYPE:
   inform (loc, "  %qT is not a literal type", t1);
   break;
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
+  inform (loc, "  %qT is not a member function pointer", t1);
+  break;
 case CPTK_IS_MEMBER_POINTER:
   inform (loc, "  %qT is not a member pointer", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 26087da3bdf..897b96630f2 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -72,6 +72,7 @@ DEFTRAIT_EXPR (IS_ENUM, "__is_enum", 1)
 DEFTRAIT_EXPR (IS_FINAL, "__is_final", 1)
 DEFTRAIT_EXPR (IS_LAYOUT_COMPATIBLE, "__is_layout_compatible", 2)
 DEFTRAIT_EXPR (IS_LITERAL_TYPE, "__is_literal_type", 1)
+DEFTRAIT_EXPR (IS_MEMBER_FUNCTION_POINTER, "__is_member_function_pointer", 1)
 DEFTRAIT_EXPR (IS_MEMBER_POINTER, "__is_member_pointer", 1)
 DEFTRAIT_EXPR (IS_NOTHROW_ASSIGNABLE, "__is_nothrow_assignable", 2)
 DEFTRAIT_EXPR (IS_NOTHROW_CONSTRUCTIBLE, "__is_nothrow_constructible", -1)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 5ca05dde75d..59aaa256232 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12184,6 +12184,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_LITERAL_TYPE:
   return literal_type_p (type1);
 
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
+  return TYPE_PTRMEMFUNC_P (type1);
+
 case CPTK_IS_MEMBER_POINTER:
   return TYPE_PTRMEM_P (type1);
 
@@ -12396,6 +12399,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CLASS:
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
+case CPTK_IS_MEMBER_FUNCTION_POINTER:
 case CPTK_IS_MEMBER_POINTER:
 case CPTK_IS_SAME:
 case CPTK_IS_SCOPED_ENUM:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 994873f14e9..0dfe957474b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -95,6 +95,9 @@
 #if !__has_builtin (__is_literal_type)
 # error "__has_builtin (__is_literal_type) failed"
 #endif
+#if !__has_builtin (__is_member_function_pointer)
+# error "__has_builtin (__is_member_function_pointer) failed"
+#endif
 #if !__has_builtin (__is_member_pointer)
 # error "__has_builtin (__is_member_pointer) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C 
b/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C
new file mode 100644
index 000..555123e8f07
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_member_function_pointer.C
@@ -0,0 +1,31 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_FN(TRAIT, TYPE, EXPECT)\
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT);
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+// Positive tests.
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (int), true);
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (int) const, true);
+SA_TEST_FN(__is_member_function_pointer, int (ClassType::*) (float, ...), 
true);
+SA_TEST_FN(__is_member_function_pointer, ClassType (ClassType::*) (ClassType), 
true);
+SA_TEST_FN(__is_member_function_pointer, float (ClassType::*) (int, float, 
int[], int&), true);
+
+// Negative tests.
+SA_TEST_CATEGORY(__is_member_function_pointer, int (ClassType::*), false);
+SA_TEST_CATEGORY(__is_member_function_pointer, ClassType (ClassType::*), 
false);
+
+// Sanity check.

[PATCH v22 10/31] c++: Implement __is_unbounded_array built-in trait

2023-10-17 Thread Ken Matsui

This patch implements built-in trait for std::is_unbounded_array.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_unbounded_array.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_UNBOUNDED_ARRAY.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_unbounded_array.
* g++.dg/ext/is_unbounded_array.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc  |  3 ++
 gcc/cp/cp-trait.def   |  1 +
 gcc/cp/semantics.cc   |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  3 ++
 gcc/testsuite/g++.dg/ext/is_unbounded_array.C | 37 +++
 5 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_unbounded_array.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index b9f89fe178c..292b941e6a0 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3797,6 +3797,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_TRIVIALLY_COPYABLE:
   inform (loc, "  %qT is not trivially copyable", t1);
   break;
+case CPTK_IS_UNBOUNDED_ARRAY:
+  inform (loc, "  %qT is not an unbounded array", t1);
+  break;
 case CPTK_IS_UNION:
   inform (loc, "  %qT is not a union", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 99bc05360b9..4e02f68e4a9 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -83,6 +83,7 @@ DEFTRAIT_EXPR (IS_TRIVIAL, "__is_trivial", 1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, "__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
+DEFTRAIT_EXPR (IS_UNBOUNDED_ARRAY, "__is_unbounded_array", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
 DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8d5874d6ab0..bd73323e6db 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12217,6 +12217,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_TRIVIALLY_COPYABLE:
   return trivially_copyable_p (type1);
 
+case CPTK_IS_UNBOUNDED_ARRAY:
+  return array_of_unknown_bound_p (type1);
+
 case CPTK_IS_UNION:
   return type_code1 == UNION_TYPE;
 
@@ -12384,6 +12387,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_CONST:
 case CPTK_IS_ENUM:
 case CPTK_IS_SAME:
+case CPTK_IS_UNBOUNDED_ARRAY:
 case CPTK_IS_UNION:
 case CPTK_IS_VOLATILE:
   break;
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index 645cabe088e..90997210c12 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -131,6 +131,9 @@
 #if !__has_builtin (__is_trivially_copyable)
 # error "__has_builtin (__is_trivially_copyable) failed"
 #endif
+#if !__has_builtin (__is_unbounded_array)
+# error "__has_builtin (__is_unbounded_array) failed"
+#endif
 #if !__has_builtin (__is_union)
 # error "__has_builtin (__is_union) failed"
 #endif
diff --git a/gcc/testsuite/g++.dg/ext/is_unbounded_array.C 
b/gcc/testsuite/g++.dg/ext/is_unbounded_array.C
new file mode 100644
index 000..1307d24f5a5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_unbounded_array.C
@@ -0,0 +1,37 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+SA_TEST_CATEGORY(__is_unbounded_array, int[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, int[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, int[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, int[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, float*[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[2], false);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[], true);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, ClassType[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, IncompleteClass[2][3], false);
+SA_TEST_CATEGORY(__is_unbounded_array, IncompleteClass[][3], true);
+SA_TEST_CATEGORY(__is_unbounded_array, int(*)[2], false);

[PATCH v22 23/31] libstdc++: Optimize std::is_reference compilation performance

2023-10-17 Thread Ken Matsui

This patch optimizes the compilation performance of std::is_reference
by dispatching to the new __is_reference built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_reference): Use __is_reference built-in
trait.
(is_reference_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 792213ebfe8..36ad9814047 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -682,6 +682,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Composite type categories.
 
   /// is_reference
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
+  template
+struct is_reference
+: public __bool_constant<__is_reference(_Tp)>
+{ };
+#else
   template
 struct is_reference
 : public false_type
@@ -696,6 +702,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct is_reference<_Tp&&>
 : public true_type
 { };
+#endif
 
   /// is_arithmetic
   template
@@ -3264,12 +3271,19 @@ template 
   inline constexpr bool is_class_v = __is_class(_Tp);
 template 
   inline constexpr bool is_function_v = is_function<_Tp>::value;
+
+#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
+template 
+  inline constexpr bool is_reference_v = __is_reference(_Tp);
+#else
 template 
   inline constexpr bool is_reference_v = false;
 template 
   inline constexpr bool is_reference_v<_Tp&> = true;
 template 
   inline constexpr bool is_reference_v<_Tp&&> = true;
+#endif
+
 template 
   inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
 template 
-- 
2.42.0

1 2 >

1 - 100 of 175 matches

Mail list logo