Re: [PATCH] MATCH [PR19832]: Optimize some `(a != b) ? a OP b : c`

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 7:25 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This patch adds the following match patterns to optimize these:
>  /* (a != b) ? (a - b) : 0 -> (a - b) */
>  /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
>  /* (a != b) ? (a & b) : a -> (a & b) */
>  /* (a != b) ? (a | b) : a -> (a | b) */
>  /* (a != b) ? min(a,b) : a -> min(a,b) */
>  /* (a != b) ? max(a,b) : a -> max(a,b) */
>  /* (a != b) ? (a * b) : (a * a) -> (a * b) */
>  /* (a != b) ? (a + b) : (a + a) -> (a + b) */
>  /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
> Note currently only integer types (include vector types)
> are handled. Floating point types can be added later on.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> The first pattern had still shows up in GCC in cse.c's preferable
> function which was the original motivation for this patch.
>
> PR tree-optimization/19832
>
> gcc/ChangeLog:
>
> * match.pd: Add pattern to optimize
> `(a != b) ? a OP b : c`.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/opt/vectcond-1.C: New test.
> * gcc.dg/tree-ssa/phi-opt-same-1.c: New test.
> ---
>  gcc/match.pd  | 31 ++
>  gcc/testsuite/g++.dg/opt/vectcond-1.C | 57 ++
>  .../gcc.dg/tree-ssa/phi-opt-same-1.c  | 60 +++
>  3 files changed, 148 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/opt/vectcond-1.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c01362ee359..487a7e38719 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5261,6 +5261,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (convert @c0
>  #endif
>
> +(for cnd (cond vec_cond)
> + /* (a != b) ? (a - b) : 0 -> (a - b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (minus@2 @0 @1) integer_zerop)
> +  @2)
> + /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (bit_xor:c@2 @0 @1) integer_zerop)
> +  @2)
> + /* (a != b) ? (a & b) : a -> (a & b) */
> + /* (a != b) ? (a | b) : a -> (a | b) */
> + /* (a != b) ? min(a,b) : a -> min(a,b) */
> + /* (a != b) ? max(a,b) : a -> max(a,b) */
> + (for op (bit_and bit_ior min max)
> +  (simplify
> +   (cnd (ne:c @0 @1) (op:c@2 @0 @1) @0)
> +   @2))
> + /* (a != b) ? (a * b) : (a * a) -> (a * b) */
> + /* (a != b) ? (a + b) : (a + a) -> (a + b) */
> + (for op (mult plus)
> +  (simplify
> +   (cnd (ne:c @0 @1) (op@2 @0 @1) (op @0 @0))
> +   (if (ANY_INTEGRAL_TYPE_P (type))
> +@2)))
> + /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
> + (simplify
> +  (cnd (ne:c @0 @1) (plus@2 @0 @1) (mult @0 uniform_integer_cst_p@3))
> +  (if (wi::to_wide (uniform_integer_cst_p (@3)) == 2)
> +   @2))
> +)
> +
>  /* These was part of minmax phiopt.  */
>  /* Optimize (a CMP b) ? minmax : minmax
> to minmax, c> */
> diff --git a/gcc/testsuite/g++.dg/opt/vectcond-1.C 
> b/gcc/testsuite/g++.dg/opt/vectcond-1.C
> new file mode 100644
> index 000..3877ad11414
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/opt/vectcond-1.C
> @@ -0,0 +1,57 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ccp1 -fdump-tree-optimized" } */
> +/* This is the vector version of these optimizations. */
> +/* PR tree-optimization/19832 */
> +
> +#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
> +
> +static inline vector int max_(vector int a, vector int b)
> +{
> +   return (a > b)? a : b;
> +}
> +static inline vector int min_(vector int a, vector int b)
> +{
> +  return (a < b) ? a : b;
> +}
> +
> +vector int f_minus(vector int a, vector int b)
> +{
> +  return (a != b) ? a - b : (a - a);
> +}
> +vector int f_xor(vector int a, vector int b)
> +{
> +  return (a != b) ? a ^ b : (a ^ a);
> +}
> +
> +vector int f_ior(vector int a, vector int b)
> +{
> +  return (a != b) ? a | b : (a | a);
> +}
> +vector int f_and(vector int a, vector int b)
> +{
> +  return (a != b) ? a & b : (a & a);
> +}
> +vector int f_max(vector int a, vector int b)
> +{
> +  return (a != b) ? max_(a, b) : max_(a, a);
> +}
> +vector int f_min(vector int a, vector int b)
> +{
> +  return (a != b) ? min_(a, b) : min_(a, a);
> +}
> +vector int f_mult(vector int a, vector int b)
> +{
> +  return (a != b) ? a * b : (a * a);
> +}
> +vector int f_plus(vector int a, vector int b)
> +{
> +  return (a != b) ? a + b : (a + a);
> +}
> +vector int f_plus_alt(vector int a, vector int b)
> +{
> +  return (a != b) ? a + b : (a * 2);
> +}
> +
> +/* All of the above function's VEC_COND_EXPR should have been optimized 
> away. */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "ccp1" } } */
> +/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
> new file mode 100644
> index 000..24e757b9b9f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Andrew Pinski wrote:

> On Thu, Aug 31, 2023 at 5:15?AM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Thu, 31 Aug 2023, Filip Kastl wrote:
> >
> > > > The most obvious places would be right after SSA construction and 
> > > > before RTL expansion.
> > > > Can you provide measurements for those positions?
> > >
> > > The algorithm should only remove PHIs that break SSA form minimality. 
> > > Since
> > > GCC's SSA construction already produces minimal SSA form, the algorithm 
> > > isn't
> > > expected to remove any PHIs if run right after the construction. I even
> > > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, 
> > > where the
> > > algorithm managed to remove exactly 1 PHI, which is weird).
> > >
> > > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > > remove at that point, but there still are some.
> >
> > That's interesting.  Your placement at
> >
> >   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >   NEXT_PASS (pass_phiopt, true /* early_p */);
> > + NEXT_PASS (pass_sccp);
> >
> > and
> >
> >NEXT_PASS (pass_tsan);
> >NEXT_PASS (pass_dse, true /* use DR analysis */);
> >NEXT_PASS (pass_dce);
> > +  NEXT_PASS (pass_sccp);
> >
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> 
> Actually the last phiopt is run before last pass_cd_dce:

I meant the second instance of pass_sccp, not phiopt.

Richard.


[PATCH 4/4] RISC-V: Add conditional autovec convert(INT<->FP) patterns

2023-08-31 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
New combine pattern.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_2): Ditto.
* config/riscv/autovec.md (2): Adjust.
(2): Adjust.
(2): Adjust.
(2): Adjust.
(2): Adjust.
(2): Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add INT->FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New 
test.

---
 gcc/config/riscv/autovec-opt.md   | 120 ++
 gcc/config/riscv/autovec.md   |  42 --
 gcc/config/riscv/riscv-v.cc   |   5 +-
 .../autovec/cond/cond_convert_float2int-1.h   |  51 
 .../autovec/cond/cond_convert_float2int-2.h   |  50 
 .../cond/cond_convert_float2int-rv32-1.c  |  15 +++
 .../cond/cond_convert_float2int-rv32-2.c  |  15 +++
 .../cond/cond_convert_float2int-rv64-1.c  |  15 +++
 .../cond/cond_convert_float2int-rv64-2.c  |  15 +++
 .../cond/cond_convert_float2int_run-1.c   |  32 +
 .../cond/cond_convert_float2int_run-2.c   |  31 +
 .../autovec/cond/cond_convert_int2float-1.h   |  45 +++
 .../autovec/cond/cond_convert_int2float-2.h   |  44 +++
 .../cond/cond_convert_int2float-rv32-1.c  |  13 ++
 .../cond/cond_convert_int2float-rv32-2.c  |  13 ++
 .../cond/cond_convert_int2float-rv64-1.c  |  13 ++
 .../cond/cond_convert_int2float-rv64-2.c  |  13 ++
 .../cond/cond_convert_int2float_run-1.c   |  32 +
 .../cond/cond_convert_int2float_run-2.c   |  31 +
 19 files changed, 582 insertions(+), 13 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index ef468bb9df7..1ca5ce97193 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -863,3 +863,123 @@
   riscv_vector::expand_cond_len_unop (icode, ops);
   DONE;
 })
+
+;; Combine convert(FP->INT) + vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (m

[PATCH 3/4] RISC-V: Add conditional autovec convert(FP<->FP) patterns

2023-08-31 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_extend):
New combine pattern.
(*cond_trunc): Ditto.
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-v.cc (needs_fp_rounding): Add FP extend.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c: 
New test.

---
 gcc/config/riscv/autovec-opt.md   | 39 +++
 gcc/config/riscv/autovec.md   | 13 ++-
 gcc/config/riscv/riscv-v.cc   |  4 +-
 .../autovec/cond/cond_convert_float2float-1.h | 29 ++
 .../autovec/cond/cond_convert_float2float-2.h | 28 +
 .../cond/cond_convert_float2float-rv32-1.c|  9 +
 .../cond/cond_convert_float2float-rv32-2.c|  9 +
 .../cond/cond_convert_float2float-rv64-1.c|  9 +
 .../cond/cond_convert_float2float-rv64-2.c|  9 +
 .../cond/cond_convert_float2float_run-1.c | 31 +++
 .../cond/cond_convert_float2float_run-2.c | 30 ++
 11 files changed, 199 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 6796239d82d..ef468bb9df7 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -824,3 +824,42 @@
   riscv_vector::expand_cond_len_unop (icode, ops);
   DONE;
 })
+
+;; Combine FP sign_extend/zero_extend(vf2) and vcond_mask
+(define_insn_and_split "*cond_extend"
+  [(set (match_operand:VWEXTF_ZVFHMIN 0 "register_operand")
+(if_then_else:VWEXTF_ZVFHMIN
+  (match_operand: 1 "register_operand")
+  (float_extend:VWEXTF_ZVFHMIN (match_operand: 2 
"register_operand"))
+  (match_operand:VWEXTF_ZVFHMIN 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_extend (mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
+
+;; Combine FP trunc(vf2) + vcond_mask
+(define_insn_and_split "*cond_trunc"
+  [(set (match_operand: 0 "register_operand")
+(if_then_else:
+  (match_operand: 1 "register_operand")
+  (float_truncate:
+(match_operand:VWEXTF_ZVFHMIN 2 "register_operand"))
+  (match_operand: 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_trunc (mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4859805b8f7..a4ac688e373 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -742,13 +742,8 @@
   "TARGET_VECTOR && (TARGET_ZVFHMIN || TARGET_ZVFH)"
 {
   rtx dblw = gen_reg_rtx (mode);
-  insn_code icode = code_for_pred_extend (mode);
-  rtx ops1[] = {dblw, operands[1]};
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, ops1);
-
-  icode = code_for_pred_extend (mode);
-  rtx ops2[] = {operands[0], dblw};
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP, ops2);
+  emit_insn (gen_extend2 (dblw, operands[1]));
+  emit_insn (gen_extend2 (operands[0], dblw));
   DONE;
 })
 
@@ -791,9 +786,7 @@
   insn_code icode = code_for_pred_rod_trunc (mode);
   risc

[PATCH 2/4] RISC-V: Add conditional autovec convert(INT<->INT) patterns

2023-08-31 Thread Lehua Ding
gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
New combine pattern.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_trunc): Ditto.
* config/riscv/autovec.md (2): Adjust.
(2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New 
test.

---
 gcc/config/riscv/autovec-opt.md   | 77 +++
 gcc/config/riscv/autovec.md   | 37 -
 .../riscv/rvv/autovec/binop/narrow-3.c|  2 +-
 .../rvv/autovec/cond/cond_convert_int2int-1.h | 47 +++
 .../rvv/autovec/cond/cond_convert_int2int-2.h | 46 +++
 .../cond/cond_convert_int2int-rv32-1.c| 17 
 .../cond/cond_convert_int2int-rv32-2.c| 16 
 .../cond/cond_convert_int2int-rv64-1.c| 16 
 .../cond/cond_convert_int2int-rv64-2.c| 16 
 .../autovec/cond/cond_convert_int2int_run-1.c | 31 
 .../autovec/cond/cond_convert_int2int_run-2.c | 30 
 11 files changed, 311 insertions(+), 24 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 92590776c3e..6796239d82d 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -747,3 +747,80 @@
   riscv_vector::BINARY_OP, operands);
   DONE;
 })
+
+;; Combine sign_extend/zero_extend(vf2) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+(if_then_else:VWEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VWEXTI (match_operand: 2 
"register_operand"))
+  (match_operand:VWEXTI 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf2 (, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf4) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VQEXTI 0 "register_operand")
+(if_then_else:VQEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VQEXTI (match_operand: 2 
"register_operand"))
+  (match_operand:VQEXTI 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf4 (, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf8) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VOEXTI 0 "register_operand")
+(if_then_else:VOEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VOEXTI (match_operand: 2 
"register_operand"))
+  (match_operand:VOEXTI 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf8 (, mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
+
+;; Combine trunc(vf2) + vcond_mask
+(define_insn_and_split "*cond_trunc"
+  [(set (match_operand: 0 "register_operand")
+(if_then_else:
+

[PATCH 0/4] Add conditional autovec convert patterns

2023-08-31 Thread Lehua Ding
Hi,

these patchs support combining convert_op + vcond_mask to convert_op with mask
operand. The method is to keep the vector convert pattern simple (by changing
define_expand to define_insn_and_split) until the combine pass and introduce
the corresponding pattern to match the pattern after the combine.

Best,
Lehua

Lehua Ding (4):
  RISC-V: Adjust expand_cond_len_{unary,binop,op} api
  RISC-V: Add conditional autovec convert(INT<->INT) patterns
  RISC-V: Add conditional autovec convert(FP<->FP) patterns
  RISC-V: Add conditional autovec convert(INT<->FP) patterns

 gcc/config/riscv/autovec-opt.md   | 236 ++
 gcc/config/riscv/autovec.md   | 110 
 gcc/config/riscv/riscv-protos.h   |   4 +-
 gcc/config/riscv/riscv-v.cc   |  39 +--
 .../riscv/rvv/autovec/binop/narrow-3.c|   2 +-
 .../autovec/cond/cond_convert_float2float-1.h |  29 +++
 .../autovec/cond/cond_convert_float2float-2.h |  28 +++
 .../cond/cond_convert_float2float-rv32-1.c|   9 +
 .../cond/cond_convert_float2float-rv32-2.c|   9 +
 .../cond/cond_convert_float2float-rv64-1.c|   9 +
 .../cond/cond_convert_float2float-rv64-2.c|   9 +
 .../cond/cond_convert_float2float_run-1.c |  31 +++
 .../cond/cond_convert_float2float_run-2.c |  30 +++
 .../autovec/cond/cond_convert_float2int-1.h   |  51 
 .../autovec/cond/cond_convert_float2int-2.h   |  50 
 .../cond/cond_convert_float2int-rv32-1.c  |  15 ++
 .../cond/cond_convert_float2int-rv32-2.c  |  15 ++
 .../cond/cond_convert_float2int-rv64-1.c  |  15 ++
 .../cond/cond_convert_float2int-rv64-2.c  |  15 ++
 .../cond/cond_convert_float2int_run-1.c   |  32 +++
 .../cond/cond_convert_float2int_run-2.c   |  31 +++
 .../autovec/cond/cond_convert_int2float-1.h   |  45 
 .../autovec/cond/cond_convert_int2float-2.h   |  44 
 .../cond/cond_convert_int2float-rv32-1.c  |  13 +
 .../cond/cond_convert_int2float-rv32-2.c  |  13 +
 .../cond/cond_convert_int2float-rv64-1.c  |  13 +
 .../cond/cond_convert_int2float-rv64-2.c  |  13 +
 .../cond/cond_convert_int2float_run-1.c   |  32 +++
 .../cond/cond_convert_int2float_run-2.c   |  31 +++
 .../rvv/autovec/cond/cond_convert_int2int-1.h |  47 
 .../rvv/autovec/cond/cond_convert_int2int-2.h |  46 
 .../cond/cond_convert_int2int-rv32-1.c|  17 ++
 .../cond/cond_convert_int2int-rv32-2.c|  16 ++
 .../cond/cond_convert_int2int-rv64-1.c|  16 ++
 .../cond/cond_convert_int2int-rv64-2.c|  16 ++
 .../autovec/cond/cond_convert_int2int_run-1.c |  31 +++
 .../autovec/cond/cond_convert_int2int_run-2.c |  30 +++
 37 files changed, 1124 insertions(+), 68 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/au

[PATCH 1/4] RISC-V: Adjust expand_cond_len_{unary,binop,op} api

2023-08-31 Thread Lehua Ding
This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.

---
 gcc/config/riscv/autovec.md | 18 +++--
 gcc/config/riscv/riscv-protos.h |  4 ++--
 gcc/config/riscv/riscv-v.cc | 34 +++--
 3 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ebe1b10aa12..006e174ebd5 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1551,7 +1551,8 @@
(match_operand 5 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_unop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_unop (icode, operands);
   DONE;
 })
 
@@ -1588,7 +1589,8 @@
(match_operand 5 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_unop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_unop (icode, operands);
   DONE;
 })
 
@@ -1627,7 +1629,8 @@
(match_operand 6 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_binop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_binop (icode, operands);
   DONE;
 })
 
@@ -1667,7 +1670,8 @@
(match_operand 6 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_binop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_binop (icode, operands);
   DONE;
 })
 
@@ -1707,7 +1711,8 @@
(match_operand 6 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_binop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_binop (icode, operands);
   DONE;
 })
 
@@ -1745,7 +1750,8 @@
(match_operand 6 "const_0_operand")]
   "TARGET_VECTOR"
 {
-  riscv_vector::expand_cond_len_binop (, operands);
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::expand_cond_len_binop (icode, operands);
   DONE;
 })
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index e145ee6c69b..dd7aa360ec5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -426,8 +426,8 @@ bool neg_simm5_p (rtx);
 bool has_vi_variant_p (rtx_code, rtx);
 void expand_vec_cmp (rtx, rtx_code, rtx, rtx);
 bool expand_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
-void expand_cond_len_unop (rtx_code, rtx *);
-void expand_cond_len_binop (rtx_code, rtx *);
+void expand_cond_len_unop (unsigned, rtx *);
+void expand_cond_len_binop (unsigned, rtx *);
 void expand_reduction (rtx_code, rtx *, rtx,
   reduction_type = reduction_type::UNORDERED);
 #endif
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6228ff3d92e..89ac4743f40 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -245,6 +245,12 @@ public:
   always Pmode.  */
if (mode == VOIDmode)
  mode = Pmode;
+   else
+ /* Early assertion ensures same mode since maybe_legitimize_operand
+will check this.  */
+ gcc_assert (GET_MODE (ops[opno]) == VOIDmode
+ || GET_MODE (ops[opno]) == mode);
+
add_input_operand (ops[opno], mode);
   }
 
@@ -291,6 +297,7 @@ public:
 if (m_insn_flags & FRM_DYN_P)
   add_rounding_mode_operand (FRM_DYN);
 
+gcc_assert (insn_data[(int) icode].n_operands == m_opno);
 expand (icode, any_mem_p);
   }
 
@@ -2951,17 +2958,20 @@ expand_load_store (rtx *ops, bool is_load)
 
 /* Return true if the operation is the floating-point operation need FRM.  */
 static bool
-needs_fp_rounding (rtx_code code, machine_mode mode)
+needs_fp_rounding (unsigned icode, machine_mode mode)
 {
   if (!FLOAT_MODE_P (mode))
 return false;
-  return code != SMIN && code != SMAX && code != NEG && code != ABS;
+
+  return icode != maybe_code_for_pred (SMIN, mode)
+&& icode != maybe_code_for_pred (SMAX, mode)
+&& icode != maybe_code_for_pred (NEG, mode)
+&& icode != maybe_code_for_pred (ABS, mode);
 }
 
 /* Subroutine to expand COND_LEN_* patterns.  */
 static void
-expand_cond_len_op (rtx_code code, unsigned icode, insn_flags op_type, rtx 
*ops,
-   rtx len)
+expand_cond_len_op (unsigned icode, insn_flags op_type, rtx *ops, rtx len)
 {
   rtx dest = ops[0];
   rtx mask = ops[1];
@@ -2980,7 +2990,7 @@ expand_cond_len_op (rtx_code code, unsigned icode, 
insn_flags op_type, r

[PATCH v1] RISC-V: Support FP ADD/SUB/MUL/DIV autovec for VLS mode

2023-08-31 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to allow the VLS mode autovec for the
floating-point binary operation ADD/SUB/MUL/DIV.

Given below code example:

test (float *out, float *in1, float *in2)
{
  for (int i = 0; i < 128; i++)
out[i] = in1[i] + in2[i];
}

Before this patch:
test:
  csrr a4,vlenb
  slli a4,a4,1
  li   a5,128
  bleu a5,a4,.L38
  mv   a5,a4
.L38:
  vsetvli  zero,a5,e32,m8,ta,ma
  vle32.v  v16,0(a1)
  vsetvli  a4,zero,e32,m8,ta,ma
  vmv.v.i  v8,0
  vsetvli  zero,a5,e32,m8,tu,ma
  vle32.v  v24,0(a2)
  vfadd.vv v8,v24,v16
  vse32.v  v8,0(a0)
  ret

After this patch:
test:
  li   a5,128
  vsetvli  zero,a5,e32,m1,ta,ma
  vle32.v  v1,0(a2)
  vle32.v  v2,0(a1)
  vfadd.vv v1,v1,v2
  vse32.v  v1,0(a0)
  ret

Please note this patch also fix the execution failure of below
vect test cases.

* vect-alias-check-10.c
* vect-alias-check-11.c
* vect-alias-check-12.c
* vect-alias-check-14.c

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/autovec-vls.md (3): New pattern for
vls floating-point autovec.
* config/riscv/vector-iterators.md: New iterator for
floating-point V and VLS.
* config/riscv/vector.md: Add VLS to floating-point binop.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: New test.
---
 gcc/config/riscv/autovec-vls.md   | 24 ++
 gcc/config/riscv/vector-iterators.md  | 80 +++
 gcc/config/riscv/vector.md| 12 +--
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  8 ++
 .../rvv/autovec/vls/floating-point-add-1.c| 43 ++
 .../rvv/autovec/vls/floating-point-add-2.c| 43 ++
 .../rvv/autovec/vls/floating-point-add-3.c| 43 ++
 .../rvv/autovec/vls/floating-point-div-1.c| 43 ++
 .../rvv/autovec/vls/floating-point-div-2.c| 43 ++
 .../rvv/autovec/vls/floating-point-div-3.c| 43 ++
 .../rvv/autovec/vls/floating-point-mul-1.c| 43 ++
 .../rvv/autovec/vls/floating-point-mul-2.c| 43 ++
 .../rvv/autovec/vls/floating-point-mul-3.c| 43 ++
 .../rvv/autovec/vls/floating-point-sub-1.c| 43 ++
 .../rvv/autovec/vls/floating-point-sub-2.c| 43 ++
 .../rvv/autovec/vls/floating-point-sub-3.c| 43 ++
 16 files changed, 634 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-add-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-div-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-div-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-div-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-mul-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sub-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c

diff --git a/gcc/config/riscv/autovec-vls.md b/gcc/config/riscv/autovec-vls.md
index 503ad691b9a..4ca640c11e2 100644
--- a/gcc/config/riscv/autovec-vls.md
+++ b/gcc/config/riscv/autovec-vls.md
@@ -208,6 +208,30 @@ (define_insn_and_split "3"
 [(set_attr "type" "vector")]
 )
 
+;; -
+;;  [FP] Binary operations
+;; -
+;; Includes:
+;; - vfadd.vv/vfsub.vv/vfmul.vv/vfdiv.vv
+;; - vfadd.vf/vfsub.vf/vfmul.vf/vfdiv.vf
+;; -
+(d

[PATCH v6] LoongArch:Implement 128-bit floating point functions in gcc.

2023-08-31 Thread chenxiaolong
Brief version history of patch set:

v1 -> v2:
   According to the GNU code specification, adjust the format of the
function implementation with "q" as the suffix function.

v2 - >v3:

   1.On the LoongArch architecture, refer to the functionality of 64-bit
functions and modify the underlying implementation of __builtin_{nanq, nansq}
functions in libgcc.

   2.Modify the function's instruction template to use some instructions such
as "bstrins.d" to implement the 128-bit __builtin_{fabsq, copysignq} function
instead of calling libgcc library support, so as to better play the machine's
performance.

v3 -> v4:

   1.The above v1,v2, and v3 all implement 128-bit floating-point functions
with "q" as the suffix, but it is an older implementation. The v4 version
completely abandoned the old implementation by associating the 128-bit
floating-point function with the "q" suffix with the "f128" function that
already existed in GCC.

   2.Modify the code so that both "__float128" and "_Float128" function types
can be supported in compiler gcc.

   3.Associating a function with the suffix "q" to the "f128" function allows
two different forms of the function to produce the same effect, For example,
__builtin_{huge_{valq/valf128},{infq/inff128},{nanq/nanf128},{nansq/nansf128},
{fabsq/fabsf128}}.

   4.For the _builtin_copysignq  function, do not call the new "f128"
implementation, but use the "bstrins" and other instructions in the machine
description file to implement the function function, the result is that the
number of assembly instructions can be reduced and the function optimization
to achieve the optimal effect.

v4 -> v5:

   Removed the v4 implementation of the __builtin_fabsf128() function added
to LoongArch.md.

v5 -> v6:

   1.Modify the test cases in the math-float-128.c file.

   2.Removed the v5 implementation of the __builtin_copysignf128() function
added to LoongArch.md.

During implementation, float128_type_node is bound with the type "__float128"
so that the compiler can correctly identify the type   of the function. The
"q" suffix is associated with the "f128" function, which makes GCC more
flexible to support different user input cases, implementing functions such
as __builtin_{huge_valq, infq, fabsq, copysignq, nanq, nansq}.

gcc/ChangeLog:

* config/loongarch/loongarch-builtins.cc (loongarch_init_builtins):
Associate the __float128 type to float128_type_node so that it can
be recognized by the compiler.
* config/loongarch/loongarch-c.cc (loongarch_cpu_cpp_builtins):
Add the flag "FLOAT128_TYPE" to gcc and associate a function
with the suffix "q" to "f128".
* doc/extend.texi:Added support for 128-bit floating-point functions on
the LoongArch architecture.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/math-float-128.c: New test.
---
 gcc/config/loongarch/loongarch-builtins.cc|  5 ++
 gcc/config/loongarch/loongarch-c.cc   | 11 +++
 gcc/doc/extend.texi   | 20 -
 .../gcc.target/loongarch/math-float-128.c | 81 +++
 4 files changed, 114 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/math-float-128.c

diff --git a/gcc/config/loongarch/loongarch-builtins.cc 
b/gcc/config/loongarch/loongarch-builtins.cc
index b929f224dfa..58b612bf445 100644
--- a/gcc/config/loongarch/loongarch-builtins.cc
+++ b/gcc/config/loongarch/loongarch-builtins.cc
@@ -256,6 +256,11 @@ loongarch_init_builtins (void)
   unsigned int i;
   tree type;
 
+  /* Register the type float128_type_node as a built-in type and
+ give it an alias "__float128".  */
+  (*lang_hooks.types.register_builtin_type) (float128_type_node,
+   "__float128");
+
   /* Iterate through all of the bdesc arrays, initializing all of the
  builtin functions.  */
   for (i = 0; i < ARRAY_SIZE (loongarch_builtins); i++)
diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 67911b78f28..6ffbf748316 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -99,6 +99,17 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   else
 builtin_define ("__loongarch_frlen=0");
 
+  /* Add support for FLOAT128_TYPE on the LoongArch architecture.  */
+  builtin_define ("__FLOAT128_TYPE__");
+
+  /* Map the old _Float128 'q' builtins into the new 'f128' builtins.  */
+  builtin_define ("__builtin_fabsq=__builtin_fabsf128");
+  builtin_define ("__builtin_copysignq=__builtin_copysignf128");
+  builtin_define ("__builtin_nanq=__builtin_nanf128");
+  builtin_define ("__builtin_nansq=__builtin_nansf128");
+  builtin_define ("__builtin_infq=__builtin_inff128");
+  builtin_define ("__builtin_huge_valq=__builtin_huge_valf128");
+
   /* Native Data Sizes.  */
   builtin_define_with_int_value ("_LOONGARCH_SZINT", INT_TYPE_SIZE);
   builtin_define_with_int_value ("_LOONGARCH_SZLONG", LONG_TYPE_SIZE

Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Hans-Peter Nilsson via Gcc-patches
(Looks like this was committed as r14-3580-g597b9ec69bca8a)

> Cc: g...@gcc.gnu.org, gcc-patches@gcc.gnu.org, Eric Feng 
> From: Eric Feng via Gcc 

> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>   * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
>   * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
>   * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
>   * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
>   * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
>   * gcc.dg/plugin/plugin.exp: New tests.
>   * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
>   * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
>   * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

It seems this was more or less a rewrite, but that said,
it's generally preferable to always *add* tests, never *modify* them.

>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-

^^^ Ouch!  Was it not within reason to keep that test as it
was, and just add another test?

Anyway, the test after rewrite fails, and for some targets
like cris-elf and apparently m68k-linux, yields an error.
I see a PR was already opened.

Also, mostly for future reference, several files in the
patch miss a final newline, as seen by a "\ No newline at
end of file"-marker.

I think I found the problem; a mismatch between default C++
language standard between host-gcc and target-gcc.

(It's actually *not* as simple as "auto var = typeofvar()"
not being recognized in C++11 --or else there'd be an error
for the hash_set declaration too, which I just changed for
consistency-- but it's close enough for me.)

With this, retesting plugin.exp for cris-elf works.

Ok to commit?

-- >8 --
From: Hans-Peter Nilsson 
Date: Fri, 1 Sep 2023 04:36:03 +0200
Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c declarations, PR 
testsuite/111264

Also, add missing newline at end of file.

PR testsuite/111264
* gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
C++11-compatible.
---
 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7af520436549..bf1982e79c37 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
   if (!ctxt)
 return;
 
-  auto region_to_refcnt = hash_map ();
-  auto seen_regions = hash_set ();
+  hash_map region_to_refcnt;
+  hash_set seen_regions;
 
   count_pyobj_references (model, region_to_refcnt, retval, seen_regions);
   check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
@@ -561,7 +561,7 @@ public:
 if (!ctxt)
   return;
 region_model *model = cd.get_model ();
-auto region_to_refcnt = hash_map ();
+hash_map region_to_refcnt;
 count_all_references(model, region_to_refcnt);
 dump_refcnt_info(region_to_refcnt, model, ctxt);
   }
@@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args *plugin_info,
   sorry_no_analyzer ();
 #endif
   return 0;
-}
\ No newline at end of file
+}
-- 
2.30.2

brgds, H-P


[PATCH] MATCH: `(nop_convert)-a` into -(nop_convert)a if the negate is single use and a is known not to be signed min value

2023-08-31 Thread Andrew Pinski via Gcc-patches
This pushes the conversion further down the chain which allows to optimize away 
more
conversions in many cases.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/107765
PR tree-optimization/107137

gcc/ChangeLog:

* match.pd (`(nop_convert)-a`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/neg-cast-1.c: New test.
* gcc.dg/tree-ssa/neg-cast-2.c: New test.
* gcc.dg/tree-ssa/neg-cast-3.c: New test.
---
 gcc/match.pd   | 31 ++
 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c | 17 
 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c | 20 ++
 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c | 15 +++
 4 files changed, 83 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 487a7e38719..3cff9b03d92 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -959,6 +959,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 #endif

 
+/* (nop_cast)-var -> -(nop_cast)(var)
+   if -var is known to not overflow; that is does not include
+   the signed integer MIN. */
+(simplify
+ (convert (negate:s @0))
+ (if (INTEGRAL_TYPE_P (type)
+  && tree_nop_conversion_p (type, TREE_TYPE (@0)))
+  (with {
+/* If the top is not set, there is no overflow happening. */
+bool contains_signed_min = !wi::ges_p (tree_nonzero_bits (@0), 0);
+#if GIMPLE
+int_range_max vr;
+if (contains_signed_min
+&& TREE_CODE (@0) == SSA_NAME
+   && get_range_query (cfun)->range_of_expr (vr, @0)
+   && !vr.undefined_p ())
+  {
+tree stype = signed_type_for (type);
+   auto minvalue = wi::min_value (stype);
+   int_range_max valid_range (TREE_TYPE (@0), minvalue, minvalue);
+   vr.intersect (valid_range);
+   /* If the range does not include min value,
+  then we can do this change around. */
+   if (vr.undefined_p ())
+ contains_signed_min = false;
+  }
+#endif
+   }
+   (if (!contains_signed_min)
+(negate (convert @0))
+
 (for op (negate abs)
  /* Simplify cos(-x) and cos(|x|) -> cos(x).  Similarly for cosh.  */
  (for coss (COS COSH)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
new file mode 100644
index 000..7ddf40aca29
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+/* PR tree-optimization/107765 */
+
+#include 
+
+int a(int input)
+{
+if (input == INT_MIN) __builtin_unreachable();
+unsigned t = input;
+int tt =  -t;
+return tt == -input;
+}
+
+/* Should be able to optimize this down to just `return 1;` during evrp. */
+/* { dg-final { scan-tree-dump "return 1;" "evrp" } } */
+/* { dg-final { scan-tree-dump-not " - " "evrp" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
new file mode 100644
index 000..ce49079e235
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-fre3 -fdump-tree-optimized" } */
+/* part of PR tree-optimization/108397 */
+
+long long
+foo (unsigned char o)
+{
+  unsigned long long t1 = -(long long) (o == 0);
+  unsigned long long t2 = -(long long) (t1 != 0);
+  unsigned long long t3 = -(long long) (t1 <= t2);
+  return t3;
+}
+
+/* Should be able to optimize this down to just `return -1;` during fre3. */
+/* { dg-final { scan-tree-dump "return -1;" "fre3" } } */
+/* FRE does not remove all dead statements */
+/* { dg-final { scan-tree-dump-not " - " "fre3" { xfail *-*-* } } } */
+
+/* { dg-final { scan-tree-dump "return -1;" "optimized" } } */
+/* { dg-final { scan-tree-dump-not " - " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
new file mode 100644
index 000..a26a6051bda
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/neg-cast-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop2 -fdump-tree-optimized" } */
+/* PR tree-optimization/107137 */
+
+unsigned f(_Bool a)
+{
+  int t = a;
+  t = -t;
+  return t;
+}
+
+/* There should be no cast to int at all. */
+/* Forwprop2 does not remove all of the statements. */
+/* { dg-final { scan-tree-dump-not "\\\(int\\\)" "forwprop2" { xfail *-*-* } } 
} */
+/* { dg-final { scan-tree-dump-not "\\\(int\\\)" "optimized" } } */
-- 
2.31.1



Re: [PING][PATCH] LoongArch: initial ada support on linux

2023-08-31 Thread Yujie Yang
On Thu, Aug 31, 2023 at 03:09:52PM +0200, Marc Poulhiès wrote:
> 
> Yang Yujie  writes:
> 
> Hello Yujie,
> 
> > gcc/ChangeLog:
> >
> > * ada/Makefile.rtl: Add LoongArch support.
> > * ada/libgnarl/s-linux__loongarch.ads: New.
> > * ada/libgnat/system-linux-loongarch.ads: New.
> > * config/loongarch/loongarch.h: mark normalized options
> > passed from driver to gnat1 as explicit for multilib.
> > ---
> >  gcc/ada/Makefile.rtl   |  49 +++
> >  gcc/ada/libgnarl/s-linux__loongarch.ads| 134 +++
> >  gcc/ada/libgnat/system-linux-loongarch.ads | 145 +
> 
> The Ada part of the patch looks correct, thanks.
> 
> >  gcc/config/loongarch/loongarch.h   |   4 +-
> >  4 files changed, 330 insertions(+), 2 deletions(-)
> > diff --git a/gcc/config/loongarch/loongarch.h 
> > b/gcc/config/loongarch/loongarch.h
> > index f8167875646..9887a7ac630 100644
> > --- a/gcc/config/loongarch/loongarch.h
> > +++ b/gcc/config/loongarch/loongarch.h
> > @@ -83,9 +83,9 @@ along with GCC; see the file COPYING3.  If not see
> >  /* CC1_SPEC is the set of arguments to pass to the compiler proper.  */
> >
> >  #undef CC1_SPEC
> > -#define CC1_SPEC "\
> > +#define CC1_SPEC "%{,ada:-gnatea} %{m*} \
> >  %{G*} \
> > -%(subtarget_cc1_spec)"
> > +%(subtarget_cc1_spec) %{,ada:-gnatez}"
> >
> >  /* Preprocessor specs.  */
> 
> This is outside of ada/ (so I don't have a say on it), but I'm curious
> about why you need to use -gnatea/-gnatez here?
> 
> Thanks,
> Marc

Hi Marc,

Thank you for the review!

We added -gnatea and -gnatez to CC1_SPECS for correct multilib handling,
and I believe this is currently specific to LoongArch.

LoongArch relies on the GCC driver (via self_specs rules) to generate a
canonicalized tuple of parameters that identifies the current target (ISA/ABI)
configuration, including the "-mabi=" option that corresponds to the selected
multilib variant.  Even if "-mabi=" itself is not given explicitly to gcc, it
may be fed to the compiler propers with values other than the default ABI.

For GNAT on LoongArch, it is necessary that -mabi= generated by driver
self-specs gets stored in the .ali file, otherwise the linker might
hit the wrong multilib variant by assuming the default ABI.  Using
-gnatea/-gnatez can mark the driver-generated "-mabi=" as "explicit",
so it is sure to be found in "A"-records of the generated *.ali file.

Currently, gnatmake only marks user-specified options as explicit with
-gnatea and -gnatez, but not others [gcc/ada/make.adb].  So I think it's
necessary to have these marks around our driver-canonicalized %{m*} tuple
as well.

(Not sure if we should also mark non-multilib-related options other than
"-mabi=" as explicit, but it doesn't seem to do any harm.)

Sincerely,
Yujie



Re: Re: [PATCH 1/2] allow targets to check shrink-wrap-separate enabled or not

2023-08-31 Thread Fei Gao


On 2023-08-29 09:46  Jeff Law  wrote:
>
>
>
>On 8/28/23 19:28, Fei Gao wrote:
>> On 2023-08-29 06:54  Jeff Law  wrote:
>>>
>>>
>>>
>>> On 8/28/23 01:47, Fei Gao wrote:
 no functional changes but allow targets to check shrink-wrap-separate 
 enabled or not.

      gcc/ChangeLog:

    * shrink-wrap.cc (try_shrink_wrapping_separate):call
      use_shrink_wrapping_separate.
    (use_shrink_wrapping_separate): wrap the condition
      check in use_shrink_wrapping_separate.
    * shrink-wrap.h (use_shrink_wrapping_separate): add to extern
>>> So as I mentioned earlier today in the older thread, can we use
>>> override_options to do this?
>>>
>>> If we look at aarch64_override_options we have this:
>>>
>>>     /* The pass to insert speculation tracking runs before
>>>    shrink-wrapping and the latter does not know how to update the
>>>    tracking status.  So disable it in this case.  */
>>>     if (aarch64_track_speculation)
>>>   flag_shrink_wrap = 0;
>>>
>>> We kind of want this instead
>>>
>>>     if (flag_shrink_wrap)
>>>   {
>>>     turn off whatever target bits enable the cm.push/cm.pop insns
>>>   }
>>>
>>>
>>> This does imply that we have a distinct target flag to enable/disable
>>> those instructions.  But that seems like a good thing to have anyway.
>> I'm afraid we cannot simply resolve the confilict based on
>> flag_shrink_wrap/flag_shrink_wrap_separate only, as they're set true from 
>> -O1 onwards,
>> which means zcmp is disabled almostly unless 
>> -fno-shrink-warp/-fno-shrink-warp-separate
>> are explictly given.
>Yea, but I would generally expect that if someone is really concerned
>about code size, they're probably using -Os which (hopefully) would not
>have shrink-wrapping enabled.
>
>>
>> So after discussion with Kito, we would like to turn on zcmp for -Os and 
>> shrink-warp-separate
>> for the speed perfered optimization. use_shrink_wrapping_separate in this 
>> patch provide the
>> chance for this check. No new hook is needed.
>Seems reasonable to me if Kito is OK with it. 

Thanks Jeff and Kito for the discussion.

Could you please review the new series at your convenience?
https://patchwork.sourceware.org/project/gcc/list/?series=24065

BR, 
Fei

>
>jeff

Re: [PATCH v2 1/4] LoongArch: improved target configuration interface

2023-08-31 Thread Yujie Yang
On Thu, Aug 31, 2023 at 05:56:26PM +, Joseph Myers wrote:
> On Thu, 31 Aug 2023, Yujie Yang wrote:
> 
> > -If none of such suffix is present, the configured value of
> > -@option{--with-multilib-default} can be used as a common default suffix
> > -for all library ABI variants.  Otherwise, the default build option
> > -@code{-march=abi-default} is applied when building the variants without
> > -a suffix.
> > +If no such suffix is present for a given multilib variant, the
> > +configured value of @code{--with-multilib-default} is appended as a default
> > +suffix.  If @code{--with-multilib-default} is not given, the default build
> > +option @code{-march=abi-default} is applied when building the variants
> > +without a suffix.
> 
> @option is appropriate for --with-multilib-default and other configure 
> options; it shouldn't be changed to @code.
> 

Thanks! Fixed in the v3 patchset.



Re: [PATCH] analyzer: Add support of placement new and improved operator new [PR105948,PR94355]

2023-08-31 Thread David Malcolm via Gcc-patches
On Fri, 2023-09-01 at 00:04 +0200, priour...@gmail.com wrote:


> Hi, 
> 
> Succesfully regstrapped off trunk 7f2ed06ddc825e8a4e0edfd1d66b5156e6dc1d34
> on x86_64-linux-gnu.
> 
> Is it OK for trunk ?

Hi Benjamin.

Thanks for the patch.  It's OK as-is, but it doesn't cover every
case...

[...snip...]

> diff --git a/gcc/analyzer/call-details.cc b/gcc/analyzer/call-details.cc
> index 66fb0fe871e..8d60e928b15 100644
> --- a/gcc/analyzer/call-details.cc
> +++ b/gcc/analyzer/call-details.cc
> @@ -295,6 +295,17 @@ call_details::get_arg_svalue (unsigned idx) const
>return m_model->get_rvalue (arg, m_ctxt);
>  }
>  
> +/* If argument IDX's svalue at the callsite is a region_svalue,
> +   return the region it points to.
> +   Otherwise return NULL.  */
> +
> +const region *
> +call_details::maybe_get_arg_region (unsigned idx) const
> +{
> +  const svalue *sval = get_arg_svalue (idx);
> +  return sval->maybe_get_region ();
> +}
> +

Is this the correct thing to be doing?  It's used in the following...

[...snip...]

> diff --git a/gcc/analyzer/kf-lang-cp.cc b/gcc/analyzer/kf-lang-cp.cc
> index 393b4f25e79..4450892dfa2 100644
> --- a/gcc/analyzer/kf-lang-cp.cc
> +++ b/gcc/analyzer/kf-lang-cp.cc

[...snip...]

> @@ -54,28 +90,75 @@ public:
>  region_model *model = cd.get_model ();
>  region_model_manager *mgr = cd.get_manager ();
>  const svalue *size_sval = cd.get_arg_svalue (0);
> -const region *new_reg
> -  = model->get_or_create_region_for_heap_alloc (size_sval, cd.get_ctxt 
> ());
> -if (cd.get_lhs_type ())
> +region_model_context *ctxt = cd.get_ctxt ();
> +const gcall *call = cd.get_call_stmt ();
> +
> +/* If the call was actually a placement new, check that accessing
> +   the buffer lhs is placed into does not result in out-of-bounds.  */
> +if (is_placement_new_p (call))
>{
> - const svalue *ptr_sval
> -   = mgr->get_ptr_svalue (cd.get_lhs_type (), new_reg);
> - cd.maybe_set_lhs (ptr_sval);
> + const region *ptr_reg = cd.maybe_get_arg_region (1);
> + if (ptr_reg && cd.get_lhs_type ())
> +   {

...which will only fire if arg 1 is a region_svalue.  This won't
trigger if you have e.g. a binop_svalue for pointer arithmetic.

What happens e.g. for this one-off-the-end bug:

  void *p = malloc (4);
  if (!p)
return;
  int32_t *i = ::new (p + 1) int32_t;
  *i = 42;

So maybe call_details::maybe_get_arg_region should instead be:

/* Return the region that argument IDX points to.  */

const region *
call_details::deref_ptr_arg (unsigned idx) const
{
  const svalue *ptr_sval = get_arg_svalue (idx);
  return m_model->deref_rvalue (ptr_sval, get_arg_tree (idx), m_ctxt);
}

(caveat: I didn't test this)

> + const region *base_reg = ptr_reg->get_base_region ();
> + const svalue *num_bytes_sval = cd.get_arg_svalue (0);
> + const region *sized_new_reg
> + = mgr->get_sized_region (base_reg,
> +  cd.get_lhs_type (),
> +  num_bytes_sval);

Why do you use the base_reg here, rather than just ptr_reg?

In the example above, the *(p + 1) has base region
heap_allocated_region, but the ptr_reg is one byte higher; hence
check_region_for_write of 4 bytes ought to detect a problem with
writing 4 bytes to *(p + 1), but wouldn't complain about the write to
*p.

...assuming that I'm reading this code correctly.

> + model->check_region_for_write (sized_new_reg,
> +nullptr,
> +ctxt);
> + const svalue *ptr_sval
> +   = mgr->get_ptr_svalue (cd.get_lhs_type (), sized_new_reg);
> + cd.maybe_set_lhs (ptr_sval);
> +   }
> +  }

[...snip...]

The patch is OK for trunk as is; but please can you look into the
above.

If the above is a problem, you can either do another version of the
patch, or do it as a followup patch (whichever you're more comfortable
with, but it might be best to get the patch into trunk as-is, given
that the GSoC period is nearly over).

Thanks
Dave



[PATCH] Add Types to Un-Typed Pic Instructions:

2023-08-31 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the pic instructions to ensure that no insn is left
without a type attribute.

Tested for regressions using rv32/64 multilib with newlib/linux. 

gcc/Changelog:

* config/riscv/pic.md: Update types

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/pic.md | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/pic.md b/gcc/config/riscv/pic.md
index da636e31619..cfaa670caf0 100644
--- a/gcc/config/riscv/pic.md
+++ b/gcc/config/riscv/pic.md
@@ -27,21 +27,24 @@ (define_insn "*local_pic_load"
(mem:ANYI (match_operand 1 "absolute_symbolic_operand" "")))]
   "USE_LOAD_ADDRESS_MACRO (operands[1])"
   "\t%0,%1"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "load")
+   (set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_load_s"
   [(set (match_operand:SUPERQI 0 "register_operand" "=r")
(sign_extend:SUPERQI (mem:SUBX (match_operand 1 
"absolute_symbolic_operand" ""]
   "USE_LOAD_ADDRESS_MACRO (operands[1])"
   "\t%0,%1"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "load")
+   (set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_load_u"
   [(set (match_operand:SUPERQI 0 "register_operand" "=r")
(zero_extend:SUPERQI (mem:SUBX (match_operand 1 
"absolute_symbolic_operand" ""]
   "USE_LOAD_ADDRESS_MACRO (operands[1])"
   "u\t%0,%1"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "load")
+   (set (attr "length") (const_int 8))])
 
 ;; We can support ANYLSF loads into X register if there is no double support
 ;; or if the target is 64-bit.
@@ -55,7 +58,8 @@ (define_insn "*local_pic_load"
   "@
\t%0,%1,%2
\t%0,%1"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpload")
+   (set (attr "length") (const_int 8))])
 
 ;; ??? For a 32-bit target with double float, a DF load into a X reg isn't
 ;; supported.  ld is not valid in that case.  Punt for now.  Maybe add a split
@@ -68,14 +72,16 @@ (define_insn "*local_pic_load_32d"
   "TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[1])
&& (TARGET_DOUBLE_FLOAT && !TARGET_64BIT)"
   "\t%0,%1,%2"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpload")
+   (set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_load_sf"
   [(set (match_operand:SOFTF 0 "register_operand" "=r")
(mem:SOFTF (match_operand 1 "absolute_symbolic_operand" "")))]
   "!TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[1])"
   "\t%0,%1"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpload")
+   (set (attr "length") (const_int 8))])
 
 ;; Simplify PIC stores to static variables.
 ;; These should go away once we figure out how to emit auipc discretely.
@@ -86,7 +92,8 @@ (define_insn "*local_pic_store"
(clobber (match_scratch:P 2 "=&r"))]
   "USE_LOAD_ADDRESS_MACRO (operands[0])"
   "\t%z1,%0,%2"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "store")
+   (set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_store"
   [(set (mem:ANYLSF (match_operand 0 "absolute_symbolic_operand" ""))
@@ -97,7 +104,8 @@ (define_insn "*local_pic_store"
   "@
\t%1,%0,%2
\t%1,%0,%2"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpstore")
+   (set (attr "length") (const_int 8))])
 
 ;; ??? For a 32-bit target with double float, a DF store from a X reg isn't
 ;; supported.  sd is not valid in that case.  Punt for now.  Maybe add a split
@@ -110,7 +118,8 @@ (define_insn "*local_pic_store_32d"
   "TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[1])
&& (TARGET_DOUBLE_FLOAT && !TARGET_64BIT)"
   "\t%1,%0,%2"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpstore")
+   (set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_store_sf"
   [(set (mem:SOFTF (match_operand 0 "absolute_symbolic_operand" ""))
@@ -118,4 +127,5 @@ (define_insn "*local_pic_store_sf"
(clobber (match_scratch:P 2 "=&r"))]
   "!TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[0])"
   "\t%1,%0,%2"
-  [(set (attr "length") (const_int 8))])
+  [(set_attr "type" "fpstore")
+   (set (attr "length") (const_int 8))])
-- 
2.34.1



[PATCH] RISC-V: Add dynamic LMUL compile option

2023-08-31 Thread Juzhe-Zhong
We are going to support dynamic LMUL support.

gcc/ChangeLog:

* config/riscv/riscv-opts.h (enum riscv_autovec_lmul_enum): Add dynamic 
enum.
* config/riscv/riscv.opt: Add dynamic compile option.

---
 gcc/config/riscv/riscv-opts.h | 4 +++-
 gcc/config/riscv/riscv.opt| 3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 5ed69abd214..79e0f12e388 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -79,7 +79,9 @@ enum riscv_autovec_lmul_enum {
   RVV_M1 = 1,
   RVV_M2 = 2,
   RVV_M4 = 4,
-  RVV_M8 = 8
+  RVV_M8 = 8,
+  /* For dynamic LMUL, we compare COST start with LMUL8.  */
+  RVV_DYNAMIC = RVV_M8
 };
 
 enum riscv_multilib_select_kind {
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index d2407c3c502..eca0dda4dd5 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -311,6 +311,9 @@ Enum(riscv_autovec_lmul) String(m4) Value(RVV_M4)
 EnumValue
 Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8)
 
+EnumValue
+Enum(riscv_autovec_lmul) String(dynamic) Value(RVV_DYNAMIC)
+
 -param=riscv-autovec-lmul=
 Target RejectNegative Joined Enum(riscv_autovec_lmul) Var(riscv_autovec_lmul) 
Init(RVV_M1)
 -param=riscv-autovec-lmul= Set the RVV LMUL of auto-vectorization 
in the RISC-V port.
-- 
2.36.3



[PATCH] RISC-V: Enable VECT_COMPARE_COSTS by default

2023-08-31 Thread Juzhe-Zhong
since we have added COST framework, we by default enable VECT_COMPARE_COSTS.

Also, add 16/32/64 to provide more choices for COST comparison.

This patch doesn't change any behavior from the current testsuite since we are 
using
default COST model.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (autovectorize_vector_modes): Enable 
VECT_COMPARE_COSTS by default.

---
 gcc/config/riscv/riscv-v.cc | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 6228ff3d92e..c8ad96f44d5 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -2173,7 +2173,7 @@ autovectorize_vector_modes (vector_modes *modes, bool)
   full vectors for wider elements.
 - full_size / 8:
   Try using 64-bit containers for all element types.  */
-  static const int rvv_factors[] = {1, 2, 4, 8};
+  static const int rvv_factors[] = {1, 2, 4, 8, 16, 32, 64};
   for (unsigned int i = 0; i < sizeof (rvv_factors) / sizeof (int); i++)
{
  poly_uint64 units;
@@ -2183,12 +2183,8 @@ autovectorize_vector_modes (vector_modes *modes, bool)
modes->safe_push (mode);
}
 }
-  unsigned int flag = 0;
   if (TARGET_VECTOR_VLS)
 {
-  /* Enable VECT_COMPARE_COSTS between VLA modes VLS modes for scalable
-auto-vectorization.  */
-  flag |= VECT_COMPARE_COSTS;
   /* Push all VLSmodes according to TARGET_MIN_VLEN.  */
   unsigned int i = 0;
   unsigned int base_size = TARGET_MIN_VLEN * riscv_autovec_lmul / 8;
@@ -2201,7 +2197,8 @@ autovectorize_vector_modes (vector_modes *modes, bool)
  size = base_size / (1U << i);
}
 }
-  return flag;
+  /* Enable LOOP_VINFO comparison in COST model.  */
+  return VECT_COMPARE_COSTS;
 }
 
 /* If the given VECTOR_MODE is an RVV mode,  first get the largest number
-- 
2.36.3



[PATCH] analyzer: Add support of placement new and improved operator new [PR105948, PR94355]

2023-08-31 Thread Benjamin Priour via Gcc-patches
From: benjamin priour 

Hi, 

Succesfully regstrapped off trunk 7f2ed06ddc825e8a4e0edfd1d66b5156e6dc1d34
on x86_64-linux-gnu.

Is it OK for trunk ?

Thanks,
Benjamin.

Patch below.
---

Fixed spurious possibly-NULL warning always tagging along throwing
operator new despite it never returning NULL.
Now operator new is correctly recognized as possibly returning NULL
if and only if it is non-throwing or exceptions have been disabled.
Different standard signatures of operator new are now properly
recognized.

Added support of placement new, so that it is now properly recognized,
and a 'heap_allocated' region is no longer created for it.
Placement new size is also checked and a 'Wanalyzer-allocation-size'
is emitted when relevant, as well as always a 'Wanalyzer-out-of-bounds'.

'operator new' non-throwing variants are detected y checking the types
of the parameters.
Indeed, in a call to new (std::nothrow) () the chosen overload
has signature 'operator new (void*, std::nothrow_t&)', where the second
parameter is a reference. In a placement new, the second parameter will
always be a void pointer.

Prior to this patch, some buffers first allocated with 'new', then deleted
an thereafter used would result in a 'Wanalyzer-user-after-free'
warning. However the wording was "use after 'free'" instead of the
expected "use after 'delete'".
This patch fixes this by introducing a new kind of poisoned value,
namely POISON_KIND_DELETED.

Due to how the analyzer sees calls to non-throwing variants of
operator new, dereferencing a pointer freshly allocated in this fashion
caused both a 'Wanalyzer-use-of-uninitialized-value' and a
'Wanalyzer-null-dereference' to be emitted, while only the latter was
relevant. As a result, 'null-dereference' now supersedes
'use-of-uninitialized'.

Signed-off-by: benjamin priour 

gcc/analyzer/ChangeLog:

PR analyzer/105948
PR analyzer/94355
* analyzer.h (is_placement_new_p): New declaration.
* call-details.cc
(call_details::maybe_get_arg_region): New function.
Returns the region of the argument at given index if possible.
* call-details.h: Declaration of the above function.
* kf-lang-cp.cc (is_placement_new_p): Returns true if the gcall
is recognized as a placement new.
(kf_operator_delete::impl_call_post): Unbinding a region and its
descendents now poisons with POISON_KIND_DELETED.
(register_known_functions_lang_cp): Known function "operator
delete" is now registered only once independently of its number of
arguments.
* region-model.cc (region_model::eval_condition): Now
recursively calls itself if any of the operand is wrapped in a
cast.
* sm-malloc.cc (malloc_state_machine::on_stmt):
Add placement new recognition.
* svalue.cc (poison_kind_to_str): Wording for the new PK.
* svalue.h (enum poison_kind): Add value POISON_KIND_DELETED.

gcc/testsuite/ChangeLog:

PR analyzer/105948
PR analyzer/94355
* g++.dg/analyzer/out-of-bounds-placement-new.C: Added a directive.
* g++.dg/analyzer/placement-new.C: Added tests.
* g++.dg/analyzer/new-2.C: New test.
* g++.dg/analyzer/noexcept-new.C: New test.
* g++.dg/analyzer/placement-new-size.C: New test.
---
 gcc/analyzer/analyzer.h   |   1 +
 gcc/analyzer/call-details.cc  |  11 ++
 gcc/analyzer/call-details.h   |   1 +
 gcc/analyzer/kf-lang-cp.cc| 117 +++---
 gcc/analyzer/region-model.cc  |  36 ++
 gcc/analyzer/sm-malloc.cc |  37 --
 gcc/analyzer/svalue.cc|   2 +
 gcc/analyzer/svalue.h |   3 +
 gcc/testsuite/g++.dg/analyzer/new-2.C |  70 +++
 gcc/testsuite/g++.dg/analyzer/noexcept-new.C  |  48 +++
 .../analyzer/out-of-bounds-placement-new.C|   2 +-
 .../g++.dg/analyzer/placement-new-size.C  |  27 
 gcc/testsuite/g++.dg/analyzer/placement-new.C |  90 +-
 13 files changed, 417 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/analyzer/new-2.C
 create mode 100644 gcc/testsuite/g++.dg/analyzer/noexcept-new.C
 create mode 100644 gcc/testsuite/g++.dg/analyzer/placement-new-size.C

diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 9b351b5ed56..208b85026fc 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -423,6 +423,7 @@ extern bool is_std_named_call_p (const_tree fndecl, const 
char *funcname,
 const gcall *call, unsigned int num_args);
 extern bool is_setjmp_call_p (const gcall *call);
 extern bool is_longjmp_call_p (const gcall *call);
+extern bool is_placement_new_p (const gcall *call);
 
 extern const char *get_user_facing_name (const gcall *call);
 
diff --git a/gcc/analyzer/call-details.cc b/gcc/analyzer/call-details.cc
index 66fb0fe

Re: [RFC PATCH] c++: Diagnose [basic.scope.block]/2 violations even for block externs [PR52953]

2023-08-31 Thread Jason Merrill via Gcc-patches

On 8/31/23 04:08, Jakub Jelinek wrote:

Hi!

C++17 had in [basic.block.scope]/2
"A parameter name shall not be redeclared in the outermost block of the function
definition nor in the outermost block of any handler associated with a
function-try-block."
and in [basic.block.scope]/4 similar rule for selection/iteration
statements.  My reading of that is that it applied even for block local
externs in all those spots, while they declare something at namespace scope,
the redeclaration happens in that outermost block etc. and introduces names
into that.
Those wordings seemed to have been moved somewhere else in C++20, but what's
worse, they were moved back and completely rewritten in
P1787R6: Declarations and where to find them
which has been applied as a DR (but admittedly, we don't claim yet to
implement that).
The current wording at https://eel.is/c++draft/basic.scope#block-2
and https://eel.is/c++draft/basic.scope#scope-2.10 seem to imply at least
to me that it doesn't apply to extern block local decls because their
target scope is the namespace scope and [basic.scope.block]/2 says
"and whose target scope is the block scope"...
Now, it is unclear if that is actually the intent or not.


Yes, I suspect that should be

If a declaration that is not a name-independent declaration and 
whose target scope isthat binds a name in the 
block scope S of a


which seems to also be needed to prohibit the already-diagnosed

void f(int i) { union { int i; }; }
void g(int i) { enum { i }; }

I've suggested this to Core.


There seems to be quite large implementation divergence on this as well.

Unpatched g++ e.g. on the redeclaration-5.C testcase diagnoses just
lines 55,58,67,70 (i.e. where the previous declaration is in for's
condition).

clang++ trunk diagnoses just lines 8 and 27, i.e. redeclaration in the
function body vs. parameter both in normal fn and lambda (but not e.g.
function-try-block and others, including ctors, but it diagnoses those
for non-extern decls).

ICC 19 diagnoses lines 8,32,38,41,45,52,55,58,61,64,67,70,76.

And MSCV trunk diagnoses 8,27,32,38,41,45,48,52,55,58,67,70,76,87,100,137
although the last 4 are just warnings.

g++ with the patch diagnoses
8,15,27,32,38,41,45,48,52,55,58,61,64,67,70,76,87,100,121,137
as the dg-error directives test.

So, I'm not really sure what to do.  Intuitively the patch seems right
because even block externs redeclare stuff and change meaning of the
identifiers and void foo () { int i; extern int i (int); } is rejected
by all compilers.


I think this direction makes sense, though we might pedwarn on these 
rather than error to reduce possible breakage.



2023-08-31  Jakub Jelinek  

PR c++/52953
* name-lookup.cc (check_local_shadow): Defer punting on
DECL_EXTERNAL (decl) from the start of function to right before
the -Wshadow* checks.


Don't we want to consider externs for the -Wshadow* checks as well?

Jason



Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Andrew Pinski via Gcc-patches
On Thu, Aug 31, 2023 at 5:15 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, 31 Aug 2023, Filip Kastl wrote:
>
> > > The most obvious places would be right after SSA construction and before 
> > > RTL expansion.
> > > Can you provide measurements for those positions?
> >
> > The algorithm should only remove PHIs that break SSA form minimality. Since
> > GCC's SSA construction already produces minimal SSA form, the algorithm 
> > isn't
> > expected to remove any PHIs if run right after the construction. I even
> > measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where 
> > the
> > algorithm managed to remove exactly 1 PHI, which is weird).
> >
> > I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> > remove at that point, but there still are some.
>
> That's interesting.  Your placement at
>
>   NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
>   NEXT_PASS (pass_phiopt, true /* early_p */);
> + NEXT_PASS (pass_sccp);
>
> and
>
>NEXT_PASS (pass_tsan);
>NEXT_PASS (pass_dse, true /* use DR analysis */);
>NEXT_PASS (pass_dce);
> +  NEXT_PASS (pass_sccp);
>
> isn't immediately after the "best" existing pass we have to
> remove dead PHIs which is pass_cd_dce.  phiopt might leave
> dead PHIs around and the second instance runs long after the
> last CD-DCE.

Actually the last phiopt is run before last pass_cd_dce:
  NEXT_PASS (pass_dce, true /* update_address_taken_p */);
  /* After late DCE we rewrite no longer addressed locals into SSA
 form if possible.  */
  NEXT_PASS (pass_forwprop);
  NEXT_PASS (pass_sink_code, true /* unsplit edges */);
  NEXT_PASS (pass_phiopt, false /* early_p */);
  NEXT_PASS (pass_fold_builtins);
  NEXT_PASS (pass_optimize_widening_mul);
  NEXT_PASS (pass_store_merging);
  /* If DCE is not run before checking for uninitialized uses,
 we may get false warnings (e.g., testsuite/gcc.dg/uninit-5.c).
 However, this also causes us to misdiagnose cases that should be
 real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

Thanks,
Andrew Pinski


>
> So I wonder if your pass just detects unnecessary PHIs we'd have
> removed by other means and what survives until RTL expansion is
> what we should count?



>
> Can you adjust your original early placement to right after
> the cd-dce pass and for the late placement turn the dce pass
> before it into cd-dce and re-do your measurements?
>
> > 500.perlbench_r
> > Started with 43111
> > Ended with 42942
> > Removed PHI % .39201131961680313700
> >
> > 502.gcc_r
> > Started with 141392
> > Ended with 140455
> > Removed PHI % .66269661649881181400
> >
> > 505.mcf_r
> > Started with 482
> > Ended with 478
> > Removed PHI % .82987551867219917100
> >
> > 523.xalancbmk_r
> > Started with 136040
> > Ended with 135629
> > Removed PHI % .30211702440458688700
> >
> > 531.deepsjeng_r
> > Started with 2150
> > Ended with 2148
> > Removed PHI % .09302325581395348900
> >
> > 541.leela_r
> > Started with 4664
> > Ended with 4650
> > Removed PHI % .30017152658662092700
> >
> > 557.xz_r
> > Started with 43
> > Ended with 43
> > Removed PHI % 0
> >
> > > Can the pass somehow be used as part of propagations like during value 
> > > numbering?
> >
> > I don't think that the pass could be used as a part of different 
> > optimizations
> > since it works on the whole CFG (except for copy propagation as I noted in 
> > the
> > RFC). I'm adding Honza into Cc. He'll have more insight into this.
> >
> > > Could the new file be called gimple-ssa-sccp.cc or something similar?
> >
> > Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> > appropriate?
> >
> > I'm thinking about naming the pass 'scc-copy' and the file
> > 'tree-ssa-scc-copy.cc'.
> >
> > > Removing some PHIs is nice, but it would be also interesting to know what
> > > are the effects on generated code size and/or performance.
> > > And also if it has any effects on debug information coverage.
> >
> > Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> > and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> > Here are the full results. What do you think? Should I run more benchmarks? 
> > Or
> > benchmark multiple times? Or run the benchmarks on different machines?*
> >
> > 500.perlbench_r
> > Without SCCP: 244.151807s
> > With SCCP: 242.448438s
> > -0.7025695913124297%
> >
> > 502.gcc_r
> > Without SCCP: 211.029606s
> > With SCCP: 211.614523s
> > +0.27640683243653763%
> >
> > 505.mcf_r
> > Without SCCP: 298.782621s
> > With SCCP: 291.671468s
> > -2.438069465197046%
> >
> > 523.xalancbmk_r
> > Without SCCP: 189.940639s
> > With SCCP: 189.876261s
> > -0.03390523894928332%
> >
> > 531.deepsjeng_r
> > Without SCCP: 250.63648s
> > With SCCP: 250.988624s
> > +0.1403027732444051%
> >
> > 541.leela_r
> > Withou

[PATCH] Fortran: runtime bounds-checking in presence of array constructors [PR31059]

2023-08-31 Thread Harald Anlauf via Gcc-patches
Dear all,

gfortran's array bounds-checking code does a mostly reasonable
job for array sections in expressions and assignments, but
forgot the case that (rank-1) expressions can involve array
constructors, which have a shape ;-)

The attached patch walks over the loops generated by the
scalarizer, checks for the presence of a constructor, and
takes the first shape found as reference.  (If several
constructors are present, discrepancies in their shape
seems to be already detected at compile time).

For more details on what will be caught now see testcase.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 944a35909e8eeb79c92e398ae3f27e94708584e6 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 31 Aug 2023 22:19:58 +0200
Subject: [PATCH] Fortran: runtime bounds-checking in presence of array
 constructors [PR31059]

gcc/fortran/ChangeLog:

	PR fortran/31059
	* trans-array.cc (gfc_conv_ss_startstride): For array bounds checking,
	consider also array constructors in expressions, and use their shape.

gcc/testsuite/ChangeLog:

	PR fortran/31059
	* gfortran.dg/bounds_check_fail_5.f90: New test.
---
 gcc/fortran/trans-array.cc| 23 
 .../gfortran.dg/bounds_check_fail_5.f90   | 26 +++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/bounds_check_fail_5.f90

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 90a7d4e9aef..6ca58e98547 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4740,6 +4740,29 @@ done:
   for (n = 0; n < loop->dimen; n++)
 	size[n] = NULL_TREE;

+  /* If there is a constructor involved, derive size[] from its shape.  */
+  for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
+	{
+	  gfc_ss_info *ss_info;
+
+	  ss_info = ss->info;
+	  info = &ss_info->data.array;
+
+	  if (ss_info->type == GFC_SS_CONSTRUCTOR && info->shape)
+	{
+	  for (n = 0; n < loop->dimen; n++)
+		{
+		  if (size[n] == NULL)
+		{
+		  gcc_assert (info->shape[n]);
+		  size[n] = gfc_conv_mpz_to_tree (info->shape[n],
+		  gfc_index_integer_kind);
+		}
+		}
+	  break;
+	}
+	}
+
   for (ss = loop->ss; ss != gfc_ss_terminator; ss = ss->loop_chain)
 	{
 	  stmtblock_t inner;
diff --git a/gcc/testsuite/gfortran.dg/bounds_check_fail_5.f90 b/gcc/testsuite/gfortran.dg/bounds_check_fail_5.f90
new file mode 100644
index 000..436cc96621d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/bounds_check_fail_5.f90
@@ -0,0 +1,26 @@
+! { dg-do run }
+! { dg-additional-options "-fcheck=bounds -g -fdump-tree-original" }
+! { dg-output "At line 13 .*" }
+! { dg-shouldfail "Array bound mismatch for dimension 1 of array 'ivec' (2/3)" }
+!
+! PR fortran/31059 - runtime bounds-checking in presence of array constructors
+
+program p
+  integer  :: jvec(3) = [1,2,3]
+  integer, allocatable :: ivec(:), kvec(:), lvec(:), mvec(:), nvec(:)
+  ivec= [1,2]   ! (re)allocation
+  kvec= [4,5,6] ! (re)allocation
+  ivec(:) = [4,5,6] ! runtime error (->dump)
+  ! not reached ...
+  print *, jvec + [1,2,3] ! OK & no check generated
+  print *, [4,5,6] + jvec ! OK & no check generated
+  print *, lvec + [1,2,3] ! check generated (->dump)
+  print *, [4,5,6] + mvec ! check generated (->dump)
+  nvec(:) = jvec  ! check generated (->dump)
+end
+
+! { dg-final { scan-tree-dump-times "Array bound mismatch " 4 "original" } }
+! { dg-final { scan-tree-dump-times "Array bound mismatch .*ivec" 1 "original" } }
+! { dg-final { scan-tree-dump-times "Array bound mismatch .*lvec" 1 "original" } }
+! { dg-final { scan-tree-dump-times "Array bound mismatch .*mvec" 1 "original" } }
+! { dg-final { scan-tree-dump-times "Array bound mismatch .*nvec" 1 "original" } }
--
2.35.3



Re: [PATCH] c++: Diagnose [basic.scope.block]/2 violations even in compound-stmt of function-try-block [PR52953]

2023-08-31 Thread Jason Merrill via Gcc-patches

On 8/31/23 03:20, Jakub Jelinek wrote:

Hi!

As the following testcase shows, while check_local_shadow diagnoses most of
the [basic.scope.block]/2 violations, it doesn't diagnose when parameter's
name is redeclared inside of the compound-stmt of a function-try-block.

There is in that case an extra scope (sk_try with parent artificial
sk_block with for FUNCTION_NEEDS_BODY_BLOCK another sk_block and only then
sk_function_param).

The following patch fixes that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-08-31  Jakub Jelinek  

PR c++/52953
* cp-tree.h (struct language_function): Add x_in_function_try_block
member.


How about adding a flag to cp_binding_level instead?  Maybe to mark the 
artificial sk_block level as such, which we could use for both this case 
and the FUNCTION_NEEDS_BODY_BLOCK cases.


Jason



Re: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Jonathan Wakely via Gcc-patches
On Thu, 31 Aug 2023 at 18:42, Jonathan Wakely  wrote:
>
> On Thu, 31 Aug 2023 at 16:26, Christophe Lyon
>  wrote:
> >
> > As discussed in PR104167 (comments #8 and below), and PR111238, using
> > -Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
> > (cross-toolchain) avoids link failures for a few tests:
> >
> > 27_io/filesystem/path/108636.cc
>
> I think this one probably just needs { dg-require-filesystem-ts "" }
> because there's no point testing that we can link to the
> std::filesystem definitions if some of those definitions are unusable
> on the target.
>
> // { dg-additional-options "-Wl,--gc-sections" { target gc_sections } }
>
> For the rest of them, does the attached patch help?

I've tested the patch for an arm-eabi cross compiler, and it fixes the
linker errors.

It doesn't change the fact that almost any use of the std::filesystem
APIs will hit the linker errors and so require users to link with
--gc-sections (or provide stubs for the missing functions) but that's
for users to deal with (if anybody using newlib targets is even making
use of those std::filesystem APIs anyway). With the patch to tzdb.cc
we don't need to change how libstdc++ is tested for the arm-eabi cross
target.

> If arm-eabi
> doesn't define _GLIBCXX_HAVE_READLINK then there's no point even
> trying to call filesystem::read_symlink. We can avoid a useless
> dependency on it by reusing the same preprocessor condition that
> filesystem::read_symlink uses.
>
> > std/time/clock/gps/1.cc
> > std/time/clock/gps/io.cc
> > std/time/clock/tai/1.cc
> > std/time/clock/tai/io.cc
> > std/time/clock/utc/1.cc
> > std/time/clock/utc/io.cc
> > std/time/clock/utc/leap_second_info.cc
> > std/time/exceptions.cc
> > std/time/format.cc
> > std/time/time_zone/get_info_local.cc
> > std/time/time_zone/get_info_sys.cc
> > std/time/tzdb/1.cc
> > std/time/tzdb/leap_seconds.cc
> > std/time/tzdb_list/1.cc
> > std/time/zoned_time/1.cc
> > std/time/zoned_time/custom.cc
> > std/time/zoned_time/io.cc
> > std/time/zoned_traits.cc
> >
> > This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
> > cross-build cases, like we already do for native builds. We keep not
> > doing so in Canadian-cross builds.
> >
> > However, this would hide the fact that libstdc++ somehow forces the
> > user to use -Wl,-gc-sections to avoid undefined references to chdir,
> > mkdir, chmod, pathconf, ... so maybe it's better to keep the status
> > quo and not apply this patch?
>
> I'm undecided about this for now, but let's wait for HP's cris-elf
> testing anyway.



Re: [PATCH] c++, v3: Fix up mangling of function/block scope static structured bindings and emit abi tags [PR111069]

2023-08-31 Thread Jason Merrill via Gcc-patches

On 8/31/23 15:14, Jakub Jelinek wrote:

On Thu, Aug 31, 2023 at 01:11:57PM -0400, Jason Merrill wrote:

2023-08-28  Jakub Jelinek  

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.


Maybe cp_finish_decomp should take this as well?  And tsubst_decomp_names,
and various other functions with decomp_first_name/decomp_cnt parms?


Ok, done below.


+  if (tree tags = get_abi_tags (decl))
+{
+  /* We didn't emit ABI tags for structured bindings before ABI 19.  */
+  if (!G.need_abi_warning
+  && abi_warn_or_compat_version_crosses (19))
+   G.need_abi_warning = 1;


In general we should probably only warn about mangling changes if
TREE_PUBLIC (decl).


I have done that but I think it ought to be unnecessary, because
check_abi_tags starts with
   if (!TREE_PUBLIC (decl))
 /* No need to worry about things local to this TU.  */
 return NULL_TREE;

Here is an updated patch, so far just tested with
make check-g++ GXX_TESTSUITE_STDS=98,11,14,17,20,2b,2c 
RUNTESTFLAGS="dg.exp='*decomp*'"
ok for trunk if it passes full bootstrap/regtest?


OK.


2023-08-31  Jakub Jelinek  

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.
(cp_finish_decl): Add DECOMP argument defaulted to nullptr.
(cp_maybe_mangle_decomp): Remove declaration.
(cp_finish_decomp): Add cp_decomp * argument, remove tree and unsigned
args.
(cp_convert_range_for): Likewise.
* decl.cc (determine_local_discriminator): Add NAME argument, use it
if non-NULL, otherwise compute it the old way.
(maybe_commonize_var): Don't return early for structured bindings.
(cp_finish_decl): Add DECOMP argument, if non-NULL, call
cp_maybe_mangle_decomp.
(cp_maybe_mangle_decomp): Make it static with a forward declaration.
Call determine_local_discriminator.  Replace FIRST and COUNT arguments
with DECOMP argument.
(cp_finish_decomp): Replace FIRST and COUNT arguments with DECOMP
argument.
* mangle.cc (find_decomp_unqualified_name): Remove.
(write_unqualified_name): Don't call find_decomp_unqualified_name.
(mangle_decomp): Handle mangling of static function/block scope
structured bindings.  Don't call decl_mangling_context twice.  Call
check_abi_tags, call write_abi_tags for abi version >= 19 and emit
-Wabi warnings if needed.
(write_guarded_var_name): Handle structured bindings.
(mangle_ref_init_variable): Use write_guarded_var_name.
* parser.cc (cp_parser_range_for): Adjust do_range_for_auto_deduction
and cp_convert_range_for callers.
(do_range_for_auto_deduction): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Adjust cp_finish_decomp caller.
(cp_convert_range_for): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
(cp_parser_decomposition_declaration): Don't call
cp_maybe_mangle_decomp, adjust cp_finish_decl and cp_finish_decomp
callers.
(cp_convert_omp_range_for): Adjust do_range_for_auto_deduction
and cp_finish_decomp callers.
(cp_finish_omp_range_for): Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
* pt.cc (tsubst_omp_for_iterator): Adjust tsubst_decomp_names
caller.
(tsubst_decomp_names): Replace FIRST and CNT arguments with DECOMP.
(tsubst_expr): Don't call cp_maybe_mangle_decomp, adjust
tsubst_decomp_names, cp_finish_decl, cp_finish_decomp and
cp_convert_range_for callers.
gcc/testsuite/
* g++.dg/cpp2a/decomp8.C: New test.
* g++.dg/cpp2a/decomp9.C: New test.
* g++.dg/abi/macro0.C: Expect __GXX_ABI_VERSION 1019 rather than
1018.

--- gcc/common.opt.jj   2023-08-28 13:55:55.670370386 +0200
+++ gcc/common.opt  2023-08-31 19:53:31.186280641 +0200
@@ -1010,6 +1010,9 @@ Driver Undocumented
  ; 18: Corrects errors in mangling of lambdas with additional context.
  ; Default in G++ 13.
  ;
+; 19: Emits ABI tags if needed in structured binding mangled names.
+; Default in G++ 14.
+;
  ; Additional positive integers will be assigned as new versions of
  ; the ABI

Re: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Hans-Peter Nilsson via Gcc-patches
> From: Hans-Peter Nilsson 
> Date: Thu, 31 Aug 2023 19:05:19 +0200

> > Date: Thu, 31 Aug 2023 17:25:45 +0200
> > From: Christophe Lyon via Gcc-patches 

> > However, this would hide the fact that libstdc++ somehow forces the
> > user to use -Wl,-gc-sections to avoid undefined references to chdir,
> > mkdir, chmod, pathconf, ... so maybe it's better to keep the status
> > quo and not apply this patch?

I agree with the sentiment, but maybe --gc-sections should
instead be passed by default for arm-eabi when linking, with
way to opt-out; as for cris-elf per below.

> Datapoint: no failures for cris-elf in the listed tests -
> but it instead passes --gc-sections if -O2 or -O3 is seen
> for linking; see cris/cris.h.  It's been like that forever,
> modulo a patch in 2002 not passing it if "-r" is seen.
> 
> Incidentally, I've been sort-of investigating a recent-ish
> commit to newlib (8/8) that added a stub for getpriority,
> which was apparently added due to testsuite breakage for
> libstdc++ and arm-eabi, but that instead broke testsuite
> results for *other* targets, as warning at link-time.  Film
> at 11.
> 
> > 2023-08-31  Christophe Lyon  
> > 
> > libstdc++-v3/ChangeLog:
> > 
> > PR libstdc++/111238
> > * configure: Regenerate.
> > * configure.ac: Call GLIBCXX_CHECK_LINKER_FEATURES in cross,
> > non-Canadian builds.
> 
> On this actual patch, I can't say yay or nay though (but
> leaning towards yay), but I'll test for cris-elf.  Would you
> mind holding off committing for a day or two?

No regressions for cris-elf with this patch.  Still, on one
thought I'm also not wild about libstdc++ this way
overriding the target, and on the other hand, I'll likely to
suggest something similar (adding options) to "improve"
GCC_TRY_COMPILE_OR_LINK (more targets actually linking).

brgds, H-P


[PATCH] c++, v3: Fix up mangling of function/block scope static structured bindings and emit abi tags [PR111069]

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 01:11:57PM -0400, Jason Merrill wrote:
> > 2023-08-28  Jakub Jelinek  
> > 
> > PR c++/111069
> > gcc/
> > * common.opt (fabi-version=): Document version 19.
> > * doc/invoke.texi (-fabi-version=): Likewise.
> > gcc/c-family/
> > * c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
> > gcc/cp/
> > * cp-tree.h (determine_local_discriminator): Add NAME argument with
> > NULL_TREE default.
> > (struct cp_decomp): New type.
> 
> Maybe cp_finish_decomp should take this as well?  And tsubst_decomp_names,
> and various other functions with decomp_first_name/decomp_cnt parms?

Ok, done below.

> > +  if (tree tags = get_abi_tags (decl))
> > +{
> > +  /* We didn't emit ABI tags for structured bindings before ABI 19.  */
> > +  if (!G.need_abi_warning
> > +  && abi_warn_or_compat_version_crosses (19))
> > +   G.need_abi_warning = 1;
> 
> In general we should probably only warn about mangling changes if
> TREE_PUBLIC (decl).

I have done that but I think it ought to be unnecessary, because
check_abi_tags starts with
  if (!TREE_PUBLIC (decl))
/* No need to worry about things local to this TU.  */
return NULL_TREE;

Here is an updated patch, so far just tested with
make check-g++ GXX_TESTSUITE_STDS=98,11,14,17,20,2b,2c 
RUNTESTFLAGS="dg.exp='*decomp*'"
ok for trunk if it passes full bootstrap/regtest?

2023-08-31  Jakub Jelinek  

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.
(cp_finish_decl): Add DECOMP argument defaulted to nullptr.
(cp_maybe_mangle_decomp): Remove declaration.
(cp_finish_decomp): Add cp_decomp * argument, remove tree and unsigned
args.
(cp_convert_range_for): Likewise.
* decl.cc (determine_local_discriminator): Add NAME argument, use it
if non-NULL, otherwise compute it the old way.
(maybe_commonize_var): Don't return early for structured bindings.
(cp_finish_decl): Add DECOMP argument, if non-NULL, call
cp_maybe_mangle_decomp.
(cp_maybe_mangle_decomp): Make it static with a forward declaration.
Call determine_local_discriminator.  Replace FIRST and COUNT arguments
with DECOMP argument.
(cp_finish_decomp): Replace FIRST and COUNT arguments with DECOMP
argument.
* mangle.cc (find_decomp_unqualified_name): Remove.
(write_unqualified_name): Don't call find_decomp_unqualified_name.
(mangle_decomp): Handle mangling of static function/block scope
structured bindings.  Don't call decl_mangling_context twice.  Call
check_abi_tags, call write_abi_tags for abi version >= 19 and emit
-Wabi warnings if needed.
(write_guarded_var_name): Handle structured bindings.
(mangle_ref_init_variable): Use write_guarded_var_name.
* parser.cc (cp_parser_range_for): Adjust do_range_for_auto_deduction
and cp_convert_range_for callers.
(do_range_for_auto_deduction): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Adjust cp_finish_decomp caller.
(cp_convert_range_for): Replace DECOMP_FIRST_NAME and
DECOMP_CNT arguments with DECOMP.  Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
(cp_parser_decomposition_declaration): Don't call
cp_maybe_mangle_decomp, adjust cp_finish_decl and cp_finish_decomp
callers.
(cp_convert_omp_range_for): Adjust do_range_for_auto_deduction
and cp_finish_decomp callers.
(cp_finish_omp_range_for): Don't call cp_maybe_mangle_decomp,
adjust cp_finish_decl and cp_finish_decomp callers.
* pt.cc (tsubst_omp_for_iterator): Adjust tsubst_decomp_names
caller.
(tsubst_decomp_names): Replace FIRST and CNT arguments with DECOMP.
(tsubst_expr): Don't call cp_maybe_mangle_decomp, adjust
tsubst_decomp_names, cp_finish_decl, cp_finish_decomp and
cp_convert_range_for callers.
gcc/testsuite/
* g++.dg/cpp2a/decomp8.C: New test.
* g++.dg/cpp2a/decomp9.C: New test.
* g++.dg/abi/macro0.C: Expect __GXX_ABI_VERSION 1019 rather than
1018.

--- gcc/common.opt.jj   2023-08-28 13:55:55.670370386 +0200
+++ gcc/common.opt  2023-08-31 19:53:31.186280641 +0200
@@ -1010,6 +1010,9 @@ Driver Undocumented
 ; 18: Corrects errors in mangling of lambdas with additional context.
 ; Default in G++ 13.
 ;
+; 19: Emits ABI tags if needed in structured binding mangled names.
+; Default in G++ 14.
+;
 ; Additional positive integers will be assigned as new versions of
 ; the ABI become th

Re: [PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-08-31 Thread Vineet Gupta




On 8/31/23 06:51, Jeff Law wrote:



On 8/30/23 15:57, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a"

Isn't it

x = (a == 0) ? 0 : x

Which seems like it ought to fit zicond just fine.


Logically they are equivalent, but 



If we take yours;

x = (a != 0) ? x : a

And simplify with the known value of a on the false arm we get:

x = (a != 0 ) ? x : 0;

Which is equivalent to

x = (a == 0) ? 0 : x;

So ISTM this does fit zicond just fine.


I could very well be mistaken, but define_insn is a pattern match and 
opt2 has *ne* so the expression has to be in != form and thus needs to 
work with that condition. No ?



and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|    (if_then_else:DI (ne (reg/v:DI 134 [ a ])
|    (const_int 0 [0]))
|    (reg/v:DI 136 [ x ])
|    (reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
 czero.nez x, x, a   ; %0, %2, %1
implying
 "x = (a != 0) ? 0 : a"

I get this from the RTL pattern:

x = (a != 0) ? x : a
x = (a != 0) ? x : 0


This is the issue, for ne, czero.nez can only express
x = (a != 0) ? 0 : x



I think you got the arms reversed.


What I meant was czero.nez as specified in RTL pattern would generate x 
= (a != 0) ? 0 : a, whereas pattern's desired semantics is (a != 0) ? x : 0
And that is a problem because after all equivalents/simplifications, a 
ternary operation's middle operand has to be zero to map to czero*, but 
it doesn't for the opt2 RTL semantics.


I've sat on this for 2 days, trying to convince myself I was wrong, but 
as it stands, it was generating wrong code in the test which is fixed 
after the patch.


Thx,
-Vineet


Re: [PATCH v2 1/4] LoongArch: improved target configuration interface

2023-08-31 Thread Joseph Myers
On Thu, 31 Aug 2023, Yujie Yang wrote:

> -If none of such suffix is present, the configured value of
> -@option{--with-multilib-default} can be used as a common default suffix
> -for all library ABI variants.  Otherwise, the default build option
> -@code{-march=abi-default} is applied when building the variants without
> -a suffix.
> +If no such suffix is present for a given multilib variant, the
> +configured value of @code{--with-multilib-default} is appended as a default
> +suffix.  If @code{--with-multilib-default} is not given, the default build
> +option @code{-march=abi-default} is applied when building the variants
> +without a suffix.

@option is appropriate for --with-multilib-default and other configure 
options; it shouldn't be changed to @code.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Jonathan Wakely via Gcc-patches
On Thu, 31 Aug 2023, 18:43 Jonathan Wakely via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> On Thu, 31 Aug 2023 at 16:26, Christophe Lyon
>  wrote:
> >
> > As discussed in PR104167 (comments #8 and below), and PR111238, using
> > -Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
> > (cross-toolchain) avoids link failures for a few tests:
> >
> > 27_io/filesystem/path/108636.cc
>
> I think this one probably just needs { dg-require-filesystem-ts "" }
> because there's no point testing that we can link to the
> std::filesystem definitions if some of those definitions are unusable
> on the target.
>
> // { dg-additional-options "-Wl,--gc-sections" { target gc_sections } }
>

Oops, ignore this line! I was going to suggest that we could work try
adding this line, but I think it's better to use dg-require for the
108636.cc test, and make the ones below just work.



> For the rest of them, does the attached patch help? If arm-eabi
> doesn't define _GLIBCXX_HAVE_READLINK then there's no point even
> trying to call filesystem::read_symlink. We can avoid a useless
> dependency on it by reusing the same preprocessor condition that
> filesystem::read_symlink uses.
>
> > std/time/clock/gps/1.cc
> > std/time/clock/gps/io.cc
> > std/time/clock/tai/1.cc
> > std/time/clock/tai/io.cc
> > std/time/clock/utc/1.cc
> > std/time/clock/utc/io.cc
> > std/time/clock/utc/leap_second_info.cc
> > std/time/exceptions.cc
> > std/time/format.cc
> > std/time/time_zone/get_info_local.cc
> > std/time/time_zone/get_info_sys.cc
> > std/time/tzdb/1.cc
> > std/time/tzdb/leap_seconds.cc
> > std/time/tzdb_list/1.cc
> > std/time/zoned_time/1.cc
> > std/time/zoned_time/custom.cc
> > std/time/zoned_time/io.cc
> > std/time/zoned_traits.cc
> >
> > This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
> > cross-build cases, like we already do for native builds. We keep not
> > doing so in Canadian-cross builds.
> >
> > However, this would hide the fact that libstdc++ somehow forces the
> > user to use -Wl,-gc-sections to avoid undefined references to chdir,
> > mkdir, chmod, pathconf, ... so maybe it's better to keep the status
> > quo and not apply this patch?
>
> I'm undecided about this for now, but let's wait for HP's cris-elf
> testing anyway.
>


Re: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Jonathan Wakely via Gcc-patches
On Thu, 31 Aug 2023 at 16:26, Christophe Lyon
 wrote:
>
> As discussed in PR104167 (comments #8 and below), and PR111238, using
> -Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
> (cross-toolchain) avoids link failures for a few tests:
>
> 27_io/filesystem/path/108636.cc

I think this one probably just needs { dg-require-filesystem-ts "" }
because there's no point testing that we can link to the
std::filesystem definitions if some of those definitions are unusable
on the target.

// { dg-additional-options "-Wl,--gc-sections" { target gc_sections } }

For the rest of them, does the attached patch help? If arm-eabi
doesn't define _GLIBCXX_HAVE_READLINK then there's no point even
trying to call filesystem::read_symlink. We can avoid a useless
dependency on it by reusing the same preprocessor condition that
filesystem::read_symlink uses.

> std/time/clock/gps/1.cc
> std/time/clock/gps/io.cc
> std/time/clock/tai/1.cc
> std/time/clock/tai/io.cc
> std/time/clock/utc/1.cc
> std/time/clock/utc/io.cc
> std/time/clock/utc/leap_second_info.cc
> std/time/exceptions.cc
> std/time/format.cc
> std/time/time_zone/get_info_local.cc
> std/time/time_zone/get_info_sys.cc
> std/time/tzdb/1.cc
> std/time/tzdb/leap_seconds.cc
> std/time/tzdb_list/1.cc
> std/time/zoned_time/1.cc
> std/time/zoned_time/custom.cc
> std/time/zoned_time/io.cc
> std/time/zoned_traits.cc
>
> This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
> cross-build cases, like we already do for native builds. We keep not
> doing so in Canadian-cross builds.
>
> However, this would hide the fact that libstdc++ somehow forces the
> user to use -Wl,-gc-sections to avoid undefined references to chdir,
> mkdir, chmod, pathconf, ... so maybe it's better to keep the status
> quo and not apply this patch?

I'm undecided about this for now, but let's wait for HP's cris-elf
testing anyway.
commit eea73ea3bdd44a8f7d8c0f54b15bfba9058f6ce8
Author: Jonathan Wakely 
Date:   Thu Aug 31 18:31:32 2023

libstdc++: Avoid useless dependency on read_symlink from tzdb

chrono::tzdb::current_zone uses filesystem::read_symlink, which creates
a dependency on the fs_ops.o object in libstdc++.a, which then creates
dependencies on several OS functions if --gc-sections isn't used.

In the cases where that causes linker failures, we probably don't have
readlink anyway, so the filesystem::read_symlink call will always fail.
Repeat the preprocessor conditions for filesystem::read_symlink in the
body of chrono::tzdb::current_zone so that we don't create the
dependency on fs_ops.o if it's not even going to be able to read the
symlink.

libstdc++-v3/ChangeLog:

* src/c++20/tzdb.cc (tzdb::current_zone): Check configure macros
for POSIX readlink before using filesystem::read_symlink.

diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
index 0fcbf6a4824..24044bb60f8 100644
--- a/libstdc++-v3/src/c++20/tzdb.cc
+++ b/libstdc++-v3/src/c++20/tzdb.cc
@@ -1635,6 +1635,9 @@ namespace std::chrono
 // TODO cache this function's result?
 
 #ifndef _AIX
+// Repeat the preprocessor condition used by filesystem::read_symlink,
+// to avoid a dependency on src/c++17/tzdb.o if it won't work anyway.
+#if defined(_GLIBCXX_HAVE_READLINK) && defined(_GLIBCXX_HAVE_SYS_STAT_H)
 error_code ec;
 // This should be a symlink to e.g. /usr/share/zoneinfo/Europe/London
 auto path = filesystem::read_symlink("/etc/localtime", ec);
@@ -1653,6 +1656,7 @@ namespace std::chrono
  return tz;
  }
   }
+#endif
 // Otherwise, look for a file naming the time zone.
 string_view files[] {
   "/etc/timezone",// Debian derivates


[PATCH] RISC-V Add Types to Un-Typed Thead Instructions:

2023-08-31 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the THEAD instructions to ensure that no insn is left
without a type attribute. 

Tested for regressions using rv32/64 multilib for linux/newlib. 

gcc/Changelog:

* config/riscv/thead.md: Update types

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/thead.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 29f98dec3a8..6e63cb946e4 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -169,6 +169,7 @@ (define_insn "th_fmv_hw_w_x"
   "!TARGET_64BIT && TARGET_XTHEADFMV"
   "fmv.w.x\t%0,%2\n\tth.fmv.hw.x\t%0,%1"
   [(set_attr "move_type" "move")
+   (set_attr "type" "fmove")
(set_attr "mode" "DF")])
 
 (define_insn "th_fmv_x_w"
@@ -178,6 +179,7 @@ (define_insn "th_fmv_x_w"
   "!TARGET_64BIT && TARGET_XTHEADFMV"
   "fmv.x.w\t%0,%1"
   [(set_attr "move_type" "move")
+   (set_attr "type" "fmove")
(set_attr "mode" "DF")])
 
 (define_insn "th_fmv_x_hw"
@@ -187,6 +189,7 @@ (define_insn "th_fmv_x_hw"
   "!TARGET_64BIT && TARGET_XTHEADFMV"
   "th.fmv.x.hw\t%0,%1"
   [(set_attr "move_type" "move")
+   (set_attr "type" "fmove")
(set_attr "mode" "DF")])
 
 ;; XTheadMac
@@ -322,6 +325,7 @@ (define_insn "*th_mempair_load_2"
&& th_mempair_operands_p (operands, true, mode)"
   { return th_mempair_output_move (operands, true, mode, UNKNOWN); }
   [(set_attr "move_type" "load")
+   (set_attr "type" "load")
(set_attr "mode" "")])
 
 ;; MEMPAIR store 64/32 bit
@@ -334,6 +338,7 @@ (define_insn "*th_mempair_store_2"
&& th_mempair_operands_p (operands, false, mode)"
   { return th_mempair_output_move (operands, false, mode, UNKNOWN); }
   [(set_attr "move_type" "store")
+   (set_attr "type" "store")
(set_attr "mode" "")])
 
 ;; MEMPAIR load DI extended signed SI
@@ -346,6 +351,7 @@ (define_insn "*th_mempair_load_extendsidi2"
&& th_mempair_operands_p (operands, true, SImode)"
   { return th_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
   [(set_attr "move_type" "load")
+   (set_attr "type" "load")
(set_attr "mode" "DI")
(set_attr "length" "8")])
 
@@ -359,6 +365,7 @@ (define_insn "*th_mempair_load_zero_extendsidi2"
&& th_mempair_operands_p (operands, true, SImode)"
   { return th_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
   [(set_attr "move_type" "load")
+   (set_attr "type" "load")
(set_attr "mode" "DI")
(set_attr "length" "8")])
 
-- 
2.34.1



[PATCH] RISC-V: Add Types to Un-Typed Risc-v Instructions:

2023-08-31 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the riscv instructions to ensure that no insn is left
without a type attribute. Added new types: "trap" (self explanatory) and "cbo" 
(for cache related instructions)

Tested for regressions using rv32/64 multilib for linux/newlib. Also tested
rv32/64 gcv for linux.

gcc/Changelog:

* config/riscv/riscv.md: Update/Add types

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/riscv.md | 112 --
 1 file changed, 82 insertions(+), 30 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4041875e0e3..d80b6938f84 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -303,12 +303,14 @@ (define_attr "ext_enabled" "no,yes"
 ;; auipc   integer addition to PC
 ;; sfb_alu  SFB ALU instruction
 ;; nop no operation
+;; traptrap instruction
 ;; ghost   an instruction that produces no real code
 ;; bitmanipbit manipulation instructions
 ;; clmulclmul, clmulh, clmulr
 ;; rotate   rotation instructions
 ;; atomic   atomic instructions
 ;; condmoveconditional moves
+;; cbocache block instructions
 ;; crypto cryptography instructions
 ;; Classification of RVV instructions which will be added to each RVV .md 
pattern and used by scheduler.
 ;; rdvlenb vector byte length vlenb csrr read
@@ -417,9 +419,9 @@ (define_attr "ext_enabled" "no,yes"
 (define_attr "type"
   "unknown,branch,jump,call,load,fpload,store,fpstore,
mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,
-   fmadd,fdiv,fcmp,fcvt,fsqrt,multi,auipc,sfb_alu,nop,ghost,bitmanip,rotate,
-   clmul,min,max,minu,maxu,clz,ctz,cpop,
-   atomic,condmove,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl,
+   fmadd,fdiv,fcmp,fcvt,fsqrt,multi,auipc,sfb_alu,nop,trap,ghost,bitmanip,
+   rotate,clmul,min,max,minu,maxu,clz,ctz,cpop,
+   atomic,condmove,cbo,crypto,rdvlenb,rdvl,wrvxrm,wrfrm,rdfrm,vsetvl,
vlde,vste,vldm,vstm,vlds,vsts,
vldux,vldox,vstux,vstox,vldff,vldr,vstr,

vlsegde,vssegte,vlsegds,vssegts,vlsegdux,vlsegdox,vssegtux,vssegtox,vlsegdff,
@@ -1652,6 +1654,7 @@ (define_insn_and_split "*zero_extendsidi2_internal"
(lshiftrt:DI (match_dup 0) (const_int 32)))]
   { operands[1] = gen_lowpart (DImode, operands[1]); }
   [(set_attr "move_type" "shift_shift,load")
+   (set_attr "type" "load")
(set_attr "mode" "DI")])
 
 (define_expand "zero_extendhi2"
@@ -1680,6 +1683,7 @@ (define_insn_and_split "*zero_extendhi2"
 operands[2] = GEN_INT(GET_MODE_BITSIZE(mode) - 16);
   }
   [(set_attr "move_type" "shift_shift,load")
+   (set_attr "type" "load")
(set_attr "mode" "")])
 
 (define_insn "zero_extendqi2"
@@ -1691,6 +1695,7 @@ (define_insn "zero_extendqi2"
andi\t%0,%1,0xff
lbu\t%0,%1"
   [(set_attr "move_type" "andi,load")
+   (set_attr "type" "multi")
(set_attr "mode" "")])
 
 ;;
@@ -1709,6 +1714,7 @@ (define_insn "extendsidi2"
sext.w\t%0,%1
lw\t%0,%1"
   [(set_attr "move_type" "move,load")
+   (set_attr "type" "multi")
(set_attr "mode" "DI")])
 
 (define_expand "extend2"
@@ -1736,6 +1742,7 @@ (define_insn_and_split 
"*extend2"
 - GET_MODE_BITSIZE (mode));
 }
   [(set_attr "move_type" "shift_shift,load")
+   (set_attr "type" "load")
(set_attr "mode" "SI")])
 
 (define_insn "extendhfsf2"
@@ -1784,6 +1791,7 @@ (define_insn "*movhf_hardfloat"
|| reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" 
"fmove,fmove,mtc,fpload,fpstore,store,mtc,mfc,move,load,store")
+   (set_attr "type" "fmove")
(set_attr "mode" "HF")])
 
 (define_insn "*movhf_softfloat"
@@ -1794,6 +1802,7 @@ (define_insn "*movhf_softfloat"
|| reg_or_0_operand (operands[1], HFmode))"
   { return riscv_output_move (operands[0], operands[1]); }
   [(set_attr "move_type" "fmove,move,load,store,mtc,mfc")
+   (set_attr "type" "fmove")
(set_attr "mode" "HF")])
 
 ;;
@@ -1888,6 +1897,7 @@ (define_insn "got_load"
   ""
   "la\t%0,%1"
[(set_attr "got" "load")
+(set_attr "type" "load")
 (set_attr "mode" "")])
 
 (define_insn "tls_add_tp_le"
@@ -1910,6 +1920,7 @@ (define_insn "got_load_tls_gd"
   ""
   "la.tls.gd\t%0,%1"
   [(set_attr "got" "load")
+   (set_attr "type" "load")
(set_attr "mode" "")])
 
 (define_insn "got_load_tls_ie"
@@ -1920,6 +1931,7 @@ (define_insn "got_load_tls_ie"
   ""
   "la.tls.ie\t%0,%1"
   [(set_attr "got" "load")
+   (set_attr "type" "load")
(set_attr "mode" "")])
 
 (define_insn "auipc"
@@ -1989,7 +2001,8 @@ (define_insn_and_split "*mvconst_internal"
   riscv_move_integer (operands[0], operands[0], INTVAL (operands[1]),
   mode);
   DONE;
-})
+}
+[(set_attr "type" "move")])
 
 ;; 64-bit integer moves
 
@@ -2011,6 +2024,7 @@ (define_insn "*movdi_32bit"
   { return riscv_output_move (operands[0], operands[1]

[PATCH] MATCH [PR19832]: Optimize some `(a != b) ? a OP b : c`

2023-08-31 Thread Andrew Pinski via Gcc-patches
This patch adds the following match patterns to optimize these:
 /* (a != b) ? (a - b) : 0 -> (a - b) */
 /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
 /* (a != b) ? (a & b) : a -> (a & b) */
 /* (a != b) ? (a | b) : a -> (a | b) */
 /* (a != b) ? min(a,b) : a -> min(a,b) */
 /* (a != b) ? max(a,b) : a -> max(a,b) */
 /* (a != b) ? (a * b) : (a * a) -> (a * b) */
 /* (a != b) ? (a + b) : (a + a) -> (a + b) */
 /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
Note currently only integer types (include vector types)
are handled. Floating point types can be added later on.

OK? Bootstrapped and tested on x86_64-linux-gnu.

The first pattern had still shows up in GCC in cse.c's preferable
function which was the original motivation for this patch.

PR tree-optimization/19832

gcc/ChangeLog:

* match.pd: Add pattern to optimize
`(a != b) ? a OP b : c`.

gcc/testsuite/ChangeLog:

* g++.dg/opt/vectcond-1.C: New test.
* gcc.dg/tree-ssa/phi-opt-same-1.c: New test.
---
 gcc/match.pd  | 31 ++
 gcc/testsuite/g++.dg/opt/vectcond-1.C | 57 ++
 .../gcc.dg/tree-ssa/phi-opt-same-1.c  | 60 +++
 3 files changed, 148 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/opt/vectcond-1.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c01362ee359..487a7e38719 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5261,6 +5261,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(convert @c0
 #endif
 
+(for cnd (cond vec_cond)
+ /* (a != b) ? (a - b) : 0 -> (a - b) */
+ (simplify
+  (cnd (ne:c @0 @1) (minus@2 @0 @1) integer_zerop)
+  @2)
+ /* (a != b) ? (a ^ b) : 0 -> (a ^ b) */
+ (simplify
+  (cnd (ne:c @0 @1) (bit_xor:c@2 @0 @1) integer_zerop)
+  @2)
+ /* (a != b) ? (a & b) : a -> (a & b) */
+ /* (a != b) ? (a | b) : a -> (a | b) */
+ /* (a != b) ? min(a,b) : a -> min(a,b) */
+ /* (a != b) ? max(a,b) : a -> max(a,b) */
+ (for op (bit_and bit_ior min max)
+  (simplify
+   (cnd (ne:c @0 @1) (op:c@2 @0 @1) @0)
+   @2))
+ /* (a != b) ? (a * b) : (a * a) -> (a * b) */
+ /* (a != b) ? (a + b) : (a + a) -> (a + b) */
+ (for op (mult plus)
+  (simplify
+   (cnd (ne:c @0 @1) (op@2 @0 @1) (op @0 @0))
+   (if (ANY_INTEGRAL_TYPE_P (type))
+@2)))
+ /* (a != b) ? (a + b) : (2 * a) -> (a + b) */
+ (simplify
+  (cnd (ne:c @0 @1) (plus@2 @0 @1) (mult @0 uniform_integer_cst_p@3))
+  (if (wi::to_wide (uniform_integer_cst_p (@3)) == 2)
+   @2))
+)
+
 /* These was part of minmax phiopt.  */
 /* Optimize (a CMP b) ? minmax : minmax
to minmax, c> */
diff --git a/gcc/testsuite/g++.dg/opt/vectcond-1.C 
b/gcc/testsuite/g++.dg/opt/vectcond-1.C
new file mode 100644
index 000..3877ad11414
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/vectcond-1.C
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp1 -fdump-tree-optimized" } */
+/* This is the vector version of these optimizations. */
+/* PR tree-optimization/19832 */
+
+#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
+
+static inline vector int max_(vector int a, vector int b)
+{
+   return (a > b)? a : b;
+}
+static inline vector int min_(vector int a, vector int b)
+{
+  return (a < b) ? a : b;
+}
+
+vector int f_minus(vector int a, vector int b)
+{
+  return (a != b) ? a - b : (a - a);
+}
+vector int f_xor(vector int a, vector int b)
+{
+  return (a != b) ? a ^ b : (a ^ a);
+}
+
+vector int f_ior(vector int a, vector int b)
+{
+  return (a != b) ? a | b : (a | a);
+}
+vector int f_and(vector int a, vector int b)
+{
+  return (a != b) ? a & b : (a & a);
+}
+vector int f_max(vector int a, vector int b)
+{
+  return (a != b) ? max_(a, b) : max_(a, a);
+}
+vector int f_min(vector int a, vector int b)
+{
+  return (a != b) ? min_(a, b) : min_(a, a);
+}
+vector int f_mult(vector int a, vector int b)
+{
+  return (a != b) ? a * b : (a * a);
+}
+vector int f_plus(vector int a, vector int b)
+{
+  return (a != b) ? a + b : (a + a);
+}
+vector int f_plus_alt(vector int a, vector int b)
+{
+  return (a != b) ? a + b : (a * 2);
+}
+
+/* All of the above function's VEC_COND_EXPR should have been optimized away. 
*/
+/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "ccp1" } } */
+/* { dg-final { scan-tree-dump-not "VEC_COND_EXPR " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
new file mode 100644
index 000..24e757b9b9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-same-1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1 -fdump-tree-optimized" } */
+/* PR tree-optimization/19832 */
+
+static inline int max_(int a, int b)
+{
+  if (a > b) return a;
+  return b;
+}
+static inline int min_(int a, int b)
+{
+  if (a < b) return a;
+  return b;
+}
+
+int f_minus(int a, int b)
+{
+  if (a != b) return a - b;
+  return a - a;
+}
+int f_xor(int 

Re: [PATCH] c++, v2: Fix up mangling of function/block scope static structured bindings and emit abi tags [PR111069]

2023-08-31 Thread Jason Merrill via Gcc-patches

On 8/28/23 09:58, Jakub Jelinek wrote:

Hi!

On Thu, Aug 24, 2023 at 06:39:10PM +0200, Jakub Jelinek via Gcc-patches wrote:

Maybe do this in mangle_decomp, based on the actual mangling in process
instead of this pseudo-mangling?


Not sure that is possible, for 2 reasons:
1) determine_local_discriminator otherwise works on DECL_NAME, not mangled
names, so if one uses (albeit implementation reserved)
_ZZN1N3fooI1TB3bazEEivEDC1h1iEB6foobar and similar identifiers, they
could clash with the counting of the structured bindings


I guess, but those names are reserved so that we don't need to worry 
about that.



2) seems the local discriminator counting shouldn't take into account
details like abi tags, e.g. if I have:


Right, you'd need to use the partial mangled name before the ABI tags, 
e.g. with get_identifier_with_length (obstack_base, obstack_object_size).


But the way you have it is fine too.


The following updated patch handles everything except it leaves for the
above 2 reasons the determination of local discriminator where it was.
I had to add a new (defaulted) argument to cp_finish_decl and do
cp_maybe_mangle_decomp from there, so that it is after e.g. auto type
deduction and maybe_commonize_var (which had to be changed as well) and
spots in cp_finish_decl where we need or might need mangled names already.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

There is one difference between g++ with this patch and clang++,
g++ uses
_ZZ3barI1TB3quxEivEDC1o1pEB3qux
while clang++ uses
_ZZ3barI1TB3quxEivEDC1o1pE
but from what I can see, such a difference is there also when just using
normal local decls:
struct [[gnu::abi_tag ("foobar")]] S { int i; };
struct [[gnu::abi_tag ("qux")]] T { int i; S j; int k; };

inline int
foo ()
{
   static S c;
   static T d;
   return ++c.i + ++d.i;
}

template 
inline int
bar ()
{
   static S c;
   static T d;
   return ++c.i + ++d.i;
}

int (*p) () = &foo;
int (*q) () = &bar;
where both compilers mangle c in foo as:
_ZZ3foovE1cB6foobar
and d in there as
_ZZ3foovE1dB3qux
and similarly both compilers mangle c in bar as
_ZZ3barI1TB3quxEivE1cB6foobar
but g++ mangles d in bar as
_ZZ3barI1TB3quxEivE1dB3qux
while clang++ mangles it as just
_ZZ3barI1TB3quxEivE1d
No idea what is right or wrong according to Itanium mangling.


I think g++ is right.

This has to do with the ABI "If part of a declaration's type is not 
represented in the mangling, i.e. the type of a variable or a return 
type that is not represented in the mangling of a function, any ABI tags 
on that type (or components of a compound type) that are not also 
present in a mangled part of the type are applied to the name of the 
declaration."


Here the type of bar::d is not mangled, so the tags of T are applied to 
the name d.


I guess clang interpreted the above as "any ABI tags on that type that 
are not already present in the mangled name...", which would also be a 
reasonable rule but is not the actual rule in the ABI.



2023-08-28  Jakub Jelinek  

PR c++/111069
gcc/
* common.opt (fabi-version=): Document version 19.
* doc/invoke.texi (-fabi-version=): Likewise.
gcc/c-family/
* c-opts.cc (c_common_post_options): Change latest_abi_version to 19.
gcc/cp/
* cp-tree.h (determine_local_discriminator): Add NAME argument with
NULL_TREE default.
(struct cp_decomp): New type.


Maybe cp_finish_decomp should take this as well?  And 
tsubst_decomp_names, and various other functions with 
decomp_first_name/decomp_cnt parms?



+  if (tree tags = get_abi_tags (decl))
+{
+  /* We didn't emit ABI tags for structured bindings before ABI 19.  */
+  if (!G.need_abi_warning
+  && abi_warn_or_compat_version_crosses (19))
+   G.need_abi_warning = 1;


In general we should probably only warn about mangling changes if 
TREE_PUBLIC (decl).


Jason



Re: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Hans-Peter Nilsson via Gcc-patches
> Date: Thu, 31 Aug 2023 17:25:45 +0200
> From: Christophe Lyon via Gcc-patches 

> As discussed in PR104167 (comments #8 and below), and PR111238, using
> -Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
> (cross-toolchain) avoids link failures for a few tests:
> 
> 27_io/filesystem/path/108636.cc
> std/time/clock/gps/1.cc
> std/time/clock/gps/io.cc
> std/time/clock/tai/1.cc
> std/time/clock/tai/io.cc
> std/time/clock/utc/1.cc
> std/time/clock/utc/io.cc
> std/time/clock/utc/leap_second_info.cc
> std/time/exceptions.cc
> std/time/format.cc
> std/time/time_zone/get_info_local.cc
> std/time/time_zone/get_info_sys.cc
> std/time/tzdb/1.cc
> std/time/tzdb/leap_seconds.cc
> std/time/tzdb_list/1.cc
> std/time/zoned_time/1.cc
> std/time/zoned_time/custom.cc
> std/time/zoned_time/io.cc
> std/time/zoned_traits.cc
> 
> This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
> cross-build cases, like we already do for native builds. We keep not
> doing so in Canadian-cross builds.
> 
> However, this would hide the fact that libstdc++ somehow forces the
> user to use -Wl,-gc-sections to avoid undefined references to chdir,
> mkdir, chmod, pathconf, ... so maybe it's better to keep the status
> quo and not apply this patch?

Datapoint: no failures for cris-elf in the listed tests -
but it instead passes --gc-sections if -O2 or -O3 is seen
for linking; see cris/cris.h.  It's been like that forever,
modulo a patch in 2002 not passing it if "-r" is seen.

Incidentally, I've been sort-of investigating a recent-ish
commit to newlib (8/8) that added a stub for getpriority,
which was apparently added due to testsuite breakage for
libstdc++ and arm-eabi, but that instead broke testsuite
results for *other* targets, as warning at link-time.  Film
at 11.

> 2023-08-31  Christophe Lyon  
> 
> libstdc++-v3/ChangeLog:
> 
> PR libstdc++/111238
> * configure: Regenerate.
> * configure.ac: Call GLIBCXX_CHECK_LINKER_FEATURES in cross,
> non-Canadian builds.

On this actual patch, I can't say yay or nay though (but
leaning towards yay), but I'll test for cris-elf.  Would you
mind holding off committing for a day or two?

brgds, H-P


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread David Malcolm via Gcc-patches
On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> On Tue, Aug 29, 2023 at 5:14 PM David Malcolm 
> wrote:
> > 
> > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > Additionally, by using the old model and the pointer per your
> > > suggestion,
> > > we are able to find the representative tree and emit a more
> > > accurate
> > > diagnostic!
> > > 
> > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > ‘1’
> > > but ob_refcnt field is: ‘2’
> > >    23 |   return list;
> > >   |  ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |  |    ^~
> > >     |  |    |
> > >     |  |    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |  |    ~
> > >     |  |    |
> > >     |  |    (2) when ‘PyList_New’ succeeds
> > >     |..
> > >     |   14 |   PyList_Append(list, item);
> > >     |  |   ~
> > >     |  |   |
> > >     |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > >     |..
> > >     |   23 |   return list;
> > >     |  |  
> > >     |  |  |
> > >     |  |  (4) here
> > >     |
> > 
> > Excellent, that's a big improvement.
> > 
> > > 
> > > If a representative tree is not found, I decided we should just
> > > bail
> > > out
> > > of emitting a diagnostic for now, to avoid confusing the user on
> > > what
> > > the problem is.
> > 
> > Fair enough.
> > 
> > > 
> > > I've attached the patch for this (on top of the previous one)
> > > below.
> > > If
> > > it also looks good, I can merge it with the last patch and push
> > > it in
> > > at
> > > the same time.
> > 
> > I don't mind either way, but please can you update the tests so
> > that we
> > have some automated test coverage that the correct name is being
> > printed in the warning.
> > 
> > Thanks
> > Dave
> > 
> 
> Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> to
> preserve our chain on the list:
> 
> ---
> 
> Thanks; pushed to trunk with nits fixed:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> .

Thanks; looks good.

Do you want to add this to the GCC 14 part of the "History" section on
the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer
or should I?

> 
> Incidentally, I updated my formatting settings in VSCode, which I've
> previously mentioned in passing. In case anyone is interested:
> 
> "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> AlignAfterOpenBracket: Align,
> AllowAllParametersOfDeclarationOnNextLine: true }",
> 
> This fixes some issues with the indent width and also ensures
> function
> parameters of appropriate length are aligned properly and on a new
> line each (like the rest of the analyzer code).

Thanks
Dave




Re: [PATCH] Darwin: homogenize spelling of macOS

2023-08-31 Thread FX Coudert via Gcc-patches
Hi,

Thanks Sandra and Iain.
Patch pushed.

FX


Re: [PATCH] Darwin: homogenize spelling of macOS

2023-08-31 Thread Sandra Loosemore

On 8/31/23 05:27, Iain Sandoe wrote:

Hi FX,

+Sandra


On 31 Aug 2023, at 12:13, FX Coudert  wrote:

This patch homogenizes to some extent the use of “Mac OS X” or “OS X” or “Mac 
OS” in the gcc/ folder to “macOS”, which is the modern way of writing it. It is 
not a global replacement though, and each use was audited.

- When referring to specific versions that used the “OS X” or “Mac OS” as their 
name, it was kept.
- All uses referring to powerpc*-apple-darwin* were kept as-is, because those 
versions all predate the change to “macOS”.
- I did not touch Ada or D
- I did not touch testsuite comments

Tested by building on x86_64-apple-darwin, and generating the docs.
OK to push?


I think this is useful for user (or configurer)-facing documentation and help 
strings.

Being picky, there is one change where the reference is to 10.9 and earlier 
which are all Mac OS X (but that’s in a code comment so no need to change it).

OK from the Darwin perspective (for the code changes),
please wait for any comments from Sandra on the documentation changes.


I can't claim any particular knowledge of macOS or its correct historical 
naming, so I'm happy to defer to experts on that.  I did look over the patch 
and didn't spot anything that looked scary, at least.


-Sandra


[PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds (PR111238)

2023-08-31 Thread Christophe Lyon via Gcc-patches
As discussed in PR104167 (comments #8 and below), and PR111238, using
-Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
(cross-toolchain) avoids link failures for a few tests:

27_io/filesystem/path/108636.cc
std/time/clock/gps/1.cc
std/time/clock/gps/io.cc
std/time/clock/tai/1.cc
std/time/clock/tai/io.cc
std/time/clock/utc/1.cc
std/time/clock/utc/io.cc
std/time/clock/utc/leap_second_info.cc
std/time/exceptions.cc
std/time/format.cc
std/time/time_zone/get_info_local.cc
std/time/time_zone/get_info_sys.cc
std/time/tzdb/1.cc
std/time/tzdb/leap_seconds.cc
std/time/tzdb_list/1.cc
std/time/zoned_time/1.cc
std/time/zoned_time/custom.cc
std/time/zoned_time/io.cc
std/time/zoned_traits.cc

This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
cross-build cases, like we already do for native builds. We keep not
doing so in Canadian-cross builds.

However, this would hide the fact that libstdc++ somehow forces the
user to use -Wl,-gc-sections to avoid undefined references to chdir,
mkdir, chmod, pathconf, ... so maybe it's better to keep the status
quo and not apply this patch?

2023-08-31  Christophe Lyon  

libstdc++-v3/ChangeLog:

PR libstdc++/111238
* configure: Regenerate.
* configure.ac: Call GLIBCXX_CHECK_LINKER_FEATURES in cross,
non-Canadian builds.
From 026b173107f19d4a1bf4e1cd05befa97c65d01f4 Mon Sep 17 00:00:00 2001
From: Christophe Lyon 
Date: Thu, 31 Aug 2023 13:50:16 +
Subject: [PATCH] libstdc++: Use GLIBCXX_CHECK_LINKER_FEATURES for cross-builds
 (PR111238)

As discussed in PR104167 (comments #8 and below), and PR111238, using
-Wl,-gc-sections in the libstdc++ testsuite for arm-eabi
(cross-toolchain) avoids link failures for a few tests:

27_io/filesystem/path/108636.cc
std/time/clock/gps/1.cc
std/time/clock/gps/io.cc
std/time/clock/tai/1.cc
std/time/clock/tai/io.cc
std/time/clock/utc/1.cc
std/time/clock/utc/io.cc
std/time/clock/utc/leap_second_info.cc
std/time/exceptions.cc
std/time/format.cc
std/time/time_zone/get_info_local.cc
std/time/time_zone/get_info_sys.cc
std/time/tzdb/1.cc
std/time/tzdb/leap_seconds.cc
std/time/tzdb_list/1.cc
std/time/zoned_time/1.cc
std/time/zoned_time/custom.cc
std/time/zoned_time/io.cc
std/time/zoned_traits.cc

This patch achieves this by calling GLIBCXX_CHECK_LINKER_FEATURES in
cross-build cases, like we already do for native builds. We keep not
doing so in Canadian-cross builds.

However, this would hide the fact that libstdc++ somehow forces the
user to use -Wl,-gc-sections to avoid undefined references to chdir,
mkdir, chmod, pathconf, ... so maybe it's better to keep the status
quo and not apply this patch?

2023-08-31  Christophe Lyon  

libstdc++-v3/ChangeLog:

	PR libstdc++/111238
	* configure: Regenerate.
	* configure.ac: Call GLIBCXX_CHECK_LINKER_FEATURES in cross,
	non-Canadian builds.
---
 libstdc++-v3/configure| 154 ++
 libstdc++-v3/configure.ac |   4 +
 2 files changed, 158 insertions(+)

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index c4da56c3042..948dab4f9a0 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -29823,6 +29823,160 @@ else
 CANADIAN=yes
   else
 CANADIAN=no
+  fi
+
+  if test $CANADIAN = no; then
+
+  # If we're not using GNU ld, then there's no point in even trying these
+  # tests.  Check for that first.  We should have already tested for gld
+  # by now (in libtool), but require it now just to be safe...
+  test -z "$SECTION_LDFLAGS" && SECTION_LDFLAGS=''
+  test -z "$OPT_LDFLAGS" && OPT_LDFLAGS=''
+
+
+
+  # The name set by libtool depends on the version of libtool.  Shame on us
+  # for depending on an impl detail, but c'est la vie.  Older versions used
+  # ac_cv_prog_gnu_ld, but now it's lt_cv_prog_gnu_ld, and is copied back on
+  # top of with_gnu_ld (which is also set by --with-gnu-ld, so that actually
+  # makes sense).  We'll test with_gnu_ld everywhere else, so if that isn't
+  # set (hence we're using an older libtool), then set it.
+  if test x${with_gnu_ld+set} != xset; then
+if test x${ac_cv_prog_gnu_ld+set} != xset; then
+  # We got through "ac_require(ac_prog_ld)" and still not set?  Huh?
+  with_gnu_ld=no
+else
+  with_gnu_ld=$ac_cv_prog_gnu_ld
+fi
+  fi
+
+  # Start by getting the version number.  I think the libtool test already
+  # does some of this, but throws away the result.
+  glibcxx_ld_is_gold=no
+  glibcxx_ld_is_mold=no
+  if test x"$with_gnu_ld" = x"yes"; then
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ld version" >&5
+$as_echo_n "checking for ld version... " >&6; }
+
+if $LD --version 2>/dev/null | grep 'GNU gold' >/dev/null 2>&1; then
+  glibcxx_ld_is_gold=yes
+elif $LD --version 2>/dev/null | grep 'mold' >/dev/null 2>&1; then
+  glibcxx_ld_is_mold=yes
+fi
+ldver=`$LD --version 2>/dev/null |
+	   sed -e 's/[. ][0-9]\{8\}$//;s/.* \([^ ]\{1,\}\)$/\1/; q'`
+
+glibcxx_gnu_ld_version=`echo $ldver | \
+	   $AWK 

[PATCH] lra: Avoid unfolded plus-0

2023-08-31 Thread Richard Sandiford via Gcc-patches
While backporting another patch to an earlier release, I hit a
situation in which lra_eliminate_regs_1 would eliminate an address to:

(plus (reg:P R) (const_int 0))

This address compared not-equal to plain:

(reg:P R)

which caused an ICE in a later peephole2.  (The ICE showed up in
gfortran.fortran-torture/compile/pr80464.f90 on the branch but seems
to be latent on trunk.)

These unfolded PLUSes shouldn't occur in the insn stream, and later code
in the same function tried to avoid them.

Tested on aarch64-linux-gnu so far, but I'll test on x86_64-linux-gnu too.
Does this look OK?

There are probably other instances of the same thing elsewhere,
but it seemed safer to stick to the one that caused the issue.

Thanks,
Richard


gcc/
* lra-eliminations.cc (lra_eliminate_regs_1): Use simplify_gen_binary
rather than gen_rtx_PLUS.
---
 gcc/lra-eliminations.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index df613cdda76..4daaff1a124 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -406,7 +406,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode 
mem_mode,
elimination_fp2sp_occured_p = true;
 
  if (! update_p && ! full_p)
-   return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
+   return simplify_gen_binary (PLUS, Pmode, to, XEXP (x, 1));
 
  if (maybe_ne (update_sp_offset, 0))
offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
-- 
2.25.1



[PATCH] aarch64: Fix return register handling in untyped_call

2023-08-31 Thread Richard Sandiford via Gcc-patches
While working on another patch, I hit a problem with the aarch64
expansion of untyped_call.  The expander emits the usual:

  (set (mem ...) (reg resN))

instructions to store the result registers to memory, but it didn't
say in RTL where those resN results came from.  This eventually led
to a failure of gcc.dg/torture/stackalign/builtin-return-2.c,
via regrename.

This patch turns the untyped call from a plain call to a call_value,
to represent that the call returns (or might return) a useful value.
The patch also uses a PARALLEL return rtx to represent all the possible
return registers.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.md (untyped_call): Emit a call_value
rather than a call.  List each possible destination register
in the call pattern.
---
 gcc/config/aarch64/aarch64.md | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 01cf989641f..6f7827bd8c9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1170,9 +1170,27 @@ (define_expand "untyped_call"
 {
   int i;
 
+  /* Generate a PARALLEL that contains all of the register results.
+ The offsets are somewhat arbitrary, since we don't know the
+ actual return type.  The main thing we need to avoid is having
+ overlapping byte ranges, since those might give the impression
+ that two registers are known to have data in common.  */
+  rtvec rets = rtvec_alloc (XVECLEN (operands[2], 0));
+  poly_int64 offset = 0;
+  for (i = 0; i < XVECLEN (operands[2], 0); i++)
+{
+  rtx reg = SET_SRC (XVECEXP (operands[2], 0, i));
+  gcc_assert (REG_P (reg));
+  rtx offset_rtx = gen_int_mode (offset, Pmode);
+  rtx piece = gen_rtx_EXPR_LIST (VOIDmode, reg, offset_rtx);
+  RTVEC_ELT (rets, i) = piece;
+  offset += GET_MODE_SIZE (GET_MODE (reg));
+}
+  rtx ret = gen_rtx_PARALLEL (VOIDmode, rets);
+
   /* Untyped calls always use the default ABI.  It's only possible to use
  ABI variants if we know the type of the target function.  */
-  emit_call_insn (gen_call (operands[0], const0_rtx, const0_rtx));
+  emit_call_insn (gen_call_value (ret, operands[0], const0_rtx, const0_rtx));
 
   for (i = 0; i < XVECLEN (operands[2], 0); i++)
 {
-- 
2.25.1



Re: [PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-08-31 Thread Ken Matsui via Gcc-patches
On Thu, Aug 31, 2023 at 6:57 AM Ken Matsui  wrote:
>
> On Tue, Aug 8, 2023 at 1:19 PM Jonathan Wakely  wrote:
> >
> >
> >
> > On Tue, 18 Jul 2023 at 07:25, Ken Matsui via Libstdc++ 
> >  wrote:
> >>
> >> Hi,
> >>
> >> I took a benchmark for this.
> >>
> >> https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental-disjunction.md#mon-jul-17-105937-pm-pdt-2023
> >>
> >> template
> >> struct is_fundamental
> >> : public std::bool_constant<__is_arithmetic(_Tp)
> >> || std::is_void<_Tp>::value
> >> || std::is_null_pointer<_Tp>::value>
> >> { };
> >>
> >> is faster than:
> >>
> >> template
> >> struct is_fundamental
> >> : public std::bool_constant<__is_arithmetic(_Tp)
> >> || std::disjunction,
> >> std::is_null_pointer<_Tp>
> >> >::value>
> >> { };
> >>
> >> Time: -32.2871%
> >> Peak Memory: -18.5071%
> >> Total Memory: -20.1991%
> >
> >
> > But what about the fallback implementation of is_fundamental where we don't 
> > have the __is_arithmetic built-in?
>
> That fallback implementation would be this:
> https://github.com/ken-matsui/gsoc23/blob/967e20770599f2a8925c9794669111faef11beb7/is_fundamental.cc#L11-L15.
>
> The is_fundamental-disjunction.cc benchmark used the USE_BUILTIN
> macro, but in this benchmark, I used it to just switch two different
> implementations that use the __is_arithmetic built-in.
>
> > -: public __or_, is_void<_Tp>,
> > -  is_null_pointer<_Tp>>::type
> > +: public __bool_constant::value
> > + || is_void<_Tp>::value
> > + || is_null_pointer<_Tp>::value>
> >
> > Here the use of __or_ means that for is_fundamental we don't 
> > instantiate is_void and is_null_pointer. Isn't that still 
> > worthwhile?
> >
> Let me take a benchmark with __or_ later! We may see a difference.
>
Here is the benchmark result with __or_:

https://github.com/ken-matsui/gsoc23/blob/main/is_fundamental-disjunction.md#thu-aug-31-075127-am-pdt-2023

Time: -23.3935%
Peak Memory: -10.2915%
Total Memory: -14.4165%

Considering the following was with disjunction, __or_ is faster than
disjunction, but still just || seems much faster.

Time: -32.2871%
Peak Memory: -18.5071%
Total Memory: -20.1991%

> >
> >
> >
> >>
> >>
> >> Sincerely,
> >> Ken Matsui
> >>
> >> On Sun, Jul 16, 2023 at 9:49 PM Ken Matsui  
> >> wrote:
> >> >
> >> > On Sun, Jul 16, 2023 at 5:41 AM François Dumont  
> >> > wrote:
> >> > >
> >> > >
> >> > > On 15/07/2023 06:55, Ken Matsui via Libstdc++ wrote:
> >> > > > This patch optimizes the performance of the is_fundamental trait by
> >> > > > dispatching to the new __is_arithmetic built-in trait.
> >> > > >
> >> > > > libstdc++-v3/ChangeLog:
> >> > > >
> >> > > >   * include/std/type_traits (is_fundamental_v): Use 
> >> > > > __is_arithmetic
> >> > > >   built-in trait.
> >> > > >   (is_fundamental): Likewise. Optimize the original 
> >> > > > implementation.
> >> > > >
> >> > > > Signed-off-by: Ken Matsui 
> >> > > > ---
> >> > > >   libstdc++-v3/include/std/type_traits | 21 +
> >> > > >   1 file changed, 17 insertions(+), 4 deletions(-)
> >> > > >
> >> > > > diff --git a/libstdc++-v3/include/std/type_traits 
> >> > > > b/libstdc++-v3/include/std/type_traits
> >> > > > index 7ebbe04c77b..cf24de2fcac 100644
> >> > > > --- a/libstdc++-v3/include/std/type_traits
> >> > > > +++ b/libstdc++-v3/include/std/type_traits
> >> > > > @@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >> > > >   #endif
> >> > > >
> >> > > > /// is_fundamental
> >> > > > +#if __has_builtin(__is_arithmetic)
> >> > > > +  template
> >> > > > +struct is_fundamental
> >> > > > +: public __bool_constant<__is_arithmetic(_Tp)
> >> > > > + || is_void<_Tp>::value
> >> > > > + || is_null_pointer<_Tp>::value>
> >> > > > +{ };
> >> > >
> >> > > What about doing this ?
> >> > >
> >> > >template
> >> > >  struct is_fundamental
> >> > >  : public __bool_constant<__is_arithmetic(_Tp)
> >> > >   || __or_,
> >> > >   is_null_pointer<_Tp>>::value>
> >> > >  { };
> >> > >
> >> > > Based on your benches it seems that builtin __is_arithmetic is much 
> >> > > better that std::is_arithmetic. But __or_ could still avoid 
> >> > > instantiation of is_null_pointer.
> >> > >
> >> > Let me take a benchmark for this later.
> >>


Re: [pushed] analyzer: fix ICE in text art strings support

2023-08-31 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 30 Aug 2023 at 19:20, David Malcolm  wrote:
>
> On Wed, 2023-08-30 at 11:52 +0530, Prathamesh Kulkarni wrote:
> > On Wed, 30 Aug 2023 at 04:21, David Malcolm 
> > wrote:
> > >
> > > On Tue, 2023-08-29 at 11:01 +0530, Prathamesh Kulkarni wrote:
> > > > On Fri, 25 Aug 2023 at 18:15, David Malcolm via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > > Pushed to trunk as r14-3481-g99a3fcb8ff0bf2.
> > > > Hi David,
> > > > It seems the new tests FAIL on arm for LTO bootstrap config:
> > > > https://ci.linaro.org/job/tcwg_bootstrap_check--master-arm-check_bootstrap_lto-build/263/artifact/artifacts/06-check_regression/fails.sum/*view*/
> > >
> > > Sorry about this.
> > >
> > > Looking at e.g. the console.log.xz, I just see the status of the
> > > failing tests.
> > >
> > > Is there an easy way to get at the stderr from the tests without
> > > rerunning this?
> > >
> > > Otherwise, I'd appreciate help with reproducing this.
> > Hi David,
> > I have attached make check log for the failing tests.
> > To reproduce, I configured and built gcc with following options on
> > armv8 machine:
> > ../gcc/configure --enable-languages=c,c++,fortran --with-float=hard
> > --with-fpu=neon-fp-armv8 --with-mode=thumb --with-arch=armv8-a
> > --disable-werror --with-build-config=bootstrap-lto
> > make -j$(nproc)
>
> Thanks.
>
> Looks a lot like PR analyzer/110483, which I'm working on now (sorry!)
>
> What's the endianness of the host?
Little endian. It was built natively (host == target) on
armv8l-unknown-linux-gnueabihf.
>
>
> Specifically, the pertinent part of the log is:
>
> FAIL: gcc.dg/analyzer/out-of-bounds-diagram-17.c (test for excess errors)
> Excess errors:
>┌─┬─┬┬┬┐┌─┬─┬─┐
>│ [1] │ [1] │[1] │[1] │[1] ││ [1] │ [1] │ [1] │
>├─┼─┼┼┼┤├─┼─┼─┤
>│ ' ' │ 'w' │'o' │'r' │'l' ││ 'd' │ '!' │ NUL │
>├─┴─┴┴┴┴┴─┴─┴─┤
>│  string literal (type: 'char[8]')   │
>└─┘
>   │ ││││  │ │ │
>   │ ││││  │ │ │
>   v vvvv  v v v
>   ┌─┬┬┐┌─┐
>   │ [0] │  ...   │[9] ││ │
>   ├─┴┴┤│after valid range│
>   │ 'buf' (type: 'char[10]')  ││ │
>   └───┘└─┘
>   ├─┬─┤├┬┤
> │   │
>   ╭─┴╮╭─┴─╮
>   │capacity: 10 bytes││overflow of 3 bytes│
>   ╰──╯╰───╯
>
> where the issue seems to be all those [1], which are meant to be index
> [0], [1], [2], etc.
Oh OK, thanks for the clarification!

Thanks,
Prathamesh
>
>
> Dave


RE: [PATCH v1] RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

2023-08-31 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 31, 2023 9:09 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

LGTM

On Thu, Aug 24, 2023 at 3:13 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination.
>
> vfadd RTZ   <- intrinisc static rounding
> vfmsub  <- autovec/autovec-opt
>
> The autovec generated vfmsub should take DYN mode, and the
> frm must be restored before the vfmsub insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfmsub  <- autovec/autovec-opt
> | ...
> +
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmsac/vfmsub
> * config/riscv/autovec.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 36 
>  gcc/config/riscv/autovec.md   | 30 ---
>  .../rvv/base/float-point-frm-autovec-2.c  | 88 +++
>  3 files changed, 127 insertions(+), 27 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 4b07e80ad95..732a51edacd 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -583,13 +583,15 @@ (define_insn_and_split "*single_widen_fnma"
>  ;; vect__13.182_33 = .FMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (float_extend:VWEXTF
> + (match_operand: 3 "register_operand"))
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -600,17 +602,20 @@ (define_insn_and_split "*double_widen_fms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fms.
>  (define_insn_and_split "*single_widen_fms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (match_operand:VWEXTF 3 "register_operand")
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -627,7 +632,8 @@ (define_insn_and_split "*single_widen_fms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWNMACC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 4894986d2a5..d9f1a10eb66 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1218,24 +1218,29 @@ (define_insn_and_split "*fnma"
>  (define_expand "fms4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (match_operand:VF 1 "register_operand")
> -   (match_operand:VF 2 "register_operand")
> -   (neg:VF
> - (match_operand:VF 3 "register_operand"
> + (unspec:VF
> +   [(fma:VF
> + (match_operand:VF 1 "register_operand")
> + (match_operand:VF 2 "register_operand")
> + (neg:VF
> +   (match_operand:VF 3 "register_operand"))

Re: [PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-08-31 Thread Ken Matsui via Gcc-patches
On Tue, Aug 8, 2023 at 1:19 PM Jonathan Wakely  wrote:
>
>
>
> On Tue, 18 Jul 2023 at 07:25, Ken Matsui via Libstdc++ 
>  wrote:
>>
>> Hi,
>>
>> I took a benchmark for this.
>>
>> https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental-disjunction.md#mon-jul-17-105937-pm-pdt-2023
>>
>> template
>> struct is_fundamental
>> : public std::bool_constant<__is_arithmetic(_Tp)
>> || std::is_void<_Tp>::value
>> || std::is_null_pointer<_Tp>::value>
>> { };
>>
>> is faster than:
>>
>> template
>> struct is_fundamental
>> : public std::bool_constant<__is_arithmetic(_Tp)
>> || std::disjunction,
>> std::is_null_pointer<_Tp>
>> >::value>
>> { };
>>
>> Time: -32.2871%
>> Peak Memory: -18.5071%
>> Total Memory: -20.1991%
>
>
> But what about the fallback implementation of is_fundamental where we don't 
> have the __is_arithmetic built-in?

That fallback implementation would be this:
https://github.com/ken-matsui/gsoc23/blob/967e20770599f2a8925c9794669111faef11beb7/is_fundamental.cc#L11-L15.

The is_fundamental-disjunction.cc benchmark used the USE_BUILTIN
macro, but in this benchmark, I used it to just switch two different
implementations that use the __is_arithmetic built-in.

> -: public __or_, is_void<_Tp>,
> -  is_null_pointer<_Tp>>::type
> +: public __bool_constant::value
> + || is_void<_Tp>::value
> + || is_null_pointer<_Tp>::value>
>
> Here the use of __or_ means that for is_fundamental we don't instantiate 
> is_void and is_null_pointer. Isn't that still worthwhile?
>
Let me take a benchmark with __or_ later! We may see a difference.

>
>
>
>>
>>
>> Sincerely,
>> Ken Matsui
>>
>> On Sun, Jul 16, 2023 at 9:49 PM Ken Matsui  wrote:
>> >
>> > On Sun, Jul 16, 2023 at 5:41 AM François Dumont  
>> > wrote:
>> > >
>> > >
>> > > On 15/07/2023 06:55, Ken Matsui via Libstdc++ wrote:
>> > > > This patch optimizes the performance of the is_fundamental trait by
>> > > > dispatching to the new __is_arithmetic built-in trait.
>> > > >
>> > > > libstdc++-v3/ChangeLog:
>> > > >
>> > > >   * include/std/type_traits (is_fundamental_v): Use __is_arithmetic
>> > > >   built-in trait.
>> > > >   (is_fundamental): Likewise. Optimize the original implementation.
>> > > >
>> > > > Signed-off-by: Ken Matsui 
>> > > > ---
>> > > >   libstdc++-v3/include/std/type_traits | 21 +
>> > > >   1 file changed, 17 insertions(+), 4 deletions(-)
>> > > >
>> > > > diff --git a/libstdc++-v3/include/std/type_traits 
>> > > > b/libstdc++-v3/include/std/type_traits
>> > > > index 7ebbe04c77b..cf24de2fcac 100644
>> > > > --- a/libstdc++-v3/include/std/type_traits
>> > > > +++ b/libstdc++-v3/include/std/type_traits
>> > > > @@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > > >   #endif
>> > > >
>> > > > /// is_fundamental
>> > > > +#if __has_builtin(__is_arithmetic)
>> > > > +  template
>> > > > +struct is_fundamental
>> > > > +: public __bool_constant<__is_arithmetic(_Tp)
>> > > > + || is_void<_Tp>::value
>> > > > + || is_null_pointer<_Tp>::value>
>> > > > +{ };
>> > >
>> > > What about doing this ?
>> > >
>> > >template
>> > >  struct is_fundamental
>> > >  : public __bool_constant<__is_arithmetic(_Tp)
>> > >   || __or_,
>> > >   is_null_pointer<_Tp>>::value>
>> > >  { };
>> > >
>> > > Based on your benches it seems that builtin __is_arithmetic is much 
>> > > better that std::is_arithmetic. But __or_ could still avoid 
>> > > instantiation of is_null_pointer.
>> > >
>> > Let me take a benchmark for this later.
>>


Re: [PATCH] RISC-V: zicond: remove bogus opt2 pattern

2023-08-31 Thread Jeff Law via Gcc-patches




On 8/30/23 15:57, Vineet Gupta wrote:

This was tripping up gcc.c-torture/execute/pr60003.c at -O1 since the
pattern semantics can't be expressed by zicond instructions.

This involves test code snippet:

   if (a == 0)
return 0;
   else
return x;
 }

which is equivalent to:  "x = (a != 0) ? x : a"

Isn't it

x = (a == 0) ? 0 : x

Which seems like it ought to fit zicond just fine.

If we take yours;

x = (a != 0) ? x : a

And simplify with the known value of a on the false arm we get:

x = (a != 0 ) ? x : 0;

Which is equivalent to

x = (a == 0) ? 0 : x;

So ISTM this does fit zicond just fine.








and matches define_insn "*czero.nez..opt2"

| (insn 41 20 38 3 (set (reg/v:DI 136 [ x ])
|(if_then_else:DI (ne (reg/v:DI 134 [ a ])
|(const_int 0 [0]))
|(reg/v:DI 136 [ x ])
|(reg/v:DI 134 [ a ]))) {*czero.nez.didi.opt2}

The corresponding asm pattern generates
 czero.nez x, x, a   ; %0, %2, %1
implying
 "x = (a != 0) ? 0 : a"

I get this from the RTL pattern:

x = (a != 0) ? x : a
x = (a != 0) ? x : 0

I think you got the arms reversed.






which is not what the pattern semantics are.

Essentially "(a != 0) ? x : a" cannot be expressed with CZERO.nez

Agreed, but I think you goof'd earlier :-)


Jeff


Re: [PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-08-31 Thread Ken Matsui via Gcc-patches
On Tue, Aug 8, 2023 at 1:14 PM Jonathan Wakely  wrote:
>
>
>
> On Tue, 18 Jul 2023 at 07:28, Ken Matsui via Libstdc++ 
>  wrote:
>>
>> I will eventually work on disjunction to somehow optimize, but in the
>> meantime, this might be a better implementation. Of course, my
>> benchmark could be wrong.
>
>
> You should use __or_ internally in libstdc++ code, not std::disjunction.
>
> Patrick already optimized both of those, and __or_ is slightly faster 
> (because it doesn't have to conform to the full requirements of 
> std::disjunction).
>
I will! Thank you!

> A compiler built-in for __or_ / __disjunction might perform better. But 
> eventually if we're going to have built-ins for all of __is_arithmetic, 
> __is_void, and __is_null_pointer, then we would want simply:
>
> __is_arithmetic(T) || __is_void(T) || __is_null_pointer(T)
>
> and so we wouldn't need to avoid instantiating any class templates at all.
>
I think we are not going to define built-ins for is_void and
is_null_pointer as their type trait implementations are already
optimal.

>
>>
>>
>> On Mon, Jul 17, 2023 at 11:24 PM Ken Matsui  
>> wrote:
>> >
>> > Hi,
>> >
>> > I took a benchmark for this.
>> >
>> > https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental-disjunction.md#mon-jul-17-105937-pm-pdt-2023
>> >
>> > template
>> > struct is_fundamental
>> > : public std::bool_constant<__is_arithmetic(_Tp)
>> > || std::is_void<_Tp>::value
>> > || std::is_null_pointer<_Tp>::value>
>> > { };
>> >
>> > is faster than:
>> >
>> > template
>> > struct is_fundamental
>> > : public std::bool_constant<__is_arithmetic(_Tp)
>> > || std::disjunction,
>> > std::is_null_pointer<_Tp>
>> > >::value>
>> > { };
>> >
>> > Time: -32.2871%
>> > Peak Memory: -18.5071%
>> > Total Memory: -20.1991%
>> >
>> > Sincerely,
>> > Ken Matsui
>> >
>> > On Sun, Jul 16, 2023 at 9:49 PM Ken Matsui  
>> > wrote:
>> > >
>> > > On Sun, Jul 16, 2023 at 5:41 AM François Dumont  
>> > > wrote:
>> > > >
>> > > >
>> > > > On 15/07/2023 06:55, Ken Matsui via Libstdc++ wrote:
>> > > > > This patch optimizes the performance of the is_fundamental trait by
>> > > > > dispatching to the new __is_arithmetic built-in trait.
>> > > > >
>> > > > > libstdc++-v3/ChangeLog:
>> > > > >
>> > > > >   * include/std/type_traits (is_fundamental_v): Use 
>> > > > > __is_arithmetic
>> > > > >   built-in trait.
>> > > > >   (is_fundamental): Likewise. Optimize the original 
>> > > > > implementation.
>> > > > >
>> > > > > Signed-off-by: Ken Matsui 
>> > > > > ---
>> > > > >   libstdc++-v3/include/std/type_traits | 21 +
>> > > > >   1 file changed, 17 insertions(+), 4 deletions(-)
>> > > > >
>> > > > > diff --git a/libstdc++-v3/include/std/type_traits 
>> > > > > b/libstdc++-v3/include/std/type_traits
>> > > > > index 7ebbe04c77b..cf24de2fcac 100644
>> > > > > --- a/libstdc++-v3/include/std/type_traits
>> > > > > +++ b/libstdc++-v3/include/std/type_traits
>> > > > > @@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> > > > >   #endif
>> > > > >
>> > > > > /// is_fundamental
>> > > > > +#if __has_builtin(__is_arithmetic)
>> > > > > +  template
>> > > > > +struct is_fundamental
>> > > > > +: public __bool_constant<__is_arithmetic(_Tp)
>> > > > > + || is_void<_Tp>::value
>> > > > > + || is_null_pointer<_Tp>::value>
>> > > > > +{ };
>> > > >
>> > > > What about doing this ?
>> > > >
>> > > >template
>> > > >  struct is_fundamental
>> > > >  : public __bool_constant<__is_arithmetic(_Tp)
>> > > >   || __or_,
>> > > >   is_null_pointer<_Tp>>::value>
>> > > >  { };
>> > > >
>> > > > Based on your benches it seems that builtin __is_arithmetic is much 
>> > > > better that std::is_arithmetic. But __or_ could still avoid 
>> > > > instantiation of is_null_pointer.
>> > > >
>> > > Let me take a benchmark for this later.
>>


Re: [PING][PATCH] LoongArch: initial ada support on linux

2023-08-31 Thread Marc Poulhiès via Gcc-patches


Yang Yujie  writes:

Hello Yujie,

> gcc/ChangeLog:
>
>   * ada/Makefile.rtl: Add LoongArch support.
>   * ada/libgnarl/s-linux__loongarch.ads: New.
>   * ada/libgnat/system-linux-loongarch.ads: New.
>   * config/loongarch/loongarch.h: mark normalized options
>   passed from driver to gnat1 as explicit for multilib.
> ---
>  gcc/ada/Makefile.rtl   |  49 +++
>  gcc/ada/libgnarl/s-linux__loongarch.ads| 134 +++
>  gcc/ada/libgnat/system-linux-loongarch.ads | 145 +

The Ada part of the patch looks correct, thanks.

>  gcc/config/loongarch/loongarch.h   |   4 +-
>  4 files changed, 330 insertions(+), 2 deletions(-)
> diff --git a/gcc/config/loongarch/loongarch.h 
> b/gcc/config/loongarch/loongarch.h
> index f8167875646..9887a7ac630 100644
> --- a/gcc/config/loongarch/loongarch.h
> +++ b/gcc/config/loongarch/loongarch.h
> @@ -83,9 +83,9 @@ along with GCC; see the file COPYING3.  If not see
>  /* CC1_SPEC is the set of arguments to pass to the compiler proper.  */
>
>  #undef CC1_SPEC
> -#define CC1_SPEC "\
> +#define CC1_SPEC "%{,ada:-gnatea} %{m*} \
>  %{G*} \
> -%(subtarget_cc1_spec)"
> +%(subtarget_cc1_spec) %{,ada:-gnatez}"
>
>  /* Preprocessor specs.  */

This is outside of ada/ (so I don't have a say on it), but I'm curious
about why you need to use -gnatea/-gnatez here?

Thanks,
Marc


RE: [PATCH v1] RISC-V: Support rounding mode for VFMADD/VFMACC autovec

2023-08-31 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 31, 2023 9:10 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Support rounding mode for VFMADD/VFMACC autovec

LGTM

On Thu, Aug 24, 2023 at 12:49 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination
>
> vfadd RTZ   <- intrinisc static rounding
> vfmadd  <- autovec/autovec-opt
>
> The autovec generated vfmadd should take DYN mode, and the
> frm must be restored before the vfmadd insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfmadd  <- autovec/autovec-opt
> | ...
> +
>
> However, we leverage unspec instead of use to consume the FRM register
> because there are some restrictions from the combine pass. Some code
> path of try_combine may require the XVECLEN(pat, 0) == 2 for the
> recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
> and result in the vfwmacc optimization failure. For example, in the
> test  widen-complicate-5.c and widen-8.c
>
> Finally, there will be other fma cases and they will be covered in
> the underlying patches.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
> * config/riscv/autovec.md: Ditto.
> * config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 32 ---
>  gcc/config/riscv/autovec.md   | 26 +++---
>  gcc/config/riscv/vector-iterators.md  |  2 +
>  .../rvv/base/float-point-frm-autovec-1.c  | 88 +++
>  4 files changed, 125 insertions(+), 23 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 99b609a99d9..4b07e80ad95 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -459,12 +459,14 @@ (define_insn_and_split "*pred_single_widen_mul"
>  ;; vect__13.182_33 = .FMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (float_extend:VWEXTF
> + (match_operand: 3 "register_operand"))
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -475,16 +477,19 @@ (define_insn_and_split "*double_widen_fma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fma.
>  (define_insn_and_split "*single_widen_fma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (match_operand:VWEXTF 3 "register_operand")
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -501,7 +506,8 @@ (define_insn_and_split "*single_widen_fma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWNMSAC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index acca4c22b90..4894986d2a5 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1126,22 +1126,27 @@ (define_insn_and_split

Re: [PATCH v1] RISC-V: Support rounding mode for VFMADD/VFMACC autovec

2023-08-31 Thread Kito Cheng via Gcc-patches
LGTM

On Thu, Aug 24, 2023 at 12:49 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination
>
> vfadd RTZ   <- intrinisc static rounding
> vfmadd  <- autovec/autovec-opt
>
> The autovec generated vfmadd should take DYN mode, and the
> frm must be restored before the vfmadd insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the vfmadd/vfmacc autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfmadd  <- autovec/autovec-opt
> | ...
> +
>
> However, we leverage unspec instead of use to consume the FRM register
> because there are some restrictions from the combine pass. Some code
> path of try_combine may require the XVECLEN(pat, 0) == 2 for the
> recog_for_combine, and add new use will make the XVECLEN(pat, 0) == 3
> and result in the vfwmacc optimization failure. For example, in the
> test  widen-complicate-5.c and widen-8.c
>
> Finally, there will be other fma cases and they will be covered in
> the underlying patches.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Ju-Zhe Zhong 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmadd/vfmacc.
> * config/riscv/autovec.md: Ditto.
> * config/riscv/vector-iterators.md: Add UNSPEC_VFFMA.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 32 ---
>  gcc/config/riscv/autovec.md   | 26 +++---
>  gcc/config/riscv/vector-iterators.md  |  2 +
>  .../rvv/base/float-point-frm-autovec-1.c  | 88 +++
>  4 files changed, 125 insertions(+), 23 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-1.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 99b609a99d9..4b07e80ad95 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -459,12 +459,14 @@ (define_insn_and_split "*pred_single_widen_mul"
>  ;; vect__13.182_33 = .FMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (float_extend:VWEXTF
> + (match_operand: 3 "register_operand"))
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -475,16 +477,19 @@ (define_insn_and_split "*double_widen_fma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fma.
>  (define_insn_and_split "*single_widen_fma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (match_operand:VWEXTF 3 "register_operand")
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -501,7 +506,8 @@ (define_insn_and_split "*single_widen_fma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWNMSAC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index acca4c22b90..4894986d2a5 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1126,22 +1126,27 @@ (define_insn_and_split "*fnma"
>  (define_expand "fma4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (match_operand:VF 1 "register_operand")
> -   (match_operand:VF 2 "register_operand")
> -   (match_operand:VF 3 "register_oper

Re: [PATCH v1] RISC-V: Support rounding mode for VFMSAC/VFMSUB autovec

2023-08-31 Thread Kito Cheng via Gcc-patches
LGTM

On Thu, Aug 24, 2023 at 3:13 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination.
>
> vfadd RTZ   <- intrinisc static rounding
> vfmsub  <- autovec/autovec-opt
>
> The autovec generated vfmsub should take DYN mode, and the
> frm must be restored before the vfmsub insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfmsub  <- autovec/autovec-opt
> | ...
> +
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfmsac/vfmsub
> * config/riscv/autovec.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 36 
>  gcc/config/riscv/autovec.md   | 30 ---
>  .../rvv/base/float-point-frm-autovec-2.c  | 88 +++
>  3 files changed, 127 insertions(+), 27 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-2.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 4b07e80ad95..732a51edacd 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -583,13 +583,15 @@ (define_insn_and_split "*single_widen_fnma"
>  ;; vect__13.182_33 = .FMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (float_extend:VWEXTF
> + (match_operand: 3 "register_operand"))
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -600,17 +602,20 @@ (define_insn_and_split "*double_widen_fms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fms.
>  (define_insn_and_split "*single_widen_fms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (float_extend:VWEXTF
> -   (match_operand: 2 "register_operand"))
> - (match_operand:VWEXTF 3 "register_operand")
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 2 "register_operand"))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -627,7 +632,8 @@ (define_insn_and_split "*single_widen_fms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWNMACC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 4894986d2a5..d9f1a10eb66 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1218,24 +1218,29 @@ (define_insn_and_split "*fnma"
>  (define_expand "fms4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (match_operand:VF 1 "register_operand")
> -   (match_operand:VF 2 "register_operand")
> -   (neg:VF
> - (match_operand:VF 3 "register_operand"
> + (unspec:VF
> +   [(fma:VF
> + (match_operand:VF 1 "register_operand")
> + (match_operand:VF 2 "register_operand")
> + (neg:VF
> +   (match_operand:VF 3 "register_operand")))
> +(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
>   (clobber (match_dup 4))])]
>"TARGET_VECTOR"
>{
>  operands[4] = gen_reg_rtx (Pmode);
> -  })
> +  }
> +  [(set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  (define_insn_and_split "*fms"
> 

[PATCH v3 2/4] LoongArch: define preprocessing macros "__loongarch_{arch, tune}"

2023-08-31 Thread Yang Yujie
These are exported according to the LoongArch Toolchain Conventions[1]
as a replacement of the obsolete "_LOONGARCH_{ARCH,TUNE}" macros,
which are expanded to strings representing the actual architecture
and microarchitecture of the target.

[1] currently relased at https://github.com/loongson/LoongArch-Documentation
/blob/main/docs/LoongArch-toolchain-conventions-EN.adoc

gcc/ChangeLog:

* config/loongarch/loongarch-c.cc: Export macros
"__loongarch_{arch,tune}" in the preprocessor.
---
 gcc/config/loongarch/loongarch-c.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/config/loongarch/loongarch-c.cc 
b/gcc/config/loongarch/loongarch-c.cc
index 7e3b57ff9b1..ec047e3822a 100644
--- a/gcc/config/loongarch/loongarch-c.cc
+++ b/gcc/config/loongarch/loongarch-c.cc
@@ -64,6 +64,9 @@ loongarch_cpu_cpp_builtins (cpp_reader *pfile)
   LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_ARCH", la_target.cpu_arch);
   LARCH_CPP_SET_PROCESSOR ("_LOONGARCH_TUNE", la_target.cpu_tune);
 
+  LARCH_CPP_SET_PROCESSOR ("__loongarch_arch", la_target.cpu_arch);
+  LARCH_CPP_SET_PROCESSOR ("__loongarch_tune", la_target.cpu_tune);
+
   /* Base architecture / ABI.  */
   if (TARGET_64BIT)
 {
-- 
2.36.0



[PATCH v3 0/4] LoongArch: target configuration interface update

2023-08-31 Thread Yang Yujie
This is an update of
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628772.html

Changes since the last patchset:

1. Fix texinfo format of the install.texi document.
2. Add documentation for --with-strict-align-lib.

v1 -> v2:
1. Add new configure option --with-strict-align-lib to control
   whether -mstrict-align should be used when building libraries.
   This facilitates building toolchains targeting both LA264
   (Loongson 2k1000la) and non-LA264 cores.

2. Define preprocessing macros  __loongarch_sx / __loongarch_asx
   / __loongarch_simd_width that indicates the enabled SIMD
   extensions.

3. Keep the current non-symmetric multidir layout, but do not build
   duplicate multilib variants with the same ABI option.  Make
   --with-abi= obsolete to ensure a consistent directory layout.
   (ABI type of the "toplevel" libraries can be inferred from the
target triplet)

4. Using "-mno-lasx" do not cause a fallback to "-msimd=none" as
   long as the -march= architecture or the default --with-simd=
   setting has LSX support.

Yang Yujie (4):
  LoongArch: improved target configuration interface
  LoongArch: define preprocessing macros "__loongarch_{arch,tune}"
  LoongArch: add new configure option --with-strict-align-lib
  LoongArch: support loongarch*-elf target

 config-ml.in  |  10 +
 gcc/config.gcc| 408 ++
 gcc/config/loongarch/elf.h|  52 +++
 .../loongarch/genopts/loongarch-strings   |   8 +-
 gcc/config/loongarch/genopts/loongarch.opt.in |  62 ++-
 gcc/config/loongarch/la464.md |  32 +-
 gcc/config/loongarch/loongarch-c.cc   |  22 +-
 gcc/config/loongarch/loongarch-cpu.cc | 263 ++-
 gcc/config/loongarch/loongarch-cpu.h  |   3 +-
 gcc/config/loongarch/loongarch-def.c  |  67 +--
 gcc/config/loongarch/loongarch-def.h  |  57 +--
 gcc/config/loongarch/loongarch-driver.cc  | 208 +
 gcc/config/loongarch/loongarch-driver.h   |  40 +-
 gcc/config/loongarch/loongarch-opts.cc| 372 +++-
 gcc/config/loongarch/loongarch-opts.h |  59 +--
 gcc/config/loongarch/loongarch-str.h  |   7 +-
 gcc/config/loongarch/loongarch.cc |  87 ++--
 gcc/config/loongarch/loongarch.opt|  60 ++-
 gcc/config/loongarch/t-linux  |  32 +-
 gcc/doc/install.texi  |  56 ++-
 gcc/doc/invoke.texi   |  32 +-
 libgcc/config.host|   9 +-
 22 files changed, 1261 insertions(+), 685 deletions(-)
 create mode 100644 gcc/config/loongarch/elf.h

-- 
2.36.0



[PATCH v3 1/4] LoongArch: improved target configuration interface

2023-08-31 Thread Yang Yujie
The configure script and the GCC driver are updated so that
it is easier to customize and control GCC builds for targeting
different LoongArch implementations.

* Make --with-abi obsolete, since it might cause different default ABI
  under the same target triplet, which is undesirable.  The default ABI
  is now purely decided by the target triplet.

* Support options for LoongArch SIMD extensions:
  new configure options --with-simd={none,lsx,lasx};
  new compiler option -msimd={none,lsx,lasx};
  new driver options -m[no]-l[a]sx.

* Enforce the priority of configuration paths (for ={fpu,tune,simd}):
  -m > -march-implied > --with- > --with-arch-implied.

* Allow the user to control the compiler options used when building
  GCC libraries for each multilib variant via --with-multilib-list
  and --with-multilib-default.  This could become more useful when
  we have 32-bit support later.

  Example 1: the following configure option
--with-multilib-list=lp64d/la464/mno-strict-align/msimd=lsx,lp64s/mfpu=32
  | || |
-mabi=ABI  -march=ARCH  a list of other options
  (mandatory)  (optional) (optional)

 builds two sets of libraries:
 1. lp64d/base ABI (built with "-march=la464 -mno-strict-align -msimd=lsx")
 2. lp64s/base ABI (built with "-march=abi-default -mfpu=32")

  Example 2: the following 3 configure options

--with-arch=loongarch64
--with-multilib-list=lp64d,lp64f,lp64s/la464
--with-multilib-default=fixed/mno-strict-align/mfpu=64
 ||   |
-march=ARCH   a list of other options
 (optional)(optional)

is equivalent to (in terms of building libraries):

--with-multilib-list=\
lp64d/loongarch64/mno-strict-align/mfpu=64,\
lp64f/loongarch64/mno-strict-align/mfpu=64,\
lp64s/la464

  Note:
1. the GCC driver and compiler proper does not support
   "-march=fixed". "fixed" that appear here acts as a placeholder for
   "use whatever ARCH in --with-arch=ARCH" (or the default value
   of --with-arch=ARCH if --with-arch is not explicitly configured).

2. if the ARCH part is omitted, "-march=abi-default"
   is used for building all library variants, which
   practically means enabling the minimal ISA features
   that can support the given ABI.

ChangeLog:

* config-ml.in: Do not build the multilib library variant
that is duplicate with the toplevel one.

gcc/ChangeLog:

* config.gcc: Make --with-abi= obsolete, decide the default ABI
with target triplet.  Allow specifying multilib library build
options with --with-multilib-list and --with-multilib-default.
* config/loongarch/t-linux: Likewise.
* config/loongarch/genopts/loongarch-strings: Likewise.
* config/loongarch/loongarch-str.h: Likewise.
* doc/install.texi: Likewise.
* config/loongarch/genopts/loongarch.opt.in: Introduce
-m[no-]l[a]sx options.  Only process -m*-float and
-m[no-]l[a]sx in the GCC driver.
* config/loongarch/loongarch.opt: Likewise.
* config/loongarch/la464.md: Likewise.
* config/loongarch/loongarch-c.cc: Likewise.
* config/loongarch/loongarch-cpu.cc: Likewise.
* config/loongarch/loongarch-cpu.h: Likewise.
* config/loongarch/loongarch-def.c: Likewise.
* config/loongarch/loongarch-def.h: Likewise.
* config/loongarch/loongarch-driver.cc: Likewise.
* config/loongarch/loongarch-driver.h: Likewise.
* config/loongarch/loongarch-opts.cc: Likewise.
* config/loongarch/loongarch-opts.h: Likewise.
* config/loongarch/loongarch.cc: Likewise.
* doc/invoke.texi: Likewise.
---
 config-ml.in  |  10 +
 gcc/config.gcc| 379 ++
 .../loongarch/genopts/loongarch-strings   |   8 +-
 gcc/config/loongarch/genopts/loongarch.opt.in |  62 +--
 gcc/config/loongarch/la464.md |  32 +-
 gcc/config/loongarch/loongarch-c.cc   |  19 +-
 gcc/config/loongarch/loongarch-cpu.cc | 263 +++-
 gcc/config/loongarch/loongarch-cpu.h  |   3 +-
 gcc/config/loongarch/loongarch-def.c  |  67 ++--
 gcc/config/loongarch/loongarch-def.h  |  57 +--
 gcc/config/loongarch/loongarch-driver.cc  | 208 +-
 gcc/config/loongarch/loongarch-driver.h   |  40 +-
 gcc/config/loongarch/loongarch-opts.cc| 372 -
 gcc/config/loongarch/loongarch-opts.h |  59 +--
 gcc/config/loongarch/loongarch-str.h  |   7 +-
 gcc/config/loongarch/loongarch.cc |  87 ++--
 gcc/config/loongarch/loongarch.opt|  60 ++-
 gcc/config/loongarch/t-linux  |  32 +-
 gcc/doc/install.texi  |  52 ++-
 gcc/doc/invoke.texi 

[PATCH v3 4/4] LoongArch: support loongarch*-elf target

2023-08-31 Thread Yang Yujie
gcc/ChangeLog:

* config.gcc: add loongarch*-elf target.
* config/loongarch/elf.h: New file.
Link against newlib by default.

libgcc/ChangeLog:

* config.host: add loongarch*-elf target.
---
 gcc/config.gcc | 15 ++-
 gcc/config/loongarch/elf.h | 52 ++
 libgcc/config.host |  9 +--
 3 files changed, 73 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/loongarch/elf.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ed70fa63268..b77e1fd5278 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2491,6 +2491,18 @@ loongarch*-*-linux*)
gcc_cv_initfini_array=yes
;;
 
+loongarch*-*-elf*)
+   tm_file="elfos.h newlib-stdint.h ${tm_file}"
+   tm_file="${tm_file} loongarch/elf.h loongarch/linux.h"
+   tmake_file="${tmake_file} loongarch/t-linux"
+   gnu_ld=yes
+   gas=yes
+
+   # For .init_array support.  The configure script cannot always
+   # automatically detect that GAS supports it, yet we require it.
+   gcc_cv_initfini_array=yes
+   ;;
+
 mips*-*-netbsd*)   # NetBSD/mips, either endian.
target_cpu_default="MASK_ABICALLS"
tm_file="elfos.h ${tm_file} mips/elf.h ${nbsd_tm_file} mips/netbsd.h"
@@ -4932,8 +4944,9 @@ case "${target}" in
esac
 
case ${target} in
- *-linux-gnu*)  triplet_os="linux-gnu";;
+ *-linux-gnu*) triplet_os="linux-gnu";;
  *-linux-musl*) triplet_os="linux-musl";;
+ *-elf*) triplet_os="elf";;
  *)
  echo "Unsupported target ${target}." 1>&2
  exit 1
diff --git a/gcc/config/loongarch/elf.h b/gcc/config/loongarch/elf.h
new file mode 100644
index 000..6f84222e4e1
--- /dev/null
+++ b/gcc/config/loongarch/elf.h
@@ -0,0 +1,52 @@
+/* Definitions for LoongArch ELF-based systems.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* Define the size of the wide character type.  */
+#undef WCHAR_TYPE
+#define WCHAR_TYPE "int"
+
+#undef WCHAR_TYPE_SIZE
+#define WCHAR_TYPE_SIZE 32
+
+
+/* GNU-specific SPEC definitions.  */
+#define GNU_USER_LINK_EMULATION "elf" ABI_GRLEN_SPEC "loongarch"
+
+#undef GNU_USER_TARGET_LINK_SPEC
+#define GNU_USER_TARGET_LINK_SPEC \
+  "%{shared} -m " GNU_USER_LINK_EMULATION
+
+
+/* Link against Newlib libraries, because the ELF backend assumes Newlib.
+   Handle the circular dependence between libc and libgloss.  */
+#undef  LIB_SPEC
+#define LIB_SPEC "--start-group -lc %{!specs=nosys.specs:-lgloss} --end-group"
+
+#undef LINK_SPEC
+#define LINK_SPEC GNU_USER_TARGET_LINK_SPEC
+
+#undef  STARTFILE_SPEC
+#define STARTFILE_SPEC "crt0%O%s crtbegin%O%s"
+
+#undef  ENDFILE_SPEC
+#define ENDFILE_SPEC "crtend%O%s"
+
+#undef SUBTARGET_CC1_SPEC
+#define SUBTARGET_CC1_SPEC "%{profile:-p}"
+
diff --git a/libgcc/config.host b/libgcc/config.host
index c94d69d84b7..6a112a07b14 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -136,7 +136,7 @@ hppa*-*-*)
 lm32*-*-*)
cpu_type=lm32
;;
-loongarch*-*-*)
+loongarch*-*)
cpu_type=loongarch
tmake_file="loongarch/t-loongarch"
if test "${libgcc_cv_loongarch_hard_float}" = yes; then
@@ -944,7 +944,7 @@ lm32-*-uclinux*)
 extra_parts="$extra_parts crtbegin.o crtendS.o crtbeginT.o"
 tmake_file="lm32/t-lm32 lm32/t-uclinux t-libgcc-pic t-softfp-sfdf 
t-softfp"
;;
-loongarch*-*-linux*)
+loongarch*-linux*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
case ${host} in
@@ -954,6 +954,11 @@ loongarch*-*-linux*)
esac
md_unwind_header=loongarch/linux-unwind.h
;;
+loongarch*-elf*)
+   extra_parts="$extra_parts crtfastmath.o"
+   tmake_file="${tmake_file} t-crtfm loongarch/t-crtstuff"
+   tmake_file="${tmake_file} t-slibgcc-libgcc"
+   ;;
 m32r-*-elf*)
tmake_file="$tmake_file m32r/t-m32r t-fdpbit"
extra_parts="$extra_parts crtinit.o crtfini.o"
-- 
2.36.0



[PATCH v3 3/4] LoongArch: add new configure option --with-strict-align-lib

2023-08-31 Thread Yang Yujie
LoongArch processors may not support memory accesses without natural
alignments.  Building libraries with -mstrict-align may help with
toolchain binary compatiblity and performance on these implementations
(e.g. Loongson 2K1000LA).

No significant performance degredation is observed on current mainstream
LoongArch processors when the option is enabled.

gcc/ChangeLog:

* config.gcc: use -mstrict-align for building libraries
if --with-strict-align-lib is given.
* doc/install.texi: likewise.
---
 gcc/config.gcc   | 16 +++-
 gcc/doc/install.texi |  4 
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4fae672a3b7..ed70fa63268 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4892,7 +4892,7 @@ case "${target}" in
;;
 
loongarch*-*)
-   supported_defaults="abi arch tune fpu simd multilib-default"
+   supported_defaults="abi arch tune fpu simd multilib-default 
strict-align-lib"
 
# Local variables
unset \
@@ -5089,6 +5089,17 @@ case "${target}" in
;;
esac
 
+   # Build libraries with -mstrict-align if 
--with-strict-align-lib is given.
+   case ${with_strict_align_lib} in
+   yes) strict_align_opt="/mstrict-align" ;;
+   ""|no)  ;;
+   *)
+   echo "Unknown option: 
--with-strict-align-lib=${with_strict_align_lib}" 1>&2
+   exit 1
+   ;;
+   esac
+
+
# Handle --with-multilib-default
if echo "${with_multilib_default}" \
| grep -E -e '[[:space:]]' -e '//' -e '/$' -e '^/' > /dev/null 
2>&1; then
@@ -5250,6 +5261,9 @@ case "${target}" in
;;
esac
 
+   # Use mstrict-align for building libraries if 
--with-strict-align-lib is given.
+   
loongarch_multilib_list_make="${loongarch_multilib_list_make}${strict_align_opt}"
+
# Check for repeated configuration of the same multilib 
variant.
if echo "${elem_abi_base}/${elem_abi_ext}" \
 | grep -E "^(${all_abis%|})$" >/dev/null 2>&1; then
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index bbc58cb60ca..31f2234640f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1416,6 +1416,10 @@ Multiple @var{option}s may appear consecutively while 
@var{arch} may only
 appear in the beginning or be omitted (which means @option{-march=abi-default}
 is applied when building the libraries).
 
+@item --with-strict-align-lib
+On LoongArch targets, build all enabled multilibs with @option{-mstrict-align}
+(Not enabled by default).
+
 @item --with-multilib-generator=@var{config}
 Specify what multilibs to build.  @var{config} is a semicolon separated list of
 values, possibly consisting of a single value.  Currently only implemented
-- 
2.36.0



Re: [PATCH] middle-end/111253 - partly revert r11-6508-gabb1b6058c09a7

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 12:37:59PM +, Richard Biener via Gcc-patches wrote:
> The following keeps dumping SSA def stmt RHS during diagnostic
> reporting only for gimple_assign_single_p defs which means
> memory loads.  This avoids diagnostics containing PHI nodes
> like
> 
>   warning: 'realloc' called on pointer '*_42 = PHI  lcs.19_48(30)>.t_mem_caches' with nonzero offset 40
> 
> instead getting back the previous behavior:
> 
>   warning: 'realloc' called on pointer '*.t_mem_caches' with nonzero 
> offset 40
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR middle-end/111253
> gcc/c-family/
>   * c-pretty-print.cc (c_pretty_printer::primary_expression):
>   Only dump gimple_assign_single_p SSA def RHS.
> 
>   * gcc.dg/Wfree-nonheap-object-7.c: New testcase.

Ok.

Jakub



RE: [PATCH] RISC-V: Add Vector cost model framework for RVV

2023-08-31 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Thursday, August 31, 2023 8:39 PM
To: Robin Dapp 
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; Juzhe-Zhong 

Subject: Re: [PATCH] RISC-V: Add Vector cost model framework for RVV

LGTM, Awesome!! It seems a sign of the next big move for RISC-V vectorization!

On Thu, Aug 31, 2023 at 8:36 PM Robin Dapp  wrote:
>
> OK.  As it doesn't do anything and we'll be needing it anyway no harm
> in adding it.
>
> Regards
>  Robin


Re: [PATCH] RISC-V: Add Vector cost model framework for RVV

2023-08-31 Thread Kito Cheng via Gcc-patches
LGTM, Awesome!! It seems a sign of the next big move for RISC-V vectorization!

On Thu, Aug 31, 2023 at 8:36 PM Robin Dapp  wrote:
>
> OK.  As it doesn't do anything and we'll be needing it anyway no harm
> in adding it.
>
> Regards
>  Robin


[PATCH] middle-end/111253 - partly revert r11-6508-gabb1b6058c09a7

2023-08-31 Thread Richard Biener via Gcc-patches
The following keeps dumping SSA def stmt RHS during diagnostic
reporting only for gimple_assign_single_p defs which means
memory loads.  This avoids diagnostics containing PHI nodes
like

  warning: 'realloc' called on pointer '*_42 = PHI .t_mem_caches' with nonzero offset 40

instead getting back the previous behavior:

  warning: 'realloc' called on pointer '*.t_mem_caches' with nonzero 
offset 40

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR middle-end/111253
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::primary_expression):
Only dump gimple_assign_single_p SSA def RHS.

* gcc.dg/Wfree-nonheap-object-7.c: New testcase.
---
 gcc/c-family/c-pretty-print.cc|  7 -
 gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c | 26 +++
 2 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index 7536a7c471f..679aa766fe0 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -33,6 +33,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "options.h"
 #include "internal-fn.h"
+#include "function.h"
+#include "basic-block.h"
+#include "gimple.h"
 
 /* The pretty-printer code is primarily designed to closely follow
(GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -1380,12 +1383,14 @@ c_pretty_printer::primary_expression (tree e)
  else
primary_expression (var);
}
-  else
+  else if (gimple_assign_single_p (SSA_NAME_DEF_STMT (e)))
{
  /* Print only the right side of the GIMPLE assignment.  */
  gimple *def_stmt = SSA_NAME_DEF_STMT (e);
  pp_gimple_stmt_1 (this, def_stmt, 0, TDF_RHS_ONLY);
}
+  else
+   expression (e);
   break;
 
 default:
diff --git a/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c 
b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c
new file mode 100644
index 000..6116bfa4d8e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wfree-nonheap-object-7.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wfree-nonheap-object" } */
+
+struct local_caches *get_local_caches_lcs;
+void *calloc(long, long);
+void *realloc();
+
+struct local_caches {
+  int *t_mem_caches;
+};
+
+struct local_caches *get_local_caches() {
+  if (get_local_caches_lcs)
+return get_local_caches_lcs;
+  get_local_caches_lcs = calloc(1, 0);
+  return get_local_caches_lcs;
+}
+
+void libtrace_ocache_free() {
+  struct local_caches lcs = *get_local_caches(), __trans_tmp_1 = lcs;
+  {
+struct local_caches *lcs = &__trans_tmp_1;
+lcs->t_mem_caches += 10;
+__trans_tmp_1.t_mem_caches = realloc(__trans_tmp_1.t_mem_caches, 
sizeof(int)); // { dg-warning "called on pointer (?:(?!PHI).)*nonzero offset" }
+  }
+}
-- 
2.35.3


Re: [PATCH] RISC-V: Add Vector cost model framework for RVV

2023-08-31 Thread Robin Dapp via Gcc-patches
OK.  As it doesn't do anything and we'll be needing it anyway no harm
in adding it.

Regards
 Robin


Re: [PATCH v2] libstdc++: Define _GLIBCXX_HAS_BUILTIN_TRAIT

2023-08-31 Thread Ken Matsui via Gcc-patches
On Tue, Aug 8, 2023 at 1:23 PM Jonathan Wakely  wrote:
>
>
>
> On Wed, 19 Jul 2023 at 20:33, Ken Matsui via Libstdc++ 
>  wrote:
>>
>> This patch defines _GLIBCXX_HAS_BUILTIN_TRAIT macro, which will be used
>> as a flag to toggle the use of built-in traits in the type_traits header
>> through _GLIBCXX_NO_BUILTIN_TRAITS macro, without needing to modify the
>> source code.
>>
>> libstdc++-v3/ChangeLog:
>>
>> * include/bits/c++config (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
>> (_GLIBCXX_HAS_BUILTIN): Keep defined.
>
>
> I think this would be a little better as:
>
> * include/bits/c++config (_GLIBCXX_HAS_BUILTIN): Do not undef.
> (_GLIBCXX_HAS_BUILTIN_TRAIT): Define.
>
> OK for trunk with that change, thanks.
>
Thank you for your review! Patrick and I were discussing the naming
conventions for the macros _GLIBCXX_HAS_BUILTIN_TRAIT and
_GLIBCXX_NO_BUILTIN_TRAITS. It was brought to our attention that these
namings might be ambiguous, as there are implementations that have
corresponding built-ins but do not have fallback. Therefore, we
believe that using _GLIBCXX_USE_BUILTIN_TRAIT instead of
_GLIBCXX_HAS_BUILTIN_TRAIT would be more appropriate. Similarly, we
think that _GLIBCXX_AVOID_BUILTIN_TRAITS would be a better choice than
_GLIBCXX_NO_BUILTIN_TRAITS, as the latter implies that there are no
built-ins, when in fact it is meant to express that the use of
built-ins should be avoided when defining this macro. Could you please
let me know your thoughts on these updated namings?
>
>>
>>
>> Signed-off-by: Ken Matsui 
>> ---
>>  libstdc++-v3/include/bits/c++config | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/libstdc++-v3/include/bits/c++config 
>> b/libstdc++-v3/include/bits/c++config
>> index dd47f274d5f..984985d6fff 100644
>> --- a/libstdc++-v3/include/bits/c++config
>> +++ b/libstdc++-v3/include/bits/c++config
>> @@ -854,7 +854,15 @@ namespace __gnu_cxx
>>  # define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
>>  #endif
>>
>> -#undef _GLIBCXX_HAS_BUILTIN
>> +// Returns 1 if _GLIBCXX_NO_BUILTIN_TRAITS is not defined and the compiler
>> +// has a corresponding built-in type trait, 0 otherwise.
>> +// _GLIBCXX_NO_BUILTIN_TRAITS can be defined to disable the use of built-in
>> +// traits.
>> +#ifndef _GLIBCXX_NO_BUILTIN_TRAITS
>> +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) _GLIBCXX_HAS_BUILTIN(BT)
>> +#else
>> +# define _GLIBCXX_HAS_BUILTIN_TRAIT(BT) 0
>> +#endif
>>
>>  // Mark code that should be ignored by the compiler, but seen by Doxygen.
>>  #define _GLIBCXX_DOXYGEN_ONLY(X)
>> --
>> 2.41.0
>>


Re: [PATCH V3 2/3] RISC-V: Part-2: Save/Restore vector registers which need to be preversed

2023-08-31 Thread Lehua Ding

Sorry for that, rebased and send V4 patch, thanks.

On 2023/8/31 17:50, Kito Cheng via Gcc-patches wrote:

Could you rebase the patch again, it seems got some conflict with zcmt
which I commit in the past few days...

On Wed, Aug 30, 2023 at 9:54 AM Lehua Ding  wrote:


Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

 * config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): 
Pass riscv_cc.
 * config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
 (riscv_frame_info::reset): Reset new fileds.
 (riscv_call_tls_get_addr): Pass riscv_cc.
 (riscv_function_arg): Return riscv_cc for call patterm.
 (riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
 (riscv_save_reg_p): Add vector callee-saved check.
 (riscv_save_libcall_count): Add vector save area.
 (riscv_compute_frame_info): Ditto.
 (riscv_restore_reg): Update for type change.
 (riscv_for_each_saved_v_reg): New function save vector registers.
 (riscv_first_stack_step): Handle funciton with vector callee-saved 
registers.
 (riscv_expand_prologue): Ditto.
 (riscv_expand_epilogue): Ditto.
 (riscv_output_mi_thunk): Pass riscv_cc.
 (TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
 * config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
 * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
 * gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
 * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.
---
  gcc/config/riscv/riscv-sr.cc  |  12 +-
  gcc/config/riscv/riscv.cc | 222 +++---
  gcc/config/riscv/riscv.md |  43 +++-
  .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
  .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
  .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
  .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +
  7 files changed, 606 insertions(+), 45 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

diff --git a/gcc/config/riscv/riscv-sr.cc b/gcc/config/riscv/riscv-sr.cc
index 7248f04d68f..e6e17685df5 100644
--- a/gcc/config/riscv/riscv-sr.cc
+++ b/gcc/config/riscv/riscv-sr.cc
@@ -447,12 +447,18 @@ riscv_remove_unneeded_save_restore_calls (void)
&& !SIBCALL_REG_P (REGNO (target)))
  return;

+  /* Extract RISCV CC from the UNSPEC rtx.  */
+  rtx unspec = XVECEXP (callpat, 0, 1);
+  gcc_assert (GET_CODE (unspec) == UNSPEC
+ && XINT (unspec, 1) == UNSPEC_CALLEE_CC);
+  riscv_cc cc = (riscv_cc) INTVAL (XVECEXP (unspec, 0, 0));
rtx sibcall = NULL;
if (set_target != NULL)
-sibcall
-  = gen_sibcall_value_internal (set_target, target, const0_rtx);
+sibcall = gen_sibcall_value_internal (set_target, target, const0_rtx,
+ gen_int_mode (cc, SImode));
else
-sibcall = gen_sibcall_internal (target, const0_rtx);
+sibcall
+  = gen_sibcall_internal (target, const0_rtx, gen_int_mode (cc, SImode));

rtx_insn *before_call = PREV_INSN (call);
remove_insn (call);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index aa6b46d7611..09c9e09e83a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -108,6 +108,9 @@ struct GTY(())  riscv_frame_info {
/* Likewise FPR X.  */
unsigned int fmask;

+  /* Likewise for vector registers.  */
+  unsigned int vmask;
+
/* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
unsigned save_libcall_adjustment;

@@ -115,6 +118,10 @@ struct GTY(())  riscv_frame_info {
poly_int64 gp_sp_offset;
poly_int64 fp_sp_offset;

+  /* Top and bottom offsets of vector save areas from frame bottom.  */
+  poly_int64 v_sp_offset_top;
+  poly_int64 v_sp_offset_bottom;
+
/* Offset of virtual frame pointer from stack pointer/frame bottom */
poly_int64 frame_pointer_offset;

@@ -265,7 +272,7 @@ unsigned riscv_stack_boundary;
  /* If non-zero, this is an o

[PATCH V4 3/3] RISC-V: Part-3: Output .variant_cc directive for vector function

2023-08-31 Thread Lehua Ding
Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].

[1] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] 
https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):
Output .variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME):
Implement ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |  3 ++
 gcc/config/riscv/riscv.cc | 48 +++
 gcc/config/riscv/riscv.h  | 15 ++
 .../riscv/rvv/base/abi-call-variant_cc.c  | 39 +++
 4 files changed, 105 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 3761ff3b86b..5853808694f 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,9 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern void riscv_declare_function_name (FILE *, const char *, tree);
+extern void riscv_asm_output_alias (FILE *, const tree, const tree);
+extern void riscv_asm_output_external (FILE *, const tree, const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0605298fcb3..5b9b504d1bc 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7708,6 +7708,54 @@ riscv_emit_attribute ()
riscv_stack_boundary / 8);
 }
 
+/* Output .variant_cc for function symbol which follows vector calling
+   convention.  */
+
+static void
+riscv_asm_output_variant_cc (FILE *stream, const tree decl, const char *name)
+{
+  if (TREE_CODE (decl) == FUNCTION_DECL)
+{
+  riscv_cc cc = (riscv_cc) fndecl_abi (decl).id ();
+  if (cc == RISCV_CC_V)
+   {
+ fprintf (stream, "\t.variant_cc\t");
+ assemble_name (stream, name);
+ fprintf (stream, "\n");
+   }
+}
+}
+
+/* Implement ASM_DECLARE_FUNCTION_NAME.  */
+
+void
+riscv_declare_function_name (FILE *stream, const char *name, tree fndecl)
+{
+  riscv_asm_output_variant_cc (stream, fndecl, name);
+  ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
+  ASM_OUTPUT_LABEL (stream, name);
+}
+
+/* Implement ASM_OUTPUT_DEF_FROM_DECLS.  */
+
+void
+riscv_asm_output_alias (FILE *stream, const tree decl, const tree target)
+{
+  const char *name = XSTR (XEXP (DECL_RTL (decl), 0), 0);
+  const char *value = IDENTIFIER_POINTER (target);
+  riscv_asm_output_variant_cc (stream, decl, name);
+  ASM_OUTPUT_DEF (stream, name, value);
+}
+
+/* Implement ASM_OUTPUT_EXTERNAL.  */
+
+void
+riscv_asm_output_external (FILE *stream, tree decl, const char *name)
+{
+  default_elf_asm_output_external (stream, decl, name);
+  riscv_asm_output_variant_cc (stream, decl, name);
+}
+
 /* Implement TARGET_ASM_FILE_START.  */
 
 static void
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 222aeec2b24..22a419a44e4 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1046,6 +1046,21 @@ while (0)
 
 #define ASM_COMMENT_START "#"
 
+/* Add output .variant_cc directive for specific function definition.  */
+#undef ASM_DECLARE_FUNCTION_NAME
+#define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL) 
\
+  riscv_declare_function_name (STR, NAME, DECL)
+
+/* Add output .variant_cc directive for specific alias definition.  */
+#undef ASM_OUTPUT_DEF_FROM_DECLS
+#define ASM_OUTPUT_DEF_FROM_DECLS(STR, DECL, TARGET)   
\
+  riscv_asm_output_alias (STR, DECL, TARGET)
+
+/* Add output .variant_cc directive for specific extern function.  */
+#undef ASM_OUTPUT_EXTERNAL
+#define ASM_OUTPUT_EXTERNAL(STR, DECL, NAME)   
\
+  riscv_asm_output_external (STR, DECL, NAME)
+
 #undef SIZE_TYPE
 #define SIZE_TYPE (POINTER_SIZE == 64 ? "long unsigned int" : "unsigned int")
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c

[PATCH V4 2/3] RISC-V: Part-2: Save/Restore vector registers which need to be preversed

2023-08-31 Thread Lehua Ding
Because functions which follow vector calling convention variant has
callee-saved vector reigsters but functions which follow standard calling
convention don't have. We need to distinguish which function callee is so that
we can tell GCC exactly which vector registers callee will clobber. So I encode
the callee's calling convention information into the calls rtx pattern like
AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target are
useless and removed according to my analysis.

gcc/ChangeLog:

* config/riscv/riscv-sr.cc (riscv_remove_unneeded_save_restore_calls): 
Pass riscv_cc.
* config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
(riscv_frame_info::reset): Reset new fileds.
(riscv_call_tls_get_addr): Pass riscv_cc.
(riscv_function_arg): Return riscv_cc for call patterm.
(riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
(riscv_save_reg_p): Add vector callee-saved check.
(riscv_stack_align): Add vector save area comment.
(riscv_compute_frame_info): Ditto.
(riscv_restore_reg): Update for type change.
(riscv_for_each_saved_v_reg): New function save vector registers.
(riscv_first_stack_step): Handle funciton with vector callee-saved 
registers.
(riscv_expand_prologue): Ditto.
(riscv_expand_epilogue): Ditto.
(riscv_output_mi_thunk): Pass riscv_cc.
(TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
* config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: New test.
* gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.

---
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv.cc | 194 --
 gcc/config/riscv/riscv.md |  43 ++--
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  86 
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  86 
 .../base/abi-callee-saved-1-save-restore.c|  85 
 .../riscv/rvv/base/abi-callee-saved-1-zcmp.c  |  85 
 .../riscv/rvv/base/abi-callee-saved-1.c   |  88 
 .../base/abi-callee-saved-2-save-restore.c| 108 ++
 .../riscv/rvv/base/abi-callee-saved-2-zcmp.c  | 107 ++
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +++
 11 files changed, 979 insertions(+), 32 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-save-restore.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2-save-restore.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c

diff --git a/gcc/config/riscv/riscv-sr.cc b/gcc/config/riscv/riscv-sr.cc
index 7248f04d68f..e6e17685df5 100644
--- a/gcc/config/riscv/riscv-sr.cc
+++ b/gcc/config/riscv/riscv-sr.cc
@@ -447,12 +447,18 @@ riscv_remove_unneeded_save_restore_calls (void)
   && !SIBCALL_REG_P (REGNO (target)))
 return;
 
+  /* Extract RISCV CC from the UNSPEC rtx.  */
+  rtx unspec = XVECEXP (callpat, 0, 1);
+  gcc_assert (GET_CODE (unspec) == UNSPEC
+ && XINT (unspec, 1) == UNSPEC_CALLEE_CC);
+  riscv_cc cc = (riscv_cc) INTVAL (XVECEXP (unspec, 0, 0));
   rtx sibcall = NULL;
   if (set_target != NULL)
-sibcall
-  = gen_sibcall_value_internal (set_target, target, const0_rtx);
+sibcall = gen_sibcall_value_internal (set_target, target, const0_rtx,
+ gen_int_mode (cc, SImode));
   else
-sibcall = gen_sibcall_internal (target, const0_rtx);
+sibcall
+  = gen_sibcall_internal (target, const0_rtx, gen_int_mode (cc, SImode));
 
   rtx_insn *before_call = PREV_INSN (call);
   remove_insn (call);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a317afd1c15..0605298fcb3 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -109,6 +109,9 @@ struct GTY(())  riscv_frame_info {
   /* Likewise FPR X.  */
   unsigned int fmask;
 
+  /* Likewi

[PATCH V4 1/3] RISC-V: Part-1: Select suitable vector registers for vector type args and returns

2023-08-31 Thread Lehua Ding
I post the vector register calling convention rules from in the proposal[1]
directly here:

v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.

Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.

Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.

The rules for passing vector arguments are as follows:

1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.

2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.

3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.

NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.

Vector values are returned in the same manner as the first named argument of
the same type would be passed.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389

gcc/ChangeLog:

* config/riscv/riscv-protos.h (builtin_type_p): New function for 
checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector 
arguments.
(riscv_arguments_is_vector_type_p): New function for check vector 
returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc |  10 +
 gcc/config/riscv/riscv.cc | 235 ++--
 gcc/config/riscv/riscv.h  |  25 ++
 gcc/config/riscv/riscv.opt|   5 +
 .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
 .../riscv/rvv/base/abi-call-args-1.c  | 197 +
 .../riscv/rvv/base/abi-call-args-2-run.c  |  34 +++
 .../riscv/rvv/base/abi-call-args-2.c  |  27 ++
 .../riscv/rvv/base/abi-call-args-3-run.c  | 260 ++
 .../riscv/rvv/base/abi-call-args-3.c  | 116 
 .../riscv/rvv/base/abi-call-a

[PATCH V4 0/3] RISC-V: Add an experimental vector calling convention

2023-08-31 Thread Lehua Ding
V4 change: Rebasing.

Hi RISC-V folks,

This patch implement the proposal of RISC-V vector calling convention[1] and
this feature can be enabled by `--param=riscv-vector-abi` option. Currently,
all vector type arguments and return values are pass by reference. With this
patch, these arguments and return values can pass through vector registers.
Currently only vector types defined in the RISC-V Vector Extension Intrinsic 
Document[2]
are supported. GNU-ext vector types are unsupported for now since the
corresponding proposal was not presented.

The proposal introduce a new calling convention variant, functions which follow
this variant need follow the bellow vector register convention.

| Name| ABI Mnemonic | Meaning  | Preserved across 
calls?
=
| v0  |  | Argument register| No
| v1-v7   |  | Callee-saved registers   | Yes
| v8-v23  |  | Argument registers   | No
| v24-v31 |  | Callee-saved registers   | Yes

If a functions follow this vector calling convention, then the function symbole
must be annotated with .variant_cc directive[3] (used to indicate that it is a
calling convention variant).

This implementation split into three parts, each part corresponds to a 
sub-patch.

- Part-1: Select suitable vector regsiters for vector type arguments and return
  values according to the proposal.
- Part-2: Allocate frame area for callee-saved vector registers and save/restore
  them in prologue and epilogue.
- Part-3: Generate .variant_cc directive for vector function in assembly code.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389
[2] 
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#type-system
[3] 
https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops

Best,
Lehua

Lehua Ding (3):
  RISC-V: Part-1: Select suitable vector registers for vector type args
and returns
  RISC-V: Part-2: Save/Restore vector registers which need to be
preversed
  RISC-V: Part-3: Output .variant_cc directive for vector function

 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-sr.cc  |  12 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  10 +
 gcc/config/riscv/riscv.cc | 477 --
 gcc/config/riscv/riscv.h  |  40 ++
 gcc/config/riscv/riscv.md |  43 +-
 gcc/config/riscv/riscv.opt|   5 +
 .../riscv/rvv/base/abi-call-args-1-run.c  | 127 +
 .../riscv/rvv/base/abi-call-args-1.c  | 197 
 .../riscv/rvv/base/abi-call-args-2-run.c  |  34 ++
 .../riscv/rvv/base/abi-call-args-2.c  |  27 +
 .../riscv/rvv/base/abi-call-args-3-run.c  | 260 ++
 .../riscv/rvv/base/abi-call-args-3.c  | 116 +
 .../riscv/rvv/base/abi-call-args-4-run.c  | 145 ++
 .../riscv/rvv/base/abi-call-args-4.c  | 111 
 .../riscv/rvv/base/abi-call-error-1.c |  11 +
 .../riscv/rvv/base/abi-call-return-run.c  | 127 +
 .../riscv/rvv/base/abi-call-return.c  | 197 
 .../riscv/rvv/base/abi-call-variant_cc.c  |  39 ++
 .../rvv/base/abi-callee-saved-1-fixed-1.c |  86 
 .../rvv/base/abi-callee-saved-1-fixed-2.c |  86 
 .../base/abi-callee-saved-1-save-restore.c|  85 
 .../riscv/rvv/base/abi-callee-saved-1-zcmp.c  |  85 
 .../riscv/rvv/base/abi-callee-saved-1.c   |  88 
 .../base/abi-callee-saved-2-save-restore.c| 108 
 .../riscv/rvv/base/abi-callee-saved-2-zcmp.c  | 107 
 .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +
 27 files changed, 2695 insertions(+), 49 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-args-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-error-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-return.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-call-variant_cc.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
 create m

Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to reduce cross backedge FMA

2023-08-31 Thread Richard Biener via Gcc-patches
On Wed, Aug 30, 2023 at 11:33 AM Di Zhao OS
 wrote:
>
> Hello Richard,
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Tuesday, August 29, 2023 7:11 PM
> > To: Di Zhao OS 
> > Cc: Jeff Law ; Martin Jambor ; gcc-
> > patc...@gcc.gnu.org
> > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc to
> > reduce cross backedge FMA
> >
> > On Tue, Aug 29, 2023 at 10:59 AM Di Zhao OS
> >  wrote:
> > >
> > > Hi,
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Tuesday, August 29, 2023 4:09 PM
> > > > To: Di Zhao OS 
> > > > Cc: Jeff Law ; Martin Jambor ;
> > gcc-
> > > > patc...@gcc.gnu.org
> > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in reassoc
> > to
> > > > reduce cross backedge FMA
> > > >
> > > > On Tue, Aug 29, 2023 at 9:49 AM Di Zhao OS
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Tuesday, August 29, 2023 3:41 PM
> > > > > > To: Jeff Law ; Martin Jambor 
> > > > > > 
> > > > > > Cc: Di Zhao OS ; gcc-
> > patc...@gcc.gnu.org
> > > > > > Subject: Re: [PATCH] [tree-optimization/110279] swap operands in
> > reassoc
> > > > to
> > > > > > reduce cross backedge FMA
> > > > > >
> > > > > > On Tue, Aug 29, 2023 at 1:23 AM Jeff Law via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 8/28/23 02:17, Di Zhao OS via Gcc-patches wrote:
> > > > > > > > This patch tries to fix the 2% regression in 510.parest_r on
> > > > > > > > ampere1 in the tracker. (Previous discussion is here:
> > > > > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
> > > > > > > >
> > > > > > > > 1. Add testcases for the problem. For an op list in the form of
> > > > > > > > "acc = a * b + c * d + acc", currently reassociation doesn't
> > > > > > > > Swap the operands so that more FMAs can be generated.
> > > > > > > > After widening_mul the result looks like:
> > > > > > > >
> > > > > > > > _1 = .FMA(a, b, acc_0);
> > > > > > > > acc_1 = .FMA(c, d, _1);
> > > > > > > >
> > > > > > > > While previously (before the "Handle FMA friendly..." patch),
> > > > > > > > widening_mul's result was like:
> > > > > > > >
> > > > > > > > _1 = a * b;
> > > > > > > > _2 = .FMA (c, d, _1);
> > > > > > > > acc_1 = acc_0 + _2;
> > > > > >
> > > > > > How can we execute the multiply and the FMA in parallel?  They
> > > > > > depend on each other.  Or is it the uarch can handle dependence
> > > > > > on the add operand but only when it is with a multiplication and
> > > > > > not a FMA in some better ways?  (I'd doubt so much complexity)
> > > > > >
> > > > > > Can you explain in more detail how the uarch executes one vs. the
> > > > > > other case?
> > >
> > > Here's my understanding after consulted our hardware team. For the
> > > second case, the uarch of some out-of-order processors can calculate
> > > "_2" of several loops at the same time, since there's no dependency
> > > among different iterations. While for the first case the next iteration
> > > has to wait for the current iteration to finish, so "acc_0"'s value is
> > > known. I assume it is also the case in some i386 processors, since I
> > > saw the patch "Deferring FMA transformations in tight loops" also
> > > changed corresponding files.
> >
> > That should be true for all kind of operations, no?  Thus it means
> > reassoc should in general associate cross-iteration accumulation
> Yes I think both are true.
>
> > last?  Historically we associated those first because that's how the
> > vectorizer liked to see them, but I think that's no longer necessary.
> >
> > It should be achievable by properly biasing the operand during
> > rank computation (don't we already do that?).
>
> The issue is related with the following codes (handling cases with
> three operands left):
>   /* When there are three operands left, we want
>  to make sure the ones that get the double
>  binary op are chosen wisely.  */
>   int len = ops.length ();
>   if (len >= 3 && !has_fma)
> swap_ops_for_binary_stmt (ops, len - 3);
>
>   new_lhs = rewrite_expr_tree (stmt, rhs_code, 0, ops,
>powi_result != NULL
>|| negate_result,
>len != orig_len);
>
> Originally (before the "Handle FMA friendly..." patch), for the
> tiny example, the 2 multiplications will be placed first by
> swap_ops_for_binary_stmt and rewrite_expr_tree, according to
> ranks. While currently, to preserve more FMAs,
> swap_ops_for_binary_stmt won't be called, so the result would
> be MULT_EXPRs and PLUS_EXPRs interleaved with each other (which
> is mostly fine if these are not in such tight loops).
>
> What this patch tries to do can be summarized as: when cross
> backedge dependency is detected (and the uarch doesn't like it),
> better fallback to t

[PATCH] RISC-V: Add Vector cost model framework for RVV

2023-08-31 Thread Juzhe-Zhong
Hi, currently RVV vectorization only support picking LMUL according to
compile option --param=riscv-autovec-lmul= which is no ideal.

Compiler should be able to pick optimal LMUL/vectorization factor to
vectorize the loop according to the loop_vec_info and SSA-based register
pressure analysis.

Now, I figure out current GCC cost model provide the approach that we
can choose LMUL/vectorization factor by adjusting the COST.

This patch is just add the minimum COST model framework which is still
applying the default cost model (No vector codes changed from before).

Regression all pased and no difference.

gcc/ChangeLog:

* config.gcc: Add vector cost model framework for RVV.
* config/riscv/riscv.cc (riscv_vectorize_create_costs): Ditto.
(TARGET_VECTORIZE_CREATE_COSTS): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-costs.cc: New file.
* config/riscv/riscv-vector-costs.h: New file.

---
 gcc/config.gcc |  2 +-
 gcc/config/riscv/riscv-vector-costs.cc | 66 ++
 gcc/config/riscv/riscv-vector-costs.h  | 44 +
 gcc/config/riscv/riscv.cc  | 15 ++
 gcc/config/riscv/t-riscv   |  8 
 5 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-vector-costs.cc
 create mode 100644 gcc/config/riscv/riscv-vector-costs.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 415e0e1ebc5..0ba1a7f494c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,7 @@ pru-*-*)
;;
 riscv*)
cpu_type=riscv
-   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
+   extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
riscv-vector-costs.o"
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o"
d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
new file mode 100644
index 000..1a5e13d5eb3
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -0,0 +1,66 @@
+/* Cost model implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define IN_TARGET_CODE 1
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "target.h"
+#include "function.h"
+#include "tree.h"
+#include "basic-block.h"
+#include "rtl.h"
+#include "gimple.h"
+#include "targhooks.h"
+#include "cfgloop.h"
+#include "fold-const.h"
+#include "tm_p.h"
+#include "tree-vectorizer.h"
+
+/* This file should be included last.  */
+#include "riscv-vector-costs.h"
+
+namespace riscv_vector {
+
+costs::costs (vec_info *vinfo, bool costing_for_scalar)
+  : vector_costs (vinfo, costing_for_scalar)
+{}
+
+unsigned
+costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info, slp_tree, tree vectype,
+ int misalign, vect_cost_model_location where)
+{
+  /* TODO: Use default STMT cost model.
+  We will support more accurate STMT cost model later.  */
+  int stmt_cost = default_builtin_vectorization_cost (kind, vectype, misalign);
+  return record_stmt_cost (stmt_info, where, count * stmt_cost);
+}
+
+void
+costs::finish_cost (const vector_costs *scalar_costs)
+{
+  vector_costs::finish_cost (scalar_costs);
+}
+
+} // namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-costs.h 
b/gcc/config/riscv/riscv-vector-costs.h
new file mode 100644
index 000..57b1be01048
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-costs.h
@@ -0,0 +1,44 @@
+/* Cost model declaration of RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2023-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software F

Re: [PATCH] MATCH: extend min_value/max_value match to vectors

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 12:27 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This simple patch extends the min_value/max_value match to vector integer 
> types.
> Using uniform_integer_cst_p makes this easy.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> The testcases pr110915-*.c are the same as pr88784-*.c except using vector
> types instead.

OK.

> PR tree-optimization/110915
>
> gcc/ChangeLog:
>
> * match.pd (min_value, max_value): Extend to vector constants.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr110915-1.c: New test.
> * gcc.dg/pr110915-10.c: New test.
> * gcc.dg/pr110915-11.c: New test.
> * gcc.dg/pr110915-12.c: New test.
> * gcc.dg/pr110915-2.c: New test.
> * gcc.dg/pr110915-3.c: New test.
> * gcc.dg/pr110915-4.c: New test.
> * gcc.dg/pr110915-5.c: New test.
> * gcc.dg/pr110915-6.c: New test.
> * gcc.dg/pr110915-7.c: New test.
> * gcc.dg/pr110915-8.c: New test.
> * gcc.dg/pr110915-9.c: New test.
> ---
>  gcc/match.pd   | 24 ++
>  gcc/testsuite/gcc.dg/pr110915-1.c  | 31 
>  gcc/testsuite/gcc.dg/pr110915-10.c | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-11.c | 31 
>  gcc/testsuite/gcc.dg/pr110915-12.c | 31 
>  gcc/testsuite/gcc.dg/pr110915-2.c  | 31 
>  gcc/testsuite/gcc.dg/pr110915-3.c  | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-4.c  | 33 ++
>  gcc/testsuite/gcc.dg/pr110915-5.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-6.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-7.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-8.c  | 32 +
>  gcc/testsuite/gcc.dg/pr110915-9.c  | 33 ++
>  13 files changed, 400 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-11.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-12.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110915-9.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 6a7edde5736..c01362ee359 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -2750,16 +2750,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>& (bitpos / BITS_PER_UNIT))); }
>
>  (match min_value
> - INTEGER_CST
> - (if ((INTEGRAL_TYPE_P (type)
> -   || POINTER_TYPE_P(type))
> -  && wi::eq_p (wi::to_wide (t), wi::min_value (type)
> + uniform_integer_cst_p
> + (with {
> +   tree int_cst = uniform_integer_cst_p (t);
> +   tree inner_type = TREE_TYPE (int_cst);
> +  }
> +  (if ((INTEGRAL_TYPE_P (inner_type)
> +|| POINTER_TYPE_P (inner_type))
> +   && wi::eq_p (wi::to_wide (int_cst), wi::min_value (inner_type))
>
>  (match max_value
> - INTEGER_CST
> - (if ((INTEGRAL_TYPE_P (type)
> -   || POINTER_TYPE_P(type))
> -  && wi::eq_p (wi::to_wide (t), wi::max_value (type)
> + uniform_integer_cst_p
> + (with {
> +   tree int_cst = uniform_integer_cst_p (t);
> +   tree itype = TREE_TYPE (int_cst);
> +  }
> + (if ((INTEGRAL_TYPE_P (itype)
> +   || POINTER_TYPE_P (itype))
> +  && wi::eq_p (wi::to_wide (int_cst), wi::max_value (itype))
>
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
> diff --git a/gcc/testsuite/gcc.dg/pr110915-1.c 
> b/gcc/testsuite/gcc.dg/pr110915-1.c
> new file mode 100644
> index 000..2e1e871b9a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110915-1.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ifcombine" } */
> +#define vector __attribute__((vector_size(sizeof(unsigned)*2)))
> +
> +#include 
> +
> +vector signed and1(vector unsigned x, vector unsigned y)
> +{
> +  /* (x > y) & (x != 0)  --> x > y */
> +  return (x > y) & (x != 0);
> +}
> +
> +vector signed and2(vector unsigned x, vector unsigned y)
> +{
> +  /* (x < y) & (x != UINT_MAX)  --> x < y */
> +  return (x < y) & (x != UINT_MAX);
> +}
> +
> +vector signed and3(vector signed x, vector signed y)
> +{
> +  /* (x > y) & (x != INT_MIN)  --> x > y */
> +  return (x > y) & (x != INT_MIN);
> +}
> +
> +vector signed and4(vector signed x, vector signed y)
> +{
> +  /* (x < y) & (x != INT_MAX)  --> x < y */
> +  return (x < y) & (x != INT_

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Filip Kastl wrote:

> > The most obvious places would be right after SSA construction and before 
> > RTL expansion.
> > Can you provide measurements for those positions?
> 
> The algorithm should only remove PHIs that break SSA form minimality. Since
> GCC's SSA construction already produces minimal SSA form, the algorithm isn't
> expected to remove any PHIs if run right after the construction. I even
> measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
> algorithm managed to remove exactly 1 PHI, which is weird). 
> 
> I tried putting the pass before pass_expand. There isn't a lot of PHIs to
> remove at that point, but there still are some.

That's interesting.  Your placement at

  NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
  NEXT_PASS (pass_phiopt, true /* early_p */);
+ NEXT_PASS (pass_sccp);

and

   NEXT_PASS (pass_tsan);
   NEXT_PASS (pass_dse, true /* use DR analysis */);
   NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_sccp);

isn't immediately after the "best" existing pass we have to
remove dead PHIs which is pass_cd_dce.  phiopt might leave
dead PHIs around and the second instance runs long after the
last CD-DCE.

So I wonder if your pass just detects unnecessary PHIs we'd have
removed by other means and what survives until RTL expansion is
what we should count?

Can you adjust your original early placement to right after
the cd-dce pass and for the late placement turn the dce pass
before it into cd-dce and re-do your measurements?

> 500.perlbench_r
> Started with 43111
> Ended with 42942
> Removed PHI % .39201131961680313700
> 
> 502.gcc_r
> Started with 141392
> Ended with 140455
> Removed PHI % .66269661649881181400
> 
> 505.mcf_r
> Started with 482
> Ended with 478
> Removed PHI % .82987551867219917100
> 
> 523.xalancbmk_r
> Started with 136040
> Ended with 135629
> Removed PHI % .30211702440458688700
> 
> 531.deepsjeng_r
> Started with 2150
> Ended with 2148
> Removed PHI % .09302325581395348900
> 
> 541.leela_r
> Started with 4664
> Ended with 4650
> Removed PHI % .30017152658662092700
> 
> 557.xz_r
> Started with 43
> Ended with 43
> Removed PHI % 0
> 
> > Can the pass somehow be used as part of propagations like during value 
> > numbering?
> 
> I don't think that the pass could be used as a part of different optimizations
> since it works on the whole CFG (except for copy propagation as I noted in the
> RFC). I'm adding Honza into Cc. He'll have more insight into this.
> 
> > Could the new file be called gimple-ssa-sccp.cc or something similar?
> 
> Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
> appropriate?
> 
> I'm thinking about naming the pass 'scc-copy' and the file
> 'tree-ssa-scc-copy.cc'.
> 
> > Removing some PHIs is nice, but it would be also interesting to know what
> > are the effects on generated code size and/or performance.
> > And also if it has any effects on debug information coverage.
> 
> Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
> and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
> Here are the full results. What do you think? Should I run more benchmarks? Or
> benchmark multiple times? Or run the benchmarks on different machines?*
> 
> 500.perlbench_r
> Without SCCP: 244.151807s
> With SCCP: 242.448438s
> -0.7025695913124297%
> 
> 502.gcc_r
> Without SCCP: 211.029606s
> With SCCP: 211.614523s
> +0.27640683243653763%
> 
> 505.mcf_r
> Without SCCP: 298.782621s
> With SCCP: 291.671468s
> -2.438069465197046%
> 
> 523.xalancbmk_r
> Without SCCP: 189.940639s
> With SCCP: 189.876261s
> -0.03390523894928332%
> 
> 531.deepsjeng_r
> Without SCCP: 250.63648s
> With SCCP: 250.988624s
> +0.1403027732444051%
> 
> 541.leela_r
> Without SCCP: 346.066278s
> With SCCP: 339.692987s
> -1.8761915152519792%
> 
> Regarding size: The pass doesn't seem to significantly reduce or increase the
> size of the result binary. The differences were at most ~0.1%.
> 
> Regarding debug info coverage: I didn't notice any additional guality 
> testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*
> 
> 
> Filip K
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 01:26:37PM +0200, Filip Kastl wrote:
> Regarding debug info coverage: I didn't notice any additional guality 
> testcases
> failing after I applied the patch. *Is there any other way how I should check
> debug info coverage?*

I'm usually using https://github.com/pmachata/dwlocstat
for that, usually on cc1plus from the last stage of gcc bootstrap.
Though of course, if one tree is unpatched and one patched, that results in
different code not just because the optimization did something, but because
it is different source.  So, for such purposes, I usually after one of the
2 bootstraps apply resp. revert the patch, make a copy of the cc1plus binary
and do make -jN cc1plus in the last stage directory (only there, not attempt
to rebootstrap).

Jakub



[PATCH] rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

2023-08-31 Thread Ajit Agarwal via Gcc-patches
This patch removes zero extension from vctzlsbb as it already zero extends.
Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

For rs6000 target we dont need zero_extend after vctzlsbb as vctzlsbb
already zero extend.

2023-08-31  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/vsx.md: Add new pattern.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/altivec-19.C: New testcase.
---
 gcc/config/rs6000/vsx.md  | 17 ++---
 gcc/testsuite/g++.target/powerpc/altivec-19.C | 11 +++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/altivec-19.C

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 19abfeb565a..09d21a6d00a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5846,11 +5846,22 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Vector Count Trailing Zero Least-Significant Bits Byte
-(define_insn "vctzlsbb_"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+(define_insn "vctzlsbbzext_"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
(unspec:SI
 [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
-UNSPEC_VCTZLSBB))]
+UNSPEC_VCTZLSBB)))]
+  "TARGET_P9_VECTOR"
+  "vctzlsbb %0,%1"
+  [(set_attr "type" "vecsimple")])
+
+;; Vector Count Trailing Zero Least-Significant Bits Byte
+(define_insn "vctzlsbb_"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI
+ [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
+ UNSPEC_VCTZLSBB))]
   "TARGET_P9_VECTOR"
   "vctzlsbb %0,%1"
   [(set_attr "type" "vecsimple")])
diff --git a/gcc/testsuite/g++.target/powerpc/altivec-19.C 
b/gcc/testsuite/g++.target/powerpc/altivec-19.C
new file mode 100644
index 000..2d630b2fc1f
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/altivec-19.C
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 " } */ 
+
+#include 
+
+unsigned int foo (vector unsigned char a, vector unsigned char b) {
+  return vec_first_match_or_eos_index (a, b);
+}
+/* { dg-final { scan-assembler-not "rldicl" } } */
-- 
2.39.3



[PATCH] rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

2023-08-31 Thread Ajit Agarwal via Gcc-patches


This patch removes zero extension from vctzlsbb as it already zero extends.
Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

rs6000: unnecessary clear after vctzlsbb in vec_first_match_or_eos_index

For rs6000 target we dont need zero_extend after vctzlsbb as vctzlsbb
already zero extend.

2023-08-31  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/vsx.md: Add new pattern.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/altivec-19.C: New testcase.
---
 gcc/config/rs6000/vsx.md  | 17 ++---
 gcc/testsuite/g++.target/powerpc/altivec-19.C | 11 +++
 2 files changed, 25 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/altivec-19.C

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 19abfeb565a..09d21a6d00a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5846,11 +5846,22 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Vector Count Trailing Zero Least-Significant Bits Byte
-(define_insn "vctzlsbb_"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+(define_insn "vctzlsbbzext_"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (zero_extend:DI
(unspec:SI
 [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
-UNSPEC_VCTZLSBB))]
+UNSPEC_VCTZLSBB)))]
+  "TARGET_P9_VECTOR"
+  "vctzlsbb %0,%1"
+  [(set_attr "type" "vecsimple")])
+
+;; Vector Count Trailing Zero Least-Significant Bits Byte
+(define_insn "vctzlsbb_"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(unspec:SI
+ [(match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")]
+ UNSPEC_VCTZLSBB))]
   "TARGET_P9_VECTOR"
   "vctzlsbb %0,%1"
   [(set_attr "type" "vecsimple")])
diff --git a/gcc/testsuite/g++.target/powerpc/altivec-19.C 
b/gcc/testsuite/g++.target/powerpc/altivec-19.C
new file mode 100644
index 000..2d630b2fc1f
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/altivec-19.C
@@ -0,0 +1,11 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 " } */ 
+
+#include 
+
+unsigned int foo (vector unsigned char a, vector unsigned char b) {
+  return vec_first_match_or_eos_index (a, b);
+}
+/* { dg-final { scan-assembler-not "rldicl" } } */
-- 
2.39.3



Re: [PATCH] Darwin: homogenize spelling of macOS

2023-08-31 Thread Iain Sandoe
Hi FX,

+Sandra

> On 31 Aug 2023, at 12:13, FX Coudert  wrote:
> 
> This patch homogenizes to some extent the use of “Mac OS X” or “OS X” or “Mac 
> OS” in the gcc/ folder to “macOS”, which is the modern way of writing it. It 
> is not a global replacement though, and each use was audited.
> 
> - When referring to specific versions that used the “OS X” or “Mac OS” as 
> their name, it was kept.
> - All uses referring to powerpc*-apple-darwin* were kept as-is, because those 
> versions all predate the change to “macOS”.
> - I did not touch Ada or D
> - I did not touch testsuite comments
> 
> Tested by building on x86_64-apple-darwin, and generating the docs.
> OK to push?

I think this is useful for user (or configurer)-facing documentation and help 
strings.

Being picky, there is one change where the reference is to 10.9 and earlier 
which are all Mac OS X (but that’s in a code comment so no need to change it).

OK from the Darwin perspective (for the code changes),
please wait for any comments from Sandra on the documentation changes.

thanks
Iain

> 
> FX
> 
> 
> <0001-Darwin-homogenize-spelling-of-macOS.patch>



Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-31 Thread Filip Kastl
> The most obvious places would be right after SSA construction and before RTL 
> expansion.
> Can you provide measurements for those positions?

The algorithm should only remove PHIs that break SSA form minimality. Since
GCC's SSA construction already produces minimal SSA form, the algorithm isn't
expected to remove any PHIs if run right after the construction. I even
measured it and indeed -- no PHIs got removed (except for 502.gcc_r, where the
algorithm managed to remove exactly 1 PHI, which is weird). 

I tried putting the pass before pass_expand. There isn't a lot of PHIs to
remove at that point, but there still are some.

500.perlbench_r
Started with 43111
Ended with 42942
Removed PHI % .39201131961680313700

502.gcc_r
Started with 141392
Ended with 140455
Removed PHI % .66269661649881181400

505.mcf_r
Started with 482
Ended with 478
Removed PHI % .82987551867219917100

523.xalancbmk_r
Started with 136040
Ended with 135629
Removed PHI % .30211702440458688700

531.deepsjeng_r
Started with 2150
Ended with 2148
Removed PHI % .09302325581395348900

541.leela_r
Started with 4664
Ended with 4650
Removed PHI % .30017152658662092700

557.xz_r
Started with 43
Ended with 43
Removed PHI % 0

> Can the pass somehow be used as part of propagations like during value 
> numbering?

I don't think that the pass could be used as a part of different optimizations
since it works on the whole CFG (except for copy propagation as I noted in the
RFC). I'm adding Honza into Cc. He'll have more insight into this.

> Could the new file be called gimple-ssa-sccp.cc or something similar?

Certainly. Though I'm not sure, but wouldn't tree-ssa-sccp.cc be more
appropriate?

I'm thinking about naming the pass 'scc-copy' and the file
'tree-ssa-scc-copy.cc'.

> Removing some PHIs is nice, but it would be also interesting to know what
> are the effects on generated code size and/or performance.
> And also if it has any effects on debug information coverage.

Regarding performance: I ran some benchmarks on a Zen3 machine with -O3 with
and without the new pass. *I got ~2% speedup for 505.mcf_r and 541.leela_r.
Here are the full results. What do you think? Should I run more benchmarks? Or
benchmark multiple times? Or run the benchmarks on different machines?*

500.perlbench_r
Without SCCP: 244.151807s
With SCCP: 242.448438s
-0.7025695913124297%

502.gcc_r
Without SCCP: 211.029606s
With SCCP: 211.614523s
+0.27640683243653763%

505.mcf_r
Without SCCP: 298.782621s
With SCCP: 291.671468s
-2.438069465197046%

523.xalancbmk_r
Without SCCP: 189.940639s
With SCCP: 189.876261s
-0.03390523894928332%

531.deepsjeng_r
Without SCCP: 250.63648s
With SCCP: 250.988624s
+0.1403027732444051%

541.leela_r
Without SCCP: 346.066278s
With SCCP: 339.692987s
-1.8761915152519792%

Regarding size: The pass doesn't seem to significantly reduce or increase the
size of the result binary. The differences were at most ~0.1%.

Regarding debug info coverage: I didn't notice any additional guality testcases
failing after I applied the patch. *Is there any other way how I should check
debug info coverage?*


Filip K


Re: [PATCH] testsuite/vect: Make match patterns more accurate.

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, 31 Aug 2023, Robin Dapp wrote:

> Hi,
> 
> on some targets we fail to vectorize with the first type the vectorizer
> tries but succeed with the second.  This patch changes several regex
> patterns to reflect that behavior.
> 
> Before we would look for a single occurrence of e.g.
> "vect_recog_dot_prod_pattern" but would possible find two (one for each
> attempted mode).  The new pattern tries to match sequences where we
> first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
> while making sure there is no "failed" or "Re-trying" in between.
> 
> I realized we already only do scan-tree-dump instead of
> scan-tree-dump-times in some related testcases, probably for the same
> reason but I didn't touch them for now.
> 
> Testsuite unchanged on x86, aarch64 and Power10.

LGTM.

Thanks for discovering the required TCL regex magic.

Richard.

> Regards
>  Robin
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-reduc-dot-s16a.c: Adjust regex pattern.
>   * gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
>   * gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
>   * gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c  | 4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 4 ++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c | 5 +++--
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c  | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c  | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c   | 2 +-
>  gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b-big-array.c | 2 +-
>  gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c| 4 ++--
>  13 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> index ffbc9706901..d826828e3d6 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
> @@ -51,7 +51,7 @@ main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_sdot_hi } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_widen_mult_hi_to_si } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> index 05e343ad782..4e1e0b234f4 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
> @@ -55,8 +55,8 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
> +/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_sdot_qi } } } */
>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> { vect_widen_mult_qi_to_hi && vect_widen_sum_hi_to_si } } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c 
> b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> index 82c648cc73c..cb88ad5b639 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
> @@ -53,8 +53,8 @@ int main (void)
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 
> 1 "vect" { xfail *-*-* } } } */
> -/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
> det

[PATCH v1] LoongArch: Optimize fixed-point and floating-point conversion operations.

2023-08-31 Thread Lulu Cheng
Before optimization, the operation of taking fixed-point numbers from memory
and then forcing type conversion needs to be loaded into fixed-point registers
before conversion. After the optimization is completed, the fixed-point value
is directly transferred to the floating-point register for type conversion.

eg:
extern int a;
float
test(void)
{
  return (float)a;
}

Assembly code before optimization:
pcalau12i   $r12,%got_pc_hi20(a)
ld.d$r12,$r12,%got_pc_lo12(a)
ldptr.w $r12,$r12,0
movgr2fr.w  $f0,$r12
ffint.s.w   $f0,$f0

Optimized assembly code:
pcalau12i   $r12,%got_pc_hi20(a)
ld.d$r12,$r12,%got_pc_lo12(a)
fld.s   $f0,$r12,0
ffint.s.w   $f0,$f0

gcc/ChangeLog:

* config/loongarch/loongarch.md: Allows fixed-point values to be loaded
from memory into floating-point registers.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/float-load.c: New test.
---
 gcc/config/loongarch/loongarch.md   |  4 ++--
 gcc/testsuite/gcc.target/loongarch/float-load.c | 11 +++
 2 files changed, 13 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/float-load.c

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index a6fb2b935fa..eb580697915 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1828,8 +1828,8 @@ (define_expand "movsi"
 })
 
 (define_insn_and_split "*movsi_internal"
-  [(set (match_operand:SI 0 "nonimmediate_operand" 
"=r,r,r,w,*f,*f,*r,*m,*r,*z")
-   (match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,*m,*f,*f,*z,*r"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,w,*f,f,*r,*m,*r,*z")
+   (match_operand:SI 1 "move_operand" "r,Yd,w,rJ,*r*J,m,*f,*f,*z,*r"))]
   "(register_operand (operands[0], SImode)
 || reg_or_0_operand (operands[1], SImode))"
   { return loongarch_output_move (operands[0], operands[1]); }
diff --git a/gcc/testsuite/gcc.target/loongarch/float-load.c 
b/gcc/testsuite/gcc.target/loongarch/float-load.c
new file mode 100644
index 000..c757a795e21
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/float-load.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler "fld\\.s" } } */
+
+extern int a;
+float
+test (void)
+{
+  return (float)a;
+}
+
-- 
2.31.1



[PATCH] Darwin: homogenize spelling of macOS

2023-08-31 Thread FX Coudert via Gcc-patches
This patch homogenizes to some extent the use of “Mac OS X” or “OS X” or “Mac 
OS” in the gcc/ folder to “macOS”, which is the modern way of writing it. It is 
not a global replacement though, and each use was audited.

- When referring to specific versions that used the “OS X” or “Mac OS” as their 
name, it was kept.
- All uses referring to powerpc*-apple-darwin* were kept as-is, because those 
versions all predate the change to “macOS”.
- I did not touch Ada or D
- I did not touch testsuite comments

Tested by building on x86_64-apple-darwin, and generating the docs.
OK to push?

FX




0001-Darwin-homogenize-spelling-of-macOS.patch
Description: Binary data


Re: [PATCH] RISC-V: Change vsetvl tail and mask policy to default policy

2023-08-31 Thread Lehua Ding

Committed, thanks Kito.

On 2023/8/31 17:13, Kito Cheng via Gcc-patches wrote:

LGTM

On Thu, Aug 31, 2023 at 5:07 PM Lehua Ding  wrote:


This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.

gcc/ChangeLog:

 * config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
 * config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
 Change to default policy.
 * config/riscv/riscv-vector-builtins-bases.cc: Change to default 
policy.
 * config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
 * config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
 * gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
 * gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.

---
  gcc/config/riscv/riscv-protos.h   |  3 +++
  gcc/config/riscv/riscv-v.cc   |  4 +++-
  gcc/config/riscv/riscv-vector-builtins-bases.cc   |  8 
  gcc/config/riscv/riscv-vsetvl.h   |  2 --
  gcc/config/riscv/riscv.cc |  3 +--
  .../riscv/rvv/base/binop_vx_constraint-171.c  |  4 ++--
  .../riscv/rvv/base/binop_vx_constraint-173.c  |  4 ++--
  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-24.c | 11 +++
  8 files changed, 26 insertions(+), 13 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-24.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 92e30a10f3c..e145ee6c69b 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -406,6 +406,9 @@ enum mask_policy
MASK_ANY = 2,
  };

+/* Return true if VALUE is agnostic or any policy.  */
+#define IS_AGNOSTIC(VALUE) (bool) (VALUE & 0x1 || (VALUE >> 1 & 0x1))
+
  enum class reduction_type
  {
UNORDERED,
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 427700192a3..6228ff3d92e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1672,9 +1672,11 @@ static rtx
  gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl)
  {
unsigned int sew = get_sew (vmode);
+  rtx tail_policy = gen_int_mode (get_prefer_tail_policy (), Pmode);
+  rtx mask_policy = gen_int_mode (get_prefer_mask_policy (), Pmode);
return gen_vsetvl_no_side_effects (Pmode, vl, avl, gen_int_mode (sew, 
Pmode),
  gen_int_mode (get_vlmul (vmode), Pmode),
-const0_rtx, const0_rtx);
+tail_policy, mask_policy);
  }

  /* GET VL * 2 rtx.  */
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 54582ee130c..8e679f72392 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -139,11 +139,11 @@ public:
  /* LMUL.  */
  e.add_input_operand (Pmode, gen_int_mode (get_vlmul (mode), Pmode));

-/* TA.  */
-e.add_input_operand (Pmode, gen_int_mode (1, Pmode));
+/* TAIL_ANY.  */
+e.add_input_operand (Pmode, gen_int_mode (get_prefer_tail_policy (), 
Pmode));

-/* MU.  */
-e.add_input_operand (Pmode, gen_int_mode (0, Pmode));
+/* MASK_ANY.  */
+e.add_input_operand (Pmode, gen_int_mode (get_prefer_mask_policy (), 
Pmode));
  return e.generate_insn (code_for_vsetvl_no_side_effects (Pmode));
}
  };
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 2a315e45f31..53549abfac5 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -21,8 +21,6 @@ along with GCC; see the file COPYING3.  If not see
  #ifndef GCC_RISCV_VSETVL_H
  #define GCC_RISCV_VSETVL_H

-#define IS_AGNOSTIC(VALUE) (bool) (VALUE & 0x1 || (VALUE >> 1 & 0x1))
-
  namespace riscv_vector {

  /* Classification of vsetvl instruction.  */
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d84fa2311fa..8bca8075713 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5246,8 +5246,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
 else if (code == CONST_INT)
   {
 /* Tail && Mask policy.  */
-   bool agnostic_p = UINTVAL (op) & 0x1;
-   asm_fprintf (file, "%s", agnostic_p ? "a" : "u");
+   asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
   }
 else
   output_operand_lossage ("invalid vector constant");
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/binop_vx_constraint-171.c 
b/gcc/testsui

[PATCH] Fix gcc.dg/tree-ssa/forwprop-42.c

2023-08-31 Thread Richard Biener via Gcc-patches
The testcase requires hardware support for V2DImode vectors because
otherwise we do not rewrite inserts via BIT_FIELD_REF to
BIT_INSERT_EXPR.  There's no effective target for this so the
following makes the testcase x86 specific, requiring and enabling SSE2.

Pushed.

* gcc.dg/tree-ssa/forwprop-42.c: Move ...
* gcc.target/i386/pr111228.c: ... here.  Enable SSE2.
---
 .../tree-ssa/forwprop-42.c => gcc.target/i386/pr111228.c}  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.dg/tree-ssa/forwprop-42.c => 
gcc.target/i386/pr111228.c} (76%)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c 
b/gcc/testsuite/gcc.target/i386/pr111228.c
similarity index 76%
rename from gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
rename to gcc/testsuite/gcc.target/i386/pr111228.c
index 257a05d3ec8..f0c3f9b77bf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c
+++ b/gcc/testsuite/gcc.target/i386/pr111228.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fdump-tree-cddce1" } */
+/* { dg-additional-options "-msse2" { target sse2 } } */
 
 typedef __UINT64_TYPE__ v2di __attribute__((vector_size(16)));
 
@@ -14,4 +15,4 @@ void test (v2di *v)
   g = res;
 }
 
-/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 
"cddce1" } } */
+/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 
"cddce1" { target sse2 } } } */
-- 
2.35.3


Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang  wrote:
>
> From: Kong Lingling 
>
> Current reload infrastructure does not support selective base_reg_class
> for backend insn. Add insn argument to base_reg_class for
> lra/reload usage.

I don't think this is the correct approach. Ideally, a memory
constraint should somehow encode its BASE/INDEX register class.
Instead of passing "insn", simply a different constraint could be used
in the constraint string of the relevant insn.

Uros.
>
> gcc/ChangeLog:
>
> * addresses.h (base_reg_class):  Add insn argument.
> Pass to MODE_CODE_BASE_REG_CLASS.
> (regno_ok_for_base_p_1): Add insn argument.
> Pass to REGNO_MODE_CODE_OK_FOR_BASE_P.
> (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
> * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add insn argument.
> (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> * config/gcn/gcn.h (MODE_CODE_BASE_REG_CLASS): Ditto.
> (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> * config/rl78/rl78.h (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto.
> (MODE_CODE_BASE_REG_CLASS): Ditto.
> * doc/tm.texi: Add insn argument for MODE_CODE_BASE_REG_CLASS
> and REGNO_MODE_CODE_OK_FOR_BASE_P.
> * doc/tm.texi.in: Ditto.
> * lra-constraints.cc (process_address_1): Pass insn to
> base_reg_class.
> (curr_insn_transform): Ditto.
> * reload.cc (find_reloads): Ditto.
> (find_reloads_address): Ditto.
> (find_reloads_address_1): Ditto.
> (find_reloads_subreg_address): Ditto.
> * reload1.cc (maybe_fix_stack_asms): Ditto.
> ---
>  gcc/addresses.h| 15 +--
>  gcc/config/avr/avr.h   |  5 +++--
>  gcc/config/gcn/gcn.h   |  4 ++--
>  gcc/config/rl78/rl78.h |  6 --
>  gcc/doc/tm.texi|  8 ++--
>  gcc/doc/tm.texi.in |  8 ++--
>  gcc/lra-constraints.cc | 15 +--
>  gcc/reload.cc  | 30 ++
>  gcc/reload1.cc |  2 +-
>  9 files changed, 58 insertions(+), 35 deletions(-)
>
> diff --git a/gcc/addresses.h b/gcc/addresses.h
> index 3519c241c6d..08b100cfe6d 100644
> --- a/gcc/addresses.h
> +++ b/gcc/addresses.h
> @@ -28,11 +28,12 @@ inline enum reg_class
>  base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
> addr_space_t as ATTRIBUTE_UNUSED,
> enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -   enum rtx_code index_code ATTRIBUTE_UNUSED)
> +   enum rtx_code index_code ATTRIBUTE_UNUSED,
> +   rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef MODE_CODE_BASE_REG_CLASS
>return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
> -  index_code);
> +  index_code, insn);
>  #else
>  #ifdef MODE_BASE_REG_REG_CLASS
>if (index_code == REG)
> @@ -56,11 +57,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>  machine_mode mode ATTRIBUTE_UNUSED,
>  addr_space_t as ATTRIBUTE_UNUSED,
>  enum rtx_code outer_code ATTRIBUTE_UNUSED,
> -enum rtx_code index_code ATTRIBUTE_UNUSED)
> +enum rtx_code index_code ATTRIBUTE_UNUSED,
> +rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
>  {
>  #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
>return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
> -   outer_code, index_code);
> +   outer_code, index_code, insn);
>  #else
>  #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
>if (index_code == REG)
> @@ -79,12 +81,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
>
>  inline bool
>  regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
> -enum rtx_code outer_code, enum rtx_code index_code)
> +enum rtx_code outer_code, enum rtx_code index_code,
> +rtx_insn* insn = NULL)
>  {
>if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
>  regno = reg_renumber[regno];
>
> -  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
> +  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
>  }
>
>  #endif /* GCC_ADDRESSES_H */
> diff --git a/gcc/config/avr/avr.h b/gcc/config/avr/avr.h
> index 8e7e00db13b..1d090fe0838 100644
> --- a/gcc/config/avr/avr.h
> +++ b/gcc/config/avr/avr.h
> @@ -280,12 +280,13 @@ enum reg_class {
>
>  #define REGNO_REG_CLASS(R) avr_regno_reg_class(R)
>
> -#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code)   \
> +#define MODE_CODE_BASE_REG_CLASS(mode, as, outer_code, index_code, insn)   \
>avr_mode_code_base_reg_class (mode, as, outer_code, index_code)
>
>  #define INDEX_REG_CLASS NO_REGS
>
> -#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mode, as, outer_code, index_code) 
> \
> +#define REGNO_MODE_CODE_OK_FOR_BASE_P(num, mod

Re: [PATCH v6 0/4] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-31 Thread Chenghui Pan
Thanks for the testing work! I will continue to try to find and resolve
some subtle issues too (Such as use compiler to compile some large
project). I'm also curious about the partly saved register problem and
will take some learning and investigation in the future.

On Thu, 2023-08-31 at 17:41 +0800, Xi Ruoyao wrote:
> On Thu, 2023-08-31 at 17:08 +0800, Chenghui Pan wrote:
> > This is an update of:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628303.html
> > 
> > Changes since last version of patch set:
> > - "dg-skip-if"-related Changes of the g++.dg/torture/vshuf*
> > testcases are reverted.
> >   (Replaced by __builtin_shuffle fix)
> > - Add fix of __builtin_shuffle() for Loongson SX/ASX (Implemeted by
> > adding
> >   vand/xvand insn in front of shuffle operation). There's no
> > significant performance
> >   impact in current state.
> 
> I think it's the correct fix, thanks!
> 
> I'm still unsure about the "partly saved register" issue (I'll need
> to
> resolve similar issues for "ILP32 ABI on loongarch64") but it seems
> GCC
> just don't attempt to preserve any vectors in register across
> function
> call.
> 
> After the patches are committed I (and Xuerui, maybe) will perform
> full
> system rebuild with LASX enabled to see if there are subtle issues. 
> IMO
> we still have plenty of time to fix them (if there are any) before
> GCC
> 14 release.
> 
> > - Rebased on the top of Yang Yujie's latest target configuration
> > interface patch set
> >  
> > (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628772.html)
> > .
> > 
> > Brief history of patch set:
> > v1 -> v2:
> > - Reduce usage of "unspec" in RTL template.
> > - Append Support of ADDR_REG_REG in LSX and LASX.
> > - Constraint docs are appended in gcc/doc/md.texi and ccomment
> > block.
> > - Codes related to vecarg are removed.
> > - Testsuite of LSX and LASX is added in v2. (Because of the size
> > limitation of
> >   mail list, these patches are not shown)
> > - Adjust the loongarch_expand_vector_init() function to reduce
> > instruction 
> > output amount.
> > - Some minor implementation changes of RTL templates.
> > 
> > v2 -> v3:
> > - Revert vabsd/xvabsd RTL templates to unspec impl.
> > - Resolve warning in gcc/config/loongarch/loongarch.cc when
> > bootstrapping 
> >   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -
> > mlasx".
> > - Remove redundant definitions in lasxintrin.h.
> > - Refine commit info.
> > 
> > v3 -> v4:
> > - Code simplification.
> > - Testsuite patches are splited from this patch set again and will
> > be
> >   submitted independently in the future.
> > 
> > v4 -> v5:
> > - Regression test fix (pr54346.c)
> > - Combine vilvh/xvilvh insn's RTL template impl.
> > - Add dg-skip-if for loongarch*-*-* in vshuf test inside
> > g++.dg/torture
> >   (reverted in this version)
> > 
> > Lulu Cheng (4):
> >   LoongArch: Add Loongson SX base instruction support.
> >   LoongArch: Add Loongson SX directive builtin function support.
> >   LoongArch: Add Loongson ASX base instruction support.
> >   LoongArch: Add Loongson ASX directive builtin function support.
> > 
> >  gcc/config.gcc    |    2 +-
> >  gcc/config/loongarch/constraints.md   |  131 +-
> >  gcc/config/loongarch/genopts/loongarch.opt.in |    4 +
> >  gcc/config/loongarch/lasx.md  | 5104
> > 
> >  gcc/config/loongarch/lasxintrin.h | 5338
> > +
> >  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
> >  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
> >  gcc/config/loongarch/loongarch-modes.def  |   39 +
> >  gcc/config/loongarch/loongarch-protos.h   |   35 +
> >  gcc/config/loongarch/loongarch.cc | 4751
> > ++-
> >  gcc/config/loongarch/loongarch.h  |  117 +-
> >  gcc/config/loongarch/loongarch.md |   56 +-
> >  gcc/config/loongarch/loongarch.opt    |    4 +
> >  gcc/config/loongarch/lsx.md   | 4467
> > ++
> >  gcc/config/loongarch/lsxintrin.h  | 5181
> > 
> >  gcc/config/loongarch/predicates.md    |  333 +-
> >  gcc/doc/md.texi   |   11 +
> >  17 files changed, 28645 insertions(+), 280 deletions(-)
> >  create mode 100644 gcc/config/loongarch/lasx.md
> >  create mode 100644 gcc/config/loongarch/lasxintrin.h
> >  create mode 100644 gcc/config/loongarch/lsx.md
> >  create mode 100644 gcc/config/loongarch/lsxintrin.h
> > 
> 



Re: [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang  wrote:
>
> From: Kong Lingling 
>
> These legacy insn in opcode map0/1 only support GPR16,
> and do not have vex/evex counterpart, directly adjust constraints and
> add gpr32 attr to patterns.
>
> insn list:
> 1. xsave/xsave64, xrstor/xrstor64
> 2. xsaves/xsaves64, xrstors/xrstors64
> 3. xsavec/xsavec64
> 4. xsaveopt/xsaveopt64
> 5. fxsave64/fxrstor64

IMO, instructions should be handled with a reversed approach. Add "h"
constraint (and memory constraint that can handle EGPR) to
instructions that CAN use EGPR (together with a relevant "enabled"
attribute. We have had the same approach with "x" to "v" transition
with SSE registers. If we "forgot" to add "v" to the instruction, it
still worked, but not to its full potential w.r.t available registers.

Uros.
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (): Set attr gpr32 0 and constraint
> Bt.
> (_rex64): Likewise.
> (_rex64): Likewise.
> (64): Likewise.
> (fxsave64): Likewise.
> (fxstore64): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * lib/target-supports.exp: Add apxf check.
> * gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
> * gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler 
> test.
> ---
>  gcc/config/i386/i386.md   | 18 +++
>  .../i386/apx-legacy-insn-check-norex2-asm.c   |  5 
>  .../i386/apx-legacy-insn-check-norex2.c   | 30 +++
>  gcc/testsuite/lib/target-supports.exp | 10 +++
>  4 files changed, 57 insertions(+), 6 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
>  create mode 100644 
> gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index b9eaea78f00..83ad01b43c1 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -25626,11 +25626,12 @@ (define_insn "fxsave"
>  (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "fxsave64"
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
> (unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE64))]
>"TARGET_64BIT && TARGET_FXSR"
>"fxsave64\t%0"
>[(set_attr "type" "other")
> +   (set_attr "gpr32" "0")
> (set_attr "memory" "store")
> (set (attr "length")
>  (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
> @@ -25646,11 +25647,12 @@ (define_insn "fxrstor"
>  (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "fxrstor64"
> -  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
> +  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "Bt")]
> UNSPECV_FXRSTOR64)]
>"TARGET_64BIT && TARGET_FXSR"
>"fxrstor64\t%0"
>[(set_attr "type" "other")
> +   (set_attr "gpr32" "0")
> (set_attr "memory" "load")
> (set (attr "length")
>  (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
> @@ -25704,7 +25706,7 @@ (define_insn ""
>  (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "_rex64"
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
> (unspec_volatile:BLK
>  [(match_operand:SI 1 "register_operand" "a")
>   (match_operand:SI 2 "register_operand" "d")]
> @@ -25713,11 +25715,12 @@ (define_insn "_rex64"
>"\t%0"
>[(set_attr "type" "other")
> (set_attr "memory" "store")
> +   (set_attr "gpr32" "0")
> (set (attr "length")
>  (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn ""
> -  [(set (match_operand:BLK 0 "memory_operand" "=m")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Bt")
> (unspec_volatile:BLK
>  [(match_operand:SI 1 "register_operand" "a")
>   (match_operand:SI 2 "register_operand" "d")]
> @@ -25726,6 +25729,7 @@ (define_insn ""
>"\t%0"
>[(set_attr "type" "other")
> (set_attr "memory" "store")
> +   (set_attr "gpr32" "0")
> (set (attr "length")
>  (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
>
> @@ -25743,7 +25747,7 @@ (define_insn ""
>
>  (define_insn "_rex64"
> [(unspec_volatile:BLK
> - [(match_operand:BLK 0 "memory_operand" "m")
> + [(match_operand:BLK 0 "memory_operand" "Bt")
>(match_operand:SI 1 "register_operand" "a")
>(match_operand:SI 2 "register_operand" "d")]
>   ANY_XRSTOR)]
> @@ -25751,12 +25755,13 @@ (define_insn "_rex64"
>"\t%0"
>[(set_attr "type" "other")
> (set_attr "memory" "load")
> +   (set_attr "gpr32" "0")
> (set (attr "length")
>  (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
>
>  (define_insn "64"
> [(unspec_volatile:BLK
> - [(match_operand:BLK 0 "memory_operan

[PATCH] testsuite/vect: Make match patterns more accurate.

2023-08-31 Thread Robin Dapp via Gcc-patches
Hi,

on some targets we fail to vectorize with the first type the vectorizer
tries but succeed with the second.  This patch changes several regex
patterns to reflect that behavior.

Before we would look for a single occurrence of e.g.
"vect_recog_dot_prod_pattern" but would possible find two (one for each
attempted mode).  The new pattern tries to match sequences where we
first have a "vect_recog_dot_prod_pattern" and a "succeeded" afterwards
while making sure there is no "failed" or "Re-trying" in between.

I realized we already only do scan-tree-dump instead of
scan-tree-dump-times in some related testcases, probably for the same
reason but I didn't touch them for now.

Testsuite unchanged on x86, aarch64 and Power10.

Regards
 Robin

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-reduc-dot-s16a.c: Adjust regex pattern.
* gcc.dg/vect/vect-reduc-dot-s8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u16b.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8a.c: Ditto.
* gcc.dg/vect/vect-reduc-dot-u8b.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2a.c: Ditto.
* gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Ditto.
* gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Ditto.
---
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c  | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c  | 4 ++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16a.c | 5 +++--
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u16b.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8a.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-dot-u8b.c  | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-1c-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2a.c   | 2 +-
 gcc/testsuite/gcc.dg/vect/vect-reduc-pattern-2b-big-array.c | 2 +-
 gcc/testsuite/gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c| 4 ++--
 13 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
index ffbc9706901..d826828e3d6 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c
@@ -51,7 +51,7 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 
"vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_sdot_hi } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_widen_mult_hi_to_si } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
index 05e343ad782..4e1e0b234f4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8a.c
@@ -55,8 +55,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 
"vect" } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_sdot_qi } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
vect_widen_mult_qi_to_hi && vect_widen_sum_hi_to_si } } } } */
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c 
b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
index 82c648cc73c..cb88ad5b639 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-dot-s8b.c
@@ -53,8 +53,8 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: detected" 1 
"vect" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 
1 "vect" } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_dot_prod_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 "vect" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: 
detected(?:(?!failed)(?!Re-trying).)*succeeded" 1 

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> > From: Kong Lingling 
> >
> > In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> > usage by default from mapping the common reg/mem constraint to non-EGPR
> > constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> > for inline asm.
> >
> > gcc/ChangeLog:
> >
> >   * config/i386/i386.cc (INCLUDE_STRING): Add include for
> >   ix86_md_asm_adjust.
> >   (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
> >   target option, map reg/mem constraints to non-EGPR constraints.
> >   * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> > ---
> >  gcc/config/i386/i386.cc   |  44 +++
> >  gcc/config/i386/i386.opt  |   5 +
> >  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++
> >  3 files changed, 156 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index d26d9ab0d9d..9460ebbfda4 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
> > License
> >  along with GCC; see the file COPYING3.  If not see
> >  .  */
> >
> > +#define INCLUDE_STRING
> >  #define IN_TARGET_CODE 1
> >
> >  #include "config.h"
> > @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec &outputs, vec & 
> > /*inputs*/,
> >bool saw_asm_flag = false;
> >
> >start_sequence ();
> > +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> > +   constraints, will eventually map all the usable constraints in the 
> > future. */
>
> I think there should be some constraint which explicitly has all the 32
> GPRs, like there is one for just all 16 GPRs (h), so that regardless of
> -mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.
>
> Also, what about the "g" constraint?  Shouldn't there be another for "g"
> without r16..r31?  What about the various other memory
> constraints ("<", "o", ...)?

I think we should leave all existing constraints as they are, so "r"
covers only GPR16, "m" and "o" to only use GPR16. We can then
introduce "h" to instructions that have the ability to handle EGPR.
This would be somehow similar to the SSE -> AVX512F transition, where
we still have "x" for SSE16 and "v" was introduced as a separate
register class for EVEX SSE registers. This way, asm will be
compatible, when "r", "m", "o" and "g" are used. The new memory
constraint "Bt", should allow new registers, and should be added to
the constraint string as a separate constraint, and conditionally
enabled by relevant "isa" (AKA "enabled") attribute.

Uros.

> > +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> > +{
> > +  /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> > +  and replace only r, exclude Br and Yr.  */
> > +  for (unsigned i = 0; i < constraints.length (); i++)
> > + {
> > +   std::string *s = new std::string (constraints[i]);
>
> Doesn't this leak memory (all the time)?
> I must say I don't really understand why you need to use std::string here,
> but certainly it shouldn't leak.
>
> > +   size_t pos = s->find ('r');
> > +   while (pos != std::string::npos)
> > + {
> > +   if (pos > 0
> > +   && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> > + pos = s->find ('r', pos + 1);
> > +   else
> > + {
> > +   s->replace (pos, 1, "h");
> > +   constraints[i] = (const char*) s->c_str ();
>
> Formatting (space before *).  The usual way for constraints is ggc_strdup on
> some string in a buffer.  Also, one could have several copies or r (or m, 
> memory (doesn't
> that appear just in clobbers?  And that doesn't look like something that
> should be replaced), Bm, e.g. in various alternatives.  So, you
> need to change them all, not just the first hit.  "r,r,r,m" and the like.
> Normally, one would simply walk the constraint string, parsing the special
> letters (+, =, & etc.) and single letter constraints and 2 letter
> constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
> Either do it in 2 passes, first one counts how long constraint string one
> will need after the adjustments (and whether to adjust something at all),
> then if needed XALLOCAVEC it and adjust in there, or say use a
> auto_vec for
> it.
>
> > +   break;
> > + }
> > + }
> > + }
> > +  /* Also map "m/memory/Bm" constraint that may use GPR32, replace 
> > them with
> > +  "Bt/Bt/BT".  */
> > +  for (unsign

Re: [PATCH] RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions

2023-08-31 Thread Lehua Ding

Hi Robin,

Thanks for these comments.

On 2023/8/31 17:16, Robin Dapp wrote:

Hi Lehua,

thanks, this definitely goes into the direction of what I had in mind and
simplifies a lot of the reduntant emit_... so it's good to have it.

I was too slow for a detailed response :)  So just some high-level comments.

One thing I noticed is the overloading of "MASK_OP",  we use it as
"operation on masks" i.e. an insn as well as "mask policy".  IMHO we could
get rid of UNARY_MASK_OP and BINARY_MASK_OP and just decide whether to
add a mask policy depending on if all operands are masks (the same way we
did before).


I think I should change the marco name here form __MASK_OP to 
__NORMAL_OP_MASK. MASK_OP means this is a insn operate on mask operands. 
I lift up it here because I want to simplify the emit_insn method.



Related, and seeing that the MASK in UNARY_MASK_OP is somewhat redundant,
I feel we still mix concerns a bit.  For example it is not obvious, from
the name at least, why a WIDEN_TERNARY_OP does not have a merge operand
and the decision making seems very "enum centered" now :D


As the above says, UNARY_MASK_OP means it is an operator of mask 
operands and itself don't have mask operand. If the operation depends on 
a mask, then the name should be UNARY_MASK_OP_TAMA.


For the WIDEN_TERNARY_OP, This is because of the design problem of 
vwmacc pattern (as bellow), maybe it can be unified with wmacc and then 
WIDEN_TERNARY_OP is not needed.


(define_insn "@pred_widen_mul_plus"
  [(set (match_operand:VWEXTI 0 "register_operand")
(if_then_else:VWEXTI
  (unspec:
[(match_operand: 1 "vector_mask_operand")
 (match_operand 5 "vector_length_operand")
 (match_operand 6 "const_int_operand")
 (match_operand 7 "const_int_operand")
 (match_operand 8 "const_int_operand")
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (plus:VWEXTI
(mult:VWEXTI
  (any_extend:VWEXTI
(match_operand: 3 "register_operand"))
  (any_extend:VWEXTI
(match_operand: 4 "register_operand")))
(match_operand:VWEXTI 2 "register_operand"))
  (match_dup 2)))]
  "TARGET_VECTOR"
  "vwmacc.vv\t%0,%3,%4%p1"
  [(set_attr "type" "viwmuladd")
   (set_attr "mode" "")])



In general we use the NULLARY, UNARY, BINARY, TERNARY prefixes just
to determine the number of sources which doesn't seem really necessary
because a user of e.g. NEG will already know that there only is one
source - he already specified it and currently needs to, redundantly,
say UNARY again.


From the caller's point of view, he does know how many sources, but the 
emit_insn method needs to use this flag(NULLARY_OP_P, UNARY_OP_P) to 
determine how many sources to add.


Oh, you mean we can get oerands number from insn_data[icode].n_operands? 
In this case, we don't really need to distinguish between NULLARY, UNARY 
and ect.



If we split off the destination and sources from mask, merge and the rest
we could ditch them altogether.

What about
  emit_(non)vlmax_insn (icode, *operands (just dest and sources),
   mask, merge, tail/mask policy, frm)

with mask defaulting to NULL and merge defaulting to VUNDEF?  So ideally,
and in the easy case, the call would just degenerate to
  emit_..._insn (icode, operands).
  
I realize this will cause some complications on the "other side" but with

the enum in place it should still be doable?


That is to say, when the user needs to emit a COND_NEG, do you need to 
call it like this?:

  emit_vlmax_insn (pred_neg, ops, mask, vundef(), ta, ma)

Is there too much focus on operands that don't need to be passed in? 
Like merge, tail and mask policy operands.


Maybe I don't understand enough here, can you explain it in some more 
detail?


--
Best,
Lehua



Re: [PATCH V3 2/3] RISC-V: Part-2: Save/Restore vector registers which need to be preversed

2023-08-31 Thread Kito Cheng via Gcc-patches
Could you rebase the patch again, it seems got some conflict with zcmt
which I commit in the past few days...

On Wed, Aug 30, 2023 at 9:54 AM Lehua Ding  wrote:
>
> Because functions which follow vector calling convention variant has
> callee-saved vector reigsters but functions which follow standard calling
> convention don't have. We need to distinguish which function callee is so that
> we can tell GCC exactly which vector registers callee will clobber. So I 
> encode
> the callee's calling convention information into the calls rtx pattern like
> AArch64. The old operand 2 and 3 of call pattern which copy from MIPS target 
> are
> useless and removed according to my analysis.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-sr.cc 
> (riscv_remove_unneeded_save_restore_calls): Pass riscv_cc.
> * config/riscv/riscv.cc (struct riscv_frame_info): Add new fileds.
> (riscv_frame_info::reset): Reset new fileds.
> (riscv_call_tls_get_addr): Pass riscv_cc.
> (riscv_function_arg): Return riscv_cc for call patterm.
> (riscv_insn_callee_abi): Implement TARGET_INSN_CALLEE_ABI.
> (riscv_save_reg_p): Add vector callee-saved check.
> (riscv_save_libcall_count): Add vector save area.
> (riscv_compute_frame_info): Ditto.
> (riscv_restore_reg): Update for type change.
> (riscv_for_each_saved_v_reg): New function save vector registers.
> (riscv_first_stack_step): Handle funciton with vector callee-saved 
> registers.
> (riscv_expand_prologue): Ditto.
> (riscv_expand_epilogue): Ditto.
> (riscv_output_mi_thunk): Pass riscv_cc.
> (TARGET_INSN_CALLEE_ABI): Implement TARGET_INSN_CALLEE_ABI.
> * config/riscv/riscv.md: Add CALLEE_CC operand for call pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c: New test.
> * gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c: New test.
> * gcc.target/riscv/rvv/base/abi-callee-saved-1.c: New test.
> * gcc.target/riscv/rvv/base/abi-callee-saved-2.c: New test.
> ---
>  gcc/config/riscv/riscv-sr.cc  |  12 +-
>  gcc/config/riscv/riscv.cc | 222 +++---
>  gcc/config/riscv/riscv.md |  43 +++-
>  .../rvv/base/abi-callee-saved-1-fixed-1.c |  85 +++
>  .../rvv/base/abi-callee-saved-1-fixed-2.c |  85 +++
>  .../riscv/rvv/base/abi-callee-saved-1.c   |  87 +++
>  .../riscv/rvv/base/abi-callee-saved-2.c   | 117 +
>  7 files changed, 606 insertions(+), 45 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1-fixed-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/abi-callee-saved-2.c
>
> diff --git a/gcc/config/riscv/riscv-sr.cc b/gcc/config/riscv/riscv-sr.cc
> index 7248f04d68f..e6e17685df5 100644
> --- a/gcc/config/riscv/riscv-sr.cc
> +++ b/gcc/config/riscv/riscv-sr.cc
> @@ -447,12 +447,18 @@ riscv_remove_unneeded_save_restore_calls (void)
>&& !SIBCALL_REG_P (REGNO (target)))
>  return;
>
> +  /* Extract RISCV CC from the UNSPEC rtx.  */
> +  rtx unspec = XVECEXP (callpat, 0, 1);
> +  gcc_assert (GET_CODE (unspec) == UNSPEC
> + && XINT (unspec, 1) == UNSPEC_CALLEE_CC);
> +  riscv_cc cc = (riscv_cc) INTVAL (XVECEXP (unspec, 0, 0));
>rtx sibcall = NULL;
>if (set_target != NULL)
> -sibcall
> -  = gen_sibcall_value_internal (set_target, target, const0_rtx);
> +sibcall = gen_sibcall_value_internal (set_target, target, const0_rtx,
> + gen_int_mode (cc, SImode));
>else
> -sibcall = gen_sibcall_internal (target, const0_rtx);
> +sibcall
> +  = gen_sibcall_internal (target, const0_rtx, gen_int_mode (cc, SImode));
>
>rtx_insn *before_call = PREV_INSN (call);
>remove_insn (call);
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index aa6b46d7611..09c9e09e83a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -108,6 +108,9 @@ struct GTY(())  riscv_frame_info {
>/* Likewise FPR X.  */
>unsigned int fmask;
>
> +  /* Likewise for vector registers.  */
> +  unsigned int vmask;
> +
>/* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>unsigned save_libcall_adjustment;
>
> @@ -115,6 +118,10 @@ struct GTY(())  riscv_frame_info {
>poly_int64 gp_sp_offset;
>poly_int64 fp_sp_offset;
>
> +  /* Top and bottom offsets of vector save areas from frame bottom.  */
> +  poly_int64 v_sp_offset_top;
> +  poly_int64 v_sp_offset_bottom;
> +
>/* Offset of virtual frame pointer from stack pointer/frame bottom */
>poly_int64 frame_pointer_offset;
>
> @@ -265,7 +272,7 @@ unsigned r

Re: [PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 04:20:19PM +0800, Hongyu Wang via Gcc-patches wrote:
> For vector move insns like vmovdqa/vmovdqu, their evex counterparts
> requrire explicit suffix 64/32/16/8. The usage of these instruction
> are prohibited under AVX10_1 or AVX512F, so for AVX2+APX_F we select
> vmovaps/vmovups for vector load/store insns that contains EGPR.

Why not make it dependent on AVX512VL?
I.e. if egpr_p && TARGET_AVX512VL, still use vmovdqu16 or vmovdqa16
and the like, and only if !evex_reg_p && egpr_p && !TARGET_AVX512VL
fall back to what you're doing?
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
>   adjust mnemonic for vmovduq/vmovdqa.
>   * config/i386/sse.md 
> (*_vinsert_0):
>   Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
>   (avx_vec_concat): Likewise, and separate alternative 0 to
>   avx_noavx512f.

Jakub



Re: [PATCH v6 0/4] Add Loongson SX/ASX instruction support to LoongArch target.

2023-08-31 Thread Xi Ruoyao via Gcc-patches
On Thu, 2023-08-31 at 17:08 +0800, Chenghui Pan wrote:
> This is an update of:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628303.html
> 
> Changes since last version of patch set:
> - "dg-skip-if"-related Changes of the g++.dg/torture/vshuf* testcases are 
> reverted.
>   (Replaced by __builtin_shuffle fix)
> - Add fix of __builtin_shuffle() for Loongson SX/ASX (Implemeted by adding
>   vand/xvand insn in front of shuffle operation). There's no significant 
> performance
>   impact in current state.

I think it's the correct fix, thanks!

I'm still unsure about the "partly saved register" issue (I'll need to
resolve similar issues for "ILP32 ABI on loongarch64") but it seems GCC
just don't attempt to preserve any vectors in register across function
call.

After the patches are committed I (and Xuerui, maybe) will perform full
system rebuild with LASX enabled to see if there are subtle issues.  IMO
we still have plenty of time to fix them (if there are any) before GCC
14 release.

> - Rebased on the top of Yang Yujie's latest target configuration interface 
> patch set
>   (https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628772.html).
> 
> Brief history of patch set:
> v1 -> v2:
> - Reduce usage of "unspec" in RTL template.
> - Append Support of ADDR_REG_REG in LSX and LASX.
> - Constraint docs are appended in gcc/doc/md.texi and ccomment block.
> - Codes related to vecarg are removed.
> - Testsuite of LSX and LASX is added in v2. (Because of the size limitation of
>   mail list, these patches are not shown)
> - Adjust the loongarch_expand_vector_init() function to reduce instruction 
> output amount.
> - Some minor implementation changes of RTL templates.
> 
> v2 -> v3:
> - Revert vabsd/xvabsd RTL templates to unspec impl.
> - Resolve warning in gcc/config/loongarch/loongarch.cc when bootstrapping 
>   with BOOT_CFLAGS="-O2 -ftree-vectorize -fno-vect-cost-model -mlasx".
> - Remove redundant definitions in lasxintrin.h.
> - Refine commit info.
> 
> v3 -> v4:
> - Code simplification.
> - Testsuite patches are splited from this patch set again and will be
>   submitted independently in the future.
> 
> v4 -> v5:
> - Regression test fix (pr54346.c)
> - Combine vilvh/xvilvh insn's RTL template impl.
> - Add dg-skip-if for loongarch*-*-* in vshuf test inside g++.dg/torture
>   (reverted in this version)
> 
> Lulu Cheng (4):
>   LoongArch: Add Loongson SX base instruction support.
>   LoongArch: Add Loongson SX directive builtin function support.
>   LoongArch: Add Loongson ASX base instruction support.
>   LoongArch: Add Loongson ASX directive builtin function support.
> 
>  gcc/config.gcc    |    2 +-
>  gcc/config/loongarch/constraints.md   |  131 +-
>  gcc/config/loongarch/genopts/loongarch.opt.in |    4 +
>  gcc/config/loongarch/lasx.md  | 5104 
>  gcc/config/loongarch/lasxintrin.h | 5338 +
>  gcc/config/loongarch/loongarch-builtins.cc    | 2686 -
>  gcc/config/loongarch/loongarch-ftypes.def |  666 +-
>  gcc/config/loongarch/loongarch-modes.def  |   39 +
>  gcc/config/loongarch/loongarch-protos.h   |   35 +
>  gcc/config/loongarch/loongarch.cc | 4751 ++-
>  gcc/config/loongarch/loongarch.h  |  117 +-
>  gcc/config/loongarch/loongarch.md |   56 +-
>  gcc/config/loongarch/loongarch.opt    |    4 +
>  gcc/config/loongarch/lsx.md   | 4467 ++
>  gcc/config/loongarch/lsxintrin.h  | 5181 
>  gcc/config/loongarch/predicates.md    |  333 +-
>  gcc/doc/md.texi   |   11 +
>  17 files changed, 28645 insertions(+), 280 deletions(-)
>  create mode 100644 gcc/config/loongarch/lasx.md
>  create mode 100644 gcc/config/loongarch/lasxintrin.h
>  create mode 100644 gcc/config/loongarch/lsx.md
>  create mode 100644 gcc/config/loongarch/lsxintrin.h
> 

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 11:26:26AM +0200, Richard Biener wrote:
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > From: Kong Lingling 
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> >roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> 
> How are GPRs involved in the above?  Or did I misunderstand something?

Those instructions allow memory operands, and say vptest (%r18), %xmm7
isn't supported.

Jakub



Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
 wrote:
>
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > From: Kong Lingling 
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> >roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
>
> How are GPRs involved in the above?  Or did I misunderstand something?

Following up myself - for the memory operand alternatives I guess.  How about
simply disabling the memory alternatives when EGPR is active?  Wouldn't
that simplify the initial patchset a lot?  Re-enabling them when
deemed important
could be done as followup then?

Richard.

> > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > prototype.
> > * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > function.
> > * config/i386/i386.md (sse4_1_round2): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives, adjust
> > alternative outputs if evex reg is mentioned.
> > * config/i386/sse.md (_ptest): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives.
> > (ptesttf2): Likewise.
> > (_round > (sse4_1_round): Likewise.
> > (sse4_2_pcmpestri): Likewise.
> > (sse4_2_pcmpestrm): Likewise.
> > (sse4_2_pcmpestr_cconly): Likewise.
> > (sse4_2_pcmpistr): Likewise.
> > (sse4_2_pcmpistri): Likewise.
> > (sse4_2_pcmpistrm): Likewise.
> > (sse4_2_pcmpistr_cconly): Likewise.
> > (aesimc): Likewise.
> > (aeskeygenassist): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > tests.
> > ---
> >  gcc/config/i386/i386-protos.h |  1 +
> >  gcc/config/i386/i386.cc   | 13 +++
> >  gcc/config/i386/i386.md   |  3 +-
> >  gcc/config/i386/sse.md| 93 +--
> >  .../i386/apx-legacy-insn-check-norex2.c   | 55 ++-
> >  5 files changed, 132 insertions(+), 33 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 78eb3e0f584..bbb219e3039 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> >  extern bool x86_extended_reg_mentioned_p (rtx);
> >  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> >  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> >  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index f5d642948bc..ec93c5bab97 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> >return false;
> >  }
> >
> > +/* Return true when rtx operands mentions register that must be encoded 
> > using
> > +   evex prefix.  */
> > +bool
> > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > +{
> > +  int i;
> > +  for (i = 0; i < nops; i++)
> > +if (EXT_REX_SSE_REG_P (operands[i])
> > +   || x86_extended_rex2reg_mentioned_p (operands[i]))
> > +  return true;
> > +  return false;
> > +}
> > +
> >  /* If profitable, negate (without causing overflow) integer constant
> > of mode MODE at location LOC.  Return true in this case.  */
> >  bool
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 83ad01b43c1..4c305e72389 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -21603,7 +21603,7 @@ (define_expand "significand2"
> >  (define_insn "sse4_1_round2"
> >[(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > (unspec:MODEFH
> > - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> >(match_operand:SI 2 "const_0_to_15_operand")]
> >   UNSPEC_ROUND))]
> >"TARGET_SSE4_1"
> > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round2"
> >[(set_attr "type" "ssecvt")
> > (set_attr "prefix_extra" "1,1,1,*,*")
> > (set_attr "length_immediate" "1")
> > +   (set_attr "gpr32" "1,1,0,1,1")
> > (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > diff --git

Re: [PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
 wrote:
>
> From: Kong Lingling 
>
> Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> but no evex counterpart.
>
> insn list:
> 1. phminposuw/vphminposuw
> 2. ptest/vptest
> 3. roundps/vroundps, roundpd/vroundpd,
>roundss/vroundss, roundsd/vroundsd
> 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm

How are GPRs involved in the above?  Or did I misunderstand something?

> 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
>
> gcc/ChangeLog:
>
> * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> prototype.
> * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> function.
> * config/i386/i386.md (sse4_1_round2): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives, adjust
> alternative outputs if evex reg is mentioned.
> * config/i386/sse.md (_ptest): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives.
> (ptesttf2): Likewise.
> (_round (sse4_1_round): Likewise.
> (sse4_2_pcmpestri): Likewise.
> (sse4_2_pcmpestrm): Likewise.
> (sse4_2_pcmpestr_cconly): Likewise.
> (sse4_2_pcmpistr): Likewise.
> (sse4_2_pcmpistri): Likewise.
> (sse4_2_pcmpistrm): Likewise.
> (sse4_2_pcmpistr_cconly): Likewise.
> (aesimc): Likewise.
> (aeskeygenassist): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> tests.
> ---
>  gcc/config/i386/i386-protos.h |  1 +
>  gcc/config/i386/i386.cc   | 13 +++
>  gcc/config/i386/i386.md   |  3 +-
>  gcc/config/i386/sse.md| 93 +--
>  .../i386/apx-legacy-insn-check-norex2.c   | 55 ++-
>  5 files changed, 132 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 78eb3e0f584..bbb219e3039 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
>  extern bool x86_extended_reg_mentioned_p (rtx);
>  extern bool x86_extended_rex2reg_mentioned_p (rtx);
> +extern bool x86_evex_reg_mentioned_p (rtx [], int);
>  extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
>  extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f5d642948bc..ec93c5bab97 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
>return false;
>  }
>
> +/* Return true when rtx operands mentions register that must be encoded using
> +   evex prefix.  */
> +bool
> +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> +{
> +  int i;
> +  for (i = 0; i < nops; i++)
> +if (EXT_REX_SSE_REG_P (operands[i])
> +   || x86_extended_rex2reg_mentioned_p (operands[i]))
> +  return true;
> +  return false;
> +}
> +
>  /* If profitable, negate (without causing overflow) integer constant
> of mode MODE at location LOC.  Return true in this case.  */
>  bool
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 83ad01b43c1..4c305e72389 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -21603,7 +21603,7 @@ (define_expand "significand2"
>  (define_insn "sse4_1_round2"
>[(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> (unspec:MODEFH
> - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
>(match_operand:SI 2 "const_0_to_15_operand")]
>   UNSPEC_ROUND))]
>"TARGET_SSE4_1"
> @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round2"
>[(set_attr "type" "ssecvt")
> (set_attr "prefix_extra" "1,1,1,*,*")
> (set_attr "length_immediate" "1")
> +   (set_attr "gpr32" "1,1,0,1,1")
> (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 05963de9219..456713b991a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd"
>
>  (define_insn "sse4_1_phminposuw"
>[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> -   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> +   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
>  UNSPEC_PHMINPOSUW))]
>"TARGET_SSE4_1"
>"%vphminposuw\t{%1, %

Re: [PATCH 00/13] [RFC] Support Intel APX EGPR

2023-08-31 Thread Richard Biener via Gcc-patches
On Thu, Aug 31, 2023 at 10:22 AM Hongyu Wang via Gcc-patches
 wrote:
>
> Intel Advanced performance extension (APX) has been released in [1].
> It contains several extensions such as extended 16 general purpose registers
> (EGPRs), push2/pop2, new data destination (NDD), conditional compare
> (CCMP/CTEST) combined with suppress flags write version of common instructions
> (NF). This RFC focused on EGPR implementation in GCC.
>
> APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE
> instructions. For the remaining ones, it promotes some of them using evex
> prefix for EGPR.  The main issue in APX is that not all legacy/sse/vex
> instructions support EGPR. For example, instructions in legacy opcode map2/3
> cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1
> instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is 
> supported
> in their evex forms but not vex forms, which means the mnemonics with no evex
> forms also cannot use EGPR, e.g., vphaddw.
>
> Such limitation brings some challenge with current GCC infrastructure.
> Generally, we use constraints to guide register allocation behavior. For
> register operand, it is easy to add a new constraint to certain insn and limit
> it to legacy or REX registers. But for memory operand, if we only use
> constraint to limit base/index register choice, reload has no backoff when
> process_address allocates any egprs to base/index reg, and then any 
> post-reload
> pass would get ICE from the constraint.

How realistic would it be to simply disable instructions not supporting EGPR?
I hope there are alternatives that would be available in actual APX
implementations?
Otherwise this design limitation doesn't shed a very positive light on
the designers ...

How sure are we actual implementations with APX will appear (just
remembering SSE5...)?
I'm quite sure it's not going to be 2024 so would it be realistic to
post-pone APX work
to next stage1, targeting GCC 15 only?

> Here is what we did to address the issue:
>
> Middle-end:
> -   Add rtx_insn parameter to base_reg_class, reuse the
> MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter.
> -   Add index_reg_class like base_reg_class, calls new 
> INSN_INDEX_REG_CLASS
> macro with rtx_insn parameter.
> -   In process_address_1, add rtx_insn parameter to call sites of
> base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with
> rtx_insn parameter.
>
> Back-end:
> -   Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with
> corresponding regno checks for EGPRs.
> -   Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs.
> -   Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is
> not enabled, clear r16-r31 in accessible_reg_set.
> -   New register_constraint “h” and memory_constraint “Bt” that disallows
> EGPRs in operand.
> -   New asm_gpr32 flag option to enable/disable gpr32 for inline asm,
>   disabled by default.
> -   If asm_gpr32 is disabled, replace constraints “r” to “h”, and
> “m/memory” to “Bt”.
> -   Extra insn attribute gpr32, value 0 indicates the alternative cannot
> use EGPRs.
> -   Add target functions for base_reg_class and index_reg_class, calls a
> helper function to verify if insn can use EGPR in its memory_operand.
> -   In the helper function, the verify process works as follow:
> 1. Returns true if APX_EGPR disabled or insn is null.
> 2. If the insn is inline asm, returns asm_gpr32 flag.
> 3. Returns false for unrecognizable insn.
> 4. Save recog_data and which_alternative, extract the insn, and restore 
> them
> before return.
> 5. Loop through all enabled alternatives, if one of the enabled 
> alternatives
> have attr_gpr32 0, returns false, otherwise returns true.
> -   For insn alternatives that cannot use gpr32 in register_operand, use h
> constraint instead of r.
> -   For insn alternatives that cannot use gpr32 in memory operand, use Bt
> constraint instead of m, and set corresponding attr_gpr32 to 0.
> -   Split output template with %v if the sse version of mnemonic cannot 
> use
> gpr32.
> -   For insn alternatives that cannot use gpr32 in memory operand, 
> classify
> the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so
> the helper function can properly loop through the available enabled mask.
>
> Specifically for inline asm, we currently just map “r/m/memory” constraints as
> an example. Eventually we will support entire mapping of all common 
> constraints
> if the mapping method was accepted.
>
> Also, for vex instructions, currently we assume egpr was supported if they 
> have
> evex counterpart, since any APX enabled machine will have AVX10 support for 
> all
> the evex encodings. We just disabled those mnemonics that doesn’t support 
> EGPR.
> So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics.
>
> We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote:
> From: Kong Lingling 
> 
> In inline asm, we do not know if the insn can use EGPR, so disable EGPR
> usage by default from mapping the common reg/mem constraint to non-EGPR
> constraints. Use a flag mapx-inline-asm-use-gpr32 to enable EGPR usage
> for inline asm.
> 
> gcc/ChangeLog:
> 
>   * config/i386/i386.cc (INCLUDE_STRING): Add include for
>   ix86_md_asm_adjust.
>   (ix86_md_asm_adjust): When APX EGPR enabled without specifying the
>   target option, map reg/mem constraints to non-EGPR constraints.
>   * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/apx-inline-gpr-norex2.c: New test.
> ---
>  gcc/config/i386/i386.cc   |  44 +++
>  gcc/config/i386/i386.opt  |   5 +
>  .../gcc.target/i386/apx-inline-gpr-norex2.c   | 107 ++
>  3 files changed, 156 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d26d9ab0d9d..9460ebbfda4 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  .  */
>  
> +#define INCLUDE_STRING
>  #define IN_TARGET_CODE 1
>  
>  #include "config.h"
> @@ -23077,6 +23078,49 @@ ix86_md_asm_adjust (vec &outputs, vec & 
> /*inputs*/,
>bool saw_asm_flag = false;
>  
>start_sequence ();
> +  /* TODO: Here we just mapped the general r/m constraints to non-EGPR
> +   constraints, will eventually map all the usable constraints in the 
> future. */

I think there should be some constraint which explicitly has all the 32
GPRs, like there is one for just all 16 GPRs (h), so that regardless of
-mapx-inline-asm-use-gpr32 one can be explicit what the inline asm wants.

Also, what about the "g" constraint?  Shouldn't there be another for "g"
without r16..r31?  What about the various other memory
constraints ("<", "o", ...)?

> +  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
> +{
> +  /* Map "r" constraint in inline asm to "h" that disallows r16-r31
> +  and replace only r, exclude Br and Yr.  */
> +  for (unsigned i = 0; i < constraints.length (); i++)
> + {
> +   std::string *s = new std::string (constraints[i]);

Doesn't this leak memory (all the time)?
I must say I don't really understand why you need to use std::string here,
but certainly it shouldn't leak.

> +   size_t pos = s->find ('r');
> +   while (pos != std::string::npos)
> + {
> +   if (pos > 0
> +   && (s->at (pos - 1) == 'Y' || s->at (pos - 1) == 'B'))
> + pos = s->find ('r', pos + 1);
> +   else
> + {
> +   s->replace (pos, 1, "h");
> +   constraints[i] = (const char*) s->c_str ();

Formatting (space before *).  The usual way for constraints is ggc_strdup on
some string in a buffer.  Also, one could have several copies or r (or m, 
memory (doesn't
that appear just in clobbers?  And that doesn't look like something that
should be replaced), Bm, e.g. in various alternatives.  So, you
need to change them all, not just the first hit.  "r,r,r,m" and the like.
Normally, one would simply walk the constraint string, parsing the special
letters (+, =, & etc.) and single letter constraints and 2 letter
constraints using CONSTRAINT_LEN macro (tons of examples in GCC sources).
Either do it in 2 passes, first one counts how long constraint string one
will need after the adjustments (and whether to adjust something at all),
then if needed XALLOCAVEC it and adjust in there, or say use a
auto_vec for
it.

> +   break;
> + }
> + }
> + }
> +  /* Also map "m/memory/Bm" constraint that may use GPR32, replace them 
> with
> +  "Bt/Bt/BT".  */
> +  for (unsigned i = 0; i < constraints.length (); i++)
> + {
> +   std::string *s = new std::string (constraints[i]);
> +   size_t pos = s->find ("m");
> +   size_t pos2 = s->find ("memory");
> +   if (pos != std::string::npos)
> + {
> +   if (pos > 0 && (s->at (pos - 1) == 'B'))
> +   s->replace (pos - 1, 2, "BT");
> +   else if (pos2 != std::string::npos)
> +   s->replace (pos, 6, "Bt");
> +   else
> +   s->replace (pos, 1, "Bt");

Formatting, the s->replace calls are indented too much.

Jakub



Re: [PATCH 1/1] RISC-V: Imply 'Zicsr' from 'Zcmt'

2023-08-31 Thread Kito Cheng via Gcc-patches
Ok, I just went through the patchlist and found this patch seems
not committed yet, anyway I will mark this as commit now :)


On Thu, Aug 31, 2023 at 5:14 PM Tsukasa OI via Gcc-patches
 wrote:
>
> On 2023/08/31 18:10, Kito Cheng wrote:
> > Hi Tsukasa:
> >
> > I guess you might did something wrong during commit this patch and
> > "RISC-V: Add stub support for existing extensions"
> >
> > https://github.com/gcc-mirror/gcc/commit/f30d6a48635b5b180e46c51138d0938d33abd942
> >
>
> It's fine.  That patch was a part of "RISC-V: Add stub support for
> existing extensions" (the only intent for subset submission was faster
> review but the bigger one is accepted earlier than I expected).
>
> Tsukasa


Re: [PATCH] RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functions

2023-08-31 Thread Robin Dapp via Gcc-patches
Hi Lehua,

thanks, this definitely goes into the direction of what I had in mind and
simplifies a lot of the reduntant emit_... so it's good to have it.

I was too slow for a detailed response :)  So just some high-level comments.

One thing I noticed is the overloading of "MASK_OP",  we use it as
"operation on masks" i.e. an insn as well as "mask policy".  IMHO we could
get rid of UNARY_MASK_OP and BINARY_MASK_OP and just decide whether to
add a mask policy depending on if all operands are masks (the same way we
did before).

Related, and seeing that the MASK in UNARY_MASK_OP is somewhat redundant,
I feel we still mix concerns a bit.  For example it is not obvious, from
the name at least, why a WIDEN_TERNARY_OP does not have a merge operand
and the decision making seems very "enum centered" now :D

In general we use the NULLARY, UNARY, BINARY, TERNARY prefixes just
to determine the number of sources which doesn't seem really necessary
because a user of e.g. NEG will already know that there only is one
source - he already specified it and currently needs to, redundantly,
say UNARY again.
 
If we split off the destination and sources from mask, merge and the rest
we could ditch them altogether.

What about
 emit_(non)vlmax_insn (icode, *operands (just dest and sources),
   mask, merge, tail/mask policy, frm)

with mask defaulting to NULL and merge defaulting to VUNDEF?  So ideally,
and in the easy case, the call would just degenerate to
 emit_..._insn (icode, operands).
 
I realize this will cause some complications on the "other side" but with
the enum in place it should still be doable?

No need to address this right away though, just sharing some ideas again.

Regards
 Robin



Re: [PATCH] RISC-V: Change vsetvl tail and mask policy to default policy

2023-08-31 Thread Kito Cheng via Gcc-patches
LGTM

On Thu, Aug 31, 2023 at 5:07 PM Lehua Ding  wrote:
>
> This patch change the vsetvl policy to default policy
> (returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
> fixed policy. Any policy is now returned, allowing change to agnostic
> or undisturbed. In the future, users may be able to control the default
> policy, such as keeping agnostic by compiler options.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
> * config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
> Change to default policy.
> * config/riscv/riscv-vector-builtins-bases.cc: Change to default 
> policy.
> * config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
> * config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to 
> test.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
> * gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
> * gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.
>
> ---
>  gcc/config/riscv/riscv-protos.h   |  3 +++
>  gcc/config/riscv/riscv-v.cc   |  4 +++-
>  gcc/config/riscv/riscv-vector-builtins-bases.cc   |  8 
>  gcc/config/riscv/riscv-vsetvl.h   |  2 --
>  gcc/config/riscv/riscv.cc |  3 +--
>  .../riscv/rvv/base/binop_vx_constraint-171.c  |  4 ++--
>  .../riscv/rvv/base/binop_vx_constraint-173.c  |  4 ++--
>  gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-24.c | 11 +++
>  8 files changed, 26 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-24.c
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 92e30a10f3c..e145ee6c69b 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -406,6 +406,9 @@ enum mask_policy
>MASK_ANY = 2,
>  };
>
> +/* Return true if VALUE is agnostic or any policy.  */
> +#define IS_AGNOSTIC(VALUE) (bool) (VALUE & 0x1 || (VALUE >> 1 & 0x1))
> +
>  enum class reduction_type
>  {
>UNORDERED,
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 427700192a3..6228ff3d92e 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -1672,9 +1672,11 @@ static rtx
>  gen_no_side_effects_vsetvl_rtx (machine_mode vmode, rtx vl, rtx avl)
>  {
>unsigned int sew = get_sew (vmode);
> +  rtx tail_policy = gen_int_mode (get_prefer_tail_policy (), Pmode);
> +  rtx mask_policy = gen_int_mode (get_prefer_mask_policy (), Pmode);
>return gen_vsetvl_no_side_effects (Pmode, vl, avl, gen_int_mode (sew, 
> Pmode),
>  gen_int_mode (get_vlmul (vmode), Pmode),
> -const0_rtx, const0_rtx);
> +tail_policy, mask_policy);
>  }
>
>  /* GET VL * 2 rtx.  */
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 54582ee130c..8e679f72392 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -139,11 +139,11 @@ public:
>  /* LMUL.  */
>  e.add_input_operand (Pmode, gen_int_mode (get_vlmul (mode), Pmode));
>
> -/* TA.  */
> -e.add_input_operand (Pmode, gen_int_mode (1, Pmode));
> +/* TAIL_ANY.  */
> +e.add_input_operand (Pmode, gen_int_mode (get_prefer_tail_policy (), 
> Pmode));
>
> -/* MU.  */
> -e.add_input_operand (Pmode, gen_int_mode (0, Pmode));
> +/* MASK_ANY.  */
> +e.add_input_operand (Pmode, gen_int_mode (get_prefer_mask_policy (), 
> Pmode));
>  return e.generate_insn (code_for_vsetvl_no_side_effects (Pmode));
>}
>  };
> diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
> index 2a315e45f31..53549abfac5 100644
> --- a/gcc/config/riscv/riscv-vsetvl.h
> +++ b/gcc/config/riscv/riscv-vsetvl.h
> @@ -21,8 +21,6 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_RISCV_VSETVL_H
>  #define GCC_RISCV_VSETVL_H
>
> -#define IS_AGNOSTIC(VALUE) (bool) (VALUE & 0x1 || (VALUE >> 1 & 0x1))
> -
>  namespace riscv_vector {
>
>  /* Classification of vsetvl instruction.  */
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index d84fa2311fa..8bca8075713 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5246,8 +5246,7 @@ riscv_print_operand (FILE *file, rtx op, int letter)
> else if (code == CONST_INT)
>   {
> /* Tail && Mask policy.  */
> -   bool agnostic_p = UINTVAL (op) & 0x1;
> -   asm_fprintf (file, "%s", agnostic_p ? "a" : "u");
> +   asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
>   }
> else
>   output_operand_lossage ("invalid vector constant"

Re: [PATCH 1/1] RISC-V: Imply 'Zicsr' from 'Zcmt'

2023-08-31 Thread Tsukasa OI via Gcc-patches
On 2023/08/31 18:10, Kito Cheng wrote:
> Hi Tsukasa:
> 
> I guess you might did something wrong during commit this patch and
> "RISC-V: Add stub support for existing extensions"
> 
> https://github.com/gcc-mirror/gcc/commit/f30d6a48635b5b180e46c51138d0938d33abd942
> 

It's fine.  That patch was a part of "RISC-V: Add stub support for
existing extensions" (the only intent for subset submission was faster
review but the bigger one is accepted earlier than I expected).

Tsukasa


  1   2   >