[PATCH v3] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread pan2 . li
From: Pan Li 

This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:

* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8

Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \
  void __attribute__((noinline))\
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  { \
unsigned i; \
for (i = 0; i < limit; i++) \
  { \
WT x = in[i];   \
bool overflow = x > (WT)(NT)(-1);   \
out[i] = ((NT)x) | (NT)-overflow;   \
  } \
  }

DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)

Before this patch:
.L3:
  vsetvli  a5,a2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vmsgtu.vvv0,v1,v2
  vsetvli  zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

After this patch:
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle64.v  v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

Passed the rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec.md (ustrunc2): Add
new pattern for double truncation.
(ustrunc2): Ditto but for quad truncation.
(ustrunc2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_double_ustrunc): Add
new func decl to expand double vec ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.
* config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new
func impl to expand vector double ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  35 ++
 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-v.cc   |  46 ++
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   |  22 +
 .../riscv/rvv/autovec/unop/vec_sat_data.h | 394 ++
 .../rvv/autovec/unop/vec_sat_u_trunc-1.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  21 +
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  23 +
 .../rvv/autovec/unop/vec_sat_u_trunc-4.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-5.c  |  21 +
 .../rvv/autovec/unop/vec_sat_u_trunc-6.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-1.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-2.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-3.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-4.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-5.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-6.c  |  16 +
 .../rvv/autovec/unop/vec_sat_unary_vv_run.h   |  23 +
 18 files changed, 742 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
 create mode

Re: [PATCH v8 09/12] Delay caller error reporting for musttail

2024-07-07 Thread Richard Biener
On Sat, Jul 6, 2024 at 7:08 PM Andi Kleen  wrote:
>
> On Fri, Jul 05, 2024 at 01:45:17PM +0200, Richard Biener wrote:
> > On Sat, Jun 22, 2024 at 9:00 PM Andi Kleen  wrote:
> > >
> > > Move the error reporting for caller attributes to be
> > > after the tail call discovery, so that we can give proper
> > > error messages tagged to the calls.
> >
> > Hmm.  This all gets a bit awkward.  I realize that early checking
> > gets us less compile-time unnecessarily spent for searching for
> > a tail call - but at least for the musttail case parsing constraints
> > should put a practical limit on how far to look?
>
> All the top level checks are for obscure situations, so it's unlikely
> that it makes much difference for compile time either way.
>
> >
> > So what I wonder is whether it would be better to separate
> > searching for a (musttail) candidate separate from validation?
> >
> > We could for example invoke find_tail_calls twice, once to
> > find a musttail candidate (can there be multiple ones?) and once
> > to validate and error?  Would that make the delaying less awkward?
>
> There can be multiple musttails in a function, in theory
> one for every return.
>
> I'm not sure I see the awkward part? (other than perhaps
> the not-quite-natural accumulation of opt_tailcalls). There
> are alots of checks before and after discovery. This just
> moves them all to be after.
>
> If the top level checks were done based on a discovered
> list you would need extra loops to walk the candidates
> later and error. It wouldn't be any simpler at least.

Thanks for clarifying.

> Overall the logic in this pass is rather convoluted and
> could deserve some cleanups and separation of concerns.
> e.g. it would be better to separate tail calls and tail
> recursion. But I'm not trying to rewrite the pass here.

Understood.  For a v9, can you squash the tree-tailcall.cc changes
please?

Thanks,
Richard.

> -Andi


Re: [PATCH v8 07/12] Enable musttail tail conversion even when not optimizing

2024-07-07 Thread Richard Biener
On Sat, Jul 6, 2024 at 6:07 PM Andi Kleen  wrote:
>
> > > +class pass_musttail : public gimple_opt_pass
> > > +{
> > > +public:
> > > +  pass_musttail (gcc::context *ctxt)
> > > +: gimple_opt_pass (pass_data_musttail, ctxt)
> > > +  {}
> > > +
> > > +  /* opt_pass methods: */
> > > +  /* This pass is only used when not optimizing to make [[musttail]] 
> > > still
> > > + work.  */
> > > +  bool gate (function *) final override { return 
> > > !flag_optimize_sibling_calls; }
> >
> > Shouldn't this check f->has_musttail only?  That is, I would expect
> > -fno-optimize-sibling-calls to still tail-call [[musttail]]?  The comment 
> > says
> > the pass only runs when not optimizing - so maybe you wanted to do
> > return optimize == 0;?
>
> When flag_optimize_sibling_call is set the other tailcall pass will
> take care of the musttails. It is only needed when that one doesn't run.
> So I think looking at that flag is correct.

Ah, I see.  So this pass is responsible for both -O0 and
-fno-optimized-sibling-calls.
But I'm quite sure the other pass doesn't run with -O0
-foptimize-sibling-calls, does it?

> But I should move the f->has_musttail check into the gate (done) and
> clarified the comment because it is not specific to optimizing.

Yeah, clarifying the comment would be nice.

Thanks,
Richard.

> Thanks,
> -Andi


Re: [PATCH] rs6000: Remove vcond{,u} expanders

2024-07-07 Thread Richard Biener
On Mon, Jul 8, 2024 at 8:24 AM Kewen.Lin  wrote:
>
> Hi,
>
> As PR114189 shows, middle-end will obsolete vcond, vcondu
> and vcondeq optabs soon.  This patch is to remove all
> vcond{,u} expanders in rs6000 port and adjust the function
> rs6000_emit_vector_cond_expr which is called by those
> expanders as static.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'm going to push this later this week if no objections.

Thanks a lot for doing this!

Richard.

> BR,
> Kewen
> -
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr): 
> Remove.
> * config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add static
> qualifier as it is only called by rs6000_emit_swsqrt now.
> * config/rs6000/vector.md (vcond): Remove.
> (vcond): Remove.
> (vcondv4sfv4si): Likewise.
> (vcondv4siv4sf): Likewise.
> (vcondv2dfv2di): Likewise.
> (vcondv2div2df): Likewise.
> (vcondu): Likewise.
> (vconduv4sfv4si): Likewise.
> (vconduv2dfv2di): Likewise.
> ---
>  gcc/config/rs6000/rs6000-protos.h |   1 -
>  gcc/config/rs6000/rs6000.cc   |   2 +-
>  gcc/config/rs6000/vector.md   | 160 --
>  3 files changed, 1 insertion(+), 162 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 09a57a806fa..b40557a8557 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -126,7 +126,6 @@ extern void rs6000_emit_dot_insn (rtx dst, rtx src, int 
> dot, rtx ccreg);
>  extern bool rs6000_emit_set_const (rtx, rtx);
>  extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
>  extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
> -extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx);
>  extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
>  extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
>  extern rtx swap_endian_selector_for_mode (machine_mode mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 58553ff66f4..24044f3a558 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -16145,7 +16145,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
> OP_FALSE are two VEC_COND_EXPR operands.  CC_OP0 and CC_OP1 are the two
> operands for the relation operation COND.  */
>
> -int
> +static int
>  rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
>   rtx cond, rtx cc_op0, rtx cc_op1)
>  {
> diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
> index 59489e06839..0d3e0a24e11 100644
> --- a/gcc/config/rs6000/vector.md
> +++ b/gcc/config/rs6000/vector.md
> @@ -331,166 +331,6 @@ (define_expand "vector_copysign3"
>  })
>
>
>
> -;; Vector comparisons
> -(define_expand "vcond"
> -  [(set (match_operand:VEC_F 0 "vfloat_operand")
> -   (if_then_else:VEC_F
> -(match_operator 3 "comparison_operator"
> -[(match_operand:VEC_F 4 "vfloat_operand")
> - (match_operand:VEC_F 5 "vfloat_operand")])
> -(match_operand:VEC_F 1 "vfloat_operand")
> -(match_operand:VEC_F 2 "vfloat_operand")))]
> -  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> -{
> -  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> -   operands[3], operands[4], operands[5]))
> -DONE;
> -  else
> -gcc_unreachable ();
> -})
> -
> -(define_expand "vcond"
> -  [(set (match_operand:VEC_I 0 "vint_operand")
> -   (if_then_else:VEC_I
> -(match_operator 3 "comparison_operator"
> -[(match_operand:VEC_I 4 "vint_operand")
> - (match_operand:VEC_I 5 "vint_operand")])
> -(match_operand:VEC_I 1 "vector_int_reg_or_same_bit")
> -(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))]
> -  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
> -{
> -  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> -   operands[3], operands[4], operands[5]))
> -DONE;
> -  else
> -gcc_unreachable ();
> -})
> -
> -(define_expand "vcondv4sfv4si"
> -  [(set (match_operand:V4SF 0 "vfloat_operand")
> -   (if_then_else:V4SF
> -(match_operator 3 "comparison_operator"
> -[(match_operand:V4SI 4 "vint_operand")
> - (match_operand:V4SI 5 "vint_operand")])
> -(match_operand:V4SF 1 "vfloat_operand")
> -(match_operand:V4SF 2 "vfloat_operand")))]
> -  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
> -   && VECTOR_UNIT_ALTIVEC_P (V4SImode)"
> -{
> -  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
> -   operands[3], operands[4], operands[5]))
> -DONE;
> -  else
> -gcc_unreachable ();
> -})
>

[PATCH] rs6000: Remove vcond{,u} expanders

2024-07-07 Thread Kewen.Lin
Hi,

As PR114189 shows, middle-end will obsolete vcond, vcondu
and vcondeq optabs soon.  This patch is to remove all
vcond{,u} expanders in rs6000 port and adjust the function
rs6000_emit_vector_cond_expr which is called by those
expanders as static.

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000-protos.h (rs6000_emit_vector_cond_expr): Remove.
* config/rs6000/rs6000.cc (rs6000_emit_vector_cond_expr): Add static
qualifier as it is only called by rs6000_emit_swsqrt now.
* config/rs6000/vector.md (vcond): Remove.
(vcond): Remove.
(vcondv4sfv4si): Likewise.
(vcondv4siv4sf): Likewise.
(vcondv2dfv2di): Likewise.
(vcondv2div2df): Likewise.
(vcondu): Likewise.
(vconduv4sfv4si): Likewise.
(vconduv2dfv2di): Likewise.
---
 gcc/config/rs6000/rs6000-protos.h |   1 -
 gcc/config/rs6000/rs6000.cc   |   2 +-
 gcc/config/rs6000/vector.md   | 160 --
 3 files changed, 1 insertion(+), 162 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..b40557a8557 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -126,7 +126,6 @@ extern void rs6000_emit_dot_insn (rtx dst, rtx src, int 
dot, rtx ccreg);
 extern bool rs6000_emit_set_const (rtx, rtx);
 extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
 extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
-extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx);
 extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
 extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
 extern rtx swap_endian_selector_for_mode (machine_mode mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 58553ff66f4..24044f3a558 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -16145,7 +16145,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
OP_FALSE are two VEC_COND_EXPR operands.  CC_OP0 and CC_OP1 are the two
operands for the relation operation COND.  */

-int
+static int
 rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, rtx op_false,
  rtx cond, rtx cc_op0, rtx cc_op1)
 {
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 59489e06839..0d3e0a24e11 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -331,166 +331,6 @@ (define_expand "vector_copysign3"
 })



-;; Vector comparisons
-(define_expand "vcond"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (if_then_else:VEC_F
-(match_operator 3 "comparison_operator"
-[(match_operand:VEC_F 4 "vfloat_operand")
- (match_operand:VEC_F 5 "vfloat_operand")])
-(match_operand:VEC_F 1 "vfloat_operand")
-(match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcond"
-  [(set (match_operand:VEC_I 0 "vint_operand")
-   (if_then_else:VEC_I
-(match_operator 3 "comparison_operator"
-[(match_operand:VEC_I 4 "vint_operand")
- (match_operand:VEC_I 5 "vint_operand")])
-(match_operand:VEC_I 1 "vector_int_reg_or_same_bit")
-(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcondv4sfv4si"
-  [(set (match_operand:V4SF 0 "vfloat_operand")
-   (if_then_else:V4SF
-(match_operator 3 "comparison_operator"
-[(match_operand:V4SI 4 "vint_operand")
- (match_operand:V4SI 5 "vint_operand")])
-(match_operand:V4SF 1 "vfloat_operand")
-(match_operand:V4SF 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
-   && VECTOR_UNIT_ALTIVEC_P (V4SImode)"
-{
-  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],
-   operands[3], operands[4], operands[5]))
-DONE;
-  else
-gcc_unreachable ();
-})
-
-(define_expand "vcondv4siv4sf"
-  [(set (match_operand:V4SI 0 "vint_operand")
-   (if_then_else:V4SI
-(match_operator 3 "comparison_operator"
-[(match_operand:V4SF 4 "vfloat_operand")
- (match_operand:V4SF 5 "vfloat_operand")])
-(match_operand:V4SI 1 "vint_opera

Re: Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 11:24 AM Levy Hsu  wrote:
>
> This patch extends support for BF16 vector operations in GCC, including 
> bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and 
> V32BF modes.
> Bootstrapped and tested on x86_64-linux-gnu. ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (ix86_expand_fp_absneg_operator): Add 
> VBF modes.
> (ix86_expand_copysign): Ditto.
> (ix86_expand_xorsign): Ditto.
> * config/i386/i386.cc (ix86_build_const_vector): Ditto.
> (ix86_build_signbit_mask): Ditto.
> * config/i386/sse.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx2-bf16-vec-absneg.c: New test.
> * gcc.target/i386/avx512f-bf16-vec-absneg.c: New test.
>
> ---
>  gcc/config/i386/i386-expand.cc| 76 +++--
>  gcc/config/i386/i386.cc   |  6 ++
>  gcc/config/i386/sse.md| 37 +---
>  .../gcc.target/i386/avx2-bf16-vec-absneg.c| 85 +++
>  .../gcc.target/i386/avx512f-bf16-vec-absneg.c | 66 ++
>  5 files changed, 234 insertions(+), 36 deletions(-)
>  create mode 100755 gcc/testsuite/gcc.target/i386/avx2-bf16-vec-absneg.c
>  create mode 100755 gcc/testsuite/gcc.target/i386/avx512f-bf16-vec-absneg.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 5c29ee1353f..46d13a55e6a 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -2175,20 +2175,28 @@ ix86_expand_fp_absneg_operator (enum rtx_code code, 
> machine_mode mode,
>machine_mode vmode = mode;
>rtvec par;
>
> -  if (vector_mode || mode == TFmode || mode == HFmode)
> -{
> -  use_sse = true;
> -  if (mode == HFmode)
> -   vmode = V8HFmode;
> -}
> -  else if (TARGET_SSE_MATH)
> -{
> -  use_sse = SSE_FLOAT_MODE_P (mode);
> -  if (mode == SFmode)
> -   vmode = V4SFmode;
> -  else if (mode == DFmode)
> -   vmode = V2DFmode;
> -}
> +  switch (mode)
> +  {
> +  case HFmode:
> +use_sse = true;
> +vmode = V8HFmode;
> +break;
> +  case BFmode:
> +use_sse = true;
> +vmode = V8BFmode;
> +break;
> +  case SFmode:
> +use_sse = TARGET_SSE_MATH;
use_sse = TARGET_SSE_MATH && TARGET_SSE;
> +vmode = V4SFmode;
> +break;
> +  case DFmode:
> +use_sse = TARGET_SSE_MATH;
use_sse = TARGET_SSE_MATH && TARGET_SSE2;
Others LGTM.
> +vmode = V2DFmode;
> +break;
> +  default:
> +use_sse = vector_mode || mode == TFmode;
> +break;
> +  }
>
>dst = operands[0];
>src = operands[1];
> @@ -2321,16 +2329,26 @@ ix86_expand_copysign (rtx operands[])
>
>mode = GET_MODE (operands[0]);
>
> -  if (mode == HFmode)
> +  switch (mode)
> +  {
> +  case HFmode:
>  vmode = V8HFmode;
> -  else if (mode == SFmode)
> +break;
> +  case BFmode:
> +vmode = V8BFmode;
> +break;
> +  case SFmode:
>  vmode = V4SFmode;
> -  else if (mode == DFmode)
> +break;
> +  case DFmode:
>  vmode = V2DFmode;
> -  else if (mode == TFmode)
> +break;
> +  case TFmode:
>  vmode = mode;
> -  else
> -gcc_unreachable ();
> +break;
> +  default:
> +gcc_unreachable();
> +  }
>
>if (rtx_equal_p (operands[1], operands[2]))
>  {
> @@ -2391,14 +2409,24 @@ ix86_expand_xorsign (rtx operands[])
>
>mode = GET_MODE (dest);
>
> -  if (mode == HFmode)
> +  switch (mode)
> +  {
> +  case HFmode:
>  vmode = V8HFmode;
> -  else if (mode == SFmode)
> +break;
> +  case BFmode:
> +vmode = V8BFmode;
> +break;
> +  case SFmode:
>  vmode = V4SFmode;
> -  else if (mode == DFmode)
> +break;
> +  case DFmode:
>  vmode = V2DFmode;
> -  else
> +break;
> +  default:
>  gcc_unreachable ();
> +break;
> +  }
>
>temp = gen_reg_rtx (vmode);
>mask = ix86_build_signbit_mask (vmode, 0, 0);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index d4ccc24be6e..b5768a65e52 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -16353,6 +16353,9 @@ ix86_build_const_vector (machine_mode mode, bool 
> vect, rtx value)
>  case E_V8DFmode:
>  case E_V4DFmode:
>  case E_V2DFmode:
> +case E_V32BFmode:
> +case E_V16BFmode:
> +case E_V8BFmode:
>n_elt = GET_MODE_NUNITS (mode);
>v = rtvec_alloc (n_elt);
>scalar_mode = GET_MODE_INNER (mode);
> @@ -16389,6 +16392,9 @@ ix86_build_signbit_mask (machine_mode mode, bool 
> vect, bool invert)
>  case E_V8HFmode:
>  case E_V16HFmode:
>  case E_V32HFmode:
> +case E_V32BFmode:
> +case E_V16BFmode:
> +case E_V8BFmode:
>vec_mode = mode;
>imode = HImode;
>break;
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 0be2dcd8891..1703bbb4250 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -351,7 +351,9 @@
>
>  ;; 128-, 256- and 512-bit float vector modes for bitwis

Re: [PATCH v2] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread juzhe.zh...@rivai.ai
+ if (double_mode == E_VOIDmode && quad_mode == E_VOIDmode)
Why we have VOID mode  ? I still don't understand the codes.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-08 12:48
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v2] RISC-V: Implement .SAT_TRUNC for vector unsigned int
From: Pan Li 
 
This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:
 
* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8
 
Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \
  void __attribute__((noinline))\
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  { \
unsigned i; \
for (i = 0; i < limit; i++) \
  { \
WT x = in[i];   \
bool overflow = x > (WT)(NT)(-1);   \
out[i] = ((NT)x) | (NT)-overflow;   \
  } \
  }
 
DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)
 
Before this patch:
.L3:
  vsetvli  a5,a2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vmsgtu.vvv0,v1,v2
  vsetvli  zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3
 
After this patch:
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle64.v  v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3
 
Passed the rv64gcv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ustrunc2): Add
new pattern for double truncation.
(ustrunc2): Ditto but for quad truncation.
(ustrunc2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_ustrunc): Add new decl
to expand vec ustrunc.
* config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new
func impl to expand vector double ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.
(expand_vec_ustrunc): Add new func impl to expand vector ustrunc.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   |  34 ++
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-v.cc   |  54 +++
.../riscv/rvv/autovec/binop/vec_sat_arith.h   |  22 +
.../riscv/rvv/autovec/unop/vec_sat_data.h | 394 ++
.../rvv/autovec/unop/vec_sat_u_trunc-1.c  |  19 +
.../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  21 +
.../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  23 +
.../rvv/autovec/unop/vec_sat_u_trunc-4.c  |  19 +
.../rvv/autovec/unop/vec_sat_u_trunc-5.c  |  21 +
.../rvv/autovec/unop/vec_sat_u_trunc-6.c  |  19 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-1.c  |  16 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-2.c  |  16 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-3.c  |  16 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-4.c  |  16 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-5.c  |  16 +
.../rvv/autovec/unop/vec_sat_u_trunc-run-6.c  |  16 +
.../rvv/autovec/unop/vec_sat_unary_vv_run.h   |  23 +
18 files changed, 746 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h
create mode 100644 
gcc/testsuite/gcc.target/

Re: Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-07-07 Thread Fei Gao
On 2024-07-07 22:53  Jeff Law  wrote:
>
>
>
>On 6/8/24 2:36 PM, Jeff Law wrote:
>>
>>
>> On 6/5/24 8:42 PM, Fei Gao wrote:
>>
 But let's back up and get a good explanation of what the problem is.
 Based on patch 2/2 it looks like we have lost an assignment to the
 return register.

 To someone not familiar with this code, it sounds to me like we've made
 a mistake earlier and we're now defining a hook that lets us go back and
 fix that earlier mistake.   I'm probably wrong, but so far that's what
 it sounds like.
>>> Hi Jeff
>>>
>>> You're right. Let me rephrase  patch 2/2 with more details. Search /*
>>> feigao to location the point I'm
>>> tring to explain.
>>>
>>> code snippets from gcc/function.cc
>>> void
>>> thread_prologue_and_epilogue_insns (void)
>>> {
>>> ...
>>>    /*feigao:
>>>          targetm.gen_epilogue () is called here to generate epilogue
>>> sequence.
>>> https://gcc.gnu.org/git/?
>>> p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
>>> Commit above tries in targetm.gen_epilogue () to detect if
>>> there's li    a0,0 insn at the end of insn chain, if so, cm.popret
>>> is replaced by cm.popretz and li    a0,0 insn is deleted.
>> So that seems like the critical issue.  Generation of the prologue/
>> epilogue really shouldn't be changing other instructions in the
>> instruction stream.  I'm not immediately aware of another target that
>> does that, an it seems like a rather risky thing to do.
>>
>>
>> It looks like the cm.popretz's RTL exposes the assignment to a0 and
>> there's a DCE pass that runs after insertion of the prologue/epilogue.
>> So I would suggest leaving the assignment to a0 in the RTL chain and see
>> if the later DCE pass after prologue generation eliminates the redundant
>> assignment.  That seems a lot cleaner.
>So I looked at this in a bit more detail.  I'm going to explicitly
>reject this patch now.
>
>The removal of the set to a0 in riscv_gen_multi_pop_insn looks wrong on
>multiple levels.  I don't think you have enough context in that routine
>or its callers to know if it's safe  ie given this fragment of RTL:
>
>> (call_insn 12 11 13 3 (parallel [
>> (call (mem:SI (symbol_ref:SI ("test_1") [flags 0x41] 
>>) [0 test_1 S4 A32])
>> (const_int 0 [0]))
>> (use (unspec:SI [
>> (const_int 0 [0])
>> ] UNSPEC_CALLEE_CC))
>> (clobber (reg:SI 1 ra))
>> ]) "j.c":14:9 441 {call_internal}
>>  (expr_list:REG_CALL_DECL (symbol_ref:SI ("test_1") [flags 0x41] 
>>)
>> (nil))
>> (expr_list:SI (use (reg:SI 10 a0))
>> (nil)))
>>
>> (code_label 13 12 14 4 2 (nil) [1 uses])
>>
>> (note 14 13 19 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>>
>> (insn 19 14 20 4 (set (reg/i:SI 10 a0)
>> (const_int 0 [0])) "j.c":18:1 276 {*movsi_internal}
>>  (nil))
>>
>> (insn 20 19 24 4 (use (reg/i:SI 10 a0)) "j.c":18:1 -1
>>  (nil))
>
>
>You delete insns 19 and 20 resulting in this:
>
>> (call_insn 12 11 13 3 (parallel [
>> (call (mem:SI (symbol_ref:SI ("test_1") [flags 0x41] 
>>) [0 test_1 S4 A32])
>> (const_int 0 [0]))
>> (use (unspec:SI [
>> (const_int 0 [0])
>> ] UNSPEC_CALLEE_CC))
>> (clobber (reg:SI 1 ra))
>> ]) "j.c":14:9 441 {call_internal}
>>  (expr_list:REG_CALL_DECL (symbol_ref:SI ("test_1") [flags 0x41] 
>>)
>> (nil))
>> (expr_list:SI (use (reg:SI 10 a0))
>> (nil)))
>>
>> (code_label 13 12 14 4 2 (nil) [1 uses])
>>
>> (note 14 13 24 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
>>
>> (note 24 14 0 NOTE_INSN_DELETED)
>
>Which is incorrect/inconsistent RTL.  And as I've noted before, it's
>conceptually wrong for the backend code to be removing insns from the
>insn chain during prologue/epilogue generation.
>
>I realize you're trying to use a hook to limit how this impacts other
>targets, but if you're making a bad decision in the RISC-V backend code,
>working around it with a target hook rather than fixing the core problem
>in the RISC-V backend just makes the whole situation worse.
>
>My suggest is this.  Leave the assignment to a0 and use alone.  That's
>likely going to result in some kind of code size regression, but not a
>correctness regression.  Then address the code size regressions with the
>invariant that prologue/epilogue generation must not change existing
>insns on the insn chain. 

Hi Jeff

Thanks for your patience.

Got your point never remove insns from the
insn chain during prologue/epilogue generation. 

As you suggested, I will fix this issue by leaving the assignment to a0 and use 
insn
with cm.popret. Then optimize to cm.popretz in correct pass in future if 
possilbe.

>jeff
>
>
>
>
>
>
>
>
>
>

[PATCH] Rename __{float, double}_u to __x86_{float, double}_u to avoid pulluting the namespace.

2024-07-07 Thread liuhongt
I have a build failure on NetBSD as the namespace pollution avoidance causes
a direct hit with the system /usr/include/math.h
===

In file included from /usr/src/local/gcc/obj/gcc/include/emmintrin.h:31,
 from 
/usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/ext/random:45,
 from 
/usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:65:
/usr/src/local/gcc/obj/gcc/include/xmmintrin.h:75:15: error: conflicting 
declaration 'typedef float __float_u'
   75 | typedef float __float_u __attribute__ ((__may_alias__, __aligned__ 
(1)));
  |   ^
In file included from 
/usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/cmath:47,
 from 
/usr/src/local/gcc/obj/x86_64-unknown-netbsd10.99/libstdc++-v3/include/x86_64-unknown-netbsd10.99/bits/stdc++.h:114,
 from 
/usr/src/local/gcc/libstdc++-v3/include/precompiled/extc++.h:32:
/usr/src/local/gcc/obj/gcc/include-fixed/math.h:49:7: note: previous 
declaration as 'union __float_u'
   49 | union __float_u {

As pinski suggested in #c2, use __x86_float_u which seems less likely to 
pullute the namespace.

Bootstrapped and regtested on x86_64-pc-linux{-m32,}.
Ready push to trunk if there's no other concerns.

gcc/ChangeLog:

PR target/115796
* config/i386/emmintrin.h (__float_u): Rename to ..
(__x86_float_u): .. this.
(_mm_load_sd): Ditto.
(_mm_store_sd): Ditto.
(_mm_loadh_pd): Ditto.
(_mm_loadl_pd): Ditto.
* config/i386/xmmintrin.h (__double_u): Rename to ..
(__x86_double_u): .. this.
(_mm_load_ss): Ditto.
(_mm_store_ss): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115796.c: New test.
---
 gcc/config/i386/emmintrin.h  | 10 +-
 gcc/config/i386/xmmintrin.h  |  6 +++---
 gcc/testsuite/gcc.target/i386/pr115796.c | 24 
 3 files changed, 32 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115796.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index d58030e5c4f..a3fcd7a869c 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -56,7 +56,7 @@ typedef double __m128d __attribute__ ((__vector_size__ (16), 
__may_alias__));
 /* Unaligned version of the same types.  */
 typedef long long __m128i_u __attribute__ ((__vector_size__ (16), 
__may_alias__, __aligned__ (1)));
 typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
-typedef double __double_u __attribute__ ((__may_alias__, __aligned__ (1)));
+typedef double __x86_double_u __attribute__ ((__may_alias__, __aligned__ (1)));
 
 /* Create a selector for use with the SHUFPD instruction.  */
 #define _MM_SHUFFLE2(fp1,fp0) \
@@ -146,7 +146,7 @@ _mm_load1_pd (double const *__P)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_load_sd (double const *__P)
 {
-  return __extension__ (__m128d) { *(__double_u *)__P, 0.0 };
+  return __extension__ (__m128d) { *(__x86_double_u *)__P, 0.0 };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -181,7 +181,7 @@ _mm_storeu_pd (double *__P, __m128d __A)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_store_sd (double *__P, __m128d __A)
 {
-  *(__double_u *)__P = ((__v2df)__A)[0] ;
+  *(__x86_double_u *)__P = ((__v2df)__A)[0] ;
 }
 
 extern __inline double __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
@@ -974,13 +974,13 @@ _mm_unpacklo_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadh_pd (__m128d __A, double const *__B)
 {
-  return __extension__ (__m128d) { ((__v2df)__A)[0], *(__double_u*)__B };
+  return __extension__ (__m128d) { ((__v2df)__A)[0], *(__x86_double_u*)__B };
 }
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_loadl_pd (__m128d __A, double const *__B)
 {
-  return __extension__ (__m128d) { *(__double_u*)__B, ((__v2df)__A)[1] };
+  return __extension__ (__m128d) { *(__x86_double_u*)__B, ((__v2df)__A)[1] };
 }
 
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 37e5a94cf10..7f10f96d72c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -72,7 +72,7 @@ typedef float __m128 __attribute__ ((__vector_size__ (16), 
__may_alias__));
 
 /* Unaligned version of the same type.  */
 typedef float __m128_u __attribute__ ((__vector_size__ (16), __may_alias__, 
__aligned__ (1)));
-typedef float __float_u __attribute__ ((__may_alias__, __aligned__ (1)));
+typedef float __x86_float_u __

Re: Re: [PATCH 2/2] [RISC-V] c implies zca, and conditionally zcf & zcd

2024-07-07 Thread Fei Gao
On 2024-07-06 22:15  Jeff Law  wrote:
>
>
>
>On 7/5/24 3:56 AM, Fei Gao wrote:
>> According to Zc-1.0.4-3.pdf from
>> https://github.com/riscvarchive/riscv-code-size-reduction/releases/tag/v1.0.4-3
>> The rule is that:
>> 1. C always implies Zca
>> 2. C+F implies Zcf (RV32 only)
>> 3. C+D implies Zcd
>>
>> gcc/ChangeLog:
>>
>>  * common/config/riscv/riscv-common.cc:
>>  c implies zca, and conditionally zcf & zcd.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * gcc.target/riscv/attribute-15.c: adapt TC.
>>  * gcc.target/riscv/attribute-18.c: likewise.
>>  * gcc.target/riscv/pr110696.c: likewise.
>>  * gcc.target/riscv/rvv/base/abi-callee-saved-1-zcmp.c: likewise.
>>  * gcc.target/riscv/rvv/base/abi-callee-saved-2-zcmp.c: likewise.
>>  * gcc.target/riscv/rvv/base/pr114352-1.c: likewise.
>>  * gcc.target/riscv/rvv/base/pr114352-3.c: likewise.
>>  * gcc.target/riscv/arch-39.c: New test.
>>  * gcc.target/riscv/arch-40.c: New test.
>This failed to apply in the RISC-V tester.  It's not clear to me why.
>
>What I would suggest is get 1/2 installed, then wait for the tester to
>pick up 1/2, then resubmit 2/2. 

Thanks for your reveiw and advice. 
Patch 1/2 has been installed and I will resend patch 2/2 in a few days
to retrigger CI.

BR
Fei

>
>Conceptually this is probably OK, but I'd like to see it run through the
>pre-commit testing before final ACK.
>
>
>Jeff

[PATCH v2] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread pan2 . li
From: Pan Li 

This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:

* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8

Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \
  void __attribute__((noinline))\
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  { \
unsigned i; \
for (i = 0; i < limit; i++) \
  { \
WT x = in[i];   \
bool overflow = x > (WT)(NT)(-1);   \
out[i] = ((NT)x) | (NT)-overflow;   \
  } \
  }

DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)

Before this patch:
.L3:
  vsetvli  a5,a2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vmsgtu.vvv0,v1,v2
  vsetvli  zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

After this patch:
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle64.v  v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3

Passed the rv64gcv fully regression tests.

gcc/ChangeLog:

* config/riscv/autovec.md (ustrunc2): Add
new pattern for double truncation.
(ustrunc2): Ditto but for quad truncation.
(ustrunc2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_ustrunc): Add new decl
to expand vec ustrunc.
* config/riscv/riscv-v.cc (expand_vec_double_ustrunc): Add new
func impl to expand vector double ustrunc.
(expand_vec_quad_ustrunc): Ditto but for quad.
(expand_vec_oct_ustrunc): Ditto but for oct.
(expand_vec_ustrunc): Add new func impl to expand vector ustrunc.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_unary_vv_run.h: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   |  34 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  54 +++
 .../riscv/rvv/autovec/binop/vec_sat_arith.h   |  22 +
 .../riscv/rvv/autovec/unop/vec_sat_data.h | 394 ++
 .../rvv/autovec/unop/vec_sat_u_trunc-1.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  21 +
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  23 +
 .../rvv/autovec/unop/vec_sat_u_trunc-4.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-5.c  |  21 +
 .../rvv/autovec/unop/vec_sat_u_trunc-6.c  |  19 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-1.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-2.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-3.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-4.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-5.c  |  16 +
 .../rvv/autovec/unop/vec_sat_u_trunc-run-6.c  |  16 +
 .../rvv/autovec/unop/vec_sat_unary_vv_run.h   |  23 +
 18 files changed, 746 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/

[PATCH] c++/modules: Keep entity mapping info across duplicate_decls [PR99241]

2024-07-07 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk/14?

-- >8 --

When duplicate_decls finds a match with an existing imported
declaration, it clears DECL_LANG_SPECIFIC of the olddecl and replaces it
with the contents of newdecl; this clears DECL_MODULE_ENTITY_P causing
an ICE if the same declaration is imported again later.

This fixes the issue by ensuring that the flag is transferred to newdecl
before clearing so that it ends up on olddecl again.

For future-proofing we also do the same with DECL_MODULE_KEYED_DECLS_P,
though because we don't yet support textual redefinition merging we
can't yet test this works as intended.  I don't expect it's possible for
a new declaration already to have extra keyed decls mismatching that of
the old declaration though, so I don't do anything with 'keyed_map' at
this time.

PR c++/99241

gcc/cp/ChangeLog:

* decl.cc (duplicate_decls): Merge module entity information.

gcc/testsuite/ChangeLog:

* g++.dg/modules/pr99241_a.H: New test.
* g++.dg/modules/pr99241_b.H: New test.
* g++.dg/modules/pr99241_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/decl.cc   | 17 +
 gcc/testsuite/g++.dg/modules/pr99241_a.H |  3 +++
 gcc/testsuite/g++.dg/modules/pr99241_b.H |  3 +++
 gcc/testsuite/g++.dg/modules/pr99241_c.C |  5 +
 4 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99241_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99241_b.H
 create mode 100644 gcc/testsuite/g++.dg/modules/pr99241_c.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 29616100cfe..b3a770df926 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -3149,6 +3149,23 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
hiding, bool was_hidden)
   if (TREE_CODE (newdecl) == FIELD_DECL)
 DECL_PACKED (olddecl) = DECL_PACKED (newdecl);
 
+  /* Merge module entity mapping information.  */
+  if (modules_p ())
+{
+  tree old_nontmpl = STRIP_TEMPLATE (olddecl);
+  if (DECL_LANG_SPECIFIC (old_nontmpl)
+ && (DECL_MODULE_ENTITY_P (old_nontmpl)
+ || DECL_MODULE_KEYED_DECLS_P (old_nontmpl)))
+   {
+ tree new_nontmpl = STRIP_TEMPLATE (newdecl);
+ retrofit_lang_decl (new_nontmpl);
+ DECL_MODULE_ENTITY_P (new_nontmpl)
+   = DECL_MODULE_ENTITY_P (old_nontmpl);
+ DECL_MODULE_KEYED_DECLS_P (new_nontmpl)
+   = DECL_MODULE_KEYED_DECLS_P (old_nontmpl);
+   }
+}
+
   /* The DECL_LANG_SPECIFIC information in OLDDECL will be replaced
  with that from NEWDECL below.  */
   if (DECL_LANG_SPECIFIC (olddecl))
diff --git a/gcc/testsuite/g++.dg/modules/pr99241_a.H 
b/gcc/testsuite/g++.dg/modules/pr99241_a.H
new file mode 100644
index 000..c7031f08eb5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99241_a.H
@@ -0,0 +1,3 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+void terminate();
diff --git a/gcc/testsuite/g++.dg/modules/pr99241_b.H 
b/gcc/testsuite/g++.dg/modules/pr99241_b.H
new file mode 100644
index 000..c7031f08eb5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99241_b.H
@@ -0,0 +1,3 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+void terminate();
diff --git a/gcc/testsuite/g++.dg/modules/pr99241_c.C 
b/gcc/testsuite/g++.dg/modules/pr99241_c.C
new file mode 100644
index 000..7f2b1bb43ea
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/pr99241_c.C
@@ -0,0 +1,5 @@
+// { dg-additional-options "-fmodules-ts -fno-module-lazy" }
+
+import "pr99241_a.H";
+void terminate();
+import "pr99241_b.H";
-- 
2.43.2



Re: [PATCH V2] x86: Update branch hint for Redwood Cove.

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 9:30 AM liuhongt  wrote:
>
> From: "H.J. Lu" 
>
> >The above reads like it would be worth splitting branc_prediction_hits
> >into branch_prediction_hints_taken and branch_prediction_hints_not_taken
> >given not-taken is the default and thus will just increase code size?
> >According to Intel® 64 and IA-32 Architectures Optimization Reference
> >Manual[1], Branch Hint is updated for Redwood Cove.
> Changed.
>
> cut from [1]-
> Starting with the Redwood Cove microarchitecture, if the predictor has
> no stored information about a branch, the branch has the Intel® SSE2
> branch taken hint (i.e., instruction prefix 3EH), When the codec
> decodes the branch, it flips the branch’s prediction from not-taken to
> taken. It then flushes the pipeline in front of it and steers this
> pipeline to fetch the taken path of the branch.
> cut end -
>
> Split tune branch_prediction_hints into branch_prediction_hints_taken
> and branch_prediction_hints_not_taken, always generate branch hint for
> conditional branches, both tunes are disabled by default.
>
> [1] 
> https://www.intel.com/content/www/us/en/content-details/821612/intel-64-and-ia-32-architectures-optimization-reference-manual-volume-1.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
Committed.
>
> gcc/
>
> * config/i386/i386.cc (ix86_print_operand): Always generate
> branch hint for conditional branches.
> * config/i386/i386.h (TARGET_BRANCH_PREDICTION_HINTS): Split
> into ..
> (TARGET_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
> (TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
> * config/i386/x86-tune.def (X86_TUNE_BRANCH_PREDICTION_HINTS):
> Split into ..
> (X86_TUNE_BRANCH_PREDICTION_HINTS_TAKEN): .. this, and ..
> (X86_TUNE_BRANCH_PREDICTION_HINTS_NOT_TAKEN): .. this.
> ---
>  gcc/config/i386/i386.cc  | 29 +
>  gcc/config/i386/i386.h   |  6 --
>  gcc/config/i386/x86-tune.def | 13 +++--
>  3 files changed, 24 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1f71ed04be6..ea9cb620f8d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -14041,7 +14041,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
>
> if (!optimize
> || optimize_function_for_size_p (cfun)
> -   || !TARGET_BRANCH_PREDICTION_HINTS)
> +   || (!TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN
> +   && !TARGET_BRANCH_PREDICTION_HINTS_TAKEN))
>   return;
>
> x = find_reg_note (current_output_insn, REG_BR_PROB, 0);
> @@ -14050,25 +14051,13 @@ ix86_print_operand (FILE *file, rtx x, int code)
> int pred_val = profile_probability::from_reg_br_prob_note
>  (XINT (x, 0)).to_reg_br_prob_base ();
>
> -   if (pred_val < REG_BR_PROB_BASE * 45 / 100
> -   || pred_val > REG_BR_PROB_BASE * 55 / 100)
> - {
> -   bool taken = pred_val > REG_BR_PROB_BASE / 2;
> -   bool cputaken
> - = final_forward_branch_p (current_output_insn) == 0;
> -
> -   /* Emit hints only in the case default branch prediction
> -  heuristics would fail.  */
> -   if (taken != cputaken)
> - {
> -   /* We use 3e (DS) prefix for taken branches and
> -  2e (CS) prefix for not taken branches.  */
> -   if (taken)
> - fputs ("ds ; ", file);
> -   else
> - fputs ("cs ; ", file);
> - }
> - }
> +   bool taken = pred_val > REG_BR_PROB_BASE / 2;
> +   /* We use 3e (DS) prefix for taken branches and
> +  2e (CS) prefix for not taken branches.  */
> +   if (taken && TARGET_BRANCH_PREDICTION_HINTS_TAKEN)
> + fputs ("ds ; ", file);
> +   else if (!taken && TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN)
> + fputs ("cs ; ", file);
>   }
> return;
>   }
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 9ed225ec587..50ebed221dc 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -309,8 +309,10 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
>  #define TARGET_ZERO_EXTEND_WITH_AND \
> ix86_tune_features[X86_TUNE_ZERO_EXTEND_WITH_AND]
>  #define TARGET_UNROLL_STRLEN   ix86_tune_features[X86_TUNE_UNROLL_STRLEN]
> -#define TARGET_BRANCH_PREDICTION_HINTS \
> -   ix86_tune_features[X86_TUNE_BRANCH_PREDICTION_HINTS]
> +#define TARGET_BRANCH_PREDICTION_HINTS_NOT_TAKEN \
> +   ix86

Re: [PATCH] Fix MinGW option -mcrtdll=

2024-07-07 Thread Jonathan Yong

On 7/5/24 09:53, Pali Rohár wrote:

On Monday 24 June 2024 10:03:26 Jonathan Yong wrote:

On 6/23/24 16:40, Pali Rohár wrote:

Add missing msvcr40* and msvcrtd* cases to CPP_SPEC and
document missing _UCRT macro and msvcr71* case.

Fixes commit 453cb585f0f8673a5d69d1b420ffd4b3f53aca00.

Thanks, pushed to master branch.


Hello, thanks for quick reply.

I would like to ask one thing. I see that the mentioned commit was
branched into release gcc-14. Should be this fixup commit included
into release gcc-14 too?



Done, cherry-picked to gcc-14 branch.


[PING^0][Patch, rs6000, middle-end] v6: Add implementation for different targets for pair mem fusion

2024-07-07 Thread Ajit Agarwal
Ping ! Please let me know OK for trunk.

Thanks & Regards
Ajit


 Forwarded Message 
Subject: [Patch, rs6000, middle-end] v6: Add implementation for different 
targets for pair mem fusion
Date: Tue, 2 Jul 2024 14:15:02 +0530
From: Ajit Agarwal 
To: Alex Coplan , Richard Sandiford 
, Kewen.Lin , Segher 
Boessenkool , Michael Meissner 
, Peter Bergner , David Edelsohn 
, gcc-patches 

Hello All:

This version of patch relaxes store fusion for more use cases.

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


rs6000, middle-end: Add implementation for different targets for pair mem fusion

Common infrastructure using generic code for pair mem fusion of different
targets.

rs6000 target specific code implement virtual functions defined by generic code.

Target specific code are added in rs6000-mem-fusion.cc.

2024-07-02  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000-passes.def: New mem fusion pass
before pass_early_remat.
* pair-fusion.h: Add additional pure virtual function
required for rs6000 target implementation.
* pair-fusion.cc: Use of virtual functions for additional
virtual function addded for rs6000 target.
* config/rs6000/rs6000-mem-fusion.cc: Add new pass.
Add target specific implementation for generic pure virtual
functions.
* config/rs6000/mma.md: Modify movoo machine description.
Add new machine description movoo1.
* config/rs6000/rs6000.cc: Modify rs6000_split_multireg_move
to expand movoo machine description for all constraints.
* config.gcc: Add new object file.
* config/rs6000/rs6000-protos.h: Add new prototype for mem
fusion pass.
* config/rs6000/t-rs6000: Add new rule.
* rtl-ssa/functions.h: Move out allocate function from private
to public and add get_m_temp_defs function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mem-fusion.C: New test.
* g++.target/powerpc/mem-fusion-1.C: New test.
* gcc.target/powerpc/mma-builtin-1.c: Modify test.
---
 gcc/config.gcc|   2 +
 gcc/config/rs6000/mma.md  |  26 +-
 gcc/config/rs6000/rs6000-mem-fusion.cc| 708 ++
 gcc/config/rs6000/rs6000-passes.def   |   4 +-
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.cc   |  57 +-
 gcc/config/rs6000/rs6000.md   |   1 +
 gcc/config/rs6000/t-rs6000|   5 +
 gcc/pair-fusion.cc|  27 +-
 gcc/pair-fusion.h |  34 +
 gcc/rtl-ssa/functions.h   |  11 +-
 .../g++.target/powerpc/mem-fusion-1.C |  22 +
 gcc/testsuite/g++.target/powerpc/mem-fusion.C |  15 +
 .../gcc.target/powerpc/mma-builtin-1.c|   4 +-
 14 files changed, 890 insertions(+), 27 deletions(-)
 create mode 100644 gcc/config/rs6000/rs6000-mem-fusion.cc
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion-1.C
 create mode 100644 gcc/testsuite/g++.target/powerpc/mem-fusion.C

diff --git a/gcc/config.gcc b/gcc/config.gcc
index bc45615741b..12f79a78177 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -524,6 +524,7 @@ powerpc*-*-*)
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -560,6 +561,7 @@ rs6000*-*-*)
extra_options="${extra_options} g.opt fused-madd.opt 
rs6000/rs6000-tables.opt"
extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
+   extra_objs="${extra_objs} rs6000-mem-fusion.o"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-logue.cc 
\$(srcdir)/config/rs6000/rs6000-call.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
;;
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 04e2d0066df..88413926a02 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -294,7 +294,31 @@
 
 (define_insn_and_split "*movoo"
   [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
-   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+(match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+  "TARGET_MMA
+   && (gpc_reg_opera

Re: [PATCH v1] RISC-V: Implement .SAT_TRUNC for vector unsigned int

2024-07-07 Thread juzhe.zh...@rivai.ai
+/* Expand the standard name ustrunc2 for vector mode,  we can leverage
+   the vector fixed point vector narrowing fixed-point clip directly.  */
+
+void
+expand_vec_ustrunc (rtx op_0, rtx op_1, machine_mode vec_mode,
+ machine_mode double_mode, machine_mode quad_mode)
+{
+  insn_code icode;
+  rtx double_rtx = NULL_RTX;
+  rtx quad_rtx = NULL_RTX;
+  rtx zero = CONST0_RTX (Xmode);
+  enum unspec unspec = UNSPEC_VNCLIPU;
+
+  if (double_mode != E_VOIDmode)
+double_rtx = gen_reg_rtx (double_mode);
+
+  if (quad_mode != E_VOIDmode)
+quad_rtx = gen_reg_rtx (quad_mode);
+
+  if (double_rtx != NULL_RTX && quad_rtx != NULL_RTX)
+{
+  icode = code_for_pred_narrow_clip_scalar (unspec, vec_mode);
+  emit_vec_fixed_binary_rnu (double_rtx, op_1, zero, icode, vec_mode);
+
+  icode = code_for_pred_narrow_clip_scalar (unspec, double_mode);
+  emit_vec_fixed_binary_rnu (quad_rtx, double_rtx, zero, icode, 
double_mode);
+
+  icode = code_for_pred_narrow_clip_scalar (unspec, quad_mode);
+  emit_vec_fixed_binary_rnu (op_0, quad_rtx, zero, icode, quad_mode);
+
+  return;
+}
+
+  if (double_rtx != NULL_RTX && quad_rtx == NULL_RTX)
+{
+  icode = code_for_pred_narrow_clip_scalar (unspec, vec_mode);
+  emit_vec_fixed_binary_rnu (double_rtx, op_1, zero, icode, vec_mode);
+
+  icode = code_for_pred_narrow_clip_scalar (unspec, double_mode);
+  emit_vec_fixed_binary_rnu (op_0, double_rtx, zero, icode, double_mode);
+
+  return;
+}
+
+  if (double_rtx == NULL_RTX && quad_rtx == NULL_RTX)
+{
+  icode = code_for_pred_narrow_clip_scalar (unspec, vec_mode);
+  emit_vec_fixed_binary_rnu (op_0, op_1, zero, icode, vec_mode);
+
+  return;
+}
+
+  gcc_unreachable ();
+}

These codes look odd to me. Could you optimize it in a more straightforward way?


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2024-07-05 09:23
To: gcc-patches
CC: juzhe.zhong; kito.cheng; jeffreyalaw; rdapp.gcc; Pan Li
Subject: [PATCH v1] RISC-V: Implement .SAT_TRUNC for vector unsigned int
From: Pan Li 
 
This patch would like to implement the .SAT_TRUNC for the RISC-V
backend.  With the help of the RVV Vector Narrowing Fixed-Point
Clip Instructions.  The below SEW(S) are supported:
 
* e64 => e32
* e64 => e16
* e64 => e8
* e32 => e16
* e32 => e8
* e16 => e8
 
Take below example to see the changes to asm.
Form 1:
  #define DEF_VEC_SAT_U_TRUNC_FMT_1(NT, WT) \
  void __attribute__((noinline))\
  vec_sat_u_trunc_##NT##_##WT##_fmt_1 (NT *out, WT *in, unsigned limit) \
  { \
unsigned i; \
for (i = 0; i < limit; i++) \
  { \
WT x = in[i];   \
bool overflow = x > (WT)(NT)(-1);   \
out[i] = ((NT)x) | (NT)-overflow;   \
  } \
  }
 
DEF_VEC_SAT_U_TRUNC_FMT_1 (uint32_t, uint64_t)
 
Before this patch:
.L3:
  vsetvli  a5,a2,e64,m1,ta,ma
  vle64.v  v1,0(a1)
  vmsgtu.vvv0,v1,v2
  vsetvli  zero,zero,e32,mf2,ta,ma
  vncvt.x.x.w  v1,v1
  vmerge.vim   v1,v1,-1,v0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3
 
After this patch:
.L3:
  vsetvli  a5,a2,e32,mf2,ta,ma
  vle64.v  v1,0(a1)
  vnclipu.wi   v1,v1,0
  vse32.v  v1,0(a0)
  slli a4,a5,3
  add  a1,a1,a4
  slli a4,a5,2
  add  a0,a0,a4
  sub  a2,a2,a5
  bne  a2,zero,.L3
 
Passed the rv64gcv fully regression tests.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (ustrunc2): Add
new pattern for double truncation.
(ustrunc2): Ditto but for quad truncation.
(ustrunc2): Ditto but for oct truncation.
* config/riscv/riscv-protos.h (expand_vec_ustrunc): Add new decl
to expand vec ustrunc.
* config/riscv/riscv-v.cc (emit_vec_fixed_binary_rnu): Add new
help func to emit vnclipu.
(expand_vec_ustrunc): Add new func impl to expand vector ustrunc.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_data.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-4.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-5.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-6.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-1.c: New test

Re: [x86 SSE PATCH] Some AVX512 ternlog expansion refinements.

2024-07-07 Thread Hongtao Liu
On Sun, Jul 7, 2024 at 5:00 PM Roger Sayle  wrote:
>
>
> Hi Hongtao,
> This should address concerns about the remaining use of force_reg.
>
 51@@ -25793,15 +25792,20 @@ ix86_expand_ternlog_binop (enum rtx_code
code, machine_mode mode,
 52   if (GET_MODE (op1) != mode)
 53 op1 = gen_lowpart (mode, op1);
 54
 55-  if (GET_CODE (op0) == CONST_VECTOR)
 56+  if (CONST_VECTOR_P (op0))
 57 op0 = validize_mem (force_const_mem (mode, op0));
 58-  if (GET_CODE (op1) == CONST_VECTOR)
 59+  if (CONST_VECTOR_P (op1))
 60 op1 = validize_mem (force_const_mem (mode, op1));
 61
 62   if (memory_operand (op0, mode))
 63 {
 64   if (memory_operand (op1, mode))
 65-   op0 = force_reg (mode, op0);
 66+   {
 67+ /* We can't use force_reg (op0, mode).  */
 68+ rtx reg = gen_reg_rtx (mode);
 69+ emit_move_insn (reg, op0);
 70+ op0 = reg;
 71+   }
Shouldn't we handle bcst_mem_operand instead of
memory_operand(bcst_memory_operand is not a memory_operand)?
so maybe
if (memory_operand (op0, mode0) || bcst_mem_operand (op0, mode0)
  if (memory_operand (op1, mode) || bcst_mem_operand (op1, mode0)?
 72   else
 73std::swap (op0, op1);
 74 }

Also there's force_reg in below 3 cases, are there any restrictions to
avoid bcst_mem_operand into them?
case 0x0f:  /* ~a */
case 0x33:  /* ~b */
case 0x33:  /* ~b */
..
 if (!TARGET_64BIT && !register_operand (op2, mode))
   op2 = force_reg (mode, op2);

-- 
BR,
Hongtao


Ping^3: [PATCH 0/3] Recover in-tree libiconv build support

2024-07-07 Thread Arsen Arsenović
Arsen Arsenović  writes:

> Another gentle ping on this patch series.  Could it be merged into
> trunk?

And another!  Sorry about them being sparse - I was quite busy in the
meanwhile.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-07 Thread Sandra Loosemore

On 5/31/24 06:22, Tobias Burnus wrote:


I have to admit that I don't really see the use of metadirective_p as …

  int
-omp_context_selector_matches (tree ctx)
+omp_context_selector_matches (tree ctx, bool metadirective_p, bool 
delay_p)

...

+    if (metadirective_p && delay_p)
+  return -1;


I do see why the resolution of KIND/ARCH/ISA should be delayed – for 
both variant/metadirective as long as the code is run by the host and 
the device. Except that we could exclude, e.g., 'kind(FPGA)' early on as 
we don't support it at all.


But once the device code is split off, I don't see why we can't expand 
the DEVICE clause right away for both variant and metadirective – while 
for 'target_device', we cannot do much until runtime – except of 
excluding things like 'kind(fpga)' – or excluding all 'arch' known not 
to be supported neither by the host nor by any enabled offload devices.


Thus, I see why there is a 'delay_p', but not why there is a 
'metadirective_p'.


But I might have missed something important ...


Yeah, omp_context_selector_matches() is pretty substantially revised in 
part 9 of the posted patch set -- among other things, to remove the 
metadirective_p parameter.  The current split between patches isn't 
ideal but this is such a huge patch set already (with more pieces in the 
works to support "begin declare variant") that refactoring them would be 
a lot of work and probably result in something even more challenging to 
review.  :-S





 case OMP_TRAIT_USER_CONDITION:
   if (set == OMP_TRAIT_SET_USER)
 for (tree p = OMP_TS_PROPERTIES (ts); p; p = 
TREE_CHAIN (p))

   if (OMP_TP_NAME (p) == NULL_TREE)
 {
+  /* OpenMP 5.1 allows non-constant conditions for
+ metadirectives.  */
+  if (metadirective_p
+  && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+    break;
   if (integer_zerop (OMP_TP_VALUE (p)))
 return 0;
   if (integer_nonzerop (OMP_TP_VALUE (p)))
 break;
   ret = -1;
 }


(BTW: I am happy to be enlightened as I likely have miss some fine print.)

Regarding the comment: True, but shouldn't this be handled before by 
issuing an error when such a clause is used in 'declare variant', i.e. 
only occur when metadirective_p is/can be true?


The error diagnostic is handled during parsing in the respective front 
ends (parts 4, 5, and 7 of the series).


Besides, I have to admit that I do not understand the new code. The 
current code has: constant zero → whole selector known to be false 
("return 0"); nonzero constant → keep current state, i.e. either 'true' 
(1) or don't known ('-1') and continue; otherwise (not const) → set to 
"don't know" (-1) and continue with the next item.


That seems to make also sense for metadirectives. But your patch changes 
this to keep current state if a variable. In that case, '1' is used if 
this is the only item or the previous condition is true. Or "-1" when 
the previous item is "don't know" (-1). - I think that doesn't make 
sense and it should always return -1 for a run time value.


-1 doesn't really mean "don't know".  It means "don't know YET".  For 
the purposes of omp_context_selector_matches, a dynamic selector always 
matches in the sense that they all need to be included in the list of 
replacement candidates.


Additionally, I wonder why you use tree_fits_shwi_p instead of a simple 
'TREE_CODE (OMP_TP_VALUE (p)) != INTEGER_CST'. It does not seem to 
matter here, but '(uint128_t)-1' looks like a valid condition and valid 
constant, which integer_nonzerop should handled but if the hwi is 128bit 
wide, it won't fit into a signed variable.


(As integer_nonzerop and the current code both do "break;" it won't 
change the result of the current code.)


The existing code for parsing "declare variant" context selectors 
already uses tree_fits_shwi_p.  (See e.g. c_parser_omp_context_selector 
in gcc/c/c-parser.cc.)  If there's a better idiom for checking for a 
constant I'll certainly use it, but I was trying to be consistent with 
what I thought was standard practice already.  :-S



* * *

+static tree
+omp_dynamic_cond (tree ctx)
+{

...

+  /* The user condition is not dynamic if it is constant.  */
+  if (!tree_fits_shwi_p (TREE_VALUE (expr_list)))


Any reason for using tree_fits_shwi_p instead of INTEGER_CST? Here, 
(uint128_t)-1 could make a difference …


Same here.




+    /* omp_initial_device is -1, omp_invalid_device is -4; choose
+   a value that isn't otherwise defined to indicate the default
+   device.  */
+    device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently for 'target' and it should do the 
same. Some value usage history:


Wait, in your January review comments on an earlier version of this 
patch y

[pushed] libstdc++: Tweak two links in configuration docs

2024-07-07 Thread Gerald Pfeifer
Business as usual; pushed.

Gerald


libstdc++-v3:
* doc/xml/manual/configure.xml: Update Autobook 14 link.
Update GCC installation instructions link.
* doc/html/manual/configure.html: Regenerate.
---
 libstdc++-v3/doc/html/manual/configure.html | 4 ++--
 libstdc++-v3/doc/xml/manual/configure.xml   | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/doc/html/manual/configure.html 
b/libstdc++-v3/doc/html/manual/configure.html
index 346b5d345cd..3564b0c409f 100644
--- a/libstdc++-v3/doc/html/manual/configure.html
+++ b/libstdc++-v3/doc/html/manual/configure.html
@@ -9,7 +9,7 @@
   Here are all of the configure options specific to libstdc++.  Keep
   in mind that

-   http://sourceware.org/autobook/autobook/autobook_14.html"; 
target="_top">they
+   https://sourceware.org/autobook/autobook/autobook_14.html"; 
target="_top">they
all have opposite forms as well (enable/disable and
with/without).  The defaults are for the current
development sources, which may be different than those
@@ -82,7 +82,7 @@
(described next).
  --enable-threads=OPTIONSelect a 
threading library.  A full description is
given in the
-   general http://gcc.gnu.org/install/configure.html"; target="_top">compiler
+   general https://gcc.gnu.org/install/configure.html"; target="_top">compiler
configuration instructions. This option can change the
library ABI.
  --enable-libstdcxx-threadsEnable C++11 
threads support.  If not explicitly specified,
diff --git a/libstdc++-v3/doc/xml/manual/configure.xml 
b/libstdc++-v3/doc/xml/manual/configure.xml
index 0a477ab85e5..cd5f44458a2 100644
--- a/libstdc++-v3/doc/xml/manual/configure.xml
+++ b/libstdc++-v3/doc/xml/manual/configure.xml
@@ -24,7 +24,7 @@
   Here are all of the configure options specific to libstdc++.  Keep
   in mind that

-   http://www.w3.org/1999/xlink"; 
xlink:href="http://sourceware.org/autobook/autobook/autobook_14.html";>they
+   http://www.w3.org/1999/xlink"; 
xlink:href="https://sourceware.org/autobook/autobook/autobook_14.html";>they
all have opposite forms as well (enable/disable and
with/without).  The defaults are for the current
development sources, which may be different than those
@@ -148,7 +148,7 @@
  --enable-threads=OPTION
  Select a threading library.  A full description is
given in the
-   general http://www.w3.org/1999/xlink"; 
xlink:href="http://gcc.gnu.org/install/configure.html";>compiler
+   general http://www.w3.org/1999/xlink"; 
xlink:href="https://gcc.gnu.org/install/configure.html";>compiler
configuration instructions. This option can change the
library ABI.
  
-- 
2.45.2


[pushed] wwwdocs: readings: Update Edsger W. Dijkstra's home page

2024-07-07 Thread Gerald Pfeifer
---
 htdocs/readings.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/readings.html b/htdocs/readings.html
index ae1b52bb..f0d44d6f 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -469,7 +469,7 @@ names.
 
Historical material - for your enjoyment.

- https://www.cs.utexas.edu/users/EWD/";>The writings of Edsger 
W. Dijkstra (RIP)
+ https://www.cs.utexas.edu/~EWD/";>The writings of Edsger W. 
Dijkstra (RIP)


 
-- 
2.45.2


[pushed] maintainer-scripts: Switch bug reporting URL to https

2024-07-07 Thread Gerald Pfeifer
So, this required quite some detective work to understand why

  https://gcc.gnu.org/onlinedocs/gcc/Bug-Reporting.html

still referred to http://gcc.gnu.org/bugs/ *without* https from the 
following makeinfo snippet

  Bugs should be reported to the bug database at @value{BUGURL}.

in gcc/doc/bugreport.texi when elsewhere in the tree we had updated
BUGURL a while ago.


Now it turns out I had already improved things there last year via

  commit 0c061da91a3657afdb3fac68e4595af685909a1a
  Author: Gerald Pfeifer 
  Date:   Thu Mar 16 01:20:26 2023 +0100

maintainer-scripts: Abstract BUGURL in update_web_docs_git

The URL where to report bugs is hard coded in two places; abstract that
into one variable, defined up front.

maintainer-scripts/ChangeLog:

* update_web_docs_git (BUGURL): Introduce and use throughout.


However, wouldn't it be so much better if we could import (or "import")
BUGURL from gcc/ where it is also set?

Any thoughts?
Joseph, I believe you helped put the current mechanics in place?


Gerald

PS: In any case I pushed the patch below for now.

maintainer-scripts:
* update_web_docs_git (BUGURL): Switch to https.
---
 maintainer-scripts/update_web_docs_git | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/maintainer-scripts/update_web_docs_git 
b/maintainer-scripts/update_web_docs_git
index c651e567424..0d7b6c90fe9 100755
--- a/maintainer-scripts/update_web_docs_git
+++ b/maintainer-scripts/update_web_docs_git
@@ -45,7 +45,7 @@ MANUALS="cpp
   libiberty
   porting"
 
-BUGURL="http://gcc.gnu.org/bugs/";
+BUGURL="https://gcc.gnu.org/bugs/";
 CSS=/texinfo-manuals.css
 
 WWWBASE=${WWWBASE:-"/www/gcc/htdocs"}
-- 
2.45.2


[pushed] doc: Remove dubious example around bug reporting

2024-07-07 Thread Gerald Pfeifer
Really, that's probably something from some old compilers in the 90s; no 
point in confusing people with such history as interesting as it may be.

Gerald


gcc:
* doc/bugreport.texi (Bug Criteria): Remove dubious example.
---
 gcc/doc/bugreport.texi | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/doc/bugreport.texi b/gcc/doc/bugreport.texi
index b7cfb5dd6ae..7a603241f77 100644
--- a/gcc/doc/bugreport.texi
+++ b/gcc/doc/bugreport.texi
@@ -50,11 +50,6 @@ However, you must double-check to make sure, because you may 
have a
 program whose behavior is undefined, which happened by chance to give
 the desired results with another C or C++ compiler.
 
-For example, in many nonoptimizing compilers, you can write @samp{x;}
-at the end of a function instead of @samp{return x;}, with the same
-results.  But the value of the function is undefined if @code{return}
-is omitted; it is not a bug when GCC produces different results.
-
 Problems often result from expressions with two increment operators,
 as in @code{f (*p++, *p++)}.  Your previous compiler might have
 interpreted that expression the way you intended; GCC might
-- 
2.45.2


[pushed] wwwdocs: news: Standardize OpenMP references

2024-07-07 Thread Gerald Pfeifer
Use OpenMP X.Y instead of OpenMP vX.Y. Use https for web links.
---
 htdocs/news.html | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index 4a104520..f13a8249 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -302,10 +302,10 @@
 standards. The code was contributed by Fran??ois-Xavier Coudert of
 CNRS.
 
-OpenMP v4.0
+OpenMP 4.0
 [2014-06-30] wwwdocs:
 An implementation of the http://www.openmp.org/specifications/";>OpenMP v4.0
+href="https://www.openmp.org/specifications/";>OpenMP 4.0
 parallel programming interface for Fortran has been added and is going
 to be available in the upcoming GCC 4.9.1 release.
 
@@ -360,10 +360,10 @@
 [2013-10-16] wwwdocs:
 
 
-OpenMP v4.0
+OpenMP 4.0
 [2013-10-11] wwwdocs:
 An implementation of the http://www.openmp.org/specifications/";>OpenMP v4.0
+href="https://www.openmp.org/specifications/";>OpenMP 4.0
 parallel programming interface for so far C and C++ has been added.
 Code was contributed by Jakub Jelinek, Aldy Hernandez, Richard Henderson
 of Red Hat, Inc. and Tobias Burnus.
@@ -530,7 +530,7 @@ by Embecosm.
 [2011-10-26] wwwdocs:
 
 
-OpenMP v3.1
+OpenMP 3.1
 [2011-08-02] wwwdocs:
 An implementation of the OpenMP 3.1
 parallel programming interface for C, C++ and Fortran has been added.
-- 
2.45.2


[PATCH 1/1] config: Handle dash in library name for AC_LIB_LINKAGEFLAGS_BODY

2024-07-07 Thread Abdul Basit Ijaz
From: "Ijaz, Abdul B" 

For a library with dash in the name like yaml-cpp the AC_LIB_LINKAGEFLAGS_BODY
function generates a with_libname_type argument variable name with a dash but
this results in configure error.  Since dashes are not allowed in the variable
name.

This change handles such cases and in case input library for the
AC_LIB_HAVE_LINKFLAGS has dash then it replaces it with the underscore "_".

Example of an error for yaml-cpp library before the change using gcc config
scripts in gdb:
gdb/gdb/configure: line 22868: with_libyaml-cpp_type=auto: command not found

After having underscore for this variable name:

checking whether to use yaml-cpp... yes
checking for libyaml-cpp... yes
checking how to link with libyaml-cpp... -lyaml-cpp

config/ChangeLog:

* lib-link.m4: Handle dash in the library name for
AC_LIB_LINKFLAGS_BODY.

2024-07-03 Ijaz, Abdul B 
---
 config/lib-link.m4 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/config/lib-link.m4 b/config/lib-link.m4
index 20e281fd323..a60a8069453 100644
--- a/config/lib-link.m4
+++ b/config/lib-link.m4
@@ -126,6 +126,7 @@ AC_DEFUN([AC_LIB_LINKFLAGS_BODY],
 [
   define([NAME],[translit([$1],[abcdefghijklmnopqrstuvwxyz./-],
[ABCDEFGHIJKLMNOPQRSTUVWXYZ___])])
+  define([Name],[translit([$1],[./-], [___])])
   dnl By default, look in $includedir and $libdir.
   use_additional=yes
   AC_LIB_WITH_FINAL_PREFIX([
@@ -152,8 +153,8 @@ AC_DEFUN([AC_LIB_LINKFLAGS_BODY],
 ])
   AC_LIB_ARG_WITH([lib$1-type],
 [  --with-lib$1-type=TYPE type of library to search for 
(auto/static/shared) ],
-  [ with_lib$1_type=$withval ], [ with_lib$1_type=auto ])
-  lib_type=`eval echo \$with_lib$1_type`
+  [ with_lib[]Name[]_type=$withval ], [ with_lib[]Name[]_type=auto ])
+  lib_type=`eval echo \$with_lib[]Name[]_type`
 
   dnl Search the library and its dependencies in $additional_libdir and
   dnl $LDFLAGS. Using breadth-first-seach.
-- 
2.34.1

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928



[PATCH 0/1] config: Handle dash in library name for AC_LIB_LINKAGEFLAGS_BODY

2024-07-07 Thread Abdul Basit Ijaz
From: "Ijaz, Abdul B" 

Hi All,

This change fixes the handling of dash(-) in the library names like
yaml-cpp for the AC_LIB_LINKAGEFLAGS_BODY script function of the file
"config/lib-link.m4" and used by the autoconf tool.  This is my first
patch for GCC so adding this summary of the change in 0/1 patch and
details of the fix can be seen in the patch 1/1.

Autoconf 2.69 tool execution only adds few new empty lines and some
line numbers are updated in the regenerated configure file after this
change.  There is no significant change in this file because of this
change so have not comitted it.  Since this change only affects the
libraries having having dash in the name.  We need this change eventually
for the GDB source code repository which basically gets the copy of these
scripts from GCC.  Let me know if any testing can be done for this change.
Already mentioned I only tried regenerating the configure file and have
not notice any significant change so it seems nothing need to be tested.
I ran the tests on x86_64 Ubuntu22 setup and there are some unexpeced
failures but nothing seems related to this change.

Lastly want to mention that I do not have the write access.  So I
want to ask if this small change can be accepted without CLA.  The
Signed-off field is already added to the commit to fulfill the DCO.

Waiting for your feedback.

Thanks & Best Regards
Abdul Basit

Ijaz, Abdul B (1):
  config: Handle dash in library name for AC_LIB_LINKAGEFLAGS_BODY

 config/lib-link.m4 | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

-- 
2.34.1

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Sean Fennelly, Jeffrey Schneiderman, Tiffany Doon Silva
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928



Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-07-07 Thread Gerald Pfeifer
On Mon, 10 Jun 2024, Rainer Orth wrote:
> I'd have loved to remove fixes that mention obsolete Solaris versions,
> but refrained from doing so when there was no way of knowing that no
> innocent would be harmed.

Doing so early in stage 1 (like now ;-) might be a good trade off?

If nobody reports any issue until the first release candidate of the next 
major version, that is telling.

Gerald


[pushed] wwwdocs: gcc-14: Use hyphen in compile-time

2024-07-07 Thread Gerald Pfeifer
---
 htdocs/gcc-14/changes.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 9a1b0c8a..29958fd5 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -621,7 +621,7 @@ You may also want to check out our
   
   The -Wcase-enum
 and -Wuninit-variable-checking= options have
-been implemented to provide compile time warnings against
+been implemented to provide compile-time warnings against
 missing case clauses and uninitialized variables respectively.
   
 
-- 
2.45.2


Re: [PATCH][wwwdocs] changes.html changes for AArch64 for GCC 14.1

2024-07-07 Thread Gerald Pfeifer
On Tue, 2 Apr 2024, Kyrylo Tkachov wrote:
> Here's a writeup of the AArch64 changes to highlight in GCC 14.1. If 
> there's something you'd like to highlight feel free to comment or add a 
> patch yourself. I don't expect the list to be exhaustive.
> 
> It's been a busy release for AArch64!

Indeed. Busy in a good way. :-)

How about the following simplification around generic-armv8-a and
generic-armv8-a? I believe this version is easier to consume.

(_Not_ pushed.)

Gerald


diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 9a1b0c8a..ca4cae0f 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -687,11 +687,11 @@ You may also want to check out our
   Microsoft Cobalt-100 (cobalt-100).
 
 Additionally, the identifiers generic,
-generic-armv8-a and generic-armv9-a are added
-as arguments to -mcpu= and -mtune= to optimize
-code generation aimed at a good blend of CPUs of a particular architecture
-version.  These tunings are also used as the default optimization targets
-when compiling with the -march=armv8-a or
+generic-armv8-a and generic-armv9-a can be
+used to optimize code generation for a good blend of CPUs of a
+particular architecture
+version.  These tunings are also used as the default optimization
+targets when compiling with the -march=armv8-a or
 -march=armv9-a options and their point releases e.g.
 -march=armv8.2-a or -march=armv9.3-a.
 


Re: [RFC WIP] RAW_DATA_CST for #embed optimization

2024-07-07 Thread Richard Biener



> Am 07.07.2024 um 17:14 schrieb Jakub Jelinek :
> 
> On Sun, Jul 07, 2024 at 09:02:57AM +0200, Richard Biener wrote:
>> I see.  I was wondering because PCH includes are not resolved.  That said,
>> it sounds like #embed is sadly defined on The preprocessor side rather
>> than in the language where it would have been easy to constrain uses to
>> those that make sense…
> 
> I think there were big discussions on this and at some stage it has been
> a builtin etc.
> 
>> Yeah, I wondered if where the raw data survives we can make it always
>> wrapped by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element.  This may
>> be useful to encode large initializers more efficiently during/after
>> parsing.
> 
> We definitely should try to improve handling even large initializers which
> do not use #embed eventually, it depends on where all the large overheads
> are where to approach it.
> It could be handled in the preprocessor, say after we see 128 or how many
> CPP_NUMBERs from 0-255 alternating with CPP_COMMA, do some look ahead and
> construct a CPP_EMBED, or it could be done during parsing of initializer
> similarly after seeing certain number of initializers of a CHAR_BIT array
> use the C FE raw token lexing to watch ahead and create RAW_DATA_CST out
> of that if beneficial, etc.
> It really depends on where the biggest overhead is, whether it is in
> creation of the millions of CPP_NUMBER/CPP_COMMA tokens, or primarily
> when creating the large CONSTRUCTOR (the INTEGER_CSTs for the values should
> be shared, at most 256 of them, but the indexes are not).

There’s a very old PR about the regression for very large static initializers 
compared to the time we wrote those directly to asm_out

Richard 

>Jakub
> 


Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-07 Thread Jeff Law




On 7/6/24 6:12 PM, Oleg Endo wrote:




This is almost certainly a poorly written pattern.  I just fixed a bunch
of these, but not this one.  Essentially a recent change in the generic
parts of the compiler is exposing some bugs in the SH backend.


The patterns were written and tested to the best of our knowledge at that
time many years ago.  Nobody thought that we'll get a 2nd combine pass after
RA.  Anyway, I'll have a look at the remaining patterns.
My comment wasn't meant to disparage you or anyone.  Just to note that 
the patterns need fixing as they are incorrect.


We have certainly had cases where the model used by those patterns would 
cause problems in the past.   Hard register cprop, compare elimination 
and others could trigger the same kind of problem.  late-combine is just 
more likely to expose these kinds of latent bugs.


Jeff




Re: [RFC WIP] RAW_DATA_CST for #embed optimization

2024-07-07 Thread Jakub Jelinek
On Sun, Jul 07, 2024 at 09:02:57AM +0200, Richard Biener wrote:
> I see.  I was wondering because PCH includes are not resolved.  That said,
> it sounds like #embed is sadly defined on The preprocessor side rather
> than in the language where it would have been easy to constrain uses to
> those that make sense…

I think there were big discussions on this and at some stage it has been
a builtin etc.

> Yeah, I wondered if where the raw data survives we can make it always
> wrapped by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element.  This may
> be useful to encode large initializers more efficiently during/after
> parsing.

We definitely should try to improve handling even large initializers which
do not use #embed eventually, it depends on where all the large overheads
are where to approach it.
It could be handled in the preprocessor, say after we see 128 or how many
CPP_NUMBERs from 0-255 alternating with CPP_COMMA, do some look ahead and
construct a CPP_EMBED, or it could be done during parsing of initializer
similarly after seeing certain number of initializers of a CHAR_BIT array
use the C FE raw token lexing to watch ahead and create RAW_DATA_CST out
of that if beneficial, etc.
It really depends on where the biggest overhead is, whether it is in
creation of the millions of CPP_NUMBER/CPP_COMMA tokens, or primarily
when creating the large CONSTRUCTOR (the INTEGER_CSTs for the values should
be shared, at most 256 of them, but the indexes are not).

Jakub



[committed] c++: Simplify uses of LAMBDA_EXPR_EXTRA_SCOPE

2024-07-07 Thread Nathaniel Shead
On Sun, Jun 16, 2024 at 12:18:10PM +1000, Nathaniel Shead wrote:
> No functional change intended; OK for trunk?
> 

In retrospect, committing as obvious after bootstrap+regtest.

> -- >8 --
> 
> I noticed there already exists a getter to get the scope of a lambda
> from its type directly rather than needing to go via
> CLASSTYPE_LAMBDA_EXPR, we may as well use it.
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (trees_out::get_merge_kind): Use
>   LAMBDA_TYPE_EXTRA_SCOPE instead of LAMBDA_EXPR_EXTRA_SCOPE.
>   (trees_out::key_mergeable): Likewise.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index ea7ad0c1f29..6d6044af199 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -10686,9 +10686,7 @@ trees_out::get_merge_kind (tree decl, depset *dep)
>  g++.dg/modules/lambda-6_a.C.  */
>   if (DECL_IMPLICIT_TYPEDEF_P (STRIP_TEMPLATE (decl))
>   && LAMBDA_TYPE_P (TREE_TYPE (decl)))
> -   if (tree scope
> -   = LAMBDA_EXPR_EXTRA_SCOPE (CLASSTYPE_LAMBDA_EXPR
> -  (TREE_TYPE (decl
> +   if (tree scope = LAMBDA_TYPE_EXTRA_SCOPE (TREE_TYPE (decl)))
>   {
> /* Lambdas attached to fields are keyed to its class.  */
> if (TREE_CODE (scope) == FIELD_DECL)
> @@ -10993,8 +10991,7 @@ trees_out::key_mergeable (int tag, merge_kind mk, 
> tree decl, tree inner,
>   case MK_keyed:
> {
>   gcc_checking_assert (LAMBDA_TYPE_P (TREE_TYPE (inner)));
> - tree scope = LAMBDA_EXPR_EXTRA_SCOPE (CLASSTYPE_LAMBDA_EXPR
> -   (TREE_TYPE (inner)));
> + tree scope = LAMBDA_TYPE_EXTRA_SCOPE (TREE_TYPE (inner));
>   gcc_checking_assert (TREE_CODE (scope) == VAR_DECL
>|| TREE_CODE (scope) == FIELD_DECL
>|| TREE_CODE (scope) == PARM_DECL
> -- 
> 2.43.2
> 


Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-07-07 Thread Jeff Law




On 6/8/24 2:36 PM, Jeff Law wrote:



On 6/5/24 8:42 PM, Fei Gao wrote:


But let's back up and get a good explanation of what the problem is.
Based on patch 2/2 it looks like we have lost an assignment to the
return register.

To someone not familiar with this code, it sounds to me like we've made
a mistake earlier and we're now defining a hook that lets us go back and
fix that earlier mistake.   I'm probably wrong, but so far that's what
it sounds like.

Hi Jeff

You're right. Let me rephrase  patch 2/2 with more details. Search /* 
feigao to location the point I'm

tring to explain.

code snippets from gcc/function.cc
void
thread_prologue_and_epilogue_insns (void)
{
...
   /*feigao:
         targetm.gen_epilogue () is called here to generate epilogue 
sequence.
https://gcc.gnu.org/git/? 
p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864

Commit above tries in targetm.gen_epilogue () to detect if
there's li    a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and li    a0,0 insn is deleted.
So that seems like the critical issue.  Generation of the prologue/ 
epilogue really shouldn't be changing other instructions in the 
instruction stream.  I'm not immediately aware of another target that 
does that, an it seems like a rather risky thing to do.



It looks like the cm.popretz's RTL exposes the assignment to a0 and 
there's a DCE pass that runs after insertion of the prologue/epilogue. 
So I would suggest leaving the assignment to a0 in the RTL chain and see 
if the later DCE pass after prologue generation eliminates the redundant 
assignment.  That seems a lot cleaner.
So I looked at this in a bit more detail.  I'm going to explicitly 
reject this patch now.


The removal of the set to a0 in riscv_gen_multi_pop_insn looks wrong on 
multiple levels.  I don't think you have enough context in that routine 
or its callers to know if it's safe  ie given this fragment of RTL:



(call_insn 12 11 13 3 (parallel [
(call (mem:SI (symbol_ref:SI ("test_1") [flags 0x41] ) [0 test_1 S4 A32])
(const_int 0 [0]))
(use (unspec:SI [
(const_int 0 [0])
] UNSPEC_CALLEE_CC))
(clobber (reg:SI 1 ra))
]) "j.c":14:9 441 {call_internal}
 (expr_list:REG_CALL_DECL (symbol_ref:SI ("test_1") [flags 0x41] 
)
(nil))
(expr_list:SI (use (reg:SI 10 a0))
(nil)))

(code_label 13 12 14 4 2 (nil) [1 uses])

(note 14 13 19 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(insn 19 14 20 4 (set (reg/i:SI 10 a0)
(const_int 0 [0])) "j.c":18:1 276 {*movsi_internal}
 (nil))

(insn 20 19 24 4 (use (reg/i:SI 10 a0)) "j.c":18:1 -1
 (nil))



You delete insns 19 and 20 resulting in this:


(call_insn 12 11 13 3 (parallel [
(call (mem:SI (symbol_ref:SI ("test_1") [flags 0x41] ) [0 test_1 S4 A32])
(const_int 0 [0]))
(use (unspec:SI [
(const_int 0 [0])
] UNSPEC_CALLEE_CC))
(clobber (reg:SI 1 ra))
]) "j.c":14:9 441 {call_internal}
 (expr_list:REG_CALL_DECL (symbol_ref:SI ("test_1") [flags 0x41] 
)
(nil))
(expr_list:SI (use (reg:SI 10 a0))
(nil)))

(code_label 13 12 14 4 2 (nil) [1 uses])

(note 14 13 24 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(note 24 14 0 NOTE_INSN_DELETED)


Which is incorrect/inconsistent RTL.  And as I've noted before, it's 
conceptually wrong for the backend code to be removing insns from the 
insn chain during prologue/epilogue generation.


I realize you're trying to use a hook to limit how this impacts other 
targets, but if you're making a bad decision in the RISC-V backend code, 
working around it with a target hook rather than fixing the core problem 
in the RISC-V backend just makes the whole situation worse.


My suggest is this.  Leave the assignment to a0 and use alone.  That's 
likely going to result in some kind of code size regression, but not a 
correctness regression.  Then address the code size regressions with the 
invariant that prologue/epilogue generation must not change existing 
insns on the insn chain.


jeff













Re: [PATCH 1/1] ada: Make the names of uninstalled cross-gnattools consistent across builds

2024-07-07 Thread Maciej W. Rozycki
On Thu, 4 Jul 2024, Arnaud Charlet wrote:

> The change is OK, thanks.

 I have committed it now, thank you for your review.

  Maciej


[PATCH] c++/modules: Conditionally start timer during lazy load [PR115165]

2024-07-07 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Or should I include a testcase?  I haven't reduced one from using the
full contents of C++23  yet but I can do so if you prefer.

-- >8 --

While lazy loading, instantiation of pendings can sometimes recursively
perform name lookup and begin further lazy loading.  When using the
'-ftime-report' functionality this causes ICEs as we could start an
already-running timer for the importing.

This patch fixes the issue by using the 'timevar_cond*' API instead to
support such recursive calls.

PR c++/115165

gcc/cp/ChangeLog:

* module.cc (lazy_load_binding): Use 'timevar_cond*' APIs.
(lazy_load_pendings): Likewise.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index dc5d046f04d..fec1b7e58df 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19564,7 +19564,7 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
 {
   int count = errorcount + warningcount;
 
-  timevar_start (TV_MODULE_IMPORT);
+  bool timer_running = timevar_cond_start (TV_MODULE_IMPORT);
 
   /* Make sure lazy loading from a template context behaves as if
  from a non-template context.  */
@@ -19594,7 +19594,7 @@ lazy_load_binding (unsigned mod, tree ns, tree id, 
binding_slot *mslot)
 
   function_depth--;
 
-  timevar_stop (TV_MODULE_IMPORT);
+  timevar_cond_stop (TV_MODULE_IMPORT, timer_running);
 
   if (!ok)
 fatal_error (input_location,
@@ -19633,7 +19633,7 @@ lazy_load_pendings (tree decl)
 
   int count = errorcount + warningcount;
 
-  timevar_start (TV_MODULE_IMPORT);
+  bool timer_running = timevar_cond_start (TV_MODULE_IMPORT);
   bool ok = !recursive_lazy ();
   if (ok)
 {
@@ -19667,7 +19667,7 @@ lazy_load_pendings (tree decl)
   function_depth--;
 }
 
-  timevar_stop (TV_MODULE_IMPORT);
+  timevar_cond_stop (TV_MODULE_IMPORT, timer_running);
 
   if (!ok)
 fatal_error (input_location, "failed to load pendings for %<%E%s%E%>",
-- 
2.43.2



[committed] libstdc++: Fix std::find for non-contiguous iterators [PR115799]

2024-07-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The r15-1857 change didn't correctly restrict the new optimization to
contiguous iterators.

libstdc++-v3/ChangeLog:

PR libstdc++/115799
* include/bits/stl_algo.h (find): Use 'if constexpr' so that
memchr optimization is a discarded statement for non-contiguous
iterators.
* testsuite/25_algorithms/find/bytes.cc: Check with input
iterators.
---
 libstdc++-v3/include/bits/stl_algo.h  | 44 +--
 .../testsuite/25_algorithms/find/bytes.cc |  7 +++
 2 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index 45c3b591326..d250b2e04d4 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3849,32 +3849,28 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 #if __cpp_if_constexpr && __glibcxx_type_trait_variable_templates
   using _ValT = typename iterator_traits<_InputIterator>::value_type;
   if constexpr (__can_use_memchr_for_find<_ValT, _Tp>)
-   {
- // If converting the value to the 1-byte value_type alters its value,
- // then it would not be found by std::find using equality comparison.
- // We need to check this here, because otherwise something like
- // memchr("a", 'a'+256, 1) would give a false positive match.
- if (!(static_cast<_ValT>(__val) == __val))
-   return __last;
- else if (!__is_constant_evaluated())
-   {
- const void* __p0 = nullptr;
- if constexpr (is_pointer_v)
-   __p0 = std::__niter_base(__first);
+   if constexpr (is_pointer_v
 #if __cpp_lib_concepts
- else if constexpr (contiguous_iterator<_InputIterator>)
-   __p0 = std::to_address(__first);
+   || contiguous_iterator<_InputIterator>
 #endif
- if (__p0)
-   {
- const int __ival = static_cast(__val);
- if (auto __n = std::distance(__first, __last); __n > 0)
-   if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
- return __first + ((const char*)__p1 - (const char*)__p0);
- return __last;
-   }
-   }
-   }
+)
+ {
+   // If conversion to the 1-byte value_type alters the value,
+   // it would not be found by std::find using equality comparison.
+   // We need to check this here, because otherwise something like
+   // memchr("a", 'a'+256, 1) would give a false positive match.
+   if (!(static_cast<_ValT>(__val) == __val))
+ return __last;
+   else if (!__is_constant_evaluated())
+ {
+   const void* __p0 = std::__to_address(__first);
+   const int __ival = static_cast(__val);
+   if (auto __n = std::distance(__first, __last); __n > 0)
+ if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
+   return __first + ((const char*)__p1 - (const char*)__p0);
+   return __last;
+ }
+ }
 #endif
 
   return std::__find_if(__first, __last,
diff --git a/libstdc++-v3/testsuite/25_algorithms/find/bytes.cc 
b/libstdc++-v3/testsuite/25_algorithms/find/bytes.cc
index f4ac5d4018d..e1d6c01ab21 100644
--- a/libstdc++-v3/testsuite/25_algorithms/find/bytes.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/find/bytes.cc
@@ -3,6 +3,7 @@
 #include 
 #include  // std::byte
 #include 
+#include 
 
 // PR libstdc++/88545 made std::find use memchr as an optimization.
 // This test verifies that it didn't change any semantics.
@@ -113,6 +114,12 @@ test_non_characters()
 #endif
 }
 
+void
+test_pr115799c2(__gnu_test::input_iterator_wrapper i)
+{
+  (void) std::find(i, i, 'a');
+}
+
 int main()
 {
   test_char();
-- 
2.45.2



[committed] libstdc++: Fix memchr path in std::ranges::find for non-common range [PR115799]

2024-07-07 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The memchr optimization introduced in r15-1857 needs to advance the
start iterator instead of returning the sentinel.

libstdc++-v3/ChangeLog:

PR libstdc++/115799
* include/bits/ranges_util.h (__find_fn): Return iterator
instead of sentinel.
* testsuite/25_algorithms/find/constrained.cc: Check non-common
contiguous sized range of char.
---
 libstdc++-v3/include/bits/ranges_util.h   | 19 +--
 .../25_algorithms/find/constrained.cc | 10 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index 186acae4f70..a1f42875b11 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -501,17 +501,16 @@ namespace ranges
  if constexpr (contiguous_iterator<_Iter>)
if (!is_constant_evaluated())
  {
-   if (static_cast>(__value) != __value)
- return __last;
-
+   using _Vt = iter_value_t<_Iter>;
auto __n = __last - __first;
-   if (__n > 0)
- {
-   const int __ival = static_cast(__value);
-   const void* __p0 = std::to_address(__first);
-   if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
- __n = (const char*)__p1 - (const char*)__p0;
- }
+   if (static_cast<_Vt>(__value) == __value) [[likely]]
+ if (__n > 0)
+   {
+ const int __ival = static_cast(__value);
+ const void* __p0 = std::to_address(__first);
+ if (auto __p1 = __builtin_memchr(__p0, __ival, __n))
+   __n = (const char*)__p1 - (const char*)__p0;
+   }
return __first + __n;
  }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/find/constrained.cc 
b/libstdc++-v3/testsuite/25_algorithms/find/constrained.cc
index e94751fcf89..7357a40bcc4 100644
--- a/libstdc++-v3/testsuite/25_algorithms/find/constrained.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/find/constrained.cc
@@ -66,9 +66,19 @@ test02()
   static_assert(ranges::find(y, 5, &Y::j) == y+3);
 }
 
+void
+test_pr115799()
+{
+  const char str[3] = { 'a', 'b', 'c' };
+  __gnu_test::test_contiguous_sized_range r(str);
+  VERIFY(std::ranges::find(r, 'a') == std::ranges::begin(r));
+  VERIFY(std::ranges::find(r, 'a'+255) == std::ranges::end(r));
+}
+
 int
 main()
 {
   test01();
   test02();
+  test_pr115799();
 }
-- 
2.45.2



Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-07 Thread Richard Biener



> Am 07.07.2024 um 11:26 schrieb Robin Dapp :
> 
> 
>> 
>> Yeah, I think so.  I guess for RVV there's a choice between:
>> 
>> (1) making the insn predicate accept all else values and making
>>the insn emit an explicit blend between the loaded result
>>and the else value
>> 
>> (2) making the insn predicate only accept “undefined” (SCRATCH in
>>rtl terms)
>> 
>> (2) sounds more in keeping with Juzhe's fix for PR110751.
> 
> 
> I think (2) is the reasonable choice for the future, however:
> 
> I misinterpreted the RVV spec.  The implementation has the choice
> between filling with ones or leaving unchanged (merging).  So it's
> not really "anything" as I claimed before.
> This implies that we could defer to target_preferred_else_value and
> eventually, in the target, leave it to the uarch what to return
> (either "1" or "undefined").
> 
> On the general strategy:
> Should we (for SVE, AVX512 etc.) have a match pattern that folds
> 
>  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
>  COND_EXPR (mask, res, {0, 0, ...})
> 
> into just MASK_LOAD then?  MASK_LOAD (..., {0, 0, ..., 0}) would not
> be emitted for RVV and we'd need to fold the COND_EXPR differently,
> maybe with a combination of match patterns similar to the ones in
> my RFC as well as a separate "pass" for mask tracking as Richi
> suggested?

I think it’s less than ideal to only do this via match but I think we cannot do 
without the match patterns if you think of unrolling and simplification 
exposing the opportunities.

Richard 

> Regards
> Robin
> 
> p.s. We have the choice of different policies for length masking and
> element masking.  I don't think the problem arose yet due to the way
> we handle length control in the vectorizer but I could imagine, at
> least in principle, the need for two separate else values for MASK_LEN.


Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-07 Thread Sébastien Michelland

Hi!


The default sh-elf configuration has no multi-libs for SH3 and SH4 variants
without FPU (from what I can see).  So it won't use soft-fp so much during
sim testing.  So please change to soft-fp for sh*, not just SH3/SH4.


Got it, done that locally, and will update patch once tested.


Here's an old proposed change to the simtest instructions to not use
combined trees:

https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin


Thanks for the instructions. Apologies for the back-and-forth as I'm 
pretty new with this infrastructure (I usually do research stuff on LLVM).


The split-tree build goes better, still fails with GCC 15 (as expected, 
though somehow my custom toolchain did build originally) and sort of 
works with GCC 14.


The binutils/gdb repos have been merged since that attachement, and 
while I can build binutils only with --disable-gdb, building gdb (in 
another build folder, reconfiguring from scratch) seems iffy. The global 
CFLAGS/CXXFLAGS to switch to 32-bit affects at least parts of binutils, 
resulting in a broken toolchain due to architecture mixup:


---
% sh-elf-g++ /tmp/test.cc -o /tmp/test
$PREFIX/lib/gcc/sh-elf/14.1.1/../../../../sh-elf/bin/ld: 
$PREFIX/libexec/gcc/sh-elf/14.1.1/liblto_plugin.so: error loading 
plugin: $PREFIX/libexec/gcc/sh-elf/14.1.1/liblto_plugin.so: wrong ELF 
class: ELFCLASS64

---

My first build kept GDB as 64-bit, but running the test binary in 
sh-elf-run gave a Bus error. Even with the 32-bit GDB build, ignoring 
the broken toolchain and running that old binary still gives a Bus error.


For reference, here's my latest attempt.

---
% cd ${TOP}
% git clone git://sourceware.org/git/binutils-gdb.git
% git clone https://sourceware.org/git/newlib-cygwin.git newlib
% ln -sf PATH_TO_MY_GCC_14.1 gcc

% cd ${TOP}/build-binutils
% ../binutils-gdb/configure --target=sh-elf --prefix="$PREFIX" 
--disable-nls --disable-werror --disable-gdb

% make all && make install

% cd ${TOP}/build-gcc
% ../gcc/configure --target=sh-elf --prefix="$PREFIX" 
--enable-languages=c,c++ --disable-nls --disable-werror --with-newlib 
--enable-lto --enable-multilib

% make all-gcc && make install-gcc

% cd ${TOP}/build-newlib
% ../newlib/configure --target=sh-elf --prefix="$PREFIX" --enable-lto 
--enable-multilib

% make all && make install

% cd ${TOP}/build-gcc
% rm -rf *
% ../gcc/configure --target=sh-elf --prefix="$PREFIX" 
--enable-languages=c,c++ --disable-nls --disable-werror --with-newlib 
--enable-lto --enable-multilib

% make all && make install

% cd ${TOP}/build-gdb
% CFLAGS="-O2 -m32 -msse -mfpmath=sse" CXXFLAGS="-O2 -m32 -msse 
-mfpmath=sse" ../binutils-gdb/configure --target=sh-elf 
--prefix="$PREFIX" --enable-interwork --enable-multilib --disable-nls 
--disable-werror

% make all && make install
---

Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
hardware I have. (My main platform also happens to be one of these SH4 
without an FPU, the SH4AL-DSP.)


Oh, wow, especially rare type!


How active are the main types? Like are there still new products 
designed with these (maybe the J2)?



The patterns were written and tested to the best of our knowledge at that
time many years ago.  Nobody thought that we'll get a 2nd combine pass after
RA.  Anyway, I'll have a look at the remaining patterns.


I'd be interested to learn more about the history of the SH backend, if 
anyone wrote that up somewhere...


Thanks again,
Sébastien


Re: [RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

2024-07-07 Thread Robin Dapp
> Yeah, I think so.  I guess for RVV there's a choice between:
> 
> (1) making the insn predicate accept all else values and making
> the insn emit an explicit blend between the loaded result
> and the else value
> 
> (2) making the insn predicate only accept “undefined” (SCRATCH in
> rtl terms)
> 
> (2) sounds more in keeping with Juzhe's fix for PR110751.


I think (2) is the reasonable choice for the future, however:

I misinterpreted the RVV spec.  The implementation has the choice
between filling with ones or leaving unchanged (merging).  So it's
not really "anything" as I claimed before.
This implies that we could defer to target_preferred_else_value and
eventually, in the target, leave it to the uarch what to return
(either "1" or "undefined").

On the general strategy:
Should we (for SVE, AVX512 etc.) have a match pattern that folds
 
  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
  COND_EXPR (mask, res, {0, 0, ...})

into just MASK_LOAD then?  MASK_LOAD (..., {0, 0, ..., 0}) would not
be emitted for RVV and we'd need to fold the COND_EXPR differently,
maybe with a combination of match patterns similar to the ones in
my RFC as well as a separate "pass" for mask tracking as Richi
suggested?

Regards
 Robin

p.s. We have the choice of different policies for length masking and
element masking.  I don't think the problem arose yet due to the way
we handle length control in the vectorizer but I could imagine, at
least in principle, the need for two separate else values for MASK_LEN.


[PATCH] fortran: Remove useless nested end of scalarization chain handling

2024-07-07 Thread Mikael Morin
Hello,

this is another small cleanup I had lying around.
Regression-tested on x86_64-linux.  Ok for master?

-- 8< --

Remove the special handling of end of nested scalarization chains, which
advanced the chain to an element of a parent chain when the current one
was reaching its end.

That handling was superfluous as nested chains correspond to nested
scalarizations of subexpressions and the scalarizations don't extend beyond
their associated subexpression and don't use any scalarisation element from
the parent expression.

No change of behaviour, as the GFC_SE struct is supposed to be in its final
state anyway when the last element from the chain has been consumed.

gcc/fortran/ChangeLog:

* trans-expr.cc (gfc_advance_se_ss_chain): Don't use an element
from the parent scalarization chain when the current chain reaches
its end.
---
 gcc/fortran/trans-expr.cc | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 477c2720187..f0862db5f17 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -2052,7 +2052,6 @@ void
 gfc_advance_se_ss_chain (gfc_se * se)
 {
   gfc_se *p;
-  gfc_ss *ss;
 
   gcc_assert (se != NULL && se->ss != NULL && se->ss != gfc_ss_terminator);
 
@@ -2064,15 +2063,7 @@ gfc_advance_se_ss_chain (gfc_se * se)
   gcc_assert (p->parent == NULL || p->parent->ss == p->ss
  || p->parent->ss->nested_ss == p->ss);
 
-  /* If we were in a nested loop, the next scalarized expression can be
-on the parent ss' next pointer.  Thus we should not take the next
-pointer blindly, but rather go up one nest level as long as next
-is the end of chain.  */
-  ss = p->ss;
-  while (ss->next == gfc_ss_terminator && ss->parent != NULL)
-   ss = ss->parent;
-
-  p->ss = ss->next;
+  p->ss = p->ss->next;
 
   p = p->parent;
 }
-- 
2.43.0



[x86 SSE PATCH] Some AVX512 ternlog expansion refinements.

2024-07-07 Thread Roger Sayle

Hi Hongtao,
This should address concerns about the remaining use of force_reg.

This patch replaces the call to force_reg in ix86_expand_ternlog_binop
with gen_reg_rtx and emit_move_insn, the last place where force_reg may
be called (indirectly) from ix86_expand_ternlog.  This patch also cleans
up whitespace, consistently uses CONST_VECTOR_P instead of GET_CODE and
tweaks checks for ix86_ternlog_leaf_p (for example where vpandn may take
a memory operand).

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-07-07  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_broadcast_from_constant):
Use CONST_VECTOR_P instead of comparison against GET_CODE.
(ix86_gen_bcst_mem): Likewise.
(ix86_ternlog_leaf_p): Likewise.
(ix86_ternlog_operand_p): ix86_ternlog_leaf_p is always true for
vector_all_ones_operand.
(ix86_expand_ternlog_bin_op): Use CONST_VECTOR_P instead of
equality comparison against GET_CODE.  Replace call to force_reg
with gen_reg_rtx and emit_move_insn (for VEC_DUPLICATE broadcast).
Support CONST_VECTORs, but calling force_const_mem.
(ix86_expand_ternlog): Fix indentation whitespace.
Allow ix86_ternlog_leaf_p as ix86_expand_ternlog_andnot's second
operand. Use CONST_VECTOR_P instead of equality against GET_CODE.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index dd2c3a8..c085786 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -612,7 +612,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
 return nullptr;
 
   rtx constant = get_pool_constant (XEXP (op, 0));
-  if (GET_CODE (constant) != CONST_VECTOR)
+  if (!CONST_VECTOR_P (constant))
 return nullptr;
 
   /* There could be some rtx like
@@ -622,7 +622,7 @@ ix86_broadcast_from_constant (machine_mode mode, rtx op)
 {
   constant = simplify_subreg (mode, constant, GET_MODE (constant),
  0);
-  if (constant == nullptr || GET_CODE (constant) != CONST_VECTOR)
+  if (constant == nullptr || !CONST_VECTOR_P (constant))
return nullptr;
 }
 
@@ -25532,7 +25532,7 @@ static rtx
 ix86_gen_bcst_mem (machine_mode mode, rtx x)
 {
   if (!TARGET_AVX512F
-  || GET_CODE (x) != CONST_VECTOR
+  || !CONST_VECTOR_P (x)
   || (!TARGET_AVX512VL
  && (GET_MODE_SIZE (mode) != 64 || !TARGET_EVEX512))
   || !VALID_BCST_MODE_P (GET_MODE_INNER (mode))
@@ -25722,7 +25722,7 @@ ix86_ternlog_leaf_p (rtx op, machine_mode mode)
  problems splitting instructions.  */
   return register_operand (op, mode)
 || MEM_P (op)
-|| GET_CODE (op) == CONST_VECTOR
+|| CONST_VECTOR_P (op)
 || bcst_mem_operand (op, mode);
 }
 
@@ -25772,8 +25772,7 @@ ix86_ternlog_operand_p (rtx op)
   op1 = XEXP (op, 1);
   /* Prefer pxor, or one_cmpl2.  */
   if (ix86_ternlog_leaf_p (XEXP (op, 0), mode)
- && (ix86_ternlog_leaf_p (op1, mode)
- || vector_all_ones_operand (op1, mode)))
+ && ix86_ternlog_leaf_p (XEXP (op, 1), mode))
return false;
   break;
 
@@ -25793,15 +25792,20 @@ ix86_expand_ternlog_binop (enum rtx_code code, 
machine_mode mode,
   if (GET_MODE (op1) != mode)
 op1 = gen_lowpart (mode, op1);
 
-  if (GET_CODE (op0) == CONST_VECTOR)
+  if (CONST_VECTOR_P (op0))
 op0 = validize_mem (force_const_mem (mode, op0));
-  if (GET_CODE (op1) == CONST_VECTOR)
+  if (CONST_VECTOR_P (op1))
 op1 = validize_mem (force_const_mem (mode, op1));
 
   if (memory_operand (op0, mode))
 {
   if (memory_operand (op1, mode))
-   op0 = force_reg (mode, op0);
+   {
+ /* We can't use force_reg (op0, mode).  */
+ rtx reg = gen_reg_rtx (mode);
+ emit_move_insn (reg, op0);
+ op0 = reg;
+   }
   else
std::swap (op0, op1);
 }
@@ -25820,6 +25824,8 @@ ix86_expand_ternlog_andnot (machine_mode mode, rtx op0, 
rtx op1, rtx target)
   op0 = gen_rtx_NOT (mode, op0);
   if (GET_MODE (op1) != mode)
 op1 = gen_lowpart (mode, op1);
+  if (CONST_VECTOR_P (op1))
+op1 = validize_mem (force_const_mem (mode, op1));
   emit_move_insn (target, gen_rtx_AND (mode, op0, op1));
   return target;
 }
@@ -25856,9 +25862,9 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx 
op1, rtx op2, int idx,
 {
 case 0x00:
   if ((!op0 || !side_effects_p (op0))
-  && (!op1 || !side_effects_p (op1))
-  && (!op2 || !side_effects_p (op2)))
-{
+ && (!op1 || !side_effects_p (op1))
+ && (!op2 || !side_effects_p (op2)))
+   {
  emit_move_insn (target, CONST0_RTX (mode));
  return target;
}
@@ -25867,21 +25873,21 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx 
op1, rtx op2, int

[PATCH] fortran: Move definition of variable closer to its uses

2024-07-07 Thread Mikael Morin
Hello,

I have found this small cleanup lying in a local branch.
Regression-tested on x86_64-linux, OK for master?

-- 8< --

No change of behaviour, this makes a variable easier to track.

gcc/fortran/ChangeLog:

* trans-array.cc (gfc_trans_preloop_setup): Use a separate variable
for iteration.  Use directly the value of variable I if it is known.
Move the definition of the variable to the branch where the
remaining uses are.
---
 gcc/fortran/trans-array.cc | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 510f429ef8e..c34c97257a9 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -4294,7 +4294,6 @@ gfc_trans_preloop_setup (gfc_loopinfo * loop, int dim, 
int flag,
   gfc_ss *ss, *pss;
   gfc_loopinfo *ploop;
   gfc_array_ref *ar;
-  int i;
 
   /* This code will be executed before entering the scalarization loop
  for this dimension.  */
@@ -4340,19 +4339,10 @@ gfc_trans_preloop_setup (gfc_loopinfo * loop, int dim, 
int flag,
  pss = ss;
}
 
-  if (dim == loop->dimen - 1)
-   i = 0;
-  else
-   i = dim + 1;
-
-  /* For the time being, there is no loop reordering.  */
-  gcc_assert (i == ploop->order[i]);
-  i = ploop->order[i];
-
   if (dim == loop->dimen - 1 && loop->parent == NULL)
{
  stride = gfc_conv_array_stride (info->descriptor,
- innermost_ss (ss)->dim[i]);
+ innermost_ss (ss)->dim[0]);
 
  /* Calculate the stride of the innermost loop.  Hopefully this will
 allow the backend optimizers to do their stuff more effectively.
@@ -4364,7 +4354,7 @@ gfc_trans_preloop_setup (gfc_loopinfo * loop, int dim, 
int flag,
 base offset of the array.  */
  if (info->ref)
{
- for (i = 0; i < ar->dimen; i++)
+ for (int i = 0; i < ar->dimen; i++)
{
  if (ar->dimen_type[i] != DIMEN_ELEMENT)
continue;
@@ -4374,8 +4364,21 @@ gfc_trans_preloop_setup (gfc_loopinfo * loop, int dim, 
int flag,
}
}
   else
-   /* Add the offset for the previous loop dimension.  */
-   add_array_offset (pblock, ploop, ss, ar, pss->dim[i], i);
+   {
+ int i;
+
+ if (dim == loop->dimen - 1)
+   i = 0;
+ else
+   i = dim + 1;
+
+ /* For the time being, there is no loop reordering.  */
+ gcc_assert (i == ploop->order[i]);
+ i = ploop->order[i];
+
+ /* Add the offset for the previous loop dimension.  */
+ add_array_offset (pblock, ploop, ss, ar, pss->dim[i], i);
+   }
 
   /* Remember this offset for the second loop.  */
   if (dim == loop->temp_dim - 1 && loop->parent == NULL)
-- 
2.43.0



Re: [RFC WIP] RAW_DATA_CST for #embed optimization

2024-07-07 Thread Richard Biener



> Am 06.07.2024 um 16:56 schrieb Jakub Jelinek :
> 
> On Sat, Jul 06, 2024 at 02:45:45PM +0200, Richard Biener wrote:
>>> Anyway, thoughts on this before I spend too much time on it?
>> 
>> Why do we have an "element type"?  Would
>> 
>> int a[] = {
>> #embed "cc1plus"
>> };
>> 
>> be valid?
> 
> Yes, that is valid.
> The way #embed is defined for C is that it is essentially just
> as if a huge sequence of integer literals like
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0
> so it can appear anywhere in the IL where the grammar allows something
> like that.  So even
> void foo (...);
> void bar ()
> {
>  foo (
> #embed "cc1plus"
>  );
>  int i = 1 + (
> #embed "cc1plus"
>  ) + 2;
> }
> etc. is valid.
> I chose to greatly simplify things by not emitting CPP_EMBED for the
> boundary numbers of the sequence because otherwise one needs to deal with
> significantly more special cases, one can have
> const unsigned char a[] = { 13 + 25 *
> #embed "cc1plus"
> / 2, 0 };
> for example, or even something expected to be used in C often like
> const unsigned char b[] = {
> [64] =
> #embed "cc1plus"
> };
> and the advantage of the inner sequence elements is we know for sure
> it is preceded by CPP_COMMA and succeeded by it too.  If we e.g. used
> CPP_EMBED even for single element sequence, that can appear anywhere
> where a CPP_NUMBER can appear in the grammar, which is basically everywhere.
> 
> Right now the patch when lexing CPP_EMBED turns it into a RAW_DATA_CST
> with integer_type_node type, that reflects that it is from the preprocessor
> a sequence of int literals, and then when parsing an initializer peels off
> bytes into it, see e.g. the c-c++-common/cpp/embed-19.c test in the
> patch where some of the sequence elements initialize some fields in a
> struct, others an unsigned char array field and others some other fields
> again.  To simplify things it only keeps around the RAW_DATA_CST in the
> initializer of ARRAY_TYPE CONSTRUCTORs if they have INTEGER_TYPE elements
> with CHAR_BIT precision, so
> int a[] = {
> #embed "cc1plus"
> };
> is peeled off into a huge sequence of INTEGER_CST CONSTRUCTOR_ELTs.
> In theory if this is something that appears often enough in real-world code
> we might use RAW_DATA_CST even for that case, basically allocate 4 times as
> big backing STRING_CST and based on target endianity and storage reverse
> extend it from one buffer to another one.  I'd prefer to do that only if
> we really see people actually want that, because it will be more work.
> 
>> I suppose #embed itself is just "embedding" the target(?)
>> representation and the file encodes that in bytes as if laid out in
>> memory?
> 
> It is designed pretty much as the values you get by fread into unsigned char
> array.
> 
>> Does anything in the #embed spec require actually reading the contents
>> of the embedded file?  For the above a fstat() would be enough to
>> deduce the size of a[].
> 
> For regular files, fstat would be good enough, for non-regular files one
> really has to read them into memory.
> But I think there are so many cases where we actually need to read and
> inspect the values at compile time I think having it always in memory
> (as implemented in the patch set) doesn't hurt.  E.g. for the first and
> last byte of the sequence we need to read those, any time one e.g. during
> constant expression evaluation does something like:
> constexpr unsigned char a[] = {
> #embed "cc1plus"
> };
> constexpr int b = a[6832];
> etc. we really need to read the value and interpret; similarly during
> optimizations we often do that as well.  ICF hashes the data to decide
> what is the same, ...
> Sure, having it all in memory will mean > 2GB embeds in 32-bit compilers
> will be tough, but in 64-bit compilers should work just fine, while I think
> e.g. right now you can't have an initialized > 4GB array without gaps
> because CONSTRUCTOR_ELTS is a vector and that uses unsigned int length.
> 
> What I think is important that we if at all possible keep it in memory once
> and refer to the libcpp buffer holding the file, don't copy stuff over and
> over, that is one of the reasons why compiling that
> #embed "cc1plus"
> right now without the optimizations (i.e. as the
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0
> 261M sequence) just eats more than 26GBs and 5 minutes (stopped it after
> that).  E.g. STRING_CST is inappropriate because it owns the data (data sits
> in its payload) and currently is only valid as the whole initializer of
> the array, not just part of it.
> 
>> When preprocessing only I suppose #embed
>> isn't "resolved", right?
> 
> The series as posted will with -E preprocess it into something like
>   118,
> # 10 "embed-10.c"
> #embed "." __gnu__::__base64__( \
> "b2lkCmZvbyAodW5zaWduZWQgY2hhciAqcCkKewp9CgppbnQKbWFpbiAoKQp7CiAgdW5zaWduZWQg"
>  \
> "Y2hhciBhW10gPSB7CiAgICAjZW1iZ