date:20230802

Re: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai

vfwadd needs to depend on FRM???

Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

Besides, do you check the MD pattern has include dependency of FRM_REGNUM?



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-02 14:35
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
From: Pan Li 
 
Update in v2:
 
1. Add vfwalu type to frm_mode.
2. Enhance the test cases for frm.
 
Original log:
 
This patch would like to support the rounding mode API for the VFWADD
VFSUB and VFRSUB as below samples.
 
* __riscv_vfwadd_vv_f64m2_rm
* __riscv_vfwadd_vv_f64m2_rm_m
* __riscv_vfwadd_vf_f64m2_rm
* __riscv_vfwadd_vf_f64m2_rm_m
* __riscv_vfwadd_wv_f64m2_rm
* __riscv_vfwadd_wv_f64m2_rm_m
* __riscv_vfwadd_wf_f64m2_rm
* __riscv_vfwadd_wf_f64m2_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class widen_binop_frm): New class for binop frm.
(BASE): Add vfwadd_frm.
* config/riscv/riscv-vector-builtins-bases.h: New declaration.
* config/riscv/riscv-vector-builtins-functions.def
(vfwadd_frm): New function definition.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): New macro.
(struct alu_frm_def): Leverage new base class.
(struct build_frm_base): New build base for frm.
(struct widen_alu_frm_def): New struct for widen alu frm.
(SHAPE): Add widen_alu_frm shape.
* config/riscv/riscv-vector-builtins-shapes.h: New declaration.
* config/riscv/vector.md (frm_mode): Add vfwalu type.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-widening-add.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 37 +++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  4 ++
.../riscv/riscv-vector-builtins-shapes.cc | 66 +++
.../riscv/riscv-vector-builtins-shapes.h  |  1 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-widening-add.c | 66 +++
7 files changed, 164 insertions(+), 13 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-add.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 035cafc43b3..981a4a7ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -315,6 +315,41 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfwadd
+*/
+template
+class widen_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen (CODE, e.vector_mode ()));
+  case OP_TYPE_vf:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
+  case OP_TYPE_wv:
+ if (CODE == PLUS)
+   return e.use_exact_insn (
+ code_for_pred_single_widen_add (e.vector_mode ()));
+ else
+   return e.use_exact_insn (
+ code_for_pred_single_widen_sub (e.vector_mode ()));
+  case OP_TYPE_wf:
+ return e.use_exact_insn (
+   code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
+  default:
+ gcc_unreachable ();
+  }
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2063,6 +2098,7 @@ static CONSTEXPR const binop_frm vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
+static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const binop vfmul_obj;
static CONSTEXPR const binop vfdiv_obj;
@@ -2292,6 +2328,7 @@ BASE (vfsub_frm)
BASE (vfrsub)
BASE (vfrsub_frm)
BASE (vfwadd)
+BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfmul)
BASE (vfdiv)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5c6b239c274..f9e1df5fe75 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -148,6 +148,7 @@ extern const function_base *const vfsub_frm;
extern const function_base *const vfrsub;
extern const function_base *const vfrsub_frm;
extern const function_base *const vfwadd;
+extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfmul;
extern const function_base *const vfmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index fa1c2cef970..743205a9b97 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -304,6 +304,10 @@ DEF_RVV_FUNCTION (vfwadd, widen_alu, ful

Fix profile update after cancelled loop distribution

2023-08-02 Thread Jan Hubicka via Gcc-patches

Hi,
Loop distribution and ifcvt introduces verisons of loops which may be removed
later if vectorization fails.  Ifcvt does this by temporarily breaking profile
and producing conditional that has two arms with 100% probability because we
know one of the versions will be removed.

Loop distribution is trickier, since it introduces test for alignment that
either survives to final code if vecotorization suceeds or is turned if it
fails.

Here we need to assign some reasonable probabilities for the case vectorization
goes well, so this code adds logic to scale profile back in case we remove the
call.

This is not perfect since we drop precise BB counts to guessed.  It is not big
deal since we do not use much reliablity of bb counts after this point.  Other
option would be to apply scale only if vectorization succeeds which however
needs bit more work at tree-loop-distribution side and would need all code in
this patch with small change that fold_loop_internal_call will have to know how
to adjust if conditional stays. I decided to go for easier solution for now.

Bootstrapped/regtested x86_64-linux, committed.

gcc/ChangeLog:

* cfg.cc (scale_strictly_dominated_blocks): New function.
* cfg.h (scale_strictly_dominated_blocks): Declare.
* tree-cfg.cc (fold_loop_internal_call): Fixup CFG profile.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr98308.c: Check that profile is consistent.

diff --git a/gcc/cfg.cc b/gcc/cfg.cc
index 0de6d6b9e71..9eb9916f61a 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -1195,3 +1195,27 @@ get_loop_copy (class loop *loop)
   else
 return NULL;
 }
+
+/* Scales the frequencies of all basic blocks that are strictly
+   dominated by BB by NUM/DEN.  */
+
+void
+scale_strictly_dominated_blocks (basic_block bb,
+profile_count num, profile_count den)
+{
+  basic_block son;
+
+  if (!den.nonzero_p () && !(num == profile_count::zero ()))
+return;
+  auto_vec  worklist;
+  worklist.safe_push (bb);
+
+  while (!worklist.is_empty ())
+for (son = first_dom_son (CDI_DOMINATORS, worklist.pop ());
+son;
+son = next_dom_son (CDI_DOMINATORS, son))
+  {
+   son->count = son->count.apply_scale (num, den);
+   worklist.safe_push (son);
+  }
+}
diff --git a/gcc/cfg.h b/gcc/cfg.h
index 4bf4263ebfc..a0e944979c8 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -127,6 +127,8 @@ extern void set_bb_copy (basic_block, basic_block);
 extern basic_block get_bb_copy (basic_block);
 void set_loop_copy (class loop *, class loop *);
 class loop *get_loop_copy (class loop *);
+void scale_strictly_dominated_blocks (basic_block,
+ profile_count, profile_count);
 
 /* Generic RAII class to allocate a bit from storage of integer type T.
The allocated bit is accessible as mask with the single bit set
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index c65af8cc800..c158454946c 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -7703,6 +7703,44 @@ fold_loop_internal_call (gimple *g, tree value)
   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
SET_USE (use_p, value);
   update_stmt (use_stmt);
+  /* If we turn conditional to constant, scale profile counts.
+We know that the conditional was created by loop distribution
+and all basic blocks dominated by the taken edge are part of
+the loop distributed.  */
+  if (gimple_code (use_stmt) == GIMPLE_COND)
+   {
+ edge true_edge, false_edge;
+ extract_true_false_edges_from_block (gimple_bb (use_stmt),
+  &true_edge, &false_edge);
+ edge taken_edge = NULL, other_edge = NULL;
+ if (gimple_cond_true_p (as_a (use_stmt)))
+   {
+ taken_edge = true_edge;
+ other_edge = false_edge;
+   }
+ else if (gimple_cond_false_p (as_a (use_stmt)))
+   {
+ taken_edge = false_edge;
+ other_edge = true_edge;
+   }
+ if (taken_edge
+ && !(taken_edge->probability == profile_probability::always ()))
+   {
+ profile_count old_count = taken_edge->count ();
+ profile_count new_count = taken_edge->src->count;
+ taken_edge->probability = profile_probability::always ();
+ other_edge->probability = profile_probability::never ();
+ /* If we have multiple predecessors, we can't use the dominance
+test.  This should not happen as the guarded code should
+start with pre-header.  */
+ gcc_assert (single_pred_edge (taken_edge->dest));
+ taken_edge->dest->count
+   = taken_edge->dest->count.apply_scale (new_count,
+  old_count);
+ scale_strictly_dominated_blocks (taken_edge->dest,
+  new_count, old_count);
+   }
+

RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches

> vfwadd needs to depend on FRM???
> Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

The VI_VFP_VF_LOOP_WIDE depends on VI_VFP_COMMON, which has required 
STATE.frm->read(). AFAIK, the precision will also result in rounding as 
floating is discretized by design. For example as below, a big number 
plus/minus a very small number.

2 * SEW = SEW - SEW, but the real value of SEW - SEW cannot be represented by 2 
* SEW, and then we may have precision exception which need rounding.

200.09997474f (real) = 0.09997474f(0X3727C5AC) + 200.0f 
( 0X49F42400) = 200.10761449f (0X413E8480A7C6)

>Besides, do you check the MD pattern has include dependency of FRM_REGNUM?

Yes, (reg:SI FRM_REGNUM) is included and the test covered both rm and non-rm 
parts.

Pan


From: juzhe.zh...@rivai.ai 
Sent: Wednesday, August 2, 2023 3:07 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

vfwadd needs to depend on FRM???

Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

Besides, do you check the MD pattern has include dependency of FRM_REGNUM?


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-02 14:35
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

Update in v2:

1. Add vfwalu type to frm_mode.
2. Enhance the test cases for frm.

Original log:

This patch would like to support the rounding mode API for the VFWADD
VFSUB and VFRSUB as below samples.

* __riscv_vfwadd_vv_f64m2_rm
* __riscv_vfwadd_vv_f64m2_rm_m
* __riscv_vfwadd_vf_f64m2_rm
* __riscv_vfwadd_vf_f64m2_rm_m
* __riscv_vfwadd_wv_f64m2_rm
* __riscv_vfwadd_wv_f64m2_rm_m
* __riscv_vfwadd_wf_f64m2_rm
* __riscv_vfwadd_wf_f64m2_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class widen_binop_frm): New class for binop frm.
(BASE): Add vfwadd_frm.
* config/riscv/riscv-vector-builtins-bases.h: New declaration.
* config/riscv/riscv-vector-builtins-functions.def
(vfwadd_frm): New function definition.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): New macro.
(struct alu_frm_def): Leverage new base class.
(struct build_frm_base): New build base for frm.
(struct widen_alu_frm_def): New struct for widen alu frm.
(SHAPE): Add widen_alu_frm shape.
* config/riscv/riscv-vector-builtins-shapes.h: New declaration.
* config/riscv/vector.md (frm_mode): Add vfwalu type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-add.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 37 +++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  4 ++
.../riscv/riscv-vector-builtins-shapes.cc | 66 +++
.../riscv/riscv-vector-builtins-shapes.h  |  1 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-widening-add.c | 66 +++
7 files changed, 164 insertions(+), 13 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-add.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 035cafc43b3..981a4a7ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -315,6 +315,41 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfwadd
+*/
+template
+class widen_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen (CODE, e.vector_mode ()));
+  case OP_TYPE_vf:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
+  case OP_TYPE_wv:
+ if (CODE == PLUS)
+   return e.use_exact_insn (
+ code_for_pred_single_widen_add (e.vector_mode ()));
+ else
+   return e.use_exact_insn (
+ code_for_pred_single_widen_sub (e.vector_mode ()));
+  case OP_TYPE_wf:
+ return e.use_exact_insn (
+   code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
+  default:
+ gcc_unreachable ();
+  }
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2063,6 +2098,7 @@ static CONSTEXPR const binop_frm vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONST

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, 31 Jul 2023, ??? wrote:

> Oh, Thanks a lot.
> I can test it in RISC-V backend now.
> 
> But I have another questions:
> >> I'm a bit confused (but also by the existing mask code), whether
> >>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
> >>If if-conversion recognizes a .COND_ADD then we need to add nothing
> >>for masking (that is, ifn == cond_fn already).  In your code above
> >>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> >>isn't that the very same?!  So how come you in one case add two
> >>and in the other add four args?
> >>Please make sure to place gcc_unreachable () in each arm and check
> >>you have test coverage.  I believe that the else arm is unreachable
> >>but when you vectorize .FMA you will need to add 4 and when you
> >>vectorize .COND_FMA you will need to add two arguments (as said,
> >>no idea why we special case reduc_idx >= 0 at the moment).
> 
> Do you mean I add gcc_unreachable in else like this:
> 
>   if (len_loop_p)
> {
>   if (len_opno >= 0)
> {
>   ifn = cond_len_fn;
>   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
>   vect_nargs += 2;
> }
>   else if (reduc_idx >= 0)
> {
>   /* FMA -> COND_LEN_FMA takes 4 extra arguments:MASK,ELSE,LEN,BIAS.  
> */
>   ifn = get_len_internal_fn (cond_fn);
>   vect_nargs += 4;

no, a gcc_unreachable () here.  That is, make sure you have test coverage
for the above two cases (to me the len_opno >= 0 case is obvious)

> }
> else
> gcc_unreachable ();
> }
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 21:58
> To: ???
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, ??? wrote:
>  
> > Yeah. I have tried this case too.
> > 
> > But this case doesn't need to be vectorized as COND_FMA, am I right?
>  
> Only when you enable loop masking.  Alternatively use
>  
> double foo (double *a, double *b, double *c)
> {
>   double result = 0.0;
>   for (int i = 0; i < 1024; ++i)
> result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
>   return result;
> }
>  
> but then for me if-conversion produces
>  
>   iftmp.0_18 = __builtin_fma (_8, _10, _5);
>   _ifc__43 = _26 ? iftmp.0_18 : 0.0;
>  
> with -ffast-math (probably rightfully so).  I then get .FMAs
> vectorized and .COND_FMA folded.
>  
> > The thing I wonder is that whether this condtion:
> > 
> > if  (mask_opno >= 0 && reduc_idx >= 0)
> > 
> > or similar as len
> > if  (len_opno >= 0 && reduc_idx >= 0)
> > 
> > Whether they are redundant in vectorizable_call ?
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 21:33
> > To: juzhe.zh...@rivai.ai
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richi.
> > > 
> > > >> I think you need to use fma from math.h together with -ffast-math
> > > >>to get fma.
> > > 
> > > As you said, this is one of the case I tried:
> > > https://godbolt.org/z/xMzrrv5dT 
> > > GCC failed to vectorize.
> > > 
> > > Could you help me with this?
> >  
> > double foo (double *a, double *b, double *c)
> > {
> >   double result = 0.0;
> >   for (int i = 0; i < 1024; ++i)
> > result += __builtin_fma (a[i], b[i], c[i]);
> >   return result;
> > }
> >  
> > with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
> >  
> > ...
> >   vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
> >   _9 = *_8;
> >   vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
> >   vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
> > ...
> >  
> > but ifcvt still shows
> >  
> >   _9 = *_8;
> >   _10 = __builtin_fma (_7, _9, _4);
> >   result_17 = _10 + result_20;
> >  
> > still vectorizable_call has IFN_FMA with
> >  
> >   /* First try using an internal function.  */
> >   code_helper convert_code = MAX_TREE_CODES;
> >   if (cfn != CFN_LAST
> >   && (modifier == NONE
> >   || (modifier == NARROW
> >   && simple_integer_narrowing (vectype_out, vectype_in,
> >&convert_code
> > ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> >   vectype_in);
> >  
> > from CFN_BUILT_IN_FMA
> >  
> >  
> >  
> > > Thanks.
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-07-31 20:00
> > > To: juzhe.zh...@rivai.ai
> > > CC: richard.sandiford; gcc-patches
> > > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for 
> > > COND_LEN_*
> > > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > Ok . Thanks Richard.
> > > > 
> > > > Could you give me a case that SVE can vectorize a reduction with FMA?
> > > > Meaning it will go into vectorize

Re: RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai

Ok. LGTM.

juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-08-02 15:38
To: juzhe.zh...@rivai.ai; gcc-patches
CC: Kito.cheng; Wang, Yanzhang
Subject: RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
> vfwadd needs to depend on FRM???
> Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

The VI_VFP_VF_LOOP_WIDE depends on VI_VFP_COMMON, which has required 
STATE.frm->read(). AFAIK, the precision will also result in rounding as 
floating is discretized by design. For example as below, a big number 
plus/minus a very small number.

2 * SEW = SEW – SEW, but the real value of SEW – SEW cannot be represented by 2 
* SEW, and then we may have precision exception which need rounding.

200.09997474f (real) = 0.09997474f(0X3727C5AC) + 200.0f 
( 0X49F42400) = 200.10761449f (0X413E8480A7C6)

>Besides, do you check the MD pattern has include dependency of FRM_REGNUM?

Yes, (reg:SI FRM_REGNUM) is included and the test covered both rm and non-rm 
parts.

Pan

From: juzhe.zh...@rivai.ai  
Sent: Wednesday, August 2, 2023 3:07 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

vfwadd needs to depend on FRM???

Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

Besides, do you check the MD pattern has include dependency of FRM_REGNUM?

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-02 14:35
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
From: Pan Li 

Update in v2:

1. Add vfwalu type to frm_mode.
2. Enhance the test cases for frm.

Original log:

This patch would like to support the rounding mode API for the VFWADD
VFSUB and VFRSUB as below samples.

* __riscv_vfwadd_vv_f64m2_rm
* __riscv_vfwadd_vv_f64m2_rm_m
* __riscv_vfwadd_vf_f64m2_rm
* __riscv_vfwadd_vf_f64m2_rm_m
* __riscv_vfwadd_wv_f64m2_rm
* __riscv_vfwadd_wv_f64m2_rm_m
* __riscv_vfwadd_wf_f64m2_rm
* __riscv_vfwadd_wf_f64m2_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class widen_binop_frm): New class for binop frm.
(BASE): Add vfwadd_frm.
* config/riscv/riscv-vector-builtins-bases.h: New declaration.
* config/riscv/riscv-vector-builtins-functions.def
(vfwadd_frm): New function definition.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): New macro.
(struct alu_frm_def): Leverage new base class.
(struct build_frm_base): New build base for frm.
(struct widen_alu_frm_def): New struct for widen alu frm.
(SHAPE): Add widen_alu_frm shape.
* config/riscv/riscv-vector-builtins-shapes.h: New declaration.
* config/riscv/vector.md (frm_mode): Add vfwalu type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-add.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 37 +++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  4 ++
.../riscv/riscv-vector-builtins-shapes.cc | 66 +++
.../riscv/riscv-vector-builtins-shapes.h  |  1 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-widening-add.c | 66 +++
7 files changed, 164 insertions(+), 13 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-add.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 035cafc43b3..981a4a7ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -315,6 +315,41 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfwadd
+*/
+template
+class widen_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen (CODE, e.vector_mode ()));
+  case OP_TYPE_vf:
+ return e.use_exact_insn (
+   code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
+  case OP_TYPE_wv:
+ if (CODE == PLUS)
+   return e.use_exact_insn (
+ code_for_pred_single_widen_add (e.vector_mode ()));
+ else
+   return e.use_exact_insn (
+ code_for_pred_single_widen_sub (e.vector_mode ()));
+  case OP_TYPE_wf:
+ return e.use_exact_insn (
+   code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
+  default:
+ gcc_unreachable ();
+  }
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2063,6 +2098,7 @@ static CONSTEXPR const binop_frm vfsub_frm_obj;
static CONSTEXPR const reverse_binop vfrsub_obj;
static CONSTEXPR const reverse_binop_frm vfrsub_frm_obj;
static CONSTEXPR const widen_binop vfwadd_obj;
+static

[PATCH v1] RISC-V: Enhance the test case for RVV vfsub/vfrsub rounding

2023-08-02 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to enhance the vfsub/vfrsub rounding API test for
below 2 purposes.

* The non-rm API has no frm related insn generated.
* The rm API has the frm backup/restore/set insn generated.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-rsub.c: Enhance
cases.
* gcc.target/riscv/rvv/base/float-point-single-sub.c: Ditto.
---
 .../riscv/rvv/base/float-point-single-rsub.c | 16 +++-
 .../riscv/rvv/base/float-point-single-sub.c  | 16 +++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
index 1d770adc32c..86c56b7c6cb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-rsub.c
@@ -16,4 +16,18 @@ test_vfrsub_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, 
float32_t op2,
   return __riscv_vfrsub_vf_f32m1_rm_m (mask, op1, op2, 3, vl);
 }
 
-/* { dg-final { scan-assembler-times 
{vfrsub\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 2 } } */
+vfloat32m1_t
+test_vfrsub_vf_f32m1 (vfloat32m1_t op1, float32_t op2, size_t vl) {
+  return __riscv_vfrsub_vf_f32m1 (op1, op2, vl);
+}
+
+vfloat32m1_t
+test_vfrsub_vf_f32m1_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
+   size_t vl) {
+  return __riscv_vfrsub_vf_f32m1_m (mask, op1, op2, vl);
+}
+
+/* { dg-final { scan-assembler-times 
{vfrsub\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 2 } } */
+/* { dg-final { scan-assembler-times {fsrmi\s+[01234]} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c
index 34ed03a31d9..8075dced0b9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-sub.c
@@ -27,4 +27,18 @@ test_vfsub_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, 
float32_t op2,
   return __riscv_vfsub_vf_f32m1_rm_m (mask, op1, op2, 3, vl);
 }
 
-/* { dg-final { scan-assembler-times 
{vfsub\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 4 } } */
+vfloat32m1_t
+test_riscv_vfsub_vv_f32m1 (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfsub_vv_f32m1 (op1, op2, vl);
+}
+
+vfloat32m1_t
+test_vfsub_vv_f32m1_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+  size_t vl) {
+  return __riscv_vfsub_vv_f32m1_m (mask, op1, op2, vl);
+}
+
+/* { dg-final { scan-assembler-times 
{vfsub\.v[vf]\s+v[0-9]+,\s*v[0-9]+,\s*[fav]+[0-9]+} 6 } } */
+/* { dg-final { scan-assembler-times {frrm\s+[axs][0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {fsrm\s+[axs][0-9]+} 4 } } */
+/* { dg-final { scan-assembler-times {fsrmi\s+[01234]} 4 } } */
-- 
2.34.1

Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
>
> Hi,
>
> when IPA-SRA detects whether a parameter passed by reference is
> written to, it does not special case CLOBBERs which means it often
> bails out unnecessarily, especially when dealing with C++ destructors.
> Fixed by the obvious continue in the two relevant loops.
>
> The (slightly) more complex testcases in the PR need surprisingly more
> effort but the simple one can be fixed now easily by this patch and I'll
> work on the others incrementally.
>
> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
> if it passes too?

LGTM, btw - how are the clobbers handled during transform?

> Thanks,
>
> Martin
>
>
>
>
> gcc/ChangeLog:
>
> 2023-07-31  Martin Jambor  
>
> PR ipa/110378
> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
> (ptr_parm_has_nonarg_uses): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> 2023-07-31  Martin Jambor  
>
> PR ipa/110378
> * g++.dg/ipa/pr110378-1.C: New test.
> ---
>  gcc/ipa-sra.cc|  6 ++--
>  gcc/testsuite/g++.dg/ipa/pr110378-1.C | 47 +++
>  2 files changed, 51 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C
>
> diff --git a/gcc/ipa-sra.cc b/gcc/ipa-sra.cc
> index c35e03b7abd..edba364f56e 100644
> --- a/gcc/ipa-sra.cc
> +++ b/gcc/ipa-sra.cc
> @@ -898,7 +898,8 @@ isra_track_scalar_value_uses (function *fun, cgraph_node 
> *node, tree name,
>
>FOR_EACH_IMM_USE_STMT (stmt, imm_iter, name)
>  {
> -  if (is_gimple_debug (stmt))
> +  if (is_gimple_debug (stmt)
> + || gimple_clobber_p (stmt))
> continue;
>
>/* TODO: We could handle at least const builtin functions like 
> arithmetic
> @@ -1056,7 +1057,8 @@ ptr_parm_has_nonarg_uses (cgraph_node *node, function 
> *fun, tree parm,
>unsigned uses_ok = 0;
>use_operand_p use_p;
>
> -  if (is_gimple_debug (stmt))
> +  if (is_gimple_debug (stmt)
> + || gimple_clobber_p (stmt))
> continue;
>
>if (gimple_assign_single_p (stmt))
> diff --git a/gcc/testsuite/g++.dg/ipa/pr110378-1.C 
> b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
> new file mode 100644
> index 000..aabe326b8b2
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ipa/pr110378-1.C
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-ipa-sra -fdump-tree-optimized-slim"  } */
> +
> +/* Test that even though destructors end with clobbering all of *this, it
> +   should not prevent IPA-SRA.  */
> +
> +namespace {
> +
> +  class foo
> +  {
> +  public:
> +int *a;
> +foo(int c)
> +{
> +  a = new int[c];
> +  a[0] = 4;
> +}
> +__attribute__((noinline)) ~foo();
> +int f ()
> +{
> +  return a[0] + 1;
> +}
> +  };
> +
> +  volatile int v1 = 4;
> +
> +  __attribute__((noinline)) foo::~foo()
> +  {
> +delete[] a;
> +return;
> +  }
> +
> +
> +}
> +
> +volatile int v2 = 20;
> +
> +int test (void)
> +{
> +  foo shouldnotexist(v2);
> +  v2 = shouldnotexist.f();
> +  return 0;
> +}
> +
> +
> +/* { dg-final { scan-ipa-dump "Will split parameter 0" "sra"  } } */
> +/* { dg-final { scan-tree-dump-not "shouldnotexist" "optimized" } } */
> --
> 2.41.0
>

Re: [PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, Jul 31, 2023 at 7:47 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This is a simple patch to move these 2 patterns over to use
> bitwise_inverted_equal_p. It also allows us to remove 2 other patterns
> which were used on comparisons as they are now handled by
> the original pattern.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * match.pd (`~X & X`, `~X | X`): Move over to
> use bitwise_inverted_equal_p, removing :c as bitwise_inverted_equal_p
> handles that already.
> Remove range test simplifications to true/false as they
> are now handled by these patterns.
> ---
>  gcc/match.pd | 28 ++--
>  1 file changed, 6 insertions(+), 22 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 74f0a84f31d..7d030262698 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* Simplify ~X & X as zero.  */
>  (simplify
> - (bit_and:c (convert? @0) (convert? (bit_not @0)))
> -  { build_zero_cst (type); })
> + (bit_and (convert? @0) (convert? @1))
> + (if (bitwise_inverted_equal_p (@0, @1))
> +  { build_zero_cst (type); }))
>
>  /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
>  (simplify
> @@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* ~x ^ x -> -1 */
>  (for op (bit_ior bit_xor)
>   (simplify
> -  (op:c (convert? @0) (convert? (bit_not @0)))
> -  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
> +  (op (convert? @0) (convert? @1))
> +  (if (bitwise_inverted_equal_p (@0, @1))
> +   (convert { build_all_ones_cst (TREE_TYPE (@0)); }
>
>  /* x ^ x -> 0 */
>  (simplify
> @@ -5994,24 +5996,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1))
>   @2)
>
> -/* Simple range test simplifications.  */
> -/* A < B || A >= B -> true.  */
> -(for test1 (lt le le le ne ge)
> - test2 (ge gt ge ne eq ne)
> - (simplify
> -  (bit_ior:c (test1 @0 @1) (test2 @0 @1))
> -  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> -   || VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
> -   { constant_boolean_node (true, type); })))
> -/* A < B && A >= B -> false.  */
> -(for test1 (lt lt lt le ne eq)
> - test2 (ge gt eq gt eq gt)
> - (simplify
> -  (bit_and:c (test1 @0 @1) (test2 @0 @1))
> -  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> -   || VECTOR_INTEGER_TYPE_P (TREE_TYPE (@0)))
> -   { constant_boolean_node (false, type); })))
> -
>  /* A & (2**N - 1) <= 2**K - 1 -> A & (2**N - 2**K) == 0
> A & (2**N - 1) >  2**K - 1 -> A & (2**N - 2**K) != 0
>
> --
> 2.31.1
>

Re: [PATCH 2/2] Slightly improve bitwise_inverted_equal_p comparisons

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, Jul 31, 2023 at 7:47 PM Andrew Pinski via Gcc-patches
 wrote:
>
> This slighly improves bitwise_inverted_equal_p
> for comparisons. Instead of just comparing the
> comparisons operands also valueize them.
> This will allow ccp and others to match the 2 comparisons
> without an extra pass happening.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Valueize
> the comparison operands before comparing them.
> ---
>  gcc/gimple-match-head.cc | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 0265e55be93..b1e96304d7c 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -319,12 +319,12 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree 
> expr2, tree (*valueize) (tree)
>&& TREE_CODE_CLASS (gimple_assign_rhs_code (a1)) == tcc_comparison
>&& TREE_CODE_CLASS (gimple_assign_rhs_code (a2)) == tcc_comparison)
>  {
> -  tree op10 = gimple_assign_rhs1 (a1);
> -  tree op20 = gimple_assign_rhs1 (a2);
> +  tree op10 = do_valueize (valueize, gimple_assign_rhs1 (a1));
> +  tree op20 = do_valueize (valueize, gimple_assign_rhs1 (a2));
>if (!operand_equal_p (op10, op20))
>  return false;
> -  tree op11 = gimple_assign_rhs2 (a1);
> -  tree op21 = gimple_assign_rhs2 (a2);
> +  tree op11 = do_valueize (valueize, gimple_assign_rhs2 (a1));
> +  tree op21 = do_valueize (valueize, gimple_assign_rhs2 (a2));
>if (!operand_equal_p (op11, op21))
>  return false;
>if (invert_tree_comparison (gimple_assign_rhs_code (a1),
> --
> 2.31.1
>

Re: [PATCH] PHIOPT: Mark the conditional lhs and rhs as to look at to see if DCEable

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, Jul 31, 2023 at 10:17 PM Andrew Pinski via Gcc-patches
 wrote:
>
> In some cases (usually dealing with bools only), there could be some 
> statements
> left behind which are considered trivial dead.
> An example is:
> ```
> bool f(bool a, bool b)
> {
> if (!a && !b)
> return 0;
> if (!a && b)
> return 0;
> if (a && !b)
> return 0;
> return 1;
> }
> ```
> Where during phiopt2, the IR had:
> ```
>   _3 = ~b_7(D);
>   _4 = _3 & a_6(D);
>   _4 != 0 ? 0 : 1
> ```
> match-and-simplify would transform that into:
> ```
>   _11 = ~a_6(D);
>   _12 = b_7(D) | _11;
> ```
> But phiopt would leave around the statements defining _4 and _3.
> This helps by marking the conditional's lhs and rhs to see if they are
> trivial dead.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (match_simplify_replacement): Mark's cond
> statement's lhs and rhs to check if trivial dead.
> Rename inserted_exprs to exprs_maybe_dce; also move it so
> bitmap is not allocated if not needed.
> ---
>  gcc/tree-ssa-phiopt.cc | 21 -
>  1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index cb4e2da023d..ff36bb0119b 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -767,7 +767,6 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>tree result;
>gimple *stmt_to_move = NULL;
>gimple *stmt_to_move_alt = NULL;
> -  auto_bitmap inserted_exprs;
>tree arg_true, arg_false;
>
>/* Special case A ? B : B as this will always simplify to B. */
> @@ -844,6 +843,18 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>if (!result)
>  return false;
>
> +  auto_bitmap exprs_maybe_dce;
> +
> +  /* Mark the cond statements' lhs/rhs as maybe dce.  */
> +  if (TREE_CODE (gimple_cond_lhs (stmt)) == SSA_NAME
> +  && !SSA_NAME_IS_DEFAULT_DEF (gimple_cond_lhs (stmt)))
> +bitmap_set_bit (exprs_maybe_dce,
> +   SSA_NAME_VERSION (gimple_cond_lhs (stmt)));
> +  if (TREE_CODE (gimple_cond_rhs (stmt)) == SSA_NAME
> +  && !SSA_NAME_IS_DEFAULT_DEF (gimple_cond_rhs (stmt)))
> +bitmap_set_bit (exprs_maybe_dce,
> +   SSA_NAME_VERSION (gimple_cond_rhs (stmt)));
> +
>gsi = gsi_last_bb (cond_bb);
>/* Insert the sequence generated from gimple_simplify_phiopt.  */
>if (seq)
> @@ -855,7 +866,7 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>   gimple *stmt = gsi_stmt (gsi1);
>   tree name = gimple_get_lhs (stmt);
>   if (name && TREE_CODE (name) == SSA_NAME)
> -   bitmap_set_bit (inserted_exprs, SSA_NAME_VERSION (name));
> +   bitmap_set_bit (exprs_maybe_dce, SSA_NAME_VERSION (name));
> }
>if (dump_file && (dump_flags & TDF_FOLDING))
> {
> @@ -867,10 +878,10 @@ match_simplify_replacement (basic_block cond_bb, 
> basic_block middle_bb,
>
>/* If there was a statement to move, move it to right before
>   the original conditional.  */
> -  move_stmt (stmt_to_move, &gsi, inserted_exprs);
> -  move_stmt (stmt_to_move_alt, &gsi, inserted_exprs);
> +  move_stmt (stmt_to_move, &gsi, exprs_maybe_dce);
> +  move_stmt (stmt_to_move_alt, &gsi, exprs_maybe_dce);
>
> -  replace_phi_edge_with_variable (cond_bb, e1, phi, result, inserted_exprs);
> +  replace_phi_edge_with_variable (cond_bb, e1, phi, result, exprs_maybe_dce);
>
>/* Add Statistic here even though replace_phi_edge_with_variable already
>   does it as we want to be able to count when match-simplify happens vs
> --
> 2.31.1
>

Re: [PATCH] tree-pretty-print: handle COMPONENT_REF with non-decl RHS

2023-08-02 Thread Richard Biener via Gcc-patches

On Tue, Aug 1, 2023 at 2:36 AM Patrick Palka via Gcc-patches
 wrote:
>
> Tested on x86_64-pc-linux-gnu, does this look OK for trunk?
>
> -- >8 --
>
> In the C++ front end, a COMPONENT_REF's second operand isn't always a
> decl (at least at template parse time).  This patch makes the generic
> pretty printer not ICE when printing such a COMPONENT_REF.
>
> gcc/ChangeLog:
>
> * tree-pretty-print.cc (dump_generic_node) :
> Don't call component_ref_field_offset if the RHS isn't a decl.
> ---
>  gcc/tree-pretty-print.cc | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
> index 25d191b10fd..da8dd002a3b 100644
> --- a/gcc/tree-pretty-print.cc
> +++ b/gcc/tree-pretty-print.cc
> @@ -2482,14 +2482,16 @@ dump_generic_node (pretty_printer *pp, tree node, int 
> spc, dump_flags_t flags,
>if (op_prio (op0) < op_prio (node))
> pp_right_paren (pp);
>pp_string (pp, str);
> -  dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
> -  op0 = component_ref_field_offset (node);
> -  if (op0 && TREE_CODE (op0) != INTEGER_CST)
> -   {
> - pp_string (pp, "{off: ");
> - dump_generic_node (pp, op0, spc, flags, false);
> +  op1 = TREE_OPERAND (node, 1);
> +  dump_generic_node (pp, op1, spc, flags, false);
> +  if (DECL_P (op1))

OK if you add a comment before this test.

> +   if (tree off = component_ref_field_offset (node))
> + if (TREE_CODE (off) != INTEGER_CST)
> +   {
> + pp_string (pp, "{off: ");
> + dump_generic_node (pp, off, spc, flags, false);
>   pp_right_brace (pp);
> -   }
> +   }
>break;
>
>  case BIT_FIELD_REF:
> --
> 2.41.0.478.gee48e70a82
>

RE: RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches

Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, August 2, 2023 3:50 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Wang, Yanzhang 
Subject: Re: RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic 
API

Ok. LGTM.

juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-08-02 15:38
To: juzhe.zh...@rivai.ai; 
gcc-patches
CC: Kito.cheng; Wang, 
Yanzhang
Subject: RE: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
> vfwadd needs to depend on FRM???
> Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

The VI_VFP_VF_LOOP_WIDE depends on VI_VFP_COMMON, which has required 
STATE.frm->read(). AFAIK, the precision will also result in rounding as 
floating is discretized by design. For example as below, a big number 
plus/minus a very small number.

2 * SEW = SEW – SEW, but the real value of SEW – SEW cannot be represented by 2 
* SEW, and then we may have precision exception which need rounding.

200.09997474f (real) = 0.09997474f(0X3727C5AC) + 200.0f 
( 0X49F42400) = 200.10761449f (0X413E8480A7C6)

>Besides, do you check the MD pattern has include dependency of FRM_REGNUM?

Yes, (reg:SI FRM_REGNUM) is included and the test covered both rm and non-rm 
parts.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Wednesday, August 2, 2023 3:07 PM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Kito.cheng mailto:kito.ch...@sifive.com>>; Li, Pan2 
mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>
Subject: Re: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API

vfwadd needs to depend on FRM???

Did you check SPIKE ? I am not sure since I think vfwadd never overflow.

Besides, do you check the MD pattern has include dependency of FRM_REGNUM?

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-02 14:35
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV VFWADD rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

Update in v2:

1. Add vfwalu type to frm_mode.
2. Enhance the test cases for frm.

Original log:

This patch would like to support the rounding mode API for the VFWADD
VFSUB and VFRSUB as below samples.

* __riscv_vfwadd_vv_f64m2_rm
* __riscv_vfwadd_vv_f64m2_rm_m
* __riscv_vfwadd_vf_f64m2_rm
* __riscv_vfwadd_vf_f64m2_rm_m
* __riscv_vfwadd_wv_f64m2_rm
* __riscv_vfwadd_wv_f64m2_rm_m
* __riscv_vfwadd_wf_f64m2_rm
* __riscv_vfwadd_wf_f64m2_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class widen_binop_frm): New class for binop frm.
(BASE): Add vfwadd_frm.
* config/riscv/riscv-vector-builtins-bases.h: New declaration.
* config/riscv/riscv-vector-builtins-functions.def
(vfwadd_frm): New function definition.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): New macro.
(struct alu_frm_def): Leverage new base class.
(struct build_frm_base): New build base for frm.
(struct widen_alu_frm_def): New struct for widen alu frm.
(SHAPE): Add widen_alu_frm shape.
* config/riscv/riscv-vector-builtins-shapes.h: New declaration.
* config/riscv/vector.md (frm_mode): Add vfwalu type.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-add.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 37 +++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  4 ++
.../riscv/riscv-vector-builtins-shapes.cc | 66 +++
.../riscv/riscv-vector-builtins-shapes.h  |  1 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-widening-add.c | 66 +++
7 files changed, 164 insertions(+), 13 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-add.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 035cafc43b3..981a4a7ede8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -315,6 +315,41 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfwadd
+*/
+template
+class widen_binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)

Re: [PATCH] rtl-optimization/110587 - speedup find_hard_regno_for_1

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, 31 Jul 2023, Jeff Law wrote:

> 
> 
> On 7/31/23 04:54, Richard Biener via Gcc-patches wrote:
> > On Tue, 25 Jul 2023, Richard Biener wrote:
> > 
> >> The following applies a micro-optimization to find_hard_regno_for_1,
> >> re-ordering the check so we can easily jump-thread by using an else.
> >> This reduces the time spent in this function by 15% for the testcase
> >> in the PR.
> >>
> >> Bootstrap & regtest running on x86_64-unknown-linux-gnu, OK if that
> >> passes?
> > 
> > Ping.
> > 
> >> Thanks,
> >> Richard.
> >>
> >>  PR rtl-optimization/110587
> >>  * lra-assigns.cc (find_hard_regno_for_1): Re-order checks.
> >> ---
> >>   gcc/lra-assigns.cc | 9 +
> >>   1 file changed, 5 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> >> index b8582dcafff..d2ebcfd5056 100644
> >> --- a/gcc/lra-assigns.cc
> >> +++ b/gcc/lra-assigns.cc
> >> @@ -522,14 +522,15 @@ find_hard_regno_for_1 (int regno, int *cost, int
> >> @@ try_only_hard_regno,
> >>   r2 != NULL;
> >>   r2 = r2->start_next)
> >>{
> >> -if (r2->regno >= lra_constraint_new_regno_start
> >> +if (live_pseudos_reg_renumber[r2->regno] < 0
> >> +&& r2->regno >= lra_constraint_new_regno_start
> >>   && lra_reg_info[r2->regno].preferred_hard_regno1 >= 0
> >> -&& live_pseudos_reg_renumber[r2->regno] < 0
> >>   && rclass_intersect_p[regno_allocno_class_array[r2->regno]])
> >> sparseset_set_bit (conflict_reload_and_inheritance_pseudos,
> >>   r2->regno);
> >> -if (live_pseudos_reg_renumber[r2->regno] >= 0
> >> -&& rclass_intersect_p[regno_allocno_class_array[r2->regno]])
> >> +else if (live_pseudos_reg_renumber[r2->regno] >= 0
> >> + && rclass_intersect_p
> >> +  [regno_allocno_class_array[r2->regno]])
> >> sparseset_set_bit (live_range_hard_reg_pseudos, r2->regno);
> My biggest concern here would be r2->regno < 0  in the new code which could
> cause an OOB array reference in the first condition of the test.
> 
> Isn't that the point if the original ordering?  Test that r2->regno is
> reasonable before using it as an array index?

Note the original code is

  if (r2->regno >= lra_constraint_new_regno_start
...
  if (live_pseudos_reg_renumber[r2->regno] >= 0
...

so we are going to access live_pseudos_reg_renumber[r2->regno]
independent on the r2->regno >= lra_constraint_new_regno_start check,
so I don't think that's the point of the original ordering.  Note
I preserved the ordering with respect to other array accesses,
the speedup seen is because we now have the


   if (live_pseudos_reg_renumber[r2->regno] < 0
   ...
   else if (live_pseudos_reg_renumber[r2->regno] >= 0
...

structure directly exposed which helps the compiler.

I think the check on r2->regno is to decide whether to alter
conflict_reload_and_inheritance_pseudos or
live_range_hard_reg_pseudos (so it's also somewhat natural to check
that first).

Thanks,
Richard.

Re: [PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-08-02 Thread Jakub Jelinek via Gcc-patches

On Wed, Aug 02, 2023 at 10:04:26AM +0200, Richard Biener via Gcc-patches wrote:
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >
> >  /* Simplify ~X & X as zero.  */
> >  (simplify
> > - (bit_and:c (convert? @0) (convert? (bit_not @0)))
> > -  { build_zero_cst (type); })
> > + (bit_and (convert? @0) (convert? @1))
> > + (if (bitwise_inverted_equal_p (@0, @1))
> > +  { build_zero_cst (type); }))

I wonder if the above isn't incorrect.
Without the possibility of widening converts it would be ok,
but for widening conversions it is significant not just that
the bits of @0 and @1 are inverted, but also that they are either
both signed or both unsigned and so the MS bit (which is guaranteed
to be different) extends to 0s in one case and to all 1s in the other
one, so that even the upper bits are inverted.
But that isn't the case here.  Something like (untested):
long long
foo (unsigned int x)
{
  int y = x;
  y = ~y;
  return ((long long) x) & y;
}
Actually maybe for this pattern it happens to be ok, because while
the upper bits in this case might not be inverted between the extended
operands (if x has msb set), it will be 0 & 0 in the upper bits.

> >
> >  /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
> >  (simplify
> > @@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  /* ~x ^ x -> -1 */
> >  (for op (bit_ior bit_xor)
> >   (simplify
> > -  (op:c (convert? @0) (convert? (bit_not @0)))
> > -  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
> > +  (op (convert? @0) (convert? @1))
> > +  (if (bitwise_inverted_equal_p (@0, @1))
> > +   (convert { build_all_ones_cst (TREE_TYPE (@0)); }

But not here.
long long
bar (unsigned int x)
{
  int y = x;
  y = ~y;
  return ((long long) x) ^ y;
}

long long
baz (unsigned int x)
{
  int y = x;
  y = ~y;
  return y ^ ((long long) x);
}
You pick TREE_TYPE (@0), but that is a random signedness if the two
operands have different signedness.

Jakub

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread juzhe.zh...@rivai.ai

Thanks Richard so much.

Forgive me asking question again :)

Is this following code correct for you ?

+  if (len_loop_p)
+{
+  if (len_opno >= 0)
+   {
+ ifn = cond_len_fn;
+ /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
+ vect_nargs += 2;
+   }
+  else if (reduc_idx >= 0)
+   gcc_unreachable ();
+}

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-02 15:49
To: 钟居哲
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Mon, 31 Jul 2023, ??? wrote:
 
> Oh, Thanks a lot.
> I can test it in RISC-V backend now.
> 
> But I have another questions:
> >> I'm a bit confused (but also by the existing mask code), whether
> >>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
> >>If if-conversion recognizes a .COND_ADD then we need to add nothing
> >>for masking (that is, ifn == cond_fn already).  In your code above
> >>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> >>isn't that the very same?!  So how come you in one case add two
> >>and in the other add four args?
> >>Please make sure to place gcc_unreachable () in each arm and check
> >>you have test coverage.  I believe that the else arm is unreachable
> >>but when you vectorize .FMA you will need to add 4 and when you
> >>vectorize .COND_FMA you will need to add two arguments (as said,
> >>no idea why we special case reduc_idx >= 0 at the moment).
> 
> Do you mean I add gcc_unreachable in else like this:
> 
>   if (len_loop_p)
> {
>   if (len_opno >= 0)
> {
>   ifn = cond_len_fn;
>   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
>   vect_nargs += 2;
> }
>   else if (reduc_idx >= 0)
> {
>   /* FMA -> COND_LEN_FMA takes 4 extra arguments:MASK,ELSE,LEN,BIAS.  
> */
>   ifn = get_len_internal_fn (cond_fn);
>   vect_nargs += 4;
 
no, a gcc_unreachable () here.  That is, make sure you have test coverage
for the above two cases (to me the len_opno >= 0 case is obvious)
 
> }
> else
> gcc_unreachable ();
> }
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-07-31 21:58
> To: ???
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, ??? wrote:
>  
> > Yeah. I have tried this case too.
> > 
> > But this case doesn't need to be vectorized as COND_FMA, am I right?
>  
> Only when you enable loop masking.  Alternatively use
>  
> double foo (double *a, double *b, double *c)
> {
>   double result = 0.0;
>   for (int i = 0; i < 1024; ++i)
> result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
>   return result;
> }
>  
> but then for me if-conversion produces
>  
>   iftmp.0_18 = __builtin_fma (_8, _10, _5);
>   _ifc__43 = _26 ? iftmp.0_18 : 0.0;
>  
> with -ffast-math (probably rightfully so).  I then get .FMAs
> vectorized and .COND_FMA folded.
>  
> > The thing I wonder is that whether this condtion:
> > 
> > if  (mask_opno >= 0 && reduc_idx >= 0)
> > 
> > or similar as len
> > if  (len_opno >= 0 && reduc_idx >= 0)
> > 
> > Whether they are redundant in vectorizable_call ?
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 21:33
> > To: juzhe.zh...@rivai.ai
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richi.
> > > 
> > > >> I think you need to use fma from math.h together with -ffast-math
> > > >>to get fma.
> > > 
> > > As you said, this is one of the case I tried:
> > > https://godbolt.org/z/xMzrrv5dT 
> > > GCC failed to vectorize.
> > > 
> > > Could you help me with this?
> >  
> > double foo (double *a, double *b, double *c)
> > {
> >   double result = 0.0;
> >   for (int i = 0; i < 1024; ++i)
> > result += __builtin_fma (a[i], b[i], c[i]);
> >   return result;
> > }
> >  
> > with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
> >  
> > ...
> >   vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
> >   _9 = *_8;
> >   vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
> >   vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
> > ...
> >  
> > but ifcvt still shows
> >  
> >   _9 = *_8;
> >   _10 = __builtin_fma (_7, _9, _4);
> >   result_17 = _10 + result_20;
> >  
> > still vectorizable_call has IFN_FMA with
> >  
> >   /* First try using an internal function.  */
> >   code_helper convert_code = MAX_TREE_CODES;
> >   if (cfn != CFN_LAST
> >   && (modifier == NONE
> >   || (modifier == NARROW
> >   && simple_integer_narrowing (vectype_out, vectype_in,
> >&convert_code
> > ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> >

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread Richard Biener via Gcc-patches

On Wed, 2 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richard so much.
> 
> Forgive me asking question again :)
> 
> Is this following code correct for you ?

Well, I wonder what kind of testcase runs into the reduc_idx >= 0 case.
The point is I don't _know_ whether the code is correct, in fact it looked
suspicious ;)

> +  if (len_loop_p)
> +{
> +  if (len_opno >= 0)
> + {
> +   ifn = cond_len_fn;
> +   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> +   vect_nargs += 2;
> + }
> +  else if (reduc_idx >= 0)
> + gcc_unreachable ();
> +}
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-02 15:49
> To: ???
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, ??? wrote:
>  
> > Oh, Thanks a lot.
> > I can test it in RISC-V backend now.
> > 
> > But I have another questions:
> > >> I'm a bit confused (but also by the existing mask code), whether
> > >>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
> > >>If if-conversion recognizes a .COND_ADD then we need to add nothing
> > >>for masking (that is, ifn == cond_fn already).  In your code above
> > >>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> > >>isn't that the very same?!  So how come you in one case add two
> > >>and in the other add four args?
> > >>Please make sure to place gcc_unreachable () in each arm and check
> > >>you have test coverage.  I believe that the else arm is unreachable
> > >>but when you vectorize .FMA you will need to add 4 and when you
> > >>vectorize .COND_FMA you will need to add two arguments (as said,
> > >>no idea why we special case reduc_idx >= 0 at the moment).
> > 
> > Do you mean I add gcc_unreachable in else like this:
> > 
> >   if (len_loop_p)
> > {
> >   if (len_opno >= 0)
> > {
> >   ifn = cond_len_fn;
> >   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> >   vect_nargs += 2;
> > }
> >   else if (reduc_idx >= 0)
> > {
> >   /* FMA -> COND_LEN_FMA takes 4 extra 
> > arguments:MASK,ELSE,LEN,BIAS.  */
> >   ifn = get_len_internal_fn (cond_fn);
> >   vect_nargs += 4;
>  
> no, a gcc_unreachable () here.  That is, make sure you have test coverage
> for the above two cases (to me the len_opno >= 0 case is obvious)
>  
> > }
> > else
> > gcc_unreachable ();
> > }
> > 
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 21:58
> > To: ???
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, ??? wrote:
> >  
> > > Yeah. I have tried this case too.
> > > 
> > > But this case doesn't need to be vectorized as COND_FMA, am I right?
> >  
> > Only when you enable loop masking.  Alternatively use
> >  
> > double foo (double *a, double *b, double *c)
> > {
> >   double result = 0.0;
> >   for (int i = 0; i < 1024; ++i)
> > result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
> >   return result;
> > }
> >  
> > but then for me if-conversion produces
> >  
> >   iftmp.0_18 = __builtin_fma (_8, _10, _5);
> >   _ifc__43 = _26 ? iftmp.0_18 : 0.0;
> >  
> > with -ffast-math (probably rightfully so).  I then get .FMAs
> > vectorized and .COND_FMA folded.
> >  
> > > The thing I wonder is that whether this condtion:
> > > 
> > > if  (mask_opno >= 0 && reduc_idx >= 0)
> > > 
> > > or similar as len
> > > if  (len_opno >= 0 && reduc_idx >= 0)
> > > 
> > > Whether they are redundant in vectorizable_call ?
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-07-31 21:33
> > > To: juzhe.zh...@rivai.ai
> > > CC: richard.sandiford; gcc-patches
> > > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for 
> > > COND_LEN_*
> > > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > Hi, Richi.
> > > > 
> > > > >> I think you need to use fma from math.h together with -ffast-math
> > > > >>to get fma.
> > > > 
> > > > As you said, this is one of the case I tried:
> > > > https://godbolt.org/z/xMzrrv5dT 
> > > > GCC failed to vectorize.
> > > > 
> > > > Could you help me with this?
> > >  
> > > double foo (double *a, double *b, double *c)
> > > {
> > >   double result = 0.0;
> > >   for (int i = 0; i < 1024; ++i)
> > > result += __builtin_fma (a[i], b[i], c[i]);
> > >   return result;
> > > }
> > >  
> > > with -mavx2 -mfma -Ofast this is vectorized on x86_64 to
> > >  
> > > ...
> > >   vect__9.13_27 = MEM  [(double *)vectp_a.11_29];
> > >   _9 = *_8;
> > >   vect__10.14_26 = .FMA (vect__7.10_30, vect__9.13_27, vect__4.7_33);
> > >   vect_result_17.15_25 = vect__10.14_26 + vect_result_20.4_36;
> > > ...
> > >  
> > > but ifcvt still shows
> > >  
> > >   _9 = *_8;
> > >   _10 = __builtin_fma (_7, _9, _4);
> > >   result_

[PATCH] Enable tpause Exponential backoff and thread delay

2023-08-02 Thread Zhang, Jun via Gcc-patches

There are two kinds of pause bottleneck, one is in user space, the other
is in kernel. Tpause plus backoff could reduce loop count in user space.
To kernel, Because tasks start at same time, they usually arrive critial
area at same time, this decrease performance. tasks started one by one
could avoid it.

include/ChangeLog:

* localfn.h: define RUNLOCALFN.

libgomp/ChangeLog:

* config/linux/wait.h: split do_spin
* env.c (initialize_env): set gomp_thread_delay_count default
value
* libgomp.h: add gomp_thread_delay_count
* team.c (gomp_thread_start): add RUNLOCALFN
* config/linux/spin.h: head file.
* config/linux/x86/localfn.h: implement thread delay.
* config/linux/x86/mutex.c: implement tpause backoff.
* config/linux/x86/spin.h: spin head file.
---
 include/localfn.h  |  6 +++
 libgomp/config/linux/spin.h| 12 ++
 libgomp/config/linux/wait.h| 11 ++---
 libgomp/config/linux/x86/localfn.h | 19 +
 libgomp/config/linux/x86/mutex.c   | 66 ++
 libgomp/config/linux/x86/spin.h|  5 +++
 libgomp/env.c  |  4 ++
 libgomp/libgomp.h  |  1 +
 libgomp/team.c |  8 ++--
 9 files changed, 121 insertions(+), 11 deletions(-)
 create mode 100644 include/localfn.h
 create mode 100644 libgomp/config/linux/spin.h
 create mode 100644 libgomp/config/linux/x86/localfn.h
 create mode 100644 libgomp/config/linux/x86/mutex.c
 create mode 100644 libgomp/config/linux/x86/spin.h

diff --git a/include/localfn.h b/include/localfn.h
new file mode 100644
index 000..998e6554aec
--- /dev/null
+++ b/include/localfn.h
@@ -0,0 +1,6 @@
+#define RUNLOCALFN(a, b, c)  \
+  do \
+{ \
+  a (b); \
+} \
+  while (0)
diff --git a/libgomp/config/linux/spin.h b/libgomp/config/linux/spin.h
new file mode 100644
index 000..ad8eba275ed
--- /dev/null
+++ b/libgomp/config/linux/spin.h
@@ -0,0 +1,12 @@
+static inline int
+do_spin_for_count (int *addr, int val, unsigned long long count)
+{
+  unsigned long long i;
+  for (i = 0; i < count; i++)
+if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 0))
+  return 0;
+else
+  cpu_relax ();
+  return 1;
+}
+
diff --git a/libgomp/config/linux/wait.h b/libgomp/config/linux/wait.h
index 29d745f7141..17b7ef11c96 100644
--- a/libgomp/config/linux/wait.h
+++ b/libgomp/config/linux/wait.h
@@ -44,21 +44,16 @@
 extern int gomp_futex_wait, gomp_futex_wake;
 
 #include 
-
+#include 
 static inline int do_spin (int *addr, int val)
 {
-  unsigned long long i, count = gomp_spin_count_var;
+  unsigned long long count = gomp_spin_count_var;
 
   if (__builtin_expect (__atomic_load_n (&gomp_managed_threads,
  MEMMODEL_RELAXED)
 > gomp_available_cpus, 0))
 count = gomp_throttled_spin_count_var;
-  for (i = 0; i < count; i++)
-if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val, 0))
-  return 0;
-else
-  cpu_relax ();
-  return 1;
+  return do_spin_for_count (addr, val, count);
 }
 
 static inline void do_wait (int *addr, int val)
diff --git a/libgomp/config/linux/x86/localfn.h 
b/libgomp/config/linux/x86/localfn.h
new file mode 100644
index 000..379aced99ee
--- /dev/null
+++ b/libgomp/config/linux/x86/localfn.h
@@ -0,0 +1,19 @@
+#ifdef __x86_64__
+static inline void
+gomp_thread_delay(unsigned int count)
+{
+  unsigned long long i;
+  for (i = 0; i < count * gomp_thread_delay_count; i++)
+__builtin_ia32_pause ();
+}
+
+#define RUNLOCALFN(a, b, c)  \
+  do \
+{ \
+  gomp_thread_delay(c); \
+  a (b); \
+} \
+  while (0)
+#else
+# include "../../../../include/localfn.h"
+#endif
diff --git a/libgomp/config/linux/x86/mutex.c b/libgomp/config/linux/x86/mutex.c
new file mode 100644
index 000..5a14efb522e
--- /dev/null
+++ b/libgomp/config/linux/x86/mutex.c
@@ -0,0 +1,66 @@
+#include "../mutex.c"
+
+#ifdef __x86_64__
+static inline int
+do_spin_for_count_generic (int *addr, int val, unsigned long long count)
+{
+  unsigned long long i;
+  for (i = 0; i < count; i++)
+if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_RELAXED) != val,
+ 0))
+  return 0;
+else
+  cpu_relax ();
+  return 1;
+}
+
+#ifndef __WAITPKG__
+#pragma GCC push_options
+#pragma GCC target("waitpkg")
+#define __DISABLE_WAITPKG__
+#endif /* __WAITPKG__ */
+
+static inline unsigned long long __rdtsc(void)
+{
+  unsigned long long var;
+  unsigned int hi, lo;
+
+  __asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));
+
+  var = ((unsigned long long)hi << 32) | lo;
+  return var;
+}
+
+#define PAUSE_TP 200
+static inline int
+do_spin_for_backoff_tpause (int *addr, int val, unsigned long long count)
+{
+  unsigned int ctrl = 1;
+  unsigned long long wait_time = 1;
+  unsigned long long mask = 1ULL << __builtin_ia32_bsrdi(count * PAUSE_TP);
+

Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*

2023-08-02 Thread juzhe.zh...@rivai.ai

Yes. I also suspect whether we can run into reduc_idx >= 0.

I will add gcc_unreachable () and add fully testcase for it.

After I have fully tested in RISC-V port then send V4.

Thank you so much.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-02 16:33
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches
Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
On Wed, 2 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richard so much.
> 
> Forgive me asking question again :)
> 
> Is this following code correct for you ?
 
Well, I wonder what kind of testcase runs into the reduc_idx >= 0 case.
The point is I don't _know_ whether the code is correct, in fact it looked
suspicious ;)
 
> +  if (len_loop_p)
> +{
> +  if (len_opno >= 0)
> + {
> +   ifn = cond_len_fn;
> +   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> +   vect_nargs += 2;
> + }
> +  else if (reduc_idx >= 0)
> + gcc_unreachable ();
> +}
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-02 15:49
> To: ???
> CC: richard.sandiford; gcc-patches
> Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> On Mon, 31 Jul 2023, ??? wrote:
>  
> > Oh, Thanks a lot.
> > I can test it in RISC-V backend now.
> > 
> > But I have another questions:
> > >> I'm a bit confused (but also by the existing mask code), whether
> > >>vect_nargs needs adjustment depends on the IFN in the IL we analyze.
> > >>If if-conversion recognizes a .COND_ADD then we need to add nothing
> > >>for masking (that is, ifn == cond_fn already).  In your code above
> > >>you either use cond_len_fn or get_len_internal_fn (cond_fn) but
> > >>isn't that the very same?!  So how come you in one case add two
> > >>and in the other add four args?
> > >>Please make sure to place gcc_unreachable () in each arm and check
> > >>you have test coverage.  I believe that the else arm is unreachable
> > >>but when you vectorize .FMA you will need to add 4 and when you
> > >>vectorize .COND_FMA you will need to add two arguments (as said,
> > >>no idea why we special case reduc_idx >= 0 at the moment).
> > 
> > Do you mean I add gcc_unreachable in else like this:
> > 
> >   if (len_loop_p)
> > {
> >   if (len_opno >= 0)
> > {
> >   ifn = cond_len_fn;
> >   /* COND_* -> COND_LEN_* takes 2 extra arguments:LEN,BIAS.  */
> >   vect_nargs += 2;
> > }
> >   else if (reduc_idx >= 0)
> > {
> >   /* FMA -> COND_LEN_FMA takes 4 extra 
> > arguments:MASK,ELSE,LEN,BIAS.  */
> >   ifn = get_len_internal_fn (cond_fn);
> >   vect_nargs += 4;
>  
> no, a gcc_unreachable () here.  That is, make sure you have test coverage
> for the above two cases (to me the len_opno >= 0 case is obvious)
>  
> > }
> > else
> > gcc_unreachable ();
> > }
> > 
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-07-31 21:58
> > To: ???
> > CC: richard.sandiford; gcc-patches
> > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for COND_LEN_*
> > On Mon, 31 Jul 2023, ??? wrote:
> >  
> > > Yeah. I have tried this case too.
> > > 
> > > But this case doesn't need to be vectorized as COND_FMA, am I right?
> >  
> > Only when you enable loop masking.  Alternatively use
> >  
> > double foo (double *a, double *b, double *c)
> > {
> >   double result = 0.0;
> >   for (int i = 0; i < 1024; ++i)
> > result += i & 1 ? __builtin_fma (a[i], b[i], c[i]) : 0.0;
> >   return result;
> > }
> >  
> > but then for me if-conversion produces
> >  
> >   iftmp.0_18 = __builtin_fma (_8, _10, _5);
> >   _ifc__43 = _26 ? iftmp.0_18 : 0.0;
> >  
> > with -ffast-math (probably rightfully so).  I then get .FMAs
> > vectorized and .COND_FMA folded.
> >  
> > > The thing I wonder is that whether this condtion:
> > > 
> > > if  (mask_opno >= 0 && reduc_idx >= 0)
> > > 
> > > or similar as len
> > > if  (len_opno >= 0 && reduc_idx >= 0)
> > > 
> > > Whether they are redundant in vectorizable_call ?
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-07-31 21:33
> > > To: juzhe.zh...@rivai.ai
> > > CC: richard.sandiford; gcc-patches
> > > Subject: Re: Re: [PATCH V2] VECT: Support CALL vectorization for 
> > > COND_LEN_*
> > > On Mon, 31 Jul 2023, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > Hi, Richi.
> > > > 
> > > > >> I think you need to use fma from math.h together with -ffast-math
> > > > >>to get fma.
> > > > 
> > > > As you said, this is one of the case I tried:
> > > > https://godbolt.org/z/xMzrrv5dT 
> > > > GCC failed to vectorize.
> > > > 
> > > > Could you help me with this?
> > >  
> > > double foo (double *a, double *b, double *c)
> > > {
> > >   double result = 0.0;
> > >   for (int i = 0; i < 1024; ++i)
> > > result += __builtin_fma (a[i], b[i], c[i]);
> > >   return result;
> > > }
> > >  
> > > with -mavx2 -mfma -Ofast this is vectori

[PATCH v3] mklog: handle Signed-off-by, minor cleanup

2023-08-02 Thread Marc Poulhiès via Gcc-patches

Consider Signed-off-by lines as part of the ending of the initial
commit to avoid having these in the middle of the log when the
changelog part is injected after.

This is particularly usefull with:

 $ git gcc-commit-mklog --amend -s

that can be used to create the changelog and add the Signed-off-by line.

Also applies most of the shellcheck suggestions on the
prepare-commit-msg hook.

contrib/ChangeLog:

* mklog.py: Leave SOB lines after changelog.
* prepare-commit-msg: Apply most shellcheck suggestions.

Signed-off-by: Marc Poulhiès 
---
Found a small bug in the regex for comments, now fixed.

This command is used in particular during the dev of the frontend
for the Rust language (see r13-7099-g4b25fc15b925f8 as an example
of a SoB ending in the middle of the commit message).

Ok for master?

 contrib/mklog.py   | 34 +-
 contrib/prepare-commit-msg | 20 ++--
 2 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/contrib/mklog.py b/contrib/mklog.py
index 26230b9b4f2..496780883fb 100755
--- a/contrib/mklog.py
+++ b/contrib/mklog.py
@@ -41,7 +41,34 @@ from unidiff import PatchSet
 
 LINE_LIMIT = 100
 TAB_WIDTH = 8
-CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
+
+# Initial commit:
+#   +--+
+#   | gccrs: Some title|
+#   |  | This is the "start"
+#   | This is some text explaining the commit. |
+#   | There can be several lines.  |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+#
+# Results in:
+#   +--+
+#   | gccrs: Some title|
+#   |  |
+#   | This is some text explaining the commit. | This is the "start"
+#   | There can be several lines.  |
+#   |  |<--->
+#   | gcc/rust/ChangeLog:  |
+#   |  | This is the generated
+#   | * some_file (bla):   | ChangeLog part
+#   | (foo):   |
+#   |  |<--->
+#   | Signed-off-by: My Name  | This is the "end"
+#   +--+
+
+# this regex matches the first line of the "end" in the initial commit message
+FIRST_LINE_OF_END_RE = re.compile('(?i)^(signed-off-by:|co-authored-by:|#) ')
 
 pr_regex = re.compile(r'(\/(\/|\*)|[Cc*!])\s+(?PPR [a-z+-]+\/[0-9]+)')
 prnum_regex = re.compile(r'PR (?P[a-z+-]+)/(?P[0-9]+)')
@@ -330,10 +357,7 @@ def update_copyright(data):
 
 
 def skip_line_in_changelog(line):
-if line.lower().startswith(CO_AUTHORED_BY_PREFIX) or line.startswith('#'):
-return False
-return True
-
+return FIRST_LINE_OF_END_RE.match(line) == None
 
 if __name__ == '__main__':
 extra_args = os.getenv('GCC_MKLOG_ARGS')
diff --git a/contrib/prepare-commit-msg b/contrib/prepare-commit-msg
index 48c9dad3c6f..1e94706ba40 100755
--- a/contrib/prepare-commit-msg
+++ b/contrib/prepare-commit-msg
@@ -32,11 +32,11 @@ if ! [ -f "$COMMIT_MSG_FILE" ]; then exit 0; fi
 # Don't do anything unless requested to.
 if [ -z "$GCC_FORCE_MKLOG" ]; then exit 0; fi
 
-if [ -z "$COMMIT_SOURCE" ] || [ $COMMIT_SOURCE = template ]; then
+if [ -z "$COMMIT_SOURCE" ] || [ "$COMMIT_SOURCE" = template ]; then
 # No source or "template" means new commit.
 cmd="diff --cached"
 
-elif [ $COMMIT_SOURCE = message ]; then
+elif [ "$COMMIT_SOURCE" = message ]; then
 # "message" means -m; assume a new commit if there are any changes staged.
 if ! git diff --cached --quiet; then
cmd="diff --cached"
@@ -44,23 +44,23 @@ elif [ $COMMIT_SOURCE = message ]; then
cmd="diff --cached HEAD^"
 fi
 
-elif [ $COMMIT_SOURCE = commit ]; then
+elif [ "$COMMIT_SOURCE" = commit ]; then
 # The message of an existing commit.  If it's HEAD, assume --amend;
 # otherwise, assume a new commit with -C.
-if [ $SHA1 = HEAD ]; then
+if [ "$SHA1" = HEAD ]; then
cmd="diff --cached HEAD^"
if [ "$(git config gcc-config.mklog-hook-type)" = "smart-amend" ]; then
# Check if the existing message still describes the staged changes.
f=$(mktemp /tmp/git-commit.XX) || exit 1
-   git log -1 --pretty=email HEAD > $f
-   printf '\n---\n\n' >> $f
-   git $cmd >> $f
+   git log -1 --pretty=email HEAD > "$f"
+   printf '\n---\n\n' >> "$f"
+   git $cmd >> "$f"
if contrib/gcc-changelog/git_email.py "$f" >/dev/null 2>&1; th

RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-08-02 Thread Tamar Christina via Gcc-patches

Ping.

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, July 26, 2023 8:35 PM
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; jos...@codesourcery.com
> Subject: RE: [PATCH 2/2][frontend]: Add novector C pragma
> 
> Hi, This is a respin of the patch taking in the feedback received from the C++
> part.
> 
> Simultaneously it's also a ping 😊
> 
> 
> 
> Hi All,
> 
> FORTRAN currently has a pragma NOVECTOR for indicating that vectorization
> should not be applied to a particular loop.
> 
> ICC/ICX also has such a pragma for C and C++ called #pragma novector.
> 
> As part of this patch series I need a way to easily turn off vectorization of
> particular loops, particularly for testsuite reasons.
> 
> This patch proposes a #pragma GCC novector that does the same for C as
> gfortan does for FORTRAN and what ICX/ICX does for C.
> 
> I added only some basic tests here, but the next patch in the series uses 
> this in
> the testsuite in about ~800 tests.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/c-family/ChangeLog:
> 
>   * c-pragma.h (enum pragma_kind): Add PRAGMA_NOVECTOR.
>   * c-pragma.cc (init_pragma): Use it.
> 
> gcc/c/ChangeLog:
> 
>   * c-parser.cc (c_parser_while_statement, c_parser_do_statement,
>   c_parser_for_statement, c_parser_statement_after_labels,
>   c_parse_pragma_novector, c_parser_pragma): Wire through novector
> and
>   default to false.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-novector-pragma.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index
> 9cc95ab3ee376628dbef2485b84e6008210fa8fc..99cf2e8bd1c05537c1984
> 70f1aaa0a5a9da4e576 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -87,6 +87,7 @@ enum pragma_kind {
>PRAGMA_GCC_PCH_PREPROCESS,
>PRAGMA_IVDEP,
>PRAGMA_UNROLL,
> +  PRAGMA_NOVECTOR,
> 
>PRAGMA_FIRST_EXTERNAL
>  };
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc index
> 0d2b333cebbed32423d5dc6fd2a3ac0ce0bf8b94..848a850b8e123ff1c6ae1e
> c4b7f8ccbd599b1a88 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -1862,6 +1862,10 @@ init_pragma (void)
>  cpp_register_deferred_pragma (parse_in, "GCC", "unroll",
> PRAGMA_UNROLL,
> false, false);
> 
> +  if (!flag_preprocess_only)
> +cpp_register_deferred_pragma (parse_in, "GCC", "novector",
> PRAGMA_NOVECTOR,
> +   false, false);
> +
>  #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION
>c_register_pragma_with_expansion (0, "pack", handle_pragma_pack);  #else
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index
> 24a6eb6e4596f32c477e3f1c3f98b9792f7bc92c..74f3cbb0d61b5f4c0eb300
> 672f495dde3f1517f7 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -1572,9 +1572,11 @@ static tree c_parser_c99_block_statement
> (c_parser *, bool *,
> location_t * = NULL);
>  static void c_parser_if_statement (c_parser *, bool *, vec *);  static 
> void
> c_parser_switch_statement (c_parser *, bool *); -static void
> c_parser_while_statement (c_parser *, bool, unsigned short, bool *); -static
> void c_parser_do_statement (c_parser *, bool, unsigned short); -static void
> c_parser_for_statement (c_parser *, bool, unsigned short, bool *);
> +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool,
> +   bool *);
> +static void c_parser_do_statement (c_parser *, bool, unsigned short,
> +bool); static void c_parser_for_statement (c_parser *, bool, unsigned short,
> bool,
> + bool *);
>  static tree c_parser_asm_statement (c_parser *);  static tree
> c_parser_asm_operands (c_parser *);  static tree
> c_parser_asm_goto_operands (c_parser *); @@ -6644,13 +6646,13 @@
> c_parser_statement_after_labels (c_parser *parser, bool *if_p,
> c_parser_switch_statement (parser, if_p);
> break;
>   case RID_WHILE:
> -   c_parser_while_statement (parser, false, 0, if_p);
> +   c_parser_while_statement (parser, false, 0, false, if_p);
> break;
>   case RID_DO:
> -   c_parser_do_statement (parser, false, 0);
> +   c_parser_do_statement (parser, false, 0, false);
> break;
>   case RID_FOR:
> -   c_parser_for_statement (parser, false, 0, if_p);
> +   c_parser_for_statement (parser, false, 0, false, if_p);
> break;
>   case RID_GOTO:
> c_parser_consume_token (parser);
> @@ -7146,7 +7148,7 @@ c_parser_switch_statement (c_parser *parser, bool
> *if_p)
> 
>  static void
>  c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short
> unroll,
> -   bool *if_p)
> +   bool novector, bool *if_p)
>  {
>tree block, cond, body;
>unsigned char s

[PATCH 0/3 v2] genmatch: Speed up recompilation after changes to match.pd

2023-08-02 Thread Andrzej Turko via Gcc-patches

The following reduces the number of object files that need to be rebuilt
after match.pd has been modified. Right now a change to match.pd which
adds/removes a line almost always forces recompilation of all files that
genmatch generates from it. This is because of unnecessary changes to
the generated .cc files:

1. Function names and ordering change as does the way the functions are
distributed across multiple source files.
2. Code locations from match.pd are quoted directly (including line
numbers) by logging fprintf calls.

This patch addresses the those issues without changing the behaviour
of the generated code. The first one is solved by making sure that minor
changes to match.pd do not influence the order in which functions are
generated. The second one by using a lookup table with line numbers.

Now a change to a single function will trigger a rebuild of 4 object
files (one with the function  and the one with the lookup table both for
gimple and generic) instead all of them (20 by default).
For reference, this decreased the rebuild time with 48 threads from 3.5
minutes to 1.5 minutes on my machine.

V2:
* Placed the change in Makefile.in in the correct commit.
* Used a separate logging function to reduce size of the
executable.

As for Richard Biener's remarks on executable size:

1. The previous version of the change increased the sizes of executables
by 8-12 kB.
2. The current version (with an extra indirection step) did so by
around 150 kB (I suspect that the reason for it may be adding
an additional functions and its overhead, in which case the
actual number of instructions may be smaller in this case).


One can choose between those variants just by taking the third commit either
from this or the previous version of the patch series.

Note for reviewers: I do not have write access.

Andrzej Turko (3):
  Support get_or_insert in ordered_hash_map
  genmatch: Reduce variability of generated code
  genmatch: Log line numbers indirectly

 gcc/Makefile.in   |  4 +-
 gcc/genmatch.cc   | 91 +--
 gcc/ordered-hash-map-tests.cc | 19 ++--
 gcc/ordered-hash-map.h| 26 ++
 4 files changed, 118 insertions(+), 22 deletions(-)

-- 
2.34.1

[PATCH 2/3 v2] genmatch: Reduce variability of generated code

2023-08-02 Thread Andrzej Turko via Gcc-patches

So far genmatch has been using an unordered map to store information about
functions to be generated. Since corresponding locations from match.pd were
used as keys in the map, even small changes to match.pd which caused
line number changes would change the order in which the functions are
generated. This would reshuffle the functions between the generated .cc files.
This way even a minimal modification to match.pd forces recompilation of all
object files originating from match.pd on rebuild.

This commit makes sure that functions are generated in the order of their
processing (in contrast to the random order based on hashes of their
locations in match.pd). This is done by replacing the unordered map with an
ordered one. This way small changes to match.pd does not cause function
renaming and reshuffling among generated source files.
Together with the subsequent change to logging fprintf calls, this
removes unnecessary changes to the files generated by genmatch allowing
for reuse of already built object files during rebuild. The aim is to
make editing of match.pd and subsequent testing easier.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Make sinfo map ordered.
* Makefile.in: Require the ordered map header for genmatch.o.
---
 gcc/Makefile.in | 4 ++--
 gcc/genmatch.cc | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index e99628cec07..2429128cbf2 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3004,8 +3004,8 @@ build/genhooks.o : genhooks.cc $(TARGET_DEF) 
$(C_TARGET_DEF)  \
   $(COMMON_TARGET_DEF) $(D_TARGET_DEF) $(BCONFIG_H) $(SYSTEM_H) errors.h
 build/genmddump.o : genmddump.cc $(RTL_BASE_H) $(BCONFIG_H) $(SYSTEM_H)
\
   $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H)
-build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) \
-  $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \
+build/genmatch.o : genmatch.cc $(BCONFIG_H) $(SYSTEM_H) $(CORETYPES_H) \
+  errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h ordered-hash-map.h \
   tree.def builtins.def internal-fn.def case-cfn-macros.h $(CPPLIB_H)
 build/gencfn-macros.o : gencfn-macros.cc $(BCONFIG_H) $(SYSTEM_H)  \
   $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def  \
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 2302f2a7ff0..1deca505603 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hash-table.h"
 #include "hash-set.h"
 #include "is-a.h"
+#include "ordered-hash-map.h"
 
 
 /* Stubs for GGC referenced through instantiations triggered by hash-map.  */
@@ -1684,7 +1685,7 @@ struct sinfo_hashmap_traits : 
simple_hashmap_traits,
   template  static inline void remove (T &) {}
 };
 
-typedef hash_map
+typedef ordered_hash_map
   sinfo_map_t;
 
 /* Current simplifier ID we are processing during insertion into the
-- 
2.34.1

[PATCH 3/3 v2] genmatch: Log line numbers indirectly

2023-08-02 Thread Andrzej Turko via Gcc-patches

Currently fprintf calls logging to a dump file take line numbers
in the match.pd file directly as arguments.
When match.pd is edited, referenced code changes line numbers,
which causes changes to many fprintf calls and, thus, to many
(usually all) .cc files generated by genmatch. This forces make
to (unnecessarily) rebuild many .o files.

This change replaces those logging fprintf calls with calls to
a dedicated logging function. Because it reads the line numbers
from the lookup table, it is enough to pass a corresponding index.
Thanks to this, when match.pd changes, it is enough to rebuild
the file containing the lookup table and, of course, those
actually affected by the change.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* genmatch.cc: Log line numbers indirectly.
---
 gcc/genmatch.cc | 88 -
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 1deca505603..be6c11c347f 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -217,9 +217,56 @@ fp_decl_done (FILE *f, const char *trailer)
 fprintf (header_file, "%s;", trailer);
 }
 
+/* Line numbers for use by indirect line directives.  */
+static vec dbg_line_numbers;
+
+static void
+write_header_declarations (bool gimple, FILE *f)
+{
+  fprintf (f, "\nextern void\n%s_dump_logs (const char *file1, int line1_id, "
+ "const char *file2, int line2, bool simplify);\n",
+ gimple ? "gimple" : "generic");
+}
+
+static void
+define_dbg_line_numbers (bool gimple, FILE *f)
+{
+
+  if (dbg_line_numbers.is_empty ())
+{
+  fprintf (f, "};\n\n");
+  return;
+}
+
+  fprintf (f , "void\n%s_dump_logs (const char *file1, int line1_id,"
+   "const char *file2, int line2, bool simplify)\n{\n",
+   gimple ? "gimple" : "generic");
+
+  fprintf_indent (f, 2, "static int __dbg_line_numbers[%d] = {",
+ dbg_line_numbers.length ());
+
+  for (int i = 0; i < (int)dbg_line_numbers.length () - 1; i++)
+{
+  if (i % 20 == 0)
+   fprintf (f, "\n\t");
+
+  fprintf (f, "%d, ", dbg_line_numbers[i]);
+}
+  fprintf (f, "%d\n  };\n\n", dbg_line_numbers.last ());
+
+
+  fprintf_indent (f, 2, "fprintf (dump_file, \"%%s "
+ "%%s: __dbg_line_numbers[%%d], %%s:%%d\\n\",\n");
+  fprintf_indent (f, 10, "simplify ? \"Applying pattern\" : "
+ "\"Matching expression\", file1, line1_id, file2, line2);");
+
+  fprintf (f, "\n}\n\n");
+}
+
 static void
 output_line_directive (FILE *f, location_t location,
-  bool dumpfile = false, bool fnargs = false)
+ bool dumpfile = false, bool fnargs = false,
+ bool indirect_line_numbers = false)
 {
   const line_map_ordinary *map;
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, &map);
@@ -239,7 +286,15 @@ output_line_directive (FILE *f, location_t location,
++file;
 
   if (fnargs)
-   fprintf (f, "\"%s\", %d", file, loc.line);
+  {
+  if (indirect_line_numbers)
+{
+  fprintf (f, "\"%s\", %d", file, dbg_line_numbers.length ());
+  dbg_line_numbers.safe_push (loc.line);
+}
+  else
+fprintf (f, "\"%s\", %d", file, loc.line);
+  }
   else
fprintf (f, "%s:%d", file, loc.line);
 }
@@ -3375,20 +3430,19 @@ dt_operand::gen (FILE *f, int indent, bool gimple, int 
depth)
 }
 }
 
-/* Emit a fprintf to the debug file to the file F, with the INDENT from
+/* Emit a logging call to the debug file to the file F, with the INDENT from
either the RESULT location or the S's match location if RESULT is null. */
 static void
-emit_debug_printf (FILE *f, int indent, class simplify *s, operand *result)
+emit_logging_call (FILE *f, int indent, class simplify *s, operand *result,
+ bool gimple)
 {
   fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
-  "fprintf (dump_file, \"%s ",
-  s->kind == simplify::SIMPLIFY
-  ? "Applying pattern" : "Matching expression");
-  fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+  "%s_dump_logs (", gimple ? "gimple" : "generic");
   output_line_directive (f,
-result ? result->location : s->match->location, true,
-true);
-  fprintf (f, ", __FILE__, __LINE__);\n");
+   result ? result->location : s->match->location,
+   true, true, true);
+  fprintf (f, ", __FILE__, __LINE__, %s);\n",
+ s->kind == simplify::SIMPLIFY ? "true" : "false");
 }
 
 /* Generate code for the '(if ...)', '(with ..)' and actual transform
@@ -3524,7 +3578,7 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   if (!result)
 {
   /* If there is no result then this is a predicate implementation.  */
-  emit_debug_printf (f, indent, s, result);
+  emit_logging_call (f, indent, s, result, gimple);
   fprin

[PATCH 1/3 v2] Support get_or_insert in ordered_hash_map

2023-08-02 Thread Andrzej Turko via Gcc-patches

Get_or_insert method is already supported by the unordered hash map.
Adding it to the ordered map enables us to replace the unordered map
with the ordered one in cases where ordering may be useful.

Signed-off-by: Andrzej Turko 

gcc/ChangeLog:

* ordered-hash-map.h: Add get_or_insert.
* ordered-hash-map-tests.cc: Use get_or_insert in tests.
---
 gcc/ordered-hash-map-tests.cc | 19 +++
 gcc/ordered-hash-map.h| 26 ++
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/gcc/ordered-hash-map-tests.cc b/gcc/ordered-hash-map-tests.cc
index 1c26bbfa979..55894c25fa0 100644
--- a/gcc/ordered-hash-map-tests.cc
+++ b/gcc/ordered-hash-map-tests.cc
@@ -58,6 +58,7 @@ static void
 test_map_of_strings_to_int ()
 {
   ordered_hash_map  m;
+  bool existed;
 
   const char *ostrich = "ostrich";
   const char *elephant = "elephant";
@@ -74,17 +75,23 @@ test_map_of_strings_to_int ()
   ASSERT_EQ (false, m.put (ostrich, 2));
   ASSERT_EQ (false, m.put (elephant, 4));
   ASSERT_EQ (false, m.put (ant, 6));
-  ASSERT_EQ (false, m.put (spider, 8));
+  existed = true;
+  int &value = m.get_or_insert (spider, &existed);
+  value = 8;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (millipede, 750));
   ASSERT_EQ (false, m.put (eric, 3));
 
+
   /* Verify that we can recover the stored values.  */
   ASSERT_EQ (6, m.elements ());
   ASSERT_EQ (2, *m.get (ostrich));
   ASSERT_EQ (4, *m.get (elephant));
   ASSERT_EQ (6, *m.get (ant));
   ASSERT_EQ (8, *m.get (spider));
-  ASSERT_EQ (750, *m.get (millipede));
+  existed = false;
+  ASSERT_EQ (750, m.get_or_insert (millipede, &existed));
+  ASSERT_EQ (true, existed);
   ASSERT_EQ (3, *m.get (eric));
 
   /* Verify that the order of insertion is preserved.  */
@@ -113,6 +120,7 @@ test_map_of_int_to_strings ()
 {
   const int EMPTY = -1;
   const int DELETED = -2;
+  bool existed;
   typedef int_hash  int_hash_t;
   ordered_hash_map  m;
 
@@ -131,7 +139,9 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (false, m.put (2, ostrich));
   ASSERT_EQ (false, m.put (4, elephant));
   ASSERT_EQ (false, m.put (6, ant));
-  ASSERT_EQ (false, m.put (8, spider));
+  const char* &value = m.get_or_insert (8, &existed);
+  value = spider;
+  ASSERT_EQ (false, existed);
   ASSERT_EQ (false, m.put (750, millipede));
   ASSERT_EQ (false, m.put (3, eric));
 
@@ -141,7 +151,8 @@ test_map_of_int_to_strings ()
   ASSERT_EQ (*m.get (4), elephant);
   ASSERT_EQ (*m.get (6), ant);
   ASSERT_EQ (*m.get (8), spider);
-  ASSERT_EQ (*m.get (750), millipede);
+  ASSERT_EQ (m.get_or_insert (750, &existed), millipede);
+  ASSERT_EQ (existed, TRUE);
   ASSERT_EQ (*m.get (3), eric);
 
   /* Verify that the order of insertion is preserved.  */
diff --git a/gcc/ordered-hash-map.h b/gcc/ordered-hash-map.h
index 6b68cc96305..9fc875182e1 100644
--- a/gcc/ordered-hash-map.h
+++ b/gcc/ordered-hash-map.h
@@ -76,6 +76,32 @@ public:
 return m_map.get (k);
   }
 
+  /* Return a reference to the value for the passed in key, creating the entry
+if it doesn't already exist.  If existed is not NULL then it is set to
+false if the key was not previously in the map, and true otherwise.  */
+
+  Value &get_or_insert (const Key &k, bool *existed = NULL)
+  {
+bool _existed;
+Value &ret = m_map.get_or_insert (k, &_existed);
+
+if (!_existed)
+  {
+   bool key_present;
+   int &slot = m_key_index.get_or_insert (k, &key_present);
+   if (!key_present)
+ {
+   slot = m_keys.length ();
+   m_keys.safe_push (k);
+ }
+  }
+
+if (existed)
+  *existed = _existed;
+
+return ret;
+  }
+
   /* Removing a key removes it from the map, but retains the insertion
  order.  */
 
-- 
2.34.1

Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-08-02 Thread Richard Sandiford via Gcc-patches

Jeff Law via Gcc-patches  writes:
> On 8/1/23 05:18, Richard Sandiford wrote:
>> 
>> Where were you seeing the requirement for pointer equality?  genrecog.cc
>> at least uses rtx_equal_p, and I think it has to.  E.g. some patterns
>> use (match_dup ...) to match output and input mems, and mem rtxes
>> shouldn't be shared.
> It's a general concern due to the way we handle transforming pseudos 
> into hard registers after allocation is complete.   We can end up with 
> two REG expressions that will compare equal according to rtx_equal_p, 
> but which are not pointer equal.

But isn't that OK?  I don't think there's a requirement for match_dup
pointer equality either before or after RA.  Or at least, there
shouldn't be.  If something happens to rely on pointer equality
for match_dups then I think we should fix it.

So IMO, like you said originally, match_dup would be the right way to
handle this kind of pattern.

The reason I'm interested is that AArch64 makes pretty extensive use
of match_dup for this purpose.  E.g.:

(define_insn "aarch64_abd"
  [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
(minus:VDQ_BHSI
  (USMAX:VDQ_BHSI
(match_operand:VDQ_BHSI 1 "register_operand" "w")
(match_operand:VDQ_BHSI 2 "register_operand" "w"))
  (:VDQ_BHSI
(match_dup 1)
(match_dup 2]

So if this isn't working correctly for subregs (or for anythine else),
then I'd be keen to do something about it :)

I don't want to labour the point though.

Thanks,
Richard

[PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

2023-08-02 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to support the rounding mode API for the VFWSUB
for the below samples.

* __riscv_vfwsub_vv_f64m2_rm
* __riscv_vfwsub_vv_f64m2_rm_m
* __riscv_vfwsub_vf_f64m2_rm
* __riscv_vfwsub_vf_f64m2_rm_m
* __riscv_vfwsub_wv_f64m2_rm
* __riscv_vfwsub_wv_f64m2_rm_m
* __riscv_vfwsub_wf_f64m2_rm
* __riscv_vfwsub_wf_f64m2_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (BASE): Add
vfwsub frm.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration.
* config/riscv/riscv-vector-builtins-functions.def (vfwsub_frm):
Add vfwsub function definitions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-sub.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 +
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  4 ++
 .../riscv/rvv/base/float-point-widening-sub.c | 66 +++
 4 files changed, 74 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 981a4a7ede8..ddf694c771c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -317,6 +317,7 @@ public:
 
 /* Implements below instructions for frm
- vfwadd
+   - vfwsub
 */
 template
 class widen_binop_frm : public function_base
@@ -2100,6 +2101,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrsub_frm_obj;
 static CONSTEXPR const widen_binop vfwadd_obj;
 static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
+static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
@@ -2330,6 +2332,7 @@ BASE (vfrsub_frm)
 BASE (vfwadd)
 BASE (vfwadd_frm)
 BASE (vfwsub)
+BASE (vfwsub_frm)
 BASE (vfmul)
 BASE (vfdiv)
 BASE (vfrdiv)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f9e1df5fe75..5800fca0169 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -150,6 +150,7 @@ extern const function_base *const vfrsub_frm;
 extern const function_base *const vfwadd;
 extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
+extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 743205a9b97..58a7224fe0c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -306,8 +306,12 @@ DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwf_ops)
 
 // 13.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
 DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
new file mode 100644
index 000..4325cc510a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f64m2_rm (op1, op2, 0, vl);
+}
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+  size_t vl) {
+  return __riscv_vfwsub_vv_f64m2_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat64m2_t
+test_vfwsub_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl) {
+  return __riscv_vfwsub_vf_f64m2_rm (op1, op2, 2, vl);
+}
+
+vfloat64m2_t
+test_vfwsub_vf_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, float32_t op2,
+  size_t vl) {
+  return __riscv_vfwsub_vf_f64m2_rm_m (m

[PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Tamar Christina via Gcc-patches

Hi All,

When determining issue rates we currently discount non-constant MLA accumulators
for Advanced SIMD but don't do it for the latency.

This means the costs for Advanced SIMD with a constant accumulator are wrong and
results in us costing SVE and Advanced SIMD the same.  This can cauze us to
vectorize with Advanced SIMD instead of SVE in some cases.

This patch adds the same discount for SVE and Scalar as we do for issue rate.

My assumption was that on issue rate we reject all scalar constants early
because we take into account the extra instruction to create the constant?
Though I'd have expected this to be in prologue costs.  For this reason I added
an extra parameter to allow me to force the check to at least look for the
multiplication.

This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
Neoverse cores.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
allow_constants. 
(aarch64_adjust_stmt_cost): Use it.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Pass vinfo to
aarch64_adjust_stmt_cost.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c972557e2f83b63ba365fea9
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum vect_cost_for_stmt 
kind,
or multiply-subtract sequence that might be suitable for fusing into a
single instruction.  If VEC_FLAGS is zero, analyze the operation as
a scalar one, otherwise analyze it as an operation on vectors with those
-   VEC_* flags.  */
+   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators 
including
+   constant ones.  */
 static bool
 aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
-   unsigned int vec_flags)
+   unsigned int vec_flags, bool allow_constants)
 {
   gassign *assign = dyn_cast (stmt_info->stmt);
   if (!assign)
@@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
 return false;
 
-  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
-  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
+  if (!allow_constants
+  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
+ || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
 return false;
 
   for (int i = 1; i < 3; ++i)
@@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
continue;
 
-  if (vec_flags & VEC_ADVSIMD)
+  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
{
  /* Scalar and SVE code can tie the result to any FMLA input (or none,
 although that requires a MOVPRFX for SVE).  However, Advanced SIMD
@@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
return false;
  def_stmt_info = vinfo->lookup_def (rhs);
  if (!def_stmt_info
- || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
return false;
}
 
@@ -16721,8 +16724,9 @@ aarch64_sve_adjust_stmt_cost (class vec_info *vinfo, 
vect_cost_for_stmt kind,
and which when vectorized would operate on vector type VECTYPE.  Add the
cost of any embedded operations.  */
 static fractional_cost
-aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, stmt_vec_info stmt_info,
- tree vectype, fractional_cost stmt_cost)
+aarch64_adjust_stmt_cost (vec_info *vinfo, vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info, tree vectype,
+ unsigned vec_flags, fractional_cost stmt_cost)
 {
   if (vectype)
 {
@@ -16745,6 +16749,15 @@ aarch64_adjust_stmt_cost (vect_cost_for_stmt kind, 
stmt_vec_info stmt_info,
  break;
}
 
+  gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
+  if (assign && !vect_is_reduction (stmt_info))
+   {
+ bool simd_p = vec_flags & VEC_ADVSIMD;
+ /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
+ if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+   return 0;
+   }
+
   if (kind == vector_stmt || kind == vec_to_scalar)
if (tree cmp_type = vect_embedded_comparison_type (stmt_info))
  {
@@ -16795,7 +16808,8 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
 }
 
   /* Assume that multiply

[PATCH]AArch64 update costing for combining vector conditionals

2023-08-02 Thread Tamar Christina via Gcc-patches

Hi All,

boolean comparisons have different cost depending on the mode. e.g.
a && b when predicated doesn't require an addition instruction, the AND is free
by combining the predicate of the one operation into the second one.  At the
moment though we only fuse compares so this update requires one of the
operands to be a comparison.

Scalars also don't require this because the non-ifct variant is a series of
branches where following the branch sequences themselves are natural ANDs.

Advanced SIMD however does require an actual AND to combine the boolean values.

As such this patch discounts Scalar and SVE boolean operation latency and
throughput.

With this patch comparison heavy code prefers SVE as it should, especially in
cases with SVE VL == Advanced SIMD VL where previously the SVE prologue costs
would tip it towards Advanced SIMD.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_bool_compound_p): New.
(aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Use it.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b1bacc734b4630257b6ebf8ca7d9afeb34008c10..55963bb28be7ede08b05fb9fddb5a65f6818c63e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16453,6 +16453,49 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
+/* Return true if STMT_INFO is the second part of a two-statement boolean AND
+   expression sequence that might be suitable for fusing into a
+   single instruction.  If VEC_FLAGS is zero, analyze the operation as
+   a scalar one, otherwise analyze it as an operation on vectors with those
+   VEC_* flags.  */
+
+static bool
+aarch64_bool_compound_p (vec_info *vinfo, stmt_vec_info stmt_info,
+unsigned int vec_flags)
+{
+  gassign *assign = dyn_cast (stmt_info->stmt);
+  if (!assign
+  || !STMT_VINFO_VECTYPE (stmt_info)
+  || !VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_info))
+  || gimple_assign_rhs_code (assign) != BIT_AND_EXPR)
+return false;
+
+  for (int i = 1; i < 3; ++i)
+{
+  tree rhs = gimple_op (assign, i);
+
+  if (TREE_CODE (rhs) != SSA_NAME)
+   continue;
+
+  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
+  if (!def_stmt_info
+ || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
+   continue;
+
+  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
+  if (!rhs_assign
+ || TREE_CODE_CLASS (gimple_assign_rhs_code (rhs_assign))
+   != tcc_comparison)
+   continue;
+
+  if (vec_flags & VEC_ADVSIMD)
+   return false;
+
+  return true;
+}
+  return false;
+}
+
 /* We are considering implementing STMT_INFO using SVE.  If STMT_INFO is an
in-loop reduction that SVE supports directly, return its latency in cycles,
otherwise return zero.  SVE_COSTS specifies the latencies of the relevant
@@ -16750,11 +16793,17 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
vect_cost_for_stmt kind,
}
 
   gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
-  if (assign && !vect_is_reduction (stmt_info))
+  if (assign)
{
  bool simd_p = vec_flags & VEC_ADVSIMD;
  /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
- if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+ if (!vect_is_reduction (stmt_info)
+ && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
+   return 0;
+
+ /* For vector boolean ANDs with a compare operand we just need
+one insn.  */
+ if (aarch64_bool_compound_p (vinfo, stmt_info, vec_flags))
return 0;
}
 
@@ -16831,6 +16880,12 @@ aarch64_vector_costs::count_ops (unsigned int count, 
vect_cost_for_stmt kind,
   && aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags, false))
 return;
 
+  /* Assume that bool AND with compare operands will become a single
+ operation.  */
+  if (stmt_info
+  && aarch64_bool_compound_p (m_vinfo, stmt_info, m_vec_flags))
+return;
+
   /* Count the basic operation cost associated with KIND.  */
   switch (kind)
 {




-- 
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
b1bacc734b4630257b6ebf8ca7d9afeb34008c10..55963bb28be7ede08b05fb9fddb5a65f6818c63e
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -16453,6 +16453,49 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
   return false;
 }
 
+/* Return true if STMT_INFO is the second part of a two-statement boolean AND
+   expression sequence that might be suitable for fusing into a
+   single instruction.  If VEC_FLAGS is zero, analyze the operation as
+   a scalar one, otherwise analyze it as an operation on vectors

[PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-02 Thread Tamar Christina via Gcc-patches

Hi All,

In GCC 11 we implemented the vectorizer optab for widening left shifts,
however this optab is only supported for uniform shift constants.

At the moment GCC still has two loop vectorization strategy (classical loop and
SLP based loop vec) and the optab is implemented as a scalar pattern.

This means that when we apply it to a non-uniform constant inside a loop we only
find out during SLP build that the constants aren't uniform.  At this point it's
too late and we lose SLP entirely.

Over the years I've tried various options but none of it works well:

1. Dissolving patterns during SLP built (problematic, also dissolves them for
non-slp).
2. Optionally ignoring patterns for SLP build (problematic, ends up interfearing
with relevancy detection).
3. Relaxing contraint on SLP build to allow non-constant values and dissolving
them after SLP build using an SLP pattern.  (problematic, ends up breaking
shift reassociation).

As a result we've concluded that for now this pattern should just be removed
and formed during RTL.

The plan is to move this to an SLP only pattern once we remove classical loop
vectorization support from GCC, at which time we can also properly support SVE's
Top and Bottom variants.

This removes the optab and reworks the RTL to recognize both the vector variant
and the intrinsics variant.  Also just simplifies all these patterns.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/106346
* config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
vec_widen_shiftl_hi_): Remove.
(aarch64_shll_internal): Renamed to...
(aarch64_shll): .. This.
(aarch64_shll2_internal): Renamed to...
(aarch64_shll2): .. This.
(aarch64_shll_n, aarch64_shll2_n): Re-use new
optabs.
* config/aarch64/constraints.md (D2, D3): New.
* config/aarch64/predicates.md (aarch64_simd_shift_imm_vec): New.

gcc/testsuite/ChangeLog:

PR target/106346
* gcc.target/aarch64/pr98772.c: Adjust assembly.
* gcc.target/aarch64/vect-widen-shift.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
d95394101470446e55f25a2397dd112239b6a54d..afd5b8632afbcddf8dad14495c3446c560eb085d
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -6387,105 +6387,66 @@ (define_insn "aarch64_qshl"
   [(set_attr "type" "neon_sat_shift_reg")]
 )
 
-(define_expand "vec_widen_shiftl_lo_"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
-  "TARGET_SIMD"
-  {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
-emit_insn (gen_aarch64_shll_internal (operands[0], operands[1],
-p, operands[2]));
-DONE;
-  }
-)
-
-(define_expand "vec_widen_shiftl_hi_"
-   [(set (match_operand: 0 "register_operand")
-   (unspec: [(match_operand:VQW 1 "register_operand" "w")
-(match_operand:SI 2
-  "immediate_operand" "i")]
- VSHLL))]
-   "TARGET_SIMD"
-   {
-rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
-emit_insn (gen_aarch64_shll2_internal (operands[0], operands[1],
- p, operands[2]));
-DONE;
-   }
-)
-
 ;; vshll_n
 
-(define_insn "aarch64_shll_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
-(match_operand:SI 3
-  "aarch64_simd_shift_imm_bitsize_" "i")]
-VSHLL))]
+(define_insn "aarch64_shll"
+  [(set (match_operand: 0 "register_operand")
+   (ashift: (ANY_EXTEND:
+   (match_operand:VD_BHSI 1 "register_operand"))
+(match_operand: 2
+  "aarch64_simd_shift_imm_vec")))]
   "TARGET_SIMD"
-  {
-if (INTVAL (operands[3]) == GET_MODE_UNIT_BITSIZE (mode))
-  return "shll\\t%0., %1., %3";
-else
-  return "shll\\t%0., %1., %3";
+  {@ [cons: =0, 1, 2]
+ [w, w, D2] shll\t%0., %1., %I2
+ [w, w, D3] shll\t%0., %1., %I2
   }
   [(set_attr "type" "neon_shift_imm_long")]
 )
 
-(define_insn "aarch64_shll2_internal"
-  [(set (match_operand: 0 "register_operand" "=w")
-   (unspec: [(vec_select:
-   (match_operand:VQW 1 "register_operand" "w")
-   (match_operand:VQW 2 "vect_par_cnst_hi_half" ""))
-(match_operand:SI 3
-

[PATCH][gensupport]: Don't segfault on empty attrs list

2023-08-02 Thread Tamar Christina via Gcc-patches

Hi All,

Currently we segfault when len == 0 for an attribute list.

essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to
[cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:].  This fixes it by just
returning early and leaving it to the validators whether this should error out
or not.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* gensupport.cc (conlist): Support length 0 attribute.

--- inline copy of patch -- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
959d1d9c83cf397fcb344e8d3db0f339a967587f..5c5f1cf4781551d3db95103c19cd1b70d98f4f73
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -619,6 +619,9 @@ public:
  [ns..ns + len) should equal XSTR (rtx, 0).  */
   conlist (const char *ns, unsigned int len, bool numeric)
   {
+if (len == 0)
+  return;
+
 /* Trim leading whitespaces.  */
 while (ISBLANK (*ns))
   {




-- 
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 
959d1d9c83cf397fcb344e8d3db0f339a967587f..5c5f1cf4781551d3db95103c19cd1b70d98f4f73
 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -619,6 +619,9 @@ public:
  [ns..ns + len) should equal XSTR (rtx, 0).  */
   conlist (const char *ns, unsigned int len, bool numeric)
   {
+if (len == 0)
+  return;
+
 /* Trim leading whitespaces.  */
 while (ISBLANK (*ns))
   {

Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi All,
>
> When determining issue rates we currently discount non-constant MLA 
> accumulators
> for Advanced SIMD but don't do it for the latency.
>
> This means the costs for Advanced SIMD with a constant accumulator are wrong 
> and
> results in us costing SVE and Advanced SIMD the same.  This can cauze us to
> vectorize with Advanced SIMD instead of SVE in some cases.
>
> This patch adds the same discount for SVE and Scalar as we do for issue rate.
>
> My assumption was that on issue rate we reject all scalar constants early
> because we take into account the extra instruction to create the constant?
> Though I'd have expected this to be in prologue costs.  For this reason I 
> added
> an extra parameter to allow me to force the check to at least look for the
> multiplication.

I'm not sure that was it.  I wish I'd added a comment to say what
it was though :(  I suspect different parts of this function were
written at different times, hence the inconsistency.

> This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
> Neoverse cores.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
>   allow_constants. 
>   (aarch64_adjust_stmt_cost): Use it.
>   (aarch64_vector_costs::count_ops): Likewise.
>   (aarch64_vector_costs::add_stmt_cost): Pass vinfo to
>   aarch64_adjust_stmt_cost.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c972557e2f83b63ba365fea9
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum vect_cost_for_stmt 
> kind,
> or multiply-subtract sequence that might be suitable for fusing into a
> single instruction.  If VEC_FLAGS is zero, analyze the operation as
> a scalar one, otherwise analyze it as an operation on vectors with those
> -   VEC_* flags.  */
> +   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators 
> including
> +   constant ones.  */
>  static bool
>  aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
> - unsigned int vec_flags)
> + unsigned int vec_flags, bool allow_constants)
>  {
>gassign *assign = dyn_cast (stmt_info->stmt);
>if (!assign)
> @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>if (code != PLUS_EXPR && code != MINUS_EXPR)
>  return false;
>  
> -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> -  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> +  if (!allow_constants
> +  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> +   || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
>  return false;
>  
>for (int i = 1; i < 3; ++i)
> @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
>   continue;
>  
> -  if (vec_flags & VEC_ADVSIMD)
> +  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
>   {
> /* Scalar and SVE code can tie the result to any FMLA input (or none,
>although that requires a MOVPRFX for SVE).  However, Advanced SIMD
> @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   return false;
> def_stmt_info = vinfo->lookup_def (rhs);
> if (!def_stmt_info
> -   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
> +   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
> +   || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)

Do you see vect_constant_defs in practice, or is this just for completeness?
I would expect any constants to appear as direct operands.  I don't mind
keeping it if it's just a belt-and-braces thing though.

But rather than add the allow_constants parameter, I think we should
just try removing:

  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
return false;

so that the detection is the same for throughput and latency.  I think:

  if (vec_flags & VEC_ADVSIMD)
{
  /* Scalar and SVE code can tie the result to any FMLA input (or none,
 although that requires a MOVPRFX for SVE).  However, Advanced SIMD
 only supports MLA forms, so will require a move if the result
 cannot be tied to the accumulator.  The most important case in
 which this is true is when the accumulator input is invariant.  */
  rhs = gimple_op (assign, 3 - i);
  if (TREE_CODE (rhs) != SSA_NAME)
return false;
  def_stmt_info = vinfo->lookup_

Re: [PATCH v2][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-08-02 Thread Alex Coplan via Gcc-patches

On 26/07/2023 16:26, Jason Merrill wrote:
> On 6/28/23 06:35, Alex Coplan wrote:
> > Hi,
> > 
> > This patch implements clang's __has_feature and __has_extension in GCC.
> > This is a v2 of the original RFC posted here:
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617878.html
> > 
> > Changes since v1:
> >   - Follow the clang behaviour where -pedantic-errors means that
> > __has_extension behaves exactly like __has_feature.
> >   - We're now more conservative with reporting C++ features as extensions
> > available in C++98. For features where we issue a pedwarn in C++98
> > mode, we no longer report these as available extensions for C++98.
> >   - Switch to using a hash_map to store the features. As well as ensuring
> > lookup is constant time, this allows us to dynamically register
> > features (right now based on frontend, but later we could allow the
> > target to register additional features).
> >   - Also implement some Objective-C features, add a langhook to dispatch
> > to each frontend to allow it to register language-specific features.
> 
> Hmm, it seems questionable to use a generic langhook for something that the
> generic code doesn't care about, only the c-family front ends.  A common
> pattern in c-family is to declare a signature in c-common.h and define it
> differently for the various front-ends, i.e. in the *-lang.cc files.

Thanks. I wasn't sure if, for each frontend, there was a source file
that gets linked into exactly one frontend, but it looks like the
*-lang.cc files will do the job. I'll rework the patch to drop the
langhook and use this approach instead.

> 
> > There is an outstanding question around what to do with
> > cxx_binary_literals in the C frontend for C2x. Should we introduce a new
> > c_binary_literals feature that is a feature in C2x and an extension
> > below that, or should we just continue using the cxx_binary_literals
> > feature and mark that as a standard feature in C2x? See the comment in
> > c_feature_table in the patch.
> 
> What does clang do here?

The status quo in clang is that there is no identifier that gets
reported as a feature for this in C (even with -std=c2x).
cxx_binary_literals is reported just as an extension (even with -std=c2x).
It does seem that there should be at least one identifier which reports
this as a feature with -std=c2x, though. WDYT?

> 
> > There is also some doubt over what to do with the undocumented "tls"
> > feature.  In clang this is gated on whether the target supports TLS, but
> > in clang (unlike GCC) it is a hard error to use TLS when the target
> > doesn't support it.  In GCC I believe you can always use TLS, you just
> > get emulated TLS in the case that the target doesn't support it
> > natively.  So in this patch GCC always reports having the "tls" feature.
> > Would appreciate if anyone has feedback on this aspect.
> 
> Hmm, I don't think GCC always supports TLS, given that the testsuite has a
> predicate to check for that support (and others to check for emulated or
> native support).

Hmm, I see there is a check_effective_target_tls predicate for this,
indeed. I wonder if this might be a holdover, though. I can't seem to
configure a GCC without TLS. Even if I configure with
--target=aarch64-none-elf --disable-tls, for example, I get emutls
if I compile code using thread-local variables.

Do we know of a GCC configuration where thread-local variables actually
get rejected (and hence check_effective_target_tls returns false)?

> 
> But I think it's right to report having "tls" for emulated support.
> 
> > I know Iain was concerned that it should be possible to have
> > target-specific features. Hopefully it is clear that the design in this
> > patch is more amenable in this. I think for Darwin it should be possible
> > to add a targetcm hook to register additional features (either passing
> > through a callback to allow the target code to add to the hash_map, or
> > exposing a separate langhook that the target can call to register
> > features).
> 
> The design seems a bit complicated still, with putting a callback into the
> map.  Do we need the callbacks?  Do we expect the value of __has_feature to
> change at different points in compilation?  Does that happen in clang?

This is a good point. Certainly if we were to add features that depend
on the target architecture features, then this can change mid-way
through a TU, so having this flexibility in the design does provide some
potential future-proofing.

I had a look through the existing features, and I did wonder about cases
like this:

__attribute__((no_sanitize("undefined")))
int f() {
  return __has_feature (undefined_behavior_sanitizer);
}

but of course since __has_feature is evaluated during preprocessing,
there's no way that the attribute could be taken into account here (and
indeed clang does not).

I'll drop the callbacks from the patch for now, unless you think we
should keep them for future-proofing.

> 
> > B

RE: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Tamar Christina via Gcc-patches

> Tamar Christina  writes:
> > Hi All,
> >
> > When determining issue rates we currently discount non-constant MLA
> > accumulators for Advanced SIMD but don't do it for the latency.
> >
> > This means the costs for Advanced SIMD with a constant accumulator are
> > wrong and results in us costing SVE and Advanced SIMD the same.  This
> > can cauze us to vectorize with Advanced SIMD instead of SVE in some cases.
> >
> > This patch adds the same discount for SVE and Scalar as we do for issue 
> > rate.
> >
> > My assumption was that on issue rate we reject all scalar constants
> > early because we take into account the extra instruction to create the
> constant?
> > Though I'd have expected this to be in prologue costs.  For this
> > reason I added an extra parameter to allow me to force the check to at
> > least look for the multiplication.
> 
> I'm not sure that was it.  I wish I'd added a comment to say what it was
> though :(  I suspect different parts of this function were written at 
> different
> times, hence the inconsistency.
> 
> > This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
> > Neoverse cores.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
> > allow_constants.
> > (aarch64_adjust_stmt_cost): Use it.
> > (aarch64_vector_costs::count_ops): Likewise.
> > (aarch64_vector_costs::add_stmt_cost): Pass vinfo to
> > aarch64_adjust_stmt_cost.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c9725
> 57e2f83
> > b63ba365fea9 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum
> vect_cost_for_stmt kind,
> > or multiply-subtract sequence that might be suitable for fusing into a
> > single instruction.  If VEC_FLAGS is zero, analyze the operation as
> > a scalar one, otherwise analyze it as an operation on vectors with those
> > -   VEC_* flags.  */
> > +   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators
> including
> > +   constant ones.  */
> >  static bool
> >  aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
> > -   unsigned int vec_flags)
> > +   unsigned int vec_flags, bool allow_constants)
> >  {
> >gassign *assign = dyn_cast (stmt_info->stmt);
> >if (!assign)
> > @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >if (code != PLUS_EXPR && code != MINUS_EXPR)
> >  return false;
> >
> > -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> > -  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> > +  if (!allow_constants
> > +  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
> > + || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
> >  return false;
> >
> >for (int i = 1; i < 3; ++i)
> > @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo,
> stmt_vec_info stmt_info,
> >if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
> > continue;
> >
> > -  if (vec_flags & VEC_ADVSIMD)
> > +  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
> > {
> >   /* Scalar and SVE code can tie the result to any FMLA input (or none,
> >  although that requires a MOVPRFX for SVE).  However, Advanced
> > SIMD @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info
> *vinfo, stmt_vec_info stmt_info,
> > return false;
> >   def_stmt_info = vinfo->lookup_def (rhs);
> >   if (!def_stmt_info
> > - || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
> > + || STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
> 
> Do you see vect_constant_defs in practice, or is this just for completeness?
> I would expect any constants to appear as direct operands.  I don't mind
> keeping it if it's just a belt-and-braces thing though.

In the latency case where I had allow_constants the early rejection based on
the operand itself wouldn't be rejected so in that case I still needed to reject
them but do so after the multiply check.  While they do appear as direct
operands as well they also have their own nodes, in particular for SLP so the
constants are handled as a group.

But can also check CONSTANT_CLASS_P (rhs) if that's preferrable. 

> 
> But rather than add the allow_constants parameter, I think we should just try
> removing:
> 
>   if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>   || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
> return false;
> 
> so that the detection is the same for throughput and latency.  I think:
> 
>   if (ve

Re: [PATCH]AArch64 update costing for combining vector conditionals

2023-08-02 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi All,
>
> boolean comparisons have different cost depending on the mode. e.g.
> a && b when predicated doesn't require an addition instruction, the AND is 
> free

Nit (for the commit msg): additional

Maybe:

  for SVE, a && b doesn't require an additional instruction when a or b
  is predicated, ...

?

> by combining the predicate of the one operation into the second one.  At the
> moment though we only fuse compares so this update requires one of the
> operands to be a comparison.
>
> Scalars also don't require this because the non-ifct variant is a series of

Typo: ifcvt

> branches where following the branch sequences themselves are natural ANDs.
>
> Advanced SIMD however does require an actual AND to combine the boolean 
> values.
>
> As such this patch discounts Scalar and SVE boolean operation latency and
> throughput.
>
> With this patch comparison heavy code prefers SVE as it should, especially in
> cases with SVE VL == Advanced SIMD VL where previously the SVE prologue costs
> would tip it towards Advanced SIMD.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_bool_compound_p): New.
>   (aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Use it.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> b1bacc734b4630257b6ebf8ca7d9afeb34008c10..55963bb28be7ede08b05fb9fddb5a65f6818c63e
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -16453,6 +16453,49 @@ aarch64_multiply_add_p (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>return false;
>  }
>  
> +/* Return true if STMT_INFO is the second part of a two-statement boolean AND
> +   expression sequence that might be suitable for fusing into a
> +   single instruction.  If VEC_FLAGS is zero, analyze the operation as
> +   a scalar one, otherwise analyze it as an operation on vectors with those
> +   VEC_* flags.  */
> +
> +static bool
> +aarch64_bool_compound_p (vec_info *vinfo, stmt_vec_info stmt_info,
> +  unsigned int vec_flags)
> +{
> +  gassign *assign = dyn_cast (stmt_info->stmt);
> +  if (!assign
> +  || !STMT_VINFO_VECTYPE (stmt_info)
> +  || !VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_info))
> +  || gimple_assign_rhs_code (assign) != BIT_AND_EXPR)

Very minor, sorry, but I think the condition reads more naturally
if the BIT_AND_EXPR test comes immediately after the !assign.

OK with that change, thanks.

Richard

> +return false;
> +
> +  for (int i = 1; i < 3; ++i)
> +{
> +  tree rhs = gimple_op (assign, i);
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> + continue;
> +
> +  stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
> +  if (!def_stmt_info
> +   || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
> + continue;
> +
> +  gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
> +  if (!rhs_assign
> +   || TREE_CODE_CLASS (gimple_assign_rhs_code (rhs_assign))
> + != tcc_comparison)
> + continue;
> +
> +  if (vec_flags & VEC_ADVSIMD)
> + return false;
> +
> +  return true;
> +}
> +  return false;
> +}
> +
>  /* We are considering implementing STMT_INFO using SVE.  If STMT_INFO is an
> in-loop reduction that SVE supports directly, return its latency in 
> cycles,
> otherwise return zero.  SVE_COSTS specifies the latencies of the relevant
> @@ -16750,11 +16793,17 @@ aarch64_adjust_stmt_cost (vec_info *vinfo, 
> vect_cost_for_stmt kind,
>   }
>  
>gassign *assign = dyn_cast (STMT_VINFO_STMT (stmt_info));
> -  if (assign && !vect_is_reduction (stmt_info))
> +  if (assign)
>   {
> bool simd_p = vec_flags & VEC_ADVSIMD;
> /* For MLA we need to reduce the cost since MLA is 1 instruction.  */
> -   if (aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
> +   if (!vect_is_reduction (stmt_info)
> +   && aarch64_multiply_add_p (vinfo, stmt_info, vec_flags, !simd_p))
> + return 0;
> +
> +   /* For vector boolean ANDs with a compare operand we just need
> +  one insn.  */
> +   if (aarch64_bool_compound_p (vinfo, stmt_info, vec_flags))
>   return 0;
>   }
>  
> @@ -16831,6 +16880,12 @@ aarch64_vector_costs::count_ops (unsigned int count, 
> vect_cost_for_stmt kind,
>&& aarch64_multiply_add_p (m_vinfo, stmt_info, m_vec_flags, false))
>  return;
>  
> +  /* Assume that bool AND with compare operands will become a single
> + operation.  */
> +  if (stmt_info
> +  && aarch64_bool_compound_p (m_vinfo, stmt_info, m_vec_flags))
> +return;
> +
>/* Count the basic operation cost associated with KIND.  */
>switch (kind)
>  {

Re: [PATCH]AArch64 update costing for MLA by invariant

2023-08-02 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > When determining issue rates we currently discount non-constant MLA
>> > accumulators for Advanced SIMD but don't do it for the latency.
>> >
>> > This means the costs for Advanced SIMD with a constant accumulator are
>> > wrong and results in us costing SVE and Advanced SIMD the same.  This
>> > can cauze us to vectorize with Advanced SIMD instead of SVE in some cases.
>> >
>> > This patch adds the same discount for SVE and Scalar as we do for issue 
>> > rate.
>> >
>> > My assumption was that on issue rate we reject all scalar constants
>> > early because we take into account the extra instruction to create the
>> constant?
>> > Though I'd have expected this to be in prologue costs.  For this
>> > reason I added an extra parameter to allow me to force the check to at
>> > least look for the multiplication.
>> 
>> I'm not sure that was it.  I wish I'd added a comment to say what it was
>> though :(  I suspect different parts of this function were written at 
>> different
>> times, hence the inconsistency.
>> 
>> > This gives a 5% improvement in fotonik3d_r in SPECCPU 2017 on large
>> > Neoverse cores.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64.cc (aarch64_multiply_add_p): Add param
>> >allow_constants.
>> >(aarch64_adjust_stmt_cost): Use it.
>> >(aarch64_vector_costs::count_ops): Likewise.
>> >(aarch64_vector_costs::add_stmt_cost): Pass vinfo to
>> >aarch64_adjust_stmt_cost.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> > b/gcc/config/aarch64/aarch64.cc index
>> >
>> 560e5431636ef46c41d56faa0c4e95be78f64b50..76b74b77b3f122a3c9725
>> 57e2f83
>> > b63ba365fea9 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -16398,10 +16398,11 @@ aarch64_advsimd_ldp_stp_p (enum
>> vect_cost_for_stmt kind,
>> > or multiply-subtract sequence that might be suitable for fusing into a
>> > single instruction.  If VEC_FLAGS is zero, analyze the operation as
>> > a scalar one, otherwise analyze it as an operation on vectors with 
>> > those
>> > -   VEC_* flags.  */
>> > +   VEC_* flags.  When ALLOW_CONSTANTS we'll recognize all accumulators
>> including
>> > +   constant ones.  */
>> >  static bool
>> >  aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info stmt_info,
>> > -  unsigned int vec_flags)
>> > +  unsigned int vec_flags, bool allow_constants)
>> >  {
>> >gassign *assign = dyn_cast (stmt_info->stmt);
>> >if (!assign)
>> > @@ -16410,8 +16411,9 @@ aarch64_multiply_add_p (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >if (code != PLUS_EXPR && code != MINUS_EXPR)
>> >  return false;
>> >
>> > -  if (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>> > -  || CONSTANT_CLASS_P (gimple_assign_rhs2 (assign)))
>> > +  if (!allow_constants
>> > +  && (CONSTANT_CLASS_P (gimple_assign_rhs1 (assign))
>> > +|| CONSTANT_CLASS_P (gimple_assign_rhs2 (assign
>> >  return false;
>> >
>> >for (int i = 1; i < 3; ++i)
>> > @@ -16429,7 +16431,7 @@ aarch64_multiply_add_p (vec_info *vinfo,
>> stmt_vec_info stmt_info,
>> >if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
>> >continue;
>> >
>> > -  if (vec_flags & VEC_ADVSIMD)
>> > +  if (!allow_constants && (vec_flags & VEC_ADVSIMD))
>> >{
>> >  /* Scalar and SVE code can tie the result to any FMLA input (or none,
>> > although that requires a MOVPRFX for SVE).  However, Advanced
>> > SIMD @@ -16441,7 +16443,8 @@ aarch64_multiply_add_p (vec_info
>> *vinfo, stmt_vec_info stmt_info,
>> >return false;
>> >  def_stmt_info = vinfo->lookup_def (rhs);
>> >  if (!def_stmt_info
>> > -|| STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def)
>> > +|| STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_external_def
>> > +|| STMT_VINFO_DEF_TYPE (def_stmt_info) == vect_constant_def)
>> 
>> Do you see vect_constant_defs in practice, or is this just for completeness?
>> I would expect any constants to appear as direct operands.  I don't mind
>> keeping it if it's just a belt-and-braces thing though.
>
> In the latency case where I had allow_constants the early rejection based on
> the operand itself wouldn't be rejected so in that case I still needed to 
> reject
> them but do so after the multiply check.  While they do appear as direct
> operands as well they also have their own nodes, in particular for SLP so the
> constants are handled as a group.

Ah, OK, thanks.

> But can also check CONSTANT_CLASS_P (rhs) if that's preferrable. 

No, what you did is more correct.  I just wasn't sure at first which case
it was handling.

Thanks,
Richard

[PATCH] Make add_phi_node_to_bb static

2023-08-02 Thread Richard Biener via Gcc-patches

The only exported PHI allocation already adds the PHI node to a block.

Bootstrapped on x86_64-unknown-linux-gnu, pushed.

* tree-phinodes.h (add_phi_node_to_bb): Remove.
* tree-phinodes.cc  (add_phi_node_to_bb): Make static.
---
 gcc/tree-phinodes.cc | 3 +--
 gcc/tree-phinodes.h  | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/tree-phinodes.cc b/gcc/tree-phinodes.cc
index 976f3dbae10..63baec4c16a 100644
--- a/gcc/tree-phinodes.cc
+++ b/gcc/tree-phinodes.cc
@@ -315,7 +315,7 @@ reserve_phi_args_for_new_edge (basic_block bb)
 
 /* Adds PHI to BB.  */
 
-void
+static void
 add_phi_node_to_bb (gphi *phi, basic_block bb)
 {
   gimple_seq seq = phi_nodes (bb);
@@ -330,7 +330,6 @@ add_phi_node_to_bb (gphi *phi, basic_block bb)
 
   /* Associate BB to the PHI node.  */
   gimple_set_bb (phi, bb);
-
 }
 
 /* Create a new PHI node for variable VAR at basic block BB.  */
diff --git a/gcc/tree-phinodes.h b/gcc/tree-phinodes.h
index be114e317b4..99209ad3392 100644
--- a/gcc/tree-phinodes.h
+++ b/gcc/tree-phinodes.h
@@ -22,7 +22,6 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void phinodes_print_statistics (void);
 extern void reserve_phi_args_for_new_edge (basic_block);
-extern void add_phi_node_to_bb (gphi *phi, basic_block bb);
 extern gphi *create_phi_node (tree, basic_block);
 extern void add_phi_arg (gphi *, tree, edge, location_t);
 extern void remove_phi_args (edge);
-- 
2.35.3

[PATCH] aarch64: SVE/NEON Bridging intrinsics

2023-08-02 Thread Richard Ball via Gcc-patches


ACLE has added intrinsics to bridge between SVE and Neon.

The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.

This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq

gcc/ChangeLog:

* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (GTY): Externs aarch64_simd_types.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
 Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h: New function.
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
 (class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
 (NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE 
functions.

* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
 (parse_neon_type): Parser for NEON types.
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(NEON_SVE_BRIDGE_SHAPE): Defines macro for NEON_SVE_BRIDGE shapes.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc (DEF_NEON_SVE_FUNCTION): 
Defines
 macro for NEON_SVE_BRIDGE functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-builtins.h: New header file to extern neon 
types.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New instrinsics
 function def file.
* config/aarch64/arm_neon_sve_bridge.h: New header file.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/neon-sve-bridge.c: New test.

#

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
d88071773c9e1280cc5f38e36e09573214323b48..ca55992200dbe58782c3dbf66906339de021ba6b 
100644

--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -334,7 +334,7 @@ m32c*-*-*)
  ;;
  aarch64*-*-*)
cpu_type=aarch64
-   extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h 
arm_neon_sve_bridge.h"

c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
d_target_objs="aarch64-d.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.h 
b/gcc/config/aarch64/aarch64-builtins.h

new file mode 100644
index 
..eebde448f92c230c8f88b4da1ca8ebd9670b1536

--- /dev/null
+++ b/gcc/config/aarch64/aarch64-builtins.h
@@ -0,0 +1,86 @@
+/* Builtins' description for AArch64 SIMD architecture.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+   This file is part of GCC.
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+#ifndef GCC_AARCH64_BUILTINS_H
+#define GCC_AARCH64_BUILTINS_H
+#include "tree.h"
+enum aarch64_type_qualifiers
+{
+  /* T foo.  */
+  qualifier_none = 0x0,
+  /* unsigned T foo.  */
+  qualifier_unsigned = 0x1, /* 1 << 0  */
+  /* const T foo.  */
+  qualifier_const = 0x2, /* 1 << 1  */
+  /* T *foo.  */
+  qualifier_pointer = 0x4, /* 1 << 2  */
+  /* Used when expanding arguments if an operand could
+ be an immediate.  */
+  qualifier_immediate = 0x8, /* 1 << 3  */
+  qualifier_maybe_immediate = 0x10, /* 1 << 4  */
+  /* void foo (...).  */
+  qualifier_void = 0x20, /* 1 << 5  */
+  /* 1 << 6 is now unused */
+  /* Some builtins should use the T_*mode* encoded in a simd_builtin_datum
+ rather than using the type of the operand.  */
+  qualifier_map_mode = 0x80, /* 1 << 7  */
+  /* qualifier_pointer | qualifier_map_mode  */
+  qualifier_pointer_map_mode = 0x84,
+  /* qualifier_const | qualifier_pointer | qualifier_map_mode  */
+  qualifier_const_pointer_map_mode = 0x86,
+  /* Polynomial types.  */
+  qualifier_poly = 0x100,
+  /* Lane indices - must be in range, and flipped for bigendian.  */
+  qualifier_lane_index = 0x200,
+  /* Lane

Re: _BitInt vs. _Atomic

2023-08-02 Thread Michael Matz via Gcc-patches

Hello,

On Tue, 1 Aug 2023, Joseph Myers wrote:

> > Only because cmpxchg is defined in terms of memcpy/memcmp.  If it were 
> > defined in terms of the == operator (obviously applied recursively 
> > member-wise for structs) and simple-assignment that wouldn't be a problem.  
> 
> It also wouldn't work for floating point, where I think clearly the atomic 
> operations should consider positive and negative zero as different, and 
> should consider different DFP quantum exponents for the same real number 
> as different - but should also consider the same NaN (same payload, same 
> choice of quiet / signaling) as being the same.

That is all true.  But the current wording can't work either.  It happily 
requires copying around memory between types of different representations 
and sizes, it makes padding observable behaviour, and due to that makes 
basic algebraic guarantees not be observed (after two values are checked 
equal with the predicates of the algrabra, they are not then in fact equal 
with predicates of the same algebra).

Ciao,
Michael.

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Biener via Gcc-patches

On Tue, 1 Aug 2023, Richard Sandiford wrote:

> Richard Sandiford  writes:
> > Richard Biener via Gcc-patches  writes:
> >> The following makes sure to limit the shift operand when vectorizing
> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> >> operand otherwise invokes undefined behavior.  When we determine
> >> whether we can demote the operand we know we at most shift in the
> >> sign bit so we can adjust the shift amount.
> >>
> >> Note this has the possibility of un-CSEing common shift operands
> >> as there's no good way to share pattern stmts between patterns.
> >> We'd have to separately pattern recognize the definition.
> >>
> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >>
> >> Not sure about LSHIFT_EXPR, it probably has the same issue but
> >> the fallback optimistic zero for out-of-range shifts is at least
> >> "corrrect".  Not sure we ever try to demote rotates (probably not).
> >
> > I guess you mean "correct" for x86?  But that's just a quirk of x86.
> > IMO the behaviour is equally wrong for LSHIFT_EXPR.

I meant "correct" for the constant folding that evaluates out-of-bound
shifts as zero.

> Sorry for the multiple messages.  Wanted to get something out quickly
> because I wasn't sure how long it would take me to write this...
> 
> On rotates, for:
> 
> void
> foo (unsigned short *restrict ptr)
> {
>   for (int i = 0; i < 200; ++i)
> {
>   unsigned int x = ptr[i] & 0xff0;
>   ptr[i] = (x << 1) | (x >> 31);
> }
> }
> 
> we do get:
> 
> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
> 
> although aarch64 doesn't provide rrotate patterns, so nothing actually
> comes of it.

I think it's still correct that we only need unsigned:13 for the input,
we know other bits are zero.  But of course when actually applying
this as documented

/* Record that STMT_INFO could be changed from operating on TYPE to
   operating on a type with the precision and sign given by PRECISION
   and SIGN respectively.

the operation itself has to be altered (the above doesn't suggest
promoting/demoting the operands to TYPE is the only thing to do).

So it seems to be the burden is on the consumers of the information?

> I think the handling of variable shifts is flawed for other reasons.  Given:
> 
> void
> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
> {
>   for (int i = 0; i < 200; ++i)
> ptr1[i] = ptr1[i] >> ptr2[i];
> }
> 
> void
> us (unsigned short *restrict ptr1, short *restrict ptr2)
> {
>   for (int i = 0; i < 200; ++i)
> ptr1[i] = ptr1[i] >> ptr2[i];
> }
> 
> void
> su (short *restrict ptr1, unsigned short *restrict ptr2)
> {
>   for (int i = 0; i < 200; ++i)
> ptr1[i] = ptr1[i] >> ptr2[i];
> }
> 
> void
> ss (short *restrict ptr1, short *restrict ptr2)
> {
>   for (int i = 0; i < 200; ++i)
> ptr1[i] = ptr1[i] >> ptr2[i];
> }
> 
> we only narrow uu and ss, due to:
> 
>   /* Ignore codes that don't take uniform arguments.  */
>   if (!types_compatible_p (TREE_TYPE (op), type))
> return;

I suppose that's because we care about the shift operand at all here.
We could possibly use [0 .. precision-1] as known range for it
and only if that doesn't fit 'type' give up (and otherwise simply
ignore the input range of the shift operands here).

> in vect_determine_precisions_from_range.  Maybe we should drop
> the shift handling from there and instead rely on
> vect_determine_precisions_from_users, extending:
> 
>   if (TREE_CODE (shift) != INTEGER_CST
>   || !wi::ltu_p (wi::to_widest (shift), precision))
> return;
> 
> to handle ranges where the max is known to be < precision.
> 
> There again, if masking is enough for right shifts and right rotates,
> maybe we should keep the current handling for then (with your fix)
> and skip the types_compatible_p check for those cases.

I think it should be enough for left-shifts as well?  If we lshift
out like 0x100 << 9 so the lhs range is [0,0] the input range from
op0 will still make us use HImode.  I think we only ever get overly
conservative answers for left-shifts from this function?

Whatever works for RROTATE should also work for LROTATE.

> So:
> 
> - restrict shift handling in vect_determine_precisions_from_range to
>   RSHIFT_EXPR and RROTATE_EXPR
> 
> - remove types_compatible_p restriction for those cases
> 
> - extend vect_determine_precisions_from_users shift handling to check
>   for ranges on the shift amount
> 
> Does that sound right?

I'm not sure.   This all felt somewhat fragile when looking closer
(I was hoping you would eventually tackle it from the older
referenced bug) ... so my main idea was to perform incremental changes
where I have test coverage (as with the new wrong-code bug).

Originally I completely disabled shift support but that regressed
the over-widen testcases a lot which at least have widened shifts
by constants a lot.

x86 has vector rotates only for AMD XOP (which is d

Re: [RFC] light expander sra for parameters and returns

2023-08-02 Thread Richard Biener via Gcc-patches

On Tue, 1 Aug 2023, Jiufu Guo wrote:

> 
> Hi,
> 
> Richard Biener  writes:
> 
> > On Mon, 24 Jul 2023, Jiufu Guo wrote:
> >
> >> 
> >> Hi Martin,
> >> 
> >> Not sure about your current option about re-using the ipa-sra code
> >> in the light-expander-sra. And if anything I could input please
> >> let me know.
> >> 
> >> And I'm thinking about the difference between the expander-sra, ipa-sra
> >> and tree-sra. 1. For stmts walking, expander-sra has special behavior
> >> for return-stmt, and also a little special on assign-stmt. And phi
> >> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
> >> I'm also thinking if we need a tree structure; it would be useful when
> >> checking overlaps, it was not used now in the expander-sra.
> >> 
> >> For ipa-sra and tree-sra, I notice that there is some similar code,
> >> but of cause there are differences. While it seems the difference
> >> is 'intended', for example: 1. when creating and accessing,
> >> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
> >> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
> >> not ok for tree-ipa.  
> >> I'm wondering if those slight difference blocks re-use the code
> >> between ipa-sra and tree-sra.
> >> 
> >> The expander-sra may be more light, for example, maybe we can use
> >> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
> >> need to walk all the stmts.
> >
> > What I was hoping for is shared stmt-level analysis and a shared
> > data structure for the "access"(es) a stmt performs.  Because that
> > can come up handy in multiple places.  The existing SRA data
> > structures could easily embed that subset for example if sharing
> > the whole data structure of [IPA] SRA seems too unwieldly.
> 
> Understand.
> The stmt-level analysis and "access" data structure are similar
> between ipa-sra/tree-sra and the expander-sra.
> 
> I just update the patch, this version does not change the behaviors of
> the previous version.  It is just cleaning/merging some functions only.
> The patch is attached.
> 
> This version (and tree-sra/ipa-sra) is still using the similar
> "stmt analyze" and "access struct"".  This could be extracted as
> shared code.
> I'm thinking to update the code to use the same "base_access" and
> "walk function".
> 
> >
> > With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
> > possible (though RTL expansion pre-walks all stmts anyway).
> 
> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
> For struct parameters, walking stmt is needed.

I think I mentioned this before, RTL expansion already
pre-walks the whole function looking for variables it has to
expand to the stack in discover_nonconstant_array_refs (which is
now badly named), I'd appreciate if the "SRA" walk would piggy-back
on that existing walk.

For RTL expansion I think a critical part is to create accesses
based on the incoming/outgoing RTL which is specified by the ABI.
As I understand we are optimizing the argument setup code which
assigns the incoming arguments to either pseudo(s) or the stack
and thus we get to choose an optimized "mode" for that virtual
location of the incoming arguments (but we can't alter their
hardregs/stack assignment obviously).  So when we have an
incoming register pair we should create an artificial access
for the pieces those two registers represent.

You seem to do quite some adjustment to the parameter setup
where I was hoping we get away with simply choosing a different
mode for the virtual argument representation?

But I'm not too familiar with the innards of parameter/return
value initial RTL expansion.  I hope somebody else can chime
in here as well.

Richard.


> 
> BR,
> Jeff (Jiufu Guo)
> 
> -
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index edf292cfbe9..8c36ad5df79 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -97,6 +97,502 @@ static bool defer_stack_allocation (tree, bool);
>  
>  static void record_alignment_for_reg_var (unsigned int);
>  
> +extern rtx
> +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
> +
> +/* For light SRA in expander about paramaters and returns.  */
> +namespace
> +{
> +
> +struct access
> +{
> +  /* Each accessing on the aggragate is about OFFSET/SIZE.  */
> +  HOST_WIDE_INT offset;
> +  HOST_WIDE_INT size;
> +
> +  bool writing;
> +
> +  /* The context expression of this access.  */
> +  tree expr;
> +
> +  /* The rtx for the access: link to incoming/returning register(s).  */
> +  rtx rtx_val;
> +};
> +
> +typedef struct access *access_p;
> +
> +/* Expr (tree) -> Scalarized value (rtx) map.  */
> +static hash_map *expr_rtx_vec;
> +
> +/* Base (tree) -> Vector (vec *) map.  */
> +static hash_map > *base_access_vec;
> +
> +/* Return true if EXPR has interesting access to the sra candidates,
> +   and created access, return false otherwise.  */
> +
> +static struct access *
> +build_access (tree expr, bool write)

[PATCH 1/2] Add virtual operand global liveness computation class

2023-08-02 Thread Richard Biener via Gcc-patches

The following adds an on-demand global liveness computation class
computing and caching the live-out virtual operand of basic blocks
and answering live-out, live-in and live-on-edge queries.  The flow
is optimized for the intended use in code sinking which will query
live-in and possibly can be optimized further when the originating
query is for live-out.

The code relies on up-to-date immediate dominator information and
on an unchanging virtual operand state.

Bootstrapped and tested on x86_64-unknown-linux-gnu (with 2/2).

OK?

Thanks,
Richard.

* tree-ssa-live.h (class virtual_operand_live): New.
* tree-ssa-live.cc (virtual_operand_live::init): New.
(virtual_operand_live::get_live_in): Likewise.
(virtual_operand_live::get_live_out): Likewise.
---
 gcc/tree-ssa-live.cc | 75 
 gcc/tree-ssa-live.h  | 27 
 2 files changed, 102 insertions(+)

diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index 1be92956cc5..c9c2fdef0e3 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optinfo.h"
 #include "gimple-walk.h"
 #include "cfganal.h"
+#include "tree-cfg.h"
 
 static void verify_live_on_entry (tree_live_info_p);
 
@@ -1651,3 +1652,77 @@ verify_live_on_entry (tree_live_info_p live)
 }
   gcc_assert (num <= 0);
 }
+
+
+/* Virtual operand liveness analysis data init.  */
+
+void
+virtual_operand_live::init ()
+{
+  liveout = XCNEWVEC (tree, last_basic_block_for_fn (cfun) + 1);
+  liveout[ENTRY_BLOCK] = ssa_default_def (cfun, gimple_vop (cfun));
+}
+
+/* Compute live-in of BB from cached live-out.  */
+
+tree
+virtual_operand_live::get_live_in (basic_block bb)
+{
+  /* A virtual PHI is a convenient cache for live-in.  */
+  gphi *phi = get_virtual_phi (bb);
+  if (phi)
+return gimple_phi_result (phi);
+
+  if (!liveout)
+init ();
+
+  /* Since we don't have a virtual PHI we can now pick any of the
+ incoming edges liveout value.  All returns from the function have
+ a virtual use forcing generation of virtual PHIs.  */
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+if (liveout[e->src->index])
+  {
+   if (EDGE_PRED (bb, 0) != e)
+ liveout[EDGE_PRED (bb, 0)->src->index] = liveout[e->src->index];
+   return liveout[e->src->index];
+  }
+
+  /* Since virtuals are in SSA form at most the immediate dominator can
+ contain the definition of the live version.  Skipping to that deals
+ with CFG cycles as well.  */
+  return get_live_out (get_immediate_dominator (CDI_DOMINATORS, bb));
+}
+
+/* Compute live-out of BB.  */
+
+tree
+virtual_operand_live::get_live_out (basic_block bb)
+{
+  if (!liveout)
+init ();
+
+  if (liveout[bb->index])
+return liveout[bb->index];
+
+  tree lo = NULL_TREE;
+  for (auto gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+  if (gimple_vdef (stmt))
+   {
+ lo = gimple_vdef (stmt);
+ break;
+   }
+  if (gimple_vuse (stmt))
+   {
+ lo = gimple_vuse (stmt);
+ break;
+   }
+}
+  if (!lo)
+lo = get_live_in (bb);
+  liveout[bb->index] = lo;
+  return lo;
+}
diff --git a/gcc/tree-ssa-live.h b/gcc/tree-ssa-live.h
index de665d6bad0..a7604448332 100644
--- a/gcc/tree-ssa-live.h
+++ b/gcc/tree-ssa-live.h
@@ -328,4 +328,31 @@ make_live_on_entry (tree_live_info_p live, basic_block bb 
, int p)
   bitmap_set_bit (live->global, p);
 }
 
+
+/* On-demand virtual operand global live analysis.  There is at most
+   a single virtual operand live at a time, the following computes and
+   caches the virtual operand live at the exit of a basic block
+   supporting related live-in and live-on-edge queries.  */
+
+class virtual_operand_live
+{
+public:
+  virtual_operand_live() : liveout (nullptr) {}
+  ~virtual_operand_live()
+  {
+if (liveout)
+  free (liveout);
+  }
+
+  tree get_live_in (basic_block bb);
+  tree get_live_out (basic_block bb);
+  tree get_live_on_edge (edge e) { return get_live_out (e->src); }
+
+private:
+  void init ();
+
+  tree *liveout;
+};
+
+
 #endif /* _TREE_SSA_LIVE_H  */
-- 
2.35.3

[PATCH 2/2] Improve sinking with unrelated defs

2023-08-02 Thread Richard Biener via Gcc-patches

statement_sink_location for loads is currently confused about
stores that are not on the paths we are sinking across.  The
following replaces the logic that tries to ensure we are not
sinking across stores by instead of walking all immediate virtual
uses and then checking whether found stores are on the paths
we sink through with checking the live virtual operand at the
sinking location.  To obtain the live virtual operand we rely
on the new virtual_operand_live class which provides an overall
cheaper and also more precise way to check the constraints.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Comments?

Thanks,
Richard.

* tree-ssa-sink.cc: Include tree-ssa-live.h.
(pass_sink_code::execute): Instantiate virtual_operand_live
and pass it down.
(sink_code_in_bb): Pass down virtual_operand_live.
(statement_sink_location): Get virtual_operand_live and
verify we are not sinking loads across stores by looking up
the live virtual operand at the sink location.

* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 16 +
 gcc/tree-ssa-sink.cc| 70 +
 2 files changed, 33 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..266ceb000a5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink1-details" } */
+
+void bar ();
+int foo (int *p, int x)
+{
+  int res = *p;
+  if (x)
+{
+  bar ();
+  res = 1;
+}
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink1" } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index d83d7be587d..5cf9e737e84 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -35,6 +35,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "cfgloop.h"
 #include "tree-eh.h"
+#include "tree-ssa-live.h"
 
 /* TODO:
1. Sinking store only using scalar promotion (IE without moving the RHS):
@@ -263,7 +264,8 @@ select_best_block (basic_block early_bb,
 
 static bool
 statement_sink_location (gimple *stmt, basic_block frombb,
-gimple_stmt_iterator *togsi, bool *zero_uses_p)
+gimple_stmt_iterator *togsi, bool *zero_uses_p,
+virtual_operand_live &vop_live)
 {
   gimple *use;
   use_operand_p one_use = NULL_USE_OPERAND_P;
@@ -386,10 +388,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
   if (commondom == frombb)
return false;
 
-  /* If this is a load then do not sink past any stores.
-Look for virtual definitions in the path from frombb to the sink
-location computed from the real uses and if found, adjust
-that it a common dominator.  */
+  /* If this is a load then do not sink past any stores.  */
   if (gimple_vuse (stmt))
{
  /* Do not sink loads from hard registers.  */
@@ -398,51 +397,14 @@ statement_sink_location (gimple *stmt, basic_block frombb,
  && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)))
return false;
 
- imm_use_iterator imm_iter;
- use_operand_p use_p;
- FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (stmt))
-   {
- gimple *use_stmt = USE_STMT (use_p);
- basic_block bb = gimple_bb (use_stmt);
- /* For PHI nodes the block we know sth about is the incoming block
-with the use.  */
- if (gimple_code (use_stmt) == GIMPLE_PHI)
-   {
- /* If the PHI defines the virtual operand, ignore it.  */
- if (gimple_phi_result (use_stmt) == gimple_vuse (stmt))
-   continue;
- /* In case the PHI node post-dominates the current insert
-location we can disregard it.  But make sure it is not
-dominating it as well as can happen in a CFG cycle.  */
- if (commondom != bb
- && !dominated_by_p (CDI_DOMINATORS, commondom, bb)
- && dominated_by_p (CDI_POST_DOMINATORS, commondom, bb)
- /* If the blocks are possibly within the same irreducible
-cycle the above check breaks down.  */
- && !((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
-  && bb->loop_father == commondom->loop_father)
- && !((commondom->flags & BB_IRREDUCIBLE_LOOP)
-  && flow_loop_nested_p (commondom->loop_father,
- bb->loop_father))
- && !((bb->flags & BB_IRREDUCIBLE_LOOP)
-

Re: Fix profile upate after vectorizer peeling

2023-08-02 Thread Richard Biener via Gcc-patches

On Tue, Aug 1, 2023 at 12:15 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> This patch fixes update after constant peeling in profilogue.  We now reached 
> 0 profile
> update bugs on tramp3d vectorizaiton and also on quite few testcases, so I am 
> enabling the
> testuiste checks so we do not regress again.
>
> Bootstrapped/regtested x86_64, comitted.

Note most of the profile consistency checks FAIL when testing with -m32 on
x86_64-unknown-linux-gnu ...

For example vect-11.c has

;;   basic block 4, loop depth 0, count 719407024 (estimated locally,
freq 0.6700), maybe hot
;;   Invalid sum of incoming counts 708669602 (estimated locally, freq
0.6600), should be 719407024 (estimated locally, freq 0.6700)
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [always (guessed)]  count:708669602 (estimated
locally, freq 0.6600) (FALSE_VALUE,EXECUTABLE)
  __asm__ __volatile__("cpuid
" : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0);
  _3 = d_47 & 67108864;

so it looks like it's the check_vect () function that goes wrong
everywhere but only on i?86.
The first dump with the Invalid sum is 095t.fixup_cfg3 already.

Richard.

> Honza
>
> gcc/ChangeLog:
>
> * tree-vect-loop-manip.cc (vect_do_peeling): Fix profile update after
> constant prologue peeling.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-1-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-1.c: Check profile consistency.
> * gcc.dg/vect/vect-10-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-10.c: Check profile consistency.
> * gcc.dg/vect/vect-100.c: Check profile consistency.
> * gcc.dg/vect/vect-103.c: Check profile consistency.
> * gcc.dg/vect/vect-104.c: Check profile consistency.
> * gcc.dg/vect/vect-105-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-105.c: Check profile consistency.
> * gcc.dg/vect/vect-106.c: Check profile consistency.
> * gcc.dg/vect/vect-107.c: Check profile consistency.
> * gcc.dg/vect/vect-108.c: Check profile consistency.
> * gcc.dg/vect/vect-109.c: Check profile consistency.
> * gcc.dg/vect/vect-11.c: Check profile consistency.
> * gcc.dg/vect/vect-110.c: Check profile consistency.
> * gcc.dg/vect/vect-112-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-112.c: Check profile consistency.
> * gcc.dg/vect/vect-113.c: Check profile consistency.
> * gcc.dg/vect/vect-114.c: Check profile consistency.
> * gcc.dg/vect/vect-115.c: Check profile consistency.
> * gcc.dg/vect/vect-116.c: Check profile consistency.
> * gcc.dg/vect/vect-117.c: Check profile consistency.
> * gcc.dg/vect/vect-118.c: Check profile consistency.
> * gcc.dg/vect/vect-119.c: Check profile consistency.
> * gcc.dg/vect/vect-11a.c: Check profile consistency.
> * gcc.dg/vect/vect-12.c: Check profile consistency.
> * gcc.dg/vect/vect-120.c: Check profile consistency.
> * gcc.dg/vect/vect-121.c: Check profile consistency.
> * gcc.dg/vect/vect-122.c: Check profile consistency.
> * gcc.dg/vect/vect-123.c: Check profile consistency.
> * gcc.dg/vect/vect-124.c: Check profile consistency.
> * gcc.dg/vect/vect-126.c: Check profile consistency.
> * gcc.dg/vect/vect-13.c: Check profile consistency.
> * gcc.dg/vect/vect-14.c: Check profile consistency.
> * gcc.dg/vect/vect-15-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-15.c: Check profile consistency.
> * gcc.dg/vect/vect-17.c: Check profile consistency.
> * gcc.dg/vect/vect-18.c: Check profile consistency.
> * gcc.dg/vect/vect-19.c: Check profile consistency.
> * gcc.dg/vect/vect-2-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-2.c: Check profile consistency.
> * gcc.dg/vect/vect-20.c: Check profile consistency.
> * gcc.dg/vect/vect-21.c: Check profile consistency.
> * gcc.dg/vect/vect-22.c: Check profile consistency.
> * gcc.dg/vect/vect-23.c: Check profile consistency.
> * gcc.dg/vect/vect-24.c: Check profile consistency.
> * gcc.dg/vect/vect-25.c: Check profile consistency.
> * gcc.dg/vect/vect-26.c: Check profile consistency.
> * gcc.dg/vect/vect-27.c: Check profile consistency.
> * gcc.dg/vect/vect-28.c: Check profile consistency.
> * gcc.dg/vect/vect-29.c: Check profile consistency.
> * gcc.dg/vect/vect-3.c: Check profile consistency.
> * gcc.dg/vect/vect-30.c: Check profile consistency.
> * gcc.dg/vect/vect-31-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-31.c: Check profile consistency.
> * gcc.dg/vect/vect-32-big-array.c: Check profile consistency.
> * gcc.dg/vect/vect-32-chars.c: Check profile

Re: [PATCH] Improve sinking with unrelated defs

2023-08-02 Thread Richard Biener via Gcc-patches

On Mon, 31 Jul 2023, Richard Biener wrote:

> statement_sink_location for loads is currently confused about
> stores that are not on the paths we are sinking across.  The
> following avoids this by explicitely checking whether a block
> with a store is on any of those paths.  To not perform too many
> walks over the sub-part of the CFG between the orignal stmt
> location and the found sinking candidate we first collect all
> blocks to check and then perform a single walk from the sinking
> candidate location to the original stmt location.  We avoid enlarging
> the region by conservatively handling backedges.
> 
> The original heuristics what store locations to ignore have been
> refactored, some can possibly be removed now.
> 
> If anybody knows a cheaper way to check whether a BB is on a path
> from block A to block B which is dominated by A I'd be happy to
> know (or if there would be a clever caching method at least - I'm
> probably going to limit the number of blocks to walk to aovid
> quadraticness).

I've replaced the whole thing with something based on virtual
operand liveness.

Richard.

> Boostrapped and tested on x86_64-unknown-linux-gnu.  This depends
> on the previous sent RFC to limit testsuite fallout.
> 
>   * tree-ssa-sink.cc (pass_sink_code::execute): Mark backedges.
>   (statement_sink_location): Do not consider
>   stores that are not on any path from the original to the
>   destination location.
> 
>   * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 +++
>  gcc/tree-ssa-sink.cc| 125 
>  2 files changed, 121 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> new file mode 100644
> index 000..266ceb000a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink1-details" } */
> +
> +void bar ();
> +int foo (int *p, int x)
> +{
> +  int res = *p;
> +  if (x)
> +{
> +  bar ();
> +  res = 1;
> +}
> +  return res;
> +}
> +
> +/* { dg-final { scan-tree-dump "Sinking # VUSE" "sink1" } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index cf0a32a954b..e996f46c864 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -388,13 +388,32 @@ statement_sink_location (gimple *stmt, basic_block 
> frombb,
>  
> imm_use_iterator imm_iter;
> use_operand_p use_p;
> +   auto_bitmap bbs_to_check;
> FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_vuse (stmt))
>   {
> gimple *use_stmt = USE_STMT (use_p);
> basic_block bb = gimple_bb (use_stmt);
> +
> +   /* If there is no virtual definition here, continue.  */
> +   if (gimple_code (use_stmt) != GIMPLE_PHI
> +   && !gimple_vdef (use_stmt))
> + continue;
> +
> +   /* When the virtual definition is possibly within the same
> +  irreducible region as the current sinking location all
> +  bets are off.  */
> +   if (((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
> +&& bb->loop_father == commondom->loop_father)
> +   || ((commondom->flags & BB_IRREDUCIBLE_LOOP)
> +   && flow_loop_nested_p (commondom->loop_father,
> +  bb->loop_father))
> +   || ((bb->flags & BB_IRREDUCIBLE_LOOP)
> +   && flow_loop_nested_p (bb->loop_father,
> +  commondom->loop_father)))
> + ;
> /* For PHI nodes the block we know sth about is the incoming block
>with the use.  */
> -   if (gimple_code (use_stmt) == GIMPLE_PHI)
> +   else if (gimple_code (use_stmt) == GIMPLE_PHI)
>   {
> /* If the PHI defines the virtual operand, ignore it.  */
> if (gimple_phi_result (use_stmt) == gimple_vuse (stmt))
> @@ -402,32 +421,97 @@ statement_sink_location (gimple *stmt, basic_block 
> frombb,
> /* In case the PHI node post-dominates the current insert
>location we can disregard it.  But make sure it is not
>dominating it as well as can happen in a CFG cycle.  */
> -   if (commondom != bb
> -   && !dominated_by_p (CDI_DOMINATORS, commondom, bb)
> -   && dominated_by_p (CDI_POST_DOMINATORS, commondom, bb)
> -   /* If the blocks are possibly within the same irreducible
> -  cycle the above check breaks down.  */
> -   && !((bb->flags & commondom->flags & BB_IRREDUCIBLE_LOOP)
> -&& bb->loop_father == commondom->loop_father

Re: [PING][PATCH] ira: update allocated_hardreg_p[] in improve_allocation() [PR110254]

2023-08-02 Thread Vladimir Makarov via Gcc-patches




On 8/1/23 01:20, Surya Kumari Jangala wrote:

Ping

Sorry for delay with the answer. I was on vacation.

On 21/07/23 3:43 pm, Surya Kumari Jangala via Gcc-patches wrote:

The improve_allocation() routine does not update the
allocated_hardreg_p[] array after an allocno is assigned a register.

If the register chosen in improve_allocation() is one that already has
been assigned to a conflicting allocno, then allocated_hardreg_p[]
already has the corresponding bit set to TRUE, so nothing needs to be
done.

But improve_allocation() can also choose a register that has not been
assigned to a conflicting allocno, and also has not been assigned to any
other allocno. In this case, allocated_hardreg_p[] has to be updated.

The patch is OK for me.  Thank you for finding and fixing this issue.

2023-07-21  Surya Kumari Jangala  

gcc/
PR rtl-optimization/PR110254
* ira-color.cc (improve_allocation): Update array


I guess you missed the next line in the changelog.  I suspect it should 
be "Update array allocated_hard_reg_p."


Please, fix it before committing the patch.


---

diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index 1fb2958bddd..5807d6d26f6 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -3340,6 +3340,10 @@ improve_allocation (void)
}
/* Assign the best chosen hard register to A.  */
ALLOCNO_HARD_REGNO (a) = best;
+
+  for (j = nregs - 1; j >= 0; j--)
+   allocated_hardreg_p[best + j] = true;
+
if (internal_flag_ira_verbose > 2 && ira_dump_file != NULL)
fprintf (ira_dump_file, "Assigning %d to a%dr%d\n",
 best, ALLOCNO_NUM (a), ALLOCNO_REGNO (a));

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Tue, 1 Aug 2023, Richard Sandiford wrote:
>
>> Richard Sandiford  writes:
>> > Richard Biener via Gcc-patches  writes:
>> >> The following makes sure to limit the shift operand when vectorizing
>> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
>> >> operand otherwise invokes undefined behavior.  When we determine
>> >> whether we can demote the operand we know we at most shift in the
>> >> sign bit so we can adjust the shift amount.
>> >>
>> >> Note this has the possibility of un-CSEing common shift operands
>> >> as there's no good way to share pattern stmts between patterns.
>> >> We'd have to separately pattern recognize the definition.
>> >>
>> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> >>
>> >> Not sure about LSHIFT_EXPR, it probably has the same issue but
>> >> the fallback optimistic zero for out-of-range shifts is at least
>> >> "corrrect".  Not sure we ever try to demote rotates (probably not).
>> >
>> > I guess you mean "correct" for x86?  But that's just a quirk of x86.
>> > IMO the behaviour is equally wrong for LSHIFT_EXPR.
>
> I meant "correct" for the constant folding that evaluates out-of-bound
> shifts as zero.
>
>> Sorry for the multiple messages.  Wanted to get something out quickly
>> because I wasn't sure how long it would take me to write this...
>> 
>> On rotates, for:
>> 
>> void
>> foo (unsigned short *restrict ptr)
>> {
>>   for (int i = 0; i < 200; ++i)
>> {
>>   unsigned int x = ptr[i] & 0xff0;
>>   ptr[i] = (x << 1) | (x >> 31);
>> }
>> }
>> 
>> we do get:
>> 
>> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
>> 
>> although aarch64 doesn't provide rrotate patterns, so nothing actually
>> comes of it.
>
> I think it's still correct that we only need unsigned:13 for the input,
> we know other bits are zero.  But of course when actually applying
> this as documented
>
> /* Record that STMT_INFO could be changed from operating on TYPE to
>operating on a type with the precision and sign given by PRECISION
>and SIGN respectively.
>
> the operation itself has to be altered (the above doesn't suggest
> promoting/demoting the operands to TYPE is the only thing to do).
>
> So it seems to be the burden is on the consumers of the information?

Yeah, textually that seems fair.  Not sure I was thinking of it in
those terms at the time though. :)

>> I think the handling of variable shifts is flawed for other reasons.  Given:
>> 
>> void
>> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
>> {
>>   for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>> 
>> void
>> us (unsigned short *restrict ptr1, short *restrict ptr2)
>> {
>>   for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>> 
>> void
>> su (short *restrict ptr1, unsigned short *restrict ptr2)
>> {
>>   for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>> 
>> void
>> ss (short *restrict ptr1, short *restrict ptr2)
>> {
>>   for (int i = 0; i < 200; ++i)
>> ptr1[i] = ptr1[i] >> ptr2[i];
>> }
>> 
>> we only narrow uu and ss, due to:
>> 
>>  /* Ignore codes that don't take uniform arguments.  */
>>  if (!types_compatible_p (TREE_TYPE (op), type))
>>return;
>
> I suppose that's because we care about the shift operand at all here.
> We could possibly use [0 .. precision-1] as known range for it
> and only if that doesn't fit 'type' give up (and otherwise simply
> ignore the input range of the shift operands here).
>
>> in vect_determine_precisions_from_range.  Maybe we should drop
>> the shift handling from there and instead rely on
>> vect_determine_precisions_from_users, extending:
>> 
>>  if (TREE_CODE (shift) != INTEGER_CST
>>  || !wi::ltu_p (wi::to_widest (shift), precision))
>>return;
>> 
>> to handle ranges where the max is known to be < precision.
>> 
>> There again, if masking is enough for right shifts and right rotates,
>> maybe we should keep the current handling for then (with your fix)
>> and skip the types_compatible_p check for those cases.
>
> I think it should be enough for left-shifts as well?  If we lshift
> out like 0x100 << 9 so the lhs range is [0,0] the input range from
> op0 will still make us use HImode.  I think we only ever get overly
> conservative answers for left-shifts from this function?

But if we have:

  short x, y;
  int z = (int) x << (int) y;

and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
whereas x << y would invoke UB and x << (y & 15) would be 1.

> Whatever works for RROTATE should also work for LROTATE.

I think the same problem affects LROTATE.

>> So:
>> 
>> - restrict shift handling in vect_determine_precisions_from_range to
>>   RSHIFT_EXPR and RROTATE_EXPR
>> 
>> - remove types_compatible_p restriction for those cases
>> 
>> - extend vect_determine_precisions_from_users shift handling to check
>>   for ranges on the shif

Re: [PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

2023-08-02 Thread Kito Cheng via Gcc-patches

LGTM, thanks:)

Pan Li via Gcc-patches  於 2023年8月2日 週三 18:19 寫道：

> From: Pan Li 
>
> This patch would like to support the rounding mode API for the VFWSUB
> for the below samples.
>
> * __riscv_vfwsub_vv_f64m2_rm
> * __riscv_vfwsub_vv_f64m2_rm_m
> * __riscv_vfwsub_vf_f64m2_rm
> * __riscv_vfwsub_vf_f64m2_rm_m
> * __riscv_vfwsub_wv_f64m2_rm
> * __riscv_vfwsub_wv_f64m2_rm_m
> * __riscv_vfwsub_wf_f64m2_rm
> * __riscv_vfwsub_wf_f64m2_rm_m
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc (BASE): Add
> vfwsub frm.
> * config/riscv/riscv-vector-builtins-bases.h: Add declaration.
> * config/riscv/riscv-vector-builtins-functions.def (vfwsub_frm):
> Add vfwsub function definitions.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-widening-sub.c: New test.
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  |  3 +
>  .../riscv/riscv-vector-builtins-bases.h   |  1 +
>  .../riscv/riscv-vector-builtins-functions.def |  4 ++
>  .../riscv/rvv/base/float-point-widening-sub.c | 66 +++
>  4 files changed, 74 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index 981a4a7ede8..ddf694c771c 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -317,6 +317,7 @@ public:
>
>  /* Implements below instructions for frm
> - vfwadd
> +   - vfwsub
>  */
>  template
>  class widen_binop_frm : public function_base
> @@ -2100,6 +2101,7 @@ static CONSTEXPR const reverse_binop_frm
> vfrsub_frm_obj;
>  static CONSTEXPR const widen_binop vfwadd_obj;
>  static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
>  static CONSTEXPR const widen_binop vfwsub_obj;
> +static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
>  static CONSTEXPR const binop vfmul_obj;
>  static CONSTEXPR const binop vfdiv_obj;
>  static CONSTEXPR const reverse_binop vfrdiv_obj;
> @@ -2330,6 +2332,7 @@ BASE (vfrsub_frm)
>  BASE (vfwadd)
>  BASE (vfwadd_frm)
>  BASE (vfwsub)
> +BASE (vfwsub_frm)
>  BASE (vfmul)
>  BASE (vfdiv)
>  BASE (vfrdiv)
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index f9e1df5fe75..5800fca0169 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -150,6 +150,7 @@ extern const function_base *const vfrsub_frm;
>  extern const function_base *const vfwadd;
>  extern const function_base *const vfwadd_frm;
>  extern const function_base *const vfwsub;
> +extern const function_base *const vfwsub_frm;
>  extern const function_base *const vfmul;
>  extern const function_base *const vfmul;
>  extern const function_base *const vfdiv;
> diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def
> b/gcc/config/riscv/riscv-vector-builtins-functions.def
> index 743205a9b97..58a7224fe0c 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-functions.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
> @@ -306,8 +306,12 @@ DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds,
> f_wwv_ops)
>  DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwf_ops)
>  DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvv_ops)
>  DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvf_ops)
> +DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvv_ops)
> +DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvf_ops)
>  DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwv_ops)
>  DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwf_ops)
> +DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwv_ops)
> +DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwf_ops)
>
>  // 13.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
>  DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvv_ops)
> diff --git
> a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
> b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
> new file mode 100644
> index 000..4325cc510a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
> @@ -0,0 +1,66 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +typedef float float32_t;
> +
> +vfloat64m2_t
> +test_vfwsub_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
> +  return __riscv_vfwsub_vv_f64m2_rm (op1, op2, 0, vl);
> +}
> +
> +vfloat64m2_t
> +test_vfwsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t
> op2,
> +  size_t vl) {
> +  return __riscv_vfwsub_vv_f64m2_rm_m (mask, op1, op2, 1, vl);
> +}
> +
> +vfloat64m2_t
> +te

Re: [PATCH] Improve sinking with unrelated defs

2023-08-02 Thread Jeff Law via Gcc-patches





On 8/2/23 06:50, Richard Biener via Gcc-patches wrote:

On Mon, 31 Jul 2023, Richard Biener wrote:


statement_sink_location for loads is currently confused about
stores that are not on the paths we are sinking across.  The
following avoids this by explicitely checking whether a block
with a store is on any of those paths.  To not perform too many
walks over the sub-part of the CFG between the orignal stmt
location and the found sinking candidate we first collect all
blocks to check and then perform a single walk from the sinking
candidate location to the original stmt location.  We avoid enlarging
the region by conservatively handling backedges.

The original heuristics what store locations to ignore have been
refactored, some can possibly be removed now.

If anybody knows a cheaper way to check whether a BB is on a path
from block A to block B which is dominated by A I'd be happy to
know (or if there would be a clever caching method at least - I'm
probably going to limit the number of blocks to walk to aovid
quadraticness).


I've replaced the whole thing with something based on virtual
operand liveness.
Good.  I was trying to form something based on lowest common ancestor 
walks in the dominator tree, but couldn't convince myself that it would 
actually solve your problem.


jeff

[PATCH] PR combine/110867 Fix narrow comparison of memory and constant

2023-08-02 Thread Stefan Schulze Frielinghaus via Gcc-patches

In certain cases a constant may not fit into the mode used to perform a
comparison.  This may be the case for sign-extended constants which are
used during an unsigned comparison as e.g. in

(set (reg:CC 100 cc)
(compare:CC (mem:SI (reg/v/f:SI 115 [ a ]) [1 *a_4(D)+0 S4 A64])
(const_int -2147483648 [0x8000])))

Fixed by ensuring that the constant fits into comparison mode.

Furthermore, on some targets as e.g. sparc the constant used in a
comparison is chopped off before combine which leads to failing test
cases (see PR 110869).  Fixed by not requiring that the source mode has
to be DImode, and excluding sparc from the last two test cases entirely
since there the constant cannot be further reduced.

According to PR 110867 and 110869 this patch resolves bootstrap problems
on armv8l and sparc.  While writing this, bootstrap+regtest are still
running on x64 and s390x.  Assuming they pass, ok for mainline?

gcc/ChangeLog:

PR combine/110867
* combine.cc (simplify_compare_const): Try the optimization only
in case the constant fits into the comparison mode.

gcc/testsuite/ChangeLog:

PR combine/110869
* gcc.dg/cmp-mem-const-1.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-2.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-3.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-4.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-5.c: Exclude sparc since here the
constant is already reduced.
* gcc.dg/cmp-mem-const-6.c: Exclude sparc since here the
constant is already reduced.
---
 gcc/combine.cc | 4 
 gcc/testsuite/gcc.dg/cmp-mem-const-1.c | 2 +-
 gcc/testsuite/gcc.dg/cmp-mem-const-2.c | 2 +-
 gcc/testsuite/gcc.dg/cmp-mem-const-3.c | 2 +-
 gcc/testsuite/gcc.dg/cmp-mem-const-4.c | 2 +-
 gcc/testsuite/gcc.dg/cmp-mem-const-5.c | 4 ++--
 gcc/testsuite/gcc.dg/cmp-mem-const-6.c | 4 ++--
 7 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 0d99fa541c5..e46d202d0a7 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11998,11 +11998,15 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  x0 >= 0x40.  */
   if ((code == LEU || code == LTU || code == GEU || code == GTU)
   && is_a  (GET_MODE (op0), &int_mode)
+  && HWI_COMPUTABLE_MODE_P (int_mode)
   && MEM_P (op0)
   && !MEM_VOLATILE_P (op0)
   /* The optimization makes only sense for constants which are big enough
 so that we have a chance to chop off something at all.  */
   && (unsigned HOST_WIDE_INT) const_op > 0xff
+  /* Bail out, if the constant does not fit into INT_MODE.  */
+  && (unsigned HOST_WIDE_INT) const_op
+< ((HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1) << 1) - 1)
   /* Ensure that we do not overflow during normalization.  */
   && (code != GTU || (unsigned HOST_WIDE_INT) const_op < 
HOST_WIDE_INT_M1U))
 {
diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-1.c 
b/gcc/testsuite/gcc.dg/cmp-mem-const-1.c
index 263ad98af79..4f21a1ade4a 100644
--- a/gcc/testsuite/gcc.dg/cmp-mem-const-1.c
+++ b/gcc/testsuite/gcc.dg/cmp-mem-const-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O1 -fdump-rtl-combine-details" } */
-/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to QI" "combine" 
} } */
+/* { dg-final { scan-rtl-dump "narrow comparison from mode .I to QI" "combine" 
} } */
 
 typedef __UINT64_TYPE__ uint64_t;
 
diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-2.c 
b/gcc/testsuite/gcc.dg/cmp-mem-const-2.c
index a7cc5348295..7b722951594 100644
--- a/gcc/testsuite/gcc.dg/cmp-mem-const-2.c
+++ b/gcc/testsuite/gcc.dg/cmp-mem-const-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O1 -fdump-rtl-combine-details" } */
-/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to QI" "combine" 
} } */
+/* { dg-final { scan-rtl-dump "narrow comparison from mode .I to QI" "combine" 
} } */
 
 typedef __UINT64_TYPE__ uint64_t;
 
diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-3.c 
b/gcc/testsuite/gcc.dg/cmp-mem-const-3.c
index 06f80bf72d8..ed5059d3807 100644
--- a/gcc/testsuite/gcc.dg/cmp-mem-const-3.c
+++ b/gcc/testsuite/gcc.dg/cmp-mem-const-3.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O1 -fdump-rtl-combine-details" } */
-/* { dg-final { scan-rtl-dump "narrow comparison from mode DI to HI" "combine" 
} } */
+/* { dg-final { scan-rtl-dump "narrow comparison from mode .I to HI" "combine" 
} } */
 
 typedef __UINT64_TYPE__ uint64_t;
 
diff --git a/gcc/testsuite/gcc.dg/cmp-mem-const-4.c 
b/gcc/testsuite/gcc.dg/cmp-mem-const-4.c
index 407999abf7e..23e83372bee 100644
--- a/gcc/testsuite/gcc.dg/cmp-mem-const-4.c
+++ b/gcc/testsuite/gcc.dg/cmp-mem-const-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile { target { lp64 } } } */
 /* { dg-options "-O1 -fdump-rtl-combine-details" } */
-/*

Re: One question on the source code of tree-object-size.cc

2023-08-02 Thread Qing Zhao via Gcc-patches

Okay.  This previous small example was used to show the correct behavior of 
__bos 
for Fixed arrays when the allocation size and the TYPE_SIZE are mismatched. 

Now we agreed on the correct behavior for each of the cases for the fixed array.

Since the new “counted_by” attribute is mainly a complement to the TYPE_SIZE 
for the flexible array member.
So, GCC should just use it similarly as TYPE_SIZE. 

Based on the fixed array example, I came up a small example for the flexible 
array member with “counted_by” attribute,
And the expected correct behavior for each of the cases. 
I also put detailed comments into the example to explain why for each case. 
(Similar as the fixed array example)

Please take a look at this example and let me know any issue you see.
With my private GCC that support “counted_by” attribute, all the cases passed.

Thanks.

Qing.


#include 
#include 

struct annotated {
size_t foo;
int array[] __attribute__((counted_by (foo)));
};

#define expect(p, _v) do { \
size_t v = _v; \
if (p == v) \
__builtin_printf ("ok:  %s == %zd\n", #p, p); \
else \
{  \
  __builtin_printf ("WAT: %s == %zd (expected %zd)\n", #p, p, v); \
} \
} while (0);

#define noinline __attribute__((__noinline__))
#define SIZE_BUMP 5 

/* In general, Due to type casting, the type for the pointee of a pointer
   does not say anything about the object it points to,
   So, __builtin_object_size can not directly use the type of the pointee
   to decide the size of the object the pointer points to.

   there are only two reliable ways:
   A. observed allocations  (call to the allocation functions in the routine)
   B. observed accesses (read or write access to the location of the 
 pointer points to)

   that provide information about the type/existence of an object at
   the corresponding address.

   for A, we use the "alloc_size" attribute for the corresponding allocation
   functions to determine the object size;

   For B, we use the SIZE info of the TYPE attached to the corresponding access.
   (We treat counted_by attribute as a complement to the SIZE info of the TYPE
for FMA) 

   The only other way in C which ensures that a pointer actually points
   to an object of the correct type is 'static':

   void foo(struct P *p[static 1]);   

   See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624814.html
   for more details.  */

/* in the following function, malloc allocated more space than the value
   of counted_by attribute.  Then what's the correct behavior we expect
   the __builtin_dynamic_object_size should have for each of the cases?  */ 

static struct annotated * noinline alloc_buf_more (int index)
{
  struct annotated *p;
  p = malloc(sizeof (*p) + (index + SIZE_BUMP) * sizeof (int));
  p->foo = index;

  /*when checking the observed access p->array, we have info on both
observered allocation and observed access, 
A. from observed allocation: (index + SIZE_BUMP) * sizeof (int)
B. from observed access: p->foo * sizeof (int)

in the above, p->foo = index.
   */
   
  /* for size in the whole object: always uses A.  */
  /* for size in the sub-object: chose the smaller of A and B.
   * Please see https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625891.html
   * for details on why.  */

  /* for MAXIMUM size in the whole object: use the allocation size 
 for the whole object.  */
  expect(__builtin_dynamic_object_size(p->array, 0), (index + SIZE_BUMP) * 
sizeof(int));

  /* for MAXIMUM size in the sub-object. use the smaller of A and B.  */ 
  expect(__builtin_dynamic_object_size(p->array, 1), (p->foo) * sizeof(int));

  /* for MINIMUM size in the whole object: use the allocation size 
 for the whole object.  */
  expect(__builtin_dynamic_object_size(p->array, 2), (index + SIZE_BUMP) * 
sizeof(int));

  /* for MINIMUM size in the sub-object: use the smaller of A and B.  */
  expect(__builtin_dynamic_object_size(p->array, 3), p->foo * sizeof(int));

  /*when checking the pointer p, we only have info on the observed allocation.
So, the object size info can only been obtained from the call to malloc.
for both MAXIMUM and MINIMUM: A = (index + SIZE_BUMP) * sizeof (int)  */ 
  expect(__builtin_dynamic_object_size(p, 1), sizeof (*p) + (index + SIZE_BUMP) 
* sizeof(int));
  expect(__builtin_dynamic_object_size(p, 0), sizeof (*p) + (index + SIZE_BUMP) 
* sizeof(int));
  expect(__builtin_dynamic_object_size(p, 3), sizeof (*p) + (index + SIZE_BUMP) 
* sizeof(int));
  expect(__builtin_dynamic_object_size(p, 2), sizeof (*p) + (index + SIZE_BUMP) 
* sizeof(int));
  return p;
}

/* in the following function, malloc allocated less space than the value
   of counted_by attribute.  Then what's the correct behavior we expect
   the __builtin_dynamic_object_size should have for each of the cases?
   NOTE: this is an user error, GCC should i

Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread 钟居哲

I am concerning:

1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
2. Is it possible we could use vaadd[u] to model avg ?



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-01 22:31
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Implement vector "average" autovec pattern.
Hi,
 
this patch adds vector average patterns
 
op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1) >> 1;
 
If there is no direct support, the vectorizer can synthesize the patterns
but, presumably due to lack of narrowing operation support, won't try
a narrowing shift.  Therefore, this patch implements the expanders instead.
 
A synthesized pattern results in e.g:
vsrl.vi v2,v1,1
vsrl.vi v4,v3,1
vand.vv v1,v1,v3
vadd.vv v2,v2,v4
vand.vi v1,v1,1
vadd.vv v1,v2,v1
 
With this patch we generate:
vwadd.vv v2,v4,v1
vadd.vi v2,1
vnsrl.wi v2,v2,1
 
We manage to recover (i.e. create the latter sequence) for signed types
but not for unsigned.  I figured that offering both patterns might be the
safe thing to do but open to leaving the signed one out.  In the long
term we'd want full vectorizer support for this I suppose.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (avg3_floor):
Implement expander.
(avg3_ceil): Ditto.
* config/riscv/vector-iterators.md (ashiftrt): New iterator.
(ASHIFTRT): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vec-avg-run.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/vec-avg-template.h: New test.
---
gcc/config/riscv/autovec.md   | 66 ++
gcc/config/riscv/vector-iterators.md  |  5 ++
.../riscv/rvv/autovec/vec-avg-run.c   | 85 +++
.../riscv/rvv/autovec/vec-avg-rv32gcv.c   | 10 +++
.../riscv/rvv/autovec/vec-avg-rv64gcv.c   | 10 +++
.../riscv/rvv/autovec/vec-avg-template.h  | 33 +++
6 files changed, 209 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv32gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-rv64gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vec-avg-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7b784437c7e..23d3c2feaff 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1752,3 +1752,69 @@ (define_expand "mask_len_fold_left_plus_"
riscv_vector::reduction_type::MASK_LEN_FOLD_LEFT);
   DONE;
})
+
+;; -
+;;  [INT] Average.
+;; -
+;; Implements the following "average" patterns:
+;; floor:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2]) >> 1;
+;; ceil:
+;;  op[0] = (narrow) ((wide) op[1] + (wide) op[2] + 1)) >> 1;
+;; -
+
+(define_expand "avg3_floor"
+ [(set (match_operand: 0 "register_operand")
+   (truncate:
+(:VWEXTI
+ (plus:VWEXTI
+  (any_extend:VWEXTI
+   (match_operand: 1 "register_operand"))
+  (any_extend:VWEXTI
+   (match_operand: 2 "register_operand"))]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then a narrowing shift.  */
+  rtx ops2[] = {operands[0], tmp1, const1_rtx};
+  icode = code_for_pred_narrow_scalar (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+  DONE;
+})
+
+(define_expand "avg3_ceil"
+ [(set (match_operand: 0 "register_operand")
+   (truncate:
+(:VWEXTI
+ (plus:VWEXTI
+  (plus:VWEXTI
+   (any_extend:VWEXTI
+ (match_operand: 1 "register_operand"))
+   (any_extend:VWEXTI
+ (match_operand: 2 "register_operand")))
+  (const_int 1)]
+  "TARGET_VECTOR"
+{
+  /* First emit a widening addition.  */
+  rtx tmp1 = gen_reg_rtx (mode);
+  rtx ops1[] = {tmp1, operands[1], operands[2]};
+  insn_code icode = code_for_pred_dual_widen (PLUS, , mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops1);
+
+  /* Then add 1.  */
+  rtx tmp2 = gen_reg_rtx (mode);
+  rtx ops2[] = {tmp2, tmp1, const1_rtx};
+  icode = code_for_pred_scalar (PLUS, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops2);
+
+  /* Finally, a narrowing shift.  */
+  rtx ops3[] = {operands[0], tmp2, const1_rtx};
+  icode = code_for_pred_narrow_scalar (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops3);
+  DONE;
+})
dif

Re: [PATCH] PR combine/110867 Fix narrow comparison of memory and constant

2023-08-02 Thread Jeff Law via Gcc-patches





On 8/2/23 07:49, Stefan Schulze Frielinghaus via Gcc-patches wrote:

In certain cases a constant may not fit into the mode used to perform a
comparison.  This may be the case for sign-extended constants which are
used during an unsigned comparison as e.g. in

(set (reg:CC 100 cc)
 (compare:CC (mem:SI (reg/v/f:SI 115 [ a ]) [1 *a_4(D)+0 S4 A64])
 (const_int -2147483648 [0x8000])))

Fixed by ensuring that the constant fits into comparison mode.

Furthermore, on some targets as e.g. sparc the constant used in a
comparison is chopped off before combine which leads to failing test
cases (see PR 110869).  Fixed by not requiring that the source mode has
to be DImode, and excluding sparc from the last two test cases entirely
since there the constant cannot be further reduced.

According to PR 110867 and 110869 this patch resolves bootstrap problems
on armv8l and sparc.  While writing this, bootstrap+regtest are still
running on x64 and s390x.  Assuming they pass, ok for mainline?

gcc/ChangeLog:

PR combine/110867
* combine.cc (simplify_compare_const): Try the optimization only
in case the constant fits into the comparison mode.

gcc/testsuite/ChangeLog:

PR combine/110869
* gcc.dg/cmp-mem-const-1.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-2.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-3.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-4.c: Relax mode for constant.
* gcc.dg/cmp-mem-const-5.c: Exclude sparc since here the
constant is already reduced.
* gcc.dg/cmp-mem-const-6.c: Exclude sparc since here the
constant is already reduced.

OK
jeff

Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-08-02 Thread Manolis Tsamis

Hi all,

I'm pinging to discuss again if we want to move this forward for GCC14.

I did some testing again and I haven't been able to find obvious
regressions, including testing the code from PR86270 and PR70359 that
Richard mentioned.
I still believe that zero can be considered a special case even for
hardware that doesn't directly benefit in the comparison.
For example it happens that the testcase from the commit compiles to
one instruction less in x86:

.LFB0:
movl(%rdi), %eax
leal1(%rax), %edx
movl%edx, (%rdi)
testl%eax, %eax
je.L4
ret
.L4:
jmpg

vs

.LFB0:
movl(%rdi), %eax
addl$1, %eax
movl%eax, (%rdi)
cmpl$1, %eax
je.L4
ret
.L4:
xorl%eax, %eax
jmpg

(The xorl is not emitted  when testl is used. LLVM uses testl but also
does xor eax, eax :) )
Although this is accidental, I believe it also showcases that zero is
a preferential value in various ways.

I'm running benchmarks comparing the effects of this change and I'm
also still looking for testcases that result in problematic
regressions.
Any feedback or other concerns about this are appreciated!

Thanks,
Manolis

On Wed, Apr 26, 2023 at 9:43 AM Richard Biener
 wrote:
>
> On Wed, Apr 26, 2023 at 4:30 AM Jeff Law  wrote:
> >
> >
> >
> > On 4/25/23 01:21, Richard Biener wrote:
> > > On Tue, Apr 25, 2023 at 1:05 AM Jeff Law  wrote
> > >>
> > >>
> > >>
> > >>
> > >> On 4/24/23 02:06, Richard Biener via Gcc-patches wrote:
> > >>> On Fri, Apr 21, 2023 at 11:01 PM Philipp Tomsich
> > >>>  wrote:
> > 
> >  Any guidance on the next steps for this patch?
> > >>>
> > >>> I think we want to perform this transform later, in particular when
> > >>> the test is a loop exit test we do not want to do it as it prevents
> > >>> coalescing of the IV on the backedge at out-of-SSA time.
> > >>>
> > >>> That means doing the transform in folding and/or before inlining
> > >>> (the test could become a loop exit test) would be a no-go.  In fact
> > >>> for SSA coalescing we'd want the reverse transform in some cases, see
> > >>> PRs 86270 and 70359.
> > >>>
> > >>> If we can reliably undo for the loop case I suppose we can do the
> > >>> canonicalization to compare against zero.  In any case please split
> > >>> up the patch (note
> > >> I've also
> > >>> hoped we could eventually get rid of that part of
> > >>> tree-ssa-forwprop.cc
> > >> in favor
> > >>> of match.pd patterns since it uses GENERIC folding :/).
> > >>>
> > >> Do we have enough information to do this at expansion time?  That would
> > >> avoid introducing the target dependencies to drive this in gimple.
> > >
> > > I think so, but there isn't any convenient place to do this I think.  I 
> > > suppose
> > > there's no hope to catch it during RTL opts?
> > Combine would be the most natural place in the RTL pipeline, but it'd be
> > a 2->2 combination which would be rejected.
> >
> > We could possibly do it as a define_insn_and_split, but the gimple->RTL
> > interface seems like a better fit to me.  If TER has done its job, we
> > should see a complex enough tree node to do the right thing.
>
> Of course we'd want to get rid of TER in favor of ISEL
>
> Richard.
>
> > jeff

Re: [PATCH]AArch64 Undo vec_widen_shiftl optabs [PR106346]

2023-08-02 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi All,
>
> In GCC 11 we implemented the vectorizer optab for widening left shifts,
> however this optab is only supported for uniform shift constants.
>
> At the moment GCC still has two loop vectorization strategy (classical loop 
> and
> SLP based loop vec) and the optab is implemented as a scalar pattern.
>
> This means that when we apply it to a non-uniform constant inside a loop we 
> only
> find out during SLP build that the constants aren't uniform.  At this point 
> it's
> too late and we lose SLP entirely.
>
> Over the years I've tried various options but none of it works well:
>
> 1. Dissolving patterns during SLP built (problematic, also dissolves them for
> non-slp).
> 2. Optionally ignoring patterns for SLP build (problematic, ends up 
> interfearing
> with relevancy detection).
> 3. Relaxing contraint on SLP build to allow non-constant values and dissolving
> them after SLP build using an SLP pattern.  (problematic, ends up breaking
> shift reassociation).
>
> As a result we've concluded that for now this pattern should just be removed
> and formed during RTL.
>
> The plan is to move this to an SLP only pattern once we remove classical loop
> vectorization support from GCC, at which time we can also properly support 
> SVE's
> Top and Bottom variants.
>
> This removes the optab and reworks the RTL to recognize both the vector 
> variant
> and the intrinsics variant.  Also just simplifies all these patterns.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/106346
>   * config/aarch64/aarch64-simd.md (vec_widen_shiftl_lo_,
>   vec_widen_shiftl_hi_): Remove.
>   (aarch64_shll_internal): Renamed to...
>   (aarch64_shll): .. This.
>   (aarch64_shll2_internal): Renamed to...
>   (aarch64_shll2): .. This.
>   (aarch64_shll_n, aarch64_shll2_n): Re-use new
>   optabs.
>   * config/aarch64/constraints.md (D2, D3): New.
>   * config/aarch64/predicates.md (aarch64_simd_shift_imm_vec): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/106346
>   * gcc.target/aarch64/pr98772.c: Adjust assembly.
>   * gcc.target/aarch64/vect-widen-shift.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> d95394101470446e55f25a2397dd112239b6a54d..afd5b8632afbcddf8dad14495c3446c560eb085d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -6387,105 +6387,66 @@ (define_insn 
> "aarch64_qshl"
>[(set_attr "type" "neon_sat_shift_reg")]
>  )
>  
> -(define_expand "vec_widen_shiftl_lo_"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(match_operand:VQW 1 "register_operand" "w")
> -  (match_operand:SI 2
> -"aarch64_simd_shift_imm_bitsize_" "i")]
> -  VSHLL))]
> -  "TARGET_SIMD"
> -  {
> -rtx p = aarch64_simd_vect_par_cnst_half (mode, , false);
> -emit_insn (gen_aarch64_shll_internal (operands[0], 
> operands[1],
> -  p, operands[2]));
> -DONE;
> -  }
> -)
> -
> -(define_expand "vec_widen_shiftl_hi_"
> -   [(set (match_operand: 0 "register_operand")
> - (unspec: [(match_operand:VQW 1 "register_operand" "w")
> -  (match_operand:SI 2
> -"immediate_operand" "i")]
> -   VSHLL))]
> -   "TARGET_SIMD"
> -   {
> -rtx p = aarch64_simd_vect_par_cnst_half (mode, , true);
> -emit_insn (gen_aarch64_shll2_internal (operands[0], 
> operands[1],
> -   p, operands[2]));
> -DONE;
> -   }
> -)
> -
>  ;; vshll_n
>  
> -(define_insn "aarch64_shll_internal"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(vec_select:
> - (match_operand:VQW 1 "register_operand" "w")
> - (match_operand:VQW 2 "vect_par_cnst_lo_half" ""))
> -  (match_operand:SI 3
> -"aarch64_simd_shift_imm_bitsize_" "i")]
> -  VSHLL))]
> +(define_insn "aarch64_shll"
> +  [(set (match_operand: 0 "register_operand")
> + (ashift: (ANY_EXTEND:
> + (match_operand:VD_BHSI 1 "register_operand"))
> +  (match_operand: 2
> +"aarch64_simd_shift_imm_vec")))]

The name of this predicate seems more general than its meaning.
How about naming it aarch64_simd_shift_imm_vec_half_bitsize, to follow:

;; Predicates used by the various SIMD shift operations.  These
;; fall in to 3 categories.
;;   Shifts with a range 0-(bit_size - 1) (aarch64_simd_shift_imm)
;;   Shifts with a range 1-bit_size (aarch64_simd_shift_imm_offset)
;;   Shifts with a range 0-bit_size (aarch64_simd_shift_imm_bitsize)

Or aarch64_simd_shll_imm_v

Re: [PATCH][gensupport]: Don't segfault on empty attrs list

2023-08-02 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Hi All,
>
> Currently we segfault when len == 0 for an attribute list.
>
> essentially [cons: =0, 1, 2, 3; attrs: ] segfaults but should be equivalent to
> [cons: =0, 1, 2, 3] and [cons: =0, 1, 2, 3; attrs:].  This fixes it by just
> returning early and leaving it to the validators whether this should error out
> or not.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * gensupport.cc (conlist): Support length 0 attribute.
>
> --- inline copy of patch -- 
> diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
> index 
> 959d1d9c83cf397fcb344e8d3db0f339a967587f..5c5f1cf4781551d3db95103c19cd1b70d98f4f73
>  100644
> --- a/gcc/gensupport.cc
> +++ b/gcc/gensupport.cc
> @@ -619,6 +619,9 @@ public:
>   [ns..ns + len) should equal XSTR (rtx, 0).  */
>conlist (const char *ns, unsigned int len, bool numeric)
>{
> +if (len == 0)
> +  return;
> +
>  /* Trim leading whitespaces.  */
>  while (ISBLANK (*ns))
>{

I think instead we should add some "len" guards to the while loops:

/* Trim leading whitespaces.  */
while (len > 0 && ISBLANK (*ns))
  {
ns++;
len--;
  }

...

/* Parse off any modifiers.  */
while (len > 0 && !ISALNUM (*ns))
  {
con += *(ns++);
len--;
  }

Otherwise we could crash for a string that only contains whitespace,
or that only contains non-alphnumeric characters.

OK like that if it works.

Thanks,
Richard

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-08-02 Thread Qing Zhao via Gcc-patches



> On Aug 2, 2023, at 2:25 AM, Martin Uecker  wrote:
> 
> Am Dienstag, dem 01.08.2023 um 15:45 -0700 schrieb Kees Cook:
>> On Mon, Jul 31, 2023 at 08:14:42PM +, Qing Zhao wrote:
>>> /* In general, Due to type casting, the type for the pointee of a pointer
>>>   does not say anything about the object it points to,
>>>   So, __builtin_object_size can not directly use the type of the pointee
>>>   to decide the size of the object the pointer points to.
>>> 
>>>   there are only two reliable ways:
>>>   A. observed allocations  (call to the allocation functions in the routine)
>>>   B. observed accesses (read or write access to the location of the 
>>> pointer points to)
>>> 
>>>   that provide information about the type/existence of an object at
>>>   the corresponding address.
>>> 
>>>   for A, we use the "alloc_size" attribute for the corresponding allocation
>>>   functions to determine the object size;
>>> 
>>>   For B, we use the SIZE info of the TYPE attached to the corresponding 
>>> access.
>>>   (We treat counted_by attribute as a complement to the SIZE info of the 
>>> TYPE
>>>for FMA) 
>>> 
>>>   The only other way in C which ensures that a pointer actually points
>>>   to an object of the correct type is 'static':
>>> 
>>>   void foo(struct P *p[static 1]);   
>>> 
>>>   See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624814.html
>>>   for more details.  */
>> 
>> This is a great explanation; thank you!
>> 
>> In the future I might want to have a new builtin that will allow
>> a program to query a pointer when neither A nor B have happened. But
>> for the first version of the __counted_by infrastructure, the above
>> limitations seen fine.
>> 
>> For example, maybe __builtin_counted_size(p) (which returns sizeof(*p) +
>> sizeof(*p->flex_array_member) * p->counted_by_member). Though since
>> there might be multiple flex array members, maybe this can't work. :)
> 
> We had a _Lengthof proposal for arrays (instead of sizeof/sizeof)
> and thought about how to extend this to structs with FAM. The
> problem is that it can not rely on an attribute.
> 
> With GCC's VLA in structs you could do 
> 
> struct foo { int n; char buf[n_init]; } *p = malloc(sizeof *p);
> p->n_init = n;
> 
> and get sizeof and bounds checking with UBSan
> https://godbolt.org/z/d4nneqs3P
> 
> (but also compiler bugs and other issues)

This works great!

If later the bounds information for the FAM can be integrated into TYPE system 
just like the VLA, 
That will be ideal, then we don’t need to hack the compiler here and there to 
handle the FMA specially.
> 
> 
> Also see my experimental container library, where you can do:
> 
> vec_decl(int);
> vec(int)* v = vec_alloc(int);
> 
> vec_push(&v, 1);
> vec_push(&v, 3);
> 
> auto p = &vec_array(v);
> (*p)[1] = 1; // bounds check
> 
> Here, "vec_array()" would give you a regular C array view
> of the vector contant and with correct dynamic size, so you
> can apply "sizeof" and  have bounds checking with UBSan and
> it just works (with clang / GCC without changes). 
> https://github.com/uecker/noplate

Yes, the idea of providing a type-safe library for C also looks promising. 
thanks

Qing
> 
> 
> 
> Martin

Re: [V1][PATCH 0/3] New attribute "element_count" to annotate bounds for C99 FAM(PR108896)

2023-08-02 Thread Qing Zhao via Gcc-patches



> On Aug 1, 2023, at 6:45 PM, Kees Cook  wrote:
> 
> On Mon, Jul 31, 2023 at 08:14:42PM +, Qing Zhao wrote:
>> /* In general, Due to type casting, the type for the pointee of a pointer
>>   does not say anything about the object it points to,
>>   So, __builtin_object_size can not directly use the type of the pointee
>>   to decide the size of the object the pointer points to.
>> 
>>   there are only two reliable ways:
>>   A. observed allocations  (call to the allocation functions in the routine)
>>   B. observed accesses (read or write access to the location of the 
>> pointer points to)
>> 
>>   that provide information about the type/existence of an object at
>>   the corresponding address.
>> 
>>   for A, we use the "alloc_size" attribute for the corresponding allocation
>>   functions to determine the object size;
>> 
>>   For B, we use the SIZE info of the TYPE attached to the corresponding 
>> access.
>>   (We treat counted_by attribute as a complement to the SIZE info of the TYPE
>>for FMA) 
>> 
>>   The only other way in C which ensures that a pointer actually points
>>   to an object of the correct type is 'static':
>> 
>>   void foo(struct P *p[static 1]);   
>> 
>>   See https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624814.html
>>   for more details.  */
> 
> This is a great explanation; thank you!
> 
> In the future I might want to have a new builtin that will allow
> a program to query a pointer when neither A nor B have happened. But
> for the first version of the __counted_by infrastructure, the above
> limitations seen fine.
> 
> For example, maybe __builtin_counted_size(p) (which returns sizeof(*p) +
> sizeof(*p->flex_array_member) * p->counted_by_member). Though since
> there might be multiple flex array members, maybe this can't work. :)

What do you mean by “there might be multiple flex array members”?

Do you mean the following example:

struct annotated {
size_t foo;
int array[] __attribute__((counted_by (foo)));
};

static struct annotated * noinline alloc_buf (int index)
{
  struct annotated *p;
  p = malloc(sizeof (*p) + (index) * sizeof (int));
  p->foo = index;
  return p;
}

Int main ()
{
  struct annotated *p1, *p2;
  p1 = alloc_buf (10);
  p2 = alloc_buf (20);

  __builtin_counted_size(p1)???
  __builtin_counted_size(p2)???
}

Or something else?

Qing
> 
> -Kees
> 
> -- 
> Kees Cook

[PATCH] arm/aarch64: Add bti for all functions [PR106671]

2023-08-02 Thread Feng Xue OS via Gcc-patches

This patch extends option -mbranch-protection=bti with an optional argument
as bti[+all] to force compiler to unconditionally insert bti for all
functions. Because a direct function call at the stage of compiling might be
rewritten to an indirect call with some kind of linker-generated thunk stub
as invocation relay for some reasons. One instance is if a direct callee is
placed far from its caller, direct BL {imm} instruction could not represent
the distance, so indirect BLR {reg} should be used. For this case, a bti is
required at the beginning of the callee.

   caller() {
   bl callee
   }

=>

   caller() {
   adrp   reg, 
   addreg, reg, #constant
   blrreg
   }

Although the issue could be fixed with a pretty new version of ld, here we
provide another means for user who has to rely on the old ld or other non-ld
linker. I also checked LLVM, by default, it implements bti just as the proposed
-mbranch-protection=bti+all.

Feng

---
 gcc/config/aarch64/aarch64.cc| 12 +++-
 gcc/config/aarch64/aarch64.opt   |  2 +-
 gcc/config/arm/aarch-bti-insert.cc   |  3 ++-
 gcc/config/arm/aarch-common.cc   | 22 ++
 gcc/config/arm/aarch-common.h| 18 ++
 gcc/config/arm/arm.cc|  4 ++--
 gcc/config/arm/arm.opt   |  2 +-
 gcc/doc/invoke.texi  | 16 ++--
 gcc/testsuite/gcc.target/aarch64/bti-5.c | 17 +
 9 files changed, 76 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/bti-5.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 71215ef9fee..a404447c8d0 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -8997,7 +8997,8 @@ void aarch_bti_arch_check (void)
 bool
 aarch_bti_enabled (void)
 {
-  return (aarch_enable_bti == 1);
+  gcc_checking_assert (aarch_enable_bti != AARCH_BTI_FUNCTION_UNSET);
+  return (aarch_enable_bti != AARCH_BTI_FUNCTION_NONE);
 }
 
 /* Check if INSN is a BTI J insn.  */
@@ -18454,12 +18455,12 @@ aarch64_override_options (void)
 
   selected_tune = tune ? tune->ident : cpu->ident;
 
-  if (aarch_enable_bti == 2)
+  if (aarch_enable_bti == AARCH_BTI_FUNCTION_UNSET)
 {
 #ifdef TARGET_ENABLE_BTI
-  aarch_enable_bti = 1;
+  aarch_enable_bti = AARCH_BTI_FUNCTION;
 #else
-  aarch_enable_bti = 0;
+  aarch_enable_bti = AARCH_BTI_FUNCTION_NONE;
 #endif
 }
 
@@ -22881,7 +22882,8 @@ aarch64_print_patchable_function_entry (FILE *file,
   basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
 
   if (!aarch_bti_enabled ()
-  || cgraph_node::get (cfun->decl)->only_called_directly_p ())
+  || (aarch_enable_bti != AARCH_BTI_FUNCTION_ALL
+ && cgraph_node::get (cfun->decl)->only_called_directly_p ()))
 {
   /* Emit the patchable_area at the beginning of the function.  */
   rtx_insn *insn = emit_insn_before (pa, BB_HEAD (bb));
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 025e52d40e5..5571f7e916d 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -37,7 +37,7 @@ TargetVariable
 aarch64_feature_flags aarch64_isa_flags = 0
 
 TargetVariable
-unsigned aarch_enable_bti = 2
+enum aarch_bti_function_type aarch_enable_bti = AARCH_BTI_FUNCTION_UNSET
 
 TargetVariable
 enum aarch_key_type aarch_ra_sign_key = AARCH_KEY_A
diff --git a/gcc/config/arm/aarch-bti-insert.cc 
b/gcc/config/arm/aarch-bti-insert.cc
index 71a77e29406..babd2490c9f 100644
--- a/gcc/config/arm/aarch-bti-insert.cc
+++ b/gcc/config/arm/aarch-bti-insert.cc
@@ -164,7 +164,8 @@ rest_of_insert_bti (void)
  functions that are already protected by Return Address Signing (PACIASP/
  PACIBSP).  For all other cases insert a BTI C at the beginning of the
  function.  */
-  if (!cgraph_node::get (cfun->decl)->only_called_directly_p ())
+  if (aarch_enable_bti == AARCH_BTI_FUNCTION_ALL
+  || !cgraph_node::get (cfun->decl)->only_called_directly_p ())
 {
   bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb;
   insn = BB_HEAD (bb);
diff --git a/gcc/config/arm/aarch-common.cc b/gcc/config/arm/aarch-common.cc
index 5b96ff4c2e8..7751d40f909 100644
--- a/gcc/config/arm/aarch-common.cc
+++ b/gcc/config/arm/aarch-common.cc
@@ -666,7 +666,7 @@ static enum aarch_parse_opt_result
 aarch_handle_no_branch_protection (char* str, char* rest)
 {
   aarch_ra_sign_scope = AARCH_FUNCTION_NONE;
-  aarch_enable_bti = 0;
+  aarch_enable_bti = AARCH_BTI_FUNCTION_NONE;
   if (rest)
 {
   error ("unexpected %<%s%> after %<%s%>", rest, str);
@@ -680,7 +680,7 @@ aarch_handle_standard_branch_protection (char* str, char* 
rest)
 {
   aarch_ra_sign_scope = AARCH_FUNCTION_NON_LEAF;
   aarch_ra_sign_key = AARCH_KEY_A;
-  aarch_enable_bti = 1;
+  aarch_enable_bti = AARCH_BTI_FUNCTION;
   if (rest)
 {
   error ("unexpected %<%s%> after %<%s%>", res

[PATCH] _BitInt bit-field support [PR102989]

2023-08-02 Thread Jakub Jelinek via Gcc-patches

Hi!

On Fri, Jul 28, 2023 at 06:37:23PM +, Joseph Myers wrote:
> Yes, the type used in _Generic isn't fully specified, just the type after 
> integer promotions in contexts where those occur.

Ok.  I've removed those static_asserts from the test then, no need to test
what isn't fully specified...

> > static_assert (expr_has_type (s4.c + 1uwb, _BitInt(389)));
> > static_assert (expr_has_type (s4.d * 0wb, _BitInt(2)));
> > static_assert (expr_has_type (s6.a + 0wb, _BitInt(2)));
> > That looks to me like LLVM bug, because
> > "The value from a bit-field of a bit-precise integer type is converted to
> > the corresponding bit-precise integer type."
> > specifies that s4.c has _BitInt(389) type after integer promotions
> > and s4.d and s6.a have _BitInt(2) type.  Now, 1uwb has unsigned _BitInt(1)
> > type and 0wb has _BitInt(2) and the common type for those in all cases is
> > I believe the type of the left operand.
> 
> Indeed, I'd expect those to pass, since in those cases integer promotions 
> (to the declared _BitInt type of the bit-field, without the bit-field 
> width) are applied.

Here is an updated patch on top of the initially posted 0/5 patch series
which implements the _BitInt bit-field support including its lowering.

2023-08-02  Jakub Jelinek  

PR c/102989
gcc/
* tree.h (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode
bit-fields if they have BITINT_TYPE type.
* stor-layout.cc (finish_bitfield_representative): For bit-fields
with BITINT_TYPE, prefer representatives with precisions in
multiple of limb precision.
* gimple-lower-bitint.cc (struct bitint_large_huge): Declare
handle_load method.  Add m_upwards, m_cast_conditional and
m_bitfld_load members.
(bitint_large_huge::limb_access): Use CEIL instead of normal
truncating division.
(bitint_large_huge::handle_cast): For m_upwards even if not
m_upwards_2limb compute ext only when processing the corresponding
limb rather than upfront.  Deal with handle_operand in 2 separate
branches doing bit-field loads.
(bitint_large_huge::handle_load): New method.
(bitint_large_huge::handle_stmt): Use handle_load for loads.
(bitint_large_huge::lower_mergeable_stmt): Handle stores into
bit-fields.  Set m_upwards.
(bitint_large_huge::lower_addsub_overflow): Set m_upwards.
(bitint_large_huge::lower_stmt): Clear m_upwards, m_cast_conditional
and m_bitfld_load.
(stmt_needs_operand_addr): New function.
(gimple_lower_bitint): Punt on merging loads from bit-fields and/or
stores into bit-fields with statements whose lowering doesn't or can't
handle those.
gcc/c/
* c-decl.cc (check_bitfield_type_and_width): Allow BITINT_TYPE
bit-fields.
(finish_struct): Prefer to use BITINT_TYPE for BITINT_TYPE bit-fields
if possible.
* c-typeck.cc (perform_integral_promotions): Promote BITINT_TYPE
bit-fields to their declared type.
gcc/testsuite/
* gcc.dg/bitint-17.c: New test.
* gcc.dg/torture/bitint-42.c: New test.

--- gcc/tree.h.jj   2023-07-11 15:28:54.703679523 +0200
+++ gcc/tree.h  2023-07-31 20:09:42.329267570 +0200
@@ -1259,7 +1259,9 @@ extern void omp_clause_range_check_faile
 /* True if NODE, a FIELD_DECL, is to be processed as a bitfield for
constructor output purposes.  */
 #define CONSTRUCTOR_BITFIELD_P(NODE) \
-  (DECL_BIT_FIELD (FIELD_DECL_CHECK (NODE)) && DECL_MODE (NODE) != BLKmode)
+  (DECL_BIT_FIELD (FIELD_DECL_CHECK (NODE)) \
+   && (DECL_MODE (NODE) != BLKmode \
+   || TREE_CODE (TREE_TYPE (NODE)) == BITINT_TYPE))

 /* True if NODE is a clobber right hand side, an expression of indeterminate
value that clobbers the LHS in a copy instruction.  We use a volatile
--- gcc/stor-layout.cc.jj   2023-07-19 19:35:10.639889885 +0200
+++ gcc/stor-layout.cc  2023-08-01 13:08:02.667892939 +0200
@@ -2148,6 +2148,22 @@ finish_bitfield_representative (tree rep
   || GET_MODE_BITSIZE (mode) > maxbitsize
   || GET_MODE_BITSIZE (mode) > MAX_FIXED_MODE_SIZE)
 {
+  if (TREE_CODE (TREE_TYPE (field)) == BITINT_TYPE)
+   {
+ struct bitint_info info;
+ unsigned prec = TYPE_PRECISION (TREE_TYPE (field));
+ gcc_assert (targetm.c.bitint_type_info (prec, &info));
+ scalar_int_mode limb_mode = as_a  (info.limb_mode);
+ unsigned lprec = GET_MODE_PRECISION (limb_mode);
+ if (prec > lprec)
+   {
+ /* For middle/large/huge _BitInt prefer bitsize being a multiple
+of limb precision.  */
+ unsigned HOST_WIDE_INT bsz = CEIL (bitsize, lprec) * lprec;
+ if (bsz <= maxbitsize)
+   bitsize = bsz;
+   }
+   }
   /* We really want a BLKmode representative only as a last resort,
  considering the member b in
   struct { int a : 7; int b : 17; in

Re: [RFC] light expander sra for parameters and returns

2023-08-02 Thread guojiufu via Gcc-patches

On 2023-08-02 20:41, Richard Biener wrote:

On Tue, 1 Aug 2023, Jiufu Guo wrote:

Hi,

Richard Biener  writes:

> On Mon, 24 Jul 2023, Jiufu Guo wrote:
>
>>
>> Hi Martin,
>>
>> Not sure about your current option about re-using the ipa-sra code
>> in the light-expander-sra. And if anything I could input please
>> let me know.
>>
>> And I'm thinking about the difference between the expander-sra, ipa-sra
>> and tree-sra. 1. For stmts walking, expander-sra has special behavior
>> for return-stmt, and also a little special on assign-stmt. And phi
>> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
>> I'm also thinking if we need a tree structure; it would be useful when
>> checking overlaps, it was not used now in the expander-sra.
>>
>> For ipa-sra and tree-sra, I notice that there is some similar code,
>> but of cause there are differences. While it seems the difference
>> is 'intended', for example: 1. when creating and accessing,
>> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
>> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
>> not ok for tree-ipa.
>> I'm wondering if those slight difference blocks re-use the code
>> between ipa-sra and tree-sra.
>>
>> The expander-sra may be more light, for example, maybe we can use
>> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
>> need to walk all the stmts.
>
> What I was hoping for is shared stmt-level analysis and a shared
> data structure for the "access"(es) a stmt performs.  Because that
> can come up handy in multiple places.  The existing SRA data
> structures could easily embed that subset for example if sharing
> the whole data structure of [IPA] SRA seems too unwieldly.

Understand.
The stmt-level analysis and "access" data structure are similar
between ipa-sra/tree-sra and the expander-sra.

I just update the patch, this version does not change the behaviors of
the previous version.  It is just cleaning/merging some functions 
only.

The patch is attached.

This version (and tree-sra/ipa-sra) is still using the similar
"stmt analyze" and "access struct"".  This could be extracted as
shared code.
I'm thinking to update the code to use the same "base_access" and
"walk function".

>
> With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
> possible (though RTL expansion pre-walks all stmts anyway).

Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
For struct parameters, walking stmt is needed.

I think I mentioned this before, RTL expansion already
pre-walks the whole function looking for variables it has to
expand to the stack in discover_nonconstant_array_refs (which is
now badly named), I'd appreciate if the "SRA" walk would piggy-back
on that existing walk.

Yes.  I also had a look at discover_nonconstant_array_refs, it seems
this function takes care only of 'call_internal' and 'vdef' stmt for
array access.   But sra cares more about 'assign/call'.
The common thing is just the loop header between these two "stmt-walk"s.

  FOR_EACH_BB_FN (bb, cfun)
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
  {
gimple *stmt = gsi_stmt (gsi);

So, the existing walk is not used.
Another reason to have a new walk is that: the sra walk code may be
shared for tree-sra/ipa-sra.

For RTL expansion I think a critical part is to create accesses
based on the incoming/outgoing RTL which is specified by the ABI.
As I understand we are optimizing the argument setup code which
assigns the incoming arguments to either pseudo(s) or the stack
and thus we get to choose an optimized "mode" for that virtual
location of the incoming arguments (but we can't alter their
hardregs/stack assignment obviously).

Yes, this is what I'm trying to do.
It is "set_scalar_rtx_for_aggregate_access", which is called after
incoming arguments are set up, and then assign the incoming hard 
registers

to pseudo(s).  Those pseudo(s) are the scalarized rtx for the argument.

 So when we have an
incoming register pair we should create an artificial access
for the pieces those two registers represent.

You seem to do quite some adjustment to the parameter setup
where I was hoping we get away with simply choosing a different
mode for the virtual argument representation?

I insert the code in the parameter setup, where the incoming registers
are computed, and assigning the incoming regs to scalar pseudo(s).
(copying the incoming registers to stack would be optimized out by rtl 
passes.

Yes, it would be better to avoid generating them.)

But I'm not too familiar with the innards of parameter/return
value initial RTL expansion.  I hope somebody else can chime
in here as well.

Thanks so much for your very helpful comments!

BR,
Jeff (Jiufu Guo)

Richard.

BR,
Jeff (Jiufu Guo)

-
diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index edf292cfbe9..8c36ad5df79 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -97,6 +97,502 @@

[PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches

Revised:
-- Fix indentation problems
-- Add more detail to Changelog
-- Add new test on handling non-CPython code case
-- Turn off debugging inform by default
-- Make on_finish_translation_unit() static
-- Remove superfluous null checks in init_py_structs()

Changes have been bootstrapped and tested against trunk on 
aarch64-unknown-linux-gnu.

---
This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  26 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 297 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit &tu)
+{
+  for (auto const &cb : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit &tu)
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit &tu);
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..617111b0f0a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,32 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+  {
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+  }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+

[PATCH] match.pd: Canonicalize (signed x << c) >> c [PR101955]

2023-08-02 Thread Drew Ross via Gcc-patches

Canonicalizes (signed x << c) >> c into the lowest
precision(type) - c bits of x IF those bits have a mode precision or a
precision of 1. Also combines this rule with (unsigned x << c) >> c -> x &
((unsigned)-1 >> c) to prevent duplicate pattern. Tested successfully on
x86_64 and x86 targets.

  PR middle-end/101955

gcc/ChangeLog:

  * match.pd ((signed x << c) >> c): New canonicalization.

gcc/testsuite/ChangeLog:

  * gcc.dg/pr101955.c: New test.
---
 gcc/match.pd| 20 +++
 gcc/testsuite/gcc.dg/pr101955.c | 63 +
 2 files changed, 77 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr101955.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 8543f777a28..62de97f0186 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3758,13 +3758,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
- TYPE_PRECISION (TREE_TYPE (@2)
   (bit_and (convert @0) (lshift { build_minus_one_cst (type); } @1
 
-/* Optimize (x << c) >> c into x & ((unsigned)-1 >> c) for unsigned
-   types.  */
+/* For (x << c) >> c, optimize into x & ((unsigned)-1 >> c) for
+   unsigned x OR truncate into the precision(type) - c lowest bits
+   of signed x (if they have mode precision or a precision of 1).  */
 (simplify
- (rshift (lshift @0 INTEGER_CST@1) @1)
- (if (TYPE_UNSIGNED (type)
-  && (wi::ltu_p (wi::to_wide (@1), element_precision (type
-  (bit_and @0 (rshift { build_minus_one_cst (type); } @1
+ (rshift (nop_convert? (lshift @0 INTEGER_CST@1)) @@1)
+ (if (wi::ltu_p (wi::to_wide (@1), element_precision (type)))
+  (if (TYPE_UNSIGNED (type))
+   (bit_and (convert @0) (rshift { build_minus_one_cst (type); } @1))
+   (if (INTEGRAL_TYPE_P (type))
+(with {
+  int width = element_precision (type) - tree_to_uhwi (@1);
+  tree stype = build_nonstandard_integer_type (width, 0);
+ }
+ (if (width == 1 || type_has_mode_precision_p (stype))
+  (convert (convert:stype @0
 
 /* Optimize x >> x into 0 */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/pr101955.c b/gcc/testsuite/gcc.dg/pr101955.c
new file mode 100644
index 000..6a04288511f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr101955.c
@@ -0,0 +1,63 @@
+/* { dg-do compile { target int32 } } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+__attribute__((noipa)) int
+t1 (int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t2 (unsigned int x)
+{
+  int y = x << 31;
+  int z = y >> 31;
+  return z;
+}
+
+__attribute__((noipa)) int
+t3 (int x)
+{
+  return (x << 31) >> 31;
+}
+
+__attribute__((noipa)) int
+t4 (int x)
+{
+  return (x << 24) >> 24;
+}
+
+__attribute__((noipa)) int
+t5 (int x)
+{
+  return (x << 16) >> 16;
+}
+
+__attribute__((noipa)) long long
+t6 (long long x)
+{
+  return (x << 63) >> 63;
+}
+
+__attribute__((noipa)) long long
+t7 (long long x)
+{
+  return (x << 56) >> 56;
+}
+
+__attribute__((noipa)) long long
+t8 (long long x)
+{
+  return (x << 48) >> 48;
+}
+
+__attribute__((noipa)) long long
+t9 (long long x)
+{
+  return (x << 32) >> 32;
+}
+
+/* { dg-final { scan-tree-dump-not " >> " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " << " "optimized" } } */
-- 
2.39.3

Re: [PATCH] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches

Hi Dave,

Thank you for the feedback! I've incorporated the changes and sent a
revised version of the patch.

On Tue, Aug 1, 2023 at 1:02 PM David Malcolm  wrote:
>
> On Tue, 2023-08-01 at 09:52 -0400, Eric Feng wrote:
> > Hi all,
> >
> > This patch adds a hook to the end of ana::on_finish_translation_unit
> > which calls relevant stashing-related callbacks registered during
> > plugin
> > initialization. This feature is used to stash named types and global
> > variables for a CPython analyzer plugin [PR107646].
> >
> > Bootstrapped and tested on aarch64-unknown-linux-gnu. Does it look
> > okay?
>
> Hi Eric, thanks for the patch.
>
> The patch touches the C frontend, so those parts would need approval
> from the C FE maintainers/reviewers; I've CCed them.
>
> Overall, I like the patch, but it's not ready for trunk yet; various
> comments inline below...
>
> >
> > ---
> >
> > gcc/analyzer/ChangeLog:
>
> You could add: PR analyzer/107646 to these ChangeLog entries; have a
> look at how other ChangeLog entries refer to such bugzilla entries.
>
> >
> > * analyzer-language.cc (run_callbacks): New function.
> > (on_finish_translation_unit): New function.
> > * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
> > (class translation_unit): New vfuncs.
> >
> > gcc/c/ChangeLog:
> >
> > * c-parser.cc: New functions.
>
> I think this ChangeLog entry needs more detail.
Added in revised version of the patch.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/plugin/analyzer_cpython_plugin.c: New test.
> >
> > Signed-off-by: Eric Feng 
> > ---
> >  gcc/analyzer/analyzer-language.cc |  22 ++
> >  gcc/analyzer/analyzer-language.h  |   9 +
> >  gcc/c/c-parser.cc |  26 ++
> >  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 224
> > ++
> >  4 files changed, 281 insertions(+)
> >  create mode 100644
> > gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
> >
> > diff --git a/gcc/analyzer/analyzer-language.cc
> > b/gcc/analyzer/analyzer-language.cc
> > index 2c8910906ee..fc41b9c17b8 100644
> > --- a/gcc/analyzer/analyzer-language.cc
> > +++ b/gcc/analyzer/analyzer-language.cc
> > @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> > *analyzer_stashed_constants;
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > +static vec
> > +*finish_translation_unit_callbacks;
> > +
> > +void
> > +register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback)
> > +{
> > +  if (!finish_translation_unit_callbacks)
> > +vec_alloc (finish_translation_unit_callbacks, 1);
> > +  finish_translation_unit_callbacks->safe_push (callback);
> > +}
> > +
> > +void
> > +run_callbacks (logger *logger, const translation_unit &tu)
>
> This function could be "static" since it's not needed outside of
> analyzer-language.cc
>
> > +{
> > +  for (auto const &cb : finish_translation_unit_callbacks)
> > +{
> > +  cb (logger, tu);
> > +}
> > +}
> >
> >  /* Call into TU to try to find a value for NAME.
> > If found, stash its value within analyzer_stashed_constants.  */
> > @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> > translation_unit &tu)
> >  the_logger.set_logger (new logger (logfile, 0, 0,
> >  *global_dc->printer));
> >stash_named_constants (the_logger.get_logger (), tu);
> > +
> > +  run_callbacks (the_logger.get_logger (), tu);
> >  }
> >
> >  /* Lookup NAME in the named constants stashed when the frontend TU
> > finished.
> > diff --git a/gcc/analyzer/analyzer-language.h
> > b/gcc/analyzer/analyzer-language.h
> > index 00f85aba041..8deea52d627 100644
> > --- a/gcc/analyzer/analyzer-language.h
> > +++ b/gcc/analyzer/analyzer-language.h
> > @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
> >  #ifndef GCC_ANALYZER_LANGUAGE_H
> >  #define GCC_ANALYZER_LANGUAGE_H
> >
> > +#include "analyzer/analyzer-logging.h"
> > +
> >  #if ENABLE_ANALYZER
> >
> >  namespace ana {
> > @@ -35,8 +37,15 @@ class translation_unit
> >   have been seen).  If it is defined and an integer (e.g. either
> > as a
> >   macro or enum), return the INTEGER_CST value, otherwise return
> > NULL.  */
> >virtual tree lookup_constant_by_id (tree id) const = 0;
> > +  virtual tree lookup_type_by_id (tree id) const = 0;
> > +  virtual tree lookup_global_var_by_id (tree id) const = 0;
> >  };
> >
> > +typedef void (*finish_translation_unit_callback)
> > +   (logger *, const translation_unit &);
> > +void register_finish_translation_unit_callback (
> > +finish_translation_unit_callback callback);
> > +
> >  /* Analyzer hook for frontends to call at the end of the TU.  */
> >
> >  void on_finish_translation_unit (const translation_unit &tu);
> > diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> > index 80920b31f83..f0ee55e416b 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1695,6 +1695,32 @@ public:
> >  return NULL_TREE;
> >

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-02 Thread Qing Zhao via Gcc-patches




> On Aug 1, 2023, at 10:31 AM, Martin Uecker  wrote:
> 
> Am Dienstag, dem 01.08.2023 um 13:27 + schrieb Qing Zhao:
>> 
>>> On Aug 1, 2023, at 3:51 AM, Martin Uecker via Gcc-patches 
>>>  wrote:
>>> 
> 
> 
 Hi Martin,
 Just wondering if it'd be a good idea perhaps to warn if alloc size is
 not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
 So it can catch cases like:
 int *p = malloc (sizeof (int) + 2); // probably intended malloc
 (sizeof (int) * 2)
 
 FWIW, this is caught using -fanalyzer:
 f.c: In function 'f':
 f.c:3:12: warning: allocated buffer size is not a multiple of the
 pointee's size [CWE-131] [-Wanalyzer-allocation-size]
3 |   int *p = __builtin_malloc (sizeof(int) + 2);
  |^~
 
 Thanks,
 Prathamesh
>>> 
>>> Yes, this is probably a good idea.  It might need special
>>> logic for flexible array members then...
>> 
>> Why special logic for FAM on such warning? (Not a multiple of TYPE_SIZE_UNIT 
>> for the element).
>> 
> 
> For
> 
> struct { int n; char buf[]; } *p = malloc(sizeof *p + n);
> p->n = n;
> 
> the size would not be a multiple.

But n is still a multiple of sizeof (char), right? Do I miss anything here?

Qing
> 
> Martin
> 
> 
> 
>

Re: [PATCH 04/14] c++: use _P() defines from tree.h

2023-08-02 Thread Patrick Palka via Gcc-patches

On Thu, Jun 1, 2023 at 2:11 PM Bernhard Reutner-Fischer
 wrote:
>
> Hi David, Patrick,
>
> On Thu, 1 Jun 2023 18:33:46 +0200
> Bernhard Reutner-Fischer  wrote:
>
> > On Thu, 1 Jun 2023 11:24:06 -0400
> > Patrick Palka  wrote:
> >
> > > On Sat, May 13, 2023 at 7:26 PM Bernhard Reutner-Fischer via
> > > Gcc-patches  wrote:
> >
> > > > diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> > > > index 131b212ff73..19dfb3ed782 100644
> > > > --- a/gcc/cp/tree.cc
> > > > +++ b/gcc/cp/tree.cc
> > > > @@ -1173,7 +1173,7 @@ build_cplus_array_type (tree elt_type, tree 
> > > > index_type, int dependent)
> > > >  }
> > > >
> > > >/* Avoid spurious warnings with VLAs (c++/54583).  */
> > > > -  if (TYPE_SIZE (t) && EXPR_P (TYPE_SIZE (t)))
> > > > +  if (CAN_HAVE_LOCATION_P (TYPE_SIZE (t)))
> > >
> > > Hmm, this change seems undesirable...
> >
> > mhm, yes that is misleading. I'll prepare a patch to revert this.
> > Let me have a look if there were other such CAN_HAVE_LOCATION_P changes
> > that we'd want to revert.
>
> Sorry for that!
> I'd revert the hunk above and the one in gcc-rich-location.cc
> (maybe_range_label_for_tree_type_mismatch::get_text), please see
> attached. Bootstrap running, ok for trunk if it passes?

LGTM!

>
> thanks,

Re: [PATCH 2/5] [RISC-V] Generate Zicond instruction for basic semantics

2023-08-02 Thread Jeff Law via Gcc-patches





On 8/2/23 04:05, Richard Sandiford wrote:

Jeff Law via Gcc-patches  writes:

On 8/1/23 05:18, Richard Sandiford wrote:


Where were you seeing the requirement for pointer equality?  genrecog.cc
at least uses rtx_equal_p, and I think it has to.  E.g. some patterns
use (match_dup ...) to match output and input mems, and mem rtxes
shouldn't be shared.

It's a general concern due to the way we handle transforming pseudos
into hard registers after allocation is complete.   We can end up with
two REG expressions that will compare equal according to rtx_equal_p,
but which are not pointer equal.


But isn't that OK?  I don't think there's a requirement for match_dup
pointer equality either before or after RA.  Or at least, there
shouldn't be.  If something happens to rely on pointer equality
for match_dups then I think we should fix it.





So IMO, like you said originally, match_dup would be the right way to
handle this kind of pattern.
I'd assumed that match_dup required pointer equality.  If it doesn't, 
then great, we can adjust the pattern to use match_dup.  I'm about to 
submit some bits to simplify/correct a bit of zicond.md, then I can do 
some testing with match_dup in place now that things seem to be more 
stable on the code generation correctness side.





I don't want to labour the point though.
No worries about that on my end!  I probably don't say it enough, but 
when you raise an issue, it's worth the time to make sure I understand 
your point thoroughly.


In this case I'd assumed that match_dup relied on pointer equality which 
doesn't seem to be the case.  30+ years into this codebase and I'm still 
learning new stuff!


Jeff

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread David Malcolm via Gcc-patches

On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:

Hi Eric, thanks for the updated patch.

Overall, looks good to me, although I'd drop the "Exited." from the
"sorry" message (and thus from the dg-message directive), since the
compiler is not exiting, it's just the particular plugin that's giving
up (but let's not hold up the patch with a "bikeshed" discussion on the
precise wording).

If Joseph or Marek approves the C parts of the patch, this will be OK
to push to trunk.

Dave

> Revised:
> -- Fix indentation problems
> -- Add more detail to Changelog
> -- Add new test on handling non-CPython code case
> -- Turn off debugging inform by default
> -- Make on_finish_translation_unit() static
> -- Remove superfluous null checks in init_py_structs()
> 
> Changes have been bootstrapped and tested against trunk on aarch64-
> unknown-linux-gnu.
> 
> ---
> This patch adds a hook to the end of ana::on_finish_translation_unit
> which calls relevant stashing-related callbacks registered during
> plugin
> initialization. This feature is used to stash named types and global
> variables for a CPython analyzer plugin [PR107646].
> 
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
>     * analyzer-language.cc (run_callbacks): New function.
>     (on_finish_translation_unit): New function.
>     * analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
>     (class translation_unit): New vfuncs.
> 
> gcc/c/ChangeLog:
> PR analyzer/107646
>     * c-parser.cc: New functions on stashing values for the
>   analyzer.
> 
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
>     * gcc.dg/plugin/plugin.exp: Add new plugin and test.
>     * gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
>     * gcc.dg/plugin/cpython-plugin-test-1.c: New test.
> 
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/analyzer-language.cc |  22 ++
>  gcc/analyzer/analyzer-language.h  |   9 +
>  gcc/c/c-parser.cc |  26 ++
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230
> ++
>  .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
>  gcc/testsuite/gcc.dg/plugin/plugin.exp    |   2 +
>  6 files changed, 297 insertions(+)
>  create mode 100644
> gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-
> 1.c
> 
> diff --git a/gcc/analyzer/analyzer-language.cc
> b/gcc/analyzer/analyzer-language.cc
> index 2c8910906ee..85400288a93 100644
> --- a/gcc/analyzer/analyzer-language.cc
> +++ b/gcc/analyzer/analyzer-language.cc
> @@ -35,6 +35,26 @@ static GTY (()) hash_map 
> *analyzer_stashed_constants;
>  #if ENABLE_ANALYZER
>  
>  namespace ana {
> +static vec
> +    *finish_translation_unit_callbacks;
> +
> +void
> +register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback)
> +{
> +  if (!finish_translation_unit_callbacks)
> +    vec_alloc (finish_translation_unit_callbacks, 1);
> +  finish_translation_unit_callbacks->safe_push (callback);
> +}
> +
> +static void
> +run_callbacks (logger *logger, const translation_unit &tu)
> +{
> +  for (auto const &cb : finish_translation_unit_callbacks)
> +    {
> +  cb (logger, tu);
> +    }
> +}
>  
>  /* Call into TU to try to find a value for NAME.
>     If found, stash its value within analyzer_stashed_constants.  */
> @@ -102,6 +122,8 @@ on_finish_translation_unit (const
> translation_unit &tu)
>  the_logger.set_logger (new logger (logfile, 0, 0,
>    *global_dc->printer));
>    stash_named_constants (the_logger.get_logger (), tu);
> +
> +  run_callbacks (the_logger.get_logger (), tu);
>  }
>  
>  /* Lookup NAME in the named constants stashed when the frontend TU
> finished.
> diff --git a/gcc/analyzer/analyzer-language.h
> b/gcc/analyzer/analyzer-language.h
> index 00f85aba041..8deea52d627 100644
> --- a/gcc/analyzer/analyzer-language.h
> +++ b/gcc/analyzer/analyzer-language.h
> @@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
>  #ifndef GCC_ANALYZER_LANGUAGE_H
>  #define GCC_ANALYZER_LANGUAGE_H
>  
> +#include "analyzer/analyzer-logging.h"
> +
>  #if ENABLE_ANALYZER
>  
>  namespace ana {
> @@ -35,8 +37,15 @@ class translation_unit
>   have been seen).  If it is defined and an integer (e.g. either
> as a
>   macro or enum), return the INTEGER_CST value, otherwise return
> NULL.  */
>    virtual tree lookup_constant_by_id (tree id) const = 0;
> +  virtual tree lookup_type_by_id (tree id) const = 0;
> +  virtual tree lookup_global_var_by_id (tree id) const = 0;
>  };
>  
> +typedef void (*finish_translation_unit_callback)
> +   (logger *, const translation_unit &);
> +void register_finish_translation_unit_callback (
> +    finish_translation_unit_callback callback);
> +
>  /* Analyzer hook for frontends to call at the end of the TU.  */
>  
>  void on_finish_translation_unit (const translation_unit &tu);
>

[PATCH v2 2/3] openmp, nvptx: low-lat memory access traits

2023-08-02 Thread Andrew Stubbs


The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
* testsuite/libgomp.c/omp_alloc-4.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-6.c (main): Add access trait.
* testsuite/libgomp.c/omp_alloc-traits.c: New test.
---
 libgomp/allocator.c   | 16 +
 libgomp/config/nvptx/allocator.c  | 11 +++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c |  7 +-
 libgomp/testsuite/libgomp.c/omp_alloc-6.c |  7 +-
 .../testsuite/libgomp.c/omp_alloc-traits.c| 68 +++
 5 files changed, 103 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index fbf7b1ab061..35b8ec71480 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -56,6 +56,10 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  (((void)(MEMSPACE), (void)(ACCESS), 1))
+#endif
 
 /* Map the predefined allocators to the correct memory space.
The index to this table is the omp_allocator_handle_t enum value.
@@ -507,6 +511,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, &new_size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -817,6 +825,10 @@ retry:
   if (__builtin_add_overflow (size_temp, new_size, &new_size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -1063,6 +1075,10 @@ retry:
 goto fail;
   old_size = data->size;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index 6014fba177f..f19ac28d32a 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -108,6 +108,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+  /* Disallow use of low-latency memory when it must be accessible by
+ all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
@@ -116,5 +125,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  nvptx_memspace_validate (MEMSPACE, ACCESS)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/omp_alloc-4.c b/libgomp/testsuite/libgomp.c/omp_alloc-4.c
index 66e13c09234..9d169858151 100644
--- a/libgomp/testsuite/libgomp.c/omp_alloc-4.c
+++ b/libgomp/testsuite/libgomp.c/omp_alloc-4.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]
+  = { { omp_atk_fallback, omp_atv_null_fb },
+  { omp_atk_access, omp_atv_pteam } };
 omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-			1, traits);
+			2, traits);
 
 int size = 4;
 
diff --git a/libgomp/testsuite/libgomp.c/omp_alloc-6.c b/libgomp/testsuite/libgomp.c/omp_alloc-6.c
index 66bf69b0455..b5f0a296998 100644
--- a/libgomp/testsuite/libgomp.c/omp_alloc-6.c
+++ b/libgomp/testsuite/libgomp.c/omp_alloc-6.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]

[PATCH v2 0/3] libgomp: OpenMP low-latency omp_alloc

2023-08-02 Thread Andrew Stubbs

This patch series is an updated and reworked version of some of the patch set
posted about a year ago (the other features will be posted soon), this
time supporting amdgcn, in addition to nvptx:

https://patchwork.sourceware.org/project/gcc/list/?series=10748&state=%2A&archive=both

The series implements device-specific allocators and adds a low-latency
allocator for both GPUs architectures.

The previous review comments have been addressed, I hope, plus a lot of
bugs have been found and fixed since the original post.  With the
addition of amdgcn I have broken out the heap implementation so both
architectures can share the code.

Andrew

Andrew Stubbs (3):
  libgomp, nvptx: low-latency memory allocator
  openmp, nvptx: low-lat memory access traits
  amdgcn, libgomp: low-latency allocator

 gcc/config/gcn/gcn-builtins.def   |   2 +
 gcc/config/gcn/gcn.cc |  16 +-
 libgomp/allocator.c   | 269 +
 libgomp/basic-allocator.c | 380 ++
 libgomp/config/gcn/allocator.c| 123 ++
 libgomp/config/gcn/libgomp-gcn.h  |   6 +
 libgomp/config/gcn/team.c |  12 +
 libgomp/config/nvptx/allocator.c  | 131 ++
 libgomp/config/nvptx/team.c   |  18 +
 libgomp/libgomp.h |   3 -
 libgomp/plugin/plugin-gcn.c   |  35 +-
 libgomp/plugin/plugin-nvptx.c |  23 +-
 libgomp/testsuite/libgomp.c/omp_alloc-1.c |  56 +++
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  64 +++
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  42 ++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c | 197 +
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  63 +++
 libgomp/testsuite/libgomp.c/omp_alloc-6.c | 118 ++
 .../testsuite/libgomp.c/omp_alloc-traits.c|  68 
 19 files changed, 1528 insertions(+), 98 deletions(-)
 create mode 100644 libgomp/basic-allocator.c
 create mode 100644 libgomp/config/gcn/allocator.c
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-6.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-traits.c

-- 
2.41.0

[PATCH v2 1/3] libgomp, nvptx: low-latency memory allocator

2023-08-02 Thread Andrew Stubbs


This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(predefined_alloc_mapping): New array.
(omp_aligned_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
(omp_realloc): Use MEMSPACE_REALLOC, MEMSPACE_ALLOC, and MEMSPACE_FREE.
* config/nvptx/team.c (__nvptx_lowlat_pool): New asm variable.
(__nvptx_lowlat_init): New prototype.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* basic-allocator.c: New file.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/omp_alloc-1.c: New test.
* testsuite/libgomp.c/omp_alloc-2.c: New test.
* testsuite/libgomp.c/omp_alloc-3.c: New test.
* testsuite/libgomp.c/omp_alloc-4.c: New test.
* testsuite/libgomp.c/omp_alloc-5.c: New test.
* testsuite/libgomp.c/omp_alloc-6.c: New test.

Co-authored-by: Kwok Cheung Yeung  
Co-Authored-By: Thomas Schwinge 
---
 libgomp/allocator.c   | 253 +-
 libgomp/basic-allocator.c | 380 ++
 libgomp/config/nvptx/allocator.c  | 120 +++
 libgomp/config/nvptx/team.c   |  18 +
 libgomp/plugin/plugin-nvptx.c |  23 +-
 libgomp/testsuite/libgomp.c/omp_alloc-1.c |  56 
 libgomp/testsuite/libgomp.c/omp_alloc-2.c |  64 
 libgomp/testsuite/libgomp.c/omp_alloc-3.c |  42 +++
 libgomp/testsuite/libgomp.c/omp_alloc-4.c | 196 +++
 libgomp/testsuite/libgomp.c/omp_alloc-5.c |  63 
 libgomp/testsuite/libgomp.c/omp_alloc-6.c | 117 +++
 11 files changed, 1244 insertions(+), 88 deletions(-)
 create mode 100644 libgomp/basic-allocator.c
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/omp_alloc-6.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 90f2dcb60d6..fbf7b1ab061 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -37,6 +37,42 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.
+   The following definitions (ab)use comma operators to avoid unused
+   variable errors.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) \
+  malloc (((void)(MEMSPACE), (SIZE)))
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) \
+  calloc (1, (((void)(MEMSPACE), (SIZE
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) \
+  realloc (ADDR, (((void)(MEMSPACE), (void)(OLDSIZE), (SIZE
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) \
+  free (((void)(MEMSPACE), (void)(SIZE), (ADDR)))
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.
+   When the user calls omp_alloc with a predefined allocator this
+   table determines what memory they get.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_const_mem_space, /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 enum gomp_numa_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
@@ -522,7 +558,7 @@ retry:
 	}
   else
 #endif
-	ptr = malloc (new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -554,7 +590,13 @@ retry:
 	}
   else
 #e

[PATCH v2 3/3] amdgcn, libgomp: low-latency allocator

2023-08-02 Thread Andrew Stubbs


This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).

Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible.  This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories).  A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.

gcc/ChangeLog:

* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.

libgomp/ChangeLog:

* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/omp_alloc-traits.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
---
 gcc/config/gcn/gcn-builtins.def   |   2 +
 gcc/config/gcn/gcn.cc |  16 ++-
 libgomp/config/gcn/allocator.c| 123 ++
 libgomp/config/gcn/libgomp-gcn.h  |   6 +
 libgomp/config/gcn/team.c |  12 ++
 libgomp/libgomp.h |   3 -
 libgomp/plugin/plugin-gcn.c   |  35 -
 .../testsuite/libgomp.c/omp_alloc-traits.c|   2 +-
 8 files changed, 188 insertions(+), 11 deletions(-)
 create mode 100644 libgomp/config/gcn/allocator.c

diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index 636a8e7a1a9..471457d7c23 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -164,6 +164,8 @@ DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
 	 _A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
 DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
 	 gcn_expand_builtin_1)
+DEF_BUILTIN (DISPATCH_PTR, -1, "dispatch_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
+	 gcn_expand_builtin_1)
 DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
 	 _A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 02f4dedec42..c4bf0e6ab92 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -109,7 +109,8 @@ gcn_init_machine_status (void)
 
   f = ggc_cleared_alloc ();
 
-  if (TARGET_GCN3)
+  // FIXME: re-enable global addressing with safety for LDS-flat addresses
+  //if (TARGET_GCN3)
 f->use_flat_addressing = true;
 
   return f;
@@ -4881,6 +4882,19 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
 	  }
 	return ptr;
   }
+case GCN_BUILTIN_DISPATCH_PTR:
+  {
+	rtx ptr;
+	if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0)
+	   ptr = gen_rtx_REG (DImode,
+			  cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+	else
+	  {
+	ptr = gen_reg_rtx (DImode);
+	emit_move_insn (ptr, const0_rtx);
+	  }
+	return ptr;
+  }
 case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
   {
 	/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
diff --git a/libgomp/config/gcn/allocator.c b/libgomp/config/gcn/allocator.c
new file mode 100644
index 000..151086ea225
--- /dev/null
+++ b/libgomp/config/gcn/allocator.c
@@ -0,0 +1,123 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+   This file is part of the GNU Offloading and Multi Processing Library
+   (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this progra

Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
 wrote:
>
> This is a new version of the patch.
> Instead of doing the matching of inversion comparison directly inside
> match, creating a new function (bitwise_inverted_equal_p) to do it.
> It is very similar to bitwise_equal_p that was added in 
> r14-2751-g2a3556376c69a1fb
> but instead it says `expr1 == ~expr2`. A follow on patch, will
> use this function in other patterns where we try to match `@0` and `(bit_not 
> @0)`.
>
> Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
>
> Committed as approved after a Bootstrapped and test on x86_64-linux-gnu with 
> no regressions.
Hi Andrew,
Unfortunately, this patch (committed in
2bae476b511dc441bf61da8a49cca655575e7dd6) causes
segmentation fault for pr33133.c on aarch64-linux-gnu because of
infinite recursion.

Running the test under gdb shows:
Program received signal SIGSEGV, Segmentation fault.
operand_compare::operand_equal_p (this=0x29dc680
, arg0=0xf7789a68, arg1=0xf7789f30,
flags=16) at ../../gcc/gcc/fold-const.cc:3088
3088{
(gdb) bt
#0  operand_compare::operand_equal_p (this=0x29dc680
, arg0=0xf7789a68, arg1=0xf7789f30,
flags=16) at ../../gcc/gcc/fold-const.cc:3088
#1  0x00a90394 in operand_compare::verify_hash_value
(this=this@entry=0x29dc680 ,
arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
flags=flags@entry=0, ret=ret@entry=0xfc000157)
at ../../gcc/gcc/fold-const.cc:4074
#2  0x00a9351c in operand_compare::verify_hash_value
(ret=0xfc000157, flags=0, arg1=0xf7789f30,
arg0=0xf7789a68, this=0x29dc680 ) at
../../gcc/gcc/fold-const.cc:4072
#3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
, arg0=arg0@entry=0xf7789a68,
arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
../../gcc/gcc/fold-const.cc:3090
#4  0x00a9791c in operand_equal_p
(arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
#5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf7789f30, valueize=
0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:284
#6  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf77d0240,
valueize=0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:296
#7  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf7789f30,
valueize=0x112d698 ) at
../../gcc/gcc/gimple-match-head.cc:296
#8  0x01d38e80 in gimple_bitwise_inverted_equal_p
(expr1=0xf7789a68, expr2=0xf77d0240,
...

It seems to recurse cyclically with expr2=0xf7789f30 ->
expr2=0xf77d0240 eventually leading to segfault.
while expr1=0xf7789a68 remains same throughout the stack frames.

Thanks,
Prathamesh
>
> PR tree-optimization/100864
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> (gimple_bitwise_inverted_equal_p): New function.
> * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> instead of direct matching bit_not.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-3.c: New test.
> ---
>  gcc/generic-match-head.cc| 42 ++
>  gcc/gimple-match-head.cc | 71 
>  gcc/match.pd |  5 +-
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
>  4 files changed, 183 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index a71c0727b0b..ddaf22f2179 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
>  return wi::to_wide (expr1) == wi::to_wide (expr2);
>return operand_equal_p (expr1, expr2, 0);
>  }
> +
> +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> +   but not necessarily same type.
> +   The types can differ through nop conversions.  */
> +
> +static inline bool
> +bitwise_inverted_equal_p (tree expr1, tree expr2)
> +{
> +  STRIP_NOPS (expr1);
> +  STRIP_NOPS (expr2);
> +  if (expr1 == expr2)
> +return false;
> +  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> +return false;
> +  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> +return wi::to_wide (expr1) == ~wi::to_wide (expr2);
> +  if (operand_equal_p (expr1, expr2, 0))
> +return false;
> +  if (TREE_CODE (expr1) == BIT_NOT_EXPR
> +  && bitwise_equal_p (TREE_OPERAND (expr1, 0), expr2))
> +return true;
> +  if (TREE_CODE (expr2) == BIT_NOT_EXPR
> +  && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
> +return true;
> +  if (COMPARISON_CLASS_P (expr1)
> +

Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Andrew Pinski via Gcc-patches

On Wed, Aug 2, 2023 at 10:13 AM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
>  wrote:
> >
> > This is a new version of the patch.
> > Instead of doing the matching of inversion comparison directly inside
> > match, creating a new function (bitwise_inverted_equal_p) to do it.
> > It is very similar to bitwise_equal_p that was added in 
> > r14-2751-g2a3556376c69a1fb
> > but instead it says `expr1 == ~expr2`. A follow on patch, will
> > use this function in other patterns where we try to match `@0` and 
> > `(bit_not @0)`.
> >
> > Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
> >
> > Committed as approved after a Bootstrapped and test on x86_64-linux-gnu 
> > with no regressions.
> Hi Andrew,
> Unfortunately, this patch (committed in
> 2bae476b511dc441bf61da8a49cca655575e7dd6) causes
> segmentation fault for pr33133.c on aarch64-linux-gnu because of
> infinite recursion.

A similar issue is recorded as PR 110874 which I am debugging right now.

Thanks,
Andrew

>
> Running the test under gdb shows:
> Program received signal SIGSEGV, Segmentation fault.
> operand_compare::operand_equal_p (this=0x29dc680
> , arg0=0xf7789a68, arg1=0xf7789f30,
> flags=16) at ../../gcc/gcc/fold-const.cc:3088
> 3088{
> (gdb) bt
> #0  operand_compare::operand_equal_p (this=0x29dc680
> , arg0=0xf7789a68, arg1=0xf7789f30,
> flags=16) at ../../gcc/gcc/fold-const.cc:3088
> #1  0x00a90394 in operand_compare::verify_hash_value
> (this=this@entry=0x29dc680 ,
> arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> flags=flags@entry=0, ret=ret@entry=0xfc000157)
> at ../../gcc/gcc/fold-const.cc:4074
> #2  0x00a9351c in operand_compare::verify_hash_value
> (ret=0xfc000157, flags=0, arg1=0xf7789f30,
> arg0=0xf7789a68, this=0x29dc680 ) at
> ../../gcc/gcc/fold-const.cc:4072
> #3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
> , arg0=arg0@entry=0xf7789a68,
> arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
> ../../gcc/gcc/fold-const.cc:3090
> #4  0x00a9791c in operand_equal_p
> (arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
> #5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf7789f30, valueize=
> 0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:284
> #6  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf77d0240,
> valueize=0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:296
> #7  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf7789f30,
> valueize=0x112d698 ) at
> ../../gcc/gcc/gimple-match-head.cc:296
> #8  0x01d38e80 in gimple_bitwise_inverted_equal_p
> (expr1=0xf7789a68, expr2=0xf77d0240,
> ...
>
> It seems to recurse cyclically with expr2=0xf7789f30 ->
> expr2=0xf77d0240 eventually leading to segfault.
> while expr1=0xf7789a68 remains same throughout the stack frames.
>
> Thanks,
> Prathamesh
> >
> > PR tree-optimization/100864
> >
> > gcc/ChangeLog:
> >
> > * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> > * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> > (gimple_bitwise_inverted_equal_p): New function.
> > * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> > instead of direct matching bit_not.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/bitops-3.c: New test.
> > ---
> >  gcc/generic-match-head.cc| 42 ++
> >  gcc/gimple-match-head.cc | 71 
> >  gcc/match.pd |  5 +-
> >  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
> >  4 files changed, 183 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> >
> > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > index a71c0727b0b..ddaf22f2179 100644
> > --- a/gcc/generic-match-head.cc
> > +++ b/gcc/generic-match-head.cc
> > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
> >  return wi::to_wide (expr1) == wi::to_wide (expr2);
> >return operand_equal_p (expr1, expr2, 0);
> >  }
> > +
> > +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> > +   but not necessarily same type.
> > +   The types can differ through nop conversions.  */
> > +
> > +static inline bool
> > +bitwise_inverted_equal_p (tree expr1, tree expr2)
> > +{
> > +  STRIP_NOPS (expr1);
> > +  STRIP_NOPS (expr2);
> > +  if (expr1 == expr2)
> > +return false;
> > +  if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
> > +return false;
> > +  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
> > +return wi::to_

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Marek Polacek via Gcc-patches

On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> 
> Hi Eric, thanks for the updated patch.
> 
> Overall, looks good to me, although I'd drop the "Exited." from the
> "sorry" message (and thus from the dg-message directive), since the
> compiler is not exiting, it's just the particular plugin that's giving
> up (but let's not hold up the patch with a "bikeshed" discussion on the
> precise wording).
> 
> If Joseph or Marek approves the C parts of the patch, this will be OK
> to push to trunk.

[...]

> > index cf82b0306d1..617111b0f0a 100644
> > --- a/gcc/c/c-parser.cc
> > +++ b/gcc/c/c-parser.cc
> > @@ -1695,6 +1695,32 @@ public:
> >  return NULL_TREE;
> >    }
> >  
> > +  tree
> > +  lookup_type_by_id (tree id) const final override
> > +  {
> > +    if (tree type_decl = lookup_name (id))
> > +  {
> > +   if (TREE_CODE (type_decl) == TYPE_DECL)
> > + {
> > +   tree record_type = TREE_TYPE (type_decl);
> > +   if (TREE_CODE (record_type) == RECORD_TYPE)
> > + return record_type;
> > + }
> > +  }

I'd drop this set of { }, like below.  OK with that adjusted, thanks.

> > +
> > +    return NULL_TREE;
> > +  }
> > +
> > +  tree
> > +  lookup_global_var_by_id (tree id) const final override
> > +  {
> > +    if (tree var_decl = lookup_name (id))
> > +  if (TREE_CODE (var_decl) == VAR_DECL)
> > +   return var_decl;
> > +
> > +    return NULL_TREE;
> > +  }
> > +
> >  private:
> >    /* Attempt to get an INTEGER_CST from MACRO.
> >   Only handle the simplest cases: where MACRO's definition is a

Marek

[committed][RISC-V] Fix 20010221-1.c with zicond

2023-08-02 Thread Jeff Law via Gcc-patches




So we're being a bit too aggressive with the .opt zicond patterns.



(define_insn "*czero.eqz..opt1"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "1")
  (match_operand:GPR 3 "register_operand" "r")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
  "czero.eqz\t%0,%3,%1"
)

The RTL semantics here are op0 = (op1 == 0) ? op1 : op2.  That maps 
directly to czero.eqz.  ie, we select op1 when we know it's zero, op2 
otherwise.  So this pattern is fine.





(define_insn "*czero.eqz..opt2"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "r")
  (match_operand:GPR 3 "register_operand" "1")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1],  operands[3])"
  "czero.nez\t%0,%2,%1"
)


The RTL semantics of this pattern are are: op0 = (op1 == 0) ? op2 : op1;

That's not something that can be expressed by the zicond extension as it 
selects op1 if and only if op1 is not equal to zero.





(define_insn "*czero.nez..opt3"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "r")
  (match_operand:GPR 3 "register_operand" "1")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[3])"
  "czero.eqz\t%0,%2,%1"
)
The RTL semantics of this pattern are op0 = (op1 != 0) ? op2 : op1. 
That maps to czero.nez.  But the output template uses czero.eqz.  Opps.



(define_insn "*czero.nez..opt4"
  [(set (match_operand:GPR 0 "register_operand"   "=r")
(if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
  (const_int 0))
  (match_operand:GPR 2 "register_operand" "1")
  (match_operand:GPR 3 "register_operand" "r")))]
  "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
  "czero.nez\t%0,%3,%1"
)
The RTL semantics of this pattern are op0 = (op1 != 0) ? op1 : op2 which 
obviously doesn't match to any zicond instruction as op1 is selected 
when it is not zero.



So two of the patterns are just totally bogus as they are not 
implementable with zicond.  They are removed.  The asm template for the 
.opt3 pattern is fixed to use czero.nez and its name is changed to .opt2.


This fixes the known issues with the zicond.md bits.  Onward to the rest 
of the expansion work :-)


Committed to the trunk,

jeff

commit 1d5bc3285e8a115538442dc2aaa34d2b509e1f6e
Author: Jeff Law 
Date:   Wed Aug 2 13:16:23 2023 -0400

[committed][RISC-V] Fix 20010221-1.c with zicond

So we're being a bit too aggressive with the .opt zicond patterns.

> (define_insn "*czero.eqz..opt1"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "1")
>   (match_operand:GPR 3 "register_operand" "r")))]
>   "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1], operands[2])"
>   "czero.eqz\t%0,%3,%1"
> )
The RTL semantics here are op0 = (op1 == 0) ? op1 : op2.  That maps
directly to czero.eqz.  ie, we select op1 when we know it's zero, op2
otherwise.  So this pattern is fine.

> (define_insn "*czero.eqz..opt2"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (eq (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "r")
>   (match_operand:GPR 3 "register_operand" "1")))]
>   "(TARGET_ZICOND || 1) && rtx_equal_p (operands[1],  operands[3])"
>   "czero.nez\t%0,%2,%1"
> )

The RTL semantics of this pattern are are: op0 = (op1 == 0) ? op2 : op1;

That's not something that can be expressed by the zicond extension as it
selects op1 if and only if op1 is not equal to zero.

> (define_insn "*czero.nez..opt3"
>   [(set (match_operand:GPR 0 "register_operand"   "=r")
> (if_then_else:GPR (ne (match_operand:X 1 "register_operand" "r")
>   (const_int 0))
>   (match_operand:GPR 2 "register_operand" "r")
>

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Biener via Gcc-patches

On Wed, 2 Aug 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Tue, 1 Aug 2023, Richard Sandiford wrote:
> >
> >> Richard Sandiford  writes:
> >> > Richard Biener via Gcc-patches  writes:
> >> >> The following makes sure to limit the shift operand when vectorizing
> >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift
> >> >> operand otherwise invokes undefined behavior.  When we determine
> >> >> whether we can demote the operand we know we at most shift in the
> >> >> sign bit so we can adjust the shift amount.
> >> >>
> >> >> Note this has the possibility of un-CSEing common shift operands
> >> >> as there's no good way to share pattern stmts between patterns.
> >> >> We'd have to separately pattern recognize the definition.
> >> >>
> >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >> >>
> >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but
> >> >> the fallback optimistic zero for out-of-range shifts is at least
> >> >> "corrrect".  Not sure we ever try to demote rotates (probably not).
> >> >
> >> > I guess you mean "correct" for x86?  But that's just a quirk of x86.
> >> > IMO the behaviour is equally wrong for LSHIFT_EXPR.
> >
> > I meant "correct" for the constant folding that evaluates out-of-bound
> > shifts as zero.
> >
> >> Sorry for the multiple messages.  Wanted to get something out quickly
> >> because I wasn't sure how long it would take me to write this...
> >> 
> >> On rotates, for:
> >> 
> >> void
> >> foo (unsigned short *restrict ptr)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> {
> >>   unsigned int x = ptr[i] & 0xff0;
> >>   ptr[i] = (x << 1) | (x >> 31);
> >> }
> >> }
> >> 
> >> we do get:
> >> 
> >> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31;
> >> 
> >> although aarch64 doesn't provide rrotate patterns, so nothing actually
> >> comes of it.
> >
> > I think it's still correct that we only need unsigned:13 for the input,
> > we know other bits are zero.  But of course when actually applying
> > this as documented
> >
> > /* Record that STMT_INFO could be changed from operating on TYPE to
> >operating on a type with the precision and sign given by PRECISION
> >and SIGN respectively.
> >
> > the operation itself has to be altered (the above doesn't suggest
> > promoting/demoting the operands to TYPE is the only thing to do).
> >
> > So it seems to be the burden is on the consumers of the information?
> 
> Yeah, textually that seems fair.  Not sure I was thinking of it in
> those terms at the time though. :)
> 
> >> I think the handling of variable shifts is flawed for other reasons.  
> >> Given:
> >> 
> >> void
> >> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> us (unsigned short *restrict ptr1, short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> su (short *restrict ptr1, unsigned short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> void
> >> ss (short *restrict ptr1, short *restrict ptr2)
> >> {
> >>   for (int i = 0; i < 200; ++i)
> >> ptr1[i] = ptr1[i] >> ptr2[i];
> >> }
> >> 
> >> we only narrow uu and ss, due to:
> >> 
> >>/* Ignore codes that don't take uniform arguments.  */
> >>if (!types_compatible_p (TREE_TYPE (op), type))
> >>  return;
> >
> > I suppose that's because we care about the shift operand at all here.
> > We could possibly use [0 .. precision-1] as known range for it
> > and only if that doesn't fit 'type' give up (and otherwise simply
> > ignore the input range of the shift operands here).
> >
> >> in vect_determine_precisions_from_range.  Maybe we should drop
> >> the shift handling from there and instead rely on
> >> vect_determine_precisions_from_users, extending:
> >> 
> >>if (TREE_CODE (shift) != INTEGER_CST
> >>|| !wi::ltu_p (wi::to_widest (shift), precision))
> >>  return;
> >> 
> >> to handle ranges where the max is known to be < precision.
> >> 
> >> There again, if masking is enough for right shifts and right rotates,
> >> maybe we should keep the current handling for then (with your fix)
> >> and skip the types_compatible_p check for those cases.
> >
> > I think it should be enough for left-shifts as well?  If we lshift
> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
> > op0 will still make us use HImode.  I think we only ever get overly
> > conservative answers for left-shifts from this function?
> 
> But if we have:
> 
>   short x, y;
>   int z = (int) x << (int) y;
> 
> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
> whereas x << y would invoke UB and x << (y & 15) would be 1.

True, but we start with the range of the LHS which in this case
would be of type 'i

RE: [PATCH 2/2][frontend]: Add novector C pragma

2023-08-02 Thread Joseph Myers

On Wed, 2 Aug 2023, Tamar Christina via Gcc-patches wrote:

> Ping.
> 
> > -Original Message-
> > From: Tamar Christina 
> > Sent: Wednesday, July 26, 2023 8:35 PM
> > To: Tamar Christina ; gcc-patches@gcc.gnu.org
> > Cc: nd ; jos...@codesourcery.com
> > Subject: RE: [PATCH 2/2][frontend]: Add novector C pragma
> > 
> > Hi, This is a respin of the patch taking in the feedback received from the 
> > C++
> > part.
> > 
> > Simultaneously it's also a ping 😊

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [C PATCH]: Add Walloc-type to warn about insufficient size in allocations

2023-08-02 Thread Martin Uecker via Gcc-patches

Am Mittwoch, dem 02.08.2023 um 16:45 + schrieb Qing Zhao:
> 
> > On Aug 1, 2023, at 10:31 AM, Martin Uecker  wrote:
> > 
> > Am Dienstag, dem 01.08.2023 um 13:27 + schrieb Qing Zhao:
> > > 
> > > > On Aug 1, 2023, at 3:51 AM, Martin Uecker via Gcc-patches 
> > > >  wrote:
> > > > 
> > 
> > 
> > > > > Hi Martin,
> > > > > Just wondering if it'd be a good idea perhaps to warn if alloc size is
> > > > > not a multiple of TYPE_SIZE_UNIT instead of just less-than ?
> > > > > So it can catch cases like:
> > > > > int *p = malloc (sizeof (int) + 2); // probably intended malloc
> > > > > (sizeof (int) * 2)
> > > > > 
> > > > > FWIW, this is caught using -fanalyzer:
> > > > > f.c: In function 'f':
> > > > > f.c:3:12: warning: allocated buffer size is not a multiple of the
> > > > > pointee's size [CWE-131] [-Wanalyzer-allocation-size]
> > > > >3 |   int *p = __builtin_malloc (sizeof(int) + 2);
> > > > >  |^~
> > > > > 
> > > > > Thanks,
> > > > > Prathamesh
> > > > 
> > > > Yes, this is probably a good idea.  It might need special
> > > > logic for flexible array members then...
> > > 
> > > Why special logic for FAM on such warning? (Not a multiple of 
> > > TYPE_SIZE_UNIT for the element).
> > > 
> > 
> > For
> > 
> > struct { int n; char buf[]; } *p = malloc(sizeof *p + n);
> > p->n = n;
> > 
> > the size would not be a multiple.
> 
> But n is still a multiple of sizeof (char), right? Do I miss anything here?

Right, for a struct with FAM we could check that it is
sizeof () plus a multiple of the element size of the FAM.
Still special logic... 

Martin


> Qing
> > 
> > Martin
> > 
> > 
> > 
> > 
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging

[PATCH v3] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches

Revised:
-- Remove superfluous { }
-- Reword diagnostic

---

This patch adds a hook to the end of ana::on_finish_translation_unit
which calls relevant stashing-related callbacks registered during plugin
initialization. This feature is used to stash named types and global
variables for a CPython analyzer plugin [PR107646].

gcc/analyzer/ChangeLog:
PR analyzer/107646
* analyzer-language.cc (run_callbacks): New function.
(on_finish_translation_unit): New function.
* analyzer-language.h (GCC_ANALYZER_LANGUAGE_H): New include.
(class translation_unit): New vfuncs.

gcc/c/ChangeLog:
PR analyzer/107646
* c-parser.cc: New functions on stashing values for the
  analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/plugin.exp: Add new plugin and test.
* gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin.
* gcc.dg/plugin/cpython-plugin-test-1.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/analyzer-language.cc |  22 ++
 gcc/analyzer/analyzer-language.h  |   9 +
 gcc/c/c-parser.cc |  24 ++
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 230 ++
 .../gcc.dg/plugin/cpython-plugin-test-1.c |   8 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   2 +
 6 files changed, 295 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-1.c

diff --git a/gcc/analyzer/analyzer-language.cc 
b/gcc/analyzer/analyzer-language.cc
index 2c8910906ee..85400288a93 100644
--- a/gcc/analyzer/analyzer-language.cc
+++ b/gcc/analyzer/analyzer-language.cc
@@ -35,6 +35,26 @@ static GTY (()) hash_map  
*analyzer_stashed_constants;
 #if ENABLE_ANALYZER
 
 namespace ana {
+static vec
+*finish_translation_unit_callbacks;
+
+void
+register_finish_translation_unit_callback (
+finish_translation_unit_callback callback)
+{
+  if (!finish_translation_unit_callbacks)
+vec_alloc (finish_translation_unit_callbacks, 1);
+  finish_translation_unit_callbacks->safe_push (callback);
+}
+
+static void
+run_callbacks (logger *logger, const translation_unit &tu)
+{
+  for (auto const &cb : finish_translation_unit_callbacks)
+{
+  cb (logger, tu);
+}
+}
 
 /* Call into TU to try to find a value for NAME.
If found, stash its value within analyzer_stashed_constants.  */
@@ -102,6 +122,8 @@ on_finish_translation_unit (const translation_unit &tu)
 the_logger.set_logger (new logger (logfile, 0, 0,
   *global_dc->printer));
   stash_named_constants (the_logger.get_logger (), tu);
+
+  run_callbacks (the_logger.get_logger (), tu);
 }
 
 /* Lookup NAME in the named constants stashed when the frontend TU finished.
diff --git a/gcc/analyzer/analyzer-language.h b/gcc/analyzer/analyzer-language.h
index 00f85aba041..8deea52d627 100644
--- a/gcc/analyzer/analyzer-language.h
+++ b/gcc/analyzer/analyzer-language.h
@@ -21,6 +21,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_ANALYZER_LANGUAGE_H
 #define GCC_ANALYZER_LANGUAGE_H
 
+#include "analyzer/analyzer-logging.h"
+
 #if ENABLE_ANALYZER
 
 namespace ana {
@@ -35,8 +37,15 @@ class translation_unit
  have been seen).  If it is defined and an integer (e.g. either as a
  macro or enum), return the INTEGER_CST value, otherwise return NULL.  */
   virtual tree lookup_constant_by_id (tree id) const = 0;
+  virtual tree lookup_type_by_id (tree id) const = 0;
+  virtual tree lookup_global_var_by_id (tree id) const = 0;
 };
 
+typedef void (*finish_translation_unit_callback)
+   (logger *, const translation_unit &);
+void register_finish_translation_unit_callback (
+finish_translation_unit_callback callback);
+
 /* Analyzer hook for frontends to call at the end of the TU.  */
 
 void on_finish_translation_unit (const translation_unit &tu);
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index cf82b0306d1..a3f216d90f8 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -1695,6 +1695,30 @@ public:
 return NULL_TREE;
   }
 
+  tree
+  lookup_type_by_id (tree id) const final override
+  {
+if (tree type_decl = lookup_name (id))
+   if (TREE_CODE (type_decl) == TYPE_DECL)
+ {
+   tree record_type = TREE_TYPE (type_decl);
+   if (TREE_CODE (record_type) == RECORD_TYPE)
+ return record_type;
+ }
+
+return NULL_TREE;
+  }
+
+  tree
+  lookup_global_var_by_id (tree id) const final override
+  {
+if (tree var_decl = lookup_name (id))
+  if (TREE_CODE (var_decl) == VAR_DECL)
+   return var_decl;
+
+return NULL_TREE;
+  }
+
 private:
   /* Attempt to get an INTEGER_CST from MACRO.
  Only handle the simplest cases: where MACRO's definition is a single
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/ana

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread Eric Feng via Gcc-patches

On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek  wrote:
>
> On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> >
> > Hi Eric, thanks for the updated patch.
> >
> > Overall, looks good to me, although I'd drop the "Exited." from the
> > "sorry" message (and thus from the dg-message directive), since the
> > compiler is not exiting, it's just the particular plugin that's giving
> > up (but let's not hold up the patch with a "bikeshed" discussion on the
> > precise wording).
> >
> > If Joseph or Marek approves the C parts of the patch, this will be OK
> > to push to trunk.
>
Sounds good. Revised.
>
> > > index cf82b0306d1..617111b0f0a 100644
> > > --- a/gcc/c/c-parser.cc
> > > +++ b/gcc/c/c-parser.cc
> > > @@ -1695,6 +1695,32 @@ public:
> > >  return NULL_TREE;
> > >}
> > >
> > > +  tree
> > > +  lookup_type_by_id (tree id) const final override
> > > +  {
> > > +if (tree type_decl = lookup_name (id))
> > > +  {
> > > +   if (TREE_CODE (type_decl) == TYPE_DECL)
> > > + {
> > > +   tree record_type = TREE_TYPE (type_decl);
> > > +   if (TREE_CODE (record_type) == RECORD_TYPE)
> > > + return record_type;
> > > + }
> > > +  }
>
> I'd drop this set of { }, like below.  OK with that adjusted, thanks.
Sounds good — fixed.
>
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > > +  tree
> > > +  lookup_global_var_by_id (tree id) const final override
> > > +  {
> > > +if (tree var_decl = lookup_name (id))
> > > +  if (TREE_CODE (var_decl) == VAR_DECL)
> > > +   return var_decl;
> > > +
> > > +return NULL_TREE;
> > > +  }
> > > +
> > >  private:
> > >/* Attempt to get an INTEGER_CST from MACRO.
> > >   Only handle the simplest cases: where MACRO's definition is a
>
> Marek
>

Thank you, everyone. I've submitted a new patch with the described
changes. As I do not yet have write access, could someone please help
me commit it? Otherwise, please let me know if I should request write
access first (the GettingStarted page suggested requesting someone
commit the patch for the first few patches before requesting write
access).

Best,
Eric

Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread Robin Dapp via Gcc-patches

> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?

That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.

> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.

Regards
 Robin

Re: [PATCH] gcc-13/changes.html: Add and fix URL to -fstrict-flex-array option.

2023-08-02 Thread Qing Zhao via Gcc-patches

Ping.

This is a very simple patch to correct a URL address in GCC13’s changes.html.
Currently, it’s pointing to a wrong address.

Okay for committing? 

> On Jul 21, 2023, at 3:02 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> In the current GCC13 release note, the URL to the option -fstrict-flex-array
> is wrong (pointing to -Wstrict-flex-array).
> This is the change to correct the URL and also add the URL in another place
> where -fstrict-flex-array is mentioned.
> 
> I have checked the resulting HTML file, works well.
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> ---
> htdocs/gcc-13/changes.html | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
> index 68e8c5cc..39b63a84 100644
> --- a/htdocs/gcc-13/changes.html
> +++ b/htdocs/gcc-13/changes.html
> @@ -46,7 +46,7 @@ You may also want to check out our
>   will no longer issue warnings for out of
>   bounds accesses to trailing struct members of one-element array type
>   anymore. Instead it diagnoses accesses to trailing arrays according to
> -  -fstrict-flex-arrays. 
> +   href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays.
>  
>  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Static-Analyzer-Options.html";>-fanalyzer
>   is still only suitable for analyzing C code.
>   In particular, using it on C++ is unlikely to give meaningful 
> output.
> @@ -213,7 +213,7 @@ You may also want to check out our
>  flexible array member for the purpose of accessing the elements of such
>  an array. By default, all trailing arrays in aggregates are treated as
>  flexible array members. Use the new command-line option
> -  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Warning-Options.html#index-Wstrict-flex-arrays";>-fstrict-flex-arrays
> +  href="https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/C-Dialect-Options.html#index-fstrict-flex-arrays";>-fstrict-flex-arrays
>  to control which array members are treated as flexible arrays.
>  
> 
> -- 
> 2.31.1
>

Re: [PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.

2023-08-02 Thread Qing Zhao via Gcc-patches

Ping…

thanks.

Qing

> On Jul 10, 2023, at 3:11 PM, Qing Zhao  wrote:
> 
> Hi,
> 
> This is the change for the GCC14 releaes Notes on the deprecating of a C
> extension about flexible array members.
> 
> Okay for committing?
> 
> thanks.
> 
> Qing
> 
> 
> 
> *htdocs/gcc-14/changes.html (Caveats): Add notice about deprecating a C
> extension about flexible array members.
> ---
> htdocs/gcc-14/changes.html | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 3f797642..c7f2ce4d 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -30,7 +30,15 @@ a work-in-progress.
> 
> Caveats
> 
> -  ...
> +  C:
> +  Support for the GCC extension, a structure containing a C99 flexible 
> array
> +  member, or a union containing such a structure, is not the last field 
> of
> +  another structure, is deprecated. Refer to
> +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html";>
> +  Zero Length Arrays.
> +  Any code relying on this extension should be modifed to ensure that
> +  C99 flexible array members only end up at the ends of structures.
> +  
> 
> 
> 
> -- 
> 2.31.1
>

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

2023-08-02 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> [...]
>> >> in vect_determine_precisions_from_range.  Maybe we should drop
>> >> the shift handling from there and instead rely on
>> >> vect_determine_precisions_from_users, extending:
>> >> 
>> >>   if (TREE_CODE (shift) != INTEGER_CST
>> >>   || !wi::ltu_p (wi::to_widest (shift), precision))
>> >> return;
>> >> 
>> >> to handle ranges where the max is known to be < precision.
>> >> 
>> >> There again, if masking is enough for right shifts and right rotates,
>> >> maybe we should keep the current handling for then (with your fix)
>> >> and skip the types_compatible_p check for those cases.
>> >
>> > I think it should be enough for left-shifts as well?  If we lshift
>> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
>> > op0 will still make us use HImode.  I think we only ever get overly
>> > conservative answers for left-shifts from this function?
>> 
>> But if we have:
>> 
>>   short x, y;
>>   int z = (int) x << (int) y;
>> 
>> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
>> whereas x << y would invoke UB and x << (y & 15) would be 1.
>
> True, but we start with the range of the LHS which in this case
> would be of type 'int' and thus 1 << 16 and not zero.  You
> might call that a failure of vect_determine_precisions_from_range
> of course, since it makes it not exactly a forward propagation ...

Ah, right, sorry.  I should have done more checking.

> [...]
>> > Originally I completely disabled shift support but that regressed
>> > the over-widen testcases a lot which at least have widened shifts
>> > by constants a lot.
>> >
>> > x86 has vector rotates only for AMD XOP (which is dead) plus
>> > some for V1TImode AFAICS, but I think we pattern-match rotates
>> > to shifts, so maybe the precision stuff is interesting for the
>> > case where we match the pattern rotate sequence for widenings?
>> >
>> > So for the types_compatible_p issue something along
>> > the following?  We could also exempt the shift operand from
>> > being covered by min_precision so the consumer would have
>> > to make sure it can be represented (I think that's never going
>> > to be an issue in practice until we get 256bit integers vectorized).
>> > It will have to fixup the shift operands anyway.
>> >
>> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
>> > index e4ab8c2d65b..cdeeaf98a47 100644
>> > --- a/gcc/tree-vect-patterns.cc
>> > +++ b/gcc/tree-vect-patterns.cc
>> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range 
>> > (stmt_vec_info stmt_info, gassign *stmt)
>> >   }
>> > else if (TREE_CODE (op) == SSA_NAME)
>> >   {
>> > -   /* Ignore codes that don't take uniform arguments.  */
>> > -   if (!types_compatible_p (TREE_TYPE (op), type))
>> > +   /* Ignore codes that don't take uniform arguments.  For shifts
>> > +  the shift amount is known to be in-range.  */
>> 
>> I guess it's more "we can assume that the amount is in range"?
>
> Yes.
>
>> > +   if (code == LSHIFT_EXPR
>> > +   || code == RSHIFT_EXPR
>> > +   || code == LROTATE_EXPR
>> > +   || code == RROTATE_EXPR)
>> > + {
>> > +   min_value = wi::min (min_value, 0, sign);
>> > +   max_value = wi::max (max_value, TYPE_PRECISION (type), 
>> > sign);
>> 
>> LGTM for shifts right.  Because of the above lshift thing, I think we
>> need something like:
>> 
>>   if (code == LSHIFT_EXPR || code == LROTATE_EXPR)
>> {
>>   wide_int op_min_value, op_max_value;
>>   if (!vect_get_range_info (op, &op_min_value, op_max_value))
>> return;
>> 
>>   /* We can ignore left shifts by negative amounts, which are UB.  */
>>   min_value = wi::min (min_value, 0, sign);
>> 
>>   /* Make sure the highest non-UB shift amount doesn't become UB.  */
>>   op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type));
>>   auto mask = wi::mask (TYPE_PRECISION (type), false,
>>  op_max_value.to_uhwi ());
>>   max_value = wi::max (max_value, mask, sign);
>> }
>> 
>> Does that look right?
>
> As said it looks overly conservative to me?  For example with my patch
> for
>
> void foo (signed char *v, int s)
> {
>   if (s < 1 || s > 7)
> return;
>   for (int i = 0; i < 1024; ++i)
> v[i] = v[i] << s;
> }
>
> I get
>
> t.c:5:21: note:   _7 has range [0xc000, 0x3f80]
> t.c:5:21: note:   can narrow to signed:15 without loss of precision: _7 = 
> _6 << s_12(D);
> t.c:5:21: note:   only the low 15 bits of _6 are significant
> t.c:5:21: note:   _6 has range [0xff80, 0x7f]
> ...
> t.c:5:21: note:   vect_recog_over_widening_pattern: detected: _7 = _6 << 
> s_12(D);
> t.c:5:21: note:   demoting int to signed short
> t.c:5:21: note:   Splitting statement: _6 = (int) _5;
> t.c:5:21: note:   into pattern statements: patt_24 = (signed short) _5;
> t.c:5:21: note:   and: patt_23 = (int) patt

Re: [PATCH v2] analyzer: stash values for CPython plugin [PR107646]

2023-08-02 Thread David Malcolm via Gcc-patches

On Wed, 2023-08-02 at 14:46 -0400, Eric Feng wrote:
> On Wed, Aug 2, 2023 at 1:20 PM Marek Polacek 
> wrote:
> > 
> > On Wed, Aug 02, 2023 at 12:59:28PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-02 at 12:20 -0400, Eric Feng wrote:
> > > 

[Dropping Joseph and Marek from the CC]

[...snip...]

> 
> 
> Thank you, everyone. I've submitted a new patch with the described
> changes. 

Thanks.

> As I do not yet have write access, could someone please help
> me commit it?

I've pushed the v3 trunk to patch, as r14-2933-gfafe2d18f791c6; you can
see it at [1], so you're now officially a GCC contributor,
congratulation!

FWIW I had to do a little whitespace fixing on the ChangeLog entries
before the server-side hooks.commit-extra-checker would pass, as they
were indented with spaces, rather than tabs, so it complained thusly:

remote: *** The following commit was rejected by your 
hooks.commit-extra-checker script (status: 1)
remote: *** commit: 0a4a2dc7dad1dfe22be0b48fe0d8c50d216c8349
remote: *** ChangeLog format failed:
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* analyzer-language.cc 
(run_callbacks): New function."
remote: *** ERR: line should start with a tab: "
(on_finish_translation_unit): New function."
remote: *** ERR: line should start with a tab: "* analyzer-language.h 
(GCC_ANALYZER_LANGUAGE_H): New include."
remote: *** ERR: line should start with a tab: "(class 
translation_unit): New vfuncs."
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* c-parser.cc: New 
functions on stashing values for the"
remote: *** ERR: line should start with a tab: "  analyzer."
remote: *** ERR: line should start with a tab: "PR analyzer/107646"
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/plugin.exp: Add new plugin and test."
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/analyzer_cpython_plugin.c: New plugin."
remote: *** ERR: line should start with a tab: "* 
gcc.dg/plugin/cpython-plugin-test-1.c: New test."
remote: *** ERR: PR 107646 in subject but not in changelog: "analyzer: stash 
values for CPython plugin [PR107646]"
remote: *** 
remote: *** Please see: https://gcc.gnu.org/codingconventions.html#ChangeLogs
remote: *** 
remote: error: hook declined to update refs/heads/master
To git+ssh://gcc.gnu.org/git/gcc.git
 ! [remote rejected] master -> master (hook declined)
error: failed to push some refs to 'git+ssh://dmalc...@gcc.gnu.org/git/gcc.git'

...but this was a trivial fix.  You can test that patches are properly
formatted by running:

  ./contrib/gcc-changelog/git_check_commit.py HEAD

locally.

>  Otherwise, please let me know if I should request write
> access first (the GettingStarted page suggested requesting someone
> commit the patch for the first few patches before requesting write
> access).

Please go ahead and request write access now; we should have done this
in the "community bonding" phase of GSoC; sorry for not catching this.

Thanks again for the patch.  How's the followup work?  Are you close to
being able to post one or more of the simpler known_function
subclasses?

Dave

[1] 
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fafe2d18f791c6b97b49af7c84b1b5703681c3af

Re: [COMMITTEDv3] tree-optimization: [PR100864] `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-08-02 Thread Andrew Pinski via Gcc-patches

On Wed, Aug 2, 2023 at 10:14 AM Andrew Pinski  wrote:
>
> On Wed, Aug 2, 2023 at 10:13 AM Prathamesh Kulkarni via Gcc-patches
>  wrote:
> >
> > On Mon, 31 Jul 2023 at 22:39, Andrew Pinski via Gcc-patches
> >  wrote:
> > >
> > > This is a new version of the patch.
> > > Instead of doing the matching of inversion comparison directly inside
> > > match, creating a new function (bitwise_inverted_equal_p) to do it.
> > > It is very similar to bitwise_equal_p that was added in 
> > > r14-2751-g2a3556376c69a1fb
> > > but instead it says `expr1 == ~expr2`. A follow on patch, will
> > > use this function in other patterns where we try to match `@0` and 
> > > `(bit_not @0)`.
> > >
> > > Changed the name bitwise_not_equal_p to bitwise_inverted_equal_p.
> > >
> > > Committed as approved after a Bootstrapped and test on x86_64-linux-gnu 
> > > with no regressions.
> > Hi Andrew,
> > Unfortunately, this patch (committed in
> > 2bae476b511dc441bf61da8a49cca655575e7dd6) causes
> > segmentation fault for pr33133.c on aarch64-linux-gnu because of
> > infinite recursion.
>
> A similar issue is recorded as PR 110874 which I am debugging right now.

Yes the issue is the same and is solved by the same patch.

Thanks,
Andrew

>
> Thanks,
> Andrew
>
> >
> > Running the test under gdb shows:
> > Program received signal SIGSEGV, Segmentation fault.
> > operand_compare::operand_equal_p (this=0x29dc680
> > , arg0=0xf7789a68, arg1=0xf7789f30,
> > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > 3088{
> > (gdb) bt
> > #0  operand_compare::operand_equal_p (this=0x29dc680
> > , arg0=0xf7789a68, arg1=0xf7789f30,
> > flags=16) at ../../gcc/gcc/fold-const.cc:3088
> > #1  0x00a90394 in operand_compare::verify_hash_value
> > (this=this@entry=0x29dc680 ,
> > arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > flags=flags@entry=0, ret=ret@entry=0xfc000157)
> > at ../../gcc/gcc/fold-const.cc:4074
> > #2  0x00a9351c in operand_compare::verify_hash_value
> > (ret=0xfc000157, flags=0, arg1=0xf7789f30,
> > arg0=0xf7789a68, this=0x29dc680 ) at
> > ../../gcc/gcc/fold-const.cc:4072
> > #3  operand_compare::operand_equal_p (this=this@entry=0x29dc680
> > , arg0=arg0@entry=0xf7789a68,
> > arg1=arg1@entry=0xf7789f30, flags=flags@entry=0) at
> > ../../gcc/gcc/fold-const.cc:3090
> > #4  0x00a9791c in operand_equal_p
> > (arg0=arg0@entry=0xf7789a68, arg1=arg1@entry=0xf7789f30,
> > flags=flags@entry=0) at ../../gcc/gcc/fold-const.cc:4105
> > #5  0x01d38dd0 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf7789f30, valueize=
> > 0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:284
> > #6  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf77d0240,
> > valueize=0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:296
> > #7  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf7789f30,
> > valueize=0x112d698 ) at
> > ../../gcc/gcc/gimple-match-head.cc:296
> > #8  0x01d38e80 in gimple_bitwise_inverted_equal_p
> > (expr1=0xf7789a68, expr2=0xf77d0240,
> > ...
> >
> > It seems to recurse cyclically with expr2=0xf7789f30 ->
> > expr2=0xf77d0240 eventually leading to segfault.
> > while expr1=0xf7789a68 remains same throughout the stack frames.
> >
> > Thanks,
> > Prathamesh
> > >
> > > PR tree-optimization/100864
> > >
> > > gcc/ChangeLog:
> > >
> > > * generic-match-head.cc (bitwise_inverted_equal_p): New function.
> > > * gimple-match-head.cc (bitwise_inverted_equal_p): New macro.
> > > (gimple_bitwise_inverted_equal_p): New function.
> > > * match.pd ((~x | y) & x): Use bitwise_inverted_equal_p
> > > instead of direct matching bit_not.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/bitops-3.c: New test.
> > > ---
> > >  gcc/generic-match-head.cc| 42 ++
> > >  gcc/gimple-match-head.cc | 71 
> > >  gcc/match.pd |  5 +-
> > >  gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 ++
> > >  4 files changed, 183 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
> > >
> > > diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> > > index a71c0727b0b..ddaf22f2179 100644
> > > --- a/gcc/generic-match-head.cc
> > > +++ b/gcc/generic-match-head.cc
> > > @@ -121,3 +121,45 @@ bitwise_equal_p (tree expr1, tree expr2)
> > >  return wi::to_wide (expr1) == wi::to_wide (expr2);
> > >return operand_equal_p (expr1, expr2, 0);
> > >  }
> > > +
> > > +/* Return true if EXPR1 and EXPR2 have the bitwise opposite value,
> > > +   but not necessarily same type.
> > > +   The types can differ through nop conversions.  */
> > > +
> > > +static inline bool
> > > +

Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread 钟居哲

I just checked LLVM:
https://godbolt.org/z/nMa6qnEeT 

This patch generally is reasonable so LGTM.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin

Re: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.

2023-08-02 Thread 钟居哲

Plz put your testcases into:

# widening operation only test on LMUL < 8
set AUTOVEC_TEST_OPTS [list \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O3 --param riscv-autovec-lmul=m4} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m1} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m2} \
  {-ftree-vectorize -O2 --param riscv-autovec-lmul=m4} ]
foreach op $AUTOVEC_TEST_OPTS {
  dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/autovec/widen/*.\[cS\]]] \
"" "$op"
}

You could either simpilfy put them into "widen" directory or create a new 
directly.
Anyway, make sure you have fully tested it with LMUL = 1/2/4.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-03 02:49
To: 钟居哲; gcc-patches; palmer; kito.cheng; Jeff Law
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Implement vector "average" autovec pattern.
> 1. How do you model round to +Inf (avg_floor) and round to -Inf (avg_ceil) ?
 
That's just specified by the +1 or the lack of it in the original pattern.
Actually the IFN is just a detour because we would create perfect code
if not for the fallback.  But as there is currently now way to check for
the existence of a narrowing shift we cannot circumvent the fallback.
 
> 2. Is it possible we could use vaadd[u] to model avg ?
In principle yes (I first read it wrong that overflow must not happen but the
specs actually say that it does not happen).
However, we would need to set a rounding mode before vaadd or check its current
value and provide a fallback.  Off the spot I can't imagine a workaround like
two vaadds or so.
 
Regards
Robin

Re: [PATCH 1/2] Move `~X & X` and `~X | X` over to use bitwise_inverted_equal_p

2023-08-02 Thread Andrew Pinski via Gcc-patches

On Wed, Aug 2, 2023 at 1:25 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Wed, Aug 02, 2023 at 10:04:26AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -1157,8 +1157,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >
> > >  /* Simplify ~X & X as zero.  */
> > >  (simplify
> > > - (bit_and:c (convert? @0) (convert? (bit_not @0)))
> > > -  { build_zero_cst (type); })
> > > + (bit_and (convert? @0) (convert? @1))
> > > + (if (bitwise_inverted_equal_p (@0, @1))
> > > +  { build_zero_cst (type); }))
>
> I wonder if the above isn't incorrect.
> Without the possibility of widening converts it would be ok,
> but for widening conversions it is significant not just that
> the bits of @0 and @1 are inverted, but also that they are either
> both signed or both unsigned and so the MS bit (which is guaranteed
> to be different) extends to 0s in one case and to all 1s in the other
> one, so that even the upper bits are inverted.
> But that isn't the case here.  Something like (untested):
> long long
> foo (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return ((long long) x) & y;
> }
> Actually maybe for this pattern it happens to be ok, because while
> the upper bits in this case might not be inverted between the extended
> operands (if x has msb set), it will be 0 & 0 in the upper bits.
>
> > >
> > >  /* PR71636: Transform x & ((1U << b) - 1) -> x & ~(~0U << b);  */
> > >  (simplify
> > > @@ -1395,8 +1396,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  /* ~x ^ x -> -1 */
> > >  (for op (bit_ior bit_xor)
> > >   (simplify
> > > -  (op:c (convert? @0) (convert? (bit_not @0)))
> > > -  (convert { build_all_ones_cst (TREE_TYPE (@0)); })))
> > > +  (op (convert? @0) (convert? @1))
> > > +  (if (bitwise_inverted_equal_p (@0, @1))
> > > +   (convert { build_all_ones_cst (TREE_TYPE (@0)); }
>
> But not here.
> long long
> bar (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return ((long long) x) ^ y;
> }
>
> long long
> baz (unsigned int x)
> {
>   int y = x;
>   y = ~y;
>   return y ^ ((long long) x);
> }
> You pick TREE_TYPE (@0), but that is a random signedness if the two
> operands have different signedness.

Oh you are correct, I am testing a patch which adds the test to make
sure the types of @0 and @1 match which brings us back to basically
was done beforehand and still provides the benefit of using
bitwise_inverted_equal_p for the comparisons.

Thanks,
Andrew

>
> Jakub
>

[x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Roger Sayle


This patch is a conservative fix for PR target/110792, a wrong-code
regression affecting doubleword rotations by BITS_PER_WORD, which
effectively swaps the highpart and lowpart words, when the source to be
rotated resides in memory. The issue is that if the register used to
hold the lowpart of the destination is mentioned in the address of
the memory operand, the current define_insn_and_split unintentionally
clobbers it before reading the highpart.

Hence, for the testcase, the incorrectly generated code looks like:

salq$4, %rdi// calculate address
movqWHIRL_S+8(%rdi), %rdi   // accidentally clobber addr
movqWHIRL_S(%rdi), %rbp // load (wrong) lowpart

Traditionally, the textbook way to fix this would be to add an
explicit early clobber to the instruction's constraints.

 (define_insn_and_split "32di2_doubleword"
- [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+ [(set (match_operand:DI 0 "register_operand" "=r,r,&r")
(any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
   (const_int 32)))]

but unfortunately this currently generates significantly worse code,
due to a strange choice of reloads (effectively memcpy), which ends up
looking like:

salq$4, %rdi// calculate address
movdqa  WHIRL_S(%rdi), %xmm0// load the double word in SSE reg.
movaps  %xmm0, -16(%rsp)// store the SSE reg back to the
stack
movq-8(%rsp), %rdi  // load highpart
movq-16(%rsp), %rbp // load lowpart

Note that reload's "&" doesn't distinguish between the memory being
early clobbered, vs the registers used in an addressing mode being
early clobbered.

The fix proposed in this patch is to remove the third alternative, that
allowed offsetable memory as an operand, forcing reload to place the
operand into a register before the rotation.  This results in:

salq$4, %rdi
movqWHIRL_S(%rdi), %rax
movqWHIRL_S+8(%rdi), %rdi
movq%rax, %rbp

I believe there's a more advanced solution, by swapping the order of
the loads (if first destination register is mentioned in the address),
or inserting a lea insn (if both destination registers are mentioned
in the address), but this fix is a minimal "safe" solution, that
should hopefully be suitable for backporting.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-08-02  Roger Sayle  

gcc/ChangeLog
PR target/110792
* config/i386/i386.md (ti3): For rotations by 64 bits
place operand in a register before gen_64ti2_doubleword.
(di3): Likewise, for rotations by 32 bits, place
operand in a register before gen_32di2_doubleword.
(32di2_doubleword): Constrain operand to be in register.
(64ti2_doubleword): Likewise.

gcc/testsuite/ChangeLog
PR target/110792
* g++.target/i386/pr110792.C: New 32-bit C++ test case.
* gcc.target/i386/pr110792.c: New 64-bit C test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 4db210c..849e1de 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -15340,7 +15340,10 @@
 emit_insn (gen_ix86_ti3_doubleword
(operands[0], operands[1], operands[2]));
   else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 64)
-emit_insn (gen_64ti2_doubleword (operands[0], operands[1]));
+{
+  operands[1] = force_reg (TImode, operands[1]);
+  emit_insn (gen_64ti2_doubleword (operands[0], operands[1]));
+}
   else
 {
   rtx amount = force_reg (QImode, operands[2]);
@@ -15375,7 +15378,10 @@
 emit_insn (gen_ix86_di3_doubleword
(operands[0], operands[1], operands[2]));
   else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32)
-emit_insn (gen_32di2_doubleword (operands[0], operands[1]));
+{
+  operands[1] = force_reg (DImode, operands[1]);
+  emit_insn (gen_32di2_doubleword (operands[0], operands[1]));
+}
   else
 FAIL;
 
@@ -15543,8 +15549,8 @@
 })
 
 (define_insn_and_split "32di2_doubleword"
- [(set (match_operand:DI 0 "register_operand" "=r,r,r")
-   (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o")
+ [(set (match_operand:DI 0 "register_operand" "=r,r")
+   (any_rotate:DI (match_operand:DI 1 "register_operand" "0,r")
   (const_int 32)))]
  "!TARGET_64BIT"
  "#"
@@ -15561,8 +15567,8 @@
 })
 
 (define_insn_and_split "64ti2_doubleword"
- [(set (match_operand:TI 0 "register_operand" "=r,r,r")
-   (any_rotate:TI (match_operand:TI 1 "nonimmediate_operand" "0,r,o")
+ [(set (match_operand:TI 0 "register_operand" "=r,r")
+   (any_rotate:TI (match_operand:TI 1 "register_operand" "0,r")
   (const_int 64)))]

[PATCH] MATCH: first of the value replacement moving from phiopt

2023-08-02 Thread Andrew Pinski via Gcc-patches

This moves a few simple patterns that are done in value replacement
in phiopt over to match.pd. Just the simple ones which might show up
in other code.

This allows some optimizations to happen even without depending
on sinking from happening and in some cases where phiopt is not
invoked (cond-1.c is an example there).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (`a == 0 ? b : b + a`,
`a == 0 ? b : b - a`): New patterns.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/cond-1.c: New test.
* gcc.dg/tree-ssa/phi-opt-33.c: New test.
* gcc.dg/tree-ssa/phi-opt-34.c: New test.
---
 gcc/match.pd   | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/cond-1.c | 17 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 19 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c | 17 +
 4 files changed, 67 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c

diff --git a/gcc/match.pd b/gcc/match.pd
index c62f205c13c..1ceff9691a0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3832,6 +3832,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (mult (convert:type @0) @2) @1
 
+/* ?: Value replacement. */
+/* a == 0 ? b : b + a  -> b + a */
+(for op (plus bit_ior bit_xor)
+ (simplify
+  (cond (eq @0 integer_zerop) @1 (op:c@2 @1 @0))
+   @2))
+/* a == 0 ? b : b - a  -> b - a */
+/* a == 0 ? b : b ptr+ a  -> b ptr+ a */
+/* a == 0 ? b : b shift/rotate a -> b shift/rotate a */
+(for op (lrotate rrotate lshift rshift minus pointer_plus)
+ (simplify
+  (cond (eq @0 integer_zerop) @1 (op@2 @1 @0))
+   @2))
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
new file mode 100644
index 000..478a818b206
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cond-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized-raw" } */
+
+int sub(int a, int b, int c, int d) {
+  int e = (a == 0);
+  int f = !e;
+  c = b;
+  d = b - a ;
+  return ((-e & c) | (-f & d));
+}
+
+/* In the end we end up with `(a == 0) ? (b - a) : b`
+   which then can be optimized to just `(b - a)`. */
+
+/* { dg-final { scan-tree-dump-not "cond_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-not "eq_expr," "optimized" } } */
+/* { dg-final { scan-tree-dump-times "minus_expr," 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
new file mode 100644
index 000..809ccfe1479
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* Phi-OPT should be able to optimize this without sinking being invoked. */
+/* { dg-options "-O -fdump-tree-phiopt2 -fdump-tree-optimized -fno-tree-sink" 
} */
+
+int f(int a, int b, int c) {
+  int d = a + b;
+  if (c > 5) return c;
+  if (a == 0) return b;
+  return d;
+}
+
+unsigned rot(unsigned x, int n) {
+  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
+  int t = ((x << n) | (x >> (bits - n)));
+  return (n == 0) ? x : t;
+}
+
+/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt2" } } */
+/* { dg-final { scan-tree-dump-times "goto" 2 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c
new file mode 100644
index 000..a90de8926c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-34.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* Phi-OPT should be able to optimize this without sinking being invoked. */
+/* { dg-options "-O -fdump-tree-phiopt2 -fdump-tree-optimized -fno-tree-sink" 
} */
+
+char *f(char *a, __SIZE_TYPE__ b) {
+  char *d = a + b;
+  if (b == 0) return a;
+  return d;
+}
+int sub(int a, int b, int c) {
+  int d = a - b;
+  if (b == 0) return a;
+  return d;
+}
+
+/* { dg-final { scan-tree-dump-not "goto" "phiopt2" } } */
+/* { dg-final { scan-tree-dump-not "goto" "optimized" } } */
-- 
2.31.1

RE: [PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches

Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Wednesday, August 2, 2023 9:48 PM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWSUB rounding mode intrinsic API

LGTM, thanks:)

Pan Li via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>> 於 2023年8月2日 週三 18:19 
寫道：
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFWSUB
for the below samples.

* __riscv_vfwsub_vv_f64m2_rm
* __riscv_vfwsub_vv_f64m2_rm_m
* __riscv_vfwsub_vf_f64m2_rm
* __riscv_vfwsub_vf_f64m2_rm_m
* __riscv_vfwsub_wv_f64m2_rm
* __riscv_vfwsub_wv_f64m2_rm_m
* __riscv_vfwsub_wf_f64m2_rm
* __riscv_vfwsub_wf_f64m2_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (BASE): Add
vfwsub frm.
* config/riscv/riscv-vector-builtins-bases.h: Add declaration.
* config/riscv/riscv-vector-builtins-functions.def (vfwsub_frm):
Add vfwsub function definitions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-widening-sub.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 +
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  4 ++
 .../riscv/rvv/base/float-point-widening-sub.c | 66 +++
 4 files changed, 74 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 981a4a7ede8..ddf694c771c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -317,6 +317,7 @@ public:

 /* Implements below instructions for frm
- vfwadd
+   - vfwsub
 */
 template
 class widen_binop_frm : public function_base
@@ -2100,6 +2101,7 @@ static CONSTEXPR const reverse_binop_frm 
vfrsub_frm_obj;
 static CONSTEXPR const widen_binop vfwadd_obj;
 static CONSTEXPR const widen_binop_frm vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
+static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
@@ -2330,6 +2332,7 @@ BASE (vfrsub_frm)
 BASE (vfwadd)
 BASE (vfwadd_frm)
 BASE (vfwsub)
+BASE (vfwsub_frm)
 BASE (vfmul)
 BASE (vfdiv)
 BASE (vfrdiv)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index f9e1df5fe75..5800fca0169 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -150,6 +150,7 @@ extern const function_base *const vfrsub_frm;
 extern const function_base *const vfwadd;
 extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
+extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
 extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 743205a9b97..58a7224fe0c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -306,8 +306,12 @@ DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwsub, widen_alu, full_preds, f_wwf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wvf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wvf_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwv_ops)
 DEF_RVV_FUNCTION (vfwadd_frm, widen_alu_frm, full_preds, f_wwf_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwv_ops)
+DEF_RVV_FUNCTION (vfwsub_frm, widen_alu_frm, full_preds, f_wwf_ops)

 // 13.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
 DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
new file mode 100644
index 000..4325cc510a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-widening-sub.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfwsub_vv_f64m2_rm (op1, op2, 0, vl);
+}
+
+vfloat64m2_t
+test_vfwsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+  size_t vl) {
+  return __risc

[PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  |  3 ++
 .../riscv/riscv-vector-builtins-bases.h   |  2 +-
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 gcc/config/riscv/vector.md|  2 +-
 .../riscv/rvv/base/float-point-single-mul.c   | 44 +++
 5 files changed, 51 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
 
 /* Implements below instructions for now.
- vfadd
+   - vfmul
 */
 template
 class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
 static CONSTEXPR const widen_binop vfwsub_obj;
 static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
 static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
 static CONSTEXPR const binop vfdiv_obj;
 static CONSTEXPR const reverse_binop vfrdiv_obj;
 static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
 BASE (vfwsub)
 BASE (vfwsub_frm)
 BASE (vfmul)
+BASE (vfmul_frm)
 BASE (vfdiv)
 BASE (vfrdiv)
 BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
 DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
 DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
 
 // 13.5. Vector Widening Floating-Point Multiply
 DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
 
 ;; Defines rounding mode of an floating-point operation.
 (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm (op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t op1, vfloat32m1_t op2,
+ size_t vl) {
+  return __riscv_vfmul_vv_f32m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfmul_vf_f32m1_rm (vfloat32m1_t op1, float32_t op2, size_t vl

Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai

extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the VFMUL
for the below samples.
 
* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul")
  (cond
   [(match_test "INTVAL (operands[9]) == riscv_vector::FRM_RNE")
(const_string "rne")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
new file mode 100644
index 000..e6410ea3a37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmul_vv_f32m1_rm (vfloat32m1_t op1

RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches

Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq_attr "type" "vfalu,vfwalu")
+  (cond [(eq_attr "type" "vfalu,vfwalu,vfmul

Re: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread juzhe.zh...@rivai.ai

Could you split it into 2 patches ?

one is cleanup patch which is removing the redundant declaration.

The other is support VFMUL API.



juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-08-03 09:44
To: juzhe.zh...@rivai.ai; gcc-patches
CC: Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.
 
Pan
 
From: juzhe.zh...@rivai.ai  
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
 
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
 
It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 
 


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the VFMUL
for the below samples.
 
* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -319,6 +319,8 @@ DEF_RVV_FUNCTION (vfmul, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvv_ops)
DEF_RVV_FUNCTION (vfdiv, alu, full_preds, f_vvf_ops)
DEF_RVV_FUNCTION (vfrdiv, alu, full_preds, f_vvf_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvv_ops)
+DEF_RVV_FUNCTION (vfmul_frm, alu_frm, full_preds, f_vvf_ops)
// 13.5. Vector Widening Floating-Point Multiply
DEF_RVV_FUNCTION (vfwmul, alu, full_preds, f_wvv_ops)
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 65f36744f54..5d3e4256cd5 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -866,7 +866,7 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
;; Defines rounding mode of an floating-point operation.
(define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
-  (cond [(eq

RE: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

2023-08-02 Thread Li, Pan2 via Gcc-patches

Sure thing, will prepare it on the double.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Thursday, August 3, 2023 10:02 AM
To: Li, Pan2 ; gcc-patches 
Cc: Wang, Yanzhang ; kito.cheng 
Subject: Re: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic 
API

Could you split it into 2 patches ?

one is cleanup patch which is removing the redundant declaration.

The other is support VFMUL API.


juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-08-03 09:44
To: juzhe.zh...@rivai.ai; 
gcc-patches
CC: Wang, Yanzhang; 
kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
Yes, looks there is some I missed after the last cleanup. I will have a double 
check after rounding API support.

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Thursday, August 3, 2023 9:40 AM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>; kito.cheng 
mailto:kito.ch...@gmail.com>>
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API

extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;

It seems that there is a redundant declaration in the original code?
extern const function_base *const vfmul;
-extern const function_base *const vfmul;



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-03 09:38
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMUL rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the VFMUL
for the below samples.

* __riscv_vfmul_vv_f32m1_rm
* __riscv_vfmul_vv_f32m1_rm_m
* __riscv_vfmul_vf_f32m1_rm
* __riscv_vfmul_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfmul_frm_obj): New declaration.
(Base): Likewise.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfmul_frm): New function definition.
* config/riscv/vector.md: Add vfmul to frm_mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-single-mul.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  3 ++
.../riscv/riscv-vector-builtins-bases.h   |  2 +-
.../riscv/riscv-vector-builtins-functions.def |  2 +
gcc/config/riscv/vector.md|  2 +-
.../riscv/rvv/base/float-point-single-mul.c   | 44 +++
5 files changed, 51 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-single-mul.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index ddf694c771c..3adc11138a3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -277,6 +277,7 @@ public:
/* Implements below instructions for now.
- vfadd
+   - vfmul
*/
template
class binop_frm : public function_base
@@ -2103,6 +2104,7 @@ static CONSTEXPR const widen_binop_frm 
vfwadd_frm_obj;
static CONSTEXPR const widen_binop vfwsub_obj;
static CONSTEXPR const widen_binop_frm vfwsub_frm_obj;
static CONSTEXPR const binop vfmul_obj;
+static CONSTEXPR const binop_frm vfmul_frm_obj;
static CONSTEXPR const binop vfdiv_obj;
static CONSTEXPR const reverse_binop vfrdiv_obj;
static CONSTEXPR const widen_binop vfwmul_obj;
@@ -2334,6 +2336,7 @@ BASE (vfwadd_frm)
BASE (vfwsub)
BASE (vfwsub_frm)
BASE (vfmul)
+BASE (vfmul_frm)
BASE (vfdiv)
BASE (vfrdiv)
BASE (vfwmul)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..9c12a6b4e8f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,7 @@ extern const function_base *const vfwadd_frm;
extern const function_base *const vfwsub;
extern const function_base *const vfwsub_frm;
extern const function_base *const vfmul;
-extern const function_base *const vfmul;
+extern const function_base *const vfmul_frm;
extern const function_base *const vfdiv;
extern const function_base *const vfrdiv;
extern const function_base *const vfwmul;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 58a7224fe0c..35a83ef239c 100644
--- a/gcc/co

[PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Pan Li via Gcc-patches

From: Pan Li 

This patch would like to remove the redudant declaration.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove
redudant declaration.
---
 gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..f40b022239d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
-- 
2.34.1

Re: [PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Kito Cheng via Gcc-patches

LGTM

 於 2023年8月3日 週四 10:11 寫道：

> From: Pan Li 
>
> This patch would like to remove the redudant declaration.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.h: Remove
> redudant declaration.
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h
> b/gcc/config/riscv/riscv-vector-builtins-bases.h
> index 5800fca0169..f40b022239d 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.h
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
> @@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
>  extern const function_base *const vfwsub;
>  extern const function_base *const vfwsub_frm;
>  extern const function_base *const vfmul;
> -extern const function_base *const vfmul;
>  extern const function_base *const vfdiv;
>  extern const function_base *const vfrdiv;
>  extern const function_base *const vfwmul;
> --
> 2.34.1
>
>

RE: [PATCH v1] RISC-V: Remove redudant extern declaration in function base

2023-08-02 Thread Li, Pan2 via Gcc-patches

Committed, thanks Kito.

Pan

From: Kito Cheng 
Sent: Thursday, August 3, 2023 10:12 AM
To: Li, Pan2 
Cc: GCC Patches ; 钟居哲 ; Wang, 
Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Remove redudant extern declaration in function 
base

LGTM

mailto:pan2...@intel.com>> 於 2023年8月3日 週四 10:11 寫道：
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to remove the redudant declaration.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.h: Remove
redudant declaration.
---
 gcc/config/riscv/riscv-vector-builtins-bases.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5800fca0169..f40b022239d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -152,7 +152,6 @@ extern const function_base *const vfwadd_frm;
 extern const function_base *const vfwsub;
 extern const function_base *const vfwsub_frm;
 extern const function_base *const vfmul;
-extern const function_base *const vfmul;
 extern const function_base *const vfdiv;
 extern const function_base *const vfrdiv;
 extern const function_base *const vfwmul;
--
2.34.1

1 2 >

1 - 100 of 126 matches

Mail list logo