date:20230531

Re: [PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-31 Thread Jiufu Guo via Gcc-patches

Hi,

Jiufu Guo via Gcc-patches  writes:

> Hi,
>
> Richard Biener  writes:
>
>> On Wed, 17 May 2023, Jiufu Guo wrote:
>>
>>> Hi,
>>> 
>>> This patch tries to optimize "(X - N * M) / N + M" to "X / N".
>>
>> But if that's valid why not make the transform simpler and transform
>> (X - N * M) / N  to X / N - M instead?
>
> Great catch!
> If "N * M" is not constant, "X / N - M" would be better than
> "(X - N * M) / N".  If "N, M" are constants, "(X - N * M) / N" and
> "X / N - M" may be similar; while for this case, "X / N - M" should
> also be fine!  I would try to update accordingly. 
>
>>
>> You use the same optimize_x_minus_NM_div_N_plus_M validator for
>> the division and shift variants but the overflow rules are different,
>> so I'm not sure that's warranted.  I'd also prefer to not split out
>> the validator to a different file - iff then the appropriate file
>> is fold-const.cc, not gimple-match-head.cc (I see we're a bit
>> inconsistent here, for pure gimple matches gimple-fold.cc would
>> be another place).
>
> Thanks for pointing out this!
> For shift,  I guess you may concern that: 1. if the right operand is
> negative or is greater than or equal to the type width.  2. if it is
> a signed negative value.  They may UB or 'sign bit shift'?  This patch
> assumes it is ok to do the transform.  I may have more check to see
> if this is really ok, and hope some one can point out if this is
> invalid. "(X - N * M) >> log2(N)" ==> " X >> log2(N) - M".
>
> I split out the validator just because: it is shared for division and
> shift :).  And it seems gimple-match-head.cc and generic-match-head.cc,
> may be introduced for match.pd.  So, I put it into gimple-match-head.cc.
>
>>
>> Since you use range information why is the transform restricted
>> to constant M?
>
> If M is a variable, the range for "X" is varying_p. I did not find
> the method to get the bounds for "X" (or for "X - N * M") to check no
> wraps.  Any suggestions?

Oh, I may misunderstand here.
You may say: M could be with a range too, then we can check if
"X - N * M" has a valid range or possible wrap/overflow. 

BR,
Jeff (Jiufu Guo)

>
>
> Again, thanks for your great help!
>
> BR,
> Jeff (Jiufu Guo)
>
>>
>> Richard.
>>
>>> As per the discussions in PR108757, we know this transformation is valid
>>> only under some conditions.
>>> For C code, "/" towards zero (trunc_div), and "X - N * M"
>>> maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
>>> not cross zero and does not wrap/overflow/underflow.
>>> 
>>> This patch also handles the case when "N" is the power of 2, where
>>> "(X - N * M) / N" is "(X - N * M) >> log2(N)".
>>> 
>>> Bootstrap & regtest pass on ppc64{,le} and x86_64.
>>> Is this ok for trunk?
>>> 
>>> BR,
>>> Jeff (Jiufu)
>>> 
>>> PR tree-optimization/108757
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
>>> * match.pd ((X - N * M) / N + M): New pattern.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> * gcc.dg/pr108757-1.c: New test.
>>> * gcc.dg/pr108757-2.c: New test.
>>> * gcc.dg/pr108757.h: New test.
>>> 
>>> ---
>>>  gcc/gimple-match-head.cc  |  54 ++
>>>  gcc/match.pd  |  22 
>>>  gcc/testsuite/gcc.dg/pr108757-1.c |  17 
>>>  gcc/testsuite/gcc.dg/pr108757-2.c |  18 
>>>  gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
>>>  5 files changed, 271 insertions(+)
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
>>> 
>>> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
>>> index b08cd891a13..680a4cb2fc6 100644
>>> --- a/gcc/gimple-match-head.cc
>>> +++ b/gcc/gimple-match-head.cc
>>> @@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
>>> inner_div)
>>>  }
>>>return true;
>>>  }
>>> +
>>> +/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
>>> +   Otherwise return false.
>>> +
>>> +   For unsigned,
>>> +   If sign bit of M is 0 (clz is 0), valid range is [N*M, MAX].
>>> +   If sign bit of M is 1, valid range is [0, MAX - N*(-M)].
>>> +
>>> +   For signed,
>>> +   If N*M > 0, valid range: [MIN+N*M, 0] + [N*M, MAX]
>>> +   If N*M < 0, valid range: [MIN, -(-N*M)] + [0, MAX - (-N*M)].  */
>>> +
>>> +static bool
>>> +optimize_x_minus_NM_div_N_plus_M (tree x, wide_int n, wide_int m, tree 
>>> type)
>>> +{
>>> +  wide_int max = wi::max_value (type);
>>> +  signop sgn = TYPE_SIGN (type);
>>> +  wide_int nm;
>>> +  wi::overflow_type ovf;
>>> +  if (TYPE_UNSIGNED (type) && wi::clz (m) == 0)
>>> +nm = wi::mul (n, -m, sgn, );
>>> +  else
>>> +nm = wi::mul (n, m, sgn, );
>>> +
>>> +  if (ovf != wi::OVF_NONE)
>>> +return false;
>>> +
>>> +  value_range vr0;
>>> +  if (!get_range_query (cfun)->range_of_expr (vr0, x) || vr0.varying_p ()
>>> +  || vr0.undefined_p ())
>>> +return

Re: [PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread Kewen.Lin via Gcc-patches

Hi,

on 2023/6/1 13:00, juzhe.zh...@rivai.ai wrote:
> This patch is no difference from V2.

I support this patch based on the testing and SPEC2017 evaluation
results on Power (see my comments on patch v2).

> Just add PR tree-optimization/109971 as Kewen's suggested.

Thanks for adding that, I was expecting you will add that when you
are committing it, not really requesting one new version. :)  btw,
the PR marker(s) will trigger scripts to comment some commit info
(commit link, commit log) into the specified PR(s), people can
find some connections between PRs and (fixing or progressing forward)
commits easily.

BR,
Kewen

> 
> Already bootstrapped and Regression on X86 no difference.
> 
> Ok for trunk ?
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong 
> *Date:* 2023-06-01 12:36
> *To:* gcc-patches 
> *CC:* richard.sandiford ; rguenther 
> ; linkw ; Ju-Zhe Zhong 
> 
> *Subject:* [PATCH V3] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>    remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>    old_remain = remain;
>    len = MIN (vf, remain);
>    remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
>   PR tree-optimization/109971
>  
> gcc/ChangeLog:
>  
>     * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
>     (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>    tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>    tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>    ...
>    vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>    ...
> -    ivtmp_35 = ivtmp_9 - _36;
> +    ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>    ...
> -    if (ivtmp_35 != 0)
> +    if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>    else
>  goto ; [16.67%]
> @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>    tree step = rgc->controls.length () == 1 ? rgc->controls[0]
>    : make_ssa_name (iv_type);
>    /* Create decrement IV.  */
> -  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, 
> _gsi,
> - insert_after, _before_incr, _after_incr);
> +  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
> + _gsi, insert_after, _before_incr,
> + _after_incr);
>    gimple_seq_add_stmt (header_seq, gimple_build_assign (step, 
> MIN_EXPR,
>     index_before_incr,
>     nitems_step));
>    *iv_step = step;
> -  return index_after_incr;
> +  *compare_step = nitems_step;
>

PATCH v5 4/4] ree: Improve ree pass for rs6000 target using defined ABI interfaces.

2023-05-31 Thread Ajit Agarwal via Gcc-patches

Hello All:

This new version of patch 4 use improve ree pass for rs6000 target using 
defined ABI interfaces.
Bootstrapped and regtested on power64-linux-gnu.

Review comments incorporated.

Thanks & Regards
Ajit

Improve ree pass for rs6000 target using defined abi interfaces

For rs6000 target we see redundant zero and sign
extension and done to improve ree pass to eliminate
such redundant zero and sign extension using defined
ABI interfaces.

2023-06-01  Ajit Kumar Agarwal  

gcc/ChangeLog:

* ree.cc (combine_reaching_defs): Use of  zero_extend and sign_extend
defined abi interfaces.
(add_removable_extension): Use of defined abi interfaces for no
reaching defs.
(abi_extension_candidate_return_reg_p): New function.
(abi_extension_candidate_p): New function.
(abi_extension_candidate_argno_p): New function.
(abi_handle_regs_without_defs_p): New function.
(abi_target_promote_function_mode): New function.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/zext-elim-3.C
---
 gcc/ree.cc| 199 +++---
 .../g++.target/powerpc/zext-elim-3.C  |  13 ++
 2 files changed, 183 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-3.C

diff --git a/gcc/ree.cc b/gcc/ree.cc
index fc04249fa84..2025a7c43da 100644
--- a/gcc/ree.cc
+++ b/gcc/ree.cc
@@ -514,7 +514,8 @@ get_uses (rtx_insn *insn, rtx reg)
 if (REGNO (DF_REF_REG (def)) == REGNO (reg))
   break;
 
-  gcc_assert (def != NULL);
+  if (def == NULL)
+return NULL;
 
   ref_chain = DF_REF_CHAIN (def);
 
@@ -750,6 +751,120 @@ get_extended_src_reg (rtx src)
   return src;
 }
 
+/* Return TRUE if target mode is equal to source mode of zero_extend
+   or sign_extend otherwise false.  */
+
+static bool
+abi_target_promote_function_mode (machine_mode mode)
+{
+  int unsignedp;
+  machine_mode tgt_mode =
+targetm.calls.promote_function_mode (NULL_TREE, mode, ,
+NULL_TREE, 1);
+
+  if (tgt_mode == mode)
+return true;
+  else
+return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an return  registers.  */
+
+static bool
+abi_extension_candidate_return_reg_p (rtx_insn *insn, int regno)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_VALUE_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if reg source operand of zero_extend is argument registers
+   and not return registers and source and destination operand are same
+   and mode of source and destination operand are not same.  */
+
+static bool
+abi_extension_candidate_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+
+  if (GET_CODE (SET_SRC (set)) !=  ZERO_EXTEND)
+return false;
+
+  machine_mode ext_dst_mode = GET_MODE (SET_DEST (set));
+  rtx orig_src = XEXP (SET_SRC (set),0);
+
+  bool copy_needed
+= (REGNO (SET_DEST (set)) != REGNO (XEXP (SET_SRC (set), 0)));
+
+  if (!copy_needed && ext_dst_mode != GET_MODE (orig_src)
+  && FUNCTION_ARG_REGNO_P (REGNO (orig_src))
+  && !abi_extension_candidate_return_reg_p (insn, REGNO (orig_src)))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn is zero extend and regno is
+   an argument registers.  */
+
+static bool
+abi_extension_candidate_argno_p (rtx_code code, int regno)
+{
+  if (code !=  ZERO_EXTEND)
+return false;
+
+  if (FUNCTION_ARG_REGNO_P (regno))
+return true;
+
+  return false;
+}
+
+/* Return TRUE if the candidate insn doesn't have defs and have
+ * uses without RTX_BIN_ARITH/RTX_COMM_ARITH/RTX_UNARY rtx class.  */
+
+static bool
+abi_handle_regs_without_defs_p (rtx_insn *insn)
+{
+  if (side_effects_p (PATTERN (insn)))
+return false;
+
+  struct df_link *uses
+= get_uses (insn, SET_DEST (PATTERN (insn)));
+
+  if (!uses)
+return false;
+
+  for (df_link *use = uses; use; use = use->next)
+{
+  if (!use->ref)
+   return false;
+
+  if (BLOCK_FOR_INSN (insn)
+ != BLOCK_FOR_INSN (DF_REF_INSN (use->ref)))
+   return false;
+
+  rtx_insn *use_insn = DF_REF_INSN (use->ref);
+
+  if (GET_CODE (PATTERN (use_insn)) == SET)
+   {
+ rtx_code code = GET_CODE (SET_SRC (PATTERN (use_insn)));
+
+ if (GET_RTX_CLASS (code) == RTX_BIN_ARITH
+ || GET_RTX_CLASS (code) == RTX_COMM_ARITH
+ || GET_RTX_CLASS (code) == RTX_UNARY)
+   return false;
+   }
+ }
+  return true;
+}
+
 /* This function goes through all reaching defs of the source
of the candidate for elimination (CAND) and tries to combine
the extension with the definition instruction.  The changes
@@ -770,6 +885,11 @@ combine_reaching_defs (ext_cand *cand, const_rtx set_pat, 
ext_state *state)
 
   state->defs_list.truncate (0);
   state->copies_list.truncate (0);
+  rtx orig_src = XEXP (SET_SRC

Re: [PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

This patch is no difference from V2.
Just add PR tree-optimization/109971 as Kewen's suggested.

Already bootstrapped and Regression on X86 no difference.

Ok for trunk ?


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-01 12:36
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
  PR tree-optimization/109971
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
- insert_after, _before_incr, _after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ _gsi, insert_after, _before_incr,
+ _after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
_seq, _seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- _step);
+ _step, _step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

Re: Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Thanks kewen.
I have send V3 patch. Could you comment that ?
I want to make sure you do support that patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-06-01 12:32
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; rguenther; gcc-patches
Subject: Re: [PATCH V2] VECT: Change flow of decrement IV
Hi Juzhe,
 
on 2023/6/1 08:31, juzhe.zh...@rivai.ai wrote:
> Bootstrapped and Regression on X86 no surprise different.
> 
> Looking forward Kewen's test report for this patch.
> 
 
This patch can be bootstrapped and regress-tested on
powerpc64-linux-gnu P9 and powerpc64le-linux-gnu P9/P10.
 
Also SPEC2017 int/fp bmks build and run successfully
with it on powerpc64le-linux-gnu P10 (with an explicit
parameter --param=vect-partial-vector-usage=2).
 
It can fix the 510.parest_r -5% degradation, and it speed-ed up
525.x264_r +1%, 521.wrf_r +2.03%, 544.nab_r +1.27% and
549.fotonik3d_r +3.22%, but it degraded 503.bwaves_r -4%, we have
some heuristics on load and load pct. for 503.bwaves_r on Power,
I suspected it's related, by considering vect-partial-vector-usage=2
isn't default on Power and this can fix exposed failures and parest_r
degradation, I think the bwaves_r degradation should not block this.
For bwaves_r degradation, I'll have a further look later, open a PR
if it's an actual issue rather than just costing heuristics having
no effects.
 
btw, it would be better to add one PR marker line to associate
this with PR109971, something like:
 
PR tree-optimization/109971
 
Thanks!
 
BR,
Kewen
 
> Thanks.
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong 
> *Date:* 2023-05-31 23:08
> *To:* gcc-patches 
> *CC:* richard.sandiford ; rguenther 
> ; linkw ; Ju-Zhe Zhong 
> 
> *Subject:* [PATCH V2] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>...
> -ivtmp_35 = ivtmp_9 - _36;
> +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>...
> -if (ivtmp_35 != 0)
> +if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>else
>  goto ; [16.67%]
> @@ -549,13

[PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.


This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin  

  PR tree-optimization/109971

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
-insert_after, _before_incr, _after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+_gsi, insert_after, _before_incr,
+_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 _seq, _seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-_step);
+_step, _step);
 
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread Kewen.Lin via Gcc-patches

Hi Juzhe,

on 2023/6/1 08:31, juzhe.zh...@rivai.ai wrote:
> Bootstrapped and Regression on X86 no surprise different.
> 
> Looking forward Kewen's test report for this patch.
> 

This patch can be bootstrapped and regress-tested on
powerpc64-linux-gnu P9 and powerpc64le-linux-gnu P9/P10.

Also SPEC2017 int/fp bmks build and run successfully
with it on powerpc64le-linux-gnu P10 (with an explicit
parameter --param=vect-partial-vector-usage=2).

It can fix the 510.parest_r -5% degradation, and it speed-ed up
525.x264_r +1%, 521.wrf_r +2.03%, 544.nab_r +1.27% and
549.fotonik3d_r +3.22%, but it degraded 503.bwaves_r -4%, we have
some heuristics on load and load pct. for 503.bwaves_r on Power,
I suspected it's related, by considering vect-partial-vector-usage=2
isn't default on Power and this can fix exposed failures and parest_r
degradation, I think the bwaves_r degradation should not block this.
For bwaves_r degradation, I'll have a further look later, open a PR
if it's an actual issue rather than just costing heuristics having
no effects.

btw, it would be better to add one PR marker line to associate
this with PR109971, something like:

PR tree-optimization/109971

Thanks!

BR,
Kewen

> Thanks.
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong 
> *Date:* 2023-05-31 23:08
> *To:* gcc-patches 
> *CC:* richard.sandiford ; rguenther 
> ; linkw ; Ju-Zhe Zhong 
> 
> *Subject:* [PATCH V2] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>    remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>    old_remain = remain;
>    len = MIN (vf, remain);
>    remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
> gcc/ChangeLog:
>  
>     * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
>     (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>    tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>    tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>    ...
>    vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>    ...
> -    ivtmp_35 = ivtmp_9 - _36;
> +    ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>    ...
> -    if (ivtmp_35 != 0)
> +    if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>    else
>  goto ; [16.67%]
> @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>    tree step = rgc->controls.length () == 1 ? rgc->controls[0]
>    : make_ssa_name (iv_type);
>    /* Create decrement IV.  */
> -  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE,

[PATCH v5] MIPS: Add speculation_barrier support

2023-05-31 Thread YunQiang Su

speculation_barrier for MIPS needs sync+jr.hb (r2+),
so we implement __speculation_barrier in libgcc, like arm32 does.

gcc/ChangeLog:
* config/mips/mips-protos.h (mips_emit_speculation_barrier): New
prototype.
* config/mips/mips.cc (speculation_barrier_libfunc): New static
variable.
(mips_init_libfuncs): Initialize it.
(mips_emit_speculation_barrier): New function.
* config/mips/mips.md (speculation_barrier): Call
mips_emit_speculation_barrier.

libgcc/ChangeLog:
* config/mips/lib1funcs.S: New file.
define __speculation_barrier and include mips16.S.
* config/mips/t-mips: define LIB1ASMSRC as mips/lib1funcs.S.
define LIB1ASMFUNCS as _speculation_barrier.
set version info for __speculation_barrier.
* config/mips/libgcc-mips.ver: New file.
* config/mips/t-mips16: don't define LIB1ASMSRC as mips16.S
included in lib1funcs.S now.
---
 gcc/config/mips/mips-protos.h  |  2 +
 gcc/config/mips/mips.cc| 12 ++
 gcc/config/mips/mips.md| 12 ++
 libgcc/config/mips/lib1funcs.S | 65 ++
 libgcc/config/mips/libgcc-mips.ver | 21 ++
 libgcc/config/mips/t-mips  |  7 
 libgcc/config/mips/t-mips16|  3 +-
 7 files changed, 120 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/mips/lib1funcs.S
 create mode 100644 libgcc/config/mips/libgcc-mips.ver

diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..da7902c235b 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,6 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern void mips_emit_speculation_barrier_function (void);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca491b981a3..c1d1691306e 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -13611,6 +13611,9 @@ mips_autovectorize_vector_modes (vector_modes *modes, 
bool)
   return 0;
 }
 
+
+static GTY (()) rtx speculation_barrier_libfunc;
+
 /* Implement TARGET_INIT_LIBFUNCS.  */
 
 static void
@@ -13680,6 +13683,7 @@ mips_init_libfuncs (void)
   synchronize_libfunc = init_one_libfunc ("__sync_synchronize");
   init_sync_libfuncs (UNITS_PER_WORD);
 }
+  speculation_barrier_libfunc = init_one_libfunc ("__speculation_barrier");
 }
 
 /* Build up a multi-insn sequence that loads label TARGET into $AT.  */
@@ -19092,6 +19096,14 @@ mips_avoid_hazard (rtx_insn *after, rtx_insn *insn, 
int *hilo_delay,
   }
 }
 
+/* Emit a speculation barrier.
+   JR.HB is needed, so we put speculation_barrier_libfunc in libgcc.  */
+void
+mips_emit_speculation_barrier_function ()
+{
+  emit_library_call (speculation_barrier_libfunc, LCT_NORMAL, VOIDmode);
+}
+
 /* A SEQUENCE is breakable iff the branch inside it has a compact form
and the target has compact branches.  */
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..5d04ac566dd 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -160,6 +160,8 @@
   ;; The `.insn' pseudo-op.
   UNSPEC_INSN_PSEUDO
   UNSPEC_JRHB
+
+  VUNSPEC_SPECULATION_BARRIER
 ])
 
 (define_constants
@@ -7455,6 +7457,16 @@
   mips_expand_conditional_move (operands);
   DONE;
 })
+
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  ""
+  "
+  mips_emit_speculation_barrier_function ();
+  DONE;
+  "
+)
+
 
 ;;
 ;;  
diff --git a/libgcc/config/mips/lib1funcs.S b/libgcc/config/mips/lib1funcs.S
new file mode 100644
index 000..97a3655e8ab
--- /dev/null
+++ b/libgcc/config/mips/lib1funcs.S
@@ -0,0 +1,65 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+//#include "mips16.S"
+
+#ifdef L_speculation_barrier
+
+/* MIPS16e1 has no sync/jr.hb instructions, and MIPS16e2 lacks

Re: Build-break in libstdc++-v3 at r14-1442-ge1240bda3e0bb1 for non-float128 targets

2023-05-31 Thread Hans-Peter Nilsson via Gcc-patches

> From: Jonathan Wakely 
> Date: Wed, 31 May 2023 21:06:16 +0100
> On Wed, 31 May 2023 at 16:32, Jonathan Wakely  wrote:
> > On Wed, 31 May 2023 at 16:29, Hans-Peter Nilsson via Libstdc++ <
> > libstd...@gcc.gnu.org> wrote:
> >
> >> Since I don't see a quick fix at r14-1444-g3f4853a5f00fab, I
> >> thought I'd better notify the author (I have written authors
> >> if there was more than one ;-) of suspect commits in the
> >> range r14-1425-g80ee7d02e8db48..e1240bda3e0b for the
> >> build-break at r14-1442-ge1240bda3e0bb1 for cris-elf, where
> >> I get:
> >>
> >> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:47: error:
> >> '_Float128' is not supported on this target
> >>  1330 | from_chars(const char* first, const char* last, _Float128& value,
> >>   |   ^
> >>
> >
> > Sorry, I'll fix or revert it today.
> >
> 
> It should be fixed at  r14-1451-ga239a35075ffd8

JFTR: confirmed at r14-1451-ga239a35075ffd8, thanks!

brgds, H-P

[r14-1452 Regression] FAIL: g++.dg/pr104547.C -std=gnu++17 scan-tree-dump-not vrp2 "_M_default_append" on Linux/x86_64

2023-05-31 Thread haochen.jiang via Gcc-patches

On Linux/x86_64,

fb409a15d9babc78fe1d9957afcbaf1102cce58f is the first bad commit
commit fb409a15d9babc78fe1d9957afcbaf1102cce58f
Author: Jonathan Wakely 
Date:   Thu May 25 09:57:46 2023 +0100

libstdc++: Express std::vector's size() <= capacity() invariant in code

caused

FAIL: g++.dg/pr104547.C  -std=gnu++14  scan-tree-dump-not vrp2 
"_M_default_append"
FAIL: g++.dg/pr104547.C  -std=gnu++17  scan-tree-dump-not vrp2 
"_M_default_append"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-1452/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr104547.C 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr104547.C 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr104547.C 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=g++.dg/pr104547.C 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)

[PATCH] RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

1. This patch optimize the codegen of the following auto-vectorization codes:

void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict 
c, int n)
{
for (int i = 0; i < n; i++)
  c[i] = (int64_t)a[i] + b[i];
}

Combine instruction from:

...
vsext.vf2
vadd.vv
...

into:

...
vwadd.wv
...

Since for PLUS operation, GCC prefer the following RTL operand order when 
combining:

(plus: (sign_extend:..)
   (reg:)

instead of

(plus: (reg:..)
   (sign_extend:)

which is different from MINUS pattern.

I split patterns of vwadd/vwsub, and add dedicated patterns for them.

2. This patch not only optimize the case as above (1) mentioned, also enhance 
vwadd.vv/vwsub.vv
   optimization for complicate PLUS/MINUS codes, consider this following codes:
   
__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int8_t *__restrict a,
  int8_t *__restrict b, int8_t *__restrict a2,
  int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] + (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
}
}

Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v  v2,0(a3)
vle8.v  v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2   v3,v2
vsext.vf2   v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v  v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2   v1,v4
vadd.vv v2,v1,v2
...

After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v  v3,0(a4)
vle8.v  v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vvv2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v  v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vvv4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vvv3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...

The reason why current upstream GCC can not optimize codes using vwadd 
thoroughly is combine PASS 
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then 
base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.

So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv 
intrinsic API expander
* config/riscv/vector.md 
(@pred_single_widen_): Remove it.
(@pred_single_widen_sub): New pattern.
(@pred_single_widen_add): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  |  8 +++--
 gcc/config/riscv/vector.md| 29 +---
 .../riscv/rvv/autovec/widen/widen-5.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-6.c | 27 +++
 .../rvv/autovec/widen/widen-complicate-1.c| 31 +
 .../rvv/autovec/widen/widen-complicate-2.c| 31 +
 .../riscv/rvv/autovec/widen/widen_run-5.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-6.c | 34 +++
 8 files changed, 215 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index a8113f6602b..3f92084929d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -361,8 +361,12 @@ public:
return e.use_exact_insn (
  code_for_pred_dual_widen_scalar (CODE1, CODE2, e.vector_mode ()));
   case OP_TYPE_wv:
-   return e.use_exact_insn (
- code_for_pred_single_widen (CODE1, CODE2, e.vector_mode ()));
+

[PATCH] doc: improve docs for -pedantic{,-errors}

2023-05-31 Thread Jason Merrill via Gcc-patches

Tested by looking at the makeinfo output.  OK for trunk?

-- 8< --

Recent discussion of -Wimplicit led me to want to clarify this section of
the documentation, and mark which diagnostics other than -Wpedantic are
affected by -pedantic-errors.

gcc/ChangeLog:

* doc/invoke.texi (-Wpedantic): Improve clarity.
---
 gcc/doc/invoke.texi | 98 -
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 898a88ce33e..54edf8753a2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3701,6 +3701,9 @@ in C++20 with a pedantic warning that can be disabled with
 Enabled by default with @option{-std=c++20} unless @option{-Wno-deprecated},
 and with @option{-std=c++23} regardless of @option{-Wno-deprecated}.
 
+This warning is upgraded to an error by @option{-pedantic-errors} in
+C++23 mode or later.
+
 @opindex Wctad-maybe-unsupported
 @opindex Wno-ctad-maybe-unsupported
 @item -Wctad-maybe-unsupported @r{(C++ and Objective-C++ only)}
@@ -5987,15 +5990,16 @@ warnings, in some cases it may also cause false 
positives.
 @item -Wpedantic
 @itemx -pedantic
 Issue all the warnings demanded by strict ISO C and ISO C++;
-reject all programs that use forbidden extensions, and some other
-programs that do not follow ISO C and ISO C++.  For ISO C, follows the
-version of the ISO C standard specified by any @option{-std} option used.
+diagnose all programs that use forbidden extensions, and some other
+programs that do not follow ISO C and ISO C++.  This follows the version
+of the ISO C or C++ standard specified by any @option{-std} option used.
 
 Valid ISO C and ISO C++ programs should compile properly with or without
 this option (though a rare few require @option{-ansi} or a
-@option{-std} option specifying the required version of ISO C)@.  However,
+@option{-std} option specifying the version of the standard)@.  However,
 without this option, certain GNU extensions and traditional C and C++
-features are supported as well.  With this option, they are rejected.
+features are supported as well.  With this option, they are diagnosed
+(or rejected with @option{-pedantic-errors}).
 
 @option{-Wpedantic} does not cause warning messages for use of the
 alternate keywords whose names begin and end with @samp{__}.  This alternate
@@ -6006,16 +6010,10 @@ Pedantic warnings are also disabled in the expression 
that follows
 these escape routes; application programs should avoid them.
 @xref{Alternate Keywords}.
 
-Some users try to use @option{-Wpedantic} to check programs for strict ISO
-C conformance.  They soon find that it does not do quite what they want:
-it finds some non-ISO practices, but not all---only those for which
-ISO C @emph{requires} a diagnostic, and some others for which
-diagnostics have been added.
-
-A feature to report any failure to conform to ISO C might be useful in
-some instances, but would require considerable additional work and would
-be quite different from @option{-Wpedantic}.  We don't have plans to
-support such a feature in the near future.
+Some warnings about non-conforming programs are controlled by options
+other than @option{-Wpedantic}; in many cases they are implied by
+@option{-Wpedantic} but can be disabled separately by their specific
+option, e.g. @option{-Wpedantic -Wno-pointer-sign}.
 
 Where the standard specified with @option{-std} represents a GNU
 extended dialect of C, such as @samp{gnu90} or @samp{gnu99}, there is a
@@ -6033,8 +6031,44 @@ Give an error whenever the @dfn{base standard} (see 
@option{-Wpedantic})
 requires a diagnostic, in some cases where there is undefined behavior
 at compile-time and in some other cases that do not prevent compilation
 of programs that are valid according to the standard. This is not
-equivalent to @option{-Werror=pedantic}, since there are errors enabled
-by this option and not enabled by the latter and vice versa.
+equivalent to @option{-Werror=pedantic}: the latter option is unlikely to be
+useful, as it only makes errors of the diagnostics that are controlled by
+@option{-Wpedantic}, whereas this option also affects required diagnostics that
+are always enabled or controlled by options other than @option{-Wpedantic}.
+
+If you want the required diagnostics that are warnings by default to
+be errors instead, but don't also want to enable the @option{-Wpedantic}
+diagnostics, you can specify @option{-pedantic-errors -Wno-pedantic}
+(or @option{-pedantic-errors -Wno-error=pedantic} to enable them but
+only as warnings).
+
+Some required diagnostics are errors by default, but can be reduced to
+warnings using @option{-fpermissive} or their specific warning option,
+e.g. @option{-Wno-error=narrowing}.
+
+Some diagnostics for non-ISO practices are controlled by specific
+warning options other than @option{-Wpedantic}, but are also made
+errors by @option{-pedantic-errors}.  For instance:
+
+@gccoptlist{
+-Wattributes @r{(for standard

[pushed] c++: make -fpermissive avoid -Werror=narrowing

2023-05-31 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Currently we make -Wnarrowing an error by default by forcing pedantic_errors
on, but for consistency -fpermissive should prevent that.

In general I'm inclined to move away from using permerror in favor of this
kind of model, with specific flags for each diagnostic.

gcc/cp/ChangeLog:

* typeck2.cc (check_narrowing): Check flag_permissive.
---
 gcc/cp/typeck2.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index 8724877058f..1c204c8612b 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -1109,7 +1109,8 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
   else if (complain & tf_error)
{
  int savederrorcount = errorcount;
- global_dc->pedantic_errors = 1;
+ if (!flag_permissive)
+   global_dc->pedantic_errors = 1;
  auto s = make_temp_override (global_dc->dc_warn_system_headers, true);
  pedwarn (loc, OPT_Wnarrowing,
   "narrowing conversion of %qE from %qH to %qI",

base-commit: 09ff83d4bc1405c9af803fb84bfc49d6001da47b
-- 
2.31.1

[PATCH] libstdc++: optimize EH phase 2

2023-05-31 Thread Jason Merrill via Gcc-patches

Tested x86_64-pc-linux-gnu, OK for trunk?

-- 8< --

In the ABI's two-phase EH model, first we walk the stack looking for a
handler, then we walk the stack running cleanups until we reach that
handler.  In the cleanup phase, we shouldn't redundantly check the handlers
along the way, e.g. when walking through g():

  void f() { throw 42; }
  void g() { try { f(); } catch (void *) { } }
  int main() { try { g(); } catch (int) { } }

libstdc++-v3/ChangeLog:

* libsupc++/eh_personality.cc (PERSONALITY_FUNCTION): Don't check
handlers in the cleanup phase.
---
 libstdc++-v3/libsupc++/eh_personality.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/libstdc++-v3/libsupc++/eh_personality.cc 
b/libstdc++-v3/libsupc++/eh_personality.cc
index 12391e563d6..cc6bc048892 100644
--- a/libstdc++-v3/libsupc++/eh_personality.cc
+++ b/libstdc++-v3/libsupc++/eh_personality.cc
@@ -592,6 +592,10 @@ PERSONALITY_FUNCTION (int version,
  // Zero filter values are cleanups.
  saw_cleanup = true;
}
+ else if (actions == _UA_CLEANUP_PHASE)
+   // We checked the handlers in the search phase; if one of them
+   // matched, actions would also have _UA_HANDLER_FRAME set.
+   ;
  else if (ar_filter > 0)
{
  // Positive filter values are handlers.

base-commit: 68816ba245afc6d0e1482bde2d15b35b925b4195
-- 
2.31.1

[PATCH V2] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.

Fixed following comments from Robin.
Ok for trunk?

gcc/ChangeLog:

* config/riscv/autovec.md (vec_perm): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.

---
 gcc/config/riscv/autovec.md   |  18 +++
 gcc/config/riscv/predicates.md|   4 +
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 153 ++
 .../riscv/rvv/autovec/vls-vlmax/perm-1.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-2.c  |  33 
 .../riscv/rvv/autovec/vls-vlmax/perm-3.c  |  29 
 .../riscv/rvv/autovec/vls-vlmax/perm-4.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-5.c  |  49 ++
 .../riscv/rvv/autovec/vls-vlmax/perm-6.c  |  58 +++
 .../riscv/rvv/autovec/vls-vlmax/perm-7.c  |  49 ++
 .../riscv/rvv/autovec/vls-vlmax/perm.h|  70 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-1.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-2.c  |  32 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-3.c  |  20 +++
 .../riscv/rvv/autovec/vls-vlmax/perm_run-4.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-5.c  | 137 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-6.c  | 104 
 .../riscv/rvv/autovec/vls-vlmax/perm_run-7.c  | 135 
 19 files changed, 1217 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3a1e1316732..5c3aad7ee44 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -83,6 +83,24 @@
   }
 )
 
+;; -
+;;  [INT,FP] permutation
+;; -
+;; This is the pattern permutes the vector
+;; -
+
+(define_expand "vec_perm"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand:V 2 "register_operand")
+   (match_operand: 3

Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

Bootstrapped and Regression on X86 no surprise different.

Looking forward Kewen's test report for this patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-31 23:08
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V2] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
- insert_after, _before_incr, _after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ _gsi, insert_after, _before_incr,
+ _after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
_seq, _seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- _step);
+ _step, _step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

RE: [PATCH] RISC-V: Add RVV FRM enum for floating-point rounding mode intriniscs

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed as the doc updated, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Tuesday, May 30, 2023 1:03 AM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com; Li, Pan2 
Subject: Re: [PATCH] RISC-V: Add RVV FRM enum for floating-point rounding mode 
intriniscs



On 5/25/23 01:54, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> gcc/ChangeLog:
> 
>  * config/riscv/riscv-vector-builtins.cc (register_frm): New function.
>  (DEF_RVV_FRM_ENUM): New macro.
>  (handle_pragma_vector): Add FRM enum
>  * config/riscv/riscv-vector-builtins.def (DEF_RVV_FRM_ENUM): New 
> macro.
>  (RNE): Ditto.
>  (RTZ): Ditto.
>  (RDN): Ditto.
>  (RUP): Ditto.
>  (RMM): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/base/frm-1.c: New test.
OK
jeff

Re: [PATCH v4] tree-ssa-sink: Improve code sinking pass

2023-05-31 Thread Peter Bergner via Gcc-patches

This is not a review of the patch itself, but...

On 5/31/23 2:01 AM, Ajit Agarwal wrote:
> tree-ssa-sink: Improve code sinking pass
> 
> Code Sinking sinks the blocks after call.This increases register pressure
> for callee-saved registers. Improves code sinking before call in the use
> blocks or immediate dominator of use blocks.

I think the wording of your git log comment could be improved a little.
How about something like the following?

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

> gcc/ChangeLog:
> 
>   * tree-ssa-sink.cc (statement_sink_location): Move statements before
>   calls.
>   (def_use_same_block): New function.
>   (select_best_block): Add heuristics to select the best blocks in the
>   immediate post dominator.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
>   * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.

Please don't forget to add "PR tree-optimization/81953" to both sections
of the ChangeLog entries.

Peter

Re: [PATCH] Fix PR 110042: ifcvt regression due to paradoxical subregs

2023-05-31 Thread Andrew Pinski via Gcc-patches

On Wed, May 31, 2023 at 12:29 AM Richard Biener via Gcc-patches
 wrote:
>
> On Wed, May 31, 2023 at 6:34 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > After r14-1014-gc5df248509b489364c573e8, GCC started to emit
> > directly a zero_extract for `(t1&0x8)!=0`. This introduced
> > a small regression where ifcvt would not do the ifconversion
> > as there is now a paradoxical subreg in the dest which
> > was being rejected. Since paradoxical subreg set the whole
> > register, we can treat it as the same as a reg in the two places.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
>
> OK I guess.   I vaguely remember SUBREG_PROMOTED_UNSIGNED_P
> applies to non-paradoxical subregs but I might be swapping things - maybe
> you remember better and whether that would cause any issues here?

So I looked into the history of the code in ifcvt.cc, this code was
added with r6-3071-ge65bf4e814d38c to accept more complex bb
(https://inbox.sourceware.org/gcc-patches/559fbb13.80...@arm.com/).
The thread where we start talking about subregs is located with Jeff's
email starting here:
https://inbox.sourceware.org/gcc-patches/55bbafac.5020...@redhat.com/ .

Jeff,
  I know Richard already approved this patch but could you provide a
second eye as you were involved reviewing the original code here and I
want to make sure I understood the code in a a reasonable fashion?

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> > gcc/ChangeLog:
> >
> > PR rtl-optimization/110042
> > * ifcvt.cc (bbs_ok_for_cmove_arith): Allow paradoxical subregs.
> > (bb_valid_for_noce_process_p): Strip the subreg for the SET_DEST.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR rtl-optimization/110042
> > * gcc.target/aarch64/csel_bfx_2.c: New test.
> > ---
> >  gcc/ifcvt.cc  | 14 ++
> >  gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c | 27 +++
> >  2 files changed, 36 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c
> >
> > diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> > index 868eda93251..0b180b4568f 100644
> > --- a/gcc/ifcvt.cc
> > +++ b/gcc/ifcvt.cc
> > @@ -2022,7 +2022,7 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block 
> > bb_b, rtx to_rename)
> > }
> >
> >/* Make sure this is a REG and not some instance
> > -of ZERO_EXTRACT or SUBREG or other dangerous stuff.
> > +of ZERO_EXTRACT or non-paradoxical SUBREG or other dangerous stuff.
> >  If we have a memory destination then we have a pair of simple
> >  basic blocks performing an operation of the form [addr] = c ? a : 
> > b.
> >  bb_valid_for_noce_process_p will have ensured that these are
> > @@ -2030,7 +2030,8 @@ bbs_ok_for_cmove_arith (basic_block bb_a, basic_block 
> > bb_b, rtx to_rename)
> >  to be renamed.  Assert that the callers set this up properly.  */
> >if (MEM_P (SET_DEST (sset_b)))
> > gcc_assert (rtx_equal_p (SET_DEST (sset_b), to_rename));
> > -  else if (!REG_P (SET_DEST (sset_b)))
> > +  else if (!REG_P (SET_DEST (sset_b))
> > +  && !paradoxical_subreg_p (SET_DEST (sset_b)))
> > {
> >   BITMAP_FREE (bba_sets);
> >   return false;
> > @@ -3136,14 +3137,17 @@ bb_valid_for_noce_process_p (basic_block test_bb, 
> > rtx cond,
> >
> >   rtx sset = single_set (insn);
> >   gcc_assert (sset);
> > + rtx dest = SET_DEST (sset);
> > + if (SUBREG_P (dest))
> > +   dest = SUBREG_REG (dest);
> >
> >   if (contains_mem_rtx_p (SET_SRC (sset))
> > - || !REG_P (SET_DEST (sset))
> > - || reg_overlap_mentioned_p (SET_DEST (sset), cond))
> > + || !REG_P (dest)
> > + || reg_overlap_mentioned_p (dest, cond))
> > goto free_bitmap_and_fail;
> >
> >   potential_cost += pattern_cost (sset, speed_p);
> > - bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
> > + bitmap_set_bit (test_bb_temps, REGNO (dest));
> > }
> >  }
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c 
> > b/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c
> > new file mode 100644
> > index 000..c3b8a6f45cc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/csel_bfx_2.c
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +unsigned
> > +f1(int t, int t1)
> > +{
> > +  int tt = 0;
> > +  if(t)
> > +tt = (t1&0x8)!=0;
> > +  return tt;
> > +}
> > +struct f
> > +{
> > +  unsigned t:3;
> > +  unsigned t1:4;
> > +};
> > +unsigned
> > +f2(int t, struct f y)
> > +{
> > +  int tt = 0;
> > +  if(t)
> > +tt = y.t1;
> > +  return tt;
> > +}
> > +/* Both f1 and f2 should produce a csel and not a cbz on the argument. */
> > +/*  { dg-final { scan-assembler-times "csel\t" 2 } } */
> > +/*  { dg-final { scan-assembler-times "ubfx\t" 2 } } */
> >

[committed] libstdc++: Add separate autoconf macro for std::float_t and std::double_t [PR109818]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This should make it possible to use openlibm with djgpp (and other
targets with missing C99  functions). The  from openlibm
provides all the functions, but not the float_t and double_t typedefs.
By separating the autoconf checks for the functionsand the typedefs, we
don't disable support for all the functions just because those typedefs
are not present.

libstdc++-v3/ChangeLog:

PR libstdc++/109818
* acinclude.m4 (GLIBCXX_ENABLE_C99): Add separate check for
float_t and double_t and define HAVE_C99_FLT_EVAL_TYPES.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/c_global/cmath (float_t, double_t): Guard using new
_GLIBCXX_HAVE_C99_FLT_EVAL_TYPES macro.
---
 libstdc++-v3/acinclude.m4   | 21 ---
 libstdc++-v3/config.h.in|  4 +++
 libstdc++-v3/configure  | 41 ++---
 libstdc++-v3/include/c_global/cmath |  2 ++
 4 files changed, 62 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 66194071b20..6ae141b8c20 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -1273,13 +1273,28 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 in  in namespace std for C++11.])
 fi
 
-# Check for the existence of  functions.
-AC_CACHE_CHECK([for ISO C99 function support for C++11 in ],
-glibcxx_cv_c99_math_funcs, [
+# Check for the existence of  typedefs.
+AC_CACHE_CHECK([for ISO C99 float types for C++11 in ],
+glibcxx_cv_c99_flt_eval_types, [
 AC_TRY_COMPILE([#include ],
   [// Types
typedef double_t  my_double_t;
typedef float_t   my_float_t;
+  ],
+  [glibcxx_cv_c99_flt_eval_types=yes],
+  [glibcxx_cv_c99_flt_eval_types=no])
+])
+if test x"$glibcxx_cv_c99_flt_eval_types" = x"yes"; then
+  AC_DEFINE(HAVE_C99_FLT_EVAL_TYPES, 1,
+   [Define if C99 float_t and double_t in  should be
+   imported in  in namespace std for C++11.])
+fi
+
+# Check for the existence of  functions.
+AC_CACHE_CHECK([for ISO C99 function support for C++11 in ],
+glibcxx_cv_c99_math_funcs, [
+AC_TRY_COMPILE([#include ],
+  [
// Hyperbolic
acosh(0.0);
acoshf(0.0f);
diff --git a/libstdc++-v3/config.h.in b/libstdc++-v3/config.h.in
index 4fbf5ef86b9..5a95853cbbe 100644
--- a/libstdc++-v3/config.h.in
+++ b/libstdc++-v3/config.h.in
@@ -42,6 +42,10 @@
 /* Define to 1 if you have the `at_quick_exit' function. */
 #undef HAVE_AT_QUICK_EXIT
 
+/* Define if C99 float_t and double_t in  should be imported in
+in namespace std for C++11. */
+#undef HAVE_C99_FLT_EVAL_TYPES
+
 /* Define to 1 if the target assembler supports thread-local storage. */
 #undef HAVE_CC_TLS
 
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index ba328a64be2..70d169cf64b 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -18275,6 +18275,43 @@ $as_echo "#define _GLIBCXX11_USE_C99_MATH 1" 
>>confdefs.h
 
 fi
 
+# Check for the existence of  typedefs.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ISO C99 float types 
for C++11 in " >&5
+$as_echo_n "checking for ISO C99 float types for C++11 in ... " >&6; }
+if ${glibcxx_cv_c99_flt_eval_types+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include 
+int
+main ()
+{
+// Types
+   typedef double_t  my_double_t;
+   typedef float_t   my_float_t;
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_cxx_try_compile "$LINENO"; then :
+  glibcxx_cv_c99_flt_eval_types=yes
+else
+  glibcxx_cv_c99_flt_eval_types=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$glibcxx_cv_c99_flt_eval_types" >&5
+$as_echo "$glibcxx_cv_c99_flt_eval_types" >&6; }
+if test x"$glibcxx_cv_c99_flt_eval_types" = x"yes"; then
+
+$as_echo "#define HAVE_C99_FLT_EVAL_TYPES 1" >>confdefs.h
+
+fi
+
 # Check for the existence of  functions.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ISO C99 function 
support for C++11 in " >&5
 $as_echo_n "checking for ISO C99 function support for C++11 in ... " 
>&6; }
@@ -18288,9 +18325,7 @@ else
 int
 main ()
 {
-// Types
-   typedef double_t  my_double_t;
-   typedef float_t   my_float_t;
+
// Hyperbolic
acosh(0.0);
acoshf(0.0f);
diff --git a/libstdc++-v3/include/c_global/cmath 
b/libstdc++-v3/include/c_global/cmath
index c80ee7c8d72..b0ba395eb5c 100644
--- a/libstdc++-v3/include/c_global/cmath
+++ b/libstdc++-v3/include/c_global/cmath
@@ -1877,9 +1877,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

[committed] libstdc++: Stop using _GLIBCXX_USE_C99_MATH_TR1 in

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Similar to the three commits r14-908, r14-909 and r14-910, the
_GLIBCXX_USE_C99_MATH_TR1 macro is misleading when it is also used for
, not only for  headers. It is also wrong, because the
configure checks for TR1 use -std=c++98 and a target might define the
C99 features for C++11 but not for C++98.

Add separate configure checks for the  functions using
-std=c++11 for the checks. Use the new macro defined by those checks in
the C++11-specific parts of , and in ,  etc.

The check that defines _GLIBCXX_NO_C99_ROUNDING_FUNCS is only needed for
the C++11  checks, so remove that from GLIBCXX_CHECK_C99_TR1 and
only do it for GLIBCXX_ENABLE_C99.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ENABLE_C99): Add checks for C99 math
functions and define _GLIBCXX_USE_C99_MATH_FUNCS. Move checks
for C99 rounding functions to here.
(GLIBCXX_CHECK_C99_TR1): Remove checks for C99 rounding
functions from here.
* config.h.in: Regenerate.
* configure: Regenerate.
* include/bits/random.h: Use _GLIBCXX_USE_C99_MATH_FUNCS instead
of _GLIBCXX_USE_C99_MATH_TR1.
* include/bits/random.tcc: Likewise.
* include/c_compatibility/math.h: Likewise.
* include/c_global/cmath: Likewise.
* include/ext/random: Likewise.
* include/ext/random.tcc: Likewise.
* include/std/complex: Likewise.
* testsuite/20_util/from_chars/4.cc: Likewise.
* testsuite/20_util/from_chars/8.cc: Likewise.
* testsuite/26_numerics/complex/proj.cc: Likewise.
* testsuite/26_numerics/headers/cmath/60401.cc: Likewise.
* testsuite/26_numerics/headers/cmath/types_std_c++0x.cc:
Likewise.
* testsuite/lib/libstdc++.exp (check_v3_target_cstdint):
Likewise.
* testsuite/util/testsuite_random.h: Likewise.
---
 libstdc++-v3/acinclude.m4 | 183 +++--
 libstdc++-v3/config.h.in  |   8 +-
 libstdc++-v3/configure| 245 ++
 libstdc++-v3/include/bits/random.h|  12 +-
 libstdc++-v3/include/bits/random.tcc  |  14 +-
 libstdc++-v3/include/c_compatibility/math.h   |   4 +-
 libstdc++-v3/include/c_global/cmath   |   4 +-
 libstdc++-v3/include/ext/random   |   4 +-
 libstdc++-v3/include/ext/random.tcc   |   6 +-
 libstdc++-v3/include/std/complex  |   2 +-
 .../testsuite/20_util/from_chars/4.cc |   2 +-
 .../testsuite/20_util/from_chars/8.cc |   2 +-
 .../testsuite/26_numerics/complex/proj.cc |   4 +-
 .../26_numerics/headers/cmath/60401.cc|   2 +-
 .../headers/cmath/types_std_c++0x.cc  |   2 +-
 libstdc++-v3/testsuite/lib/libstdc++.exp  |   2 +-
 .../testsuite/util/testsuite_random.h |   6 +-
 17 files changed, 396 insertions(+), 106 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index ca776974832..66194071b20 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -1245,8 +1245,8 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
imported in  in namespace std in C++11.])
 fi
 
-# Check for the existence of  functions used if C99 is enabled.
-AC_CACHE_CHECK([for ISO C99 support in  for C++11],
+# Check for the existence of  generic macros used if C99 is 
enabled.
+AC_CACHE_CHECK([for ISO C99 generic macro support in  for C++11],
 glibcxx_cv_c99_math_cxx11, [
   GCC_TRY_COMPILE_OR_LINK(
 [#include 
@@ -1269,10 +1269,165 @@ AC_DEFUN([GLIBCXX_ENABLE_C99], [
 ])
 if test x"$glibcxx_cv_c99_math_cxx11" = x"yes"; then
   AC_DEFINE(_GLIBCXX11_USE_C99_MATH, 1,
-[Define if C99 functions or macros in  should be imported
+[Define if C99 generic macros in  should be imported
 in  in namespace std for C++11.])
 fi
 
+# Check for the existence of  functions.
+AC_CACHE_CHECK([for ISO C99 function support for C++11 in ],
+glibcxx_cv_c99_math_funcs, [
+AC_TRY_COMPILE([#include ],
+  [// Types
+   typedef double_t  my_double_t;
+   typedef float_t   my_float_t;
+   // Hyperbolic
+   acosh(0.0);
+   acoshf(0.0f);
+   acoshl(0.0l);
+   asinh(0.0);
+   asinhf(0.0f);
+   asinhl(0.0l);
+   atanh(0.0);
+   atanhf(0.0f);
+   atanhl(0.0l);
+   // Exponential and logarithmic
+   exp2(0.0);
+   exp2f(0.0f);
+   exp2l(0.0l);
+   expm1(0.0);
+   expm1f(0.0f);
+   expm1l(0.0l);
+   ilogb(0.0);
+   ilogbf(0.0f);
+   ilogbl(0.0l);
+   log1p(0.0);
+   log1pf(0.0f);

[committed] libstdc++: Express std::vector's size() <= capacity() invariant in code

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This adds optimizer hints so that GCC knows that size() <= capacity() is
always true. This allows the compiler to optimize away re-allocating
paths when assigning new values to the vector without resizing it, e.g.,
vec.assign(vec.size(), new_val).

libstdc++-v3/ChangeLog:

* include/bits/stl_vector.h (_Vector_base::_M_invariant()): New
function.
(vector::size(), vector::capacity()): Call _M_invariant().
* testsuite/23_containers/vector/capacity/invariant.cc: New test.
* testsuite/23_containers/vector/types/1.cc: Add suppression for
false positive warning (PR110060).
---
 libstdc++-v3/include/bits/stl_vector.h| 30 +--
 .../vector/capacity/invariant.cc  | 16 ++
 .../testsuite/23_containers/vector/types/1.cc |  2 +-
 3 files changed, 44 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/23_containers/vector/capacity/invariant.cc

diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index acb29396d26..e593be443bc 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -388,6 +388,24 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   }
 
 protected:
+
+  __attribute__((__always_inline__))
+  _GLIBCXX20_CONSTEXPR void
+  _M_invariant() const
+  {
+#if __OPTIMIZE__
+   if (this->_M_impl._M_finish < this->_M_impl._M_start)
+ __builtin_unreachable();
+   if (this->_M_impl._M_finish > this->_M_impl._M_end_of_storage)
+ __builtin_unreachable();
+
+   size_t __sz = this->_M_impl._M_finish - this->_M_impl._M_start;
+   size_t __cap = this->_M_impl._M_end_of_storage - this->_M_impl._M_start;
+   if (__sz > __cap)
+ __builtin_unreachable();
+#endif
+  }
+
   _GLIBCXX20_CONSTEXPR
   void
   _M_create_storage(size_t __n)
@@ -987,7 +1005,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   size_type
   size() const _GLIBCXX_NOEXCEPT
-  { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }
+  {
+   _Base::_M_invariant();
+   return size_type(this->_M_impl._M_finish - this->_M_impl._M_start);
+  }
 
   /**  Returns the size() of the largest possible %vector.  */
   _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
@@ -1073,8 +1094,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
   size_type
   capacity() const _GLIBCXX_NOEXCEPT
-  { return size_type(this->_M_impl._M_end_of_storage
-- this->_M_impl._M_start); }
+  {
+   _Base::_M_invariant();
+   return size_type(this->_M_impl._M_end_of_storage
+  - this->_M_impl._M_start);
+  }
 
   /**
*  Returns true if the %vector is empty.  (Thus begin() would
diff --git a/libstdc++-v3/testsuite/23_containers/vector/capacity/invariant.cc 
b/libstdc++-v3/testsuite/23_containers/vector/capacity/invariant.cc
new file mode 100644
index 000..d68db694add
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/capacity/invariant.cc
@@ -0,0 +1,16 @@
+// { dg-do compile }
+// { dg-options "-O3 -g0" }
+// { dg-final { scan-assembler-not "_Znw" } }
+// GCC should be able to optimize away the paths involving reallocation.
+
+#include 
+
+void fill(std::vector& vec)
+{
+  vec.assign(vec.size(), 0);
+}
+
+void fill_val(std::vector& vec, int i)
+{
+  vec.assign(vec.size(), i);
+}
diff --git a/libstdc++-v3/testsuite/23_containers/vector/types/1.cc 
b/libstdc++-v3/testsuite/23_containers/vector/types/1.cc
index 079e5af9556..9be07d9fd5c 100644
--- a/libstdc++-v3/testsuite/23_containers/vector/types/1.cc
+++ b/libstdc++-v3/testsuite/23_containers/vector/types/1.cc
@@ -18,7 +18,7 @@
 // .
 
 // { dg-do compile }
-// { dg-options "-Wno-unused-result" }
+// { dg-options "-Wno-unused-result -Wno-stringop-overread" }
 
 #include 
 #include 
-- 
2.40.1

Re: Build-break in libstdc++-v3 at r14-1442-ge1240bda3e0bb1 for non-float128 targets

2023-05-31 Thread Jonathan Wakely via Gcc-patches

On Wed, 31 May 2023 at 16:32, Jonathan Wakely  wrote:

>
>
> On Wed, 31 May 2023 at 16:29, Hans-Peter Nilsson via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
>
>> Since I don't see a quick fix at r14-1444-g3f4853a5f00fab, I
>> thought I'd better notify the author (I have written authors
>> if there was more than one ;-) of suspect commits in the
>> range r14-1425-g80ee7d02e8db48..e1240bda3e0b for the
>> build-break at r14-1442-ge1240bda3e0bb1 for cris-elf, where
>> I get:
>>
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:47: error:
>> '_Float128' is not supported on this target
>>  1330 | from_chars(const char* first, const char* last, _Float128& value,
>>   |   ^
>>
>
> Sorry, I'll fix or revert it today.
>

It should be fixed at  r14-1451-ga239a35075ffd8


>
>
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
>> expected identifier before '_Float128'
>>  1330 | from_chars(const char* first, const char* last, _Float128& value,
>>   | ^
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
>> '_Float128' is not supported on this target
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
>> expected ',' or '...' before '_Float128'
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc: In function
>> 'std::from_chars_result std::from_chars(const char*, const char*, int)':
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1340:53: error:
>> 'fmt' was not declared in this scope; did you mean 'fma'?
>>  1340 |   auto res = std::from_chars(first, last, ldbl_val, fmt);
>>   | ^~~
>>   | fma
>> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1342:5: error:
>> 'value' was not declared in this scope
>>  1342 | value = ldbl_val;
>>   | ^
>> make[5]: *** [Makefile:587: floating_from_chars.lo] Error 1
>>
>> brgds, H-P
>>
>>

[committed] libstdc++: Fix build for targets without _Float128 [PR109921]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Built msp430-elf and cris-elf. Pushed to trunk.

-- >8 --

My r14-1431-g7037e7b6e4ac41 change caused the _Float128 overload to be
compiled unconditionally, by moving the USE_STRTOF128_FOR_FROM_CHARS
check into the function body. That function should still only be
compiled if the target actually supports _Float128.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc: Check __FLT128_MANT_DIG__ is
defined before trying to use _Float128.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index eea878072b0..f1dd1037bf3 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -1325,7 +1325,7 @@ _ZSt10from_charsPKcS0_RDF128_St12chars_format(const char* 
first,
  __ieee128& value,
  chars_format fmt) noexcept
 __attribute__((alias ("_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format")));
-#else
+#elif defined(__FLT128_MANT_DIG__)
 from_chars_result
 from_chars(const char* first, const char* last, _Float128& value,
   chars_format fmt) noexcept
-- 
2.40.1

[committed] libstdc++: Fix configure test for 32-bit targets

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Built msp430-elf and cris-elf. Pushed to trunk.

-- >8 --

The -mlarge model for msp430-elf uses 20-bit pointers, which means that
sizeof(void*) == 4 and so the r14-1432-g51cf0b3949b88b change gives the
wrong answer. Check __INTPTR_WIDTH__ >= 32 instead.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Fix for 32-bit pointers
to check __INT_PTR_WIDTH__ instead of sizeof(void*).
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 2 +-
 libstdc++-v3/configure| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index eb30c4f00a5..ca776974832 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5427,7 +5427,7 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
;;
 esac
 
-AC_COMPUTE_INT(glibcxx_cv_at_least_32bit, [sizeof(void*) >= 4])
+AC_COMPUTE_INT(glibcxx_cv_at_least_32bit, [__INTPTR_WIDTH__ >= 32])
 if test "$glibcxx_cv_at_least_32bit" -ne 0; then
   # Also embed a copy of the tzdata.zi file as a static string.
   embed_zoneinfo=yes
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 4db8a284083..f573dfced2e 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -71904,7 +71904,7 @@ fi
;;
 esac
 
-if ac_fn_c_compute_int "$LINENO" "sizeof(void*) >= 4" 
"glibcxx_cv_at_least_32bit"""; then :
+if ac_fn_c_compute_int "$LINENO" "__INTPTR_WIDTH__ >= 32" 
"glibcxx_cv_at_least_32bit"""; then :
 
 fi
 
-- 
2.40.1

Re: [PATCH] RISC-V: Support RVV permutation auto-vectorization

2023-05-31 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

thanks looks pretty comprehensive already.

> +(define_expand "vec_perm"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand:V 1 "register_operand")
> +   (match_operand:V 2 "register_operand")
> +   (match_operand: 3 "vector_perm_operand")]
> +  "TARGET_VECTOR && GET_MODE_NUNITS (mode).is_constant ()"
> +  {
> +riscv_vector::expand_vec_perm (operands[0], operands[1],
> +operands[2], operands[3]);
> +DONE;
> +  }
> +)

IMHO this would be the perfect use of expand_vec_perm (operands)
instead of the individual arguments (also the way aarch64 does it)
but your call in the end.

> +/* Return true if VEC is a constant in which every element is in the range
> +   [MINVAL, MAXVAL].  The elements do not need to have the same value.  */
> +
> +static bool
> +const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT 
> maxval)
> +{
> +  if (!CONST_VECTOR_P (vec)
> +  || GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> +return false;
> +
> +  int nunits;
> +  if (!CONST_VECTOR_STEPPED_P (vec))
> +nunits = const_vector_encoded_nelts (vec);
> +  else if (!CONST_VECTOR_NUNITS (vec).is_constant ())
> +return false;
> +
> +  for (int i = 0; i < nunits; i++)
> +{
> +  rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> +  if (!CONST_INT_P (vec_elem)
> +   || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> + return false;
> +}
> +  return true;
> +}
> +
> +/* Return a const_int vector of VAL.  */
> +
> +static rtx
> +gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +{
> +  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
> +  return gen_const_vec_duplicate (mode, c);
> +}
> +

Both functions are taken from aarch64.  I would suggest adding
comments in that respect in case somebody wants to unify that in the
middle-end at some point (and wonders if they really do the same thing).

> +/* This function emits VLMAX vrgather instruction. Emit vrgather.vx/vi when 
> sel
> +   is a const duplicate vector. Otherwise, emit vrgather.vv.  */
> +static void
> +emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
> +{
> +  rtx elt;
> +  insn_code icode;
> +  machine_mode data_mode = GET_MODE (target);
> +  if (const_vec_duplicate_p (sel, ))
> +{
> +  icode = code_for_pred_gather_scalar (data_mode);
> +  sel = elt;
> +}
> +  else
> +icode = code_for_pred_gather (data_mode);
> +  rtx ops[] = {target, op, sel};
> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
> +}
> +
> +static void
> +emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
> +{
> +  rtx elt;
> +  insn_code icode;
> +  machine_mode data_mode = GET_MODE (target);
> +  if (const_vec_duplicate_p (sel, ))
> +{
> +  icode = code_for_pred_gather_scalar (data_mode);
> +  sel = elt;
> +}
> +  else
> +icode = code_for_pred_gather (data_mode);
> +  rtx ops[] = {target, mask, target, op, sel};
> +  emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
> +}
> +
> +/* Implement vec_perm.  */
> +
> +void
> +expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
> +{
> +  machine_mode data_mode = GET_MODE (target);
> +  machine_mode sel_mode = GET_MODE (sel);
> +  /* Enforced by the pattern condition.  */

This is supposed to refer to the const-ness of sel_mode?

> +  /* Note: vec_perm indices are supposed to wrap when they go beyond the
> + size of the two value vectors, i.e. the upper bits of the indices
> + are effectively ignored.  RVV vrgather instead produces 0 for any
> + out-of-range indices, so we need to modulo all the vec_perm indices
> + to ensure they are all in range.  */
> +  int nunits = GET_MODE_NUNITS (sel_mode).to_constant ();

Ah, saw it in the aarch64 version, it makes more sense when arranged the same
way they did it so I'd suggest using their comment/code order.  Where is the
modulo actually happening, though?

> +  /* Check if the two values vectors are the same.  */
> +  if (rtx_equal_p (op0, op1) || const_vec_duplicate_p (sel))
> +{
> +  rtx max_sel = gen_const_vector_dup (sel_mode, nunits - 1);
> +  rtx sel_mod = expand_simple_binop (sel_mode, AND, sel, max_sel, NULL, 
> 0,
> +  OPTAB_DIRECT);

Ah, here.  But why not also further down?  I mean it gets clearer when
reading the whole function a second time but a bit of a strategy comment
upfront would help clear up misunderstandings.

Just thinking out loudly as I read it:

> +  rtx max_sel = gen_const_vector_dup (sel_mode, nunits);> +
> +  /* Step 1: generate a mask that should select second vector.  */
> +  expand_vec_cmp (mask, GEU, sel, max_sel);

So we select everything >= nunits into the mask.

> +  /* Step2: gather a intermediate result for index < nunits,
> + we don't need to care about the result of the element
> + whose index >= nunits.  */
> +  emit_vlmax_gather_insn (target, op0, sel);

Then gather every op0 indexed by sel into target.

Re: [PATCH] jump: Change return type of predicate functions from int to bool

2023-05-31 Thread Bernhard Reutner-Fischer via Gcc-patches

On Wed, 31 May 2023 09:40:24 +0200
Uros Bizjak via Gcc-patches  wrote:

> On Wed, May 31, 2023 at 9:17 AM Richard Biener
>  wrote:

> > Do we have a diagnostic that would point out places we
> > assign the bool result to an integer variable?  Do we want
> > to change those places as well (did you intend to or restrict
> > the changes to functions only used in conditional context?)  
> 
> FWIW, I'm going through candidate files by hand, looking for predicate
> functions that return 0/1. The candidate files are the ones mentioned
> in rtl.h. In addition, I am doing some drive-by cleanups in candidate
> files.

I've scratched
https://inbox.sourceware.org/gcc-patches/20221112234543.95441-5-al...@gcc.gnu.org/
https://inbox.sourceware.org/gcc-patches/20221112234543.95441-6-al...@gcc.gnu.org/

to generate patches.
You had to manually adjust the declarations to match the patched
definitions, and i did not change the type of local variables feeding
into the return automatically.

But it helped find some low hanging fruit quickly.

HTH and cheers,

[Patch] OpenMP/Fortran: Permit pure directives inside PURE

2023-05-31 Thread Tobias Burnus


I intent to commit the attached patch soon.

However, I want to give anyone the chance to comment on any aspect before
committing. Comments after the commit are welcome as well :-)

OpenMP 5.2 now uses properties to clauses and "pure" is among those properties.

Note that pure-2.f90 contains also stubs for directives only added in TR11 or 
TR12
to reduce the chance of missing those once they get implemented.
Additionally, 'scan' is 'pure' only since very recently - which I read
as bug fix; hence, it is accepted with the attached patch.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
OpenMP/Fortran: Permit pure directives inside PURE

Update permitted directives for directives marked in OpenMP's 5.2 as pure.
To ensure that list is updated, unimplemented directives are placed into
pure-2.f90 such the test FAILs once a known to be pure directive is
implemented without handling its pureness.

gcc/fortran/ChangeLog:

	* parse.cc (decode_omp_directive): Accept all pure directives
	inside a PURE procedures; handle 'error at(execution).

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.2): Mark pure-directive handling as 'Y'.

gcc/testsuite/ChangeLog:

	* gfortran.dg/gomp/nothing-2.f90: Remove one dg-error.
	* gfortran.dg/gomp/pr79154-2.f90: Update expected dg-error wording.
	* gfortran.dg/gomp/pr79154-simd.f90: Likewise.
	* gfortran.dg/gomp/pure-1.f90: New test.
	* gfortran.dg/gomp/pure-2.f90: New test.
	* gfortran.dg/gomp/pure-3.f90: New test.
	* gfortran.dg/gomp/pure-4.f90: New test.

 gcc/fortran/parse.cc| 50 +-
 gcc/testsuite/gfortran.dg/gomp/nothing-2.f90|  2 +-
 gcc/testsuite/gfortran.dg/gomp/pr79154-2.f90| 24 +++
 gcc/testsuite/gfortran.dg/gomp/pr79154-simd.f90 |  2 +-
 gcc/testsuite/gfortran.dg/gomp/pure-1.f90   | 88 +
 gcc/testsuite/gfortran.dg/gomp/pure-2.f90   | 73 
 gcc/testsuite/gfortran.dg/gomp/pure-3.f90   | 31 +
 gcc/testsuite/gfortran.dg/gomp/pure-4.f90   | 35 ++
 libgomp/libgomp.texi|  2 +-
 9 files changed, 277 insertions(+), 30 deletions(-)

diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index 9730ab095e2..733294c8cfa 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -934,7 +934,16 @@ decode_omp_directive (void)
  first (those also shall not turn off implicit pure).  */
   switch (c)
 {
+case 'a':
+  /* For -fopenmp-simd, ignore 'assumes'; note no clause starts with 's'. */
+  if (!flag_openmp && gfc_match ("assumes") == MATCH_YES)
+	break;
+  matcho ("assumes", gfc_match_omp_assumes, ST_OMP_ASSUMES);
+  matchs ("assume", gfc_match_omp_assume, ST_OMP_ASSUME);
+  break;
 case 'd':
+  matchds ("declare reduction", gfc_match_omp_declare_reduction,
+	   ST_OMP_DECLARE_REDUCTION);
   matchds ("declare simd", gfc_match_omp_declare_simd,
 	   ST_OMP_DECLARE_SIMD);
   matchdo ("declare target", gfc_match_omp_declare_target,
@@ -942,16 +951,25 @@ decode_omp_directive (void)
   matchdo ("declare variant", gfc_match_omp_declare_variant,
 	   ST_OMP_DECLARE_VARIANT);
   break;
+case 'e':
+  matchs ("end assume", gfc_match_omp_eos_error, ST_OMP_END_ASSUME);
+  matchs ("end simd", gfc_match_omp_eos_error, ST_OMP_END_SIMD);
+  matcho ("error", gfc_match_omp_error, ST_OMP_ERROR);
+  break;
 case 's':
+  matchs ("scan", gfc_match_omp_scan, ST_OMP_SCAN);
   matchs ("simd", gfc_match_omp_simd, ST_OMP_SIMD);
   break;
+case 'n':
+  matcho ("nothing", gfc_match_omp_nothing, ST_NONE);
+  break;
 }
 
   pure_ok = false;
   if (flag_openmp && gfc_pure (NULL))
 {
-  gfc_error_now ("OpenMP directives other than SIMD or DECLARE TARGET "
-		 "at %C may not appear in PURE procedures");
+  gfc_error_now ("OpenMP directive at %C is not pure and thus may not "
+		 "appear in a PURE procedure");
   gfc_error_recovery ();
   return ST_NONE;
 }
@@ -967,11 +985,6 @@ decode_omp_directive (void)
   else
 	matcho ("allocate", gfc_match_omp_allocate, ST_OMP_ALLOCATE);
   matcho ("allocators", gfc_match_omp_allocators, ST_OMP_ALLOCATORS);
-  /* For -fopenmp-simd, ignore 'assumes'; note no clause starts with 's'. */
-  if (!flag_openmp && gfc_match ("assumes") == MATCH_YES)
-	break;
-  matcho ("assumes", gfc_match_omp_assumes, ST_OMP_ASSUMES);
-  matchs ("assume", gfc_match_omp_assume, ST_OMP_ASSUME);
   matcho ("atomic", gfc_match_omp_atomic, ST_OMP_ATOMIC);
   break;
 case 'b':
@@ -984,8 +997,6 @@ decode_omp_directive (void)
   matcho ("critical", gfc_match_omp_critical, ST_OMP_CRITICAL);
   break;
 case 'd':
-  matchds ("declare

Re: [PATCH v4] tree-ssa-sink: Improve code sinking pass

2023-05-31 Thread Bernhard Reutner-Fischer via Gcc-patches

Hi!

On Wed, 31 May 2023 12:31:35 +0530
Ajit Agarwal via Gcc-patches  wrote:

> Hello All:
> 
> This patch improves code sinking pass to sink statements before call to reduce
> register pressure.
> Review comments are incorporated.
> 
> For example :
> 
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   l = a + b + c + d +e + f;
>   if (a != 5)
> {
>   bar();
>   j = l;
> }
> }
> 
> Code Sinking does the following:
> 
> void bar();
> int j;
> void foo(int a, int b, int c, int d, int e, int f)
> {
>   int l;
>   
>   if (a != 5)
> {
>   l = a + b + c + d +e + f; 
>   bar();
>   j = l;
> }
> }
> 
> Bootstrapped regtested on powerpc64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> tree-ssa-sink: Improve code sinking pass
> 
> Code Sinking sinks the blocks after call.This increases register pressure
> for callee-saved registers. Improves code sinking before call in the use
> blocks or immediate dominator of use blocks.
> 
> 2023-05-24  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * tree-ssa-sink.cc (statement_sink_location): Move statements before
>   calls.
>   (def_use_same_block): New function.
>   (select_best_block): Add heuristics to select the best blocks in the
>   immediate post dominator.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
>   * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 +
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 ++
>  gcc/tree-ssa-sink.cc| 74 +
>  3 files changed, 96 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> new file mode 100644
> index 000..49d5019ab93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized -fdump-tree-sink-stats" } */

You don't test the optimized dump, do you?

> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +{
> +  bar();
> +  if (b != 3)
> +x = 3;
> +  else
> +x = 5;
> +  j = l;
> +}
> +}
> +/* { dg-final { scan-tree-dump 
> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index b1ba7a2ad6c..ee8988bbb2c 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -171,9 +171,28 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>return commondom;
>  }
>  
> +/* Return TRUE if immediate uses of the defs in
> +   STMT occur in the same block as STMT, FALSE otherwise.  */
> +
> +bool
> +def_use_same_block (gimple *stmt)

Looks like this function should be static.

> +{
> +  def_operand_p def;
> +  ssa_op_iter iter;
> +
> +  FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
> +{
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (DEF_FROM_PTR (def));
> +  if ((gimple_bb (def_stmt) == gimple_bb (stmt)))
> + return true;
> + }
> +  return false;
> +}
> +
>  /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
> tree, return the best basic block between them (inclusive) to place
> -   statements.
> +   statements. The best basic block should be in immediate dominator of
> +   best basic block if the use stmt is after the call.

s/in/an/ ?

>  
> We want the most control dependent block in the shallowest loop nest.
>  
> @@ -190,7 +209,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
> bool *debug_stmts)
>  static basic_block
>  select_best_block (basic_block early_bb,
>  basic_block late_bb,
> -gimple *stmt)
> +gimple *stmt,
> +gimple *use)
>  {
>basic_block best_bb = late_bb;
>basic_block temp_bb = late_bb;
> @@ -230,14 +250,46 @@ select_best_block (basic_block early_bb,
>if (threshold > 100)
>   threshold = 100;
>  }
> -

superfluous whitespace change?

Re: [PATCH] PR52665 do not let .ident confuse assembler scan tests

2023-05-31 Thread Bernhard Reutner-Fischer via Gcc-patches

On Wed, 5 Sep 2018 17:32:04 +0200
Bernhard Reutner-Fischer  wrote:

> On Tue, 21 Jun 2016 at 00:19, Jeff Law  wrote:
> >
> > On 06/18/2016 01:31 PM, Bernhard Reutner-Fischer wrote:  

> > > gcc/testsuite/ChangeLog
> > >
> > > 2016-06-18  Bernhard Reutner-Fischer  
> > >
> > >   PR testsuite/52665
> > >   * lib/gcc-dg.exp (gcc-dg-test-1): Iterate over _required_options.
> > >   * lib/target-supports.exp (scan-assembler_required_options,
> > >   scan-assembler-not_required_options,
> > >   scan-assembler-times_required_options): Add -fno-ident.
> > >   * lib/scanasm.exp (scan-assembler-times): Fix error message.
> > >   * c-c++-common/ident-0a.c: New test.
> > >   * c-c++-common/ident-0b.c: New test.
> > >   * c-c++-common/ident-1a.c: New test.
> > >   * c-c++-common/ident-1b.c: New test.
> > >   * c-c++-common/ident-2a.c: New test.
> > >   * c-c++-common/ident-2b.c: New test.
> > >
> > > Ok for trunk?
> > >
> > > PS: proc force_conventional_output_for would be a bit misnomed by this,
> > > not sure if it should be renamed to maybe set_required_options_for or
> > > the like?  
> > OK.  
> 
> Now applied without the rename to trunk as r264128.
> 
> thanks,
> 
> >
> > Changing force_conventional_output to set_required_options_for is
> > pre-approved as well.

I've now applied the renaming as r14-1449-g994195b597ff20
thanks,

> >
> > jeff
> >

Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta




On 5/31/23 11:13, Iain Sandoe wrote:

I do have a multilib problem [with libgomp] on Darwin (which has been noticed 
:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951) but it is not obvious how 
the fix proposed would solve this - unless it’s some subtle change in global 
content for the multilib options.


For the issue I was seeing, the actual multilib content didn't matter. 
Its just that the test needed to be run for more than 1 rounds so that 
the imbalanced torture-init from 1st multlib/round created the state 
which triggered errors for 2nd round...



Schedule of variations:
    riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
    riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow
    riscv-sim/-march=rv64imac/-mabi=lp64/-mcmodel=medlow
    riscv-sim/-march=rv64imafdc/-mabi=lp64d/-mcmodel=medlow

Running target riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/riscv-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/config/sim.exp 
as generic interface file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/basic-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp 
...

===  torture-init
===  torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp 
...

===  torture-init
    ^^
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/linkage.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.misc-tests/matrix1.exp 
...

...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.target/xtensa/xtensa.exp 
...
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.test-framework/test-framework.exp 
...

skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

    === gcc Summary for 
riscv-sim/-march=rv32imac/-mabi=ilp32/-mcmodel=medlow ===


# of expected passes        136964
# of unexpected failures    4
# of unexpected successes    3
# of expected failures        1072
# of unsupported tests        3052

Running target riscv-sim/-march=rv32imafdc/-mabi=ilp32d/-mcmodel=medlow 
<--- 2nd round
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/riscv-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/config/sim.exp 
as generic interface file for target.
Using 
/scratch/vineetg/gnu/INSTALL/tc-up-230524-273895500425/share/dejagnu/baseboards/basic-sim.exp 
as board description file for target.
Using 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp 
...


--- gcc-dg-runtest : NOT calling torture-init
--- gcc-dg-runtest : NOT calling torture-finish

Running 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp 
...


===  torture-init
ERROR: tcl error sourcing 
/scratch/vineetg/gnu/toolchain-upstream-pristine/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp.

ERROR: tcl error code NONE
ERROR: torture-init: torture_without_loops is not empty as expected = "{ 
-O0 } { -O1 } { -O2 } { -O3 -g } { -Os } { -O2 -flto 
-fno-use-linker-plugin -flto-partition=none } { -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects }"

    while executing

Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta





On 5/31/23 10:57, Jeff Law wrote:



On 5/31/23 10:25, Vineet Gupta wrote:

Multilib testing on trunk is currently busted (and surprisingly this
affects any/all targets but it seems nobody cares). We currently get the
following splat:
I wouldn't say that nobody cares, it just hasn't bubbled up on 
anyone's priority list yet (most developers aren't working on targets 
that make heavy use of multilibs).


Pardon my theatrics :-)

But probably more importantly, this problem seems to not be triggering 
on all multilib targets.  For example, I just examined my tester's 
build logs and couldn't see this on the H8/300 or V850 ports.  Which 
begs the question, why?


Are just in case, this is not running a subset using some stray 
RUNTESTFLAGS.


Yes I'm curious to see why others are not seeing it. Could you rerun 
upstream with following debug (and avoid -j when running the testsuite 
just to serialize the logs - the problem does happen for -j runs too 
though). Then in the logs we could see if init/finish get out of sync 
around the culprit file (or my case at least)



--->
diff --git a/gcc/testsuite/lib/torture-options.exp 
b/gcc/testsuite/lib/torture-options.exp

index dfb536d1d96c..95a6f818fded 100644
--- a/gcc/testsuite/lib/torture-options.exp
+++ b/gcc/testsuite/lib/torture-options.exp
@@ -22,6 +22,8 @@
 proc torture-init { args } {
 global torture_without_loops global_with_loops

+    send_user "\n\n===  torture-init\n"
+
 if [info exists torture_without_loops] {
    error "torture-init: torture_without_loops is not empty as 
expected = \"${torture_without_loops}\""

 }
@@ -116,6 +118,8 @@ proc set-torture-options { args } {
 proc torture-finish { args } {
 global torture_without_loops torture_with_loops

+    send_user "\n\n===  torture-finish\n"
+
 if [info exists torture_without_loops] {
    unset torture_without_loops
 } else {

--->8---

FWIW I'd like to be able to test stuff cross-arch too (at least x86, 
aarch64 and a few others).


Thx,
-Vineet

Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Iain Sandoe via Gcc-patches




> On 31 May 2023, at 18:57, Jeff Law via Gcc-patches  
> wrote:
> 
> 
> 
> On 5/31/23 10:25, Vineet Gupta wrote:
>> Multilib testing on trunk is currently busted (and surprisingly this
>> affects any/all targets but it seems nobody cares). We currently get the
>> following splat:
> I wouldn't say that nobody cares, it just hasn't bubbled up on anyone's 
> priority list yet (most developers aren't working on targets that make heavy 
> use of multilibs).
> 
> But probably more importantly, this problem seems to not be triggering on all 
> multilib targets.  For example, I just examined my tester's build logs and 
> couldn't see this on the H8/300 or V850 ports.  Which begs the question, why?

I do have a multilib problem [with libgomp] on Darwin (which has been noticed : 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951) but it is not obvious how 
the fix proposed would solve this - unless it’s some subtle change in global 
content for the multilib options.

(testing anyway)

thanks
Iain

[PATCH 0/3] Add diagram support to gcc diagnostics

2023-05-31 Thread David Malcolm via Gcc-patches

Existing diagnostic text output in GCC has to be implemented by writing
sequentially to a pretty_printer instance.  This makes it hard to
implement some kinds of diagnostic output (see e.g.
diagnostic-show-locus.cc, which is reaching the limits of
maintainability).

I've posted various experimental patches over the years that add other
kinds of output to GCC, such as ASCII art:
- "rich vectorization hints":
  - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg01576.html
- visualizations of -Wformat-overflow:
  - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77696 comment 9 onwards
  - https://gcc.gnu.org/legacy-ml/gcc-patches/2018-09/msg00771.html

This patch kit combines the above ideas.  It:
- adds more flexible ways to create diagnostic output:
  - a canvas class, which can be "painted" to via random-access (rather
than sequentially), and then printed when the painting is complete.
A formatted pretty_printer can be roundtripped to a canvas and back,
preserving formatting data (colors and URLs)
  - a table class for 2D grid layout, supporting items that span multiple
rows/columns
  - a widget class for organizing diagrams hierarchically and painting
  them to a canvas
- expands GCC's diagnostics subsystem so that diagnostics can have
  "text art" diagrams - think ASCII art, but potentially including some
  Unicode characters, such as box-drawing chars (by using the canvas
  class)
- uses this to implement visualizations of -Wanalyzer-out-of-bounds so
  that, where possible, it will emit a text art diagram visualizing the
  spatial relationship between (a) the memory region that the analyzer
  predicts would be accessed, versus (b) the range of memory that is
  valid to access - whether they overlap, are touching, are close or far
  apart; which one is before or after in memory, the relative sizes
  involved, the direction of the access (read vs write), and, in some
  cases, the values of data involved.

The new code is in a new "gcc/text-art" subdirectory and "text_art"
namespace.

Many examples of the visualizations can be seen in patch 3 of the kit;
here are two examples; given:

  int32_t arr[10];

  int32_t int_arr_read_element_before_start_far(void)
  {
return arr[-100];
  }

it emits:

demo-1.c: In function ‘int_arr_read_element_before_start_far’:
demo-1.c:7:13: warning: buffer under-read [CWE-127] [-Wanalyzer-out-of-bounds]
7 |   return arr[-100];
  |  ~~~^~
  ‘int_arr_read_element_before_start_far’: event 1
|
|7 |   return arr[-100];
|  |  ~~~^~
|  | |
|  | (1) out-of-bounds read from byte -400 till byte -397 
but ‘arr’ starts at byte 0
|
demo-1.c:7:13: note: valid subscripts for ‘arr’ are ‘[0]’ to ‘[9]’

  ┌───┐
  │read of ‘int32_t’ (4 bytes)│
  └───┘
^
│
│
  ┌───┐  ┌┬┬─┐
  │   │  │  [0]   │  ...   │   [9]   │
  │before valid range │  ├┴┴─┤
  │   │  │‘arr’ (type: ‘int32_t[10]’)│
  └───┘  └───┘
  ├─┬─┤├─┬──┤├─┬─┤
││ │
   ╭┴───╮   ╭┴╮╭───┴──╮
   │⚠️  under-read of 4 bytes│   │396 bytes││size: 40 bytes│
   ╰╯   ╰─╯╰──╯

and given:

  #include 

  void
  test_non_ascii ()
  {
char buf[5];
strcpy (buf, "文字化け");
  }

it emits:

demo-2.c: In function ‘test_non_ascii’:
demo-2.c:7:3: warning: stack-based buffer overflow [CWE-121] 
[-Wanalyzer-out-of-bounds]
7 |   strcpy (buf, "文字化け");
  |   ^~~~
  ‘test_non_ascii’: events 1-2
|
|6 |   char buf[5];
|  |^~~
|  ||
|  |(1) capacity: 5 bytes
|7 |   strcpy (buf, "文字化け");
|  |   
|  |   |
|  |   (2) out-of-bounds write from byte 5 till byte 12 but ‘buf’ ends 
at byte 5
|
demo-2.c:7:3: note: write of 8 bytes to beyond the end of ‘buf’
7 |   strcpy (buf, "文字化け");
  |   ^~~~
demo-2.c:7:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[4]’

  ┌─┬─┬─┬┬┐┌┬┬┬┬┬┬┬──┐
  │ [0] │ [1] │ [2] │[3] │[4] ││[5] │[6] │[7] │[8] │[9] │[10]│[11]│ [12] │
  ├─┼─┼─┼┼┤├┼┼┼┼┼┼┼──┤
  │0xe6 │0x96 │0x87 │0xe5│0xad││0x97│0xe5│0x8c│0x96│0xe3│0x81│0x91│ 0x00 │
  ├─┴─┴─┼┴┴┴┼┴┴┼┴┴┼──┤
  │ U+6587  │U+5b57 │U+5316│U+3051│U+│

[PATCH 1/3] testsuite: move handle-multiline-outputs to before check for blank lines

2023-05-31 Thread David Malcolm via Gcc-patches

I have followup patches that require checking for multiline patterns
that have blank lines within them, so this moves the handling of
multiline patterns before the check for blank lines, allowing for such
multiline patterns.

Doing so uncovers some issues with existing multiline directives, which
the patch fixes.

gcc/testsuite/ChangeLog:
* c-c++-common/Wlogical-not-parentheses-2.c: Split up the
multiline directive.
* gcc.dg/analyzer/malloc-macro-inline-events.c: Remove redundant
dg-regexp directives.
* gcc.dg/missing-header-fixit-5.c: Split up the multiline
directives.
* lib/gcc-dg.exp (gcc-dg-prune): Move call to
handle-multiline-outputs from prune_gcc_output to here.
* lib/multiline.exp (dg-end-multiline-output): Move call to
maybe-handle-nn-line-numbers from prune_gcc_output to here.
* lib/prune.exp (prune_gcc_output): Move calls to
maybe-handle-nn-line-numbers and handle-multiline-outputs from
here to the above.
---
 .../c-c++-common/Wlogical-not-parentheses-2.c  |  2 ++
 .../gcc.dg/analyzer/malloc-macro-inline-events.c   |  5 -
 gcc/testsuite/gcc.dg/missing-header-fixit-5.c  | 10 --
 gcc/testsuite/lib/gcc-dg.exp   |  5 +
 gcc/testsuite/lib/multiline.exp|  7 ++-
 gcc/testsuite/lib/prune.exp|  7 ---
 6 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Wlogical-not-parentheses-2.c 
b/gcc/testsuite/c-c++-common/Wlogical-not-parentheses-2.c
index ba8dce84f5d..2d9382014c4 100644
--- a/gcc/testsuite/c-c++-common/Wlogical-not-parentheses-2.c
+++ b/gcc/testsuite/c-c++-common/Wlogical-not-parentheses-2.c
@@ -12,6 +12,8 @@ foo (int aaa, int bbb)
 /* { dg-begin-multiline-output "" }
r += !aaa == bbb;
  ^~
+   { dg-end-multiline-output "" } */
+/* { dg-begin-multiline-output "" }
r += !aaa == bbb;
 ^~~~
 (   )
diff --git a/gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c 
b/gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c
index f08aee626a5..9134bb4781e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c
+++ b/gcc/testsuite/gcc.dg/analyzer/malloc-macro-inline-events.c
@@ -12,11 +12,6 @@ int test (void *ptr)
   WRAPPED_FREE (ptr); /* { dg-message "in expansion of macro 'WRAPPED_FREE'" } 
*/
   WRAPPED_FREE (ptr); /* { dg-message "in expansion of macro 'WRAPPED_FREE'" } 
*/
 
-  /* Erase the spans indicating the header file
- (to avoid embedding path assumptions).  */
-  /* { dg-regexp "\[^|\]+/malloc-macro.h:\[0-9\]+:\[0-9\]+:" } */
-  /* { dg-regexp "\[^|\]+/malloc-macro.h:\[0-9\]+:\[0-9\]+:" } */
-
   /* { dg-begin-multiline-output "" }
NN | #define WRAPPED_FREE(PTR) free(PTR)
   |   ^
diff --git a/gcc/testsuite/gcc.dg/missing-header-fixit-5.c 
b/gcc/testsuite/gcc.dg/missing-header-fixit-5.c
index 916033c689c..bf44feb24a9 100644
--- a/gcc/testsuite/gcc.dg/missing-header-fixit-5.c
+++ b/gcc/testsuite/gcc.dg/missing-header-fixit-5.c
@@ -12,14 +12,18 @@ foo (char *m, int i)
   /* { dg-begin-multiline-output "" }
11 |   if (isdigit (m[0]))
   |   ^~~
+ { dg-end-multiline-output "" } */
+  /* { dg-begin-multiline-output "" }
   +++ |+#include 
 1 | 
  { dg-end-multiline-output "" } */
 {
   return abs (i); /* { dg-warning "implicit declaration of function" } */
   /* { dg-begin-multiline-output "" }
-   19 |   return abs (i);
+   21 |   return abs (i);
   |  ^~~
+ { dg-end-multiline-output "" } */
+  /* { dg-begin-multiline-output "" }
   +++ |+#include 
 1 | 
  { dg-end-multiline-output "" } */
@@ -27,8 +31,10 @@ foo (char *m, int i)
   else
 putchar (m[0]); /* { dg-warning "implicit declaration of function" } */
   /* { dg-begin-multiline-output "" }
-   28 | putchar (m[0]);
+   32 | putchar (m[0]);
   | ^~~
+ { dg-end-multiline-output "" } */
+  /* { dg-begin-multiline-output "" }
   +++ |+#include 
 1 | 
  { dg-end-multiline-output "" } */
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 4ed4233efff..6475cab46de 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -364,6 +364,11 @@ proc gcc-dg-prune { system text } {
 # Always remember to clear it in .exp file after executed all tests.
 global dg_runtest_extra_prunes
 
+# Call into multiline.exp to handle any multiline output directives.
+# This is done before the check for blank lines so that multiline
+# output directives can have blank lines within them.
+set text [handle-multiline-outputs $text]
+
 # Complain about blank lines in the output (PR other/69006)
 global allow_blank_lines
 if { !$allow_blank_lines } {
diff --git a/gcc/testsuite/lib/multiline.exp

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-31 Thread Peter Bergner via Gcc-patches

On 5/22/23 4:04 AM, Kewen.Lin wrote:
> on 2023/5/11 02:06, Carl Love via Gcc-patches wrote:
>> @@ -3161,12 +3161,15 @@
>>void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
>>  TR_STXVRBX vsx_stxvrbx {stvec}
>>  
>> -  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
>> +  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
>>  TR_STXVRHX vsx_stxvrhx {stvec}
>>  
>> -  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
>> +  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
>>  TR_STXVRWX vsx_stxvrwx {stvec}
> 
> Good catching!

This hunk should be its own patch and commit, as it is independent of
the other change.  Especially since other built-ins also don't have
{,un}simgned long * as arguments, not just __builtin_altivec_tr_stxvr*x.

>> +  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long *);
>> +TR_STXVRLX vsx_stxvrdx {stvec}
>> +
> 
> This is mapped to the one used for type long long, it's a hard mapping,
> IMHO it's wrong and not consistent with what the users expect, since on Power
> the size of type long int is 4 bytes at -m32 while 8 bytes at -m64, this
> implementation binding to 8 bytes can cause trouble in 32-bit.  I wonder if
> it's a good idea to add one overloaded version for type long int, for now
> openxl also emits error message for long int type pointer (see its doc [1]),
> users can use casting to make it to the acceptable pointer types (long long
> or int as its size).

I'm the person who noticed that we don't accept signed/unsigned long * as
an argument type and asked Carl to investigate.  I find it hard to believe
we accept all integer pointer types, except long *.  I agree that it shouldn't
always map to long long *, since as you say, that's wrong for -m32.
My hope was that we could somehow automagically handle the long * types
in the built-in machinery, mapping them to either the int * built-in or
the long long * built-in depending on -m32 or -m64.  Again, this limitation
is no limited to __builtin_altivec_tr_stx* built-ins, but others as well,
so I was kind of hoping for a general solution that would fix them all.
I'm not sure of that's possible though.

Peter

Re: [PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 10:25, Vineet Gupta wrote:

Multilib testing on trunk is currently busted (and surprisingly this
affects any/all targets but it seems nobody cares). We currently get the
following splat:
I wouldn't say that nobody cares, it just hasn't bubbled up on anyone's 
priority list yet (most developers aren't working on targets that make 
heavy use of multilibs).


But probably more importantly, this problem seems to not be triggering 
on all multilib targets.  For example, I just examined my tester's build 
logs and couldn't see this on the H8/300 or V850 ports.  Which begs the 
question, why?


Jeff

Re: [PATCH] Move std::search into algobase.h

2023-05-31 Thread Jonathan Wakely via Gcc-patches

On Wed, 31 May 2023 at 18:39, François Dumont via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> libstdc++: Reduce  inclusion to 
>
>
> Move the std::search definition from stl_algo.h to stl_algobase.h and use
> the later in .
>
> For consistency also move std::__parallel::search and associated helpers
> from
>  to  so that
> std::__parallel::search
> is accessible along with std::search.
>
> libstdc++-v3/ChangeLog:
>
>  * include/bits/stl_algo.h
>  (std::__search, std::search(_FwdIt1, _FwdIt1, _FwdIt2,
> _FwdIt2, _BinPred)): Move...
>  * include/bits/stl_algobase.h: ...here.
>  * include/std/functional: Replace  include by
> .
>  * include/parallel/algo.h (std::__parallel::search<_FIt1,
> _FIt2, _BinaryPred>)
>  (std::__parallel::__search_switch<_FIt1, _FIt2,
> _BinaryPred, _ItTag1, _ItTag2>):
>  Move...
>  * include/parallel/algobase.h: ...here.
>  * include/std/functional: Remove  and
> 
>  includes. Include .
>
> Tested under Linux x86_64.
>
> Ok to commit ?
>

OK

[PATCH] Move std::search into algobase.h

2023-05-31 Thread François Dumont via Gcc-patches


libstdc++: Reduce  inclusion to 


Move the std::search definition from stl_algo.h to stl_algobase.h and use
the later in .

For consistency also move std::__parallel::search and associated helpers 
from
 to  so that 
std::__parallel::search

is accessible along with std::search.

libstdc++-v3/ChangeLog:

    * include/bits/stl_algo.h
    (std::__search, std::search(_FwdIt1, _FwdIt1, _FwdIt2, 
_FwdIt2, _BinPred)): Move...

    * include/bits/stl_algobase.h: ...here.
    * include/std/functional: Replace  include by 
.
    * include/parallel/algo.h (std::__parallel::search<_FIt1, 
_FIt2, _BinaryPred>)
    (std::__parallel::__search_switch<_FIt1, _FIt2, 
_BinaryPred, _ItTag1, _ItTag2>):

    Move...
    * include/parallel/algobase.h: ...here.
    * include/std/functional: Remove  and 


    includes. Include .

Tested under Linux x86_64.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 54695490166..2c52ed51402 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -140,54 +140,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // count
   // count_if
   // search
-
-  template
-_GLIBCXX20_CONSTEXPR
-_ForwardIterator1
-__search(_ForwardIterator1 __first1, _ForwardIterator1 __last1,
-	 _ForwardIterator2 __first2, _ForwardIterator2 __last2,
-	 _BinaryPredicate  __predicate)
-{
-  // Test for empty ranges
-  if (__first1 == __last1 || __first2 == __last2)
-	return __first1;
-
-  // Test for a pattern of length 1.
-  _ForwardIterator2 __p1(__first2);
-  if (++__p1 == __last2)
-	return std::__find_if(__first1, __last1,
-		__gnu_cxx::__ops::__iter_comp_iter(__predicate, __first2));
-
-  // General case.
-  _ForwardIterator1 __current = __first1;
-
-  for (;;)
-	{
-	  __first1 =
-	std::__find_if(__first1, __last1,
-		__gnu_cxx::__ops::__iter_comp_iter(__predicate, __first2));
-
-	  if (__first1 == __last1)
-	return __last1;
-
-	  _ForwardIterator2 __p = __p1;
-	  __current = __first1;
-	  if (++__current == __last1)
-	return __last1;
-
-	  while (__predicate(__current, __p))
-	{
-	  if (++__p == __last2)
-		return __first1;
-	  if (++__current == __last1)
-		return __last1;
-	}
-	  ++__first1;
-	}
-  return __first1;
-}
-
   // search_n
 
   /**
@@ -4147,48 +4099,6 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 			   __gnu_cxx::__ops::__iter_equal_to_iter());
 }
 
-  /**
-   *  @brief Search a sequence for a matching sub-sequence using a predicate.
-   *  @ingroup non_mutating_algorithms
-   *  @param  __first1 A forward iterator.
-   *  @param  __last1  A forward iterator.
-   *  @param  __first2 A forward iterator.
-   *  @param  __last2  A forward iterator.
-   *  @param  __predicate  A binary predicate.
-   *  @return   The first iterator @c i in the range
-   *  @p [__first1,__last1-(__last2-__first2)) such that
-   *  @p __predicate(*(i+N),*(__first2+N)) is true for each @c N in the range
-   *  @p [0,__last2-__first2), or @p __last1 if no such iterator exists.
-   *
-   *  Searches the range @p [__first1,__last1) for a sub-sequence that
-   *  compares equal value-by-value with the sequence given by @p
-   *  [__first2,__last2), using @p __predicate to determine equality,
-   *  and returns an iterator to the first element of the
-   *  sub-sequence, or @p __last1 if no such iterator exists.
-   *
-   *  @see search(_ForwardIter1, _ForwardIter1, _ForwardIter2, _ForwardIter2)
-  */
-  template
-_GLIBCXX20_CONSTEXPR
-inline _ForwardIterator1
-search(_ForwardIterator1 __first1, _ForwardIterator1 __last1,
-	   _ForwardIterator2 __first2, _ForwardIterator2 __last2,
-	   _BinaryPredicate  __predicate)
-{
-  // concept requirements
-  __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator1>)
-  __glibcxx_function_requires(_ForwardIteratorConcept<_ForwardIterator2>)
-  __glibcxx_function_requires(_BinaryPredicateConcept<_BinaryPredicate,
-	typename iterator_traits<_ForwardIterator1>::value_type,
-	typename iterator_traits<_ForwardIterator2>::value_type>)
-  __glibcxx_requires_valid_range(__first1, __last1);
-  __glibcxx_requires_valid_range(__first2, __last2);
-
-  return std::__search(__first1, __last1, __first2, __last2,
-			   __gnu_cxx::__ops::__iter_comp_iter(__predicate));
-}
-
   /**
*  @brief Search a sequence for a number of consecutive values.
*  @ingroup non_mutating_algorithms
diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index 4a6f8195d98..dd95e94f7e9 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -2150,6 +2150,53 @@ _GLIBCXX_END_NAMESPACE_ALGO
   return __result;
 }
 
+  template
+_GLIBCXX20_CONSTEXPR
+_ForwardIterator1
+

Re: [PATCH] libgcc: Use initarray section type for .init_stack

2023-05-31 Thread Ian Lance Taylor via Gcc-patches

On Wed, May 31, 2023 at 12:41 AM Kewen.Lin via Gcc-patches
 wrote:
>
> >> libgcc/ChangeLog:
> >>
> >>  * config/i386/morestack.S: Use @init_array rather than
> >>  @progbits for section type of section .init_array.
> >>  * config/rs6000/morestack.S: Likewise.
> >>  * config/s390/morestack.S: Likewise.
> >
> > s390 parts are ok. I did run a bootstrap and regression. Looks all good. 
> > Thanks!
>
> Thanks for testing this on s390, really appreciate!
>
> Hi Ian & Uros,
>
> Do you have any concerns on this, or does it look good to you?

This is OK.

Thanks.

Ian

[PATCH][committed] aarch64: PR target/99195 Annotate dot-product patterns for vec-concat-zero

2023-05-31 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This straightforward patch annotates the dotproduct instructions, including the 
i8mm ones.
Tests included.
Nothing unexpected here.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (dot_prod): Rename to...
(dot_prod): ... This.
(usdot_prod): Rename to...
(usdot_prod): ... This.
(aarch64_dot_lane): Rename to...
(aarch64_dot_lane): ... This.
(aarch64_dot_laneq): Rename to...
(aarch64_dot_laneq): ... This.
(aarch64_dot_lane): Rename 
to...

(aarch64_dot_lane):
... This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_11.c: New test.


dotprod.patch
Description: dotprod.patch

[PATCH][committed] aarch64: PR target/99195 Annotate saturating mult patterns for vec-concat-zero

2023-05-31 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

This patch goes through the various alphabet soup saturating multiplication 
patterns, including those in TARGET_RDMA
and annotates them with . Many other patterns are widening and 
always write the full 128-bit vectors
so this annotation doesn't apply to them. Nothing out of the ordinary in this 
patch.

Bootstrapped and tested on aarch64-none-linux and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_sqdmulh): Rename 
to...
(aarch64_sqdmulh): ... This.
(aarch64_sqdmulh_n): Rename to...
(aarch64_sqdmulh_n): ... This.
(aarch64_sqdmulh_lane): Rename to...
(aarch64_sqdmulh_lane): ... This.
(aarch64_sqdmulh_laneq): Rename to...
(aarch64_sqdmulh_laneq): ... This.
(aarch64_sqrdmlh): Rename to...
(aarch64_sqrdmlh): ... This.
(aarch64_sqrdmlh_lane): Rename to...
(aarch64_sqrdmlh_lane): ... 
This.
(aarch64_sqrdmlh_laneq): Rename to...
(aarch64_sqrdmlh_laneq): ... 
This.

gcc/testsuite/ChangeLog:

PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for qdmulh, qrdmulh.
* gcc.target/aarch64/simd/pr99195_10.c: New test.


satmul.patch
Description: satmul.patch

Re: [PATCH] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-31 Thread Vineet Gupta




On 5/30/23 11:43, Vineet Gupta wrote:

On 5/26/23 16:38, Vineet Gupta wrote:



On 5/25/23 13:26, Thomas Schwinge wrote:


I'm pasting a snippet of gcc.log. Issue is indeed triggered by rvv.exp
which needs some love.

I'd intentionally asked to "see a complete 'gcc.log' file where the
ERRORs are visible".


The full log files are humongous - even xz compressed is ~ 7 MB - how 
can I share that w/o the list dropping it.

I guess I can try emailing it you directly on work email - if that's OK.


The torture-{init,finish} needs to be in riscv.exp not rvv.exp
Running full tests now.

I still don't understand this.

My current theory would be that some other '*.exp' file runs
'torture-init' and then prematurely ends without 'torture-finish', and
thus the torture testing state bleeds into the next '*.exp' 
file(s).  I'd
hoped that I could pinpoint that via "a complete 'gcc.log' file 
where the

ERRORs are visible".


Seems likely. So back to good old printf style debugging: I added 
dumping of the dup options to see what exactly was leaking.


setup #1
 - riscv.exp: Added torture-init/finish
 - Deleted rvv.exp (to isolate the problem)

...

Setup #2
 - riscv.exp: Added torture-init/finish
 - riscv.exp: commented away ADDITIONAL_TORTURE_OPTIONS line
 - rvv.exp remains, unchanged

...


In the 3rd setup, I've removed riscv.exp and rvv.exp and running the 
testsuite: errors still show.


So we are iterating over multilib combinations.
Things are fine for the first one. The initial flags comprise of 
DG_TORTURE_OPTIONS from gcc-dg.exp (-O0, -O1 )
However when the 2nd multilib runs, it seems the old ones are not 
getting cleared, hence the splat. 


I've posted the fix [1] . printf/send_user() to the rescue !

@Thomas I fat fingered the send and missed CC'ing you on the patches.

Thx,
-Vineet

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620263.html

[PATCH 3/3] testsuite: print any leaking torture options for debugging

2023-05-31 Thread Vineet Gupta

This was helpful when debugging the recent multilib testsuite failure.

gcc/testsuite:
* lib/torture-options.exp: print the value of non-empty options:
torture_without_loops, torture_with_loops, LTO_TORTURE_OPTIONS.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/lib/torture-options.exp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/lib/torture-options.exp 
b/gcc/testsuite/lib/torture-options.exp
index d00d07e9378d..dfb536d1d96c 100644
--- a/gcc/testsuite/lib/torture-options.exp
+++ b/gcc/testsuite/lib/torture-options.exp
@@ -23,15 +23,15 @@ proc torture-init { args } {
 global torture_without_loops global_with_loops
 
 if [info exists torture_without_loops] {
-   error "torture-init: torture_without_loops is not empty as expected"
+   error "torture-init: torture_without_loops is not empty as expected = 
\"${torture_without_loops}\""
 }
 if [info exists torture_with_loops] {
-   error "torture-init: torture_with_loops is not empty as expected"
+   error "torture-init: torture_with_loops is not empty as expected = 
\"${torture_with_loops}\""
 }
 
 global LTO_TORTURE_OPTIONS
 if [info exists LTO_TORTURE_OPTIONS] {
-   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected"
+   error "torture-init: LTO_TORTURE_OPTIONS is not empty as expected =  
\"${LTO_TORTURE_OPTIONS}\""
 }
 set LTO_TORTURE_OPTIONS ""
 if [check_effective_target_lto] {
-- 
2.34.1

[PATCH 2/3] RISC-V: Add missing torture-init and torture-finish for rvv.exp

2023-05-31 Thread Vineet Gupta

From: Kito Cheng 

This is in line with recent test harness expectations and is a
preventive change as it doesn't actually fix any errors.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add torture-init and
torture-finish.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp 
b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
index 5e69235a268c..7ab7456d1d15 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
+++ b/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp
@@ -39,6 +39,7 @@ if [istarget riscv32-*-*] then {
 
 # Initialize `dg'.
 dg-init
+torture-init
 
 # Main loop.
 set CFLAGS "$DEFAULT_CFLAGS -march=$gcc_march -mabi=$gcc_mabi -O3"
@@ -90,5 +91,7 @@ foreach op $AUTOVEC_TEST_OPTS {
 dg-runtest [lsort [glob -nocomplain 
$srcdir/$subdir/autovec/vls-vlmax/*.\[cS\]]] \
"-std=c99 -O3 -ftree-vectorize --param 
riscv-autovec-preference=fixed-vlmax" $CFLAGS
 
+torture-finish
+
 # All done.
 dg-finish
-- 
2.34.1

[PATCH 1/3] testsuite: Unbork multilib testing on RISC-V (and any target really)

2023-05-31 Thread Vineet Gupta

Multilib testing on trunk is currently busted (and surprisingly this
affects any/all targets but it seems nobody cares). We currently get the
following splat:

| ERROR: tcl error code NONE
| ERROR: torture-init: torture_without_loops is not empty as expected

And this takes down pretty much all of testsuite.

|   = Summary of gcc testsuite =
|| # of unexpected case / # of unique unexpected 
case
||  gcc |  g++ | gfortran |
| rv64imafdc/  lp64d/ medlow | 5421 / 4 |1 / 1 |   72 /12 |
| rv32imafdc/ ilp32d/ medlow | 5422 / 5 |3 / 2 |   72 /12 |
|   rv32imac/  ilp32/ medlow |  391 / 5 |3 / 2 |  109 /19 |
|   rv64imac/   lp64/ medlow | 5422 / 5 |1 / 1 |  109 /19 |

There have been recent improvements in test harness around pairing of
torture-{init,finish} and checking for leaking torture options. This
however triggers a latent bug introduced way back in 2009: commit 3dd1415dc88
"i386-prefetch.exp: Skip tests when multilib flags contain -march" which
missed a pairing torture-finish. It was benign so far but in the new
regime it causes extra state "torture-init-done" confusing the 2nd round of
tests (in multilib).

This fix moves the early exit outside of torture-{init,finish} bracket
and brings RISC-V testing back to sanity.

| rv64imafdc/  lp64d/ medlow |3 / 2 |1 / 1 |   72 /12 |
| rv32imafdc/ ilp32d/ medlow |4 / 3 |3 / 2 |   72 /12 |
|   rv32imac/  ilp32/ medlow |3 / 2 |3 / 2 |  109 /19 |
|   rv64imac/   lp64/ medlow |5 / 4 |1 / 1 |  109 /19 |

gcc/testsuite:
* gcc.misc-tests/i386-prefetch.exp: Move early return outside
  the torture-{init,finish}

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.misc-tests/i386-prefetch.exp | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp 
b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
index ad9e56a54bcf..7101b1e30576 100644
--- a/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
+++ b/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp
@@ -82,6 +82,13 @@ if $tracelevel then {
 strace $tracelevel
 }
 
+if { [board_info target exists multilib_flags]
+ && [string match "* -march=*" " [board_info target multilib_flags] "] } {
+# Multilib flags come after the -march flags we pass and override
+# them, so skip these tests when such flags are passed.
+return
+}
+
 # Load support procs.
 load_lib gcc-dg.exp
 load_lib torture-options.exp
@@ -90,13 +97,6 @@ load_lib torture-options.exp
 dg-init
 torture-init
 
-if { [board_info target exists multilib_flags]
- && [string match "* -march=*" " [board_info target multilib_flags] "] } {
-# Multilib flags come after the -march flags we pass and override
-# them, so skip these tests when such flags are passed.
-return
-}
-
 set-torture-options $PREFETCH_NONE
 gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/i386-pf-none-*.c]] "" 
""
 
-- 
2.34.1

[PATCH 0/3] Unbork testsuite for multlib setups

2023-05-31 Thread Vineet Gupta

Hi,

This fixes the current broken multilib testing on trunk.
1/3 is th actual fix
2/3 is preventive and
3/3 is debug aid in case this bites someone else in future.

Thx,
-Vineet

Kito Cheng (1):
  RISC-V: Add missing torture-init and torture-finish for rvv.exp

Vineet Gupta (2):
  testsuite: Unbork multilib testing on RISC-V (and any target really)
  testsuite: print any leaking torture options for debugging

 gcc/testsuite/gcc.misc-tests/i386-prefetch.exp | 14 +++---
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp |  3 +++
 gcc/testsuite/lib/torture-options.exp  |  6 +++---
 3 files changed, 13 insertions(+), 10 deletions(-)

-- 
2.34.1

Re: [PATCH 1/2] ipa-cp: Avoid long linear searches through DECL_ARGUMENTS

2023-05-31 Thread Martin Jambor

Hello,

On Wed, May 31 2023, Richard Biener wrote:
> On Tue, May 30, 2023 at 4:21 PM Jan Hubicka  wrote:
>>
>> > On Mon, May 29, 2023 at 6:20 PM Martin Jambor  wrote:
>> > >
>> > > Hi,
>> > >
>> > > there have been concerns that linear searches through DECL_ARGUMENTS
>> > > that are often necessary to compute the index of a particular
>> > > PARM_DECL which is the key to results of IPA-CP can happen often
>> > > enough to be a compile time issue, especially if we plug the results
>> > > into value numbering, as I intend to do with a follow-up patch.
>> > >
>> > > This patch creates a hash map to do the look-up for all functions
>> > > which have some information discovered by IPA-CP and which have 32
>> > > parameters or more.  32 is a hard-wired magical constant here to
>> > > capture the trade-off between the memory allocation overhead and
>> > > length of the linear search.  I do not think it is worth making it a
>> > > --param but if people think it appropriate, I can turn it into one.
>> >
>> > Since ipcp_transformation is short-lived (is it?) is it worth the trouble?
>> > Comments below ...
>>
>> It lives from ipa-cp time to WPA stream-out or IPA transform stage,
>> so memory consumption is a concern with -flto.

It lives longer, until the function is finished, it holds the
information we want to use during PRE, after all (and Honza also already
added queries to it to tree-ssa-ccp.cc though those probably could be
avoided).

The proposed mapping for long chains would only be created in the
transformation IPA-CP hook, so would only live in LTRANS and only
throughout the compilation of a single function.  (But I am adding a
pointer to the transformation summary of all.)

>> > > +  m_tree_to_idx = hash_map::create_ggc (c);
>> > > +  unsigned index = 0;
>> > > +  for (tree p = DECL_ARGUMENTS (fndecl); p; p = DECL_CHAIN (p), index++)
>> > > +m_tree_to_idx->put (p, index);
>> >
>> > I think allocating the hash-map with 'c' for some numbers (depending
>> > on the "prime"
>> > chosen) will necessarily cause re-allocation of the hash since we keep a 
>> > load
>> > factor of at most 3/4 upon insertion.

Oh, right.

>> >
>> > But - I wonder if a UID sorted array isn't a very much better data
>> > structure for this?
>> > That is, a vec >?
>>
>> Yeah, I was thinking along this lines too.
>> Having field directly in PARM_DECL node would be probably prettiest.
>> In general this is probably not that important as wast amount of time we
>> have few parameters and linear lookup is just fine.
>
> There is 6 bits of DECL_OFFSET_ALIGN that could be re-purposed, but
> 64 parameters is a bit low.  _Maybe_ PARM_DECL doesn't need any of
> the tree_base bits so could use the full word for sth else as well ...
>
> I also though it might be interesting to only record PARM_DECLs that
> we have interesting info for and skip VARYING ones.  So with an
> indirection DECL_OFFSET_ALIGN -> index to non-varying param or
> -1 the encoding space could shrink.
>
> But still using a vec<> looks like a straight-forward improvement here.

Yeah, 64 parameters seems too tight.  I guess a testcase in which we
would record information for that many parameters would be quite
artificial, but I can imagine something like that in machine generated
code.

Below is the patch based on DECL_UIDs in a vector.  The problem with
std::pair is that it is not GC-friendly and the transformation summary
unfortunately needs to live in GC.  So I added a simple GTY marked
structure.

Bootstrapped, tested and (together with the subsequent patch) LTO
bootstrapped on an x86_64-linux, as is and with lower threshold to
create the mapping.  OK for master now?

Thanks,

Martin

Subject: [PATCH 1/2] ipa-cp: Avoid long linear searches through DECL_ARGUMENTS

There have been concerns that linear searches through DECL_ARGUMENTS
that are often necessary to compute the index of a particular
PARM_DECL which is the key to results of IPA-CP can happen often
enough to be a compile time issue, especially if we plug the results
into value numbering, as I intend to do with a follow-up patch.

This patch creates a vector sorted according to PARM_DECLs to do the look-up
for all functions which have some information discovered by IPA-CP and which
have 32 parameters or more.  32 is a hard-wired magical constant here to
capture the trade-off between the memory allocation overhead and length of the
linear search.  I do not think it is worth making it a --param but if people
think it appropriate, I can turn it into one.

gcc/ChangeLog:

2023-05-31  Martin Jambor  

* ipa-prop.h (ipa_uid_to_idx_map_elt): New type.
(struct ipcp_transformation): Rearrange members according to
C++ class coding convention, add m_uid_to_idx,
get_param_index and maybe_create_parm_idx_map.
* ipa-cp.cc (ipcp_transformation::get_param_index): New function.
(compare_uids): Likewise.
(ipcp_transformation::maype_create_parm_idx_map): Likewise.
*

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-31 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-05-31 at 17:11 +0800, Kewen.Lin wrote:
> > So, there is no need for the builtin to have to determine if the
> > user
> > is storing the result of the __builtin_set_fpscr_rn.  The RN bits
> > will
> > always be updated by the __builtin_set_fpscr_rn builtin and the
> > existing fields of the FPSCR will always be returned by the
> > builtin.
> 
> Yeah, I agree, even with pre-P9 code when the returned value is
> unused,
> I'd expect DCE can eliminate the part for the FPSCR bits reading and
> masking, it's just like before (only setting RN bits).
> 
> The only concern I mentioned before is the built-in name doesn't
> clearly
> match what it does (with extending, it returns something instead)
> since
> it's only saying "set" and setting RN bits, the return value is
> easily
> misunderstood as returning old RN bits, the documentation has to
> explain
> and note it well.
> 
> Looking forward to Segher's opinion on this.

I have the patch to extend the __builtin_set_fpscr_rn builtin working. 
I agree the documentation on the instructions in the ISA is not really
clear about that.  It needs to be much more explicit in the builtin
description that the current RN field is returned then the field is
updated with the new RN bits from the argument.  

I sent the patch, with the updated builtin description and testcases to
the GLibC team to see what they thought of it.  The goal was for the
builtin to be effectively a "drop in replacement" for the inline asm
that they have.  I was planning on posting the new version if the GLibC
team says it works for them.  Hopefully I will hear from them soon.

Carl

Re: Build-break in libstdc++-v3 at r14-1442-ge1240bda3e0bb1 for non-float128 targets

2023-05-31 Thread Jonathan Wakely via Gcc-patches

On Wed, 31 May 2023 at 16:29, Hans-Peter Nilsson via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Since I don't see a quick fix at r14-1444-g3f4853a5f00fab, I
> thought I'd better notify the author (I have written authors
> if there was more than one ;-) of suspect commits in the
> range r14-1425-g80ee7d02e8db48..e1240bda3e0b for the
> build-break at r14-1442-ge1240bda3e0bb1 for cris-elf, where
> I get:
>
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:47: error:
> '_Float128' is not supported on this target
>  1330 | from_chars(const char* first, const char* last, _Float128& value,
>   |   ^
>

Sorry, I'll fix or revert it today.



> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
> expected identifier before '_Float128'
>  1330 | from_chars(const char* first, const char* last, _Float128& value,
>   | ^
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
> '_Float128' is not supported on this target
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error:
> expected ',' or '...' before '_Float128'
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc: In function
> 'std::from_chars_result std::from_chars(const char*, const char*, int)':
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1340:53: error: 'fmt'
> was not declared in this scope; did you mean 'fma'?
>  1340 |   auto res = std::from_chars(first, last, ldbl_val, fmt);
>   | ^~~
>   | fma
> /x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1342:5: error:
> 'value' was not declared in this scope
>  1342 | value = ldbl_val;
>   | ^
> make[5]: *** [Makefile:587: floating_from_chars.lo] Error 1
>
> brgds, H-P
>
>

Re: [PATCH V2] Testsuite: Fix a fail about xtheadcondmov-indirect-rv64.c

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 09:08, shiyul...@iscas.ac.cn wrote:

From: yulong 

I find fail of the xtheadcondmov-indirect-rv64.c test case and provide a way to 
solve it.
In this patch, I take Kito's advice that I modify the form of the function 
bodies.It likes
*[a-x0-9].

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/xtheadcondmov-indirect-rv32.c:Modify
 * gcc.target/riscv/xtheadcondmov-indirect-rv64.c:Modify

I adjusted the ChangeLog entry and pushed this to the trunk.

jeff

Build-break in libstdc++-v3 at r14-1442-ge1240bda3e0bb1 for non-float128 targets

2023-05-31 Thread Hans-Peter Nilsson via Gcc-patches

Since I don't see a quick fix at r14-1444-g3f4853a5f00fab, I
thought I'd better notify the author (I have written authors
if there was more than one ;-) of suspect commits in the
range r14-1425-g80ee7d02e8db48..e1240bda3e0b for the
build-break at r14-1442-ge1240bda3e0bb1 for cris-elf, where
I get:

/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:47: error: 
'_Float128' is not supported on this target
 1330 | from_chars(const char* first, const char* last, _Float128& value,
  |   ^
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error: expected 
identifier before '_Float128'
 1330 | from_chars(const char* first, const char* last, _Float128& value,
  | ^
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error: 
'_Float128' is not supported on this target
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1330:49: error: expected 
',' or '...' before '_Float128'
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc: In function 
'std::from_chars_result std::from_chars(const char*, const char*, int)':
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1340:53: error: 'fmt' was 
not declared in this scope; did you mean 'fma'?
 1340 |   auto res = std::from_chars(first, last, ldbl_val, fmt);
  | ^~~
  | fma
/x/gcc/libstdc++-v3/src/c++17/floating_from_chars.cc:1342:5: error: 'value' was 
not declared in this scope
 1342 | value = ldbl_val;
  | ^
make[5]: *** [Makefile:587: floating_from_chars.lo] Error 1

brgds, H-P

[PATCH] reload_cse_move2add: Handle trivial single_set:s

2023-05-31 Thread Hans-Peter Nilsson via Gcc-patches

Tested cris-elf, bootstrapped & checked native
x86_64-pc-linux-gnu for good measure.  Ok to commit?

If it wasn't for there already being an auto_inc_dec pass,
this looks like a good place to put it, considering the
framework data.  (BTW, current auto-inc-dec generation is so
poor that you can replace half of what auto_inc_dec does
with a few peephole2s.)

brgds, H-P

-- >8 --
The reload_cse_move2add part of "postreload" handled only
insns whose PATTERN was a SET.  That excludes insns that
e.g. clobber a flags register, which it does only for
"simplicity".  The patch extends the "simplicity" to most
single_set insns.  For a subset of those insns there's still
an assumption; that the single_set of a PARALLEL insn is the
first element in the PARALLEL.  If the assumption fails,
it's no biggie; the optimization just isn't performed.
Don't let the name deceive you, this optimization doesn't
hit often, but as often (or as rarely) for LRA as for reload
at least on e.g. cris-elf where the biggest effect was seen
in reducing repeated addresses in copies from fixed-address
arrays, like in gcc.c-torture/compile/pr78694.c.

* postreload.cc (move2add_use_add2_insn): Handle
trivial single_sets.  Rename variable PAT to SET.
(move2add_use_add3_insn, reload_cse_move2add): Similar.
---
 gcc/postreload.cc | 67 +++
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/gcc/postreload.cc b/gcc/postreload.cc
index fb392651e1b6..996206f589d3 100644
--- a/gcc/postreload.cc
+++ b/gcc/postreload.cc
@@ -1744,8 +1744,8 @@ static bool
 move2add_use_add2_insn (scalar_int_mode mode, rtx reg, rtx sym, rtx off,
rtx_insn *insn)
 {
-  rtx pat = PATTERN (insn);
-  rtx src = SET_SRC (pat);
+  rtx set = single_set (insn);
+  rtx src = SET_SRC (set);
   int regno = REGNO (reg);
   rtx new_src = gen_int_mode (UINTVAL (off) - reg_offset[regno], mode);
   bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
@@ -1764,21 +1764,21 @@ move2add_use_add2_insn (scalar_int_mode mode, rtx reg, 
rtx sym, rtx off,
 (reg)), would be discarded.  Maybe we should
 try a truncMN pattern?  */
   if (INTVAL (off) == reg_offset [regno])
-   changed = validate_change (insn, _SRC (pat), reg, 0);
+   changed = validate_change (insn, _SRC (set), reg, 0);
 }
   else
 {
   struct full_rtx_costs oldcst, newcst;
   rtx tem = gen_rtx_PLUS (mode, reg, new_src);
 
-  get_full_set_rtx_cost (pat, );
-  SET_SRC (pat) = tem;
-  get_full_set_rtx_cost (pat, );
-  SET_SRC (pat) = src;
+  get_full_set_rtx_cost (set, );
+  SET_SRC (set) = tem;
+  get_full_set_rtx_cost (set, );
+  SET_SRC (set) = src;
 
   if (costs_lt_p (, , speed)
  && have_add2_insn (reg, new_src))
-   changed = validate_change (insn, _SRC (pat), tem, 0);   
+   changed = validate_change (insn, _SRC (set), tem, 0);
   else if (sym == NULL_RTX && mode != BImode)
{
  scalar_int_mode narrow_mode;
@@ -1796,10 +1796,15 @@ move2add_use_add2_insn (scalar_int_mode mode, rtx reg, 
rtx sym, rtx off,
narrow_reg),
   narrow_src);
  get_full_set_rtx_cost (new_set, );
- if (costs_lt_p (, , speed))
+
+ /* We perform this replacement only if NEXT is either a
+naked SET, or else its single_set is the first element
+in a PARALLEL.  */
+ rtx *setloc = GET_CODE (PATTERN (insn)) == PARALLEL
+   ?  (PATTERN (insn), 0) :  (insn);
+ if (*setloc == set && costs_lt_p (, , speed))
{
- changed = validate_change (insn,  (insn),
-new_set, 0);
+ changed = validate_change (insn, setloc, new_set, 0);
  if (changed)
break;
}
@@ -1825,8 +1830,8 @@ static bool
 move2add_use_add3_insn (scalar_int_mode mode, rtx reg, rtx sym, rtx off,
rtx_insn *insn)
 {
-  rtx pat = PATTERN (insn);
-  rtx src = SET_SRC (pat);
+  rtx set = single_set (insn);
+  rtx src = SET_SRC (set);
   int regno = REGNO (reg);
   int min_regno = 0;
   bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
@@ -1836,10 +1841,10 @@ move2add_use_add3_insn (scalar_int_mode mode, rtx reg, 
rtx sym, rtx off,
   rtx plus_expr;
 
   init_costs_to_max ();
-  get_full_set_rtx_cost (pat, );
+  get_full_set_rtx_cost (set, );
 
   plus_expr = gen_rtx_PLUS (GET_MODE (reg), reg, const0_rtx);
-  SET_SRC (pat) = plus_expr;
+  SET_SRC (set) = plus_expr;
 
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 if (move2add_valid_value_p (i, mode)
@@ -1864,7 +1869,7 @@ move2add_use_add3_insn (scalar_int_mode mode, rtx reg, 
rtx sym, rtx off,
else

[PATCH V2] Testsuite: Fix a fail about xtheadcondmov-indirect-rv64.c

2023-05-31 Thread shiyulong

From: yulong 

I find fail of the xtheadcondmov-indirect-rv64.c test case and provide a way to 
solve it.
In this patch, I take Kito's advice that I modify the form of the function 
bodies.It likes
*[a-x0-9].

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadcondmov-indirect-rv32.c:Modify
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c:Modify

---
 .../riscv/xtheadcondmov-indirect-rv32.c   | 50 +--
 .../riscv/xtheadcondmov-indirect-rv64.c   | 50 +--
 2 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c 
b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
index e2b135f3d00..d0df59c5e1c 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv32.c
@@ -5,9 +5,9 @@
 
 /*
 **ConEmv_imm_imm_reg:
-** addia5,a0,-1000
-** li  a0,10
-** th.mvneza0,a1,a5
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** li\t\s*[a-x0-9]+,10+
+** th.mvnez\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_imm_imm_reg(int x, int y){
@@ -17,9 +17,9 @@ int ConEmv_imm_imm_reg(int x, int y){
 
 /*
 **ConEmv_imm_reg_reg:
-** addia5,a0,-1000
-** th.mveqza2,a1,a5
-** mv  a0,a2
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** th.mveqz\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** mv\t\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_imm_reg_reg(int x, int y, int z){
@@ -29,9 +29,9 @@ int ConEmv_imm_reg_reg(int x, int y, int z){
 
 /*
 **ConEmv_reg_imm_reg:
-** sub a1,a0,a1
-** li  a0,10
-** th.mvneza0,a2,a1
+** sub\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** li\t\s*[a-x0-9]+,10+
+** th.mvnez\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_reg_imm_reg(int x, int y, int z){
@@ -41,9 +41,9 @@ int ConEmv_reg_imm_reg(int x, int y, int z){
 
 /*
 **ConEmv_reg_reg_reg:
-** sub a1,a0,a1
-** th.mveqza3,a2,a1
-** mv  a0,a3
+** sub\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** th.mveqz\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** mv\t\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_reg_reg_reg(int x, int y, int z, int n){
@@ -53,10 +53,10 @@ int ConEmv_reg_reg_reg(int x, int y, int z, int n){
 
 /*
 **ConNmv_imm_imm_reg:
-** addia5,a0,-1000
-** li  a0,9998336
-** addia0,a0,1664
-** th.mveqza0,a1,a5
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** li\t\s*[a-x0-9]+,9998336+
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,1664+
+** th.mveqz\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConNmv_imm_imm_reg(int x, int y){
@@ -66,9 +66,9 @@ int ConNmv_imm_imm_reg(int x, int y){
 
 /*
 **ConNmv_imm_reg_reg:
-** addia0,a0,-1000
-** th.mvneza2,a1,a0
-** mv  a0,a2
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** th.mvnez\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** mv\t\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConNmv_imm_reg_reg(int x, int y, int z){
@@ -78,9 +78,9 @@ int ConNmv_imm_reg_reg(int x, int y, int z){
 
 /*
 **ConNmv_reg_imm_reg:
-** sub a1,a0,a1
-** li  a0,10
-** th.mveqza0,a2,a1
+** sub\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** li\t\s*[a-x0-9]+,10+
+** th.mveqz\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConNmv_reg_imm_reg(int x, int y, int z){
@@ -90,9 +90,9 @@ int ConNmv_reg_imm_reg(int x, int y, int z){
 
 /*
 **ConNmv_reg_reg_reg:
-** sub a0,a0,a1
-** th.mvneza3,a2,a0
-** mv  a0,a3
+** sub\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** th.mvnez\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** mv\t\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConNmv_reg_reg_reg(int x, int y, int z, int n){
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c 
b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c
index 99956f8496c..cc971a75ace 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadcondmov-indirect-rv64.c
@@ -5,9 +5,9 @@
 
 /*
 **ConEmv_imm_imm_reg:
-** addia5,a0,-1000
-** li  a0,10
-** th.mvneza0,a1,a5
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** li\t\s*[a-x0-9]+,10+
+** th.mvnez\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_imm_imm_reg(int x, int y){
@@ -17,9 +17,9 @@ int ConEmv_imm_imm_reg(int x, int y){
 
 /*
 **ConEmv_imm_reg_reg:
-** addia0,a0,-1000
-** th.mveqza2,a1,a5
-** mv  a0,a2
+** addi\t\s*[a-x0-9]+,\s*[a-x0-9]+,-1000+
+** th.mveqz\t\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+
+** mv\t\s*[a-x0-9]+,\s*[a-x0-9]+
 ** ret
 */
 int ConEmv_imm_reg_reg(int x, int y, int z){
@@ -29,9 +29,9 @@ int ConEmv_imm_reg_reg(int x, int y, int z){
 
 /*

[PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe . zhong

From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

Include fixes from kewen.


This patch will need to wait for Kewen's test feedback.

Testing on X86 is on-going

Co-Authored by: Kewen Lin  

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
-insert_after, _before_incr, _after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+_gsi, insert_after, _before_incr,
+_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 _seq, _seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-_step);
+_step, _step);
 
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (_cond_gsi, cond_stmt, GSI_SAME_STMT);
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3

Re: [committed] libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

2023-05-31 Thread Jakub Jelinek via Gcc-patches

On Wed, May 31, 2023 at 03:04:03PM +0100, Jonathan Wakely via Gcc-patches wrote:
> On Wed, 31 May 2023 at 13:23, Jonathan Wakely via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
> 
> > Tested powerpc64le-linux. Pushed to trunk.
> >
> > -- >8 --
> >
> > As suggested by Jakub in the PR, this just hardcodes the constants with
> > a Q suffix, since the properties of __float128 are not going to change.
> >
> > We can only define it for non-strict modes because the suffix gives an
> > error otherwise, even in system headers:
> >
> > limits:2085: error: unable to find numeric literal operator 'operator""Q'
> >
> > libstdc++-v3/ChangeLog:
> >
> > PR libstdc++/104772
> > * include/std/limits (numeric_limits<__float128>): Define.
> > * testsuite/18_support/numeric_limits/128bit.cc: New test.
> >
> 
> I should have tested this with clang before pushing:
> 
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2125:
> 16: error: use of undeclared identifier '__builtin_huge_valq'
>  { return __builtin_huge_valq(); }
>   ^
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2129:
> 16: error: use of undeclared identifier '__builtin_nanq'
>  { return __builtin_nanq(""); }
>   ^
> /home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2133:
> 16: error: use of undeclared identifier '__builtin_nansq'
>  { return __builtin_nansq(""); }
>   ^

See my comments in bugzilla how to support this stuff even without
being able to use Q suffixes.
As for the builtins, no reason not to use __float128(__builtin_huge_val())
for the first case or __float128(__builtin_nan("")) for the second case.
I'm afraid there is nothing that can be done about signalling NaN
of neither __builtin_nansq nor __builtin_nansf128 is supported.

Jakub

RE: Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed with that change, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of ???
Sent: Wednesday, May 31, 2023 9:24 PM
To: Jeff Law ; rdapp.gcc ; 
gcc-patches 
Cc: kito.cheng ; kito.cheng ; 
palmer ; palmer 
Subject: Re: Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv 
lowering optimizaiton for RVV auto-vectorization

I have sent V2 withing adding commen:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620243.html 
Could you take a look at it?




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-31 20:58
To: Robin Dapp; juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering 
optimizaiton for RVV auto-vectorization
 
 
On 5/31/23 05:55, Robin Dapp wrote:
> Hi Juzhe,
> 
>> The approach is quite simple and obvious, changing extension pattern
>> into define_insn_and_split will make combine PASS combine into widen
>> operations naturally.
> 
> looks good to me.  Tiny nit: I would add a comment above the patterns
> to clarify why insn_and_split instead of expand.  Something like "to help
> combine match...", no need for a V2 though.
OK with that change.
 
jeff

Re: [committed] libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

On Wed, 31 May 2023 at 13:23, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested powerpc64le-linux. Pushed to trunk.
>
> -- >8 --
>
> As suggested by Jakub in the PR, this just hardcodes the constants with
> a Q suffix, since the properties of __float128 are not going to change.
>
> We can only define it for non-strict modes because the suffix gives an
> error otherwise, even in system headers:
>
> limits:2085: error: unable to find numeric literal operator 'operator""Q'
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/104772
> * include/std/limits (numeric_limits<__float128>): Define.
> * testsuite/18_support/numeric_limits/128bit.cc: New test.
>

I should have tested this with clang before pushing:

/home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2125:
16: error: use of undeclared identifier '__builtin_huge_valq'
 { return __builtin_huge_valq(); }
  ^
/home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2129:
16: error: use of undeclared identifier '__builtin_nanq'
 { return __builtin_nanq(""); }
  ^
/home/jwakely/gcc/latest/lib/gcc/x86_64-pc-linux-gnu/14.0.0/../../../../include/c++/14.0.0/limits:2133:
16: error: use of undeclared identifier '__builtin_nansq'
 { return __builtin_nansq(""); }
  ^

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 06:19, Manolis Tsamis wrote:

On Tue, May 30, 2023 at 2:30 AM Jeff Law  wrote:




On 5/25/23 08:02, Manolis Tsamis wrote:

On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
 wrote:


On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
 wrote:




On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:

On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  wrote:


Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.


Why do fwprop or combine not what you want to do?

I think a lot of them end up coming from register elimination.


Why isn't this a problem for other targets then?  Or maybe it is and this
shouldn't be a machine specific pass?  Maybe postreload-gcse should
perform strength reduction (I can't think of any other post reload pass
that would do something even remotely related).

Richard.



It should be a problem for other targets as well (especially RISC-style ISAs).

It can be easily seen by comparing the generated code for the
testcases: Example for testcase-2 on AArch64:
https://godbolt.org/z/GMT1K7Ebr
Although the patterns in the test cases are the ones that are simple
as the complex ones manifest in complex programs, the case still
holds.
The code for this pass is quite generic and could work for most/all
targets if that would be interesting.

Interestly enough, fold-mem-offsets seems to interact strangely with the
load/store pair support on aarch64.  Note show store2a uses 2 stp
instructions on the trunk, but 4 str instructions with fold-mem-offsets.
   Yet in load1r we're able to generate a load-quad rather than two load
pairs.  Weird.



I'm confused, where is this comparison from?
The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't)
affect AArch64.

I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs
clang, no fold-mem-offsets involved here.
My bad!  I should have looked at the headings more closely.  I thought 
you'd set up a with/without fold-mem-offsets comparison.


jeff

RE: [PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, May 31, 2023 9:07 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization



On 5/31/23 04:23, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> Apparently, we are missing vrsub.vi tests.
> 
> gcc/testsuite/ChangeLog:
> 
>  * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
>  * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
>  * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.
OK
jeff

RE: [PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, May 31, 2023 9:03 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening 
conversion)



On 5/31/23 04:35, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> Base on the discussion here:
> https://github.com/riscv/riscv-v-spec/issues/884
> 
> vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching 
> support.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Remove FRM.
OK
jeff

Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 07:23, 钟居哲 wrote:

I have sent V2 withing adding commen:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620243.html 


Could you take a look at it?
I'd probably remove "PASS" from that comment.  I don't think you need to 
post a V3.  Just remove that word from the comment and commit.

jeff

RE: [PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, May 31, 2023 9:01 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to 
float widening conversion)



On 5/31/23 04:43, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> Base on the discussion here:
> https://github.com/riscv/riscv-v-spec/issues/884
> 
> vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode 
> switching support.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Remove FRM.
OK.
jeff

RE: [PATCH] RISC-V: Remove FRM for vfncvt.rod instruction

2023-05-31 Thread Li, Pan2 via Gcc-patches

Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Wednesday, May 31, 2023 9:02 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Remove FRM for vfncvt.rod instruction



On 5/31/23 04:47, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Remove FRM.
OK
jeff

Re: ping^^: [PATCH] rs6000: Enable const_anchor for 'addi'

2023-05-31 Thread David Edelsohn via Gcc-patches

On Tue, May 30, 2023 at 11:00 PM Jiufu Guo  wrote:

>
> Gentle ping...
>
> Jiufu Guo via Gcc-patches  writes:
>
> > Gentle ping...
> >
> > Jiufu Guo via Gcc-patches  writes:
> >
> >> Hi,
> >>
> >> I'm thinking that we may enable this patch for stage1, so ping it.
> >> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603530.html
> >>
> >> BR,
> >> Jeff (Jiufu)
> >>
> >> Jiufu Guo  writes:
> >>
> >>> Hi,
> >>>
> >>> There is a functionality as const_anchor in cse.cc.  This const_anchor
> >>> supports to generate new constants through adding small gap/offsets to
> >>> existing constant.  For example:
> >>>
> >>> void __attribute__ ((noinline)) foo (long long *a)
> >>> {
> >>>   *a++ = 0x2351847027482577LL;
> >>>   *a++ = 0x2351847027482578LL;
> >>> }
> >>> The second constant (0x2351847027482578LL) can be compated by adding
> '1'
> >>> to the first constant (0x2351847027482577LL).
> >>> This is profitable if more than one instructions are need to build the
> >>> second constant.
> >>>
> >>> * For rs6000, we can enable this functionality, as the instruction
> >>> 'addi' is just for this when gap is smaller than 0x8000.
> >>>
> >>> * Besides enabling TARGET_CONST_ANCHOR on rs6000, this patch also fixed
> >>> one issue. The issue is:
> >>> "gcc_assert (SCALAR_INT_MODE_P (mode))" is an requirement for function
> >>> "try_const_anchors".
> >>>
> >>> * One potential side effect of this patch:
> >>> Comparing with
> >>> "r101=0x2351847027482577LL
> >>> ...
> >>> r201=0x2351847027482578LL"
> >>> The new r201 will be "r201=r101+1", and then r101 will live longer,
> >>> and would increase pressure when allocating registers.
> >>> But I feel, this would be acceptable for this const_anchor feature.
> >>>
> >>> * With this patch, I checked the performance change on SPEC2017, while,
> >>> and the performance is not aggressive, since this functionality is not
> >>> hit on any hot path. There are runtime wavings/noise(e.g. on
> >>> povray_r/xalancbmk_r/xz_r), that are not caused by the patch.
> >>>
> >>> With this patch, I also checked the changes in object files (from
> >>> GCC bootstrap and SPEC), the significant changes are the improvement
> >>> that: "addi" vs. "2 or more insns: lis+or.."; it also exposes some
> >>> other optimizations opportunities: like combine/jump2. While the
> >>> code to store/load one more register is also occurring in few cases,
> >>> but it does not impact overall performance.
> >>>
> >>> * To refine this patch, some history discussions are referenced:
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33699
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2009-April/260421.html
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566744.html
> >>>
> >>>
> >>> Bootstrap and regtest pass on ppc64 and ppc64le for this patch.
> >>> Is this ok for trunk?
>

Hi, Jiufu

Thanks for developing this patch and your persistence.

The rs6000.cc part of the patch (TARGET_CONST_ANCHOR) is okay for Stage 1.
This is approved.

I don't have the authority to approve the change to cse_insn.  Is the
cse_insn change a prerequisite?  Will the rs6000 change break or produce
wrong code without the cse change?  The second part of the patch should be
posted separately to the mailing list, with a cc for appropriate
maintainers, because most maintainers will not be following this specific
thread to approve the other part of the patch.

Thanks, David


> >>>
> >>>
> >>> BR,
> >>> Jeff (Jiufu)
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>> * config/rs6000/rs6000.cc (TARGET_CONST_ANCHOR): New define.
> >>> * cse.cc (cse_insn): Add guard condition.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>> * gcc.target/powerpc/const_anchors.c: New test.
> >>> * gcc.target/powerpc/try_const_anchors_ice.c: New test.
> >>>
> >>> ---
> >>>  gcc/config/rs6000/rs6000.cc   |  4 
> >>>  gcc/cse.cc|  3 ++-
> >>>  .../gcc.target/powerpc/const_anchors.c| 20 +++
> >>>  .../powerpc/try_const_anchors_ice.c   | 16 +++
> >>>  4 files changed, 42 insertions(+), 1 deletion(-)
> >>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/const_anchors.c
> >>>  create mode 100644
> gcc/testsuite/gcc.target/powerpc/try_const_anchors_ice.c
> >>>
> >>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >>> index d2743f7bce6..80cded6dec1 100644
> >>> --- a/gcc/config/rs6000/rs6000.cc
> >>> +++ b/gcc/config/rs6000/rs6000.cc
> >>> @@ -1760,6 +1760,10 @@ static const struct attribute_spec
> rs6000_attribute_table[] =
> >>>
> >>>  #undef TARGET_UPDATE_IPA_FN_TARGET_INFO
> >>>  #define TARGET_UPDATE_IPA_FN_TARGET_INFO
> rs6000_update_ipa_fn_target_info
> >>> +
> >>> +#undef TARGET_CONST_ANCHOR
> >>> +#define TARGET_CONST_ANCHOR 0x8000
> >>> +
> >>>
> >>>
> >>>  /* Processor table.  */
> >>> diff --git a/gcc/cse.cc b/gcc/cse.cc
> >>> index b13afd4ba72..56542b91c1e 100644
> >>> --- a/gcc/cse.cc
> >>> +++ b/gcc/cse.cc
> >>> @@

Re: Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread 钟居哲

I have sent V2 withing adding commen:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620243.html 
Could you take a look at it?




juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-31 20:58
To: Robin Dapp; juzhe.zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering 
optimizaiton for RVV auto-vectorization
 
 
On 5/31/23 05:55, Robin Dapp wrote:
> Hi Juzhe,
> 
>> The approach is quite simple and obvious, changing extension pattern
>> into define_insn_and_split will make combine PASS combine into widen
>> operations naturally.
> 
> looks good to me.  Tiny nit: I would add a comment above the patterns
> to clarify why insn_and_split instead of expand.  Something like "to help
> combine match...", no need for a V2 though.
OK with that change.
 
jeff

[PATCH V2] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine 
PASS
;; to combine instructions as below:
;;   vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv

gcc/ChangeLog:

* config/riscv/autovec.md (2): Change 
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 16 ++---
 .../riscv/rvv/autovec/widen/widen-1.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-2.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-3.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-4.c | 23 +
 .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-4.c | 31 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++
 10 files changed, 262 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4834bb4b412..2a21ce3f93c 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -401,16 +401,24 @@
 ;; - vsext.vf[2|4|8]
 ;; -
 
-(define_expand "2"
-  [(set (match_operand:VWEXTI 0 "register_operand")
+;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine 
PASS
+;; to combine instructions as below:
+;;   vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv
+(define_insn_and_split "2"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=")
 (any_extend:VWEXTI
- (match_operand: 1 "register_operand")))]
+ (match_operand: 1 "register_operand" "vr")))]
   "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
 {
   insn_code icode = code_for_pred_vf2 (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
-})
+}
+  [(set_attr "type" "vext")
+   (set_attr "mode" "")])
 
 (define_expand "2"
   [(set (match_operand:VQEXTI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
new file mode 100644
index 000..00edecab089
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE1, TYPE2)
\
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   
\
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] = (TYPE1) a[i] + (TYPE1) b[i];
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_TYPE (int16_t, int8_t)  
\
+  TEST_TYPE (uint16_t, uint8_t)
\
+  TEST_TYPE (int32_t, int16_t) 
\
+  TEST_TYPE (uint32_t, uint16_t)   
\
+  TEST_TYPE (int64_t, int32_t)

Re: [PATCH v2] aarch64: Add pattern for bswap + rotate [PR 110039]

2023-05-31 Thread Richard Sandiford via Gcc-patches

Christophe Lyon  writes:
> After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
> pattern to match the new GIMPLE form.
>
> With this patch, gcc.target/aarch64/rev16_2.c passes again.
>
> 2023-05-31  Christophe Lyon  
>
>   PR target/110039
>   gcc/
>   * config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New
>   pattern.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64.md | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 8b8951d7b14..9af7024da43 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -6267,6 +6267,16 @@
>[(set_attr "type" "rev")]
>  )
>  
> +;; Similar pattern to match (rotate (bswap) 16)
> +(define_insn "aarch64_rev16si2_alt3"
> +  [(set (match_operand:SI 0 "register_operand" "=r")
> +(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "r"))
> +   (const_int 16)))]
> +  ""
> +  "rev16\\t%w0, %w1"
> +  [(set_attr "type" "rev")]
> +)
> +
>  ;; zero_extend version of above
>  (define_insn "*bswapsi2_uxtw"
>[(set (match_operand:DI 0 "register_operand" "=r")

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-05-31 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs  wrote:
>>
>> Hi all,
>>
>> I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
>> do it because the GCC middle-end models DIVMOD's return value as
>> "complex int" type, and there are no vector equivalents of that type.
>>
>> Therefore, this patch adds minimal support for "complex vector int"
>> modes.  I have not attempted to provide any means to use these modes
>> from C, so they're really only useful for DIVMOD.  The actual libfunc
>> implementation will pack the data into wider vector modes manually.
>>
>> A knock-on effect of this is that I needed to increase the range of
>> "mode_unit_size" (several of the vector modes supported by amdgcn exceed
>> the previous 255-byte limit).
>>
>> Since this change would add a large number of new, unused modes to many
>> architectures, I have elected to *not* enable them, by default, in
>> machmode.def (where the other complex modes are created).  The new modes
>> are therefore inactive on all architectures but amdgcn, for now.
>>
>> OK for mainline?  (I've not done a full test yet, but I will.)
>
> I think it makes more sense to map vector CSImode to vector SImode with
> the double number of lanes.

Agreed FWIW.  This is effectively what AArch64 now does for x2, x3 and
x4 tuple types (where x2 is often used for complex values).

Thanks,
Richard

[PATCH v2] aarch64: Add pattern for bswap + rotate [PR 110039]

2023-05-31 Thread Christophe Lyon via Gcc-patches

After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
pattern to match the new GIMPLE form.

With this patch, gcc.target/aarch64/rev16_2.c passes again.

2023-05-31  Christophe Lyon  

PR target/110039
gcc/
* config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New
pattern.
---
 gcc/config/aarch64/aarch64.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8b8951d7b14..9af7024da43 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6267,6 +6267,16 @@
   [(set_attr "type" "rev")]
 )
 
+;; Similar pattern to match (rotate (bswap) 16)
+(define_insn "aarch64_rev16si2_alt3"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+(rotate:SI (bswap:SI (match_operand:SI 1 "register_operand" "r"))
+   (const_int 16)))]
+  ""
+  "rev16\\t%w0, %w1"
+  [(set_attr "type" "rev")]
+)
+
 ;; zero_extend version of above
 (define_insn "*bswapsi2_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
-- 
2.34.1

Re: [PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 04:23, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

Apparently, we are missing vrsub.vi tests.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
 * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
 * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
 * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.

OK
jeff

Re: [PATCH] alias: Change return type of predicate functions from int to bool

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 01:51, Uros Bizjak via Gcc-patches wrote:

Also remove a bunch of unneeded forward declarations.

gcc/ChangeLog:

 * rtl.h (true_dependence): Change return type from int to bool.
 (canon_true_dependence): Ditto.
 (read_dependence): Ditto.
 (anti_dependence): Ditto.
 (canon_anti_dependence): Ditto.
 (output_dependence): Ditto.
 (canon_output_dependence): Ditto.
 (may_alias_p): Ditto.
 * alias.h (alias_sets_conflict_p): Ditto.
 (alias_sets_must_conflict_p): Ditto.
 (objects_must_conflict_p): Ditto.
 (nonoverlapping_memrefs_p): Ditto.
 * alias.cc (rtx_equal_for_memref_p): Remove forward declaration.
 (record_set): Ditto.
 (base_alias_check): Ditto.
 (find_base_value): Ditto.
 (mems_in_disjoint_alias_sets_p): Ditto.
 (get_alias_set_entry): Ditto.
 (decl_for_component_ref): Ditto.
 (write_dependence_p): Ditto.
 (memory_modified_1): Ditto.
 (mems_in_disjoint_alias_set_p): Change return type from int to bool
 and adjust function body accordingly.
 (alias_sets_conflict_p): Ditto.
 (alias_sets_must_conflict_p): Ditto.
 (objects_must_conflict_p): Ditto.
 (rtx_equal_for_memref_p): Ditto.
 (base_alias_check): Ditto.
 (read_dependence): Ditto.
 (nonoverlapping_memrefs_p): Ditto.
 (true_dependence_1): Ditto.
 (true_dependence): Ditto.
 (canon_true_dependence): Ditto.
 (write_dependence_p): Ditto.
 (anti_dependence): Ditto.
 (canon_anti_dependence): Ditto.
 (output_dependence): Ditto.
 (canon_output_dependence): Ditto.
 (may_alias_p): Ditto.
 (init_alias_analysis): Change "changed" variable to bool.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master?

OK
jeff

Re: [PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 04:35, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching 
support.

gcc/ChangeLog:

 * config/riscv/vector.md: Remove FRM.

OK
jeff

Re: [PATCH] RISC-V: Remove FRM for vfncvt.rod instruction

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 04:47, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.

gcc/ChangeLog:

 * config/riscv/vector.md: Remove FRM.

OK
jeff

Re: [PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 04:43, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode 
switching support.

gcc/ChangeLog:

 * config/riscv/vector.md: Remove FRM.

OK.
jeff

Re: [PATCH] emit-rtl: Change return type of predicate functions from int to bool

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 05:06, Uros Bizjak via Gcc-patches wrote:

Also fix some stalled comments.

gcc/ChangeLog:

 * rtl.h (subreg_lowpart_p): Change return type from int to bool.
 (active_insn_p): Ditto.
 (in_sequence_p): Ditto.
 (unshare_all_rtl): Change return type from int to void.
 * emit-rtl.h (mem_expr_equal_p): Change return type from int to bool.
 * emit-rtl.cc (subreg_lowpart_p): Change return type from int to bool
 and adjust function body accordingly.
 (mem_expr_equal_p): Ditto.
 (unshare_all_rtl): Change return type from int to void
 and adjust function body accordingly.
 (verify_rtx_sharing): Remove unneeded return.
 (active_insn_p): Change return type from int to bool
 and adjust function body accordingly.
 (in_sequence_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master?
OK.  And given the nature of these changes, let's consider further 
changes of this nature pre-approved.


jeff

Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread Jeff Law via Gcc-patches





On 5/31/23 05:55, Robin Dapp wrote:

Hi Juzhe,


The approach is quite simple and obvious, changing extension pattern
into define_insn_and_split will make combine PASS combine into widen
operations naturally.


looks good to me.  Tiny nit: I would add a comment above the patterns
to clarify why insn_and_split instead of expand.  Something like "to help
combine match...", no need for a V2 though.

OK with that change.

jeff

[committed] libstdc++: Fix preprocessor conditions for std::from_chars [PR109921]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

We use the from_chars_strtod function with __strtof128 to read a
_Float128 value, but from_chars_strtod is not defined unless uselocale
is available. This can lead to compilation failures for some targets,
because we try to define the _Flaot128 overload in terms of a
non-existing from_chars_strtod function.

Only try to use __strtof128 if uselocale is available, otherwise
fallback to the long double overload of std::from_chars (which might
fallback to the double overload, which should use fast_float).

This ensures we always define the full set of overloads, even if they
are not always accurate for all values of the wider types.

libstdc++-v3/ChangeLog:

PR libstdc++/109921
* src/c++17/floating_from_chars.cc (USE_STRTOF128_FOR_FROM_CHARS):
Only define when USE_STRTOD_FOR_FROM_CHARS is also defined.
(USE_STRTOD_FOR_FROM_CHARS): Do not undefine when long double is
binary64.
(from_chars(const char*, const char*, double&, chars_format)):
Check __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ here.
(from_chars(const char*, const char*, _Float128&, chars_format))
Only use from_chars_strtod when USE_STRTOD_FOR_FROM_CHARS is
defined, otherwise parse a long double and convert to _Float128.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 20 ---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index ebd428d5be3..eea878072b0 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -64,7 +64,7 @@
 // strtold for __ieee128
 extern "C" __ieee128 __strtoieee128(const char*, char**);
 #elif __FLT128_MANT_DIG__ == 113 && __LDBL_MANT_DIG__ != 113 \
-  && defined(__GLIBC_PREREQ)
+  && defined(__GLIBC_PREREQ) && defined(USE_STRTOD_FOR_FROM_CHARS)
 #define USE_STRTOF128_FOR_FROM_CHARS 1
 extern "C" _Float128 __strtof128(const char*, char**)
   __asm ("strtof128")
@@ -77,10 +77,6 @@ extern "C" _Float128 __strtof128(const char*, char**)
 #if _GLIBCXX_FLOAT_IS_IEEE_BINARY32 && _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 \
 && __SIZE_WIDTH__ >= 32
 # define USE_LIB_FAST_FLOAT 1
-# if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__
-// No need to use strtold.
-#  undef USE_STRTOD_FOR_FROM_CHARS
-# endif
 #endif
 
 #if USE_LIB_FAST_FLOAT
@@ -1261,7 +1257,7 @@ from_chars_result
 from_chars(const char* first, const char* last, long double& value,
   chars_format fmt) noexcept
 {
-#if ! USE_STRTOD_FOR_FROM_CHARS
+#if __LDBL_MANT_DIG__ == __DBL_MANT_DIG__ || !defined USE_STRTOD_FOR_FROM_CHARS
   // Either long double is the same as double, or we can't use strtold.
   // In the latter case, this might give an incorrect result (e.g. values
   // out of range of double give an error, even if they fit in long double).
@@ -1329,13 +1325,23 @@ _ZSt10from_charsPKcS0_RDF128_St12chars_format(const 
char* first,
  __ieee128& value,
  chars_format fmt) noexcept
 __attribute__((alias ("_ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format")));
-#elif defined(USE_STRTOF128_FOR_FROM_CHARS)
+#else
 from_chars_result
 from_chars(const char* first, const char* last, _Float128& value,
   chars_format fmt) noexcept
 {
+#ifdef USE_STRTOF128_FOR_FROM_CHARS
   // fast_float doesn't support IEEE binary128 format, but we can use strtold.
   return from_chars_strtod(first, last, value, fmt);
+#else
+  // Read a long double. This might give an incorrect result (e.g. values
+  // out of range of long double give an error, even if they fit in _Float128).
+  long double ldbl_val;
+  auto res = std::from_chars(first, last, ldbl_val, fmt);
+  if (res.ec == errc{})
+value = ldbl_val;
+  return res;
+#endif
 }
 #endif
 
-- 
2.40.1

[committed] libstdc++: Deprecate std::setfill for std::basic_istream [PR109922]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

Prior to N0966 (July 1996) the std::setfill manipulator was specified to
work with both input and output streams. In the final C++98 standard it
is only specified to work with output streams.

We have always supported it for input streams, despite that never being
in the standard, and having no meaning for any input streams defined by
the standard. This commit adds a deprecated attribute to the overload
for input streams, so that we can stop supporting this some day.

libstdc++-v3/ChangeLog:

PR libstdc++/109922
* include/std/iomanip (operator>>(basic_istream&, _Setfill)):
Add deprecated attribute to non-standard overload.
* doc/xml/manual/evolution.xml: Document deprecation.
* doc/html/*: Regenerate.
* testsuite/27_io/manipulators/standard/char/1.cc: Add
dg-warning for expected deprecated warning.
* testsuite/27_io/manipulators/standard/char/2.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/1.cc: Likewise.
* testsuite/27_io/manipulators/standard/wchar_t/2.cc: Likewise.
---
 libstdc++-v3/doc/html/index.html | 2 +-
 libstdc++-v3/doc/html/manual/api.html| 3 +++
 libstdc++-v3/doc/html/manual/appendix.html   | 2 +-
 libstdc++-v3/doc/html/manual/appendix_porting.html   | 2 +-
 libstdc++-v3/doc/html/manual/index.html  | 2 +-
 libstdc++-v3/doc/xml/manual/evolution.xml| 9 +
 libstdc++-v3/include/std/iomanip | 2 ++
 .../testsuite/27_io/manipulators/standard/char/1.cc  | 4 ++--
 .../testsuite/27_io/manipulators/standard/char/2.cc  | 2 +-
 .../testsuite/27_io/manipulators/standard/wchar_t/1.cc   | 4 ++--
 .../testsuite/27_io/manipulators/standard/wchar_t/2.cc   | 2 +-
 11 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/evolution.xml 
b/libstdc++-v3/doc/xml/manual/evolution.xml
index a29e4df3822..4037a18d2df 100644
--- a/libstdc++-v3/doc/xml/manual/evolution.xml
+++ b/libstdc++-v3/doc/xml/manual/evolution.xml
@@ -1089,4 +1089,13 @@ Tunables glibcxx.eh_pool.obj_count 
and
 
 
 
+14
+
+
+Deprecate the non-standard overload that allows std::setfill
+to be used with std::basic_istream.
+
+
+
+
 
diff --git a/libstdc++-v3/include/std/iomanip b/libstdc++-v3/include/std/iomanip
index 5c0fb09a60e..eb82fc584b6 100644
--- a/libstdc++-v3/include/std/iomanip
+++ b/libstdc++-v3/include/std/iomanip
@@ -168,6 +168,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return { __c }; }
 
   template
+__attribute__((__deprecated__("'std::setfill' should only be used with "
+ "output streams")))
 inline basic_istream<_CharT, _Traits>&
 operator>>(basic_istream<_CharT, _Traits>& __is, _Setfill<_CharT> __f)
 {
diff --git a/libstdc++-v3/testsuite/27_io/manipulators/standard/char/1.cc 
b/libstdc++-v3/testsuite/27_io/manipulators/standard/char/1.cc
index d3eba45aac1..4da43200fe5 100644
--- a/libstdc++-v3/testsuite/27_io/manipulators/standard/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/manipulators/standard/char/1.cc
@@ -51,9 +51,9 @@ test01()
   oss << setbase(8);
   VERIFY(oss.good());
 
-  // setfil
+  // setfill
   setfill('a');
-  iss >> setfill('a');
+  iss >> setfill('a'); // { dg-warning "deprecated" }
   VERIFY(iss.good());
   oss << setfill('a');
   VERIFY(oss.good());
diff --git a/libstdc++-v3/testsuite/27_io/manipulators/standard/char/2.cc 
b/libstdc++-v3/testsuite/27_io/manipulators/standard/char/2.cc
index dc74e1983c7..9acc057ccbb 100644
--- a/libstdc++-v3/testsuite/27_io/manipulators/standard/char/2.cc
+++ b/libstdc++-v3/testsuite/27_io/manipulators/standard/char/2.cc
@@ -40,7 +40,7 @@ test01()
   sin >> resetiosflags(ios_base::dec)
   >> setiosflags(ios_base::dec)
   >> setbase(ios_base::dec)
-  >> setfill('c')
+  >> setfill('c') // { dg-warning "deprecated" }
   >> setprecision(5)
   >> setw(20)
   >> ws;
diff --git a/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/1.cc 
b/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/1.cc
index 0c27e8b126a..ebfab0cc732 100644
--- a/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/1.cc
+++ b/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/1.cc
@@ -51,9 +51,9 @@ test01()
   oss << setbase(8);
   VERIFY(oss.good());
 
-  // setfil
+  // setfill
   setfill(L'a');
-  iss >> setfill(L'a');
+  iss >> setfill(L'a'); // { dg-warning "deprecated" }
   VERIFY(iss.good());
   oss << setfill(L'a');
   VERIFY(oss.good());
diff --git a/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/2.cc 
b/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/2.cc
index 509c152a6d7..78b812d4288 100644
--- a/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/2.cc
+++ b/libstdc++-v3/testsuite/27_io/manipulators/standard/wchar_t/2.cc
@@ -40,7 +40,7 @@ test01()
   sin >>

[committed] libstdc++: Add std::numeric_limits<__float128> specialization [PR104772]

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

As suggested by Jakub in the PR, this just hardcodes the constants with
a Q suffix, since the properties of __float128 are not going to change.

We can only define it for non-strict modes because the suffix gives an
error otherwise, even in system headers:

limits:2085: error: unable to find numeric literal operator 'operator""Q'

libstdc++-v3/ChangeLog:

PR libstdc++/104772
* include/std/limits (numeric_limits<__float128>): Define.
* testsuite/18_support/numeric_limits/128bit.cc: New test.
---
 libstdc++-v3/include/std/limits   | 75 +++
 .../18_support/numeric_limits/128bit.cc   | 12 +++
 2 files changed, 87 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/18_support/numeric_limits/128bit.cc

diff --git a/libstdc++-v3/include/std/limits b/libstdc++-v3/include/std/limits
index 8bafd6fb972..5f341e63b93 100644
--- a/libstdc++-v3/include/std/limits
+++ b/libstdc++-v3/include/std/limits
@@ -2073,6 +2073,81 @@ __glibcxx_float_n(128)
 
 #endif
 
+#if defined(_GLIBCXX_USE_FLOAT128) && !defined(__STRICT_ANSI__)
+  __extension__
+  template<>
+struct numeric_limits<__float128>
+{
+  static _GLIBCXX_USE_CONSTEXPR bool is_specialized = true;
+
+  static _GLIBCXX_CONSTEXPR __float128
+  min() _GLIBCXX_USE_NOEXCEPT
+  { return __extension__ 3.36210314311209350626267781732175260e-4932Q; }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  max() _GLIBCXX_USE_NOEXCEPT
+  { return __extension__ 1.18973149535723176508575932662800702e+4932Q; }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  lowest() _GLIBCXX_USE_NOEXCEPT
+  { return -max(); }
+
+  static _GLIBCXX_USE_CONSTEXPR int digits = 113;
+  static _GLIBCXX_USE_CONSTEXPR int digits10 = 33;
+  static _GLIBCXX_USE_CONSTEXPR int max_digits10
+   = __glibcxx_max_digits10 (112);
+  static _GLIBCXX_USE_CONSTEXPR bool is_signed = true;
+  static _GLIBCXX_USE_CONSTEXPR bool is_integer = false;
+  static _GLIBCXX_USE_CONSTEXPR bool is_exact = false;
+  static _GLIBCXX_USE_CONSTEXPR int radix = __FLT_RADIX__;
+
+  static _GLIBCXX_CONSTEXPR __float128
+  epsilon() _GLIBCXX_USE_NOEXCEPT
+  { return __extension__ 1.92592994438723585305597794258492732e-34Q; }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  round_error() _GLIBCXX_USE_NOEXCEPT { return __extension__ 0.5Q; }
+
+  static _GLIBCXX_USE_CONSTEXPR int min_exponent = -16381;
+  static _GLIBCXX_USE_CONSTEXPR int min_exponent10 = -4931;
+  static _GLIBCXX_USE_CONSTEXPR int max_exponent = 16384;
+  static _GLIBCXX_USE_CONSTEXPR int max_exponent10 = 4932;
+
+  static _GLIBCXX_USE_CONSTEXPR bool has_infinity = 1;
+  static _GLIBCXX_USE_CONSTEXPR bool has_quiet_NaN = 1;
+  static _GLIBCXX_USE_CONSTEXPR bool has_signaling_NaN = has_quiet_NaN;
+  static _GLIBCXX_USE_CONSTEXPR float_denorm_style has_denorm
+   = denorm_present;
+  static _GLIBCXX_USE_CONSTEXPR bool has_denorm_loss = false;
+
+  static _GLIBCXX_CONSTEXPR __float128
+  infinity() _GLIBCXX_USE_NOEXCEPT
+  { return __builtin_huge_valq(); }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  quiet_NaN() _GLIBCXX_USE_NOEXCEPT
+  { return __builtin_nanq(""); }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  signaling_NaN() _GLIBCXX_USE_NOEXCEPT
+  { return __builtin_nansq(""); }
+
+  static _GLIBCXX_CONSTEXPR __float128
+  denorm_min() _GLIBCXX_USE_NOEXCEPT
+  { return __extension__ 6.47517511943802511092443895822764655e-4966Q; }
+
+  static _GLIBCXX_USE_CONSTEXPR bool is_iec559
+   = has_infinity && has_quiet_NaN && has_denorm == denorm_present;
+  static _GLIBCXX_USE_CONSTEXPR bool is_bounded = true;
+  static _GLIBCXX_USE_CONSTEXPR bool is_modulo = false;
+
+  static _GLIBCXX_USE_CONSTEXPR bool traps = false;
+  static _GLIBCXX_USE_CONSTEXPR bool tinyness_before = false;
+  static _GLIBCXX_USE_CONSTEXPR float_round_style round_style
+   = round_to_nearest;
+};
+#endif // _GLIBCXX_USE_FLOAT128 && ! __STRICT_ANSI__
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/testsuite/18_support/numeric_limits/128bit.cc 
b/libstdc++-v3/testsuite/18_support/numeric_limits/128bit.cc
new file mode 100644
index 000..e8ea568df94
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/numeric_limits/128bit.cc
@@ -0,0 +1,12 @@
+// { dg-do compile }
+
+#include 
+
+#if __SIZEOF_FLOAT128__ && !defined __STRICT_ANSI__
+__extension__ template class std::numeric_limits<__float128>;
+#endif
+
+#if __SIZEOF_INT128__
+__extension__ template class std::numeric_limits<__int128>;
+__extension__ template class std::numeric_limits;
+#endif
-- 
2.40.1

[committed] libstdc++: Do not include in

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

We previously needed  in  for the std::lock_error
exception class, but that was moved out of  in 2009 when it was
removed from the C++0x draft. We can stop including  now.

Move the include for  to 
where it's actually used, and only include  in  (for
EAGAIN and EDEADLK).

Also add some headers to  that are needed but are not included
directly: ,  and .

libstdc++-v3/ChangeLog:

* include/bits/unique_lock.h: Include 
here for std::errc constants.
* include/std/mutex: Do not include  and
 here.
---
 libstdc++-v3/include/bits/unique_lock.h |  1 +
 libstdc++-v3/include/std/mutex  | 12 +++-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/bits/unique_lock.h 
b/libstdc++-v3/include/bits/unique_lock.h
index f14674ed471..c28e6456ad5 100644
--- a/libstdc++-v3/include/bits/unique_lock.h
+++ b/libstdc++-v3/include/bits/unique_lock.h
@@ -37,6 +37,7 @@
 #else
 
 #include 
+#include  // for std::errc
 #include  // for std::swap
 #include  // for std::defer_lock_t
 
diff --git a/libstdc++-v3/include/std/mutex b/libstdc++-v3/include/std/mutex
index 79420388abc..2b0059fcfe8 100644
--- a/libstdc++-v3/include/std/mutex
+++ b/libstdc++-v3/include/std/mutex
@@ -37,11 +37,13 @@
 # include 
 #else
 
-#include 
-#include 
-#include 
-#include 
-#include 
+#include  // std::tuple
+#include // is_same_v
+#include// EAGAIN, EDEADLK
+#include   // duration, time_point, is_clock_v
+#include  // __throw_system_error
+#include   // __invoke
+#include// std::forward
 #include 
 #include 
 #if ! _GTHREAD_USE_MUTEX_TIMEDLOCK
-- 
2.40.1

[committed] libstdc++: Replace obsolete shell syntax in configure.ac

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The current POSIX standard says that the -a and -o operators to the
'test' utility are obsolete, and the shell operators && and || should be
used instead.

libstdc++-v3/ChangeLog:

* configure.ac: Replace use of -o operator for test.
* configure: Regenerate.
---
 libstdc++-v3/configure| 2 +-
 libstdc++-v3/configure.ac | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 167aabd8fc5..4db8a284083 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -3298,7 +3298,7 @@ if test "$build" != "$host"; then
   *-*-darwin*,*-*-darwin*)
hostos=`echo $host | sed 's/.*-darwin/darwin/'`
targetos=`echo $target | sed 's/.*-darwin/darwin/'`
-   if test $hostos = $targetos -o $targetos = darwin ; then
+   if test $hostos = $targetos || test $targetos = darwin ; then
  GLIBCXX_IS_NATIVE=true
fi
;;
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index df01f58bd83..0abe54e7b9a 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -45,7 +45,7 @@ if test "$build" != "$host"; then
   *-*-darwin*,*-*-darwin*)
hostos=`echo $host | sed 's/.*-darwin/darwin/'`
targetos=`echo $target | sed 's/.*-darwin/darwin/'`
-   if test $hostos = $targetos -o $targetos = darwin ; then
+   if test $hostos = $targetos || test $targetos = darwin ; then
  GLIBCXX_IS_NATIVE=true
fi
;;
-- 
2.40.1

[committed] libstdc++: Add missing noexcept to std::scoped_allocator_adaptor

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

The standard requires these constructors and accessors to be noexcept.

libstdc++-v3/ChangeLog:

* include/std/scoped_allocator (scoped_allocator_adaptor): Add
noexcept to all constructors except the default constructor.
(scoped_allocator_adaptor::inner_allocator): Add noexcept.
(scoped_allocator_adaptor::outer_allocator): Likewise.
* testsuite/20_util/scoped_allocator/noexcept.cc: New test.
---
 libstdc++-v3/include/std/scoped_allocator | 45 ++
 .../20_util/scoped_allocator/noexcept.cc  | 47 +++
 2 files changed, 73 insertions(+), 19 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/scoped_allocator/noexcept.cc

diff --git a/libstdc++-v3/include/std/scoped_allocator 
b/libstdc++-v3/include/std/scoped_allocator
index 68e6afb1000..a11545026ba 100644
--- a/libstdc++-v3/include/std/scoped_allocator
+++ b/libstdc++-v3/include/std/scoped_allocator
@@ -65,7 +65,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __outermost_type
 {
   using type = _Alloc;
-  static type& _S_outermost(_Alloc& __a) { return __a; }
+  static type& _S_outermost(_Alloc& __a) noexcept { return __a; }
 };
 
   template
@@ -79,7 +79,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   >;
 
   static typename __base::type&
-  _S_outermost(_Alloc& __a)
+  _S_outermost(_Alloc& __a) noexcept
   { return __base::_S_outermost(__a.outer_allocator()); }
 };
 
@@ -104,11 +104,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __inner_type_impl& operator=(__inner_type_impl&&) = default;
 
   template
-  __inner_type_impl(const __inner_type_impl<_Alloc>& __other)
+  __inner_type_impl(const __inner_type_impl<_Alloc>& __other) noexcept
   { }
 
   template
-  __inner_type_impl(__inner_type_impl<_Alloc>&& __other)
+  __inner_type_impl(__inner_type_impl<_Alloc>&& __other) noexcept
   { }
 
   __type&
@@ -137,16 +137,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __inner_type_impl& operator=(__inner_type_impl&&) = default;
 
   template
-  __inner_type_impl(const __inner_type_impl<_Allocs...>& __other)
+  __inner_type_impl(const __inner_type_impl<_Allocs...>& __other) noexcept
   : _M_inner(__other._M_inner) { }
 
   template
-  __inner_type_impl(__inner_type_impl<_Allocs...>&& __other)
+  __inner_type_impl(__inner_type_impl<_Allocs...>&& __other) noexcept
   : _M_inner(std::move(__other._M_inner)) { }
 
 template
   explicit
-  __inner_type_impl(_Args&&... __args)
+  __inner_type_impl(_Args&&... __args) noexcept
   : _M_inner(std::forward<_Args>(__args)...) { }
 
   __type&
@@ -307,31 +307,32 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template>
 scoped_allocator_adaptor(_Outer2&& __outer,
- const _InnerAllocs&... __inner)
+ const _InnerAllocs&... __inner) noexcept
 : _OuterAlloc(std::forward<_Outer2>(__outer)),
   _M_inner(__inner...)
 { }
 
-  scoped_allocator_adaptor(const scoped_allocator_adaptor& __other)
+  scoped_allocator_adaptor(const scoped_allocator_adaptor& __other) 
noexcept
   : _OuterAlloc(__other.outer_allocator()),
_M_inner(__other._M_inner)
   { }
 
-  scoped_allocator_adaptor(scoped_allocator_adaptor&& __other)
+  scoped_allocator_adaptor(scoped_allocator_adaptor&& __other) noexcept
   : _OuterAlloc(std::move(__other.outer_allocator())),
_M_inner(std::move(__other._M_inner))
   { }
 
   template>
 scoped_allocator_adaptor(
-const scoped_allocator_adaptor<_Outer2, _InnerAllocs...>& __other)
+ const scoped_allocator_adaptor<_Outer2, _InnerAllocs...>& __other
+   ) noexcept
 : _OuterAlloc(__other.outer_allocator()),
   _M_inner(__other._M_inner)
 { }
 
   template>
 scoped_allocator_adaptor(
-scoped_allocator_adaptor<_Outer2, _InnerAllocs...>&& __other)
+ scoped_allocator_adaptor<_Outer2, _InnerAllocs...>&& __other) noexcept
 : _OuterAlloc(std::move(__other.outer_allocator())),
   _M_inner(std::move(__other._M_inner))
 { }
@@ -342,25 +343,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   scoped_allocator_adaptor&
   operator=(scoped_allocator_adaptor&&) = default;
 
-  inner_allocator_type& inner_allocator() noexcept
+  inner_allocator_type&
+  inner_allocator() noexcept
   { return _M_inner._M_get(this); }
 
-  const inner_allocator_type& inner_allocator() const noexcept
+  const inner_allocator_type&
+  inner_allocator() const noexcept
   { return _M_inner._M_get(this); }
 
-  outer_allocator_type& outer_allocator() noexcept
+  outer_allocator_type&
+  outer_allocator() noexcept
   { return static_cast<_OuterAlloc&>(*this); }
 
-  const outer_allocator_type&

[committed] libstdc++: Disable embedded tzdata for all 16-bit targets

2023-05-31 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_ZONEINFO_DIR): Extend logic for avr and
msp430 to all 16-bit targets.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 15 +--
 libstdc++-v3/configure| 18 --
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 8129373e9dd..eb30c4f00a5 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -5426,12 +5426,15 @@ AC_DEFUN([GLIBCXX_ZONEINFO_DIR], [
zoneinfo_dir=none
;;
 esac
-case "$host" in
-  avr-*-* | msp430-*-* ) embed_zoneinfo=no ;;
-  *)
-   # Also embed a copy of the tzdata.zi file as a static string.
-   embed_zoneinfo=yes ;;
-esac
+
+AC_COMPUTE_INT(glibcxx_cv_at_least_32bit, [sizeof(void*) >= 4])
+if test "$glibcxx_cv_at_least_32bit" -ne 0; then
+  # Also embed a copy of the tzdata.zi file as a static string.
+  embed_zoneinfo=yes
+else
+  # The embedded data is too large for 16-bit targets.
+  embed_zoneinfo=no
+fi
   elif test "x${with_libstdcxx_zoneinfo}" = xno; then
 # Disable tzdb support completely.
 zoneinfo_dir=none
diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 412c4bf0e85..167aabd8fc5 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -71903,12 +71903,18 @@ fi
zoneinfo_dir=none
;;
 esac
-case "$host" in
-  avr-*-* | msp430-*-* ) embed_zoneinfo=no ;;
-  *)
-   # Also embed a copy of the tzdata.zi file as a static string.
-   embed_zoneinfo=yes ;;
-esac
+
+if ac_fn_c_compute_int "$LINENO" "sizeof(void*) >= 4" 
"glibcxx_cv_at_least_32bit"""; then :
+
+fi
+
+if test "$glibcxx_cv_at_least_32bit" -ne 0; then
+  # Also embed a copy of the tzdata.zi file as a static string.
+  embed_zoneinfo=yes
+else
+  # The embedded data is too large for 16-bit targets.
+  embed_zoneinfo=no
+fi
   elif test "x${with_libstdcxx_zoneinfo}" = xno; then
 # Disable tzdb support completely.
 zoneinfo_dir=none
-- 
2.40.1

Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.

2023-05-31 Thread Manolis Tsamis

On Tue, May 30, 2023 at 2:30 AM Jeff Law  wrote:
>
>
>
> On 5/25/23 08:02, Manolis Tsamis wrote:
> > On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
> >  wrote:
> >>
> >> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >>  wrote:
> >>>
> >>>
> >>>
> >>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
>  On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis  
>  wrote:
> >
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> 
>  Why do fwprop or combine not what you want to do?
> >>> I think a lot of them end up coming from register elimination.
> >>
> >> Why isn't this a problem for other targets then?  Or maybe it is and this
> >> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> >> perform strength reduction (I can't think of any other post reload pass
> >> that would do something even remotely related).
> >>
> >> Richard.
> >>
> >
> > It should be a problem for other targets as well (especially RISC-style 
> > ISAs).
> >
> > It can be easily seen by comparing the generated code for the
> > testcases: Example for testcase-2 on AArch64:
> > https://godbolt.org/z/GMT1K7Ebr
> > Although the patterns in the test cases are the ones that are simple
> > as the complex ones manifest in complex programs, the case still
> > holds.
> > The code for this pass is quite generic and could work for most/all
> > targets if that would be interesting.
> Interestly enough, fold-mem-offsets seems to interact strangely with the
> load/store pair support on aarch64.  Note show store2a uses 2 stp
> instructions on the trunk, but 4 str instructions with fold-mem-offsets.
>   Yet in load1r we're able to generate a load-quad rather than two load
> pairs.  Weird.
>

I'm confused, where is this comparison from?
The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't)
affect AArch64.

I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs
clang, no fold-mem-offsets involved here.

Manolis

> jeff

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread 钟居哲

Sure. I will repost it with kewen's fix.
Previously, this patch I tested on X86 and apply to upstream RVV gcc tested on 
upstream RVV testsuite.
Since the upstream RVV GCC is not ready and fragile, we can't use it compile 
big program.
I will apply this patch on my downstream GCC and do my downstream regression 
(Regression is very big and include so many benchmarks).

Thanks.

juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-05-31 18:53
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richard.
> Seems that this patch's approach is ok to trunk?
> Maybe the only thing we should do is to wait Kewen's testing feedback, am I 
> right ?

Can you repost the patch with Kevens fix and state how you tested it?

Thanks,
Richard.

Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.

2023-05-31 Thread Manolis Tsamis

On Thu, May 25, 2023 at 4:38 PM Jeff Law  wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >  * regcprop.cc (maybe_mode_change): Enable stack pointer 
> > propagation.
> I can't see how this can be correct given the stack pointer equality
> tests elsewhere in the compiler, particularly the various targets.
>
> The problem is if you change the mode then you end up with multiple REG
> expressions that reference the stack pointer.
>
> See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around
> the change which introduced this code.
>

Hi Jeff,

Isn't this fine for this case since:

  1) stack_pointer_rtx is used which won't cause issues with pointer
equalities (If I understand correctly).
  2) Propagation is guarded with `if (orig_mode == new_mode)` so only
when there is no mode change.

Thanks,
Manolis

>
> Jeff

Re: [PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

> The approach is quite simple and obvious, changing extension pattern
> into define_insn_and_split will make combine PASS combine into widen
> operations naturally.

looks good to me.  Tiny nit: I would add a comment above the patterns
to clarify why insn_and_split instead of expand.  Something like "to help
combine match...", no need for a V2 though.

Regards
 Robin

[PATCH] RISC-V: Add vwadd/vwsub/vwmul/vwmulsu.vv lowering optimizaiton for RVV auto-vectorization

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

The approach is quite simple and obvious, changing extension pattern into 
define_insn_and_split
will make combine PASS combine into widen operations naturally.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Change 
expand into define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 13 ---
 .../riscv/rvv/autovec/widen/widen-1.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-2.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-3.c | 27 +++
 .../riscv/rvv/autovec/widen/widen-4.c | 23 +
 .../riscv/rvv/autovec/widen/widen_run-1.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-2.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-3.c | 34 +++
 .../riscv/rvv/autovec/widen/widen_run-4.c | 31 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp| 13 +++
 10 files changed, 259 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 4834bb4b412..e96de60123b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -401,16 +401,21 @@
 ;; - vsext.vf[2|4|8]
 ;; -
 
-(define_expand "2"
-  [(set (match_operand:VWEXTI 0 "register_operand")
+(define_insn_and_split "2"
+  [(set (match_operand:VWEXTI 0 "register_operand" "=")
 (any_extend:VWEXTI
- (match_operand: 1 "register_operand")))]
+ (match_operand: 1 "register_operand" "vr")))]
   "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
 {
   insn_code icode = code_for_pred_vf2 (, mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
   DONE;
-})
+}
+  [(set_attr "type" "vext")
+   (set_attr "mode" "")])
 
 (define_expand "2"
   [(set (match_operand:VQEXTI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
new file mode 100644
index 000..00edecab089
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE1, TYPE2)
\
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   
\
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] = (TYPE1) a[i] + (TYPE1) b[i];
\
+  }
+
+#define TEST_ALL() 
\
+  TEST_TYPE (int16_t, int8_t)  
\
+  TEST_TYPE (uint16_t, uint8_t)
\
+  TEST_TYPE (int32_t, int16_t) 
\
+  TEST_TYPE (uint32_t, uint16_t)   
\
+  TEST_TYPE (int64_t, int32_t) 
\
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwadd\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwaddu\.vv} 3 } } */
diff --git

[PATCH] ipa/109983 - (IPA) PTA speedup

2023-05-31 Thread Richard Biener via Gcc-patches

This improves the edge avoidance heuristic by re-ordering the
topological sort of the graph to make sure the component with
the ESCAPED node is processed first.  This improves the number
of created edges which directly correlates with the number
of bitmap_ior_into calls from 141447426 to 239596 and the
compile-time from 1083s to 3s.  It also improves the compile-time
for the related PR109143 from 81s to 27s.

I've modernized the topological sorting API on the way as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR ipa/109983
PR tree-optimization/109143
* tree-ssa-structalias.cc (struct topo_info): Remove.
(init_topo_info): Likewise.
(free_topo_info): Likewise.
(compute_topo_order): Simplify API, put the component
with ESCAPED last so it's processed first.
(topo_visit): Adjust.
(solve_graph): Likewise.
---
 gcc/tree-ssa-structalias.cc | 118 ++--
 1 file changed, 46 insertions(+), 72 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 9ded34c1dd1..8db99a42565 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -1585,65 +1585,6 @@ unify_nodes (constraint_graph_t graph, unsigned int to, 
unsigned int from,
 bitmap_clear_bit (graph->succs[to], to);
 }
 
-/* Information needed to compute the topological ordering of a graph.  */
-
-struct topo_info
-{
-  /* sbitmap of visited nodes.  */
-  sbitmap visited;
-  /* Array that stores the topological order of the graph, *in
- reverse*.  */
-  vec topo_order;
-};
-
-
-/* Initialize and return a topological info structure.  */
-
-static struct topo_info *
-init_topo_info (void)
-{
-  size_t size = graph->size;
-  struct topo_info *ti = XNEW (struct topo_info);
-  ti->visited = sbitmap_alloc (size);
-  bitmap_clear (ti->visited);
-  ti->topo_order.create (1);
-  return ti;
-}
-
-
-/* Free the topological sort info pointed to by TI.  */
-
-static void
-free_topo_info (struct topo_info *ti)
-{
-  sbitmap_free (ti->visited);
-  ti->topo_order.release ();
-  free (ti);
-}
-
-/* Visit the graph in topological order, and store the order in the
-   topo_info structure.  */
-
-static void
-topo_visit (constraint_graph_t graph, struct topo_info *ti,
-   unsigned int n)
-{
-  bitmap_iterator bi;
-  unsigned int j;
-
-  bitmap_set_bit (ti->visited, n);
-
-  if (graph->succs[n])
-EXECUTE_IF_SET_IN_BITMAP (graph->succs[n], 0, j, bi)
-  {
-   unsigned k = find (j);
-   if (!bitmap_bit_p (ti->visited, k))
- topo_visit (graph, ti, k);
-  }
-
-  ti->topo_order.safe_push (n);
-}
-
 /* Add a copy edge FROM -> TO, optimizing special cases.  Returns TRUE
if the solution of TO changed.  */
 
@@ -1925,19 +1866,56 @@ find_indirect_cycles (constraint_graph_t graph)
   scc_visit (graph, , i);
 }
 
-/* Compute a topological ordering for GRAPH, and store the result in the
-   topo_info structure TI.  */
+/* Visit the graph in topological order starting at node N, and store the
+   order in TOPO_ORDER using VISITED to indicate visited nodes.  */
 
 static void
-compute_topo_order (constraint_graph_t graph,
-   struct topo_info *ti)
+topo_visit (constraint_graph_t graph, vec _order,
+   sbitmap visited, unsigned int n)
+{
+  bitmap_iterator bi;
+  unsigned int j;
+
+  bitmap_set_bit (visited, n);
+
+  if (graph->succs[n])
+EXECUTE_IF_SET_IN_BITMAP (graph->succs[n], 0, j, bi)
+  {
+   unsigned k = find (j);
+   if (!bitmap_bit_p (visited, k))
+ topo_visit (graph, topo_order, visited, k);
+  }
+
+  topo_order.quick_push (n);
+}
+
+/* Compute a topological ordering for GRAPH, and return the result.  */
+
+static auto_vec
+compute_topo_order (constraint_graph_t graph)
 {
   unsigned int i;
   unsigned int size = graph->size;
 
+  auto_sbitmap visited (size);
+  bitmap_clear (visited);
+
+  /* For the heuristic in add_graph_edge to work optimally make sure to
+ first visit the connected component of the graph containing
+ ESCAPED.  Do this by extracting the connected component
+ with ESCAPED and append that to all other components as solve_graph
+ pops from the order.  */
+  auto_vec tail (size);
+  topo_visit (graph, tail, visited, find (escaped_id));
+
+  auto_vec topo_order (size);
+
   for (i = 0; i != size; ++i)
-if (!bitmap_bit_p (ti->visited, i) && find (i) == i)
-  topo_visit (graph, ti, i);
+if (!bitmap_bit_p (visited, i) && find (i) == i)
+  topo_visit (graph, topo_order, visited, i);
+
+  topo_order.splice (tail);
+  return topo_order;
 }
 
 /* Structure used to for hash value numbering of pointer equivalence
@@ -2765,17 +2743,14 @@ solve_graph (constraint_graph_t graph)
   while (!bitmap_empty_p (changed))
 {
   unsigned int i;
-  struct topo_info *ti = init_topo_info ();
   stats.iterations++;
 
   bitmap_obstack_initialize (_obstack);
 
-

[PATCH] IPA PTA stats enhancement and non-details dump slimming

2023-05-31 Thread Richard Biener via Gcc-patches

The following keeps track of the number of edges we avoid to create
because they redundandly feed ESCAPED.  It also avoids printing
a header for -details when not using -details.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-structalias.cc (constraint_stats::num_avoided_edges):
New.
(add_graph_edge): Count redundant edges we avoid to create.
(dump_sa_stats): Dump them.
(ipa_pta_execute): Do not dump generating constraints when
we are not dumping them.
---
 gcc/tree-ssa-structalias.cc | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-structalias.cc b/gcc/tree-ssa-structalias.cc
index 546dab5035e..9ded34c1dd1 100644
--- a/gcc/tree-ssa-structalias.cc
+++ b/gcc/tree-ssa-structalias.cc
@@ -237,6 +237,7 @@ static struct constraint_stats
   unsigned int iterations;
   unsigned int num_edges;
   unsigned int num_implicit_edges;
+  unsigned int num_avoided_edges;
   unsigned int points_to_sets_created;
 } stats;
 
@@ -1213,7 +1214,10 @@ add_graph_edge (constraint_graph_t graph, unsigned int 
to,
   if (to < FIRST_REF_NODE
  && bitmap_bit_p (graph->succs[from], find (escaped_id))
  && bitmap_bit_p (get_varinfo (find (to))->solution, escaped_id))
-   return false;
+   {
+ stats.num_avoided_edges++;
+ return false;
+   }
 
   if (bitmap_set_bit (graph->succs[from], to))
{
@@ -7164,6 +7168,8 @@ dump_sa_stats (FILE *outfile)
   fprintf (outfile, "Number of edges:  %d\n", stats.num_edges);
   fprintf (outfile, "Number of implicit edges: %d\n",
   stats.num_implicit_edges);
+  fprintf (outfile, "Number of avoided edges: %d\n",
+  stats.num_avoided_edges);
 }
 
 /* Dump points-to information to OUTFILE.  */
@@ -8427,7 +8433,7 @@ ipa_pta_execute (void)
  || node->clone_of)
continue;
 
-  if (dump_file)
+  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file,
   "Generating constraints for %s", node->dump_name ());
-- 
2.35.3

[PATCH] emit-rtl: Change return type of predicate functions from int to bool

2023-05-31 Thread Uros Bizjak via Gcc-patches

Also fix some stalled comments.

gcc/ChangeLog:

* rtl.h (subreg_lowpart_p): Change return type from int to bool.
(active_insn_p): Ditto.
(in_sequence_p): Ditto.
(unshare_all_rtl): Change return type from int to void.
* emit-rtl.h (mem_expr_equal_p): Change return type from int to bool.
* emit-rtl.cc (subreg_lowpart_p): Change return type from int to bool
and adjust function body accordingly.
(mem_expr_equal_p): Ditto.
(unshare_all_rtl): Change return type from int to void
and adjust function body accordingly.
(verify_rtx_sharing): Remove unneeded return.
(active_insn_p): Change return type from int to bool
and adjust function body accordingly.
(in_sequence_p): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for master?

Uros.
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 51db055d214..f6276a2d0b6 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -215,7 +215,7 @@ const_int_hasher::hash (rtx x)
   return (hashval_t) INTVAL (x);
 }
 
-/* Returns nonzero if the value represented by X (which is really a
+/* Returns true if the value represented by X (which is really a
CONST_INT) is the same as that given by Y (which is really a
HOST_WIDE_INT *).  */
 
@@ -241,7 +241,7 @@ const_wide_int_hasher::hash (rtx x)
   return (hashval_t) hash;
 }
 
-/* Returns nonzero if the value represented by X (which is really a
+/* Returns true if the value represented by X (which is really a
CONST_WIDE_INT) is the same as that given by Y (which is really a
CONST_WIDE_INT).  */
 
@@ -274,7 +274,7 @@ const_poly_int_hasher::hash (rtx x)
   return h.end ();
 }
 
-/* Returns nonzero if CONST_POLY_INT X is an rtx representation of Y.  */
+/* Returns true if CONST_POLY_INT X is an rtx representation of Y.  */
 
 bool
 const_poly_int_hasher::equal (rtx x, const compare_type )
@@ -305,7 +305,7 @@ const_double_hasher::hash (rtx x)
   return h;
 }
 
-/* Returns nonzero if the value represented by X (really a ...)
+/* Returns true if the value represented by X (really a ...)
is the same as that represented by Y (really a ...) */
 bool
 const_double_hasher::equal (rtx x, rtx y)
@@ -313,7 +313,7 @@ const_double_hasher::equal (rtx x, rtx y)
   const_rtx const a = x, b = y;
 
   if (GET_MODE (a) != GET_MODE (b))
-return 0;
+return false;
   if (TARGET_SUPPORTS_WIDE_INT == 0 && GET_MODE (a) == VOIDmode)
 return (CONST_DOUBLE_LOW (a) == CONST_DOUBLE_LOW (b)
&& CONST_DOUBLE_HIGH (a) == CONST_DOUBLE_HIGH (b));
@@ -336,7 +336,7 @@ const_fixed_hasher::hash (rtx x)
   return h;
 }
 
-/* Returns nonzero if the value represented by X is the same as that
+/* Returns true if the value represented by X is the same as that
represented by Y.  */
 
 bool
@@ -345,7 +345,7 @@ const_fixed_hasher::equal (rtx x, rtx y)
   const_rtx const a = x, b = y;
 
   if (GET_MODE (a) != GET_MODE (b))
-return 0;
+return false;
   return fixed_identical (CONST_FIXED_VALUE (a), CONST_FIXED_VALUE (b));
 }
 
@@ -403,7 +403,7 @@ reg_attr_hasher::hash (reg_attrs *x)
   return h.end ();
 }
 
-/* Returns nonzero if the value represented by X  is the same as that given by
+/* Returns true if the value represented by X  is the same as that given by
Y.  */
 
 bool
@@ -1710,17 +1710,17 @@ subreg_size_highpart_offset (poly_uint64 outer_bytes, 
poly_uint64 inner_bytes)
* BITS_PER_UNIT);
 }
 
-/* Return 1 iff X, assumed to be a SUBREG,
+/* Return true iff X, assumed to be a SUBREG,
refers to the least significant part of its containing reg.
-   If X is not a SUBREG, always return 1 (it is its own low part!).  */
+   If X is not a SUBREG, always return true (it is its own low part!).  */
 
-int
+bool
 subreg_lowpart_p (const_rtx x)
 {
   if (GET_CODE (x) != SUBREG)
-return 1;
+return true;
   else if (GET_MODE (SUBREG_REG (x)) == VOIDmode)
-return 0;
+return false;
 
   return known_eq (subreg_lowpart_offset (GET_MODE (x),
  GET_MODE (SUBREG_REG (x))),
@@ -1836,20 +1836,20 @@ mem_attrs::mem_attrs ()
 size_known_p (false)
 {}
 
-/* Returns 1 if both MEM_EXPR can be considered equal
-   and 0 otherwise.  */
+/* Returns true if both MEM_EXPR can be considered equal
+   and false otherwise.  */
 
-int
+bool
 mem_expr_equal_p (const_tree expr1, const_tree expr2)
 {
   if (expr1 == expr2)
-return 1;
+return true;
 
   if (! expr1 || ! expr2)
-return 0;
+return false;
 
   if (TREE_CODE (expr1) != TREE_CODE (expr2))
-return 0;
+return false;
 
   return operand_equal_p (expr1, expr2, 0);
 }
@@ -2820,7 +2820,7 @@ unshare_all_rtl_again (rtx_insn *insn)
   unshare_all_rtl_1 (insn);
 }
 
-unsigned int
+void
 unshare_all_rtl (void)
 {
   unshare_all_rtl_1 (get_insns ());
@@ -2831,8 +2831,6 @@ unshare_all_rtl (void)
SET_DECL_RTL (decl, copy_rtx_if_shared (DECL_RTL (decl)));
   DECL_INCOMING_RTL (decl) =

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread Richard Biener via Gcc-patches

On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richard.
> Seems that this patch's approach is ok to trunk?
> Maybe the only thing we should do is to wait Kewen's testing feedback, am I 
> right ?

Can you repost the patch with Kevens fix and state how you tested it?

Thanks,
Richard.

[PATCH] RISC-V: Remove FRM for vfncvt.rod instruction

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 3c4565dc775..cd41ebbb24f 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7286,10 +7286,8 @@
 (match_operand 5 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (match_operand 6 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (match_operand 7 "const_int_operand"  "  i,  i,  
i,  i,i,i")
-(match_operand 8 "const_int_operand"  "  i,  i,  
i,  i,i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (unspec:
[(float_truncate:
   (match_operand:VWEXTF 3 "register_operand"  "  0,  0,  
0,  0,   vr,   vr"))] UNSPEC_ROD)
-- 
2.36.1

[PATCH] RISC-V: Remove FRM for vfwcvt.f.x.v (RVV integer to float widening conversion)

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt.f.x.v doesn't depend on FRM. So remove FRM preparing for mode 
switching support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 28e7e63ce69..3c4565dc775 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7159,10 +7159,8 @@
 (match_operand 5 "const_int_operand""i,i")
 (match_operand 6 "const_int_operand""i,i")
 (match_operand 7 "const_int_operand""i,i")
-(match_operand 8 "const_int_operand""i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (any_float:VF
 (match_operand: 3 "register_operand" "   vr,   vr"))
  (match_operand:VF 2 "vector_merge_operand" "   vu,0")))]
-- 
2.36.1

[PATCH] RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884

vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching 
support.

gcc/ChangeLog:

* config/riscv/vector.md: Remove FRM.

---
 gcc/config/riscv/vector.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cd696da5d89..28e7e63ce69 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -7180,10 +7180,8 @@
 (match_operand 5 "const_int_operand" "i,i")
 (match_operand 6 "const_int_operand" "i,i")
 (match_operand 7 "const_int_operand" "i,i")
-(match_operand 8 "const_int_operand" "i,i")
 (reg:SI VL_REGNUM)
-(reg:SI VTYPE_REGNUM)
-(reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
+(reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
  (float_extend:VWEXTF
 (match_operand: 3 "register_operand" "   vr,   
vr"))
  (match_operand:VWEXTF 2 "vector_merge_operand"  "   vu,
0")))]
-- 
2.36.1

Re: [PATCH v4] MIPS: add speculation_barrier support

2023-05-31 Thread Maciej W. Rozycki

On Wed, 31 May 2023, YunQiang Su wrote:

> If no objection, I will commit this V4 patch.

 At first glance it has coding style issues.

  Maciej

[PATCH] RISC-V: Add testcase for vrsub.vi auto-vectorization

2023-05-31 Thread juzhe . zhong

From: Juzhe-Zhong 

Apparently, we are missing vrsub.vi tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add vsub.vi.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Ditto.

---
 .../riscv/rvv/autovec/binop/vsub-run.c| 30 ++-
 .../riscv/rvv/autovec/binop/vsub-rv32gcv.c|  1 +
 .../riscv/rvv/autovec/binop/vsub-rv64gcv.c|  1 +
 .../riscv/rvv/autovec/binop/vsub-template.h   | 28 +
 4 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
index 8c6d8e88d1a..4f254872e33 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-run.c
@@ -27,6 +27,22 @@
   for (int i = 0; i < SZ; i++) \
 assert (as##TYPE[i] == 999 - VAL);
 
+#define RUN3(TYPE) \
+  TYPE as2##TYPE[SZ];  \
+  for (int i = 0; i < SZ; i++) \
+as2##TYPE[i] = i * 33 - 779;   \
+  vsubi_##TYPE (as2##TYPE, as2##TYPE, SZ); \
+  for (int i = 0; i < SZ; i++) \
+assert (as2##TYPE[i] == (TYPE)(-16 - (i * 33 - 779)));
+
+#define RUN4(TYPE) \
+  TYPE as3##TYPE[SZ];  \
+  for (int i = 0; i < SZ; i++) \
+as3##TYPE[i] = i * -17 + 667;  \
+  vsubi2_##TYPE (as3##TYPE, as3##TYPE, SZ);\
+  for (int i = 0; i < SZ; i++) \
+assert (as3##TYPE[i] == (TYPE)(15 - (i * -17 + 667)));
+
 #define RUN_ALL()  \
  RUN(int16_t, 1)   \
  RUN(uint16_t, 2)  \
@@ -39,7 +55,19 @@
  RUN2(int32_t, 9)  \
  RUN2(uint32_t, 10)\
  RUN2(int64_t, 11) \
- RUN2(uint64_t, 12)
+ RUN2(uint64_t, 12)\
+ RUN3(int16_t) \
+ RUN3(uint16_t)\
+ RUN3(int32_t) \
+ RUN3(uint32_t)\
+ RUN3(int64_t) \
+ RUN3(uint64_t)\
+ RUN4(int16_t) \
+ RUN4(uint16_t)\
+ RUN4(int32_t) \
+ RUN4(uint32_t)\
+ RUN4(int64_t) \
+ RUN4(uint64_t)
 
 int main ()
 {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
index e2bdd0fe904..a0d3802be65 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c
@@ -4,3 +4,4 @@
 #include "vsub-template.h"
 
 /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
index f7a2691b9f3..562c026a7e4 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c
@@ -4,3 +4,4 @@
 #include "vsub-template.h"
 
 /* { dg-final { scan-assembler-times {\tvsub\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvrsub\.vi} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
index 8c0a9c99217..47f07f13462 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vsub-template.h
@@ -16,6 +16,22 @@
   dst[i] = a[i] - b;   \
   }
 
+#define TEST3_TYPE(TYPE)   \
+  __attribute__((noipa))   \
+  void vsubi_##TYPE (TYPE *dst, TYPE *a, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = -16 - a[i]; \
+  }
+
+#define TEST4_TYPE(TYPE)   \
+  __attribute__((noipa))   \
+  void vsubi2_##TYPE (TYPE *dst, TYPE *a, int n)   \
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = 15 - a[i];  \
+  }
+
 /* *int8_t not autovec currently. */
 #define TEST_ALL() \
  TEST_TYPE(int16_t)\
@@ -30,5 +46,17 @@
  TEST2_TYPE(uint32_t)  \
  TEST2_TYPE(int64_t)   \
  TEST2_TYPE(uint64_t)
+ TEST3_TYPE(int16_t)   \
+ TEST3_TYPE(uint16_t)  \
+ TEST3_TYPE(int32_t)   \
+ TEST3_TYPE(uint32_t)  \
+ TEST3_TYPE(int64_t)   \
+ TEST3_TYPE(uint64_t)  \
+ TEST4_TYPE(int16_t)   \
+ TEST4_TYPE(uint16_t)  \
+ TEST4_TYPE(int32_t)   \
+ TEST4_TYPE(uint32_t)  \
+ TEST4_TYPE(int64_t)   \
+

[PATCH][NFC][committed] aarch64: Simplify output template emission code for a few patterns

2023-05-31 Thread Kyrylo Tkachov via Gcc-patches

Hi all,

If the output code for a define_insn just does a switch (which_alternative) 
with no other computation we can almost always
replace it with more compact MD syntax for each alternative in a 
mult-alternative '@' block.
This patch cleans up some such patterns in the aarch64 backend, making them 
shorter and more concise.
No behavioural change intended.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (*aarch64_simd_mov): 
Rewrite
output template to avoid explicit switch on which_alternative.
(*aarch64_simd_mov): Likewise.
(and3): Likewise.
(ior3): Likewise.
* config/aarch64/aarch64.md (*mov_aarch64): Likewise.


outp.patch
Description: outp.patch

Re: [PATCH] aarch64: Add pattern for bswap + rotate [PR 110039]

2023-05-31 Thread Richard Sandiford via Gcc-patches

Christophe Lyon  writes:
> On Wed, 31 May 2023 at 11:49, Richard Sandiford 
> wrote:
>
>> Christophe Lyon  writes:
>> > After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
>> > pattern to match the new GIMPLE form.
>> >
>> > With this patch, gcc.target/aarch64/rev16_2.c passes again.
>> >
>> > 2023-05-31  Christophe Lyon  
>> >
>> >   PR target/110039
>> >   gcc/
>> >   * config/aarch64/aarch64.md (aarch64_rev162_alt3): New
>> >   pattern.
>> > ---
>> >  gcc/config/aarch64/aarch64.md | 10 ++
>> >  1 file changed, 10 insertions(+)
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.md
>> b/gcc/config/aarch64/aarch64.md
>> > index 8b8951d7b14..663353791fd 100644
>> > --- a/gcc/config/aarch64/aarch64.md
>> > +++ b/gcc/config/aarch64/aarch64.md
>> > @@ -6267,6 +6267,16 @@
>> >[(set_attr "type" "rev")]
>> >  )
>> >
>> > +;; Similar pattern to mache (rotate (bswap) 16)
>> > +(define_insn "aarch64_rev162_alt3"
>> > +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> > +(rotate:GPI (bswap:GPI (match_operand:GPI 1 "register_operand"
>> "r"))
>> > +(const_int 16)))]
>> > +  ""
>> > +  "rev16\\t%0, %1"
>> > +  [(set_attr "type" "rev")]
>> > +)
>> > +
>>
>> Doesn't this have to be :SI only?  The rtl expression and the
>> instruction are different for :DI.
>>
> Do you mean the other two examples in the testcase?
> ( __rev16_64_alt, __rev16_64)
> They currently use aarch64_rev16di2_alt1 and aarch64_rev16di2_alt2
> respectively.

I meant more that the new pattern would generate wrong code if someone
wrote a 64-bit bswap followed by a 64-bit rotate left.

Richard

1 2 >

1 - 100 of 147 matches

Mail list logo