Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread Kewen.Lin via Gcc-patches
Hi,

on 2023/5/4 21:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
>For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>  _36 = MIN_EXPR ;
> 
>  First length (MIN (X, VF/N)):
>loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>  Second length (X - MIN (X, 1 * VF/N)):
>loop_len_16 = _36 - loop_len_15;
> 
>  Third length (X - MIN (X, 2 * VF/N)):
>_38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
>loop_len_17 = _36 - _38;
> 
>  Forth length (X - MIN (X, 3 * VF/N)):
>_39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
>loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make 
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
> Thanks.
> 
> ---
>  gcc/cfgloopmanip.cc|   2 +-
>  gcc/doc/md.texi|  34 +++
>  gcc/gimple-loop-interchange.cc |   2 +-
>  gcc/internal-fn.def|   1 +
>  gcc/optabs.def |   1 +
>  gcc/tree-ssa-loop-ivcanon.cc   |   2 +-
>  gcc/tree-ssa-loop-ivopts.cc|   2 +-
>  gcc/tree-ssa-loop-manip.cc |  18 +-
>  gcc/tree-ssa-loop-manip.h  |   4 +-
>  gcc/tree-vect-data-refs.cc |   8 +-
>  gcc/tree-vect-loop-manip.cc| 374 -
>  gcc/tree-vect-loop.cc  |  32 ++-
>  gcc/tree-vect-stmts.cc |  89 +++-
>  gcc/tree-vectorizer.h  |   4 +-
>  14 files changed, 535 insertions(+), 38 deletions(-)
> 
> diff --git a/gcc/cfgloopmanip.cc b/gcc/cfgloopmanip.cc
> index 0e3ad8ed742..6e09dcbb0b1 100644
> --- a/gcc/cfgloopmanip.cc
> +++ b/gcc/cfgloopmanip.cc
> @@ -826,7 +826,7 @@ create_empty_loop_on_edge (edge entry_edge,
>  }
>  
>gsi = gsi_last_bb (loop_header);
> -  create_iv (initial_value, stride, iv, loop, , false,
> +  create_iv (initial_value, PLUS_EXPR, stride, iv, loop, , false,
>iv_before, iv_after);
>  
>/* Insert loop exit condition.  */
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index cc4a93a8763..99cf0cdbdca 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>  @end smallexample
>  
> +@cindex @code{select_vl@var{m}} instruction pattern
> +@item @code{select_vl@var{m}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The value of operand 0 is target dependent and flexible in each iteration.
> +The operation of this pattern can be:
> +
> +@smallexample
> +Case 1:
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +
> +In this case, only the last iteration of the loop is partial iteration.
> +@end smallexample
> +
> +@smallexample
> +Case 2:
> +if (operand1 <= operand2)
> +  operand0 = operand1;
> +else if (operand1 < 2 * operand2)
> +  operand0 = IN_RANGE (ceil (operand1 / 2), operand2);
> +else
> +  operand0 = operand2;
> +
> +This case will evenly distribute work over the last 2 iterations of a 
> 

[PATCH] Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparison

2023-05-07 Thread Andrew Pinski via Gcc-patches
This patch adds the support for match that was implemented for PR 87913 in 
phiopt.
It implements it by adding support to minmax_from_comparison for the check.
It uses the range information if available which allows to produce MIN/MAX 
expression
when comparing against the lower/upper bound of the range instead of lower/upper
of the type.

minmax-20.c is the new testcase which tests the ranges part.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
* match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
Add ne as a possible cmp.
((a CMP b) ? minmax : minmax pattern): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/minmax-20.c: New test.
---
 gcc/fold-const.cc | 26 +++
 gcc/match.pd  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c | 12 +++
 3 files changed, 40 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index db54bfc5662..d90671b9975 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -173,6 +173,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
exp1, tree exp2, tree exp
  /* X > Y - 1 equals to X >= Y.  */
  if (cmp == GT_EXPR)
code = GE_EXPR;
+ /* a != MIN_RANGE ? a : MIN_RANGE+1 -> MAX_EXPR+1, 
a> */
+ if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
+   {
+ value_range r;
+ get_range_query (cfun)->range_of_expr (r, exp0);
+ if (r.undefined_p ())
+   r.set_varying (TREE_TYPE (exp0));
+
+ widest_int min = widest_int::from (r.lower_bound (),
+TYPE_SIGN (TREE_TYPE (exp0)));
+ if (min == wi::to_widest (exp1))
+   code = MAX_EXPR;
+   }
}
   if (wi::to_widest (exp1) == (wi::to_widest (exp3) + 1))
{
@@ -182,6 +195,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
exp1, tree exp2, tree exp
  /* X >= Y + 1 equals to X > Y.  */
  if (cmp == GE_EXPR)
  code = GT_EXPR;
+ /* a != MAX_RANGE ? a : MAX_RANGE-1 -> MIN_EXPR-1, 
a> */
+ if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
+   {
+ value_range r;
+ get_range_query (cfun)->range_of_expr (r, exp0);
+ if (r.undefined_p ())
+   r.set_varying (TREE_TYPE (exp0));
+
+ widest_int max = widest_int::from (r.upper_bound (),
+TYPE_SIGN (TREE_TYPE (exp0)));
+ if (max == wi::to_widest (exp1))
+   code = MIN_EXPR;
+   }
}
 }
   if (code != ERROR_MARK
diff --git a/gcc/match.pd b/gcc/match.pd
index a55ede838cd..95f7e9a6abc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4751,7 +4751,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
Case 2)
(cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).  */
-(for cmp (lt le gt ge eq)
+(for cmp (lt le gt ge eq ne)
  (simplify
   (cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
   (with
@@ -4942,7 +4942,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Optimize (a CMP b) ? minmax : minmax
to minmax, c> */
 (for minmax (min max)
- (for cmp (lt le gt ge)
+ (for cmp (lt le gt ge ne)
   (simplify
(cond (cmp @1 @3) (minmax:c @1 @4) (minmax:c @2 @4))
(with
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
new file mode 100644
index 000..481c375f5f9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt2" } */
+
+int f(int num)
+{
+  if (num < 3) __builtin_unreachable();
+  return num != 3 ?  num : 4;
+}
+
+/* In phiopt2 with the range information, this should be turned into
+   a MAX_EXPR.  */
+/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt2" } } */
-- 
2.31.1



[PATCH] RISC-V: Fix ugly && incorrect codes of RVV auto-vectorization

2023-05-07 Thread juzhe . zhong
From: Juzhe-Zhong 

1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
   targethook, current RISC-V has supported this target hook, we can't make
   it supported without movmisalign pattern.

2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred.
   These 2 functions are comming from intrinsic builtin frameworks.
   We are sure we don't need them in auto-vectorization implementation.

3. Refine mask mode implementation.

4. We should not have "riscv_vector_" in riscv_vector namspace since it
   makes the codes inconsistent and ugly.
   
   For example:
   Before this patch:
   static opt_machine_mode
   riscv_get_mask_mode (machine_mode mode)
   {
 machine_mode mask_mode = VOIDmode;
 if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode 
(mode).exists (_mode))
  return mask_mode;
   ..

   After this patch:
   riscv_get_mask_mode (machine_mode mode)
   {
 machine_mode mask_mode = VOIDmode;
 if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists 
(_mode))
  return mask_mode;
   ..

5. Fix fail testcase fixed-vlmax-1.c.

gcc/ChangeLog:

* config/riscv/autovec.md (movmisalign): New pattern.
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove 
global extern.
(get_tail_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): 
Ditto.
(get_mask_policy_for_pred): Ditto
* config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.

---
 gcc/config/riscv/autovec.md   | 11 ++
 gcc/config/riscv/riscv-protos.h   |  5 +--
 gcc/config/riscv/riscv-v.cc   | 36 ++-
 gcc/config/riscv/riscv-vector-builtins.cc |  4 +--
 gcc/config/riscv/riscv-vector-builtins.h  |  3 --
 gcc/config/riscv/riscv.cc |  3 +-
 .../riscv/rvv/autovec/fixed-vlmax-1.c |  4 +--
 7 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5d46ff57ab..f1c5ff5951b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -47,3 +47,14 @@
  operands[1], operands[2], mode);
   DONE;
 })
+
+(define_expand "movmisalign"
+  [(set (match_operand:V 0 "nonimmediate_operand")
+   (match_operand:V 1 "general_operand"))]
+  "TARGET_VECTOR"
+  {
+/* Equivalent to a normal move for our purpooses.  */
+emit_move_insn (operands[0], operands[1]);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index d83ea2c77e4..c0293a306f9 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,10 +218,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode,
 rtx gen_avl_for_scalar_move (rtx);
 void expand_tuple_move (machine_mode, rtx *);
 machine_mode preferred_simd_mode (scalar_mode);
-extern bool riscv_vector_mask_mode_p (machine_mode);
-extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
-extern rtx get_mask_policy_no_pred (void);
-extern rtx get_tail_policy_no_pred (void);
+opt_machine_mode get_mask_mode (machine_mode);
 }
 
 /* We classify builtin types into two classes:
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 8c7f3206771..7ca49ca67c1 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -43,7 +43,6 @@
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
-#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 #include "targhooks.h"
 
@@ -479,43 +478,12 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
-/* Return the mask policy for no predication.  */
-rtx
-get_mask_policy_no_pred (void)
-{
-  return get_mask_policy_for_pred (PRED_TYPE_none);
-}
-
-/* Return the tail policy for no predication.  */
-rtx
-get_tail_policy_no_pred (void)
-{
-  return get_tail_policy_for_pred (PRED_TYPE_none);
-}
-
-/* Return true if it is a RVV mask mode.  */
-bool
-riscv_vector_mask_mode_p (machine_mode mode)
-{
-  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
- || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
- || mode == VNx64BImode);
-}
-
 /* Return the appropriate mask mode for MODE.  */
 
 opt_machine_mode
-riscv_vector_get_mask_mode 

Re: [PATCH 2/2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-07 Thread Kito Cheng via Gcc-patches
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 000..1c2f390269e
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,55 @@
...
+(define_insn "gpr_multi_pop"
+  [(unspec_volatile [(match_operand 0 "const_int_operand")
+ (match_operand 1 "const_int_operand")]
+UNSPECV_GPR_MULTI_POP)]

I would strongly suggest modeling the right memory and register access
here correctly instead of using unspec,
and same for other two patterns.

That will help GCC know the semantics of this operation.


Re: [PATCH 1/2] [RISC-V] disable shrink-wrap-separate if zcmp enabled.

2023-05-07 Thread Kito Cheng via Gcc-patches
shrink-wraping already gated by Os so I think we don't need add more
gate here, unless we are trying to claim force optimize for size if
zcmp is present.

On Sat, May 6, 2023 at 4:41 PM Fei Gao  wrote:
>
> zcmp aims to reduce code size, while shrink-wrap-separate prefers
> speed to code size. So disable shrink-wrap-separate if zcmp
> enabled, just like what save-restore has done.
>
> author: Zhangjin Liao liaozhang...@eswincomputing.com
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_get_separate_components):
> ---
>  gcc/config/riscv/riscv.cc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 45a63cab9c9..629e5e45cac 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5729,7 +5729,8 @@ riscv_get_separate_components (void)
>
>if (riscv_use_save_libcall (>machine->frame)
>|| cfun->machine->interrupt_handler_p
> -  || !cfun->machine->frame.gp_sp_offset.is_constant ())
> +  || !cfun->machine->frame.gp_sp_offset.is_constant ()
> +  || TARGET_ZCMP)
>  return components;
>
>offset = cfun->machine->frame.gp_sp_offset.to_constant ();
> --
> 2.17.1
>


[PATCH V2] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-05-07 Thread liuhongt via Gcc-patches
> > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info 
> > *vinfo, vec *vec_oprnds,
> >stmt_vec_info stmt_info,
> >vec _dsts,
> >gimple_stmt_iterator *gsi,
> > -  slp_tree slp_node, enum tree_code 
> > code)
> > +  slp_tree slp_node, enum tree_code 
> > code,
> > +  bool last_stmt_p)
>
> Can you please document this new parameter?
>
Changed.

>
> I understand what you are doing, but somehow it looks a bit awkward?
> Maybe we should split the NARROW case into NARROW_SRC and NARROW_DST?
> The case of narrowing the source because we know its range isn't a
> good fit for the
> flow.
Changed.

Here's updated patch.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
intermediate integer type whenever gimple ranger can tell it's safe.

.i.e.
When there's no direct optab for vector long long -> vector float, but
the value range of integer can be represented as int, try vector int
-> vector float if availble.

gcc/ChangeLog:

PR tree-optimization/108804
* tree-vect-patterns.cc (vect_get_range_info): Remove static.
* tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
Add new parameter narrow_src_p.
(vectorizable_conversion): Enhance NARROW FLOAT_EXPR
vectorization by truncating to lower precision.
* tree-vectorizer.h (vect_get_range_info): New declare.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr108804.c: New test.
---
 gcc/testsuite/gcc.target/i386/pr108804.c |  15 +++
 gcc/tree-vect-patterns.cc|   2 +-
 gcc/tree-vect-stmts.cc   | 135 +--
 gcc/tree-vectorizer.h|   1 +
 4 files changed, 121 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr108804.c

diff --git a/gcc/testsuite/gcc.target/i386/pr108804.c 
b/gcc/testsuite/gcc.target/i386/pr108804.c
new file mode 100644
index 000..2a43c1e1848
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108804.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx2 -Ofast -fdump-tree-vect-details" } */
+/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 1 "vect" } } */
+
+typedef unsigned long long uint64_t;
+uint64_t d[512];
+float f[1024];
+
+void foo() {
+for (int i=0; i<512; ++i) {
+uint64_t k = d[i];
+f[i]=(k & 0x3F30);
+}
+}
+
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a49b0953977..dd546b488a4 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -61,7 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Return true if we have a useful VR_RANGE range for VAR, storing it
in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  */
 
-static bool
+bool
 vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
 {
   value_range vr;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 6b7dbfd4a23..3da89a8402d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
+#include "gimple-range.h"
 #include "tree-ssa-loop-niter.h"
 #include "gimple-fold.h"
 #include "regs.h"
@@ -4791,7 +4792,9 @@ vect_gen_widened_results_half (vec_info *vinfo, enum 
tree_code code,
 
 /* Create vectorized demotion statements for vector operands from VEC_OPRNDS.
For multi-step conversions store the resulting vectors and call the function
-   recursively.  */
+   recursively. When NARROW_SRC_P is true, there's still a conversion after
+   narrowing, don't store the vectors in the SLP_NODE or in vector info of
+   the scalar statement(or in STMT_VINFO_RELATED_STMT chain).  */
 
 static void
 vect_create_vectorized_demotion_stmts (vec_info *vinfo, vec *vec_oprnds,
@@ -4799,7 +4802,8 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
vec *vec_oprnds,
   stmt_vec_info stmt_info,
   vec _dsts,
   gimple_stmt_iterator *gsi,
-  slp_tree slp_node, enum tree_code code)
+  slp_tree slp_node, enum tree_code code,
+  bool narrow_src_p)
 {
   unsigned int i;
   tree vop0, vop1, new_tmp, vec_dest;
@@ -4815,9 +4819,9 @@ vect_create_vectorized_demotion_stmts (vec_info *vinfo, 
vec *vec_oprnds,
   new_tmp = make_ssa_name (vec_dest, new_stmt);
   gimple_assign_set_lhs (new_stmt, new_tmp);
   vect_finish_stmt_generation (vinfo, stmt_info, 

RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Li, Pan2 via Gcc-patches
Update the X86 memory bytes allocated by below changes (included kito's patch 
for the tree common part).

rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.

Bytes allocated with O2:
-
Benchmark   |  upstream | with the PATCH
-
400.perlbench   | 25286185160   | 25286590847 ~0.0%
401.bzip2   | 1429883731| 1430373103 ~0.0%
403.gcc | 55023568981   | 55027574220 ~0.0%
429.mcf | 1360975660| 1360959361 ~0.0%
445.gobmk   | 12791636502   | 12789648370 ~0.0%
456.hmmer   | 9354433652| 9353899089 ~0.0%
458.sjeng   | 1991260562| 1991107773 ~0.0%
462.libquantum  | 1725112078| 1724972077 ~0.0%
464.h264ref | 8597673515| 8597748172 ~0.0%
471.omnetpp | 37613034778   | 37614346380 ~0.0%
473.astar   | 3817295518| 3817226365 ~0.0%
483.xalancbmk   | 149418776991  | 149405214817 ~0.0%

Bytes allocated with Ofast + funroll-loops:
--
Benchmark   |  upstream | with the PATCH
--
400.perlbench   | 30438407499   | 30568217795 +0.4% 
401.bzip2   | 2277114519| 2318588280 +1.8%
403.gcc | 64499664264   | 64764400606 +0.4%
429.mcf | 1361486758| 1399872438 +2.8%
445.gobmk   | 15258056111   | 15392769408 +0.9%
456.hmmer   | 10896615649   | 10934649010 +0.3%
458.sjeng   | 2592620709| 2641551464 +1.9%
462.libquantum  | 1814487525| 1856446214 +2.3%
464.h264ref | 13528736878   | 13606989269 +0.6%
471.omnetpp | 38721066702   | 38908678658 +0.5%
473.astar   | 3924015756| 3967867190 +1.1%
483.xalancbmk   | 165897692838  | 166818255397 +0.6%

Pan

-Original Message-
From: Li, Pan2 
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

Yes, that makes sense, will have a try and keep you posted.

Pan

-Original Message-
From: Kito Cheng 
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; jeffreyalaw ; gcc-patches 
; palmer ; jakub 
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit

I think x86 first? The major thing we want to make sure is that this change 
won't affect those targets which do not really require 16 bit machine_mode too 
much.


On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches 
 wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the 
> overall diff before start to make sure my understand is correct) the test 
> again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zh...@rivai.ai 
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng ; Li, Pan2 
> Cc: rguenther ; richard.sandiford 
> ; jeffreyalaw ; 
> gcc-patches ; palmer ; 
> jakub 
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to 
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> 
> juzhe.zh...@rivai.ai
>
> From: Kito Cheng
> Date: 2023-05-06 09:53
> To: Li, Pan2
> CC: Richard Biener;
> 钟居哲;
> richard.sandiford; Jeff 
> Law;
> gcc-patches;
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This 
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 

Re: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743]

2023-05-07 Thread juzhe.zh...@rivai.ai
Gentle ping this patch.

Is this Ok for trunk? Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-06 19:14
To: gcc-patches
CC: kito.cheng; Juzhe-Zhong
Subject: [PATCH] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli 
[PR 109743]
From: Juzhe-Zhong 
 
This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743.
 
This issue happens is because we are currently very conservative in 
optimization of user vsetvli.
 
Consider this following case:
 
bb 1:
  vsetvli a5,a4... (demand AVL = a4).
bb 2:
  RVV insn use a5 (demand AVL = a5).
 
LCM will hoist vsetvl of bb 2 into bb 1.
We don't do AVL propagation for this situation since it's complicated that
we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 
2.
They are not necessary the consecutive blocks.
 
This patch is doing the optimizations after LCM, we will check and eliminate 
the vsetvli
in LCM inserted edge if such vsetvli is redundant. Such approach is much 
simplier and safe.
 
code:
void
foo2 (int32_t *a, int32_t *b, int n)
{
  if (n <= 0)
  return;
  int i = n;
  size_t vl = __riscv_vsetvl_e32m1 (i);
 
  for (; i >= 0; i--)
  {
vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl);
__riscv_vse32_v_i32m1 (b, v, vl);
 
if (i >= vl)
  continue;
 
if (i == 0)
  return;
 
vl = __riscv_vsetvl_e32m1 (i);
  }
}
 
Before this patch:
foo2:
.LFB2:
.cfi_startproc
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,mu
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,mu
addia4,a4,-1
vsetvli zero,a5,e32,m1,ta,ma  <- can be eliminated.
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret
 
After this patch:
f:
ble a2,zero,.L1
mv  a4,a2
li  a3,-1
vsetvli a5,a2,e32,m1,ta,ma
.L5:
vle32.v v1,0(a0)
vse32.v v1,0(a1)
bgeua4,a5,.L3
.L10:
beq a2,zero,.L1
vsetvli a5,a4,e32,m1,ta,ma
addia4,a4,-1
vle32.v v1,0(a0)
vse32.v v1,0(a1)
addiw   a2,a2,-1
bltua4,a5,.L10
.L3:
addiw   a2,a2,-1
addia4,a4,-1
bne a2,a3,.L5
.L1:
ret
 
PR target/109743
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::commit_vsetvls): Add 
optimization for LCM inserted edge.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  | 42 +++
.../gcc.target/riscv/rvv/vsetvl/pr109743-1.c  | 26 
.../gcc.target/riscv/rvv/vsetvl/pr109743-2.c  | 27 
.../gcc.target/riscv/rvv/vsetvl/pr109743-3.c  | 28 +
.../gcc.target/riscv/rvv/vsetvl/pr109743-4.c  | 28 +
5 files changed, 151 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f55907a410e..fcee7fdf323 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3834,6 +3834,48 @@ pass_vsetvl::commit_vsetvls (void)
  const vector_insn_info *require
= m_vector_manager->vector_exprs[i];
  gcc_assert (require->valid_or_dirty_p ());
+
+   /* Here we optimize the VSETVL is hoisted by LCM:
+
+ Before LCM:
+bb 1:
+  vsetvli a5,a2,e32,m1,ta,mu
+bb 2:
+  vsetvli zero,a5,e32,m1,ta,mu
+  ...
+
+ After LCM:
+bb 1:
+  vsetvli a5,a2,e32,m1,ta,mu
+  LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate
+bb 2:
+  ...
+*/
+   const basic_block pred_cfg_bb = eg->src;
+   const auto block_info
+ = m_vector_manager->vector_block_infos[pred_cfg_bb->index];
+   const insn_info *pred_insn = block_info.reaching_out.get_insn ();
+   if (pred_insn && vsetvl_insn_p (pred_insn->rtl ())
+   && require->get_avl_source ()
+   && require->get_avl_source ()->insn ()
+   && require->skip_avl_compatible_p (block_info.reaching_out))
+ {
+   vector_insn_info new_info = *require;
+   new_info.set_avl_info (
+ block_info.reaching_out.get_avl_info ());
+   new_info
+ = block_info.reaching_out.merge (new_info, LOCAL_MERGE);
+   change_vsetvl_insn (pred_insn, 

RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Li, Pan2 via Gcc-patches
I see. Thank you, will have a try soon.

Pan

-Original Message-
From: Jeff Law  
Sent: Sunday, May 7, 2023 11:24 PM
To: Li, Pan2 ; Kito Cheng 
Cc: juzhe.zh...@rivai.ai; rguenther ; richard.sandiford 
; gcc-patches ; palmer 
; jakub 
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 
16-bit



On 5/6/23 19:55, Li, Pan2 wrote:
> It looks like we cannot simply swap the code and mode in rtx_def, the code 
> may have to be the same bits as the tree_code in tree_base. Or we will meet 
> ICE like below.
> 
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> 
> static inline decl_or_value
> dv_from_value (rtx value)
> {
>decl_or_value dv;
>dv = value;
>gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
>return dv;
Ugh.  We really just need to fix this code.  It assumes particular structure 
layouts and that's just wrong/dumb.

So I think think the first step is to fix up this crap code in var-tracking.  
That should be a patch unto itself.  Then we'd have the structure changes as a 
separate change.

Jeff


Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Steve Kargl via Gcc-patches
Harald,
Thanks for keeping us honest.  I didn't check what other 
separators might cause a problem.

After 2 decades of working on gfortran, I've come to conclusion
that -std=f2018 should be the default.  When f2023 is ratified,
the default becomes -std=f2023.  All GNU fortran extension 
should be behind an option, and we should be aggressive 
eliminating extensions.

Yes, this means that 'real*4' and similar would require 
a -fallow-nonstandard-declaration option.

-- 
steve

On Sun, May 07, 2023 at 08:33:43PM +0200, Harald Anlauf wrote:
> Hi Jerry,
> 
> I've made a small compiler survey how they behave on namelist read
> from an internal unit when:
> 
> 1.) there is a single input line of the type
> "" // testchar // " n = 666/"
> 
> 2.) the input spans 2 lines split after the testchar
> 
> 3.) same as 2.) but first line right-adjusted
> 
> See attached source code.
> 
> Competitors: Intel, NAG, NVidia, gfortran at r14-547 with -std=f2018.
> 
> My findings were (last column is iostat, next-to-last is n read or -1):
> 
> NAG:
> 
>  Compiler version = NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
>  1-line:   > < 666 0
>  2-line/left:  > < 666 0
>  2-line/right: > < 666 0
>  1-line:   >!< -1 187
>  2-line/left:  >!< -1 187
>  2-line/right: >!< -1 187
>  1-line:   >/< -1 187
>  2-line/left:  >/< -1 187
>  2-line/right: >/< -1 187
>  1-line:   >,< -1 187
>  2-line/left:  >,< -1 187
>  2-line/right: >,< -1 187
>  1-line:   >;< -1 187
>  2-line/left:  >;< -1 187
>  2-line/right: >;< -1 187
>  1-line:   tab 666 0
>  2-line/left:  tab 666 0
>  2-line/right: tab 666 0
>  1-line:   lf -1 187
>  2-line/left:  lf -1 187
>  2-line/right: lf -1 187
>  1-line:   ret -1 187
>  2-line/left:  ret -1 187
>  2-line/right: ret -1 187
> 
> My interpretation of this is that NAG treats tab as (white)space,
> everything else gives an error.  This is the strictest compiler.
> 
> Intel:
> 
>  Compiler version = Intel(R) Fortran Intel(R) 64 Compiler Classic for
> applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_00
>  1-line:   > < 666   0
>  2-line/left:  > < 666   0
>  2-line/right: > < 666   0
>  1-line:   >!<  -1  -1
>  2-line/left:  >!< 666   0
>  2-line/right: >!< 666   0
>  1-line:   >/<  -1   0
>  2-line/left:  >/<  -1   0
>  2-line/right: >/<  -1   0
>  1-line:   >,<  -1  17
>  2-line/left:  >,<  -1  17
>  2-line/right: >,<  -1  17
>  1-line:   >;<  -1  17
>  2-line/left:  >;<  -1  17
>  2-line/right: >;<  -1  17
>  1-line:   tab 666   0
>  2-line/left:  tab 666   0
>  2-line/right: tab 666   0
>  1-line:   lf 666   0
>  2-line/left:  lf 666   0
>  2-line/right: lf 666   0
>  1-line:   ret  -1  17
>  2-line/left:  ret  -1  17
>  2-line/right: ret  -1  17
> 
> Nvidia:
> 
>  Compiler version = nvfortran 23.3-0
>  1-line:   > <  6660
>  2-line/left:  > <  6660
>  2-line/right: > <  6660
>  1-line:   >!<   -1   -1
>  2-line/left:  >!<   -1   -1
>  2-line/right: >!<   -1   -1
>  1-line:   >/<   -1   -1
>  2-line/left:  >/<   -1   -1
>  2-line/right: >/<   -1   -1
>  1-line:   >,<   -1   -1
>  2-line/left:  >,<   -1   -1
>  2-line/right: >,<   -1   -1
>  1-line:   >;<   -1   -1
>  2-line/left:  >;<   -1   -1
>  2-line/right: >;<   -1   -1
>  1-line:   tab  6660
>  2-line/left:  tab  6660
>  2-line/right: tab  6660
>  1-line:   lf   -1   -1
>  2-line/left:  lf  6660
>  2-line/right: lf  6660
>  1-line:   ret  6660
>  2-line/left:  ret  6660
>  2-line/right: ret  6660
> 
> gfortran (see above):
> 
>  Compiler version = GCC version 14.0.0 20230506 (experimental)
>  1-line:   > < 666   0
>  2-line/left:  > < 666   0
>  2-line/right: > < 666   0
>  1-line:   >!<  -1  -1
>  2-line/left:  >!<  -1   0
>  2-line/right: >!< 666   0
>  1-line:   >/<  -1   0
>  2-line/left:  >/<  -1   0
>  2-line/right: >/<  -1   0
>  1-line:   >,< 6665010
>  2-line/left:  >,< 6665010
>  2-line/right: >,< 6665010
>  1-line:   

aarch64: Add scheduling model for Neoverse V1

2023-05-07 Thread Evandro Menezes via Gcc-patches
This patch adds the scheduling model for Neoverse V1, based on the information 
from the “Arm Neoverse V1 Software Optimization Guide” and on static and 
dynamic analysis of internal and public benchmarks.  Results are forthcoming. 

-- 
Evandro Menezes


0001-aarch64-Add-scheduling-model-for-Neoverse-V1.patch
Description: Binary data



[PATCH] MATCH: Move `a <= CST1 ? MAX : a` optimization to match

2023-05-07 Thread Andrew Pinski via Gcc-patches
This moves the `a <= CST1 ? MAX : a` optimization
from phiopt to match. It just adds a new pattern to match.pd.

There is one more change needed before being able to remove
minmax_replacement from phiopt.

A few notes on the testsuite changes:
* phi-opt-5.c is now able to optimize at phiopt1 so remove
the xfail.
* pr66726-4.c can be optimized during fold before phiopt1
so need to change the scanning.
* pr66726-5.c needs two phiopt passes currently to optimize
to the right thing, it needed 2 phiopt passes before, the cast
from int to unsigned char is the reason.
* pr66726-6.c is what the original pr66726-4.c was testing
before the fold was able to optimize it.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (`(a CMP CST1) ? max : a`): New
pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
* gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
works.
* gcc.dg/tree-ssa/pr66726-5.c: New test.
* gcc.dg/tree-ssa/pr66726-6.c: New test.
---
 gcc/match.pd  | 18 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c |  5 +++-
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c | 28 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c | 17 ++
 5 files changed, 68 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ceae1c34abc..a55ede838cd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4954,6 +4954,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (code == MAX_EXPR)
   (minmax (max @1 @2) @4)))
 
+/* Optimize (a CMP CST1) ? max : a */
+(for cmp(gt  ge  lt  le)
+ minmax (min min max max)
+ (simplify
+  (cond (cmp @0 @1) (minmax:c@2 @0 @3) @4)
+   (with
+{
+  tree_code code = minmax_from_comparison (cmp, @0, @1, @0, @4);
+}
+(if ((cmp == LT_EXPR || cmp == LE_EXPR)
+&& code == MIN_EXPR
+ && integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node, @3, 
@1)))
+ (min @2 @4)
+ (if ((cmp == GT_EXPR || cmp == GE_EXPR)
+ && code == MAX_EXPR
+  && integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node, @3, 
@1)))
+  (max @2 @4))
+
 /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
 (simplify
  (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
index 5f78a1ba6dc..e78d9d8b83d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
@@ -39,7 +39,7 @@ float repl2 (float vary)
 
 /* phiopt1 confused by predictors.  */
 /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt1" } } */
-/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" { xfail *-*-* 
} } } */
+/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" } } */
 /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt2"} } */
 /* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt2"} } */
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
index 4e43522f3a3..930ad5fb79f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
@@ -9,4 +9,7 @@ foo (unsigned char *p, int i)
   *p = SAT (i);
 }
 
-/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 1 "phiopt1" } } */
+/* fold could optimize SAT before phiopt1 so only match on the
+   MIN/MAX here.  */
+/* { dg-final { scan-tree-dump-times "= MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "= MAX_EXPR" 1 "phiopt1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
new file mode 100644
index 000..4b5066cdb6b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-phiopt1-details -fdump-tree-phiopt2-details 
-fdump-tree-optimized" } */
+
+#define SAT(x) (x < 0 ? 0 : (x > 255 ? 255 : x))
+
+unsigned char
+foo (unsigned char *p, int i)
+{
+  if (i < 0)
+return 0;
+  {
+int t;
+if (i > 255)
+  t = 255;
+else
+  t = i;
+return t;
+  }
+}
+
+/* Because of the way PHIOPT works, it only does the merging of BBs after it 
is done so we get the case were we can't
+   optimize the above until phiopt2 right now.  */
+/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 2 "phiopt1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
straightline code" 0 "phiopt2" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "= MIN_EXPR" 1 "phiopt1" } } */
+/* { dg-final { 

Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread 钟居哲
>> It's been pretty standard to stick with just PLUS_EXPR for this stuff
>> and instead negate the constant to produce the same effect as
>> MINUS_EXPR.  Is there a reason we're not continuing that practice?
>> Sorry if you've answered this already -- if you have, you can just point
>> me at the prior discussion and I'll read it.

Richard (Sandiford) has answered this question:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616745.html 
And as Richard (Biener) said, it will make IVOPTs failed if we have variable 
IVs.
However, we already have implemented variable IVs in downstream RVV GCC
and works fine and I don't see any bad codegen so far. So, I think it may not
be a serious issue for RVV.

Besides, this implementation is not my idea which is just following the guide 
coming
from RVV ISA spec.  And also inspired by the implementation coming from LLVM.
Reference:
1). RVV ISA: 
https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s 
2). LLVM length stuff implementation (Should note that "get_vector_length" 
pattern in LLVM is
 totally doing the same thing as "select_vl" pattern in GCC here:
 https://reviews.llvm.org/D99750 

vvaddint32:
vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
vle32.v v0, (a1) # Get first vector
  sub a0, a0, t0 # Decrement number done
  slli t0, t0, 2 # Multiply number done by 4 bytes
  add a1, a1, t0 # Bump pointer
vle32.v v1, (a2) # Get second vector
  add a2, a2, t0 # Bump pointer
vadd.vv v2, v0, v1   # Sum vectors
vse32.v v2, (a3) # Store result
  add a3, a3, t0 # Bump pointer
  bnez a0, vvaddint32# Loop back
  ret# Finished

Notice "sub a0, a0, t0", the "t0" is the variable coming from "vsetvli t0, a0, 
e32, ta, ma"
which is generated by "get_vector_length" in LLVM, or similiar it is also 
generatd by "select_vl" in GCC too.

Other comments from Jeff will be addressed in the next patch (V5), I will wait 
for
Richards (both Sandiford && Biener that are the experts in Loop Vectorizer) 
comments.
Then send V5 patch which is including all comments from Jeff && Richards 
(Sandiford && Biener).

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-07 23:19
To: juzhe.zhong; gcc-patches
CC: richard.sandiford; rguenther
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
 
 
On 5/4/23 07:25, juzhe.zh...@rivai.ai wrote:
> From: Ju-Zhe Zhong 
> 
> This patch is fixing V3 patch:
> https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/
> 
> Fix issues according to Richard Sandiford && Richard Biener.
> 
> 1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
> 2. Support multiple-rgroup for non-SLP auto-vectorization.
> 
> For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the 
> total length:
> 
>   _36 = MIN_EXPR ;
> 
>   First length (MIN (X, VF/N)):
> loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
> 
>   Second length (X - MIN (X, 1 * VF/N)):
> loop_len_16 = _36 - loop_len_15;
> 
>   Third length (X - MIN (X, 2 * VF/N)):
> _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
> loop_len_17 = _36 - _38;
> 
>   Forth length (X - MIN (X, 3 * VF/N)):
> _39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
> loop_len_18 = _36 - _39;
> 
> The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
> since using SELECT_VL
> to adapt induction IV consumes more instructions than just using MIN_EXPR. 
> Also, during testing,
> I found it's hard to adjust length correctly according to SELECT_VL.
> 
> So, this patch we only use SELECT_VL for single-rgroup with single length 
> control.
> 
> 3. Fix document of select_vl for Richard Biener (remove mode N).
> 4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
> Biener.
> 5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
> 6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
> at the caller site.
> 
> More comments from Richard Biener:
>>> So it's not actually saturating.  The saturating operation is done by 
>>> .WHILE_LEN?
> I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
> vf)) will make
> the loop control counter never underflow zero.
> 
>>> I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
>>> the above scheme should also work for single rgroups, no?
>>> As said, it _looks_ like you can progress without .WHILE_LEN and using
>>> .WHILE_LEN is a pure optimization?
> Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
> target adjust any length = INRANGE (0, min (n, vf)) each iteration.
> 
> Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 

[PATCH] [GCC 13] Fix aarch64/109762: push_options/push_options does not work sometimes

2023-05-07 Thread Andrew Pinski via Gcc-patches
aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags 
(uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious to gcc 13 branch after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc 
(aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.

(cherry picked from commit a1a9ce2441df0675540faee8476523164e12578b)
---
 gcc/config/aarch64/aarch64-builtins.cc | 2 +-
 gcc/config/aarch64/aarch64-protos.h| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index cc6b7c01fd1..4ee6cb00f8b 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1547,7 +1547,7 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
 
 /* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
set.  */
-aarch64_simd_switcher::aarch64_simd_switcher (unsigned int extra_flags)
+aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
extra_flags)
   : m_old_asm_isa_flags (aarch64_asm_isa_flags),
 m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
 {
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 63339fa47df..e727e207367 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -733,11 +733,11 @@ const unsigned int AARCH64_BUILTIN_CLASS = (1 << 
AARCH64_BUILTIN_SHIFT) - 1;
 class aarch64_simd_switcher
 {
 public:
-  aarch64_simd_switcher (unsigned int extra_flags = 0);
+  aarch64_simd_switcher (aarch64_feature_flags extra_flags = 0);
   ~aarch64_simd_switcher ();
 
 private:
-  unsigned long m_old_asm_isa_flags;
+  aarch64_feature_flags m_old_asm_isa_flags;
   bool m_old_general_regs_only;
 };
 
-- 
2.31.1



Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Maciej W. Rozycki
On Sun, 7 May 2023, Jiaxun Yang wrote:

> > Both EHB and J[AL]R.HB are backwards compatible however (except for an 
> > obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
> > longer relevant), so I think the legacy sequence ought to just return via 
> > JR.HB as well, therefore providing the required semantics with newer 
> > hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
> > ignore it for bare metal, deferring to whoever might be interested for a 
> > workaround).
> 
> Hmm, I just checked MIPS-IV manual, it seems like HB bit (bit 10) is defined 
> as
> zero for both JR and JALR.
> 
> Is it actually omitted in implementation?

 What has become the hint field was supposed not to be decoded in earlier 
silicon, which is why such encodings were chosen for JR.HB and JALR.HB.  
I was told it was actually verified across existing silicon at the time 
the instructions were being defined with MIPSr1 (i.e. when the hint field 
was added to the ISA).  The issue with the 4Kc was an unfortunate mistake.

 Many instructions are not fully decoded causing encoding aliases, MFHI, 
etc. comes to mind.

  Maciej


Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Jiaxun Yang via Gcc-patches



> 2023年5月7日 18:34,Maciej W. Rozycki  写道:
> 
> On Wed, 3 May 2023, Jiaxun Yang wrote:
> 
>> Since it’s possible to run R2- binary on R2+ processor, we’d better find a
>> semantic that do eliminate speculation on all processors. While SSNOPs
>> on R2+ processors is pretty much undefined, there is no guarantee that
>> SSNOP sequence can eliminate speculation.
> 
> Not exactly undefined on R2+, SSNOP is still required to single-issue, so 
> it does act as an execution barrier.  Good point otherwise.
> 
> Both EHB and J[AL]R.HB are backwards compatible however (except for an 
> obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
> longer relevant), so I think the legacy sequence ought to just return via 
> JR.HB as well, therefore providing the required semantics with newer 
> hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
> ignore it for bare metal, deferring to whoever might be interested for a 
> workaround).

Hmm, I just checked MIPS-IV manual, it seems like HB bit (bit 10) is defined as
zero for both JR and JALR.

Is it actually omitted in implementation?

Thanks
Jiaxun




Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Harald Anlauf via Gcc-patches

Hi Jerry,

I've made a small compiler survey how they behave on namelist read
from an internal unit when:

1.) there is a single input line of the type
"" // testchar // " n = 666/"

2.) the input spans 2 lines split after the testchar

3.) same as 2.) but first line right-adjusted

See attached source code.

Competitors: Intel, NAG, NVidia, gfortran at r14-547 with -std=f2018.

My findings were (last column is iostat, next-to-last is n read or -1):

NAG:

 Compiler version = NAG Fortran Compiler Release 7.1(Hanzomon) Build 7101
 1-line:   > < 666 0
 2-line/left:  > < 666 0
 2-line/right: > < 666 0
 1-line:   >!< -1 187
 2-line/left:  >!< -1 187
 2-line/right: >!< -1 187
 1-line:   >/< -1 187
 2-line/left:  >/< -1 187
 2-line/right: >/< -1 187
 1-line:   >,< -1 187
 2-line/left:  >,< -1 187
 2-line/right: >,< -1 187
 1-line:   >;< -1 187
 2-line/left:  >;< -1 187
 2-line/right: >;< -1 187
 1-line:   tab 666 0
 2-line/left:  tab 666 0
 2-line/right: tab 666 0
 1-line:   lf -1 187
 2-line/left:  lf -1 187
 2-line/right: lf -1 187
 1-line:   ret -1 187
 2-line/left:  ret -1 187
 2-line/right: ret -1 187

My interpretation of this is that NAG treats tab as (white)space,
everything else gives an error.  This is the strictest compiler.

Intel:

 Compiler version = Intel(R) Fortran Intel(R) 64 Compiler Classic for
applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_00
 1-line:   > < 666   0
 2-line/left:  > < 666   0
 2-line/right: > < 666   0
 1-line:   >!<  -1  -1
 2-line/left:  >!< 666   0
 2-line/right: >!< 666   0
 1-line:   >/<  -1   0
 2-line/left:  >/<  -1   0
 2-line/right: >/<  -1   0
 1-line:   >,<  -1  17
 2-line/left:  >,<  -1  17
 2-line/right: >,<  -1  17
 1-line:   >;<  -1  17
 2-line/left:  >;<  -1  17
 2-line/right: >;<  -1  17
 1-line:   tab 666   0
 2-line/left:  tab 666   0
 2-line/right: tab 666   0
 1-line:   lf 666   0
 2-line/left:  lf 666   0
 2-line/right: lf 666   0
 1-line:   ret  -1  17
 2-line/left:  ret  -1  17
 2-line/right: ret  -1  17

Nvidia:

 Compiler version = nvfortran 23.3-0
 1-line:   > <  6660
 2-line/left:  > <  6660
 2-line/right: > <  6660
 1-line:   >!<   -1   -1
 2-line/left:  >!<   -1   -1
 2-line/right: >!<   -1   -1
 1-line:   >/<   -1   -1
 2-line/left:  >/<   -1   -1
 2-line/right: >/<   -1   -1
 1-line:   >,<   -1   -1
 2-line/left:  >,<   -1   -1
 2-line/right: >,<   -1   -1
 1-line:   >;<   -1   -1
 2-line/left:  >;<   -1   -1
 2-line/right: >;<   -1   -1
 1-line:   tab  6660
 2-line/left:  tab  6660
 2-line/right: tab  6660
 1-line:   lf   -1   -1
 2-line/left:  lf  6660
 2-line/right: lf  6660
 1-line:   ret  6660
 2-line/left:  ret  6660
 2-line/right: ret  6660

gfortran (see above):

 Compiler version = GCC version 14.0.0 20230506 (experimental)
 1-line:   > < 666   0
 2-line/left:  > < 666   0
 2-line/right: > < 666   0
 1-line:   >!<  -1  -1
 2-line/left:  >!<  -1   0
 2-line/right: >!< 666   0
 1-line:   >/<  -1   0
 2-line/left:  >/<  -1   0
 2-line/right: >/<  -1   0
 1-line:   >,< 6665010
 2-line/left:  >,< 6665010
 2-line/right: >,< 6665010
 1-line:   >;< 666   0
 2-line/left:  >;< 666   0
 2-line/right: >;< 666   0
 1-line:   tab 666   0
 2-line/left:  tab 666   0
 2-line/right: tab 666   0
 1-line:   lf 666   0
 2-line/left:  lf 666   0
 2-line/right: lf 666   0
 1-line:   ret 666   0
 2-line/left:  ret 666   0
 2-line/right: ret 666   0


So there seems to be a consensus that "," and ";" must be rejected,
and tab is accepted (makes real sense), but already the termination
character "/" and comment character "!" are treated differently.
And how do we want to treat lf and ret in internal files with
-std=f20xx?

Cheers,

Re: [PATCH v2] MIPS: add speculation_barrier support

2023-05-07 Thread Maciej W. Rozycki
On Wed, 3 May 2023, Jiaxun Yang wrote:

> Since it’s possible to run R2- binary on R2+ processor, we’d better find a
> semantic that do eliminate speculation on all processors. While SSNOPs
> on R2+ processors is pretty much undefined, there is no guarantee that
> SSNOP sequence can eliminate speculation.

 Not exactly undefined on R2+, SSNOP is still required to single-issue, so 
it does act as an execution barrier.  Good point otherwise.

 Both EHB and J[AL]R.HB are backwards compatible however (except for an 
obscure 4Kc J[AL]R.HB erratum I came across once and which may be no 
longer relevant), so I think the legacy sequence ought to just return via 
JR.HB as well, therefore providing the required semantics with newer 
hardware.  If it does trap for 4Kc, then the OS can emulate it (and we can 
ignore it for bare metal, deferring to whoever might be interested for a 
workaround).

> My proposal is for R2- CPUs we can do a dummy syscall to act as instruction
> hazard barrier, since exception must clear the pipeline this should be true
> for all known implementations.

 I think the SSNOP approach should be sufficient.

> The most lightweight syscall I know is to do a MIPS_ATOMIC_SET with
> sysmips. A dummy variable on stack should do the track. Do let me know if 
> there
> is a better option.

 That would have to be gettimeofday(2) then, the most performance-critical 
one, and also one that does not have side effects.  The real syscall and 
not VSDO emulation of course (there's a reason it's emulated via VSDO, 
which is exactly our reason too).

> I have a vague memory about a discussion finding that exception does not 
> indicate
> a memory barrier, so perhaps we still need a sync preceding to that syscall.

 There is no claim that I could find in the architecture specification 
saying that taking an exception implies a memory barrier and therefore we 
must conclude it does not.  Likewise executing ERET.

 As I say I think the SSNOP approach should be sufficient, along with 
relying on SYNC emulation.

> > I think there may be no authoritative source of information here, this is 
> > a grey area.  The longest SSNOP sequences I have seen were for the various 
> > Broadcom implementations and counted 7 instructions.  Both the Linux 
> > kernel and the CFE firmware has them.
> 
> Was it for SiByte or BMIPS?

 Both AFAICT.

> > Also we may not be able to fully enforce ordering for the oldest devices 
> > that do not implement SYNC, as this is system-specific, e.g. involving 
> > branching on the CP0 condition with the BC0F instruction, and inventing an 
> > OS interface for that seems unreasonable at this point of history.
> 
> I guess this is not a valid concern for user space applications?
> As per R4000 manual BC0F will issue “Coprocessor unusable exception”
> exception and it’s certain that we have Staus.CU0 = 0 in user space.

 Exactly, which is why an OS service would have to provide the required 
semantics to the userland, and none might be available.  And we probably 
do not care anyway, because I gather this is a security feature to prevent 
certain types of data leaks via a side channel.  I wouldn't expect anyone 
doing any serious security-sensitive processing with legacy MIPS hardware.  
And then there's no speculative execution with all these pieces of legacy 
hardware (R3000, eh?) that have no suitable barriers provided, mostly 
because they are not relevant for them anyway.

 For bare metal we probably do not care about such legacy hardware either 
way.

 Overall I'd say let's do the best we can without bending backwards and 
then rely on people's common sense.

  Maciej


Re: [patch, fortran] PR109662 Namelist input with comma after name accepted

2023-05-07 Thread Jerry D via Gcc-patches

On 5/6/23 11:15 AM, Harald Anlauf via Fortran wrote:

Hi Jerry, Steve,

I think I have to pour a little water into the wine.

The patch fixes the reported issue only for a comma after
the namelist name, but we still accept a few other illegal
characters, e.g. ';', because:

#define is_separator(c) (c == '/' ||  c == ',' || c == '\n' || c == ' ' \
  || c == '\t' || c == '\r' || c == ';' || \
  (dtp->u.p.namelist_mode && c == '!'))

We don't want that in standard conformance mode, or do we?

Cheers,
Harald

On 5/6/23 06:02, Steve Kargl via Gcc-patches wrote:

On Fri, May 05, 2023 at 08:41:48PM -0700, Jerry D via Fortran wrote:
The attached patch adds a check for the invalid comma and emits a 
runtime

error if -std=f95,f2003,f2018 are specified at compile time.

Attached patch includes a new test case.

Regression tested on x86_64-linux-gnu.

OK for mainline?



Yes.  Thanks for the fix.  It's been a long time since
I looked at libgfortran code and couldn't quite determine
where to start to fix this.





As I think back, I don't recall ever seeing a semi-colon used after a 
NAMELIST name, so I think we should reject it always.  The other "soft" 
blanks we should allow.


I will make a another patch on trunk to reject the semi-colon and if no 
one objects here I will test and push it.


Regards,

Jerry



[PATCH] Fix aarch64/109762: push_options/push_options does not work sometimes

2023-05-07 Thread Andrew Pinski via Gcc-patches
aarch64_isa_flags (and aarch64_asm_isa_flags) are both aarch64_feature_flags 
(uint64_t)
but since r12-8000-g14814e20161d, they are saved/restored as unsigned long. This
does not make a difference for LP64 targets but on ILP32 and LLP64IL32 targets,
it means it does not get restored correctly.
This patch changes over to use aarch64_feature_flags instead of unsigned long.

Committed as obvious after a bootstrap/test.

gcc/ChangeLog:

PR target/109762
* config/aarch64/aarch64-builtins.cc 
(aarch64_simd_switcher::aarch64_simd_switcher):
Change argument type to aarch64_feature_flags.
* config/aarch64/aarch64-protos.h (aarch64_simd_switcher): Change
constructor argument type to aarch64_feature_flags.
Change m_old_asm_isa_flags to be aarch64_feature_flags.
---
 gcc/config/aarch64/aarch64-builtins.cc | 2 +-
 gcc/config/aarch64/aarch64-protos.h| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 94ad364b997..cb6aae3f1fa 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1547,7 +1547,7 @@ aarch64_scalar_builtin_type_p (aarch64_simd_type t)
 
 /* Enable AARCH64_FL_* flags EXTRA_FLAGS on top of the base Advanced SIMD
set.  */
-aarch64_simd_switcher::aarch64_simd_switcher (unsigned int extra_flags)
+aarch64_simd_switcher::aarch64_simd_switcher (aarch64_feature_flags 
extra_flags)
   : m_old_asm_isa_flags (aarch64_asm_isa_flags),
 m_old_general_regs_only (TARGET_GENERAL_REGS_ONLY)
 {
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4904d193647..b138494384b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -733,11 +733,11 @@ const unsigned int AARCH64_BUILTIN_CLASS = (1 << 
AARCH64_BUILTIN_SHIFT) - 1;
 class aarch64_simd_switcher
 {
 public:
-  aarch64_simd_switcher (unsigned int extra_flags = 0);
+  aarch64_simd_switcher (aarch64_feature_flags extra_flags = 0);
   ~aarch64_simd_switcher ();
 
 private:
-  unsigned long m_old_asm_isa_flags;
+  aarch64_feature_flags m_old_asm_isa_flags;
   bool m_old_general_regs_only;
 };
 
-- 
2.31.1



Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit

2023-05-07 Thread Jeff Law via Gcc-patches




On 5/6/23 19:55, Li, Pan2 wrote:

It looks like we cannot simply swap the code and mode in rtx_def, the code may 
have to be the same bits as the tree_code in tree_base. Or we will meet ICE 
like below.

rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.

static inline decl_or_value
dv_from_value (rtx value)
{
   decl_or_value dv;
   dv = value;
   gcc_checking_assert (dv_is_value_p (dv));  <=  ICE
   return dv;
Ugh.  We really just need to fix this code.  It assumes particular 
structure layouts and that's just wrong/dumb.


So I think think the first step is to fix up this crap code in 
var-tracking.  That should be a patch unto itself.  Then we'd have the 
structure changes as a separate change.


Jeff


Re: [x86_64 PATCH] Introduce insvti_highpart define_insn_and_split.

2023-05-07 Thread Uros Bizjak via Gcc-patches
On Sat, May 6, 2023 at 4:00 PM Roger Sayle  wrote:
>
>
> Hi Uros,
> This is a repost/respin of a patch that was conditionally approved:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
>
> This patch adds a convenient post-reload splitter for setting/updating
> the highpart of a TImode variable, using i386's previously added
> split_double_concat infrastructure.
>
> For the new test case below:
>
> __int128 foo(__int128 x, unsigned long long y)
> {
>   __int128 t = (__int128)y << 64;
>   __int128 r = (x & ~0ull) | t;
>   return r;
> }
>
> mainline GCC with -O2 currently generates:
>
> foo:movq%rdi, %rcx
> xorl%eax, %eax
> xorl%edi, %edi
> orq %rcx, %rax
> orq %rdi, %rdx
> ret
>
> with this patch, GCC instead now generates the much better:
>
> foo:movq%rdi, %rcx
> movq%rcx, %rax
> ret
>
> It turns out that the -m32 equivalent of this testcase, already
> avoids using explict orl/xor instructions, as it gets optimized
> (in combine) by a completely different path.  Given that this idiom
> isn't seen in 32-bit code (so this pattern doesn't match with -m32),
> and also that the shorter 32-bit AND bitmask is represented as a
> CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
> is implemented for just TARGET_64BIT rather than contort a "generic"
> implementation using DWI mode iterators.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline now that we're back in stage 1?
>
>
> 2023-05-06  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (any_or_plus): Move definition earlier.
> (*insvti_highpart_1): New define_insn_and_split to overwrite
> (insv) the highpart of a TImode register/memory.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/insvti_highpart-1.c: New test case.

@@ -3479,6 +3479,31 @@
   "mov{b}\t{%h1, %h0|%h0, %h1}"
   [(set_attr "type" "imov")
(set_attr "mode" "QI")])
+
+(define_code_iterator any_or_plus [plus ior xor])

Please add a line of vertical space here.

+(define_insn_and_split "*insvti_highpart_1"

...

+   && CONST_WIDE_INT_P (operands[3])
+   && CONST_WIDE_INT_NUNITS (operands[3]) == 2
+   && CONST_WIDE_INT_ELT (operands[3], 0) == -1
+   && CONST_WIDE_INT_ELT (operands[3], 1) == 0"
+  "#"
+  "&& reload_completed"
+  [(clobber (const_int 0))]

The above RTX is unreachable, but please use

[(const_int 0)]

as is the case with similar patterns.

OK with the above changes.

Thanks,
Uros.


Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-07 Thread Jeff Law via Gcc-patches




On 5/4/23 07:25, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

This patch is fixing V3 patch:
https://patchwork.sourceware.org/project/gcc/patch/20230407014741.139387-1-juzhe.zh...@rivai.ai/

Fix issues according to Richard Sandiford && Richard Biener.

1. Rename WHILE_LEN pattern into SELECT_VL according to Richard Sandiford.
2. Support multiple-rgroup for non-SLP auto-vectorization.

For vec_pack_trunc pattern (multi-rgroup of non-SLP), we generate the total 
length:

  _36 = MIN_EXPR ;

  First length (MIN (X, VF/N)):
loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;

  Second length (X - MIN (X, 1 * VF/N)):
loop_len_16 = _36 - loop_len_15;

  Third length (X - MIN (X, 2 * VF/N)):
_38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
loop_len_17 = _36 - _38;

  Forth length (X - MIN (X, 3 * VF/N)):
_39 = MIN_EXPR <_36, POLY_INT_CST [6, 6]>;
loop_len_18 = _36 - _39;

The reason that I use MIN_EXPR instead of SELECT_VL to calculate total length 
since using SELECT_VL
to adapt induction IV consumes more instructions than just using MIN_EXPR. 
Also, during testing,
I found it's hard to adjust length correctly according to SELECT_VL.

So, this patch we only use SELECT_VL for single-rgroup with single length 
control.

3. Fix document of select_vl for Richard Biener (remove mode N).
4. Fix comments of vect_set_loop_controls_by_select_vl according to Richard 
Biener.
5. Keep loop_vinfo as first parameter for "vect_get_loop_len".
6. make requirement of get_while_len_data_ref_ptr outside, let it to be gated 
at the caller site.

More comments from Richard Biener:

So it's not actually saturating.  The saturating operation is done by 
.WHILE_LEN?

I define the outcome of SELECT_VL (n, vf)  (WHILE_LEN) = IN_RANGE (0, min (n, 
vf)) will make
the loop control counter never underflow zero.


I see.  I wonder if it makes sense to leave .WHILE_LEN aside for a start,
the above scheme should also work for single rgroups, no?
As said, it _looks_ like you can progress without .WHILE_LEN and using
.WHILE_LEN is a pure optimization?

Yes, SELECT_VL (WHILE_LEN) is pure optimization for single-rgroup and allow
target adjust any length = INRANGE (0, min (n, vf)) each iteration.

Let me known if I missed something for the V3 patch.
So at a high level this is pretty good.  I think there's some 
improvements we should make in the documentation and comments, but I'm 
comfortable with most of the implementation details.






diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index cc4a93a8763..99cf0cdbdca 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,40 @@ for (i = 1; i < operand3; i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
  @end smallexample
  
+@cindex @code{select_vl@var{m}} instruction pattern

+@item @code{select_vl@var{m}}
+Set operand 0 to the number of active elements in vector will be updated value.

This reads rather poorly.  Is this still accurate?

Set operand 0 to the number of active elements in a vector to be updated 
in a loop iteration based on the total number of elements to be updated, 
the vectorization factor and vector properties of the target.




+operand 1 is the total elements need to be updated value.

operand 1 is the total elements in the vector to be updated.



+
+The output of this pattern is not only used as IV of loop control counter, but 
also
+is used as the IV of address calculation with multiply/shift operation. This 
allow
+us dynamic adjust the number of elements is processed in each iteration of the 
loop.
This allows dynamic adjustment of the number of elements processed each 
loop iteration. -- is that still accurate and does it read better?




@@ -47,7 +47,9 @@ along with GCC; see the file COPYING3.  If not see
 so that we can free them all at once.  */
  static bitmap_obstack loop_renamer_obstack;
  
-/* Creates an induction variable with value BASE + STEP * iteration in LOOP.

+/* Creates an induction variable with value BASE (+/-) STEP * iteration in 
LOOP.
+   If CODE is PLUS_EXPR, the induction variable is BASE + STEP * iteration.
+   If CODE is MINUS_EXPR, the induction variable is BASE - STEP * iteration.
 It is expected that neither BASE nor STEP are shared with other expressions
 (unless the sharing rules allow this).  Use VAR as a base var_decl for it
 (if NULL, a new temporary will be created).  The increment will occur at
It's been pretty standard to stick with just PLUS_EXPR for this stuff 
and instead negate the constant to produce the same effect as 
MINUS_EXPR.  Is there a reason we're not continuing that practice? 
Sorry if you've answered this already -- if you have, you can just point 
me at the prior discussion and I'll read it.





diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 44bd5f2c805..d63ded5d4f0 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,48 @@ 

Re: [PATCH V3] rs6000: Load high and low part of 64bit constant independently

2023-05-07 Thread guojiufu via Gcc-patches

Hi,

On 2023-04-26 17:35, Kewen.Lin wrote:

Hi Jeff,

on 2023/1/4 14:51, Jiufu Guo wrote:

Hi,

Compare with previous version, this patch updates the comments only.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608293.html

For a complicate 64bit constant, below is one instruction-sequence to
build:
lis 9,0x800a
ori 9,9,0xabcd
sldi 9,9,32
oris 9,9,0xc167
ori 9,9,0xfa16

while we can also use below sequence to build:
lis 9,0xc167
lis 10,0x800a
ori 9,9,0xfa16
ori 10,10,0xabcd
rldimi 9,10,32,0
This sequence is using 2 registers to build high and low part firstly,
and then merge them.

In parallel aspect, this sequence would be faster. (Ofcause, using 1 
more

register with potential register pressure).

The instruction sequence with two registers for parallel version can 
be

generated only if can_create_pseudo_p.  Otherwise, the one register
sequence is generated.

Bootstrap and regtest pass on ppc64{,le}.
Is this ok for trunk?


OK for trunk, thanks for the improvement!


Thanks! Committed via r14-555-gb05b529125fa51.

BR,
Jeff (Jiufu)



BR,
Kewen




BR,
Jeff(Jiufu)


gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Generate
more parallel code if can_create_pseudo_p.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/parall_5insn_const.c: New test.



[pushed] wwwdocs: onlinedocs: Fix markup

2023-05-07 Thread Gerald Pfeifer
Pushed.

Gerald

---

Commit 7c0b3155efaecf8c9bfa5192ca99acc7356bec71 for GCC 13.1 "stole" an
opening  from the existing GCC 13.2 entry.
---
 htdocs/onlinedocs/index.html | 1 +
 1 file changed, 1 insertion(+)

diff --git a/htdocs/onlinedocs/index.html b/htdocs/onlinedocs/index.html
index 421d1d0b..e72f5bfa 100644
--- a/htdocs/onlinedocs/index.html
+++ b/htdocs/onlinedocs/index.html
@@ -111,6 +111,7 @@
   
 
 
+
  GCC 12.2 manuals:
   
 https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/;>GCC
-- 
2.40.1


New Russian PO file for 'cpplib' (version 13.1-b20230212)

2023-05-07 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Russian team of translators.  The file is available at:

https://translationproject.org/latest/cpplib/ru.po

(This file, 'cpplib-13.1-b20230212.ru.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Contents of PO file 'cpplib-13.1-b20230212.ru.po'

2023-05-07 Thread Translation Project Robot


cpplib-13.1-b20230212.ru.po.gz
Description: Binary data
The Translation Project robot, in the
name of your translation coordinator.