Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
Ok.  LGTM as long as you change the patch as I suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-30 14:51
To: juzhe.zh...@rivai.ai
CC: gcc-patches; palmer; kito.cheng; jeffreyalaw; Robin Dapp; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >> /* Return true if MODE is true VLS mode.  */
> >> bool
> >> vls_mode_p (machine_mode mode)
> >> {
> >>   switch (mode)
> >> {
> >> case E_V4SImode:
> >> case E_V2DImode:
> >> case E_V8HImode:
> >> case E_V16QImode:
> >>   return true;
> >> default:
> >>   return false;
> >> }
> >> }
>
> To be consistent, you should put these into riscv-vector-switching.def.
> It can make the function easier extend,change it like this:
> change name into riscv_v_ext_vls_mode_p
>
> bool
> riscv_v_ext_vls_mode_p (machine_mode mode)
> {
> #define VLS_ENTRY(MODE, REQUIREMENT, ...) 
>  \
>   case MODE##mode:
>  \
> return REQUIREMENT;
>   switch (mode)
> {
> #include "riscv-vector-switch.def"
> default:
>   return false;
> }
>   return false;
> }
>
> Then in riscv-vector-switch.def
> VLS_ENTRY (V4SI...
> VLS_ENTRY (V2DI..
> ...
> In the future, we extend more VLS modes in riscv-vector-switch.def
 
Good point, we should make this more consistent :)
 
> >>(define_insn_and_split "3"
> >>  [(set (match_operand:VLS 0 "register_operand" "=vr")
> >> (any_int_binop_no_shift:VLS
> >>  (match_operand:VLS 1 "register_operand" "vr")
> >>  (match_operand:VLS 2 "register_operand" "vr")))]
> >>  "TARGET_VECTOR"
> >>  "#"
> >>  "reload_completed"
> >>  [(const_int 0)]
> >>+{
> >>  machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode);
> >>  riscv_vector::vls_insn_expander (
> >>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP,
> >>operands, mode, vla_mode);
> >>  DONE;
> >>})
>
> This pattern can work for current VLS modes so far since they are within 
> 0~31, if we add more VLSmodes such as V32QImode, V64QImode,
> it can't work . I am ok with this, but I should remind you early.
 
Yeah, I Know the problem, my thought is we will have another set of
VLS patterns for those NUNITS >= 32, and require one clobber with GPR.
 
> Add tests with -march=rv64gcv_zvl256b to see whether your testcase can 
> generate LMUL = mf2 vsetvli
>
> and -march=rv64gcv_zvl2048 make sure your testcase will not go into the VLS 
> modes (2048 * 1 / 8 > 128)
 
I guess I should make a loop to test those combinations instead of
spearted file but with different options.
 
>
>
> For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS 
> modes.
>
> I wonder how these VLS modes emit correct VSETVL?
 
That's the magic I made here, I split the pattern after RA, but before
vsetvli, and convert all operands to VLA mode and use VLA pattern, so
that we don't need to modify any line of vsetvli stuff.
 


Re: [PATCH 1/2] ipa-cp: Avoid long linear searches through DECL_ARGUMENTS

2023-05-30 Thread Richard Biener via Gcc-patches
On Mon, May 29, 2023 at 6:20 PM Martin Jambor  wrote:
>
> Hi,
>
> there have been concerns that linear searches through DECL_ARGUMENTS
> that are often necessary to compute the index of a particular
> PARM_DECL which is the key to results of IPA-CP can happen often
> enough to be a compile time issue, especially if we plug the results
> into value numbering, as I intend to do with a follow-up patch.
>
> This patch creates a hash map to do the look-up for all functions
> which have some information discovered by IPA-CP and which have 32
> parameters or more.  32 is a hard-wired magical constant here to
> capture the trade-off between the memory allocation overhead and
> length of the linear search.  I do not think it is worth making it a
> --param but if people think it appropriate, I can turn it into one.

Since ipcp_transformation is short-lived (is it?) is it worth the trouble?
Comments below ...

> Bootstrapped, tested and LTO bootstrapped on x86_64-linux, both as-is
> and with themagical constant dropped to 4 so that the has lookup path
> is also well excercised.  OK for master?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-05-26  Martin Jambor  
>
> * ipa-prop.h (struct ipcp_transformation): Rearrange members
> according to C++ class coding convention, add m_tree_to_idx,
> get_param_index and maybe_create_parm_idx_map.
> * ipa-cp.cc (ipcp_transformation::get_param_index): New function.
> (ipcp_transformation::maype_create_parm_idx_map): Likewise.
> * ipa-prop.cc (ipcp_get_parm_bits): Use get_param_index.
> (ipcp_update_bits): Accept TS as a parameter, assume it is not NULL.
> (ipcp_update_vr): Likewise.
> (ipcp_transform_function): Call, maybe_create_parm_idx_map of TS, bail
> out quickly if empty, pass it to ipcp_update_bits and ipcp_update_vr.
> ---
>  gcc/ipa-cp.cc   | 45 +
>  gcc/ipa-prop.cc | 44 +++-
>  gcc/ipa-prop.h  | 33 +
>  3 files changed, 89 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 0f37bb5e336..9f8b07b2398 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -6761,3 +6761,48 @@ ipa_cp_cc_finalize (void)
>orig_overall_size = 0;
>ipcp_free_transformation_sum ();
>  }
> +
> +/* Given PARAM which must be a parameter of function FNDECL described by 
> THIS,
> +   return its index in the DECL_ARGUMENTS chain, using a pre-computed hash 
> map
> +   if avialable (which is pre-computed only if there are many parameters).  
> Can
> +   return -1 if param is static chain not represented among DECL_ARGUMENTS. 
> */
> +
> +int
> +ipcp_transformation::get_param_index (const_tree fndecl, const_tree param) 
> const
> +{
> +  gcc_assert (TREE_CODE (param) == PARM_DECL);
> +  if (m_tree_to_idx)
> +{
> +  unsigned *pr = m_tree_to_idx->get(param);
> +  if (!pr)
> +   {
> + gcc_assert (DECL_STATIC_CHAIN (fndecl));
> + return -1;
> +   }
> +  return (int) *pr;
> +}
> +
> +  unsigned index = 0;
> +  for (tree p = DECL_ARGUMENTS (fndecl); p; p = DECL_CHAIN (p), index++)
> +if (p == param)
> +  return (int) index;
> +
> +  gcc_assert (DECL_STATIC_CHAIN (fndecl));
> +  return -1;
> +}
> +
> +/* Assuming THIS describes FNDECL and it has sufficiently many parameters to
> +   justify the overhead, creat a has map from parameter trees to their
> +   indices.  */
> +void
> +ipcp_transformation::maybe_create_parm_idx_map (tree fndecl)
> +{
> +  int c = count_formal_params (fndecl);
> +  if (c < 32)
> +return;
> +
> +  m_tree_to_idx = hash_map::create_ggc (c);
> +  unsigned index = 0;
> +  for (tree p = DECL_ARGUMENTS (fndecl); p; p = DECL_CHAIN (p), index++)
> +m_tree_to_idx->put (p, index);

I think allocating the hash-map with 'c' for some numbers (depending
on the "prime"
chosen) will necessarily cause re-allocation of the hash since we keep a load
factor of at most 3/4 upon insertion.

But - I wonder if a UID sorted array isn't a very much better data
structure for this?
That is, a vec >?

> +}
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index ab6de9f10da..f0976e363f7 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -5776,16 +5776,9 @@ ipcp_get_parm_bits (tree parm, tree *value, widest_int 
> *mask)
>if (!ts || vec_safe_length (ts->bits) == 0)
>  return false;
>
> -  int i = 0;
> -  for (tree p = DECL_ARGUMENTS (current_function_decl);
> -   p != parm; p = DECL_CHAIN (p))
> -{
> -  i++;
> -  /* Ignore static chain.  */
> -  if (!p)
> -   return false;
> -}
> -
> +  int i = ts->get_param_index (current_function_decl, parm);
> +  if (i < 0)
> +return false;
>clone_info *cinfo = clone_info::get (cnode);
>if (cinfo && cinfo->param_adjustments)
>  {
> @@ -5802,16 +5795,12 @@ ipcp_get_parm_bits (tree parm, tree *value, 
> widest_int *mask)
>

Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 30, 2023 at 7:06 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> On 22/05/23 6:26 pm, Richard Biener wrote:
> > On Thu, May 18, 2023 at 9:14 AM Ajit Agarwal  wrote:
> >>
> >> Hello All:
> >>
> >> This patch improves code sinking pass to sink statements before call to 
> >> reduce
> >> register pressure.
> >> Review comments are incorporated.
> >>
> >> Bootstrapped and regtested on powerpc64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >>
> >> tree-ssa-sink: Improve code sinking pass.
> >>
> >> Code Sinking sinks the blocks after call. This increases
> >> register pressure for callee-saved registers. Improves
> >> code sinking before call in the use blocks or immediate
> >> dominator of use blocks.
> >>
> >> 2023-05-18  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-ssa-sink.cc (statement_sink_location): Modifed to
> >> move statements before calls.
> >> (block_call_p): New function.
> >> (def_use_same_block): New function.
> >> (select_best_block): Add heuristics to select the best
> >> blocks in the immediate post dominator.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
> >> * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
> >> ---
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 ++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  20 +++
> >>  gcc/tree-ssa-sink.cc| 159 ++--
> >>  3 files changed, 185 insertions(+), 10 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> >> new file mode 100644
> >> index 000..716bc1f9257
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> >> @@ -0,0 +1,16 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
> >> -fdump-tree-sink-stats" } */
> >> +
> >> +void bar();
> >> +int j;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >> +  l = a + b + c + d +e + f;
> >> +  if (a != 5)
> >> +{
> >> +  bar();
> >> +  j = l;
> >> +}
> >> +}
> >> +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
> >
> > this doesn't verify the place we sink to?
> >
>
> I am not sure how to verify the place we sink to with dg-final.

I think dejagnu supports matching multi-line regexps so I suggest
to scan for the sunk expr RHS to be followed by the call?

> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> new file mode 100644
> >> index 000..ff41e2ea8ae
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> @@ -0,0 +1,20 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */
> >> +
> >> +void bar();
> >> +int j, x;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >> +  l = a + b + c + d +e + f;
> >> +  if (a != 5)
> >> +{
> >> +  bar();
> >> +  if (b != 3)
> >> +x = 3;
> >> +  else
> >> +x = 5;
> >> +  j = l;
> >> +}
> >> +}
> >> +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
> >
> > likewise.  So both tests already pass before the patch?
> >
> >> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> >> index 87b1d40c174..76556e7795b 100644
> >> --- a/gcc/tree-ssa-sink.cc
> >> +++ b/gcc/tree-ssa-sink.cc
> >> @@ -171,6 +171,72 @@ nearest_common_dominator_of_uses (def_operand_p 
> >> def_p, bool *debug_stmts)
> >>return commondom;
> >>  }
> >>
> >> +/* Return TRUE if immediate uses of the defs in
> >> +   USE occur in the same block as USE, FALSE otherwise.  */
> >> +
> >> +bool
> >> +def_use_same_block (gimple *stmt)
> >> +{
> >> +  use_operand_p use_p;
> >> +  def_operand_p def_p;
> >> +  imm_use_iterator imm_iter;
> >> +  ssa_op_iter iter;
> >> +
> >> +  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_DEF)
> >> +{
> >> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
> >> +   {
> >> + if (is_gimple_debug (USE_STMT (use_p)))
> >> +   continue;
> >> +
> >> + if (use_p
> >
> > use_p is never null
> >
> >> + && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt)))
> >> +   return true;
> >
> > the function behavior is obviously odd ...
> >
> >> +   }
> >> + }
> >> +  return false;
> >> +}
> >> +
> >> +/* Return TRUE if the block has only calls, FALSE otherwise. */
> >> +
> >> +bool
> >> +block_call_p (basic_block bb)
> >> +{
> >> +  int i = 0;
> >> +  bool is_call = false;
> >> +  gimple_stmt_iterator gsi = gsi_last_bb (bb);
> >> +  gimple *last_stmt = gsi_stmt (gsi);
> >> +
> >> +  if (last_stmt && gimp

Re: [PATCH] [libstdc++] [testsuite] xfail double-prec from_chars for x86_64 ldbl

2023-05-30 Thread Jonathan Wakely via Gcc-patches
On Tue, 30 May 2023, 05:35 Alexandre Oliva via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

>
> When long double is wider than double, but from_chars is implemented
> in terms of double, tests that involve the full precision of long
> double are expected to fail.  Mark them as such on x86_64-*-vxworks*.
>

OK for trunk/13/12



> Tested on x86_64-vxworks7r2 with gcc-12.  Ok to install?
>
>
> for  libstdc++-v3/ChangeLog
>
> * testsuite/20_util/from_chars/4.cc: Skip long double test06
> on x86_64-vxworks.
> * testsuite/20_util/to_chars/long_double.cc: Xfail run on
> x86_64-vxworks.
> ---
>  libstdc++-v3/testsuite/20_util/from_chars/4.cc |2 +-
>  .../testsuite/20_util/to_chars/long_double.cc  |2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> index c3594f9014bd3..63a32b511be4e 100644
> --- a/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> +++ b/libstdc++-v3/testsuite/20_util/from_chars/4.cc
> @@ -18,7 +18,7 @@
>  //  is supported in C++14 as a GNU extension
>  // { dg-do run { target c++14 } }
>  // { dg-add-options ieee }
> -// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* } }
> +// { dg-additional-options "-DSKIP_LONG_DOUBLE" { target
> aarch64-*-vxworks* x86_64-*-vxworks* } }
>
>  #include 
>  #include 
> diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> index 08363d9d04003..df02dff935f40 100644
> --- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> +++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
> @@ -36,7 +36,7 @@
>
>  // On systems that use double-precision from_chars for long double,
>  // this is expected to fail.
> -// { dg-xfail-run-if "from_chars limited to double-precision" {
> aarch64-*-vxworks* i*86-*-vxworks* } }
> +// { dg-xfail-run-if "from_chars limited to double-precision" {
> aarch64-*-vxworks* i*86-*-vxworks* x86_64-*-vxworks* } }
>
>  // { dg-require-effective-target ieee_floats }
>  // { dg-require-effective-target size32plus }
>
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 
>


Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 30, 2023 at 8:07 AM Kito Cheng via Gcc-patches
 wrote:
>
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
>
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.

In the patch you added fixed 16 bytes vector modes, correct?  I've
never looked at
how ARM deals with the GNU vector extensions but I suppose they get mapped
to NEON and not SVE so basically behave the same way here.

But I do wonder about the efficiency for RVV where there doesn't exist a
complementary fixed-length ISA.  Shouldn't vector lowering
(tree-vect-generic.cc)
be enhanced to support lowering fixed-length vectors to variable length ones
with (variable) fixed length instead?  From your patch I second-guess the RVV
specification requires 16 byte vectors to be available (or will your
patch split the
insns?) but ideally the user would be able to specify -mrvv-size=32 for an
implementation with 32 byte vectors and then vector lowering would make use
of vectors up to 32 bytes?

Also vector lowering will split smaller vectors not equal to the fixed size to
scalars unless you add all fixed length modes smaller than 16 bytes as well.

> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
>
> This is compatible with VLA vectorization.
>
> Only support move and binary part of operation patterns.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def: Introduce VLS modes.
> * config/riscv/riscv-protos.h (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::vls_mode_p): New.
> * config/riscv/riscv-v.cc (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_mode_p): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::update_vls_mode): New.
> * config/riscv/riscv.cc (riscv_v_ext_mode_p): New.
> (riscv_v_adjust_nunits): Handle VLS type.
> (riscv_hard_regno_nregs): Ditto.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_regmode_natural_size): Ditto.
> * config/riscv/vector-iterators.md (VLS): New.
> (VM): Handle VLS type.
> (vel): Ditto.
> * config/riscv/vector.md: Include vector-vls.md.
> * config/riscv/vector-vls.md: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add vls folder.
> * gcc.target/riscv/rvv/vls/binop-template.h: New test.
> * gcc.target/riscv/rvv/vls/binop-v.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/move-template.h: New test.
> * gcc.target/riscv/rvv/vls/move-v.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-template.h: New test.
> * gcc.target/riscv/rvv/vls/load-store-v.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/vls-types.h: New test.
> ---
>  gcc/config/riscv/riscv-modes.def  |  3 +
>  gcc/config/riscv/riscv-protos.h   |  4 ++
>  gcc/config/riscv/riscv-v.cc   | 67 +++
>  gcc/config/riscv/riscv.cc | 27 +++-
>  gcc/config/riscv/vector-iterators.md  |  6 ++
>  gcc/config/riscv/vector-vls.md| 64 ++
>  gcc/config/riscv/vector.md|  2 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 ++
>  .../gcc.target/riscv/rvv/vls/binop-template.h | 18 +
>  .../gcc.target/riscv/rvv/vls/binop-v.c| 18 +
>  .../gcc.target/riscv/rvv/vls/binop-zve32x.c   | 18 +
>  .../gcc.target/riscv/rvv/vls/binop-zve64x.c   | 18 +
>  .../riscv/rvv/vls/load-store-template.h   |  8 +++
>  .../gcc.target/riscv/rvv/vls/load-store-v.c   | 17 +
>  .../riscv/rvv/vls/load-store-zve32x.c | 17 +
>  .../riscv/rvv/vls/load-store-zve64x.c | 17 +
>  .../gcc.target/riscv/rvv/vls/move-template.h  | 13 
>  .../gcc.target/riscv/rvv/vls/move-v.c | 10 +++
>  .../gcc.target/riscv/rvv/vls/move-zve32x.c| 10 +++
>  .../gcc.target/riscv/rvv/vls/move-zve64x.c| 10 +++
>  .../gcc.target/riscv/rvv/vls/vls-types.h  | 42 
>  21 files changed, 391 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/config/riscv/v

[COMMITTED] ada: Fix coding style in init.c

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Cedric Landet 

The coding style rules require to avoid using FIXME comments. ??? is
preferred.

gcc/ada/

* init.c: Replace FIXME by ???

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/init.c b/gcc/ada/init.c
index 5212a38490d..53ca142 100644
--- a/gcc/ada/init.c
+++ b/gcc/ada/init.c
@@ -248,7 +248,7 @@ __gnat_error_handler (int sig,
   switch (sig)
 {
 case SIGSEGV:
-  /* FIXME: we need to detect the case of a *real* SIGSEGV.  */
+  /* ??? we need to detect the case of a *real* SIGSEGV.  */
   exception = &storage_error;
   msg = "stack overflow or erroneous memory access";
   break;
@@ -340,7 +340,7 @@ __gnat_error_handler (int sig, siginfo_t *si 
ATTRIBUTE_UNUSED, void *ucontext)
   switch (sig)
 {
 case SIGSEGV:
-  /* FIXME: we need to detect the case of a *real* SIGSEGV.  */
+  /* ??? we need to detect the case of a *real* SIGSEGV.  */
   exception = &storage_error;
   msg = "stack overflow or erroneous memory access";
   break;
-- 
2.40.0



[COMMITTED] ada: Fix visibility error with DIC or Type_Invariant aspect on generic type

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The compiler fails to capture global references during the analysis of the
aspect on the generic type because it analyzes a copy of the expression.

gcc/ada/

* exp_util.adb (Build_DIC_Procedure_Body.Add_Own_DIC): When inside
a generic unit, preanalyze the expression directly.
(Build_Invariant_Procedure_Body.Add_Own_Invariants): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 2582524b1dd..4c4844594d2 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -1853,7 +1853,15 @@ package body Exp_Util is
 
   begin
  pragma Assert (Present (DIC_Expr));
- Expr := New_Copy_Tree (DIC_Expr);
+
+ --  We need to preanalyze the expression itself inside a generic to
+ --  be able to capture global references present in it.
+
+ if Inside_A_Generic then
+Expr := DIC_Expr;
+ else
+Expr := New_Copy_Tree (DIC_Expr);
+ end if;
 
  --  Perform the following substitution:
 
@@ -3111,7 +3119,14 @@ package body Exp_Util is
   return;
end if;
 
-   Expr := New_Copy_Tree (Prag_Expr);
+   --  We need to preanalyze the expression itself inside a generic
+   --  to be able to capture global references present in it.
+
+   if Inside_A_Generic then
+  Expr := Prag_Expr;
+   else
+  Expr := New_Copy_Tree (Prag_Expr);
+   end if;
 
--  Substitute all references to type T with references to the
--  _object formal parameter.
-- 
2.40.0



[COMMITTED] ada: Use generalized loop iteration in Put_Image routines

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* libgnat/a-cidlli.adb (Put_Image): Simplify.
* libgnat/a-coinve.adb (Put_Image): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-cidlli.adb | 13 +
 gcc/ada/libgnat/a-coinve.adb | 13 +
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/gcc/ada/libgnat/a-cidlli.adb b/gcc/ada/libgnat/a-cidlli.adb
index 65582d152f0..9e6ad70a103 100644
--- a/gcc/ada/libgnat/a-cidlli.adb
+++ b/gcc/ada/libgnat/a-cidlli.adb
@@ -1283,22 +1283,19 @@ is
is
   First_Time : Boolean := True;
   use System.Put_Images;
+   begin
+  Array_Before (S);
 
-  procedure Put_Elem (Position : Cursor);
-  procedure Put_Elem (Position : Cursor) is
-  begin
+  for X of V loop
  if First_Time then
 First_Time := False;
  else
 Simple_Array_Between (S);
  end if;
 
- Element_Type'Put_Image (S, Element (Position));
-  end Put_Elem;
+ Element_Type'Put_Image (S, X);
+  end loop;
 
-   begin
-  Array_Before (S);
-  Iterate (V, Put_Elem'Access);
   Array_After (S);
end Put_Image;
 
diff --git a/gcc/ada/libgnat/a-coinve.adb b/gcc/ada/libgnat/a-coinve.adb
index 846f819a732..dd0e8cdee40 100644
--- a/gcc/ada/libgnat/a-coinve.adb
+++ b/gcc/ada/libgnat/a-coinve.adb
@@ -2679,22 +2679,19 @@ is
is
   First_Time : Boolean := True;
   use System.Put_Images;
+   begin
+  Array_Before (S);
 
-  procedure Put_Elem (Position : Cursor);
-  procedure Put_Elem (Position : Cursor) is
-  begin
+  for X of V loop
  if First_Time then
 First_Time := False;
  else
 Simple_Array_Between (S);
  end if;
 
- Element_Type'Put_Image (S, Element (Position));
-  end Put_Elem;
+ Element_Type'Put_Image (S, X);
+  end loop;
 
-   begin
-  Array_Before (S);
-  Iterate (V, Put_Elem'Access);
   Array_After (S);
end Put_Image;
 
-- 
2.40.0



[COMMITTED] ada: Only build access-to-subprogram wrappers when expander is active

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Piotr Trojanek 

For access-to-subprogram types with Pre/Post aspects we create a wrapper
routine that evaluates these aspects. Spec of this wrapper was created
always, while its body was only created when expansion was enabled.

Now we only create these wrappers when expansion is enabled. In
particular, we don't create them in GNATprove mode; instead, GNATprove
picks the Pre/Post expressions directly from the aspects.

gcc/ada/

* exp_ch3.adb
(Build_Access_Subprogram_Wrapper_Body): Build wrapper body if requested
by routine that builds wrapper spec.
* sem_ch3.adb
(Analyze_Full_Type_Declaration): Only build wrapper when expander is
active.
(Build_Access_Subprogram_Wrapper):
Remove special-case for GNATprove.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch3.adb |  4 
 gcc/ada/sem_ch3.adb | 12 ++--
 2 files changed, 2 insertions(+), 14 deletions(-)

diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
index 5f651bacafb..f8c99470dd7 100644
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -583,10 +583,6 @@ package body Exp_Ch3 is
   Ptr   : Entity_Id;
 
begin
-  if not Expander_Active then
- return;
-  end if;
-
   --  Create List of actuals for indirect call. The last parameter of the
   --  subprogram declaration is the access value for the indirect call.
 
diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
index f360be810b4..29733e9d31f 100644
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -3224,6 +3224,7 @@ package body Sem_Ch3 is
 
if Ada_Version >= Ada_2022
  and then Present (Aspect_Specifications (N))
+ and then Expander_Active
then
   Build_Access_Subprogram_Wrapper (N);
end if;
@@ -6915,16 +6916,7 @@ package body Sem_Ch3 is
   --  may be handled as a dispatching operation and erroneously registered
   --  in a dispatch table.
 
-  if not GNATprove_Mode then
- Append_Freeze_Action (Id, New_Decl);
-
-  --  Under GNATprove mode there is no such problem but we do not declare
-  --  it in the freezing actions since they are not analyzed under this
-  --  mode.
-
-  else
- Insert_After (Decl, New_Decl);
-  end if;
+  Append_Freeze_Action (Id, New_Decl);
 
   Set_Access_Subprogram_Wrapper (Designated_Type (Id), Subp);
   Build_Access_Subprogram_Wrapper_Body (Decl, New_Decl);
-- 
2.40.0



[COMMITTED] ada: Ensure Default_Stack_Size is greater than Minimum_Stack_Size

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Johannes Kliemann 

The Default_Stack_Size function does not check that the binder specified
default stack size is greater than the minimum stack size for the runtime.
This can result in tasks using default stack sizes less than the minimum
stack size because the Adjust_Storage_Size only adjusts storages sizes for
tasks that explicitly specify a storage size. To avoid this, the binder
specified default stack size is round up to the minimum stack size if
required.

gcc/ada/

* libgnat/s-parame.adb: Check that Default_Stack_Size >=
Minimum_Stack_size.
* libgnat/s-parame__rtems.adb: Ditto.
* libgnat/s-parame__vxworks.adb: Check that Default_Stack_Size >=
Minimum_Stack_size and use the proper Minimum_Stack_Size if
Stack_Check_Limits is enabled.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/s-parame.adb  |  2 ++
 gcc/ada/libgnat/s-parame__rtems.adb   |  2 ++
 gcc/ada/libgnat/s-parame__vxworks.adb | 11 +--
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/libgnat/s-parame.adb b/gcc/ada/libgnat/s-parame.adb
index 930c92d35e2..6bd9f03f63f 100644
--- a/gcc/ada/libgnat/s-parame.adb
+++ b/gcc/ada/libgnat/s-parame.adb
@@ -58,6 +58,8 @@ package body System.Parameters is
begin
   if Default_Stack_Size = -1 then
  return 2 * 1024 * 1024;
+  elsif Size_Type (Default_Stack_Size) < Minimum_Stack_Size then
+ return Minimum_Stack_Size;
   else
  return Size_Type (Default_Stack_Size);
   end if;
diff --git a/gcc/ada/libgnat/s-parame__rtems.adb 
b/gcc/ada/libgnat/s-parame__rtems.adb
index 2f2e70b1796..1d51ae9ec04 100644
--- a/gcc/ada/libgnat/s-parame__rtems.adb
+++ b/gcc/ada/libgnat/s-parame__rtems.adb
@@ -63,6 +63,8 @@ package body System.Parameters is
begin
   if Default_Stack_Size = -1 then
  return 32 * 1024;
+  elsif Size_Type (Default_Stack_Size) < Minimum_Stack_Size then
+ return Minimum_Stack_Size;
   else
  return Size_Type (Default_Stack_Size);
   end if;
diff --git a/gcc/ada/libgnat/s-parame__vxworks.adb 
b/gcc/ada/libgnat/s-parame__vxworks.adb
index 8e0768e1e29..38fe0222622 100644
--- a/gcc/ada/libgnat/s-parame__vxworks.adb
+++ b/gcc/ada/libgnat/s-parame__vxworks.adb
@@ -58,11 +58,13 @@ package body System.Parameters is
begin
   if Default_Stack_Size = -1 then
  if Stack_Check_Limits then
-return 32 * 1024;
 --  Extra stack to allow for 12K exception area.
+return 32 * 1024;
  else
 return 20 * 1024;
  end if;
+  elsif Size_Type (Default_Stack_Size) < Minimum_Stack_Size then
+ return Minimum_Stack_Size;
   else
  return Size_Type (Default_Stack_Size);
   end if;
@@ -74,7 +76,12 @@ package body System.Parameters is
 
function Minimum_Stack_Size return Size_Type is
begin
-  return 8 * 1024;
+  if Stack_Check_Limits then
+ --  Extra stack to allow for 12K exception area.
+ return 20 * 1024;
+  else
+ return 8 * 1024;
+  end if;
end Minimum_Stack_Size;
 
 end System.Parameters;
-- 
2.40.0



[COMMITTED] ada: Fix regression of secondary stack management in return statements

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This happens when the expression of the return statement is a call that does
not return on the same stack as the enclosing function.

gcc/ada/

* sem_res.adb (Resolve_Call): Restrict previous change to calls that
return on the same stack as the enclosing function.  Tidy up.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_res.adb | 69 -
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index b16e48917f2..c2a4bcb58cd 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -6923,69 +6923,62 @@ package body Sem_Res is
  return;
   end if;
 
-  --  Create a transient scope if the resulting type requires it
+  --  Create a transient scope if the expander is active and the resulting
+  --  type requires it.
 
   --  There are several notable exceptions:
 
-  --  a) In init procs, the transient scope overhead is not needed, and is
-  --  even incorrect when the call is a nested initialization call for a
-  --  component whose expansion may generate adjust calls. However, if the
-  --  call is some other procedure call within an initialization procedure
-  --  (for example a call to Create_Task in the init_proc of the task
-  --  run-time record) a transient scope must be created around this call.
-
-  --  b) Enumeration literal pseudo-calls need no transient scope
-
-  --  c) Intrinsic subprograms (Unchecked_Conversion and source info
+  --  a) Intrinsic subprograms (Unchecked_Conversion and source info
   --  functions) do not use the secondary stack even though the return
   --  type may be unconstrained.
 
-  --  d) Calls to a build-in-place function, since such functions may
+  --  b) Subprograms that are ignored ghost entities do not return anything
+
+  --  c) Calls to a build-in-place function, since such functions may
   --  allocate their result directly in a target object, and cases where
   --  the result does get allocated in the secondary stack are checked for
   --  within the specialized Exp_Ch6 procedures for expanding those
   --  build-in-place calls.
 
-  --  e) Calls to inlinable expression functions do not use the secondary
+  --  d) Calls to inlinable expression functions do not use the secondary
   --  stack (since the call will be replaced by its returned object).
 
-  --  f) If the subprogram is marked Inline_Always, then even if it returns
+  --  e) If the subprogram is marked Inline, then even if it returns
   --  an unconstrained type the call does not require use of the secondary
   --  stack. However, inlining will only take place if the body to inline
   --  is already present. It may not be available if e.g. the subprogram is
   --  declared in a child instance.
 
-  --  g) If the subprogram is a static expression function and the call is
+  --  f) If the subprogram is a static expression function and the call is
   --  a static call (the actuals are all static expressions), then we never
   --  want to create a transient scope (this could occur in the case of a
   --  static string-returning call).
 
-  --  h) If the subprogram is an ignored ghost entity, because it does not
-  --  return anything.
-
-  --  i) If the call is the expression of a simple return statement, since
-  --  it will be handled as a tail call by Expand_Simple_Function_Return.
-
-  if Is_Inlined (Nam)
-and then Has_Pragma_Inline (Nam)
-and then Nkind (Unit_Declaration_Node (Nam)) = N_Subprogram_Declaration
-and then Present (Body_To_Inline (Unit_Declaration_Node (Nam)))
-  then
- null;
+  --  g) If the call is the expression of a simple return statement that
+  --  returns on the same stack, since it will be handled as a tail call
+  --  by Expand_Simple_Function_Return.
 
-  elsif Ekind (Nam) = E_Enumeration_Literal
-or else Is_Build_In_Place_Function (Nam)
-or else Is_Intrinsic_Subprogram (Nam)
-or else Is_Inlinable_Expression_Function (Nam)
-or else Is_Static_Function_Call (N)
-or else Is_Ignored_Ghost_Entity (Nam)
-or else Nkind (Parent (N)) = N_Simple_Return_Statement
-  then
- null;
-
-  elsif Expander_Active
+  if Expander_Active
 and then Ekind (Nam) in E_Function | E_Subprogram_Type
 and then Requires_Transient_Scope (Etype (Nam))
+and then not Is_Intrinsic_Subprogram (Nam)
+and then not Is_Ignored_Ghost_Entity (Nam)
+and then not Is_Build_In_Place_Function (Nam)
+and then not Is_Inlinable_Expression_Function (Nam)
+and then not (Is_Inlined (Nam)
+   and then Has_Pragma_Inline (Nam)
+   and then Nkind (Unit_Declaration_Node (Nam)) =
+  

[COMMITTED] ada: Make use of Cannot_Be_Superflat flag on N_Range nodes

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/decl.cc (range_cannot_be_superflat): Return true
immediately if Cannot_Be_Superflat is set.
* gcc-interface/misc.cc (gnat_post_options): Do not override the
-Wstringop-overflow setting.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 4 
 gcc/ada/gcc-interface/misc.cc | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index ec61593a65b..53a11243590 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -6673,6 +6673,10 @@ range_cannot_be_superflat (Node_Id gnat_range)
   Node_Id gnat_scalar_range;
   tree gnu_lb, gnu_hb, gnu_lb_minus_one;
 
+  /* This is the easy case.  */
+  if (Cannot_Be_Superflat (gnat_range))
+return true;
+
   /* If the low bound is not constant, take the worst case by finding an upper
  bound for its type, repeatedly if need be.  */
   while (Nkind (gnat_lb) != N_Integer_Literal
diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index b18ca8c7d88..56c7bb9b533 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -267,9 +267,6 @@ gnat_post_options (const char **pfilename ATTRIBUTE_UNUSED)
   /* No return type warnings for Ada.  */
   warn_return_type = 0;
 
-  /* No string overflow warnings for Ada.  */
-  warn_stringop_overflow = 0;
-
   /* No caret by default for Ada.  */
   if (!OPTION_SET_P (flag_diagnostics_show_caret))
 global_dc->show_caret = false;
-- 
2.40.0



[COMMITTED] ada: Fix fallout of recent fix for missing finalization

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The original fix makes it possible to create transient scopes around return
statements in more cases, but it overlooks that transient scopes are reused
and, in particular, that they can be promoted to secondary stack management.

gcc/ada/

* exp_ch7.adb (Find_Enclosing_Transient_Scope): Return the index in
the scope table instead of the scope's entity.
(Establish_Transient_Scope): If an enclosing scope already exists,
do not set the Uses_Sec_Stack flag on it if the node to be wrapped
is a return statement which requires secondary stack management.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch7.adb | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 520bb099d33..42b41e5cf6b 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -4476,10 +4476,10 @@ package body Exp_Ch7 is
   function Is_Package_Or_Subprogram (Id : Entity_Id) return Boolean;
   --  Determine whether arbitrary Id denotes a package or subprogram [body]
 
-  function Find_Enclosing_Transient_Scope return Entity_Id;
+  function Find_Enclosing_Transient_Scope return Int;
   --  Examine the scope stack looking for the nearest enclosing transient
   --  scope within the innermost enclosing package or subprogram. Return
-  --  Empty if no such scope exists.
+  --  its index in the table or else -1 if no such scope exists.
 
   function Find_Transient_Context (N : Node_Id) return Node_Id;
   --  Locate a suitable context for arbitrary node N which may need to be
@@ -4605,7 +4605,7 @@ package body Exp_Ch7 is
   -- Find_Enclosing_Transient_Scope --
   
 
-  function Find_Enclosing_Transient_Scope return Entity_Id is
+  function Find_Enclosing_Transient_Scope return Int is
   begin
  for Index in reverse Scope_Stack.First .. Scope_Stack.Last loop
 declare
@@ -4620,12 +4620,12 @@ package body Exp_Ch7 is
   exit;
 
elsif Scope.Is_Transient then
-  return Scope.Entity;
+  return Index;
end if;
 end;
  end loop;
 
- return Empty;
+ return -1;
   end Find_Enclosing_Transient_Scope;
 
   
@@ -4822,8 +4822,8 @@ package body Exp_Ch7 is
 
   --  Local variables
 
-  Trans_Id : constant Entity_Id := Find_Enclosing_Transient_Scope;
-  Context  : Node_Id;
+  Trans_Idx : constant Int := Find_Enclosing_Transient_Scope;
+  Context   : Node_Id;
 
--  Start of processing for Establish_Transient_Scope
 
@@ -4831,13 +4831,29 @@ package body Exp_Ch7 is
   --  Do not create a new transient scope if there is already an enclosing
   --  transient scope within the innermost enclosing package or subprogram.
 
-  if Present (Trans_Id) then
+  if Trans_Idx >= 0 then
 
  --  If the transient scope was requested for purposes of managing the
- --  secondary stack, then the existing scope must perform this task.
+ --  secondary stack, then the existing scope must perform this task,
+ --  unless the node to be wrapped is a return statement of a function
+ --  that requires secondary stack management, because the function's
+ --  result would be reclaimed too early (see Find_Transient_Context).
 
  if Manage_Sec_Stack then
-Set_Uses_Sec_Stack (Trans_Id);
+declare
+   SE : Scope_Stack_Entry renames Scope_Stack.Table (Trans_Idx);
+
+begin
+   if Nkind (SE.Node_To_Be_Wrapped) /= N_Simple_Return_Statement
+ or else not
+   Needs_Secondary_Stack
+ (Etype
+   (Return_Applies_To
+ (Return_Statement_Entity (SE.Node_To_Be_Wrapped
+   then
+  Set_Uses_Sec_Stack (SE.Entity);
+   end if;
+end;
  end if;
 
  return;
-- 
2.40.0



[COMMITTED] ada: Fix minor issues in user's guide

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Ronan Desplanques 

gcc/ada/

* doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor 
issues.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Fix minor issues.
* gnat_ugn.texi: Regenerate.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 ...building_executable_programs_with_gnat.rst | 32 -
 .../gnat_ugn/the_gnat_compilation_model.rst   |  2 +-
 gcc/ada/gnat_ugn.texi | 34 +--
 3 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst 
b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
index 2838a302f2e..20e003d4ac7 100644
--- a/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
+++ b/gcc/ada/doc/gnat_ugn/building_executable_programs_with_gnat.rst
@@ -895,12 +895,12 @@ by ``gnatmake``. It may be necessary to use the switch
 Examples of ``gnatmake`` Usage
 --
 
-*gnatmake hello.adb*
+``gnatmake hello.adb``
   Compile all files necessary to bind and link the main program
   :file:`hello.adb` (containing unit ``Hello``) and bind and link the
   resulting object files to generate an executable file :file:`hello`.
 
-*gnatmake main1 main2 main3*
+``gnatmake main1 main2 main3``
   Compile all files necessary to bind and link the main programs
   :file:`main1.adb` (containing unit ``Main1``), :file:`main2.adb`
   (containing unit ``Main2``) and :file:`main3.adb`
@@ -908,7 +908,7 @@ Examples of ``gnatmake`` Usage
   to generate three executable files :file:`main1`,
   :file:`main2`  and :file:`main3`.
 
-*gnatmake -q Main_Unit -cargs -O2 -bargs -l*
+``gnatmake -q Main_Unit -cargs -O2 -bargs -l``
   Compile all files necessary to bind and link the main program unit
   ``Main_Unit`` (from file :file:`main_unit.adb`). All compilations will
   be done with optimization level 2 and the order of elaboration will be
@@ -949,7 +949,7 @@ You need *not* compile the following files
 
 * subunits
 
-because they are compiled as part of compiling related units. GNAT
+because they are compiled as part of compiling related units. GNAT compiles
 package specs
 when the corresponding body is compiled, and subunits when the parent is
 compiled.
@@ -997,8 +997,6 @@ two output files in the current directory, but you may 
specify a source
 file in any directory using an absolute or relative path specification
 containing the directory information.
 
-TESTING: the :switch:`--foobar{NN}` switch
-
 .. index::  gnat1
 
 ``gcc`` is actually a driver program that looks at the extensions of
@@ -1068,7 +1066,7 @@ directories, in the following order:
 * The content of the :file:`ada_source_path` file which is part of the GNAT
   installation tree and is used to store standard libraries such as the
   GNAT Run Time Library (RTL) source files.
-  :ref:`Installing_a_library`
+  See also :ref:`Installing_a_library`.
 
 Specifying the switch :switch:`-I-`
 inhibits the use of the directory
@@ -1159,7 +1157,7 @@ Compile body in file :file:`xyz.adb` with all default 
options.
 $ gcc -c -O2 -gnata xyz-def.adb
 
 Compile the child unit package in file :file:`xyz-def.adb` with extensive
-optimizations, and pragma ``Assert``/`Debug` statements
+optimizations, and pragma ``Assert``/``Debug`` statements
 enabled.
 
 .. code-block:: sh
@@ -1274,7 +1272,7 @@ Alphabetical List of All Switches
   size of the executable, compared with a traditional per-unit compilation
   with inlining across units enabled by the :switch:`-gnatn` switch.
   The drawback of this approach is that it may require more memory and that
-  the debugging information generated by -g with it might be hardly usable.
+  the debugging information generated by ``-g`` with it might be hardly usable.
   The switch, as well as the accompanying :switch:`-Ox` switches, must be
   specified both for the compilation and the link phases.
   If the ``n`` parameter is specified, the optimization and final code
@@ -1472,7 +1470,7 @@ Alphabetical List of All Switches
   This switch will generate an intermediate representation suitable for
   use by CodePeer (:file:`.scil` files). This switch is not compatible with
   code generation (it will, among other things, disable some switches such
-  as -gnatn, and enable others such as -gnata).
+  as ``-gnatn``, and enable others such as ``-gnata``).
 
 
 .. index:: -gnatd  (gcc)
@@ -1482,9 +1480,9 @@ Alphabetical List of All Switches
   the :switch:`-gnatd` specifies the specific debug options. The possible
   characters are 0-9, a-z, A-Z, optionally preceded by a dot or underscore.
   See compiler source file :file:`debug.adb` for details of the implemented
-  debug options. Certain debug options are relevant to applications
+  debug options. Certain debug options are relevant to application
   programmers, and these are documented at appropriate points in this
-  users guide.
+  user's guide.
 
 
 .. index:: -gnatD[nn]  (gcc

[COMMITTED] ada: Fix storage model handling for dereference as lvalue and renamings

2023-05-30 Thread Marc Poulhiès via Gcc-patches
Don't require storage access for explicit dereferences used as
lvalue (e.g. Some_Access.all'Address) or for renamings.

gcc/ada/

* gcc-interface/trans.cc (get_storage_model_access): Don't require
storage model access for dereference used as lvalue or renamings.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 13f438c424b..f4a5db002f4 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4398,14 +4398,32 @@ static void
 get_storage_model_access (Node_Id gnat_node, Entity_Id *gnat_smo)
 {
   const Node_Id gnat_parent = Parent (gnat_node);
+  *gnat_smo = Empty;
 
-  /* If we are the prefix of the parent, then the access is above us.  */
-  if (node_is_component (gnat_parent) && Prefix (gnat_parent) == gnat_node)
+  switch (Nkind (gnat_parent))
 {
-  *gnat_smo = Empty;
+case N_Attribute_Reference:
+  /* If the parent is an attribute reference that requires an lvalue and
+ gnat_node is the Prefix (i.e. not a parameter), we do not need to
+ actually access any storage. */
+  if (lvalue_required_for_attribute_p (gnat_parent)
+  && Prefix (gnat_parent) == gnat_node)
+return;
+  break;
+
+case N_Object_Renaming_Declaration:
+  /* Nothing to do for the identifier in an object renaming declaration,
+ the renaming itself does not need storage model access. */
   return;
+
+default:
+  break;
 }
 
+  /* If we are the prefix of the parent, then the access is above us.  */
+  if (node_is_component (gnat_parent) && Prefix (gnat_parent) == gnat_node)
+return;
+
   /* Now strip any type conversion from GNAT_NODE.  */
   if (Nkind (gnat_node) == N_Type_Conversion
   || Nkind (gnat_node) == N_Unchecked_Type_Conversion)
-- 
2.40.0



[COMMITTED] ada: Fix wrong access for qualified aggregate with storage model

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The previous fix to get_storage_model_access was incomplete and needs to be
extended to the node itself.

gcc/ada/

* gcc-interface/trans.cc (get_storage_model_access): Also strip any
type conversion in the node when unwinding the components.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 2e8d979831f..ddc7b6dde1e 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4438,12 +4438,15 @@ get_storage_model_access (Node_Id gnat_node, Entity_Id 
*gnat_smo)
  && Prefix (Parent (gnat_parent)) == gnat_parent))
 return;
 
-  /* Now strip any type conversion from GNAT_NODE.  */
+  /* Find the innermost prefix in GNAT_NODE, stripping any type conversion.  */
   if (node_is_type_conversion (gnat_node))
 gnat_node = Expression (gnat_node);
-
   while (node_is_component (gnat_node))
-gnat_node = Prefix (gnat_node);
+{
+  gnat_node = Prefix (gnat_node);
+  if (node_is_type_conversion (gnat_node))
+   gnat_node = Expression (gnat_node);
+}
 
   *gnat_smo = get_storage_model (gnat_node);
 }
-- 
2.40.0



[COMMITTED] ada: Fix internal error on array constant in expression function

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This happens when the peculiar check emitted by Check_Large_Modular_Array
is applied to an object whose actual subtype is an itype with dynamic size,
because the first reference to the itype in the expanded code may turn out
to be within the raise statement, which is problematic for the eloboration
of this itype by the code generator at library level.

gcc/ada/

* freeze.adb (Check_Large_Modular_Array): Fix head comment, use
Standard_Long_Long_Integer_Size directly and generate a reference
just before the raise statement if the Etype of the object is an
itype declared in an open scope.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/freeze.adb | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
index 8ebf10bd576..83ce0300871 100644
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -4110,9 +4110,10 @@ package body Freeze is
  procedure Check_Large_Modular_Array (Typ : Entity_Id);
  --  Check that the size of array type Typ can be computed without
  --  overflow, and generates a Storage_Error otherwise. This is only
- --  relevant for array types whose index has System_Max_Integer_Size
- --  bits, where wrap-around arithmetic might yield a meaningless value
- --  for the length of the array, or its corresponding attribute.
+ --  relevant for array types whose index is a modular type with
+ --  Standard_Long_Long_Integer_Size bits: wrap-around arithmetic
+ --  might yield a meaningless value for the length of the array,
+ --  or its corresponding attribute.
 
  procedure Check_Pragma_Thread_Local_Storage (Var_Id : Entity_Id);
  --  Ensure that the initialization state of variable Var_Id subject
@@ -4170,8 +4171,24 @@ package body Freeze is
 --  Storage_Error.
 
 if Is_Modular_Integer_Type (Idx_Typ)
-  and then RM_Size (Idx_Typ) = RM_Size (Standard_Long_Long_Integer)
+  and then RM_Size (Idx_Typ) = Standard_Long_Long_Integer_Size
 then
+   --  Ensure that the type of the object is elaborated before
+   --  the check itself is emitted to avoid elaboration issues
+   --  in the code generator at the library level.
+
+   if Is_Itype (Etype (E))
+ and then In_Open_Scopes (Scope (Etype (E)))
+   then
+  declare
+ Ref_Node : constant Node_Id :=
+  Make_Itype_Reference (Obj_Loc);
+  begin
+ Set_Itype (Ref_Node, Etype (E));
+ Insert_Action (Declaration_Node (E), Ref_Node);
+  end;
+   end if;
+
Insert_Action (Declaration_Node (E),
  Make_Raise_Storage_Error (Obj_Loc,
Condition =>
-- 
2.40.0



[COMMITTED] ada: Add System.Traceback.Symbolic.Module_Name support on AArch64 Linux

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Joel Brobecker 

This commit changes the runtime on aarch64-linux to use the Linux
version of s-tsmona.adb, so as to add support for this functionality
on aarch64-linux.

gcc/ada/

* Makefile.rtl: Use libgnat/s-tsmona__linux.adb on
aarch64-linux.  Link libgnat with -ldl, as the use of
s-tsmona__linux.adb requires it.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/Makefile.rtl | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index e2f437ff6e5..ca4c528a7e0 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2250,6 +2250,7 @@ ifeq ($(strip $(filter-out aarch64% linux%,$(target_cpu) 
$(target_os))),)
   s-intman.adb

[COMMITTED] ada: Minor generic tweaks left and and right

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

No functional changes.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Replace
integer_zero_node with null_pointer_node for pointer types.
* gcc-interface/trans.cc (gnat_gimplify_expr) : Likewise.
* gcc-interface/utils.cc (maybe_pad_type): Do not attempt to make a
packable type from a fat pointer type.
* gcc-interface/utils2.cc (build_atomic_load): Use a local variable.
(build_atomic_store): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc   |  2 +-
 gcc/ada/gcc-interface/trans.cc  |  2 +-
 gcc/ada/gcc-interface/utils.cc  |  1 +
 gcc/ada/gcc-interface/utils2.cc | 21 +++--
 4 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 53a11243590..456fe53737d 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -1212,7 +1212,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
&& (POINTER_TYPE_P (gnu_type) || TYPE_IS_FAT_POINTER_P (gnu_type))
&& !gnu_expr
&& !Is_Imported (gnat_entity))
- gnu_expr = integer_zero_node;
+ gnu_expr = null_pointer_node;
 
/* If we are defining the object and it has an Address clause, we must
   either get the address expression from the saved GCC tree for the
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 4e5f26305f5..8c8a78f5d2d 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -8987,7 +8987,7 @@ gnat_gimplify_expr (tree *expr_p, gimple_seq *pre_p,
  || TREE_CODE (type) == UNCONSTRAINED_ARRAY_TYPE)
*expr_p = build_unary_op (INDIRECT_REF, NULL_TREE,
  convert (build_pointer_type (type),
-  integer_zero_node));
+  null_pointer_node));
 
   /* Otherwise, just make a VAR_DECL.  */
   else
diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 337b552619e..8f1861b848e 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.cc
@@ -1562,6 +1562,7 @@ maybe_pad_type (tree type, tree size, unsigned int align,
  at the RTL level when the stand-alone object is accessed as a whole.  */
   if (align > 0
   && RECORD_OR_UNION_TYPE_P (type)
+  && !TYPE_IS_FAT_POINTER_P (type)
   && TYPE_MODE (type) == BLKmode
   && !TYPE_BY_REFERENCE_P (type)
   && TREE_CODE (orig_size) == INTEGER_CST
diff --git a/gcc/ada/gcc-interface/utils2.cc b/gcc/ada/gcc-interface/utils2.cc
index c56fccb4a98..e1737724b65 100644
--- a/gcc/ada/gcc-interface/utils2.cc
+++ b/gcc/ada/gcc-interface/utils2.cc
@@ -692,13 +692,14 @@ build_atomic_load (tree src, bool sync)
 = build_int_cst (integer_type_node,
 sync ? MEMMODEL_SEQ_CST : MEMMODEL_RELAXED);
   tree orig_src = src;
-  tree t, addr, val;
+  tree type, t, addr, val;
   unsigned int size;
   int fncode;
 
   /* Remove conversions to get the address of the underlying object.  */
   src = remove_conversions (src, false);
-  size = resolve_atomic_size (TREE_TYPE (src));
+  type = TREE_TYPE (src);
+  size = resolve_atomic_size (type);
   if (size == 0)
 return orig_src;
 
@@ -710,7 +711,7 @@ build_atomic_load (tree src, bool sync)
 
   /* First reinterpret the loaded bits in the original type of the load,
  then convert to the expected result type.  */
-  t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val);
+  t = fold_build1 (VIEW_CONVERT_EXPR, type, val);
   return convert (TREE_TYPE (orig_src), t);
 }
 
@@ -728,13 +729,14 @@ build_atomic_store (tree dest, tree src, bool sync)
 = build_int_cst (integer_type_node,
 sync ? MEMMODEL_SEQ_CST : MEMMODEL_RELAXED);
   tree orig_dest = dest;
-  tree t, int_type, addr;
+  tree type, t, int_type, addr;
   unsigned int size;
   int fncode;
 
   /* Remove conversions to get the address of the underlying object.  */
   dest = remove_conversions (dest, false);
-  size = resolve_atomic_size (TREE_TYPE (dest));
+  type = TREE_TYPE (dest);
+  size = resolve_atomic_size (type);
   if (size == 0)
 return build_binary_op (MODIFY_EXPR, NULL_TREE, orig_dest, src);
 
@@ -746,12 +748,11 @@ build_atomic_store (tree dest, tree src, bool sync)
  then reinterpret them in the effective type.  But if the original type
  is a padded type with the same size, convert to the inner type instead,
  as we don't want to artificially introduce a CONSTRUCTOR here.  */
-  if (TYPE_IS_PADDING_P (TREE_TYPE (dest))
-  && TYPE_SIZE (TREE_TYPE (dest))
-== TYPE_SIZE (TREE_TYPE (TYPE_FIELDS (TREE_TYPE (dest)
-src = convert (TREE_TYPE (TYPE_FIELDS (TREE_TYPE (dest))), src);
+  if (TYPE_IS_PADDING_P (type)
+  && TYPE_SIZE (type) == TYPE_SIZE (TREE_TYPE (T

[COMMITTED] ada: Fix wrong expansion of array aggregate with noncontiguous choices

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This extends an earlier fix done for the others choice of an array aggregate
to all the choices of the aggregate, since the same sharing issue may happen
when the choices are not contiguous.

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code.Get_Assoc_Expr): Duplicate the
expression here instead of...
(Build_Array_Aggr_Code): ...here.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 38 ++
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 93fcac5439e..da31d2480f2 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -2215,21 +2215,32 @@ package body Exp_Aggr is
   -- Get_Assoc_Expr --
   
 
+  --  Duplicate the expression in case we will be generating several loops.
+  --  As a result the expression is no longer shared between the loops and
+  --  is reevaluated for each such loop.
+
   function Get_Assoc_Expr (Assoc : Node_Id) return Node_Id is
  Typ : constant Entity_Id := Base_Type (Etype (N));
 
   begin
  if Box_Present (Assoc) then
 if Present (Default_Aspect_Component_Value (Typ)) then
-   return Default_Aspect_Component_Value (Typ);
+   return New_Copy_Tree (Default_Aspect_Component_Value (Typ));
 elsif Needs_Simple_Initialization (Ctype) then
-   return Get_Simple_Init_Val (Ctype, N);
+   return New_Copy_Tree (Get_Simple_Init_Val (Ctype, N));
 else
return Empty;
 end if;
 
  else
-return Expression (Assoc);
+--  The expression will be passed to Gen_Loop, which immediately
+--  calls Parent_Kind on it, so we set Parent when it matters.
+
+return
+   Expr : constant Node_Id := New_Copy_Tree (Expression (Assoc))
+do
+   Copy_Parent (To => Expr, From => Expression (Assoc));
+end return;
  end if;
   end Get_Assoc_Expr;
 
@@ -2394,8 +2405,7 @@ package body Exp_Aggr is
 
  if Present (Others_Assoc) then
 declare
-   First: Boolean := True;
-   Dup_Expr : Node_Id;
+   First : Boolean := True;
 
 begin
for J in 0 .. Nb_Choices loop
@@ -2429,23 +2439,11 @@ package body Exp_Aggr is
  end if;
   end if;
 
-  if First
-or else not Empty_Range (Low, High)
-  then
+  if First or else not Empty_Range (Low, High) then
  First := False;
-
- --  Duplicate the expression in case we will be generating
- --  several loops. As a result the expression is no longer
- --  shared between the loops and is reevaluated for each
- --  such loop.
-
- Expr := Get_Assoc_Expr (Others_Assoc);
- Dup_Expr := New_Copy_Tree (Expr);
- Copy_Parent (To => Dup_Expr, From => Expr);
-
  Set_Loop_Actions (Others_Assoc, New_List);
- Append_List
-   (Gen_Loop (Low, High, Dup_Expr), To => New_Code);
+ Expr := Get_Assoc_Expr (Others_Assoc);
+ Append_List (Gen_Loop (Low, High, Expr), To => New_Code);
   end if;
end loop;
 end;
-- 
2.40.0



[COMMITTED] ada: Small cleanups and fixes in expansion of aggregates

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This streamlines the handling of qualified expressions in the expansion of
aggregates and plugs a couple of loopholes that may cause memory leaks.

gcc/ada/

* exp_aggr.adb (Build_Array_Aggr_Code): Move the declaration of Typ
to the beginning.
(Initialize_Array_Component): Test the unqualified version of the
expression for the nested array case.
(Initialize_Ctrl_Array_Component): Do not duplicate the expression
here.  Do the pattern matching of the unqualified version of it.
(Gen_Assign): Call Unqualify to compute Expr_Q and use Expr_Q in
subsequent pattern matching.
(Initialize_Ctrl_Record_Component): Do the pattern matching of the
unqualified version of the aggregate.
(Build_Record_Aggr_Code): Call Unqualify.
(Convert_Aggr_In_Assignment): Likewise.
(Convert_Aggr_In_Object_Decl): Likewise.
(Component_OK_For_Backend): Likewise.
(Is_Delayed_Aggregate): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 90 ++--
 1 file changed, 28 insertions(+), 62 deletions(-)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index da31d2480f2..270d3bb8d66 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -1060,6 +1060,7 @@ package body Exp_Aggr is
   Indexes : List_Id := No_List) return List_Id
is
   Loc  : constant Source_Ptr := Sloc (N);
+  Typ  : constant Entity_Id  := Etype (N);
   Index_Base   : constant Entity_Id  := Base_Type (Etype (Index));
   Index_Base_L : constant Node_Id := Type_Low_Bound (Index_Base);
   Index_Base_H : constant Node_Id := Type_High_Bound (Index_Base);
@@ -1460,7 +1461,7 @@ package body Exp_Aggr is
   and then not
 (Is_Array_Type (Comp_Typ)
   and then Needs_Finalization (Component_Type (Comp_Typ))
-  and then Nkind (Expr) = N_Aggregate)
+  and then Nkind (Unqualify (Init_Expr)) = N_Aggregate)
 then
Adj_Call :=
  Make_Adjust_Call
@@ -1522,9 +1523,10 @@ package body Exp_Aggr is
 Init_Expr : Node_Id;
 Stmts : List_Id)
  is
+Init_Expr_Q : constant Node_Id := Unqualify (Init_Expr);
+
 Act_Aggr   : Node_Id;
 Act_Stmts  : List_Id;
-Expr   : Node_Id;
 Fin_Call   : Node_Id;
 Hook_Clear : Node_Id;
 
@@ -1533,29 +1535,20 @@ package body Exp_Aggr is
 --  in-place expansion.
 
  begin
---  Duplicate the initialization expression in case the context is
---  a multi choice list or an "others" choice which plugs various
---  holes in the aggregate. As a result the expression is no longer
---  shared between the various components and is reevaluated for
---  each such component.
-
-Expr := New_Copy_Tree (Init_Expr);
-Set_Parent (Expr, Parent (Init_Expr));
-
 --  Perform a preliminary analysis and resolution to determine what
 --  the initialization expression denotes. An unanalyzed function
 --  call may appear as an identifier or an indexed component.
 
-if Nkind (Expr) in N_Function_Call
- | N_Identifier
- | N_Indexed_Component
-  and then not Analyzed (Expr)
+if Nkind (Init_Expr_Q) in N_Function_Call
+| N_Identifier
+| N_Indexed_Component
+  and then not Analyzed (Init_Expr)
 then
-   Preanalyze_And_Resolve (Expr, Comp_Typ);
+   Preanalyze_And_Resolve (Init_Expr, Comp_Typ);
 end if;
 
 In_Place_Expansion :=
-  Nkind (Expr) = N_Function_Call
+  Nkind (Init_Expr_Q) = N_Function_Call
 and then not Is_Build_In_Place_Result_Type (Comp_Typ);
 
 --  The initialization expression is a controlled function call.
@@ -1572,7 +1565,7 @@ package body Exp_Aggr is
--  generation of a transient scope, which leads to out-of-order
--  adjustment and finalization.
 
-   Set_No_Side_Effect_Removal (Expr);
+   Set_No_Side_Effect_Removal (Init_Expr);
 
--  When the transient component initialization is related to a
--  range or an "others", keep all generated statements within
@@ -1598,7 +1591,7 @@ package body Exp_Aggr is
Process_Transient_Component
  (Loc=> Loc,
   Comp_Typ   => Comp_Typ,
-  Init_Expr  => Expr,
+  Init_Expr  => Init_Expr,
   Fin_Call   => Fin_Call,
   Hook_Clear => Hook_C

[COMMITTED] ada: Fix internal error on qualified aggregate with storage model

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

It comes from a small oversight in get_storage_model_access.

gcc/ada/

* gcc-interface/trans.cc (node_is_component): Remove parentheses.
(node_is_type_conversion): New predicate.
(get_atomic_access): Use it.
(get_storage_model_access): Likewise and look into the parent to
find a component if it returns true.
(present_in_lhs_or_actual_p): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 36 ++
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 18f7e73d45d..2e8d979831f 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4264,8 +4264,16 @@ static inline bool
 node_is_component (Node_Id gnat_node)
 {
   const Node_Kind k = Nkind (gnat_node);
-  return
-(k == N_Indexed_Component || k == N_Selected_Component || k == N_Slice);
+  return k == N_Indexed_Component || k == N_Selected_Component || k == N_Slice;
+}
+
+/* Return true if GNAT_NODE is a type conversion.  */
+
+static inline bool
+node_is_type_conversion (Node_Id gnat_node)
+{
+  const Node_Kind k = Nkind (gnat_node);
+  return k == N_Type_Conversion || k == N_Unchecked_Type_Conversion;
 }
 
 /* Compute whether GNAT_NODE requires atomic access and set TYPE to the type
@@ -4316,8 +4324,7 @@ get_atomic_access (Node_Id gnat_node, atomic_acces_t 
*type, bool *sync)
 }
 
   /* Now strip any type conversion from GNAT_NODE.  */
-  if (Nkind (gnat_node) == N_Type_Conversion
-  || Nkind (gnat_node) == N_Unchecked_Type_Conversion)
+  if (node_is_type_conversion (gnat_node))
 gnat_node = Expression (gnat_node);
 
   /* Up to Ada 2012, for Atomic itself, only reads and updates of the object as
@@ -4425,12 +4432,14 @@ get_storage_model_access (Node_Id gnat_node, Entity_Id 
*gnat_smo)
 }
 
   /* If we are the prefix of the parent, then the access is above us.  */
-  if (node_is_component (gnat_parent) && Prefix (gnat_parent) == gnat_node)
+  if ((node_is_component (gnat_parent) && Prefix (gnat_parent) == gnat_node)
+  || (node_is_type_conversion (gnat_parent)
+ && node_is_component (Parent (gnat_parent))
+ && Prefix (Parent (gnat_parent)) == gnat_parent))
 return;
 
   /* Now strip any type conversion from GNAT_NODE.  */
-  if (Nkind (gnat_node) == N_Type_Conversion
-  || Nkind (gnat_node) == N_Unchecked_Type_Conversion)
+  if (node_is_type_conversion (gnat_node))
 gnat_node = Expression (gnat_node);
 
   while (node_is_component (gnat_node))
@@ -6115,16 +6124,9 @@ lhs_or_actual_p (Node_Id gnat_node)
 static bool
 present_in_lhs_or_actual_p (Node_Id gnat_node)
 {
-  if (lhs_or_actual_p (gnat_node))
-return true;
-
-  const Node_Kind kind = Nkind (Parent (gnat_node));
-
-  if ((kind == N_Type_Conversion || kind == N_Unchecked_Type_Conversion)
-  && lhs_or_actual_p (Parent (gnat_node)))
-return true;
-
-  return false;
+  return lhs_or_actual_p (gnat_node)
+|| (node_is_type_conversion (Parent (gnat_node))
+&& lhs_or_actual_p (Parent (gnat_node)));
 }
 
 /* Return true if GNAT_NODE, an unchecked type conversion, is a no-op as far
-- 
2.40.0



[COMMITTED] ada: Simplify the implementation of storage models

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

As the additional temporaries required by the semantics of nonnative storage
models are now created by the front-end, in particular for actual parameters
and assignment statements, the corresponding code in gigi can be removed.

gcc/ada/

* gcc-interface/trans.cc (Call_to_gnu): Remove code implementing the
by-copy semantics for actuals with nonnative storage models.
(gnat_to_gnu) : Remove code instantiating a
temporary for assignments between nonnative storage models.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 130 +++--
 1 file changed, 27 insertions(+), 103 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index f4a5db002f4..92c8dc33af8 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -4560,14 +4560,13 @@ elaborate_profile (Entity_Id first_formal, Entity_Id 
result_type)
N_Assignment_Statement and the result is to be placed into that object.
ATOMIC_ACCESS is the type of atomic access to be used for the assignment
to GNU_TARGET.  If, in addition, ATOMIC_SYNC is true, then the assignment
-   to GNU_TARGET requires atomic synchronization.  GNAT_STORAGE_MODEL is the
-   storage model object to be used for the assignment to GNU_TARGET or Empty
-   if there is none.  */
+   to GNU_TARGET requires atomic synchronization.  GNAT_SMO is the storage
+   model object to be used for the assignment to GNU_TARGET or Empty if there
+   is none.  */
 
 static tree
 Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
-atomic_acces_t atomic_access, bool atomic_sync,
-Entity_Id gnat_storage_model)
+atomic_acces_t atomic_access, bool atomic_sync, Entity_Id gnat_smo)
 {
   const bool function_call = (Nkind (gnat_node) == N_Function_Call);
   const bool returning_value = (function_call && !gnu_target);
@@ -4599,7 +4598,6 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
   Node_Id gnat_actual;
   atomic_acces_t aa_type;
   bool aa_sync;
-  Entity_Id gnat_smo;
 
   /* The only way we can make a call via an access type is if GNAT_NAME is an
  explicit dereference.  In that case, get the list of formal args from the
@@ -4751,8 +4749,8 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
 != TYPE_SIZE (TREE_TYPE (gnu_target))
  && type_is_padding_self_referential (gnu_result_type))
  || (gnu_target
- && Present (gnat_storage_model)
- && Present (Storage_Model_Copy_To (gnat_storage_model)
+ && Present (gnat_smo)
+ && Present (Storage_Model_Copy_To (gnat_smo)
 {
   gnu_retval = create_temporary ("R", gnu_result_type);
   DECL_RETURN_VALUE_P (gnu_retval) = 1;
@@ -4823,19 +4821,12 @@ Call_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, tree gnu_target,
  = build_compound_expr (TREE_TYPE (gnu_name), init, gnu_name);
}
 
-  get_storage_model_access (gnat_actual, &gnat_smo);
-
-  /* If we are passing a non-addressable actual parameter by reference,
-pass the address of a copy.  Likewise if it needs to be accessed with
-a storage model.  In the In Out or Out case, set up to copy back out
-after the call.  */
+  /* If we are passing a non-addressable parameter by reference, pass the
+address of a copy.  In the In Out or Out case, set up to copy back
+out after the call.  */
   if (is_by_ref_formal_parm
  && (gnu_name_type = gnat_to_gnu_type (Etype (gnat_name)))
- && (!addressable_p (gnu_name, gnu_name_type)
- || (Present (gnat_smo)
- && (Present (Storage_Model_Copy_From (gnat_smo))
- || (!in_param
- && Present (Storage_Model_Copy_To (gnat_smo)))
+ && !addressable_p (gnu_name, gnu_name_type))
{
  tree gnu_orig = gnu_name, gnu_temp, gnu_stmt;
 
@@ -4906,40 +4897,21 @@ Call_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, tree gnu_target,
}
 
  /* Create an explicit temporary holding the copy.  */
- tree gnu_temp_type;
- if (Nkind (gnat_actual) == N_Explicit_Dereference
- && Present (Actual_Designated_Subtype (gnat_actual)))
-   gnu_temp_type
- = gnat_to_gnu_type (Actual_Designated_Subtype (gnat_actual));
- else
-   gnu_temp_type = TREE_TYPE (gnu_name);
 
  /* Do not initialize it for the _Init parameter of an initialization
 procedure since no data is meant to be passed in.  */
  if (Ekind (gnat_formal) == E_Out_Parameter
  && Is_Entity_Name (gnat_subprog)
  && Is_Init_Proc (Entity (gnat_subprog)))
-   gnu_name = gnu_temp = create_temporary ("A", gnu_temp_type);
+   

[COMMITTED] ada: Disable PIE mode during the build of the Ada front-end

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This also removes some obsolete stuff.

gcc/ada/

* gcc-interface/Make-lang.in (ADA_CFLAGS): Move up.
(ALL_ADAFLAGS): Add $(NO_PIE_CFLAGS).
(ada/mdll.o): Remove.
(ada/mdll-fil.o): Likewise.
(ada/mdll-utl.o): Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/Make-lang.in | 16 +++-
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/gcc/ada/gcc-interface/Make-lang.in 
b/gcc/ada/gcc-interface/Make-lang.in
index 7b826f2366f..d7bab7d3ce8 100644
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -71,10 +71,11 @@ else
   ADAFLAGS=$(COMMON_ADAFLAGS)
 endif
 
+ADA_CFLAGS =
 ALL_ADAFLAGS = \
-  $(CFLAGS) $(ADA_CFLAGS) $(ADAFLAGS) $(CHECKING_ADAFLAGS) $(WARN_ADAFLAGS)
+  $(CFLAGS) $(NO_PIE_CFLAGS) $(ADA_CFLAGS) \
+  $(ADAFLAGS) $(CHECKING_ADAFLAGS) $(WARN_ADAFLAGS)
 FORCE_DEBUG_ADAFLAGS = -g
-ADA_CFLAGS =
 COMMON_ADA_INCLUDES = -I- -I. -Iada/generated -Iada -I$(srcdir)/ada
 
 STAGE1_LIBS=
@@ -1174,17 +1175,6 @@ ada/gnatvsn.o : ada/gnatvsn.adb ada/generated/gnatvsn.ads
$(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(ADA_OUTPUT_OPTION)
@$(ADA_DEPS)
 
-# Dependencies for windows specific tool (mdll)
-
-ada/mdll.o : ada/mdll.adb ada/mdll.ads ada/mdll-fil.ads ada/mdll-utl.ads
-   $(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(ADA_OUTPUT_OPTION)
-
-ada/mdll-fil.o : ada/mdll-fil.adb ada/mdll.ads ada/mdll-fil.ads
-   $(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(ADA_OUTPUT_OPTION)
-
-ada/mdll-utl.o : ada/mdll-utl.adb ada/mdll.ads ada/mdll-utl.ads 
ada/sdefault.ads ada/types.ads
-   $(CC) -c $(ALL_ADAFLAGS) $(ADA_INCLUDES) $< $(ADA_OUTPUT_OPTION)
-
 # All generated files.  Perhaps we should build all of these in the same
 # subdirectory, and get rid of ada/bldtools.
 # Warning: the files starting with ada/gnat.ads are not really generated,
-- 
2.40.0



[COMMITTED] ada: Adjust again the implementation of storage models

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

The code generator must now be prepared to translate assignment statements
to objects allocated with a storage model and that are not initialized yet.

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu) : Tweak.
(gnat_to_gnu) : Declare a local variable.
For a target with a storage model, use the Actual_Designated_Subtype
to compute the size if it is present.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 51 +++---
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 92c8dc33af8..4e5f26305f5 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -1945,24 +1945,20 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
  /* If this is a dereference and we have a special dynamic constrained
 subtype on the prefix, use it to compute the size; otherwise, use
 the designated subtype.  */
- if (Nkind (gnat_prefix) == N_Explicit_Dereference)
+ if (Nkind (gnat_prefix) == N_Explicit_Dereference
+ && Present (Actual_Designated_Subtype (gnat_prefix)))
{
- Node_Id gnat_actual_subtype
-   = Actual_Designated_Subtype (gnat_prefix);
+ tree gnu_actual_obj_type
+   = gnat_to_gnu_type (Actual_Designated_Subtype (gnat_prefix));
  tree gnu_ptr_type
= TREE_TYPE (gnat_to_gnu (Prefix (gnat_prefix)));
 
- if (TYPE_IS_FAT_OR_THIN_POINTER_P (gnu_ptr_type)
- && Present (gnat_actual_subtype))
-   {
- tree gnu_actual_obj_type
-   = gnat_to_gnu_type (gnat_actual_subtype);
- gnu_type
-   = build_unc_object_type_from_ptr (gnu_ptr_type,
- gnu_actual_obj_type,
- get_identifier ("SIZE"),
- false);
-   }
+ if (TYPE_IS_FAT_OR_THIN_POINTER_P (gnu_ptr_type))
+   gnu_type
+ = build_unc_object_type_from_ptr (gnu_ptr_type,
+   gnu_actual_obj_type,
+   get_identifier ("SIZE"),
+   false);
}
 
  gnu_result = TYPE_SIZE (gnu_type);
@@ -7378,13 +7374,13 @@ gnat_to_gnu (Node_Id gnat_node)
   /* Otherwise we need to build the assignment statement manually.  */
   else
{
+ const Node_Id gnat_name = Name (gnat_node);
  const Node_Id gnat_expr = Expression (gnat_node);
  const Node_Id gnat_inner
= Nkind (gnat_expr) == N_Qualified_Expression
  ? Expression (gnat_expr)
  : gnat_expr;
- const Entity_Id gnat_type
-   = Underlying_Type (Etype (Name (gnat_node)));
+ const Entity_Id gnat_type = Underlying_Type (Etype (gnat_name));
  const bool use_memset_p
= Is_Array_Type (gnat_type)
  && Nkind (gnat_inner) == N_Aggregate
@@ -7409,8 +7405,8 @@ gnat_to_gnu (Node_Id gnat_node)
 
  gigi_checking_assert (!Do_Range_Check (gnat_expr));
 
- get_atomic_access (Name (gnat_node), &aa_type, &aa_sync);
- get_storage_model_access (Name (gnat_node), &gnat_smo);
+ get_atomic_access (gnat_name, &aa_type, &aa_sync);
+ get_storage_model_access (gnat_name, &gnat_smo);
 
  /* If an outer atomic access is required on the LHS, build the load-
 modify-store sequence.  */
@@ -7427,15 +7423,26 @@ gnat_to_gnu (Node_Id gnat_node)
  else if (Present (gnat_smo)
   && Present (Storage_Model_Copy_To (gnat_smo)))
{
+ tree gnu_size;
+
  /* We obviously cannot use memset in this case.  */
  gcc_assert (!use_memset_p);
 
- /* We cannot directly move between nonnative storage models.  */
- tree t = remove_conversions (gnu_rhs, false);
- gcc_assert (TREE_CODE (t) != LOAD_EXPR);
+ /* If this is a dereference with a special dynamic constrained
+subtype on the node, use it to compute the size.  */
+ if (Nkind (gnat_name) == N_Explicit_Dereference
+ && Present (Actual_Designated_Subtype (gnat_name)))
+   {
+ tree gnu_actual_obj_type
+   = gnat_to_gnu_type (Actual_Designated_Subtype (gnat_name));
+ gnu_size = TYPE_SIZE_UNIT (gnu_actual_obj_type);
+   }
+ else
+   gnu_size = NULL_TREE;
 
  gnu_result
-   = build_storage_model_store (gnat_smo, gnu_lhs, gnu_rhs);
+ 

[COMMITTED] ada: Make internal_error_function more robust

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/misc.cc (internal_error_function): Be prepared for
an input_location set to UNKNOWN_LOCATION.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/misc.cc | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 56c7bb9b533..30319ae58b1 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -330,13 +330,23 @@ internal_error_function (diagnostic_context *context, 
const char *msgid,
   sp.Bounds = &temp;
   sp.Array = buffer;
 
-  xloc = expand_location (input_location);
-  if (context->show_column && xloc.column != 0)
-loc = xasprintf ("%s:%d:%d", xloc.file, xloc.line, xloc.column);
+  if (input_location == UNKNOWN_LOCATION)
+{
+  loc = NULL;
+  temp_loc.Low_Bound = 1;
+  temp_loc.High_Bound = 0;
+}
   else
-loc = xasprintf ("%s:%d", xloc.file, xloc.line);
-  temp_loc.Low_Bound = 1;
-  temp_loc.High_Bound = strlen (loc);
+{
+  xloc = expand_location (input_location);
+  if (context->show_column && xloc.column != 0)
+   loc = xasprintf ("%s:%d:%d", xloc.file, xloc.line, xloc.column);
+  else
+   loc = xasprintf ("%s:%d", xloc.file, xloc.line);
+  temp_loc.Low_Bound = 1;
+  temp_loc.High_Bound = strlen (loc);
+}
+
   sp_loc.Bounds = &temp_loc;
   sp_loc.Array = loc;
 
-- 
2.40.0



[COMMITTED] ada: Fix minor issue with Mod operator

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu) : Test the
precision of the operation rather than that of the result type.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 8c8a78f5d2d..1c3c6c0618e 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -7095,9 +7095,9 @@ gnat_to_gnu (Node_Id gnat_node)
gnu_rhs = convert (gnu_count_type, gnu_rhs);
gnu_max_shift
  = convert (TREE_TYPE (gnu_rhs), TYPE_SIZE (gnu_type));
-   /* If the result type is larger than a word, then declare the 
dependence
-  on the libgcc routine.  */
-   if (TYPE_PRECISION (gnu_result_type) > BITS_PER_WORD)
+   /* If the result type is larger than a word, then declare the
+  dependence on the libgcc routine.  */
+   if (TYPE_PRECISION (gnu_type) > BITS_PER_WORD)
  Check_Restriction_No_Dependence_On_System (Name_Gcc, gnat_node);
  }
 
@@ -7114,7 +7114,7 @@ gnat_to_gnu (Node_Id gnat_node)
/* If this is a modulo/remainder and the result type is larger than a
   word, then declare the dependence on the libgcc routine.  */
else if ((kind == N_Op_Mod ||kind == N_Op_Rem)
-&& TYPE_PRECISION (gnu_result_type) > BITS_PER_WORD)
+&& TYPE_PRECISION (gnu_type) > BITS_PER_WORD)
  Check_Restriction_No_Dependence_On_System (Name_Gcc, gnat_node);
 
/* Pending generic support for efficient vector logical operations in
-- 
2.40.0



Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Robin Dapp via Gcc-patches
Hi Kito,

> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
> 
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
> 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
> 
> This is compatible with VLA vectorization.
> 
> Only support move and binary part of operation patterns.

On a high-level:  Why do we need to do it this way and not any other way? :)
Some more comments/explanations would definitely help, i.e. prior art on
aarch64, what exactly is easier for combine and friends now (no undef and so
on) and, importantly, why is the conversion after register allocation always
safe?  Couldn't we "lower" the fixed-length vectors to VLA at some point and
how does everything relate to fixed-vlmax? Essentially this is a "separate"
backend similar to ARM NEON but we share most of the things and possibly grow
it in the future?

What would the alternative be?

That said, couldn't we reuse the existing binop tests?  If you don't like them
change the existing ones as well and reuse then?

> +/* Return the minimal containable VLA mode for MODE.  */
> +
> +machine_mode
> +minimal_vla_mode (machine_mode mode)
> +{
> +  gcc_assert (GET_MODE_NUNITS (mode).is_constant ());
> +  unsigned type_size = GET_MODE_NUNITS (mode).to_constant ();

Couldn't you use .require () right away?  Same in some other hunks.

Regards
 Robin



[COMMITTED] ada: Fix incorrect copies being used with 'Address

2023-05-30 Thread Marc Poulhiès via Gcc-patches
When using 'Address on an object with a size clause, gigi would end up
creating a copy and using its address instead of the one of the original
object, leading to incorrect behavior. Remove the conversion (that
triggers the copy) when 'Address is applied to a declaration.

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu): Also strip conversion
in case of DECL.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 1c3c6c0618e..57933ceb8a3 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -1714,12 +1714,17 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
 case Attr_Address:
 case Attr_Unrestricted_Access:
   /* Conversions don't change the address of references but can cause
-build_unary_op to miss the references below, so strip them off.
+build_unary_op to miss the references below so strip them off.
+
+ Also remove the conversions applied to declarations as the intent is
+ to take the decls' address, not that of the copies that the
+ conversions may create.
+
 On the contrary, if the address-of operation causes a temporary
 to be created, then it must be created with the proper type.  */
   gnu_expr = remove_conversions (gnu_prefix,
 !Must_Be_Byte_Aligned (gnat_node));
-  if (REFERENCE_CLASS_P (gnu_expr))
+  if (REFERENCE_CLASS_P (gnu_expr) || DECL_P (gnu_expr))
gnu_prefix = gnu_expr;
 
   /* If we are taking 'Address of an unconstrained object, this is the
@@ -4575,7 +4580,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
   /* The FUNCTION_TYPE node giving the GCC type of the subprogram.  */
   tree gnu_subprog_type = TREE_TYPE (gnu_subprog);
   /* The return type of the FUNCTION_TYPE.  */
-  tree gnu_result_type;;
+  tree gnu_result_type;
   const bool frontend_builtin
 = (TREE_CODE (gnu_subprog) == FUNCTION_DECL
&& DECL_BUILT_IN_CLASS (gnu_subprog) == BUILT_IN_FRONTEND);
@@ -4657,7 +4662,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, 
tree gnu_target,
 }
 
   /* We must elaborate the entire profile now because, if it references types
- that were initially incomplete,, their elaboration changes the contents
+ that were initially incomplete, their elaboration changes the contents
  of GNU_SUBPROG_TYPE and, in particular, may change the result type.  */
   elaborate_profile (gnat_formal, gnat_result_type);
 
-- 
2.40.0



[COMMITTED] ada: Fix bogus Storage_Error on dynamic array with static zero length

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This works around the limitations present for the support of arrays in the
middle-end by clearing the TREE_OVERFLOW flag for arrays with zero length.

gcc/ada/

* gcc-interface/decl.cc (gnat_to_gnu_entity) : Use a
local variable for the GNAT index type.
: Likewise.  Call Is_Null_Range on the bounds and
force the zero on TYPE_SIZE and TYPE_SIZE_UNIT if it returns true.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/decl.cc | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/gcc/ada/gcc-interface/decl.cc b/gcc/ada/gcc-interface/decl.cc
index 456fe53737d..e5e04ddad93 100644
--- a/gcc/ada/gcc-interface/decl.cc
+++ b/gcc/ada/gcc-interface/decl.cc
@@ -2241,9 +2241,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
 index += (convention_fortran_p ? - 1 : 1),
 gnat_index = Next_Index (gnat_index))
  {
+   const Entity_Id gnat_index_type = Etype (gnat_index);
const bool is_flb
- = Is_Fixed_Lower_Bound_Index_Subtype (Etype (gnat_index));
-   tree gnu_index_type = get_unpadded_type (Etype (gnat_index));
+ = Is_Fixed_Lower_Bound_Index_Subtype (gnat_index_type);
+   tree gnu_index_type = get_unpadded_type (gnat_index_type);
tree gnu_orig_min = TYPE_MIN_VALUE (gnu_index_type);
tree gnu_orig_max = TYPE_MAX_VALUE (gnu_index_type);
tree gnu_index_base_type = get_base_type (gnu_index_type);
@@ -2479,6 +2480,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
  const int ndim = Number_Dimensions (gnat_entity);
  tree gnu_base_type = gnu_type;
  tree *gnu_index_types = XALLOCAVEC (tree, ndim);
+ bool *gnu_null_ranges = XALLOCAVEC (bool, ndim);
  tree gnu_max_size = size_one_node;
  bool need_index_type_struct = false;
  int index;
@@ -2494,7 +2496,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, 
bool definition)
   gnat_index = Next_Index (gnat_index),
   gnat_base_index = Next_Index (gnat_base_index))
{
- tree gnu_index_type = get_unpadded_type (Etype (gnat_index));
+ const Entity_Id gnat_index_type = Etype (gnat_index);
+ tree gnu_index_type = get_unpadded_type (gnat_index_type);
  tree gnu_orig_min = TYPE_MIN_VALUE (gnu_index_type);
  tree gnu_orig_max = TYPE_MAX_VALUE (gnu_index_type);
  tree gnu_index_base_type = get_base_type (gnu_index_type);
@@ -2671,6 +2674,13 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
= create_index_type (gnu_min, gnu_high, gnu_index_type,
 gnat_entity);
 
+ /* Record whether the range is known to be null at compile time
+to disambiguate it from too large ranges.  */
+ const Entity_Id gnat_ui_type = Underlying_Type (gnat_index_type);
+ gnu_null_ranges[index]
+   = Is_Null_Range (Type_Low_Bound (gnat_ui_type),
+Type_High_Bound (gnat_ui_type));
+
  /* We need special types for debugging information to point to
 the index types if they have variable bounds, are not integer
 types, are biased or are wider than sizetype.  These are GNAT
@@ -2737,7 +2747,14 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree 
gnu_expr, bool definition)
  if (array_type_has_nonaliased_component (gnu_type, gnat_entity))
set_nonaliased_component_on_array_type (gnu_type);
 
- /* Kludge to remove the TREE_OVERFLOW flag for the sake of LTO
+ /* Clear the TREE_OVERFLOW flag, if any, for null arrays.  */
+ if (gnu_null_ranges[index])
+   {
+ TYPE_SIZE (gnu_type) = bitsize_zero_node;
+ TYPE_SIZE_UNIT (gnu_type) = size_zero_node;
+   }
+
+ /* Kludge to clear the TREE_OVERFLOW flag for the sake of LTO
 on maximally-sized array types designed by access types.  */
  if (integer_zerop (TYPE_SIZE (gnu_type))
  && TREE_OVERFLOW (TYPE_SIZE (gnu_type))
-- 
2.40.0



[COMMITTED] ada: Add missing guards for degenerate storage models

2023-05-30 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/trans.cc (Attribute_to_gnu) : Check that
the storage model has Copy_From before instantiating loads for it.
: Likewise.
: Likewise.
(gnat_to_gnu) : Likewise.
: Likewise.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 57933ceb8a3..18f7e73d45d 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -1978,7 +1978,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
   if (TREE_CODE (gnu_prefix) != TYPE_DECL)
{
  gnu_result = SUBSTITUTE_PLACEHOLDER_IN_EXPR (gnu_result, gnu_prefix);
- if (Present (gnat_smo))
+ if (Present (gnat_smo)
+ && Present (Storage_Model_Copy_From (gnat_smo)))
gnu_result = INSTANTIATE_LOAD_IN_EXPR (gnu_result, gnat_smo);
}
   else if (CONTAINS_PLACEHOLDER_P (gnu_result))
@@ -2211,7 +2212,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
   handling.  Note that these attributes could not have been used on
   an unconstrained array type.  */
gnu_result = SUBSTITUTE_PLACEHOLDER_IN_EXPR (gnu_result, gnu_prefix);
-   if (Present (gnat_smo))
+   if (Present (gnat_smo)
+   && Present (Storage_Model_Copy_From (gnat_smo)))
  gnu_result = INSTANTIATE_LOAD_IN_EXPR (gnu_result, gnat_smo);
 
/* Cache the expression we have just computed.  Since we want to do it
@@ -2373,7 +2375,8 @@ Attribute_to_gnu (Node_Id gnat_node, tree 
*gnu_result_type_p, int attribute)
/* If this has a PLACEHOLDER_EXPR, qualify it by the object we are
   handling.  */
gnu_result = SUBSTITUTE_PLACEHOLDER_IN_EXPR (gnu_result, gnu_prefix);
-   if (Present (gnat_smo))
+   if (Present (gnat_smo)
+   && Present (Storage_Model_Copy_From (gnat_smo)))
  gnu_result = INSTANTIATE_LOAD_IN_EXPR (gnu_result, gnat_smo);
break;
   }
@@ -6701,7 +6704,8 @@ gnat_to_gnu (Node_Id gnat_node)
gnu_result
  = build_binary_op (ARRAY_REF, NULL_TREE, gnu_result, gnu_expr);
 
-   if (Present (gnat_smo))
+   if (Present (gnat_smo)
+   && Present (Storage_Model_Copy_From (gnat_smo)))
  instantiate_load_in_array_ref (gnu_result, gnat_smo);
  }
 
@@ -6746,7 +6750,8 @@ gnat_to_gnu (Node_Id gnat_node)
gnu_result = build_binary_op (ARRAY_RANGE_REF, gnu_result_type,
  gnu_array_object, gnu_expr);
 
-   if (Present (gnat_smo))
+   if (Present (gnat_smo)
+   && Present (Storage_Model_Copy_From (gnat_smo)))
  instantiate_load_in_array_ref (gnu_result, gnat_smo);
 
/* If storage model access is required on the RHS, build the load.  */
-- 
2.40.0



Re: [PATCH] VECT: Add SELECT_VL support

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, this patch is bootstrapped PASS.

Ok for trunk ?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-25 23:26
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -551,9 +551,14 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Create decrement IV.  */
   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
insert_after, &index_before_incr, &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  tree len = NULL_TREE;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+ index_before_incr, nitems_step);
+  else
+ len = gimple_build (header_seq, MIN_EXPR, iv_type, index_before_incr,
+ nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
   *iv_step = step;
   return index_after_incr;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..f67340976c8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,14 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
+  /* If we're using decrement IV and SELECT_VL is supported by the target.
+ Use output of SELECT_VL to adjust IV of loop control and data reference.
+ Note: We only use SELECT_VL on single-rgroup control.  */
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
+  && !slp)
+LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
+
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
  to be able to handle fewer than VF scalars, or needs to have a lower VF
  than the main loop.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 127b987cd62..8e8b0f71a4a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo,
   return iv_step;
}
+/* Prepare the pointer IVs which needs to be updated by a variable amount.
+   Such variable amount is the outcome of .SELECT_VL. In this case, we can
+   allow each iteration process the flexible number of elements as long as
+   the number <= vf elments.
+
+   Return data reference according to SELECT_VL.
+   If new statements are needed, insert them before GSI.  */
+
+static tree
+get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
+ tree aggr_type, class loop *at_loop, tree offset,
+ tree *dummy, gimple_stmt_iterator *gsi,
+ bool simd_lane_access_p, vec_loop_lens *loop_lens,
+ dr_vec_info *dr_info,
+ vect_memory_access_type memory_access_type)
+{
+  loop_vec_info loop_vinfo = dyn_cast (vinfo);
+  tree step = vect_dr_behavior (vinfo, dr_info)->step;
+
+  /* TODO: We don't support gather/scatter or load_lanes/store_lanes for 
pointer
+ IVs are updated by variable amount but we will support them in the future.
+   */
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+   && memory_access_type != VMAT_LOAD_STORE_LANES);
+
+  /* When we support SELECT_VL pattern, we dynamic adjust
+ the memory address by .SELECT_VL result.
+
+ The result of .SELECT_VL is the number of elements to
+ be processed of each iteration. So the memory address
+ adjustment operation should be:
+
+ bytesize = GET_MODE_SIZE (element_mode (aggr_type));
+ addr = addr + .SELECT_VL (ARG..) * bytesize;
+  */
+  gimple *ptr_incr;
+  tree loop_len
+= vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0, 0);
+  tree len_type = TREE_TYPE (loop_len);
+  poly_uint64 bytesize = GET_MODE_SIZE (element_mode (aggr_type));
+  /* Since the outcome of .SELECT_VL is elem

Re: [PATCH] btf: improve -dA comments for testsuite

2023-05-30 Thread Indu Bhagat via Gcc-patches

On 5/25/23 9:37 AM, David Faust via Gcc-patches wrote:

Many BTF type kinds refer to other types via index to the final types
list. However, the order of the final types list is not guaranteed to
remain the same for the same source program between different runs of
the compiler, making it difficult to test inter-type references.

This patch updates the assembler comments output when writing a
given BTF record to include minimal information about the referenced
type, if any. This allows for the regular expressions used in the gcc
testsuite to do some basic integrity checks on inter-type references.

For example, for the type

unsigned int *

Assembly comments like the following are written with -dA:

.4byte  0   ; TYPE 2 BTF_KIND_PTR ''
.4byte  0x200   ; btt_info: kind=2, kflag=0, vlen=0
.4byte  0x1 ; btt_type: (BTF_KIND_INT 'unsigned int')

Several BTF tests which can immediately be made more robust with this
change are updated. It will also be useful in new tests for the upcoming
btf_type_tag support.



Thanks for working on this, David.  It will be nice to use these 
enhanced assembler comments in the output for some of CTF testing as 
well sometime.  But we can get to that later after this comit.


Some comments inlined below.


Tested on BPF and x86_64, no known regressions.
OK for trunk?

Thanks.

gcc/

* btfout.cc (btf_kind_names): New.
(btf_kind_name): New.
(btf_absolute_var_id): New utility function.
(btf_relative_var_id): Likewise.
(btf_relative_func_id): Likewise.
(btf_absolute_datasec_id): Likewise.
(btf_asm_type_ref): New.
(btf_asm_type): Update asm comments and use btf_asm_type_ref ().
(btf_asm_array): Likewise. Accept ctf_container_ref parameter.
(btf_asm_varent): Likewise.
(btf_asm_func_arg): Likewise.
(btf_asm_datasec_entry): Likewise.
(btf_asm_datasec_type): Likewise.
(btf_asm_func_type): Likewise. Add index parameter.
(btf_asm_sou_member): Likewise.
(output_btf_vars): Update btf_asm_* call accordingly.
(output_asm_btf_sou_fields): Likewise.
(output_asm_btf_func_args_list): Likewise.
(output_asm_btf_vlen_bytes): Likewise.
(output_btf_func_types): Add ctf_container_ref parameter.
Pass it to btf_asm_func_type.
(output_btf_datasec_types): Update btf_asm_datsec_type call similarly.
(btf_output): Update output_btf_func_types call similarly.

gcc/testsuite/

* gcc.dg/debug/btf/btf-array-1.c: Use new BTF asm comments
in scan-assembler expressions where useful.
* gcc.dg/debug/btf/btf-anonymous-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-anonymous-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-2.c: Likewise.
* gcc.dg/debug/btf/btf-bitfields-3.c: Likewise.
* gcc.dg/debug/btf/btf-function-6.c: Likewise.
* gcc.dg/debug/btf/btf-pointers-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-1.c: Likewise.
* gcc.dg/debug/btf/btf-struct-2.c: Likewise.
* gcc.dg/debug/btf/btf-typedef-1.c: Likewise.
* gcc.dg/debug/btf/btf-union-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-1.c: Likewise.
* gcc.dg/debug/btf/btf-variables-2.c: Likewise. Update outdated comment.
* gcc.dg/debug/btf/btf-function-3.c: Update outdated comment.
---
  gcc/btfout.cc | 220 ++
  .../gcc.dg/debug/btf/btf-anonymous-struct-1.c |   3 +-
  .../gcc.dg/debug/btf/btf-anonymous-union-1.c  |   4 +-
  gcc/testsuite/gcc.dg/debug/btf/btf-array-1.c  |   3 +
  .../gcc.dg/debug/btf/btf-bitfields-2.c|   2 +-
  .../gcc.dg/debug/btf/btf-bitfields-3.c|   2 +-
  .../gcc.dg/debug/btf/btf-function-3.c |   2 +-
  .../gcc.dg/debug/btf/btf-function-6.c |   4 +-
  .../gcc.dg/debug/btf/btf-pointers-1.c |   3 +
  gcc/testsuite/gcc.dg/debug/btf/btf-struct-1.c |   4 +-
  gcc/testsuite/gcc.dg/debug/btf/btf-struct-2.c |   2 +-
  .../gcc.dg/debug/btf/btf-typedef-1.c  |  14 +-
  gcc/testsuite/gcc.dg/debug/btf/btf-union-1.c  |   2 +-
  .../gcc.dg/debug/btf/btf-variables-1.c|   6 +
  .../gcc.dg/debug/btf/btf-variables-2.c|   7 +-
  15 files changed, 215 insertions(+), 63 deletions(-)

diff --git a/gcc/btfout.cc b/gcc/btfout.cc
index 497c1ca06e6..8960acfbbaa 100644
--- a/gcc/btfout.cc
+++ b/gcc/btfout.cc
@@ -114,6 +114,23 @@ static unsigned int num_types_added = 0;
 CTF types.  */
  static unsigned int num_types_created = 0;
  
+/* Name strings for BTF kinds.

+   Note: the indices here must match the type defines in btf.h.  */
+static const char *const btf_kind_names[] =
+  {
+"UNKN", "INT", "PTR", "ARRAY", "STRUCT", "UNION", "ENUM", "FWD",
+"TYPEDEF", "VOLATILE", "CONST", "RESTRICT", "FUNC", "FUNC_PROTO",
+"VAR", "DATASEC", "FLOAT", "DECL_TAG", "TYPE_TAG", "ENUM64"
+  };
+
+/* Return a name string 

Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-30 Thread Ajit Agarwal via Gcc-patches
Hello Richard:

On 30/05/23 12:34 pm, Richard Biener wrote:
> On Tue, May 30, 2023 at 7:06 AM Ajit Agarwal  wrote:
>>
>> Hello Richard:
>>
>> On 22/05/23 6:26 pm, Richard Biener wrote:
>>> On Thu, May 18, 2023 at 9:14 AM Ajit Agarwal  wrote:

 Hello All:

 This patch improves code sinking pass to sink statements before call to 
 reduce
 register pressure.
 Review comments are incorporated.

 Bootstrapped and regtested on powerpc64-linux-gnu.

 Thanks & Regards
 Ajit


 tree-ssa-sink: Improve code sinking pass.

 Code Sinking sinks the blocks after call. This increases
 register pressure for callee-saved registers. Improves
 code sinking before call in the use blocks or immediate
 dominator of use blocks.

 2023-05-18  Ajit Kumar Agarwal  

 gcc/ChangeLog:

 * tree-ssa-sink.cc (statement_sink_location): Modifed to
 move statements before calls.
 (block_call_p): New function.
 (def_use_same_block): New function.
 (select_best_block): Add heuristics to select the best
 blocks in the immediate post dominator.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
 * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
 ---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 ++
  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  20 +++
  gcc/tree-ssa-sink.cc| 159 ++--
  3 files changed, 185 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 new file mode 100644
 index 000..716bc1f9257
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 @@ -0,0 +1,16 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
 -fdump-tree-sink-stats" } */
 +
 +void bar();
 +int j;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
>>>
>>> this doesn't verify the place we sink to?
>>>
>>
>> I am not sure how to verify the place we sink to with dg-final.
> 
> I think dejagnu supports matching multi-line regexps so I suggest
> to scan for the sunk expr RHS to be followed by the call?
> 

You meant to use dg-begin-multiline-output and dg-end-multiline-output.

Thanks & Regards
Ajit
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 new file mode 100644
 index 000..ff41e2ea8ae
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 @@ -0,0 +1,20 @@
 +/* { dg-do compile } */
 +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */
 +
 +void bar();
 +int j, x;
 +void foo(int a, int b, int c, int d, int e, int f)
 +{
 +  int l;
 +  l = a + b + c + d +e + f;
 +  if (a != 5)
 +{
 +  bar();
 +  if (b != 3)
 +x = 3;
 +  else
 +x = 5;
 +  j = l;
 +}
 +}
 +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } */
>>>
>>> likewise.  So both tests already pass before the patch?
>>>
 diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
 index 87b1d40c174..76556e7795b 100644
 --- a/gcc/tree-ssa-sink.cc
 +++ b/gcc/tree-ssa-sink.cc
 @@ -171,6 +171,72 @@ nearest_common_dominator_of_uses (def_operand_p 
 def_p, bool *debug_stmts)
return commondom;
  }

 +/* Return TRUE if immediate uses of the defs in
 +   USE occur in the same block as USE, FALSE otherwise.  */
 +
 +bool
 +def_use_same_block (gimple *stmt)
 +{
 +  use_operand_p use_p;
 +  def_operand_p def_p;
 +  imm_use_iterator imm_iter;
 +  ssa_op_iter iter;
 +
 +  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_DEF)
 +{
 +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
 +   {
 + if (is_gimple_debug (USE_STMT (use_p)))
 +   continue;
 +
 + if (use_p
>>>
>>> use_p is never null
>>>
 + && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt)))
 +   return true;
>>>
>>> the function behavior is obviously odd ...
>>>
 +   }
 + }
 +  return false;
 +}
 +
 +/* Return TRUE if the block has only calls, FALSE otherwise. */
 +
 +bool
 +block_call_p (basic_block bb)
 +{
 +  in

Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-05-30 Thread Uros Bizjak via Gcc-patches
On Mon, May 29, 2023 at 8:17 PM Roger Sayle  wrote:
>
>
> This is my proposed minimal fix for PR target/109973 (hopefully suitable
> for backporting) that follows Jakub Jelinek's suggestion that we introduce
> CCZmode and CCCmode variants of ptest and vptest, so that the i386
> backend treats [v]ptest instructions similarly to testl instructions;
> using different CCmodes to indicate which condition flags are desired,
> and then relying on the RTL cmpelim pass to eliminate redundant tests.
>
> This conveniently matches Intel's intrinsics, that provide different
> functions for retrieving different flags, _mm_testz_si128 tests the
> Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> same instruction (pattern) for both, and unfortunately the *ptest_and
> optimization is only valid when the ptest/vptest instruction is used to
> set/test the Z flag.
>
> The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> currently COMPARE-centric and not able to merge the ptests from expressions
> such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> known issue, PR target/80040.  I've some follow-up patches to improve
> things, but this first patch fixes the wrong-code regression, replacing
> it with a rare missed-optimization (hopefully suitable for GCC 13).
>
> The only change that was unanticipated was the tweak to ix86_match_ccmode.
> Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> CCZmode means just the Z flag, CCCmode means just the C flag, and
> CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> CCmode
> in the *_ptest pattern's predicate, and not attempt to
> re-use ix86_match_ccmode?

It is actually the other way. CCZmode should NOT be allowed for CCmode
in ix86_match_ccmode. When CCmode is requested, we don't assume
anything about FLAGS bits, so we expect all bits to be valid. CCZmode
implies only Z bit, and should be compatible only with itself. So, the
"break;" is in the wrong place, it should be before E_CCZmode.

Uros.

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-05-29  Roger Sayle  
>
> gcc/ChangeLog
> PR targt/109973
> * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> CODE_for_sse4_1_ptestzv2di.
> (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> when expanding UNSPEC_PTEST to compare against zero.
> * config/i386/i386-features.cc (scalar_chain::convert_compare):
> Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> result.
> (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
> of CCmode.
> * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> (*_ptest): Add asterisk to hide define_insn.
> Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> (_ptestz): New define_expand to specify CCZ.
> (_ptestc): New define_expand to specify CCC.
> (_ptest): A define_expand using CC to preserve the
> current behavior.
> (*ptest_and): Specify CCZ to only perform this optimization
> when only the Z flag is required.
>
> gcc/testsuite/ChangeLog
> PR targt/109973
> * gcc.target/i386/pr109973-1.c: New test case.
> * gcc.target/i386/pr109973-2.c: Likewise.
>
>
> Thanks,
> Roger
> --
>


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
>> why is the conversion after register allocation always
>> safe?
I do worry about this issue too. 
I just notice :

+   case MEM:
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX);

I am not sure whether it is safe.

>> Couldn't we "lower" the fixed-length vectors to VLA at some point and
>> how does everything relate to fixed-vlmax?

I can answer you why we need this patch (I call it fixed-vlmin).
You can take a look at this example:
https://godbolt.org/z/3jYqoM84h 

This is how LLVM works.
This example, you can see GCC need --param=riscv-autovec-preference=fixed-vlmax 
-march=rv64gcv (same as mrvv-vector-bits=128).
However, LLVM doesn't need to specify the vector-length.

The benefits:
1. We don't need to specify actual real vector length, then we can vectorize 
this example.
2. GCC codegen can only run on CPU with vector length=128. However, LLVM can 
run on any RVV CPU with vector length >= 128.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 15:27
To: Kito Cheng; gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; 
pan2.li
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
Hi Kito,
 
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
> 
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
> 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
> 
> This is compatible with VLA vectorization.
> 
> Only support move and binary part of operation patterns.
 
On a high-level:  Why do we need to do it this way and not any other way? :)
Some more comments/explanations would definitely help, i.e. prior art on
aarch64, what exactly is easier for combine and friends now (no undef and so
on) and, importantly, why is the conversion after register allocation always
safe?  Couldn't we "lower" the fixed-length vectors to VLA at some point and
how does everything relate to fixed-vlmax? Essentially this is a "separate"
backend similar to ARM NEON but we share most of the things and possibly grow
it in the future?
 
What would the alternative be?
 
That said, couldn't we reuse the existing binop tests?  If you don't like them
change the existing ones as well and reuse then?
 
> +/* Return the minimal containable VLA mode for MODE.  */
> +
> +machine_mode
> +minimal_vla_mode (machine_mode mode)
> +{
> +  gcc_assert (GET_MODE_NUNITS (mode).is_constant ());
> +  unsigned type_size = GET_MODE_NUNITS (mode).to_constant ();
 
Couldn't you use .require () right away?  Same in some other hunks.
 
Regards
Robin
 
 


Re: Re: [PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp

2023-05-30 Thread Fei Gao
On 2023-05-30 13:26  Sinan  wrote:
>
>>> +/* Return TRUE if Zcmp push and pop insns should be
>>> + avoided. FALSE otherwise.
>>> + Only use multi push & pop if all GPRs masked can be covered,
>>> + and stack access is SP based,
>>> + and GPRs are at top of the stack frame,
>>> + and no conflicts in stack allocation with other features */
>>> +static bool
>>> +riscv_avoid_multi_push(const struct riscv_frame_info *frame)
>>> +{
>>> + if (!TARGET_ZCMP
>>> + || crtl->calls_eh_return
>>> + || frame_pointer_needed
>>> + || cfun->machine->interrupt_handler_p
>>> + || cfun->machine->varargs_size != 0
>>> + || crtl->args.pretend_args_size != 0
>>> + || (frame->mask & ~ MULTI_PUSH_GPR_MASK))
>>> + return true;
>>> +
>>> + return false;
>>> +}
>Any reason to skip generating push/pop in the cases where a frame pointer is 
>needed?
>IIRC, only code compiled with O1 and above will omit frame pointer, if so then 
>code with
>O0 will never generate cm.push/pop. 
without -fomit-frame-pointer in O0, the stack access is s0 based, while 
cm.push/pop is sp based access.
So cm.push/pop will not be generated. Same logic as taken by save-restore.

>Same question for interrupt_handler_p. I think cm.push/pop can handle this 
>case. e.g.
>the test case zc-zcmp-push-pop-6.c from Jiawei's patch. 
Same logic as taken by save-restore. I don't know the exact reason why 
save-restore cannot be used in interrupt. 
In riscv_compute_frame_info, riscv_stack_align (num_save_restore * 
UNITS_PER_WORD) == x_save_size fails in most cases for interrupt.
          if (riscv_stack_align (num_save_restore * UNITS_PER_WORD) == 
x_save_size
              && !riscv_avoid_save_libcall ())
            {
              ...
              frame->save_libcall_adjustment = x_save_size;
            }
In my understanding, use save-restore if all regs to be saved can be covered. 
That's why i added (frame->mask & ~ MULTI_PUSH_GPR_MASK)
in riscv_avoid_multi_push.

BR, 
Fei

>BR,
>Sinan 

>--
>Sender:Fei Gao 
>Sent At:2023 May 16 (Tue.) 17:34
>Recipient:sinan.lin ; jiawei 
>; shihua ; lidie 
>
>Cc:Fei Gao 
>Subject:[PATCH 1/1] [V2] [RISC-V] support cm.push cm.pop cm.popret in zcmp
>Zcmp can share the same logic as save-restore in stack allocation: 
>pre-allocation
>by cm.push, step 1 and step 2.
>please be noted cm.push pushes ra, s0-s11 in reverse order than what 
>save-restore does.
>So adaption has been done in .cfi directives in my patch.
>gcc/ChangeLog:
> * config/riscv/predicates.md (slot_0_offset_operand): predicates for slot 0 
> offset.
> (slot_1_offset_operand): likewise
> (slot_2_offset_operand): likewise
> (slot_3_offset_operand): likewise
> (slot_4_offset_operand): likewise
> (slot_5_offset_operand): likewise
> (slot_6_offset_operand): likewise
> (slot_7_offset_operand): likewise
> (slot_8_offset_operand): likewise
> (slot_9_offset_operand): likewise
> (slot_10_offset_operand): likewise
> (slot_11_offset_operand): likewise
> (slot_12_offset_operand): likewise
> (stack_push_up_to_ra_operand): predicates for stack adjust of pushing ra
> (stack_push_up_to_s0_operand): predicates for stack adjust of pushing ra, s0
> (stack_push_up_to_s1_operand): likewise
> (stack_push_up_to_s2_operand): likewise
> (stack_push_up_to_s3_operand): likewise
> (stack_push_up_to_s4_operand): likewise
> (stack_push_up_to_s5_operand): likewise
> (stack_push_up_to_s6_operand): likewise
> (stack_push_up_to_s7_operand): likewise
> (stack_push_up_to_s8_operand): likewise
> (stack_push_up_to_s9_operand): likewise
> (stack_push_up_to_s11_operand): likewise
> (stack_pop_up_to_ra_operand): predicates for stack adjust of poping ra
> (stack_pop_up_to_s0_operand): predicates for stack adjust of poping ra, s0
> (stack_pop_up_to_s1_operand): likewise
> (stack_pop_up_to_s2_operand): likewise
> (stack_pop_up_to_s3_operand): likewise
> (stack_pop_up_to_s4_operand): likewise
> (stack_pop_up_to_s5_operand): likewise
> (stack_pop_up_to_s6_operand): likewise
> (stack_pop_up_to_s7_operand): likewise
> (stack_pop_up_to_s8_operand): likewise
> (stack_pop_up_to_s9_operand): likewise
> (stack_pop_up_to_s11_operand): likewise
> * config/riscv/riscv-protos.h (riscv_zcmp_valid_slot_offset_p): declaration
> (riscv_zcmp_valid_stack_adj_bytes_p): declaration
> * config/riscv/riscv.cc (struct riscv_frame_info): comment change
> (riscv_avoid_multi_push): helper function of riscv_use_multi_push
> (riscv_use_multi_push): true if multi push is used
> (riscv_multi_push_sregs_count): num of sregs in multi-push
> (riscv_multi_push_regs_count): num of regs in multi-push
> (riscv_16bytes_align): align to 16 bytes
> (riscv_stack_align): moved to a better place
> (riscv_save_libcall_count): no functional change
> (riscv_compute_frame_info): add zcmp frame info
> (riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
> (get_slot_offset_rtx): get the rtx of slot to push or pop
> (riscv_gen_multi_push_pop_insn): gen function for multi push and pop
> (ris

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> but ideally the user would be able to specify -mrvv-size=32 for an
>> implementation with 32 byte vectors and then vector lowering would make use
>> of vectors up to 32 bytes?

Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
GNU vectors.
You can take a look this example:
https://godbolt.org/z/3jYqoM84h 

GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
can run on CPU with vector-length = 128bit.
However, LLVM doesn't need to specify the vector length, and the codegen can 
run on any CPU with RVV  vector-length >= 128 bits.

This is what this patch want to do.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 15:13
To: Kito Cheng
CC: gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; rdapp.gcc; 
pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 8:07 AM Kito Cheng via Gcc-patches
 wrote:
>
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
>
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
 
In the patch you added fixed 16 bytes vector modes, correct?  I've
never looked at
how ARM deals with the GNU vector extensions but I suppose they get mapped
to NEON and not SVE so basically behave the same way here.
 
But I do wonder about the efficiency for RVV where there doesn't exist a
complementary fixed-length ISA.  Shouldn't vector lowering
(tree-vect-generic.cc)
be enhanced to support lowering fixed-length vectors to variable length ones
with (variable) fixed length instead?  From your patch I second-guess the RVV
specification requires 16 byte vectors to be available (or will your
patch split the
insns?) but ideally the user would be able to specify -mrvv-size=32 for an
implementation with 32 byte vectors and then vector lowering would make use
of vectors up to 32 bytes?
 
Also vector lowering will split smaller vectors not equal to the fixed size to
scalars unless you add all fixed length modes smaller than 16 bytes as well.
 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
>
> This is compatible with VLA vectorization.
>
> Only support move and binary part of operation patterns.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def: Introduce VLS modes.
> * config/riscv/riscv-protos.h (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::vls_mode_p): New.
> * config/riscv/riscv-v.cc (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_mode_p): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::update_vls_mode): New.
> * config/riscv/riscv.cc (riscv_v_ext_mode_p): New.
> (riscv_v_adjust_nunits): Handle VLS type.
> (riscv_hard_regno_nregs): Ditto.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_regmode_natural_size): Ditto.
> * config/riscv/vector-iterators.md (VLS): New.
> (VM): Handle VLS type.
> (vel): Ditto.
> * config/riscv/vector.md: Include vector-vls.md.
> * config/riscv/vector-vls.md: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add vls folder.
> * gcc.target/riscv/rvv/vls/binop-template.h: New test.
> * gcc.target/riscv/rvv/vls/binop-v.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/move-template.h: New test.
> * gcc.target/riscv/rvv/vls/move-v.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-template.h: New test.
> * gcc.target/riscv/rvv/vls/load-store-v.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/vls-types.h: New test.
> ---
>  gcc/config/riscv/riscv-modes.def  |  3 +
>  gcc/config/riscv/riscv-protos.h   |  4 ++
>  gcc/config/riscv/riscv-v.cc   | 67 +++
>  gcc/config/riscv/riscv.cc | 27 +++-
>  gcc/config/riscv/vector-iterators.md  |  6 ++
>  gcc/config/riscv/vector-vls.md| 64 ++
>  gcc/config/riscv/vector.md|  2 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|

RE: [RFC][PATCH] Improve generating FMA by adding a widening_mul pass

2023-05-30 Thread Di Zhao OS via Gcc-patches
Sorry I've missed the recent updates on trunk regarding handling FMA.
I'll measure again if something in this still helps.

Thanks,
Di Zhao

> -Original Message-
> From: Di Zhao OS
> Sent: Friday, May 26, 2023 3:15 PM
> To: gcc-patches@gcc.gnu.org
> Subject: [RFC][PATCH] Improve generating FMA by adding a widening_mul pass
> 
> As GCC's reassociation pass does not have knowledge of FMA, when
> transforming expression lists to parallel, it reduces the
> opportunities to generate FMAs. Currently there's a workaround
> on AArch64 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84114),
> that is, to disable the parallelization with floating-point additions.
> However, this approach may cause regressions. For example, in the
> code below there are only floating-point additions when calculating
> "result += array[j]", and rewriting to parallel is better:
> 
> // Compile with -Ofast on aarch64
> float foo (int n, float in)
> {
>   float array[8] = { 0.1, 1.0, 1.1, 100.0, 10.5, 0.5, 0.01, 9.9 };
>   float result = 0.0;
>   for (int i = 0; i < n; i++)
> {
>   if (i % 10)
> for (unsigned j = 0; j < 8; j++)
>   array[j] *= in;
> 
>   for (unsigned j = 0; j < 8; j++)
>result += array[j];
> }
>   return result;
> }
> 
> To improve this, one option is to count the number of MUL_EXPRs in an
> operator list before rewriting to parallel, and allow the rewriting
> when there's none (or 1 MUL_EXPR). This is simple and unlikely to
> introduce regressions. However it lacks flexibility and can not handle
> more general cases.
> 
> Here's an attempt to address the issue more generally.
> 
> 1. Added an additional widening_mul pass before the original reassoc2
> pass. The new pass is limited to only insert FMA, and leave other
> operations like convert_mult_to_widen to the old late widening_mul pass,
> in case other optimizations between the two passes could be hindered.
> 
> 2. On some platforms, for a very long FMA chain, rewriting to parallel
> can be faster. Extended the original "deferring" logic so that all
> conversions to FMA can be deferred. Introduced a new parameter
> op-count-prefer-reassoc to control this behavior.
> 
> 3. Additionally, the new widening_mul pass calls execute_reassoc first,
> to avoid losing opportunities such as folding constants and
> undistributing.
> 
> However, changing the sequence of generating FMA and reassociation may
> expose more FMA chains that are slow (see commit 4a0d0ed2).
> To reduce possible regressions, improved handling the slow FMA chain by:
> 
> 1. Modified result_of_phi to support checking an additional FADD/FMUL.
> 
> 2. On some CPUs, rather than removing the whole FMA chain, only skipping
> a few candidates may generate faster code. Added new parameter
> fskip-fma-heuristic to control this behavior.
> 
> This patch also solves https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98350.
> 
> Thanks,
> Di Zhao



Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-30 Thread Eric Botcazou via Gcc-patches
> We want to be able to treat such things as invariant somehow even if we
> can't do that for references to user data that might be changed by
> intervening code.
> 
> That is, indicate that we know that the _REF actually refers to a const
> variable or is otherwise known to be unchanging.
> 
> Perhaps that should be a new flag that tree_invariant_p can check
> instead of TREE_READONLY.

Richard earlier suggested a langhook; given that Ada will be the main (sole?) 
user of it, this would probably be better.

-- 
Eric Botcazou





Re: [COMMITTED] ada: Remove the body of System.Storage_Elements

2023-05-30 Thread Maciej W. Rozycki
On Mon, 29 May 2023, Jan-Benedict Glaw wrote:

> > Can you elaborate how you build GCC?
> 
> My host compileris Debian's "gcc-snapshot", by now some two months
> old. (As Eric wrote, it's probably just too old.) That compiler is
> given for CC/CXX. The new build is just (as I wrote in the initial
> mail) the configure/make call. So I'll just wait for the next drop for
> Debian's "gcc-snapshot" package. I see that there are already a good
> number of additional commits on the package source, I guess a new
> package version is imminent.

 Alternatively you can just bootstrap GCC under test natively first and 
then use the newly-built compiler for all the cross builds you want to 
verify.  As you need to do it only once per iteration the extra time spent 
on the native build shouldn't be a big fraction of the duration of the 
whole iteration.  A drawback is if this native bootstrap fails for any 
reason, it will make the whole run invalid, i.e. none of the cross targets 
will be verified.

  Maciej


Re: [PATCH] tree: Fix up save_expr [PR52339]

2023-05-30 Thread Jakub Jelinek via Gcc-patches
On Tue, May 30, 2023 at 10:03:05AM +0200, Eric Botcazou wrote:
> > We want to be able to treat such things as invariant somehow even if we
> > can't do that for references to user data that might be changed by
> > intervening code.
> > 
> > That is, indicate that we know that the _REF actually refers to a const
> > variable or is otherwise known to be unchanging.
> > 
> > Perhaps that should be a new flag that tree_invariant_p can check
> > instead of TREE_READONLY.
> 
> Richard earlier suggested a langhook; given that Ada will be the main (sole?) 
> user of it, this would probably be better.

Are the DECL_INVARIANT_P FIELD_DECLs in Ada really invariant no matter how
exactly they are accessed?  Or can Ada suffer from the same problem as
C/C++, where the FIELD_DECL is TREE_READONLY, but could go out of scope or
a pointer to it could change.
I mean the p->fld cases in C/C++, where there could be free (p); or p++
etc. in between the place where save_expr is first evaluated and later
uses?

Jakub



Re: [x86_64 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-05-30 Thread Uros Bizjak via Gcc-patches
On Tue, May 30, 2023 at 9:39 AM Uros Bizjak  wrote:
>
> On Mon, May 29, 2023 at 8:17 PM Roger Sayle  
> wrote:
> >
> >
> > This is my proposed minimal fix for PR target/109973 (hopefully suitable
> > for backporting) that follows Jakub Jelinek's suggestion that we introduce
> > CCZmode and CCCmode variants of ptest and vptest, so that the i386
> > backend treats [v]ptest instructions similarly to testl instructions;
> > using different CCmodes to indicate which condition flags are desired,
> > and then relying on the RTL cmpelim pass to eliminate redundant tests.
> >
> > This conveniently matches Intel's intrinsics, that provide different
> > functions for retrieving different flags, _mm_testz_si128 tests the
> > Z flag, _mm_testc_si128 tests the carry flag.  Currently we use the
> > same instruction (pattern) for both, and unfortunately the *ptest_and
> > optimization is only valid when the ptest/vptest instruction is used to
> > set/test the Z flag.
> >
> > The downside, as predicted by Jakub, is that GCC's cmpelim pass is
> > currently COMPARE-centric and not able to merge the ptests from expressions
> > such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
> > known issue, PR target/80040.  I've some follow-up patches to improve
> > things, but this first patch fixes the wrong-code regression, replacing
> > it with a rare missed-optimization (hopefully suitable for GCC 13).
> >
> > The only change that was unanticipated was the tweak to ix86_match_ccmode.
> > Oddly, CCZmode is allowable for CCmode, but CCCmode isn't.  Given that
> > CCZmode means just the Z flag, CCCmode means just the C flag, and
> > CCmode means all the flags, I'm guessing this asymmetry is unintentional.
> > Perhaps a super-safe fix is to explicitly test for CCZmode, CCCmode or
> > CCmode
> > in the *_ptest pattern's predicate, and not attempt to
> > re-use ix86_match_ccmode?
>
> It is actually the other way. CCZmode should NOT be allowed for CCmode
> in ix86_match_ccmode. When CCmode is requested, we don't assume
> anything about FLAGS bits, so we expect all bits to be valid. CCZmode
> implies only Z bit, and should be compatible only with itself. So, the
> "break;" is in the wrong place, it should be before E_CCZmode.

Hm, but PTEST is the *PRODUCER* of flags, not the consumer...

So, the whole picture should be like this:

(define_insn "*cmp_ccno_1"
  [(set (reg FLAGS_REG)
(compare (match_operand:SWI 0 "nonimmediate_operand" ",?m")
 (match_operand:SWI 1 "const0_operand")))]
  "ix86_match_ccmode (insn, CCNOmode)"

The above means that the compare PROVIDES all bits, but O is
guaranteed to be zero.

(define_insn "*cmp_1"
  [(set (reg FLAGS_REG)
(compare (match_operand:SWI 0 "nonimmediate_operand" "m,")
 (match_operand:SWI 1 "" ",")))]
  "ix86_match_ccmode (insn, CCmode)"

The above means that compare PROVIDES all bits.

+(define_expand "_ptest"
+  [(set (reg:CC FLAGS_REG)
+ (unspec:CC [(match_operand:V_AVX 0 "register_operand")
+(match_operand:V_AVX 1 "vector_operand")]
+   UNSPEC_PTEST))]
+  "TARGET_SSE4_1")

This is not true, PTEST does not provide all FLAGS bits in a general sense.

So, I think your original patch is OK, but please introduce the
ix86_match_ptest_ccmode function instead of reusing ix86_match_ccmode.

Uros.


>
> Uros.
>
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-05-29  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR targt/109973
> > * config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
> > CODE_for_sse4_1_ptestzv2di.
> > (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
> > (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
> > (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
> > * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
> > when expanding UNSPEC_PTEST to compare against zero.
> > * config/i386/i386-features.cc (scalar_chain::convert_compare):
> > Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
> > (general_scalar_chain::convert_insn): Use CCZmode for COMPARE
> > result.
> > (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
> > * config/i386/i386.cc (ix86_match_ccmode): Allow the SET_SRC to be
> > an UNSPEC, in addition to a COMPARE.  Consider CCCmode to be a form
> > of CCmode.
> > * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
> > to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
> > (*_ptest): Add asterisk to hide define_insn.
> > Remove ":CC" flags specification, and use ix86_match_ccmode instead.
> > (_ptestz): New define_expand to specify CCZ.
> > (_ptestc): New define_expand to specify CCC.
> > (_ptest):

[PATCH] riscv: update riscv_asan_shadow_offset

2023-05-30 Thread Andreas Schwab via Gcc-patches
This fixes all asan tests, apart from
c-c++-common/asan/pointer-compare-1.c which needs a workaround for PR
sanitizer/82501.

PR target/110036
* config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
match libsanitizer.
---
 gcc/config/riscv/riscv.cc | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 09fc9e5d95e..b358ca8b5d0 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7043,10 +7043,9 @@ riscv_asan_shadow_offset (void)
 {
   /* We only have libsanitizer support for RV64 at present.
 
- This number must match kRiscv*_ShadowOffset* in the file
- libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
- even though 1<<36 makes more sense.  */
-  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
+ This number must match ASAN_SHADOW_OFFSET_CONST in the file
+ libsanitizer/asan/asan_mapping.h.  */
+  return TARGET_64BIT ? HOST_WIDE_INT_UC (0xd) : 0;
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
-- 
2.40.1


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] riscv: add work around for PR sanitizer/82501

2023-05-30 Thread Andreas Schwab via Gcc-patches
PR sanitizer/82501
* c-c++-common/asan/pointer-compare-1.c: Disable use of small data
on RISC-V.
---
 gcc/testsuite/c-c++-common/asan/pointer-compare-1.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c 
b/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
index 4b558bf8179..fb9126d6df1 100644
--- a/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
+++ b/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
@@ -5,6 +5,7 @@
 /* FIXME: remove me after PR sanitizer/82501 is resolved.  */
 /* { dg-additional-options "-fno-section-anchors" } */
 /* { dg-additional-options "-msdata=none" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } } 
*/
 
 volatile int v;
 
-- 
2.40.1


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH] riscv: update riscv_asan_shadow_offset

2023-05-30 Thread Kito Cheng via Gcc-patches
LGTM, I remember Luís updated[1] that, but apparently I forgot sync this to gcc,

and just to remind, I plan to change that to dynamic offset[2] to make
that work on Sv39, Sv48 and Sv57,
but we are still running testing and debugging to make sure LSAN works well...

[1] https://reviews.llvm.org/D97646
[2] https://reviews.llvm.org/D139827

On Tue, May 30, 2023 at 4:43 PM Andreas Schwab via Gcc-patches
 wrote:
>
> This fixes all asan tests, apart from
> c-c++-common/asan/pointer-compare-1.c which needs a workaround for PR
> sanitizer/82501.
>
> PR target/110036
> * config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
> match libsanitizer.
> ---
>  gcc/config/riscv/riscv.cc | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 09fc9e5d95e..b358ca8b5d0 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -7043,10 +7043,9 @@ riscv_asan_shadow_offset (void)
>  {
>/* We only have libsanitizer support for RV64 at present.
>
> - This number must match kRiscv*_ShadowOffset* in the file
> - libsanitizer/asan/asan_mapping.h which is currently 1<<29 for rv64,
> - even though 1<<36 makes more sense.  */
> -  return TARGET_64BIT ? (HOST_WIDE_INT_1 << 29) : 0;
> + This number must match ASAN_SHADOW_OFFSET_CONST in the file
> + libsanitizer/asan/asan_mapping.h.  */
> +  return TARGET_64BIT ? HOST_WIDE_INT_UC (0xd) : 0;
>  }
>
>  /* Implement TARGET_MANGLE_TYPE.  */
> --
> 2.40.1
>
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH] riscv: add work around for PR sanitizer/82501

2023-05-30 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Tue, May 30, 2023 at 4:43 PM Andreas Schwab via Gcc-patches
 wrote:
>
> PR sanitizer/82501
> * c-c++-common/asan/pointer-compare-1.c: Disable use of small data
> on RISC-V.
> ---
>  gcc/testsuite/c-c++-common/asan/pointer-compare-1.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c 
> b/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
> index 4b558bf8179..fb9126d6df1 100644
> --- a/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
> +++ b/gcc/testsuite/c-c++-common/asan/pointer-compare-1.c
> @@ -5,6 +5,7 @@
>  /* FIXME: remove me after PR sanitizer/82501 is resolved.  */
>  /* { dg-additional-options "-fno-section-anchors" } */
>  /* { dg-additional-options "-msdata=none" { target { powerpc*-*-* } } } */
> +/* { dg-additional-options "-msmall-data-limit=0" { target { riscv*-*-* } } 
> } */
>
>  volatile int v;
>
> --
> 2.40.1
>
>
> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."


Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Robin Dapp via Gcc-patches
>>> but ideally the user would be able to specify -mrvv-size=32 for an
>>> implementation with 32 byte vectors and then vector lowering would make use
>>> of vectors up to 32 bytes?
> 
> Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
> GNU vectors.
> You can take a look this example:
> https://godbolt.org/z/3jYqoM84h  
> 
> GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
> can run on CPU with vector-length = 128bit.
> However, LLVM doesn't need to specify the vector length, and the codegen can 
> run on any CPU with RVV  vector-length >= 128 bits.
> 
> This is what this patch want to do.
> 
> Thanks.
I think Richard's question was rather if it wasn't better to do it more
generically and lower vectors to what either the current cpu or what the
user specified rather than just 16-byte vectors (i.e. indeed a fixed
vlmin and not a fixed vlmin == fixed vlmax).

This patch assumes everything is fixed for optimization purposes and then
switches over to variable-length when nothing can be changed anymore.  That
is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
We would need to make sure that no pass after reload makes use of VLA
properties at all.

In general I don't have a good overview of which optimizations we gain by
such an approach or rather which ones are prevented by VLA altogether?
What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
what we would have for pure VLA?

Regards
 Robin


Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Kito Cheng via Gcc-patches
(I am still on the meeting hell, and will be released very later,
apology for short and incomplete reply, and will reply complete later)

One point for adding VLS mode support is because SLP, especially for
those SLP candidate not in the loop, those case use VLS type can be
better, of cause using larger safe VLA type can optimize too, but that
will cause one issue we found in RISC-V in LLVM - it will spill/reload
whole register instead of exact size.

e.g.

int32x4_t a;
// def a
// spill a
foo ()
// reload a
// use a

Consider we use a VLA mode for a, it will spill and reload with whole
register VLA mode
Online demo here: https://godbolt.org/z/Y1fThbxE6

On Tue, May 30, 2023 at 5:05 PM Robin Dapp  wrote:
>
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h 
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
>  Robin


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
In the future, we will definitely mixing VLA and VLS-vlmin together in a 
codegen and it will not cause any issues.
For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am not 
sure since my SELECT_VL patch is not
finished, I will check if can work when I am working in SELECT_VL patch).

>> In general I don't have a good overview of which optimizations we gain by
>> such an approach or rather which ones are prevented by VLA altogether?
These patches VLS modes can help for SLP auto-vectorization.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 17:05
To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>>> but ideally the user would be able to specify -mrvv-size=32 for an
>>> implementation with 32 byte vectors and then vector lowering would make use
>>> of vectors up to 32 bytes?
> 
> Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
> GNU vectors.
> You can take a look this example:
> https://godbolt.org/z/3jYqoM84h  
> 
> GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
> can run on CPU with vector-length = 128bit.
> However, LLVM doesn't need to specify the vector length, and the codegen can 
> run on any CPU with RVV  vector-length >= 128 bits.
> 
> This is what this patch want to do.
> 
> Thanks.
I think Richard's question was rather if it wasn't better to do it more
generically and lower vectors to what either the current cpu or what the
user specified rather than just 16-byte vectors (i.e. indeed a fixed
vlmin and not a fixed vlmin == fixed vlmax).
 
This patch assumes everything is fixed for optimization purposes and then
switches over to variable-length when nothing can be changed anymore.  That
is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
We would need to make sure that no pass after reload makes use of VLA
properties at all.
 
In general I don't have a good overview of which optimizations we gain by
such an approach or rather which ones are prevented by VLA altogether?
What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
what we would have for pure VLA?
 
Regards
Robin
 


Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Kito Cheng via Gcc-patches
One more note: we found a real case in spec 2006, SLP convert two 8
bit into int8x2_t, but the value has live across the function call, it
only need to save-restore 16 bit, but it become save-restore VLEN bits
because it using VLA mode in backend, you could imagine when VLEN is
larger, the performance penalty will also increase, which is opposite
way we expect - larger VLEN better performance.

On Tue, May 30, 2023 at 5:11 PM Kito Cheng  wrote:
>
> (I am still on the meeting hell, and will be released very later,
> apology for short and incomplete reply, and will reply complete later)
>
> One point for adding VLS mode support is because SLP, especially for
> those SLP candidate not in the loop, those case use VLS type can be
> better, of cause using larger safe VLA type can optimize too, but that
> will cause one issue we found in RISC-V in LLVM - it will spill/reload
> whole register instead of exact size.
>
> e.g.
>
> int32x4_t a;
> // def a
> // spill a
> foo ()
> // reload a
> // use a
>
> Consider we use a VLA mode for a, it will spill and reload with whole
> register VLA mode
> Online demo here: https://godbolt.org/z/Y1fThbxE6
>
> On Tue, May 30, 2023 at 5:05 PM Robin Dapp  wrote:
> >
> > >>> but ideally the user would be able to specify -mrvv-size=32 for an
> > >>> implementation with 32 byte vectors and then vector lowering would make 
> > >>> use
> > >>> of vectors up to 32 bytes?
> > >
> > > Actually, we don't want to specify -mrvv-size = 32 to enable 
> > > vectorization on GNU vectors.
> > > You can take a look this example:
> > > https://godbolt.org/z/3jYqoM84h 
> > >
> > > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > > only can run on CPU with vector-length = 128bit.
> > > However, LLVM doesn't need to specify the vector length, and the codegen 
> > > can run on any CPU with RVV  vector-length >= 128 bits.
> > >
> > > This is what this patch want to do.
> > >
> > > Thanks.
> > I think Richard's question was rather if it wasn't better to do it more
> > generically and lower vectors to what either the current cpu or what the
> > user specified rather than just 16-byte vectors (i.e. indeed a fixed
> > vlmin and not a fixed vlmin == fixed vlmax).
> >
> > This patch assumes everything is fixed for optimization purposes and then
> > switches over to variable-length when nothing can be changed anymore.  That
> > is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> > We would need to make sure that no pass after reload makes use of VLA
> > properties at all.
> >
> > In general I don't have a good overview of which optimizations we gain by
> > such an approach or rather which ones are prevented by VLA altogether?
> > What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> > with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> > what we would have for pure VLA?
> >
> > Regards
> >  Robin


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Biener via Gcc-patches
On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi. Thanks for your analysis and helps.
> 
> >> We could simply retain the original
> >> incrementing IV for loop control and add the decrementing
> >> IV for computing LEN in addition to that and leave IVOPTs
> >> sorting out to eventually merge them (or not).
> 
> I am not sure how to do that. Could you give me more informations?
> 
> I somehow understand your concern is that variable amount of IV will make
> IVOPT fails. 
> 
> I have seen similar situation in LLVM (when apply variable IV,
> they failed to interleave the vectorize code). I am not sure whether they
> are the same reason for that.
> 
> For RVV, we not only want decrement IV style in vectorization but also
> we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM 
> also only apply get_vector_length in single vector length).
>
> >>You can do some testing with a cross compiler, alternatively
> >>there are powerpc machines in the GCC compile farm.
> 
> It seems that Power is ok with decrement IV since most cases are improved.

Well, but Power never will have SELECT_VL so at least for !SELECT_VL
targets you should avoid having an IV with variable decrement.  As
I said it should be easy to rewrite decrement IV to use a constant
increment (when not using SELECT_VL) and testing the pre-decrement
value in the exit test.

Richard.
 
> I think Richard may help to explain decrement IV more clearly.
> 
> Thanks
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-26 14:46
> To: ???
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, ??? wrote:
>  
> > Yesterday's patch has been approved (decremnt IV support):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > 
> > However, it creates fails on PowerPC:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > 
> > I am really sorry for causing inconvinience.
> > 
> > I wonder as we disccussed:
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > This conditions can not disable decrement IV on PowerPC.
> > Should I add a target hook for it?
>  
> No.  I've put some analysis in the PR.  To me the question is
> why (without that SELECT_VL case) we need a decrementing IV
> _for the loop control_?  We could simply retain the original
> incrementing IV for loop control and add the decrementing
> IV for computing LEN in addition to that and leave IVOPTs
> sorting out to eventually merge them (or not).
>  
> Alternatively avoid the variable decrement as I wrote in the
> PR and do the exit test based on the previous IV value.
>  
> But as said all this won't work for the SELECT_VL case, but
> then it's availability is something to key off rather than a
> new target hook?
>  
> > The patch I can only do bootstrap and regression on X86.
> > I didn't have an environment to test PowerPC. I am really sorry.
>  
> You can do some testing with a cross compiler, alternatively
> there are powerpc machines in the GCC compile farm.
>  
> Richard.
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[PATCH 1/3] xtensa: Improve "*shlrd_reg" insn pattern and its variant

2023-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches
The insn "*shlrd_reg" shifts two registers with a funnel shifter by the
third register to get a single word result:

  reg0 = (reg1 SHIFT_OP0 reg3) BIT_JOIN_OP (reg2 SHIFT_OP1 (32 - reg3))

where the funnel left shift is SHIFT_OP0 := ASHIFT, SHIFT_OP1 := LSHIFTRT
and its right shift is SHIFT_OP0 := LSHIFTRT, SHIFT_OP1 := ASHIFT,
respectively.  And also, BIT_JOIN_OP can be either PLUS or IOR in either
shift direction.

  [(set (match_operand:SI 0 "register_operand" "=a")
(match_operator:SI 6 "xtensa_bit_join_operator"
[(match_operator:SI 4 "logical_shift_operator"
[(match_operand:SI 1 "register_operand" "r")
 (match_operand:SI 3 "register_operand" "r")])
 (match_operator:SI 5 "logical_shift_operator"
[(match_operand:SI 2 "register_operand" "r")
 (neg:SI (match_dup 3))])]))]

Although the RTL matching template can express it as above, there is no
way of direcing that the operator (operands[6]) that combines the two
individual shifts is commutative.
Thus, if multiple insn sequences matching the above pattern appear
adjacently, the combiner may accidentally mix them up and get partial
results.

This patch adds a new insn-and-split pattern with the two sides swapped
representation of the bit-combining operation that was lacking and
described above.

And also changes the other "*shlrd" variants from previously describing
the arbitraryness of bit-combining operations with code iterators to a
combination of the match_operator and the predicate above.

gcc/ChangeLog:

* config/xtensa/predicates.md (xtensa_bit_join_operator):
New predicate.
* config/xtensa/xtensa.md (ior_op): Remove.
(*shlrd_reg): Rename from "*shlrd_reg_", and add the
insn_and_split pattern of the same name to express and capture
the bit-combining operation with both sides swapped.
In addition, replace use of code iterator with new operator
predicate.
(*shlrd_const, *shlrd_per_byte):
Likewise regarding the code iterator.
---
 gcc/config/xtensa/predicates.md |  3 ++
 gcc/config/xtensa/xtensa.md | 81 ++---
 2 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 5faf1be8c15..a3575a68892 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -200,6 +200,9 @@
 (define_predicate "xtensa_shift_per_byte_operator"
   (match_code "ashift,ashiftrt,lshiftrt"))
 
+(define_predicate "xtensa_bit_join_operator"
+  (match_code "plus,ior"))
+
 (define_predicate "tls_symbol_operand"
   (and (match_code "symbol_ref")
(match_test "SYMBOL_REF_TLS_MODEL (op) != 0")))
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 57e50911f52..eda1353894b 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -87,9 +87,6 @@
 ;; the same template.
 (define_mode_iterator HQI [HI QI])
 
-;; This code iterator is for *shlrd and its variants.
-(define_code_iterator ior_op [ior plus])
-
 
 ;; Attributes.
 
@@ -1682,21 +1679,22 @@
(set_attr "mode""SI")
(set_attr "length"  "9")])
 
-(define_insn "*shlrd_reg_"
+(define_insn "*shlrd_reg"
   [(set (match_operand:SI 0 "register_operand" "=a")
-   (ior_op:SI (match_operator:SI 4 "logical_shift_operator"
+   (match_operator:SI 6 "xtensa_bit_join_operator"
+   [(match_operator:SI 4 "logical_shift_operator"
[(match_operand:SI 1 "register_operand" "r")
-(match_operand:SI 2 "register_operand" "r")])
-  (match_operator:SI 5 "logical_shift_operator"
-   [(match_operand:SI 3 "register_operand" "r")
-(neg:SI (match_dup 2))])))]
+(match_operand:SI 3 "register_operand" "r")])
+(match_operator:SI 5 "logical_shift_operator"
+   [(match_operand:SI 2 "register_operand" "r")
+(neg:SI (match_dup 3))])]))]
   "!optimize_debug && optimize
&& xtensa_shlrd_which_direction (operands[4], operands[5]) != UNKNOWN"
 {
   switch (xtensa_shlrd_which_direction (operands[4], operands[5]))
 {
-case ASHIFT:   return "ssl\t%2\;src\t%0, %1, %3";
-case LSHIFTRT: return "ssr\t%2\;src\t%0, %3, %1";
+case ASHIFT:   return "ssl\t%3\;src\t%0, %1, %2";
+case LSHIFTRT: return "ssr\t%3\;src\t%0, %2, %1";
 default:   gcc_unreachable ();
 }
 }
@@ -1704,14 +1702,42 @@
(set_attr "mode""SI")
(set_attr "length"  "6")])
 
-(define_insn "*shlrd_const_"
+(define_insn_and_split "*shlrd_reg"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+   (match_operator:SI 6 "xtensa_bit_join_operator"
+   [(match_operator:SI 4 "logical_shift_operator"
+   [(match_operand:SI

[PATCH 2/3] xtensa: Add 'adddi3' and 'subdi3' insn patterns

2023-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches
More optimized than the default RTL generation.

gcc/ChangeLog:

* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).
---
 gcc/config/xtensa/xtensa.md | 52 +
 1 file changed, 52 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index eda1353894b..7870fb0bfce 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -190,6 +190,32 @@
(set_attr "mode""SI")
(set_attr "length"  "3")])
 
+(define_expand "adddi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (plus:DI (match_operand:DI 1 "register_operand")
+(match_operand:DI 2 "register_operand")))]
+  ""
+{
+  rtx_code_label *label = gen_label_rtx ();
+  rtx lo_dest, hi_dest, lo_op0, hi_op0, lo_op1, hi_op1;
+  lo_dest = gen_lowpart (SImode, operands[0]);
+  hi_dest = gen_highpart (SImode, operands[0]);
+  lo_op0 = gen_lowpart (SImode, operands[1]);
+  hi_op0 = gen_highpart (SImode, operands[1]);
+  lo_op1 = gen_lowpart (SImode, operands[2]);
+  hi_op1 = gen_highpart (SImode, operands[2]);
+  if (rtx_equal_p (lo_dest, lo_op1))
+FAIL;
+  emit_clobber (operands[0]);
+  emit_insn (gen_addsi3 (lo_dest, lo_op0, lo_op1));
+  emit_insn (gen_addsi3 (hi_dest, hi_op0, hi_op1));
+  emit_cmp_and_jump_insns (lo_dest, lo_op1, GEU,
+  const0_rtx, SImode, true, label);
+  emit_insn (gen_addsi3 (hi_dest, hi_dest, const1_rtx));
+  emit_label (label);
+  DONE;
+})
+
 (define_insn "addsf3"
   [(set (match_operand:SF 0 "register_operand" "=f")
(plus:SF (match_operand:SF 1 "register_operand" "%f")
@@ -237,6 +263,32 @@
  (const_int 5)
  (const_int 6)))])
 
+(define_expand "subdi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (minus:DI (match_operand:DI 1 "register_operand")
+ (match_operand:DI 2 "register_operand")))]
+  ""
+{
+  rtx_code_label *label = gen_label_rtx ();
+  rtx lo_dest, hi_dest, lo_op0, hi_op0, lo_op1, hi_op1;
+  lo_dest = gen_lowpart (SImode, operands[0]);
+  hi_dest = gen_highpart (SImode, operands[0]);
+  lo_op0 = gen_lowpart (SImode, operands[1]);
+  hi_op0 = gen_highpart (SImode, operands[1]);
+  lo_op1 = gen_lowpart (SImode, operands[2]);
+  hi_op1 = gen_highpart (SImode, operands[2]);
+  if (rtx_equal_p (lo_op0, lo_op1))
+FAIL;
+  emit_clobber (operands[0]);
+  emit_insn (gen_subsi3 (lo_dest, lo_op0, lo_op1));
+  emit_insn (gen_subsi3 (hi_dest, hi_op0, hi_op1));
+  emit_cmp_and_jump_insns (lo_op0, lo_op1, GEU,
+  const0_rtx, SImode, true, label);
+  emit_insn (gen_addsi3 (hi_dest, hi_dest, constm1_rtx));
+  emit_label (label);
+  DONE;
+})
+
 (define_insn "subsf3"
   [(set (match_operand:SF 0 "register_operand" "=f")
(minus:SF (match_operand:SF 1 "register_operand" "f")
-- 
2.30.2


[PATCH 3/3] xtensa: Optimize 'cstoresi4' insn pattern

2023-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches
This patch introduces more optimized implementations for the 6 cstoresi4
insn comparison methods (eq/ne/lt/le/gt/ge, however, required TARGET_NSA
for eq).

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_expand_scc):
Add dedicated optimization code for cstoresi4 (eq/ne/gt/ge/lt/le).
* config/xtensa/xtensa.md (xtensa_ge_zero):
Rename from '*signed_ge_zero', because it had to be called from
'xtensa_expand_scc()'.
---
 gcc/config/xtensa/xtensa.cc | 106 
 gcc/config/xtensa/xtensa.md |  14 ++---
 2 files changed, 102 insertions(+), 18 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 3b5d25b660a..64efd3d7287 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -991,24 +991,108 @@ xtensa_expand_conditional_move (rtx *operands, int isflt)
 int
 xtensa_expand_scc (rtx operands[4], machine_mode cmp_mode)
 {
-  rtx dest = operands[0];
-  rtx cmp;
-  rtx one_tmp, zero_tmp;
+  rtx dest = operands[0], op0 = operands[2], op1 = operands[3];
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx cmp, tmp0, tmp1;
   rtx (*gen_fn) (rtx, rtx, rtx, rtx, rtx);
 
-  if (!(cmp = gen_conditional_move (GET_CODE (operands[1]), cmp_mode,
-   operands[2], operands[3])))
-return 0;
+  /* Dedicated optimizations for cstoresi4.
+ a. In a magnitude comparison operator, swapping both sides and
+   inverting magnitude does not change the result,
+   eg. '(x >= y) != (y <= x)' is a constant of zero
+   (GE is changed to LE, not LT).
+ b. Due to room for further optimization, we use subtraction rather
+   than XOR (the default for RTL expansion of EQ/NE) as the binary
+   operation which is zero if both sides are the same and non-zero
+   otherwise.  */
+  if (cmp_mode == SImode)
+switch (code)
+  {
+  /* EQ(op0, op1) := clz(op0 - op1) / 32 [requires TARGET_NSA] */
+  case EQ:
+   if (!TARGET_NSA)
+ break;
+   /* EQ to EQZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   /* NSAU instruction will return 32 iff the source is zero,
+  zero through 31 otherwise (See Xtensa ISA Reference Manual,
+  p. 462)  */
+   emit_insn (gen_clzsi2 (dest, dest));
+   emit_insn (gen_lshrsi3 (dest, dest, GEN_INT (5)));
+   return 1;
+
+  /* NE(op0, op1) := (op0 - op1) == 0 ? 0 : 1 */
+  case NE:
+   /* NE to NEZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (tmp0 = gen_reg_rtx (SImode),
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   emit_move_insn (dest, const_true_rtx);
+   emit_move_insn (dest,
+   gen_rtx_fmt_eee (IF_THEN_ELSE, SImode,
+gen_rtx_fmt_ee (EQ, VOIDmode,
+tmp0, const0_rtx),
+tmp0, dest));
+   return 1;
+
+  case LE:
+   if (REG_P (op1))
+ {
+   /* LE to GE conversion by swapping both sides.  */
+   tmp0 = op0, op0 = op1, op1 = tmp0;
+   goto case_GE_reg;
+ }
+   /* LE to LT conversion by adding one to op1.  */
+   op1 = GEN_INT (INTVAL (op1) + 1);
+   /* fallthru */
+
+  /* LT(op0, op1) := (unsigned)(op0 - op1) >> 31 */
+  case LT:
+case_LT:
+   /* LT to LTZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   emit_insn (gen_lshrsi3 (dest, dest, GEN_INT (31)));
+   return 1;
+
+  case GE:
+   if (REG_P (op1))
+ {
+case_GE_reg:
+   /* GE to GEZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   /* Emitting the dedicated insn pattern.  */
+   emit_insn (gen_xtensa_ge_zero (dest, dest));
+   return 1;
+ }
+   /* GE to GT conversion by subtracting one from op1.  */
+   op1 = GEN_INT (INTVAL (op1) - 1);
+   /* fallthru */
 
-  one_tmp = gen_reg_rtx (SImode);
-  zero_tmp = gen_reg_rtx (SImode);
-  emit_insn (gen_movsi (one_tmp, const_true_rtx));
-  emit_insn (gen_movsi (zero_tmp, const0_rtx));
+  case GT:
+   /* GT to LT conversion by swapping both sides.  */
+   tmp0 = op0, op0 = op1, op1 = tmp0;
+   goto case_LT;
 
+  default:
+   break;
+  }
+
+  if (! (cmp = gen_conditional_move (code, cmp_mode, op0, op1)))
+return 0;
+
+  tmp0 = force_reg (SImo

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
Ok.

It seems that for this conditions:

+  /* If we're vectorizing a loop that uses length "controls" and
+ can iterate more than once, we apply decrementing IV approach
+ in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo
+LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

I should add direct_supportted_p (SELECT_VL...) to this is that right?

I have send SELECT_VL patch. I will add this in next SELECT_VL patch.

Let's wait Richard's more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:22
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi. Thanks for your analysis and helps.
> 
> >> We could simply retain the original
> >> incrementing IV for loop control and add the decrementing
> >> IV for computing LEN in addition to that and leave IVOPTs
> >> sorting out to eventually merge them (or not).
> 
> I am not sure how to do that. Could you give me more informations?
> 
> I somehow understand your concern is that variable amount of IV will make
> IVOPT fails. 
> 
> I have seen similar situation in LLVM (when apply variable IV,
> they failed to interleave the vectorize code). I am not sure whether they
> are the same reason for that.
> 
> For RVV, we not only want decrement IV style in vectorization but also
> we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM 
> also only apply get_vector_length in single vector length).
>
> >>You can do some testing with a cross compiler, alternatively
> >>there are powerpc machines in the GCC compile farm.
> 
> It seems that Power is ok with decrement IV since most cases are improved.
 
Well, but Power never will have SELECT_VL so at least for !SELECT_VL
targets you should avoid having an IV with variable decrement.  As
I said it should be easy to rewrite decrement IV to use a constant
increment (when not using SELECT_VL) and testing the pre-decrement
value in the exit test.
 
Richard.
> I think Richard may help to explain decrement IV more clearly.
> 
> Thanks
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-26 14:46
> To: ???
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, ??? wrote:
>  
> > Yesterday's patch has been approved (decremnt IV support):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > 
> > However, it creates fails on PowerPC:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > 
> > I am really sorry for causing inconvinience.
> > 
> > I wonder as we disccussed:
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > This conditions can not disable decrement IV on PowerPC.
> > Should I add a target hook for it?
>  
> No.  I've put some analysis in the PR.  To me the question is
> why (without that SELECT_VL case) we need a decrementing IV
> _for the loop control_?  We could simply retain the original
> incrementing IV for loop control and add the decrementing
> IV for computing LEN in addition to that and leave IVOPTs
> sorting out to eventually merge them (or not).
>  
> Alternatively avoid the variable decrement as I wrote in the
> PR and do the exit test based on the previous IV value.
>  
> But as said all this won't work for the SELECT_VL case, but
> then it's availability is something to key off rather than a
> new target hook?
>  
> > The patch I can only do bootstrap and regression on X86.
> > I didn't have an environment to test PowerPC. I am really sorry.
>  
> You can do some testing with a cross compiler, alternatively
> there are powerpc machines in the GCC compile farm.
>  
> Richard.
>  
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).

For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.

> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h 
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
> Robin
>


Re: [PATCH] MATCH: Move `a <= CST1 ? MAX : a` optimization to match

2023-05-30 Thread Richard Biener via Gcc-patches
On Mon, May 8, 2023 at 12:21 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This moves the `a <= CST1 ? MAX : a` optimization
> from phiopt to match. It just adds a new pattern to match.pd.
>
> There is one more change needed before being able to remove
> minmax_replacement from phiopt.
>
> A few notes on the testsuite changes:
> * phi-opt-5.c is now able to optimize at phiopt1 so remove
> the xfail.
> * pr66726-4.c can be optimized during fold before phiopt1
> so need to change the scanning.
> * pr66726-5.c needs two phiopt passes currently to optimize
> to the right thing, it needed 2 phiopt passes before, the cast
> from int to unsigned char is the reason.
> * pr66726-6.c is what the original pr66726-4.c was testing
> before the fold was able to optimize it.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * match.pd (`(a CMP CST1) ? max : a`): New
> pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/phi-opt-5.c: Remove last xfail.
> * gcc.dg/tree-ssa/pr66726-4.c: Change how scanning
> works.
> * gcc.dg/tree-ssa/pr66726-5.c: New test.
> * gcc.dg/tree-ssa/pr66726-6.c: New test.
> ---
>  gcc/match.pd  | 18 +++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c |  2 +-
>  gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c |  5 +++-
>  gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c | 28 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c | 17 ++
>  5 files changed, 68 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr66726-6.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ceae1c34abc..a55ede838cd 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4954,6 +4954,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (code == MAX_EXPR)
>(minmax (max @1 @2) @4)))
>
> +/* Optimize (a CMP CST1) ? max : a */
> +(for cmp(gt  ge  lt  le)
> + minmax (min min max max)
> + (simplify
> +  (cond (cmp @0 @1) (minmax:c@2 @0 @3) @4)
> +   (with
> +{
> +  tree_code code = minmax_from_comparison (cmp, @0, @1, @0, @4);
> +}
> +(if ((cmp == LT_EXPR || cmp == LE_EXPR)
> +&& code == MIN_EXPR
> + && integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node, @3, 
> @1)))
> + (min @2 @4)
> + (if ((cmp == GT_EXPR || cmp == GE_EXPR)
> + && code == MAX_EXPR
> +  && integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node, @3, 
> @1)))
> +  (max @2 @4))
> +
>  /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
>  (simplify
>   (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
> index 5f78a1ba6dc..e78d9d8b83d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-5.c
> @@ -39,7 +39,7 @@ float repl2 (float vary)
>
>  /* phiopt1 confused by predictors.  */
>  /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt1" } } */
> -/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" { xfail 
> *-*-* } } } */
> +/* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt1" } } */
>  /* { dg-final { scan-tree-dump "vary.*MAX_EXPR.*0\\.0" "phiopt2"} } */
>  /* { dg-final { scan-tree-dump "vary.*MIN_EXPR.*1\\.0" "phiopt2"} } */
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
> index 4e43522f3a3..930ad5fb79f 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-4.c
> @@ -9,4 +9,7 @@ foo (unsigned char *p, int i)
>*p = SAT (i);
>  }
>
> -/* { dg-final { scan-tree-dump-times "COND_EXPR .*and PHI .*converted to 
> straightline code" 1 "phiopt1" } } */
> +/* fold could optimize SAT before phiopt1 so only match on the
> +   MIN/MAX here.  */
> +/* { dg-final { scan-tree-dump-times "= MIN_EXPR" 1 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-times "= MAX_EXPR" 1 "phiopt1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
> new file mode 100644
> index 000..4b5066cdb6b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726-5.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-phiopt1-details -fdump-tree-phiopt2-details 
> -fdump-tree-optimized" } */
> +
> +#define SAT(x) (x < 0 ? 0 : (x > 255 ? 255 : x))
> +
> +unsigned char
> +foo (unsigned char *p, int i)
> +{
> +  if (i < 0)
> +return 0;
> +  {
> +int t;
> +if (i > 255)
> +  t = 255;
> +else
> +  t = i;
> +return t;
> +  }
> +}
> +
> +/* Because of the way PHIOPT works, it only does the merging of BBs after it 
> is done so we get the case were we can't
> +   optimize the above until phiopt2 right now.  */
> +/* { dg-final { scan-tree-dump-ti

Re: [PATCH] Add a != MIN/MAX_VALUE_CST ? CST-+1 : a to minmax_from_comparison

2023-05-30 Thread Richard Biener via Gcc-patches
On Mon, May 8, 2023 at 7:27 AM Andrew Pinski via Gcc-patches
 wrote:
>
> This patch adds the support for match that was implemented for PR 87913 in 
> phiopt.
> It implements it by adding support to minmax_from_comparison for the check.
> It uses the range information if available which allows to produce MIN/MAX 
> expression
> when comparing against the lower/upper bound of the range instead of 
> lower/upper
> of the type.
>
> minmax-20.c is the new testcase which tests the ranges part.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> * fold-const.cc (minmax_from_comparison): Add support for NE_EXPR.
> * match.pd ((cond (cmp (convert1? x) c1) (convert2? x) c2) pattern):
> Add ne as a possible cmp.
> ((a CMP b) ? minmax : minmax pattern): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/minmax-20.c: New test.
> ---
>  gcc/fold-const.cc | 26 +++
>  gcc/match.pd  |  4 ++--
>  gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c | 12 +++
>  3 files changed, 40 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index db54bfc5662..d90671b9975 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -173,6 +173,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
> exp1, tree exp2, tree exp
>   /* X > Y - 1 equals to X >= Y.  */
>   if (cmp == GT_EXPR)
> code = GE_EXPR;
> + /* a != MIN_RANGE ? a : MIN_RANGE+1 -> 
> MAX_EXPR+1, a> */
> + if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
> +   {
> + value_range r;
> + get_range_query (cfun)->range_of_expr (r, exp0);
> + if (r.undefined_p ())
> +   r.set_varying (TREE_TYPE (exp0));
> +
> + widest_int min = widest_int::from (r.lower_bound (),
> +TYPE_SIGN (TREE_TYPE 
> (exp0)));
> + if (min == wi::to_widest (exp1))
> +   code = MAX_EXPR;
> +   }
> }
>if (wi::to_widest (exp1) == (wi::to_widest (exp3) + 1))
> {
> @@ -182,6 +195,19 @@ minmax_from_comparison (tree_code cmp, tree exp0, tree 
> exp1, tree exp2, tree exp
>   /* X >= Y + 1 equals to X > Y.  */
>   if (cmp == GE_EXPR)
>   code = GT_EXPR;
> + /* a != MAX_RANGE ? a : MAX_RANGE-1 -> 
> MIN_EXPR-1, a> */
> + if (cmp == NE_EXPR && TREE_CODE (exp0) == SSA_NAME)
> +   {
> + value_range r;
> + get_range_query (cfun)->range_of_expr (r, exp0);
> + if (r.undefined_p ())
> +   r.set_varying (TREE_TYPE (exp0));
> +
> + widest_int max = widest_int::from (r.upper_bound (),
> +TYPE_SIGN (TREE_TYPE 
> (exp0)));
> + if (max == wi::to_widest (exp1))
> +   code = MIN_EXPR;
> +   }
> }
>  }
>if (code != ERROR_MARK
> diff --git a/gcc/match.pd b/gcc/match.pd
> index a55ede838cd..95f7e9a6abc 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4751,7 +4751,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
> Case 2)
> (cond (eq (convert1? x) c1) (convert2? x) c2) -> (cond (eq x c1) c1 c2).  
> */
> -(for cmp (lt le gt ge eq)
> +(for cmp (lt le gt ge eq ne)
>   (simplify
>(cond (cmp (convert1? @1) INTEGER_CST@3) (convert2? @1) INTEGER_CST@2)
>(with
> @@ -4942,7 +4942,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Optimize (a CMP b) ? minmax : minmax
> to minmax, c> */
>  (for minmax (min max)
> - (for cmp (lt le gt ge)
> + (for cmp (lt le gt ge ne)
>(simplify
> (cond (cmp @1 @3) (minmax:c @1 @4) (minmax:c @2 @4))
> (with
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
> new file mode 100644
> index 000..481c375f5f9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-20.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-phiopt2" } */
> +
> +int f(int num)
> +{
> +  if (num < 3) __builtin_unreachable();
> +  return num != 3 ?  num : 4;
> +}
> +
> +/* In phiopt2 with the range information, this should be turned into
> +   a MAX_EXPR.  */
> +/* { dg-final { scan-tree-dump-times "MAX_EXPR" 1 "phiopt2" } } */
> --
> 2.31.1
>


Re: [PATCH] riscv: update riscv_asan_shadow_offset

2023-05-30 Thread Andreas Schwab via Gcc-patches
Ok for 12 and 13 branch?

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
>> For the future it would be then good to have the vectorizer
>>re-vectorize loops with
>>VLS vector uses to VLA style?
 Not really, this patch is just using a magic convert VLS vector into VLA stype 
since
 it can avoid defining the RVV patterns with VLS modes and avoid a lot of work.

 There is no benefits in case of convert VLS into VLS
 And I don't even consider it's safe.

especially this code:
+   case MEM: 
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX); 

I feel it is unsafe code.

Actually, my original plan is to define new RVV patterns with new VLS modes 
(The patterns are same as VLA patterns, just modes are different).
Then emit codegen this VLS RVV patterns.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h 
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
> Robin
>
 


[PATCH][committed] aarch64: Reimplement v(r)hadd and vhsub intrinsics with RTL codes

2023-05-30 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch reimplements the MD patterns for the 
UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using
standard RTL operations rather than unspecs. The correct RTL representations 
involves widening
the inputs before adding them and halving, followed by a truncation back to the 
original mode.
An unfortunate wart in the patch is that we end up having very similar 
expanders for the intrinsics
through the aarch64_h and aarch64_rhadd names 
and the standard names
for the vector averaging optabs avg3_floor and avg3_ceil.
I'd like to reuse avg3_ceil for the intrinsics builtin as well but 
our scheme
in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only 
allowing mappings
of entries in aarch64-simd-builtins.def to:
   0 - CODE_FOR_aarch64_
   1-9 - CODE_FOR_<1-9>
   10 - CODE_FOR_

whereas here we want a string after the  i.e. CODE_FOR_uavg3_ceil.
This patch adds a bit of remapping logic in aarch64-builtins.cc before the 
construction of the
builtin info that remaps the CODE_FOR_* definitions in 
aarch64-simd-builtins.def to the
optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to 
CODE_FOR_avgv4si3_ceil, for example.
It's a bit specific to this case, but this solution requires the least invasive 
changes while avoiding
having duplicate expanders just for the sake of a different pattern name.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of
aarch64-builtin-iterators.h.  Add definition to remap shadd, uhadd,
srhadd, urhadd builtin codes for standard optab ones.
* config/aarch64/aarch64-simd.md (avg3_floor): Rename to...
(avg3_floor): ... This.  Expand to RTL codes rather than
unspec.
(avg3_ceil): Rename to...
(avg3_ceil): ... This.  Expand to RTL codes rather than
unspec.
(aarch64_hsub): New define_expand.
(aarch64_h): Split into...
(*aarch64_h_insn): ... This...
(*aarch64_rhadd_insn): ... And this.


vrhadd.patch
Description: vrhadd.patch


Re: [PATCH] Detect bswap + rotate for byte permutation in pass_bswap.

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 9, 2023 at 9:06 AM liuhongt via Gcc-patches
 wrote:
>
> The patch doesn't handle:
>   1. cast64_to_32,
>   2. memory source with rsize < range.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

OK and sorry for the delay.

Richard.

> gcc/ChangeLog:
>
> PR middle-end/108938
> * gimple-ssa-store-merging.cc (is_bswap_or_nop_p): New
> function, cut from original find_bswap_or_nop function.
> (find_bswap_or_nop): Add a new parameter, detect bswap +
> rotate and save rotate result in the new parameter.
> (bswap_replace): Add a new parameter to indicate rotate and
> generate rotate stmt if needed.
> (maybe_optimize_vector_constructor): Adjust for new rotate
> parameter in the upper 2 functions.
> (pass_optimize_bswap::execute): Ditto.
> (imm_store_chain_info::output_merged_store): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr108938-1.c: New test.
> * gcc.target/i386/pr108938-2.c: New test.
> * gcc.target/i386/pr108938-3.c: New test.
> * gcc.target/i386/pr108938-load-1.c: New test.
> * gcc.target/i386/pr108938-load-2.c: New test.
> ---
>  gcc/gimple-ssa-store-merging.cc   | 130 ++
>  gcc/testsuite/gcc.target/i386/pr108938-1.c|  79 +++
>  gcc/testsuite/gcc.target/i386/pr108938-2.c|  35 +
>  gcc/testsuite/gcc.target/i386/pr108938-3.c|  26 
>  .../gcc.target/i386/pr108938-load-1.c |  69 ++
>  .../gcc.target/i386/pr108938-load-2.c |  30 
>  6 files changed, 342 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108938-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108938-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108938-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108938-load-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr108938-load-2.c
>
> diff --git a/gcc/gimple-ssa-store-merging.cc b/gcc/gimple-ssa-store-merging.cc
> index df7afd2fd78..9cb574fa315 100644
> --- a/gcc/gimple-ssa-store-merging.cc
> +++ b/gcc/gimple-ssa-store-merging.cc
> @@ -893,6 +893,37 @@ find_bswap_or_nop_finalize (struct symbolic_number *n, 
> uint64_t *cmpxchg,
>n->range *= BITS_PER_UNIT;
>  }
>
> +/* Helper function for find_bswap_or_nop,
> +   Return true if N is a swap or nop with MASK.  */
> +static bool
> +is_bswap_or_nop_p (uint64_t n, uint64_t cmpxchg,
> +  uint64_t cmpnop, uint64_t* mask,
> +  bool* bswap)
> +{
> +  *mask = ~(uint64_t) 0;
> +  if (n == cmpnop)
> +*bswap = false;
> +  else if (n == cmpxchg)
> +*bswap = true;
> +  else
> +{
> +  int set = 0;
> +  for (uint64_t msk = MARKER_MASK; msk; msk <<= BITS_PER_MARKER)
> +   if ((n & msk) == 0)
> + *mask &= ~msk;
> +   else if ((n & msk) == (cmpxchg & msk))
> + set++;
> +   else
> + return false;
> +
> +  if (set < 2)
> +   return false;
> +  *bswap = true;
> +}
> +  return true;
> +}
> +
> +
>  /* Check if STMT completes a bswap implementation or a read in a given
> endianness consisting of ORs, SHIFTs and ANDs and sets *BSWAP
> accordingly.  It also sets N to represent the kind of operations
> @@ -903,7 +934,7 @@ find_bswap_or_nop_finalize (struct symbolic_number *n, 
> uint64_t *cmpxchg,
>
>  gimple *
>  find_bswap_or_nop (gimple *stmt, struct symbolic_number *n, bool *bswap,
> -  bool *cast64_to_32, uint64_t *mask)
> +  bool *cast64_to_32, uint64_t *mask, uint64_t* l_rotate)
>  {
>tree type_size = TYPE_SIZE_UNIT (TREE_TYPE (gimple_get_lhs (stmt)));
>if (!tree_fits_uhwi_p (type_size))
> @@ -984,29 +1015,57 @@ find_bswap_or_nop (gimple *stmt, struct 
> symbolic_number *n, bool *bswap,
>  }
>
>uint64_t cmpxchg, cmpnop;
> +  uint64_t orig_range = n->range * BITS_PER_UNIT;
>find_bswap_or_nop_finalize (n, &cmpxchg, &cmpnop, cast64_to_32);
>
>/* A complete byte swap should make the symbolic number to start with
>   the largest digit in the highest order byte. Unchanged symbolic
>   number indicates a read with same endianness as target architecture.  */
> -  *mask = ~(uint64_t) 0;
> -  if (n->n == cmpnop)
> -*bswap = false;
> -  else if (n->n == cmpxchg)
> -*bswap = true;
> -  else
> +  *l_rotate = 0;
> +  uint64_t tmp_n = n->n;
> +  if (!is_bswap_or_nop_p (tmp_n, cmpxchg, cmpnop, mask, bswap))
>  {
> -  int set = 0;
> -  for (uint64_t msk = MARKER_MASK; msk; msk <<= BITS_PER_MARKER)
> -   if ((n->n & msk) == 0)
> - *mask &= ~msk;
> -   else if ((n->n & msk) == (cmpxchg & msk))
> - set++;
> -   else
> - return NULL;
> -  if (set < 2)
> +  /* Try bswap + lrotate.  */
> +  /* TODO, handle cast64_to_32 and big/litte_endian memory
> +source when rsize < range.  */
> +  if (n->range == ori

[PATCH][committed] aarch64: Convert ADDLP and ADALP patterns to standard RTL codes

2023-05-30 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch converts the patterns for the integer widen and pairwise-add 
instructions
to standard RTL operations. The pairwise addition withing a vector can be 
represented
as an addition of two vec_selects, one selecting the even elements, and one 
selecting odd.
Thus for the intrinsic vpaddlq_s8 we can generate:
(set (reg:V8HI 92)
(plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 0 [0])
(const_int 2 [0x2])
(const_int 4 [0x4])
(const_int 6 [0x6])
(const_int 8 [0x8])
(const_int 10 [0xa])
(const_int 12 [0xc])
(const_int 14 [0xe])
]))
(vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 1 [0x1])
(const_int 3 [0x3])
(const_int 5 [0x5])
(const_int 7 [0x7])
(const_int 9 [0x9])
(const_int 11 [0xb])
(const_int 13 [0xd])
(const_int 15 [0xf])
] 

Similarly for the accumulating forms where there's an extra outer PLUS for the 
accumulation.
We already have the handy helper functions aarch64_stepped_int_parallel_p and
aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of 
to define
the right predicate for the VEC_SELECT PARALLEL.

This patch allows us to remove some code iterators and the UNSPEC definitions 
for SADDLP and UADDLP.
UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 
patterns still.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_adalp): Delete.
(aarch64_adalp): New define_expand.
(*aarch64_adalp_insn): New define_insn.
(aarch64_addlp): Convert to define_expand.
(*aarch64_addlp_insn): New define_insn.
* config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete.
(ADALP): Likewise.
(USADDLP): Likewise.
* config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.


adalp.patch
Description: adalp.patch


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
I think I prefer doing VLS mode like these:
This is current VLA patterns:
(define_insn "@pred_"
  [(set (match_operand:VI 0 "register_operand"   "=vd, vd, vr, vr, vd, 
vd, vr, vr, vd, vd, vr, vr")
  (if_then_else:VI
(unspec:
  [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, 
vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
   (match_operand 5 "vector_length_operand"" rK, rK, rK,  rK, rK, rK, 
rK, rK, rK, rK, rK, rK")
   (match_operand 6 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 7 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 8 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(any_int_binop:VI
  (match_operand:VI 3 "" "")
  (match_operand:VI 4 "" ""))
(match_operand:VI 2 "vector_merge_operand" 
"vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))]
  "TARGET_VECTOR"
  "@
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1"
  [(set_attr "type" "")
   (set_attr "mode" "")])

(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])

You can see there is no VLS modes in "VI". Now to support VLS, I think we 
should extend "VI" iterator:
(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
V4SI V2DI V8HI V16QI
])

Then codegen directly to this VLS patterns without any conversion.
This is the safe way to deal with VLS patterns.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h 
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this pat

Re: [PATCH] Optimized "(X - N * M) / N + M" to "X / N" if valid

2023-05-30 Thread Richard Biener via Gcc-patches
On Wed, 17 May 2023, Jiufu Guo wrote:

> Hi,
> 
> This patch tries to optimize "(X - N * M) / N + M" to "X / N".

But if that's valid why not make the transform simpler and transform
(X - N * M) / N  to X / N - M instead?

You use the same optimize_x_minus_NM_div_N_plus_M validator for
the division and shift variants but the overflow rules are different,
so I'm not sure that's warranted.  I'd also prefer to not split out
the validator to a different file - iff then the appropriate file
is fold-const.cc, not gimple-match-head.cc (I see we're a bit
inconsistent here, for pure gimple matches gimple-fold.cc would
be another place).

Since you use range information why is the transform restricted
to constant M?

Richard.

> As per the discussions in PR108757, we know this transformation is valid
> only under some conditions.
> For C code, "/" towards zero (trunc_div), and "X - N * M"
> maybe wrap/overflow/underflow. So, it is valid that "X - N * M" does
> not cross zero and does not wrap/overflow/underflow.
> 
> This patch also handles the case when "N" is the power of 2, where
> "(X - N * M) / N" is "(X - N * M) >> log2(N)".
> 
> Bootstrap & regtest pass on ppc64{,le} and x86_64.
> Is this ok for trunk?
> 
> BR,
> Jeff (Jiufu)
> 
>   PR tree-optimization/108757
> 
> gcc/ChangeLog:
> 
>   * gimple-match-head.cc (optimize_x_minus_NM_div_N_plus_M): New function.
>   * match.pd ((X - N * M) / N + M): New pattern.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr108757-1.c: New test.
>   * gcc.dg/pr108757-2.c: New test.
>   * gcc.dg/pr108757.h: New test.
> 
> ---
>  gcc/gimple-match-head.cc  |  54 ++
>  gcc/match.pd  |  22 
>  gcc/testsuite/gcc.dg/pr108757-1.c |  17 
>  gcc/testsuite/gcc.dg/pr108757-2.c |  18 
>  gcc/testsuite/gcc.dg/pr108757.h   | 160 ++
>  5 files changed, 271 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> 
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index b08cd891a13..680a4cb2fc6 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -224,3 +224,57 @@ optimize_successive_divisions_p (tree divisor, tree 
> inner_div)
>  }
>return true;
>  }
> +
> +/* Return true if "(X - N * M) / N + M" can be optimized into "X / N".
> +   Otherwise return false.
> +
> +   For unsigned,
> +   If sign bit of M is 0 (clz is 0), valid range is [N*M, MAX].
> +   If sign bit of M is 1, valid range is [0, MAX - N*(-M)].
> +
> +   For signed,
> +   If N*M > 0, valid range: [MIN+N*M, 0] + [N*M, MAX]
> +   If N*M < 0, valid range: [MIN, -(-N*M)] + [0, MAX - (-N*M)].  */
> +
> +static bool
> +optimize_x_minus_NM_div_N_plus_M (tree x, wide_int n, wide_int m, tree type)
> +{
> +  wide_int max = wi::max_value (type);
> +  signop sgn = TYPE_SIGN (type);
> +  wide_int nm;
> +  wi::overflow_type ovf;
> +  if (TYPE_UNSIGNED (type) && wi::clz (m) == 0)
> +nm = wi::mul (n, -m, sgn, &ovf);
> +  else
> +nm = wi::mul (n, m, sgn, &ovf);
> +
> +  if (ovf != wi::OVF_NONE)
> +return false;
> +
> +  value_range vr0;
> +  if (!get_range_query (cfun)->range_of_expr (vr0, x) || vr0.varying_p ()
> +  || vr0.undefined_p ())
> +return false;
> +
> +  wide_int wmin0 = vr0.lower_bound ();
> +  wide_int wmax0 = vr0.upper_bound ();
> +  wide_int min = wi::min_value (type);
> +
> +  /* unsigned */
> +  if ((TYPE_UNSIGNED (type)))
> +/* M > 0 (clz != 0): [N*M, MAX],  M < 0 : [0, MAX-N*(-M)]  */
> +return wi::clz (m) != 0 ? wi::ge_p (wmin0, nm, sgn)
> + : wi::le_p (wmax0, max - nm, sgn);
> +
> +  /* signed, N*M > 0 */
> +  else if (wi::gt_p (nm, 0, sgn))
> +/* [N*M, MAX] or [MIN+N*M, 0] */
> +return wi::ge_p (wmin0, nm, sgn)
> +|| (wi::ge_p (wmin0, min + nm, sgn) && wi::le_p (wmax0, 0, sgn));
> +
> +  /* signed, N*M < 0 */
> +  /* [MIN, N*M] or [0, MAX + N*M]*/
> +  else
> +return wi::le_p (wmax0, nm, sgn)
> +|| (wi::ge_p (wmin0, 0, sgn) && wi::le_p (wmax0, max - (-nm), sgn));
> +}
> diff --git a/gcc/match.pd b/gcc/match.pd
> index ceae1c34abc..1aaa5530577 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -881,6 +881,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  #endif
> 
>  
> +#if GIMPLE
> +/* Simplify ((t + -N*M) / N + M) -> t / N.  */
> +(for div (trunc_div exact_div)
> + (simplify
> +  (plus (div (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
> +  (with {wide_int n = wi::to_wide (@2); wide_int m = wi::to_wide (@3);}
> +(if (INTEGRAL_TYPE_P (type)
> +  && n * m == -wi::to_wide (@1)
> +  && optimize_x_minus_NM_div_N_plus_M (@0, n, m, type))
> +(div @0 @2)
> +
> +/* Simplify ((t + -(M<> N + M) -> t >> N.  */
> +(simplify
> + (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
> + (with {wide_int n = wi::to_wide (@2); wide_int m = wi:

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:

> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?

No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).

Richard.

> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > > 
> > > This conditions can not disable decrement IV on PowerPC.
> > > Should I add a target hook for it?
> >  
> > No.  I've put some analysis in the PR.  To me the question is
> > why (without that SELECT_VL case) we need a decrementing IV
> > _for the loop control_?  We could simply retain the original
> > incrementing IV for loop control and add the decrementing
> > IV for computing LEN in addition to that and leave IVOPTs
> > sorting out to eventually merge them (or not).
> >  
> > Alternatively avoid the variable decrement as I wrote in the
> > PR and do the exit test based on the previous IV value.
> >  
> > But as said all this won't work for the SELECT_VL case, but
> > then it's availability is something to key off rather than a
> > new target hook?
> >  
> > > The patch I can only do bootstrap and regression on X86.
> > > I didn't have an environment to test PowerPC. I am really sorry.
> >  
> > You can do some testing with a cross co

[PATCH 2/3 v2] xtensa: Add 'adddi3' and 'subdi3' insn patterns

2023-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches
Resubmitting the correct one due to a mistake in merging order of fixes.
---
More optimized than the default RTL generation.

gcc/ChangeLog:

* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).
---
 gcc/config/xtensa/xtensa.md | 52 +
 1 file changed, 52 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index eda1353894b..6882baaedfd 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -190,6 +190,32 @@
(set_attr "mode""SI")
(set_attr "length"  "3")])
 
+(define_expand "adddi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (plus:DI (match_operand:DI 1 "register_operand")
+(match_operand:DI 2 "register_operand")))]
+  ""
+{
+  rtx lo_dest, hi_dest, lo_op0, hi_op0, lo_op1, hi_op1;
+  rtx_code_label *label;
+  lo_dest = gen_lowpart (SImode, operands[0]);
+  hi_dest = gen_highpart (SImode, operands[0]);
+  lo_op0 = gen_lowpart (SImode, operands[1]);
+  hi_op0 = gen_highpart (SImode, operands[1]);
+  lo_op1 = gen_lowpart (SImode, operands[2]);
+  hi_op1 = gen_highpart (SImode, operands[2]);
+  if (rtx_equal_p (lo_dest, lo_op1))
+FAIL;
+  emit_clobber (operands[0]);
+  emit_insn (gen_addsi3 (lo_dest, lo_op0, lo_op1));
+  emit_insn (gen_addsi3 (hi_dest, hi_op0, hi_op1));
+  emit_cmp_and_jump_insns (lo_dest, lo_op1, GEU, const0_rtx,
+  SImode, true, label = gen_label_rtx ());
+  emit_insn (gen_addsi3 (hi_dest, hi_dest, const1_rtx));
+  emit_label (label);
+  DONE;
+})
+
 (define_insn "addsf3"
   [(set (match_operand:SF 0 "register_operand" "=f")
(plus:SF (match_operand:SF 1 "register_operand" "%f")
@@ -237,6 +263,32 @@
  (const_int 5)
  (const_int 6)))])
 
+(define_expand "subdi3"
+  [(set (match_operand:DI 0 "register_operand")
+   (minus:DI (match_operand:DI 1 "register_operand")
+ (match_operand:DI 2 "register_operand")))]
+  ""
+{
+  rtx lo_dest, hi_dest, lo_op0, hi_op0, lo_op1, hi_op1;
+  rtx_code_label *label;
+  lo_dest = gen_lowpart (SImode, operands[0]);
+  hi_dest = gen_highpart (SImode, operands[0]);
+  lo_op0 = gen_lowpart (SImode, operands[1]);
+  hi_op0 = gen_highpart (SImode, operands[1]);
+  lo_op1 = gen_lowpart (SImode, operands[2]);
+  hi_op1 = gen_highpart (SImode, operands[2]);
+  if (rtx_equal_p (lo_op0, lo_op1))
+FAIL;
+  emit_clobber (operands[0]);
+  emit_insn (gen_subsi3 (lo_dest, lo_op0, lo_op1));
+  emit_insn (gen_subsi3 (hi_dest, hi_op0, hi_op1));
+  emit_cmp_and_jump_insns (lo_op0, lo_op1, GEU, const0_rtx,
+  SImode, true, label = gen_label_rtx ());
+  emit_insn (gen_addsi3 (hi_dest, hi_dest, constm1_rtx));
+  emit_label (label);
+  DONE;
+})
+
 (define_insn "subsf3"
   [(set (match_operand:SF 0 "register_operand" "=f")
(minus:SF (match_operand:SF 1 "register_operand" "f")
-- 
2.30.2


[PATCH 3/3 v2] xtensa: Optimize 'cstoresi4' insn pattern

2023-05-30 Thread Takayuki 'January June' Suwa via Gcc-patches
Resubmitting the correct one due to a mistake in merging order of fixes.
---
This patch introduces more optimized implementations for the 6 cstoresi4
insn comparison methods (eq/ne/lt/le/gt/ge, however, required TARGET_NSA
for eq).

gcc/ChangeLog:

* config/xtensa/xtensa.cc (xtensa_expand_scc):
Add dedicated optimization code for cstoresi4 (eq/ne/gt/ge/lt/le).
* config/xtensa/xtensa.md (xtensa_ge_zero):
Rename from '*signed_ge_zero', because it had to be called from
'xtensa_expand_scc()'.
---
 gcc/config/xtensa/xtensa.cc | 106 
 gcc/config/xtensa/xtensa.md |   2 +-
 2 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 3b5d25b660a..64efd3d7287 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -991,24 +991,108 @@ xtensa_expand_conditional_move (rtx *operands, int isflt)
 int
 xtensa_expand_scc (rtx operands[4], machine_mode cmp_mode)
 {
-  rtx dest = operands[0];
-  rtx cmp;
-  rtx one_tmp, zero_tmp;
+  rtx dest = operands[0], op0 = operands[2], op1 = operands[3];
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx cmp, tmp0, tmp1;
   rtx (*gen_fn) (rtx, rtx, rtx, rtx, rtx);
 
-  if (!(cmp = gen_conditional_move (GET_CODE (operands[1]), cmp_mode,
-   operands[2], operands[3])))
-return 0;
+  /* Dedicated optimizations for cstoresi4.
+ a. In a magnitude comparison operator, swapping both sides and
+   inverting magnitude does not change the result,
+   eg. '(x >= y) != (y <= x)' is a constant of zero
+   (GE is changed to LE, not LT).
+ b. Due to room for further optimization, we use subtraction rather
+   than XOR (the default for RTL expansion of EQ/NE) as the binary
+   operation which is zero if both sides are the same and non-zero
+   otherwise.  */
+  if (cmp_mode == SImode)
+switch (code)
+  {
+  /* EQ(op0, op1) := clz(op0 - op1) / 32 [requires TARGET_NSA] */
+  case EQ:
+   if (!TARGET_NSA)
+ break;
+   /* EQ to EQZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   /* NSAU instruction will return 32 iff the source is zero,
+  zero through 31 otherwise (See Xtensa ISA Reference Manual,
+  p. 462)  */
+   emit_insn (gen_clzsi2 (dest, dest));
+   emit_insn (gen_lshrsi3 (dest, dest, GEN_INT (5)));
+   return 1;
+
+  /* NE(op0, op1) := (op0 - op1) == 0 ? 0 : 1 */
+  case NE:
+   /* NE to NEZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (tmp0 = gen_reg_rtx (SImode),
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   emit_move_insn (dest, const_true_rtx);
+   emit_move_insn (dest,
+   gen_rtx_fmt_eee (IF_THEN_ELSE, SImode,
+gen_rtx_fmt_ee (EQ, VOIDmode,
+tmp0, const0_rtx),
+tmp0, dest));
+   return 1;
+
+  case LE:
+   if (REG_P (op1))
+ {
+   /* LE to GE conversion by swapping both sides.  */
+   tmp0 = op0, op0 = op1, op1 = tmp0;
+   goto case_GE_reg;
+ }
+   /* LE to LT conversion by adding one to op1.  */
+   op1 = GEN_INT (INTVAL (op1) + 1);
+   /* fallthru */
+
+  /* LT(op0, op1) := (unsigned)(op0 - op1) >> 31 */
+  case LT:
+case_LT:
+   /* LT to LTZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   emit_insn (gen_lshrsi3 (dest, dest, GEN_INT (31)));
+   return 1;
+
+  case GE:
+   if (REG_P (op1))
+ {
+case_GE_reg:
+   /* GE to GEZ conversion by subtracting op1 from op0.  */
+   emit_move_insn (dest,
+   expand_binop (SImode, sub_optab, op0, op1,
+ 0, 0, OPTAB_LIB_WIDEN));
+   /* Emitting the dedicated insn pattern.  */
+   emit_insn (gen_xtensa_ge_zero (dest, dest));
+   return 1;
+ }
+   /* GE to GT conversion by subtracting one from op1.  */
+   op1 = GEN_INT (INTVAL (op1) - 1);
+   /* fallthru */
 
-  one_tmp = gen_reg_rtx (SImode);
-  zero_tmp = gen_reg_rtx (SImode);
-  emit_insn (gen_movsi (one_tmp, const_true_rtx));
-  emit_insn (gen_movsi (zero_tmp, const0_rtx));
+  case GT:
+   /* GT to LT conversion by swapping both sides.  */
+   tmp0 = op0, op0 = op1, op1 = tmp0;
+   goto case_LT;
 
+  default:
+   break;
+  }
+
+  if (! (cmp = gen_conditional_mov

Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Kewen.Lin via Gcc-patches
on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?

I guess no, with this condition any targets without SELECT_VL are unable
to leverage the new decrement scheme for lengths, as your reply in PR109971
you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
one new IV just like the previous one which has non-variable step, then it's
SCEV-ed and some analysis based on it can do a good job.

Since this is mainly for targets without SELECT_VL capability, I can follow
up this if you don't mind.

BR,
Kewen


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
>> No, since powerpc is fine with decrementing VL it should also use it.
>>Instead you should make sure to produce SCEV analyzable IVs when
>>possible (when SELECT_VL is not or cannot be used).
Ok. Would you mind giving me the guideline how to rewrite the decrement IV?
Since I am not familiar with SCEV and I am not sure how to do that SCEV can 
analysis the decrement IV.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > > 
> > > This conditions can not disable decrement IV on PowerPC.
> > > Should I add a target hook for it?
> >  
> > No.  I've put some analysis in the PR.  To me the question is
> > why (without that SELECT_VL case) we need a decrementing IV
> > _for the loop control_?  We could simply retain the original
> > incrementing IV for loop control and add the decrementing
> > IV for computing LEN in addition to

Re: [PATCH] libatomic: Provide gthr.h default implementation

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 23, 2023 at 11:28 AM Sebastian Huber
 wrote:
>
> On 10.01.23 16:38, Sebastian Huber wrote:
> > On 19/12/2022 17:02, Sebastian Huber wrote:
> >> Build libatomic for all targets.  Use gthr.h to provide a default
> >> implementation.  If the thread model is "single", then this
> >> implementation will
> >> not work if for example atomic operations are used for thread/interrupt
> >> synchronization.
> >
> > Is this and the related -fprofile-update=atomic patch something for GCC 14?
>
> Now that the GCC 14 development is in progress, what about this patch?

Sorry, there doesn't seem to be a main maintainer for libatomic and your patch
touches targets which didn't have it before.

Can you explain how this affects the ABI of targets not having (needing?!)
libatomic?  It might help if you can say this is still opt-in and targets not
building libatomic right now would not with your patch and targets already
building libatomic have no changes with your patch.

That said - what kind of ABI implications has providing libatomic support
for a target that didn't do so before?

Richard.

> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, 30 May 2023, Kewen.Lin wrote:

> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> > Ok.
> > 
> > It seems that for this conditions:
> > 
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > +   LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > 
> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
> 
> I guess no, with this condition any targets without SELECT_VL are unable
> to leverage the new decrement scheme for lengths, as your reply in PR109971
> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
> one new IV just like the previous one which has non-variable step, then it's
> SCEV-ed and some analysis based on it can do a good job.

No, I said the current scheme does sth along

 do {
   remain -= MIN (vf, remain);
 } while (remain != 0);

and I suggest to instead do

 do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
 } while (old_remain >= vf);

basically since only the last iteration will have len < vf we can
ignore that remain -= vf will underflow there if we appropriately
rewrite the exit test to use the pre-decrement value.

> Since this is mainly for targets without SELECT_VL capability, I can follow
> up this if you don't mind.
> 
> BR,
> Kewen
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
>> No, I said the current scheme does sth along

>> do {
>>remain -= MIN (vf, remain);
>> } while (remain != 0);

>> and I suggest to instead do

>> do {
>>old_remain = remain;
>>len = MIN (vf, remain);
>>remain -= vf;
>> } while (old_remain >= vf);

>> basically since only the last iteration will have len < vf we can
>> ignore that remain -= vf will underflow there if we appropriately
>> rewrite the exit test to use the pre-decrement value.

Oh, I understand you now. I will definitely have a try and send a patch.

Thank you so much.

By the way, could you take a look at SELECT_VL patch?
I guess you want to defer it to Richard and I will wait but still I think your 
comment is very important.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 18:00
To: Kewen.Lin
CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford
Subject: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, Kewen.Lin wrote:
 
> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> > Ok.
> > 
> > It seems that for this conditions:
> > 
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > 
> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
> 
> I guess no, with this condition any targets without SELECT_VL are unable
> to leverage the new decrement scheme for lengths, as your reply in PR109971
> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
> one new IV just like the previous one which has non-variable step, then it's
> SCEV-ed and some analysis based on it can do a good job.
 
No, I said the current scheme does sth along
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
and I suggest to instead do
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
basically since only the last iteration will have len < vf we can
ignore that remain -= vf will underflow there if we appropriately
rewrite the exit test to use the pre-decrement value.
 
> Since this is mainly for targets without SELECT_VL capability, I can follow
> up this if you don't mind.
> 
> BR,
> Kewen
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 


Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Sandiford via Gcc-patches
My understanding was that we went into this knowing that the IVs
would defeat SCEV analysis.  Apparently that wasn't a problem for RVV,
but it's not surprising that it is a problem in general.

This isn't just about SELECT_VL though.  We use the same type of IV
for cases what aren't going to use SELECT_VL.

Richard Biener  writes:
> On Tue, 30 May 2023, Kewen.Lin wrote:
>
>> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
>> > Ok.
>> > 
>> > It seems that for this conditions:
>> > 
>> > +  /* If we're vectorizing a loop that uses length "controls" and
>> > + can iterate more than once, we apply decrementing IV approach
>> > + in loop control.  */
>> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
>> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
>> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
>> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>> > + && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
>> > +  LOOP_VINFO_VECT_FACTOR (loop_vinfo
>> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
>> > 
>> > 
>> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
>> 
>> I guess no, with this condition any targets without SELECT_VL are unable
>> to leverage the new decrement scheme for lengths, as your reply in PR109971
>> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
>> one new IV just like the previous one which has non-variable step, then it's
>> SCEV-ed and some analysis based on it can do a good job.
>
> No, I said the current scheme does sth along
>
>  do {
>remain -= MIN (vf, remain);
>  } while (remain != 0);
>
> and I suggest to instead do
>
>  do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
>  } while (old_remain >= vf);
>
> basically since only the last iteration will have len < vf we can
> ignore that remain -= vf will underflow there if we appropriately
> rewrite the exit test to use the pre-decrement value.

Yeah, agree that should work.

But how easy would it be to extend SCEV analysis, via a pattern match?
The evolution of the IV phi wrt the inner loop is still a normal SCEV.

Thanks,
Richard


Re: [PATCH] libatomic: Provide gthr.h default implementation

2023-05-30 Thread Sebastian Huber

On 30.05.23 11:53, Richard Biener wrote:

On Tue, May 23, 2023 at 11:28 AM Sebastian Huber
  wrote:

On 10.01.23 16:38, Sebastian Huber wrote:

On 19/12/2022 17:02, Sebastian Huber wrote:

Build libatomic for all targets.  Use gthr.h to provide a default
implementation.  If the thread model is "single", then this
implementation will
not work if for example atomic operations are used for thread/interrupt
synchronization.

Is this and the related -fprofile-update=atomic patch something for GCC 14?

Now that the GCC 14 development is in progress, what about this patch?

Sorry, there doesn't seem to be a main maintainer for libatomic and your patch
touches targets which didn't have it before.

Can you explain how this affects the ABI of targets not having (needing?!)
libatomic?  It might help if you can say this is still opt-in and targets not
building libatomic right now would not with your patch and targets already
building libatomic have no changes with your patch.

That said - what kind of ABI implications has providing libatomic support
for a target that didn't do so before?


Sorry for the missing context. The root problem I want to solve is 
getting gcov support for multi-threaded applications. For this we need 
atomic 64-bit operations, see also:


https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608620.html

The libatomic patch lets it build for every target. Targets with no 
explicit support will use the gthr.h API to provide a default 
implementation.


An alternative would be to use the RTEMS approach which uses the 
following API (provided by Newlib  for RTEMS):


#include 
#include 

__BEGIN_DECLS

__uint32_t _Libatomic_Protect_start(void *);

void _Libatomic_Protect_end(void *, __uint32_t);

void _Libatomic_Lock_n(void *, __size_t);

void _Libatomic_Unlock_n(void *, __size_t);

__END_DECLS

We could also leave libatomic as is, but then you may get unresolved 
references if you use -fprofile-update=atomic with the patch mentioned 
above.


--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, 30 May 2023, Richard Sandiford wrote:

> My understanding was that we went into this knowing that the IVs
> would defeat SCEV analysis.  Apparently that wasn't a problem for RVV,
> but it's not surprising that it is a problem in general.
> 
> This isn't just about SELECT_VL though.  We use the same type of IV
> for cases what aren't going to use SELECT_VL.
> 
> Richard Biener  writes:
> > On Tue, 30 May 2023, Kewen.Lin wrote:
> >
> >> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> >> > Ok.
> >> > 
> >> > It seems that for this conditions:
> >> > 
> >> > +  /* If we're vectorizing a loop that uses length "controls" and
> >> > + can iterate more than once, we apply decrementing IV approach
> >> > + in loop control.  */
> >> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> >> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> >> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> >> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >> > +   && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> >> > +LOOP_VINFO_VECT_FACTOR (loop_vinfo
> >> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> >> > 
> >> > 
> >> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
> >> 
> >> I guess no, with this condition any targets without SELECT_VL are unable
> >> to leverage the new decrement scheme for lengths, as your reply in PR109971
> >> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
> >> one new IV just like the previous one which has non-variable step, then 
> >> it's
> >> SCEV-ed and some analysis based on it can do a good job.
> >
> > No, I said the current scheme does sth along
> >
> >  do {
> >remain -= MIN (vf, remain);
> >  } while (remain != 0);
> >
> > and I suggest to instead do
> >
> >  do {
> >old_remain = remain;
> >len = MIN (vf, remain);
> >remain -= vf;
> >  } while (old_remain >= vf);
> >
> > basically since only the last iteration will have len < vf we can
> > ignore that remain -= vf will underflow there if we appropriately
> > rewrite the exit test to use the pre-decrement value.
> 
> Yeah, agree that should work.

Btw, it's still on my TOOD list (unless somebody beats me...) to
rewrite the vectorizer code gen to do all loop control and conditions
on a decrementing "remaining scalar iters" IV.

> But how easy would it be to extend SCEV analysis, via a pattern match?
> The evolution of the IV phi wrt the inner loop is still a normal SCEV.

No, the IV isn't a normal SCEV, the final value is different.
I think pattern matching this in niter analysis could work though.

Richard.


Re: [PATCH] riscv: update riscv_asan_shadow_offset

2023-05-30 Thread Kito Cheng via Gcc-patches
Andreas Schwab via Gcc-patches  於 2023年5月30日 週二
17:37 寫道:

> Ok for 12 and 13 branch?
>

Yes, thanks!


> --
> Andreas Schwab, SUSE Labs, sch...@suse.de
> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
> "And now for something completely different."
>


[PATCH] Update perf auto profile script

2023-05-30 Thread Andi Kleen via Gcc-patches
- Fix gen_autofdo_event: The download URL for the Intel Perfmon Event
  list has changed, as well as the JSON format.
  Also it now uses pattern matching to match CPUs. Update the script to support 
all of this.
- Regenerate gcc-auto-profile with the latest published Intel model
  numbers, so it works with recent systems.
- So far it's still broken on hybrid systems
---
 contrib/gen_autofdo_event.py | 7 ---
 gcc/config/i386/gcc-auto-profile | 9 -
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
index ac23b83888db..533c706c090b 100755
--- a/contrib/gen_autofdo_event.py
+++ b/contrib/gen_autofdo_event.py
@@ -32,8 +32,9 @@ import json
 import argparse
 import collections
 import os
+import fnmatch
 
-baseurl = "https://download.01.org/perfmon";
+baseurl = "https://raw.githubusercontent.com/intel/perfmon/main";
 
 target_events = ('BR_INST_RETIRED.NEAR_TAKEN',
  'BR_INST_EXEC.TAKEN',
@@ -74,7 +75,7 @@ def get_cpustr():
 def find_event(eventurl, model):
 print("Downloading", eventurl, file = sys.stderr)
 u = urllib.request.urlopen(eventurl)
-events = json.loads(u.read())
+events = json.loads(u.read())["Events"]
 u.close()
 
 found = 0
@@ -102,7 +103,7 @@ found = 0
 cpufound = 0
 for j in u:
 n = j.rstrip().decode().split(',')
-if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
+if len(n) >= 4 and (args.all or fnmatch.fnmatch(cpu, n[0])) and n[3] == 
"core":
 components = n[0].split("-")
 model = components[2]
 model = int(model, 16)
diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile
index 5ab224b041b9..04f7d35dcc51 100755
--- a/gcc/config/i386/gcc-auto-profile
+++ b/gcc/config/i386/gcc-auto-profile
@@ -43,8 +43,10 @@ model*:\ 47|\
 model*:\ 37|\
 model*:\ 44) E="cpu/event=0x88,umask=0x40/$FLAGS" ;;
 model*:\ 55|\
+model*:\ 74|\
 model*:\ 77|\
 model*:\ 76|\
+model*:\ 90|\
 model*:\ 92|\
 model*:\ 95|\
 model*:\ 87|\
@@ -75,14 +77,19 @@ model*:\ 165|\
 model*:\ 166|\
 model*:\ 85|\
 model*:\ 85) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
+model*:\ 125|\
 model*:\ 126|\
+model*:\ 167|\
 model*:\ 140|\
 model*:\ 141|\
 model*:\ 143|\
+model*:\ 207|\
 model*:\ 106|\
 model*:\ 108) E="cpu/event=0xc4,umask=0x20/p$FLAGS" ;;
 model*:\ 134|\
-model*:\ 150) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
+model*:\ 150|\
+model*:\ 156|\
+model*:\ 190) E="cpu/event=0xc4,umask=0xfe/p$FLAGS" ;;
 *)
 echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to 
update script."
exit 1 ;;
-- 
2.40.1



[committed] OpenMP: Improve C/C++ parsing error message [PR109999]

2023-05-30 Thread Tobias Burnus

I stumbled over that error message the other day and found it a bit
confusing:

  error: expected ‘#pragma omp’ clause before ‘uses_allocators’

The new wording is not the best, but I think at least better:

  error: expected an OpenMP clause before ‘uses_allocators’

('uses_allocators' is a valid OpenMP 5.x clause but not yet handled in GCC;
There is a patch that implements the support but some review-requested
changes are still required before it can be merged.)

Committed as r14-1404-ga899401404186843f38462c8fc9de733f19ce864

Tobias

PS: Jakub did wonder about the translatability of "expect ..." + "",
but it is a preexisting issue – and the commit makes it neither worse
nor better than before.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit a899401404186843f38462c8fc9de733f19ce864
Author: Tobias Burnus 
Date:   Tue May 30 12:49:09 2023 +0200

OpenMP: Improve C/C++ parsing error message [PR10]

Replace
  error: expected '#pragma omp' clause before ...
by the the more readable/clearer
  error: expected an OpenMP clause before ...

(And likewise for '#pragma acc' and OpenACC.)

PR c/10

gcc/c/ChangeLog:

* c-parser.cc (c_parser_oacc_all_clauses,
c_parser_omp_all_clauses): Improve error wording.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_all_clauses,
cp_parser_omp_all_clauses): Improve error wording.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/asyncwait-1.c: Update dg-error.
* c-c++-common/goacc/clauses-fail.c: Likewise.
* c-c++-common/goacc/data-2.c: Likewise.
* c-c++-common/gomp/declare-target-2.c: Likewise.
* c-c++-common/gomp/directive-1.c: Likewise.
* g++.dg/goacc/data-1.C: Likewise.
---
 gcc/c/c-parser.cc  | 4 ++--
 gcc/cp/parser.cc   | 4 ++--
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c | 4 ++--
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c| 8 
 gcc/testsuite/c-c++-common/goacc/data-2.c  | 2 +-
 gcc/testsuite/c-c++-common/gomp/declare-target-2.c | 4 ++--
 gcc/testsuite/c-c++-common/gomp/directive-1.c  | 2 +-
 gcc/testsuite/g++.dg/goacc/data-1.C| 4 ++--
 8 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 0ec75348091..5baa501dbee 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -17692,7 +17692,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 		c_name, clauses);
 	  break;
 	default:
-	  c_parser_error (parser, "expected %<#pragma acc%> clause");
+	  c_parser_error (parser, "expected an OpenACC clause");
 	  goto saw_error;
 	}
 
@@ -18050,7 +18050,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "enter";
 	  break;
 	default:
-	  c_parser_error (parser, "expected %<#pragma omp%> clause");
+	  c_parser_error (parser, "expected an OpenMP clause");
 	  goto saw_error;
 	}
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5feed77c7ac..1c9aa671851 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -41087,7 +41087,7 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 		 c_name, clauses);
 	  break;
 	default:
-	  cp_parser_error (parser, "expected %<#pragma acc%> clause");
+	  cp_parser_error (parser, "expected an OpenACC clause");
 	  goto saw_error;
 	}
 
@@ -41489,7 +41489,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "enter";
 	  break;
 	default:
-	  cp_parser_error (parser, "expected %<#pragma omp%> clause");
+	  cp_parser_error (parser, "expected an OpenMP clause");
 	  goto saw_error;
 	}
 
diff --git a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
index 1857d65a0b2..3f0b1451cf6 100644
--- a/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
@@ -185,9 +185,9 @@ f (int N, float *a, float *b)
 
 #pragma acc wait (1.0) /* { dg-error "expression must be integral" } */
 
-#pragma acc wait 1 /* { dg-error "expected '#pragma acc' clause before numeric constant" } */
+#pragma acc wait 1 /* { dg-error "expected an OpenACC clause before numeric constant" } */
 
-#pragma acc wait N /* { dg-error "expected '#pragma acc' clause before 'N'" } */
+#pragma acc wait N /* { dg-error "expected an OpenACC clause before 'N'" } */
 
 #pragma acc wait async (1 2) /* { dg-error "expected '\\)' before numeric constant" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/clauses-fail.c b/gcc/testsuite/c-c++-common/goacc/clauses-fail.c
index 853d010d038..41d7e6bd194 100644
--- a

Re: [PATCH] libatomic: Provide gthr.h default implementation

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 30, 2023 at 12:17 PM Sebastian Huber
 wrote:
>
> On 30.05.23 11:53, Richard Biener wrote:
> > On Tue, May 23, 2023 at 11:28 AM Sebastian Huber
> >   wrote:
> >> On 10.01.23 16:38, Sebastian Huber wrote:
> >>> On 19/12/2022 17:02, Sebastian Huber wrote:
>  Build libatomic for all targets.  Use gthr.h to provide a default
>  implementation.  If the thread model is "single", then this
>  implementation will
>  not work if for example atomic operations are used for thread/interrupt
>  synchronization.
> >>> Is this and the related -fprofile-update=atomic patch something for GCC 
> >>> 14?
> >> Now that the GCC 14 development is in progress, what about this patch?
> > Sorry, there doesn't seem to be a main maintainer for libatomic and your 
> > patch
> > touches targets which didn't have it before.
> >
> > Can you explain how this affects the ABI of targets not having (needing?!)
> > libatomic?  It might help if you can say this is still opt-in and targets 
> > not
> > building libatomic right now would not with your patch and targets already
> > building libatomic have no changes with your patch.
> >
> > That said - what kind of ABI implications has providing libatomic support
> > for a target that didn't do so before?
>
> Sorry for the missing context. The root problem I want to solve is
> getting gcov support for multi-threaded applications. For this we need
> atomic 64-bit operations, see also:

I was aware of the context but still worry about the ABI implications.
A target that doesn't build libatomic but would need one currently
has "unsupported" (aka fail to link) atomic operations that require
libatomic support.  After your patch such targets suddenly have
a new ABI (and supported atomic ops) - this ABI they need to
maintain for compatibility reasons I think but it would be (likely)
not documented anywhere.

I think that's undesirable, esp. without buy-in from the affected
target maintainers.

> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608620.html
>
> The libatomic patch lets it build for every target. Targets with no
> explicit support will use the gthr.h API to provide a default
> implementation.
>
> An alternative would be to use the RTEMS approach which uses the
> following API (provided by Newlib  for RTEMS):
>
> #include 
> #include 
>
> __BEGIN_DECLS
>
> __uint32_t _Libatomic_Protect_start(void *);
>
> void _Libatomic_Protect_end(void *, __uint32_t);
>
> void _Libatomic_Lock_n(void *, __size_t);
>
> void _Libatomic_Unlock_n(void *, __size_t);
>
> __END_DECLS
>
> We could also leave libatomic as is, but then you may get unresolved
> references if you use -fprofile-update=atomic with the patch mentioned
> above.

The alternative would be to provide the required subset of atomic
library functions from libgcov.a and emit calls to that directly?
The locked data isn't part of any ABI so no compatibility guarantee
needs to be maintained?

Richard.

>
> --
> embedded brains GmbH
> Herr Sebastian HUBER
> Dornierstr. 4
> 82178 Puchheim
> Germany
> email: sebastian.hu...@embedded-brains.de
> phone: +49-89-18 94 741 - 16
> fax:   +49-89-18 94 741 - 08
>
> Registergericht: Amtsgericht München
> Registernummer: HRB 157899
> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
> Unsere Datenschutzerklärung finden Sie hier:
> https://embedded-brains.de/datenschutzerklaerung/


Re: [PATCH V2] [vect]Enhance NARROW FLOAT_EXPR vectorization by truncating integer to lower precision.

2023-05-30 Thread Richard Biener via Gcc-patches
On Mon, May 29, 2023 at 5:21 AM Hongtao Liu via Gcc-patches
 wrote:
>
> ping.
>
> On Mon, May 8, 2023 at 9:59 AM liuhongt  wrote:
> >
> > > > @@ -4799,7 +4800,8 @@ vect_create_vectorized_demotion_stmts (vec_info 
> > > > *vinfo, vec *vec_oprnds,
> > > >stmt_vec_info stmt_info,
> > > >vec &vec_dsts,
> > > >gimple_stmt_iterator *gsi,
> > > > -  slp_tree slp_node, enum 
> > > > tree_code code)
> > > > +  slp_tree slp_node, enum 
> > > > tree_code code,
> > > > +  bool last_stmt_p)
> > >
> > > Can you please document this new parameter?
> > >
> > Changed.
> >
> > >
> > > I understand what you are doing, but somehow it looks a bit awkward?
> > > Maybe we should split the NARROW case into NARROW_SRC and NARROW_DST?
> > > The case of narrowing the source because we know its range isn't a
> > > good fit for the
> > > flow.
> > Changed.
> >
> > Here's updated patch.
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?

OK, sorry for the delay.

Thanks,
Richard.

> > Similar like WIDEN FLOAT_EXPR, when direct_optab is not existed, try
> > intermediate integer type whenever gimple ranger can tell it's safe.
> >
> > .i.e.
> > When there's no direct optab for vector long long -> vector float, but
> > the value range of integer can be represented as int, try vector int
> > -> vector float if availble.
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/108804
> > * tree-vect-patterns.cc (vect_get_range_info): Remove static.
> > * tree-vect-stmts.cc (vect_create_vectorized_demotion_stmts):
> > Add new parameter narrow_src_p.
> > (vectorizable_conversion): Enhance NARROW FLOAT_EXPR
> > vectorization by truncating to lower precision.
> > * tree-vectorizer.h (vect_get_range_info): New declare.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/pr108804.c: New test.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr108804.c |  15 +++
> >  gcc/tree-vect-patterns.cc|   2 +-
> >  gcc/tree-vect-stmts.cc   | 135 +--
> >  gcc/tree-vectorizer.h|   1 +
> >  4 files changed, 121 insertions(+), 32 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr108804.c
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr108804.c 
> > b/gcc/testsuite/gcc.target/i386/pr108804.c
> > new file mode 100644
> > index 000..2a43c1e1848
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr108804.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx2 -Ofast -fdump-tree-vect-details" } */
> > +/* { dg-final { scan-tree-dump-times "vectorized \[1-3] loops" 1 "vect" } 
> > } */
> > +
> > +typedef unsigned long long uint64_t;
> > +uint64_t d[512];
> > +float f[1024];
> > +
> > +void foo() {
> > +for (int i=0; i<512; ++i) {
> > +uint64_t k = d[i];
> > +f[i]=(k & 0x3F30);
> > +}
> > +}
> > +
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index a49b0953977..dd546b488a4 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -61,7 +61,7 @@ along with GCC; see the file COPYING3.  If not see
> >  /* Return true if we have a useful VR_RANGE range for VAR, storing it
> > in *MIN_VALUE and *MAX_VALUE if so.  Note the range in the dump files.  
> > */
> >
> > -static bool
> > +bool
> >  vect_get_range_info (tree var, wide_int *min_value, wide_int *max_value)
> >  {
> >value_range vr;
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 6b7dbfd4a23..3da89a8402d 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "internal-fn.h"
> >  #include "tree-vector-builder.h"
> >  #include "vec-perm-indices.h"
> > +#include "gimple-range.h"
> >  #include "tree-ssa-loop-niter.h"
> >  #include "gimple-fold.h"
> >  #include "regs.h"
> > @@ -4791,7 +4792,9 @@ vect_gen_widened_results_half (vec_info *vinfo, enum 
> > tree_code code,
> >
> >  /* Create vectorized demotion statements for vector operands from 
> > VEC_OPRNDS.
> > For multi-step conversions store the resulting vectors and call the 
> > function
> > -   recursively.  */
> > +   recursively. When NARROW_SRC_P is true, there's still a conversion after
> > +   narrowing, don't store the vectors in the SLP_NODE or in vector info of
> > +   the scalar statement(or in STMT_VINFO_RELATED_STMT chain).  */
> >
> >  static void
> >  vect_create_vectorized_demotion_stmts (vec_info *vinfo, vec 
> > *vec_oprnds,
> > @@ -4799,7 +4802,8 @@ vect_create_vectorized_demotion_stmts (vec_info 
> > *vinfo, vec *vec_oprnds,
> >stmt_vec_info 

Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, May 30, 2023 at 9:35 AM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> On 30/05/23 12:34 pm, Richard Biener wrote:
> > On Tue, May 30, 2023 at 7:06 AM Ajit Agarwal  wrote:
> >>
> >> Hello Richard:
> >>
> >> On 22/05/23 6:26 pm, Richard Biener wrote:
> >>> On Thu, May 18, 2023 at 9:14 AM Ajit Agarwal  
> >>> wrote:
> 
>  Hello All:
> 
>  This patch improves code sinking pass to sink statements before call to 
>  reduce
>  register pressure.
>  Review comments are incorporated.
> 
>  Bootstrapped and regtested on powerpc64-linux-gnu.
> 
>  Thanks & Regards
>  Ajit
> 
> 
>  tree-ssa-sink: Improve code sinking pass.
> 
>  Code Sinking sinks the blocks after call. This increases
>  register pressure for callee-saved registers. Improves
>  code sinking before call in the use blocks or immediate
>  dominator of use blocks.
> 
>  2023-05-18  Ajit Kumar Agarwal  
> 
>  gcc/ChangeLog:
> 
>  * tree-ssa-sink.cc (statement_sink_location): Modifed to
>  move statements before calls.
>  (block_call_p): New function.
>  (def_use_same_block): New function.
>  (select_best_block): Add heuristics to select the best
>  blocks in the immediate post dominator.
> 
>  gcc/testsuite/ChangeLog:
> 
>  * gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
>  * gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
>  ---
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c |  16 ++
>   gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c |  20 +++
>   gcc/tree-ssa-sink.cc| 159 ++--
>   3 files changed, 185 insertions(+), 10 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> 
>  diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
>  b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
>  new file mode 100644
>  index 000..716bc1f9257
>  --- /dev/null
>  +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
>  @@ -0,0 +1,16 @@
>  +/* { dg-do compile } */
>  +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
>  -fdump-tree-sink-stats" } */
>  +
>  +void bar();
>  +int j;
>  +void foo(int a, int b, int c, int d, int e, int f)
>  +{
>  +  int l;
>  +  l = a + b + c + d +e + f;
>  +  if (a != 5)
>  +{
>  +  bar();
>  +  j = l;
>  +}
>  +}
>  +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } 
>  */
> >>>
> >>> this doesn't verify the place we sink to?
> >>>
> >>
> >> I am not sure how to verify the place we sink to with dg-final.
> >
> > I think dejagnu supports matching multi-line regexps so I suggest
> > to scan for the sunk expr RHS to be followed by the call?
> >
>
> You meant to use dg-begin-multiline-output and dg-end-multiline-output.

I was referring to uses like that in gcc.dg/debug/dwarf2/pr41445-6.c

> Thanks & Regards
> Ajit
>  diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
>  b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  new file mode 100644
>  index 000..ff41e2ea8ae
>  --- /dev/null
>  +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  @@ -0,0 +1,20 @@
>  +/* { dg-do compile } */
>  +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */
>  +
>  +void bar();
>  +int j, x;
>  +void foo(int a, int b, int c, int d, int e, int f)
>  +{
>  +  int l;
>  +  l = a + b + c + d +e + f;
>  +  if (a != 5)
>  +{
>  +  bar();
>  +  if (b != 3)
>  +x = 3;
>  +  else
>  +x = 5;
>  +  j = l;
>  +}
>  +}
>  +/* { dg-final { scan-tree-dump-times "Sunk statements: 5" 1 "sink" } } 
>  */
> >>>
> >>> likewise.  So both tests already pass before the patch?
> >>>
>  diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>  index 87b1d40c174..76556e7795b 100644
>  --- a/gcc/tree-ssa-sink.cc
>  +++ b/gcc/tree-ssa-sink.cc
>  @@ -171,6 +171,72 @@ nearest_common_dominator_of_uses (def_operand_p 
>  def_p, bool *debug_stmts)
> return commondom;
>   }
> 
>  +/* Return TRUE if immediate uses of the defs in
>  +   USE occur in the same block as USE, FALSE otherwise.  */
>  +
>  +bool
>  +def_use_same_block (gimple *stmt)
>  +{
>  +  use_operand_p use_p;
>  +  def_operand_p def_p;
>  +  imm_use_iterator imm_iter;
>  +  ssa_op_iter iter;
>  +
>  +  FOR_EACH_SSA_DEF_OPERAND (def_p, stmt, iter, SSA_OP_DEF)
>  +{
>  +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, DEF_FROM_PTR (def_p))
>  +   {
>  + if (is_gimple_debug (USE_STMT (use_p)))
>  +   cont

[PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Follow Richi's suggestion, I change current decrement IV flow from:

do {
   remain -= MIN (vf, remain);
} while (remain != 0);

into:

do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);

to enhance SCEV.

ALL tests (decrement IV) of RVV are passed.
Ok for trunk?

gcc/ChangeLog:

* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.

---
 gcc/tree-vect-loop-manip.cc | 40 +
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..ef28711c58f 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
 gimple_stmt_iterator loop_cond_gsi,
 rgroup_controls *rgc, tree niters,
 tree niters_skip, bool might_wrap_p,
-tree *iv_step)
+tree *iv_step, tree *compare_step)
 {
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,24 +538,26 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-  ivtmp_35 = ivtmp_9 - _36;
+  ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-  if (ivtmp_35 != 0)
-goto ; [83.33%]
+  if (ivtmp_9 > POLY_INT_CST [4, 4])
+goto ; [83.33%]
   else
-goto ; [16.67%]
+goto ; [16.67%]
   */
   nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
-insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+&incr_gsi, insert_after, &index_before_incr,
+&index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
 
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 &preheader_seq, &header_seq,
 loop_cond_gsi, rgc, niters,
 niters_skip, might_wrap_p,
-&iv_step);
+&iv_step, &compare_step);
 
iv_rgc = rgc;
  }
@@ -884,11 +887,22 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
 
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
+  gcond *cond_stmt;
   tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
-   NULL_TREE, NULL_TREE);
-  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step,
+NULL_TREE, NULL_TREE);
+  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  cond_stmt
+   = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+  gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
+}
 
   /* The loop iterates (NITERS - 1) / VF + 1 times.
  Subtract one from this to get the latch count.  */
-- 
2.36.1



Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
>> But how easy would it be to extend SCEV analysis, via a pattern match?
>> The evolution of the IV phi wrt the inner loop is still a normal SCEV.
>
> No, the IV isn't a normal SCEV, the final value is different.

Which part of the IV though?  Won't all executions of the latch edge
decrement the IV phi (and specifically the phi) by VF (and only VF)?
So if we analyse the IV phi wrt the inner loop, the IV phi is simply
{ initial, -, VF }.

I agree "IV phi - step" isn't a SCEV, but that doesn't seem fatal.

Richard


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, Richi.
I have send patch by following your suggestion and change the decrement IV 
follow:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html 

It works well in RVV.

Could you take a look at it?
If it's ok, I will send patch of SELECT_VL base on this.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > > 
> > > This conditions can not disable decrement IV on PowerPC.
> > > Should I add a target hook for it?
> >  
> > No.  I've put some analysis in the PR.  To me the question is
> > why (without that SELECT_VL case) we need a decrementing IV
> > _for the loop control_?  We could simply retain the original
> > incrementing IV for loop control and add the decrementing
> > IV for computing LEN in addition to that and leave IVOPTs
> > sorting out to eventually merge them (or not).
> >  
> > Alternatively

Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread Richard Sandiford via Gcc-patches
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.

How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.

Richard


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai
Before this patch:
foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
  sub   a2,a2,a5
bne a2,zero,.L3
.L5:
ret

After this patch:

foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
neg a7,a4   -->>>additional instruction
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
mv a6,a2  -->>>additional instruction
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
add a2,a2,a7
bgtu a6,a4,.L3
.L5:
ret

There is 1 more instruction in preheader and 1 more instruction in loop.
But I think it's OK for RVV since we will definitely be using SELECT_VL so this 
issue will gone.
As long as this flow is better to power (SCEV)。



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard
 


Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, 30 May 2023, Richard Sandiford wrote:

> Richard Biener  writes:
> >> But how easy would it be to extend SCEV analysis, via a pattern match?
> >> The evolution of the IV phi wrt the inner loop is still a normal SCEV.
> >
> > No, the IV isn't a normal SCEV, the final value is different.
> 
> Which part of the IV though?

The relevant IV (for niter analysis) is the one used in the loop
exit test and that currently isn't a SCEV.  The IV used in the
*_len operations isn't either (and that's not going to change,
obviously).

>  Won't all executions of the latch edge
> decrement the IV phi (and specifically the phi) by VF (and only VF)?

But currently there's no decrement by invariant VF but only
by MIN (VF, remain), that's what I suggested to address to
make the loop exit condition analyzable (as said, in theory
we can try pattern matching the analysis of the exit test in
niter analysis).

> So if we analyse the IV phi wrt the inner loop, the IV phi is simply
> { initial, -, VF }.
> 
> I agree "IV phi - step" isn't a SCEV, but that doesn't seem fatal.

Right.  Fatal is the non-SCEV in the exit test which makes most
followup loop optimizations fail to consider the loop because the
number of iterations cannot be determined.

Richard.


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai
>> How does it affect RVV code quality?  I thought you specifically chose
>> the previous approach because code quality was better that way.
Yes, previous way is better for RVV.  But as I said, we will definitely use 
SELECT_VL then
in SELECT_VL,  we will using remain - step (produced by SELET_VL).



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard
 


Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Before this patch:
> foo:
> ble a2,zero,.L5
> csrr a3,vlenb
> srli a4,a3,2
> .L3:
> minu a5,a2,a4
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a0)
> vsetvli t1,zero,e32,m1,ta,ma
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a3
> add a0,a0,a3
>   sub   a2,a2,a5
> bne a2,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> foo:
> ble a2,zero,.L5
> csrr a3,vlenb
> srli a4,a3,2
> neg a7,a4   -->>>additional instruction
> .L3:
> minu a5,a2,a4
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a0)
> vsetvli t1,zero,e32,m1,ta,ma
> mv a6,a2  -->>>additional instruction
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a0)
> add a1,a1,a3
> add a0,a0,a3
> add a2,a2,a7
> bgtu a6,a4,.L3
> .L5:
> ret
>
> There is 1 more instruction in preheader and 1 more instruction in loop.
> But I think it's OK for RVV since we will definitely be using SELECT_VL so 
> this issue will gone.

But what about cases where you won't be using SELECT_VL, such as SLP?

Richard


Re: [PATCH] MIPS: don't expand large block move

2023-05-30 Thread Maciej W. Rozycki
On Wed, 24 May 2023, YunQiang Su wrote:

> > or even:
> >
> >   if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_STRAIGHT)
> > ...
> >   else if (INTVAL (length) < 64 && optimize)
> > ...
> >
> 
> I don't think this is a good option, since somebody may add some code,
> and may break our logic.

 There's no need to plan ahead for changes, which may never happen.  As it 
stands reading through such flattened code here requires one step less to 
analyse as there's one nested level and one exit point less here.  If more 
cases have to be added in the future, then whoever needs them will make 
any necessary adjustments to the structure (assuming minimal understanding 
how code works, which I think is a reasonable requirement for working on a 
compiler).

  Maciej


Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread Richard Sandiford via Gcc-patches
"juzhe.zhong"  writes:
> Maybe we can include rgroup number into select vl pattern?So that, I always
> use select vl pattern. In my backend, if it is single rgroup,we gen vsetvl,
> otherwise we gen min.

That just seems to be a way of hiding an “is the target RVV?” test though.

IMO targets shouldn't implement SELECT_VL in cases where it's always
equivalent to MIN.

Also, my understanding was that Kewen was happy with the new IV approach
in general.  It's just the (half-expected) effect on SCEV/niters analysis
that's the problem.

I think it's worth at least looking how difficult it would be to extend
niters analysis.  It could end up being beneficial for RVV too (in future).

Thanks,
Richard


Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread Richard Biener via Gcc-patches
On Tue, 30 May 2023, juzhe.zhong wrote:

> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.

That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.

>  Replied Message 
> From
> Richard Sandiford
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai
> Cc
> gcc-patches,
> rguenther,
> linkw
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai"  writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >   sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread 钟居哲

>> That's odd, you only need to adjust the IV which is used in the exit test,
>> not all the others.
Sorry for my incorrect information. I checked the codegen of both single-rgroup 
and multi-rgroup.
Their codegen are same behavior, after this patch, there will be 1 more neg 
instruction in preheader
and 1 more mv instruction inside the loop.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
>  Replied Message 
> From
> Richard Sandiford
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai
> Cc
> gcc-patches,
> rguenther,
> linkw
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai"  writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >   sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Ping #1: [patch,avr] Fix PR109650 wrong code

2023-05-30 Thread Georg-Johann Lay

Ping #1 for:

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618976.html

https://gcc.gnu.org/pipermail/gcc-patches/attachments/20230519/9536bf8c/attachment-0001.bin

Johann

Am 19.05.23 um 10:49 schrieb Georg-Johann Lay:


Here is a revised version of the patch.  The difference to the
previous one is that it adds some combine patterns for *cbranch
insns that were lost in the PR92729 transition.  The post-reload
part of the patterns were still there.  The new patterns are
slightly more general in that they also handle fixed-point modes.

Apart from that, the patch behaves the same:

Am 15.05.23 um 20:05 schrieb Georg-Johann Lay:

This patch fixes a wrong-code bug in the wake of PR92729, the transition
that turned the AVR backend from cc0 to CCmode.  In cc0, the insn that
uses cc0 like a conditional branch always follows the cc0 setter, which
is no more the case with CCmode where set and use of REG_CC might be in
different basic blocks.

This patch removes the machine-dependent reorg pass in avr_reorg 
entirely.


It is replaced by a new, AVR specific mini-pass that runs prior to
split2. Canonicalization of comparisons away from the "difficult"
codes GT[U] and LE[U] is now mostly performed by implementing
TARGET_CANONICALIZE_COMPARISON.

Moreover:

* Text peephole conditions get "dead_or_set_regno_p (*, REG_CC)" as
needed.

* RTL peephole conditions get "peep2_regno_dead_p (*, REG_CC)" as
needed.

* Conditional branches no more clobber REG_CC.

* insn output for compares looks ahead to determine the branch mode in
use. This needs also "dead_or_set_regno_p (*, REG_CC)".

* Add RTL peepholes for decrement-and-branch detection.

Finally, it fixes some of the many indentation glitches left over from
PR92729.

Ok?

I'd also backport this one because all of v12+ is affected by the 
wrong code.


Johann

--

gcc/
 PR target/109650
 PR target/92729

 * config/avr/avr-passes.def (avr_pass_ifelse): Insert new pass.
 * config/avr/avr.cc (avr_pass_ifelse): New RTL pass.
 (avr_pass_data_ifelse): New pass_data for it.
 (make_avr_pass_ifelse, avr_redundant_compare, avr_cbranch_cost)
 (avr_canonicalize_comparison, avr_out_plus_set_ZN)
 (avr_out_cmp_ext): New functions.
 (compare_condtition): Make sure REG_CC dies in the branch insn.
 (avr_rtx_costs_1): Add computation of cbranch costs.
 (avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_ZN, ADJUST_LEN_CMP_ZEXT]:
 [ADJUST_LEN_CMP_SEXT]Handle them.
 (TARGET_CANONICALIZE_COMPARISON): New define.
 (avr_simplify_comparison_p, compare_diff_p, avr_compare_pattern)
 (avr_reorg_remove_redundant_compare, avr_reorg): Remove functions.
 (TARGET_MACHINE_DEPENDENT_REORG): Remove define.

 * avr-protos.h (avr_simplify_comparison_p): Remove proto.
 (make_avr_pass_ifelse, avr_out_plus_set_ZN, cc_reg_rtx)
 (avr_out_cmp_zext): New Protos

 * config/avr/avr.md (branch, difficult_branch): Don't split insns.
 (*cbranchhi.zero-extend.0", *cbranchhi.zero-extend.1")
 (*swapped_tst, *add.for.eqne.): New insns.
 (*cbranch4): Rename to cbranch4_insn.
 (define_peephole): Add dead_or_set_regno_p(insn,REG_CC) as needed.
 (define_deephole2): Add peep2_regno_dead_p(*,REG_CC) as needed.
 Add new RTL peepholes for decrement-and-branch and *swapped_tst.
 Rework signtest-and-branch peepholes for *sbrx_branch.
 (adjust_len) [add_set_ZN, cmp_zext]: New.
 (QIPSI): New mode iterator.
 (ALLs1, ALLs2, ALLs4, ALLs234): New mode iterators.
 (gelt): New code iterator.
 (gelt_eqne): New code attribute.
 (rvbranch, *rvbranch, difficult_rvbranch, *difficult_rvbranch)
 (branch_unspec, *negated_tst, *reversed_tst)
 (*cmpqi_sign_extend): Remove insns.
 (define_c_enum "unspec") [UNSPEC_IDENTITY]: Remove.

 * config/avr/avr-dimode.md (cbranch4): Canonicalize comparisons.
 * config/avr/predicates.md (scratch_or_d_register_operand): New.
 * config/avr/contraints.md (Yxx): New constraint.

gcc/testsuite/
 PR target/109650
 * config/avr/torture/pr109650-1.c: New test.
 * config/avr/torture/pr109650-2.c: New test.


[PATCH] [arm][testsuite]: Fix ACLE data-intrinsics testcases

2023-05-30 Thread Christophe Lyon via Gcc-patches
data-intrinsics-assembly.c forces -march=armv6 using dg-add-options
arm_arch_v6, which implicitly adds -mfloat-abi=softfp.

However, for a toolchain configured for arm-linux-gnueabihf and
--with-arch=armv7-a, the testcase will fail when including arm_acle.h
(which includes stdint.h, which will fail to include the non-existing
gnu/stubs-soft.h).

Other effective-targets related to arm_acle.h would also pass because
they first try without -mfloat-abi=softfp, so it seems the
simplest/safest is to add { dg-require-effective-target arm_softfp_ok }
to make sure arm_arch_v6_ok's assumption is valid.

The patch also fixes what seems to be an oversight in
data-intrinsics-armv6.c: it requires arm_arch_v6_ok, but uses
arm_arch_v6t2: the patch makes it require arm_arch_v6t2_ok.

2023-05-30  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/acle/data-intrinsics-armv6.c: Fix typo.
* gcc.target/arm/acle/data-intrinsics-assembly.c Require
arm_softfp_ok.
---
 gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c| 2 +-
 gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c 
b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
index aafdff35cee..988ecac3787 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target arm_arch_v6_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_ok } */
 /* { dg-add-options arm_arch_v6t2 } */
 
 #include "arm_acle.h"
diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c 
b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
index 3e066877a70..478cbde1600 100644
--- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
+++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
@@ -1,5 +1,6 @@
 /* Test the ACLE data intrinsics get expanded to the correct instructions on a 
specific architecture  */
 /* { dg-do assemble } */
+/* { dg-require-effective-target arm_softfp_ok } */
 /* { dg-require-effective-target arm_arch_v6_ok } */
 /* { dg-additional-options "--save-temps -O1" } */
 /* { dg-add-options arm_arch_v6 } */
-- 
2.34.1



RE: [PATCH] [arm][testsuite]: Fix ACLE data-intrinsics testcases

2023-05-30 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: Tuesday, May 30, 2023 3:00 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Chris Sidebottom 
> Cc: Christophe Lyon 
> Subject: [PATCH] [arm][testsuite]: Fix ACLE data-intrinsics testcases
> 
> data-intrinsics-assembly.c forces -march=armv6 using dg-add-options
> arm_arch_v6, which implicitly adds -mfloat-abi=softfp.
> 
> However, for a toolchain configured for arm-linux-gnueabihf and
> --with-arch=armv7-a, the testcase will fail when including arm_acle.h
> (which includes stdint.h, which will fail to include the non-existing
> gnu/stubs-soft.h).
> 
> Other effective-targets related to arm_acle.h would also pass because
> they first try without -mfloat-abi=softfp, so it seems the
> simplest/safest is to add { dg-require-effective-target arm_softfp_ok }
> to make sure arm_arch_v6_ok's assumption is valid.
> 
> The patch also fixes what seems to be an oversight in
> data-intrinsics-armv6.c: it requires arm_arch_v6_ok, but uses
> arm_arch_v6t2: the patch makes it require arm_arch_v6t2_ok.

Ok, thanks for sorting this out. The arm effective target checks always catch 
me off guard if I don't deal with them for a few months ☹
Kyrill

> 
> 2023-05-30  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/arm/acle/data-intrinsics-armv6.c: Fix typo.
>   * gcc.target/arm/acle/data-intrinsics-assembly.c Require
>   arm_softfp_ok.
> ---
>  gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c| 2 +-
>  gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
> b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
> index aafdff35cee..988ecac3787 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-armv6.c
> @@ -1,5 +1,5 @@
>  /* { dg-do run } */
> -/* { dg-require-effective-target arm_arch_v6_ok } */
> +/* { dg-require-effective-target arm_arch_v6t2_ok } */
>  /* { dg-add-options arm_arch_v6t2 } */
> 
>  #include "arm_acle.h"
> diff --git a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
> b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
> index 3e066877a70..478cbde1600 100644
> --- a/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
> +++ b/gcc/testsuite/gcc.target/arm/acle/data-intrinsics-assembly.c
> @@ -1,5 +1,6 @@
>  /* Test the ACLE data intrinsics get expanded to the correct instructions on 
> a
> specific architecture  */
>  /* { dg-do assemble } */
> +/* { dg-require-effective-target arm_softfp_ok } */
>  /* { dg-require-effective-target arm_arch_v6_ok } */
>  /* { dg-additional-options "--save-temps -O1" } */
>  /* { dg-add-options arm_arch_v6 } */
> --
> 2.34.1



Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread 钟居哲
Hi, all. After several investigations:
Here is my experiements:
void
single_rgroup (int32_t *__restrict a, int32_t *__restrict b, int n)
{
  for (int i = 0; i < n; i++)
a[i] = b[i] + a[i];
}

void
mutiple_rgroup (float *__restrict f, double *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
{
  f[i * 2 + 0] = 1;
  f[i * 2 + 1] = 2;
  d[i] = 3;
}
} 


single_rgroup:
ble a2,zero,.L5
li a4,4
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a0)
vle32.v v2,0(a1)
vsetivli zero,4,e32,m1,ta,ma
mv a3,a2   -> 1 more "mv" 
instruction
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
addi a1,a1,16
addi a0,a0,16
addi a2,a2,-4
bgtu a3,a4,.L3
.L5:
ret
.size single_rgroup, .-single_rgroup
.align 1
.globl foo5
.type foo5, @function
mutiple_rgroup :
ble a2,zero,.L11
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
vl1re32.v v2,0(a5)
lui a5,%hi(.LANCHOR0+16)
addi a5,a5,%lo(.LANCHOR0+16)
slli a2,a2,1
li a3,8
li a7,4
vl1re64.v v1,0(a5)
.L9:
minu a5,a2,a3
minu a4,a5,a7
sub a5,a5,a4
addi a6,a0,16
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v2,0(a0)
srli a4,a4,1
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a6)
srli a5,a5,1
vsetvli zero,a4,e64,m1,ta,ma
addi a6,a1,16
vse64.v v1,0(a1)
mv a4,a2-> 1 more "mv" instruction
vsetvli zero,a5,e64,m1,ta,ma
vse64.v v1,0(a6)
addi a0,a0,32
addi a1,a1,32
addi a2,a2,-8
bgtu a4,a3,.L9
.L11:
ret

These are the examples, I have tried enough amount cases. This is the worst 
case after this patch for RVV:
no matter single-rgroup or multiple-rgroup, we will end up with 1 more "mv" 
instruction inside the loop.
There are also some examples I have tried with no more instructions (It seems 
IVOPTS has done some optimization in some cases).

From my side (RVV),  I think one more "mv" instruction is not a big deal if 
this patch (apply vf step and check conditon by remain > vf)
can help IBM. 

For single-rgroup, this 'mv' instruction will gone when we use SELECT_VL. For 
multiple-rgroup, the 'mv' instruction remains
but as I said, not a big deal.

If this patch's approach is approved, I will rebase and send SELECT_VL patch 
again base on this patch.

Looking forward your suggestions.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
>  Replied Message 
> From
> Richard Sandiford
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai
> Cc
> gcc-patches,
> rguenther,
> linkw
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai"  writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >   sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: [PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-30 Thread Jeff Law via Gcc-patches




On 5/23/23 06:27, Richard Sandiford wrote:

Jeff Law via Gcc-patches  writes:

On 5/17/23 03:03, Jin Ma wrote:

For example:
(define_insn "mov_lowpart_sidi2"
[(set (match_operand:SI0 "register_operand" "=r")
  (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
"TARGET_64BIT"
"mov\t%0,%1")

(define_insn "mov_highpart_sidi2"
[(set (match_operand:SI0 "register_operand" "=r")
  (subreg:SI (match_operand:DI 1 "register_operand" " r") 1))]
"TARGET_64BIT"
"movh\t%0,%1")

When defining the above patterns, the generated file insn-recog.cc will
appear 'switch (SUBREG_BYTE (op))', but since the return value of
SUBREG_BYTE is poly_uint16_pod, the following error will occur:
"error: switch quantity not an integer".

gcc/ChangeLog:

* genrecog.cc (print_nonbool_test): Fix type error of
'switch (SUBREG_BYTE (op))'.

Thanks.  Installed.


We shouldn't add to_constant just because it's a convenient
way of getting rid of errors :)  There has to be a good reason
in principle why the value is known at compile time.
Agreed.  I fully expected the constant to be known at compile time.  I 
wasn't aware we had real uses of polys in the SUBREG_BYTE field.




So I think this should be reverted.  Nothing guarantees that
SUBREG_BYTEs are constant on AArch64 and RISC-V.  And for SVE
it's common for them not to be.

That's fine with me.

Jeff


  1   2   >