Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread Hongtao Liu
On Tue, Feb 27, 2024 at 3:44 PM Richard Biener  wrote:
>
> On Tue, 27 Feb 2024, haochen.jiang wrote:
>
> > On Linux/x86_64,
> >
> > af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> > commit af66ad89e8169f44db723813662917cf4cbb78fc
> > Author: Richard Biener 
> > Date:   Fri Feb 23 16:06:05 2024 +0100
> >
> > middle-end/114070 - folding breaking VEC_COND expansion
> >
> > caused
> >
> > FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"
>
> This shows that the x86 backend is missing vcond_mask_qiqi and friends
Interesting, so both operand and mask are vector boolean.
> (for AVX512 mask modes).  Either that or both expand_vec_cond_expr_p
> and all the machinery behind it (ISEL pass, lowering) should handle
> pure integer mode VEC_COND_EXPR via bit operations.  I think quite some
> targets now implement patterns for these variants, whatever their
> boolean vector modes are.
>
> One complication with the change, which was
>
>   (simplify
>(op @3 (vec_cond:s @0 @1 @2))
> -  (vec_cond @0 (op! @3 @1) (op! @3 @2
> +  (if (TREE_CODE_CLASS (op) != tcc_comparison
> +   || types_match (type, TREE_TYPE (@1))
> +   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
> +   (vec_cond @0 (op! @3 @1) (op! @3 @2)
>
> is that expand_vec_cond_expr_p can also handle comparison defined
> masks, but whether or not we have this isn't visible here so we
> can only check whether vcond_mask expansion would work.
>
> We have optimize_vectors_before_lowering_p but we shouldn't even there
> turn supported into not supported ops and as said, what's supported or
> not cannot be finally decided (if it's only vcond and not vcond_mask
> that is supported).  Also optimize_vectors_before_lowering_p is set
> for a short time between vectorization and vector lowering and we
> definitely do not want to turn supported vectorizer emitted stmts
> into ones that we need to lower.  For GCC 15 we should see to move
> vector lowering before vectorization (before loop optimization I'd
> say) to close this particula hole (and also reliably ICE when the
> vectorizer creates unsupported IL).  We also definitely want to
> retire vcond expanders (no target I know of supports single-instruction
> compare-and-select).
>
> So short term we either live with this regression (the testcase
> verifies we perform constant folding to { 0, 0 }), implement
> the four missing patterns (qi, hi, si and di missing value mode
> vcond_mask patterns) or see to implement generic code for this.
>
> Given precedent I'd tend towards adding the x86 patterns.
>
> Hongtao, can you handle that?
Sure, I'll take a look.
>
> Thanks,
> Richard.



-- 
BR,
Hongtao


Re: [r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread Richard Biener
On Tue, 27 Feb 2024, haochen.jiang wrote:

> On Linux/x86_64,
> 
> af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
> commit af66ad89e8169f44db723813662917cf4cbb78fc
> Author: Richard Biener 
> Date:   Fri Feb 23 16:06:05 2024 +0100
> 
> middle-end/114070 - folding breaking VEC_COND expansion
> 
> caused
> 
> FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"

This shows that the x86 backend is missing vcond_mask_qiqi and friends
(for AVX512 mask modes).  Either that or both expand_vec_cond_expr_p
and all the machinery behind it (ISEL pass, lowering) should handle
pure integer mode VEC_COND_EXPR via bit operations.  I think quite some
targets now implement patterns for these variants, whatever their
boolean vector modes are.

One complication with the change, which was

  (simplify
   (op @3 (vec_cond:s @0 @1 @2))
-  (vec_cond @0 (op! @3 @1) (op! @3 @2
+  (if (TREE_CODE_CLASS (op) != tcc_comparison
+   || types_match (type, TREE_TYPE (@1))
+   || expand_vec_cond_expr_p (type, TREE_TYPE (@0), ERROR_MARK))
+   (vec_cond @0 (op! @3 @1) (op! @3 @2)

is that expand_vec_cond_expr_p can also handle comparison defined
masks, but whether or not we have this isn't visible here so we
can only check whether vcond_mask expansion would work.

We have optimize_vectors_before_lowering_p but we shouldn't even there
turn supported into not supported ops and as said, what's supported or
not cannot be finally decided (if it's only vcond and not vcond_mask
that is supported).  Also optimize_vectors_before_lowering_p is set
for a short time between vectorization and vector lowering and we
definitely do not want to turn supported vectorizer emitted stmts
into ones that we need to lower.  For GCC 15 we should see to move
vector lowering before vectorization (before loop optimization I'd
say) to close this particula hole (and also reliably ICE when the
vectorizer creates unsupported IL).  We also definitely want to
retire vcond expanders (no target I know of supports single-instruction
compare-and-select).

So short term we either live with this regression (the testcase
verifies we perform constant folding to { 0, 0 }), implement
the four missing patterns (qi, hi, si and di missing value mode
vcond_mask patterns) or see to implement generic code for this.

Given precedent I'd tend towards adding the x86 patterns.

Hongtao, can you handle that?

Thanks,
Richard.


ReRe:[PATCH V3 0/2] aarch64: Place target independent and dependent changed and unchanged code in one file.

2024-02-26 Thread Ajit Agarwal
Hello Richard/Alex:

This patch has better diff with changed and unchanged code.
Unchanged code and some of the changed code  will be extracted 
into target independent headers and sources wherein target
deoendent code changed and unchanged code would be in target
dependent file like aarch64-ldp-fusion

Please review.

Thanks & Regards
Ajit

On 23/02/24 4:41 pm, Ajit Agarwal wrote:
> Hello Richard/Alex/Segher:
> 
> This patch adds the changed code for target independent and
> dependent code for load store fusion.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> Bootstrapped for aarch64-linux-gnu.
> 
> Thanks & Regards
> Ajit
> 
> aarch64: Place target independent and dependent changed code in one file.
> 
> Common infrastructure of load store pair fusion is
> divided into target independent and target dependent
> changed code.
> 
> Target independent code is the Generic code with
> pure virtual function to interface betwwen target
> independent and dependent code.
> 
> Target dependent code is the implementation of pure
> virtual function for aarch64 target and the call
> to target independent code.
> 
> 2024-02-23  Ajit Kumar Agarwal  
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64-ldp-fusion.cc: Place target
>   independent and dependent changed code.
> ---
>  gcc/config/aarch64/aarch64-ldp-fusion.cc | 437 ---
>  1 file changed, 305 insertions(+), 132 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-ldp-fusion.cc 
> b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> index 22ed95eb743..2ef22ff1e96 100644
> --- a/gcc/config/aarch64/aarch64-ldp-fusion.cc
> +++ b/gcc/config/aarch64/aarch64-ldp-fusion.cc
> @@ -40,10 +40,10 @@
>  
>  using namespace rtl_ssa;
>  
> -static constexpr HOST_WIDE_INT LDP_IMM_BITS = 7;
> -static constexpr HOST_WIDE_INT LDP_IMM_SIGN_BIT = (1 << (LDP_IMM_BITS - 1));
> -static constexpr HOST_WIDE_INT LDP_MAX_IMM = LDP_IMM_SIGN_BIT - 1;
> -static constexpr HOST_WIDE_INT LDP_MIN_IMM = -LDP_MAX_IMM - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_BITS = 7;
> +static constexpr HOST_WIDE_INT PAIR_MEM_IMM_SIGN_BIT = (1 << 
> (PAIR_MEM_IMM_BITS - 1));
> +static constexpr HOST_WIDE_INT PAIR_MEM_MAX_IMM = PAIR_MEM_IMM_SIGN_BIT - 1;
> +static constexpr HOST_WIDE_INT PAIR_MEM_MIN_IMM = -PAIR_MEM_MAX_IMM - 1;
>  
>  // We pack these fields (load_p, fpsimd_p, and size) into an integer
>  // (LFS) which we use as part of the key into the main hash tables.
> @@ -138,8 +138,18 @@ struct alt_base
>poly_int64 offset;
>  };
>  
> +// Virtual base class for load/store walkers used in alias analysis.
> +struct alias_walker
> +{
> +  virtual bool conflict_p (int ) const = 0;
> +  virtual insn_info *insn () const = 0;
> +  virtual bool valid () const  = 0;
> +  virtual void advance () = 0;
> +};
> +
> +
>  // State used by the pass for a given basic block.
> -struct ldp_bb_info
> +struct pair_fusion
>  {
>using def_hash = nofree_ptr_hash;
>using expr_key_t = pair_hash>;
> @@ -161,13 +171,13 @@ struct ldp_bb_info
>static const size_t obstack_alignment = sizeof (void *);
>bb_info *m_bb;
>  
> -  ldp_bb_info (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
> +  pair_fusion (bb_info *bb) : m_bb (bb), m_emitted_tombstone (false)
>{
>  obstack_specify_allocation (_obstack, OBSTACK_CHUNK_SIZE,
>   obstack_alignment, obstack_chunk_alloc,
>   obstack_chunk_free);
>}
> -  ~ldp_bb_info ()
> +  ~pair_fusion ()
>{
>  obstack_free (_obstack, nullptr);
>  
> @@ -177,10 +187,50 @@ struct ldp_bb_info
>   bitmap_obstack_release (_bitmap_obstack);
>}
>}
> +  void track_access (insn_info *, bool load, rtx mem);
> +  void transform ();
> +  void cleanup_tombstones ();
> +  virtual void set_multiword_subreg (insn_info *i1, insn_info *i2,
> +  bool load_p) = 0;
> +  virtual rtx gen_load_store_pair (rtx *pats,  rtx writeback,
> +bool load_p) = 0;
> +  void merge_pairs (insn_list_t &, insn_list_t &,
> + bool load_p, unsigned access_size);
> +  virtual void transform_for_base (int load_size, access_group ) = 0;
> +
> +  bool try_fuse_pair (bool load_p, unsigned access_size,
> +  insn_info *i1, insn_info *i2);
> +
> +  bool fuse_pair (bool load_p, unsigned access_size,
> +   int writeback,
> +   insn_info *i1, insn_info *i2,
> +   base_cand ,
> +   const insn_range_info _range);
> +
> +  void do_alias_analysis (insn_info *alias_hazards[4],
> +   

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread Hongtao Liu
On Mon, Feb 26, 2024 at 6:30 PM H.J. Lu  wrote:
>
> On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu  wrote:
> >
> > On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu  wrote:
> > >
> > > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu  wrote:
> > > >
> > > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu  wrote:
> > > > >
> > > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
> > > > > >
> > > > > > ldtilecfg and sttilecfg take a 512-byte memory block.  With
> > > > > > _tile_loadconfig implemented as
> > > > > >
> > > > > > extern __inline void
> > > > > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > > > _tile_loadconfig (const void *__config)
> > > > > > {
> > > > > >   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > > > **)__config)));
> > > > > > }
> > > > > >
> > > > > > GCC sees:
> > > > > >
> > > > > > (parallel [
> > > > > >   (asm_operands/v ("ldtilecfg   %X0") ("") 0
> > > > > >[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
> > > > > >  (const_int -64 [0xffc0])) [1 
> > > > > > MEM[(const void * *)_data]+0 S8 A128])]
> > > > > >[(asm_input:DI ("m"))]
> > > > > >(clobber (reg:CC 17 flags))])
> > > > > >
> > > > > > and the memory operand size is 1 byte.  As the result, the rest of 
> > > > > > 511
> > > > > > bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg 
> > > > > > intrinsics
> > > > > > with a pointer to BLKmode to honor the 512-byte memory block.
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > PR target/114098
> > > > > > * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> > > > > > __builtin_ia32_ldtilecfg.
> > > > > > (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> > > > > > * config/i386/i386-builtin.def (BDESC): Add
> > > > > > __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> > > > > > * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> > > > > > IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> > > > > > * config/i386/i386.md (ldtilecfg): New pattern.
> > > > > > (sttilecfg): Likewise.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > PR target/114098
> > > > > > * gcc.target/i386/amxtile-4.c: New test.
> > > > > > ---
> > > > > >  gcc/config/i386/amxtileintrin.h   |  4 +-
> > > > > >  gcc/config/i386/i386-builtin.def  |  4 ++
> > > > > >  gcc/config/i386/i386-expand.cc| 19 
> > > > > >  gcc/config/i386/i386.md   | 24 ++
> > > > > >  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 
> > > > > > +++
> > > > > >  5 files changed, 104 insertions(+), 2 deletions(-)
> > > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
> > > > > >
> > > > > > diff --git a/gcc/config/i386/amxtileintrin.h 
> > > > > > b/gcc/config/i386/amxtileintrin.h
> > > > > > index d1a26e0fea5..5081b326498 100644
> > > > > > --- a/gcc/config/i386/amxtileintrin.h
> > > > > > +++ b/gcc/config/i386/amxtileintrin.h
> > > > > > @@ -39,14 +39,14 @@ extern __inline void
> > > > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > > >  _tile_loadconfig (const void *__config)
> > > > > >  {
> > > > > > -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > > > **)__config)));
> > > > > > +  __builtin_ia32_ldtilecfg (__config);
> > > > > >  }
> > > > > >
> > > > > >  extern __inline void
> > > > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > > >  _tile_storeconfig (void *__config)
> > > > > >  {
> > > > > > -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void 
> > > > > > **)__config)));
> > > > > > +  __builtin_ia32_sttilecfg (__config);
> > > > > >  }
> > > > > >
> > > > > >  extern __inline void
> > > > > > diff --git a/gcc/config/i386/i386-builtin.def 
> > > > > > b/gcc/config/i386/i386-builtin.def
> > > > > > index 729355230b8..88dd7f8857f 100644
> > > > > > --- a/gcc/config/i386/i386-builtin.def
> > > > > > +++ b/gcc/config/i386/i386-builtin.def
> > > > > > @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | 
> > > > > > OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__b
> > > > > >  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
> > > > > > CODE_FOR_nothing, "__builtin_ia32_xrstors64", 
> > > > > > IX86_BUILTIN_XRSTORS64, UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > > > >  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, 
> > > > > > CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, 
> > > > > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > > > >
> > > > > > +/* LDFILECFG and STFILECFG.  */
> > > > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > > > CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", 
> > > > > > IX86_BUILTIN_LDTILECFG, UNKNOWN, (int) VOID_FTYPE_PCVOID)
> > > > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > 

[PATCH 3/3] Fix PR libcc1/113977

2024-02-26 Thread Tom Tromey
PR libcc1/113977 points out a case where a simple expression is
rejected with a compiler error message.  The bug here is that gdb does
not inform the plugin of the correct alignment -- in fact, there is no
way to do that.

This patch adds a new method to allow the alignment to be set, and
bumps the C front end protocol version.

It also includes some updates to various comments in 'include', done
here to simplify the merge to binutils-gdb.

include/ChangeLog
2024-02-26  Tom Tromey  

* gcc-cp-interface.h (gcc_cp_fe_context_function): Update
comment.
* gcc-c-interface.h (enum gcc_c_api_version) :
New constant.
(gcc_c_fe_context_function): Update comment.
* gcc-c-fe.def (finish_record_with_alignment): New method.
Update documentation.

libcc1/ChangeLog
2024-02-26  Tom Tromey  

PR libcc1/113977
* libcc1plugin.cc (plugin_finish_record_or_union): New function.
(plugin_finish_record_or_union): Rewrite.
(plugin_init): Use GCC_C_FE_VERSION_2.
* libcc1.cc (c_vtable): Use GCC_C_FE_VERSION_2.
(gcc_c_fe_context): Check for GCC_C_FE_VERSION_2.
---
 include/ChangeLog  | 10 ++
 include/gcc-c-fe.def   | 13 -
 include/gcc-c-interface.h  | 11 +--
 include/gcc-cp-interface.h |  6 +-
 libcc1/ChangeLog   |  9 +
 libcc1/libcc1.cc   |  5 +++--
 libcc1/libcc1plugin.cc | 26 ++
 7 files changed, 66 insertions(+), 14 deletions(-)

diff --git a/include/ChangeLog b/include/ChangeLog
index 8bfaec6faa1..4c8b5b28039 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,13 @@
+2024-02-26  Tom Tromey  
+
+   * gcc-cp-interface.h (gcc_cp_fe_context_function): Update
+   comment.
+   * gcc-c-interface.h (enum gcc_c_api_version) :
+   New constant.
+   (gcc_c_fe_context_function): Update comment.
+   * gcc-c-fe.def (finish_record_with_alignment): New method.
+   Update documentation.
+
 2024-01-13  Jakub Jelinek  
 
* demangle.h (enum demangle_component_type): Add
diff --git a/include/gcc-c-fe.def b/include/gcc-c-fe.def
index 36a765484a7..cb7cf197525 100644
--- a/include/gcc-c-fe.def
+++ b/include/gcc-c-fe.def
@@ -89,7 +89,10 @@ GCC_METHOD5 (int /* bool */, build_add_field,
 
 /* After all the fields have been added to a struct or union, the
struct or union type must be "finished".  This does some final
-   cleanups in GCC.  */
+   cleanups in GCC.
+
+   Note that when using GCC_C_FE_VERSION_2, it is preferable to call
+   finish_record_with_alignment instead.  */
 
 GCC_METHOD2 (int /* bool */, finish_record_or_union,
 gcc_type, /* Argument RECORD_OR_UNION_TYPE. */
@@ -220,3 +223,11 @@ GCC_METHOD2 (gcc_type, float_type,
 unsigned long,/* Argument SIZE_IN_BYTES.  */
 const char *) /* Argument BUILTIN_NAME.  */
 
+/* New in GCC_FE_VERSION_2.  Like finish_record_or_union but the caller also
+   supplies the alignment.  If the alignment is 0, this acts identically to
+   finish_record_or_union.  */
+
+GCC_METHOD3 (int /* bool */, finish_record_with_alignment,
+gcc_type, /* Argument RECORD_OR_UNION_TYPE. */
+unsigned long,/* Argument SIZE_IN_BYTES.  */
+unsigned long)/* Argument ALIGNMENT.  */
diff --git a/include/gcc-c-interface.h b/include/gcc-c-interface.h
index feece1e38a2..700d7483a4a 100644
--- a/include/gcc-c-interface.h
+++ b/include/gcc-c-interface.h
@@ -45,7 +45,10 @@ enum gcc_c_api_version
 
   /* Added char_type.  Added new version of int_type and float_type,
  deprecated int_type_v0 and float_type_v0.  */
-  GCC_C_FE_VERSION_1 = 1
+  GCC_C_FE_VERSION_1 = 1,
+
+  /* Added finish_record_with_alignment method.  */
+  GCC_C_FE_VERSION_2 = 2,
 };
 
 /* Qualifiers.  */
@@ -198,7 +201,11 @@ struct gcc_c_context
 /* The type of the initialization function.  The caller passes in the
desired base version and desired C-specific version.  If the
request can be satisfied, a compatible gcc_context object will be
-   returned.  Otherwise, the function returns NULL.  */
+   returned.  In particular, this may return a context object with a higher
+   actual version number than was requested, provided the higher version is
+   fully compatible.  (As of GCC_C_FE_VERSION_2, this is always true.)
+
+   Otherwise, the function returns NULL.  */
 
 typedef struct gcc_c_context *gcc_c_fe_context_function
 (enum gcc_base_api_version,
diff --git a/include/gcc-cp-interface.h b/include/gcc-cp-interface.h
index 2f950729b9b..15b911cb216 100644
--- a/include/gcc-cp-interface.h
+++ b/include/gcc-cp-interface.h
@@ -483,7 +483,11 @@ struct gcc_cp_context
 /* The type of the initialization function.  The caller passes in the
desired base version and desired C-specific version.  If the
request can be satisfied, a compatible 

[PATCH 2/3] Fix version negotiation in libcc1 plugins

2024-02-26 Thread Tom Tromey
This fixes version negotiation in the libcc1 plugins.  It's done in a
simple way: the version number from the context object is now passed
to base_gdb_plugin.

The idea behind this is that when the client (gdb) requests version N,
the plugin should respond with the newest version that it knows of
that is backward compatible to N.  That is, the connection can be
upgraded.  Note that the protocol does not change much, and no
backward incompatibilities have ever been needed.

The C plugin is also changed to advertise GCC_C_FE_VERSION_1.

The version negotiation approach should of course be documented, but I
did that in a subsequent patch, in order to only have one patch
touching the 'include' directory and thus needing a merge to
binutils-gdb.

2024-02-26  Tom Tromey  

* libcp1.cc (libcp1::libcp1): Use FE version number from context.
* libcc1.cc (libcc1::libcc1): Use FE version number from context.
(c_vtable): Use GCC_C_FE_VERSION_1.
---
 libcc1/ChangeLog | 6 ++
 libcc1/libcc1.cc | 4 ++--
 libcc1/libcp1.cc | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/libcc1/ChangeLog b/libcc1/ChangeLog
index b0b31ee6586..b4072574ba8 100644
--- a/libcc1/ChangeLog
+++ b/libcc1/ChangeLog
@@ -1,3 +1,9 @@
+2024-02-26  Tom Tromey  
+
+   * libcp1.cc (libcp1::libcp1): Use FE version number from context.
+   * libcc1.cc (libcc1::libcc1): Use FE version number from context.
+   (c_vtable): Use GCC_C_FE_VERSION_1.
+
 2024-02-26  Tom Tromey  
 
* libcc1plugin.cc (safe_lookup_builtin_type): Handle ERROR_MARK.
diff --git a/libcc1/libcc1.cc b/libcc1/libcc1.cc
index 8d4ddc5ddfe..992181e8fdc 100644
--- a/libcc1/libcc1.cc
+++ b/libcc1/libcc1.cc
@@ -54,7 +54,7 @@ struct libcc1 : public 
cc1_plugin::base_gdb_plugin
 libcc1::libcc1 (const gcc_c_fe_vtable *cv)
   : cc1_plugin::base_gdb_plugin ("libcc1plugin",
C_COMPILER_NAME,
-   GCC_C_FE_VERSION_1)
+   cv->c_version)
 {
   c_ops = cv;
 }
@@ -108,7 +108,7 @@ set_callbacks (struct gcc_c_context *s,
 
 static const struct gcc_c_fe_vtable c_vtable =
 {
-  GCC_C_FE_VERSION_0,
+  GCC_C_FE_VERSION_1,
   set_callbacks,
 
 #define GCC_METHOD0(R, N) \
diff --git a/libcc1/libcp1.cc b/libcc1/libcp1.cc
index ec3eec2c606..cc2915d30af 100644
--- a/libcc1/libcp1.cc
+++ b/libcc1/libcp1.cc
@@ -55,7 +55,7 @@ struct libcp1 : public 
cc1_plugin::base_gdb_plugin
 libcp1::libcp1 (const gcc_cp_fe_vtable *cv)
   : cc1_plugin::base_gdb_plugin ("libcp1plugin",
 CP_COMPILER_NAME,
-GCC_CP_FE_VERSION_0)
+cv->cp_version)
 {
   cp_ops = cv;
 }

-- 
2.43.0



[PATCH 1/3] Change 'v1' float and int code to fall back to v0

2024-02-26 Thread Tom Tromey
While working on another patch, I discovered that the libcc1 plugin
code never did version negotiation correctly.  So, the patches to
introduce v1 never did anything -- the new code, as far as I know, has
never been run.

Making version negotiation work shows that the existing code causes
crashes.  For example, safe_lookup_builtin_type might return
error_mark_node in some cases, which the callers aren't prepared to
accept.

Looking into it some more, I couldn't find any justification for this
v1 code for the C compiler plugin.  Since it's not run at all, it's
also clear that removing it doesn't cause any regressions in gdb.

However, rather than remove it, this patch changes it to handle
ERROR_MARK better, and then to fall back to the v0 code if the new
code fails to find the type it's looking for.

2024-02-26  Tom Tromey  

* libcc1plugin.cc (safe_lookup_builtin_type): Handle ERROR_MARK.
(plugin_int_type): Fall back to plugin_int_type_v0.
(plugin_float_type): Fall back to plugin_float_type_v0.
---
 libcc1/ChangeLog   |  6 ++
 libcc1/libcc1plugin.cc | 19 ++-
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/libcc1/ChangeLog b/libcc1/ChangeLog
index f81fe389e71..b0b31ee6586 100644
--- a/libcc1/ChangeLog
+++ b/libcc1/ChangeLog
@@ -1,3 +1,9 @@
+2024-02-26  Tom Tromey  
+
+   * libcc1plugin.cc (safe_lookup_builtin_type): Handle ERROR_MARK.
+   (plugin_int_type): Fall back to plugin_int_type_v0.
+   (plugin_float_type): Fall back to plugin_float_type_v0.
+
 2024-02-09  Marek Polacek  
 
PR c++/98388
diff --git a/libcc1/libcc1plugin.cc b/libcc1/libcc1plugin.cc
index 00d3963029d..f1082d8e9d3 100644
--- a/libcc1/libcc1plugin.cc
+++ b/libcc1/libcc1plugin.cc
@@ -555,7 +555,7 @@ safe_lookup_builtin_type (const char *builtin_name)
 
   gcc_assert (TREE_CODE (result) == TYPE_DECL);
   result = TREE_TYPE (result);
-  return result;
+  return TREE_CODE (result) == ERROR_MARK ? nullptr : result;
 }
 
 static gcc_type
@@ -592,13 +592,14 @@ plugin_int_type (cc1_plugin::connection *self,
 int is_unsigned, unsigned long size_in_bytes,
 const char *builtin_name)
 {
-  if (!builtin_name)
-return plugin_int_type_v0 (self, is_unsigned, size_in_bytes);
-
-  tree result = safe_lookup_builtin_type (builtin_name);
-  gcc_assert (!result || TREE_CODE (result) == INTEGER_TYPE);
-
-  return plugin_int_check (self, is_unsigned, size_in_bytes, result);
+  if (builtin_name != nullptr)
+{
+  tree result = safe_lookup_builtin_type (builtin_name);
+  gcc_assert (!result || TREE_CODE (result) == INTEGER_TYPE);
+  if (result != nullptr)
+   return plugin_int_check (self, is_unsigned, size_in_bytes, result);
+}
+  return plugin_int_type_v0 (self, is_unsigned, size_in_bytes);
 }
 
 gcc_type
@@ -631,7 +632,7 @@ plugin_float_type (cc1_plugin::connection *self,
   tree result = safe_lookup_builtin_type (builtin_name);
 
   if (!result)
-return convert_out (error_mark_node);
+return plugin_float_type_v0 (self, size_in_bytes);
 
   gcc_assert (SCALAR_FLOAT_TYPE_P (result));
   gcc_assert (BITS_PER_UNIT * size_in_bytes == TYPE_PRECISION (result));

-- 
2.43.0



[PATCH 0/3] Fix libcc1 failure

2024-02-26 Thread Tom Tromey
This started as a patch to fix the libcc1 failure pointed out in
PR libcc1/113977.  However, investigating that pointed out some other
issues, which are also fixed in this series.

I tested this on x86-64 Fedora 38 against two versions of gdb: one
patched to fix the gdb side of the bug, and one that was not.  In both
cases gdb's "gdb.compile" tests were run.

---
Tom Tromey (3):
  Change 'v1' float and int code to fall back to v0
  Fix version negotiation in libcc1 plugins
  Fix PR libcc1/113977

 include/ChangeLog  | 10 ++
 include/gcc-c-fe.def   | 13 -
 include/gcc-c-interface.h  | 11 +--
 include/gcc-cp-interface.h |  6 +-
 libcc1/ChangeLog   | 21 +
 libcc1/libcc1.cc   |  7 ---
 libcc1/libcc1plugin.cc | 45 -
 libcc1/libcp1.cc   |  2 +-
 8 files changed, 90 insertions(+), 25 deletions(-)
---
base-commit: 1e2a3b278d7770db6b5ca869756b1375fc3a77d6
change-id: 20240226-gdb-compile-align-f31c69137d6a

Best regards,
-- 
Tom Tromey 



Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
on 2024/2/27 10:13, Peter Bergner wrote:
> On 2/26/24 7:55 PM, Kewen.Lin wrote:
>> on 2024/2/26 23:07, Peter Bergner wrote:
>>> so I think we should use both Jeevitha's predicate change and
>>> your operands[1] change.
>>
>> Since either the original predicate change or operands[1] change
>> can fix this issue, I think it's implied that only either of them
>> is enough, so we can remove "else if (!REG_P (op1))" arm (or even
>> replaced with one else arm to assert REG_P (op1))?
> 
> splat_input_operand allows, mem, reg and subreg, so I don't think
> we can just assert on REG_P (op1), since op1 could be a subreg.

ah, you are right! I missed the "subreg".

> I do agree we can remove the "if (!REG_P (op1))" test on the else
> branch, since force_reg() has an early exit for regs, so a simple:
> 
>   ...
>   else
> operands[1] = force_reg (mode, op1);
> 
> ..should work.

Yes!

> 
> 
> 
> 
>> Good point, or maybe just an explicit -mvsx like some existing ones, which
>> can avoid to only test some fixed cpu type.
> 
> If a simple "-O1 -vsx" is enough to expose the ICE on an unpacthed
> compiler and a PASS on a patched compiler, then I'm all for it.
> Jeevitha, can you try confirming that?

Jeevitha, can you also check why we have the different behavior on GCC 11 when
you get time?  GCC 12 has new built-in framework, so this ICE gets exposed, but
IMHO it would still be good to double check the previous behavior is due to
some miss support or some other latent bug.  Thanks in advance!

BR,
Kewen



[PATCH] c++/modules: local class merging [PR99426]

2024-02-26 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this approach
look reasonable?

-- >8 --

One known missing piece in the modules implementation is merging of a
streamed-in local class with the corresponding in-TU version of the
local class.  This missing piece turns out to cause a hard-to-reduce
use-after-free GC issue due to the entity_ary not being marked as a GC
root (deliberately), and manifests as a serialization error on stream-in
as in PR99426 (see comment #6 for a reduction).  It's also reproducible
on trunk when running the xtreme-header tests without -fno-module-lazy.

This patch makes us merge such local classes according to their position
within the containing function's definition, similar to how we merge
FIELD_DECLs of a class according to their index in the TYPE_FIELDS
list.

PR c++/99426

gcc/cp/ChangeLog:

* module.cc (merge_kind::MK_local_class): New enumerator.
(merge_kind_name): Update.
(trees_out::chained_decls): Move BLOCK-specific handling
of DECL_LOCAL_DECL_P decls to ...
(trees_out::core_vals) : ... here.  Stream
BLOCK_VARS manually.
(trees_in::core_vals) : Stream BLOCK_VARS
manually.  Handle deduplicated local classes.
(trees_out::key_local_class): Define.
(trees_in::key_local_class): Define.
(trees_out::get_merge_kind) : Return
MK_local_class for a local class.
(trees_out::key_mergeable) : Use
key_local_class.
(trees_in::key_mergeable) : Likewise.
(trees_in::is_matching_decl): Be flexible with type mismatches
for local entities.

gcc/testsuite/ChangeLog:

* g++.dg/modules/xtreme-header-7_a.H: New test.
* g++.dg/modules/xtreme-header-7_b.C: New test.
---
 gcc/cp/module.cc  | 167 +++---
 .../g++.dg/modules/xtreme-header-7_a.H|   4 +
 .../g++.dg/modules/xtreme-header-7_b.C|   6 +
 3 files changed, 149 insertions(+), 28 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-7_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/xtreme-header-7_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index fa91c6ff9cb..f77f73a59ed 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -2771,6 +2771,7 @@ enum merge_kind
 
   MK_enum, /* Found by CTX, & 1stMemberNAME.  */
   MK_keyed, /* Found by key & index.  */
+  MK_local_class, /* Found by CTX, index.  */
 
   MK_friend_spec,  /* Like named, but has a tmpl & args too.  */
   MK_local_friend, /* Found by CTX, index.  */
@@ -2799,7 +2800,7 @@ static char const *const merge_kind_name[MK_hwm] =
 "unique", "named", "field", "vtable",  /* 0...3  */
 "asbase", "partial", "enum", "attached",   /* 4...7  */
 
-"friend spec", "local friend", NULL, NULL,  /* 8...11 */
+"local class", "friend spec", "local friend", NULL,  /* 8...11 */
 NULL, NULL, NULL, NULL,
 
 "type spec", "type tmpl spec", /* 16,17 type (template).  */
@@ -2928,6 +2929,7 @@ public:
   unsigned binfo_mergeable (tree *);
 
 private:
+  tree key_local_class (const merge_key&, tree);
   uintptr_t *find_duplicate (tree existing);
   void register_duplicate (tree decl, tree existing);
   /* Mark as an already diagnosed bad duplicate.  */
@@ -3086,6 +3088,7 @@ public:
   void binfo_mergeable (tree binfo);
 
 private:
+  void key_local_class (merge_key&, tree, tree);
   bool decl_node (tree, walk_kind ref);
   void type_node (tree);
   void tree_value (tree);
@@ -4952,18 +4955,7 @@ void
 trees_out::chained_decls (tree decls)
 {
   for (; decls; decls = DECL_CHAIN (decls))
-{
-  if (VAR_OR_FUNCTION_DECL_P (decls)
- && DECL_LOCAL_DECL_P (decls))
-   {
- /* Make sure this is the first encounter, and mark for
-walk-by-value.  */
- gcc_checking_assert (!TREE_VISITED (decls)
-  && !DECL_TEMPLATE_INFO (decls));
- mark_by_value (decls);
-   }
-  tree_node (decls);
-}
+tree_node (decls);
   tree_node (NULL_TREE);
 }
 
@@ -6204,7 +6196,21 @@ trees_out::core_vals (tree t)
 
   /* DECL_LOCAL_DECL_P decls are first encountered here and
  streamed by value.  */
-  chained_decls (t->block.vars);
+  for (tree decls = t->block.vars; decls; decls = DECL_CHAIN (decls))
+   {
+ if (VAR_OR_FUNCTION_DECL_P (decls)
+ && DECL_LOCAL_DECL_P (decls))
+   {
+ /* Make sure this is the first encounter, and mark for
+walk-by-value.  */
+ gcc_checking_assert (!TREE_VISITED (decls)
+  && !DECL_TEMPLATE_INFO (decls));
+ mark_by_value (decls);
+   }
+ tree_node (decls);
+   }
+  tree_node (NULL_TREE);
+
   /* nonlocalized_vars is a middle-end thing.  */
   WT (t->block.subblocks);
   WT (t->block.supercontext);
@@ -6717,7 +6723,34 @@ trees_in::core_vals (tree t)
 case 

Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Peter Bergner
On 2/26/24 7:55 PM, Kewen.Lin wrote:
> on 2024/2/26 23:07, Peter Bergner wrote:
>> so I think we should use both Jeevitha's predicate change and
>> your operands[1] change.
> 
> Since either the original predicate change or operands[1] change
> can fix this issue, I think it's implied that only either of them
> is enough, so we can remove "else if (!REG_P (op1))" arm (or even
> replaced with one else arm to assert REG_P (op1))?

splat_input_operand allows, mem, reg and subreg, so I don't think
we can just assert on REG_P (op1), since op1 could be a subreg.
I do agree we can remove the "if (!REG_P (op1))" test on the else
branch, since force_reg() has an early exit for regs, so a simple:

  ...
  else
operands[1] = force_reg (mode, op1);

..should work.




> Good point, or maybe just an explicit -mvsx like some existing ones, which
> can avoid to only test some fixed cpu type.

If a simple "-O1 -vsx" is enough to expose the ICE on an unpacthed
compiler and a PASS on a patched compiler, then I'm all for it.
Jeevitha, can you try confirming that?


Peter




Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
on 2024/2/26 23:07, Peter Bergner wrote:
> On 2/26/24 4:49 AM, Kewen.Lin wrote:
>> on 2024/2/26 14:18, jeevitha wrote:
>>> Hi All,
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 6111cc90eb7..e5688ff972a 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -4660,7 +4660,7 @@
>>>  (define_expand "vsx_splat_"
>>>[(set (match_operand:VSX_D 0 "vsx_register_operand")
>>> (vec_duplicate:VSX_D
>>> -(match_operand: 1 "input_operand")))]
>>> +(match_operand: 1 "splat_input_operand")))]
>>>"VECTOR_MEM_VSX_P (mode)"
>>>  {
>>>rtx op1 = operands[1];
>>
>> This hunk actually does force_reg already:
>>
>> ...
>>   else if (!REG_P (op1))
>> op1 = force_reg (mode, op1);
>>
>> but it's assigning to op1 unexpectedly (an omission IMHO), so just
>> simply fix it with:
>>
>>   else if (!REG_P (op1))
>> -op1 = force_reg (mode, op1);
>> +operands[1] = force_reg (mode, op1);
> 
> I agree op1 was an oversight and it should be operands[1].
> That said, I think using more precise predicates is a good thing,

Agreed.

> so I think we should use both Jeevitha's predicate change and
> your operands[1] change.

Since either the original predicate change or operands[1] change
can fix this issue, I think it's implied that only either of them
is enough, so we can remove "else if (!REG_P (op1))" arm (or even
replaced with one else arm to assert REG_P (op1))?

> 
> I'll note that Jeevitha originally had the operands[1] change, but I
> didn't look closely enough at the issue or the pattern and mentioned
> that these kinds of bugs can be caused by too loose constraints and
> predicates, which is when she found the updated predicate to use.
> I believe she already even bootstrapped and regtested the operands[1]
> only change.  Jeevitha???
> 

Good to know that. :)

> 
> 
> 
>>> +/* PR target/113950 */
>>> +/* { dg-do compile } */
>>
>> We need an effective target to ensure vsx support, for now it's 
>> powerpc_vsx_ok.
>> ie: /* { dg-require-effective-target powerpc_vsx_ok } */
> 
> Agreed.
> 
> 
>>> +/* { dg-options "-O1" } */
> 
> I think we should also use a -mcpu=XXX option to ensure VSX is enabled
> when compiling these VSX built-in functions.  I'm fine using any CPU
> (power7 or later) where the ICE exists with an unpatched compiler.
> Otherwise, testing will be limited to our server systems that have
> VSX enabled by default.

Good point, or maybe just an explicit -mvsx like some existing ones, which
can avoid to only test some fixed cpu type.

BR,
Kewen


Re: [PATCH] c++: Revert deferring emission of inline variables [PR114013]

2024-02-26 Thread Jason Merrill

On 2/21/24 05:28, Nathaniel Shead wrote:

My earlier patch appears to have caused some regressions. I've taken a
quick look to see if there are obvious workarounds, but given the time
frame and the fact that I still don't really understand all the details
of how and when symbols get emitted, I felt it was safer to revert the
non-modules parts of this change instead.

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


The make_rtl_for_nonlocal_decl hunk is OK, I don't see the need to 
change import_export_decl.



-- >8 --

This is a (partial) reversion of r14-8987-gdd9d14f7d53 to return to
eagerly emitting inline variables to the middle-end when they are
declared. 'import_export_decl' will still continue to accept them, as
allowing this is a pure extension and doesn't seem to cause issues with
modules, but otherwise deferring the emission of inline variables
appears to cause issues on some targets and prevents some code using
inline variable templates from correctly linking.

There might be a more targetted way to fix this, but due to the
complexity of handling linkage I'd prefer to wait till GCC 15 to explore
our options.

PR c++/113970
PR c++/114013

gcc/cp/ChangeLog:

* decl.cc (make_rtl_for_nonlocal_decl): Don't defer inline
variables.
* decl2.cc (import_export_decl): Only support inline variables
imported from a module.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/inline-var10.C: New test.
---
  gcc/cp/decl.cc|  4 ---
  gcc/cp/decl2.cc   |  6 +++--
  gcc/testsuite/g++.dg/cpp1z/inline-var10.C | 33 +++
  3 files changed, 37 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/inline-var10.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e47f694e4e5..d19d09adde4 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -7954,10 +7954,6 @@ make_rtl_for_nonlocal_decl (tree decl, tree init, const 
char* asmspec)
&& DECL_IMPLICIT_INSTANTIATION (decl))
  defer_p = 1;
  
-  /* Defer vague-linkage variables.  */

-  if (DECL_INLINE_VAR_P (decl))
-defer_p = 1;
-
/* If we're not deferring, go ahead and assemble the variable.  */
if (!defer_p)
  rest_of_decl_compilation (decl, toplev, at_eof);
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 1dddbaab38b..24a9332ccb1 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -3362,7 +3362,7 @@ import_export_decl (tree decl)
  
   * inline functions
  
- * inline variables

+ * inline variables (from modules)
  
   * implicit instantiations of static data members of class

 templates
@@ -3385,7 +3385,9 @@ import_export_decl (tree decl)
|| DECL_DECLARED_INLINE_P (decl));
else
  gcc_assert (DECL_IMPLICIT_INSTANTIATION (decl)
-   || DECL_INLINE_VAR_P (decl)
+   || (DECL_INLINE_VAR_P (decl)
+   && DECL_LANG_SPECIFIC (decl)
+   && DECL_MODULE_IMPORT_P (decl))
|| DECL_VTABLE_OR_VTT_P (decl)
|| DECL_TINFO_P (decl));
/* Check that a definition of DECL is available in this translation
diff --git a/gcc/testsuite/g++.dg/cpp1z/inline-var10.C 
b/gcc/testsuite/g++.dg/cpp1z/inline-var10.C
new file mode 100644
index 000..8a198556778
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/inline-var10.C
@@ -0,0 +1,33 @@
+// PR c++/114013
+// { dg-do link { target c++17 } }
+
+struct S { int a, b; };
+
+template 
+constexpr struct S var[8] = {};
+
+template <>
+constexpr inline struct S var<6>[8] = {
+  { 1, 1 }, { 2, 0 }, { 3, 1 }, { 4, 0 },
+  { 5, 1 }, { 6, 0 }, { 7, 1 }, { 8, 0 }
+};
+
+[[gnu::noipa]] void
+foo (S)
+{
+}
+
+template 
+void
+bar (int x)
+{
+  foo (var[x]);
+}
+
+volatile int x;
+
+int
+main ()
+{
+  bar <6> (x);
+}




Re: [PATCH] RISC-V: Add riscv_vector_cc function attribute

2024-02-26 Thread juzhe.zh...@rivai.ai
Thanks for supporting this.
I'd rather leave this patch review to kito's since it's kito's proposal.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2024-02-27 09:17
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; zhengyu; xuli
Subject: [PATCH] RISC-V: Add riscv_vector_cc function attribute
From: xuli 
 
Standard vector calling convention variant will only enabled when function
has vector argument or returning value by default, however user may also
want to invoke function without that during a vectorized loop at some situation,
but it will cause a huge performance penalty due to vector register 
store/restore.
 
So user can declare function with this riscv_vector_cc attribute like below, 
that could enforce
function will use standard vector calling convention variant.
 
void foo() __attribute__((riscv_vector_cc));
[[riscv::vector_cc]] void foo(); // For C++11 and C23
 
For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
attribute to riscv_attribute_table.
(riscv_vector_cc_function_p): Return true if FUNC is a riscv_vector_cc function.
(riscv_fntype_abi): Add riscv_vector_cc attribute check.
* doc/extend.texi: Add riscv_vector_cc attribute description.
 
gcc/testsuite/ChangeLog:
 
* g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New test.
---
gcc/config/riscv/riscv.cc |  55 ++--
gcc/doc/extend.texi   |  12 ++
.../base/attribute-riscv_vector_cc-error.C|  22 
.../attribute-riscv_vector_cc-callee-saved.c  | 117 ++
.../base/attribute-riscv_vector_cc-error.c|  11 ++
5 files changed, 209 insertions(+), 8 deletions(-)
create mode 100644 
gcc/testsuite/g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c
 
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..7f37f231796 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -537,24 +537,52 @@ static tree riscv_handle_fndecl_attribute (tree *, tree, 
tree, int, bool *);
static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
/* Defining target-specific uses of __attribute__.  */
-TARGET_GNU_ATTRIBUTES (riscv_attribute_table,
+static const attribute_spec riscv_gnu_attributes[] =
{
   /* Syntax: { name, min_len, max_len, decl_required, type_required,
   function_type_required, affects_type_identity, handler,
   exclude } */
   /* The attribute telling no prologue/epilogue.  */
-  { "naked", 0,  0, true, false, false, false,
-riscv_handle_fndecl_attribute, NULL },
+  {"naked", 0, 0, true, false, false, false, riscv_handle_fndecl_attribute,
+   NULL},
   /* This attribute generates prologue/epilogue for interrupt handlers.  */
-  { "interrupt", 0, 1, false, true, true, false,
-riscv_handle_type_attribute, NULL },
+  {"interrupt", 0, 1, false, true, true, false, riscv_handle_type_attribute,
+   NULL},
   /* The following two are used for the built-in properties of the Vector type
  and are not used externally */
   {"RVV sizeless type", 4, 4, false, true, false, true, NULL, NULL},
-  {"RVV type", 0, 0, false, true, false, true, NULL, NULL}
-});
+  {"RVV type", 0, 0, false, true, false, true, NULL, NULL},
+  /* This attribute is used to declare a function, forcing it to use the
+standard vector calling convention variant. Syntax:
+__attribute__((riscv_vector_cc)). */
+  {"riscv_vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_gnu_attribute_table  =
+{
+  "gnu", {riscv_gnu_attributes}
+};
+
+static const attribute_spec riscv_attributes[] =
+{
+  /* This attribute is used to declare a function, forcing it to use the
+ standard vector calling convention variant. Syntax:
+ [[riscv::vector_cc]]. */
+  {"vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_nongnu_attribute_table =
+{
+  "riscv", {riscv_attributes}
+};
+
+static const scoped_attribute_specs *const riscv_attribute_table[] =
+{
+  _gnu_attribute_table,
+  _nongnu_attribute_table
+};
/* Order for the CLOBBERs/USEs of gpr_save.  */
static const unsigned gpr_save_reg_order[] = {
@@ -5425,6 +5453,16 @@ riscv_arguments_is_vector_type_p (const_tree fntype)
   return false;
}
+/* Return true if FUNC is a riscv_vector_cc function.
+   For more details please reference the below link.
+   https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */
+static bool
+riscv_vector_cc_function_p (const_tree fntype)
+{
+  return 

Re: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-26 Thread juzhe.zh...@rivai.ai
If the scheduling model increases the vsetvls, we shouldn't set it as default 
scheduling model



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-02-26 21:29
To: Edwin Lu; gcc-patches
CC: rdapp.gcc; gnu-toolchain; pan2.li; juzhe.zh...@rivai.ai
Subject: Re: [PATCH] RISC-V: Update test expectancies with recent scheduler 
change
On 2/24/24 00:10, Edwin Lu wrote:
> Given the recent change with adding the scheduler pipeline descriptions,
> many scan-dump failures emerged. Relax the expected assembler output
> conditions on the affected tests to reduce noise.
 
I'm not entirely sure yet about relaxing the scans like this.
There seem to be uarchs that want to minimize vsetvls under all
circumstances while others don't seem to care all that much.  We could
(not must) assume that the tests that now regress have been written
with this minimization aspect in mind and that we'd want to be sure
that we still manage to emit the minimal number of vsetvls.
 
Why is the new upper bound acceptable?  What if a vector_load cost
of 12 (or so) causes even more vsetvls?  The 6 in generic_ooo is more
or less arbitrary chosen.
 
My suggestion before was to create another sched model that has
load costs like before and run the regressing tests with that
model.  That's of course also not really ideal and actually
shoehorned a bit, in particular as no scheduling also increases
the number of vsetvls.
 
Juzhe: What's your intention with those tests?  I'd suppose you
want the vsetvl number to be minimal here and not higher?  Did you
plan to add a particular scheduling model or are you happy with
the default (all 1) latencies?
 
Regards
Robin
 


Re: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-02-26 Thread juzhe.zh...@rivai.ai
This patch looks odd to me.
I don't see memrefs in the trunk code.

Also, I prefer list all cost in cost tune info for NF = 2 ~ 8 like ARM SVE does:

  /* Detect cases in which a vector load or store represents an
LD[234] or ST[234] instruction.  */
  switch (aarch64_ld234_st234_vectors (kind, stmt_info))
{
case 2:
 stmt_cost += simd_costs->ld2_st2_permute_cost;
 break;

case 3:
 stmt_cost += simd_costs->ld3_st3_permute_cost;
 break;

case 4:
 stmt_cost += simd_costs->ld4_st4_permute_cost;
 break;
}



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2024-02-26 23:54
To: gcc-patches; palmer; Kito Cheng; juzhe.zh...@rivai.ai
CC: rdapp.gcc; jeffreyalaw
Subject: [PATCH] RISC-V: Add initial cost handling for segment loads/stores.
Hi,
 
This has been sitting on my local tree - I've been wanting to post it
for a while but somehow forgot.
 
This patch makes segment loads and stores more expensive.  It adds
segment_load and segment_store cost fields to the common vector costs
and adds handling to adjust_stmt_cost.  In the future we could handle
this in a more fine-grained manner but let's start somehow.
 
Regtested on rv64.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_[load/store]_cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_[load/store]_cost
to 1.
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: Expect m4
instead of m2.
---
gcc/config/riscv/riscv-protos.h   |   4 +
gcc/config/riscv/riscv-vector-costs.cc| 127 --
gcc/config/riscv/riscv.cc |   4 +
.../vect/costmodel/riscv/rvv/pr113112-4.c |   4 +-
4 files changed, 95 insertions(+), 44 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..2e8ab9990a8 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,6 +218,10 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
+  /* Segment load/store cost.  */
+  const int segment_load_cost;
+  const int segment_store_cost;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index adf9c197df5..d3c12444773 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1043,6 +1043,24 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
}
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+   stmt_vec_info stmt_info)
+{
+  if ((kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+   && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+ return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
/* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1067,55 +1085,80 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
-   /* Unit-stride vector loads and stores do not have offset addressing
-  as opposed to scalar loads and stores.
-  If the address depends on a variable we need an additional
-  add/sub for each load/store in the worst case.  */
-   if (stmt_info && stmt_info->stmt)
+   if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
-   data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
-   class loop *father = stmt_info->stmt->bb->loop_father;
-   if (!loop && father && !father->inner && father->superloops)
+   int group_size;
+   if ((group_size
+= segment_loadstore_group_size (kind, stmt_info)) > 1)
{
-   tree ref;
-   if (TREE_CODE (dr->ref) != MEM_REF
-   || !(ref = TREE_OPERAND (dr->ref, 0))
-   || TREE_CODE (ref) != SSA_NAME)
- break;
+   /* Segment loads and stores.  When the group size is > 1
+  the vectorizer will add a vector load/store statement for
+  each vector in the group.  Note that STMT_COST is
+  overwritten here rather than adjusted.  */
+   if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+ stmt_cost
+   = (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+ ? costs->vla->segment_load_cost
+ : costs->vla->segment_store_cost);
+   else
+ stmt_cost
+   = (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+ ? 

[PATCH] RISC-V: Add riscv_vector_cc function attribute

2024-02-26 Thread Li Xu
From: xuli 

Standard vector calling convention variant will only enabled when function
has vector argument or returning value by default, however user may also
want to invoke function without that during a vectorized loop at some situation,
but it will cause a huge performance penalty due to vector register 
store/restore.

So user can declare function with this riscv_vector_cc attribute like below, 
that could enforce
function will use standard vector calling convention variant.

void foo() __attribute__((riscv_vector_cc));
[[riscv::vector_cc]] void foo(); // For C++11 and C23

For more details please reference the below link.
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67

gcc/ChangeLog:

* config/riscv/riscv.cc (TARGET_GNU_ATTRIBUTES): Add riscv_vector_cc
attribute to riscv_attribute_table.
(riscv_vector_cc_function_p): Return true if FUNC is a riscv_vector_cc 
function.
(riscv_fntype_abi): Add riscv_vector_cc attribute check.
* doc/extend.texi: Add riscv_vector_cc attribute description.

gcc/testsuite/ChangeLog:

* g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C: New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c: 
New test.
* gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c: New test.
---
 gcc/config/riscv/riscv.cc |  55 ++--
 gcc/doc/extend.texi   |  12 ++
 .../base/attribute-riscv_vector_cc-error.C|  22 
 .../attribute-riscv_vector_cc-callee-saved.c  | 117 ++
 .../base/attribute-riscv_vector_cc-error.c|  11 ++
 5 files changed, 209 insertions(+), 8 deletions(-)
 create mode 100644 
gcc/testsuite/g++.target/riscv/rvv/base/attribute-riscv_vector_cc-error.C
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-callee-saved.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/attribute-riscv_vector_cc-error.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..7f37f231796 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -537,24 +537,52 @@ static tree riscv_handle_fndecl_attribute (tree *, tree, 
tree, int, bool *);
 static tree riscv_handle_type_attribute (tree *, tree, tree, int, bool *);
 
 /* Defining target-specific uses of __attribute__.  */
-TARGET_GNU_ATTRIBUTES (riscv_attribute_table,
+static const attribute_spec riscv_gnu_attributes[] =
 {
   /* Syntax: { name, min_len, max_len, decl_required, type_required,
   function_type_required, affects_type_identity, handler,
   exclude } */
 
   /* The attribute telling no prologue/epilogue.  */
-  { "naked",   0,  0, true, false, false, false,
-riscv_handle_fndecl_attribute, NULL },
+  {"naked", 0, 0, true, false, false, false, riscv_handle_fndecl_attribute,
+   NULL},
   /* This attribute generates prologue/epilogue for interrupt handlers.  */
-  { "interrupt", 0, 1, false, true, true, false,
-riscv_handle_type_attribute, NULL },
+  {"interrupt", 0, 1, false, true, true, false, riscv_handle_type_attribute,
+   NULL},
 
   /* The following two are used for the built-in properties of the Vector type
  and are not used externally */
   {"RVV sizeless type", 4, 4, false, true, false, true, NULL, NULL},
-  {"RVV type", 0, 0, false, true, false, true, NULL, NULL}
-});
+  {"RVV type", 0, 0, false, true, false, true, NULL, NULL},
+  /* This attribute is used to declare a function, forcing it to use the
+standard vector calling convention variant. Syntax:
+__attribute__((riscv_vector_cc)). */
+  {"riscv_vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_gnu_attribute_table  =
+{
+  "gnu", {riscv_gnu_attributes}
+};
+
+static const attribute_spec riscv_attributes[] =
+{
+  /* This attribute is used to declare a function, forcing it to use the
+ standard vector calling convention variant. Syntax:
+ [[riscv::vector_cc]]. */
+  {"vector_cc", 0, 0, false, true, true, true, NULL, NULL}
+};
+
+static const scoped_attribute_specs riscv_nongnu_attribute_table =
+{
+  "riscv", {riscv_attributes}
+};
+
+static const scoped_attribute_specs *const riscv_attribute_table[] =
+{
+  _gnu_attribute_table,
+  _nongnu_attribute_table
+};
 
 /* Order for the CLOBBERs/USEs of gpr_save.  */
 static const unsigned gpr_save_reg_order[] = {
@@ -5425,6 +5453,16 @@ riscv_arguments_is_vector_type_p (const_tree fntype)
   return false;
 }
 
+/* Return true if FUNC is a riscv_vector_cc function.
+   For more details please reference the below link.
+   https://github.com/riscv-non-isa/riscv-c-api-doc/pull/67 */
+static bool
+riscv_vector_cc_function_p (const_tree fntype)
+{
+  return lookup_attribute ("vector_cc", TYPE_ATTRIBUTES (fntype)) != NULL_TREE
+|| lookup_attribute ("riscv_vector_cc", TYPE_ATTRIBUTES (fntype)) != 
NULL_TREE;
+}
+
 /* Implement TARGET_FNTYPE_ABI.  */
 
 

[PATCH] combine: Don't simplify paradoxical SUBREG on WORD_REGISTER_OPERATIONS [PR113010]

2024-02-26 Thread Greg McGary
The sign-bit-copies of a sign-extending load cannot be known until runtime on
WORD_REGISTER_OPERATIONS targets, except in the case of a zero-extending MEM
load.  See the fix for PR112758.

2024-02-22  Greg McGary  

PR rtl-optimization/113010
* combine.cc (simplify_comparison): Simplify a SUBREG on
  WORD_REGISTER_OPERATIONS targets only if it is a zero-extending
  MEM load.

* gcc.c-torture/execute/pr113010.c: New test.
---
 gcc/combine.cc | 15 +--
 gcc/testsuite/gcc.c-torture/execute/pr113010.c |  9 +
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr113010.c

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 76543d85b7c..b09200d757e 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -12550,9 +12550,20 @@ simplify_comparison (enum rtx_code code, rtx *pop0, 
rtx *pop1)
}
 
  /* If the inner mode is narrower and we are extracting the low part,
-we can treat the SUBREG as if it were a ZERO_EXTEND.  */
+we can treat the SUBREG as if it were a ZERO_EXTEND ...  */
  if (paradoxical_subreg_p (op0))
-   ;
+   {
+ if (WORD_REGISTER_OPERATIONS
+ && GET_MODE_PRECISION (inner_mode) < BITS_PER_WORD
+ /* On WORD_REGISTER_OPERATIONS targets the bits
+beyond sub_mode aren't considered undefined,
+so optimize only if it is a MEM load when MEM loads
+zero extend, because then the upper bits are all zero.  */
+ && !(MEM_P (SUBREG_REG (op0))
+  && load_extend_op (inner_mode) == ZERO_EXTEND))
+   break;
+ /* FALLTHROUGH to case ZERO_EXTEND */
+   }
  else if (subreg_lowpart_p (op0)
   && GET_MODE_CLASS (mode) == MODE_INT
   && is_int_mode (GET_MODE (SUBREG_REG (op0)), _mode)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr113010.c 
b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
new file mode 100644
index 000..a95c613c1df
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr113010.c
@@ -0,0 +1,9 @@
+int minus_1 = -1;
+
+int
+main ()
+{
+  if ((0, 0xul) >= minus_1)
+__builtin_abort ();
+  return 0;
+}
-- 
2.34.1



[PATCH] tree-profile: Don't instrument an IFUNC resolver nor its callees

2024-02-26 Thread H.J. Lu
We can't instrument an IFUNC resolver nor its callees as it may require
TLS which hasn't been set up yet when the dynamic linker is resolving
IFUNC symbols.  Add an IFUNC resolver caller marker to symtab_node to
avoid recursive checking.

gcc/ChangeLog:

PR tree-optimization/114115
* cgraph.h (enum ifunc_caller): New.
(symtab_node): Add has_ifunc_caller.
* tree-profile.cc (check_ifunc_resolver): New.
(is_caller_ifunc_resolver): Likewise.
(tree_profiling): Don't instrument an IFUNC resolver nor its
callees.

gcc/testsuite/ChangeLog:

PR tree-optimization/114115
* gcc.dg/pr114115.c: New test.
---
 gcc/cgraph.h| 18 +++
 gcc/testsuite/gcc.dg/pr114115.c | 24 +
 gcc/tree-profile.cc | 92 +
 3 files changed, 134 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr114115.c

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 47f35e8078d..ce99f4a5114 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -100,6 +100,21 @@ enum symbol_partitioning_class
SYMBOL_DUPLICATE
 };
 
+/* Classification whether a function has any IFUNC resolver caller.  */
+enum ifunc_caller
+{
+  /* It is unknown if this function has any IFUNC resolver caller.  */
+  IFUNC_CALLER_UNKNOWN,
+  /* Work in progress to check if this function has any IFUNC resolver
+ caller.  */
+  IFUNC_CALLER_WIP,
+  /* This function has at least an IFUNC resolver caller, including
+ itself.  */
+  IFUNC_CALLER_TRUE,
+  /* This function doesn't have any IFUNC resolver caller.  */
+  IFUNC_CALLER_FALSE
+};
+
 /* Base of all entries in the symbol table.
The symtab_node is inherited by cgraph and varpol nodes.  */
 struct GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),
@@ -121,6 +136,7 @@ public:
   used_from_other_partition (false), in_other_partition (false),
   address_taken (false), in_init_priority_hash (false),
   need_lto_streaming (false), offloadable (false), ifunc_resolver (false),
+  has_ifunc_caller (IFUNC_CALLER_UNKNOWN),
   order (false), next_sharing_asm_name (NULL),
   previous_sharing_asm_name (NULL), same_comdat_group (NULL), ref_list (),
   alias_target (NULL), lto_file_data (NULL), aux (NULL),
@@ -595,6 +611,8 @@ public:
   /* Set when symbol is an IFUNC resolver.  */
   unsigned ifunc_resolver : 1;
 
+  /* Classification whether a function has any IFUNC resolver caller.  */
+  ENUM_BITFIELD (ifunc_caller) has_ifunc_caller : 2;
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/testsuite/gcc.dg/pr114115.c b/gcc/testsuite/gcc.dg/pr114115.c
new file mode 100644
index 000..2629f591877
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114115.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -fprofile-generate -fdump-tree-optimized" } */
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-require-ifunc "" } */
+
+void *foo_ifunc2() __attribute__((ifunc("foo_resolver")));
+
+void bar(void)
+{
+}
+
+static int f3()
+{
+  bar ();
+  return 5;
+}
+
+void (*foo_resolver(void))(void)
+{
+  f3();
+  return bar;
+}
+
+/* { dg-final { scan-tree-dump-not "__gcov_indirect_call_profiler_v" 
"optimized" } } */
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index aed13e2b1bc..46478648b32 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -738,6 +738,72 @@ include_source_file_for_profile (const char *filename)
   return false;
 }
 
+/* Return true and set *DATA to true if NODE is an ifunc resolver.  */
+
+static bool
+check_ifunc_resolver (cgraph_node *node, void *data)
+{
+  if (node->ifunc_resolver)
+{
+  bool *is_ifunc_resolver = (bool *) data;
+  *is_ifunc_resolver = true;
+  return true;
+}
+  return false;
+}
+
+/* Return true if any caller of NODE is an ifunc resolver.  */
+
+static bool
+is_caller_ifunc_resolver (cgraph_node *node)
+{
+  if (node->has_ifunc_caller == IFUNC_CALLER_WIP)
+gcc_unreachable ();
+
+  node->has_ifunc_caller = IFUNC_CALLER_WIP;
+  bool is_ifunc_resolver = false;
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+{
+  /* Check for recursive call.  */
+  if (e->caller == node)
+   continue;
+
+  switch (e->caller->has_ifunc_caller)
+   {
+   case IFUNC_CALLER_UNKNOWN:
+ e->caller->call_for_symbol_and_aliases (check_ifunc_resolver,
+ _ifunc_resolver,
+ true);
+ if (is_ifunc_resolver)
+   {
+ e->caller->has_ifunc_caller = IFUNC_CALLER_TRUE;
+ return true;
+   }
+ break;
+   case IFUNC_CALLER_TRUE:
+ return true;
+   case IFUNC_CALLER_FALSE:
+ /* This caller doesn't have any IFUNC resolver call.  Check
+the next caller.  */
+ continue;
+
+   case IFUNC_CALLER_WIP:
+ continue;
+   }
+
+  if 

[r14-9173 Regression] FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr" on Linux/x86_64

2024-02-26 Thread haochen.jiang
On Linux/x86_64,

af66ad89e8169f44db723813662917cf4cbb78fc is the first bad commit
commit af66ad89e8169f44db723813662917cf4cbb78fc
Author: Richard Biener 
Date:   Fri Feb 23 16:06:05 2024 +0100

middle-end/114070 - folding breaking VEC_COND expansion

caused

FAIL: gcc.dg/tree-ssa/andnot-2.c scan-tree-dump-not forwprop3 "_expr"

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-9173/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/andnot-2.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/andnot-2.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[patch,avr,applied] Tag optimization options as such.

2024-02-26 Thread Georg-Johann Lay

Some avr options were missing the "Optimization" flag, which
is added by this patch.

Johann

--

AVR: Tag optimization options as "Optimization".

Some options that are pure optimizations where not tagged as such.

gcc/
* config/avr/avr.opt (mcall-prologues, mrelax, maccumulate-args)
(mstrict-X): Tag as "Optimization".

diff --git a/gcc/config/avr/avr.opt b/gcc/config/avr/avr.opt
index c9f2b4d2fe5..76530fd0f71 100644
--- a/gcc/config/avr/avr.opt
+++ b/gcc/config/avr/avr.opt
@@ -19,7 +19,7 @@
 ; .

 mcall-prologues
-Target Mask(CALL_PROLOGUES)
+Target Mask(CALL_PROLOGUES) Optimization
 Use subroutines for function prologues and epilogues.

 mmcu=
@@ -79,7 +79,7 @@ Target Mask(TINY_STACK)
 Change only the low 8 bits of the stack pointer.

 mrelax
-Target
+Target Optimization
 Relax branches.

 mpmem-wrap-around
@@ -87,11 +87,11 @@ Target
 Make the linker relaxation machine assume that a program counter 
wrap-around occurs.


 maccumulate-args
-Target Mask(ACCUMULATE_OUTGOING_ARGS)
+Target Mask(ACCUMULATE_OUTGOING_ARGS) Optimization
 Accumulate outgoing function arguments and acquire/release the needed 
stack space for outgoing function arguments in function 
prologue/epilogue.  Without this option, outgoing arguments are pushed 
before calling a function and popped afterwards.  This option can lead 
to reduced code size for functions that call many functions that get 
their arguments on the stack like, for example printf.


 mstrict-X
-Target Var(avr_strict_X) Init(0)
+Target Var(avr_strict_X) Init(0) Optimization
 When accessing RAM, use X as imposed by the hardware, i.e. just use 
pre-decrement, post-increment and indirect addressing with the X 
register.  Without this option, the compiler may assume that there is an 
addressing mode X+const similar to Y+const and Z+const and emit 
instructions to emulate such an addressing mode for X.


 mflmap


[PATCH] c++/modules: complete_vars ICE with non-exported constexpr var

2024-02-26 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

Here during stream-in of the non-exported constexpr var 'a' we call
maybe_register_incomplete_var, which ends up taking the second branch
and pushing {a, NULL_TREE} onto incomplete_vars.  We later ICE from
complete_vars due to this NULL_TREE class context.

Judging by the two commits that introduced/modified this part of
maybe_register_incomplete_var, r196852 and r214333, ISTM this branch
is really only concerned with constexpr static data members (whose
initializer may contain a pointer-to-member for a currently open class).
So this patch restricts this branch accordingly so it's not inadvertently
taken during stream-in.

gcc/cp/ChangeLog:

* decl.cc (maybe_register_incomplete_var): Restrict second
branch to static data members from a currently open class.

gcc/testsuite/ChangeLog:

* g++.dg/modules/cexpr-4_a.C: New test.
* g++.dg/modules/cexpr-4_b.C: New test.
---
 gcc/cp/decl.cc   |  2 ++
 gcc/testsuite/g++.dg/modules/cexpr-4_a.C | 12 
 gcc/testsuite/g++.dg/modules/cexpr-4_b.C |  6 ++
 3 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-4_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/cexpr-4_b.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e47f694e4e5..82b5bc83927 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -18976,6 +18976,8 @@ maybe_register_incomplete_var (tree var)
  vec_safe_push (incomplete_vars, iv);
}
   else if (!(DECL_LANG_SPECIFIC (var) && DECL_TEMPLATE_INFO (var))
+  && DECL_CLASS_SCOPE_P (var)
+  && currently_open_class (DECL_CONTEXT (var))
   && decl_constant_var_p (var)
   && (TYPE_PTRMEM_P (inner_type) || CLASS_TYPE_P (inner_type)))
{
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-4_a.C 
b/gcc/testsuite/g++.dg/modules/cexpr-4_a.C
new file mode 100644
index 000..cec2e351898
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-4_a.C
@@ -0,0 +1,12 @@
+// { dg-additional-options "-fmodules-ts" }
+export module Cexpr4;
+// { dg-module-cmi "Cexpr4" }
+
+struct A { int v = 42; };
+
+constexpr A a;
+
+export
+inline int f() {
+  return a.v;
+}
diff --git a/gcc/testsuite/g++.dg/modules/cexpr-4_b.C 
b/gcc/testsuite/g++.dg/modules/cexpr-4_b.C
new file mode 100644
index 000..6fbe652bf22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/cexpr-4_b.C
@@ -0,0 +1,6 @@
+// { dg-additional-options "-fmodules-ts" }
+import Cexpr4;
+
+int w = f();
+
+struct A { };
-- 
2.44.0.rc1.15.g4fc51f00ef



[PATCH] c++/modules: befriending template from current class scope

2024-02-26 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Here the TEMPLATE_DECL representing the template friend declaration for
B has class scope since B has class scope, but get_merge_kind assumes
all DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P TEMPLATE_DECL have namespace
scope and wrongly returns MK_named instead of MK_local_friend.

gcc/cp/ChangeLog:

* module.cc (trees_out::get_merge_kind) :
Accomodate class-scope DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P
TEMPLATE_DECL.  Merge IDENTIFIER_ANON_P branches.

gcc/testsuite/ChangeLog:

* g++.dg/modules/friend-7.h: New test.
* g++.dg/modules/friend-7_a.H: New test.
* g++.dg/modules/friend-7_b.C: New test.
---
 gcc/cp/module.cc  | 19 +--
 gcc/testsuite/g++.dg/modules/friend-7.h   |  5 +
 gcc/testsuite/g++.dg/modules/friend-7_a.H |  3 +++
 gcc/testsuite/g++.dg/modules/friend-7_b.C |  5 +
 4 files changed, 22 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/friend-7.h
 create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/friend-7_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 106af7bdb3e..fa91c6ff9cb 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -10491,21 +10491,20 @@ trees_out::get_merge_kind (tree decl, depset *dep)
break;
  }
 
-   if (RECORD_OR_UNION_TYPE_P (ctx))
+   if (TREE_CODE (decl) == TEMPLATE_DECL
+   && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
  {
-   if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
- mk = MK_field;
+   mk = MK_local_friend;
break;
  }
 
-   if (TREE_CODE (decl) == TEMPLATE_DECL
-   && DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (decl))
- mk = MK_local_friend;
-   else if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
+   if (IDENTIFIER_ANON_P (DECL_NAME (decl)))
  {
-   if (DECL_IMPLICIT_TYPEDEF_P (decl)
-   && UNSCOPED_ENUM_P (TREE_TYPE (decl))
-   && TYPE_VALUES (TREE_TYPE (decl)))
+   if (RECORD_OR_UNION_TYPE_P (ctx))
+ mk = MK_field;
+   else if (DECL_IMPLICIT_TYPEDEF_P (decl)
+&& UNSCOPED_ENUM_P (TREE_TYPE (decl))
+&& TYPE_VALUES (TREE_TYPE (decl)))
  /* Keyed by first enum value, and underlying type.  */
  mk = MK_enum;
else
diff --git a/gcc/testsuite/g++.dg/modules/friend-7.h 
b/gcc/testsuite/g++.dg/modules/friend-7.h
new file mode 100644
index 000..c0f00394f3b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7.h
@@ -0,0 +1,5 @@
+template
+struct A {
+  template struct B { };
+  template friend struct B;
+};
diff --git a/gcc/testsuite/g++.dg/modules/friend-7_a.H 
b/gcc/testsuite/g++.dg/modules/friend-7_a.H
new file mode 100644
index 000..e750e4c7d8d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7_a.H
@@ -0,0 +1,3 @@
+// { dg-additional-options "-fmodule-header" }
+// { dg-module-cmi {} }
+#include "friend-7.h"
diff --git a/gcc/testsuite/g++.dg/modules/friend-7_b.C 
b/gcc/testsuite/g++.dg/modules/friend-7_b.C
new file mode 100644
index 000..d90b685d89d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/friend-7_b.C
@@ -0,0 +1,5 @@
+// { dg-additional-options "-fmodules-ts" }
+#include "friend-7.h"
+import "friend-7_a.H";
+
+A a;
-- 
2.44.0.rc1.15.g4fc51f00ef



[PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread jeevitha
Hi All,

The following patch has been bootstrapped and regtested on powerpc64le-linux.

There is no immediate value splatting instruction in Power. Currently, those
values need to be stored in a register or memory. To address this issue, I
have updated the predicate for the second operand in vsx_splat to
splat_input_operand and corrected the assignment of op1 to operands[1].
These changes ensure that operand1 is stored in a register.

2024-02-26  Jeevitha Palanisamy  

gcc/
PR target/113950
* config/rs6000/vsx.md (vsx_splat_): Updated the predicates
for second operand and corrected the assignment.

gcc/testsuite/
PR target/113950
* gcc.target/powerpc/pr113950.c: New testcase.

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 6111cc90eb7..3e2df247630 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4660,14 +4660,14 @@
 (define_expand "vsx_splat_"
   [(set (match_operand:VSX_D 0 "vsx_register_operand")
(vec_duplicate:VSX_D
-(match_operand: 1 "input_operand")))]
+(match_operand: 1 "splat_input_operand")))]
   "VECTOR_MEM_VSX_P (mode)"
 {
   rtx op1 = operands[1];
   if (MEM_P (op1))
 operands[1] = rs6000_force_indexed_or_indirect_mem (op1);
   else if (!REG_P (op1))
-op1 = force_reg (mode, op1);
+operands[1] = force_reg (mode, op1);
 })
 
 (define_insn "vsx_splat__reg"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr113950.c 
b/gcc/testsuite/gcc.target/powerpc/pr113950.c
new file mode 100644
index 000..5c6865a8544
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr113950.c
@@ -0,0 +1,25 @@
+/* PR target/113950 */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O1 -mdejagnu-cpu=power7" } */
+
+/* Verify we do not ICE on the following.  */
+
+void abort (void);
+
+int main ()
+{
+  int i;
+  vector signed long long vsll_result, vsll_expected_result;
+  signed long long sll_arg1;
+
+  sll_arg1 = 300;
+  vsll_expected_result = (vector signed long long) {300, 300};
+  vsll_result = __builtin_vsx_splat_2di (sll_arg1);  
+
+  for (i = 0; i < 2; i++)
+if (vsll_result[i] != vsll_expected_result[i])
+  abort();
+
+  return 0;
+}




[avr,patch,applied] Remove some dead code

2024-02-26 Thread Georg-Johann Lay

This code was dead in the block where it lived,
because avr_adiw_reg_p() is only true when ADIW and SBIW
are available -- which is not the case for AVR_TINY.

Johann

--

AVR: Dead code removal.

gcc/
* config/avr/avr.cc (avr_out_compare) [AVR_TINY]: Remove code in
an "if avr_adiw_reg_p()" block that's dead for AVR_TINY.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index d3756a2f036..655a8e89fdc 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -6291,10 +6291,7 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
  && (val8 == 0
  || reg_unused_after (insn, xreg)))
{
- if (AVR_TINY)
-   avr_asm_len (TINY_SBIW (%A0, %B0, %1), xop, plen, 2);
- else
-   avr_asm_len ("sbiw %0,%1", xop, plen, 1);
+ avr_asm_len ("sbiw %0,%1", xop, plen, 1);

  i++;
  continue;
@@ -6305,9 +6302,7 @@ avr_out_compare (rtx_insn *insn, rtx *xop, int *plen)
  && compare_eq_p (insn)
  && reg_unused_after (insn, xreg))
{
- return AVR_TINY
-   ? avr_asm_len (TINY_ADIW (%A0, %B0, %n1), xop, plen, 2)
-   : avr_asm_len ("adiw %0,%n1", xop, plen, 1);
+ return avr_asm_len ("adiw %0,%n1", xop, plen, 1);
}
}



Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Sandra Loosemore

On 2/26/24 09:19, Joseph Myers wrote:

On Mon, 26 Feb 2024, Alejandro Colomar wrote:


The idea seems reasonable, but the patch needs documentation for the new
option in invoke.texi.


Thanks!  Will do.

I don't see an obvious order in that file.  Where would you put the
option?  Do you want me to sort(1) it first, and then insert the new
option in alphabetic order?


Sandra might have a better idea of how the warning options ought to be
ordered in the manual.  (Obviously, don't mix reordering with any content
changes.)


Please don't blindly sort the options sections of the manual.  They've 
typically been organized to present general options first (things like 
-Wpedantic in the warning options section, or -g or -o in the debugging 
and optimization optimization sections), then more specialized flags for 
controlling specific individual behaviors.  We could do a better job of 
keeping the latter sublists alphabetized, but in some cases we have also 
grouped documentation for related options together even if they are not 
alphabetical, but that needs some manual review and decision-making and 
should not be mixed with adding new content.  We do try to keep the 
lists in the "Option Summary" section alphabetized except for the 
general options listed first, though.


As far as where to put documentation for new options, I'd say that you 
should put it in alphabetical order in the appropriate section, unless 
there is good reason to put it somewhere else, or the alphabetization of 
the whole section is so messed-up you can't figure out where it should 
go (putting it at the end is OK in that case, as opposed to some other 
random location in the list).  Remember to add the matching entry in the 
appropriate "Option Summary" list, and for this option IIUC you also 
need to add it to the lists of other options enabled by -Wextra and 
-Wc++-compat.


Are we still accepting patches to add new options in stage 4, or is this 
a stage 1 item?


-Sandra


Re: [wwwdocs] Add Ada's GCC 14 changelog entry

2024-02-26 Thread Fernando Oleo Blanco
Hi Mark,

On 2/26/24 10:17, Marc Poulhiès wrote:
> 
> Fernando,
> 
> Thank you for this work! I have a few comments, see below.
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 85ccc54d..e6c96c9f 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -171,7 +171,49 @@ a work-in-progress.
>   
>   New Languages and Language specific improvements
> 
> -
> +Ada
> +
> +
> +  Several new aspects and contracts have been implemented:
> 
> Maybe worth noting that these are implementation defined aspects.

Noted

> 
> +
> +  Exceptional_Cases may be specified for procedures and

[...]
> +  Side_Effecs.
> +  Always_Terminates is a boolean equivalent to 
> pragma
> +  Always_Terminates
> +  Ghost_Predicate
> 
> It looks like Ghost_Predicate is missing some text here.
> 
> It may be a good thing to link to the actual documentation for these
> options. Thanks to some documention changes, we can now link to
> an option directly. For example:
> 
> https://gcc.gnu.org/onlinedocs/gnat_rm/Implementation-Defined-Pragmas.html

Added

> 
> You would need to point to the correct version (this one points to
> current devel version).
> 

Done

> +
> +  
> +  The new attributes and contracts have been applied to the relevant 
> parts
[...]
> +  -fharden-control-flow-redundancy.
> +  Custom bools with higher Hamming distance.
> +  The strub attribute has been added for functions and
> 
> Same as above for doc links:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-compares

Done

> 
> +  variables in order to automatically zero-out their stack upon use or
> +  return.
> +
> +  
> +  Further clean up and improvements to the GNAT code.
> +  Support for vxWorks 7 Cert RTP has been removed.
> +
> 
>   

I have applied your recommendations. The documentation links are still 
not up... Nonetheless, I created the URL in such a way that they should 
work once the final documentation is given a release number (which I 
guessed to be 14.1.0). If you think this can be improved just say so. 
Nonetheless, feel free to modify my patch if you see it fit.

Best regards,
FerFrom 0ae94649be7f638bb4f98ba3e2ba2e1bf9770c09 Mon Sep 17 00:00:00 2001
From: Fernando Oleo Blanco 
Date: Sun, 25 Feb 2024 21:43:43 +0100
Subject: [PATCH 1/2] Add Ada changes for v14

---
 htdocs/gcc-14/changes.html | 44 +-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 85ccc54d..e6c96c9f 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -171,7 +171,49 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
-
+Ada
+
+
+  Several new aspects and contracts have been implemented:
+
+  Exceptional_Cases may be specified for procedures and
+  functions with side effects; it can be used to list exceptions that might
+  be propagated by the subprogram with side effects in the context of its
+  precondition, and associate them with a specific postcondition. For more
+  information, refer to SPARK 2014 Reference Manual, section 6.1.9.
+  User_Aspect takes an argument that is the name of an
+  aspect defined by a User_Aspect_Definition configuration pragma.
+  Local_Restrictions is used to specify that a particular
+  subprogram does not violate one or more local restrictions, nor can it
+  call a subprogram that is not subject to the same requirements.
+  Side_Effects is equivalent to pragma
+  Side_Effecs.
+  Always_Terminates is a boolean equivalent to pragma
+  Always_Terminates
+  Ghost_Predicate
+
+  
+  The new attributes and contracts have been applied to the relevant parts
+of the Ada library and more code has been proven to be correct.
+  Initial support for the
+  https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/;>CHERI
+  architecture.
+  Support for the LoongArch architecture.
+  Hardening improvements:
+
+  Use of the new -fharden* options. Most
+  notably -fharden-compares,
+  -fharden-conditional-branches and
+  -fharden-control-flow-redundancy.
+  Custom bools with higher Hamming distance.
+  The strub attribute has been added for functions and
+  variables in order to automatically zero-out their stack upon use or
+  return.
+
+  
+  Further clean up and improvements to the GNAT code.
+  Support for vxWorks 7 Cert RTP has been removed.
+
 
 
 
-- 
2.43.2


From ddacac13df6df1527b2368818fe6e9f0cc34028e Mon Sep 17 00:00:00 2001
From: Fernando Oleo Blanco 
Date: Mon, 26 Feb 2024 20:26:55 +0100
Subject: [PATCH 2/2] Add more information to the Ada changes

---
 htdocs/gcc-14/changes.html | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index e6c96c9f..260d04f5 100644
--- 

Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Mike Stump
On Feb 26, 2024, at 7:56 AM, Alejandro Colomar  wrote:
> 
> I don't see an obvious order in that file.  Where would you put the
> option?

The best place, would be to put it just after:

-Warray-bounds
-Warray-bounds=n

This is a functional style grouping that best mirrors the existing ordering.  A 
runner up would be next to the string options or near a "terminating NUL", but 
after review, these don't seem more suitable.

> Do you want me to sort(1) it first, and then insert the new
> option in alphabetic order?

No.  Let others play at that.

[PATCH v1 05/13] Reuse MinGW from i386 for AArch64

2024-02-26 Thread Evgeny Karpov
Thank you for the fix, Bernhard! Please send it as a separate patch.

Regards,
Evgeny

-Original Message-
Sent: Monday, February 26, 2024 9:18 AM
Bernhard Reutner-Fischer 

PS: Please don't forget to add an entry to contrib/config-list.mk thanks


[PATCH v1 04/13] aarch64: Add aarch64-w64-mingw32 COFF

2024-02-26 Thread Evgeny Karpov
Thanks for noticing this definition.
Yes, it was added to enable proper types in mingw/mingw-stdint.h for AArch64.
Based on the review, TARGET_64BIT has been excluded from 
aarch64/aarch64-coff.h, and 
mingw/mingw-stdint.h has been modified to support AArch64.

Regards,
Evgeny


gcc/config/mingw/mingw-stdint.h
@@ -46,5 +46,10 @@

-#define INTPTR_TYPE (TARGET_64BIT ? "long long int" : "int")
-#define UINTPTR_TYPE (TARGET_64BIT ? "long long unsigned int" : "unsigned int")
+#if defined (TARGET_AARCH64_MS_ABI)
+# define INTPTR_TYPE "long long int"
+# define UINTPTR_TYPE "long long unsigned int"
+#else
+# define INTPTR_TYPE (TARGET_64BIT ? "long long int" : "int")
+# define UINTPTR_TYPE (TARGET_64BIT ? "long long unsigned int" : "unsigned 
int")
+#endif


-Original Message-
Friday, February 23, 2024 6:02 PM 
Richard Sandiford wrote:


The only surprising thing here to me was:

> +
> +#define TARGET_64BIT 1

...this.  Does some code that is shared between x86 and aarch64 rely on this 
definition?  It might be worth identifying the code in a comment if so.

Thanks,
Richard



Re: [PATCH] c++: Fix explicit instantiation of const variable templates after earlier implicit instantation [PR113976]

2024-02-26 Thread Patrick Palka
On Tue, 20 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> Already previously instantiated const variable templates had
> cp_apply_type_quals_to_decl called when they were instantiated,
> but if they need runtime initialization, their TREE_READONLY flag
> has been subsequently cleared.
> Explicit variable template instantiation calls grokdeclarator which
> calls cp_apply_type_quals_to_decl on them again, setting TREE_READONLY
> flag again, but nothing clears it afterwards, so we emit such
> instantiations into rodata sections and segfault when the dynamic
> initialization attempts to initialize them.
> 
> The following patch fixes that by not calling cp_apply_type_quals_to_decl
> on already instantiated variable declarations.

LGTM, this seems like the safest approach for backporting.  Note
we can't check DECL_EXPLICIT_INSTANTIATION at this point because
that doesn't get set until later from do_decl_instantiation.

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-02-19  Jakub Jelinek  
>   Patrick Palka  
> 
>   PR c++/113976
>   * decl.cc (grokdeclarator): Don't call cp_apply_type_quals_to_decl
>   on DECL_TEMPLATE_INSTANTIATED VAR_DECLs.
> 
>   * g++.dg/cpp1y/var-templ87.C: New test.
> 
> --- gcc/cp/decl.cc.jj 2024-02-15 09:51:34.460065992 +0100
> +++ gcc/cp/decl.cc2024-02-19 19:18:09.839188137 +0100
> @@ -15263,7 +15263,12 @@ grokdeclarator (const cp_declarator *dec
>  /* Record constancy and volatility on the DECL itself .  There's
> no need to do this when processing a template; we'll do this
> for the instantiated declaration based on the type of DECL.  */
> -if (!processing_template_decl)
> +if (!processing_template_decl
> + /* Don't do it for instantiated variable templates either,
> +cp_apply_type_quals_to_decl should have been called on it
> +already and might have been overridden in cp_finish_decl
> +if initializer needs runtime initialization.  */
> + && (!VAR_P (decl) || !DECL_TEMPLATE_INSTANTIATED (decl)))
>cp_apply_type_quals_to_decl (type_quals, decl);
>  
>  return decl;
> --- gcc/testsuite/g++.dg/cpp1y/var-templ87.C.jj   2024-02-19 
> 19:21:49.668129195 +0100
> +++ gcc/testsuite/g++.dg/cpp1y/var-templ87.C  2024-02-19 19:21:42.218232862 
> +0100
> @@ -0,0 +1,43 @@
> +// PR c++/113976
> +// { dg-do run { target c++14 } }
> +
> +int
> +foo ()
> +{
> +  return 42;
> +}
> +
> +template 
> +const int a = foo ();
> +const int *b =  <0>;
> +template 
> +const int c = foo ();
> +template const int c <0>;
> +template 
> +const int d = foo ();
> +const int *e =  <0>;
> +template const int d <0>;
> +template 
> +const int f = foo ();
> +template const int f <0>;
> +const int *g =  <0>;
> +struct S { int a, b; };
> +template 
> +const S h = { 42, foo () };
> +const S *i =  <0>;
> +template 
> +const S j =  { 42, foo () };
> +template const S j <0>;
> +template 
> +const S k =  { 42, foo () };
> +const S *l =  <0>;
> +template const S k <0>;
> +template 
> +const S m =  { 42, foo () };
> +template const S m <0>;
> +const S *n =  <0>;
> +
> +int
> +main ()
> +{
> +}
> 
>   Jakub
> 
> 



[PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-26 Thread Evgeny Karpov
In user mode on Windows, it points to TEB (Thread Environment Block).
more information here
https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170#integer-registers
https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-teb

Regards,
Evgeny

-Original Message-
Friday, February 23, 2024 5:56 PM
Richard Sandiford wrote:

How does the MS ABI use this register?  Same question for Darwin I suppose.

Thanks,
Richard


Re: [PATCH] c++: Revert deferring emission of inline variables [PR114013]

2024-02-26 Thread Patrick Palka
On Wed, 21 Feb 2024, Nathaniel Shead wrote:

> My earlier patch appears to have caused some regressions. I've taken a
> quick look to see if there are obvious workarounds, but given the time
> frame and the fact that I still don't really understand all the details
> of how and when symbols get emitted, I felt it was safer to revert the
> non-modules parts of this change instead.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> This is a (partial) reversion of r14-8987-gdd9d14f7d53 to return to
> eagerly emitting inline variables to the middle-end when they are
> declared. 'import_export_decl' will still continue to accept them, as
> allowing this is a pure extension and doesn't seem to cause issues with
> modules, but otherwise deferring the emission of inline variables
> appears to cause issues on some targets and prevents some code using
> inline variable templates from correctly linking.
> 
> There might be a more targetted way to fix this, but due to the
> complexity of handling linkage I'd prefer to wait till GCC 15 to explore
> our options.

Seems reasonable to me.

Aside from reverting, I suppose a couple of other options (which you've
probably already considered) are to restrict that defer_p condition
to non-templates by checking !DECL_USE_TEMPLATE (so in particular
explicit specializations aren't deferred), or to call
note_vague_linkage_variable for inline explicit specializations (e.g.
from make_rtl_for_nonlocal_decl upon deferring, or from
check_explicit_specialization).  These should help with PR114013, no
idea about PR113970 though..

> 
>   PR c++/113970
>   PR c++/114013
> 
> gcc/cp/ChangeLog:
> 
>   * decl.cc (make_rtl_for_nonlocal_decl): Don't defer inline
>   variables.
>   * decl2.cc (import_export_decl): Only support inline variables
>   imported from a module.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/inline-var10.C: New test.
> ---
>  gcc/cp/decl.cc|  4 ---
>  gcc/cp/decl2.cc   |  6 +++--
>  gcc/testsuite/g++.dg/cpp1z/inline-var10.C | 33 +++
>  3 files changed, 37 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/inline-var10.C
> 
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index e47f694e4e5..d19d09adde4 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -7954,10 +7954,6 @@ make_rtl_for_nonlocal_decl (tree decl, tree init, 
> const char* asmspec)
>&& DECL_IMPLICIT_INSTANTIATION (decl))
>  defer_p = 1;
>  
> -  /* Defer vague-linkage variables.  */
> -  if (DECL_INLINE_VAR_P (decl))
> -defer_p = 1;
> -
>/* If we're not deferring, go ahead and assemble the variable.  */
>if (!defer_p)
>  rest_of_decl_compilation (decl, toplev, at_eof);
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index 1dddbaab38b..24a9332ccb1 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -3362,7 +3362,7 @@ import_export_decl (tree decl)
>  
>   * inline functions
>  
> - * inline variables
> + * inline variables (from modules)
>  
>   * implicit instantiations of static data members of class
> templates
> @@ -3385,7 +3385,9 @@ import_export_decl (tree decl)
>   || DECL_DECLARED_INLINE_P (decl));
>else
>  gcc_assert (DECL_IMPLICIT_INSTANTIATION (decl)
> - || DECL_INLINE_VAR_P (decl)
> + || (DECL_INLINE_VAR_P (decl)
> + && DECL_LANG_SPECIFIC (decl)
> + && DECL_MODULE_IMPORT_P (decl))
>   || DECL_VTABLE_OR_VTT_P (decl)
>   || DECL_TINFO_P (decl));
>/* Check that a definition of DECL is available in this translation
> diff --git a/gcc/testsuite/g++.dg/cpp1z/inline-var10.C 
> b/gcc/testsuite/g++.dg/cpp1z/inline-var10.C
> new file mode 100644
> index 000..8a198556778
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1z/inline-var10.C
> @@ -0,0 +1,33 @@
> +// PR c++/114013
> +// { dg-do link { target c++17 } }
> +
> +struct S { int a, b; };
> +
> +template 
> +constexpr struct S var[8] = {};
> +
> +template <>
> +constexpr inline struct S var<6>[8] = {
> +  { 1, 1 }, { 2, 0 }, { 3, 1 }, { 4, 0 },
> +  { 5, 1 }, { 6, 0 }, { 7, 1 }, { 8, 0 }
> +};
> +
> +[[gnu::noipa]] void
> +foo (S)
> +{
> +}
> +
> +template 
> +void
> +bar (int x)
> +{
> +  foo (var[x]);
> +}
> +
> +volatile int x;
> +
> +int
> +main ()
> +{
> +  bar <6> (x);
> +}
> -- 
> 2.43.0
> 
> 



Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

2024-02-26 Thread Andre Vieira (lists)




On 05/02/2024 09:56, Richard Biener wrote:

On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:




On 01/02/2024 07:19, Richard Biener wrote:

On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:


The patch didn't come with a testcase so it's really hard to tell
what goes wrong now and how it is fixed ...


My bad! I had a testcase locally but never added it...

However... now I look at it and ran it past Richard S, the codegen isn't
'wrong', but it does have the potential to lead to some pretty slow codegen,
especially for inbranch simdclones where it transforms the SVE predicate into
an Advanced SIMD vector by inserting the elements one at a time...

An example of which can be seen if you do:

gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S

with the following t.c:
#pragma omp declare simd simdlen(4) inbranch
int __attribute__ ((const)) fn5(int);

void fn4 (int *a, int *b, int n)
{
 for (int i = 0; i < n; ++i)
 b[i] = fn5(a[i]);
}

Now I do have to say, for our main usecase of libmvec we won't have any
'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that
doesn't mean user-code will.


It seems to use SVE masks with vector(4)  and the
ABI says the mask is vector(4) int.  You say that's because we choose
a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).

The vectorizer creates

   _44 = VEC_COND_EXPR ;

and then vector lowering decomposes this.  That means the vectorizer
lacks a check that the target handles this VEC_COND_EXPR.

Of course I would expect that SVE with VLS vectors is able to
code generate this operation, so it's missing patterns in the end.

Richard.



What should we do for GCC-14? Going forward I think the right thing to 
do is to add these patterns. But I am not even going to try to do that 
right now and even though we can codegen for this, the result doesn't 
feel like it would ever be profitable which means I'd rather not 
vectorize, or well pick a different vector mode if possible.


This would be achieved with the change to the targethook. If I change 
the hook to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that 
OK for now?


Kind regards,
Andre


Re: [PATCH] arm: fix c23 0-named-args caller-side stdarg

2024-02-26 Thread Matthew Malcomson

Hi Alexandre,

I don't have the ability to OK the patch, but I'm attempting to do a 
review in

order to reduce the workload for any maintainer.  (Apologies for the slow
response).

I think you're right that the AAPCS32 requires all arguments to be passed in
registers for this testcase.
(Nit on the commit-message: It says that your reading of the AAPCS32 
suggests
that the *caller* is correct -- I believe based on the change you 
suggested you

meant *callee* is correct in expecting arguments in registers.)

The approach you suggest looks OK to me -- I do notice that it doesn't 
fix the

legacy ABI's of `atpcs` and `apcs` and guess it would be nicer to have them
working at the same time though would defer to maintainers on how 
important that

is.
(For the benefit of others reading) I don't believe there is any ABI concern
with this since it's fixing something that is currently not working at 
all and

only applies to c23 (so a change shouldn't have too much of an impact).

You mention you chose to make the change in the arm backend rather than 
general

code due to hesitancy to change the generic ABI-affecting code. That makes
sense to me, certainly at this late stage in the development cycle.

That said, I do expect the more robust solution would be to change a part of
the generic ABI code and would like to check that this makes sense to 
you and

anyone else upstream in the discussion (justification below).
So would suggest to the maintainers that maybe this goes in with a note to
remember to look at a possible generic code change later on.

Does that sound reasonable to you?

--
Justification for why a change to the generic ABI code would be better 
in the

end:

IIUC (from the documentation of the hooks) `pretend_outgoing_varargs_named`
really should mean that `n_named_args = num_actuals`.  That seems to be 
what the

documentation for `strict_argument_naming` indicates should happen.

``` (Documentation for TARGET_STRICT_ARGUMENT_NAMING):
If it returns 'false', but 'TARGET_PRETEND_OUTGOING_VARARGS_NAMED' returns
'true', then all arguments are treated as named.
```

Because of this I guess that if `pretend_outgoing_varargs_named` only 
means to
pretend the above *except* in the c23 0-named-args case that seems like 
it would

be a bit awkward to account for in backends.

From a quick check on c23-stdarg-4.c it does look like the below change 
ends up
with the same codegen as your patch (except in the case of those legacy 
ABI's,

where the below does make the caller and callee ABI match AFAICT):

```
  diff --git a/gcc/calls.cc b/gcc/calls.cc
  index 01f44734743..0b302f633ed 100644
  --- a/gcc/calls.cc
  +++ b/gcc/calls.cc
  @@ -2970,14 +2970,15 @@ expand_call (tree exp, rtx target, int ignore)
    we do not have any reliable way to pass unnamed args in
    registers, so we must force them into memory.  */

  -  if (type_arg_types != 0
  +  if ((type_arg_types != 0 || TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
 && targetm.calls.strict_argument_naming (args_so_far))
   ;
 else if (type_arg_types != 0
 && ! targetm.calls.pretend_outgoing_varargs_named 
(args_so_far))

   /* Don't include the last named arg.  */
   --n_named_args;
  -  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype))
  +  else if (TYPE_NO_NAMED_ARGS_STDARG_P (funtype)
  +    && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far))
   n_named_args = 0;
 else
   /* Treat all args as named.  */
```

Do you agree that this makes sense (i.e. is there something I'm 
completely missing)?


FWIW I attempted to try and find other targets which have
`strict_argument_naming` returning `false`, `pretend_outgoing_varargs_named`
returning true, and some use of the `.named` member of function 
arguments.  I

got the below list.  I recognise it doesn't mean there's a bug in these
backends, but thought it might help the discussion.
(lm32 mcore msp430 gcn cris fr30 frv h8300 arm v850 rx pru)



On 1/23/24 08:26, Alexandre Oliva wrote:

On Dec  5, 2023, Alexandre Oliva  wrote:


arm: fix c23 0-named-args caller-side stdarg

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639472.html


The commit message doesn't name explicitly the fixed testsuite
failures.  Here they are:
FAIL: gcc.dg/c23-stdarg-4.c execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O0  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O1  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -O3 -g  execution test
FAIL: gcc.dg/torture/c23-stdarg-split-1a.c   -Os  execution test
Tested on arm-eabi.  Ok to install?




[PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-26 Thread Evgeny Karpov
Thank you Jacek for clarifying Wine's needs for ms_abi!
Work on vararg support is planned.

Regards,
Evgeny

-Original Message-
Friday, February 23, 2024 5:10 PM 
Jacek Caban wrote:

> Dynamically might be needed also if we want to support ms_abi 
> attribute and/or -mabi=ms to support the wine folks.

Wine no longer needs ms_abi, it was needed for PE-in-ELF modules in the past. 
We use use proper PE files now, so we need a cross compiler, but no special 
attributes. aarch64-w64-mingw32 is already well supported by Wine when using 
llvm-mingw, so as soon as GCC properly supports the ABI, Wine should just work 
with it, in theory. I didn't try it, but I don't see things like vararg support 
in this patchset nor in the repo, so I assume it won't work yet.


Thanks for the work!

Jacek



Re: Patch ping^2

2024-02-26 Thread Jan Hubicka
> Hi!
> 
> I'd like to ping 2 patches:
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644580.html
>   
>   
> PR113617 P1 - Handle private COMDAT function symbol reference in readonly 
> data section  
>   
>   
>   
>   
> More details in the   
>   
>   
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/thread.html#644121 
>   
>   
> and   
>   
>   
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/thread.html#644486 
>   
>   
> threads.  
>   
>   

I went through the thread - it looks like an issue that was latent for
quite some time.  I think it belongs to cgraph/symtab maintenance, so I
think I can OK this patch.

Honza
> 
> and
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645326.html
> i386: Enable _BitInt support on ia32
> 
> all the FAILs mentioned in that mail have been fixed by now.
> 
> Thanks
> 
>   Jakub
> 


Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Joseph Myers
On Mon, 26 Feb 2024, Alejandro Colomar wrote:

> > The idea seems reasonable, but the patch needs documentation for the new 
> > option in invoke.texi.
> 
> Thanks!  Will do.
> 
> I don't see an obvious order in that file.  Where would you put the
> option?  Do you want me to sort(1) it first, and then insert the new
> option in alphabetic order?

Sandra might have a better idea of how the warning options ought to be 
ordered in the manual.  (Obviously, don't mix reordering with any content 
changes.)

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Richard Earnshaw (lists)
On 26/02/2024 16:05, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> Did you test this on a thumb1 target?  It seems to me that the target parts 
>> that you've
>> removed were likely related to that.  In fact, I don't see why this test 
>> would need to be changed at all.
> 
> The testcase explicitly forces a Thumb-2 target (arm_arch_v6t2). The patterns
> were wrong for Thumb-2 indeed, and the testcase was explicitly testing for 
> this.
> There is a separate builtin-bswap-2.c for Thumb-1 target (arm_arch_v6m).
> 
> Cheers,
> Wilco

That's why statements like:

* gcc.target/arm/builtin-bswap-1.c: Fix test.

are less than helpful.  Perhaps if you'd said what you actually changed that 
would have made it more obvious.

So OK, but please fix the commit message to say what you did.

R.


[PATCH v1 03/13] aarch64: Mark x18 register as a fixed register for MS ABI

2024-02-26 Thread Evgeny Karpov
This change has been refactored based on the review and will be included in v2.

Original aarch64.h will remain unchanged, and the required definition for MS 
ABI will be implemented in aarch64-abi-ms.h. The reference to this file will be 
added to config.gcc.
This change has been verified, and it appears to work correctly.

The X18 register should not be used in any case.
The call used flag should be set to 0 for the X18 register to prevent its use 
in function calls.
The same applies to the static chain. The X17 register will be used instead of 
X18.

Regards,
Evgeny


gcc/config.gcc
@@ -1263,7 +1263,8 @@

aarch64-*-mingw*)
+   tm_file="${tm_file} aarch64/aarch64-abi-ms.h"



gcc/config/aarch64/aarch64-abi-ms.h
/* Copyright */

/* X18 reserved for the TEB on Windows.  */

#undef FIXED_REGISTERS
#define FIXED_REGISTERS \
  { \
0, 0, 0, 0,   0, 0, 0, 0,   /* R0 - R7 */   \
0, 0, 0, 0,   0, 0, 0, 0,   /* R8 - R15 */  \
0, 0, 1, 0,   0, 0, 0, 0,   /* R16 - R23.  */   \
0, 0, 0, 0,   0, 1, 0, 1,   /* R24 - R30, SP */ \
0, 0, 0, 0,   0, 0, 0, 0,   /* V0 - V7 */   \
0, 0, 0, 0,   0, 0, 0, 0,   /* V8 - V15 */  \
0, 0, 0, 0,   0, 0, 0, 0,   /* V16 - V23 */ \
0, 0, 0, 0,   0, 0, 0, 0,   /* V24 - V31 */ \
1, 1, 1, 1, /* SFP, AP, CC, VG */   \
0, 0, 0, 0,   0, 0, 0, 0,   /* P0 - P7 */   \
0, 0, 0, 0,   0, 0, 0, 0,   /* P8 - P15 */  \
1, 1,   /* FFR and FFRT */  \
1, 1, 1, 1, 1, 1, 1, 1  /* Fake registers */\
  }

#undef CALL_REALLY_USED_REGISTERS
#define CALL_REALLY_USED_REGISTERS  \
  { \
1, 1, 1, 1,   1, 1, 1, 1,   /* R0 - R7 */   \
1, 1, 1, 1,   1, 1, 1, 1,   /* R8 - R15 */  \
1, 1, 0, 0,   0, 0, 0, 0,   /* R16 - R23.  */   \
0, 0, 0, 0,   0, 1, 1, 1,   /* R24 - R30, SP */ \
1, 1, 1, 1,   1, 1, 1, 1,   /* V0 - V7 */   \
0, 0, 0, 0,   0, 0, 0, 0,   /* V8 - V15 */  \
1, 1, 1, 1,   1, 1, 1, 1,   /* V16 - V23 */ \
1, 1, 1, 1,   1, 1, 1, 1,   /* V24 - V31 */ \
1, 1, 1, 0, /* SFP, AP, CC, VG */   \
1, 1, 1, 1,   1, 1, 1, 1,   /* P0 - P7 */   \
1, 1, 1, 1,   1, 1, 1, 1,   /* P8 - P15 */  \
1, 1,   /* FFR and FFRT */  \
0, 0, 0, 0, 0, 0, 0, 0  /* Fake registers */\
  }

#undef  STATIC_CHAIN_REGNUM
#define STATIC_CHAIN_REGNUM R17_REGNUM

 

-Original Message-
Thursday, February 22, 2024 12:55 PM 
Richard Earnshaw (lists) wrote:

+/* X18 reserved for the TEB on Windows.  */ #ifdef TARGET_ARM64_MS_ABI 
+# define FIXED_X18 1 # define CALL_USED_X18 0 #else # define FIXED_X18 
+0 # define CALL_USED_X18 1 #endif

I'm not overly keen on ifdefs like this (and the one below), it can get quite 
confusing if we have to support more than a couple of ABIs.  Perhaps we could 
create a couple of new headers, one for the EABI (which all existing targets 
would then need to include) and one for the MS ABI.  Then the mingw port would 
use that instead of the EABI header.

An alternative is to make all this dynamic, based on the setting of the 
aarch64_calling_abi enum and to make the adjustments in 
aarch64_conditional_register_usage.

+# define CALL_USED_X18 0

Is that really correct?  If the register is really reserved, but some code 
modifies it anyway, this will cause the compiler to restore the old value at 
the end of a function; generally, for a reserved register, code that knows what 
it's doing would want to make permanent changes to this value.

+#ifdef TARGET_ARM64_MS_ABI
+# define STATIC_CHAIN_REGNUM   R17_REGNUM
+#else
+# define STATIC_CHAIN_REGNUM   R18_REGNUM
+#endif

If we went the enum way, we'd want something like

#define STATIC_CHAIN_REGNUM (calling_abi == AARCH64_CALLING_ABI_MS ? R17_REGNUM 
: R18_REGNUM)

R.


Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Wilco Dijkstra
Hi Richard,

> Did you test this on a thumb1 target?  It seems to me that the target parts 
> that you've
> removed were likely related to that.  In fact, I don't see why this test 
> would need to be changed at all.

The testcase explicitly forces a Thumb-2 target (arm_arch_v6t2). The patterns
were wrong for Thumb-2 indeed, and the testcase was explicitly testing for this.
There is a separate builtin-bswap-2.c for Thumb-1 target (arm_arch_v6m).

Cheers,
Wilco


Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote:
> When folding a multiply CHRECs are handled like {a, +, b} * c
> is {a*c, +, b*c} but that isn't generally correct when overflow
> invokes undefined behavior.  The following uses unsigned arithmetic
> unless either a is zero or a and b have the same sign.
> 
> I've used simple early outs for INTEGER_CSTs and otherwise use
> a range-query since we lack a tree_expr_nonpositive_p.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> I'm not sure it's worth using ranger, there might be also more
> cases where the integer multiply is OK, say when abs (A) > abs (B).
> But also {-2, +, 2}, but not for {1, +, -1} for example.

So, given that we found that get_range_pos_neg is not what you want,
I think the patch is ok, except a minor nit

> @@ -428,10 +434,41 @@ chrec_fold_multiply (tree type,
> if (integer_zerop (op1))
>   return build_int_cst (type, 0);
>  
> -   return build_polynomial_chrec
> - (CHREC_VARIABLE (op0),
> -  chrec_fold_multiply (type, CHREC_LEFT (op0), op1),
> -  chrec_fold_multiply (type, CHREC_RIGHT (op0), op1));
> +   /* When overflow is undefined and CHREC_LEFT/RIGHT do not have the
> +  same sign or CHREC_LEFT is zero then folding the multiply into
> +  the addition does not have the same behavior on overflow.  Use
> +  unsigned arithmetic in that case.  */
> +   value_range rl, rr;
> +   if (!ANY_INTEGRAL_TYPE_P (type)
> +   || TYPE_OVERFLOW_WRAPS (type)
> +   || integer_zerop (CHREC_LEFT (op0))
> +   || (TREE_CODE (CHREC_LEFT (op0)) == INTEGER_CST
> +   && TREE_CODE (CHREC_RIGHT (op0)) == INTEGER_CST
> +   && (tree_int_cst_sgn (CHREC_LEFT (op0))
> +   == tree_int_cst_sgn (CHREC_RIGHT (op0
> +   || (get_range_query (cfun)->range_of_expr (rl, CHREC_LEFT (op0))
> +   && !rl.undefined_p ()
> +   && (rl.nonpositive_p () || rl.nonnegative_p ())
> +   && get_range_query (cfun)->range_of_expr (rr, CHREC_RIGHT 
> (op0))
> +   && !rr.undefined_p ()
> +   && ((rl.nonpositive_p () && rr.nonpositive_p ())
> +   || (rl.nonnegative_p () && rr.nonnegative_p ()
> + return build_polynomial_chrec
> +  (CHREC_VARIABLE (op0),
> +   chrec_fold_multiply (type, CHREC_LEFT (op0), op1),
> +   chrec_fold_multiply (type, CHREC_RIGHT (op0), op1));
> +   else
> + {
> +   tree utype = unsigned_type_for (type);
> +   op1 = chrec_convert_rhs (utype, op1);
> +   tree tem = build_polynomial_chrec
> + (CHREC_VARIABLE (op0),
> +  chrec_fold_multiply
> +(utype, chrec_convert_rhs (utype, CHREC_LEFT (op0)), op1),
> +  chrec_fold_multiply
> +(utype, chrec_convert_rhs (utype, CHREC_RIGHT (op0)), op1));
> +   return chrec_convert_rhs (type, tem);

When you touch these, can you please rewrite it to more readable code with
temporaries, instead of the ugly calls with ( on different line from the
function name?
{
  tree left = chrec_fold_multiply (type, CHREC_LEFT (op0), op1);
  tree right = chrec_fold_multiply (type, CHREC_RIGHT (op0), op1);
  return build_polynomial_chrec (CHREC_VARIABLE (op0),
 left, right);
}
and
  tree left = chrec_convert_rhs (utype, CHREC_LEFT (op0));
  left = chrec_fold_multiply (utype, left, op1);
  tree right = chrec_convert_rhs (utype, CHREC_RIGHT (op0));
  right = chrec_fold_multiply (utype, right, op1);
  tree tem = build_polynomial_chrec (CHREC_VARIABLE (op0),
 left, right);
  return chrec_convert_rhs (type, tem);
?

Jakub



Re: [PATCH] ARM: Fix conditional execution [PR113915]

2024-02-26 Thread Richard Earnshaw (lists)
On 23/02/2024 15:46, Wilco Dijkstra wrote:
> Hi Richard,
> 
>> This bit isn't.  The correct fix here is to fix the pattern(s) concerned to 
>> add the missing predicate.
>>
>> Note that builtin-bswap.x explicitly mentions predicated mnemonics in the 
>> comments.
> 
> I fixed the patterns in v2. There are likely some more, plus we could likely 
> merge many t1 and t2
> patterns where the only difference is predication. But those cleanups are for 
> another time...
> 
> Cheers,
> Wilco
> 
> v2: Add predicable to the rev patterns.
> 
> By default most patterns can be conditionalized on Arm targets.  However
> Thumb-2 predication requires the "predicable" attribute be explicitly
> set to "yes".  Most patterns are shared between Arm and Thumb(-2) and are
> marked with "predicable".  Given this sharing, it does not make sense to
> use a different default for Arm.  So only consider conditional execution
> of instructions that have the predicable attribute set to yes.  This ensures
> that patterns not explicitly marked as such are never conditionally executed.
> 
> Passes regress and bootstrap, OK for commit?
> 
> gcc/ChangeLog:
> PR target/113915
> * config/arm/arm.md (NOCOND): Improve comment.
> (arm_rev*) Add predicable.
> * config/arm/arm.cc (arm_final_prescan_insn): Add check for
> PREDICABLE_YES.
> 
> gcc/testsuite/ChangeLog:
> PR target/113915
> * gcc.target/arm/builtin-bswap-1.c: Fix test.
> 
> ---
> 
> diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
> index 
> 1cd69268ee986a0953cc85ab259355d2191250ac..6a35fe44138135998877a9fb74c2a82a7f99dcd5
>  100644
> --- a/gcc/config/arm/arm.cc
> +++ b/gcc/config/arm/arm.cc
> @@ -25613,11 +25613,12 @@ arm_final_prescan_insn (rtx_insn *insn)
> break;
>  
>   case INSN:
> -   /* Instructions using or affecting the condition codes make it
> -  fail.  */
> +   /* Check the instruction is explicitly marked as predicable.
> +  Instructions using or affecting the condition codes are not.  
> */
> scanbody = PATTERN (this_insn);
> if (!(GET_CODE (scanbody) == SET
>   || GET_CODE (scanbody) == PARALLEL)
> +   || get_attr_predicable (this_insn) != PREDICABLE_YES
> || get_attr_conds (this_insn) != CONDS_NOCOND)
>   fail = TRUE;
> break;
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> 5816409f86f1106b410c5e21d77e599b485f85f2..81237a61d4a2ebcfb77e47c2bd29137aba28a521
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -307,6 +307,8 @@
>  ;
>  ; NOCOND means that the instruction does not use or alter the condition
>  ;   codes but can be converted into a conditionally exectuted instruction.
> +;   Given that NOCOND is the default for most instructions if omitted,
> +;   the attribute predicable must be set to yes as well.
>  
>  (define_attr "conds" "use,set,clob,unconditional,nocond"
>   (if_then_else
> @@ -12547,6 +12549,7 @@
>revsh%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12560,6 +12563,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12584,6 +12588,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> @@ -12619,6 +12624,7 @@
> rev16%?\t%0, %1"
>[(set_attr "arch" "t1,t2,32")
> (set_attr "length" "2,2,4")
> +   (set_attr "predicable" "no,yes,yes")
> (set_attr "type" "rev")]
>  )
>  
> diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c 
> b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> index 
> c1e7740d14d3ca4e93a71e38b12f82c19791a204..1a311a6a5af647d40abd553e5d0ba1273c76d288
>  100644
> --- a/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap-1.c
> @@ -5,14 +5,11 @@
> of the instructions.  Add an -mtune option known to facilitate that.  */
>  /* { dg-additional-options "-O2 -mtune=cortex-a53" } */
>  /* { dg-final { scan-assembler-not "orr\[ \t\]" } } */
> -/* { dg-final { scan-assembler-times "revsh\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "revshne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "revsh\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 1 { target { arm_nothumb } } 
> } }  */
> -/* { dg-final { scan-assembler-times "rev16ne\\t" 1 { target { arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev16\\t" 2 { target { ! arm_nothumb } 
> } } }  */
> -/* { dg-final { scan-assembler-times "rev\\t" 2 

Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Alejandro Colomar
Hi Joseph,

On Mon, Feb 26, 2024 at 03:24:32PM +, Joseph Myers wrote:
> On Sun, 25 Feb 2024, Mike Stump wrote:
> 
> > On Feb 6, 2024, at 2:45 AM, Alejandro Colomar  wrote:
> > > 
> > > Warn about the following:
> > > 
> > >char  s[3] = "foo";
> > 
> > No ObjC specific impact here, so no need for ObjC review.
> > 
> > As a member of the peanut gallery, I like the patch.
> > 
> > Joseph, this is been submitted 5 times over the past year.  Any thoughts?
> 
> The idea seems reasonable, but the patch needs documentation for the new 
> option in invoke.texi.

Thanks!  Will do.

I don't see an obvious order in that file.  Where would you put the
option?  Do you want me to sort(1) it first, and then insert the new
option in alphabetic order?

$ man gcc 2>/dev/null | grep -e '^ \{0,3\}[^ ]' -e '^ \{4,7\}-' | sed -n '/^ 
\{1,3\}Options to Request or Suppress Warnings/,/^ \{1,3\}[^ ]/p'
   Options to Request or Suppress Warnings
 -fsyntax-only
 -fmax-errors=n
 -w  Inhibit all warning messages.
 -Werror
 -Werror=
 -Wfatal-errors
 -Wno- options with old compilers, but if something goes wrong, the 
compiler warns that an unrecognized option is present.
 -Wpedantic
 -pedantic
 -pedantic-errors
 -Wall
 -Wextra
 -Wabi (C, Objective‐C, C++ and Objective-C++ only)
 -Wno-changes-meaning (C++ and Objective-C++ only)
 -Wchar-subscripts
 -Wno-coverage-mismatch
 -Wno-coverage-invalid-line-number
 -Wno-cpp (C, Objective‐C, C++, Objective-C++ and Fortran only)
 -Wdouble-promotion (C, C++, Objective‐C and Objective-C++ only)
 -Wduplicate-decl-specifier (C and Objective‐C only)
 -Wformat
 -Wformat=n
 -Wno-format-contains-nul
 -Wno-format-extra-args
 -Wformat-overflow
 -Wformat-overflow=level
 -Wno-format-zero-length
 -Wformat-nonliteral
 -Wformat-security
 -Wformat-signedness
 -Wformat-truncation
 -Wformat-truncation=level
 -Wformat-y2k
 -Wnonnull
 -Wnonnull-compare
 -Wnull-dereference
 -Winfinite-recursion
 -Winit-self (C, C++, Objective‐C and Objective-C++ only)
 -Wno-implicit-int (C and Objective‐C only)
 -Wno-implicit-function-declaration (C and Objective‐C only)
 -Wimplicit (C and Objective‐C only)
 -Wimplicit-fallthrough
 -Wimplicit-fallthrough=n
 -Wno-if-not-aligned (C, C++, Objective‐C and Objective-C++ only)
 -Wignored-qualifiers (C and C++ only)
 -Wno-ignored-attributes (C and C++ only)
 -Wmain
 -Wmisleading-indentation (C and C++ only)
 -Wmissing-attributes
 -Wmissing-braces
 -Wmissing-include-dirs (C, C++, Objective‐C, Objective-C++ and Fortran 
only)
 -Wno-missing-profile
 -Wmismatched-dealloc
 -Wmultistatement-macros
 -Wparentheses
 -Wno-self-move (C++ and Objective-C++ only)
 -Wsequence-point
 -Wno-return-local-addr
 -Wreturn-type
 -Wno-shift-count-negative
 -Wno-shift-count-overflow
 -Wshift-negative-value
 -Wno-shift-overflow
 -Wshift-overflow=n
 -Wswitch
 -Wswitch-default
 -Wswitch-enum
 -Wno-switch-bool
 -Wno-switch-outside-range
 -Wno-switch-unreachable
 -Wsync-nand (C and C++ only)
 -Wtrivial-auto-var-init
 -Wunused-but-set-parameter
 -Wunused-but-set-variable
 -Wunused-function
 -Wunused-label
 -Wunused-local-typedefs (C, Objective‐C, C++ and Objective-C++ only)
 -Wunused-parameter
 -Wno-unused-result
 -Wunused-variable
 -Wunused-const-variable
 -Wunused-const-variable=n
 -Wunused-value
 -Wunused
 -Wuninitialized
 -Wno-invalid-memory-model
 -Wmaybe-uninitialized
 -Wunknown-pragmas
 -Wno-pragmas
 -Wno-prio-ctor-dtor
 -Wstrict-aliasing
 -Wstrict-aliasing=n
 -Wstrict-overflow
 -Wstrict-overflow=n
 -Wstring-compare
 -Wno-stringop-overflow
 -Wstringop-overflow
 -Wstringop-overflow=type
 -Wno-stringop-overread
 -Wno-stringop-truncation
 -Wstrict-flex-arrays
 -Wsuggest-attribute=[pure|const|noreturn|format|cold|malloc]
 -Walloc-zero
 -Walloc-size-larger-than=byte‐size
 -Wno-alloc-size-larger-than
 -Walloca
 -Walloca-larger-than=byte‐size
 -Wno-alloca-larger-than
 -Warith-conversion
 -Warray-bounds
 -Warray-bounds=n
 -Warray-compare
 -Warray-parameter
 -Warray-parameter=n
 -Wattribute-alias=n
 -Wno-attribute-alias
 -Wbidi-chars=[none|unpaired|any|ucn]
 -Wbool-compare
 -Wbool-operation
 -Wduplicated-branches
 -Wduplicated-cond
 -Wframe-address
 -Wno-discarded-qualifiers (C and Objective‐C only)
 -Wno-discarded-array-qualifiers (C and Objective‐C only)
 -Wno-incompatible-pointer-types (C and Objective‐C only)
 -Wno-int-conversion (C and Objective‐C only)
 -Wzero-length-bounds
 -Wno-div-by-zero
 -Wsystem-headers
 -Wtautological-compare
 -Wtrampolines
 -Wfloat-equal
 -Wtraditional (C and 

Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 04:51:08PM +0100, Michael Matz wrote:
> On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> 
> > > Will update the patch, I think any improvement should be done
> > > to get_range_pos_neg (it's a bit odd in behavior for unsigned
> > > but I have only signed things incoming).
> > 
> > For unsigned if it always returned 1, it would be quite useless, there would
> > be no point for the caller to call it in that case.
> 
> Which seems to make sense for a function called ...pos_neg on unsigned 
> types.  I would expect calling it to be useless and always return "yep, 
> non-negative, why did you ask?".

The callers heavily rely on it doing for unsigned types what it does now
and it matches what the function comment says.
If you have a suggestion for a different name that wouldn't be much longer
than the current one, it can be renamed.
get_msb_range?
Except it doesn't return a range for the most significant bit, but a bitmask
which values are possible.

Jakub



[PATCH] RISC-V: Add initial cost handling for segment loads/stores.

2024-02-26 Thread Robin Dapp
Hi,

This has been sitting on my local tree - I've been wanting to post it
for a while but somehow forgot.

This patch makes segment loads and stores more expensive.  It adds
segment_load and segment_store cost fields to the common vector costs
and adds handling to adjust_stmt_cost.  In the future we could handle
this in a more fine-grained manner but let's start somehow.

Regtested on rv64.

Regards
 Robin

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): Add
segment_[load/store]_cost.
* config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost):
Handle segment loads/stores.
* config/riscv/riscv.cc: Initialize segment_[load/store]_cost
to 1.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr113112-4.c: Expect m4
instead of m2.
---
 gcc/config/riscv/riscv-protos.h   |   4 +
 gcc/config/riscv/riscv-vector-costs.cc| 127 --
 gcc/config/riscv/riscv.cc |   4 +
 .../vect/costmodel/riscv/rvv/pr113112-4.c |   4 +-
 4 files changed, 95 insertions(+), 44 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..2e8ab9990a8 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -218,6 +218,10 @@ struct common_vector_cost
   const int gather_load_cost;
   const int scatter_store_cost;
 
+  /* Segment load/store cost.  */
+  const int segment_load_cost;
+  const int segment_store_cost;
+
   /* Cost of a vector-to-scalar operation.  */
   const int vec_to_scalar_cost;
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc 
b/gcc/config/riscv/riscv-vector-costs.cc
index adf9c197df5..d3c12444773 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -1043,6 +1043,24 @@ costs::better_main_loop_than_p (const vector_costs 
*uncast_other) const
   return vector_costs::better_main_loop_than_p (other);
 }
 
+/* Returns the group size i.e. the number of vectors to be loaded by a
+   segmented load/store instruction.  Return 0 if it is no segmented
+   load/store.  */
+static int
+segment_loadstore_group_size (enum vect_cost_for_stmt kind,
+ stmt_vec_info stmt_info)
+{
+  if ((kind == vector_load || kind == vector_store)
+  && STMT_VINFO_DATA_REF (stmt_info))
+{
+  stmt_info = DR_GROUP_FIRST_ELEMENT (stmt_info);
+  if (stmt_info
+ && STMT_VINFO_MEMORY_ACCESS_TYPE (stmt_info) == VMAT_LOAD_STORE_LANES)
+   return DR_GROUP_SIZE (stmt_info);
+}
+  return 0;
+}
+
 /* Adjust vectorization cost after calling riscv_builtin_vectorization_cost.
For some statement, we would like to further fine-grain tweak the cost on
top of riscv_builtin_vectorization_cost handling which doesn't have any
@@ -1067,55 +1085,80 @@ costs::adjust_stmt_cost (enum vect_cost_for_stmt kind, 
loop_vec_info loop,
 case vector_load:
 case vector_store:
{
- /* Unit-stride vector loads and stores do not have offset addressing
-as opposed to scalar loads and stores.
-If the address depends on a variable we need an additional
-add/sub for each load/store in the worst case.  */
- if (stmt_info && stmt_info->stmt)
+ if (stmt_info && stmt_info->stmt && STMT_VINFO_DATA_REF (stmt_info))
{
- data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
- class loop *father = stmt_info->stmt->bb->loop_father;
- if (!loop && father && !father->inner && father->superloops)
+ int group_size;
+ if ((group_size
+  = segment_loadstore_group_size (kind, stmt_info)) > 1)
{
- tree ref;
- if (TREE_CODE (dr->ref) != MEM_REF
- || !(ref = TREE_OPERAND (dr->ref, 0))
- || TREE_CODE (ref) != SSA_NAME)
-   break;
+ /* Segment loads and stores.  When the group size is > 1
+the vectorizer will add a vector load/store statement for
+each vector in the group.  Note that STMT_COST is
+overwritten here rather than adjusted.  */
+ if (riscv_v_ext_vector_mode_p (loop->vector_mode))
+   stmt_cost
+ = (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+? costs->vla->segment_load_cost
+: costs->vla->segment_store_cost);
+ else
+   stmt_cost
+ = (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
+? costs->vls->segment_load_cost
+: costs->vls->segment_store_cost);
+ break;
+ /* TODO: Indexed and ordered/unordered cost.  */
+   }
+ else
+   {
+ /* 

Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Michael Matz
Hello,

On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> > Will update the patch, I think any improvement should be done
> > to get_range_pos_neg (it's a bit odd in behavior for unsigned
> > but I have only signed things incoming).
> 
> For unsigned if it always returned 1, it would be quite useless, there would
> be no point for the caller to call it in that case.

Which seems to make sense for a function called ...pos_neg on unsigned 
types.  I would expect calling it to be useless and always return "yep, 
non-negative, why did you ask?".


Ciao,
Michael.


Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Joseph Myers
On Sun, 25 Feb 2024, Alejandro Colomar wrote:

> or if it's just that everyone was busy doing
> other stuff.

Yes, that's right.  The patch was already listed on my patch review 
backlog, but that backlog is long.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH v5 RESEND] C, ObjC: Add -Wunterminated-string-initialization

2024-02-26 Thread Joseph Myers
On Sun, 25 Feb 2024, Mike Stump wrote:

> On Feb 6, 2024, at 2:45 AM, Alejandro Colomar  wrote:
> > 
> > Warn about the following:
> > 
> >char  s[3] = "foo";
> 
> No ObjC specific impact here, so no need for ObjC review.
> 
> As a member of the peanut gallery, I like the patch.
> 
> Joseph, this is been submitted 5 times over the past year.  Any thoughts?

The idea seems reasonable, but the patch needs documentation for the new 
option in invoke.texi.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] c: Improve some diagnostics for __builtin_stdc_bit_* [PR114042]

2024-02-26 Thread Joseph Myers
On Fri, 23 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> The PR complains that for the __builtin_stdc_bit_* "builtins" the
> diagnostics doesn't mention the name of the builtin the user used, but
> instead __builtin_{clz,ctz,popcount}g instead (which is what the FE
> immediately lowers it to).
> 
> The following patch repeats the checks from check_builtin_function_arguments
> which are there done on BUILT_IN_{CLZ,CTZ,POPCOUNT}G, such that they
> diagnose it with the name of the "builtin" user actually used before it
> is gone.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
josmy...@redhat.com



Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread jeevitha



On 26/02/24 8:37 pm, Peter Bergner wrote:
> On 2/26/24 4:49 AM, Kewen.Lin wrote:
>> on 2024/2/26 14:18, jeevitha wrote:
>>> Hi All,
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 6111cc90eb7..e5688ff972a 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -4660,7 +4660,7 @@
>>>  (define_expand "vsx_splat_"
>>>[(set (match_operand:VSX_D 0 "vsx_register_operand")
>>> (vec_duplicate:VSX_D
>>> -(match_operand: 1 "input_operand")))]
>>> +(match_operand: 1 "splat_input_operand")))]
>>>"VECTOR_MEM_VSX_P (mode)"
>>>  {
>>>rtx op1 = operands[1];
>>
>> This hunk actually does force_reg already:
>>
>> ...
>>   else if (!REG_P (op1))
>> op1 = force_reg (mode, op1);
>>
>> but it's assigning to op1 unexpectedly (an omission IMHO), so just
>> simply fix it with:
>>
>>   else if (!REG_P (op1))
>> -op1 = force_reg (mode, op1);
>> +operands[1] = force_reg (mode, op1);
> 
> I agree op1 was an oversight and it should be operands[1].
> That said, I think using more precise predicates is a good thing,
> so I think we should use both Jeevitha's predicate change and
> your operands[1] change.
> 
> I'll note that Jeevitha originally had the operands[1] change, but I
> didn't look closely enough at the issue or the pattern and mentioned
> that these kinds of bugs can be caused by too loose constraints and
> predicates, which is when she found the updated predicate to use.
> I believe she already even bootstrapped and regtested the operands[1]
> only change.  Jeevitha???
> 
> 

Yes, Peter. I have already bootstrapped and regtested the operands[1] change.


Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Peter Bergner
On 2/26/24 4:49 AM, Kewen.Lin wrote:
> on 2024/2/26 14:18, jeevitha wrote:
>> Hi All,
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index 6111cc90eb7..e5688ff972a 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -4660,7 +4660,7 @@
>>  (define_expand "vsx_splat_"
>>[(set (match_operand:VSX_D 0 "vsx_register_operand")
>>  (vec_duplicate:VSX_D
>> - (match_operand: 1 "input_operand")))]
>> + (match_operand: 1 "splat_input_operand")))]
>>"VECTOR_MEM_VSX_P (mode)"
>>  {
>>rtx op1 = operands[1];
> 
> This hunk actually does force_reg already:
> 
> ...
>   else if (!REG_P (op1))
> op1 = force_reg (mode, op1);
> 
> but it's assigning to op1 unexpectedly (an omission IMHO), so just
> simply fix it with:
> 
>   else if (!REG_P (op1))
> -op1 = force_reg (mode, op1);
> +operands[1] = force_reg (mode, op1);

I agree op1 was an oversight and it should be operands[1].
That said, I think using more precise predicates is a good thing,
so I think we should use both Jeevitha's predicate change and
your operands[1] change.

I'll note that Jeevitha originally had the operands[1] change, but I
didn't look closely enough at the issue or the pattern and mentioned
that these kinds of bugs can be caused by too loose constraints and
predicates, which is when she found the updated predicate to use.
I believe she already even bootstrapped and regtested the operands[1]
only change.  Jeevitha???




>> +/* PR target/113950 */
>> +/* { dg-do compile } */
> 
> We need an effective target to ensure vsx support, for now it's 
> powerpc_vsx_ok.
> ie: /* { dg-require-effective-target powerpc_vsx_ok } */

Agreed.


>> +/* { dg-options "-O1" } */

I think we should also use a -mcpu=XXX option to ensure VSX is enabled
when compiling these VSX built-in functions.  I'm fine using any CPU
(power7 or later) where the ICE exists with an unpatched compiler.
Otherwise, testing will be limited to our server systems that have
VSX enabled by default.


Peter





Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 03:29:30PM +0100, Richard Biener wrote:
> On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> 
> > On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote:
> > > When folding a multiply CHRECs are handled like {a, +, b} * c
> > > is {a*c, +, b*c} but that isn't generally correct when overflow
> > > invokes undefined behavior.  The following uses unsigned arithmetic
> > > unless either a is zero or a and b have the same sign.
> > > 
> > > I've used simple early outs for INTEGER_CSTs and otherwise use
> > > a range-query since we lack a tree_expr_nonpositive_p.
> > 
> > What about testing
> > (get_range_pos_neg (CHREC_LEFT (op0))
> >  | get_range_pos_neg (CHREC_RIGHT (op0))) != 3
> > ?
> 
> Ah, didn't know about that.  It seems to treat zero as "always
> positive", so for 0 and -1 I'd get 3.  OK as I check for
> zero CHREC_LEFT separately.

The function has been used where one cares about the possible values
of the sign bit, so 0 in that case is 0 sign bit.
If you want to differentiate between negative, 0 and positive and allow
non-positive with non-positive or non-negative with non-negative together,
then it isn't the right function to call (unless we add tracking if the
value can be zero, return bitmask with 3 bits rather than 2 and adjust
all callers).

> I'll note that get_range_pos_neg only asks global range query
> and for SSA names (but not sure if range_of_expr handles aribitrary
> GENERIC as SCEV tends to have here ...).

I think range_of_expr should handle usual GENERIC trees and punt on stuff it
doesn't handle.  And your patch was using ranger from current pass while
get_range_pos_neg uses always the global ranges (it is usually used during
expansion where the pass doesn't have its ranger instance).
Though, if you don't ask for range on a particular statement, it doesn't
really matter and is just a global range query.

> Will update the patch, I think any improvement should be done
> to get_range_pos_neg (it's a bit odd in behavior for unsigned
> but I have only signed things incoming).

For unsigned if it always returned 1, it would be quite useless, there would
be no point for the caller to call it in that case.

Jakub



[PATCH] tree-optimization/114081 - dominator update for prologue peeling

2024-02-26 Thread Richard Biener
The following implements manual update for multi-exit loop prologue
peeling during vectorization.

Boostrap / regtest running on x86_64-unknown-linux-gnu.

I think the amount of coverage for prologue peeling with early exits
is very low, so my testing success might not mean much.

Richard.

PR tree-optimization/114081
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Perform manual dominator update for prologue peeling.
(vect_do_peeling): Properly update dominators after adding the
prologue-around guard.

* gcc.dg/vect/vect-early-break_121-pr114081.c: New testcase.
---
 .../vect/vect-early-break_121-pr114081.c  | 39 ++
 gcc/tree-vect-loop-manip.cc   | 78 +--
 2 files changed, 95 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
new file mode 100644
index 000..423ff0b566b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+/* { dg-additional-options "-mavx2" { target { x86_64-*-* i?86-*-* } } } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+typedef struct filter_list_entry {
+  const char *name;
+  int id;
+  void (*function)();
+} filter_list_entry;
+
+static const filter_list_entry filter_list[9] = {0};
+
+void php_zval_filter(int filter, int id1) {
+  filter_list_entry filter_func;
+
+  int size = 9;
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == filter) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+
+#pragma GCC novector
+  for (int i = 0; i < size; ++i) {
+if (filter_list[i].id == 0x0204) {
+  filter_func = filter_list[i];
+  goto done;
+}
+  }
+done:
+  if (!filter_func.id)
+filter_func.function();
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 137b053ac35..f72da915103 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1594,7 +1594,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, 
edge loop_exit,
   auto loop_exits = get_loop_exit_edges (loop);
   bool multiple_exits_p = loop_exits.length () > 1;
   auto_vec doms;
-  class loop *update_loop = NULL;
 
   if (at_exit) /* Add the loop copy at exit.  */
 {
@@ -1856,11 +1855,33 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
 correct.  */
   if (multiple_exits_p)
{
- update_loop = new_loop;
+ class loop *update_loop = new_loop;
  doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header);
  for (unsigned i = 0; i < doms.length (); ++i)
if (flow_bb_inside_loop_p (loop, doms[i]))
  doms.unordered_remove (i);
+
+ for (edge e : get_loop_exit_edges (update_loop))
+   {
+ edge ex;
+ edge_iterator ei;
+ FOR_EACH_EDGE (ex, ei, e->dest->succs)
+   {
+ /* Find the first non-fallthrough block as fall-throughs can't
+dominate other blocks.  */
+ if (single_succ_p (ex->dest))
+   {
+ doms.safe_push (ex->dest);
+ ex = single_succ_edge (ex->dest);
+   }
+ doms.safe_push (ex->dest);
+   }
+ doms.safe_push (e->dest);
+   }
+
+ iterate_fix_dominators (CDI_DOMINATORS, doms, false);
+ if (updated_doms)
+   updated_doms->safe_splice (doms);
}
 }
   else /* Add the copy at entry.  */
@@ -1910,33 +1931,28 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
   set_immediate_dominator (CDI_DOMINATORS, new_loop->header,
   loop_preheader_edge (new_loop)->src);
 
+  /* Update dominators for multiple exits.  */
   if (multiple_exits_p)
-   update_loop = loop;
-}
-
-  if (multiple_exits_p)
-{
-  for (edge e : get_loop_exit_edges (update_loop))
{
- edge ex;
- edge_iterator ei;
- FOR_EACH_EDGE (ex, ei, e->dest->succs)
+ for (edge alt_e : loop_exits)
{
- /* Find the first non-fallthrough block as fall-throughs can't
-dominate other blocks.  */
- if (single_succ_p (ex->dest))
+ if (alt_e == loop_exit)
+   continue;
+ basic_block old_dom
+   = get_immediate_dominator (CDI_DOMINATORS, alt_e->dest);
+ if (flow_bb_inside_loop_p (loop, old_dom))
{
-   

Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote:
> > When folding a multiply CHRECs are handled like {a, +, b} * c
> > is {a*c, +, b*c} but that isn't generally correct when overflow
> > invokes undefined behavior.  The following uses unsigned arithmetic
> > unless either a is zero or a and b have the same sign.
> > 
> > I've used simple early outs for INTEGER_CSTs and otherwise use
> > a range-query since we lack a tree_expr_nonpositive_p.
> 
> What about testing
> (get_range_pos_neg (CHREC_LEFT (op0))
>  | get_range_pos_neg (CHREC_RIGHT (op0))) != 3
> ?

Ah, didn't know about that.  It seems to treat zero as "always
positive", so for 0 and -1 I'd get 3.  OK as I check for
zero CHREC_LEFT separately.

I'll note that get_range_pos_neg only asks global range query
and for SSA names (but not sure if range_of_expr handles aribitrary
GENERIC as SCEV tends to have here ...).

Will update the patch, I think any improvement should be done
to get_range_pos_neg (it's a bit odd in behavior for unsigned
but I have only signed things incoming).

Richard.


Re: [PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 03:15:02PM +0100, Richard Biener wrote:
> When folding a multiply CHRECs are handled like {a, +, b} * c
> is {a*c, +, b*c} but that isn't generally correct when overflow
> invokes undefined behavior.  The following uses unsigned arithmetic
> unless either a is zero or a and b have the same sign.
> 
> I've used simple early outs for INTEGER_CSTs and otherwise use
> a range-query since we lack a tree_expr_nonpositive_p.

What about testing
(get_range_pos_neg (CHREC_LEFT (op0))
 | get_range_pos_neg (CHREC_RIGHT (op0))) != 3
?

Jakub



[PATCH v2] DSE: Bugfix ICE after allow vector type in get_stored_val

2024-02-26 Thread pan2 . li
From: Pan Li 

We allowed vector type for get_stored_val when read is less than or
equal to store in previous.  Unfortunately, we missed to adjust the
validate_subreg part accordingly.  When the vector type's size is
less than vector register, it will be considered as invalid in the
validate_subreg.

Consider the validate_subreg is kind of a can with worms and we are
in stage 4.  We will fix the issue from the DES side, and make sure
the subreg is valid for both the read_mode and store_mode before
perform the real gen_lowpart.

The below test are passed for this patch:

* The x86 bootstrap test.
* The x86 regression test.
* The riscv regression test.
* The aarch64 regression test.

gcc/ChangeLog:

* dse.cc (get_stored_val): Add validate_subreg check before
perform the gen_lowpart for rtl.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-fre-44.c: Add compile option to trigger
the ICE.
* gcc.target/riscv/rvv/base/bug-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/dse.cc|  4 +++-
 gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c|  2 +-
 .../gcc.target/riscv/rvv/base/bug-6.c | 22 +++
 3 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c

diff --git a/gcc/dse.cc b/gcc/dse.cc
index edc7a1dfecf..1596da91da0 100644
--- a/gcc/dse.cc
+++ b/gcc/dse.cc
@@ -1946,7 +1946,9 @@ get_stored_val (store_info *store_info, machine_mode 
read_mode,
 copy_rtx (store_info->const_rhs));
   else if (VECTOR_MODE_P (read_mode) && VECTOR_MODE_P (store_mode)
 && known_le (GET_MODE_BITSIZE (read_mode), GET_MODE_BITSIZE (store_mode))
-&& targetm.modes_tieable_p (read_mode, store_mode))
+&& targetm.modes_tieable_p (read_mode, store_mode)
+&& validate_subreg (read_mode, store_mode, copy_rtx (store_info->rhs),
+   subreg_lowpart_offset (read_mode, store_mode)))
 read_reg = gen_lowpart (read_mode, copy_rtx (store_info->rhs));
   else
 read_reg = extract_low_bits (read_mode, store_mode,
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
index f79b4c142ae..624a00a4f32 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-44.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-fre1" } */
+/* { dg-options "-O -fdump-tree-fre1 -O3 -ftree-vectorize" } */
 
 struct A { float x, y; };
 struct B { struct A u; };
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
new file mode 100644
index 000..5bb00b8f587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/bug-6.c
@@ -0,0 +1,22 @@
+/* Test that we do not have ice when compile */
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize" } */
+
+struct A { float x, y; };
+struct B { struct A u; };
+
+extern void bar (struct A *);
+
+float
+f3 (struct B *x, int y)
+{
+  struct A p = {1.0f, 2.0f};
+  struct A *q = [y].u;
+
+  __builtin_memcpy (>x, , sizeof (float));
+  __builtin_memcpy (>y, , sizeof (float));
+
+  bar ();
+
+  return x[y].u.x + x[y].u.y;
+}
-- 
2.34.1



[PATCH] tree-optimization/114074 - CHREC multiplication and undefined overflow

2024-02-26 Thread Richard Biener
When folding a multiply CHRECs are handled like {a, +, b} * c
is {a*c, +, b*c} but that isn't generally correct when overflow
invokes undefined behavior.  The following uses unsigned arithmetic
unless either a is zero or a and b have the same sign.

I've used simple early outs for INTEGER_CSTs and otherwise use
a range-query since we lack a tree_expr_nonpositive_p.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I'm not sure it's worth using ranger, there might be also more
cases where the integer multiply is OK, say when abs (A) > abs (B).
But also {-2, +, 2}, but not for {1, +, -1} for example.

OK?

Thanks,
Richard.

PR tree-optimization/114074
* tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL.
* tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs.
Handle poly vs. non-poly multiplication correctly with respect
to undefined behavior on overflow.

* gcc.dg/torture/pr114074.c: New testcase.
* gcc.dg/pr68317.c: Adjust expected location of diagnostic.
---
 gcc/testsuite/gcc.dg/pr68317.c  |  4 +-
 gcc/testsuite/gcc.dg/torture/pr114074.c | 27 +
 gcc/tree-chrec.cc   | 53 -
 gcc/tree-chrec.h|  2 +-
 4 files changed, 72 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr114074.c

diff --git a/gcc/testsuite/gcc.dg/pr68317.c b/gcc/testsuite/gcc.dg/pr68317.c
index bd053a7522b..06cd2e1da9c 100644
--- a/gcc/testsuite/gcc.dg/pr68317.c
+++ b/gcc/testsuite/gcc.dg/pr68317.c
@@ -12,8 +12,8 @@ foo ()
 {
  int32_t index = 0;
 
- for (index; index <= 10; index--) // expected warning here
+ for (index; index <= 10; index--) /* { dg-warning "iteration \[0-9\]+ invokes 
undefined behavior" } */
/* Result of the following multiply will overflow
   when converted to signed int32_t.  */
-   bar ((0xcafe + index) * 0xdead);  /* { dg-warning "iteration \[0-9\]+ 
invokes undefined behavior" } */
+   bar ((0xcafe + index) * 0xdead);
 }
diff --git a/gcc/testsuite/gcc.dg/torture/pr114074.c 
b/gcc/testsuite/gcc.dg/torture/pr114074.c
new file mode 100644
index 000..336e97673c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114074.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+int a, b, d;
+
+__attribute__((noipa)) void
+foo (void)
+{
+  ++d;
+}
+
+int
+main ()
+{
+  for (a = 0; a > -3; a -= 2)
+{
+  int c = a;
+  b = __INT_MAX__ - 3000;
+  a = ~c * b;
+  foo ();
+  if (!a)
+break;
+  a = c;
+}
+  if (d != 2)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index 61456fe1141..1b1023f68af 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -38,6 +38,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "tree-ssa-loop.h"
 #include "dumpfile.h"
+#include "value-range.h"
+#include "value-query.h"
 #include "tree-scalar-evolution.h"
 
 /* Extended folder for chrecs.  */
@@ -404,6 +406,10 @@ chrec_fold_multiply (tree type,
   || automatically_generated_chrec_p (op1))
 return chrec_fold_automatically_generated_operands (op0, op1);
 
+  if (TREE_CODE (op0) != POLYNOMIAL_CHREC
+  && TREE_CODE (op1) == POLYNOMIAL_CHREC)
+std::swap (op0, op1);
+
   switch (TREE_CODE (op0))
 {
 case POLYNOMIAL_CHREC:
@@ -428,10 +434,41 @@ chrec_fold_multiply (tree type,
  if (integer_zerop (op1))
return build_int_cst (type, 0);
 
- return build_polynomial_chrec
-   (CHREC_VARIABLE (op0),
-chrec_fold_multiply (type, CHREC_LEFT (op0), op1),
-chrec_fold_multiply (type, CHREC_RIGHT (op0), op1));
+ /* When overflow is undefined and CHREC_LEFT/RIGHT do not have the
+same sign or CHREC_LEFT is zero then folding the multiply into
+the addition does not have the same behavior on overflow.  Use
+unsigned arithmetic in that case.  */
+ value_range rl, rr;
+ if (!ANY_INTEGRAL_TYPE_P (type)
+ || TYPE_OVERFLOW_WRAPS (type)
+ || integer_zerop (CHREC_LEFT (op0))
+ || (TREE_CODE (CHREC_LEFT (op0)) == INTEGER_CST
+ && TREE_CODE (CHREC_RIGHT (op0)) == INTEGER_CST
+ && (tree_int_cst_sgn (CHREC_LEFT (op0))
+ == tree_int_cst_sgn (CHREC_RIGHT (op0
+ || (get_range_query (cfun)->range_of_expr (rl, CHREC_LEFT (op0))
+ && !rl.undefined_p ()
+ && (rl.nonpositive_p () || rl.nonnegative_p ())
+ && get_range_query (cfun)->range_of_expr (rr, CHREC_RIGHT 
(op0))
+ && !rr.undefined_p ()
+ && ((rl.nonpositive_p () && rr.nonpositive_p ())
+ || (rl.nonnegative_p () && rr.nonnegative_p ()
+   return build_polynomial_chrec
+(CHREC_VARIABLE (op0),
+ chrec_fold_multiply (type, 

RE: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Tamar Christina wrote:

> > > The testcase shows an interesting case where we have multiple loops 
> > > sharing a
> > > live value and have an early exit that go to the same location.  The 
> > > additional
> > > complication is that on x86_64 with -mavx we seem to also do prologue 
> > > peeling
> > > on the loops.
> > >
> > > We correctly identify which BB we need their dominators updated for, but 
> > > we do
> > > so too early.
> > >
> > > Instead of adding more dominator update we can solve this by for the 
> > > cases with
> > > multiple exits not to verify dominators at the end of peeling if peeling 
> > > for
> > > vectorization.
> > >
> > > We can then perform the final dominator updates just before vectorization 
> > > when
> > > all loop transformations are done.
> > 
> > What's the actual CFG transform that happens between the old and the new
> > place?  I see a possible edge splitting but where is the one that makes
> > this patch work?
> 
> It's not one but two.
> 1. loop 1 is prologue peeled. This ICEs because the dominator update is only 
> happening
> for epilogue peeling.  Note that loop 1 here dominates 21 and the ICE is:
> 
> ice.c: In function 'void php_zval_filter(int, int)':
> ice.c:7:6: error: dominator of 14 should be 21, not 3
> 7 | void php_zval_filter(int filter, int id1) {
>   |  ^~~
> ice.c:7:6: error: dominator of 10 should be 21, not 3
> during GIMPLE pass: vect
> dump file: a-ice.c.179t.vect
> 
> This can be simply fixed by just moving the dom update code down:
> 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index a5202f32e27..e88948370c6 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1845,13 +1845,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
>  to the original function exit we recorded.  Other exits are already
>  correct.  */
>if (multiple_exits_p)
> -   {
> - update_loop = new_loop;
> - doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header);
> - for (unsigned i = 0; i < doms.length (); ++i)
> -   if (flow_bb_inside_loop_p (loop, doms[i]))
> - doms.unordered_remove (i);
> -   }
> +   update_loop = new_loop;
>  }
>else /* Add the copy at entry.  */
>  {
> @@ -1906,6 +1900,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
> 
>if (multiple_exits_p)
>  {
> +  doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header);
> +  for (unsigned i = 0; i < doms.length (); ++i)
> +   if (flow_bb_inside_loop_p (loop, doms[i]))
> + doms.unordered_remove (i);
> +
>for (edge e : get_loop_exit_edges (update_loop))
> {
>   edge ex;
> 
> with that done, the next ICE comes along. Loop 1 is peeled again, but this 
> time for epilogue.
> however loop 1 no longer dominates the exits as the prologue peeled loop does.
> 
> So we don't find anything to update and ice with the second ICE:
> 
> ice.c: In function 'void php_zval_filter(int, int)':
> ice.c:7:6: error: dominator of 14 should be 2, not 21
> 7 | void php_zval_filter(int filter, int id1) {
>   |  ^~~
> ice.c:7:6: error: dominator of 10 should be 2, not 21
> during GIMPLE pass: vect
> dump file: a-ice.c.179t.vect

OK, that seems to be because the condition around the prologue doesn't
update everything for the skip_prolog case (slpeel_add_loop_guard
fails to update for multi-exits).

A more "general" dominance update in the face of multiple exits might
be something like

  for (edge alt_e : loop_exits)
{
  if (alt_e == loop_exit)
continue;
  basic_block old_dom
= get_immediate_dominator (CDI_DOMINATORS, alt_e->dest);
  if (flow_bb_inside_loop_p (loop, old_dom))
{
  auto_vec queue;
  for (auto son = first_dom_son (CDI_DOMINATORS, old_dom);
   son; son = next_dom_son (CDI_DOMINATORS, son))
if (!flow_bb_inside_loop_p (loop, son))
  queue.safe_push (son);
  for (auto son : queue)
set_immediate_dominator (CDI_DOMINATORS,
 son, get_bb_copy (old_dom));
}
}

basically we know the new most dominating place and we know where we
wire in new edges (the exits, or in the new guard the skip point
destination).  From there we can look at the old immediate dominator
which we know we have to adjust - but with multiple exits there might
be other blocks also dominated by this block and those we need to adjust
as well.  The above can be possibly split out to a helper getting
old_dom and the "new dom" (the get_bb_copy one) and a predicate
that might generally reduce to !dominated_by_p the "exit dest".


Re: [PATCH] RISC-V: Update test expectancies with recent scheduler change

2024-02-26 Thread Robin Dapp
On 2/24/24 00:10, Edwin Lu wrote:
> Given the recent change with adding the scheduler pipeline descriptions,
> many scan-dump failures emerged. Relax the expected assembler output
> conditions on the affected tests to reduce noise.

I'm not entirely sure yet about relaxing the scans like this.
There seem to be uarchs that want to minimize vsetvls under all
circumstances while others don't seem to care all that much.  We could
(not must) assume that the tests that now regress have been written
with this minimization aspect in mind and that we'd want to be sure
that we still manage to emit the minimal number of vsetvls.

Why is the new upper bound acceptable?  What if a vector_load cost
of 12 (or so) causes even more vsetvls?  The 6 in generic_ooo is more
or less arbitrary chosen.

My suggestion before was to create another sched model that has
load costs like before and run the regressing tests with that
model.  That's of course also not really ideal and actually
shoehorned a bit, in particular as no scheduling also increases
the number of vsetvls.

Juzhe: What's your intention with those tests?  I'd suppose you
want the vsetvl number to be minimal here and not higher?  Did you
plan to add a particular scheduling model or are you happy with
the default (all 1) latencies?

Regards
 Robin


[Ada] Fix PR ada/113893

2024-02-26 Thread Eric Botcazou
The finalization of objects dynamically allocated through an anonymous access 
type is deferred to the enclosing library unit in the current implementation 
and a warning is given on each of them.

However this cannot be done if the designated type is local, because this 
would generate dangling references to the local finalization routine, so
the finalization needs to be dropped in this case and the warning adjusted.

Tested on x86-64/Linux, applied on all active branches.


2024-02-26  Eric Botcazou  

PR ada/113893
* exp_ch7.adb (Build_Anonymous_Master): Do not build the master
for a local designated type.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Force Needs_Fin
to false if no finalization master is attached to an access type
and assert that it is anonymous in this case.
* sem_res.adb (Resolve_Allocator): Mention that the object might
not be finalized at all in the warning given when the type is an
anonymous access-to-controlled type.


2024-02-26  Eric Botcazou  

* gnat.dg/access10.adb: New test.

-- 
Eric Botcazoudiff --git a/gcc/ada/exp_ch7.adb b/gcc/ada/exp_ch7.adb
index 2ac73101351..e594a534244 100644
--- a/gcc/ada/exp_ch7.adb
+++ b/gcc/ada/exp_ch7.adb
@@ -749,6 +749,7 @@ package body Exp_Ch7 is
   Desig_Typ : Entity_Id;
   FM_Id : Entity_Id;
   Priv_View : Entity_Id;
+  Scop  : Entity_Id;
   Unit_Decl : Node_Id;
   Unit_Id   : Entity_Id;
 
@@ -787,6 +788,18 @@ package body Exp_Ch7 is
  Desig_Typ := Priv_View;
   end if;
 
+  --  For a designated type not declared at library level, we cannot create
+  --  a finalization collection attached to an outer unit since this would
+  --  generate dangling references to the dynamic scope through access-to-
+  --  procedure values designating the local Finalize_Address primitive.
+
+  Scop := Enclosing_Dynamic_Scope (Desig_Typ);
+  if Scop /= Standard_Standard
+and then Scope_Depth (Scop) > Scope_Depth (Unit_Id)
+  then
+ return;
+  end if;
+
   --  Determine whether the current semantic unit already has an anonymous
   --  master which services the designated type.
 
diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 31cd47de7d2..04d114694ab 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -936,6 +936,16 @@ package body Exp_Util is
 Needs_Finalization (Desig_Typ)
   and then not No_Heap_Finalization (Ptr_Typ);
 
+  --  The allocation/deallocation of a controlled object must be associated
+  --  with an attachment to/detachment from a finalization master, but the
+  --  implementation cannot guarantee this property for every anonymous
+  --  access tyoe, see Build_Anonymous_Collection.
+
+  if Needs_Fin and then No (Finalization_Master (Ptr_Typ)) then
+ pragma Assert (Ekind (Ptr_Typ) = E_Anonymous_Access_Type);
+ Needs_Fin := False;
+  end if;
+
   if Needs_Fin then
 
  --  Do nothing if the access type may never allocate / deallocate
@@ -945,11 +955,6 @@ package body Exp_Util is
 return;
  end if;
 
- --  The allocation / deallocation of a controlled object must be
- --  chained on / detached from a finalization master.
-
- pragma Assert (Present (Finalization_Master (Ptr_Typ)));
-
   --  The only other kind of allocation / deallocation supported by this
   --  routine is on / from a subpool.
 
diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
index 8e9714c1c86..075c0d85ccd 100644
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -5679,19 +5679,19 @@ package body Sem_Res is
Set_Is_Dynamic_Coextension (N, False);
Set_Is_Static_Coextension  (N, False);
 
-   --  Anonymous access-to-controlled objects are not finalized on
-   --  time because this involves run-time ownership and currently
-   --  this property is not available. In rare cases the object may
-   --  not be finalized at all. Warn on potential issues involving
-   --  anonymous access-to-controlled objects.
+   --  Objects allocated through anonymous access types are not
+   --  finalized on time because this involves run-time ownership
+   --  and currently this property is not available. In rare cases
+   --  the object might not be finalized at all. Warn on potential
+   --  issues involving anonymous access-to-controlled types.
 
if Ekind (Typ) = E_Anonymous_Access_Type
  and then Is_Controlled_Active (Desig_T)
then
   Error_Msg_N
-("??object designated by anonymous access object might "
+("??object designated by anonymous access value might "
  & "not be finalized until its 

Re: Ping: Re: [PATCH] libgcc: fix SEH C++ rethrow semantics [PR113337]

2024-02-26 Thread Matteo Italia

Il 26/02/24 02:41, NightStrike ha scritto:

It's mostly up to you whether you want to make the patch and test it.


I mean, the whole file has no code modifications since bd6ecbe48ada 
(2020), and that specific function is the same since it was first 
committed (bf1431e3596b, from 2012). I don't think I should make a 
separate patch for backports: the one I originally posted cherry-picks 
cleanly even to releases/gcc-4.8.0 (first release containing 
bf1431e3596b, at least according to git tag --contains), and the caller 
code is just the same as well, so I expect that technically it could be 
applied pretty much up to there without any modification.


Now, having a look at https://gcc.gnu.org/releases.html I seem to 
understand that the current "active" major releases are 11, 12 and 13. I 
would surely backport to 13, especially given that the patch was 
developed and tested on that release in first place. 11 and 12 would be 
nice too, although I made no explicit tests there; a quick check with


git diff releases/gcc-11.1.0 origin/master -- libgcc/unwind.inc 
libgcc/unwind-seh.c libstdc++-v3/libsupc++/eh_throw.cc


doesn't show any difference that is relevant for my patch. Still, if I 
find some time for that I could compile these patched releases and see 
if the patch still works correctly for extra caution.


Matteo



RE: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Tamar Christina
> > The testcase shows an interesting case where we have multiple loops sharing 
> > a
> > live value and have an early exit that go to the same location.  The 
> > additional
> > complication is that on x86_64 with -mavx we seem to also do prologue 
> > peeling
> > on the loops.
> >
> > We correctly identify which BB we need their dominators updated for, but we 
> > do
> > so too early.
> >
> > Instead of adding more dominator update we can solve this by for the cases 
> > with
> > multiple exits not to verify dominators at the end of peeling if peeling for
> > vectorization.
> >
> > We can then perform the final dominator updates just before vectorization 
> > when
> > all loop transformations are done.
> 
> What's the actual CFG transform that happens between the old and the new
> place?  I see a possible edge splitting but where is the one that makes
> this patch work?

It's not one but two.
1. loop 1 is prologue peeled. This ICEs because the dominator update is only 
happening
for epilogue peeling.  Note that loop 1 here dominates 21 and the ICE is:

ice.c: In function 'void php_zval_filter(int, int)':
ice.c:7:6: error: dominator of 14 should be 21, not 3
7 | void php_zval_filter(int filter, int id1) {
  |  ^~~
ice.c:7:6: error: dominator of 10 should be 21, not 3
during GIMPLE pass: vect
dump file: a-ice.c.179t.vect

This can be simply fixed by just moving the dom update code down:

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index a5202f32e27..e88948370c6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1845,13 +1845,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
 to the original function exit we recorded.  Other exits are already
 correct.  */
   if (multiple_exits_p)
-   {
- update_loop = new_loop;
- doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header);
- for (unsigned i = 0; i < doms.length (); ++i)
-   if (flow_bb_inside_loop_p (loop, doms[i]))
- doms.unordered_remove (i);
-   }
+   update_loop = new_loop;
 }
   else /* Add the copy at entry.  */
 {
@@ -1906,6 +1900,11 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,

   if (multiple_exits_p)
 {
+  doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header);
+  for (unsigned i = 0; i < doms.length (); ++i)
+   if (flow_bb_inside_loop_p (loop, doms[i]))
+ doms.unordered_remove (i);
+
   for (edge e : get_loop_exit_edges (update_loop))
{
  edge ex;

with that done, the next ICE comes along. Loop 1 is peeled again, but this time 
for epilogue.
however loop 1 no longer dominates the exits as the prologue peeled loop does.

So we don't find anything to update and ice with the second ICE:

ice.c: In function 'void php_zval_filter(int, int)':
ice.c:7:6: error: dominator of 14 should be 2, not 21
7 | void php_zval_filter(int filter, int id1) {
  |  ^~~
ice.c:7:6: error: dominator of 10 should be 2, not 21
during GIMPLE pass: vect
dump file: a-ice.c.179t.vect

because the prologue loop no longer dominates them due to the skip edge.  This 
is why delaying
works because we know we have to update the dominators of 14 and 10, but to 
what we don't know
yet.

Tamar

> 
> > This also means we reduce the number of dominator updates needed by at least
> > 50% and fixes the ICE.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and
> > x86_64-pc-linux-gnu no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/114081
> > PR tree-optimization/113290
> > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
> > Skip dominator update when multiple exit.
> > (vect_do_peeling): Remove multiple exit dominator update.
> > * tree-vect-loop.cc (vect_transform_loop): Update dominators when
> > multiple exits.
> > * tree-vectorizer.h (LOOP_VINFO_DOMS_NEED_UPDATE,
> >  dominators_needing_update): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/114081
> > PR tree-optimization/113290
> > * gcc.dg/vect/vect-early-break_120-pr114081.c: New test.
> > * gcc.dg/vect/vect-early-break_121-pr114081.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
> > new file mode 100644
> > index
> ..2cd4ce1e4ac573ba6e4173
> 0fd2216f0ec8061376
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-add-options vect_early_break } */
> > +/* { dg-require-effective-target vect_early_break } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { 

[PATCH] Add myself to write after approval and DCO.

2024-02-26 Thread Juergen Christ
Hello,

I have added myself to write after approval and DCO.

Thanks,

Juergen Christ

ChangeLog:

* MAINTAINERS: Add myself to write after approval and DCO.

Signed-off-by: Juergen Christ 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cb5a42501dd2..ca6a27b4c11b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -375,6 +375,7 @@ Dehao Chen  

 Fabien Chêne   
 Clément Chigot 
 Harshit Chopra 
+Juergen Christ 
 Tamar Christina

 Eric Christopher   
 Paul Clarke
@@ -756,6 +757,7 @@ Certificate of Origin Version 1.1.  See 
https://gcc.gnu.org/dco.html for more
 information.
 
 
+Juergen Christ  
 Robin Dapp 
 Robin Dapp 
 Michal Jires   
-- 
2.39.3



[PATCH 2/2] tree-optimization/114099 - virtual LC PHIs and early exit vect

2024-02-26 Thread Richard Biener
In some cases exits can lack LC PHI nodes for the virtual operand.
We have to create them when the epilog loop requires them which also
allows us to remove some only halfway correct fixups.  This is the
variant triggering for alternate exits.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

PR tree-optimization/114099
* tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
Create and fill in a needed virtual LC PHI for the alternate
exits.  Remove code dealing with that missing.

* gcc.dg/vect/vect-early-break_120-pr114099.c: New testcase.
---
 .../vect/vect-early-break_120-pr114099.c  | 20 +++
 gcc/tree-vect-loop-manip.cc   | 35 +++
 2 files changed, 33 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114099.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114099.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114099.c
new file mode 100644
index 000..77e47e30417
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114099.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+int m;
+void __attribute__((noreturn)) n();
+void t1(int jj, int l) {
+  for (int i = 1; i < l; i++)
+  {
+int p = m++;
+if (p)
+  n();
+if(jj <= i)
+  __builtin_unreachable();
+  }
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 39bac1e99ef..137b053ac35 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1667,17 +1667,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  alt_loop_exit_block = split_edge (exit);
if (!need_virtual_phi)
  continue;
-   if (vphi_def)
- {
-   if (!vphi)
- vphi = create_phi_node (copy_ssa_name (vphi_def),
- alt_loop_exit_block);
-   else
- /* Edge redirection might re-allocate the PHI node
-so we have to rediscover it.  */
- vphi = get_virtual_phi (alt_loop_exit_block);
-   add_phi_arg (vphi, vphi_def, exit, UNKNOWN_LOCATION);
- }
+   /* When the edge has no virtual LC PHI get at the live
+  virtual operand by other means.  */
+   if (!vphi_def)
+ vphi_def = get_live_virtual_operand_on_edge (exit);
+   if (!vphi)
+ vphi = create_phi_node (copy_ssa_name (vphi_def),
+ alt_loop_exit_block);
+   else
+ /* Edge redirection might re-allocate the PHI node
+so we have to rediscover it.  */
+ vphi = get_virtual_phi (alt_loop_exit_block);
+   add_phi_arg (vphi, vphi_def, exit, UNKNOWN_LOCATION);
  }
 
  set_immediate_dominator (CDI_DOMINATORS, new_preheader,
@@ -1789,17 +1790,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
  if (virtual_operand_p (alt_arg))
{
  gphi *vphi = get_virtual_phi (alt_loop_exit_block);
- /* ???  When the exit yields to a path without
-any virtual use we can miss a LC PHI for the
-live virtual operand.  Simply choosing the
-one live at the start of the loop header isn't
-correct, but we should get here only with
-early-exit vectorization which will move all
-defs after the main exit, so leave a temporarily
-wrong virtual operand in place.  This happens
-for gcc.c-torture/execute/20150611-1.c  */
- if (vphi)
-   alt_arg = gimple_phi_result (vphi);
+ alt_arg = gimple_phi_result (vphi);
}
  /* For other live args we didn't create LC PHI nodes.
 Do so here.  */
-- 
2.35.3


[PATCH 1/2] tree-optimization/114068 - missed virtual LC PHI after vect peeling

2024-02-26 Thread Richard Biener
When we choose the IV exit to be one leading to no virtual use we
fail to have a virtual LC PHI even though we need it for the epilog
entry.  The following makes sure to create it so that later updating
works.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/114068
* tree-vect-loop-manip.cc (get_live_virtual_operand_on_edge):
New function.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add a virtual LC PHI
on the main exit if needed.  Remove band-aid for the case
it was missing.

* gcc.dg/vect/vect-early-break_118-pr114068.c: New testcase.
* gcc.dg/vect/vect-early-break_119-pr114068.c: Likewise.
---
 .../vect/vect-early-break_118-pr114068.c  | 23 
 .../vect/vect-early-break_119-pr114068.c  | 25 +
 gcc/tree-vect-loop-manip.cc   | 52 ++-
 3 files changed, 87 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c
new file mode 100644
index 000..b462a464b66
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+struct h {
+  int b;
+  int f;
+} k;
+
+void n(int m) {
+  struct h a = k;
+  for (int o = m; o; ++o) {
+if (a.f)
+  __builtin_unreachable();
+if (o > 1)
+  __builtin_unreachable();
+*( + o) = 1;
+  }
+}
diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c
new file mode 100644
index 000..a65ef7b8c49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-O3" } */
+
+/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
+
+struct h {
+  int b;
+  int c;
+  int f;
+} k;
+
+void n(int m) {
+  struct h a = k;
+  for (int o = m; o; ++o) {
+if (a.f)
+  __builtin_unreachable();
+if (o > 1)
+  __builtin_unreachable();
+*( + o) = 1;
+*( + o*m) = 2;
+  }
+}
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f974d6d839..39bac1e99ef 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -1429,6 +1429,32 @@ vect_set_loop_condition (class loop *loop, edge loop_e, 
loop_vec_info loop_vinfo
 (gimple *) cond_stmt);
 }
 
+/* Get the virtual operand live on E.  The precondition on this is valid
+   immediate dominators and an actual virtual definition dominating E.  */
+/* ???  Costly band-aid.  For the use in question we can populate a
+   live-on-exit/end-of-BB virtual operand when copying stmts.  */
+
+static tree
+get_live_virtual_operand_on_edge (edge e)
+{
+  basic_block bb = e->src;
+  do
+{
+  for (auto gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev ())
+   {
+ gimple *stmt = gsi_stmt (gsi);
+ if (gimple_vdef (stmt))
+   return gimple_vdef (stmt);
+ if (gimple_vuse (stmt))
+   return gimple_vuse (stmt);
+   }
+  if (gphi *vphi = get_virtual_phi (bb))
+   return gimple_phi_result (vphi);
+  bb = get_immediate_dominator (CDI_DOMINATORS, bb);
+}
+  while (1);
+}
+
 /* Given LOOP this function generates a new copy of it and puts it
on E which is either the entry or exit of LOOP.  If SCALAR_LOOP is
non-NULL, assume LOOP and SCALAR_LOOP are equivalent and copy the
@@ -1595,6 +1621,18 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
*loop, edge loop_exit,
   flush_pending_stmts (loop_exit);
   set_immediate_dominator (CDI_DOMINATORS, new_preheader, loop_exit->src);
 
+  /* If we ended up choosing an exit leading to a path not using memory
+we can end up without a virtual LC PHI.  Create it when it is
+needed because of the epilog loop continuation.  */
+  if (need_virtual_phi && !get_virtual_phi (loop_exit->dest))
+   {
+ tree header_def = gimple_phi_result (get_virtual_phi (loop->header));
+ gphi *vphi = create_phi_node (copy_ssa_name (header_def),
+   new_preheader);
+ add_phi_arg (vphi, get_live_virtual_operand_on_edge (loop_exit),
+  loop_exit, UNKNOWN_LOCATION);
+   }
+
   bool multiple_exits_p = loop_exits.length () > 1;
   

Re: [PATCH] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-26 Thread Kewen.Lin
Hi,

on 2024/2/26 14:18, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> There is no immediate value splatting instruction in powerpc. Currently that
> needs to be stored in a register or memory. For addressing this I have updated
> the predicate for the second operand in vsx_splat to splat_input_operand,
> which will handle the operands appropriately.

The test case fails with error message with GCC 11, but fails with ICE from GCC
12, it's kind of regression, so I think we can make such fix in this stage.

Out of curiosity, did you check why it triggers error messages on GCC 11?  I
guess the difference from GCC 12 is Bill introduced new built-in framework in
GCC12 which adds the support for the bif, but I'm curious what prevent this
being supported before that.

> 
> 2024-02-26  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/113950
>   * config/rs6000/vsx.md (vsx_splat_): Updated the predicates
>   for second operand.
> 
> gcc/testsuite/
>   PR target/113950
>   * gcc.target/powerpc/pr113950.c: New testcase.
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 6111cc90eb7..e5688ff972a 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4660,7 +4660,7 @@
>  (define_expand "vsx_splat_"
>[(set (match_operand:VSX_D 0 "vsx_register_operand")
>   (vec_duplicate:VSX_D
> -  (match_operand: 1 "input_operand")))]
> +  (match_operand: 1 "splat_input_operand")))]
>"VECTOR_MEM_VSX_P (mode)"
>  {
>rtx op1 = operands[1];

This hunk actually does force_reg already:

...
  else if (!REG_P (op1))
op1 = force_reg (mode, op1);

but it's assigning to op1 unexpectedly (an omission IMHO), so just
simply fix it with:

  else if (!REG_P (op1))
-op1 = force_reg (mode, op1);
+operands[1] = force_reg (mode, op1);

instead, can you verify?

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr113950.c 
> b/gcc/testsuite/gcc.target/powerpc/pr113950.c
> new file mode 100644
> index 000..29ded29f683
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr113950.c
> @@ -0,0 +1,24 @@
> +/* PR target/113950 */
> +/* { dg-do compile } */

We need an effective target to ensure vsx support, for now it's powerpc_vsx_ok.
ie: /* { dg-require-effective-target powerpc_vsx_ok } */

(most/all of its uses would be replaced with an enhanced powerpc_vsx in next 
stage 1).

BR,
Kewen


> +/* { dg-options "-O1" } */
> +
> +/* Verify we do not ICE on the following.  */
> +
> +void abort (void);
> +
> +int main ()
> +{
> +  int i;
> +  vector signed long long vsll_result, vsll_expected_result;
> +  signed long long sll_arg1;
> +
> +  sll_arg1 = 300;
> +  vsll_expected_result = (vector signed long long) {300, 300};
> +  vsll_result = __builtin_vsx_splat_2di (sll_arg1);  
> +
> +  for (i = 0; i < 2; i++)
> +if (vsll_result[i] != vsll_expected_result[i])
> +  abort();
> +
> +  return 0;
> +}
> 
> 



Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-26 Thread Kewen.Lin
on 2024/2/21 15:19, Michael Meissner wrote:
> On Tue, Feb 20, 2024 at 06:35:34PM +0800, Kewen.Lin wrote:
>> Hi Mike,
>>
>> Sorry for late reply (just back from vacation).
>>
>> on 2024/2/8 03:58, Michael Meissner wrote:
>>> On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote:
 on 2024/2/6 14:01, Michael Meissner wrote:
 Sorry for the possible confusion here, the "tune_proc" that I referred to 
 is
 the variable in the above else branch:

enum processor_type tune_proc = (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 
 : PROCESSOR_DEFAULT);

 It's either PROCESSOR_DEFAULT64 or PROCESSOR_DEFAULT, so it doesn't have a
 chance to be PROCESSOR_FUTURE, so the checking "tune_proc == 
 PROCESSOR_FUTURE"
 is useless.
>>>
>>> PROCESSOR_DEFAULT can be PROCESSOR_FUTURE if somebody configures GCC with
>>> --with-cpu=future.  While in general it shouldn't occur, it is helpful to
>>> consider all of the corner cases.
>>
>> But it sounds not true, I think you meant TARGET_CPU_DEFAULT instead?
>>
>> On one local ppc64le machine I tried to configure with --with-cpu=power10,
>> I got {,OPTION_}TARGET_CPU_DEFAULT "power10" but PROCESSOR_DEFAULT is still
>> PROCESSOR_POWER7 (PROCESSOR_DEFAULT64 is PROCESSOR_POWER8).  I think these
>> PROCESSOR_DEFAULT{,64} are defined by various headers:
> 
> Yes, I was mistaken.  You are correct TARGET_CPU_DEFAULT is set.  I will 
> change
> the comments.

Thanks!

> 
>> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
>> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
>> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
>> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
>> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER8
>> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
>> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT  PROCESSOR_PPC7400
>> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT64  PROCESSOR_POWER4
>> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7450
>> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
>> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
>> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
>> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT   PROCESSOR_PPC603
>> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT64 PROCESSOR_RS64A
>> gcc/config/rs6000/vxworks.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC604
>>
>> , and they are unlikely to be updated later, no?
>>
>> btw, the given --with-cpu=future will make cpu_index never negative so
>>
>>   ...
>>   else if (cpu_index >= 0)
>> rs6000_tune_index = tune_index = cpu_index;
>>   else
>> ... 
>>
>> so there is no chance to enter "else" arm, that is, that arm only takes
>> effect when no cpu/tune is given (neither -m{cpu,tune} nor --with-cpu=).
> 
> Note, this is existing code.  I didn't modify it.  If we want to change it, we
> should do it as another patch.

Yes, I agree.  Just to clarify, I didn't suggest changing it but instead
suggested almost keeping them, since we don't need any changes in "else"
arm, so instead of updating in arms "if" and "else if" for "future cpu type",
it seems a bit more clear to just check it after this, ie.:



bool explicit_tune = false;
if (rs6000_tune_index >= 0)
  {
tune_index = rs6000_tune_index;
explicit_tune = true;
  }
else if (cpu_index >= 0)
  // as before
  rs6000_tune_index = tune_index = cpu_index;
else
  {
   //as before
   ...
  }

// Check tune_index here instead.

if (processor_target_table[tune_index].processor == PROCESSOR_FUTURE)
  {
tune_index = rs6000_cpu_index_lookup (PROCESSOR_POWER10);
if (explicit_tune)
  warn ...
  }

// as before
rs6000_tune = processor_target_table[tune_index].processor;



, copied from previous comment: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643681.html

BR,
Kewen



Re: [PATCH]middle-end: delay updating of dominators until later during vectorization. [PR114081]

2024-02-26 Thread Richard Biener
On Sun, 25 Feb 2024, Tamar Christina wrote:

> Hi All,
> 
> The testcase shows an interesting case where we have multiple loops sharing a
> live value and have an early exit that go to the same location.  The 
> additional
> complication is that on x86_64 with -mavx we seem to also do prologue peeling
> on the loops.
> 
> We correctly identify which BB we need their dominators updated for, but we do
> so too early.
> 
> Instead of adding more dominator update we can solve this by for the cases 
> with
> multiple exits not to verify dominators at the end of peeling if peeling for
> vectorization.
> 
> We can then perform the final dominator updates just before vectorization when
> all loop transformations are done.

What's the actual CFG transform that happens between the old and the new
place?  I see a possible edge splitting but where is the one that makes
this patch work?

> This also means we reduce the number of dominator updates needed by at least
> 50% and fixes the ICE.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/114081
>   PR tree-optimization/113290
>   * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg):
>   Skip dominator update when multiple exit.
>   (vect_do_peeling): Remove multiple exit dominator update.
>   * tree-vect-loop.cc (vect_transform_loop): Update dominators when
>   multiple exits.
>   * tree-vectorizer.h (LOOP_VINFO_DOMS_NEED_UPDATE,
>dominators_needing_update): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/114081
>   PR tree-optimization/113290
>   * gcc.dg/vect/vect-early-break_120-pr114081.c: New test.
>   * gcc.dg/vect/vect-early-break_121-pr114081.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
> new file mode 100644
> index 
> ..2cd4ce1e4ac573ba6e41730fd2216f0ec8061376
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_120-pr114081.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +typedef struct filter_list_entry {
> +  const char *name;
> +  int id;
> +  void (*function)();
> +} filter_list_entry;
> +
> +static const filter_list_entry filter_list[9] = {0};
> +
> +void php_zval_filter(int filter, int id1) {
> +  filter_list_entry filter_func;
> +
> +  int size = 9;
> +  for (int i = 0; i < size; ++i) {
> +if (filter_list[i].id == filter) {
> +  filter_func = filter_list[i];
> +  goto done;
> +}
> +  }
> +
> +#pragma GCC novector
> +  for (int i = 0; i < size; ++i) {
> +if (filter_list[i].id == 0x0204) {
> +  filter_func = filter_list[i];
> +  goto done;
> +}
> +  }
> +done:
> +  if (!filter_func.id)
> +filter_func.function();
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> new file mode 100644
> index 
> ..feebdb7a6c9b8981d7be31dd1c741f9e36738515
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_121-pr114081.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +typedef struct filter_list_entry {
> +  const char *name;
> +  int id;
> +  void (*function)();
> +} filter_list_entry;
> +
> +static const filter_list_entry filter_list[9] = {0};
> +
> +void php_zval_filter(int filter, int id1) {
> +  filter_list_entry filter_func;
> +
> +  int size = 9;
> +  for (int i = 0; i < size; ++i) {
> +if (filter_list[i].id == filter) {
> +  filter_func = filter_list[i];
> +  goto done;
> +}
> +  }
> +
> +  for (int i = 0; i < size; ++i) {
> +if (filter_list[i].id == 0x0204) {
> +  filter_func = filter_list[i];
> +  goto done;
> +}
> +  }
> +done:
> +  if (!filter_func.id)
> +filter_func.function();
> +}
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index 
> 3f974d6d839e32516ae316f28ca25316e43d7d86..b5e158bc5cfb5107d5ff461e489d306f81e090d0
>  100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -1917,7 +1917,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop 
> *loop, edge loop_exit,
> doms.safe_push (e->dest);
>   }
>  
> -  

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-26 Thread H.J. Lu
On Sun, Feb 25, 2024 at 8:25 PM H.J. Lu  wrote:
>
> On Sun, Feb 25, 2024 at 7:03 PM Hongtao Liu  wrote:
> >
> > On Mon, Feb 26, 2024 at 10:37 AM H.J. Lu  wrote:
> > >
> > > On Sun, Feb 25, 2024 at 6:03 PM Hongtao Liu  wrote:
> > > >
> > > > On Mon, Feb 26, 2024 at 5:11 AM H.J. Lu  wrote:
> > > > >
> > > > > ldtilecfg and sttilecfg take a 512-byte memory block.  With
> > > > > _tile_loadconfig implemented as
> > > > >
> > > > > extern __inline void
> > > > > __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > > _tile_loadconfig (const void *__config)
> > > > > {
> > > > >   __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > > **)__config)));
> > > > > }
> > > > >
> > > > > GCC sees:
> > > > >
> > > > > (parallel [
> > > > >   (asm_operands/v ("ldtilecfg   %X0") ("") 0
> > > > >[(mem/f/c:DI (plus:DI (reg/f:DI 77 virtual-stack-vars)
> > > > >  (const_int -64 [0xffc0])) [1 
> > > > > MEM[(const void * *)_data]+0 S8 A128])]
> > > > >[(asm_input:DI ("m"))]
> > > > >(clobber (reg:CC 17 flags))])
> > > > >
> > > > > and the memory operand size is 1 byte.  As the result, the rest of 511
> > > > > bytes is ignored by GCC.  Implement ldtilecfg and sttilecfg intrinsics
> > > > > with a pointer to BLKmode to honor the 512-byte memory block.
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR target/114098
> > > > > * config/i386/amxtileintrin.h (_tile_loadconfig): Use
> > > > > __builtin_ia32_ldtilecfg.
> > > > > (_tile_storeconfig): Use __builtin_ia32_sttilecfg.
> > > > > * config/i386/i386-builtin.def (BDESC): Add
> > > > > __builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
> > > > > * config/i386/i386-expand.cc (ix86_expand_builtin): Handle
> > > > > IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
> > > > > * config/i386/i386.md (ldtilecfg): New pattern.
> > > > > (sttilecfg): Likewise.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR target/114098
> > > > > * gcc.target/i386/amxtile-4.c: New test.
> > > > > ---
> > > > >  gcc/config/i386/amxtileintrin.h   |  4 +-
> > > > >  gcc/config/i386/i386-builtin.def  |  4 ++
> > > > >  gcc/config/i386/i386-expand.cc| 19 
> > > > >  gcc/config/i386/i386.md   | 24 ++
> > > > >  gcc/testsuite/gcc.target/i386/amxtile-4.c | 55 
> > > > > +++
> > > > >  5 files changed, 104 insertions(+), 2 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/amxtile-4.c
> > > > >
> > > > > diff --git a/gcc/config/i386/amxtileintrin.h 
> > > > > b/gcc/config/i386/amxtileintrin.h
> > > > > index d1a26e0fea5..5081b326498 100644
> > > > > --- a/gcc/config/i386/amxtileintrin.h
> > > > > +++ b/gcc/config/i386/amxtileintrin.h
> > > > > @@ -39,14 +39,14 @@ extern __inline void
> > > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > >  _tile_loadconfig (const void *__config)
> > > > >  {
> > > > > -  __asm__ volatile ("ldtilecfg\t%X0" :: "m" (*((const void 
> > > > > **)__config)));
> > > > > +  __builtin_ia32_ldtilecfg (__config);
> > > > >  }
> > > > >
> > > > >  extern __inline void
> > > > >  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > > >  _tile_storeconfig (void *__config)
> > > > >  {
> > > > > -  __asm__ volatile ("sttilecfg\t%X0" : "=m" (*((void **)__config)));
> > > > > +  __builtin_ia32_sttilecfg (__config);
> > > > >  }
> > > > >
> > > > >  extern __inline void
> > > > > diff --git a/gcc/config/i386/i386-builtin.def 
> > > > > b/gcc/config/i386/i386-builtin.def
> > > > > index 729355230b8..88dd7f8857f 100644
> > > > > --- a/gcc/config/i386/i386-builtin.def
> > > > > +++ b/gcc/config/i386/i386-builtin.def
> > > > > @@ -126,6 +126,10 @@ BDESC (OPTION_MASK_ISA_XSAVES | 
> > > > > OPTION_MASK_ISA_64BIT, 0, CODE_FOR_nothing, "__b
> > > > >  BDESC (OPTION_MASK_ISA_XSAVES | OPTION_MASK_ISA_64BIT, 0, 
> > > > > CODE_FOR_nothing, "__builtin_ia32_xrstors64", IX86_BUILTIN_XRSTORS64, 
> > > > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > > >  BDESC (OPTION_MASK_ISA_XSAVEC | OPTION_MASK_ISA_64BIT, 0, 
> > > > > CODE_FOR_nothing, "__builtin_ia32_xsavec64", IX86_BUILTIN_XSAVEC64, 
> > > > > UNKNOWN, (int) VOID_FTYPE_PVOID_INT64)
> > > > >
> > > > > +/* LDFILECFG and STFILECFG.  */
> > > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > > CODE_FOR_ldtilecfg, "__builtin_ia32_ldtilecfg", 
> > > > > IX86_BUILTIN_LDTILECFG, UNKNOWN, (int) VOID_FTYPE_PCVOID)
> > > > > +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_AMX_TILE, 
> > > > > CODE_FOR_ldtilecfg, "__builtin_ia32_sttilecfg", 
> > > > > IX86_BUILTIN_STTILECFG, UNKNOWN, (int) VOID_FTYPE_PVOID)
> > > > CODE_FOR_sttilecfg.
> > >
> > > It is unused.  I changed both to CODE_FOR_nothing.
> > >
> > > > > +
> > > > >  /* SSE */
> > > > >  BDESC (OPTION_MASK_ISA_SSE, 0, 

[patch,avr] PR114100 : Better indirect addressing on reduced cores

2024-02-26 Thread Georg-Johann Lay



A description of what the patch does follows in the commit message below.

On ATmega128, there are no changes in test results.

On ATtiny40 (reduced core) there are no new execution fails,
but apart from that there is quite some noise in the delta:

unsupported (memory full) -> pass
unsupported (memory full) -> fail due to unresolved symbol (printf, 
float, ...)

unsupported (memory full) -> fail (excess errors) this is because the
testsuite is far from being diagnostic-clean.

All these transitions are because the code size shrinks a lot,
sometimes 20% or more.

When there are no objections or improvements, I would go ahead
and install it so it can go into v14.

Johann

--

The Reduced Tiny core does not support indirect addressing with offset,
which basically means that every indirect memory access with a size
of more than one byte is effectively POST_INC or PRE_DEC.  The lack of
that addressing mode is currently handled by pretending to support it,
and then let the insn printers add and subtract again offsets as needed.
For example, the following C code

   int vars[10];

   void inc_var2 (void) {
  ++vars[2];
   }

is compiled to:

   ldi r30,lo8(vars) ;  14   [c=4 l=2]  *movhi/4
   ldi r31,hi8(vars)
   subi r30,lo8(-(4));  15   [c=8 l=6]  *movhi/2
   sbci r31,hi8(-(4))
   ld r20,Z+
   ld r21,Z
   subi r30,lo8((4+1))
   sbci r31,hi8((4+1))
   subi r20,-1 ;  16   [c=4 l=2]  *addhi3_clobber/1
   sbci r21,-1
   subi r30,lo8(-(4+1));  17   [c=4 l=4]  *movhi/3
   sbci r31,hi8(-(4+1))
   st Z,r21
   st -Z,r20

where the code could be -- and with this patch actually is -- like

   ldi r30,lo8(vars+4);  28   [c=4 l=2]  *movhi/4
   ldi r31,hi8(vars+4)
   ld r20,Z+  ;  17   [c=8 l=2]  *movhi/2
   ld r21,Z+
   subi r20,-1;  19   [c=4 l=2]  *addhi3_clobber/1
   sbci r21,-1
   st -Z,r21  ;  30   [c=4 l=2]  *movhi/3
   st -Z,r20

This is achieved in two steps:

- A post-reload split into "real" instructions during .split2.
- A new avr-specific mini pass .avr-fuse-add that runs before
  RTL peephole and that tries to combine the generated pointer
  additions into memory accesses to form POST_INC or PRE_DEC.

gcc/
PR target/114100
* doc/invoke.texi (AVR Options) <-mfuse-add>: Document.
* config/avr/avr.opt (-mfuse-add=): New target option.
* common/config/avr/avr-common.cc (avr_option_optimization_table)
[OPT_LEVELS_1_PLUS]: Set -mfuse-add=1.
[OPT_LEVELS_2_PLUS]: Set -mfuse-add=2.
* config/avr/avr-passes.def (avr_pass_fuse_add): Insert new pass.
* config/avr/avr-protos.h (avr_split_tiny_move)
(make_avr_pass_fuse_add): New protos.
* config/avr/avr.md [AVR_TINY]: New post-reload splitter uses
avr_split_tiny_move to split indirect memory accesses.
(gen_move_clobbercc): New define_expand helper.
* config/avr/avr.cc (avr_pass_data_fuse_add): New pass data.
(avr_pass_fuse_add): New class from rtl_opt_pass.
(make_avr_pass_fuse_add, avr_split_tiny_move): New functions.
(reg_seen_between_p, emit_move_ccc, emit_move_ccc_after): New functions.
(avr_legitimate_address_p) [AVR_TINY]: Don't restrict offsets
of PLUS addressing for AVR_TINY.
(avr_regno_mode_code_ok_for_base_p) [AVR_TINY]: Ignore -mstrict-X.
(avr_out_plus_1) [AVR_TINY]: Tweak ++Y and --Y.
(avr_mode_code_base_reg_class) [AVR_TINY]: Always return POINTER_REGS.

 gcc/common/config/avr/avr-common.cc |   2 +
 gcc/config/avr/avr-passes.def   |   9 ++
 gcc/config/avr/avr-protos.h |   2 +
 gcc/config/avr/avr.cc   | 787 
-

 gcc/config/avr/avr.md   |  29 +
 gcc/config/avr/avr.opt  |   8 ++
 gcc/doc/invoke.texi |  10 +-
 7 files changed, 845 insertions(+), 2 deletions(-)diff --git a/gcc/common/config/avr/avr-common.cc b/gcc/common/config/avr/avr-common.cc
index 7867483909d..fdf130f1e1a 100644
--- a/gcc/common/config/avr/avr-common.cc
+++ b/gcc/common/config/avr/avr-common.cc
@@ -34,6 +34,8 @@ static const struct default_options avr_option_optimization_table[] =
 { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 },
 { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_mgas_isr_prologues, NULL, 1 },
 { OPT_LEVELS_1_PLUS, OPT_mmain_is_OS_task, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_mfuse_add_, NULL, 1 },
+{ OPT_LEVELS_2_PLUS, OPT_mfuse_add_, NULL, 2 },
 // Stick to the "old" placement of the subreg lowering pass.
 { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types_early, NULL, 1 },
 /* Allow optimizer to introduce store data races. This used to be the
diff --git a/gcc/config/avr/avr-passes.def b/gcc/config/avr/avr-passes.def
index 34e5b95f920..748260edaef 100644
--- a/gcc/config/avr/avr-passes.def
+++ b/gcc/config/avr/avr-passes.def
@@ -17,6 +17,15 @@
along with 

Re: [PATCH 1/2] LoongArch: NFC: Deduplicate crc instruction defines

2024-02-26 Thread chenglulu

LGTM!

Thanks!

在 2024/2/26 下午12:28, Xi Ruoyao 写道:

Introduce an iterator for UNSPEC_CRC and UNSPEC_CRCC to make the next
change easier.

gcc/ChangeLog:

* config/loongarch/loongarch.md (CRC): New define_int_iterator.
(crc): New define_int_attr.
(loongarch_crc_w__w, loongarch_crcc_w__w): Unify
into ...
(loongarch__w__w): ... here.
---
  gcc/config/loongarch/loongarch.md | 18 +-
  1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 2ce7a151880..4ded1b3a117 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -4251,24 +4251,16 @@ (define_peephole2
  
  
  (define_mode_iterator QHSD [QI HI SI DI])

+(define_int_iterator CRC [UNSPEC_CRC UNSPEC_CRCC])
+(define_int_attr crc [(UNSPEC_CRC "crc") (UNSPEC_CRCC "crcc")])
  
-(define_insn "loongarch_crc_w__w"

+(define_insn "loongarch__w__w"
[(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
   (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRC))]
+CRC))]
""
-  "crc.w..w\t%0,%1,%2"
-  [(set_attr "type" "unknown")
-   (set_attr "mode" "")])
-
-(define_insn "loongarch_crcc_w__w"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:QHSD 1 "register_operand" "r")
-  (match_operand:SI 2 "register_operand" "r")]
-UNSPEC_CRCC))]
-  ""
-  "crcc.w..w\t%0,%1,%2"
+  ".w..w\t%0,%1,%2"
[(set_attr "type" "unknown")
 (set_attr "mode" "")])
  




Re: [PATCH]middle-end: update vuses out of loop which use a vdef that's moved [PR114068]

2024-02-26 Thread Richard Biener
On Fri, 23 Feb 2024, Tamar Christina wrote:

> Hi All,
> 
> In certain cases we can have a situation where the merge block has a vUSE
> virtual PHI and the exits do not.  In this case for instance the exits lead
> to an abort so they have no virtual PHIs.  If we have a store before the first
> exit and we move it to a later block during vectorization we update the vUSE
> chain.
> 
> However the merge block is not an exit and is not visited by the update code.
> 
> This patch fixes it by checking during moving if there are any out of loop 
> uses
> of the vDEF that is the last_seen_vuse.  Normally there wouldn't be any and
> things are skipped, but if there is then update that to the last vDEF in the
> exit block.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and
> x86_64-pc-linux-gnu no issues.
> 
> Ok for master?

I think the problem is rather that BB13 misses the LC virtual PHI which
it requires because of the merge block.  So this should be fixed during
peeling instead.  There we assume the IV exit we choose will have
a virtual LC PHI if the loop needs one (but we choose the exit going
to unreachable () as IV exit ...).  This assumption breaks down in this
case.

Let me try to fix this.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimizations/114068
>   * tree-vect-loop.cc (move_early_exit_stmts): Update vUSE chain in merge
>   block.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimizations/114068
>   * gcc.dg/vect/vect-early-break_118-pr114068.c: New test.
>   * gcc.dg/vect/vect-early-break_119-pr114068.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c
> new file mode 100644
> index 
> ..b462a464b6603e718c5a283513ea586fc13e37ce
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_118-pr114068.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +struct h {
> +  int b;
> +  int f;
> +} k;
> +
> +void n(int m) {
> +  struct h a = k;
> +  for (int o = m; o; ++o) {
> +if (a.f)
> +  __builtin_unreachable();
> +if (o > 1)
> +  __builtin_unreachable();
> +*( + o) = 1;
> +  }
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c 
> b/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c
> new file mode 100644
> index 
> ..a65ef7b8c4901b2ada585f38fda436dc07d1e1de
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_119-pr114068.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +/* { dg-add-options vect_early_break } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-O3" } */
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +
> +struct h {
> +  int b;
> +  int c;
> +  int f;
> +} k;
> +
> +void n(int m) {
> +  struct h a = k;
> +  for (int o = m; o; ++o) {
> +if (a.f)
> +  __builtin_unreachable();
> +if (o > 1)
> +  __builtin_unreachable();
> +*( + o) = 1;
> +*( + o*m) = 2;
> +  }
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 
> 35f1f8c7d4245135ace740ff9be548919587..44bd8032b55b1ef84fdf4fa9d6117304b7709d6f
>  100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11837,6 +11837,27 @@ move_early_exit_stmts (loop_vec_info loop_vinfo)
>update_stmt (p);
>  }
>  
> +  /* last_seen_vuse should now be the PHI in the loop header.  Check for
> + any out of loop uses and update them to the vUSE on the loop latch.  */
> +  auto vuse_stmt =  loop_vinfo->lookup_def (last_seen_vuse);
> +  gphi *vuse_def;
> +  if (vuse_stmt
> +  && (vuse_def = dyn_cast  (STMT_VINFO_STMT (vuse_stmt
> +{
> +  imm_use_iterator iter;
> +  use_operand_p use_p;
> +  gimple *use_stmt;
> +  auto loop = LOOP_VINFO_LOOP (loop_vinfo);
> +  tree vuse = PHI_ARG_DEF_FROM_EDGE (vuse_def, loop_latch_edge (loop));
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, last_seen_vuse)
> + {
> +   if (flow_bb_inside_loop_p (loop, use_stmt->bb))
> + continue;
> +   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> + SET_USE (use_p, vuse);
> + }
> +}
> +
>/* And update the LC PHIs on exits.  */
>for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP  (loop_vinfo)))
>  if (!dominated_by_p (CDI_DOMINATORS, e->src, dest_bb))
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG 

[COMMITTED] testsuite: Fix gcc.dg/attr-weakref-1.c on Solaris/x86 with as [PR70582]

2024-02-26 Thread Rainer Orth
gcc.dg/attr-weakref-1.c FAILs on 32 and 64-bit Solaris/x86 with the
native assembler:

FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable

Excess errors:
Assembler: attr-weakref-1.c
"/var/tmp//ccUSaysF.s", line 171 : Multiply defined symbol: "Wv3a"

This is a bug in the native as, which isn't seeing fixes recently.

Since only a single subtest is affected, this patch omits that one.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

Committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-02-24  Rainer Orth  

gcc/testsuite:
PR ipa/70582
* gcc.dg/attr-weakref-1.c (dg-additional-options): Define
SOLARIS_X86_AS as appropriate.
(lv3, Wv3a, pv3a): Wrap in !SOLARIS_X86_AS.
(main): Likewise for chk (pv3a).

# HG changeset patch
# Parent  c2e97b9d632bb04de7c4c87d39b7e813f640f9f9
testsuite: Fix gcc.dg/attr-weakref-1.c on Solaris/x86 with as [PR70582]

diff --git a/gcc/testsuite/gcc.dg/attr-weakref-1.c b/gcc/testsuite/gcc.dg/attr-weakref-1.c
--- a/gcc/testsuite/gcc.dg/attr-weakref-1.c
+++ b/gcc/testsuite/gcc.dg/attr-weakref-1.c
@@ -14,6 +14,8 @@
 // { dg-options "-O2" }
 // { dg-additional-options "-Wl,-undefined,dynamic_lookup" { target *-*-darwin* } }
 // { dg-additional-options "-Wl,-flat_namespace" { target *-*-darwin[89]* } }
+// One subtest doesn't assemble with the Solaris/x86 as (PR ipa/70582)
+// { dg-additional-options "-DSOLARIS_X86_AS" { target { *86*-*-solaris2* && { ! gas } } } }
 // { dg-additional-sources "attr-weakref-1a.c" }
 
 // Copyright 2005 Free Software Foundation, Inc.
@@ -46,9 +48,11 @@ vtype gv2;
 static vtype Wv2a __attribute__((weakref ("gv2")));
 static vtype *pv2a USED = 
 
+#if !defined SOLARIS_X86_AS
 static vtype lv3;
 static vtype Wv3a __attribute__((weakref ("lv3")));
 static vtype *pv3a USED = 
+#endif
 
 extern vtype uv4;
 static vtype Wv4a __attribute__((weakref ("uv4")));
@@ -192,7 +196,9 @@ extern ftype wf14 __attribute__((weak));
 int main () {
   chk (!pv1a);
   chk (pv2a);
+#if !defined(SOLARIS_X86_AS)
   chk (pv3a);
+#endif
   chk (pv4a);
   chk (pv4);
   chk (pv5a);


[COMMITTED] testsuite: xfail gcc.c-torture/compile/pr61159.c on Solaris/x86 with as [PR61159]

2024-02-26 Thread Rainer Orth
gcc.c-torture/compile/pr61159.c currently FAILs on 32 and 64-bit
Solaris/x86 with the native assembler:

FAIL: gcc.c-torture/compile/pr61159.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -O1  (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -O2 -flto  (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -O2 -flto -flto-partition=none  (test 
for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/compile/pr61159.c   -Os  (test for excess errors)

Excess errors:
Assembler: pr61159.c
"/var/tmp//ccRtFPva.s", line 5 : Cannot set a weak symbol to a common 
symbol

This is a bug/limitation in the native assembler.  Given that this
hasn't seen fixes for a long time, this patch xfails the test.

Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.

Committed to trunk.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-02-24  Rainer Orth  

gcc/testsuite:
PR ipa/61159
* gcc.c-torture/compile/pr61159.c: xfail on Solaris/x86 with as.

# HG changeset patch
# Parent  32a7094d0d8c1c287ab2dce6f79e2ad1530d1113
testsuite: xfail gcc.c-torture/compile/pr61159.c on Solaris/x86 with as [PR61159]

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr61159.c b/gcc/testsuite/gcc.c-torture/compile/pr61159.c
--- a/gcc/testsuite/gcc.c-torture/compile/pr61159.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr61159.c
@@ -1,6 +1,6 @@
 /* { dg-require-alias "" } */
 /* { dg-require-weak "" } */
-/* { dg-xfail-if "weak alias" { powerpc-ibm-aix* } } */
+/* { dg-xfail-if "weak alias" { powerpc-ibm-aix* || { *86*-*-solaris* && { ! gas } } } } */
 
 static int dummy = 0;
 extern int foo __attribute__((__weak__, __alias__("dummy")));


Re: Patch ping^2

2024-02-26 Thread Uros Bizjak
On Mon, Feb 26, 2024 at 10:33 AM Jakub Jelinek  wrote:
>
> Hi!
>
> I'd like to ping 2 patches:

> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645326.html
> i386: Enable _BitInt support on ia32
>
> all the FAILs mentioned in that mail have been fixed by now.

LGTM, based on HJ's advice.

Uros.


Patch ping^2

2024-02-26 Thread Jakub Jelinek
Hi!

I'd like to ping 2 patches:

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/644580.html  

  
PR113617 P1 - Handle private COMDAT function symbol reference in readonly data 
section 
   


  
More details in the 

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/thread.html#644121   

  
and 

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/thread.html#644486   

  
threads.

  

and

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645326.html
i386: Enable _BitInt support on ia32

all the FAILs mentioned in that mail have been fixed by now.

Thanks

Jakub



Re: [wwwdocs] Add Ada's GCC 14 changelog entry

2024-02-26 Thread Marc Poulhiès


Fernando Oleo Blanco  writes:

> Dear all,
>
> just like last year, I would like to commit the changes that took place
> over at GNAT for GCC v14. The patch is attached to the email. Hopefully
> it is good enough to just be added to master. If you see something wrong
> or if you would like to add anything to it, feel free :) Feedback is
> always welcomed.

Fernando,

Thank you for this work! I have a few comments, see below.

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 85ccc54d..e6c96c9f 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -171,7 +171,49 @@ a work-in-progress.
 
 New Languages and Language specific improvements

-
+Ada
+
+
+  Several new aspects and contracts have been implemented:

Maybe worth noting that these are implementation defined aspects.

+
+  Exceptional_Cases may be specified for procedures and
+  functions with side effects; it can be used to list exceptions that might
+  be propagated by the subprogram with side effects in the context of its
+  precondition, and associate them with a specific postcondition. For more
+  information, refer to SPARK 2014 Reference Manual, section 6.1.9.
+  User_Aspect takes an argument that is the name of an
+  aspect defined by a User_Aspect_Definition configuration pragma.
+  Local_Restrictions is used to specify that a particular
+  subprogram does not violate one or more local restrictions, nor can it
+  call a subprogram that is not subject to the same requirements.
+  Side_Effects is equivalent to pragma
+  Side_Effecs.
+  Always_Terminates is a boolean equivalent to 
pragma
+  Always_Terminates
+  Ghost_Predicate

It looks like Ghost_Predicate is missing some text here.

It may be a good thing to link to the actual documentation for these
options. Thanks to some documention changes, we can now link to
an option directly. For example:

https://gcc.gnu.org/onlinedocs/gnat_rm/Implementation-Defined-Pragmas.html

You would need to point to the correct version (this one points to
current devel version).

+
+  
+  The new attributes and contracts have been applied to the relevant parts
+of the Ada library and more code has been proven to be correct.
+  Initial support for the
+  https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/;>CHERI
+  architecture.
+  Support for the LoongArch architecture.
+  Hardening improvements:
+
+  Use of the new -fharden* options. Most
+  notably -fharden-compares,
+  -fharden-conditional-branches and
+  -fharden-control-flow-redundancy.
+  Custom bools with higher Hamming distance.
+  The strub attribute has been added for functions and

Same as above for doc links:

https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-compares

+  variables in order to automatically zero-out their stack upon use or
+  return.
+
+  
+  Further clean up and improvements to the GNAT code.
+  Support for vxWorks 7 Cert RTP has been removed.
+

 


Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> On Mon, Feb 26, 2024 at 09:53:41AM +0100, Richard Biener wrote:
> > On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> > 
> > > On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > > > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> > > > > >  those through even when the mode isn't word_mode.  For
> > > > > >  ops we have to lower the lowering code assumes we are
> > > > > >  dealing with word_mode.  */
> > > > > > -  if code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > > > NEGATE_EXPR)
> > > > > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
> > > > > > + || (((code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > > > NEGATE_EXPR)
> > > > > > || !target_support_p)
> > > > > >&& maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
> > > > > >   /* Check only during analysis.  */
> > > > 
> > > > I think it will work fine.  Even after the last TLC this feels like in
> > > > the need of more TLC ;)
> > > 
> > > Note, at least in theory, floating point NEGATE_EXPR could be handled just
> > > fine in the emulated vectors, just xor the sign bit, and
> > > BIT_{AND,IOR,XOR}_EXPR also would work but likely aren't valid IL on
> > > floating point modes (though e.g. in RTL they are used even for them),
> > 
> > Hmm, I think vector types not supported only ever get integer modes
> > assigned, not FP modes so the operations would be on integer modes.
> 
> I mean one could still SLP vectorize using the emulated vectors
> float a[2], b[2];
> ...
>   a[0] = -b[0];
>   a[1] = -b[1];
> as effectively
>   *(unsigned long long *)a = *(unsigned long long *)b ^ 0x80008000ULL;
> if we handled that case later on (except the above !INTEGRAL_TYPE_P check
> would prevent that).

Yep.

> As for BIT_{AND,IOR,XOR}_EXPR, seems the GIMPLE verifiers actually don't
> disallow
>   float f, g;
>   f_3 |= g_4;
> etc. (but the FEs do), though maybe we'd ICE on some targets during
> expansion; i386.md has {and,ior,xor}{s,d,x}f3 optabs.

I think that's an omission in the verifier (we allow the ops on
pointers though).

Richard.


Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 09:53:41AM +0100, Richard Biener wrote:
> On Mon, 26 Feb 2024, Jakub Jelinek wrote:
> 
> > On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> > > > >those through even when the mode isn't word_mode.  For
> > > > >ops we have to lower the lowering code assumes we are
> > > > >dealing with word_mode.  */
> > > > > -  if code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > > NEGATE_EXPR)
> > > > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
> > > > > +   || (((code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > > NEGATE_EXPR)
> > > > >   || !target_support_p)
> > > > >  && maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
> > > > > /* Check only during analysis.  */
> > > 
> > > I think it will work fine.  Even after the last TLC this feels like in
> > > the need of more TLC ;)
> > 
> > Note, at least in theory, floating point NEGATE_EXPR could be handled just
> > fine in the emulated vectors, just xor the sign bit, and
> > BIT_{AND,IOR,XOR}_EXPR also would work but likely aren't valid IL on
> > floating point modes (though e.g. in RTL they are used even for them),
> 
> Hmm, I think vector types not supported only ever get integer modes
> assigned, not FP modes so the operations would be on integer modes.

I mean one could still SLP vectorize using the emulated vectors
float a[2], b[2];
...
  a[0] = -b[0];
  a[1] = -b[1];
as effectively
  *(unsigned long long *)a = *(unsigned long long *)b ^ 0x80008000ULL;
if we handled that case later on (except the above !INTEGRAL_TYPE_P check
would prevent that).

As for BIT_{AND,IOR,XOR}_EXPR, seems the GIMPLE verifiers actually don't
disallow
  float f, g;
  f_3 |= g_4;
etc. (but the FEs do), though maybe we'd ICE on some targets during
expansion; i386.md has {and,ior,xor}{s,d,x}f3 optabs.
Emulation using integer mode would be well defined.

Jakub



Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> > > >  those through even when the mode isn't word_mode.  For
> > > >  ops we have to lower the lowering code assumes we are
> > > >  dealing with word_mode.  */
> > > > -  if code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > NEGATE_EXPR)
> > > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
> > > > + || (((code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > > NEGATE_EXPR)
> > > > || !target_support_p)
> > > >&& maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
> > > >   /* Check only during analysis.  */
> > 
> > I think it will work fine.  Even after the last TLC this feels like in
> > the need of more TLC ;)
> 
> Note, at least in theory, floating point NEGATE_EXPR could be handled just
> fine in the emulated vectors, just xor the sign bit, and
> BIT_{AND,IOR,XOR}_EXPR also would work but likely aren't valid IL on
> floating point modes (though e.g. in RTL they are used even for them),

Hmm, I think vector types not supported only ever get integer modes
assigned, not FP modes so the operations would be on integer modes.
Unless I'm missing the point ...

> it is mainly PLUS_EXPR/MINUS_EXPR.  Perhaps the NEGATE_EXPR isn't worth
> it though.  In any case, that wouldn't be stage4 material.

Agreed.

Richard.


Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Jakub Jelinek
On Mon, Feb 26, 2024 at 09:00:58AM +0100, Richard Biener wrote:
> > > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> > >those through even when the mode isn't word_mode.  For
> > >ops we have to lower the lowering code assumes we are
> > >dealing with word_mode.  */
> > > -  if code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > > NEGATE_EXPR)
> > > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
> > > +   || (((code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
> > >   || !target_support_p)
> > >  && maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
> > > /* Check only during analysis.  */
> 
> I think it will work fine.  Even after the last TLC this feels like in
> the need of more TLC ;)

Note, at least in theory, floating point NEGATE_EXPR could be handled just
fine in the emulated vectors, just xor the sign bit, and
BIT_{AND,IOR,XOR}_EXPR also would work but likely aren't valid IL on
floating point modes (though e.g. in RTL they are used even for them),
it is mainly PLUS_EXPR/MINUS_EXPR.  Perhaps the NEGATE_EXPR isn't worth
it though.  In any case, that wouldn't be stage4 material.

Jakub



Re: [PATCH v2] x86: Check interrupt instead of noreturn attribute

2024-02-26 Thread Uros Bizjak
On Sun, Feb 25, 2024 at 10:14 PM H.J. Lu  wrote:
>
> ix86_set_func_type checks noreturn attribute to avoid incompatible
> attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
> is set also for _Noreturn without noreturn attribute, check interrupt
> attribute for interrupt functions instead.
>
> gcc/
>
> PR target/114097
> * config/i386/i386-options.cc (ix86_set_func_type): Check
> interrupt instead of noreturn attribute.
>
> gcc/testsuite/
>
> PR target/114097
> * gcc.target/i386/pr114097-1.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.cc|  8 ---
>  gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
>  2 files changed, 31 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 93a01146db7..1301f6b913e 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3391,11 +3391,13 @@ ix86_set_func_type (tree fndecl)
>   into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
>   the local-pure-const pass is run after ix86_set_func_type is called.
>   When the local-pure-const pass is enabled for LTO, the interrupt
> - function is marked as noreturn in the IR output, which leads the
> - incompatible attribute error in LTO1.  */
> + function is marked with TREE_THIS_VOLATILE in the IR output, which
> + leads to the incompatible attribute error in LTO1.  Ignore the
> + interrupt function in this case.  */
>bool has_no_callee_saved_registers
>  = ((TREE_THIS_VOLATILE (fndecl)
> -   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
> +   && !lookup_attribute ("interrupt",
> + TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
> && optimize
> && !optimize_debug
> && (TREE_NOTHROW (fndecl) || !flag_exceptions))
> diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
> b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> new file mode 100644
> index 000..b14c7b6214d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
> -fomit-frame-pointer" } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +_Noreturn
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +for (j = ARRAY_SIZE; j > 0; --j)
> +  for (k = ARRAY_SIZE; k > 0; --k)
> +   array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> --
> 2.43.2
>


Re: [PATCH v1 05/13] Reuse MinGW from i386 for AArch64

2024-02-26 Thread Bernhard Reutner-Fischer
On Sun, 25 Feb 2024 at 22:12, Mark Harmstone  wrote:
> Also, there are relocation types needed for Windows programs that are 
> supported in COFF but not in ELF object files.
>

Right, it's been a long time since i last dealt with PECOFF and i had
assumed that things might have changed in the meantime.

PS: Please don't forget to add an entry to contrib/config-list.mk
thanks


Re: [PATCH] match.pd: Guard 2 simplifications on integral TYPE_OVERFLOW_UNDEFINED [PR114090]

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> These 2 patterns are incorrect on floating point, or for -fwrapv, or
> for -ftrapv, or the first one for unsigned types (the second one is
> mathematically correct, but we ought to just fold that to 0 instead).
> 
> So, the following patch properly guards this.
> 
> I think we don't need && !TYPE_OVERFLOW_SANITIZED (type) because
> in both simplifications there would be UB before and after on
> signed integer minimum.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-02-26  Jakub Jelinek  
> 
>   PR tree-optimization/114090
>   * match.pd ((x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x):
>   Restrict pattern to ANY_INTEGRAL_TYPE_P and TYPE_OVERFLOW_UNDEFINED
>   types.
>   ((x <= 0 ? -x : 0) -> max(-x, 0)): Likewise.
> 
>   * gcc.dg/pr114090.c: New test.
> 
> --- gcc/match.pd.jj   2024-02-22 10:09:48.678446435 +0100
> +++ gcc/match.pd  2024-02-24 19:23:32.201014245 +0100
> @@ -453,8 +453,9 @@ (define_operator_list SYNC_FETCH_AND_AND
>  
>  /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
>  (simplify
> -  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
> -  (abs @0))
> + (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
> + (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
> +  (abs @0)))
>  
>  /* X * 1, X / 1 -> X.  */
>  (for op (mult trunc_div ceil_div floor_div round_div exact_div)
> @@ -4218,8 +4219,9 @@ (define_operator_list SYNC_FETCH_AND_AND
>  
>  /* (x <= 0 ? -x : 0) -> max(-x, 0).  */
>  (simplify
> -  (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
> -  (max @2 @1))
> + (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
> + (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
> +  (max @2 @1)))
>  
>  /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
>  (for op (bit_xor bit_ior plus)
> --- gcc/testsuite/gcc.dg/pr114090.c.jj2024-02-24 19:38:56.301096850 
> +0100
> +++ gcc/testsuite/gcc.dg/pr114090.c   2024-02-24 19:42:26.917153801 +0100
> @@ -0,0 +1,38 @@
> +/* PR tree-optimization/114090 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fwrapv" } */
> +
> +__attribute__((noipa)) int
> +foo (int x)
> +{
> +  int w = (x >= 0 ? x : 0);
> +  int y = -x;
> +  int z = (y >= 0 ? y : 0);
> +  return w + z;
> +}
> +
> +__attribute__((noipa)) int
> +bar (int x)
> +{
> +  int w = (x >= 0 ? x : 0);
> +  int z = (x <= 0 ? -x : 0);
> +  return w + z;
> +}
> +
> +__attribute__((noipa)) int
> +baz (int x)
> +{
> +  return x <= 0 ? -x : 0;
> +}
> +
> +int
> +main ()
> +{
> +  int v = -__INT_MAX__ - 1;
> +  if (foo (v) != 0)
> +__builtin_abort ();
> +  if (bar (v) != v)
> +__builtin_abort ();
> +  if (baz (v) != v)
> +__builtin_abort ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] fold-const: Avoid infinite recursion in +-*&|^minmax reassociation [PR114084]

2024-02-26 Thread Richard Biener
On Mon, 26 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> In the following testcase we infinitely recurse during BIT_IOR_EXPR
> reassociation.
> One operand is (unsigned _BitInt(31)) a << 4 and another operand
> 2147483647 >> 1 | 80 where both the right shift and the | 80
> trees have TREE_CONSTANT set, but weren't folded because of delayed
> folding, where some foldings are apparently done even in that case
> unfortunately.
> Now, the fold_binary_loc reassocation code splits both operands into
> variable part, minus variable part, constant part, minus constant part,
> literal part and minus literal parts, to prevent infinite recursion
> punts if there are just 2 parts altogether from the 2 operands and then goes
> on with reassociation, merges first the corresponding parts from both
> operands and then some further merges.
> The problem with the above expressions is that we get 3 different objects,
> var0 (the left shift), con1 (the right shift) and lit1 (80), so the infinite
> recursion prevention doesn't trigger, and we eventually merge con1 with
> lit1, which effectively reconstructs the original op1 and then associate
> that with var0 which is original op0, and associate_trees for that case
> calls fold_binary.  There are some casts involved there too (the T typedef
> type and the underlying _BitInt type which are stripped with STRIP_NOPS).
> 
> The following patch attempts to prevent this infinite recursion by tracking
> the origin (if certain var comes from nothing - 0, op0 - 1, op1 - 2 or both - 
> 3)
> and propagates it through all the associate_tree calls which merge the vars.
> If near the end we'd try to merge what comes solely from op0 with what comes
> solely from op1 (or vice versa), the patch punts, because then it isn't any
> kind of reassociation between the two operands, if anything it should be
> handled when folding the suboperands.

That sounds like a nice thing to do.

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-02-26  Jakub Jelinek  
> 
>   PR middle-end/114084
>   * fold-const.cc (fold_binary_loc): Avoid the final associate_trees
>   if all subtrees of var0 come from one of the op0 or op1 operands
>   and all subtrees of con0 come from the other one.  Don't clear
>   variables which are never used afterwards.
> 
>   * gcc.dg/bitint-94.c: New test.
> 
> --- gcc/fold-const.cc.jj  2024-02-24 09:49:09.098815803 +0100
> +++ gcc/fold-const.cc 2024-02-24 11:08:56.036223491 +0100
> @@ -11779,6 +11779,15 @@ fold_binary_loc (location_t loc, enum tr
> + (lit0 != 0) + (lit1 != 0)
> + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
>   {
> +   int var0_origin = (var0 != 0) + 2 * (var1 != 0);
> +   int minus_var0_origin
> + = (minus_var0 != 0) + 2 * (minus_var1 != 0);
> +   int con0_origin = (con0 != 0) + 2 * (con1 != 0);
> +   int minus_con0_origin
> + = (minus_con0 != 0) + 2 * (minus_con1 != 0);
> +   int lit0_origin = (lit0 != 0) + 2 * (lit1 != 0);
> +   int minus_lit0_origin
> + = (minus_lit0 != 0) + 2 * (minus_lit1 != 0);
> var0 = associate_trees (loc, var0, var1, code, atype);
> minus_var0 = associate_trees (loc, minus_var0, minus_var1,
>   code, atype);
> @@ -11791,15 +11800,19 @@ fold_binary_loc (location_t loc, enum tr
>  
> if (minus_var0 && var0)
>   {
> +   var0_origin |= minus_var0_origin;
> var0 = associate_trees (loc, var0, minus_var0,
> MINUS_EXPR, atype);
> minus_var0 = 0;
> +   minus_var0_origin = 0;
>   }
> if (minus_con0 && con0)
>   {
> +   con0_origin |= minus_con0_origin;
> con0 = associate_trees (loc, con0, minus_con0,
> MINUS_EXPR, atype);
> minus_con0 = 0;
> +   minus_con0_origin = 0;
>   }
>  
> /* Preserve the MINUS_EXPR if the negative part of the literal is
> @@ -11815,15 +11828,19 @@ fold_binary_loc (location_t loc, enum tr
> /* But avoid ending up with only negated parts.  */
> && (var0 || con0))
>   {
> +   minus_lit0_origin |= lit0_origin;
> minus_lit0 = associate_trees (loc, minus_lit0, lit0,
>   MINUS_EXPR, atype);
> lit0 = 0;
> +   lit0_origin = 0;
>   }
> else
>   {
> +   lit0_origin |= minus_lit0_origin;
> lit0 = associate_trees (loc, lit0, minus_lit0,
> MINUS_EXPR, atype);
> minus_lit0 = 0;
> + 

Re: [PATCH v2] Do not emulate vectors containing floats.

2024-02-26 Thread Richard Biener
On Fri, 23 Feb 2024, Jakub Jelinek wrote:

> On Fri, Feb 23, 2024 at 02:43:45PM +0100, Juergen Christ wrote:
> > The emulation via word mode tries to perform integer arithmetic on floating
> > point values instead of floating point arithmetic.  This leads to
> > mis-compilations.
> > 
> > Failure occured on s390x on these existing test cases:
> > gcc.dg/vect/tsvc/vect-tsvc-s112.c
> > gcc.dg/vect/tsvc/vect-tsvc-s113.c
> > gcc.dg/vect/tsvc/vect-tsvc-s119.c
> > gcc.dg/vect/tsvc/vect-tsvc-s121.c
> > gcc.dg/vect/tsvc/vect-tsvc-s131.c
> > gcc.dg/vect/tsvc/vect-tsvc-s132.c
> > gcc.dg/vect/tsvc/vect-tsvc-s2233.c
> > gcc.dg/vect/tsvc/vect-tsvc-s421.c
> > gcc.dg/vect/vect-alias-check-14.c
> > gcc.target/s390/vector/partial/s390-vec-length-epil-run-1.c
> > gcc.target/s390/vector/partial/s390-vec-length-epil-run-3.c
> > gcc.target/s390/vector/partial/s390-vec-length-full-run-3.c
> > 
> > gcc/ChangeLog:
> > 
> 
> Please add
>   PR tree-optimization/114075
> above the * tree-vect-stmts line.
> > * tree-vect-stmts.cc (vectorizable_operation): Don't emulate floating
> >   point vectors
> 
> This line should be tab indented like the first one, and end with .
> And given what the patch does, perhaps say non-integral instead of floating
> point.
> 
> As for testcase, I'll handle it separately, given that it already
> fixes some pre-existing tests.
> 
> > Signed-off-by: Juergen Christ 
> > ---
> >  gcc/tree-vect-stmts.cc | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index 09749ae38174..f95ff2c2aa34 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -6756,7 +6756,8 @@ vectorizable_operation (vec_info *vinfo,
> >  those through even when the mode isn't word_mode.  For
> >  ops we have to lower the lowering code assumes we are
> >  dealing with word_mode.  */
> > -  if code == PLUS_EXPR || code == MINUS_EXPR || code == 
> > NEGATE_EXPR)
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype))
> > + || (((code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
> > || !target_support_p)
> >&& maybe_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD))
> >   /* Check only during analysis.  */

I think it will work fine.  Even after the last TLC this feels like in
the need of more TLC ;)

So OK.  Also for affected branches - the effective check should be the
same in GCC 13 at least, but with some added ad-hoc costing which might
make this not trigger (maybe_lt (nunits_out, 4U)) - so we'd need a
word_mode that can cover 4 FP elements.  Possibly triggerable with
HFmode?

Thanks,
Richard.

> LGTM, but please wait until Monday evening so that Richi or Richard
> have a chance to chime in.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)