Re: [i386] Fix couple of issues in large PIC model on x86-64/VxWorks

2021-11-08 Thread Uros Bizjak via Gcc-patches
On Tue, Oct 5, 2021 at 5:50 PM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> the first issue is that the !gotoff_operand path of legitimize_pic_address in
> large PIC model does not make use of REG when it is available, which breaks
> for thunks because new pseudo-registers can no longer be created.  And the
> second issue is that the system compiler (LLVM) generates @GOTOFF in large
> model even for RTP, so we do the same.
>
> Tested on x86-64/Linux and VxWorks, OK for the mainline?
>
>
> 2021-10-05  Eric Botcazou  
>
> * config/i386/i386.c (legitimize_pic_address): Adjust comment and
> use the REG argument on the CM_LARGE_PIC code path as well.
> * config/i386/predicates.md (gotoff_operand): Do not treat VxWorks
> specially with the large code models.

LGTM for the generic part, no idea for VxWorks.

Thanks,
Uros.


Re: [i386] Fix couple of issues in large PIC model on x86-64/VxWorks

2021-11-08 Thread Eric Botcazou via Gcc-patches
> LGTM for the generic part, no idea for VxWorks.

Thanks.  The VxWorks-specific hunk is needed to make GCC compatible with the 
system compiler on this architecture (LLVM) and I have CCed Olivier.

-- 
Eric Botcazou




Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different

2021-11-08 Thread Richard Biener via Gcc-patches
On Sat, Nov 6, 2021 at 10:56 AM Jakub Jelinek  wrote:
>
> On Fri, Nov 05, 2021 at 05:37:25PM +, Qing Zhao wrote:
> > > On Nov 5, 2021, at 11:17 AM, Jakub Jelinek  wrote:
> > >
> > > On Fri, Nov 05, 2021 at 04:11:36PM +, Qing Zhao wrote:
> > >> 3076   if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> > >> 3077   && tree_fits_uhwi_p (var_size)
> > >> 3078   && (init_type == AUTO_INIT_PATTERN
> > >> 3079   || !is_gimple_reg_type (var_type))
> > >> 3080   && int_mode_for_size (tree_to_uhwi (var_size) * 
> > >> BITS_PER_UNIT,
> > >> 3081 0).exists ())
> > >> 3082 {
> > >> 3083   unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi 
> > >> (var_size);
> > >> 3084   unsigned char *buf = (unsigned char *) xmalloc 
> > >> (total_bytes);
> > >> 3085   memset (buf, (init_type == AUTO_INIT_PATTERN
> > >> 3086 ? INIT_PATTERN_VALUE : 0), total_bytes);
> > >> 3087   tree itype = build_nonstandard_integer_type
> > >> 3088  (total_bytes * BITS_PER_UNIT, 1);
> > >>
> > >> The exact failing point is at function 
> > >> “set_min_and_max_values_for_integral_type”:
> > >>
> > >> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> > >>
> > >> For _Complex long double,  “precision” is 256.
> > >> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512.
> > >> As a result, the above assertion failed on GCC11.
> > >>
> > >> I am wondering what’s the best fix for this issue in gcc11?
> > >
> > > Even for gcc 12 the above is wrong, you can't blindly assume that
> > > build_nonstandard_integer_type will work for arbitrary precisions,
> > > and even if it works that it will actually work.
> > > The fact that such a mode exist is one thing, but
> > > targetm.scalar_mode_supported_p should be tested for whether the mode
> > > is actually supported.
> >
> > You mean “int_mode_for_size().exists()” is not enough to make sure
> > “build_nonstandard_integer_type” to be valid?  We should add
> > “targetm.scalar_mode_supported_p” too ?
>
> Yeah.  The former says whether the backend has that mode at all.
> But some modes may be there only in some specific patterns but
> without support for mov, add, etc.  Only for
> targetm.scalar_mode_supported_p modes the backend guarantees that
> one can use them e.g. in mode attribute and can expect expansion
> to expand everything with that mode that is needed in some way.
> E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
> support __int128_t type, etc.

The memcpy folding code now checks

  scalar_int_mode mode;
  if (int_mode_for_size (ilen * 8, 0).exists (&mode)
  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
  && have_insn_for (SET, mode)

thus specifically only have_insn_for (SET, mode), which I guess is
good enough for this case as well?

> Jakub
>


Re: [COMMITTED] path oracle: Do not look at root oracle for killed defs.

2021-11-08 Thread Richard Biener via Gcc-patches
On Sat, Nov 6, 2021 at 4:38 PM Aldy Hernandez via Gcc-patches
 wrote:
>
> [This is more Andrew's domain, but this is a P1 PR and I'd like to
> unbreak the SPEC run, since this is a straigthforward fix.  When he
> returns he can double check my work and give additional suggestions.]
>
> The problem here is that we are incorrectly threading 41->20->21 here:
>
>[local count: 56063504182]:
>   _134 = M.10_120 + 1;
>   if (_71 <= _134)
> goto ; [11.00%]
>   else
> goto ; [89.00%]
> ...
> ...
> ...
>[local count: 49896518755]:
>
>[local count: 56063503181]:
>   # lb_75 = PHI <_134(41), 1(18)>
>   _117 = mstep_49 + lb_75;
>   _118 = _117 + -1;
>   _119 = mstep_49 + _118;
>   M.10_120 = MIN_EXPR <_119, _71>;
>   if (lb_75 > M.10_120)
> goto ; [11.00%]
>   else
> goto ; [89.00%]
>
> First, lb_17 == _134 because of the PHI.
> Second, _134 > M.10_120 because of _134 = M.10_120 + 1.
>
> We then assume that lb_75 > M.10_120, but this is incorrect because
> M.10_120 was killed along the path.

Huh, since SSA has only a single definition it cannot be "killed".
What can happen is that if you look across backedges that the same
reg can have two different values.  Basically when you look across
backedges you have to discard all knowledge you derived from
stuff that's dominated by the backedge destination.

>
> This incorrect thread causes the miscompilation in 527.cam4_r.
>
> Tested on x86-64 and ppc64le Linux.
>
> Committed.
>
> gcc/ChangeLog:
>
> PR tree-optimization/103061
> * value-relation.cc (path_oracle::path_oracle): Initialize
> m_killed_defs.
> (path_oracle::killing_def): Set m_killed_defs.
> (path_oracle::query_relation): Do not look at the root oracle for
> killed defs.
> * value-relation.h (class path_oracle): Add m_killed_defs.
> ---
>  gcc/value-relation.cc | 9 +
>  gcc/value-relation.h  | 1 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
> index f1e46d38de1..a0105481466 100644
> --- a/gcc/value-relation.cc
> +++ b/gcc/value-relation.cc
> @@ -1235,6 +1235,7 @@ path_oracle::path_oracle (relation_oracle *oracle)
>m_equiv.m_next = NULL;
>m_relations.m_names = BITMAP_ALLOC (&m_bitmaps);
>m_relations.m_head = NULL;
> +  m_killed_defs = BITMAP_ALLOC (&m_bitmaps);
>  }
>
>  path_oracle::~path_oracle ()
> @@ -1305,6 +1306,8 @@ path_oracle::killing_def (tree ssa)
>
>unsigned v = SSA_NAME_VERSION (ssa);
>
> +  bitmap_set_bit (m_killed_defs, v);
> +
>// Walk the equivalency list and remove SSA from any equivalencies.
>if (bitmap_bit_p (m_equiv.m_names, v))
>  {
> @@ -1389,6 +1392,12 @@ path_oracle::query_relation (basic_block bb, 
> const_bitmap b1, const_bitmap b2)
>
>relation_kind k = m_relations.find_relation (b1, b2);
>
> +  // Do not look at the root oracle for names that have been killed
> +  // along the path.
> +  if (bitmap_intersect_p (m_killed_defs, b1)
> +  || bitmap_intersect_p (m_killed_defs, b2))
> +return k;
> +
>if (k == VREL_NONE && m_root)
>  k = m_root->query_relation (bb, b1, b2);
>
> diff --git a/gcc/value-relation.h b/gcc/value-relation.h
> index 97be3251144..8086f5552b5 100644
> --- a/gcc/value-relation.h
> +++ b/gcc/value-relation.h
> @@ -233,6 +233,7 @@ private:
>equiv_chain m_equiv;
>relation_chain_head m_relations;
>relation_oracle *m_root;
> +  bitmap m_killed_defs;
>
>bitmap_obstack m_bitmaps;
>struct obstack m_chain_obstack;
> --
> 2.31.1
>


[PATCH] Remove dead code.

2021-11-08 Thread Martin Liška

This fixes issue reported in the PR.

Ready to be installed?
Thanks,
Martin

PR other/89259

liboffloadmic/ChangeLog:

* runtime/offload_omp_host.cpp: Remove size < 0 for a size of
size_t type.
---
 liboffloadmic/runtime/offload_omp_host.cpp | 5 -
 1 file changed, 5 deletions(-)

diff --git a/liboffloadmic/runtime/offload_omp_host.cpp 
b/liboffloadmic/runtime/offload_omp_host.cpp
index 0439fec313b..4d8c57e3385 100644
--- a/liboffloadmic/runtime/offload_omp_host.cpp
+++ b/liboffloadmic/runtime/offload_omp_host.cpp
@@ -688,11 +688,6 @@ int omp_target_associate_ptr(
 return 1;
 }
 
-// An incorrect size is treated as failure

-if (size < 0) {
-return 1;
-}
-
 // If OpenMP allows wrap-around for device numbers, enable next line
 //Engine& device = mic_engines[device_num % mic_engines_total];
 Engine& device = mic_engines[device_num];
--
2.33.1



Re: Some PINGs

2021-11-08 Thread Richard Biener via Gcc-patches
On Sat, Nov 6, 2021 at 11:21 PM Roger Sayle  wrote:
>
>
> I wonder if reviewers could take a look (or a second look) at some of my
> outstanding patches.
>
> Four nvptx backend patches:
>
> nvptx: Use cvt to perform sign-extension of truncation.
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578256.html
>
> nvptx: Add (experimental) support for HFmode with -misa=sm_53
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579623.html
>
> nvptx: Adds uses of -misa=sm_75 and -misa=sm_80
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579691.html
>
> Transition nvptx backend to STORE_FLAG_VALUE = 1
> https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581004.html
>
> Two middle-end patches:
>
> Simplify paradoxical subreg extensions of TRUNCATE
> https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578848.html
>
> PR middle-end/100810: Penalize IV candidates with undefined value bases
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html

I did comment on this one, noting the more general issue.  My opinion is still
that doing heavy lifting in IVOPTs is misplaced.

> And a gfortran patch:
>
> gfortran: Improve translation of POPPAR intrinsic
> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548055.html
>
>
> Many thanks in advance,
> Roger
> --
>
>


Re: [PATCH 1/2] [Gimple] Simplify (trunc)fmax/fmin((extend)a, (extend)b) to MAX/MIN(a,b)

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 2:30 AM Hongtao Liu  wrote:
>
> On Fri, Nov 5, 2021 at 5:52 PM Richard Biener
>  wrote:
> >
> > On Fri, Nov 5, 2021 at 6:38 AM liuhongt  wrote:
> > >
> > > a and b are same type as trunc type and has less precision than
> > > extend type, the transformation is guarded by flag_finite_math_only.
> > >
> > > Bootstrapped and regtested under x86_64-pc-linux-gnu{-m32,}
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/102464
> > > * match.pd: Simplify (trunc)fmax/fmin((extend)a, (extend)b) to
> > > MAX/MIN(a,b)
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr102464-maxmin.c: New test.
> > > ---
> > >  gcc/match.pd  | 14 ++
> > >  .../gcc.target/i386/pr102464-maxmin.c | 44 +++
> > >  2 files changed, 58 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102464-maxmin.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index f63079023d0..857ce7f712a 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6182,6 +6182,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > && direct_internal_fn_supported_p (IFN_COPYSIGN,
> > >   type, OPTIMIZE_FOR_BOTH))
> > >  (IFN_COPYSIGN @0 @1
> > > +
> > > +(for maxmin (max min)
> > > + (simplify
> > > +  (convert (maxmin (convert@2 @0) (convert @1)))
> > > +   (if (flag_finite_math_only
> >
> > I suppose you are concerned about infinities, not about NaNs.
> > Please use !HONOR_INFINITIES (@2) then (in general testing
> > flag_* is frowned upon).  You may want to do the FLOAT_TYPE_P
> > tests first.
> I'm concerned about NANs since MAX/MIN_EXPR are different from IEEE
> minimum and maximum operations at NAN operations.

But you are already only handling non-IEEE MAX/MIN_EXPR where the
behavior with a NaN argument is unspecified?

> So i think i'd use MODE_HAS_NANS(@2)?
> >
> > > +   && optimize
> > > +   && FLOAT_TYPE_P (type)
> > > +   && FLOAT_TYPE_P (TREE_TYPE (@2))
> > > +   && types_match (type, TREE_TYPE (@0))
> > > +   && types_match (type, TREE_TYPE (@1))
> > > +   && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@2))
> > > +   && optab_handler (maxmin == MAX_EXPR ? smax_optab : smin_optab,
> > > +   TYPE_MODE (type)) != CODE_FOR_nothing)

And just noticing this now - since we're only changing the type a MAX/MIN_EXPR
operate on, we don't really need to do the optab check.  At RTL expansion
we'd eventually try a wider mode.

> > > +(maxmin @0 @1
> > >  #endif
> > >
> > >  (for froms (XFLOORL XCEILL XROUNDL XRINTL)
> > > diff --git a/gcc/testsuite/gcc.target/i386/pr102464-maxmin.c 
> > > b/gcc/testsuite/gcc.target/i386/pr102464-maxmin.c
> > > new file mode 100644
> > > index 000..37867235a6c
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/pr102464-maxmin.c
> > > @@ -0,0 +1,44 @@
> > > +/* PR target/102464.  */
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -mavx512fp16 -mavx512vl -ffast-math 
> > > -ftree-vectorize -mtune=generic -mfpmath=sse" } */
> > > +/* { dg-final { scan-assembler-times "vmaxph" 3 } }  */
> > > +/* { dg-final { scan-assembler-times "vminph" 3 } }  */
> > > +/* { dg-final { scan-assembler-times "vmaxsh" 3 } }  */
> > > +/* { dg-final { scan-assembler-times "vminsh" 3 } }  */
> > > +/* { dg-final { scan-assembler-times "vmaxps" 2 } }  */
> > > +/* { dg-final { scan-assembler-times "vminps" 2 } }  */
> > > +/* { dg-final { scan-assembler-times "vmaxss" 2 } }  */
> > > +/* { dg-final { scan-assembler-times "vminss" 2 } }  */
> > > +/* { dg-final { scan-assembler-times "vmaxpd" 1 } }  */
> > > +/* { dg-final { scan-assembler-times "vminpd" 1 } }  */
> > > +/* { dg-final { scan-assembler-times "vmaxsd" 1 } }  */
> > > +/* { dg-final { scan-assembler-times "vminsd" 1 } }  */
> > > +
> > > +#include
> > > +#define FOO(CODE,TYPE,SUFFIX)  \
> > > +  void \
> > > +  foo_vect_##CODE##TYPE##SUFFIX (TYPE* __restrict a, TYPE* b, TYPE* c) \
> > > +  {\
> > > +for (int i = 0; i != 8; i++)   \
> > > +  a[i] = CODE##SUFFIX (b[i], c[i]);  
> > >   \
> > > +  }\
> > > +  TYPE \
> > > +  foo_##CODE##TYPE##SUFFIX (TYPE b, TYPE c)\
> > > +  {\
> > > +return CODE##l (b, c); \
> > > +  }
> > > +
> > > +FOO (fmax, _Float16, f);
> > > +FOO (fmax, _Float16,);
> > > +FOO (fmax, _Float16, l);
> > > +FOO (fmin, _Float16, f);
> >

Re: [PATCH] Remove dead code.

2021-11-08 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:
> This fixes issue reported in the PR.
> 
> Ready to be installed?

I'm not sure.  liboffloadmic is copied from upstream, so the right
thing if we want to do anything at all (if we don't remove it, nothing
bad happens, the condition is never true anyway, whether removed away
in the source or removed by the compiler) would be to let Intel fix it in
their source and update from that.
But I have no idea where it even lives upstream.

Jakub



Re: Improve optimization of some builtins

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 8:04 AM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> for nested functions we output call to builtin_dwarf_cfa which
> initializes frame entry used only for debugging.  This however
> prevents us from detecting functions containing nested functions
> as const/pure or analyze side effects in modref.
>
> builtin_dwarf_cfa is not documented and I wonder if it should be turned to
> internal function. But I think we could consider functions using it const even
> if in theory one can do things like test the return address and see the
> difference between different frame addreses.
>
> While doing so I also noticed that special_buitin_state handles quite few
> builtins that are not special cased by ipa-modref.  They do not make
> user visible loads/stores and thus I think they shoul dbe annotated by
> ".c" to make this explicit for both modref and PTA.
>
> Finally I aded dwarf_cfa and similar return_address to list of simple
> bulitins since it compiles to simple stack frame load and we consider
> several other builtins doing so simple.
>
> lto-bootstrapped/regtested all languages on x86_64-linux, seems sane?

LGTM.

> Honza
>
> * builtins.c (is_simple_builtin): Add builitin_dwarf_cfa
> and builtin_return_address.
> (builtin_fnspec): Annotate builtin_return,
> bulitin_eh_pointer, builtin_eh_filter, builtin_unwind_resume,
> builtin_cxa_end_cleanup, builtin_eh_copy_values,
> builtin_frame_address, builtin_apply_args,
> builtin_asan_before_dynamic_init, builtin_asan_after_dynamic_init,
> builtin_prefetch, builtin_dwarf_cfa, builtin_return_addrss
> as ".c"
> * ipa-pure-const.c (special_builtin_state): Add builtin_dwarf_cfa
> and builtin_return_address.
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 7d0f61fc98b..43433e8d6ce 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -10711,6 +10711,8 @@ is_simple_builtin (tree decl)
>case BUILT_IN_VA_END:
>case BUILT_IN_STACK_SAVE:
>case BUILT_IN_STACK_RESTORE:
> +  case BUILT_IN_DWARF_CFA:
> +  case BUILT_IN_RETURN_ADDRESS:
> /* Exception state returns or moves registers around.  */
>case BUILT_IN_EH_FILTER:
>case BUILT_IN_EH_POINTER:
> @@ -11099,6 +11099,19 @@ builtin_fnspec (tree callee)
>CASE_BUILT_IN_TM_STORE (M256):
> return ".cO ";
>case BUILT_IN_STACK_SAVE:
> +  case BUILT_IN_RETURN:
> +  case BUILT_IN_EH_POINTER:
> +  case BUILT_IN_EH_FILTER:
> +  case BUILT_IN_UNWIND_RESUME:
> +  case BUILT_IN_CXA_END_CLEANUP:
> +  case BUILT_IN_EH_COPY_VALUES:
> +  case BUILT_IN_FRAME_ADDRESS:
> +  case BUILT_IN_APPLY_ARGS:
> +  case BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT:
> +  case BUILT_IN_ASAN_AFTER_DYNAMIC_INIT:
> +  case BUILT_IN_PREFETCH:
> +  case BUILT_IN_DWARF_CFA:
> +  case BUILT_IN_RETURN_ADDRESS:
> return ".c";
>case BUILT_IN_ASSUME_ALIGNED:
> return "1cX ";
> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
> index a84a4eb7ac0..e5048092939 100644
> --- a/gcc/ipa-pure-const.c
> +++ b/gcc/ipa-pure-const.c
> @@ -529,6 +529,8 @@ special_builtin_state (enum pure_const_state_e *state, 
> bool *looping,
>case BUILT_IN_APPLY_ARGS:
>case BUILT_IN_ASAN_BEFORE_DYNAMIC_INIT:
>case BUILT_IN_ASAN_AFTER_DYNAMIC_INIT:
> +  case BUILT_IN_DWARF_CFA:
> +  case BUILT_IN_RETURN_ADDRESS:
> *looping = false;
> *state = IPA_CONST;
> return true;


[PATCH] tree-optimization/103102 - fix error in vectorizer refactoring

2021-11-08 Thread Richard Biener via Gcc-patches
This fixes an oversight that caused vectorized epilogues to have
versioning for niters applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-08  Richard Biener  

* tree-vectorizer.h (vect_create_loop_vinfo): Add main_loop_info
parameter.
* tree-vect-loop.c (vect_create_loop_vinfo): Likewise.  Set
LOOP_VINFO_ORIG_LOOP_INFO and conditionalize set of
LOOP_VINFO_NITERS_ASSUMPTIONS.
(vect_analyze_loop_1): Adjust.
(vect_analyze_loop): Move loop constraint setting and
SCEV/niter reset here from vect_create_loop_vinfo to perform
it only once.
(vect_analyze_loop_form): Move dumping of symbolic niters
here from vect_create_loop_vinfo.
---
 gcc/tree-vect-loop.c  | 58 ++-
 gcc/tree-vectorizer.h |  3 ++-
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b56b7a4a386..ede9aff0522 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1464,6 +1464,18 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
   (info->loop_cond,
"not vectorized: number of iterations = 0.\n");
 
+  if (!(tree_fits_shwi_p (info->number_of_iterations)
+   && tree_to_shwi (info->number_of_iterations) > 0))
+{
+  if (dump_enabled_p ())
+   {
+ dump_printf_loc (MSG_NOTE, vect_location,
+  "Symbolic number of iterations is ");
+ dump_generic_expr (MSG_NOTE, TDF_DETAILS, info->number_of_iterations);
+ dump_printf (MSG_NOTE, "\n");
+   }
+}
+
   return opt_result::success ();
 }
 
@@ -1472,36 +1484,17 @@ vect_analyze_loop_form (class loop *loop, 
vect_loop_form_info *info)
 
 loop_vec_info
 vect_create_loop_vinfo (class loop *loop, vec_info_shared *shared,
-   const vect_loop_form_info *info)
+   const vect_loop_form_info *info,
+   loop_vec_info main_loop_info)
 {
   loop_vec_info loop_vinfo = new _loop_vec_info (loop, shared);
   LOOP_VINFO_NITERSM1 (loop_vinfo) = info->number_of_iterationsm1;
   LOOP_VINFO_NITERS (loop_vinfo) = info->number_of_iterations;
   LOOP_VINFO_NITERS_UNCHANGED (loop_vinfo) = info->number_of_iterations;
-  if (!integer_onep (info->assumptions))
-{
-  /* We consider to vectorize this loop by versioning it under
-some assumptions.  In order to do this, we need to clear
-existing information computed by scev and niter analyzer.  */
-  scev_reset_htab ();
-  free_numbers_of_iterations_estimates (loop);
-  /* Also set flag for this loop so that following scev and niter
-analysis are done under the assumptions.  */
-  loop_constraint_set (loop, LOOP_C_FINITE);
-  /* Also record the assumptions for versioning.  */
-  LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
-}
-
-  if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
-{
-  if (dump_enabled_p ())
-{
-  dump_printf_loc (MSG_NOTE, vect_location,
-  "Symbolic number of iterations is ");
- dump_generic_expr (MSG_NOTE, TDF_DETAILS, info->number_of_iterations);
-  dump_printf (MSG_NOTE, "\n");
-}
-}
+  LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_info;
+  /* Also record the assumptions for versioning.  */
+  if (!integer_onep (info->assumptions) && !main_loop_info)
+LOOP_VINFO_NITERS_ASSUMPTIONS (loop_vinfo) = info->assumptions;
 
   stmt_vec_info loop_cond_info = loop_vinfo->lookup_stmt (info->loop_cond);
   STMT_VINFO_TYPE (loop_cond_info) = loop_exit_ctrl_vec_info_type;
@@ -2903,9 +2896,7 @@ vect_analyze_loop_1 (class loop *loop, vec_info_shared 
*shared,
 bool &fatal)
 {
   loop_vec_info loop_vinfo
-= vect_create_loop_vinfo (loop, shared, loop_form_info);
-  if (main_loop_vinfo)
-LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = main_loop_vinfo;
+= vect_create_loop_vinfo (loop, shared, loop_form_info, main_loop_vinfo);
 
   machine_mode vector_mode = vector_modes[mode_i];
   loop_vinfo->vector_mode = vector_mode;
@@ -2997,6 +2988,17 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
 "bad loop form.\n");
   return opt_loop_vec_info::propagate_failure (res);
 }
+  if (!integer_onep (loop_form_info.assumptions))
+{
+  /* We consider to vectorize this loop by versioning it under
+some assumptions.  In order to do this, we need to clear
+existing information computed by scev and niter analyzer.  */
+  scev_reset_htab ();
+  free_numbers_of_iterations_estimates (loop);
+  /* Also set flag for this loop so that following scev and niter
+analysis are done under the assumptions.  */
+  loop_constraint_set (loop, LOOP_C_FINITE);
+}
 
   auto_vector_modes vector_modes;
   /* Autodetect first vector size we try.  */
diff --git a/gc

[PATCH] libstdc++: only define _GLIBCXX_HAVE_TLS for VxWorks >= 6.6

2021-11-08 Thread Rasmus Villemoes
According to
https://gcc.gnu.org/legacy-ml/gcc-patches/2008-03/msg01698.html, the
TLS support, including the __tls_lookup function, was added to VxWorks
in 6.6.

It certainly doesn't exist on our VxWorks 5 platform, but the fallback
code in eh_globals.cc using __gthread_key_create() etc. used to work
just fine.

libstdc++-v3/ChangeLog:

* config/os/vxworks/os_defines.h (_GLIBCXX_HAVE_TLS): Only
define for VxWorks >= 6.6.
---
 libstdc++-v3/config/os/vxworks/os_defines.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/config/os/vxworks/os_defines.h 
b/libstdc++-v3/config/os/vxworks/os_defines.h
index c881b2b4b9e..75a68bc605b 100644
--- a/libstdc++-v3/config/os/vxworks/os_defines.h
+++ b/libstdc++-v3/config/os/vxworks/os_defines.h
@@ -45,8 +45,10 @@
 #define _GLIBCXX_USE_WEAK_REF 0
 #endif
 
-// We support TLS on VxWorks (either directly or with emutls)
+// We support TLS on VxWorks >= 6.6 (either directly or with emutls)
+#if !_VXWORKS_PRE(6, 6)
 #define _GLIBCXX_HAVE_TLS 1
+#endif
 
 // VxWorks7 comes with a DinkumWare library and the system headers which we
 // are going to include for libstdc++ have a few related intrinsic
-- 
2.31.1



Re: [PATCH] gcov-profile: Fix -fcompare-debug with -fprofile-generate [PR100520]

2021-11-08 Thread Martin Liška

On 11/5/21 18:30, Jan Hubicka wrote:

every gcc source looks like bit of overkill given that is can be open
coded in 3 statements?


Why? It's a static inline function with few statements. I don't want to 
copy&paste
the same code at every location. I bet there must quite some open-coded 
implementations
of endswith in the GCC source code.

Martin


Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-08 Thread Maciej W. Rozycki
On Sun, 7 Nov 2021, Hans-Peter Nilsson wrote:

> >  How do I run regression-testing with this target however?  I can see QEMU
> > support upstream, even for user-mode Linux, which would be the easiest to
> > run (sadly toolchain support for CRIS/Linux was removed a while ago as was
> > the Linux kernel port; at one point I even considered getting myself a
> > CRIS development board as an alternative RISC platform that would Linux,
> > but concluded that it was too expensive for the features it offered), but
> > for a bare metal environment both a C library (newlib?) and then a
> > specific board support package is required.
> 
> Classic "bare-metal" whatever-elf testing should not be a
> stranger: sim and binutils support are in place in the official
> binutils+gdb git, as is newlib in that git and since many
> dejagnu releases a cris-sim.exp baseboard file.  Just build and
> install binutils and sim for cris-elf (can probable be done at
> the same time/same builds from a binutils-and-gdb checkout, but
> separate builds are sometimes necessary) then build and test gcc
> from a combined-source-tree containing newlib and gcc.
> (Instructions for combining trees may be salvaged from the
> rottening https://gcc.gnu.org/simtest-howto.html but actually I
> roll tarballs and untar gcc over an (untarred) newlib tree.)

 Thanks, I'll give it a try.  I don't use GNUsim-based configurations very 
often, so I'm not even used to thinking they exist.  It might be good to 
have a template build configuration then.

> >  Or may I ask you to put this patch through testing with your environment?
> 
> Where's the fun in that? :)
> (I thought you'd use 6cb68940dcf9 and do the same for VAX.)

 I could, easily, but being confined to gcc/config/cris I don't expect it 
to be included in the build let alone trigger anything.

> > > Your proposed patch reminded me of 6cb68940dcf9; giving reload a
> > > reload-specific insn_and_split pattern to play with, matching
> > > "mult" outside of a mem.  I *guess* that's the CRIS-specific
> > > replacement to c605a8bf9270.
> >
> >  Possibly, except for the missing reload bits making it incomplete.
> 
> No, my thinking was that it wouldn't be needed.  But, I didn't
> have a close look and maybe the problem isn't exactly the same
> or VAX has additional caveats.  Also, that reload-pacifying
> pattern *is* a target-specific workaround for a reload bug, but
> a risk-free one for other targets.

 Right.

> brgds, H-P
> PS. I'll fire up a round with that patch "tomorrow".  Film at 11.

 Great, thanks!  I'll build and test your configuration anyway.  Though I 
can see that CRIS has LRA wired to off right now, which means there'll be 
little interference likely from my upcoming work with the VAX port as I'm 
going to focus on making LRA better now rather than poking at old reload 
unless something else as grave as this issue pops up.

  Maciej


Re: [aarch64] PR102376 - Emit better diagnostic for arch extensions in target attr

2021-11-08 Thread Prathamesh Kulkarni via Gcc-patches
On Thu, 4 Nov 2021 at 14:19, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > On Wed, 20 Oct 2021 at 15:05, Richard Sandiford
> >  wrote:
> >>
> >> Prathamesh Kulkarni  writes:
> >> > On Tue, 19 Oct 2021 at 19:58, Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> Prathamesh Kulkarni  writes:
> >> >> > Hi,
> >> >> > The attached patch emits a more verbose diagnostic for target 
> >> >> > attribute that
> >> >> > is an architecture extension needing a leading '+'.
> >> >> >
> >> >> > For the following test,
> >> >> > void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >
> >> >> > With patch, the compiler now emits:
> >> >> > 102376.c:1:1: error: arch extension ‘sve’ should be prepended with ‘+’
> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >   | ^~~~
> >> >> >
> >> >> > instead of:
> >> >> > 102376.c:1:1: error: pragma or attribute ‘target("sve")’ is not valid
> >> >> > 1 | void calculate(void) __attribute__ ((__target__ ("sve")));
> >> >> >   | ^~~~
> >> >>
> >> >> Nice :-)
> >> >>
> >> >> > (This isn't specific to sve though).
> >> >> > OK to commit after bootstrap+test ?
> >> >> >
> >> >> > Thanks,
> >> >> > Prathamesh
> >> >> >
> >> >> > diff --git a/gcc/config/aarch64/aarch64.c 
> >> >> > b/gcc/config/aarch64/aarch64.c
> >> >> > index a9a1800af53..975f7faf968 100644
> >> >> > --- a/gcc/config/aarch64/aarch64.c
> >> >> > +++ b/gcc/config/aarch64/aarch64.c
> >> >> > @@ -17821,7 +17821,16 @@ aarch64_process_target_attr (tree args)
> >> >> >num_attrs++;
> >> >> >if (!aarch64_process_one_target_attr (token))
> >> >> >   {
> >> >> > -   error ("pragma or attribute % is not valid", 
> >> >> > token);
> >> >> > +   /* Check if token is possibly an arch extension without
> >> >> > +  leading '+'.  */
> >> >> > +   char *str = (char *) xmalloc (strlen (token) + 2);
> >> >> > +   str[0] = '+';
> >> >> > +   strcpy(str + 1, token);
> >> >>
> >> >> I think std::string would be better here, e.g.:
> >> >>
> >> >>   auto with_plus = std::string ("+") + token;
> >> >>
> >> >> > +   if (aarch64_handle_attr_isa_flags (str))
> >> >> > + error("arch extension %<%s%> should be prepended with 
> >> >> > %<+%>", token);
> >> >>
> >> >> Nit: should be a space before the “(”.
> >> >>
> >> >> In principle, a fixit hint would have been nice here, but I don't think
> >> >> we have enough information to provide one.  (Just saying for the 
> >> >> record.)
> >> > Thanks for the suggestions.
> >> > Does the attached patch look OK ?
> >>
> >> Looks good apart from a couple of formatting nits.
> >> >
> >> > Thanks,
> >> > Prathamesh
> >> >>
> >> >> Thanks,
> >> >> Richard
> >> >>
> >> >> > +   else
> >> >> > + error ("pragma or attribute % is not 
> >> >> > valid", token);
> >> >> > +   free (str);
> >> >> > return false;
> >> >> >   }
> >> >> >
> >> >
> >> > [aarch64] PR102376 - Emit better diagnostics for arch extension in 
> >> > target attribute.
> >> >
> >> > gcc/ChangeLog:
> >> >   PR target/102376
> >> >   * config/aarch64/aarch64.c (aarch64_handle_attr_isa_flags): Change 
> >> > str's
> >> >   type to const char *.
> >> >   (aarch64_process_target_attr): Check if token is possibly an arch 
> >> > extension
> >> >   without leading '+' and emit diagnostic accordingly.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >   PR target/102376
> >> >   * gcc.target/aarch64/pr102376.c: New test.
> >> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> >> > index a9a1800af53..b72079bc466 100644
> >> > --- a/gcc/config/aarch64/aarch64.c
> >> > +++ b/gcc/config/aarch64/aarch64.c
> >> > @@ -17548,7 +17548,7 @@ aarch64_handle_attr_tune (const char *str)
> >> > modified.  */
> >> >
> >> >  static bool
> >> > -aarch64_handle_attr_isa_flags (char *str)
> >> > +aarch64_handle_attr_isa_flags (const char *str)
> >> >  {
> >> >enum aarch64_parse_opt_result parse_res;
> >> >uint64_t isa_flags = aarch64_isa_flags;
> >> > @@ -17821,7 +17821,13 @@ aarch64_process_target_attr (tree args)
> >> >num_attrs++;
> >> >if (!aarch64_process_one_target_attr (token))
> >> >   {
> >> > -   error ("pragma or attribute % is not valid", 
> >> > token);
> >> > +   /* Check if token is possibly an arch extension without
> >> > +  leading '+'.  */
> >> > +   auto with_plus = std::string("+") + token;
> >>
> >> Should be a space before “(”.
> >>
> >> > +   if (aarch64_handle_attr_isa_flags (with_plus.c_str ()))
> >> > + error ("arch extension %<%s%> should be prepended with %<+%>", 
> >> > token);
> >>
> >> Long line, should be:
> >>
> >> error ("arch extension %<%s%> should be prepended with %<+%>",
> >>token);
> >>
> >> OK with those changes, thanks.
> > Thanks, the patch regressed some target attr tests because it emitted
> > diagnostics twice from
> > aarch64_han

RE: [PATCH][GCC] arm: add armv9-a architecture to -march

2021-11-08 Thread Przemyslaw Wirkus via Gcc-patches
Ping :)

> -Original Message-
> From: Przemyslaw Wirkus
> Sent: 18 October 2021 10:37
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana
> Radhakrishnan ; Kyrylo Tkachov
> ; ni...@redhat.com
> Subject: [PATCH][GCC] arm: add armv9-a architecture to -march
> 
> Hi,
> 
> This patch is adding `armv9-a` to -march in Arm GCC.
> 
> In this patch:
>   + Add `armv9-a` to -march.
>   + Update multilib with armv9-a and armv9-a+simd.
> 
> After this patch three additional multilib directories are available:
> 
> $ arm-none-eabi-gcc --print-multi-lib
> .;
> [...vanilla multi-lib dirs...]
> thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
> thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-
> abi=softfp
> thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-
> abi=hard
> 
> New multi-lib directories under
> $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:
> 
> thumb/
> +--- v9-a
> ||--- nofp
> |
> +--- v9-a+simd
>  |--- hard
>  |--- softfp
> 
> Regtested on arm-none-eabi cross and no issues.
> 
> OK for master?
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (armv9): New define.
>   (ARMv9a): New group.
>   (armv9-a): New arch definition.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
>   * config/arm/t-aprofile: Added armv9-a and armv9+simd.
>   * config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
>   to MULTILIB_MATCHES.
>   * config/arm/t-multilib: Added v9_a_nosimd_variants and
>   v9_a_simd_variants to MULTILIB_MATCHES.
>   * doc/invoke.texi: Update docs.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/multilib.exp: Update test with armv9-a entries.
>   * lib/target-supports.exp (v9a): Add new armflag.
>   (__ARM_ARCH_9A__): Add new armdef.
> 
> --
> kind regards,
> Przemyslaw Wirkus


RE: [PATCH][GCC] arm: enable cortex-a710 CPU

2021-11-08 Thread Przemyslaw Wirkus via Gcc-patches
Ping :)

> -Original Message-
> From: Przemyslaw Wirkus
> Sent: 18 October 2021 10:40
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana Radhakrishnan 
> ; Kyrylo Tkachov 
> ; ni...@redhat.com
> Subject: [PATCH][GCC] arm: enable cortex-a710 CPU
> 
> Hi,
> 
> This patch is adding support for Cortex-A710 CPU [0].
> 
>   [0] https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a710
> 
> OK for master?
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm-cpus.in (cortex-a710): New CPU.
>   * config/arm/arm-tables.opt: Regenerate.
>   * config/arm/arm-tune.md: Regenerate.
>   * doc/invoke.texi: Update docs.
> 
> --
> kind regards,
> Przemyslaw Wirkus
> 
> Staff Compiler Engineer | Arm
> . . . . . . . . . . . . . . . . . . . . . . . . . .
> 
> Arm.com


[committed] genmodes: Define NUM_MODE_* macros

2021-11-08 Thread Richard Sandiford via Gcc-patches
I was working on a patch that needed to calculate the number of
modes in a particular class.  It seemed better to have genmodes
generate this directly rather than do the kind of dance that
expmed.h had.

Tested on aarch64-linux-gnu and x86_64-linux-gnu & pushed,
counting the non-gen* bits as obvious.

Richard


gcc/
* genmodes.c (emit_insn_modes_h): Define NUM_MODE_* macros.
* expmed.h (NUM_MODE_INT): Delete in favor of genmodes definitions.
(NUM_MODE_PARTIAL_INT, NUM_MODE_VECTOR_INT): Likewise.
* real.h (real_format_for_mode): Use NUM_MODE_FLOAT and
NUM_MODE_DECIMAL_FLOAT.
(REAL_MODE_FORMAT): Likewise.
---
 gcc/expmed.h   |  9 -
 gcc/genmodes.c | 13 +
 gcc/real.h |  5 ++---
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 93cd6316f0d..6b13ea96c49 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -133,15 +133,6 @@ struct alg_hash_entry {
 #define NUM_ALG_HASH_ENTRIES 307
 #endif
 
-#define NUM_MODE_INT \
-  (MAX_MODE_INT - MIN_MODE_INT + 1)
-#define NUM_MODE_PARTIAL_INT \
-  (MIN_MODE_PARTIAL_INT == E_VOIDmode ? 0 \
-   : MAX_MODE_PARTIAL_INT - MIN_MODE_PARTIAL_INT + 1)
-#define NUM_MODE_VECTOR_INT \
-  (MIN_MODE_VECTOR_INT == E_VOIDmode ? 0 \
-   : MAX_MODE_VECTOR_INT - MIN_MODE_VECTOR_INT + 1)
-
 #define NUM_MODE_IP_INT (NUM_MODE_INT + NUM_MODE_PARTIAL_INT)
 #define NUM_MODE_IPV_INT (NUM_MODE_IP_INT + NUM_MODE_VECTOR_INT)
 
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index c9af4efba46..ecc8b448406 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -1316,6 +1316,19 @@ enum machine_mode\n{");
   NUM_MACHINE_MODES = MAX_MACHINE_MODE\n\
 };\n");
 
+  /* Define a NUM_* macro for each mode class, giving the number of modes
+ in the class.  */
+  for (c = 0; c < MAX_MODE_CLASS; c++)
+{
+  printf ("#define NUM_%s ", mode_class_names[c]);
+  if (modes[c])
+   printf ("(MAX_%s - MIN_%s + 1)\n", mode_class_names[c],
+   mode_class_names[c]);
+  else
+   printf ("0\n");
+}
+  printf ("\n");
+
   /* I can't think of a better idea, can you?  */
   printf ("#define CONST_MODE_NUNITS%s\n", adj_nunits ? "" : " const");
   printf ("#define CONST_MODE_PRECISION%s\n", adj_nunits ? "" : " const");
diff --git a/gcc/real.h b/gcc/real.h
index 015163d9917..39dd34e3971 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -178,13 +178,12 @@ struct real_format
decimal float modes indexed by (MODE - first decimal float mode) +
the number of float modes.  */
 extern const struct real_format *
-  real_format_for_mode[MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1
-  + MAX_MODE_DECIMAL_FLOAT - MIN_MODE_DECIMAL_FLOAT + 1];
+  real_format_for_mode[NUM_MODE_FLOAT + NUM_MODE_DECIMAL_FLOAT];
 
 #define REAL_MODE_FORMAT(MODE) \
   (real_format_for_mode[DECIMAL_FLOAT_MODE_P (MODE)\
? (((MODE) - MIN_MODE_DECIMAL_FLOAT)\
-  + (MAX_MODE_FLOAT - MIN_MODE_FLOAT + 1)) \
+  + NUM_MODE_FLOAT)\
: GET_MODE_CLASS (MODE) == MODE_FLOAT   \
? ((MODE) - MIN_MODE_FLOAT) \
: (gcc_unreachable (), 0)])
-- 
2.25.1



[committed] aarch64: LD3/LD4 post-modify costs for struct modes

2021-11-08 Thread Richard Sandiford via Gcc-patches
The LD3/ST3 and LD4/ST4 address cost code had no test coverage (oops).
This patch fixes that and updates it for the new structure modes.
The test only covers Advanced SIMD because SVE doesn't have
post-increment forms.

Tested on aarch64-linxu-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_ldn_stn_vectors): New function.
(aarch64_address_cost): Use it instead of testing for CImode and
XImode directly.

gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_1.c: New test.
---
 gcc/config/aarch64/aarch64.c  | 22 +--
 .../gcc.target/aarch64/neoverse_v1_1.c| 15 +
 2 files changed, 35 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neoverse_v1_1.c

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index fdf05505846..19f67415234 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3108,6 +3108,23 @@ aarch64_vl_bytes (machine_mode mode, unsigned int 
vec_flags)
   return BYTES_PER_SVE_PRED;
 }
 
+/* If MODE holds an array of vectors, return the number of vectors
+   in the array, otherwise return 1.  */
+
+static unsigned int
+aarch64_ldn_stn_vectors (machine_mode mode)
+{
+  unsigned int vec_flags = aarch64_classify_vector_mode (mode);
+  if (vec_flags == (VEC_ADVSIMD | VEC_PARTIAL | VEC_STRUCT))
+return exact_div (GET_MODE_SIZE (mode), 8).to_constant ();
+  if (vec_flags == (VEC_ADVSIMD | VEC_STRUCT))
+return exact_div (GET_MODE_SIZE (mode), 16).to_constant ();
+  if (vec_flags == (VEC_SVE_DATA | VEC_STRUCT))
+return exact_div (GET_MODE_SIZE (mode),
+ BYTES_PER_SVE_VECTOR).to_constant ();
+  return 1;
+}
+
 /* Given an Advanced SIMD vector mode MODE and a tuple size NELEMS, return the
corresponding vector structure mode.  */
 static opt_machine_mode
@@ -12511,9 +12528,10 @@ aarch64_address_cost (rtx x,
  cost += addr_cost->pre_modify;
else if (c == POST_INC || c == POST_DEC || c == POST_MODIFY)
  {
-   if (mode == CImode)
+   unsigned int nvectors = aarch64_ldn_stn_vectors (mode);
+   if (nvectors == 3)
  cost += addr_cost->post_modify_ld3_st3;
-   else if (mode == XImode)
+   else if (nvectors == 4)
  cost += addr_cost->post_modify_ld4_st4;
else
  cost += addr_cost->post_modify;
diff --git a/gcc/testsuite/gcc.target/aarch64/neoverse_v1_1.c 
b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_1.c
new file mode 100644
index 000..c1563f01861
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_1.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -mcpu=neoverse-v1" } */
+
+void
+foo (short *restrict x, short y[restrict][128])
+{
+  for (int i = 0; i < 128; ++i)
+{
+  y[0][i] = x[i * 3 + 0];
+  y[1][i] = x[i * 3 + 1];
+  y[2][i] = x[i * 3 + 2];
+}
+}
+
+/* This shouldn't be a post-increment.  */
+/* { dg-final { scan-assembler {ld3\t{[^{}]*}, \[x[0-9]+\]\n} } } */
-- 
2.25.1



RE: [PATCH][GCC] arm: add armv9-a architecture to -march

2021-11-08 Thread Kyrylo Tkachov via Gcc-patches
Hi Przemek,

> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: Monday, November 8, 2021 10:34 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana
> Radhakrishnan ; Kyrylo Tkachov
> ; ni...@redhat.com
> Subject: RE: [PATCH][GCC] arm: add armv9-a architecture to -march
> 
> Ping :)
> 
> > -Original Message-
> > From: Przemyslaw Wirkus
> > Sent: 18 October 2021 10:37
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Ramana
> > Radhakrishnan ; Kyrylo Tkachov
> > ; ni...@redhat.com
> > Subject: [PATCH][GCC] arm: add armv9-a architecture to -march
> >
> > Hi,
> >
> > This patch is adding `armv9-a` to -march in Arm GCC.
> >
> > In this patch:
> > + Add `armv9-a` to -march.
> > + Update multilib with armv9-a and armv9-a+simd.
> >
> > After this patch three additional multilib directories are available:
> >
> > $ arm-none-eabi-gcc --print-multi-lib
> > .;
> > [...vanilla multi-lib dirs...]
> > thumb/v9-a/nofp;@mthumb@march=armv9-a@mfloat-abi=soft
> > thumb/v9-a+simd/softfp;@mthumb@march=armv9-a+simd@mfloat-
> > abi=softfp
> > thumb/v9-a+simd/hard;@mthumb@march=armv9-a+simd@mfloat-
> > abi=hard
> >
> > New multi-lib directories under
> > $GCC_INSTALL_DIE/lib/gcc/arm-none-eabi/12.0.0/thumb are created:
> >
> > thumb/
> > +--- v9-a
> > ||--- nofp
> > |
> > +--- v9-a+simd
> >  |--- hard
> >  |--- softfp
> >
> > Regtested on arm-none-eabi cross and no issues.
> >
> > OK for master?

Ok.
Thanks,
Kyrill


> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm-cpus.in (armv9): New define.
> > (ARMv9a): New group.
> > (armv9-a): New arch definition.
> > * config/arm/arm-tables.opt: Regenerate.
> > * config/arm/arm.h (BASE_ARCH_9A): New arch enum value.
> > * config/arm/t-aprofile: Added armv9-a and armv9+simd.
> > * config/arm/t-arm-elf: Added arm9-a, v9_fps and all_v9_archs
> > to MULTILIB_MATCHES.
> > * config/arm/t-multilib: Added v9_a_nosimd_variants and
> > v9_a_simd_variants to MULTILIB_MATCHES.
> > * doc/invoke.texi: Update docs.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/multilib.exp: Update test with armv9-a entries.
> > * lib/target-supports.exp (v9a): Add new armflag.
> > (__ARM_ARCH_9A__): Add new armdef.
> >
> > --
> > kind regards,
> > Przemyslaw Wirkus



RE: [PATCH][GCC] arm: enable cortex-a710 CPU

2021-11-08 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Przemyslaw Wirkus 
> Sent: Monday, November 8, 2021 10:35 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw ; Ramana
> Radhakrishnan ; Kyrylo Tkachov
> ; ni...@redhat.com
> Subject: RE: [PATCH][GCC] arm: enable cortex-a710 CPU
> 
> Ping :)
> 
> > -Original Message-
> > From: Przemyslaw Wirkus
> > Sent: 18 October 2021 10:40
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Earnshaw ; Ramana
> Radhakrishnan
> > ; Kyrylo Tkachov
> > ; ni...@redhat.com
> > Subject: [PATCH][GCC] arm: enable cortex-a710 CPU
> >
> > Hi,
> >
> > This patch is adding support for Cortex-A710 CPU [0].
> >
> >   [0] https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a710
> >
> > OK for master?

Ok.
Thanks,
Kyrill

> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm-cpus.in (cortex-a710): New CPU.
> > * config/arm/arm-tables.opt: Regenerate.
> > * config/arm/arm-tune.md: Regenerate.
> > * doc/invoke.texi: Update docs.
> >
> > --
> > kind regards,
> > Przemyslaw Wirkus
> >
> > Staff Compiler Engineer | Arm
> > . . . . . . . . . . . . . . . . . . . . . . . . . .
> >
> > Arm.com



[PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-08 Thread Richard Sandiford via Gcc-patches
The vector costs now use a common base class instead of being
completely abstract.  This means that there's no longer a
need to record the inside and outside costs separately.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (_loop_vec_info): Remove vec_outside_cost
and vec_inside_cost.
(vector_costs::outside_cost): New function.
* tree-vectorizer.c (_loop_vec_info::_loop_vec_info): Update
after above.
(vect_estimate_min_profitable_iters): Likewise.
(vect_better_loop_vinfo_p): Get the inside and outside costs
from the loop_vec_infos' vector_costs.
---
 gcc/tree-vect-loop.c  | 24 ++--
 gcc/tree-vectorizer.h | 16 +---
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b6a631d4384..dd4a363fee5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -840,8 +840,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 scan_map (NULL),
 slp_unrolling_factor (1),
 single_scalar_iteration_cost (0),
-vec_outside_cost (0),
-vec_inside_cost (0),
 inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
 vectorizable (false),
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
@@ -2845,10 +2843,10 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
   /* Compute the costs by multiplying the inside costs with the factor and
 add the outside costs for a more complete picture.  The factor is the
 amount of times we are expecting to iterate this epilogue.  */
-  old_cost = old_loop_vinfo->vec_inside_cost * old_factor;
-  new_cost = new_loop_vinfo->vec_inside_cost * new_factor;
-  old_cost += old_loop_vinfo->vec_outside_cost;
-  new_cost += new_loop_vinfo->vec_outside_cost;
+  old_cost = old_loop_vinfo->vector_costs->body_cost () * old_factor;
+  new_cost = new_loop_vinfo->vector_costs->body_cost () * new_factor;
+  old_cost += old_loop_vinfo->vector_costs->outside_cost ();
+  new_cost += new_loop_vinfo->vector_costs->outside_cost ();
   return new_cost < old_cost;
 }
 
@@ -2865,8 +2863,8 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
 
   /* Check whether the (fractional) cost per scalar iteration is lower
  or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf.  */
-  poly_int64 rel_new = new_loop_vinfo->vec_inside_cost * old_vf;
-  poly_int64 rel_old = old_loop_vinfo->vec_inside_cost * new_vf;
+  poly_int64 rel_new = new_loop_vinfo->vector_costs->body_cost () * old_vf;
+  poly_int64 rel_old = old_loop_vinfo->vector_costs->body_cost () * new_vf;
 
   HOST_WIDE_INT est_rel_new_min
 = estimated_poly_value (rel_new, POLY_VALUE_MIN);
@@ -2918,8 +2916,10 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
 
   /* If there's nothing to choose between the loop bodies, see whether
  there's a difference in the prologue and epilogue costs.  */
-  if (new_loop_vinfo->vec_outside_cost != old_loop_vinfo->vec_outside_cost)
-return new_loop_vinfo->vec_outside_cost < old_loop_vinfo->vec_outside_cost;
+  auto old_outside_cost = old_loop_vinfo->vector_costs->outside_cost ();
+  auto new_outside_cost = new_loop_vinfo->vector_costs->outside_cost ();
+  if (new_outside_cost != old_outside_cost)
+return new_outside_cost < old_outside_cost;
 
   return false;
 }
@@ -4272,10 +4272,6 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 
   vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
 
-  /* Stash the costs so that we can compare two loop_vec_infos.  */
-  loop_vinfo->vec_inside_cost = vec_inside_cost;
-  loop_vinfo->vec_outside_cost = vec_outside_cost;
-
   if (dump_enabled_p ())
 {
   dump_printf_loc (MSG_NOTE, vect_location, "Cost model analysis: \n");
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 1cd6cc036f2..87d3f211a2e 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -724,13 +724,6 @@ public:
   /* Cost of a single scalar iteration.  */
   int single_scalar_iteration_cost;
 
-  /* The cost of the vector prologue and epilogue, including peeled
- iterations and set-up code.  */
-  int vec_outside_cost;
-
-  /* The cost of the vector loop body.  */
-  int vec_inside_cost;
-
   /* The factor used to over weight those statements in an inner loop
  relative to the loop being vectorized.  */
   unsigned int inner_loop_cost_factor;
@@ -1429,6 +1422,7 @@ public:
   unsigned int prologue_cost () const;
   unsigned int body_cost () const;
   unsigned int epilogue_cost () const;
+  unsigned int outside_cost () const;
 
 protected:
   unsigned int record_stmt_cost (stmt_vec_info, vect_cost_model_location,
@@ -1489,6 +1483,14 @@ vector_costs::epilogue_cost () const
   return m_costs[vect_epilogue];
 }
 
+/* Return the cost of the prologue and epilogue code (

[PATCH] vect: Hookize better_loop_vinfo_p

2021-11-08 Thread Richard Sandiford via Gcc-patches
One of the things we want to do on AArch64 is compare vector loops
side-by-side and pick the best one.  For some targets, we want this
to be based on issue rates as well as the usual latency-based costs
(at least for loops with relatively high iteration counts).

The current approach to doing this is: when costing vectorisation
candidate A, try to guess what the other main candidate B will look
like and adjust A's latency-based cost up or down based on the likely
difference between A and B's issue rates.  This effectively means
that we try to cost parts of B at the same time as A, without actually
being able to see B.

This is needlessly indirect and complex.  It was a compromise due
to the code being added (too) late in the GCC 11 cycle, so that
target-independent changes weren't possible.

The target-independent code already compares two candidate loop_vec_infos
side-by-side, so that information about A and B above are available
directly.  This patch creates a way for targets to hook into this
comparison.

The AArch64 code can therefore hook into better_main_loop_than_p to
compare issue rates.  If the issue rate comparison isn't decisive,
the code can fall back to the normal latency-based comparison instead.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (vector_costs::better_main_loop_than_p)
(vector_costs::better_epilogue_loop_than_p)
(vector_costs::compare_inside_loop_cost)
(vector_costs::compare_outside_loop_cost): Likewise.
* tree-vectorizer.c (vector_costs::better_main_loop_than_p)
(vector_costs::better_epilogue_loop_than_p)
(vector_costs::compare_inside_loop_cost)
(vector_costs::compare_outside_loop_cost): New functions,
containing code moved from...
* tree-vect-loop.c (vect_better_loop_vinfo_p): ...here.
---
 gcc/tree-vect-loop.c  | 142 ++---
 gcc/tree-vectorizer.c | 204 ++
 gcc/tree-vectorizer.h |  17 
 3 files changed, 226 insertions(+), 137 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index dd4a363fee5..c9ee2e15e35 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2784,144 +2784,12 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
return new_simdlen_p;
 }
 
-  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
-  if (main_loop)
-{
-  poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop);
-  unsigned HOST_WIDE_INT main_vf;
-  unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost;
-  /* If we can determine how many iterations are left for the epilogue
-loop, that is if both the main loop's vectorization factor and number
-of iterations are constant, then we use them to calculate the cost of
-the epilogue loop together with a 'likely value' for the epilogues
-vectorization factor.  Otherwise we use the main loop's vectorization
-factor and the maximum poly value for the epilogue's.  If the target
-has not provided with a sensible upper bound poly vectorization
-factors are likely to be favored over constant ones.  */
-  if (main_poly_vf.is_constant (&main_vf)
- && LOOP_VINFO_NITERS_KNOWN_P (main_loop))
-   {
- unsigned HOST_WIDE_INT niters
-   = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
- HOST_WIDE_INT old_likely_vf
-   = estimated_poly_value (old_vf, POLY_VALUE_LIKELY);
- HOST_WIDE_INT new_likely_vf
-   = estimated_poly_value (new_vf, POLY_VALUE_LIKELY);
-
- /* If the epilogue is using partial vectors we account for the
-partial iteration here too.  */
- old_factor = niters / old_likely_vf;
- if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)
- && niters % old_likely_vf != 0)
-   old_factor++;
-
- new_factor = niters / new_likely_vf;
- if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)
- && niters % new_likely_vf != 0)
-   new_factor++;
-   }
-  else
-   {
- unsigned HOST_WIDE_INT main_vf_max
-   = estimated_poly_value (main_poly_vf, POLY_VALUE_MAX);
-
- old_factor = main_vf_max / estimated_poly_value (old_vf,
-  POLY_VALUE_MAX);
- new_factor = main_vf_max / estimated_poly_value (new_vf,
-  POLY_VALUE_MAX);
-
- /* If the loop is not using partial vectors then it will iterate one
-time less than one that does.  It is safe to subtract one here,
-because the main loop's vf is always at least 2x bigger than that
-of an epilogue.  */
- if (!LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo))
-   old_factor -= 1;
- if (!LOOP_VINFO_USING_PART

[ping^3] Make sure that we get unique test names if several DejaGnu directives refer to the same line [PR102735]

2021-11-08 Thread Thomas Schwinge
Hi!

Ping, once more.


Grüße
 Thomas


On 2021-10-14T12:12:41+0200, I wrote:
> Hi!
>
> Ping, again.
>
> Commit log updated for 
> "privatization-1-compute.c results in both XFAIL and PASS".
>
>
> Grüße
>  Thomas
>
>
> On 2021-09-30T08:42:25+0200, I wrote:
>> Hi!
>>
>> Ping.
>>
>> On 2021-09-22T13:03:46+0200, I wrote:
>>> On 2021-09-19T11:35:00-0600, Jeff Law via Gcc-patches 
>>>  wrote:
 A couple of goacc tests do not have unique names.
>>>
>>> Thanks for fixing this up, and sorry, largely my "fault", I suppose.  ;-|
>>>
 This causes problems
 for the test comparison script when one of the test passes and the other
 fails -- in this scenario the test comparison script claims there is a
 regression.
>>>
>>> So I understand correctly that this is a problem not just for actual
>>> mixed PASS vs. FAIL (which we'd like you to report anyway!) that appear
>>> for the same line, but also for mixed PASS vs. XFAIL?  (Because, the
>>> latter appears to be what you're addressing with your commit here.)
>>>
 This slipped through for a while because I had turned off x86_64 testing
 (others test it regularly and I was revamping the tester's hardware
 requirements).  Now that I've acquired more x86_64 resources and turned
 on native x86 testing again, it's been flagged.
>>>
>>> (I don't follow that argument -- these test cases should be all generic?
>>> Anyway, not important, I guess.)
>>>
 This patch just adds a numeric suffix to the TODO string to disambiguate
 them.
>>>
>>> So, instead of doing this manually (always error-prone!), like you've...
>>>
 Committed to the trunk,
>>>
 commit f75b237254f32d5be32c9d9610983b777abea633
 Author: Jeff Law 
 Date:   Sun Sep 19 13:31:32 2021 -0400

 [committed] Make test names unique for a couple of goacc tests
>>>
 --- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
 +++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
 @@ -39,9 +39,9 @@ contains
!$acc atomic write ! ... to force 'TREE_ADDRESSABLE'.
y = a
  !$acc end parallel
 -! { dg-note {variable 'i' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
 l_compute$c_compute }
 -! { dg-note {variable 'j' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
 l_compute$c_compute }
 -! { dg-note {variable 'a' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
 l_compute$c_compute }
 +! { dg-note {variable 'i' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO2" { xfail *-*-* } 
 l_compute$c_compute }
 +! { dg-note {variable 'j' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO3" { xfail *-*-* } 
 l_compute$c_compute }
 +! { dg-note {variable 'a' in 'private' clause potentially has 
 improper OpenACC privatization level: 'parm_decl'} "TODO4" { xfail *-*-* } 
 l_compute$c_compute }
>>>
>>> ... etc. (also similarly in a handful of earlier commits, if I remember
>>> correctly), why don't we do that programmatically, like in the attached
>>> "Make sure that we get unique test names if several DejaGnu directives
>>> refer to the same line", once and for all?  OK to push after proper
>>> testing?
>>
>> Attached again, for easy reference.
>>
>> I figure it may help if I showed an example of how this changes things;
>> for the test case cited above (word-diff):
>>
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 40+} (test for warnings, line 39)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 41+} (test for warnings, line 22)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 42+} (test for warnings, line 39)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 43+} (test for warnings, line 22)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 44+} (test for warnings, line 39)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 45+} (test for warnings, line 22)
>> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO2 {+at 
>> line 50+} (test for warnings, line 29)
>> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO3 {+at 
>> line 51+} (test for warnings, line 29)
>> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO4 {+at 
>> line 52+} (test for warnings, line 29)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO {+at line 
>> 53+} (test for warnings, line 29)
>> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 
>> 54+} (test for warning

[PATCH] vect: Keep scalar costs around longer

2021-11-08 Thread Richard Sandiford via Gcc-patches
The scalar costs for a loop are fleeting, with only the final
single_scalar_iteration_cost being kept for later comparison.
This patch replaces single_scalar_iteration_cost with the cost
structure, so that (with later patches) it's possible for targets
to examine other target-specific cost properties as well.  This will
be done by passing the scalar costs to hooks where appropriate;
targets shouldn't try to read the information directly from
loop_vec_infos.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (_loop_vec_info::scalar_costs): New member
variable.
(_loop_vec_info::single_scalar_iteration_cost): Delete.
(LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Delete.
(vector_costs::total_cost): New function.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
after above changes.
(_loop_vec_info::~_loop_vec_info): Delete scalar_costs.
(vect_compute_single_scalar_iteration_cost): Store the costs
in loop_vinfo->scalar_costs.
(vect_estimate_min_profitable_iters): Get the scalar cost from
loop_vinfo->scalar_costs.
---
 gcc/tree-vect-loop.c  | 17 ++---
 gcc/tree-vectorizer.h | 17 +
 2 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index c9ee2e15e35..887275a5071 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -822,6 +822,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 num_iters_unchanged (NULL_TREE),
 num_iters_assumptions (NULL_TREE),
 vector_costs (nullptr),
+scalar_costs (nullptr),
 th (0),
 versioning_threshold (0),
 vectorization_factor (0),
@@ -839,7 +840,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 ivexpr_map (NULL),
 scan_map (NULL),
 slp_unrolling_factor (1),
-single_scalar_iteration_cost (0),
 inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
 vectorizable (false),
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
@@ -931,6 +931,7 @@ _loop_vec_info::~_loop_vec_info ()
   delete ivexpr_map;
   delete scan_map;
   epilogue_vinfos.release ();
+  delete scalar_costs;
   delete vector_costs;
 
   /* When we release an epiloge vinfo that we do not intend to use
@@ -1292,20 +1293,15 @@ vect_compute_single_scalar_iteration_cost 
(loop_vec_info loop_vinfo)
 }
 
   /* Now accumulate cost.  */
-  vector_costs *target_cost_data = init_cost (loop_vinfo, true);
+  loop_vinfo->scalar_costs = init_cost (loop_vinfo, true);
   stmt_info_for_cost *si;
   int j;
   FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
j, si)
-(void) add_stmt_cost (target_cost_data, si->count,
+(void) add_stmt_cost (loop_vinfo->scalar_costs, si->count,
  si->kind, si->stmt_info, si->vectype,
  si->misalign, si->where);
-  unsigned prologue_cost = 0, body_cost = 0, epilogue_cost = 0;
-  finish_cost (target_cost_data, &prologue_cost, &body_cost,
-  &epilogue_cost);
-  delete target_cost_data;
-  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
-= prologue_cost + body_cost + epilogue_cost;
+  loop_vinfo->scalar_costs->finish_cost ();
 }
 
 
@@ -3868,8 +3864,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
  TODO: Consider assigning different costs to different scalar
  statements.  */
 
-  scalar_single_iter_cost
-= LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo);
+  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
 
   /* Add additional cost for the peeled instructions in prologue and epilogue
  loop.  (For fully-masked loops there will be no peeling.)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 0e3aad590e8..8dba3a34aa9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -590,6 +590,9 @@ public:
   /* The cost of the vector code.  */
   class vector_costs *vector_costs;
 
+  /* The cost of the scalar code.  */
+  class vector_costs *scalar_costs;
+
   /* Threshold of number of iterations below which vectorization will not be
  performed. It is calculated from MIN_PROFITABLE_ITERS and
  param_min_vect_loop_bound.  */
@@ -721,9 +724,6 @@ public:
  applied to the loop, i.e., no unrolling is needed, this is 1.  */
   poly_uint64 slp_unrolling_factor;
 
-  /* Cost of a single scalar iteration.  */
-  int single_scalar_iteration_cost;
-
   /* The factor used to over weight those statements in an inner loop
  relative to the loop being vectorized.  */
   unsigned int inner_loop_cost_factor;
@@ -843,7 +843,6 @@ public:
 #define LOOP_VINFO_SCALAR_LOOP_SCALING(L)  (L)->scalar_loop_scaling
 #define LOOP_VINFO_HAS_MASK_STORE(L)   (L)->has_mask_store
 #define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
-#define L

[PATCH] vect: Pass scalar_costs to finish_cost

2021-11-08 Thread Richard Sandiford via Gcc-patches
When finishing the vector costs, it can be useful to know
what the associated scalar costs were.  This allows targets
to read information collected about the original scalar loop
when trying to make a final judgement about the cost of the
vector code.

This patch therefore passes the scalar costs to
vector_costs::finish_cost.  The parameter is null for the
scalar costs themselves.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (vector_costs::finish_cost): Take the
corresponding scalar costs as a parameter.
(finish_cost): Likewise.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost)
(vect_estimate_min_profitable_iters): Update accordingly.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
* tree-vectorizer.c (vector_costs::finish_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_vector_costs::finish_cost):
Likewise.
* config/rs6000/rs6000.c (rs6000_cost_data::finish_cost): Likewise.
---
 gcc/config/aarch64/aarch64.c |  6 +++---
 gcc/config/rs6000/rs6000.c   |  6 +++---
 gcc/tree-vect-loop.c |  6 +++---
 gcc/tree-vect-slp.c  |  7 ---
 gcc/tree-vectorizer.c|  2 +-
 gcc/tree-vectorizer.h| 14 +-
 6 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 19f67415234..ebb937211ed 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14745,7 +14745,7 @@ public:
  stmt_vec_info stmt_info, tree vectype,
  int misalign,
  vect_cost_model_location where) override;
-  void finish_cost () override;
+  void finish_cost (const vector_costs *) override;
 
 private:
   void record_potential_advsimd_unrolling (loop_vec_info);
@@ -16138,7 +16138,7 @@ aarch64_vector_costs::adjust_body_cost (unsigned int 
body_cost)
 }
 
 void
-aarch64_vector_costs::finish_cost ()
+aarch64_vector_costs::finish_cost (const vector_costs *scalar_costs)
 {
   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
   if (loop_vinfo
@@ -16146,7 +16146,7 @@ aarch64_vector_costs::finish_cost ()
   && aarch64_use_new_vector_costs_p ())
 m_costs[vect_body] = adjust_body_cost (m_costs[vect_body]);
 
-  vector_costs::finish_cost ();
+  vector_costs::finish_cost (scalar_costs);
 }
 
 static void initialize_aarch64_code_model (struct gcc_options *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ec054800491..cd44ac61336 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5268,7 +5268,7 @@ public:
  stmt_vec_info stmt_info, tree vectype,
  int misalign,
  vect_cost_model_location where) override;
-  void finish_cost () override;
+  void finish_cost (const vector_costs *) override;
 
 protected:
   void update_target_cost_per_stmt (vect_cost_for_stmt, stmt_vec_info,
@@ -5522,7 +5522,7 @@ rs6000_cost_data::adjust_vect_cost_per_loop 
(loop_vec_info loop_vinfo)
 }
 
 void
-rs6000_cost_data::finish_cost ()
+rs6000_cost_data::finish_cost (const vector_costs *scalar_costs)
 {
   if (loop_vec_info loop_vinfo = dyn_cast (m_vinfo))
 {
@@ -5539,7 +5539,7 @@ rs6000_cost_data::finish_cost ()
m_costs[vect_body] += 1;
 }
 
-  vector_costs::finish_cost ();
+  vector_costs::finish_cost (scalar_costs);
 }
 
 /* Implement targetm.loop_unroll_adjust.  */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 887275a5071..190b52142e4 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1301,7 +1301,7 @@ vect_compute_single_scalar_iteration_cost (loop_vec_info 
loop_vinfo)
 (void) add_stmt_cost (loop_vinfo->scalar_costs, si->count,
  si->kind, si->stmt_info, si->vectype,
  si->misalign, si->where);
-  loop_vinfo->scalar_costs->finish_cost ();
+  loop_vinfo->scalar_costs->finish_cost (nullptr);
 }
 
 
@@ -4130,8 +4130,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 }
 
   /* Complete the target-specific cost calculations.  */
-  finish_cost (loop_vinfo->vector_costs, &vec_prologue_cost,
-  &vec_inside_cost, &vec_epilogue_cost);
+  finish_cost (loop_vinfo->vector_costs, loop_vinfo->scalar_costs,
+  &vec_prologue_cost, &vec_inside_cost, &vec_epilogue_cost);
 
   vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index d437bfd20d0..94c75497495 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5344,7 +5344,8 @@ vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
   while (si < li_scalar_costs.length ()
 && li_scalar_costs[si].first == sl);
   unsigned dummy;
-  finish_cost (scalar_target_cost_data, &dummy, &sca

Re: [PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 11:44 AM Richard Sandiford via Gcc-patches
 wrote:
>
> The vector costs now use a common base class instead of being
> completely abstract.  This means that there's no longer a
> need to record the inside and outside costs separately.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info): Remove vec_outside_cost
> and vec_inside_cost.
> (vector_costs::outside_cost): New function.
> * tree-vectorizer.c (_loop_vec_info::_loop_vec_info): Update
> after above.
> (vect_estimate_min_profitable_iters): Likewise.
> (vect_better_loop_vinfo_p): Get the inside and outside costs
> from the loop_vec_infos' vector_costs.
> ---
>  gcc/tree-vect-loop.c  | 24 ++--
>  gcc/tree-vectorizer.h | 16 +---
>  2 files changed, 19 insertions(+), 21 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index b6a631d4384..dd4a363fee5 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -840,8 +840,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  scan_map (NULL),
>  slp_unrolling_factor (1),
>  single_scalar_iteration_cost (0),
> -vec_outside_cost (0),
> -vec_inside_cost (0),
>  inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
>  vectorizable (false),
>  can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> @@ -2845,10 +2843,10 @@ vect_better_loop_vinfo_p (loop_vec_info 
> new_loop_vinfo,
>/* Compute the costs by multiplying the inside costs with the factor 
> and
>  add the outside costs for a more complete picture.  The factor is the
>  amount of times we are expecting to iterate this epilogue.  */
> -  old_cost = old_loop_vinfo->vec_inside_cost * old_factor;
> -  new_cost = new_loop_vinfo->vec_inside_cost * new_factor;
> -  old_cost += old_loop_vinfo->vec_outside_cost;
> -  new_cost += new_loop_vinfo->vec_outside_cost;
> +  old_cost = old_loop_vinfo->vector_costs->body_cost () * old_factor;
> +  new_cost = new_loop_vinfo->vector_costs->body_cost () * new_factor;
> +  old_cost += old_loop_vinfo->vector_costs->outside_cost ();
> +  new_cost += new_loop_vinfo->vector_costs->outside_cost ();
>return new_cost < old_cost;
>  }
>
> @@ -2865,8 +2863,8 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
>
>/* Check whether the (fractional) cost per scalar iteration is lower
>   or higher: new_inside_cost / new_vf vs. old_inside_cost / old_vf.  */
> -  poly_int64 rel_new = new_loop_vinfo->vec_inside_cost * old_vf;
> -  poly_int64 rel_old = old_loop_vinfo->vec_inside_cost * new_vf;
> +  poly_int64 rel_new = new_loop_vinfo->vector_costs->body_cost () * old_vf;
> +  poly_int64 rel_old = old_loop_vinfo->vector_costs->body_cost () * new_vf;
>
>HOST_WIDE_INT est_rel_new_min
>  = estimated_poly_value (rel_new, POLY_VALUE_MIN);
> @@ -2918,8 +2916,10 @@ vect_better_loop_vinfo_p (loop_vec_info new_loop_vinfo,
>
>/* If there's nothing to choose between the loop bodies, see whether
>   there's a difference in the prologue and epilogue costs.  */
> -  if (new_loop_vinfo->vec_outside_cost != old_loop_vinfo->vec_outside_cost)
> -return new_loop_vinfo->vec_outside_cost < 
> old_loop_vinfo->vec_outside_cost;
> +  auto old_outside_cost = old_loop_vinfo->vector_costs->outside_cost ();
> +  auto new_outside_cost = new_loop_vinfo->vector_costs->outside_cost ();
> +  if (new_outside_cost != old_outside_cost)
> +return new_outside_cost < old_outside_cost;
>
>return false;
>  }
> @@ -4272,10 +4272,6 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>
>vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
>
> -  /* Stash the costs so that we can compare two loop_vec_infos.  */
> -  loop_vinfo->vec_inside_cost = vec_inside_cost;
> -  loop_vinfo->vec_outside_cost = vec_outside_cost;
> -
>if (dump_enabled_p ())
>  {
>dump_printf_loc (MSG_NOTE, vect_location, "Cost model analysis: \n");
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 1cd6cc036f2..87d3f211a2e 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -724,13 +724,6 @@ public:
>/* Cost of a single scalar iteration.  */
>int single_scalar_iteration_cost;
>
> -  /* The cost of the vector prologue and epilogue, including peeled
> - iterations and set-up code.  */
> -  int vec_outside_cost;
> -
> -  /* The cost of the vector loop body.  */
> -  int vec_inside_cost;
> -
>/* The factor used to over weight those statements in an inner loop
>   relative to the loop being vectorized.  */
>unsigned int inner_loop_cost_factor;
> @@ -1429,6 +1422,7 @@ public:
>unsigned int prologue_cost () const;
>unsigned int body_cost () const;
>unsigned int epilogue

[PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Martin Liška

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* cgraph.c (cgraph_node::dump): Dump static_chain_decl.
---
 gcc/cgraph.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de078653781..8299ee92946 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2203,6 +2203,10 @@ cgraph_node::dump (FILE *f)
 fprintf (f, " %soperator_delete",
 DECL_IS_REPLACEABLE_OPERATOR (decl) ? "replaceable_" : "");
 
+  function *fn = DECL_STRUCT_FUNCTION (decl);

+  if (fn != NULL && fn->static_chain_decl)
+fprintf (f, " static_chain_decl");
+
   fprintf (f, "\n");
 
   if (thunk)

--
2.33.1



Re: [PATCH] vect: Hookize better_loop_vinfo_p

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 11:45 AM Richard Sandiford via Gcc-patches
 wrote:
>
> One of the things we want to do on AArch64 is compare vector loops
> side-by-side and pick the best one.  For some targets, we want this
> to be based on issue rates as well as the usual latency-based costs
> (at least for loops with relatively high iteration counts).
>
> The current approach to doing this is: when costing vectorisation
> candidate A, try to guess what the other main candidate B will look
> like and adjust A's latency-based cost up or down based on the likely
> difference between A and B's issue rates.  This effectively means
> that we try to cost parts of B at the same time as A, without actually
> being able to see B.

I think the idea of the current costing is that compares are always
to the original scalar loop (with comparing the resulting magic
cost numbers) so that by means of transitivity comparing two
vector variants work as well.  So I'm not sure where 'variant B'
comes into play here?

> This is needlessly indirect and complex.  It was a compromise due
> to the code being added (too) late in the GCC 11 cycle, so that
> target-independent changes weren't possible.
>
> The target-independent code already compares two candidate loop_vec_infos
> side-by-side, so that information about A and B above are available
> directly.  This patch creates a way for targets to hook into this
> comparison.

You mean it has both loop_vec_infos.  But as said I'm not sure it's a good
idea to compare those side-by-side - will that not lead to _more_ special-casing
since you need to have a working compare to the scalar variant as well
to determine the vectorization threshold?

>
> The AArch64 code can therefore hook into better_main_loop_than_p to
> compare issue rates.  If the issue rate comparison isn't decisive,
> the code can fall back to the normal latency-based comparison instead.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * tree-vectorizer.h (vector_costs::better_main_loop_than_p)
> (vector_costs::better_epilogue_loop_than_p)
> (vector_costs::compare_inside_loop_cost)
> (vector_costs::compare_outside_loop_cost): Likewise.
> * tree-vectorizer.c (vector_costs::better_main_loop_than_p)
> (vector_costs::better_epilogue_loop_than_p)
> (vector_costs::compare_inside_loop_cost)
> (vector_costs::compare_outside_loop_cost): New functions,
> containing code moved from...
> * tree-vect-loop.c (vect_better_loop_vinfo_p): ...here.
> ---
>  gcc/tree-vect-loop.c  | 142 ++---
>  gcc/tree-vectorizer.c | 204 ++
>  gcc/tree-vectorizer.h |  17 
>  3 files changed, 226 insertions(+), 137 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index dd4a363fee5..c9ee2e15e35 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2784,144 +2784,12 @@ vect_better_loop_vinfo_p (loop_vec_info 
> new_loop_vinfo,
> return new_simdlen_p;
>  }
>
> -  loop_vec_info main_loop = LOOP_VINFO_ORIG_LOOP_INFO (old_loop_vinfo);
> -  if (main_loop)
> -{
> -  poly_uint64 main_poly_vf = LOOP_VINFO_VECT_FACTOR (main_loop);
> -  unsigned HOST_WIDE_INT main_vf;
> -  unsigned HOST_WIDE_INT old_factor, new_factor, old_cost, new_cost;
> -  /* If we can determine how many iterations are left for the epilogue
> -loop, that is if both the main loop's vectorization factor and number
> -of iterations are constant, then we use them to calculate the cost of
> -the epilogue loop together with a 'likely value' for the epilogues
> -vectorization factor.  Otherwise we use the main loop's vectorization
> -factor and the maximum poly value for the epilogue's.  If the target
> -has not provided with a sensible upper bound poly vectorization
> -factors are likely to be favored over constant ones.  */
> -  if (main_poly_vf.is_constant (&main_vf)
> - && LOOP_VINFO_NITERS_KNOWN_P (main_loop))
> -   {
> - unsigned HOST_WIDE_INT niters
> -   = LOOP_VINFO_INT_NITERS (main_loop) % main_vf;
> - HOST_WIDE_INT old_likely_vf
> -   = estimated_poly_value (old_vf, POLY_VALUE_LIKELY);
> - HOST_WIDE_INT new_likely_vf
> -   = estimated_poly_value (new_vf, POLY_VALUE_LIKELY);
> -
> - /* If the epilogue is using partial vectors we account for the
> -partial iteration here too.  */
> - old_factor = niters / old_likely_vf;
> - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (old_loop_vinfo)
> - && niters % old_likely_vf != 0)
> -   old_factor++;
> -
> - new_factor = niters / new_likely_vf;
> - if (LOOP_VINFO_USING_PARTIAL_VECTORS_P (new_loop_vinfo)
> - && niters % new_likely_vf != 0)
> -   new_factor++;
> -   }
> -  else
> -   

[PATCH] vect: Move vector costs to loop_vec_info

2021-11-08 Thread Richard Sandiford via Gcc-patches
target_cost_data is in vec_info but is really specific to
loop_vec_info.  This patch moves it there and renames it to
vector_costs, to distinguish it from scalar target costs.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
* tree-vectorizer.h (vec_info::target_cost_data): Replace with...
(_loop_vec_info::vector_costs): ...this.
(LOOP_VINFO_TARGET_COST_DATA): Delete.
* tree-vectorizer.c (vec_info::vec_info): Remove target_cost_data
initialization.
(vec_info::~vec_info): Remove corresponding delete.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
vector_costs to null.
(_loop_vec_info::~_loop_vec_info): Delete vector_costs.
(vect_analyze_loop_operations): Update after above changes.
(vect_analyze_loop_2): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vect-slp.c (vect_slp_analyze_operations): Likewise.
---
 gcc/tree-vect-loop.c  | 14 --
 gcc/tree-vect-slp.c   | 13 ++---
 gcc/tree-vectorizer.c |  4 +---
 gcc/tree-vectorizer.h |  7 +++
 4 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a28bb6321d7..b6a631d4384 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -821,6 +821,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 num_iters (NULL_TREE),
 num_iters_unchanged (NULL_TREE),
 num_iters_assumptions (NULL_TREE),
+vector_costs (nullptr),
 th (0),
 versioning_threshold (0),
 vectorization_factor (0),
@@ -932,6 +933,7 @@ _loop_vec_info::~_loop_vec_info ()
   delete ivexpr_map;
   delete scan_map;
   epilogue_vinfos.release ();
+  delete vector_costs;
 
   /* When we release an epiloge vinfo that we do not intend to use
  avoid clearing AUX of the main loop which should continue to
@@ -1765,7 +1767,7 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
 }
 } /* bbs */
 
-  add_stmt_costs (loop_vinfo->target_cost_data, &cost_vec);
+  add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
 
   /* All operations in the loop are either irrelevant (deal with loop
  control, or dead), or only used outside the loop and can be moved
@@ -2375,7 +2377,7 @@ start_over:
   LOOP_VINFO_INT_NITERS (loop_vinfo));
 }
 
-  LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) = init_cost (loop_vinfo, false);
+  loop_vinfo->vector_costs = init_cost (loop_vinfo, false);
 
   /* Analyze the alignment of the data-refs in the loop.
  Fail if a data reference is found that cannot be vectorized.  */
@@ -2742,8 +2744,8 @@ again:
   LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release ();
   LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release ();
   /* Reset target cost data.  */
-  delete LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
-  LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) = nullptr;
+  delete loop_vinfo->vector_costs;
+  loop_vinfo->vector_costs = nullptr;
   /* Reset accumulated rgroup information.  */
   release_vec_loop_controls (&LOOP_VINFO_MASKS (loop_vinfo));
   release_vec_loop_controls (&LOOP_VINFO_LENS (loop_vinfo));
@@ -3919,7 +3921,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
   int scalar_outside_cost = 0;
   int assumed_vf = vect_vf_for_cost (loop_vinfo);
   int npeel = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
-  vector_costs *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
+  vector_costs *target_cost_data = loop_vinfo->vector_costs;
 
   /* Cost model disabled.  */
   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
@@ -4265,7 +4267,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
loop_vinfo,
 }
 
   /* Complete the target-specific cost calculations.  */
-  finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), &vec_prologue_cost,
+  finish_cost (loop_vinfo->vector_costs, &vec_prologue_cost,
   &vec_inside_cost, &vec_epilogue_cost);
 
   vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 7e1061c8c4e..d437bfd20d0 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4889,16 +4889,15 @@ vect_slp_analyze_operations (vec_info *vinfo)
   else
{
  i++;
-
- /* For BB vectorization remember the SLP graph entry
-cost for later.  */
- if (is_a  (vinfo))
-   instance->cost_vec = cost_vec;
- else
+ if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
{
- add_stmt_costs (vinfo->target_cost_data, &cost_vec);
+ add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
  cost_vec.release ();
}
+ else
+   /* For BB vectorization remember the SLP graph entry
+  cost for later.  */
+   instance->cost_vec = cost_vec;
}
 }
 
diff --git a/gcc/tree-vectorizer.c 

Re: [PATCH] gcov-profile: Fix -fcompare-debug with -fprofile-generate [PR100520]

2021-11-08 Thread Jan Hubicka via Gcc-patches
> On 11/5/21 18:30, Jan Hubicka wrote:
> > every gcc source looks like bit of overkill given that is can be open
> > coded in 3 statements?
> 
> Why? It's a static inline function with few statements. I don't want to 
> copy&paste
> the same code at every location. I bet there must quite some open-coded 
> implementations
> of endswith in the GCC source code.

I guess it is a matter of taste, but to me system.h should not be
universal include bringing a lot of unrelated things because in long
term it is how precompiled headers came to be.  In theory such random
generally useful things probably would belong to libiberty, but that
also seems but of overkill to me.

Anyway I guess we need someone with approval right to system.h to OK
that. I see there is already startswith so having endwith is probably
not too bad.

Honza
> 
> Martin


Re: [PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Jan Hubicka via Gcc-patches
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * cgraph.c (cgraph_node::dump): Dump static_chain_decl.
OK
Honza
> ---
>  gcc/cgraph.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de078653781..8299ee92946 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2203,6 +2203,10 @@ cgraph_node::dump (FILE *f)
>  fprintf (f, " %soperator_delete",
>DECL_IS_REPLACEABLE_OPERATOR (decl) ? "replaceable_" : "");
> +  function *fn = DECL_STRUCT_FUNCTION (decl);
> +  if (fn != NULL && fn->static_chain_decl)
> +fprintf (f, " static_chain_decl");
> +
>fprintf (f, "\n");
>if (thunk)
> -- 
> 2.33.1
> 


Re: [PATCH] vect: Keep scalar costs around longer

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 11:47 AM Richard Sandiford via Gcc-patches
 wrote:
>
> The scalar costs for a loop are fleeting, with only the final
> single_scalar_iteration_cost being kept for later comparison.
> This patch replaces single_scalar_iteration_cost with the cost
> structure, so that (with later patches) it's possible for targets
> to examine other target-specific cost properties as well.  This will
> be done by passing the scalar costs to hooks where appropriate;
> targets shouldn't try to read the information directly from
> loop_vec_infos.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.  I wondered if we can put this cost into vec_info_shared but
we seem to look at per-stmt info in vect_compute_single_scalar_iteration_cost
though quite possibly the relevant bits should not change.  So
we could eventually compute it lazily once.  Something to think about
later.

Richard.

> Richard
>
>
> gcc/
> * tree-vectorizer.h (_loop_vec_info::scalar_costs): New member
> variable.
> (_loop_vec_info::single_scalar_iteration_cost): Delete.
> (LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Delete.
> (vector_costs::total_cost): New function.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
> after above changes.
> (_loop_vec_info::~_loop_vec_info): Delete scalar_costs.
> (vect_compute_single_scalar_iteration_cost): Store the costs
> in loop_vinfo->scalar_costs.
> (vect_estimate_min_profitable_iters): Get the scalar cost from
> loop_vinfo->scalar_costs.
> ---
>  gcc/tree-vect-loop.c  | 17 ++---
>  gcc/tree-vectorizer.h | 17 +
>  2 files changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c9ee2e15e35..887275a5071 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -822,6 +822,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  num_iters_unchanged (NULL_TREE),
>  num_iters_assumptions (NULL_TREE),
>  vector_costs (nullptr),
> +scalar_costs (nullptr),
>  th (0),
>  versioning_threshold (0),
>  vectorization_factor (0),
> @@ -839,7 +840,6 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  ivexpr_map (NULL),
>  scan_map (NULL),
>  slp_unrolling_factor (1),
> -single_scalar_iteration_cost (0),
>  inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
>  vectorizable (false),
>  can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> @@ -931,6 +931,7 @@ _loop_vec_info::~_loop_vec_info ()
>delete ivexpr_map;
>delete scan_map;
>epilogue_vinfos.release ();
> +  delete scalar_costs;
>delete vector_costs;
>
>/* When we release an epiloge vinfo that we do not intend to use
> @@ -1292,20 +1293,15 @@ vect_compute_single_scalar_iteration_cost 
> (loop_vec_info loop_vinfo)
>  }
>
>/* Now accumulate cost.  */
> -  vector_costs *target_cost_data = init_cost (loop_vinfo, true);
> +  loop_vinfo->scalar_costs = init_cost (loop_vinfo, true);
>stmt_info_for_cost *si;
>int j;
>FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
> j, si)
> -(void) add_stmt_cost (target_cost_data, si->count,
> +(void) add_stmt_cost (loop_vinfo->scalar_costs, si->count,
>   si->kind, si->stmt_info, si->vectype,
>   si->misalign, si->where);
> -  unsigned prologue_cost = 0, body_cost = 0, epilogue_cost = 0;
> -  finish_cost (target_cost_data, &prologue_cost, &body_cost,
> -  &epilogue_cost);
> -  delete target_cost_data;
> -  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
> -= prologue_cost + body_cost + epilogue_cost;
> +  loop_vinfo->scalar_costs->finish_cost ();
>  }
>
>
> @@ -3868,8 +3864,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>   TODO: Consider assigning different costs to different scalar
>   statements.  */
>
> -  scalar_single_iter_cost
> -= LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo);
> +  scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
>
>/* Add additional cost for the peeled instructions in prologue and epilogue
>   loop.  (For fully-masked loops there will be no peeling.)
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 0e3aad590e8..8dba3a34aa9 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -590,6 +590,9 @@ public:
>/* The cost of the vector code.  */
>class vector_costs *vector_costs;
>
> +  /* The cost of the scalar code.  */
> +  class vector_costs *scalar_costs;
> +
>/* Threshold of number of iterations below which vectorization will not be
>   performed. It is calculated from MIN_PROFITABLE_ITERS and
>   param_min_vect_loop_bound.  */
> @@ -721,9 +724,6 @@ public:
>   applied 

Re: [PATCH] vect: Pass scalar_costs to finish_cost

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 11:48 AM Richard Sandiford via Gcc-patches
 wrote:
>
> When finishing the vector costs, it can be useful to know
> what the associated scalar costs were.  This allows targets
> to read information collected about the original scalar loop
> when trying to make a final judgement about the cost of the
> vector code.

Again, what kind of information would you be looking for here?

> This patch therefore passes the scalar costs to
> vector_costs::finish_cost.  The parameter is null for the
> scalar costs themselves.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> * tree-vectorizer.h (vector_costs::finish_cost): Take the
> corresponding scalar costs as a parameter.
> (finish_cost): Likewise.
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost)
> (vect_estimate_min_profitable_iters): Update accordingly.
> * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Likewise.
> * tree-vectorizer.c (vector_costs::finish_cost): Likewise.
> * config/aarch64/aarch64.c (aarch64_vector_costs::finish_cost):
> Likewise.
> * config/rs6000/rs6000.c (rs6000_cost_data::finish_cost): Likewise.
> ---
>  gcc/config/aarch64/aarch64.c |  6 +++---
>  gcc/config/rs6000/rs6000.c   |  6 +++---
>  gcc/tree-vect-loop.c |  6 +++---
>  gcc/tree-vect-slp.c  |  7 ---
>  gcc/tree-vectorizer.c|  2 +-
>  gcc/tree-vectorizer.h| 14 +-
>  6 files changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 19f67415234..ebb937211ed 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14745,7 +14745,7 @@ public:
>   stmt_vec_info stmt_info, tree vectype,
>   int misalign,
>   vect_cost_model_location where) override;
> -  void finish_cost () override;
> +  void finish_cost (const vector_costs *) override;
>
>  private:
>void record_potential_advsimd_unrolling (loop_vec_info);
> @@ -16138,7 +16138,7 @@ aarch64_vector_costs::adjust_body_cost (unsigned int 
> body_cost)
>  }
>
>  void
> -aarch64_vector_costs::finish_cost ()
> +aarch64_vector_costs::finish_cost (const vector_costs *scalar_costs)
>  {
>loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
>if (loop_vinfo
> @@ -16146,7 +16146,7 @@ aarch64_vector_costs::finish_cost ()
>&& aarch64_use_new_vector_costs_p ())
>  m_costs[vect_body] = adjust_body_cost (m_costs[vect_body]);
>
> -  vector_costs::finish_cost ();
> +  vector_costs::finish_cost (scalar_costs);
>  }
>
>  static void initialize_aarch64_code_model (struct gcc_options *);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index ec054800491..cd44ac61336 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5268,7 +5268,7 @@ public:
>   stmt_vec_info stmt_info, tree vectype,
>   int misalign,
>   vect_cost_model_location where) override;
> -  void finish_cost () override;
> +  void finish_cost (const vector_costs *) override;
>
>  protected:
>void update_target_cost_per_stmt (vect_cost_for_stmt, stmt_vec_info,
> @@ -5522,7 +5522,7 @@ rs6000_cost_data::adjust_vect_cost_per_loop 
> (loop_vec_info loop_vinfo)
>  }
>
>  void
> -rs6000_cost_data::finish_cost ()
> +rs6000_cost_data::finish_cost (const vector_costs *scalar_costs)
>  {
>if (loop_vec_info loop_vinfo = dyn_cast (m_vinfo))
>  {
> @@ -5539,7 +5539,7 @@ rs6000_cost_data::finish_cost ()
> m_costs[vect_body] += 1;
>  }
>
> -  vector_costs::finish_cost ();
> +  vector_costs::finish_cost (scalar_costs);
>  }
>
>  /* Implement targetm.loop_unroll_adjust.  */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index 887275a5071..190b52142e4 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1301,7 +1301,7 @@ vect_compute_single_scalar_iteration_cost 
> (loop_vec_info loop_vinfo)
>  (void) add_stmt_cost (loop_vinfo->scalar_costs, si->count,
>   si->kind, si->stmt_info, si->vectype,
>   si->misalign, si->where);
> -  loop_vinfo->scalar_costs->finish_cost ();
> +  loop_vinfo->scalar_costs->finish_cost (nullptr);
>  }
>
>
> @@ -4130,8 +4130,8 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  }
>
>/* Complete the target-specific cost calculations.  */
> -  finish_cost (loop_vinfo->vector_costs, &vec_prologue_cost,
> -  &vec_inside_cost, &vec_epilogue_cost);
> +  finish_cost (loop_vinfo->vector_costs, loop_vinfo->scalar_costs,
> +  &vec_prologue_cost, &vec_inside_cost, &vec_epilogue_cost);
>
>vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tr

[tree-ssa-loop] Remove redundant check for number of loops in pass_vectorize::execute

2021-11-08 Thread Prathamesh Kulkarni via Gcc-patches
Hi,
The attached patch removes redundant check for number of loops in
pass_vectorize::execute,
since it only calls vectorize_loops, and in vectorize_loops, we
immediately bail out if no loops are present:
  vect_loops_num = number_of_loops (cfun);
  /* Bail out if there are no loops.  */
  if (vect_loops_num <= 1)
return 0;

Is the patch OK to commit ?

Thanks,
Prathamesh
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index 1bbf2f1fb2c..d10e2dc0d54 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -407,9 +407,6 @@ public:
 unsigned int
 pass_vectorize::execute (function *fun)
 {
-  if (number_of_loops (fun) <= 1)
-return 0;
-
   return vectorize_loops ();
 }
 


Re: [PATCH] vect: Move vector costs to loop_vec_info

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 11:59 AM Richard Sandiford via Gcc-patches
 wrote:
>
> target_cost_data is in vec_info but is really specific to
> loop_vec_info.  This patch moves it there and renames it to
> vector_costs, to distinguish it from scalar target costs.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> gcc/
> * tree-vectorizer.h (vec_info::target_cost_data): Replace with...
> (_loop_vec_info::vector_costs): ...this.
> (LOOP_VINFO_TARGET_COST_DATA): Delete.
> * tree-vectorizer.c (vec_info::vec_info): Remove target_cost_data
> initialization.
> (vec_info::~vec_info): Remove corresponding delete.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> vector_costs to null.
> (_loop_vec_info::~_loop_vec_info): Delete vector_costs.
> (vect_analyze_loop_operations): Update after above changes.
> (vect_analyze_loop_2): Likewise.
> (vect_estimate_min_profitable_iters): Likewise.
> * tree-vect-slp.c (vect_slp_analyze_operations): Likewise.
> ---
>  gcc/tree-vect-loop.c  | 14 --
>  gcc/tree-vect-slp.c   | 13 ++---
>  gcc/tree-vectorizer.c |  4 +---
>  gcc/tree-vectorizer.h |  7 +++
>  4 files changed, 18 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index a28bb6321d7..b6a631d4384 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -821,6 +821,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>  num_iters (NULL_TREE),
>  num_iters_unchanged (NULL_TREE),
>  num_iters_assumptions (NULL_TREE),
> +vector_costs (nullptr),
>  th (0),
>  versioning_threshold (0),
>  vectorization_factor (0),
> @@ -932,6 +933,7 @@ _loop_vec_info::~_loop_vec_info ()
>delete ivexpr_map;
>delete scan_map;
>epilogue_vinfos.release ();
> +  delete vector_costs;
>
>/* When we release an epiloge vinfo that we do not intend to use
>   avoid clearing AUX of the main loop which should continue to
> @@ -1765,7 +1767,7 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
>  }
>  } /* bbs */
>
> -  add_stmt_costs (loop_vinfo->target_cost_data, &cost_vec);
> +  add_stmt_costs (loop_vinfo->vector_costs, &cost_vec);
>
>/* All operations in the loop are either irrelevant (deal with loop
>   control, or dead), or only used outside the loop and can be moved
> @@ -2375,7 +2377,7 @@ start_over:
>LOOP_VINFO_INT_NITERS (loop_vinfo));
>  }
>
> -  LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) = init_cost (loop_vinfo, false);
> +  loop_vinfo->vector_costs = init_cost (loop_vinfo, false);
>
>/* Analyze the alignment of the data-refs in the loop.
>   Fail if a data reference is found that cannot be vectorized.  */
> @@ -2742,8 +2744,8 @@ again:
>LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).release ();
>LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo).release ();
>/* Reset target cost data.  */
> -  delete LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> -  LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) = nullptr;
> +  delete loop_vinfo->vector_costs;
> +  loop_vinfo->vector_costs = nullptr;
>/* Reset accumulated rgroup information.  */
>release_vec_loop_controls (&LOOP_VINFO_MASKS (loop_vinfo));
>release_vec_loop_controls (&LOOP_VINFO_LENS (loop_vinfo));
> @@ -3919,7 +3921,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>int scalar_outside_cost = 0;
>int assumed_vf = vect_vf_for_cost (loop_vinfo);
>int npeel = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
> -  vector_costs *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
> +  vector_costs *target_cost_data = loop_vinfo->vector_costs;
>
>/* Cost model disabled.  */
>if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> @@ -4265,7 +4267,7 @@ vect_estimate_min_profitable_iters (loop_vec_info 
> loop_vinfo,
>  }
>
>/* Complete the target-specific cost calculations.  */
> -  finish_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), &vec_prologue_cost,
> +  finish_cost (loop_vinfo->vector_costs, &vec_prologue_cost,
>&vec_inside_cost, &vec_epilogue_cost);
>
>vec_outside_cost = (int)(vec_prologue_cost + vec_epilogue_cost);
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 7e1061c8c4e..d437bfd20d0 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -4889,16 +4889,15 @@ vect_slp_analyze_operations (vec_info *vinfo)
>else
> {
>   i++;
> -
> - /* For BB vectorization remember the SLP graph entry
> -cost for later.  */
> - if (is_a  (vinfo))
> -   instance->cost_vec = cost_vec;
> - else
> + if (loop_vec_info loop_vinfo = dyn_cast (vinfo))
> {
> - add_stmt_costs (vinfo->target_cost_data, &cost_vec);
> + ad

Re: [PATCH] gcov-profile: Fix -fcompare-debug with -fprofile-generate [PR100520]

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 12:01 PM Jan Hubicka via Gcc-patches
 wrote:
>
> > On 11/5/21 18:30, Jan Hubicka wrote:
> > > every gcc source looks like bit of overkill given that is can be open
> > > coded in 3 statements?
> >
> > Why? It's a static inline function with few statements. I don't want to 
> > copy&paste
> > the same code at every location. I bet there must quite some open-coded 
> > implementations
> > of endswith in the GCC source code.
>
> I guess it is a matter of taste, but to me system.h should not be
> universal include bringing a lot of unrelated things because in long
> term it is how precompiled headers came to be.  In theory such random
> generally useful things probably would belong to libiberty, but that
> also seems but of overkill to me.
>
> Anyway I guess we need someone with approval right to system.h to OK
> that. I see there is already startswith so having endwith is probably
> not too bad.

OK.

Richard.

> Honza
> >
> > Martin


Re: [PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Eric Botcazou via Gcc-patches
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index de078653781..8299ee92946 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2203,6 +2203,10 @@ cgraph_node::dump (FILE *f)
>   fprintf (f, " %soperator_delete",
>DECL_IS_REPLACEABLE_OPERATOR (decl) ? "replaceable_" : "");
> 
> +  function *fn = DECL_STRUCT_FUNCTION (decl);
> +  if (fn != NULL && fn->static_chain_decl)
> +fprintf (f, " static_chain_decl");
> +
> fprintf (f, "\n");
> 
> if (thunk)

static_chain_decl is not a flag though, it's a tree.

-- 
Eric Botcazou




Re: [tree-ssa-loop] Remove redundant check for number of loops in pass_vectorize::execute

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 12:06 PM Prathamesh Kulkarni via Gcc-patches
 wrote:
>
> Hi,
> The attached patch removes redundant check for number of loops in
> pass_vectorize::execute,
> since it only calls vectorize_loops, and in vectorize_loops, we
> immediately bail out if no loops are present:
>   vect_loops_num = number_of_loops (cfun);
>   /* Bail out if there are no loops.  */
>   if (vect_loops_num <= 1)
> return 0;
>
> Is the patch OK to commit ?

Can you please merge both functions then, keeping only
pass_vectorize::execute (and replacing 'cfun' occurrences
by 'fun' as passed as argument)?

Thanks,
Richard.

>
> Thanks,
> Prathamesh


Re: [PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Jan Hubicka via Gcc-patches
> > diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> > index de078653781..8299ee92946 100644
> > --- a/gcc/cgraph.c
> > +++ b/gcc/cgraph.c
> > @@ -2203,6 +2203,10 @@ cgraph_node::dump (FILE *f)
> >   fprintf (f, " %soperator_delete",
> >  DECL_IS_REPLACEABLE_OPERATOR (decl) ? "replaceable_" : "");
> > 
> > +  function *fn = DECL_STRUCT_FUNCTION (decl);
> > +  if (fn != NULL && fn->static_chain_decl)
> > +fprintf (f, " static_chain_decl");
> > +
> > fprintf (f, "\n");
> > 
> > if (thunk)
> 
> static_chain_decl is not a flag though, it's a tree.
Ah yes, you want to just test if (DECL_STATI_CHAIN (decl))
and then print "static_chain".
I was not reading carefuly enough. Thanks!
Honza
> 
> -- 
> Eric Botcazou
> 
> 


Re: [PATCH 0/4] config: Allow a host to opt out of PCH.

2021-11-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Nov 05, 2021 at 04:37:09PM +, Iain Sandoe wrote:
> It is hard to judge the relative effort in the two immediately visible 
> solutions:
> 
> 1. relocatable PCH
> 2. taking the tree streamer from the modules implementation, moving its home
> to c-family and adding hooks so that each FE can stream its own special 
> trees.
> 
> ISTM, that part of the reason people dislike PCH is because the 
> implementation is
> mixed up with the GC solution - the rendering is non-transparent etc.
> 
> So, in some ways, (2) above would be a better investment - the process of PCH 
> is:
> generate:
> “get to the end of parsing a TU” .. stream the AST
> consume:
> .. see a header .. stream the PCH AST in if there is one available for the 
> header.
> 
> There is no reason for this to be mixed into the GC solution - the read in 
> (currently)
> happens to an empty TU and there should be nothing in the AST that carries any
> reference to the compiler’s executable.

I'm afraid (2) is much more work though even just for C++, because handling
just modules streaming and arbitrary headers is quite different.

Anyway, I've rebuilt my cc1plus as PIE (and am invoking it under gdb wrapper 
which
has ASLR disabled when building x86_64-pc-linux-gnu/bits/stdc++.h.gch/O2g.gch).
With the hack patch I've posted earlier, the results are much shorter than
before, in particular those scalar at messages only for ovl_op_info array
and then
object 0x7fffe9f6b3c0 at 0x7fffe9f6b3c8 points to executable 0x56a2a180
object 0x7fffe9f6b3c0 at 0x7fffe9f6b3d0 points to executable 0x5772f9b9
object 0x7fffe9f6b3a0 at 0x7fffe9f6b3a8 points to executable 0x56a2a180
object 0x7fffe9f6b3a0 at 0x7fffe9f6b3b0 points to executable 0x5767da08
object 0x7fffe9f6b400 at 0x7fffe9f6b408 points to executable 0x56a2a180
object 0x7fffe9f6b400 at 0x7fffe9f6b410 points to executable 0x5772f9d2
object 0x7fffe9f6b480 at 0x7fffe9f6b488 points to executable 0x56a306d0
object 0x7fffe9f6b3e0 at 0x7fffe9f6b3e8 points to executable 0x56a2a180
object 0x7fffe9f6b3e0 at 0x7fffe9f6b3f0 points to executable 0x5772f9c0
object 0x7fffe9f6b420 at 0x7fffe9f6b428 points to executable 0x56a2b880
object 0x7fffe9f6b440 at 0x7fffe9f6b448 points to executable 0x56a2b7a0
object 0x7fffe9f6b460 at 0x7fffe9f6b468 points to executable 0x56a30710
object 0x7fffe9f79168 at 0x7fffe9f791d8 points to executable 0x576832b9
object 0x77fca000 at 0x77fca048 points to executable 0x5670b7d0
object 0x77fca000 at 0x77fca050 points to executable 0x561eb040
If I look at the unique addresses in the last column after subtracing my
PIE base of 0x4000, they are:
000c97040   _Z20ggc_round_alloc_sizem
0011b77d0   _ZL20realloc_for_line_mapPvm
0014d6180   _Z21output_section_asm_opPKv
0014d77a0   _ZL10emit_localP9tree_nodePKcmm
0014d7880   _ZL15emit_tls_commonP9tree_nodePKcmm
0014dc6d0   _ZL8emit_bssP9tree_nodePKcmm
0014dc710   _ZL11emit_commonP9tree_nodePKcmm
002129a08   "\t.text"
00212f2b9   "GNU C++17"
0021db9b9   "\t.data"
0021db9c0   "\t.section"
0021db9d2   "\t.bss"

For ovl_op_info array, I've mentioned that the array has:
struct GTY(()) ovl_op_info_t {
  tree identifier;
  const char *name;
  const char *mangled_name;
  // And a bunch of scalar members.
};
and while .name and .mangled_name are initially initialized to
NULL or string literals, init_operators then (at least in my understanding
not based on any command line switches and therefore probably always the
same way) reorders some of the elements plus creates those identifier trees.
I said I didn't know what exactly PCH does with const char * or char *
members.  Looking in more detail, gt_ggc_m_S clearly supports:
1) NULL
2) non-GC addresses (so most likely const literals):
  /* Look up the page on which the object is alloced.  If it was not
 GC allocated, gracefully bail out.  */
  entry = safe_lookup_page_table_entry (p);
  if (!entry)
return;
3) GC addresses not pointing to start of objects - here it assumes
   it points to STRING_CST payload and marks the STRING_CST
4) GC addresses which are starts of objects
And then as can be seen in gt_pch_note_object during PCH, it only
has an exception for NULL and (void *) 1, otherwise for gt_pch_p_S
it remembers the pointer and uses strlen (pointer) + 1 to determine
the size.  While I haven't verified it yet, my understanding is that
if PCH save is done and some GC object or e.g. that
ovl_op_info[?][?].{,mangled_}name points to a .rodata string literal
that when the PCH is saved, we actually make the .gch file not point
it to the string literal in .rodata, but allocate in GC that string
literal and so when PCH is loaded, they will point to some GC allocated
memory containing a copy of that string literal.
So, in theory ovl_op_info would be fine, my printout happens for
scalars when saving the scalar data, but after that we do
write_pch_globals and we have:
  {
&ovl_

Re: [PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Martin Liška

On 11/8/21 12:17, Jan Hubicka wrote:

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index de078653781..8299ee92946 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2203,6 +2203,10 @@ cgraph_node::dump (FILE *f)
   fprintf (f, " %soperator_delete",
 DECL_IS_REPLACEABLE_OPERATOR (decl) ? "replaceable_" : "");

+  function *fn = DECL_STRUCT_FUNCTION (decl);
+  if (fn != NULL && fn->static_chain_decl)
+fprintf (f, " static_chain_decl");
+
 fprintf (f, "\n");

 if (thunk)


static_chain_decl is not a flag though, it's a tree.

Ah yes, you want to just test if (DECL_STATI_CHAIN (decl))
and then print "static_chain".
I was not reading carefuly enough. Thanks!


Oh, ok. One can read it from decl.

Pushed with the modification as g:409767d774c59ee4c3eefca5015ba00539fddc08.

Cheers,
Martin


Honza


--
Eric Botcazou






Re: [PATCH] Dump static chain for cgraph_node.

2021-11-08 Thread Martin Liška

On 11/8/21 12:51, Martin Liška wrote:

Pushed with the modification as g:409767d774c59ee4c3eefca5015ba00539fddc08.


Sorry, I forgot to commit it, so fixed in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=355eb60b6672220642fae853021599afaa87dfd6

Martin


Re: [PATCH] Add debug counters to back threader.

2021-11-08 Thread Martin Liška

n 11/6/21 16:53, Aldy Hernandez wrote:

Martin, pinskia, do you have opinions here?


Hello.

I do prefer having the 4 counters as introduced:

DEBUG_COUNTER (back_thread1)
DEBUG_COUNTER (back_thread2)
DEBUG_COUNTER (back_threadfull1)
DEBUG_COUNTER (back_threadfull2)

Cheers,
Martin


Re: GCC 11 backports

2021-11-08 Thread Martin Liška

On 11/5/21 17:08, Martin Liška wrote:

On 8/23/21 10:54, Martin Liška wrote:

On 8/16/21 13:13, Martin Liška wrote:

I'm going to apply the following 3 tested patches.

Martin


One more patch I've just tested.

Martin


And one more backport.

Martin


One more tested patch.

Martin


Re: [PATCH] vect: Hookize better_loop_vinfo_p

2021-11-08 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, Nov 8, 2021 at 11:45 AM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> One of the things we want to do on AArch64 is compare vector loops
>> side-by-side and pick the best one.  For some targets, we want this
>> to be based on issue rates as well as the usual latency-based costs
>> (at least for loops with relatively high iteration counts).
>>
>> The current approach to doing this is: when costing vectorisation
>> candidate A, try to guess what the other main candidate B will look
>> like and adjust A's latency-based cost up or down based on the likely
>> difference between A and B's issue rates.  This effectively means
>> that we try to cost parts of B at the same time as A, without actually
>> being able to see B.
>
> I think the idea of the current costing is that compares are always
> to the original scalar loop (with comparing the resulting magic
> cost numbers) so that by means of transitivity comparing two
> vector variants work as well.  So I'm not sure where 'variant B'
> comes into play here?

E.g. A could be SVE and B could be Advanced SIMD, or A could be
SVE with fully-packed vectors and B could be SVE with partially
unpacked vectors.

One motivating case is Neoverse V1, which is a 256-bit SVE target.
There, scalar code, Advanced SIMD code and SVE code have different issue
characteristics.  SVE vector ops generally have the same latencies as
the corresponding Advanced SIMD ops.  Advanced SIMD ops generally
have double the throughput of SVE ones, so that the overall bit-for-bit
throughputs are roughly equal.  However, there are differences due to
predication, load/store handling, and so on, and those differences
should be decisive in some cases.

For SLP, latency-based costs are fine.  But for loop costs, they hide
too many details.  What works best in practice, both for vector vs.
vector and vector vs. scalar, is:

(1) compare issue rates between two loop bodies (vector vs. vector
or vector vs. scalar)
(2) if issue rates are equal for a given number of scalar iterations,
compare the latencies of the loop bodies, as we do now
(3) if both the above are equal, compare the latencies of the prologue
and epilogue, as we do now

It's very difficult to combine latency and issue rate information
into a single per-statement integer, in such a way that the integers
can just be summed and compared.

So returning to the above, when costing an SVE loop A, we currently
collect on-the-side issue information about both A and the likely
Advanced SIMD implementation B.  If B would have a higher throughput
than A then we multiply A's latency-based costs by:

   ceil(B throughput/A throughput)

The problem is, we don't actually know whether B exists or what it
would look like (given that Advanced SIMD and SVE have different
features).

In principle, we should do the same in reverse, but since SVE needs
fewer ops than Advanced SIMD, the latencies are usually smaller already.

We also do something similar for the scalar code.  When costing both
Advanced SIMD and SVE, we try to estimate the issue rate of the
original scalar code.  If the scalar code could issue more quickly
than the equivalent vector code, we increase the latency-based cost
of the vector code to account for that.

The idea of the patch (and others) is that we can do the (1), (2),
(3) comparison above directly rather than indirectly.  We can also
use the issue information calculated while costing the scalar code,
rather than having to estimate the scalar issue information while
costing the vector code.

>> This is needlessly indirect and complex.  It was a compromise due
>> to the code being added (too) late in the GCC 11 cycle, so that
>> target-independent changes weren't possible.
>>
>> The target-independent code already compares two candidate loop_vec_infos
>> side-by-side, so that information about A and B above are available
>> directly.  This patch creates a way for targets to hook into this
>> comparison.
>
> You mean it has both loop_vec_infos.  But as said I'm not sure it's a good
> idea to compare those side-by-side - will that not lead to _more_ 
> special-casing
> since you need to have a working compare to the scalar variant as well
> to determine the vectorization threshold?

A later patch allows the code to do the same comparison for vector
vs. scalar.  It makes the target code significantly simpler compared
to now, since add_stmt_cost only needs to consider the latency and
issue rate of the code it's actually supposed to be costing, rather than
trying also to estimate the issue rate of the scalar code and the issue
rate of the Advanced SIMD code.

Thanks,
Richard

>
>>
>> The AArch64 code can therefore hook into better_main_loop_than_p to
>> compare issue rates.  If the issue rate comparison isn't decisive,
>> the code can fall back to the normal latency-based comparison instead.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>>  

Re: [PATCH 11/18] rs6000: Builtin expansion, part 6

2021-11-08 Thread Bill Schmidt via Gcc-patches
Sorry for the misunderstanding.  What I meant is the 6 patches entitled 
"Builtin expansion, part N".

I still have 6-7 patches left to look at.

Thanks!
Bill

On 11/7/21 3:05 PM, Segher Boessenkool wrote:
> Hi!
>
> On Sun, Nov 07, 2021 at 09:28:09AM -0600, Bill Schmidt wrote:
>> Thank you for all of the reviews!  I appreciate your hard work and thorough 
>> study of the patches.
>>
>> I've updated these 6 patches and combined them into 1, pushed today.  There 
>> are still a couple of cleanups I haven't done, but I made note in the code
>> where these are needed.
> I did not approve the testsuite one, it needs more work?
>
>
> Segher


Re: [PATCH] Objective-C: fix protocol list count type (pertinent to non-LP64)

2021-11-08 Thread Iain Sandoe



> On 7 Nov 2021, at 22:50, Matt Jacobson  wrote:
> 
> 
>> On Oct 25, 2021, at 5:43 AM, Iain Sandoe  wrote:
>> 
>> Did you test objective-c++ on Darwin?
>> 
>> I see a lot of fails of the form:
>> Excess errors:
>> : error: initialization of a flexible array member [-Wpedantic]
> 
> Looked into this.  It’s happening because obj-c++.dg/dg.exp has:
> 
>set DEFAULT_OBJCXXFLAGS " -ansi -pedantic-errors -Wno-long-long"
> 
> Specifically, the `-pedantic-errors` argument prohibits initialization of a 
> flexible array member.  Notably, this flag does *not* appear in objc/dg.exp.

I think -pedantic-errors might be applied at a higher level - if it is not, 
then perhaps that is an omission to rectify.
> 
> Admittedly I didn’t know that initialization of a FAM was prohibited by the 
> standard.  It’s allowed by GCC, though, as documented here:
> 
> 
> 
> Is it OK to use a GCC extension this way in the Objective-C frontend?

Well, I had two thoughts so far;

1/ allow the extension and suppress the warning on the relevant trees.

2/ maybe create a new type “ on the fly “ for each protocol list with a correct 
length [initialising this would not be an error] (since the type is only used 
once it can be local to the use).

— but I haven’t had time to implement either ….

>> For a patch that changes code-gen we should have a test that it produces 
>> what’s
>> expected (in general, a ‘torture' test would be preferrable so that we can 
>> be sure the
>> output is as expected for different optimisation levels). 
> 
> The output is different only for targets where 
> sizeof (long) != sizeof (void *).  
> Do we have the ability to run “cross” 
> torture tests?

For most platforms, the answer is pretty much “no” - GCC is built for a single 
target (unless you have a mutlilib with the necessary difference - which seems 
unlikely in this case).

For Darwin platforms, we can (in most cases) choose a different 
-mmacosx-version-min= from the current host) - but that doesn’t allow us really 
any more and the 32b multilibs have sizeof (long) == sizeof (ptr) so no help 
there.

>  Could such a test verify the emitted assembly (like LLVM’s 
> FileCheck tests do)?  

We have a similar set of possibilities to LLVM/clang (the tree check could be 
the right one here)

* we can verify at some appproriate IR stage [-fdump-tree-x] (check that 
the right trees are emitted)

* we can verify the expected assembler is emitted (scan-asm in dg-tests)

> Or would it need to execute something?

An execute test can be a good way for checking that code-gen is working 
properly (providing, of course, that a wrong codegen would in some way make the 
execute fail - the question is can one construct such a test for this)

sorry this is not an answer - just a list of possible ways forward.
Iain




RE: Some PINGs

2021-11-08 Thread Roger Sayle


Hi Richard,

>> I wonder if reviewers could take a look (or a second look) at some of 
>> my outstanding patches.
>> PR middle-end/100810: Penalize IV candidates with undefined value 
>> bases 
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html
>
> I did comment on this one, noting the more general issue.
> My opinion is still that doing heavy lifting in IVOPTs is misplaced.

I wasn't sure whether you'd had the opportunity to give this bug some more
thought.

You're completely right, that it is theoretically possible for GCC to upgrade
its data flow (CCP/VRP) passes to use a finer grained definition of undefined/
uninitialized/indeterminate values; an indeterminate-value numbering pass
if you will.  Under the constraints that the automatic variables are not 
volatile,
and the types don't supporting a trapping values, the compiler could determine
that "undef1 - undef1", or "undef2 ^ undef2" have defined values, but that 
"undef1 - undef2" and "undef3 ^ undef4" remain indeterminate.  Like
traditional value numbering, it may be possible to track "t1 = undef3 ^ undef4",
"t2 = t1 ^ undef4", "t3 = t2 - undef3".   Conceptually, the same applies to 
(floating point) mathematics and its numerous infinities, sometimes "+Inf - 
+Inf"
is known to be zero, provided that it is the exact same infinity (omega) that is
being subtracted.

The two counter arguments for this solution as a fix for PR 100810, is that such
a pass doesn't yet exist, and it still (probably) falls foul of C/C++'s 
undefined
behaviour from use of an uninitialized automatic variable.  From an engineering
perspective, it's a lot of effort to support poorly written code.

Quick question to the language lawyers:

int foo()
{
  int undef;
  return undef ^ undef;
}

int bar()
{
  volatile int undef;
  return undef ^ undef;
}

Do the above functions invoke undefined behaviour?

The approach taken by the proposed patch is that it's the pass/transformation 
that
introduces more undefined behaviour than was present in the original code, that 
is
at fault.  Even if later passes, decided not to take advantage of UB, is there 
any benefit
for replacing an induction variable with a well-defined value (evolution) and 
substituting
it with one that depends upon indefinite values.  I'd argue the correct fix is 
to go the other
way, and attempt to reduce the instances of undefined behaviour.

So transform

int undef;
while (cond())
  undef++;
...

which invokes UB on each iteration with:

int undef;
unsigned int count = 0;
while (cond())
  count++;
undef += count;
...

which now only invokes UB after the loop.

Consider:
int undef;
int result = 0;
while (cond())
  result ^= undef;
return result;

where the final value of result may be well-defined if the loop iterates
an even number of times.

int undef;
int result = 0;
while (cond())
  result ^= 1;
return result ? undef : 0;


Has anyone proposed an alternate fix to PR middle-end/100810?
We can always revert my proposed fix (for this P1 regression), once IV opts
is able to confirm that it is safe to make the substitution that it is 
proposing.


I'm curious to hear your (latest) thoughts.

Thanks again for thinking about this.

Best regards,
Roger
--




Re: [PATCH] Remove dead code.

2021-11-08 Thread Jeff Law via Gcc-patches




On 11/8/2021 2:59 AM, Jakub Jelinek via Gcc-patches wrote:

On Mon, Nov 08, 2021 at 09:45:39AM +0100, Martin Liška wrote:

This fixes issue reported in the PR.

Ready to be installed?

I'm not sure.  liboffloadmic is copied from upstream, so the right
thing if we want to do anything at all (if we don't remove it, nothing
bad happens, the condition is never true anyway, whether removed away
in the source or removed by the compiler) would be to let Intel fix it in
their source and update from that.
But I have no idea where it even lives upstream.
I thought MIC as an architecture was dead, so it could well be the case 
that there isn't a viable upstream anymore for that code.


jeff


Re: [PATCH] vect: Hookize better_loop_vinfo_p

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 2:06 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Nov 8, 2021 at 11:45 AM Richard Sandiford via Gcc-patches
> >  wrote:
> >>
> >> One of the things we want to do on AArch64 is compare vector loops
> >> side-by-side and pick the best one.  For some targets, we want this
> >> to be based on issue rates as well as the usual latency-based costs
> >> (at least for loops with relatively high iteration counts).
> >>
> >> The current approach to doing this is: when costing vectorisation
> >> candidate A, try to guess what the other main candidate B will look
> >> like and adjust A's latency-based cost up or down based on the likely
> >> difference between A and B's issue rates.  This effectively means
> >> that we try to cost parts of B at the same time as A, without actually
> >> being able to see B.
> >
> > I think the idea of the current costing is that compares are always
> > to the original scalar loop (with comparing the resulting magic
> > cost numbers) so that by means of transitivity comparing two
> > vector variants work as well.  So I'm not sure where 'variant B'
> > comes into play here?
>
> E.g. A could be SVE and B could be Advanced SIMD, or A could be
> SVE with fully-packed vectors and B could be SVE with partially
> unpacked vectors.
>
> One motivating case is Neoverse V1, which is a 256-bit SVE target.
> There, scalar code, Advanced SIMD code and SVE code have different issue
> characteristics.  SVE vector ops generally have the same latencies as
> the corresponding Advanced SIMD ops.  Advanced SIMD ops generally
> have double the throughput of SVE ones, so that the overall bit-for-bit
> throughputs are roughly equal.  However, there are differences due to
> predication, load/store handling, and so on, and those differences
> should be decisive in some cases.
>
> For SLP, latency-based costs are fine.  But for loop costs, they hide
> too many details.  What works best in practice, both for vector vs.
> vector and vector vs. scalar, is:
>
> (1) compare issue rates between two loop bodies (vector vs. vector
> or vector vs. scalar)
> (2) if issue rates are equal for a given number of scalar iterations,
> compare the latencies of the loop bodies, as we do now
> (3) if both the above are equal, compare the latencies of the prologue
> and epilogue, as we do now
>
> It's very difficult to combine latency and issue rate information
> into a single per-statement integer, in such a way that the integers
> can just be summed and compared.

Yeah, so the idea I had when proposing the init_cost/add_stmt/finish_cost
was that finish_cost would determine the issue rate cost part, for
example if the uarch can issue 2 ops with ISA A and 4 ops with ISA B
then finish_cost would compute the number of cycles required to
issue all vector stmts.  That yields sth like IPC the latency cost could
be divided by.  Doing that for both ISA A and ISA B would produce
weighted latency values that can be compared?  Alternatively
you can of course simulate issue with the actual instruction latencies
in mind and produce an overall iteration latency number.

Comparing just latency or issue rate in isolation is likely not good
enough?

> So returning to the above, when costing an SVE loop A, we currently
> collect on-the-side issue information about both A and the likely
> Advanced SIMD implementation B.  If B would have a higher throughput
> than A then we multiply A's latency-based costs by:
>
>ceil(B throughput/A throughput)
>
> The problem is, we don't actually know whether B exists or what it
> would look like (given that Advanced SIMD and SVE have different
> features).
>
> In principle, we should do the same in reverse, but since SVE needs
> fewer ops than Advanced SIMD, the latencies are usually smaller already.
>
> We also do something similar for the scalar code.  When costing both
> Advanced SIMD and SVE, we try to estimate the issue rate of the
> original scalar code.  If the scalar code could issue more quickly
> than the equivalent vector code, we increase the latency-based cost
> of the vector code to account for that.
>
> The idea of the patch (and others) is that we can do the (1), (2),
> (3) comparison above directly rather than indirectly.  We can also
> use the issue information calculated while costing the scalar code,
> rather than having to estimate the scalar issue information while
> costing the vector code.
>
> >> This is needlessly indirect and complex.  It was a compromise due
> >> to the code being added (too) late in the GCC 11 cycle, so that
> >> target-independent changes weren't possible.
> >>
> >> The target-independent code already compares two candidate loop_vec_infos
> >> side-by-side, so that information about A and B above are available
> >> directly.  This patch creates a way for targets to hook into this
> >> comparison.
> >
> > You mean it has both loop_vec_infos.  But as said I'm not sure it's a good
> > idea to compare those side-

[PATCH] c++: unexpanded pack in var tmpl partial spec [PR100652]

2021-11-08 Thread Patrick Palka via Gcc-patches
Here we're not spotting a bare parameter pack appearing in the argument
list of a variable template partial specialization because we only look
for them within the decl's TREE_TYPE, which is sufficient for class
templates but not for variable templates.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/100652

gcc/cp/ChangeLog:

* pt.c (push_template_decl): Check for bare parameter packs in
the argument list of a variable template partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ69.C: New test.
---
 gcc/cp/pt.c  | 17 -
 gcc/testsuite/g++.dg/cpp1y/var-templ69.C |  5 +
 2 files changed, 17 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ69.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b82b9cc3cd2..991a20a85d4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5877,12 +5877,19 @@ push_template_decl (tree decl, bool is_friend)
   if (check_for_bare_parameter_packs (TYPE_RAISES_EXCEPTIONS (type)))
TYPE_RAISES_EXCEPTIONS (type) = NULL_TREE;
 }
-  else if (check_for_bare_parameter_packs (is_typedef_decl (decl)
-  ? DECL_ORIGINAL_TYPE (decl)
-  : TREE_TYPE (decl)))
+  else
 {
-  TREE_TYPE (decl) = error_mark_node;
-  return error_mark_node;
+  if (check_for_bare_parameter_packs (is_typedef_decl (decl)
+ ? DECL_ORIGINAL_TYPE (decl)
+ : TREE_TYPE (decl)))
+   {
+ TREE_TYPE (decl) = error_mark_node;
+ return error_mark_node;
+   }
+
+  if (is_partial && VAR_P (decl)
+ && check_for_bare_parameter_packs (DECL_TI_ARGS (decl)))
+   return error_mark_node;
 }
 
   if (is_partial)
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ69.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ69.C
new file mode 100644
index 000..420d617368c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ69.C
@@ -0,0 +1,5 @@
+// PR c++/100652
+// { dg-do compile { target c++14 } }
+
+template int var;
+template char var; // { dg-error "parameter packs not 
expanded" }
-- 
2.34.0.rc1.14.g88d915a634



Re: [PATCH] vect: Hookize better_loop_vinfo_p

2021-11-08 Thread Richard Sandiford via Gcc-patches
Richard Biener via Gcc-patches  writes:
> On Mon, Nov 8, 2021 at 2:06 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Mon, Nov 8, 2021 at 11:45 AM Richard Sandiford via Gcc-patches
>> >  wrote:
>> >>
>> >> One of the things we want to do on AArch64 is compare vector loops
>> >> side-by-side and pick the best one.  For some targets, we want this
>> >> to be based on issue rates as well as the usual latency-based costs
>> >> (at least for loops with relatively high iteration counts).
>> >>
>> >> The current approach to doing this is: when costing vectorisation
>> >> candidate A, try to guess what the other main candidate B will look
>> >> like and adjust A's latency-based cost up or down based on the likely
>> >> difference between A and B's issue rates.  This effectively means
>> >> that we try to cost parts of B at the same time as A, without actually
>> >> being able to see B.
>> >
>> > I think the idea of the current costing is that compares are always
>> > to the original scalar loop (with comparing the resulting magic
>> > cost numbers) so that by means of transitivity comparing two
>> > vector variants work as well.  So I'm not sure where 'variant B'
>> > comes into play here?
>>
>> E.g. A could be SVE and B could be Advanced SIMD, or A could be
>> SVE with fully-packed vectors and B could be SVE with partially
>> unpacked vectors.
>>
>> One motivating case is Neoverse V1, which is a 256-bit SVE target.
>> There, scalar code, Advanced SIMD code and SVE code have different issue
>> characteristics.  SVE vector ops generally have the same latencies as
>> the corresponding Advanced SIMD ops.  Advanced SIMD ops generally
>> have double the throughput of SVE ones, so that the overall bit-for-bit
>> throughputs are roughly equal.  However, there are differences due to
>> predication, load/store handling, and so on, and those differences
>> should be decisive in some cases.
>>
>> For SLP, latency-based costs are fine.  But for loop costs, they hide
>> too many details.  What works best in practice, both for vector vs.
>> vector and vector vs. scalar, is:
>>
>> (1) compare issue rates between two loop bodies (vector vs. vector
>> or vector vs. scalar)
>> (2) if issue rates are equal for a given number of scalar iterations,
>> compare the latencies of the loop bodies, as we do now
>> (3) if both the above are equal, compare the latencies of the prologue
>> and epilogue, as we do now
>>
>> It's very difficult to combine latency and issue rate information
>> into a single per-statement integer, in such a way that the integers
>> can just be summed and compared.
>
> Yeah, so the idea I had when proposing the init_cost/add_stmt/finish_cost
> was that finish_cost would determine the issue rate cost part, for
> example if the uarch can issue 2 ops with ISA A and 4 ops with ISA B
> then finish_cost would compute the number of cycles required to
> issue all vector stmts.  That yields sth like IPC the latency cost could
> be divided by.  Doing that for both ISA A and ISA B would produce
> weighted latency values that can be compared?  Alternatively
> you can of course simulate issue with the actual instruction latencies
> in mind and produce an overall iteration latency number.
>
> Comparing just latency or issue rate in isolation is likely not good
> enough?

In practice, comparing issue rates in isolation does seem to be what we
want as the first level of comparison.  I don't think total latency /
issue rate is really a meaningful combination.  It makes sense to the
extent that “lower latency == good” and “higher issue rate == good”,
but I don't think it's the case that halving the total latency is
equally valuable as doubling the issue rate.  In practice the latter is
a significant win while the former might or might not be.

total latency / issue rate also runs the risk of double-counting:
reducing the number of ops decreases the total latency *and* increases
the issue rate, which could have a quadratic effect.  (This is why we
don't decrease SVE costs when we think that the SVE code will issue
more quickly than the Advanced SIMD code.)

For a long-running loop, the only latencies that really matter are
for loop-carried dependencies.  The issue rate information takes
those into account, in that it tracks reduction and (with later
patches) induction limiters.

Thanks,
Richard

>
>> So returning to the above, when costing an SVE loop A, we currently
>> collect on-the-side issue information about both A and the likely
>> Advanced SIMD implementation B.  If B would have a higher throughput
>> than A then we multiply A's latency-based costs by:
>>
>>ceil(B throughput/A throughput)
>>
>> The problem is, we don't actually know whether B exists or what it
>> would look like (given that Advanced SIMD and SVE have different
>> features).
>>
>> In principle, we should do the same in reverse, but since SVE needs
>> fewer ops than Advanced SIMD, the latencies are usually smaller already.
>>

Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-08 Thread Martin Liška

On 9/28/21 22:39, Andrew MacLeod wrote:

In Theory, modifying the IL should be fine, it happens already in places, but 
its not extensively tested under those conditions yet.


Hello Andrew.

I've just tried using a global gimple_ranger and it crashes when loop 
unswitching duplicates
some BBs.

Please try the attached patch for:

$ ./xgcc -B. 
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/loop-unswitch-1.c -O3 -c
during GIMPLE pass: unswitch
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/loop-unswitch-1.c: In 
function ‘xml_colorize_line’:
/home/marxin/Programming/gcc/gcc/testsuite/gcc.dg/loop-unswitch-1.c:6:6: 
internal compiler error: in get_bb_range, at gimple-range-cache.cc:255
6 | void xml_colorize_line(unsigned int *p, int state)
  |  ^
0x871fcf sbr_vector::get_bb_range(irange&, basic_block_def const*)
/home/marxin/Programming/gcc/gcc/gimple-range-cache.cc:255
0x871fcf sbr_vector::get_bb_range(irange&, basic_block_def const*)
/home/marxin/Programming/gcc/gcc/gimple-range-cache.cc:253
0x1b99800 ranger_cache::fill_block_cache(tree_node*, basic_block_def*, 
basic_block_def*)
/home/marxin/Programming/gcc/gcc/gimple-range-cache.cc:1332
0x1b99bc4 ranger_cache::block_range(irange&, basic_block_def*, tree_node*, bool)
/home/marxin/Programming/gcc/gcc/gimple-range-cache.cc:1107
0x1b9461c gimple_ranger::range_on_entry(irange&, basic_block_def*, tree_node*)
/home/marxin/Programming/gcc/gcc/gimple-range.cc:144
0x1b95140 gimple_ranger::range_of_expr(irange&, tree_node*, gimple*)
/home/marxin/Programming/gcc/gcc/gimple-range.cc:118
0x1b9ebef fold_using_range::range_of_range_op(irange&, gimple*, fur_source&)
/home/marxin/Programming/gcc/gcc/gimple-range-fold.cc:600
0x1ba0221 fold_using_range::fold_stmt(irange&, gimple*, fur_source&, tree_node*)
/home/marxin/Programming/gcc/gcc/gimple-range-fold.cc:552
0x1b94a16 gimple_ranger::fold_range_internal(irange&, gimple*, tree_node*)
/home/marxin/Programming/gcc/gcc/gimple-range.cc:243
0x1b94bab gimple_ranger::range_of_stmt(irange&, gimple*, tree_node*)
/home/marxin/Programming/gcc/gcc/gimple-range.cc:273
0x10a5b5f tree_unswitch_single_loop
/home/marxin/Programming/gcc/gcc/tree-ssa-loop-unswitch.c:390
0x10a5d62 tree_unswitch_single_loop
/home/marxin/Programming/gcc/gcc/tree-ssa-loop-unswitch.c:546
0x10a68fe tree_ssa_unswitch_loops()
/home/marxin/Programming/gcc/gcc/tree-ssa-loop-unswitch.c:106

Martindiff --git a/gcc/testsuite/gcc.dg/loop-unswitch-6.c b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
new file mode 100644
index 000..8a022e0f200
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-6.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details --param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
+
+int
+__attribute__((noipa))
+foo(double *a, double *b, double *c, double *d, double *r, int size, int order)
+{
+  for (int i = 0; i < size; i++)
+  {
+double tmp, tmp2;
+
+switch(order)
+{
+  case 0:
+tmp = -8 * a[i];
+tmp2 = 2 * b[i];
+break;
+  case 1: 
+tmp = 3 * a[i] -  2 * b[i];
+tmp2 = 5 * b[i] - 2 * c[i];
+break;
+  case 2:
+tmp = 9 * a[i] +  2 * b[i] + c[i];
+tmp2 = 4 * b[i] + 2 * c[i] + 8 * d[i];
+break;
+  case 3:
+tmp = 3 * a[i] +  2 * b[i] - c[i];
+tmp2 = b[i] - 2 * c[i] + 8 * d[i];
+break;
+  defaut:
+__builtin_unreachable ();
+}
+
+double x = 3 * tmp + d[i] + tmp;
+double y = 3.4f * tmp + d[i] + tmp2;
+r[i] = x + y;
+  }
+
+  return 0;
+}
+
+#define N 16 * 1024
+double aa[N], bb[N], cc[N], dd[N], rr[N];
+
+int main()
+{
+  for (int i = 0; i < 100 * 1000; i++)
+foo (aa, bb, cc, dd, rr, N, i % 4);
+}
+
+
+/* Test that we actually unswitched something.  */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 0" "unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 1" "unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 2" "unswitch" } } */
+/* { dg-final { scan-tree-dump ";; Unswitching loop with condition: order.* == 3" "unswitch" } } */
diff --git a/gcc/testsuite/gcc.dg/loop-unswitch-7.c b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
new file mode 100644
index 000..00f2fcff64b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-unswitch-7.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funswitch-loops -fdump-tree-unswitch-details --param=max-unswitch-insns=1000 --param=max-unswitch-level=10" } */
+
+int
+foo(double *a, double *b, double *c, double *d, double *r, int size, int order)
+{
+  for (int i = 0; i < size; i++)
+  {
+double tmp, tmp2;
+
+switch(order)
+{
+  case 5 ... 6:
+  case 9:
+tmp = -8 * a[i];
+tmp2 = 2 * b[i];
+ 

[COMMITTED] path solver: Avoid recalculating ranges already in the cache.

2021-11-08 Thread Aldy Hernandez via Gcc-patches
The problem here is an ordering issue with a path that starts
with 19->3:

  [local count: 916928331]:
  # value_20 = PHI 
  # n_27 = PHI 
  n_16 = n_27 + 4;
  value_17 = value_20 / 1;
  if (value_20 > 4294967295)
goto ; [89.00%]
  else
goto ; [11.00%]

The problem here is that both value_17 and value_20 are in the set of
imports we must pre-calculate.  The value_17 name occurs first in the
bitmap, so we try to resolve it first, which causes us to recursively
solve the value_20 range.  We do so correctly and put them both in the
cache.  However, when we try to solve value_20 from the bitmap, we
ignore that it already has a cached entry and try to resolve the PHI
with the wrong value of value_17:

  # value_20 = PHI 

The right thing to do is to avoid recalculating definitions already
solved.

Regstrapped and checked for # threads before and after on x86-64 Linux.

gcc/ChangeLog:

PR tree-optimization/103120
* gimple-range-path.cc (path_range_query::range_defined_in_block):

gcc/testsuite/ChangeLog:

* gcc.dg/pr103120.c: New test.
---
 gcc/gimple-range-path.cc|  3 +++
 gcc/testsuite/gcc.dg/pr103120.c | 33 +
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr103120.c

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 9175651e896..9d3fe89185e 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -300,6 +300,9 @@ path_range_query::range_defined_in_block (irange &r, tree 
name, basic_block bb)
   if (def_bb != bb)
 return false;
 
+  if (get_cache (r, name))
+return true;
+
   if (gimple_code (def_stmt) == GIMPLE_PHI)
 ssa_range_in_phi (r, as_a (def_stmt));
   else
diff --git a/gcc/testsuite/gcc.dg/pr103120.c b/gcc/testsuite/gcc.dg/pr103120.c
new file mode 100644
index 000..b680a6c0fb0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr103120.c
@@ -0,0 +1,33 @@
+// { dg-do run }
+// { dg-options "-O2" }
+
+#define radix 10
+__INT32_TYPE__ numDigits(__UINT64_TYPE__ value)
+{
+ __INT32_TYPE__ n = 1;
+ while (value > __UINT32_MAX__)
+ {
+n += 4;
+value /= radix * radix * radix * radix;
+ }
+ __UINT32_TYPE__ v = (__UINT32_TYPE__)value;
+ while (1)
+ {
+ if (v < radix)
+ return n;
+ if (v < radix * radix)
+ return n + 1;
+ if (v < radix * radix * radix)
+ return n + 2;
+ if (v < radix * radix * radix * radix)
+ return n + 3;
+ n += 4;
+ v /= radix * radix * radix * radix;
+ }
+}
+
+int main()
+{
+if (numDigits(__UINT64_MAX__) != 20)
+__builtin_abort();
+}
-- 
2.31.1



[PATCH] Fix spurious valgrind errors in irred loop verification

2021-11-08 Thread Richard Biener via Gcc-patches
The sbitmap bitmap_{set,clear}_bit changes trigger spurious
uninit value use reportings from valgrind since we now
read the old value before setting/clearing a bit so
verify_loop_structures optimization to not clear the sbitmap is reported.

Fixed by using a temporary BB flag which should also be more
efficient in terms of cache re-use.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-11-08  Richard Biener  

* cfgloop.c (verify_loop_structure): Use a temporary BB flag
instead of an sbitmap to cache irreducible state.
---
 gcc/cfgloop.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
index 2ba9918bfa2..20c24c13c36 100644
--- a/gcc/cfgloop.c
+++ b/gcc/cfgloop.c
@@ -1567,19 +1567,17 @@ verify_loop_structure (void)
   /* Check irreducible loops.  */
   if (loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
 {
-  auto_edge_flag saved_irr_mask (cfun);
-  /* Record old info.  */
-  auto_sbitmap irreds (last_basic_block_for_fn (cfun));
+  auto_edge_flag saved_edge_irr (cfun);
+  auto_bb_flag saved_bb_irr (cfun);
+  /* Save old info.  */
   FOR_EACH_BB_FN (bb, cfun)
{
  edge_iterator ei;
  if (bb->flags & BB_IRREDUCIBLE_LOOP)
-   bitmap_set_bit (irreds, bb->index);
- else
-   bitmap_clear_bit (irreds, bb->index);
+   bb->flags |= saved_bb_irr;
  FOR_EACH_EDGE (e, ei, bb->succs)
if (e->flags & EDGE_IRREDUCIBLE_LOOP)
- e->flags |= saved_irr_mask;
+ e->flags |= saved_edge_irr;
}
 
   /* Recount it.  */
@@ -1591,34 +1589,35 @@ verify_loop_structure (void)
  edge_iterator ei;
 
  if ((bb->flags & BB_IRREDUCIBLE_LOOP)
- && !bitmap_bit_p (irreds, bb->index))
+ && !(bb->flags & saved_bb_irr))
{
  error ("basic block %d should be marked irreducible", bb->index);
  err = 1;
}
  else if (!(bb->flags & BB_IRREDUCIBLE_LOOP)
- && bitmap_bit_p (irreds, bb->index))
+  && (bb->flags & saved_bb_irr))
{
  error ("basic block %d should not be marked irreducible", 
bb->index);
  err = 1;
}
+ bb->flags &= ~saved_bb_irr;
  FOR_EACH_EDGE (e, ei, bb->succs)
{
  if ((e->flags & EDGE_IRREDUCIBLE_LOOP)
- && !(e->flags & saved_irr_mask))
+ && !(e->flags & saved_edge_irr))
{
  error ("edge from %d to %d should be marked irreducible",
 e->src->index, e->dest->index);
  err = 1;
}
  else if (!(e->flags & EDGE_IRREDUCIBLE_LOOP)
-  && (e->flags & saved_irr_mask))
+  && (e->flags & saved_edge_irr))
{
  error ("edge from %d to %d should not be marked irreducible",
 e->src->index, e->dest->index);
  err = 1;
}
- e->flags &= ~saved_irr_mask;
+ e->flags &= ~saved_edge_irr;
}
}
 }
-- 
2.31.1


[PATCH] c++: bogus error w/ variadic concept-id as if cond [PR98394]

2021-11-08 Thread Patrick Palka via Gcc-patches
Here when tentatively parsing the if condition as a declaration, we try
to treat C<1> as the start of a constrained placeholder type, which we
quickly reject because C doesn't accept a type as its first argument.
But since we're parsing tentatively, we shouldn't emit an error in this
case.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/98394

gcc/cp/ChangeLog:

* parser.c (cp_parser_placeholder_type_specifier): Don't emit
a "does not constrain a type" error when parsing tentatively.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr98394.C: New test.
---
 gcc/cp/parser.c   |  7 +--
 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C | 14 ++
 2 files changed, 19 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6..f1498e28da4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19909,8 +19909,11 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
   if (!flag_concepts_ts
  || !processing_template_parmlist)
{
- error_at (loc, "%qE does not constrain a type", DECL_NAME (con));
- inform (DECL_SOURCE_LOCATION (con), "concept defined here");
+ if (!tentative)
+   {
+ error_at (loc, "%qE does not constrain a type", DECL_NAME (con));
+ inform (DECL_SOURCE_LOCATION (con), "concept defined here");
+   }
  return error_mark_node;
}
 }
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C
new file mode 100644
index 000..c8407cdf7cd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C
@@ -0,0 +1,14 @@
+// PR c++/98394
+// { dg-do compile { target c++20 } }
+
+template
+concept C = true;
+
+template
+concept D = true;
+
+int main() {
+  if (C<1>); // { dg-bogus "does not constrain a type" }
+  if (D<1>); // { dg-error "wrong number of template arguments" }
+// { dg-bogus "does not constrain a type" "" { target *-*-* } .-1 }
+}
-- 
2.34.0.rc1.14.g88d915a634



Re: Some PINGs

2021-11-08 Thread Richard Biener via Gcc-patches
On Mon, Nov 8, 2021 at 3:02 PM Roger Sayle  wrote:
>
>
> Hi Richard,
>
> >> I wonder if reviewers could take a look (or a second look) at some of
> >> my outstanding patches.
> >> PR middle-end/100810: Penalize IV candidates with undefined value
> >> bases
> >> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578441.html
> >
> > I did comment on this one, noting the more general issue.
> > My opinion is still that doing heavy lifting in IVOPTs is misplaced.
>
> I wasn't sure whether you'd had the opportunity to give this bug some more
> thought.
>
> You're completely right, that it is theoretically possible for GCC to upgrade
> its data flow (CCP/VRP) passes to use a finer grained definition of undefined/
> uninitialized/indeterminate values; an indeterminate-value numbering pass
> if you will.  Under the constraints that the automatic variables are not 
> volatile,
> and the types don't supporting a trapping values, the compiler could determine
> that "undef1 - undef1", or "undef2 ^ undef2" have defined values, but that
> "undef1 - undef2" and "undef3 ^ undef4" remain indeterminate.  Like
> traditional value numbering, it may be possible to track "t1 = undef3 ^ 
> undef4",
> "t2 = t1 ^ undef4", "t3 = t2 - undef3".   Conceptually, the same applies to
> (floating point) mathematics and its numerous infinities, sometimes "+Inf - 
> +Inf"
> is known to be zero, provided that it is the exact same infinity (omega) that 
> is
> being subtracted.
>
> The two counter arguments for this solution as a fix for PR 100810, is that 
> such
> a pass doesn't yet exist, and it still (probably) falls foul of C/C++'s 
> undefined
> behaviour from use of an uninitialized automatic variable.  From an 
> engineering
> perspective, it's a lot of effort to support poorly written code.
>
> Quick question to the language lawyers:
>
> int foo()
> {
>   int undef;
>   return undef ^ undef;
> }
>
> int bar()
> {
>   volatile int undef;
>   return undef ^ undef;
> }
>
> Do the above functions invoke undefined behaviour?

Yes.  C17 6.3.2.1 says so.  Note that it has a strange restriction
on being 'register' qualifiable, so taking the address of 'undef'
in an unrelated stmt would make it not undefined?  Whenever the
language spec makes the use not undefined the argument is
that the abstract machine would for

  int bar()
  {
int undef;
int tem = undef;
return tem ^ tem;
  }

cause 'undef' to have a single evaluation and thus 'tem' have
a single consistent value.  In GCC we'd have propagated those out,
resulting in two evaluations (not sure if a SSA use can be considered
an "evaluation" or whether the non-existing(!) single definition is the sole
evaluation)

> The approach taken by the proposed patch is that it's the pass/transformation 
> that
> introduces more undefined behaviour than was present in the original code, 
> that is
> at fault.  Even if later passes, decided not to take advantage of UB, is 
> there any benefit
> for replacing an induction variable with a well-defined value (evolution) and 
> substituting
> it with one that depends upon indefinite values.  I'd argue the correct fix 
> is to go the other
> way, and attempt to reduce the instances of undefined behaviour.
>
> So transform
>
> int undef;
> while (cond())
>   undef++;
> ...
>
> which invokes UB on each iteration with:
>
> int undef;
> unsigned int count = 0;
> while (cond())
>   count++;
> undef += count;
> ...
>
> which now only invokes UB after the loop.
>
> Consider:
> int undef;
> int result = 0;
> while (cond())
>   result ^= undef;
> return result;
>
> where the final value of result may be well-defined if the loop iterates
> an even number of times.
>
> int undef;
> int result = 0;
> while (cond())
>   result ^= 1;
> return result ? undef : 0;
>
>
> Has anyone proposed an alternate fix to PR middle-end/100810?
> We can always revert my proposed fix (for this P1 regression), once IV opts
> is able to confirm that it is safe to make the substitution that it is 
> proposing.

I do agree with the idea to not increase the number of UNDEF uses.  But the
issue is that undefinedness as in what your patch would consider is brittle,
it can be easily hidden in a function parameter and thus the bug will likely
re-surface if you just hide the undefinedness from the pass in some way.

So any place where we try to do sth when we see an UNDEF for _correctness_
reasons should really be testing for must-definedness instead which I think
is similarly infeasible and broken.

So for the PR at hand it's actually ifcombine that turns the testcase from
one not invoking undefined behavior first (all uses of 'i' are never executed)
into one that does by turning

if (!b)
  i &&d;

into

   _Bool tem = b == 0 & i != 0;
   if (tem)
 d;

the loop body is indeed mangled by IVOPTs by replacing h with 

[PATCH] ipa: Unshare expresseions before putting them into debug statements (PR 103099, PR 103107)

2021-11-08 Thread Martin Jambor
Hi,

my recent patch to improve debug experience when there are removed
parameters (by ipa-sra or ipa-split) was not careful to unshare the
expressions that were then put into debug statements, which manifests
itself as PR 103099.  This patch adds unsharing them using
unshare_expr_without_location which is a bit more careful with stripping
locations than what we were doing manually and so also fixes PR 103107.

Bootstrapped an tested on x86_64-linux.  OK for master?

Thanks,

Martin

gcc/ChangeLog:

2021-11-08  Martin Jambor  

PR ipa/103099
PR ipa/103107
* tree-inline.c (remap_gimple_stmt): Unshare the expression without
location before invoking remap_with_debug_expressions on it.
* ipa-param-manipulation.c
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.

gcc/testsuite/ChangeLog:

2021-11-08  Martin Jambor  

PR ipa/103099
PR ipa/103107
* g++.dg/ipa/pr103099.C: New test.
* gcc.dg/ipa/pr103107.c: Likewise.
---
 gcc/ipa-param-manipulation.c|  4 ++--
 gcc/testsuite/g++.dg/ipa/pr103099.C | 25 +
 gcc/testsuite/gcc.dg/ipa/pr103107.c | 17 +
 gcc/tree-inline.c   |  5 -
 4 files changed, 48 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr103099.C
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103107.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index c84d669521c..44f3bed2640 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1185,8 +1185,8 @@ ipa_param_body_adjustments::prepare_debug_expressions 
(tree dead_ssa)
  return (*d != NULL_TREE);
}
 
-  tree val = gimple_assign_rhs_to_tree (def);
-  SET_EXPR_LOCATION (val, UNKNOWN_LOCATION);
+  tree val
+   = unshare_expr_without_location (gimple_assign_rhs_to_tree (def));
   remap_with_debug_expressions (&val);
 
   tree vexpr = make_node (DEBUG_EXPR_DECL);
diff --git a/gcc/testsuite/g++.dg/ipa/pr103099.C 
b/gcc/testsuite/g++.dg/ipa/pr103099.C
new file mode 100644
index 000..5fb137d6799
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr103099.C
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+void pthread_mutex_unlock(int *);
+int __gthread_mutex_unlock___mutex, unlock___trans_tmp_1;
+struct Object {
+  void _change_notify() {}
+  bool _is_queued_for_deletion;
+};
+struct ClassDB {
+  template  static int bind_method(N, M);
+};
+struct CanvasItemMaterial : Object {
+  bool particles_animation;
+  void set_particles_animation(bool);
+};
+void CanvasItemMaterial::set_particles_animation(bool p_particles_anim) {
+  particles_animation = p_particles_anim;
+  if (unlock___trans_tmp_1)
+pthread_mutex_unlock(&__gthread_mutex_unlock___mutex);
+  _change_notify();
+}
+void CanvasItemMaterial_bind_methods() {
+  ClassDB::bind_method("", &CanvasItemMaterial::set_particles_animation);
+}
diff --git a/gcc/testsuite/gcc.dg/ipa/pr103107.c 
b/gcc/testsuite/gcc.dg/ipa/pr103107.c
new file mode 100644
index 000..3ea3fc55947
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr103107.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Og -g -fipa-sra -fno-tree-dce" } */
+
+typedef int __attribute__((__vector_size__ (8))) V;
+V v;
+
+static void
+bar (int i)
+{
+  V l = v + i;
+}
+
+void
+foo (void)
+{
+  bar (0);
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index d78e4392b69..53d664ec2e4 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -1825,7 +1825,10 @@ remap_gimple_stmt (gimple *stmt, copy_body_data *id)
  tree value = gimple_debug_bind_get_value (stmt);
  if (id->param_body_adjs
  && id->param_body_adjs->m_dead_stmts.contains (stmt))
-   id->param_body_adjs->remap_with_debug_expressions (&value);
+   {
+ value = unshare_expr_without_location (value);
+ id->param_body_adjs->remap_with_debug_expressions (&value);
+   }
 
  gdebug *copy = gimple_build_debug_bind (var, value, stmt);
  if (id->reset_location)
-- 
2.33.0



Re: [PATCH] ipa: Unshare expresseions before putting them into debug statements (PR 103099, PR 103107)

2021-11-08 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> my recent patch to improve debug experience when there are removed
> parameters (by ipa-sra or ipa-split) was not careful to unshare the
> expressions that were then put into debug statements, which manifests
> itself as PR 103099.  This patch adds unsharing them using
> unshare_expr_without_location which is a bit more careful with stripping
> locations than what we were doing manually and so also fixes PR 103107.
> 
> Bootstrapped an tested on x86_64-linux.  OK for master?
> 
> Thanks,
> 
> Martin
> 
> gcc/ChangeLog:
> 
> 2021-11-08  Martin Jambor  
> 
>   PR ipa/103099
>   PR ipa/103107
>   * tree-inline.c (remap_gimple_stmt): Unshare the expression without
>   location before invoking remap_with_debug_expressions on it.
>   * ipa-param-manipulation.c
>   (ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-11-08  Martin Jambor  
> 
>   PR ipa/103099
>   PR ipa/103107
>   * g++.dg/ipa/pr103099.C: New test.
>   * gcc.dg/ipa/pr103107.c: Likewise.
OK,
thanks!
Honza


Re: [PATCH v1 1/7] LoongArch Port: gcc

2021-11-08 Thread Xi Ruoyao via Gcc-patches
On Mon, 2021-11-08 at 10:30 +0800, Chenghua Xu wrote:
> This patch does not arrive at  mail list. Send as an attachment in a 
> compressed format.

I think .patch.gz is perferred instead of .tar.gz.

And is it possible to seperate this into multiple commits?  For example
the whole "-march=native" support can be in a seperate commit.  It will
be easier to review those changes one-by-one.

> --- /dev/null
> +++ b/gcc/config/loongarch/linux.h
/* snip */
> +  /* Integer ABI  */
> +  #if DEFAULT_ABI_INT == ABI_LP64
> +#define INT_ABI_SUFFIX "lib64"
> +  #endif

"INT_ABI_SUFFIX" should be renamed to INT_ABI_LIBDIR or something. 
"lib64" is not a "suffix".

> --- /dev/null
> +++ b/gcc/config/loongarch/loongarch-opts.c
/* snip */

> +  /* 5.  Check integer ABI-ISA for conflicts.  */
> +  switch (*isa_int)
> +{
> +case ISA_LA64:
> +  if (*abi_int != ABI_LP64) goto error_int_abi;
> +  break;

/* snip */

> +  switch (*isa_float)
> +   {
> +   case ISA_SOFT_FLOAT:
> + if (*abi_float != ABI_SOFT_FLOAT) goto error_float_abi;
> + break;
> +
> +   case ISA_SINGLE_FLOAT:
> + if (*abi_float != ABI_SINGLE_FLOAT) goto error_float_abi;
> + break;
> +
> +   case ISA_DOUBLE_FLOAT:
> + if (*abi_float != ABI_DOUBLE_FLOAT) goto error_float_abi;
> + break;
> 

The goto statements should be in a new line (coding style).

> --- /dev/null
> +++ b/gcc/config/loongarch/gnu-user.h
/* snip */

> +#define GLIBC_DYNAMIC_LINKER_LP64 "/lib64/ld.so.1"

It is "ld-linux-loongarch-lp64d.so.x" in the latest ELF psABI.  "x" is
now 0 but we have an ongoing discussion to make it 1.

> +++ b/gcc/config/loongarch/sync.md
/* snip */

> +(define_insn "atomic_cas_value_strong"
> +  [(set (match_operand:GPR 0 "register_operand" "=&r")
> +   (match_operand:GPR 1 "memory_operand" "+ZC"))
> +   (set (match_dup 1)
> +   (unspec_volatile:GPR [(match_operand:GPR 2 "reg_or_0_operand" "rJ")
> + (match_operand:GPR 3 "reg_or_0_operand" "rJ")
> + (match_operand:SI 4 "const_int_operand")  ;; 
> mod_s
> + (match_operand:SI 5 "const_int_operand")] ;; 
> mod_f
> +UNSPEC_COMPARE_AND_SWAP))
> +   (clobber (match_scratch:GPR 6 "=&r"))]
> +  ""
> +{
> +  static char buff[256] = {0};

> +  buff[0] = '\0';
> +  sprintf (buff + strlen (buff), "%%G5\\n\\t");
> +  sprintf (buff + strlen (buff), "1:\\n\\t");
> +  sprintf (buff + strlen (buff), "ll.\\t%%0,%%1\\n\\t");
> +  sprintf (buff + strlen (buff), "bne\\t%%0,%%z2,2f\\n\\t");
> +  sprintf (buff + strlen (buff), "or%%i3\\t%%6,$zero,%%3\\n\\t");
> +  sprintf (buff + strlen (buff), "sc.\\t%%6,%%1\\n\\t");
> +  sprintf (buff + strlen (buff), "beq\\t$zero,%%6,1b\\n\\t");
> +  sprintf (buff + strlen (buff), "b\\t3f\\n\\t");
> +  sprintf (buff + strlen (buff), "2:\\n\\t");
> +  sprintf (buff + strlen (buff), "dbar\\t0x700\\n\\t");
> +  sprintf (buff + strlen (buff), "3:\\n\\t");
> +
> +  return buff;
> +}

These "cascading" sprintf/strlen looks stupid. It can be simply:

  return "%G5\\n\\t"
 "1:\\n\\t"
 "ll.\\t%0,%1\\n\\t"
 ...
 "3:\\n\\t";

The compiler will concatenate the string literals so there will be no
runtime overhead.

And there should be some comment to explain this LL/SC loop and dbar
workaround IMO.

Likewise for other atomic LL/SC expansions.
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


[RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)

2021-11-08 Thread Matthias Kretz
On Tuesday, 17 August 2021 20:31:54 CET Jason Merrill wrote:
> > 2. Given a DECL_TI_ARGS tree, can I query whether an argument was deduced
> > or explicitly specified? I'm asking because I still consider diagnostics
> > of function templates unfortunate. `template  void f()` is fine,
> > as is `void f(T) [with T = float]`, but `void f() [with T = float]` could
> > be better. I.e. if the template parameter appears somewhere in the
> > function parameter list, dump_template_parms would only produce noise.
> > If, however, the template parameter was given explicitly, it would be
> > nice if it could show up accordingly in diagnostics.
> 
> NON_DEFAULT_TEMPLATE_ARGS_COUNT has that information, though there are
> some issues with it.  Attached is my WIP from May to improve it
> somewhat, if that's interesting.

It is interesting. I used your patch to come up with the attached. Patch. I 
must say, I didn't try to read through all the cp/pt.c code to understand all 
of what you did there (which is why my ChangeLog entry says "Jason?"), but it 
works for me (and all of `make check`).

Anyway, I'd like to propose the following before finishing my diagnose_as 
patch. I believe it's useful to fix this part first. The diagnostic/default-
template-args-[12].C tests show a lot of examples of the intent of this patch. 
And the remaining changes to the testsuite show how it changes diagnostic 
output.

-- 8< 

The choice when to print a function template parameter was still
suboptimal. That's because sometimes the function template parameter
list only adds noise, while in other situations the lack of a function
template parameter list makes diagnostic messages hard to understand.

The general idea of this change is to print template parms wherever they
would appear in the source code as well. Thus, the diagnostics code
needs to know whether any template parameter was given explicitly.

Signed-off-by: Matthias Kretz 

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/template-params-12n.C: Optionally, allow
DW_AT_default_value.
* g++.dg/diagnostic/default-template-args-1.C: New.
* g++.dg/diagnostic/default-template-args-2.C: New.
* g++.dg/diagnostic/param-type-mismatch-2.C: Expect template
parms in diagnostic.
* g++.dg/ext/pretty1.C: Expect function template specialization
to not pretty-print template parms.
* g++.old-deja/g++.ext/pretty3.C: Ditto.
* g++.old-deja/g++.pt/memtemp77.C: Ditto.
* g++.dg/goacc/template.C: Expect function template parms for
explicit arguments.
* g++.dg/gomp/declare-variant-7.C: Expect no function template
parms for deduced arguments.
* g++.dg/template/error40.C: Expect only non-default template
arguments in diagnostic.

gcc/cp/ChangeLog:

* cp-tree.h (GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Return
absolute value of stored constant.
(EXPLICIT_TEMPLATE_ARGS_P): New.
(SET_EXPLICIT_TEMPLATE_ARGS_P): New.
(TFF_AS_PRIMARY): New constant.
* error.c (get_non_default_template_args_count): Avoid
GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT if
NON_DEFAULT_TEMPLATE_ARGS_COUNT is a NULL_TREE. Make independent
of flag_pretty_templates.
(dump_template_bindings): Add flags parameter to be passed to
get_non_default_template_args_count. Print only non-default
template arguments.
(dump_function_decl): Call dump_function_name and dump_type of
the DECL_CONTEXT with specialized template and set
TFF_AS_PRIMARY for their flags.
(dump_function_name): Add and document conditions for calling
dump_template_parms.
(dump_template_parms): Print only non-default template
parameters.
* pt.c (determine_specialization): Jason?
(template_parms_level_to_args): Jason?
(copy_template_args): Jason?
(fn_type_unification): Set EXPLICIT_TEMPLATE_ARGS_P on the
template arguments tree if any template parameter was explicitly
given.
(type_unification_real): Jason?
(get_partial_spec_bindings): Jason?
(tsubst_template_args): Determine number of defaulted arguments
from new argument vector, if possible.
---
 gcc/cp/cp-tree.h  | 18 +++-
 gcc/cp/error.c| 83 ++-
 gcc/cp/pt.c   | 58 +
 .../g++.dg/debug/dwarf2/template-params-12n.C |  2 +-
 .../diagnostic/default-template-args-1.C  | 73 
 .../diagnostic/default-template-args-2.C  | 37 +
 .../g++.dg/diagnostic/param-type-mismatch-2.C |  2 +-
 gcc/testsuite/g++.dg/ext/pretty1.C|  2 +-
 gcc/testsuite/g++.dg/goacc/template.C |  8 +-
 gcc/testsuite/g++.dg/gomp/declare-variant-7.C |  4 +-
 gcc/testsuite/g++.dg/template/error40.C   |  6 +-
 gcc/testsuite

[PATCH] ipa: Fix segfault when remapping debug_binds with expressions (PR 103132)

2021-11-08 Thread Martin Jambor
Hi,

my initial implementation of the method
ipa_param_body_adjustments::remap_with_debug_expressions was based on
the assumption that if it was asked to remap an expression (as opposed
to a simple SSA_NAME), the expression would not contain an SSA_NAME
operand which is to be debug-reset.  While that is true for when
called from ipa_param_body_adjustments::prepare_debug_expressions, it
turns out it is not true when invoked from remap_gimple_stmt in
tree-inline.c.  This patch adds a simple logic to handle such cases
and simply map the entire value to NULL_TREE in those cases.

I have bootstrapped and tested the patch on x86_64-linux.  OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2021-11-08  Martin Jambor  

PR ipa/103132
* ipa-param-manipulation.c (replace_with_mapped_expr): Early
return with error_mark_mode when part of expression is mapped to
NULL.
(ipa_param_body_adjustments::remap_with_debug_expressions): Set
mapped value to NULL if walk_tree returns error_mark_mode.

gcc/testsuite/ChangeLog:

2021-11-08  Martin Jambor  

PR ipa/103132
* gcc.dg/ipa/pr103132.c: New test.
---
 gcc/ipa-param-manipulation.c| 27 +++
 gcc/testsuite/gcc.dg/ipa/pr103132.c | 19 +++
 2 files changed, 38 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103132.c

diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 44f3bed2640..4610fc4ac03 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1071,8 +1071,9 @@ ipa_param_body_adjustments::mark_dead_statements (tree 
dead_param,
 }
 
 /* Callback to walk_tree.  If REMAP is an SSA_NAME that is present in hash_map
-   passed in DATA, replace it with unshared version of what it was mapped
-   to.  */
+   passed in DATA, replace it with unshared version of what it was mapped to.
+   If an SSA argument would be remapped to NULL, the whole operation needs to
+   abort which is signaled by returning error_mark_node.  */
 
 static tree
 replace_with_mapped_expr (tree *remap, int *walk_subtrees, void *data)
@@ -1089,7 +1090,11 @@ replace_with_mapped_expr (tree *remap, int 
*walk_subtrees, void *data)
 
   hash_map *equivs = (hash_map *) data;
   if (tree *p = equivs->get (*remap))
-*remap = unshare_expr (*p);
+{
+  if (!*p)
+   return error_mark_node;
+  *remap = unshare_expr (*p);
+}
   return 0;
 }
 
@@ -1100,16 +1105,22 @@ void
 ipa_param_body_adjustments::remap_with_debug_expressions (tree *t)
 {
   /* If *t is an SSA_NAME which should have its debug statements reset, it is
- mapped to NULL in the hash_map.  We need to handle that case separately or
- otherwise the walker would segfault.  No expression that is more
- complicated than that can have its operands mapped to NULL.  */
+ mapped to NULL in the hash_map.
+
+ It is perhaps simpler to handle the SSA_NAME cases directly and only
+ invoke walk_tree on more complex expressions.  When
+ remap_with_debug_expressions is called from tree-inline.c, a to-be-reset
+ SSA_NAME can be an operand to such expressions and the entire debug
+ variable we are remapping should be reset.  This is signaled by walk_tree
+ returning error_mark_node and done by setting *t to NULL.  */
   if (TREE_CODE (*t) == SSA_NAME)
 {
   if (tree *p = m_dead_ssa_debug_equiv.get (*t))
*t = *p;
 }
-  else
-walk_tree (t, replace_with_mapped_expr, &m_dead_ssa_debug_equiv, NULL);
+  else if (walk_tree (t, replace_with_mapped_expr,
+ &m_dead_ssa_debug_equiv, NULL) == error_mark_node)
+*t = NULL_TREE;
 }
 
 /* For an SSA_NAME DEAD_SSA which is about to be DCEd because it is based on a
diff --git a/gcc/testsuite/gcc.dg/ipa/pr103132.c 
b/gcc/testsuite/gcc.dg/ipa/pr103132.c
new file mode 100644
index 000..bef56494c03
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr103132.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+int globus_i_GLOBUS_GRIDFTP_SERVER_debug_handle_1;
+int globus_l_gfs_ipc_unpack_data__sz;
+void globus_i_GLOBUS_GRIDFTP_SERVER_debug_printf(const char *, ...);
+static void globus_l_gfs_ipc_unpack_cred(int len) {
+  if (globus_i_GLOBUS_GRIDFTP_SERVER_debug_handle_1)
+globus_i_GLOBUS_GRIDFTP_SERVER_debug_printf("", __func__);
+}
+static void globus_l_gfs_ipc_unpack_data(int len) {
+  for (; globus_l_gfs_ipc_unpack_data__sz;)
+len--;
+  len -= 4;
+  len -= 4;
+  globus_l_gfs_ipc_unpack_cred(len);
+}
+void globus_l_gfs_ipc_reply_read_body_cb(int len)
+{ globus_l_gfs_ipc_unpack_data(len); }
-- 
2.33.0



Re: [PATCH] libstdc++: only define _GLIBCXX_HAVE_TLS for VxWorks >= 6.6

2021-11-08 Thread Olivier Hainque via Gcc-patches



> On 8 Nov 2021, at 10:56, Rasmus Villemoes  wrote:
> 
> According to
> https://gcc.gnu.org/legacy-ml/gcc-patches/2008-03/msg01698.html, the
> TLS support, including the __tls_lookup function, was added to VxWorks
> in 6.6.
> 
> It certainly doesn't exist on our VxWorks 5 platform, but the fallback
> code in eh_globals.cc using __gthread_key_create() etc. used to work
> just fine.
> 
> libstdc++-v3/ChangeLog:
> 
>   * config/os/vxworks/os_defines.h (_GLIBCXX_HAVE_TLS): Only
>   define for VxWorks >= 6.6.

Good for me, thanks Rasmus.

Cheers,

Olivier




Re: [i386] Fix couple of issues in large PIC model on x86-64/VxWorks

2021-11-08 Thread Olivier Hainque via Gcc-patches



> On 8 Nov 2021, at 09:27, Eric Botcazou  wrote:
> 
>> LGTM for the generic part, no idea for VxWorks.
> 
> Thanks.  The VxWorks-specific hunk is needed to make GCC compatible with the 
> system compiler on this architecture (LLVM) and I have CCed Olivier.

Good for me, thanks Eric!



Re: [PATCH v4] Fix ICE when mixing VLAs and statement expressions [PR91038]

2021-11-08 Thread Jason Merrill via Gcc-patches

On 11/7/21 01:40, Uecker, Martin wrote:

Am Mittwoch, den 03.11.2021, 10:18 -0400 schrieb Jason Merrill:

On 10/31/21 05:22, Uecker, Martin wrote:

Hi Jason,

here is the fourth version of the patch.

I followed your suggestion and now make this
transformation sooner in pointer_int_sum.
I also added a check to only do this
transformation when the pointer is not a
VAR_DECL, which avoids it in the most
common cases where it is not necessary.

Looking for BIND_EXPR seems complicated
and I could not convince myself that it is
worth it.  I also see the risk that this
makes potential failure cases even more
subtle. What do you think?


That makes sense.  Though see some minor comments below.


Thank you! I made these changes and ran
bootstrap and tests again.


Hmm, it doesn't look like you made the change to use the save_expr 
function instead of build1?



Ok for trunk?


Any idea how to fix returning structs with
VLA member from statement expressions?


Testcase?


Otherwise, I will add an error message to
the FE in another patch.

Martin




Fix ICE when mixing VLAs and statement expressions [PR91038]

When returning VM-types from statement expressions, this can
lead to an ICE when declarations from the statement expression
are referred to later. Most of these issues can be addressed by
gimplifying the base expression earlier in gimplify_compound_lval.
Another issue is fixed by wrapping the pointer expression in
pointer_int_sum. This fixes PR91038 and some of the test cases
from PR29970 (structs with VLA members need further work).

  
  2021-10-30  Martin Uecker  
  
  gcc/

  PR c/91038
  PR c/29970
* c-family/c-common.c (pointer_int_sum): Make sure
pointer expressions are evaluated first when the size
expression depends on for variably-modified types.
  * gimplify.c (gimplify_var_or_parm_decl): Update comment.
  (gimplify_compound_lval): Gimplify base expression first.
  (gimplify_target_expr): Add comment.
  
  gcc/testsuite/

  PR c/91038
  PR c/29970
  * gcc.dg/vla-stexp-3.c: New test.
  * gcc.dg/vla-stexp-4.c: New test.
  * gcc.dg/vla-stexp-5.c: New test.
  * gcc.dg/vla-stexp-6.c: New test.
  * gcc.dg/vla-stexp-7.c: New test.
  * gcc.dg/vla-stexp-8.c: New test.
  * gcc.dg/vla-stexp-9.c: New test.




diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 436df45df68..668a2a129c6 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3306,7 +3306,19 @@ pointer_int_sum (location_t loc, enum tree_code 
resultcode,
 TREE_TYPE (result_type)))
  size_exp = integer_one_node;
else
-size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type));
+{
+  size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type));
+  /* Wrap the pointer expression in a SAVE_EXPR to make sure it
+is evaluated first when the size expression may depend
+on it for VM types.  */
+  if (TREE_SIDE_EFFECTS (size_exp)
+ && TREE_SIDE_EFFECTS (ptrop)
+ && variably_modified_type_p (TREE_TYPE (ptrop), NULL))
+   {
+ ptrop = build1_loc (loc, SAVE_EXPR, TREE_TYPE (ptrop), ptrop);


I still wonder why you are using build1 instead of the save_expr function?


+ size_exp = build2 (COMPOUND_EXPR, TREE_TYPE (intop), ptrop, size_exp);
+   }
+}
  
/* We are manipulating pointer values, so we don't need to warn

   about relying on undefined signed overflow.  We disable the
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2ab96e7e18..84f7dc3c248 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2964,7 +2964,9 @@ gimplify_var_or_parm_decl (tree *expr_p)
   declaration, for which we've already issued an error.  It would
   be really nice if the front end wouldn't leak these at all.
   Currently the only known culprit is C++ destructors, as seen
- in g++.old-deja/g++.jason/binding.C.  */
+ in g++.old-deja/g++.jason/binding.C.
+ Another possible culpit are size expressions for variably modified
+ types which are lost in the FE or not gimplified correctly.  */
if (VAR_P (decl)
&& !DECL_SEEN_IN_BIND_EXPR_P (decl)
&& !TREE_STATIC (decl) && !DECL_EXTERNAL (decl)
@@ -3109,16 +3111,22 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
   expression until we deal with any variable bounds, sizes, or
   positions in order to deal with PLACEHOLDER_EXPRs.
  
- So we do this in three steps.  First we deal with the annotations

- for any variables in the components, then we gimplify the base,
- then we gimplify any indices, from left to right.  */
+ The base expression may contain a statement expression that
+ has declarations used in size expressions, so has to be
+ gimplified before gimplifying the size expressions.
+
+ So we do this in th

Re: [PATCH] c++: bogus error w/ variadic concept-id as if cond [PR98394]

2021-11-08 Thread Jason Merrill via Gcc-patches

On 11/8/21 10:35, Patrick Palka wrote:

Here when tentatively parsing the if condition as a declaration, we try
to treat C<1> as the start of a constrained placeholder type, which we
quickly reject because C doesn't accept a type as its first argument.
But since we're parsing tentatively, we shouldn't emit an error in this
case.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/98394

gcc/cp/ChangeLog:

* parser.c (cp_parser_placeholder_type_specifier): Don't emit
a "does not constrain a type" error when parsing tentatively.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr98394.C: New test.
---
  gcc/cp/parser.c   |  7 +--
  gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C | 14 ++
  2 files changed, 19 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6..f1498e28da4 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19909,8 +19909,11 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
if (!flag_concepts_ts
  || !processing_template_parmlist)
{
- error_at (loc, "%qE does not constrain a type", DECL_NAME (con));
- inform (DECL_SOURCE_LOCATION (con), "concept defined here");
+ if (!tentative)
+   {
+ error_at (loc, "%qE does not constrain a type", DECL_NAME (con));
+ inform (DECL_SOURCE_LOCATION (con), "concept defined here");
+   }


We probably want to cp_parser_simulate_error in the tentative case?

I also wonder why this code uses a "tentative" parameter instead of 
checking cp_parser_parsing_tentatively itself.


Jason



Re: [PATCH] c++: unexpanded pack in var tmpl partial spec [PR100652]

2021-11-08 Thread Jason Merrill via Gcc-patches

On 11/8/21 09:45, Patrick Palka wrote:

Here we're not spotting a bare parameter pack appearing in the argument
list of a variable template partial specialization because we only look
for them within the decl's TREE_TYPE, which is sufficient for class
templates but not for variable templates.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


PR c++/100652

gcc/cp/ChangeLog:

* pt.c (push_template_decl): Check for bare parameter packs in
the argument list of a variable template partial specialization.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ69.C: New test.
---
  gcc/cp/pt.c  | 17 -
  gcc/testsuite/g++.dg/cpp1y/var-templ69.C |  5 +
  2 files changed, 17 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ69.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b82b9cc3cd2..991a20a85d4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5877,12 +5877,19 @@ push_template_decl (tree decl, bool is_friend)
if (check_for_bare_parameter_packs (TYPE_RAISES_EXCEPTIONS (type)))
TYPE_RAISES_EXCEPTIONS (type) = NULL_TREE;
  }
-  else if (check_for_bare_parameter_packs (is_typedef_decl (decl)
-  ? DECL_ORIGINAL_TYPE (decl)
-  : TREE_TYPE (decl)))
+  else
  {
-  TREE_TYPE (decl) = error_mark_node;
-  return error_mark_node;
+  if (check_for_bare_parameter_packs (is_typedef_decl (decl)
+ ? DECL_ORIGINAL_TYPE (decl)
+ : TREE_TYPE (decl)))
+   {
+ TREE_TYPE (decl) = error_mark_node;
+ return error_mark_node;
+   }
+
+  if (is_partial && VAR_P (decl)
+ && check_for_bare_parameter_packs (DECL_TI_ARGS (decl)))
+   return error_mark_node;
  }
  
if (is_partial)

diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ69.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ69.C
new file mode 100644
index 000..420d617368c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ69.C
@@ -0,0 +1,5 @@
+// PR c++/100652
+// { dg-do compile { target c++14 } }
+
+template int var;
+template char var; // { dg-error "parameter packs not 
expanded" }





Re: [COMMITTED] path oracle: Do not look at root oracle for killed defs.

2021-11-08 Thread Andrew MacLeod via Gcc-patches

On 11/8/21 3:44 AM, Richard Biener via Gcc-patches wrote:

On Sat, Nov 6, 2021 at 4:38 PM Aldy Hernandez via Gcc-patches
 wrote:

[This is more Andrew's domain, but this is a P1 PR and I'd like to
unbreak the SPEC run, since this is a straigthforward fix.  When he
returns he can double check my work and give additional suggestions.]

The problem here is that we are incorrectly threading 41->20->21 here:

[local count: 56063504182]:
   _134 = M.10_120 + 1;
   if (_71 <= _134)
 goto ; [11.00%]
   else
 goto ; [89.00%]
...
...
...
[local count: 49896518755]:

[local count: 56063503181]:
   # lb_75 = PHI <_134(41), 1(18)>
   _117 = mstep_49 + lb_75;
   _118 = _117 + -1;
   _119 = mstep_49 + _118;
   M.10_120 = MIN_EXPR <_119, _71>;
   if (lb_75 > M.10_120)
 goto ; [11.00%]
   else
 goto ; [89.00%]

First, lb_17 == _134 because of the PHI.
Second, _134 > M.10_120 because of _134 = M.10_120 + 1.

We then assume that lb_75 > M.10_120, but this is incorrect because
M.10_120 was killed along the path.

Huh, since SSA has only a single definition it cannot be "killed".
What can happen is that if you look across backedges that the same
reg can have two different values.  Basically when you look across
backedges you have to discard all knowledge you derived from
stuff that's dominated by the backedge destination.

Yeah, its just terminology... the path oracle follows a linear set of 
edges, so when we see the DEF of an ssa-name, any existing relations 
that have been seen have, by definition, come from a back edge at some 
point.  If the root oracle has that relation, as in this case, it came 
from a back edge.


"Killing" is referring to killing any relations in the path up to this 
point when we see the definition of the SSA_NAME. This includes anything 
in, or preceding, the path to this point.. We were removing any 
relations found in the path, but  were missing the "preceding" part.


Andrew




Re: [COMMITTED] path oracle: Do not look at root oracle for killed defs.

2021-11-08 Thread Andrew MacLeod via Gcc-patches

On 11/6/21 11:35 AM, Aldy Hernandez wrote:

[This is more Andrew's domain, but this is a P1 PR and I'd like to
unbreak the SPEC run, since this is a straigthforward fix.  When he
returns he can double check my work and give additional suggestions.]

The problem here is that we are incorrectly threading 41->20->21 here:

[local count: 56063504182]:
   _134 = M.10_120 + 1;
   if (_71 <= _134)
 goto ; [11.00%]
   else
 goto ; [89.00%]
...
...
...
[local count: 49896518755]:

[local count: 56063503181]:
   # lb_75 = PHI <_134(41), 1(18)>
   _117 = mstep_49 + lb_75;
   _118 = _117 + -1;
   _119 = mstep_49 + _118;
   M.10_120 = MIN_EXPR <_119, _71>;
   if (lb_75 > M.10_120)
 goto ; [11.00%]
   else
 goto ; [89.00%]

First, lb_17 == _134 because of the PHI.
Second, _134 > M.10_120 because of _134 = M.10_120 + 1.

We then assume that lb_75 > M.10_120, but this is incorrect because
M.10_120 was killed along the path.

This incorrect thread causes the miscompilation in 527.cam4_r.

Tested on x86-64 and ppc64le Linux.

Committed.

gcc/ChangeLog:

PR tree-optimization/103061
* value-relation.cc (path_oracle::path_oracle): Initialize
m_killed_defs.
(path_oracle::killing_def): Set m_killed_defs.
(path_oracle::query_relation): Do not look at the root oracle for
killed defs.
* value-relation.h (class path_oracle): Add m_killed_defs.
---
  gcc/value-relation.cc | 9 +
  gcc/value-relation.h  | 1 +
  2 files changed, 10 insertions(+)

diff --git a/gcc/value-relation.cc b/gcc/value-relation.cc
index f1e46d38de1..a0105481466 100644
--- a/gcc/value-relation.cc
+++ b/gcc/value-relation.cc
@@ -1235,6 +1235,7 @@ path_oracle::path_oracle (relation_oracle *oracle)
m_equiv.m_next = NULL;
m_relations.m_names = BITMAP_ALLOC (&m_bitmaps);
m_relations.m_head = NULL;
+  m_killed_defs = BITMAP_ALLOC (&m_bitmaps);
  }
  
  path_oracle::~path_oracle ()

@@ -1305,6 +1306,8 @@ path_oracle::killing_def (tree ssa)
  
unsigned v = SSA_NAME_VERSION (ssa);
  
+  bitmap_set_bit (m_killed_defs, v);

+
// Walk the equivalency list and remove SSA from any equivalencies.
if (bitmap_bit_p (m_equiv.m_names, v))
  {
@@ -1389,6 +1392,12 @@ path_oracle::query_relation (basic_block bb, 
const_bitmap b1, const_bitmap b2)
  
relation_kind k = m_relations.find_relation (b1, b2);
  
+  // Do not look at the root oracle for names that have been killed

+  // along the path.
+  if (bitmap_intersect_p (m_killed_defs, b1)
+  || bitmap_intersect_p (m_killed_defs, b2))
+return k;
+
if (k == VREL_NONE && m_root)
  k = m_root->query_relation (bb, b1, b2);
  
diff --git a/gcc/value-relation.h b/gcc/value-relation.h

index 97be3251144..8086f5552b5 100644
--- a/gcc/value-relation.h
+++ b/gcc/value-relation.h
@@ -233,6 +233,7 @@ private:
equiv_chain m_equiv;
relation_chain_head m_relations;
relation_oracle *m_root;
+  bitmap m_killed_defs;
  
bitmap_obstack m_bitmaps;

struct obstack m_chain_obstack;


Yeah, that is fine I think.

Rangers oracle doesn't suffer from this issue since it is dominance 
based..  we wont find any relation from a back edge because it won't 
dominate the definition.  The path oracle may start after a block 
containing a back edge, as so may inherit relations from that back edge.


The path oracle operates as an add-onto the normal oracle.   it simply 
maintains a list of relations that have been seen along the path, in the 
order it has seen them.  these are combined with any relations rangers 
oracle has active at the first block in the path.   That is why when we 
see an SSA_NAME defined while walking the path we may see relations 
already existing...  And we need to NOT find any relation that existed 
before the definition.


Anderw




Re: [PING^2 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics

2021-11-08 Thread Paul A. Clarke via Gcc-patches
On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Patches 1/3 and 3/3 have been committed.
> This is only a ping for 2/3.

Gentle re-ping.

> On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches 
> wrote:
> > Suppress exceptions (when specified), by saving, manipulating, and
> > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > rounding mode when required.
> > 
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible. Note that explicit instruction scheduling "barriers" are
> > added to prevent floating-point computations from being moved before or
> > after the explicit FPSCR manipulations.  (That these are required has
> > been reported as an issue in GCC: PR102783.)
> > 
> > The scalar versions naively use the parallel versions to compute the
> > single scalar result and then construct the remainder of the result.
> > 
> > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > are swapped from the corresponding values on x86 so as to match the
> > corresponding rounding mode values in the Power ISA.
> > 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> > 
> > Function signatures match the analogous functions in 
> > config/i386/smmintrin.h.
> > 
> > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > modeled after the very similar "floor" and "ceil" tests.
> > 
> > Include basic tests, plus tests at the boundaries for floating-point
> > representation, positive and negative, test all of the parameterized
> > rounding modes as well as the C99 rounding modes and interactions
> > between the two.
> > 
> > Exceptions are not explicitly tested.
> > 
> > 2021-10-18  Paul A. Clarke  
> > 
> > gcc
> > * config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > _mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > _MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > _MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > _MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > _MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > * config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > _mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > Convert from function to macro.
> > 
> > gcc/testsuite
> > * gcc.target/powerpc/sse4_1-round3.h: New.
> > * gcc.target/powerpc/sse4_1-roundpd.c: New.
> > * gcc.target/powerpc/sse4_1-roundps.c: New.
> > * gcc.target/powerpc/sse4_1-roundsd.c: New.
> > * gcc.target/powerpc/sse4_1-roundss.c: New.
> > ---
> >  gcc/config/rs6000/smmintrin.h | 292 ++
> >  .../gcc.target/powerpc/sse4_1-round3.h|  81 +
> >  .../gcc.target/powerpc/sse4_1-roundpd.c   | 143 +
> >  .../gcc.target/powerpc/sse4_1-roundps.c   |  98 ++
> >  .../gcc.target/powerpc/sse4_1-roundsd.c   | 256 +++
> >  .../gcc.target/powerpc/sse4_1-roundss.c   | 208 +
> >  6 files changed, 1014 insertions(+), 64 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > 
> > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > index 90ce03d22709..6bb03e6e20ac 100644
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -42,6 +42,234 @@
> >  #include 
> >  #include 
> >  
> > +/* Rounding mode macros. */
> > +#define _MM_FROUND_TO_NEAREST_INT   0x00
> > +#define _MM_FROUND_TO_ZERO  0x01
> > +#define _MM_FROUND_TO_POS_INF   0x02
> > +#define _MM_FROUND_TO_NEG_INF   0x03
> > +#define _MM_FROUND_CUR_DIRECTION0x04
> > +
> > +#define _MM_FROUND_NINT\
> > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_FLOOR   \
> > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_CEIL\
> > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_TRUNC   \
> > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_RINT\
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_NEARBYINT   \
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> > +
> > +#define _MM_FROUND_RAISE_EXC0x00
> > +#define _MM_FROUND_NO_EXC   0x08
> > +
> > +extern __

[PING PATCH] rs6000: Add Power10 optimization for _mm_blendv*

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Wed, Oct 20, 2021 at 08:42:07PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power10 ISA added `xxblendv*` instructions which are realized in the
> `vec_blendv` instrinsic.
> 
> Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
> `_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.
> 
> Also, copy a test from i386 for testing `_mm_blendv_ps`.
> This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
> but was inadvertently omitted.
> 
> 2021-10-20  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
>   when _ARCH_PWR10.
>   (_mm_blendv_ps): Likewise.
>   (_mm_blendv_pd): Likewise.
> 
> gcc/testsuite
>   * gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
>   adjust dg directives to suit.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
> 
> OK for trunk?
> 
>  gcc/config/rs6000/smmintrin.h | 12 
>  .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
>  2 files changed, 77 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index b732fbca7b09..5d87fd7b6f61 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
> __imm8)
>  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128i) vec_blendv ((__v16qu) __A, (__v16qu) __B, (__v16qu) 
> __mask);
> +#else
>const __v16qu __seven = vec_splats ((unsigned char) 0x07);
>__v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
>return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> +#endif
>  }
>  
>  __inline __m128
> @@ -149,9 +153,13 @@ __inline __m128
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
> +#else
>const __v4si __zero = {0};
>const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
>return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
> +#endif
>  }
>  
>  __inline __m128d
> @@ -174,9 +182,13 @@ __inline __m128d
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
>  {
> +#ifdef _ARCH_PWR10
> +  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) __mask);
> +#else
>const __v2di __zero = {0};
>const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
> __zero);
>return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
> +#endif
>  }
>  #endif
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
> b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> new file mode 100644
> index ..8fcb55383047
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
> @@ -0,0 +1,65 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include 
> +#include 
> +
> +#define NUM 20
> +
> +static void
> +init_blendvps (float *src1, float *src2, float *mask)
> +{
> +  int i, msk, sign = 1; 
> +
> +  msk = -1;
> +  for (i = 0; i < NUM * 4; i++)
> +{
> +  if((i % 4) == 0)
> + msk++;
> +  src1[i] = i* (i + 1) * sign;
> +  src2[i] = (i + 20) * sign;
> +  mask[i] = (i + 120) * i;
> +  if( (msk & (1 << (i % 4
> + mask[i] = -mask[i];
> +  sign = -sign;
> +}
> +}
> +
> +static int
> +check_blendvps (__m128 *dst, float *src1, float *src2,
> + float *mask)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +if (mask [j] < 0.0)
> +  tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  union
> +{
> +  __m128 x[NUM];
> +  float f[NUM * 4];
> +} dst, src1, src2, mask;
> +  int i;
> +
> +  init_blendvps (src1.f, src2.f, mask.f);
> +
> +  for (i = 0; i < NUM; i++)
> +{
> +  dst.x[i] = _mm_blendv_ps (src1.x[i], src2.x[i], mask.x[i]);
> +  if (check_blendvps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4],
> +   &mask.f[i * 4]))
> + abort ();
> +}
> +}
> -- 
> 2.27.0
> 


[PING PATCH] rs6000: Add Power10 optimization for most _mm_movemask*

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Thu, Oct 21, 2021 at 12:22:12PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power10 ISA added `vextract*` instructions which are realized in the
> `vec_extractm` instrinsic.
> 
> Use `vec_extractm` for `_mm_movemask_ps`, `_mm_movemask_pd`, and
> `_mm_movemask_epi8` compatibility intrinsics, when `_ARCH_PWR10`.
> 
> 2021-10-21  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/xmmintrin.h (_mm_movemask_ps): Use vec_extractm
>   when _ARCH_PWR10.
>   * config/rs6000/emmintrin.h (_mm_movemask_pd): Likewise.
>   (_mm_movemask_epi8): Likewise.
> ---
> Tested on Power10 powerpc64le-linux (compiled with and without
> `-mcpu=power10`).
> 
> OK for trunk?
> 
>  gcc/config/rs6000/emmintrin.h | 8 
>  gcc/config/rs6000/xmmintrin.h | 4 
>  2 files changed, 12 insertions(+)
> 
> diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> index 32ad72b4cc35..ab16c13c379e 100644
> --- a/gcc/config/rs6000/emmintrin.h
> +++ b/gcc/config/rs6000/emmintrin.h
> @@ -1233,6 +1233,9 @@ _mm_loadl_pd (__m128d __A, double const *__B)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_pd (__m128d  __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((__v2du) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned int perm_mask =
>  {
> @@ -1252,6 +1255,7 @@ _mm_movemask_pd (__m128d  __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> @@ -2030,6 +2034,9 @@ _mm_min_epu8 (__m128i __A, __m128i __B)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_epi8 (__m128i __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((__v16qu) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned char perm_mask =
>  {
> @@ -2046,6 +2053,7 @@ _mm_movemask_epi8 (__m128i __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> diff --git a/gcc/config/rs6000/xmmintrin.h b/gcc/config/rs6000/xmmintrin.h
> index ae1a33e8d95b..4c093fd1d5ae 100644
> --- a/gcc/config/rs6000/xmmintrin.h
> +++ b/gcc/config/rs6000/xmmintrin.h
> @@ -1352,6 +1352,9 @@ _mm_storel_pi (__m64 *__P, __m128 __A)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_ps (__m128  __A)
>  {
> +#ifdef _ARCH_PWR10
> +  return vec_extractm ((vector unsigned int) __A);
> +#else
>__vector unsigned long long result;
>static const __vector unsigned int perm_mask =
>  {
> @@ -1371,6 +1374,7 @@ _mm_movemask_ps (__m128  __A)
>  #else
>return result[0];
>  #endif
> +#endif /* !_ARCH_PWR10 */
>  }
>  #endif /* _ARCH_PWR8 */
>  
> -- 
> 2.27.0
> 


[PING PATCH] rs6000: Add optimizations for _mm_sad_epu8

2021-11-08 Thread Paul A. Clarke via Gcc-patches
Gentle ping...

On Fri, Oct 22, 2021 at 12:28:49PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Power9 ISA added `vabsdub` instruction which is realized in the
> `vec_absd` instrinsic.
> 
> Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when
> `_ARCH_PWR9`.
> 
> Also, the realization of `vec_sum2s` on little-endian includes
> two shifts in order to position the input and output to match
> the semantics of `vec_sum2s`:
> - Shift the second input vector left 12 bytes. In the current usage,
>   that vector is `{0}`, so this shift is unnecessary, but is currently
>   not eliminated under optimization.
> - Shift the vector produced by the `vsum2sws` instruction left 4 bytes.
>   The two words within each doubleword of this (shifted) result must then
>   be explicitly swapped to match the semantics of `_mm_sad_epu8`,
>   effectively reversing this shift.  So, this shift (and a susequent swap)
>   are unnecessary, but not currently removed under optimization.
> 
> Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an
> option for removing the shifts.
> 
> For little-endian, use the `vsum2sws` instruction directly, and
> eliminate the explicit shift (swap).
> 
> 2021-10-22  Paul A. Clarke  
> 
> gcc
>   * config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd
>   when _ARCH_PWR9, optimize vec_sum2s when LE.
> ---
> Tested on powerpc64le-linux on Power9, with and without `-mcpu=power9`,
> and on powerpc/powerpc64-linux on Power8.
> 
> OK for trunk?
> 
>  gcc/config/rs6000/emmintrin.h | 24 +---
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
> index ab16c13c379e..c4758be0e777 100644
> --- a/gcc/config/rs6000/emmintrin.h
> +++ b/gcc/config/rs6000/emmintrin.h
> @@ -2197,27 +2197,37 @@ extern __inline __m128i 
> __attribute__((__gnu_inline__, __always_inline__, __arti
>  _mm_sad_epu8 (__m128i __A, __m128i __B)
>  {
>__v16qu a, b;
> -  __v16qu vmin, vmax, vabsdiff;
> +  __v16qu vabsdiff;
>__v4si vsum;
>const __v4su zero = { 0, 0, 0, 0 };
>__v4si result;
>  
>a = (__v16qu) __A;
>b = (__v16qu) __B;
> -  vmin = vec_min (a, b);
> -  vmax = vec_max (a, b);
> +#ifndef _ARCH_PWR9
> +  __v16qu vmin = vec_min (a, b);
> +  __v16qu vmax = vec_max (a, b);
>vabsdiff = vec_sub (vmax, vmin);
> +#else
> +  vabsdiff = vec_absd (a, b);
> +#endif
>/* Sum four groups of bytes into integers.  */
>vsum = (__vector signed int) vec_sum4s (vabsdiff, zero);
> +#ifdef __LITTLE_ENDIAN__
> +  /* Sum across four integers with two integer results.  */
> +  asm ("vsum2sws %0,%1,%2" : "=v" (result) : "v" (vsum), "v" (zero));
> +  /* Note: vec_sum2s could be used here, but on little-endian, vector
> + shifts are added that are not needed for this use-case.
> + A vector shift to correctly position the 32-bit integer results
> + (currently at [0] and [2]) to [1] and [3] would then need to be
> + swapped back again since the desired results are two 64-bit
> + integers ([1]|[0] and [3]|[2]).  Thus, no shift is performed.  */
> +#else
>/* Sum across four integers with two integer results.  */
>result = vec_sum2s (vsum, (__vector signed int) zero);
>/* Rotate the sums into the correct position.  */
> -#ifdef __LITTLE_ENDIAN__
> -  result = vec_sld (result, result, 4);
> -#else
>result = vec_sld (result, result, 6);
>  #endif
> -  /* Rotate the sums into the correct position.  */
>return (__m128i) result;
>  }
>  
> -- 
> 2.27.0
> 


Move uncprop after modref pass

2021-11-08 Thread Jan Hubicka via Gcc-patches
Hi,
this patch moves uncprop after modref and pure/const pass and adds a comment 
that
this pass should alwasy be last since it is only supposed to help PHI lowering.
The pass replaces constant by SSA names that are known to be constant at the
place which hardly helps other passes.

Modref now allows easily to compare ipa solutions with local solutions done
at compile time.  The local solutions should be monotonously better ideally
they should be the same showing that IPA machinery can do all we can do locally.

Neither is quite true. Building cc1plus we get
 - 1075 parameters whose EAF flags detected late are worse than those from IPA
   (1384 before the patch)
 - 5943 parameters whose EAF flags detected late are better than those from IPA
   (5766 before the patch)
 - 367 parameters whose RAF flags are changed (some better some worse)
   (375 before the patch)

Out of about 30k params tracked in 32k functions.
So not optimal, but still situation is noticeably better than before the patch
changing from

Alias oracle query stats:
  refs_may_alias_p: 76514746 disambiguations, 101144061 queries
  ref_maybe_used_by_call_p: 642091 disambiguations, 77522063 queries
  call_may_clobber_ref_p: 387354 disambiguations, 390389 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 26150 queries
  nonoverlapping_refs_since_match_p: 30138 disambiguations, 65278 must 
overlaps, 96375 queries
  aliasing_component_refs_p: 57707 disambiguations, 15412274 queries
  TBAA oracle: 28146643 disambiguations 104216840 queries
   14970479 are in alias set 0
   8940210 queries asked about the same object
   117 queries asked about the same alias set
   0 access volatile
   50245719 are dependent in the DAG
   1913672 are aritificially in conflict with void *

Modref stats:
  modref use: 25185 disambiguations, 697431 queries
  modref clobber: 271 disambiguations, 22334828 queries
  5347614 tbaa queries (0.239429 per modref query)
  759061 base compares (0.033986 per modref query)

PTA query stats:
  pt_solution_includes: 13361936 disambiguations, 4036 queries
  pt_solutions_intersect: 1589896 disambiguations, 13702105 queries

to

Alias oracle query stats:
  refs_may_alias_p: 76706288 disambiguations, 101289627 queries
  ref_maybe_used_by_call_p: 647660 disambiguations, 77711837 queries
  call_may_clobber_ref_p: 388155 disambiguations, 391104 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 26150 queries
  nonoverlapping_refs_since_match_p: 30138 disambiguations, 65170 must 
overlaps, 96267 queries
  aliasing_component_refs_p: 57149 disambiguations, 15405496 queries
  TBAA oracle: 28122633 disambiguations 104205741 queries
   14987347 are in alias set 0
   8944156 queries asked about the same object
   99 queries asked about the same alias set
   0 access volatile
   50238319 are dependent in the DAG
   1913187 are aritificially in conflict with void *

Modref stats:
  modref use: 25273 disambiguations, 701571 queries
  modref clobber: 2337545 disambiguations, 22431672 queries
  5357026 tbaa queries (0.238815 per modref query)
  762911 base compares (0.034010 per modref query)

PTA query stats:
  pt_solution_includes: 13467699 disambiguations, 40734635 queries
  pt_solutions_intersect: 1681618 disambiguations, 13751306 queries

So we got 6% better on pt_soltions_intersect

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR tree-opt/103177
* passes.def: Move uncprop after pure/const and modref.

diff --git a/gcc/passes.def b/gcc/passes.def
index 0f541454e7f..56dab80a029 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -360,9 +360,11 @@ along with GCC; see the file COPYING3.  If not see
  number of false positives from it.  */
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_late_warn_uninitialized);
-  NEXT_PASS (pass_uncprop);
   NEXT_PASS (pass_local_pure_const);
   NEXT_PASS (pass_modref);
+  /* uncprop replaces constants by SSA names.  This makes analysis harder
+and thus it should be run last.  */
+  NEXT_PASS (pass_uncprop);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_all_optimizations_g);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations_g)
@@ -393,9 +395,11 @@ along with GCC; see the file COPYING3.  If not see
  number of false positives from it.  */
   NEXT_PASS (pass_split_crit_edges);
   NEXT_PASS (pass_late_warn_uninitialized);
-  NEXT_PASS (pass_uncprop);
   NEXT_PASS (pass_local_pure_const);
   NEXT_PASS (pass_modref);
+  /* uncprop replaces constants by SSA names.  This makes analysis harder
+and thus it should be run last.  */
+  NEXT_PASS (pass_uncprop);
   POP_INSERT_PASSES ()
   NEXT_PASS (pass_tm_init);
   PUSH_INSERT_PASSES_WITHIN (pass_tm_init)


[COMMITTED] tree-optimization/103022 - Don't calculate new values when using the private context, callback.

2021-11-08 Thread Andrew MacLeod via Gcc-patches
When using the private communication method I introduced for fold_stmt 
to acquire context, we are suppose to be using the cache in read-only 
mode.  I didn't query it in that mode, so it was accidentally triggering 
another lookup before finishing the original query, and that triggered 
the safety trap.  This patch makes sure we make the request in read-only 
mode .


Bootstraps with no regressions on x86_64-pc-linux-gnu.  Pushed.

Andrew

commit 0cd653bd2559701da9cc4c9bf51f22bdd68623b5
Author: Andrew MacLeod 
Date:   Mon Nov 8 09:32:42 2021 -0500

Don't calculate new values when using the private context callback.

When using rangers private callback mechanism to provide context
to fold_stmt calls, we are only suppose to be using the cache in read
only mode, never calculate new values.

gcc/
PR tree-optimization/103122
* gimple-range.cc (gimple_ranger::range_of_expr): Request the cache
entry with "calulate new values" set to false.

gcc/testsuite/
* g++.dg/pr103122.C: New.

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index e1177b1c5e8..87dba6e81d8 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -88,8 +88,8 @@ gimple_ranger::range_of_expr (irange &r, tree expr, gimple *stmt)
   if (!m_cache.get_global_range (r, expr))
 r = gimple_range_global (expr);
   // Pick up implied context information from the on-entry cache
-  // if current_bb is set.
-  if (current_bb && m_cache.block_range (tmp, current_bb, expr))
+  // if current_bb is set.  Do not attempt any new calculations.
+  if (current_bb && m_cache.block_range (tmp, current_bb, expr, false))
 	{
 	  r.intersect (tmp);
 	  char str[80];
diff --git a/gcc/testsuite/g++.dg/pr103122.C b/gcc/testsuite/g++.dg/pr103122.C
new file mode 100644
index 000..3465eade46b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr103122.C
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned a;
+int b;
+short c;
+void d(long) {
+  for (bool e = (bool)c - 1; e < (bool)b - 1; e += 0)
+;
+  if (a) {
+for (char f = 0; f < 7; f = 7)
+  for (int g = 0; g < c; g += 10)
+;
+d(-!c);
+  }
+}


[PATCH] Introduce build_debug_expr_decl

2021-11-08 Thread Martin Jambor
Hi,

this patch introduces a helper function build_debug_expr_decl to build
DEBUG_EXPR_DECL tree nodes in the most common way and replaces with a
call of this function all code pieces which build such a DECL itself
and sets its mode to the TYPE_MODE of its type.

There still remain 11 instances of open-coded creation of a
DEBUG_EXPR_DECL which set the mode of the DECL to something else.  It
would probably be a good idea to figure out that has any effect and if
not, convert them to calls of build_debug_expr_decl too.  But this
patch deliberately does not introduce any functional changes.

Bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2021-11-08  Martin Jambor  

* tree.h (build_debug_expr_decl): Declare.
* tree.c (build_debug_expr_decl): New function.
* cfgexpand.c (avoid_deep_ter_for_debug): Use build_debug_expr_decl
instead of building a DEBUG_EXPR_DECL.
* ipa-param-manipulation.c
(ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
* omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
* tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
* tree-ssa-phiopt.c (spaceship_replacement): Likewise.
* tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
---
 gcc/cfgexpand.c  |  5 +
 gcc/ipa-param-manipulation.c |  5 +
 gcc/omp-simd-clone.c |  5 +
 gcc/tree-ssa-ccp.c   |  5 +
 gcc/tree-ssa-phiopt.c| 10 ++
 gcc/tree-ssa-reassoc.c   |  5 +
 gcc/tree.c   | 12 
 gcc/tree.h   |  1 +
 8 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 01d0cdc548a..55ff75bd78e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -4341,11 +4341,8 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth)
  tree &vexpr = deep_ter_debug_map->get_or_insert (use);
  if (vexpr != NULL)
continue;
- vexpr = make_node (DEBUG_EXPR_DECL);
+ vexpr = build_debug_expr_decl (TREE_TYPE (use));
  gimple *def_temp = gimple_build_debug_bind (vexpr, use, g);
- DECL_ARTIFICIAL (vexpr) = 1;
- TREE_TYPE (vexpr) = TREE_TYPE (use);
- SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (use)));
  gimple_stmt_iterator gsi = gsi_for_stmt (g);
  gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
  avoid_deep_ter_for_debug (def_temp, 0);
diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 4610fc4ac03..ae3149718ca 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -1200,10 +1200,7 @@ ipa_param_body_adjustments::prepare_debug_expressions 
(tree dead_ssa)
= unshare_expr_without_location (gimple_assign_rhs_to_tree (def));
   remap_with_debug_expressions (&val);
 
-  tree vexpr = make_node (DEBUG_EXPR_DECL);
-  DECL_ARTIFICIAL (vexpr) = 1;
-  TREE_TYPE (vexpr) = TREE_TYPE (val);
-  SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (val)));
+  tree vexpr = build_debug_expr_decl (TREE_TYPE (val));
   m_dead_stmt_debug_equiv.put (def, val);
   m_dead_ssa_debug_equiv.put (dead_ssa, vexpr);
   return true;
diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
index b772b7ff520..4d43a86669a 100644
--- a/gcc/omp-simd-clone.c
+++ b/gcc/omp-simd-clone.c
@@ -910,11 +910,8 @@ ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, 
void *data)
   gimple *stmt;
   if (is_gimple_debug (info->stmt))
{
- tree vexpr = make_node (DEBUG_EXPR_DECL);
+ tree vexpr = build_debug_expr_decl (TREE_TYPE (repl));
  stmt = gimple_build_debug_source_bind (vexpr, repl, NULL);
- DECL_ARTIFICIAL (vexpr) = 1;
- TREE_TYPE (vexpr) = TREE_TYPE (repl);
- SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (repl)));
  repl = vexpr;
}
   else
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 70ce6a4d5b8..60ae5e6601f 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -3452,10 +3452,7 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator *gsip,
   tree temp = NULL_TREE;
   if (!throws || after || single_pred_p (e->dest))
{
- temp = make_node (DEBUG_EXPR_DECL);
- DECL_ARTIFICIAL (temp) = 1;
- TREE_TYPE (temp) = TREE_TYPE (lhs);
- SET_DECL_MODE (temp, TYPE_MODE (TREE_TYPE (lhs)));
+ temp = build_debug_expr_decl (TREE_TYPE (lhs));
  tree t = build2 (LSHIFT_EXPR, TREE_TYPE (lhs), new_lhs, bit);
  g = gimple_build_debug_bind (temp, t, g);
  if (throws && !after)
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 0e339c46afa..173ac835ca6 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2429,19 +2429,13 @@ spaceship_replacement (basic_block cond_bb, basic_block 
middle_bb,
 all floating point numbers should be comparable.

Revert workaround allowing interposition on nested functions

2021-11-08 Thread Jan Hubicka via Gcc-patches
Hi,
the workaround seems to be no longer necessary - it seems that all the
issues was isolated to wrong beaviour of can_be_interposed wrt
partitioned functions.

Honza

* gimple.c (gimple_call_static_chain_flags): Revert the workaround
allowing interposition since issues with binds_to_local_def were
hopefully solved.
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 3d1d3a15b2c..9e65fa61c73 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1645,13 +1645,13 @@ gimple_call_static_chain_flags (const gcall *stmt)
   modref_summary *summary = node ? get_modref_function_summary (node)
: NULL;
 
+  /* Nested functions should always bind to current def since
+there is no public ABI for them.  */
+  gcc_checking_assert (node->binds_to_current_def_p ());
   if (summary)
{
  int modref_flags = summary->static_chain_flags;
 
- /* ??? Nested functions should always bind to current def.  */
- if (!node->binds_to_current_def_p ())
-   modref_flags = interposable_eaf_flags (modref_flags, flags);
  if (dbg_cnt (ipa_mod_ref_pta))
flags |= modref_flags;
}


Re: [PATCH v4] Fix ICE when mixing VLAs and statement expressions [PR91038]

2021-11-08 Thread Uecker, Martin
Am Montag, den 08.11.2021, 12:13 -0500 schrieb Jason Merrill:
> On 11/7/21 01:40, Uecker, Martin wrote:
> > Am Mittwoch, den 03.11.2021, 10:18 -0400 schrieb Jason Merrill:

...

> > 
> > Thank you! I made these changes and ran
> > bootstrap and tests again.
> 
> Hmm, it doesn't look like you made the change to use the save_expr 
> function instead of build1?

Oh, sorry. I wanted to change it and then forgot.
Now also with this change (changelog as before).

> > Ok for trunk?
> > 
> > 
> > Any idea how to fix returning structs with
> > VLA member from statement expressions?
> 
> Testcase?

void foo(void)
{
  ({ int N = 3; struct { char x[N]; } x; x; });
}

The difference to the tests in this patch (which
also forgot to include in the last version) is that
the object of variable size is returned from the
statement expression and not a pointer to it.
This can not happen with arrays because they decay
to pointers.


Martin


> > Otherwise, I will add an error message to
> > the FE in another patch.
> > 
> > Martin
> > 

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 436df45df68..95083f95442 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -3306,7 +3306,19 @@ pointer_int_sum (location_t loc, enum tree_code 
resultcode,
 TREE_TYPE (result_type)))
 size_exp = integer_one_node;
   else
-size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type));
+{
+  size_exp = size_in_bytes_loc (loc, TREE_TYPE (result_type));
+  /* Wrap the pointer expression in a SAVE_EXPR to make sure it
+is evaluated first when the size expression may depend
+on it for VM types.  */
+  if (TREE_SIDE_EFFECTS (size_exp)
+ && TREE_SIDE_EFFECTS (ptrop)
+ && variably_modified_type_p (TREE_TYPE (ptrop), NULL))
+   {
+ ptrop = save_expr (ptrop);
+ size_exp = build2 (COMPOUND_EXPR, TREE_TYPE (intop), ptrop, size_exp);
+   }
+}
 
   /* We are manipulating pointer values, so we don't need to warn
  about relying on undefined signed overflow.  We disable the
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index c2ab96e7e18..84f7dc3c248 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2964,7 +2964,9 @@ gimplify_var_or_parm_decl (tree *expr_p)
  declaration, for which we've already issued an error.  It would
  be really nice if the front end wouldn't leak these at all.
  Currently the only known culprit is C++ destructors, as seen
- in g++.old-deja/g++.jason/binding.C.  */
+ in g++.old-deja/g++.jason/binding.C.
+ Another possible culpit are size expressions for variably modified
+ types which are lost in the FE or not gimplified correctly.  */
   if (VAR_P (decl)
   && !DECL_SEEN_IN_BIND_EXPR_P (decl)
   && !TREE_STATIC (decl) && !DECL_EXTERNAL (decl)
@@ -3109,16 +3111,22 @@ gimplify_compound_lval (tree *expr_p, gimple_seq 
*pre_p, gimple_seq *post_p,
  expression until we deal with any variable bounds, sizes, or
  positions in order to deal with PLACEHOLDER_EXPRs.
 
- So we do this in three steps.  First we deal with the annotations
- for any variables in the components, then we gimplify the base,
- then we gimplify any indices, from left to right.  */
+ The base expression may contain a statement expression that
+ has declarations used in size expressions, so has to be
+ gimplified before gimplifying the size expressions.
+
+ So we do this in three steps.  First we deal with variable
+ bounds, sizes, and positions, then we gimplify the base,
+ then we deal with the annotations for any variables in the
+ components and any indices, from left to right.  */
+
   for (i = expr_stack.length () - 1; i >= 0; i--)
 {
   tree t = expr_stack[i];
 
   if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
{
- /* Gimplify the low bound and element type size and put them into
+ /* Deal with the low bound and element type size and put them into
 the ARRAY_REF.  If these values are set, they have already been
 gimplified.  */
  if (TREE_OPERAND (t, 2) == NULL_TREE)
@@ -3127,18 +3135,8 @@ gimplify_compound_lval (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
  if (!is_gimple_min_invariant (low))
{
  TREE_OPERAND (t, 2) = low;
- tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p,
-   post_p, is_gimple_reg,
-   fb_rvalue);
- ret = MIN (ret, tret);
}
}
- else
-   {
- tret = gimplify_expr (&TREE_OPERAND (t, 2), pre_p, post_p,
-   is_gimple_reg, fb_rvalue);
- ret = MIN (ret, tret);
-   }
 
  if (TREE_OPERAND (t, 3) == NULL_TREE)
{
@@ -3155,18 +3153,8 @@ gimplify_compoun

[PATCH] arm: Initialize vector costing fields

2021-11-08 Thread Christophe Lyon via Gcc-patches
The movi, dup and extract costing fields were recently added to struct
vector_cost_table, but there initialization is missing for the arm
(aarch32) specific descriptions.

Although the arm port does not use these fields (only aarch64 does),
this is causing warnings during the build, and even build failures
when using gcc-4.8.5 as host compiler:

/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 
'vector_cost_table::movi'
 };
  ^
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 
'vector_cost_table::movi' [-Wmissing-field-initializers]
/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 
'vector_cost_table::dup'
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 
'vector_cost_table::dup' [-Wmissing-field-initializers]
/gccsrc/gcc/config/arm/arm.c:1194:1: error: uninitialized const member 
'vector_cost_table::extract'
/gccsrc/gcc/config/arm/arm.c:1194:1: warning: missing initializer for member 
'vector_cost_table::extract' [-Wmissing-field-initializers]

This patch uses the same initialization values as in aarch64 for
consistency:
+COSTS_N_INSNS (1),  /* movi.  */
+COSTS_N_INSNS (2),  /* dup.  */
+COSTS_N_INSNS (2)   /* extract.  */

But given these fields are not used, maybe a dummy value should be
used instead? (zero?)

2021-11-08  Christophe Lyon  

gcc/
* config/arm/arm.c (cortexa9_extra_costs, cortexa8_extra_costs,
cortexa5_extra_costs, cortexa7_extra_costs,
cortexa12_extra_costs, cortexa15_extra_costs, v7m_extra_costs):
Initialize movi, dup and extract costing fields.
---
 gcc/config/arm/arm.c | 35 ---
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c6e77fab66..3f5e1162853 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1197,7 +1197,10 @@ const struct cpu_cost_table cortexa9_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1301,7 +1304,10 @@ const struct cpu_cost_table cortexa8_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1406,7 +1412,10 @@ const struct cpu_cost_table cortexa5_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1512,7 +1521,10 @@ const struct cpu_cost_table cortexa7_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1616,7 +1628,10 @@ const struct cpu_cost_table cortexa12_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1720,7 +1735,10 @@ const struct cpu_cost_table cortexa15_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
@@ -1824,7 +1842,10 @@ const struct cpu_cost_table v7m_extra_costs =
   /* Vector */
   {
 COSTS_N_INSNS (1), /* alu.  */
-COSTS_N_INSNS (4)  /* mult.  */
+COSTS_N_INSNS (4), /* mult.  */
+COSTS_N_INSNS (1), /* movi.  */
+COSTS_N_INSNS (2), /* dup.  */
+COSTS_N_INSNS (2)  /* extract.  */
   }
 };
 
-- 
2.25.1



Re: [PATCH] Introduce build_debug_expr_decl

2021-11-08 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 8 Nov 2021 at 23:24, Martin Jambor  wrote:
>
> Hi,
>
> this patch introduces a helper function build_debug_expr_decl to build
> DEBUG_EXPR_DECL tree nodes in the most common way and replaces with a
> call of this function all code pieces which build such a DECL itself
> and sets its mode to the TYPE_MODE of its type.
>
> There still remain 11 instances of open-coded creation of a
> DEBUG_EXPR_DECL which set the mode of the DECL to something else.  It
> would probably be a good idea to figure out that has any effect and if
> not, convert them to calls of build_debug_expr_decl too.  But this
> patch deliberately does not introduce any functional changes.
>
> Bootstrapped and tested on x86_64-linux, OK for trunk?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2021-11-08  Martin Jambor  
>
> * tree.h (build_debug_expr_decl): Declare.
> * tree.c (build_debug_expr_decl): New function.
> * cfgexpand.c (avoid_deep_ter_for_debug): Use build_debug_expr_decl
> instead of building a DEBUG_EXPR_DECL.
> * ipa-param-manipulation.c
> (ipa_param_body_adjustments::prepare_debug_expressions): Likewise.
> * omp-simd-clone.c (ipa_simd_modify_stmt_ops): Likewise.
> * tree-ssa-ccp.c (optimize_atomic_bit_test_and): Likewise.
> * tree-ssa-phiopt.c (spaceship_replacement): Likewise.
> * tree-ssa-reassoc.c (make_new_ssa_for_def): Likewise.
> ---
>  gcc/cfgexpand.c  |  5 +
>  gcc/ipa-param-manipulation.c |  5 +
>  gcc/omp-simd-clone.c |  5 +
>  gcc/tree-ssa-ccp.c   |  5 +
>  gcc/tree-ssa-phiopt.c| 10 ++
>  gcc/tree-ssa-reassoc.c   |  5 +
>  gcc/tree.c   | 12 
>  gcc/tree.h   |  1 +
>  8 files changed, 20 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 01d0cdc548a..55ff75bd78e 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -4341,11 +4341,8 @@ avoid_deep_ter_for_debug (gimple *stmt, int depth)
>   tree &vexpr = deep_ter_debug_map->get_or_insert (use);
>   if (vexpr != NULL)
> continue;
> - vexpr = make_node (DEBUG_EXPR_DECL);
> + vexpr = build_debug_expr_decl (TREE_TYPE (use));
>   gimple *def_temp = gimple_build_debug_bind (vexpr, use, g);
> - DECL_ARTIFICIAL (vexpr) = 1;
> - TREE_TYPE (vexpr) = TREE_TYPE (use);
> - SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (use)));
>   gimple_stmt_iterator gsi = gsi_for_stmt (g);
>   gsi_insert_after (&gsi, def_temp, GSI_NEW_STMT);
>   avoid_deep_ter_for_debug (def_temp, 0);
> diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
> index 4610fc4ac03..ae3149718ca 100644
> --- a/gcc/ipa-param-manipulation.c
> +++ b/gcc/ipa-param-manipulation.c
> @@ -1200,10 +1200,7 @@ ipa_param_body_adjustments::prepare_debug_expressions 
> (tree dead_ssa)
> = unshare_expr_without_location (gimple_assign_rhs_to_tree (def));
>remap_with_debug_expressions (&val);
>
> -  tree vexpr = make_node (DEBUG_EXPR_DECL);
> -  DECL_ARTIFICIAL (vexpr) = 1;
> -  TREE_TYPE (vexpr) = TREE_TYPE (val);
> -  SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (val)));
> +  tree vexpr = build_debug_expr_decl (TREE_TYPE (val));
>m_dead_stmt_debug_equiv.put (def, val);
>m_dead_ssa_debug_equiv.put (dead_ssa, vexpr);
>return true;
> diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c
> index b772b7ff520..4d43a86669a 100644
> --- a/gcc/omp-simd-clone.c
> +++ b/gcc/omp-simd-clone.c
> @@ -910,11 +910,8 @@ ipa_simd_modify_stmt_ops (tree *tp, int *walk_subtrees, 
> void *data)
>gimple *stmt;
>if (is_gimple_debug (info->stmt))
> {
> - tree vexpr = make_node (DEBUG_EXPR_DECL);
> + tree vexpr = build_debug_expr_decl (TREE_TYPE (repl));
>   stmt = gimple_build_debug_source_bind (vexpr, repl, NULL);
> - DECL_ARTIFICIAL (vexpr) = 1;
> - TREE_TYPE (vexpr) = TREE_TYPE (repl);
> - SET_DECL_MODE (vexpr, TYPE_MODE (TREE_TYPE (repl)));
>   repl = vexpr;
> }
>else
> diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
> index 70ce6a4d5b8..60ae5e6601f 100644
> --- a/gcc/tree-ssa-ccp.c
> +++ b/gcc/tree-ssa-ccp.c
> @@ -3452,10 +3452,7 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator 
> *gsip,
>tree temp = NULL_TREE;
>if (!throws || after || single_pred_p (e->dest))
> {
> - temp = make_node (DEBUG_EXPR_DECL);
> - DECL_ARTIFICIAL (temp) = 1;
> - TREE_TYPE (temp) = TREE_TYPE (lhs);
> - SET_DECL_MODE (temp, TYPE_MODE (TREE_TYPE (lhs)));
> + temp = build_debug_expr_decl (TREE_TYPE (lhs));
>   tree t = build2 (LSHIFT_EXPR, TREE_TYPE (lhs), new_lhs, bit);
>   g = gimple_build_debug_bind (temp, t, g);
>   if (throws && !after)
> diff --git a/gcc/tree-ssa-phiopt.c

Re: Move uncprop after modref pass

2021-11-08 Thread Jeff Law via Gcc-patches




On 11/8/2021 11:50 AM, Jan Hubicka via Gcc-patches wrote:

Hi,
this patch moves uncprop after modref and pure/const pass and adds a comment 
that
this pass should alwasy be last since it is only supposed to help PHI lowering.
The pass replaces constant by SSA names that are known to be constant at the
place which hardly helps other passes.

Modref now allows easily to compare ipa solutions with local solutions done
at compile time.  The local solutions should be monotonously better ideally
they should be the same showing that IPA machinery can do all we can do locally.

Neither is quite true. Building cc1plus we get
  - 1075 parameters whose EAF flags detected late are worse than those from IPA
(1384 before the patch)
  - 5943 parameters whose EAF flags detected late are better than those from IPA
(5766 before the patch)
  - 367 parameters whose RAF flags are changed (some better some worse)
(375 before the patch)

Out of about 30k params tracked in 32k functions.
So not optimal, but still situation is noticeably better than before the patch
changing from

Alias oracle query stats:
   refs_may_alias_p: 76514746 disambiguations, 101144061 queries
   ref_maybe_used_by_call_p: 642091 disambiguations, 77522063 queries
   call_may_clobber_ref_p: 387354 disambiguations, 390389 queries
   nonoverlapping_component_refs_p: 0 disambiguations, 26150 queries
   nonoverlapping_refs_since_match_p: 30138 disambiguations, 65278 must 
overlaps, 96375 queries
   aliasing_component_refs_p: 57707 disambiguations, 15412274 queries
   TBAA oracle: 28146643 disambiguations 104216840 queries
14970479 are in alias set 0
8940210 queries asked about the same object
117 queries asked about the same alias set
0 access volatile
50245719 are dependent in the DAG
1913672 are aritificially in conflict with void *

Modref stats:
   modref use: 25185 disambiguations, 697431 queries
   modref clobber: 271 disambiguations, 22334828 queries
   5347614 tbaa queries (0.239429 per modref query)
   759061 base compares (0.033986 per modref query)

PTA query stats:
   pt_solution_includes: 13361936 disambiguations, 4036 queries
   pt_solutions_intersect: 1589896 disambiguations, 13702105 queries

to

Alias oracle query stats:
   refs_may_alias_p: 76706288 disambiguations, 101289627 queries
   ref_maybe_used_by_call_p: 647660 disambiguations, 77711837 queries
   call_may_clobber_ref_p: 388155 disambiguations, 391104 queries
   nonoverlapping_component_refs_p: 0 disambiguations, 26150 queries
   nonoverlapping_refs_since_match_p: 30138 disambiguations, 65170 must 
overlaps, 96267 queries
   aliasing_component_refs_p: 57149 disambiguations, 15405496 queries
   TBAA oracle: 28122633 disambiguations 104205741 queries
14987347 are in alias set 0
8944156 queries asked about the same object
99 queries asked about the same alias set
0 access volatile
50238319 are dependent in the DAG
1913187 are aritificially in conflict with void *

Modref stats:
   modref use: 25273 disambiguations, 701571 queries
   modref clobber: 2337545 disambiguations, 22431672 queries
   5357026 tbaa queries (0.238815 per modref query)
   762911 base compares (0.034010 per modref query)

PTA query stats:
   pt_solution_includes: 13467699 disambiguations, 40734635 queries
   pt_solutions_intersect: 1681618 disambiguations, 13751306 queries

So we got 6% better on pt_soltions_intersect

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

PR tree-opt/103177
* passes.def: Move uncprop after pure/const and modref.

OK.

jeff



Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-08 Thread Andrew MacLeod via Gcc-patches

On 11/8/21 10:05 AM, Martin Liška wrote:

On 9/28/21 22:39, Andrew MacLeod wrote:
In Theory, modifying the IL should be fine, it happens already in 
places, but its not extensively tested under those conditions yet.


Hello Andrew.

I've just tried using a global gimple_ranger and it crashes when loop 
unswitching duplicates

some BBs.

ah, ok, so the default on-entry cache for ranger doesn't expect to see 
the number of BBs to increase.  .  I can change this to grow, but I want 
to avoid too many grows.  This test case looks like it grows the number 
of BBs from 24 to somewhere north of 90..  Do you have any idea in 
advance how many BBs you will be adding?  Although Im not sure how to 
make such a suggestion anyway .  Ill work something out.  The sparse 
cache has no such issue, but you will lose precision so we don't want to 
trigger on that anyway.


As work around for the moment to keep you going, heres a patch which 
simply starts with 256 extra spaces, so that should allow you to 
continue while I fix this properly to grow.  and you can see if things 
continue to work as expected.  You can increase that number as you see fit


I'll put in a proper fix in a bit.


diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index e5591bab0ef..6a3dcfadf98 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -220,7 +220,7 @@ sbr_vector::sbr_vector (tree t, irange_allocator *allocator)
   gcc_checking_assert (TYPE_P (t));
   m_type = t;
   m_irange_allocator = allocator;
-  m_tab_size = last_basic_block_for_fn (cfun) + 1;
+  m_tab_size = last_basic_block_for_fn (cfun) + 256;
   m_tab = (irange **)allocator->get_memory (m_tab_size * sizeof (irange *));
   memset (m_tab, 0, m_tab_size * sizeof (irange *));
 


Re: [PATCH 18/18] rs6000: Add escape-newline support for builtins files

2021-11-08 Thread Bill Schmidt via Gcc-patches
On 11/5/21 6:50 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 01, 2021 at 11:13:54AM -0500, Bill Schmidt wrote:
>> +/* Escape-newline support.  For readability, we prefer to allow developers
>> +   to use escape-newline to continue long lines to the next one.  We
>> +   maintain a buffer of "original" lines here, which are concatenated into
>> +   linebuf, above, and which can be used to convert the virtual line
>> +   position "line / pos" into actual line and position information.  */
>> +#define MAXLINES 4
> Make this bigger already?  Or, want to bet if we will need to increase
> it for GCC 12 already?  Because for GCC 13 we almost certainly will :-)

We *could*, but honestly I don't think we'll need it anytime soon.  The only
reason we need 4 is for a single built-in that takes sixteen parameters:

+  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
+signed char, signed char, signed char, signed char, signed char, \
+signed char, signed char, signed char, signed char, signed char, \
+signed char, signed char, signed char);

It's hard to think of a rational built-in that will need more space than this.
We can always make it bigger later if needed.

I'll make the rest of the cleanups you suggested.  Thanks again for the review!

Bill




Re: [PATCH] Loop unswitching: support gswitch statements.

2021-11-08 Thread Andrew MacLeod via Gcc-patches

On 11/8/21 10:05 AM, Martin Liška wrote:

On 9/28/21 22:39, Andrew MacLeod wrote:
In Theory, modifying the IL should be fine, it happens already in 
places, but its not extensively tested under those conditions yet.


Hello Andrew.

I've just tried using a global gimple_ranger and it crashes when loop 
unswitching duplicates

some BBs.

Please try the attached patch for: 


hey Martin,

try using this in your tree.  Since nothing else is using a growing BB 
right now, I'll let you work with it and see if everything works as 
expected before checking it in, just in case we need more tweaking.   
With this,


make RUNTESTFLAGS=dg.exp=loop-unswitch*.c check-gcc

runs clean.


basically, I tried to grow it by either a factor of 10% for the current 
BB size when the grow is requested, or some double the needed extra 
size, or 128... whichever value is "maximum"    That means it shoudnt be 
asking for tooo much each time, but also not a minimum amount.


Im certainly open to suggestion on how much to grow it each time.    
Note the vector being grown is ONLY fo the SSA_NAme being asked for.. so 
it really an on-demand thing just for specific names, in your case, 
mostly just the switch index.


Let me know how this works for you, and if you have any other issues.

Andrew

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index e5591bab0ef..e010d35904f 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -210,6 +210,7 @@ protected:
   int_range<2> m_undefined;
   tree m_type;
   irange_allocator *m_irange_allocator;
+  void grow ();
 };
 
 
@@ -229,12 +230,33 @@ sbr_vector::sbr_vector (tree t, irange_allocator *allocator)
   m_undefined.set_undefined ();
 }
 
+void
+sbr_vector::grow ()
+{
+  int curr_bb_size = last_basic_block_for_fn (cfun);
+  gcc_checking_assert (curr_bb_size > m_tab_size);
+
+  int inc = MAX ((curr_bb_size - m_tab_size) * 2, 128);
+  inc = MAX (inc, curr_bb_size / 10);
+  int new_size = inc + curr_bb_size;
+
+  irange **t = (irange **)m_irange_allocator->get_memory (new_size
+			  * sizeof (irange *));
+  memcpy (t, m_tab, m_tab_size * sizeof (irange *));
+  memset (t + m_tab_size, 0, (new_size - m_tab_size) * sizeof (irange *));
+
+  m_tab = t;
+  m_tab_size = new_size;
+}
+
 // Set the range for block BB to be R.
 
 bool
 sbr_vector::set_bb_range (const_basic_block bb, const irange &r)
 {
   irange *m;
+  if (bb->index >= m_tab_size)
+grow ();
   gcc_checking_assert (bb->index < m_tab_size);
   if (r.varying_p ())
 m = &m_varying;
@@ -252,6 +274,8 @@ sbr_vector::set_bb_range (const_basic_block bb, const irange &r)
 bool
 sbr_vector::get_bb_range (irange &r, const_basic_block bb)
 {
+  if (bb->index >= m_tab_size)
+grow ();
   gcc_checking_assert (bb->index < m_tab_size);
   irange *m = m_tab[bb->index];
   if (m)
@@ -267,8 +291,9 @@ sbr_vector::get_bb_range (irange &r, const_basic_block bb)
 bool
 sbr_vector::bb_range_p (const_basic_block bb)
 {
-  gcc_checking_assert (bb->index < m_tab_size);
-  return m_tab[bb->index] != NULL;
+  if (bb->index < m_tab_size)
+return m_tab[bb->index] != NULL;
+  return false;
 }
 
 // This class implements the on entry cache via a sparse bitmap.


[PATCH] pch: Add support for PCH for relocatable executables

2021-11-08 Thread Jakub Jelinek via Gcc-patches
On Mon, Nov 08, 2021 at 12:46:04PM +0100, Jakub Jelinek via Gcc-patches wrote:
> So, if we want to make PCH work for PIEs, I'd say we can:
> 1) add a new GTY option, say callback, which would act like
>skip for non-PCH and for PCH would make us skip it but
>remember for address bias translation
> 2) drop the skip for tree_translation_unit_decl::language
> 3) change get_unnamed_section to have const char * as
>last argument instead of const void *, change
>unnamed_section::data also to const char * and update
>everything related to that
> 4) maybe add a host hook whether it is ok to support binaries
>changing addresses (the only thing I'm worried is if
>some host that uses function descriptors allocates them
>dynamically instead of having them somewhere in the
>executable)
> 5) maybe add a gengtype warning if it sees in GTY tracked
>structure a function pointer without that new callback
>option

So, here is 1), 2), 3) implemented.  With this patch alone,
g++.dg/pch/system-2.C test ICEs.  This is because GCC 12 has added
function::x_range_query member, which is set to &global_ranges on
cfun creation and is:
  range_query * GTY ((skip)) x_range_query;
which means when a PIE binary writes PCH and a PIE binary loaded
at a different address loads it, cfun->x_range_query might be a garbage
pointer.  We can either apply a patch like the attached one after
this inline patch, but then probably callback is misnamed and we should
rename it to relocate_and_skip or something similar.  Or we could
e.g. during gimplification overwrite cfun->x_range_query = &global_ranges.
Other than that make check-gcc check-g++ passes RUNTESTFLAGS=pch.exp.

Not really sure about PA or IA-64 function descriptors, are any of those
allocated by the dynamic linker rather than created by the static linker?
I guess instead of removing the c-pch.c changes we could remember there
not just a function pointer, but also a data pointer and compare if both
are either the same or have the same load bias and punt only if they
have different bias.  Though, on architecture where all function pointers
would be dynamically allocated and could change any time even that wouldn't
be really a reliable check.

Note, on stdc++.h.gch/O2g.gch there are just those 10 relocations without
the second patch, with it a few more, but nothing huge.  And for non-PIEs
there isn't really any extra work on the load side except freading two scalar
values and fseek.

Thoughts on this?

2021-11-08  Jakub Jelinek  

gcc/
* ggc.h (gt_pch_note_callback): Declare.
* gengtype.h (enum typekind): Add TYPE_CALLBACK.
(callback_type): Declare.
* gengtype.c (dbgprint_count_type_at): Handle TYPE_CALLBACK.
(callback_type): New variable.
(process_gc_options): Add CALLBACK argument, handle callback
option.
(set_gc_used_type): Adjust process_gc_options caller, if callback,
set type to &callback_type.
(output_mangled_typename): Handle TYPE_CALLBACK.
(walk_type): Likewise.  Handle callback option.
(write_types_process_field): Handle TYPE_CALLBACK.
(write_types_local_user_process_field): Likewise.
(write_types_local_process_field): Likewise.
(write_root): Likewise.
(dump_typekind): Likewise.
(dump_type): Likewise.
* gengtype-state.c (type_lineloc): Handle TYPE_CALLBACK.
(state_writer::write_state_callback_type): New method.
(state_writer::write_state_type): Handle TYPE_CALLBACK.
(read_state_callback_type): New function.
(read_state_type): Handle TYPE_CALLBACK.
* ggc-common.c (callback_vec): New variable.
(gt_pch_note_callback): New function.
(gt_pch_save): Stream out gt_pch_save function address and relocation
table.
(gt_pch_restore): Stream in saved gt_pch_save function address and
relocation table and apply relocations if needed.
* doc/gty.texi (callback): Document new GTY option.
* varasm.c (get_unnamed_section): Change callback argument's type and
last argument's type from const void * to const char *.
(output_section_asm_op): Change argument's type from const void *
to const char *, remove unnecessary cast.
* tree-core.h (struct tree_translation_unit_decl): Drop GTY((skip))
from language member.
* output.h (unnamed_section_callback): Change argument type from
const void * to const char *.
(struct unnamed_section): Use GTY((callback)) instead of GTY((skip))
for callback member.  Change data member type from const void *
to const char *.
(struct noswitch_section): Use GTY((callback)) instead of GTY((skip))
for callback member.
(get_unnamed_section): Change callback argument's type and
last argument's type from const void * to const char *.
(output_section_asm_op): Change argument's type from const 

[PATCH] c++: Skip unnamed bit-fields more

2021-11-08 Thread Marek Polacek via Gcc-patches
As Jason noticed in
,
we shouldn't require an initializer for an unnamed bit-field, because,
as [class.bit] says, they cannot be initialized.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

gcc/cp/ChangeLog:

* class.c (default_init_uninitialized_part): Use
next_initializable_field.
* method.c (walk_field_subobs): Skip unnamed bit-fields.

gcc/testsuite/ChangeLog:

* g++.dg/init/bitfield6.C: New test.
---
 gcc/cp/class.c|  7 +++
 gcc/cp/method.c   |  4 +++-
 gcc/testsuite/g++.dg/init/bitfield6.C | 20 
 3 files changed, 26 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/init/bitfield6.C

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index f16e50b9de9..bf92300a178 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -5455,10 +5455,9 @@ default_init_uninitialized_part (tree type)
   if (r)
return r;
 }
-  for (t = TYPE_FIELDS (type); t; t = DECL_CHAIN (t))
-if (TREE_CODE (t) == FIELD_DECL
-   && !DECL_ARTIFICIAL (t)
-   && !DECL_INITIAL (t))
+  for (t = next_initializable_field (TYPE_FIELDS (type)); t;
+   t = next_initializable_field (DECL_CHAIN (t)))
+if (!DECL_INITIAL (t) && !DECL_ARTIFICIAL (t))
   {
r = default_init_uninitialized_part (TREE_TYPE (t));
if (r)
diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 1023aefc575..935946f5eef 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -2295,7 +2295,9 @@ walk_field_subobs (tree fields, special_function_kind 
sfk, tree fnname,
 {
   tree mem_type, argtype, rval;
 
-  if (TREE_CODE (field) != FIELD_DECL || DECL_ARTIFICIAL (field))
+  if (TREE_CODE (field) != FIELD_DECL
+ || DECL_ARTIFICIAL (field)
+ || DECL_UNNAMED_BIT_FIELD (field))
continue;
 
   /* Variant members only affect deletedness.  In particular, they don't
diff --git a/gcc/testsuite/g++.dg/init/bitfield6.C 
b/gcc/testsuite/g++.dg/init/bitfield6.C
new file mode 100644
index 000..70854ee9588
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/bitfield6.C
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int : 8;
+  int i = 42;
+};
+
+constexpr A a;
+
+struct B : A {
+};
+
+constexpr B b;
+
+struct C {
+  int : 0;
+};
+
+constexpr C c;

base-commit: d626fe77cdc40de0ae1651c8b94090eea73a719f
-- 
2.33.1



Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-08 Thread Hans-Peter Nilsson via Gcc-patches
> From: "Maciej W. Rozycki" 
> Date: Wed, 3 Nov 2021 14:53:58 +0100

>   gcc/
>   PR middle-end/103059
>   * reload.c (find_reloads_address_1): Also accept the ASHIFT form 
>   of indexed addressing.
>   (find_reloads): Adjust accordingly.

> ---
>  gcc/reload.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> gcc-find-reloads-address-ashift.diff
> Index: gcc/gcc/reload.c
> ===
> --- gcc.orig/gcc/reload.c
> +++ gcc/gcc/reload.c
> @@ -2846,10 +2846,11 @@ find_reloads (rtx_insn *insn, int replac
>   i, operand_type[i], ind_levels, insn);
>  
> /* If we now have a simple operand where we used to have a
> -  PLUS or MULT, re-recognize and try again.  */
> +  PLUS or MULT or ASHIFT, re-recognize and try again.  */
> if ((OBJECT_P (*recog_data.operand_loc[i])
>  || GET_CODE (*recog_data.operand_loc[i]) == SUBREG)
> && (GET_CODE (recog_data.operand[i]) == MULT
> +   || GET_CODE (recog_data.operand[i]) == ASHIFT
> || GET_CODE (recog_data.operand[i]) == PLUS))
>   {
> INSN_CODE (insn) = -1;
> @@ -5562,7 +5563,8 @@ find_reloads_address_1 (machine_mode mod
>   return 1;
> }
>  
> - if (code0 == MULT || code0 == SIGN_EXTEND || code0 == TRUNCATE
> + if (code0 == MULT || code0 == ASHIFT
> + || code0 == SIGN_EXTEND || code0 == TRUNCATE
>   || code0 == ZERO_EXTEND || code1 == MEM)
> {
>   find_reloads_address_1 (mode, as, orig_op0, 1, PLUS, SCRATCH,
> @@ -5573,7 +5575,8 @@ find_reloads_address_1 (machine_mode mod
>   insn);
> }
>  
> - else if (code1 == MULT || code1 == SIGN_EXTEND || code1 == TRUNCATE
> + else if (code1 == MULT || code1 == ASHIFT
> +  || code1 == SIGN_EXTEND || code1 == TRUNCATE
>|| code1 == ZERO_EXTEND || code0 == MEM)
> {
>   find_reloads_address_1 (mode, as, orig_op0, 0, PLUS, code1,
> 

I regression-tested this patch for cris-elf at
r12-4987-g14e355df3053.  No regressions compared to
r12-4987-g14e355df3053.  (JFTR, that's at regress-11,
compared to T0=2007-01-05-16:47:21).

brgds, H-P


Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-08 Thread Hans-Peter Nilsson via Gcc-patches
> From: "Maciej W. Rozycki" 
> Date: Wed, 3 Nov 2021 14:53:58 +0100

>   gcc/
>   PR middle-end/103059
>   * reload.c (find_reloads_address_1): Also accept the ASHIFT form 
>   of indexed addressing.
>   (find_reloads): Adjust accordingly.

> ---
>  gcc/reload.c |9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> gcc-find-reloads-address-ashift.diff
> Index: gcc/gcc/reload.c
> ===
> --- gcc.orig/gcc/reload.c
> +++ gcc/gcc/reload.c
> @@ -2846,10 +2846,11 @@ find_reloads (rtx_insn *insn, int replac
>   i, operand_type[i], ind_levels, insn);
>  
> /* If we now have a simple operand where we used to have a
> -  PLUS or MULT, re-recognize and try again.  */
> +  PLUS or MULT or ASHIFT, re-recognize and try again.  */
> if ((OBJECT_P (*recog_data.operand_loc[i])
>  || GET_CODE (*recog_data.operand_loc[i]) == SUBREG)
> && (GET_CODE (recog_data.operand[i]) == MULT
> +   || GET_CODE (recog_data.operand[i]) == ASHIFT
> || GET_CODE (recog_data.operand[i]) == PLUS))
>   {
> INSN_CODE (insn) = -1;
> @@ -5562,7 +5563,8 @@ find_reloads_address_1 (machine_mode mod
>   return 1;
> }
>  
> - if (code0 == MULT || code0 == SIGN_EXTEND || code0 == TRUNCATE
> + if (code0 == MULT || code0 == ASHIFT
> + || code0 == SIGN_EXTEND || code0 == TRUNCATE
>   || code0 == ZERO_EXTEND || code1 == MEM)
> {
>   find_reloads_address_1 (mode, as, orig_op0, 1, PLUS, SCRATCH,
> @@ -5573,7 +5575,8 @@ find_reloads_address_1 (machine_mode mod
>   insn);
> }
>  
> - else if (code1 == MULT || code1 == SIGN_EXTEND || code1 == TRUNCATE
> + else if (code1 == MULT || code1 == ASHIFT
> +  || code1 == SIGN_EXTEND || code1 == TRUNCATE
>|| code1 == ZERO_EXTEND || code0 == MEM)
> {
>   find_reloads_address_1 (mode, as, orig_op0, 0, PLUS, code1,
> 

I regression-tested this patch for cris-elf at
r12-4987-g14e355df3053.  No regressions compared to
r12-4987-g14e355df3053.  (JFTR, that's at regress-11,
compared to T0=2007-01-05-16:47:21).

brgds, H-P


Re: [RFC] c++: Print function template parms when relevant (was: [PATCH v4] c++: Add gnu::diagnose_as attribute)

2021-11-08 Thread Matthias Kretz
I forgot to mention why I tagged it [RFC]: I needed one more bit of 
information on the template args TREE_VEC to encode EXPLICIT_TEMPLATE_ARGS_P. 
Its TREE_CHAIN already points to an integer constant denoting the number of 
non-default arguments, so I couldn't trivially replace that. Therefore, I used 
the sign of that integer. I was hoping to find a cleaner solution, though.

-Matthias

On Monday, 8 November 2021 17:40:44 CET Matthias Kretz wrote:
> On Tuesday, 17 August 2021 20:31:54 CET Jason Merrill wrote:
> > > 2. Given a DECL_TI_ARGS tree, can I query whether an argument was
> > > deduced
> > > or explicitly specified? I'm asking because I still consider diagnostics
> > > of function templates unfortunate. `template  void f()` is
> > > fine,
> > > as is `void f(T) [with T = float]`, but `void f() [with T = float]`
> > > could
> > > be better. I.e. if the template parameter appears somewhere in the
> > > function parameter list, dump_template_parms would only produce noise.
> > > If, however, the template parameter was given explicitly, it would be
> > > nice if it could show up accordingly in diagnostics.
> > 
> > NON_DEFAULT_TEMPLATE_ARGS_COUNT has that information, though there are
> > some issues with it.  Attached is my WIP from May to improve it
> > somewhat, if that's interesting.
> 
> It is interesting. I used your patch to come up with the attached. Patch. I
> must say, I didn't try to read through all the cp/pt.c code to understand
> all of what you did there (which is why my ChangeLog entry says "Jason?"),
> but it works for me (and all of `make check`).
> 
> Anyway, I'd like to propose the following before finishing my diagnose_as
> patch. I believe it's useful to fix this part first. The diagnostic/default-
> template-args-[12].C tests show a lot of examples of the intent of this
> patch. And the remaining changes to the testsuite show how it changes
> diagnostic output.
> 
> -- 8< 
> 
> The choice when to print a function template parameter was still
> suboptimal. That's because sometimes the function template parameter
> list only adds noise, while in other situations the lack of a function
> template parameter list makes diagnostic messages hard to understand.
> 
> The general idea of this change is to print template parms wherever they
> would appear in the source code as well. Thus, the diagnostics code
> needs to know whether any template parameter was given explicitly.
> 
> Signed-off-by: Matthias Kretz 
> 
> gcc/testsuite/ChangeLog:
> 
> * g++.dg/debug/dwarf2/template-params-12n.C: Optionally, allow
> DW_AT_default_value.
> * g++.dg/diagnostic/default-template-args-1.C: New.
> * g++.dg/diagnostic/default-template-args-2.C: New.
> * g++.dg/diagnostic/param-type-mismatch-2.C: Expect template
> parms in diagnostic.
> * g++.dg/ext/pretty1.C: Expect function template specialization
> to not pretty-print template parms.
> * g++.old-deja/g++.ext/pretty3.C: Ditto.
> * g++.old-deja/g++.pt/memtemp77.C: Ditto.
> * g++.dg/goacc/template.C: Expect function template parms for
> explicit arguments.
> * g++.dg/gomp/declare-variant-7.C: Expect no function template
> parms for deduced arguments.
> * g++.dg/template/error40.C: Expect only non-default template
> arguments in diagnostic.
> 
> gcc/cp/ChangeLog:
> 
> * cp-tree.h (GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT): Return
> absolute value of stored constant.
> (EXPLICIT_TEMPLATE_ARGS_P): New.
> (SET_EXPLICIT_TEMPLATE_ARGS_P): New.
> (TFF_AS_PRIMARY): New constant.
> * error.c (get_non_default_template_args_count): Avoid
> GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT if
> NON_DEFAULT_TEMPLATE_ARGS_COUNT is a NULL_TREE. Make independent
> of flag_pretty_templates.
> (dump_template_bindings): Add flags parameter to be passed to
> get_non_default_template_args_count. Print only non-default
> template arguments.
> (dump_function_decl): Call dump_function_name and dump_type of
> the DECL_CONTEXT with specialized template and set
> TFF_AS_PRIMARY for their flags.
> (dump_function_name): Add and document conditions for calling
> dump_template_parms.
> (dump_template_parms): Print only non-default template
> parameters.
> * pt.c (determine_specialization): Jason?
> (template_parms_level_to_args): Jason?
> (copy_template_args): Jason?
> (fn_type_unification): Set EXPLICIT_TEMPLATE_ARGS_P on the
> template arguments tree if any template parameter was explicitly
> given.
> (type_unification_real): Jason?
> (get_partial_spec_bindings): Jason?
> (tsubst_template_args): Determine number of defaulted arguments
> from new argument vector, if possible.
> ---
>  gcc/cp/cp-tree.h   

[PATCH] c++: __builtin_bit_cast To C array target type [PR103140]

2021-11-08 Thread Will Wray via Gcc-patches
This patch allows __builtin_bit_cast to materialize a C array as its To type.

It was developed as part of an implementation of P1997, array copy-semantics,
but is independent, so makes sense to submit, review and merge ahead of it.

gcc/cp/ChangeLog:

* constexpr.c (check_bit_cast_type): handle ARRAY_TYPE check,
(cxx_eval_bit_cast): handle ARRAY_TYPE copy.
* semantics.c (cp_build_bit_cast): warn only on unbounded/VLA.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/bit-cast2.C: update XFAIL tests.
* g++.dg/cpp2a/bit-cast-to-array1.C: New test.
---
 gcc/cp/constexpr.c  |  8 -
 gcc/cp/semantics.c  |  7 ++---
 gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C | 40 +
 gcc/testsuite/g++.dg/cpp2a/bit-cast2.C  |  8 ++---
 4 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 453007c686b..be1cdada6f8 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4124,6 +4124,11 @@ static bool
 check_bit_cast_type (const constexpr_ctx *ctx, location_t loc, tree type,
 tree orig_type)
 {
+  if (TREE_CODE (type) == ARRAY_TYPE)
+  return check_bit_cast_type (ctx, loc,
+ TYPE_MAIN_VARIANT (TREE_TYPE (type)),
+ orig_type);
+
   if (TREE_CODE (type) == UNION_TYPE)
 {
   if (!ctx->quiet)
@@ -4280,7 +4285,8 @@ cxx_eval_bit_cast (const constexpr_ctx *ctx, tree t, bool 
*non_constant_p,
   tree r = NULL_TREE;
   if (can_native_interpret_type_p (TREE_TYPE (t)))
 r = native_interpret_expr (TREE_TYPE (t), ptr, len);
-  else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE)
+  else if (TREE_CODE (TREE_TYPE (t)) == RECORD_TYPE
+  || TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE)
 {
   r = native_interpret_aggregate (TREE_TYPE (t), ptr, 0, len);
   if (r != NULL_TREE)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 2443d032749..b3126b12abc 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -11562,13 +11562,10 @@ cp_build_bit_cast (location_t loc, tree type, tree 
arg,
 {
   if (!complete_type_or_maybe_complain (type, NULL_TREE, complain))
return error_mark_node;
-  if (TREE_CODE (type) == ARRAY_TYPE)
+  if (TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
{
- /* std::bit_cast for destination ARRAY_TYPE is not possible,
-as functions may not return an array, so don't bother trying
-to support this (and then deal with VLAs etc.).  */
  error_at (loc, "%<__builtin_bit_cast%> destination type %qT "
-"is an array type", type);
+"is a VLA variable-length array type", type);
  return error_mark_node;
}
   if (!trivially_copyable_p (type))
diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C 
b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C
new file mode 100644
index 000..e6e50c06389
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast-to-array1.C
@@ -0,0 +1,40 @@
+// { dg-do compile }
+
+class S { int s; };
+S s();
+class U { int a, b; };
+U u();
+
+void
+foo (int *q)
+{
+  __builtin_bit_cast (int [1], 0);
+  __builtin_bit_cast (S [1], 0);
+  __builtin_bit_cast (U [1], u);
+}
+
+template 
+void
+bar (int *q)
+{
+  int intN[N] = {};
+  int int2N[2*N] = {};
+  __builtin_bit_cast (int [N], intN);
+  __builtin_bit_cast (S [N], intN);
+  __builtin_bit_cast (U [N], int2N);
+}
+
+template 
+void
+baz (T1 ia, T2 sa, T3 ua)
+{
+  __builtin_bit_cast (T1, *ia);
+  __builtin_bit_cast (T2, *sa);
+  __builtin_bit_cast (T3, *ua);
+}
+
+void
+qux (S* sp, int *ip, U* up)
+{
+  baz  (ip, sp, up);
+}
diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C 
b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
index 6bb1760e621..7f1836ee4e9 100644
--- a/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
+++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast2.C
@@ -14,7 +14,7 @@ foo (int *q)
   __builtin_bit_cast (int, s); // { dg-error "'__builtin_bit_cast' 
source type 'S' is not trivially copyable" }
   __builtin_bit_cast (S, 0);   // { dg-error "'__builtin_bit_cast' 
destination type 'S' is not trivially copyable" }
   __builtin_bit_cast (int &, q);   // { dg-error "'__builtin_bit_cast' 
destination type 'int&' is not trivially copyable" }
-  __builtin_bit_cast (int [1], 0); // { dg-error "'__builtin_bit_cast' 
destination type \[^\n\r]* is an array type" }
+  __builtin_bit_cast (S [1], 0);   // { dg-error "'__builtin_bit_cast' 
destination type \[^\n\r]* is not trivially copyable" }
   __builtin_bit_cast (V, 0);   // { dg-error "invalid use of 
incomplete type 'struct V'" }
   __builtin_bit_cast (int, v);
   __builtin_bit_cast (int, *p);// { dg-error "invalid use of 
incomplete type 'struct V'" }
@@ -29,7 +29,7 @@ bar (int *q)
   __builtin_bit_cast (int, s); // { dg-er

Re: [PATCH] pch: Add support for PCH for relocatable executables

2021-11-08 Thread John David Anglin

On 2021-11-08 2:48 p.m., Jakub Jelinek wrote:

Not really sure about PA or IA-64 function descriptors, are any of those
allocated by the dynamic linker rather than created by the static linker?

On PA, the static linker creates all function descriptors.  The dynamic linker 
is responsible for
resolving descriptors when lazy binding is used.

The primary difference between 32 and 64-bit descriptors is that there can be 
multiple descriptors
that resolve to the same function in the 32-bit run time.  In the 64-bit case, 
there is one official
procedure descriptor for each function.

I guess instead of removing the c-pch.c changes we could remember there
not just a function pointer, but also a data pointer and compare if both
are either the same or have the same load bias and punt only if they
have different bias.  Though, on architecture where all function pointers
would be dynamically allocated and could change any time even that wouldn't
be really a reliable check.

There is no call to dynamically allocate a descriptor but it is possible for 
code to dynamically build a descriptor.

Dave

--
John David Anglin  dave.ang...@bell.net



Re: [PATCH] c++: Implement -Wuninitialized for mem-initializers (redux) [PR19808]

2021-11-08 Thread Marek Polacek via Gcc-patches
On Sun, Nov 07, 2021 at 04:48:46PM -0700, Martin Sebor wrote:
> On 11/5/21 5:22 PM, Marek Polacek via Gcc-patches wrote:
> > 2021 update: Last year I posted a version of this patch:
> > 
> > but it didn't make it in.  The main objection seemed to be that the
> > patch tried to do too much, and overlapped with the ME uninitialized
> > warnings.  Since the patch used walk_tree without any data flow info,
> > it issued false positives for things like a(0 ? b : 42) and similar.
> > 
> > I'll admit I've been dreading resurrecting this because of the lack
> > of clarity about where we should warn about what.  On the other hand,
> > I think we really should do something about this.  So I've simplified
> > the original patch as much as it seemed reasonable.  For instance, it
> > doesn't even attempt to handle cases like "a((b = 42)), c(b)" -- for
> > these I simply give up for the whole mem-initializer (but who writes
> > code like that, anyway?).  I also give up when a member is initialized
> > with a function call, because we don't know what the call could do.
> 
> I (still) believe this is a great improvement :)

Yea, would be nice to finally nail it down this time around.

> Thinking about your last sentence above and experimenting with
> the patch just a little, we do know that function calls cannot
> "initialize" (by assigning a value) const or reference members,
> so those can be assumed to be uninitiazed when used before
> their initialized has been evaluated.  As in:
> 
> int f (int);
> 
> struct A
> {
>   int x;
>   const int y;
>   A ():
> x (f (y)),   << y used before initialized
> y (1) { }
> };
> 
> It should also be reasonable for the prurposes of warnings to
> assume that a call to a const member function (or any call where
> an object is passed by a const reference) doesn't change any
> non-mutable members.  The middle end issues -Wmaybe-uninitialized
> when it sees an uninitualized object passed to a const-qualified
> pointer or reference.

Maybe, I don't know.  It gets tricky and I don't think I want to
handle it in my patch, at least not in this initial version.  It
may be best to leave this to the ME.  Dunno.

> At the same time, a call to a non-const member function can assign
> a value to a not-yet-initialized member, so I wonder if its uses
> subsequent to it should be accepted without warning:
> 
> struct A
> {
>   int x, y, z;
>   int f () { y = 1; return 2; }
>   A ():
> x (f ()),
> z (y)   << avoid warning here?
>   { }
> };

This is actually a bug in my code, now fixed (made changes in the
DECL_NONSTATIC_MEMBER_FUNCTION_P block).
 
> > +  /* We're initializing a reference member with itself.  */
> > +  if (TYPE_REF_P (type) && cp_tree_equal (d->member, init))
> > +   warning_at (EXPR_LOCATION (init), OPT_Winit_self,
> > +   "%qD is initialized with itself", field);
> > +  else if (cp_tree_equal (TREE_OPERAND (init, 0), current_class_ref)
> > +  && uninitialized->contains (field))
> > +   {
> > + if (TYPE_REF_P (TREE_TYPE (field)))
> > +   warning_at (EXPR_LOCATION (init), OPT_Wuninitialized,
> > +   "reference %qD is not yet bound to a value when used "
> > +   "here", field);
> > + else if (!INDIRECT_TYPE_P (type) || is_this_parameter (d->member))
> > +   warning_at (EXPR_LOCATION (init), OPT_Wuninitialized,
> > +   "field %qD is used uninitialized", field);
> 
> Might I suggest the word "member" instead?
> 
> This is my OCD rearing its head but I always cringe when I read
> the word field in reference to a subobject.  Field is a term
> common in database programming but in C and C++, either member
> or subobject are more wide-spread.   Except for bit-fields, C
> formally uses the term field is to refer to the are into which
> printf directives format their output.  With one exception, C++
> 98 used it the same way.  I see that in recent versions another
> reference has crept in with the memory model, but I think it
> would be fair to call these two anomalies.

Hahah!  No worries, changed to "member".  "Field" is probably just
a GCCism in my head (due to FIELD_DECL).

> > +++ b/gcc/testsuite/g++.dg/warn/Wuninitialized-14.C
> > @@ -0,0 +1,31 @@
> > +// PR c++/19808
> > +// { dg-do compile { target c++11 } }
> > +// { dg-options "-Wuninitialized" }
> > +
> > +struct A {
> > +  int m;
> > +  int get() const { return m; }
> > +
> > +  A() { }
> > +  A(int) { }
> > +  A(const A &) { }
> > +  A(A *) { }
> > +};
> > +
> > +struct S {
> > +  A a, b;
> > +
> > +  S(int (*)[1]) : a() {}
> > +  S(int (*)[2]) : b(a.get()) {}
> 
> Here, because S::a.m isn't initialized when a.get() returns it,
> we ideally want a warning, we just can't get it, right?
> 
> To make this clear to people who read tests and also easier to
> add a warning if GCC gets smart enough (without causing what
> might look like false positives), I'd s

Re: [PATH][_GLIBCXX_DEBUG] Fix unordered container merge

2021-11-08 Thread François Dumont via Gcc-patches
Yet another version this time with only 1 guard implementation. The 
predicate to invalidate the safe iterators has been externalized.


Ok to commit ?


On 06/11/21 2:51 pm, François Dumont wrote:
You were right to delay your reply. Here is a new version with less 
code duplication and a bug fix in the new _UContMergeGuard where we 
were using it->second rather than it->first to get the key.


Note also that the change to _M_merge_multi implementation is also 
important because otherwise we might be trying to extract a key from a 
destructed node.


    libstdc++: [_GLIBCXX_DEBUG] Implement unordered container merge

    The _GLIBCXX_DEBUG unordered containers need a dedicated merge 
implementation
    so that any existing iterator on the transfered nodes is properly 
invalidated.


    Add typedef/using declaration for everything used as-is from 
normal implementation.


    libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h (__distance_fw): Replace 
class keyword with

    typename.
    * include/bits/hashtable.h 
(_Hashtable<>::_M_merge_unique): Remove noexcept
    qualification. Use const_iterator for node 
extraction/reinsert.
    (_Hashtable<>::_M_merge_multi): Likewise. Compute new hash 
code before extract.
    * include/debug/safe_container.h (_Safe_container<>): Make 
all methods

    protected.
    * include/debug/safe_unordered_container.h
(_Safe_unordered_container<>::_UContMergeGuard<_ExtractKey, _Source>): 
New.

(_Safe_unordered_container<>::_S_uc_guard<_ExtractKey, _Source>): New.
(_Safe_unordered_container<>::_UMContMergeGuard<_ExtractKey, 
_Source>): New.

(_Safe_unordered_container<>::_S_umc_guard<_ExtractKey, _Source>): New.
(_Safe_unordered_container<>::_M_invalide_all): Make public.
    (_Safe_unordered_container<>::_M_invalide_if): Likewise.
(_Safe_unordered_container<>::_M_invalide_local_if): Likewise.
    * include/debug/unordered_map
    (unordered_map<>::mapped_type, pointer, const_pointer): 
New typedef.
    (unordered_map<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_map<>::get_allocator, empty, size, max_size): 
Add usings.
    (unordered_map<>::bucket_count, max_bucket_count, bucket): 
Add usings.
    (unordered_map<>::hash_function, key_equal, count, 
contains): Add usings.
    (unordered_map<>::operator[], at, rehash, reserve): Add 
usings.

    (unordered_map<>::merge): New.
    (unordered_multimap<>::mapped_type, pointer, 
const_pointer): New typedef.
    (unordered_multimap<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_multimap<>::get_allocator, empty, size, 
max_size): Add usings.
    (unordered_multimap<>::bucket_count, max_bucket_count, 
bucket): Add usings.
    (unordered_multimap<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_multimap<>::rehash, reserve): Add usings.
    (unordered_multimap<>::merge): New.
    * include/debug/unordered_set
    (unordered_set<>::mapped_type, pointer, const_pointer): 
New typedef.
    (unordered_set<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_set<>::get_allocator, empty, size, max_size): 
Add usings.
    (unordered_set<>::bucket_count, max_bucket_count, bucket): 
Add usings.
    (unordered_set<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_set<>::rehash, reserve): Add usings.
    (unordered_set<>::merge): New.
    (unordered_multiset<>::mapped_type, pointer, 
const_pointer): New typedef.
    (unordered_multiset<>::reference, const_reference, 
difference_type): New typedef.
    (unordered_multiset<>::get_allocator, empty, size, 
max_size): Add usings.
    (unordered_multiset<>::bucket_count, max_bucket_count, 
bucket): Add usings.
    (unordered_multiset<>::hash_function, key_equal, count, 
contains): Add usings.

    (unordered_multiset<>::rehash, reserve): Add usings.
    (unordered_multiset<>::merge): New.
    * 
testsuite/23_containers/unordered_map/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_map/debug/merge4_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge1_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge2_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge3_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multimap/debug/merge4_neg.cc: New test.
    * 
testsuite/23_containers/unordered_multiset/debug/merge1_neg.c

PING [PATCH 0/2] provide simple detection of indeterminate pointers

2021-11-08 Thread Martin Sebor via Gcc-patches

Ping for the two patches below:

-Wuse-after-free:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583044.html

and -Wdangling-pointer:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583045.html

On 11/1/21 4:15 PM, Martin Sebor wrote:

This two-patch series adds support for the detection of uses
of pointers invalidated as a result of the lifetime of
the objects they point to having ended: either explicitly,
after a call to a dynamic deallocation function, or implicitly,
by virtue of an object with automatic storage duration having
gone out of scope.

To minimize false positives the initial logic is very simple
(even simplistic): the code only checks uses in basic blocks
dominated by the invalidating calls (either calls to
deallocation functions or GCC's clobbers).

A more thorough checker is certainly possible and I'd say most
desirable but will require a more sophisticated implementation
and a better predicate analyzer than is available, and so will
need to wait for GCC 13.

Martin




Re: [PATCH v1 1/7] LoongArch Port: gcc

2021-11-08 Thread Joseph Myers
You have:

> +#define GLIBC_DYNAMIC_LINKER_LP64 "/lib64/ld.so.1"

See my comments on the glibc patch series 
.  
Specifically, the point that all new glibc ports should have unique 
per-ABI dynamic linker names for each ABI supported by the port, 
preferably referencing the architecture name somewhere in the dynamic 
linker name.  /lib64/ld.so.1 is a name that's already in use, so should 
not be used by any ABI of this new port.

> +  error ("%<-march=%s%> does not work on a cross compiler.",

Error messages should not end with '.'.

> +  error ("%<-mtune=%s%> does not work on a cross compiler.",

Likewise.

I didn't see any additions to contrib/config-list.mk anywhere in the patch 
series.  (See "Back End" in sourcebuild.texi for a list of places you may 
need to update as part of a GCC port, including config-list.mk.)

Please make sure the back end builds cleanly with current GCC mainline.  
This can be tested either with a native bootstrap, or by building a cross 
compiler, using a native compiler of the same GCC mainline version for the 
build and configuring using --enable-werror-always (that configure option 
has the effect of enabling -Werror in the same way that later bootstrap 
stages in a native bootstrap do).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v2] c++: Implement -Wuninitialized for mem-initializers (redux) [PR19808]

2021-11-08 Thread Marek Polacek via Gcc-patches
On Sun, Nov 07, 2021 at 12:45:24AM -0400, Jason Merrill wrote:
> On 11/5/21 19:22, Marek Polacek wrote:
> > 2021 update: Last year I posted a version of this patch:
> > 
> > but it didn't make it in.  The main objection seemed to be that the
> > patch tried to do too much, and overlapped with the ME uninitialized
> > warnings.  Since the patch used walk_tree without any data flow info,
> > it issued false positives for things like a(0 ? b : 42) and similar.
> > 
> > I'll admit I've been dreading resurrecting this because of the lack
> > of clarity about where we should warn about what.  On the other hand,
> > I think we really should do something about this.  So I've simplified
> > the original patch as much as it seemed reasonable.  For instance, it
> > doesn't even attempt to handle cases like "a((b = 42)), c(b)" -- for
> > these I simply give up for the whole mem-initializer (but who writes
> > code like that, anyway?).  I also give up when a member is initialized
> > with a function call, because we don't know what the call could do.
> > See Wuninitialized-17.C, for which clang emits a false positive but
> > we don't.  I remember having a hard time dealing with initializer lists
> > in my previous patch, so now I only handle simple a{b} cases, but no
> > more.  It turned out that this abridged version still warns about 90%
> > cases where users would expect a warning.
> 
> Sounds good.
> 
> > More complicated cases are left for the ME, which, for unused inline
> > functions, will only warn with -fkeep-inline-functions, but so be it.
> 
> This seems like a broader issue that should really be addressed; since we
> have warnings that depend on running optimization passes, there should be a
> way to force the function through to that point even if we don't need to
> write it out.  This is also the problem with bug 21678.
 
Presumably not addressed as part of this patch ;)  FWIW, it will probably
need a separate option given how expensive that will be...

> > @@ -820,6 +936,12 @@ perform_member_init (tree member, tree init)
> > warning_at (DECL_SOURCE_LOCATION (current_function_decl),
> > OPT_Winit_self, "%qD is initialized with itself",
> > member);
> > +  else if (!uninitialized.is_empty ())
> > +   {
> > + find_uninit_data data = { &uninitialized, decl };
> > + cp_walk_tree_without_duplicates (&val, find_uninit_fields_r,
> > +  &data);
> > +   }
> 
> You repeat this pattern three times; it might be good to factor it into a
> find_uninit_fields function.

Done.
 
> >   }
> > if (array_of_unknown_bound_p (type))
> > @@ -848,6 +970,9 @@ perform_member_init (tree member, tree init)
> >  do aggregate-initialization.  */
> >   }
> > +  /* Assume we are initializing the member.  */
> > +  bool member_initialized_p = true;
> > +
> > if (init == void_type_node)
> >   {
> > /* mem() means value-initialization.  */
> > @@ -988,6 +1113,9 @@ perform_member_init (tree member, tree init)
> > diagnose_uninitialized_cst_or_ref_member (core_type,
> >   /*using_new=*/false,
> >   /*complain=*/true);
> > +
> > + /* We left the member uninitialized.  */
> > + member_initialized_p = false;
> > }
> > maybe_warn_list_ctor (member, init);
> > @@ -998,6 +1126,11 @@ perform_member_init (tree member, tree init)
> > tf_warning_or_error));
> >   }
> > +  if (member_initialized_p && warn_uninitialized)
> > +/* This member is now initialized, remove it from the uninitialized
> > +   set.  */
> > +uninitialized.remove (member);
> > +
> > if (type_build_dtor_call (type))
> >   {
> > tree expr;
> > @@ -1311,13 +1444,30 @@ emit_mem_initializers (tree mem_inits)
> > if (!COMPLETE_TYPE_P (current_class_type))
> >   return;
> > +  /* Keep a set holding fields that are not initialized.  */
> > +  hash_set uninitialized;
> > +
> > +  /* Initially that is all of them.  */
> > +  if (warn_uninitialized)
> > +for (tree field = TYPE_FIELDS (current_class_type); field;
> > +field = TREE_CHAIN (field))
> > +  if (TREE_CODE (field) == FIELD_DECL && !DECL_ARTIFICIAL (field))
> 
> I think you need to check DECL_UNNAMED_BIT_FIELD here.  Incidentally, so
> does default_init_uninitialized_part:
> 
> struct A
> {
>   int : 8;
>   int i = 42;
> };
> 
> constexpr A a; // bogus error

I sent a patch earlier today for this:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583717.html
 
> It would probably be better to use next_initializable_field more widely.
> But you'd still need to check DECL_ARTIFICIAL to exclude the base/vtable
> fields that function includes.

OK, done.

Here's a v2, which also fixes a bug that Martin S. found (in the CALL_EXPR
block)

Re: Values of WIDE_INT_MAX_ELTS in gcc11 and gcc12 are different

2021-11-08 Thread Qing Zhao via Gcc-patches
Hi, I tried both the following patches:

Patch1:

[opc@qinzhao-ol8u3-x86 gcc]$ git diff 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..ca49d2b4514 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
   /* If this variable is in a register use expand_assignment.
 For boolean scalars force zero-init.  */
   tree init;
+  scalar_int_mode var_mode;
   if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
  && tree_fits_uhwi_p (var_size)
  && (init_type == AUTO_INIT_PATTERN
  || !is_gimple_reg_type (var_type))
  && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
-   0).exists ())
+   0).exists (&var_mode)
+ && targetm.scalar_mode_supported_p (var_mode))
{
  unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);

AND

Patch2:
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0cba95411a6..7f129655926 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -3073,12 +3073,14 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
   /* If this variable is in a register use expand_assignment.
 For boolean scalars force zero-init.  */
   tree init;
+  scalar_int_mode var_mode;
   if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
  && tree_fits_uhwi_p (var_size)
  && (init_type == AUTO_INIT_PATTERN
  || !is_gimple_reg_type (var_type))
  && int_mode_for_size (tree_to_uhwi (var_size) * BITS_PER_UNIT,
-   0).exists ())
+   0).exists (&var_mode)
+ && have_insn_for (SET, var_mode))
{
  unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi (var_size);
  unsigned char *buf = (unsigned char *) xmalloc (total_bytes);

Have the same effect:

1. Resolved the ICE in gcc11;
2. For _Complex long double variables, both return FALSE, as a result, for 
PATTERN initialization of _Complex long double variables, now they are 
initialization with ZEROs instead of FEs.

Let me know you opinion on this, If the above 2 is okay, then I might pick the 
above Patch 1 for the final patch to this issue.

Thanks.

Qing

> On Nov 8, 2021, at 2:41 AM, Richard Biener  wrote:
> 
> On Sat, Nov 6, 2021 at 10:56 AM Jakub Jelinek  wrote:
>> 
>> On Fri, Nov 05, 2021 at 05:37:25PM +, Qing Zhao wrote:
 On Nov 5, 2021, at 11:17 AM, Jakub Jelinek  wrote:
 
 On Fri, Nov 05, 2021 at 04:11:36PM +, Qing Zhao wrote:
> 3076   if (TREE_CODE (TREE_TYPE (lhs)) != BOOLEAN_TYPE
> 3077   && tree_fits_uhwi_p (var_size)
> 3078   && (init_type == AUTO_INIT_PATTERN
> 3079   || !is_gimple_reg_type (var_type))
> 3080   && int_mode_for_size (tree_to_uhwi (var_size) * 
> BITS_PER_UNIT,
> 3081 0).exists ())
> 3082 {
> 3083   unsigned HOST_WIDE_INT total_bytes = tree_to_uhwi 
> (var_size);
> 3084   unsigned char *buf = (unsigned char *) xmalloc 
> (total_bytes);
> 3085   memset (buf, (init_type == AUTO_INIT_PATTERN
> 3086 ? INIT_PATTERN_VALUE : 0), total_bytes);
> 3087   tree itype = build_nonstandard_integer_type
> 3088  (total_bytes * BITS_PER_UNIT, 1);
> 
> The exact failing point is at function 
> “set_min_and_max_values_for_integral_type”:
> 
> 2851   gcc_assert (precision <= WIDE_INT_MAX_PRECISION);
> 
> For _Complex long double,  “precision” is 256.
> In GCC11, “WIDE_INT_MAX_PRECISION” is 192,  in GCC12, it’s 512.
> As a result, the above assertion failed on GCC11.
> 
> I am wondering what’s the best fix for this issue in gcc11?
 
 Even for gcc 12 the above is wrong, you can't blindly assume that
 build_nonstandard_integer_type will work for arbitrary precisions,
 and even if it works that it will actually work.
 The fact that such a mode exist is one thing, but
 targetm.scalar_mode_supported_p should be tested for whether the mode
 is actually supported.
>>> 
>>> You mean “int_mode_for_size().exists()” is not enough to make sure
>>> “build_nonstandard_integer_type” to be valid?  We should add
>>> “targetm.scalar_mode_supported_p” too ?
>> 
>> Yeah.  The former says whether the backend has that mode at all.
>> But some modes may be there only in some specific patterns but
>> without support for mov, add, etc.  Only for
>> targetm.scalar_mode_supported_p modes the backend guarantees that
>> one can use them e.g. in mode attribute and can expect expansion
>> to expand everything with that mode that is needed in some way.
>> E.g. only if targetm.scalar_mode_supported_p (TImode) the FEs
>> support __int128_t

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-11-08 Thread Jeff Law via Gcc-patches




On 10/15/2021 4:39 AM, Aldy Hernandez wrote:



On 10/15/21 2:47 AM, Andrew MacLeod wrote:

On 10/14/21 6:07 PM, Martin Sebor via Gcc-patches wrote:

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Martin


I'm willing to bet he didn't submit it for review because he doesn't 
have time this release to polish and track it...  (I think the 
threader has been quite consuming).  Rather, it was offered as a 
starting point for someone else who might be interested in continuing 
to pursue this work...  *everyone* is interested in cleanup work 
others do :-)


Exactly.  There's a lot of work that could be done in this area, and 
I'm trying to avoid the situation with the threaders where what 
started as refactoring ended up with me basically owning them ;-).


That being said, I there are enough cleanups that are useful on their 
own.  I've removed all the passing around of GSIs, as well as ptr_qry, 
with the exception of anything dealing with the sprintf pass, since it 
has a slightly different interface.


This is patch 0001, which I'm formally submitting for inclusion. No 
functional changes with this patch.  OK for trunk?


Also, I am PINGing patch 0002, which is the strlen pass conversion to 
the ranger.  As mentioned, this is just a change from an evrp client 
to a ranger client.  The APIs are exactly the same, and besides, the 
evrp analyzer is deprecated and slated for removal. OK for trunk?


Aldy

0001-Convert-strlen-pass-from-evrp-to-ranger.patch

 From 152bc3a1dad9a960b7c0c53c65d6690532d9da5a Mon Sep 17 00:00:00 2001
From: Aldy Hernandez
Date: Fri, 8 Oct 2021 15:54:23 +0200
Subject: [PATCH] Convert strlen pass from evrp to ranger.

The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

Basically the entire patch is just adjusting the calls to range_of_expr
to include context.  The previous context of si->stmt was mostly
empty, so not really useful ;-).

With ranger we are now able to remove the range calculation from
before_dom_children entirely.  Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code.  I have XFAILed the test and documented what the problem is.

On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.

Tested on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-strlen.c (compare_nonzero_chars): Pass statement
context to ranger.
(get_addr_stridx): Same.
(get_stridx): Same.
(get_range_strlen_dynamic): Same.
(handle_builtin_strlen): Same.
(handle_builtin_strchr): Same.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Same.
(handle_builtin_stxncpy_strncat):
(handle_builtin_memcpy): Same.
(handle_builtin_strcat): Same.
(handle_alloc_call): Same.
(handle_builtin_memset): Same.
(handle_builtin_string_cmp): Same.
(handle_pointer_plus): Same.
(count_nonzero_bytes_addr): Same.
(count_nonzero_bytes): Same.
(handle_store): Same.
(fold_strstr_to_strncmp): Same.
(handle_integral_assign): Same.
(check_and_optimize_stmt): Same.
(class strlen_dom_walker): Replace evrp with ranger.
(strlen_dom_walker::before_dom_children): Remove evrp.
(strlen_dom_walker::after_dom_children): Remove evrp.
* gimple-ssa-warn-access.cc (maybe_check_access_sizes):
Restrict sprintf output.

gcc/cp/ChangeLog:

* ptree.c (cxx_print_xnode): Add more space to pfx array.

gcc/fortran/ChangeLog:

* misc.c (gfc_dummy_typename): Make sure ts->kind is
non-negative.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/builtin-sprintf-warn-22.c: XFAIL.
OK.  Had I realized 99% was just adding the new argument to a bunch of 
call sites, I would have taken care of it earlier.


Re: [PATCH] PR middle-end/103059: reload: Also accept ASHIFT with indexed addressing

2021-11-08 Thread Jeff Law via Gcc-patches




On 11/4/2021 6:18 PM, Maciej W. Rozycki wrote:

On Thu, 4 Nov 2021, Jeff Law wrote:


Sometimes the language we're using in email is not as crisp as it should be.  So
just to be clear, the canonicalization I'm referring to is only in effect within
a MEM.  It does not apply to address calculations that happen outside a MEM.  I
think that is consistent with Richard's comments.

  Ah, OK then.


and then reload substitutes (reg/v:SI 154 [ n_ctrs ]) with the inner MEM
as it fails to reload the pseudo and just uses its memory location.

OK.  So what I still don't see is why  we would need to re-recognize.   You're
changing code that I thought was only applicable when we were reloading an
address inside a MEM and if we're inside a MEM, then we shouldn't be seeing an
ASHIFT.   We're replacing the argument of the ASHIFT.

  Well, the context of this code (around and including hunk #1) is:

   else if (insn_extra_address_constraint
   (lookup_constraint (constraints[i])))
{
  address_operand_reloaded[i]
= find_reloads_address (recog_data.operand_mode[i], (rtx*) 0,
recog_data.operand[i],
recog_data.operand_loc[i],
i, operand_type[i], ind_levels, insn);

  /* If we now have a simple operand where we used to have a
 PLUS or MULT, re-recognize and try again.  */
  if ((OBJECT_P (*recog_data.operand_loc[i])
   || GET_CODE (*recog_data.operand_loc[i]) == SUBREG)
  && (GET_CODE (recog_data.operand[i]) == MULT
  || GET_CODE (recog_data.operand[i]) == PLUS))
{
  INSN_CODE (insn) = -1;
  retval = find_reloads (insn, replace, ind_levels, live_known,
 reload_reg_p);
  return retval;
}

so the body of the conditional is specifically executed for an address and
not a MEM; in this particular case matched with the plain "p" constraint.

  MEMs are handled with the next conditional right below.
Ah!  Thanks for the clarification.  We're digging deep into history 
here.  I always thought this code was re-recognizing inside a MEM, but 
as you note, it's actually handling stuff outside the MEM, such as  a 
'p' constraint, which is an address, but being outside a MEMS means its 
not subject to the mult-by-power-of-2 canonicalization.


So I think the first hunk is fine.  There's two others that twiddle 
find_reloads_address_1, which I think can only be reached from 
find_reloads_address.  The comment at the front would indicate it's only 
called where AD is inside a MEM.


Are we getting into find_reloads_address_1 in any case where the RTL is 
not an address inside a MEM?


jeff



Re: [PATCH] c++: bogus error w/ variadic concept-id as if cond [PR98394]

2021-11-08 Thread Patrick Palka via Gcc-patches
On Mon, 8 Nov 2021, Jason Merrill wrote:

> On 11/8/21 10:35, Patrick Palka wrote:
> > Here when tentatively parsing the if condition as a declaration, we try
> > to treat C<1> as the start of a constrained placeholder type, which we
> > quickly reject because C doesn't accept a type as its first argument.
> > But since we're parsing tentatively, we shouldn't emit an error in this
> > case.
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> > trunk/11?
> > 
> > PR c++/98394
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.c (cp_parser_placeholder_type_specifier): Don't emit
> > a "does not constrain a type" error when parsing tentatively.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/concepts-pr98394.C: New test.
> > ---
> >   gcc/cp/parser.c   |  7 +--
> >   gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C | 14 ++
> >   2 files changed, 19 insertions(+), 2 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C
> > 
> > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> > index 4c2075742d6..f1498e28da4 100644
> > --- a/gcc/cp/parser.c
> > +++ b/gcc/cp/parser.c
> > @@ -19909,8 +19909,11 @@ cp_parser_placeholder_type_specifier (cp_parser
> > *parser, location_t loc,
> > if (!flag_concepts_ts
> >   || !processing_template_parmlist)
> > {
> > - error_at (loc, "%qE does not constrain a type", DECL_NAME (con));
> > - inform (DECL_SOURCE_LOCATION (con), "concept defined here");
> > + if (!tentative)
> > +   {
> > + error_at (loc, "%qE does not constrain a type", DECL_NAME
> > (con));
> > + inform (DECL_SOURCE_LOCATION (con), "concept defined here");
> > +   }
> 
> We probably want to cp_parser_simulate_error in the tentative case?

It seems the only caller, cp_parser_simple_type_specifier, is
responsible for that currently.

> 
> I also wonder why this code uses a "tentative" parameter instead of checking
> cp_parser_parsing_tentatively itself.

During the first call to cp_parser_placeholder_type_specifier from
cp_parser_simple_type_specifier, we're always parsing tentatively so
cp_parser_parsing_tentatively would always be true whereas 'tentative'
could be false if there's also an outer tentative parse.  So some
reworking of the caller would also be needed in order to avoid the
'tentative' parameter.  I tried, but I wasn't able to make it work
without introducing diagnostic regressions :/

While poking at the idea though, I was reminded of the related PR85846
which is about the concept-id C<> being rejected due to a stray error
being emitted during the tentative type parse.  The below patch
additionally fixes this PR since it just requires a one-line change in
cp_parser_placeholder_type_specifier.

-- >8 --

Subject: [PATCH] c++: bogus error w/ variadic concept-id as if cond [PR98394]

Here when tentatively parsing the if condition as a declaration, we try
to treat C<1> as the start of a constrained placeholder type, which we
quickly reject because C doesn't accept a type as its first argument.
But since we're parsing tentatively, we shouldn't emit an error in this
case.

In passing, also fix PR85846 by only overriding tentative to false when
given a concept-name, and not also when given a concept-id with empty
argument list.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/11?

PR c++/98394
PR c++/85846

gcc/cp/ChangeLog:

* parser.c (cp_parser_placeholder_type_specifier): Declare
static.  Don't override tentative to false when tmpl is a
concept-id with empty argument list.  Don't emit a "does not
constrain a type" error when tentative.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-pr98394.C: New test.
* g++.dg/cpp2a/concepts-pr85846.C: New test.
---
 gcc/cp/parser.c   | 11 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-pr85846.C |  5 +
 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C | 14 ++
 3 files changed, 26 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr85846.C
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-pr98394.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6..71f782468ef 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -19855,7 +19855,7 @@ cp_parser_simple_type_specifier (cp_parser* parser,
   Note that the Concepts TS allows the auto or decltype(auto) to be
   omitted in a constrained-type-specifier.  */
 
-tree
+static tree
 cp_parser_placeholder_type_specifier (cp_parser *parser, location_t loc,
  tree tmpl, bool tentative)
 {
@@ -19871,7 +19871,7 @@ cp_parser_placeholder_type_specifier (cp_parser 
*parser, location_t loc,
   args = TREE_OPERAND (tmpl, 1);
   tmpl = TREE_OPERAND (tmpl, 0);
 }
-  if (args == NULL_TREE)
+  else if (args == NULL_TREE)
 /* A conc

  1   2   >