date:20230308

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches

> I think the right spot to fix this would be instead in initialize_icvs,
> change the
>   icvs->wait_policy = 0;
> in there to
>   icvs->wait_policy = -1;
> That way it will be the default for all the devices, not just the
> initial one.

It doesn't work, for the code that determines value of wait_policy:

if (none != NULL && gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY))
  wait_policy = none->icvs.wait_policy;
else if (all != NULL && gomp_get_icv_flag (all->flags, GOMP_ICV_WAIT_POLICY))
  wait_policy = all->icvs.wait_policy;

gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY) returns true only when
OMP_WAIT_POLICY is explicitly set, so the initial icvs->wait_policy
could not affect the global wait_policy that used to set
GOMP_SPINCOUNT.

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches

Hongyu Wang  于2023年3月8日周三 16:07写道：
>
> > I think the right spot to fix this would be instead in initialize_icvs,
> > change the
> >   icvs->wait_policy = 0;
> > in there to
> >   icvs->wait_policy = -1;
> > That way it will be the default for all the devices, not just the
> > initial one.
>
> It doesn't work, for the code that determines value of wait_policy:
>
> if (none != NULL && gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY))
>   wait_policy = none->icvs.wait_policy;
> else if (all != NULL && gomp_get_icv_flag (all->flags, GOMP_ICV_WAIT_POLICY))
>   wait_policy = all->icvs.wait_policy;
>
> gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY) returns true only when
> OMP_WAIT_POLICY is explicitly set, so the initial icvs->wait_policy
> could not affect the global wait_policy that used to set
> GOMP_SPINCOUNT.

Also the global variable wait_policy here is only used for setting
spin_count related values that do not
belong to any ICV, so there is no need to set icvs->wait_policy since
for OMP_WAIT_POLICY_(DEV|ALL)
itself only has value 0 for passive and value 1 for active.

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Jakub Jelinek via Gcc-patches

On Wed, Mar 08, 2023 at 04:21:46PM +0800, Hongyu Wang wrote:
> Hongyu Wang  于2023年3月8日周三 16:07写道：
> >
> > > I think the right spot to fix this would be instead in initialize_icvs,
> > > change the
> > >   icvs->wait_policy = 0;
> > > in there to
> > >   icvs->wait_policy = -1;
> > > That way it will be the default for all the devices, not just the
> > > initial one.
> >
> > It doesn't work, for the code that determines value of wait_policy:
> >
> > if (none != NULL && gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY))
> >   wait_policy = none->icvs.wait_policy;
> > else if (all != NULL && gomp_get_icv_flag (all->flags, 
> > GOMP_ICV_WAIT_POLICY))
> >   wait_policy = all->icvs.wait_policy;
> >
> > gomp_get_icv_flag (none->flags, GOMP_ICV_WAIT_POLICY) returns true only when
> > OMP_WAIT_POLICY is explicitly set, so the initial icvs->wait_policy
> > could not affect the global wait_policy that used to set
> > GOMP_SPINCOUNT.
> 
> Also the global variable wait_policy here is only used for setting
> spin_count related values that do not
> belong to any ICV, so there is no need to set icvs->wait_policy since
> for OMP_WAIT_POLICY_(DEV|ALL)
> itself only has value 0 for passive and value 1 for active.

Seems for many ICVs the default values are done through
gomp_default_icv_values, but that doesn't cover wait_policy.
For other vars, the defaults are provided through just initializers of
those vars on the var definitions, e.g.:
char *gomp_affinity_format_var = "level %L thread %i affinity %A";
So, I'd do the initialize_icvs change and change
static int wait_policy;
to
static int wait_policy = -1;

Jakub

Re: [PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Ping,
>
> And updated the hook to allow to differentiate between ISAs.
>
> As Andy said before initializing a ranger instance is cheap but not free, and 
> if
> the intention is to call it often during a pass it should be instantiated at
> pass startup and passed along to the places that need it.  This is a big
> refactoring and doesn't seem right to do in this PR.  But we should in GCC 14.
>
> Currently we only instantiate it after a long series of much cheaper checks.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR target/108583
> * target.def (preferred_div_as_shifts_over_mult): New.
> * doc/tm.texi.in: Document it.
> * doc/tm.texi: Regenerate.
> * targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
> * targhooks.h (default_preferred_div_as_shifts_over_mult): New.
> * tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.
>
> gcc/testsuite/ChangeLog:
>
> PR target/108583
> * gcc.dg/vect/vect-div-bitmask-4.c: New test.
> * gcc.dg/vect/vect-div-bitmask-5.c: New test.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> 50a8872a6695b18b9bed0d393bacf733833633db..f69f7f036272e867ea1c3fee851b117f057f68c5
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6137,6 +6137,10 @@ instruction pattern.  There is no need for the hook to 
> handle these two
>  implementation approaches itself.
>  @end deftypefn
>
> +@deftypefn {Target Hook} bool 
> TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (const_tree @var{type})
> +If possible, when decomposing a division operation of vectors of
> +type @var{type} during vectorization, prefer to use shifts rather than
> +multiplication by magic constants.
>  @end deftypefn
>
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
> (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 
> 3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15
>  100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4173,6 +4173,7 @@ address;  but often a machine-dependent strategy can 
> generate better code.
>
>  @hook TARGET_VECTORIZE_VEC_PERM_CONST
>
> +@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
>
>  @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>
> diff --git a/gcc/target.def b/gcc/target.def
> index 
> e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..bdee9b7f9c941508738fac49593b5baa525e2915
>  100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -1868,6 +1868,16 @@ correct for most targets.",
>   poly_uint64, (const_tree type),
>   default_preferred_vector_alignment)
>
> +/* Returns whether the target has a preference for decomposing divisions 
> using
> +   shifts rather than multiplies.  */
> +DEFHOOK
> +(preferred_div_as_shifts_over_mult,
> + "If possible, when decomposing a division operation of vectors of\n\
> +type @var{type} during vectorization, prefer to use shifts rather than\n\
> +multiplication by magic constants.",

Both approaches requires shifts though.  How about:

  Sometimes it is possible to implement a vector division using a sequence
  of two addition-shift pairs, giving four instructions in total.
  Return true if taking this approach for @var{vectype} is likely
  to be better than using a sequence involving highpart multiplication.

It should also say what the default is, more below.

> + bool, (const_tree type),
> + default_preferred_div_as_shifts_over_mult)
> +
>  /* Return true if vector alignment is reachable (by peeling N
> iterations) for the given scalar type.  */
>  DEFHOOK
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 
> a6a4809ca91baa5d7fad2244549317a31390f0c2..a207963b9e6eb9300df0043e1b79aa6c941d0f7f
>  100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void);
>  extern unsigned HOST_WIDE_INT default_shift_truncation_mask
>(machine_mode);
>  extern unsigned int default_min_divisions_for_recip_mul (machine_mode);
> +extern bool default_preferred_div_as_shifts_over_mult
> +  (const_tree);
>  extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode);
>
>  extern tree default_stack_protect_guard (void);
> diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
> index 
> 211525720a620d6f533e2da91e03877337a931e7..becea6ef4b6329cfa0b676f8d844630fbdc97f20
>  100644
> --- a/gcc/targhooks.cc
> +++ b/gcc/targhooks.cc
> @@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type)
>return TYPE_ALIGN (type);
>  }
>
> +/* The default implementation of
> +   TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT.  */
> +
> +bool
> +default_preferred_div_as_shifts_over_mult (const_tree /* type */)
> +{
> +  return false;
> +}
> +

I think the de

Re: [PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-03-08 Thread Aldy Hernandez via Gcc-patches

As Andrew has been advising on this one, I'd prefer for him to review
it.  However, he's on vacation this week.  FYI...

Aldy

On Mon, Mar 6, 2023 at 12:22 PM Tamar Christina  wrote:
>
> Ping.
>
> And updated the patch to reject cases that we don't expect or can handle 
> cleanly for now.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR target/108583
> * gimple-range-op.h (gimple_range_op_handler): Add maybe_non_standard.
> * gimple-range-op.cc 
> (gimple_range_op_handler::gimple_range_op_handler):
> Use it.
> (gimple_range_op_handler::maybe_non_standard): New.
> * range-op.cc (class operator_widen_plus_signed,
> operator_widen_plus_signed::wi_fold, class 
> operator_widen_plus_unsigned,
> operator_widen_plus_unsigned::wi_fold, class 
> operator_widen_mult_signed,
> operator_widen_mult_signed::wi_fold, class 
> operator_widen_mult_unsigned,
> operator_widen_mult_unsigned::wi_fold,
> ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
> ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New.
> * range-op.h (ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
> ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New
>
> Co-Authored-By: Andrew MacLeod 
>
> --- Inline copy of patch ---
>
> diff --git a/gcc/gimple-range-op.h b/gcc/gimple-range-op.h
> index 
> 743b858126e333ea9590c0f175aacb476260c048..1bf63c5ce6f5db924a1f5907ab4539e376281bd0
>  100644
> --- a/gcc/gimple-range-op.h
> +++ b/gcc/gimple-range-op.h
> @@ -41,6 +41,7 @@ public:
>  relation_trio = TRIO_VARYING);
>  private:
>void maybe_builtin_call ();
> +  void maybe_non_standard ();
>gimple *m_stmt;
>tree m_op1, m_op2;
>  };
> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index 
> d9dfdc56939bb62ade72726b15c3d5e87e4ddcd1..a5d625387e712c170e1e68f6a7d494027f6ef0d0
>  100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -179,6 +179,8 @@ gimple_range_op_handler::gimple_range_op_handler (gimple 
> *s)
>// statements.
>if (is_a  (m_stmt))
>  maybe_builtin_call ();
> +  else
> +maybe_non_standard ();
>  }
>
>  // Calculate what we can determine of the range of this unary
> @@ -764,6 +766,57 @@ public:
>}
>  } op_cfn_parity;
>
> +// Set up a gimple_range_op_handler for any nonstandard function which can be
> +// supported via range-ops.
> +
> +void
> +gimple_range_op_handler::maybe_non_standard ()
> +{
> +  range_operator *signed_op = ptr_op_widen_mult_signed;
> +  range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
> +  if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
> +switch (gimple_assign_rhs_code (m_stmt))
> +  {
> +   case WIDEN_PLUS_EXPR:
> +   {
> + signed_op = ptr_op_widen_plus_signed;
> + unsigned_op = ptr_op_widen_plus_unsigned;
> +   }
> +   gcc_fallthrough ();
> +   case WIDEN_MULT_EXPR:
> +   {
> + m_valid = false;
> + m_op1 = gimple_assign_rhs1 (m_stmt);
> + m_op2 = gimple_assign_rhs2 (m_stmt);
> + tree ret = gimple_assign_lhs (m_stmt);
> + bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
> + bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
> + bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED;
> +
> + /* Normally these operands should all have the same sign, but
> +some passes and violate this by taking mismatched sign args.  At
> +the moment the only one that's possible is mismatch inputs and
> +unsigned output.  Once ranger supports signs for the operands we
> +can properly fix it,  for now only accept the case we can do
> +correctly.  */
> + if ((signed1 ^ signed2) && signed_ret)
> +   return;
> +
> + m_valid = true;
> + if (signed2 && !signed1)
> +   std::swap (m_op1, m_op2);
> +
> + if (signed1 || signed2)
> +   m_int = signed_op;
> + else
> +   m_int = unsigned_op;
> + break;
> +   }
> +   default:
> + break;
> +  }
> +}
> +
>  // Set up a gimple_range_op_handler for any built in function which can be
>  // supported via range-ops.
>
> diff --git a/gcc/range-op.h b/gcc/range-op.h
> index 
> f00b747f08a1fa8404c63bfe5a931b4048008b03..b1eeac70df81f2bdf228af7adff5399e7ac5e5d6
>  100644
> --- a/gcc/range-op.h
> +++ b/gcc/range-op.h
> @@ -311,4 +311,8 @@ private:
>  // This holds the range op table for floating point operations.
>  extern floating_op_table *floating_tree_table;
>
> +extern range_operator *ptr_op_widen_mult_signed;
> +extern range_operator *ptr_op_widen_mult_unsigned;
> +extern range_operator *ptr_op_widen_plus_signed;
> +extern range_operator *ptr_op_widen_plus_unsigned;
>  #endif // GCC_RANGE_OP_H
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> ind

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Hongyu Wang via Gcc-patches

> Seems for many ICVs the default values are done through
> gomp_default_icv_values, but that doesn't cover wait_policy.
> For other vars, the defaults are provided through just initializers of
> those vars on the var definitions, e.g.:
> char *gomp_affinity_format_var = "level %L thread %i affinity %A";
> So, I'd do the initialize_icvs change and change
> static int wait_policy;
> to
> static int wait_policy = -1;

Agreed, here is the updated patch, ok for trunk?

When OMP_WAIT_POLICY is not specified, current implementation will cause
icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy
will remain its uninitialized value. Initialize it to -1 to make
GOMP_SPINCOUNT behavior consistent with its description.

libgomp/ChangeLog:

PR libgomp/109062
* env.c (wait_policy): Initialize to -1.
(initialize_icvs): Initialize icvs->wait_policy to -1.
* testsuite/libgomp.c-c++-common/pr109062.c: New test.
---
 libgomp/env.c |  4 ++--
 libgomp/testsuite/libgomp.c-c++-common/pr109062.c | 14 ++
 2 files changed, 16 insertions(+), 2 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/pr109062.c

diff --git a/libgomp/env.c b/libgomp/env.c
index c41c1f852cc..e7a035b593c 100644
--- a/libgomp/env.c
+++ b/libgomp/env.c
@@ -124,7 +124,7 @@ int goacc_default_dims[GOMP_DIM_MAX];

 #ifndef LIBGOMP_OFFLOADED_ONLY

-static int wait_policy;
+static int wait_policy = -1;
 static unsigned long stacksize = GOMP_DEFAULT_STACKSIZE;

 static void
@@ -1981,7 +1981,7 @@ initialize_icvs (struct gomp_initial_icvs *icvs)
   icvs->bind_var = gomp_default_icv_values.bind_var;
   icvs->nteams_var = gomp_default_icv_values.nteams_var;
   icvs->teams_thread_limit_var =
gomp_default_icv_values.teams_thread_limit_var;
-  icvs->wait_policy = 0;
+  icvs->wait_policy = -1;
 }

 /* Helper function for initialize_env to add a device specific ICV value
diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr109062.c
b/libgomp/testsuite/libgomp.c-c++-common/pr109062.c
new file mode 100644
index 000..5c7c287dafd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/pr109062.c
@@ -0,0 +1,14 @@
+/* { dg-do run } */
+
+#include 
+#include 
+
+int
+main ()
+{
+  omp_display_env (1);
+
+  return 0;
+}
+
+/* { dg-output ".*\\\[host] GOMP_SPINCOUNT = '30'.*" { target native } } */
-- 
2.31.1

Re: [PATCH] libgomp: Fix default value of GOMP_SPINCOUNT [PR 109062]

2023-03-08 Thread Jakub Jelinek via Gcc-patches

On Wed, Mar 08, 2023 at 04:54:20PM +0800, Hongyu Wang wrote:
> > Seems for many ICVs the default values are done through
> > gomp_default_icv_values, but that doesn't cover wait_policy.
> > For other vars, the defaults are provided through just initializers of
> > those vars on the var definitions, e.g.:
> > char *gomp_affinity_format_var = "level %L thread %i affinity %A";
> > So, I'd do the initialize_icvs change and change
> > static int wait_policy;
> > to
> > static int wait_policy = -1;
> 
> Agreed, here is the updated patch, ok for trunk?
> 
> When OMP_WAIT_POLICY is not specified, current implementation will cause
> icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy
> will remain its uninitialized value. Initialize it to -1 to make
> GOMP_SPINCOUNT behavior consistent with its description.
> 
> libgomp/ChangeLog:
> 
> PR libgomp/109062
> * env.c (wait_policy): Initialize to -1.
> (initialize_icvs): Initialize icvs->wait_policy to -1.
> * testsuite/libgomp.c-c++-common/pr109062.c: New test.

Ok, thanks.

Jakub

Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-08 Thread Arsen Arsenović via Gcc-patches

Sandra Loosemore  writes:

> On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
>> I've rerendered the updated documentation with latest development
>> Texinfo (as some of the changes I made for the purposes of the GCC
>> manual still aren't in releases) at:
>>https://www.aarsen.me/~arsen/final/
>
> Ummm.  I don't think GCC's documentation should depend on an unreleased 
> version
> of Texinfo.  Currently install.texi documents that version 4.7 or later is
> required, 4.8 for "make pdf"; did I miss something in your patch set that 
> bumps
> this requirement?  Exactly what features do you depend on that are not yet
> supported by an official Texinfo release?

This patch should still build with older Texinfo versions (albeit, I
hadn't tested 4.7, I missed that requirement).  The unreleased version
should be installed on the server building HTML documentation as it
produces better results w.r.t clickable anchors and index-in-table
handling.  It should not be a hard dependency, and should only degrade
to its current state should in-dev Texinfo be missing.

It might be worth bumping the minimum, 4.7 is a version from 2004; in
the meanwhile, I'll try a few older versions too.

Thanks, have a lovely day.
-- 
Arsen Arsenović

signature.asc
Description: PGP signature

Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
> Ping,
>
> And updating the hook.
>
> There are no new test as new correctness tests were added to the mid-end and
> the existing codegen tests for this already exist.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR target/108583
> * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): 
> Remove.
> (*bitmask_shift_plus): New.
> * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus): New.
> (@aarch64_bitmask_udiv3): Remove.
> * config/aarch64/aarch64.cc
> (aarch64_vectorize_can_special_div_by_constant,
> TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
> (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
> aarch64_vectorize_preferred_div_as_shifts_over_mult): New.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 7f212bf37cd2c120dceb7efa733c9fa76226f029..e1ecb88634f93d380ef534093ea6599dc7278108
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4867,60 +4867,27 @@ (define_expand "aarch64_hn2"
>}
>  )
>
> -;; div optimizations using narrowings
> -;; we can do the division e.g. shorts by 255 faster by calculating it as
> -;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
> -;; double the precision of x.
> -;;
> -;; If we imagine a short as being composed of two blocks of bytes then
> -;; adding 257 or 0b_0001__0001 to the number is equivalent to
> -;; adding 1 to each sub component:
> -;;
> -;;  short value of 16-bits
> -;; ┌──┬┐
> -;; │  ││
> -;; └──┴┘
> -;;   8-bit part1 ▲  8-bit part2   ▲
> -;;   ││
> -;;   ││
> -;;  +1   +1
> -;;
> -;; after the first addition, we have to shift right by 8, and narrow the
> -;; results back to a byte.  Remember that the addition must be done in
> -;; double the precision of the input.  Since 8 is half the size of a short
> -;; we can use a narrowing halfing instruction in AArch64, addhn which also
> -;; does the addition in a wider precision and narrows back to a byte.  The
> -;; shift itself is implicit in the operation as it writes back only the top
> -;; half of the result. i.e. bits 2*esize-1:esize.
> -;;
> -;; Since we have narrowed the result of the first part back to a byte, for
> -;; the second addition we can use a widening addition, uaddw.
> -;;
> -;; For the final shift, since it's unsigned arithmetic we emit an ushr by 8.
> -;;
> -;; The shift is later optimized by combine to a uzp2 with movi #0.
> -(define_expand "@aarch64_bitmask_udiv3"
> -  [(match_operand:VQN 0 "register_operand")
> -   (match_operand:VQN 1 "register_operand")
> -   (match_operand:VQN 2 "immediate_operand")]
> +;; Optimize ((a + b) >> n) + c where n is half the bitsize of the vector
> +(define_insn_and_split "*bitmask_shift_plus"
> +  [(set (match_operand:VQN 0 "register_operand" "=&w")
> +   (plus:VQN
> + (lshiftrt:VQN
> +   (plus:VQN (match_operand:VQN 1 "register_operand" "w")
> + (match_operand:VQN 2 "register_operand" "w"))
> +   (match_operand:VQN 3 "aarch64_simd_shift_imm_vec_exact_top" "Dr"))

I guess this is personal preference, sorry, but I think we should drop
the constraint.  The predicate does the real check, and the operand is
never reloaded, so "Dr" isn't any more helpful than an empty constraint,
and IMO can be confusing.

> + (match_operand:VQN 4 "register_operand" "w")))]
>"TARGET_SIMD"
> +  "#"
> +  "&& true"
> +  [(const_int 0)]
>  {
> -  unsigned HOST_WIDE_INT size
> -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
> -  rtx elt = unwrap_const_vec_duplicate (operands[2]);
> -  if (!CONST_INT_P (elt) || UINTVAL (elt) != size)
> -FAIL;
> -
> -  rtx addend = gen_reg_rtx (mode);
> -  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
> -  emit_move_insn (addend, lowpart_subreg (mode, val, mode));
> -  rtx tmp1 = gen_reg_rtx (mode);
> -  rtx tmp2 = gen_reg_rtx (mode);
> -  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
> -  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
> -  rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize);
> -  emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1));
> -  emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector));
> +  rtx tmp;
> +  if (can_create_pseudo_p ())
> +tmp = gen_reg_rtx (mode);
> +  else
> +tmp = gen_rtx_REG (mode, REGNO (operands[0]));
> +  emit_insn (gen_aarch64_addhn (tmp, operands[1], operands[2]));
> +  emit_insn (gen_aarch64_uaddw (operands[0], operands[4], tmp));
>DONE;
>  })

In the previous review, I said:

  However, IIUC, this pattern would only be formed from combining
  th

RE: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, March 8, 2023 9:18 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new
> optab instead of target hook [PR108583]
> 
> Tamar Christina  writes:
> > Ping,
> >
> > And updating the hook.
> >
> > There are no new test as new correctness tests were added to the
> > mid-end and the existing codegen tests for this already exist.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/108583
> > * config/aarch64/aarch64-simd.md
> (@aarch64_bitmask_udiv3): Remove.
> > (*bitmask_shift_plus): New.
> > * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus): New.
> > (@aarch64_bitmask_udiv3): Remove.
> > * config/aarch64/aarch64.cc
> > (aarch64_vectorize_can_special_div_by_constant,
> > TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
> > (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
> > aarch64_vectorize_preferred_div_as_shifts_over_mult): New.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-simd.md
> > b/gcc/config/aarch64/aarch64-simd.md
> > index
> >
> 7f212bf37cd2c120dceb7efa733c9fa76226f029..e1ecb88634f93d380ef534
> 093ea6
> > 599dc7278108 100644
> > --- a/gcc/config/aarch64/aarch64-simd.md
> > +++ b/gcc/config/aarch64/aarch64-simd.md
> > @@ -4867,60 +4867,27 @@ (define_expand
> "aarch64_hn2"
> >}
> >  )
> >
> > -;; div optimizations using narrowings -;; we can do the division e.g.
> > shorts by 255 faster by calculating it as -;; (x + ((x + 257) >> 8))
> > >> 8 assuming the operation is done in -;; double the precision of x.
> > -;;
> > -;; If we imagine a short as being composed of two blocks of bytes
> > then -;; adding 257 or 0b_0001__0001 to the number is
> > equivalent to -;; adding 1 to each sub component:
> > -;;
> > -;;  short value of 16-bits
> > -;; ┌──┬┐
> > -;; │  ││
> > -;; └──┴┘
> > -;;   8-bit part1 ▲  8-bit part2   ▲
> > -;;   ││
> > -;;   ││
> > -;;  +1   +1
> > -;;
> > -;; after the first addition, we have to shift right by 8, and narrow
> > the -;; results back to a byte.  Remember that the addition must be
> > done in -;; double the precision of the input.  Since 8 is half the
> > size of a short -;; we can use a narrowing halfing instruction in
> > AArch64, addhn which also -;; does the addition in a wider precision
> > and narrows back to a byte.  The -;; shift itself is implicit in the
> > operation as it writes back only the top -;; half of the result. i.e. bits 
> > 2*esize-
> 1:esize.
> > -;;
> > -;; Since we have narrowed the result of the first part back to a
> > byte, for -;; the second addition we can use a widening addition, uaddw.
> > -;;
> > -;; For the final shift, since it's unsigned arithmetic we emit an ushr by 
> > 8.
> > -;;
> > -;; The shift is later optimized by combine to a uzp2 with movi #0.
> > -(define_expand "@aarch64_bitmask_udiv3"
> > -  [(match_operand:VQN 0 "register_operand")
> > -   (match_operand:VQN 1 "register_operand")
> > -   (match_operand:VQN 2 "immediate_operand")]
> > +;; Optimize ((a + b) >> n) + c where n is half the bitsize of the
> > +vector (define_insn_and_split "*bitmask_shift_plus"
> > +  [(set (match_operand:VQN 0 "register_operand" "=&w")
> > +   (plus:VQN
> > + (lshiftrt:VQN
> > +   (plus:VQN (match_operand:VQN 1 "register_operand" "w")
> > + (match_operand:VQN 2 "register_operand" "w"))
> > +   (match_operand:VQN 3
> > +"aarch64_simd_shift_imm_vec_exact_top" "Dr"))
> 
> I guess this is personal preference, sorry, but I think we should drop the
> constraint.  The predicate does the real check, and the operand is never
> reloaded, so "Dr" isn't any more helpful than an empty constraint, and IMO
> can be confusing.
> 
> > + (match_operand:VQN 4 "register_operand" "w")))]
> >"TARGET_SIMD"
> > +  "#"
> > +  "&& true"
> > +  [(const_int 0)]
> >  {
> > -  unsigned HOST_WIDE_INT size
> > -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
> > -  rtx elt = unwrap_const_vec_duplicate (operands[2]);
> > -  if (!CONST_INT_P (elt) || UINTVAL (elt) != size)
> > -FAIL;
> > -
> > -  rtx addend = gen_reg_rtx (mode);
> > -  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
> > -  emit_move_insn (addend, lowpart_subreg (mode, val,
> > mode));
> > -  rtx tmp1 = gen_reg_rtx (mode);
> > -  rtx tmp2 = gen_reg_rtx (mode);
> > -  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
> > -  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
> > -  rtx shift_vector = aarch64_simd_ge

[PATCH] middle-end/108995 - avoid folding when sanitizing overflow

2023-03-08 Thread Richard Biener via Gcc-patches

The following plugs one place in extract_muldiv where it should avoid
folding when sanitizing overflow.

I'm unsure about the testcase, I didn't find any that tests for
a runtime sanitizer error ...

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

PR middle-end/108995
* fold-const.cc (extract_muldiv_1): Avoid folding
(CST * b) / CST2 when sanitizing overflow and we rely on
overflow being undefined.

* gcc.dg/ubsan/pr108995.c: New testcase.
---
 gcc/fold-const.cc |  7 +++
 gcc/testsuite/gcc.dg/ubsan/pr108995.c | 15 +++
 2 files changed, 18 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ubsan/pr108995.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 99882ef820a..02a24c5fe65 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -7093,6 +7093,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
 If we have an unsigned type, we cannot do this since it will change
 the result if the original computation overflowed.  */
   if (TYPE_OVERFLOW_UNDEFINED (ctype)
+ && !TYPE_OVERFLOW_SANITIZED (ctype)
  && ((code == MULT_EXPR && tcode == EXACT_DIV_EXPR)
  || (tcode == MULT_EXPR
  && code != TRUNC_MOD_EXPR && code != CEIL_MOD_EXPR
@@ -7102,8 +7103,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  if (wi::multiple_of_p (wi::to_wide (op1), wi::to_wide (c),
 TYPE_SIGN (type)))
{
- if (TYPE_OVERFLOW_UNDEFINED (ctype))
-   *strict_overflow_p = true;
+ *strict_overflow_p = true;
  return fold_build2 (tcode, ctype, fold_convert (ctype, op0),
  fold_convert (ctype,
const_binop (TRUNC_DIV_EXPR,
@@ -7112,8 +7112,7 @@ extract_muldiv_1 (tree t, tree c, enum tree_code code, 
tree wide_type,
  else if (wi::multiple_of_p (wi::to_wide (c), wi::to_wide (op1),
  TYPE_SIGN (type)))
{
- if (TYPE_OVERFLOW_UNDEFINED (ctype))
-   *strict_overflow_p = true;
+ *strict_overflow_p = true;
  return fold_build2 (code, ctype, fold_convert (ctype, op0),
  fold_convert (ctype,
const_binop (TRUNC_DIV_EXPR,
diff --git a/gcc/testsuite/gcc.dg/ubsan/pr108995.c 
b/gcc/testsuite/gcc.dg/ubsan/pr108995.c
new file mode 100644
index 000..79a178c6751
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ubsan/pr108995.c
@@ -0,0 +1,15 @@
+/* { dg-do run { xfail *-*-* } } */
+/* With optimization we constant fold and diagnose the overflow and do
+   not sanitize anything.  */
+/* { dg-skip-if "" { *-*-* } { "*" } { ! "-O0" } } */
+/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined" } */
+
+int a;
+const int b = 44514;
+int *c = &a;
+
+int main ()
+{
+  *c = 65526 * b / 6;
+  return 0;
+}
-- 
2.35.3

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-08 Thread Jonathan Wakely via Gcc-patches

On Wed, 8 Mar 2023 at 07:25, Richard Biener wrote:
>
> On Wed, 8 Mar 2023, Alexander Monakov wrote:
>
> >
> > On Tue, 7 Mar 2023, Jonathan Wakely wrote:
> >
> > > > Shouldn't this use the idiom suggested in ansidecl.h, i.e.
> > > >
> > > >   private:
> > > > DISABLE_COPY_AND_ASSIGN (auto_mpfr);
> > >
> > >
> > > Why? A macro like that (or a base class like boost::noncopyable) has
> > > some value in a code base that wants to work for both C++03 and C++11
> > > (or later). But in GCC we know we have C++11 now, so we can just
> > > delete members. I don't see what the macro adds.
> >
> > Evidently it's possible to forget to delete one of the members, as
> > showcased in this very thread.
>
> Yes.  And I copy&pasted from somewhere I forgot which also forgot it ...

Looks like gcc/gimple-predicate-analysis.h and gcc/timevar.h each have
two cases of a deleted copy constructor but no deleted assignment.
I'll send a patch for those.


> > The idiom is also slightly easier to read.
>
> Of course inconsistency in the code-base isn't helping that.
> auto_bitmap seems to declare but not define things (including
> move assign/CTOR?)

I'll change those to deleted too.

Re: [PATCH 1/2] ipa-cp: Fix various issues in update_specialized_profile (PR 107925)

2023-03-08 Thread Martin Jambor

Hello,

I'd like to ping the patch below.

Martin


On Tue, Feb 21 2023, Martin Jambor wrote:
> Hi,
>
> the patch below fixes various issues in function
> update_specialized_profile.  The main is removal of the assert which
> is bogus in the case of recursive cloning.  The division of
> unexplained counts is guesswork, which then leads to updates of counts
> of recursive edges, which then can be redirected to the new clone and
> their count subtracted from the count and there simply may not be
> enough left in the count of the original node - especially when we
> clone a lot because of using --param ipa-cp-eval-threshold=1.
>
> The other issue was omission to drop the count of the original node to
> ipa count.  And when calculating the remainder, we should use
> lenient_count_portion_handling to account for partial train runs.
> Finally, the patch adds dumping of the original count which I think
> is useful.
>
> Profiled-LTO-bootstrapped on its own and also normally bootstrapped and
> tested together with the subsequent patch on an x86_64-linux.  OK for
> master and the 12 branch - assuming it is also affected?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-02-17  Martin Jambor  
>
>   PR ipa/107925
>   * ipa-cp.cc (update_specialized_profile): Drop orig_node_count to
>   ipa count, remove assert, lenient_count_portion_handling, dump
>   also orig_node_count.
> ---
>  gcc/ipa-cp.cc | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 4b8dedc0c51..5a6b41cf2d6 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -5093,22 +5093,24 @@ update_specialized_profile (struct cgraph_node 
> *new_node,
>   profile_count redirected_sum)
>  {
>struct cgraph_edge *cs;
> -  profile_count new_node_count, orig_node_count = orig_node->count;
> +  profile_count new_node_count, orig_node_count = orig_node->count.ipa ();
>  
>if (dump_file)
>  {
>fprintf (dump_file, "the sum of counts of redirected  edges is ");
>redirected_sum.dump (dump_file);
> +  fprintf (dump_file, "\nold ipa count of the original node is ");
> +  orig_node_count.dump (dump_file);
>fprintf (dump_file, "\n");
>  }
>if (!(orig_node_count > profile_count::zero ()))
>  return;
>  
> -  gcc_assert (orig_node_count >= redirected_sum);
> -
>new_node_count = new_node->count;
>new_node->count += redirected_sum;
> -  orig_node->count -= redirected_sum;
> +  orig_node->count
> += lenient_count_portion_handling (orig_node->count - redirected_sum,
> +   orig_node);
>  
>for (cs = new_node->callees; cs; cs = cs->next_callee)
>  cs->count += cs->count.apply_scale (redirected_sum, new_node_count);
> -- 
> 2.39.1

Re: [PATCH 2/2] ipa-cp: Improve updating behavior when profile counts have gone bad

2023-03-08 Thread Martin Jambor

Hello,

I'd like to ping the patch below.

Martin


On Tue, Feb 21 2023, Martin Jambor wrote:
> Hi,
>
> Looking into the behavior of profile count updating in PR 107925, I
> noticed that an option not considered possible was actually happening,
> and - with the guesswork in place to distribute unexplained counts -
> it simply can happen.  Currently it is handled by dropping the counts
> to local estimated zero, whereas it is probably better to leave the
> count as they are but drop the category to GUESSED_GLOBAL0 - which is
> what profile_count::combine_with_ipa_count in a similar case (or so I
> hope :-)
>
> Profiled-LTO-bootstrapped and normally bootstrapped and tested on an
> x86_64-linux.  OK for master once stage1 opens up?  Or perhaps even now?
>
> Thanks,
>
> Martin
>
>
> gcc/ChangeLog:
>
> 2023-02-20  Martin Jambor  
>
>   PR ipa/107925
>   * ipa-cp.cc (update_profiling_info): Drop counts of orig_node to
>   global0 instead of zeroing when it does not have as many counts as
>   it should.
> ---
>  gcc/ipa-cp.cc | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
> index 5a6b41cf2d6..6477bb840e5 100644
> --- a/gcc/ipa-cp.cc
> +++ b/gcc/ipa-cp.cc
> @@ -4969,10 +4969,20 @@ update_profiling_info (struct cgraph_node *orig_node,
> false);
>new_sum = stats.count_sum;
>  
> +  bool orig_edges_processed = false;
>if (new_sum > orig_node_count)
>  {
> -  /* TODO: Perhaps this should be gcc_unreachable ()?  */
> -  remainder = profile_count::zero ().guessed_local ();
> +  /* TODO: Profile has alreay gone astray, keep what we have but lower it
> +  to global0 category.  */
> +  remainder = orig_node->count.global0 ();
> +
> +  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> + cs->count = cs->count.global0 ();
> +  for (cgraph_edge *cs = orig_node->indirect_calls;
> +cs;
> +cs = cs->next_callee)
> + cs->count = cs->count.global0 ();
> +  orig_edges_processed = true;
>  }
>else if (stats.rec_count_sum.nonzero_p ())
>  {
> @@ -5070,11 +5080,16 @@ update_profiling_info (struct cgraph_node *orig_node,
>for (cgraph_edge *cs = new_node->indirect_calls; cs; cs = cs->next_callee)
>  cs->count = cs->count.apply_scale (new_sum, orig_new_node_count);
>  
> -  profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count);
> -  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> -cs->count = cs->count.apply_scale (remainder, orig_node_count);
> -  for (cgraph_edge *cs = orig_node->indirect_calls; cs; cs = cs->next_callee)
> -cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +  if (!orig_edges_processed)
> +{
> +  profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count);
> +  for (cgraph_edge *cs = orig_node->callees; cs; cs = cs->next_callee)
> + cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +  for (cgraph_edge *cs = orig_node->indirect_calls;
> +cs;
> +cs = cs->next_callee)
> + cs->count = cs->count.apply_scale (remainder, orig_node_count);
> +}
>  
>if (dump_file)
>  {
> -- 
> 2.39.1

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-08 Thread Jakub Jelinek via Gcc-patches

On Wed, Mar 08, 2023 at 10:13:39AM +, Jonathan Wakely wrote:
> On Wed, 8 Mar 2023 at 07:25, Richard Biener wrote:
> >
> > On Wed, 8 Mar 2023, Alexander Monakov wrote:
> >
> > >
> > > On Tue, 7 Mar 2023, Jonathan Wakely wrote:
> > >
> > > > > Shouldn't this use the idiom suggested in ansidecl.h, i.e.
> > > > >
> > > > >   private:
> > > > > DISABLE_COPY_AND_ASSIGN (auto_mpfr);
> > > >
> > > >
> > > > Why? A macro like that (or a base class like boost::noncopyable) has
> > > > some value in a code base that wants to work for both C++03 and C++11
> > > > (or later). But in GCC we know we have C++11 now, so we can just
> > > > delete members. I don't see what the macro adds.
> > >
> > > Evidently it's possible to forget to delete one of the members, as
> > > showcased in this very thread.
> >
> > Yes.  And I copy&pasted from somewhere I forgot which also forgot it ...
> 
> Looks like gcc/gimple-predicate-analysis.h and gcc/timevar.h each have
> two cases of a deleted copy constructor but no deleted assignment.
> I'll send a patch for those.

So perhaps use the = delete decls and a standardized comment (same wording
for all the auto_* types) above those which says what the above macro says?
Then one can grep that comment etc.

Jakub

Re: [PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-03-08 Thread Richard Sandiford via Gcc-patches

Tamar Christina  writes:
>> -Original Message-
>> > + (match_operand:VQN 4 "register_operand" "w")))]
>> >"TARGET_SIMD"
>> > +  "#"
>> > +  "&& true"
>> > +  [(const_int 0)]
>> >  {
>> > -  unsigned HOST_WIDE_INT size
>> > -= (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1;
>> > -  rtx elt = unwrap_const_vec_duplicate (operands[2]);
>> > -  if (!CONST_INT_P (elt) || UINTVAL (elt) != size)
>> > -FAIL;
>> > -
>> > -  rtx addend = gen_reg_rtx (mode);
>> > -  rtx val = aarch64_simd_gen_const_vector_dup (mode, 1);
>> > -  emit_move_insn (addend, lowpart_subreg (mode, val,
>> > mode));
>> > -  rtx tmp1 = gen_reg_rtx (mode);
>> > -  rtx tmp2 = gen_reg_rtx (mode);
>> > -  emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend));
>> > -  unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode);
>> > -  rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode,
>> > bitsize);
>> > -  emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1));
>> > -  emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2,
>> > shift_vector));
>> > +  rtx tmp;
>> > +  if (can_create_pseudo_p ())
>> > +tmp = gen_reg_rtx (mode);  else
>> > +tmp = gen_rtx_REG (mode, REGNO (operands[0]));
>> > + emit_insn (gen_aarch64_addhn (tmp, operands[1],
>> operands[2]));
>> > + emit_insn (gen_aarch64_uaddw (operands[0], operands[4],
>> > + tmp));
>> >DONE;
>> >  })
>> 
>> In the previous review, I said:
>> 
>>   However, IIUC, this pattern would only be formed from combining
>>   three distinct patterns.  Is that right?  If so, we should be able
>>   to handle it as a plain define_split, with no define_insn.
>>   That should make things simpler, so would be worth trying before
>>   the changes I mentioned above.
>> 
>> Did you try that?  I still think it'd be preferable to defining a new insn.
>
> Yes I did! Sorry I forgot to mention that.  When I made it a split for some
> reason It wasn't matching it anymore.

I was hoping for a bit more detail than that :-)  But it seems that
the reason is that we match SRA first, so the final combination
is a 2-to-1 rather than 3-to-1.

So yeah, the patch is OK with the other changes mentioned in the review.

Thanks,
Richard

Re: Enable UTF-8 code page in driver and compiler on 64-bit mingw host [PR108865]

2023-03-08 Thread Costas Argyris via Gcc-patches

Added .manifest file to the make rule for utf8rc-mingw32.o, latest patch
attached.

On Tue, 7 Mar 2023 at 15:27, Costas Argyris 
wrote:

> Hi Jacek,
>
> "but I think it should work just fine if you didn't explicitly limit the
> patch to x86_64."
>
> I would think so too.
>
> Actually, even cygwin might benefit from this, assuming it has the same
> problem, which I don't know if it's the case.
>
> But I'm not experienced with that so I would like to explore these hosts
> separately and just focus on the most common 64-bit Windows host with this
> change, if possible.
>
> "The point that when winnt-utf8.manifest is modified, utf8-mingw32.o
> should be rebuilt."
>
> Right, makes sense.
>
> Just noting that winnt-utf8.manifest is really not meant to be modified,
> because it is copied straight from:
>
>
> https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
>
> and will probably remain like that, but I do get your point and I am happy
> to make the change.
>
> Thanks,
> Costas
>
> On Tue, 7 Mar 2023 at 14:18, Jacek Caban  wrote:
>
>> Hi Costas,
>>
>> On 3/7/23 15:00, Costas Argyris wrote:
>> > Hi Jacek,
>> >
>> > "Is there a reason to make it specific to x86_64? It seems to me that
>> > all mingw hosts could use it."
>> >
>> > Are you referring to the 32-bit host?My concern here is that this
>> > functionality (embedding the UTF-8
>> > manifest file into the executable) is only truly supported in recent
>> > versions of Windows.From:
>> >
>> >
>> https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
>> >
>> > It says that Windows Version 1903 (May 2019 Update) enables this, so
>> > we are looking at the 64-bit
>> > version of Windows.
>> >
>> > I suppose you are referring to the scenario where one has a 32-bit
>> > gcc + mingw running in a 64-bit
>> > Windows that is recent enough to support this?It is not clear to
>> > me based on the above doc what
>> > would happen encoding-wise in that situation, and I haven't tried it
>> > either because I assumed that
>> > most people would want the 64-bit version of gcc since they are
>> > probably running a 64-bit OS.
>> >
>> > If you think it is useful, I could look into that as a separate task
>> > to try and keep this one simple, if
>> > that makes sense.
>>
>>
>> Yes, realistically it's mostly about 32-bit gcc on 64-bit Windows
>> (perhaps aarch64 as well at some point in the future). It's probably
>> indeed not very popular configuration those days, but I think it should
>> work just fine if you didn't explicitly limit the patch to x86_64.
>>
>>
>> > "I think that .manifest file should also be a dependency here."
>> >
>> > Why is that?Windres takes only the .rc file as its input, as per
>> > its own doc, and it successfully
>> > compiles it into an object file.The .manifest file is only
>> > referenced by the .rc file, and it doesn't
>> > get passed to windres, so I don't see why it has to be listed as a
>> > prerequisite in the make rule.
>>
>>
>> The point that when winnt-utf8.manifest is modified, utf8-mingw32.o
>> should be rebuilt. Anyway, it's probably not a big deal (I should
>> disclaim that I'm not very familiar with gcc build system; I'm mostly on
>> this ML due to mingw-w64 contributions).
>>
>>
>> Thanks,
>>
>> Jacek
>>
>>
From 694d6f4860a08f690070df411f3f72d66a48a981 Mon Sep 17 00:00:00 2001
From: Costas Argyris 
Date: Tue, 28 Feb 2023 17:10:18 +
Subject: [PATCH] Enable UTF-8 code page on Windows 64-bit host [PR108865]

Compile a resource object that contains the utf8 manifest.

Then link that object into the driver and compiler proper.

For compiler proper the link has to be forced because the
resource object file gets into a static library (libbackend.a)
and gets eventually dropped because it has no symbols of
its own and nothing is referencing it inside the library.

Therefore, an artificial symbol is planted to force the link.
---
 gcc/config.host |  5 ++-
 gcc/config/i386/sym-mingw32.cc  |  1 +
 gcc/config/i386/utf8-mingw32.rc |  3 ++
 gcc/config/i386/winnt-utf8.manifest |  8 
 gcc/config/i386/x-mingw32   |  3 +-
 gcc/config/i386/x-mingw32-utf8  | 57 +
 6 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100644 gcc/config/i386/sym-mingw32.cc
 create mode 100644 gcc/config/i386/utf8-mingw32.rc
 create mode 100644 gcc/config/i386/winnt-utf8.manifest
 create mode 100644 gcc/config/i386/x-mingw32-utf8

diff --git a/gcc/config.host b/gcc/config.host
index a522c39658e..4abb32ad73d 100644
--- a/gcc/config.host
+++ b/gcc/config.host
@@ -241,10 +241,11 @@ case ${host} in
   x86_64-*-mingw*)
 use_long_long_for_widest_fast_int=yes
 host_xm_file=i386/xm-mingw32.h
-host_xmake_file="${host_xmake_file} i386/x-mingw32"
+host_xmake_file="${host_xmake_file} i386/x-mingw32 i386/x-mingw32-utf8"
 host_exeext=.exe
 out_host_hook_obj=host-mingw32.o
-host_extra_gcc_objs="${ho

[Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Tobias Burnus


Hi Andrew,

attached are two patches related to GCN, one for libgomp.texi documenting an 
env var
and a release-notes update in www docs.

OK? Comments?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-13/changes.html: Mention SIMD and updated newlib requirement
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index a803f501..e7a342d3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -374,6 +374,10 @@ a work-in-progress.
   Support for the Instinct MI200 series devices (https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html";>
   gfx90a) has been added.
+  SIMD vectorization support has been improved; this and stack-handling
+  changes https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa";>require
+  newlib 4.3.0 (or newer).

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Tobias Burnus


Next try – this time with both patches.

On 08.03.23 12:05, Tobias Burnus wrote:

Hi Andrew,

attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.

OK? Comments?

Tobias


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc-13/changes.html: Mention SIMD and updated newlib requirement
diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index a803f501..e7a342d3 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -374,6 +374,10 @@ a work-in-progress.
   Support for the Instinct MI200 series devices (https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html";>
   gfx90a) has been added.
+  SIMD vectorization support has been improved; this and stack-handling
+  changes https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa";>require
+  newlib 4.3.0 (or newer).
 
 
 
libgomp.texi: Mention GCN_STACK_SIZE in Offload-Target Specifics

libgomp/ChangeLog:

	* libgomp.texi (Offload-Target Specifics): Mention GCN_STACK_SIZE.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 1b6358db4dc..5bcb84a1d6f 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -4461,6 +4461,8 @@ The implementation remark:
 @item OpenMP code that has a requires directive with @code{unified_address} or
   @code{unified_shared_memory} will remove any GCN device from the list of
   available devices (``host fallback'').
+@item The available stack size can be changed using the @code{GCN_STACK_SIZE}
+  environment variable; the default is 32 kiB per thread.
 @end itemize

RE: [PATCH] RISC-V: Bugfix for rvv bool mode size adjustment

2023-03-08 Thread Li, Pan2 via Gcc-patches

Completed the regression test and the RISC-V backend test without any surprise.

Pan

-Original Message-
From: Li, Pan2  
Sent: Wednesday, March 8, 2023 3:34 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Li, Pan2 

Subject: [PATCH] RISC-V: Bugfix for rvv bool mode size adjustment

From: yes 

Fix the bug of the rvv bool mode size by the adjustment.
Besides the mode precision (aka bit size [1, 2, 4, 8, 16, 32, 64]) of the 
vbool*_t, the mode size (aka byte size) will be adjusted to [1, 1, 1, 1, 2, 4, 
8] according to the rvv spec 1.0 isa. The adjustment will provide correct 
information for the underlying redundant instruction elimiation.

Given the below sample code:
{
  vbool1_t v1 = *(vbool1_t*)in;
  vbool64_t v2 = *(vbool64_t*)in;

  *(vbool1_t*)(out + 100) = v1;
  *(vbool64_t*)(out + 200) = v2;
}

Before the size adjustment:
csrrt0,vlenb
sllit1,t0,1
csrra3,vlenb
sub sp,sp,t1
sllia4,a3,1
add a4,a4,sp
addia2,a1,100
vsetvli a5,zero,e8,m8,ta,ma
sub a3,a4,a3
vlm.v   v24,0(a0)
vsm.v   v24,0(a2)
vsm.v   v24,0(a3)
addia1,a1,200
csrrt0,vlenb
vsetvli a4,zero,e8,mf8,ta,ma
sllit1,t0,1
vlm.v   v24,0(a3)
vsm.v   v24,0(a1)
add sp,sp,t1
jr  ra

After the size adjustment:
addia3,a1,100
vsetvli a4,zero,e8,m8,ta,ma
addia1,a1,200
vlm.v   v24,0(a0)
vsm.v   v24,0(a3)
vsetvli a5,zero,e8,mf8,ta,ma
vlm.v   v24,0(a0)
vsm.v   v24,0(a1)
ret

Additionally, the size adjust cannot cover all possible combinations of the 
vbool*_t code pattern like above. We will take a look into it in another 
patches.

PR 108185
PR 108654

gcc/ChangeLog:

* config/riscv/riscv-modes.def (ADJUST_BYTESIZE):
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
* config/riscv/riscv.h (riscv_v_adjust_bytesize):

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr108185-1.c:
* gcc.target/riscv/rvv/base/pr108185-2.c:
* gcc.target/riscv/rvv/base/pr108185-3.c:

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
---
 gcc/config/riscv/riscv-modes.def  | 14 ++--
 gcc/config/riscv/riscv.cc | 22 +++
 gcc/config/riscv/riscv.h  |  1 +
 .../gcc.target/riscv/rvv/base/pr108185-1.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr108185-2.c|  2 +-
 .../gcc.target/riscv/rvv/base/pr108185-3.c|  2 +-
 6 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/gcc/config/riscv/riscv-modes.def b/gcc/config/riscv/riscv-modes.def
index 110bddce851..4cf7cf8b1c6 100644
--- a/gcc/config/riscv/riscv-modes.def
+++ b/gcc/config/riscv/riscv-modes.def
@@ -64,13 +64,13 @@ ADJUST_ALIGNMENT (VNx16BI, 1);  ADJUST_ALIGNMENT (VNx32BI, 
1);  ADJUST_ALIGNMENT (VNx64BI, 1);
 
-ADJUST_BYTESIZE (VNx1BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx2BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx4BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx8BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * riscv_bytes_per_vector_chunk); 
-ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
+ADJUST_BYTESIZE (VNx1BI, riscv_v_adjust_bytesize (VNx1BImode, 1)); 
+ADJUST_BYTESIZE (VNx2BI, riscv_v_adjust_bytesize (VNx2BImode, 1)); 
+ADJUST_BYTESIZE (VNx4BI, riscv_v_adjust_bytesize (VNx4BImode, 1)); 
+ADJUST_BYTESIZE (VNx8BI, riscv_v_adjust_bytesize (VNx8BImode, 1)); 
+ADJUST_BYTESIZE (VNx16BI, riscv_v_adjust_bytesize (VNx16BImode, 2)); 
+ADJUST_BYTESIZE (VNx32BI, riscv_v_adjust_bytesize (VNx32BImode, 4)); 
+ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_bytesize (VNx64BImode, 8));
 
 ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));  
ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2)); diff --git 
a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 
e7b7d87cebc..428fbb28fae 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1003,6 +1003,28 @@ riscv_v_adjust_nunits (machine_mode mode, int scale)
   return scale;
 }
 
+/* Call from ADJUST_BYTESIZE in riscv-modes.def.  Return the correct
+   BYTE size for corresponding machine_mode.  */
+
+poly_int64
+riscv_v_adjust_bytesize (machine_mode mode, int scale) {
+  if (riscv_v_ext_vector_mode_p (mode))
+  {
+poly_uint16 mode_size = GET_MODE_SIZE (mode);
+
+if (maybe_eq (mode_size, (uint16_t)-1))
+  mode_size = riscv_vector_chunks * scale;
+
+if (known_gt (mode_size, BYTES_PER_RISCV_VECTOR))
+  mode_size = BYTES_PER_RISCV_VECTOR;
+
+return mode_size;
+  }
+
+  return scale;
+}
+
 /* Call from ADJUST_PRECISION in riscv-modes.def.  Return the correct
PRECISION size for corresponding machine_mode.  */
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 
15b9317a8ce..66fb0

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-08 Thread Richard Sandiford via Gcc-patches

Segher Boessenkool  writes:
> Hi!
>
> On Mon, Mar 06, 2023 at 07:13:08PM +, Richard Sandiford wrote:
>> Segher Boessenkool  writes:
>> > Most importantly, what makes you think this is a problem for aarch64
>> > only?  If it actually is, you can fix it in the aarch64 config!  Either
>> > with or without new hooks, whatever works best.
>> 
>> The point is that I don't think it's a problem for AArch64 only.
>> I think it's a generic issue that should be solved in a generic way
>> (which is what the patch is trying to do).  The suggestion to restrict
>> it to AArch64 came from Jakub.
>> 
>> The reason I'm pushing back against a hook is precisely because
>> I don't want to solve this in AArch64-specific code.
>
> But it is many times worse still to do it in target-specific magic code
> disguised as generic code :-(
>
> If there is no clear explanation why combine should do X, then it
> probably should not.
>
>> I'm not sure we would be talking about restricting this to AArch64
>> if the patch had been posted in stage 1.  If people are concerned
>> about doing this for all targets in stage 4 (which they seem to be),
>
> Not me, not in principle.  But it takes more time than we have left in
> stage 4 to handle this, even for only combine.  We should give the other
> target maintainers much longer as well.
>
>> I thought the #ifdef was the simplest way of addressing that concern.
>
> An #ifdef is a way of making a change that is not finished yet not hurt
> the other targets.  It still hurts generic development, which indirectly
> hurts all targets.

Seems like this might be moot anyway given that your results
suggest no impact on other targets.

>> And I don't think what the patch does is ad hoc.
>
> It is almost impossible to explain what it does and why that is a good
> thing, why it is what we want, what we should do here; and certainly not
> in a compact, terse, focused way.  It has all the hallmarks of ad hoc
> patches.
>
>> Reorganising the
>> expression in this way isn't something new.  extract_left_shift already
>> does a similar thing (and does it for all targets).
>
> That is not similar at all, no.
>
> /* See if X (of mode MODE) contains an ASHIFT of COUNT or more bits that
>can be commuted with any other operations in X.  Return X without
>that shift if so.  */
>
> If you can factor out a utility function like that, with an actual nice
> description like that, it would be a much more palatable patch.

OK, I've factored it out below.  Does the comment look OK?

As mentioned in the patch description below, I think some of the other
"and" handling could be moved here too, which should avoid a bit of
(existing) duplication.  But that isn't necessary to fix the bug.

On the code size results: as I mentioned on IRC yesterday, when I tried
building the linux kernel locally (with -Os -fno-schedule-insns{,2}),
I saw code size improvements in 182 .os and a regression in only one .o.
(I was comparing individual .os because it makes isolation easier.)

The one regression was because the optimised version had fewer pseudos,
and so something that was previously allocated to x3 was allocated to x2.
This pseudo was initialised to 0, and a preceding stack protect
instruction had the known side effect (both before and after the patch)
of setting x3 to 0.  So with the x3 allocation, postreload was able to
remove the separate initialisation of x3 with 0, but with the x2
allocation it couldn't.

I also tried seeing what effect it had on gcc.c-torture, gcc.dg and
g++.dg.  All the changes were improvements.

And the testcase in the PR was from a real workload (ArmRAL).

If you can isolate the case you saw, I'll have a look.  But ISTM that
the change is a pretty clear win on aarch64.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.

Thanks,
Richard

Combine's approach to simplifying a pattern X is to:

(1) expand "compound" operations (such as extends and extracts)
into multiple operations (such as shifts), to give a new
expression X1

(2) simplify X1 in the normal way, and with some combine-specific
extras, to give X2

(3) remake compound operations from X2, to give X3

For example, (1) expands sign_extend to an ashift/ashiftrt pair,
with the ashift being an immediate operand of the ashiftrt.
I'll call ashiftrt the "outer" operation and ashift the
"inner" operation below.

By design, (2) can perturb the structure of the expanded operations
in X1.  Sometimes it will rework them entirely.  But sometimes it
will keep the outer operations and inner operations, but in a
slightly different arrangement.  For example, the inner operation
might no longer be a direct operand of the outer operation.

Therefore, when make_compound_operation sees a potential outer
operation, it sometimes looks through suboperands to find the
potential inner operation.  Examples include:

(a) the ashiftrt handling, which uses extract_left_shift to find
a partnering ashift operation

(b) the "and" handling, which looks t

Re: [PATCH] -Wdangling-pointer: don't mark SSA lhs sets as stores

2023-03-08 Thread Martin Liška

On 3/3/23 12:12, Richard Biener via Gcc-patches wrote:
> On Fri, Mar 3, 2023 at 9:30 AM Alexandre Oliva  wrote:
>>
>> On Feb 17, 2023, Richard Biener  wrote:
>>
 * gimple-ssa-warn-access.cc
 (pass_waccess::check_dangling_stores): Skip non-stores.

 for  gcc/testsuite/ChangeLog

 * g++.dg/warn/Wdangling-pointer.C (warn_init_ref_member): Add
 two new variants, one fixed, one xfailed.
 * c-c++-common/Wdangling-pointer-5.c
 (nowarn_store_arg_store_arg): Add now-expected warnings.
>>
>> Ping?
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612186.html
> 
> I was hoping Martin would chime in, but he didn't.
> 
> So - OK.
> 
> Thanks,
> Richard.
> 
>>
>> --
>> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>>Free Software Activist   GNU Toolchain Engineer
>> Disinformation flourishes because many people care deeply about injustice
>> but very few check the facts.  Ask me about 

Hi.

I've just noticed this change triggered one more warning for qemu 7.1.0:

cc -m64 -mcx16 -Ilibqemuutil.a.p -I. -I.. -Isubprojects/libvhost-user 
-I../subprojects/libvhost-user -Iqapi -Itrace -Iui -Iui/shader 
-I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libmount 
-I/usr/include/blkid -I/usr/include/gio-unix-2.0 -I/usr/include/p11-kit-1 
-I/usr/include/pixman-1 -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror 
-std=gnu11 -O2 -isystem /home/abuild/rpmbuild/BUILD/qemu-7.1.0/linux-headers 
-isystem linux-headers -iquote . -iquote /home/abuild/rpmbuild/BUILD/qemu-7.1.0 
-iquote /home/abuild/rpmbuild/BUILD/qemu-7.1.0/include -iquote 
/home/abuild/rpmbuild/BUILD/qemu-7.1.0/tcg/i386 -pthread -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration 
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self 
-Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -fstack-protector-strong -O2 -Wall 
-fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables 
-fstack-clash-protection -Werror=return-type -U_FORTIFY_SOURCE 
-D_FORTIFY_SOURCE=2 -fPIE -MD -MQ libqemuutil.a.p/util_async.c.o -MF 
libqemuutil.a.p/util_async.c.o.d -o libqemuutil.a.p/util_async.c.o -c 
../util/async.c
In file included from 
/home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/qemu/coroutine.h:18,
 from 
/home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/block/aio.h:20,
 from ../util/async.c:28:
../util/async.c: In function 'aio_bh_poll':
/home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/qemu/queue.h:303:22: error: 
storing the address of local variable 'slice' in '*ctx.bh_slice_list.sqh_last' 
[-Werror=dangling-pointer=]
  303 | (head)->sqh_last = &(elm)->field.sqe_next;  
\
  | ~^~~~
../util/async.c:161:5: note: in expansion of macro 'QSIMPLEQ_INSERT_TAIL'
  161 | QSIMPLEQ_INSERT_TAIL(&ctx->bh_slice_list, &slice, next);
  | ^~~~
../util/async.c:156:17: note: 'slice' declared here
  156 | BHListSlice slice;
  | ^
../util/async.c:156:17: note: 'ctx' declared here

which I reduced to:

$ cat util_async.i
typedef struct BHListSlice BHListSlice;
struct BHListSlice {
  struct {
BHListSlice *sqe_next;
  } next;
} *aio_bh_poll_s;
struct AioContext {
  struct {
BHListSlice *sqh_first;
BHListSlice **sqh_last;
  } bh_slice_list;
} aio_bh_dequeue();
int aio_bh_poll_bh;
int aio_bh_poll(struct AioContext *ctx) {
  BHListSlice slice;
  (&ctx->bh_slice_list)->sqh_last = &(slice.next.sqe_next);
  while (aio_bh_poll_s) {
unsigned flags;
aio_bh_dequeue(&flags);
if (aio_bh_poll_bh) {
  (&ctx->bh_slice_list)->sqh_last = &(&ctx->bh_slice_list)->sqh_first;
  continue;
}
  }
  return 0;
}

$ gcc util_async.i -c -Werror=dangling-pointer
util_async.i: In function ‘aio_bh_poll’:
util_async.i:16:35: error: storing the address of local variable ‘slice’ in 
‘*ctx.bh_slice_list.sqh_last’ [-Werror=dangling-pointer=]
   16 |   (&ctx->bh_slice_list)->sqh_last = &(slice.next.sqe_next);
  |   ^~~~
util_async.i:15:15: note: ‘slice’ declared here
   15 |   BHListSlice slice;
  |   ^
util_async.i:15:15: note: ‘ctx’ declared here
cc1: some warnings being treated as errors

Is the emitted warning correct?

Thank you,
Martin

[PATCH v3] gcov: Fix "do-while" structure in case statement leads to incorrect code coverage [PR93680]

2023-03-08 Thread Xionghu Luo via Gcc-patches





On 2023/3/7 19:25, Richard Biener wrote:

It would be nice to avoid creating blocks / preserving labels we'll
immediately remove again.  For that we do need some analysis
before creating basic-blocks that determines whether a label is
possibly reached by a non-falltru edge.



 :
p = 0;
switch (s) , case 0: , case 1: >

 :
:   <= prev_stmt
:   <= stmt
p = p + 1;
n = n + -1;
if (n != 0) goto ; else goto ;

Check if  is a case label and  is a goto target then return true
in stmt_starts_bb_p to start a new basic block?  This would avoid creating and
removing blocks, but cleanup_dead_labels has all bbs setup while
stmt_starts_bb_p
does't yet to iterate bbs/labels to establish label_for_bb[] map?



Yes.  I think we'd need something more pragmatic before make_blocks (),
like re-computing TREE_USED of the label decls or computing a bitmap
of targeted labels (targeted by goto, switch or any other means).

I'll note that doing a cleanup_dead_labels () like optimization before
we create blocks will help keeping LABEL_DECL_UID and thus
label_to_block_map dense.  But it does look like a bit of
an chicken-and-egg problem and the question is how effective the
dead label removal is in practice.


Tried to add function compute_target_labels(not sure whether the function
name is suitable) in the front of make_blocks_1, now the fortran case doesn't
create/removing blocks now, but I still have several questions:

 1. I used hash_set to save the target labels instead of bitmap, as labels
are tree type value instead of block index so bitmap is not good for it since
we don't have LABEL_DECL_UID now?
 2. Is the compute_target_labels still only for !optimize?  And if we compute
the target labels before create bbs, it is unnessary to guard the first
cleanup_dead_labels under !optimize now, because the switch-case-do-while
case already create new block for CASE_LABEL already.
 3. I only added GIMPLE_SWITCH/GIMPLE_COND in compute_target_labels
so far, is it needed to also handle GIMPLE_ASM/GIMPLE_TRANSACTION and even
labels_eh?


PS1: The v3 patch will cause one test case fail:

Number of regressions in total: 1

FAIL: gcc.c-torture/compile/limits-caselabels.c   -O0  (test for excess errors)


due to this exausting case has labels from L0 to L11, they won't be 
optimized
to a simple if-else expression like before...


PS2: The GIMPLE_GOTO piece of code would cause some fortran cases run fail due
to __builtin_unreachable trap generated in .fixup_cfg1, I didn't dig into it so
just skip these label...


+   case GIMPLE_GOTO:
+#if 0
+ if (!computed_goto_p (stmt))
+   {
+ tree dest = gimple_goto_dest (stmt);
+ target_labels->add (dest);
+   }
+#endif
+ break;

Change the #if 0 to #if 1 result in:

Number of regressions in total: 8

FAIL: gcc.c-torture/compile/limits-caselabels.c   -O0  (test for excess errors)
FAIL: gcc.dg/analyzer/explode-2a.c (test for excess errors)
FAIL: gcc.dg/analyzer/pragma-2.c (test for excess errors)
FAIL: gfortran.dg/bound_2.f90   -O0  execution test
FAIL: gfortran.dg/bound_7.f90   -O0  execution test
FAIL: gfortran.dg/char_result_14.f90   -O0  execution test
FAIL: gfortran.dg/pointer_array_1.f90   -O0  execution test
FAIL: gfortran.dg/select_type_15.f03   -O0  execution test




Paste the updated patch v3:


v3: Add compute_target_labels and call it in the front of make_blocks_1.

Start a new basic block if two labels have different location when
test-coverage.

Regression tested pass on x86_64-linux-gnu and aarch64-linux-gnu, OK for
master?

gcc/ChangeLog:

PR gcov/93680
* tree-cfg.cc (stmt_starts_bb_p): Check whether the label is in
target_labels.
(compute_target_labels): New function.
(make_blocks_1): Call compute_target_labels.

gcc/testsuite/ChangeLog:

PR gcov/93680
* g++.dg/gcov/gcov-1.C: Correct counts.
* gcc.misc-tests/gcov-4.c: Likewise.
* gcc.misc-tests/gcov-pr85332.c: Likewise.
* lib/gcov.exp: Also clean gcda if fail.
* gcc.misc-tests/gcov-pr93680.c: New test.

Signed-off-by: Xionghu Luo 
---
 gcc/tree-cfg.cc | 68 -
 gcc/testsuite/g++.dg/gcov/gcov-1.C  |  2 +-
 gcc/testsuite/gcc.dg/analyzer/paths-4.c |  8 +--
 gcc/testsuite/gcc.misc-tests/gcov-pr85332.c |  2 +-
 gcc/testsuite/gcc.misc-tests/gcov-pr93680.c | 24 
 gcc/testsuite/lib/gcov.exp  |  4 +-
 6 files changed, 96 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-pr93680.c

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index a9fcc7fd050..0f8efcf4aa3 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -164,7 +164,7 @@ static edge gimple_redirect_edge_and_branch (edge, 
basic_block);
 static edge gimple_try_redirect_by_replacing_jump (edge, basic_block);
 
 /* Various helpers.  */

-static inline bool stmt_starts_bb_p (gimple *, gimple *);
+static inline bool s

Re: [PATCH] -Wdangling-pointer: don't mark SSA lhs sets as stores

2023-03-08 Thread Richard Biener via Gcc-patches

On Wed, Mar 8, 2023 at 2:04 PM Martin Liška  wrote:
>
> On 3/3/23 12:12, Richard Biener via Gcc-patches wrote:
> > On Fri, Mar 3, 2023 at 9:30 AM Alexandre Oliva  wrote:
> >>
> >> On Feb 17, 2023, Richard Biener  wrote:
> >>
>  * gimple-ssa-warn-access.cc
>  (pass_waccess::check_dangling_stores): Skip non-stores.
> 
>  for  gcc/testsuite/ChangeLog
> 
>  * g++.dg/warn/Wdangling-pointer.C (warn_init_ref_member): Add
>  two new variants, one fixed, one xfailed.
>  * c-c++-common/Wdangling-pointer-5.c
>  (nowarn_store_arg_store_arg): Add now-expected warnings.
> >>
> >> Ping?
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612186.html
> >
> > I was hoping Martin would chime in, but he didn't.
> >
> > So - OK.
> >
> > Thanks,
> > Richard.
> >
> >>
> >> --
> >> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
> >>Free Software Activist   GNU Toolchain Engineer
> >> Disinformation flourishes because many people care deeply about injustice
> >> but very few check the facts.  Ask me about 
>
> Hi.
>
> I've just noticed this change triggered one more warning for qemu 7.1.0:
>
> cc -m64 -mcx16 -Ilibqemuutil.a.p -I. -I.. -Isubprojects/libvhost-user 
> -I../subprojects/libvhost-user -Iqapi -Itrace -Iui -Iui/shader 
> -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/libmount 
> -I/usr/include/blkid -I/usr/include/gio-unix-2.0 -I/usr/include/p11-kit-1 
> -I/usr/include/pixman-1 -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror 
> -std=gnu11 -O2 -isystem /home/abuild/rpmbuild/BUILD/qemu-7.1.0/linux-headers 
> -isystem linux-headers -iquote . -iquote 
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0 -iquote 
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0/include -iquote 
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0/tcg/i386 -pthread -D_GNU_SOURCE 
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes 
> -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes 
> -fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration 
> -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k 
> -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
> -Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
> -Wno-shift-negative-value -Wno-psabi -fstack-protector-strong -O2 -Wall 
> -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables 
> -fstack-clash-protection -Werror=return-type -U_FORTIFY_SOURCE 
> -D_FORTIFY_SOURCE=2 -fPIE -MD -MQ libqemuutil.a.p/util_async.c.o -MF 
> libqemuutil.a.p/util_async.c.o.d -o libqemuutil.a.p/util_async.c.o -c 
> ../util/async.c
> In file included from 
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/qemu/coroutine.h:18,
>  from 
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/block/aio.h:20,
>  from ../util/async.c:28:
> ../util/async.c: In function 'aio_bh_poll':
> /home/abuild/rpmbuild/BUILD/qemu-7.1.0/include/qemu/queue.h:303:22: error: 
> storing the address of local variable 'slice' in 
> '*ctx.bh_slice_list.sqh_last' [-Werror=dangling-pointer=]
>   303 | (head)->sqh_last = &(elm)->field.sqe_next;
>   \
>   | ~^~~~
> ../util/async.c:161:5: note: in expansion of macro 'QSIMPLEQ_INSERT_TAIL'
>   161 | QSIMPLEQ_INSERT_TAIL(&ctx->bh_slice_list, &slice, next);
>   | ^~~~
> ../util/async.c:156:17: note: 'slice' declared here
>   156 | BHListSlice slice;
>   | ^
> ../util/async.c:156:17: note: 'ctx' declared here
>
> which I reduced to:
>
> $ cat util_async.i
> typedef struct BHListSlice BHListSlice;
> struct BHListSlice {
>   struct {
> BHListSlice *sqe_next;
>   } next;
> } *aio_bh_poll_s;
> struct AioContext {
>   struct {
> BHListSlice *sqh_first;
> BHListSlice **sqh_last;
>   } bh_slice_list;
> } aio_bh_dequeue();
> int aio_bh_poll_bh;
> int aio_bh_poll(struct AioContext *ctx) {
>   BHListSlice slice;
>   (&ctx->bh_slice_list)->sqh_last = &(slice.next.sqe_next);
>   while (aio_bh_poll_s) {
> unsigned flags;
> aio_bh_dequeue(&flags);
> if (aio_bh_poll_bh) {
>   (&ctx->bh_slice_list)->sqh_last = &(&ctx->bh_slice_list)->sqh_first;
>   continue;
> }
>   }
>   return 0;
> }
>
> $ gcc util_async.i -c -Werror=dangling-pointer
> util_async.i: In function ‘aio_bh_poll’:
> util_async.i:16:35: error: storing the address of local variable ‘slice’ in 
> ‘*ctx.bh_slice_list.sqh_last’ [-Werror=dangling-pointer=]
>16 |   (&ctx->bh_slice_list)->sqh_last = &(slice.next.sqe_next);
>   |   ^~~~
> util_async.i:15:15: note: ‘slice’ declared here
>15 |   BHListSlice slice;
>   |   ^
> util_async.i:15:15: note: ‘ctx’ declared here
> cc1: some warnings being treated as errors
>
> Is the emitted warning correct?

For the reduced

Re: HELP: Questions on multiple PROGRAM_SUMMARY sections in a profiling data file

2023-03-08 Thread Jan Hubicka via Gcc-patches

> Hi, Jan,
> 
> I am studying one profiling feedback ICE bug with GCC8 recently. 
> It’s an assertion failure inside the routine “compute_working_sets”of 
> gcov-io.c:
> 
> gcov_nonruntime_assert (ws_ix == NUM_GCOV_WORKING_SETS);
> 
> After some debugging and study, I found that the corresponding .gcda file has 
> two PROGRAM_SUMMARY sections:
> 
> foo.gcda:  a300: 202:PROGRAM_SUMMARY checksum=0x91f3e3ae 
> foo.gcda:counts=10831, runs=0, sum_all=478965, 
> run_max=125615, sum_max=201126 
> foo.gcda:counter histogram: 
> foo.gcda: 0: num counts=10187, min counter=0, cum_counter=0 
> … 
> foo.gcda:51: num counts=1, min counter=14524, 
> cum_counter=14524 
> foo.gcda:63: num counts=1, min counter=125615, 
> cum_counter=125615 
> foo.gcda:  a300: 137:PROGRAM_SUMMARY checksum=0xcf9f0896 
> foo.gcda:counts=10502, runs=1, sum_all=48618, run_max=13999, 
> sum_max=14046 
> foo.gcda:counter histogram: 
> foo.gcda: 0: num counts=9821, min counter=0, cum_counter=0 
> … 
> foo.gcda:43: num counts=1, min counter=3830, cum_counter=3830 
> foo.gcda:50: num counts=1, min counter=13999, 
> cum_counter=13999 
> 
> Looks like the 2nd PROGRAM_SUMMARY has some issue. If I manually change 
> gcc/coverage.c 
> to ignore the 2nd PROGRAM_SUMMARY section, the ICE disappears. 
> 
> I have several questions for the profiling feedback data file:
> 
> 1. Under what situation, there will be multiple PROGRAM_SUMMARY sections for 
> one module?

This is itended to resolve situations where one object file is linked
into multiple final binaries (just like libbackend is linked into
multiple FEs).  The overall binary checksum is known to
the runtime and it creates entry for each checksum.
> 2. How to check whether one of the PROGRAM_SUMMARY has issue?

The histograms was added by google to get better threshold for hot/cold
partitioning. They never worked too well, because it is hard to merge
histograms from multipe runs meaningfully, so we dropped them some time
ago.  With LTO it is easier to produce the histogram after reading all
profiles together and without LTO we use easier heuristics that only
computes constant fraction of maximum number of iterations.

What confuses me is the first summary having runs=0.  I wonder how with
zero runs it even has an entry?

What is the assert that fials in gcov-io.c?

Honza

> 
> Thanks a lot for any help.
> 
> Qing

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Andrew Stubbs


On 08/03/2023 11:06, Tobias Burnus wrote:

Next try – this time with both patches.

On 08.03.23 12:05, Tobias Burnus wrote:

Hi Andrew,

attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.

OK? Comments?


LGTM

Andrew

[PATCH] libstdc++: Fix handling of surrogate CP in codecvt [PR108976]

2023-03-08 Thread Dimitrij Mijoski via Gcc-patches

This patch fixes the handling of surrogate code points in all standard
facets for transcoding Unicode that are based on std::codecvt. Surrogate
code points should always be treated as error. On the other hand
surrogate code units can only appear in UTF-16 and only when they come
in a proper pair.

Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number
of bytes were given in the range [from, from_end), error was returned
always. The last byte in such range does not form a full UTF-16 code
unit and we can not make any decisions for error, instead partial should
be returned.

The testsuite for testing these facets was updated in the following
order:

1. All functions that test codecvts that work with UTF-8 were refactored
   and made more generic so they accept codecvt that works with the char
   type char8_t.
2. The same functions were updated with new test cases for transcoding
   errors and now additionally test for surrogates, overlong UTF-8
   sequences, code points out of the Unicode range, and more tests for
   missing leading and trailing code units.
3. New tests were added to test codecvt_utf16 in both of its variants,
   UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2.

libstdc++-v3/ChangeLog:

* src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of
surrogates in UTF-8.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8.
(ucs4_in): Fix handling of range with odd number of bytes.
(ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16.
(ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16.
(ucs2_in): Fix handling of range with odd number of bytes.
(__codecvt_utf16_base::do_in): Likewise.
(__codecvt_utf16_base::do_in): Likewise.
(__codecvt_utf16_base::do_in): Likewise.
* testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add
tests for codecvt_utf16 and codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8
testing functions for char8_t, add more test cases for errors,
add testing functions for codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc:
Renames, add tests for codecvt_utf16.
* testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test.
---
 libstdc++-v3/src/c++11/codecvt.cc |   18 +-
 .../22_locale/codecvt/codecvt_unicode.cc  |   38 +-
 .../22_locale/codecvt/codecvt_unicode.h   | 1799 +
 .../codecvt/codecvt_unicode_char8_t.cc|   53 +
 .../codecvt/codecvt_unicode_wchar_t.cc|   32 +-
 5 files changed, 1492 insertions(+), 448 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc

diff --git a/libstdc++-v3/src/c++11/codecvt.cc 
b/libstdc++-v3/src/c++11/codecvt.cc
index 02f05752d..2cc812cfc 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -284,6 +284,8 @@ namespace
return invalid_mb_sequence;
   if (c1 == 0xE0 && c2 < 0xA0) [[unlikely]] // overlong
return invalid_mb_sequence;
+  if (c1 == 0xED && c2 >= 0xA0) [[unlikely]] // surrogate
+   return invalid_mb_sequence;
   if (avail < 3) [[unlikely]]
return incomplete_mb_character;
   char32_t c3 = (unsigned char) from[2];
@@ -484,6 +486,8 @@ namespace
 while (from.size())
   {
const char32_t c = from[0];
+   if (0xD800 <= c && c <= 0xDFFF) [[unlikely]]
+ return codecvt_base::error;
if (c > maxcode) [[unlikely]]
  return codecvt_base::error;
if (!write_utf8_code_point(to, c)) [[unlikely]]
@@ -508,7 +512,7 @@ namespace
  return codecvt_base::error;
to = codepoint;
   }
-return from.size() ? codecvt_base::partial : codecvt_base::ok;
+return from.nbytes() ? codecvt_base::partial : codecvt_base::ok;
   }
 
   // ucs4 -> utf16
@@ -521,6 +525,8 @@ namespace
 while (from.size())
   {
const char32_t c = from[0];
+   if (0xD800 <= c && c <= 0xDFFF) [[unlikely]]
+ return codecvt_base::error;
if (c > maxcode) [[unlikely]]
  return codecvt_base::error;
if (!write_utf16_code_point(to, c, mode)) [[unlikely]]
@@ -653,7 +659,7 @@ namespace
 while (from.size() && to.size())
   {
char16_t c = from[0];
-   if (is_high_surrogate(c))
+   if (0xD800 <= c && c <= 0xDFFF)
  return codecvt_base::error;
if (c > maxcode)
  return codecvt_base::error;
@@ -680,7 +686,7 @@ namespace
  return codecvt_base::error;
to = c;
   }
-return from.size() == 0 ? codecvt_base::ok : codecvt_base::partial;
+return from.nbytes() == 0 ? codecvt_base::ok : codecvt_base::partial;
   }
 
   const char16_t*
@@ -1344,8 +1350,6 @@ do_in(state_type&, const extern_type* __from, const 
extern_type* __from_end,
   auto res = ucs2_in(from, to, _M_maxcode, _M_mode);
   __from_next = reinterpret_c

[PATCH] libstdc++: Make views::single/iota/istream SFINAE-friendly [PR108362]

2023-03-08 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps 12?

PR libstdc++/108362

libstdc++-v3/ChangeLog:

* include/std/ranges (__detail::__can_single_view): New concept.
(_Single::operator()): Constrain it.  Move [[nodiscard]] to the
end of the function declarator.
(__detail::__can_iota_view): New concept.
(_Iota::operator()): Constrain it.  Move [[nodiscard]] to the
end of the function declarator.
(__detail::__can_istream_view): New concept.
(_Istream::operator()): Constrain it.  Move [[nodiscard]] to the
end of the function declarator.
* testsuite/std/ranges/iota/iota_view.cc (test07): New test.
* testsuite/std/ranges/istream_view.cc (test08): New test.
* testsuite/std/ranges/single_view.cc (test07): New test.
---
 libstdc++-v3/include/std/ranges   | 40 ++-
 .../testsuite/std/ranges/iota/iota_view.cc| 10 +
 .../testsuite/std/ranges/istream_view.cc  | 12 ++
 .../testsuite/std/ranges/single_view.cc   | 13 ++
 4 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 0a65d74bb5b..67566c6ebcf 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -675,30 +675,41 @@ namespace views
   template
 inline constexpr empty_view<_Tp> empty{};
 
+  namespace __detail
+  {
+template
+  concept __can_single_view
+   = requires { single_view>(std::declval<_Tp>()); };
+  } // namespace __detail
+
   struct _Single
   {
-template
-  [[nodiscard]]
+template<__detail::__can_single_view _Tp>
   constexpr auto
-  operator()(_Tp&& __e) const
+  operator() [[nodiscard]] (_Tp&& __e) const
   noexcept(noexcept(single_view>(std::forward<_Tp>(__e
   { return single_view>(std::forward<_Tp>(__e)); }
   };
 
   inline constexpr _Single single{};
 
+  namespace __detail
+  {
+template
+  concept __can_iota_view = requires { 
iota_view(std::declval<_Args>()...); };
+  } // namespace __detail
+
   struct _Iota
   {
-template
-  [[nodiscard]]
+template<__detail::__can_iota_view _Tp>
   constexpr auto
-  operator()(_Tp&& __e) const
+  operator() [[nodiscard]] (_Tp&& __e) const
   { return iota_view(std::forward<_Tp>(__e)); }
 
 template
-  [[nodiscard]]
+  requires __detail::__can_iota_view<_Tp, _Up>
   constexpr auto
-  operator()(_Tp&& __e, _Up&& __f) const
+  operator() [[nodiscard]] (_Tp&& __e, _Up&& __f) const
   { return iota_view(std::forward<_Tp>(__e), std::forward<_Up>(__f)); }
   };
 
@@ -796,13 +807,22 @@ namespace views
 
 namespace views
 {
+  namespace __detail
+  {
+template
+concept __can_istream_view
+  = requires (_Up __e) {
+   basic_istream_view<_Tp, typename _Up::char_type, typename 
_Up::traits_type>(__e);
+  };
+  };
+
   template
 struct _Istream
 {
   template
-   [[nodiscard]]
constexpr auto
-   operator()(basic_istream<_CharT, _Traits>& __e) const
+   operator() [[nodiscard]] (basic_istream<_CharT, _Traits>& __e) const
+   requires __detail::__can_istream_view<_Tp, 
std::remove_reference_t>
{ return basic_istream_view<_Tp, _CharT, _Traits>(__e); }
 };
 
diff --git a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
index 2dd17113536..0d2eaf1d0c2 100644
--- a/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/iota/iota_view.cc
@@ -110,6 +110,15 @@ test06()
   VERIFY( std::ranges::equal(v3, w3) );
 }
 
+template
+void
+test07()
+{
+  // Verify SFINAE behavior.
+  static_assert(!requires { iota(nullptr); });
+  static_assert(!requires { iota(nullptr, nullptr); });
+}
+
 int
 main()
 {
@@ -119,4 +128,5 @@ main()
   test04();
   test05();
   test06();
+  test07();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/istream_view.cc 
b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
index 26f109fdeaa..cc1c3e006b9 100644
--- a/libstdc++-v3/testsuite/std/ranges/istream_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/istream_view.cc
@@ -115,6 +115,17 @@ test07()
   VERIFY( sum == 10 );
 }
 
+template
+concept can_istream_view = requires (U u) { views::istream(u); };
+
+void
+test08()
+{
+  // Verify SFINAE behavior.
+  struct S { };
+  static_assert(!can_istream_view);
+}
+
 int
 main()
 {
@@ -125,4 +136,5 @@ main()
   test05();
   test06();
   test07();
+  test08();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/single_view.cc 
b/libstdc++-v3/testsuite/std/ranges/single_view.cc
index 063caeb8785..38a3946ca43 100644
--- a/libstdc++-v3/testsuite/std/ranges/single_view.cc
+++ b/libstdc++-v3/testsuite/std/ranges/single_view.cc
@@ -111,6 +111,18 @@ test06()
   auto y = std::views::single(std::move(obj));
 }
 
+template
+void
+test07()
+{
+  // Verify SFINAE behavior.
+

Re: HELP: Questions on multiple PROGRAM_SUMMARY sections in a profiling data file

2023-03-08 Thread Qing Zhao via Gcc-patches

Honza, 

Thanks a lot for your information.

> On Mar 8, 2023, at 8:19 AM, Jan Hubicka  wrote:
> 
>> Hi, Jan,
>> 
>> I am studying one profiling feedback ICE bug with GCC8 recently. 
>> It’s an assertion failure inside the routine “compute_working_sets”of 
>> gcov-io.c:
>> 
>> gcov_nonruntime_assert (ws_ix == NUM_GCOV_WORKING_SETS);
>> 
>> After some debugging and study, I found that the corresponding .gcda file 
>> has two PROGRAM_SUMMARY sections:
>> 
>> foo.gcda:  a300: 202:PROGRAM_SUMMARY checksum=0x91f3e3ae 
>> foo.gcda:counts=10831, runs=0, sum_all=478965, 
>> run_max=125615, sum_max=201126 
>> foo.gcda:counter histogram: 
>> foo.gcda: 0: num counts=10187, min counter=0, cum_counter=0 
>> … 
>> foo.gcda:51: num counts=1, min counter=14524, 
>> cum_counter=14524 
>> foo.gcda:63: num counts=1, min counter=125615, 
>> cum_counter=125615 
>> foo.gcda:  a300: 137:PROGRAM_SUMMARY checksum=0xcf9f0896 
>> foo.gcda:counts=10502, runs=1, sum_all=48618, run_max=13999, 
>> sum_max=14046 
>> foo.gcda:counter histogram: 
>> foo.gcda: 0: num counts=9821, min counter=0, cum_counter=0 
>> … 
>> foo.gcda:43: num counts=1, min counter=3830, 
>> cum_counter=3830 
>> foo.gcda:50: num counts=1, min counter=13999, 
>> cum_counter=13999 
>> 
>> Looks like the 2nd PROGRAM_SUMMARY has some issue. If I manually change 
>> gcc/coverage.c 
>> to ignore the 2nd PROGRAM_SUMMARY section, the ICE disappears. 
>> 
>> I have several questions for the profiling feedback data file:
>> 
>> 1. Under what situation, there will be multiple PROGRAM_SUMMARY sections for 
>> one module?
> 
> This is itended to resolve situations where one object file is linked
> into multiple final binaries (just like libbackend is linked into
> multiple FEs).

Okay, I see. (That was my guess too. -:)
>  The overall binary checksum is known to
> the runtime and it creates entry for each checksum.
Okay. 
>> 2. How to check whether one of the PROGRAM_SUMMARY has issue?
> 
> The histograms was added by google to get better threshold for hot/cold
> partitioning. They never worked too well, because it is hard to merge
> histograms from multipe runs meaningfully, so we dropped them some time
> ago.

Just checked, the PROGRAM_SUMMARY has been removed completely since GCC9 with 
the following commit:

commit 512cc0151207de4c7ff3a84f040f730fe0d52458
Author: Martin Liska 
Date:   Fri Sep 21 10:41:17 2018 +0200

Remove arc profile histogram in non-LTO mode.

2018-09-21  Martin Liska  

>  With LTO it is easier to produce the histogram after reading all
> profiles together and without LTO we use easier heuristics that only
> computes constant fraction of maximum number of iterations.
Okay. 
> 
> What confuses me is the first summary having runs=0.  I wonder how with
> zero runs it even has an entry?
I was surprised by this too. Don’t know the reason yet.

The application that has this problem is huge. I don’t have the complete source 
codes and make files. 

The thing that is even more strange is, there are multiple files have the same 
ICE during profiling use phase, and as I checked today, all the corresponding 
.gcda files have
the exactly same PROGRAM_SUMMARY sections, i.e.

foo.gcda:

foo.gcda:  a300: 202:PROGRAM_SUMMARY checksum=0x91f3e3ae 
foo.gcda:counts=10831, runs=0, sum_all=478965, run_max=125615, 
sum_max=201126 
foo.gcda:counter histogram: 
foo.gcda: 0: num counts=10187, min counter=0, cum_counter=0 
… 
foo.gcda:51: num counts=1, min counter=14524, cum_counter=14524 
foo.gcda:63: num counts=1, min counter=125615, 
cum_counter=125615 
foo.gcda:  a300: 137:PROGRAM_SUMMARY checksum=0xcf9f0896 
foo.gcda:counts=10502, runs=1, sum_all=48618, run_max=13999, 
sum_max=14046 
foo.gcda:counter histogram: 
foo.gcda: 0: num counts=9821, min counter=0, cum_counter=0 
… 
foo.gcda:43: num counts=1, min counter=3830, cum_counter=3830 
foo.gcda:50: num counts=1, min counter=13999, cum_counter=13999 

foo_1.gcda:
foo_1.gcda:  a300: 202:PROGRAM_SUMMARY checksum=0x91f3e3ae 
foo_1.gcda:counts=10831, runs=0, sum_all=478965, 
run_max=125615, sum_max=201126 
foo_1.gcda:counter histogram: 
foo_1.gcda: 0: num counts=10187, min counter=0, cum_counter=0 
… 
foo_1.gcda:51: num counts=1, min counter=14524, 
cum_counter=14524 
foo_1.gcda:63: num counts=1, min counter=125615, 
cum_counter=125615 
foo_1.gcda:  a300: 137:PROGRAM_SUMMARY checksum=0xcf9f0896 
foo_1.gcda:counts=10502, runs=1, sum_all=48618, run_max=13999, 
sum_max=14046 
foo_1.gcda:counter histogram: 
foo_1.gcda: 0: num counts=9821, min counter=0, cum_cou

[PATCH] libstdc++: Implement LWG 3820/3849 changes to cartesian_product_view

2023-03-08 Thread Patrick Palka via Gcc-patches

The LWG 3820 testcase revealed a bug in _M_advance, which this patch
also fixes.

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/std/ranges
(cartesian_product_view::_Iterator::_Iterator): Remove
constraint on default constructor as per LWG 3849.
(cartesian_product_view::_Iterator::_M_prev): Adjust position
of _Nm > 0 test as per LWG 3820.
(cartesian_product_view::_Iterator::_M_advance): Perform bound
checking only on sized cartesian products.
* testsuite/std/ranges/cartesian_product/1.cc (test08): New test.
---
 libstdc++-v3/include/std/ranges   | 23 +++
 .../std/ranges/cartesian_product/1.cc |  9 
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 67566c6ebcf..14f38727198 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -8225,7 +8225,7 @@ namespace views::__adaptor
range_reference_t<__maybe_const_t<_Const, 
_Vs>>...>;
 using difference_type = 
decltype(cartesian_product_view::_S_difference_type());
 
-_Iterator() requires forward_range<__maybe_const_t<_Const, _First>> = 
default;
+_Iterator() = default;
 
 constexpr
 _Iterator(_Iterator __i)
@@ -8390,12 +8390,12 @@ namespace views::__adaptor
 _M_prev()
 {
   auto& __it = std::get<_Nm>(_M_current);
-  if (__it == ranges::begin(std::get<_Nm>(_M_parent->_M_bases)))
-   {
- __it = 
__detail::__cartesian_common_arg_end(std::get<_Nm>(_M_parent->_M_bases));
- if constexpr (_Nm > 0)
+  if constexpr (_Nm > 0)
+   if (__it == ranges::begin(std::get<_Nm>(_M_parent->_M_bases)))
+ {
+   __it = 
__detail::__cartesian_common_arg_end(std::get<_Nm>(_M_parent->_M_bases));
_M_prev<_Nm - 1>();
-   }
+ }
   --__it;
 }
 
@@ -8416,10 +8416,13 @@ namespace views::__adaptor
  if constexpr (_Nm == 0)
{
 #ifdef _GLIBCXX_ASSERTIONS
- auto __size = ranges::ssize(__r);
- auto __begin = ranges::begin(__r);
- auto __offset = __it - __begin;
- __glibcxx_assert(__offset + __x >= 0 && __offset + __x <= __size);
+ if constexpr (sized_range<__maybe_const_t<_Const, _First>>)
+   {
+ auto __size = ranges::ssize(__r);
+ auto __begin = ranges::begin(__r);
+ auto __offset = __it - __begin;
+ __glibcxx_assert(__offset + __x >= 0 && __offset + __x <= 
__size);
+   }
 #endif
  __it += __x;
}
diff --git a/libstdc++-v3/testsuite/std/ranges/cartesian_product/1.cc 
b/libstdc++-v3/testsuite/std/ranges/cartesian_product/1.cc
index f52c2b96d58..56ff3d152c6 100644
--- a/libstdc++-v3/testsuite/std/ranges/cartesian_product/1.cc
+++ b/libstdc++-v3/testsuite/std/ranges/cartesian_product/1.cc
@@ -201,6 +201,14 @@ test07()
   VERIFY( i == 5 );
 }
 
+void
+test08()
+{
+  // LWG 3820
+  auto r = std::views::cartesian_product(std::views::iota(0));
+  r.begin() += 3; // hard error
+}
+
 int
 main()
 {
@@ -211,4 +219,5 @@ main()
   test05();
   static_assert(test06());
   test07();
+  test08();
 }
-- 
2.40.0.rc0.57.g454dfcbddf

[PATCH] libstdc++: Implement LWG 3715 changes to view_interface::empty

2023-03-08 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h (view_interface::empty): Add
preferred overloads that use ranges::size when the range is
sized as per LWG 3715.
* testsuite/std/ranges/adaptors/lwg3715.cc: New test.
---
 libstdc++-v3/include/bits/ranges_util.h   | 16 +++--
 .../testsuite/std/ranges/adaptors/lwg3715.cc  | 33 +++
 2 files changed, 47 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/lwg3715.cc

diff --git a/libstdc++-v3/include/bits/ranges_util.h 
b/libstdc++-v3/include/bits/ranges_util.h
index e4643e31a20..880a0ce0143 100644
--- a/libstdc++-v3/include/bits/ranges_util.h
+++ b/libstdc++-v3/include/bits/ranges_util.h
@@ -97,15 +97,27 @@ namespace ranges
   constexpr bool
   empty()
   noexcept(noexcept(_S_empty(_M_derived(
-  requires forward_range<_Derived>
+  requires forward_range<_Derived> && (!sized_range<_Derived>)
   { return _S_empty(_M_derived()); }
 
+  constexpr bool
+  empty()
+  noexcept(noexcept(ranges::size(_M_derived()) == 0))
+  requires sized_range<_Derived>
+  { return ranges::size(_M_derived()) == 0; }
+
   constexpr bool
   empty() const
   noexcept(noexcept(_S_empty(_M_derived(
-  requires forward_range
+  requires forward_range && (!sized_range)
   { return _S_empty(_M_derived()); }
 
+  constexpr bool
+  empty() const
+  noexcept(noexcept(ranges::size(_M_derived()) == 0))
+  requires sized_range
+  { return ranges::size(_M_derived()) == 0; }
+
   constexpr explicit
   operator bool() noexcept(noexcept(ranges::empty(_M_derived(
   requires requires { ranges::empty(_M_derived()); }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3715.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3715.cc
new file mode 100644
index 000..96ee7087be0
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3715.cc
@@ -0,0 +1,33 @@
+// { dg-options "-std=gnu++23" }
+// { dg-do run { target c++23 } }
+
+// Verify LWG 3715 changes.
+
+#include 
+#include 
+#include 
+
+void
+test01()
+{
+  std::istringstream ints("0 1 2 3 4");
+  auto i = std::views::istream(ints);
+  auto r4 = std::views::counted(i.begin(), 4) | std::views::chunk(2);
+  VERIFY( !r4.empty() );
+}
+
+void
+test02()
+{
+  std::istringstream ints("0 1 2 3 4");
+  auto i = std::views::istream(ints);
+  auto r0 = std::views::counted(i.begin(), 0) | std::views::chunk(2);
+  VERIFY( r0.empty() );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+}
-- 
2.40.0.rc0.57.g454dfcbddf

Re: [PATCH RFC] c++: lambda mangling alias issues [PR107897]

2023-03-08 Thread Jan Hubicka via Gcc-patches

> Tested x86_64-pc-linux-gnu.  Does this look good, or do we want to factor the
> flag clearing into a symtab_node counterpart to cgraph_node::reset?
> 
> -- 8< --
> 
> In 107897, by the time we are looking at the mangling clash, the
> alias has already been removed from the symbol table by analyze_functions,
> so we can't look at n->cpp_implicit_alias.  So just assume that it's an
> alias if it's internal.
> 
> In 108887 the problem is that removing the mangling alias from the symbol
> table confuses analyze_functions, because it ended up as first_analyzed
> somehow, so it becomes a dangling pointer.  Fixed by clearing various flags
> to neutralize the alias.
> 
>   PR c++/107897
>   PR c++/108887
> 
> gcc/cp/ChangeLog:
> 
>   * decl2.cc (record_mangling): Improve symbol table handling.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-lambda3.C: Use -flto if supported.
>   * g++.dg/cpp0x/lambda/lambda-mangle7.C: New test.
> ---
>  gcc/cp/decl2.cc   | 25 +--
>  .../g++.dg/cpp0x/lambda/lambda-mangle7.C  | 70 +++
>  gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C |  1 +
>  3 files changed, 91 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C
> 
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index 387e24542cd..e6e58b08de4 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -4742,15 +4742,30 @@ record_mangling (tree decl, bool need_warning)
>  = mangled_decls->find_slot_with_hash (id, IDENTIFIER_HASH_VALUE (id),
> INSERT);
>  
> -  /* If this is already an alias, remove the alias, because the real
> +  /* If this is already an alias, cancel the alias, because the real
>   decl takes precedence.  */
>if (*slot && DECL_ARTIFICIAL (*slot) && DECL_IGNORED_P (*slot))
> -if (symtab_node *n = symtab_node::get (*slot))
> -  if (n->cpp_implicit_alias)
> +{
> +  if (symtab_node *n = symtab_node::get (*slot))
>   {
> -   n->remove ();
> -   *slot = NULL_TREE;
> +   if (n->cpp_implicit_alias)
> + {
> +   /* Actually removing the node isn't safe if other code is already
> +  holding a pointer to it, so just neutralize it.  */
> +   n->remove_from_same_comdat_group ();
> +   n->analyzed = false;
> +   n->definition = false;
> +   n->alias = false;
> +   n->cpp_implicit_alias = false;
We have n->reset () for that which is used in similar situation when
frontends overwrites extern inline function by its different offline
implementation.

reset doesn't call remove_from_same_comdat_group probably because it was
never used on anything in comdat group.  So I think it would make sense
to call n->reset() here and add remove_from_same_comdat_group into that.

OK with that change.
Honza
> + }
>   }
> +  else
> + /* analyze_functions might have already removed the alias from the
> +symbol table if it's internal.  */
> + gcc_checking_assert (!TREE_PUBLIC (*slot));
> +
> +  *slot = NULL_TREE;
> +}
>  
>if (!*slot)
>  *slot = decl;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C 
> b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C
> new file mode 100644
> index 000..c7946a2be08
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C
> @@ -0,0 +1,70 @@
> +// PR c++/108887
> +// { dg-do compile { target c++11 } }
> +
> +template  struct integral_constant {
> +  static constexpr int value = __v;
> +};
> +using false_type = integral_constant;
> +template  struct __result_of_impl;
> +template 
> +struct __result_of_impl {
> +  typedef decltype(0) type;
> +};
> +template 
> +struct __invoke_result
> +: __result_of_impl 
> {};
> +template 
> +void __invoke_impl(_Fn __f, _Args... __args) {
> +  __f(__args...);
> +}
> +template 
> +void __invoke_r(_Callable __fn, _Args... __args) {
> +  using __result = __invoke_result<_Args...>;
> +  using __type = typename __result::type;
> +  __invoke_impl<__type>(__fn, __args...);
> +}
> +struct QString {
> +  QString(const char *);
> +};
> +template  class function;
> +template  struct _Base_manager {
> +  static _Functor _M_get_pointer(int) { __builtin_abort (); }
> +};
> +template  class _Function_handler;
> +template 
> +struct _Function_handler<_Res(_ArgTypes...), _Functor> {
> +  using _Base = _Base_manager<_Functor>;
> +  static _Res _M_invoke(const int &__functor, _ArgTypes &&...__args) {
> +auto __trans_tmp_1 = _Base::_M_get_pointer(__functor);
> +__invoke_r<_Res>(__trans_tmp_1, __args...);
> +  }
> +};
> +template 
> +struct function<_Res(_ArgTypes...)> {
> +  template 
> +  using _Handler = _Function_handler<_Res(_ArgTypes...), _Functor>;
> +  template  function(_Functor) {
> +using _My_handler = _Handler<_Functor>;
> +_M_invoker = _My_handler::_M_invoke;
> +  }
> +  using _Invoker_type = _R

Re: [PATCH RFC] c++: lambda mangling alias issues [PR107897]

2023-03-08 Thread Jason Merrill via Gcc-patches


On 3/8/23 10:53, Jan Hubicka wrote:

Tested x86_64-pc-linux-gnu.  Does this look good, or do we want to factor the
flag clearing into a symtab_node counterpart to cgraph_node::reset?

-- 8< --

In 107897, by the time we are looking at the mangling clash, the
alias has already been removed from the symbol table by analyze_functions,
so we can't look at n->cpp_implicit_alias.  So just assume that it's an
alias if it's internal.

In 108887 the problem is that removing the mangling alias from the symbol
table confuses analyze_functions, because it ended up as first_analyzed
somehow, so it becomes a dangling pointer.  Fixed by clearing various flags
to neutralize the alias.

PR c++/107897
PR c++/108887

gcc/cp/ChangeLog:

* decl2.cc (record_mangling): Improve symbol table handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-lambda3.C: Use -flto if supported.
* g++.dg/cpp0x/lambda/lambda-mangle7.C: New test.
---
  gcc/cp/decl2.cc   | 25 +--
  .../g++.dg/cpp0x/lambda/lambda-mangle7.C  | 70 +++
  gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C |  1 +
  3 files changed, 91 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 387e24542cd..e6e58b08de4 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -4742,15 +4742,30 @@ record_mangling (tree decl, bool need_warning)
  = mangled_decls->find_slot_with_hash (id, IDENTIFIER_HASH_VALUE (id),
  INSERT);
  
-  /* If this is already an alias, remove the alias, because the real

+  /* If this is already an alias, cancel the alias, because the real
   decl takes precedence.  */
if (*slot && DECL_ARTIFICIAL (*slot) && DECL_IGNORED_P (*slot))
-if (symtab_node *n = symtab_node::get (*slot))
-  if (n->cpp_implicit_alias)
+{
+  if (symtab_node *n = symtab_node::get (*slot))
{
- n->remove ();
- *slot = NULL_TREE;
+ if (n->cpp_implicit_alias)
+   {
+ /* Actually removing the node isn't safe if other code is already
+holding a pointer to it, so just neutralize it.  */
+ n->remove_from_same_comdat_group ();
+ n->analyzed = false;
+ n->definition = false;
+ n->alias = false;
+ n->cpp_implicit_alias = false;

We have n->reset () for that which is used in similar situation when
frontends overwrites extern inline function by its different offline
implementation.


The problem there is that reset() is a member of cgraph_node, not 
symtab_node, and I need something that works for variables as well.



reset doesn't call remove_from_same_comdat_group probably because it was
never used on anything in comdat group.  So I think it would make sense
to call n->reset() here and add remove_from_same_comdat_group into that.

OK with that change.
Honza

+   }
}
+  else
+   /* analyze_functions might have already removed the alias from the
+  symbol table if it's internal.  */
+   gcc_checking_assert (!TREE_PUBLIC (*slot));
+
+  *slot = NULL_TREE;
+}
  
if (!*slot)

  *slot = decl;
diff --git a/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C 
b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C
new file mode 100644
index 000..c7946a2be08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C
@@ -0,0 +1,70 @@
+// PR c++/108887
+// { dg-do compile { target c++11 } }
+
+template  struct integral_constant {
+  static constexpr int value = __v;
+};
+using false_type = integral_constant;
+template  struct __result_of_impl;
+template 
+struct __result_of_impl {
+  typedef decltype(0) type;
+};
+template 
+struct __invoke_result
+: __result_of_impl {};
+template 
+void __invoke_impl(_Fn __f, _Args... __args) {
+  __f(__args...);
+}
+template 
+void __invoke_r(_Callable __fn, _Args... __args) {
+  using __result = __invoke_result<_Args...>;
+  using __type = typename __result::type;
+  __invoke_impl<__type>(__fn, __args...);
+}
+struct QString {
+  QString(const char *);
+};
+template  class function;
+template  struct _Base_manager {
+  static _Functor _M_get_pointer(int) { __builtin_abort (); }
+};
+template  class _Function_handler;
+template 
+struct _Function_handler<_Res(_ArgTypes...), _Functor> {
+  using _Base = _Base_manager<_Functor>;
+  static _Res _M_invoke(const int &__functor, _ArgTypes &&...__args) {
+auto __trans_tmp_1 = _Base::_M_get_pointer(__functor);
+__invoke_r<_Res>(__trans_tmp_1, __args...);
+  }
+};
+template 
+struct function<_Res(_ArgTypes...)> {
+  template 
+  using _Handler = _Function_handler<_Res(_ArgTypes...), _Functor>;
+  template  function(_Functor) {
+using _My_handler = _Handler<_Functor>;
+_M_invoker = _My_handler::_M_invoke;
+  }
+  using _Invoker_type = _Res (*)(const int &, _Arg

[RFC 0/X] Implement GCC support for AArch64 libmvec

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi all,

This is a series of patches/RFCs to implement support in GCC to be able 
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...' 
with a simd construct as the way for glibc to inform GCC what functions 
are available.


For example, if we would like to supply a vector version of the scalar 
'cosf' we would have an include file with something like:

typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
match(construct = {simd(notinbranch, simdlen(4))}, device = 
{isa("simd")})

#pragma omp declare variant(_ZGVnN2v_cosf) \
match(construct = {simd(notinbranch, simdlen(2))}, device = 
{isa("simd")})

#pragma omp declare variant(_ZGVsMxv_cosf) \
match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of 
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp 
declare simd' for SVE and I also do not see a need for the 'omp declare 
variant' scalable extension constructs. I will make changes to the ABI 
once we've finalized the co-design of the ABI and this implementation.


The patch series has three main steps:
1) Add SVE support for 'omp declare simd', see PR 96342
2) Enable GCC to use omp declare variants with simd constructs as simd 
clones during auto-vectorization.
3) Add SLP support for vectorizable_simd_clone_call (This sounded like a 
nice thing to add as we want to move away from non-slp vectorization).


Below you can see the list of current Patches/RFCs, the difference being 
on how confident I am of the proposed changes. For the RFC I am hoping 
to get early comments on the approach, rather than more indepth 
code-reviews.


I appreciate we are still in Stage 4, so I can completely understand if 
you don't have time to review this now, but I thought it can't hurt to 
post these early.


Andre Vieira:
[PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
[PATCH] parloops: Copy target and optimizations when creating a function 
clone

[PATCH] parloops: Allow poly nit and bound
[RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
[RFC] omp: Create simd clones from 'omp declare variant's
[RFC] omp: Allow creation of simd clones from omp declare variant with 
-fopenmp-simd flag


Work in progress:
[RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

[PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This patch replaces the uses of simd_clone_subparts with 
TYPE_VECTOR_SUBPARTS and removes the definition of the first.


gcc/ChangeLog:

* omp-sind-clone.cc (simd_clone_subparts): Remove.
(simd_clone_init_simd_arrays): Replace simd_clone_subparts with 
TYPE_VECTOR_SUBPARTS.

(ipa_simd_modify_function_body): Likewise.
* tree-vect-stmts.cc (simd_clone_subparts): Remove.
(vectorizable_simd_clone_call): Replace simd_clone_subparts 
with TYPE_VECTOR_SUBPARTS.diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
index 
0949b8ba288dfc7e7692403bfc600983faddf5dd..48b480e7556d9ad8e5502e10e513ec36b17b9cbb
 100644
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -255,16 +255,6 @@ ok_for_auto_simd_clone (struct cgraph_node *node)
   return true;
 }
 
-
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these always have a constant length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
of arguments to reserve space for.  */
 
@@ -1027,7 +1017,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
}
  continue;
}
-  if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
+  if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
node->simdclone->simdlen))
{
  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1039,7 +1029,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
}
   else
{
- unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
+ poly_uint64 simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
  unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
 simdlen);
  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
@@ -1225,9 +1215,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
  iter, NULL_TREE, NULL_TREE);
   adjustments->register_replacement (&(*adjustments->m_adj_params)[j], r);
 
-  if (multiple_p (node->simdclone->simdlen, simd_clone_subparts (vectype)))
+  if (multiple_p (node->simdclone->simdlen, TYPE_VECTOR_SUBPARTS 
(vectype)))
j += vector_unroll_factor (node->simdclone->simdlen,
-  simd_clone_subparts (vectype)) - 1;
+  TYPE_VECTOR_SUBPARTS (vectype)) - 1;
 }
   adjustments->sort_replacements ();
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
df6239a1c61c7213ad3c1468723bc1adf70bc02c..c85b6babc4bc5bc3111ef326dcc8f32bb25333f6
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3964,16 +3964,6 @@ vect_simd_lane_linear (tree op, class loop *loop,
 }
 }
 
-/* Return the number of elements in vector type VECTYPE, which is associated
-   with a SIMD clone.  At present these vectors always have a constant
-   length.  */
-
-static unsigned HOST_WIDE_INT
-simd_clone_subparts (tree vectype)
-{
-  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-}
-
 /* Function vectorizable_simd_clone_call.
 
Check if STMT_INFO performs a function call that can be vectorized
@@ -4251,7 +4241,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
  slp_node);
if (arginfo[i].vectype == NULL
|| !constant_multiple_p (bestn->simdclone->simdlen,
-simd_clone_subparts (arginfo[i].vectype)))
+TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
  return false;
   }
 
@@ -4349,15 +4339,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
case SIMD_CLONE_ARG_TYPE_VECTOR:
  atype = bestn->simdclone->args[i].vector_type;
  o = vector_unroll_factor (nunits,
-   simd_clone_subparts (atype));
+   TYPE_VECTOR_SUBPARTS (atype));
  for (m = j * o; m < (j + 1) * o; m++)
{
- if (simd_clone_subparts (atype)
- < simd_clone_subparts (arginfo[i].vectype))
+ poly_uint64 atype_subparts = TYPE_VECTOR_SUBPARTS (atype);
+ poly_uint64 arginfo_subparts
+   = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
+ if (known_lt (atype_subparts, arginfo_subparts))
{
  poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
- k = (simd_clone_subparts (arginfo[i].vectype)
-  / simd_clone_subparts (atype));
+ if (!constant_multiple_p (atype_subparts,
+

[PATCH 2/X] parloops: Copy target and optimizations when creating a function clone

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This patch makes sure we copy over 
DECL_FUNCTION_SPECIFIC_{TARGET,OPTIMIZATION} in parloops when creating 
function clones.  This is required for SVE clones as we will need to 
enable +sve for them, regardless of the current target options.
I don't actually need the 'OPTIMIZATION' for this patch, but it sounds 
like a nice feature to have, so you can use pragmas to better control 
options used in simd_clone generation.


gcc/ChangeLog:

* tree-parloops.cc (create_loop_fn): Copy specific target and 
optimization options

when creating a function clone.

Is this OK for stage 1?diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 
dfb75c369d6d00d893ddd6fc28f189ec0d774711..02c1ed3220a949c1349536ef3f74bb497bf76f71
 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2203,6 +2203,11 @@ create_loop_fn (location_t loc)
   DECL_CONTEXT (t) = decl;
   TREE_USED (t) = 1;
   DECL_ARGUMENTS (decl) = t;
+  DECL_FUNCTION_SPECIFIC_TARGET (decl)
+= DECL_FUNCTION_SPECIFIC_TARGET (act_cfun->decl);
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (decl)
+= DECL_FUNCTION_SPECIFIC_OPTIMIZATION (act_cfun->decl);
+
 
   allocate_struct_function (decl, false);

[PATCH 3/X] parloops: Allow poly number of iterations

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This patch modifies this function in parloops to allow it to handle 
loops with poly iteration counts.


gcc/ChangeLog:

* tree-parloops.cc (try_transform_to_exit_first_loop_alt): 
Handle poly nits.


Is this OK for Stage 1?diff --git a/gcc/tree-parloops.cc b/gcc/tree-parloops.cc
index 
02c1ed3220a949c1349536ef3f74bb497bf76f71..0a3133a3ae7932e11aa680dc14b8ea01613a514c
 100644
--- a/gcc/tree-parloops.cc
+++ b/gcc/tree-parloops.cc
@@ -2531,14 +2531,16 @@ try_transform_to_exit_first_loop_alt (class loop *loop,
   tree nit_type = TREE_TYPE (nit);
 
   /* Figure out whether nit + 1 overflows.  */
-  if (TREE_CODE (nit) == INTEGER_CST)
+  if (TREE_CODE (nit) == INTEGER_CST
+  || TREE_CODE (nit) == POLY_INT_CST)
 {
   if (!tree_int_cst_equal (nit, TYPE_MAX_VALUE (nit_type)))
{
  alt_bound = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR, nit_type,
   nit, build_one_cst (nit_type));
 
- gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+ gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST
+ || TREE_CODE (alt_bound) == POLY_INT_CST);
  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
  return true;
}

[RFC 4/X] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This patch adds SVE support for simd clone generation when using 'omp 
declare simd'. The design is based on what was discussed in PR 96342, 
but I did not look at YangYang's patch as I wasn't sure of whether that 
code's copyright had been assigned to FSF.


This patch also is not in accordance with the examples in the BETA 
VFABIA64 document that can be found in the vfabia64 subdir of 
https://github.com/ARM-software/abi-aa/

If we agree to this approach I will propose changes to the ABI.
It differs in that we take the ommission of 'simdlen' to be the only way 
to create a SVE simd clone using 'omp declare simd', and that the 
current target defined on the command-line has no bearing in what simd 
clones are generated. This SVE simd clone is always VLA.
The document describes a way to specify SVE simdclones of VLS by using 
the simdlen clause, but that would require another way to toggle between 
SVE and Advanced SIMD and since there is no clause to do that for 'omp 
declare simd' I would have to assume this would be controllable by the 
command-line target options (march/mcpu).
By generating all possible Advanced SIMD simdlens and a VLA simdlen for 
SVE when ommitting simdlen we would be adhering to the same practice 
x86_64 does.


Targethook changes

This patch proposes two targethook changes:
1) Add mode parameter to TARGET_SIMD_CLONE_USABLE
We require the mode parameter to distinguish between calls to a simd 
clone from a Advanced SIMD mode and a SVE mode.


2) Add new TARGET_SIMD_CLONE_ADJUST_RET_OR_PARAM
We require this to be able to modify the types used in SVE simd clones, 
as we need to add the SVE type attribute so that the correct PCS can be 
applied.


Other notable changes:
- We discourage the use of an 'inbranch' simdclone for when the caller 
is not in a branch, such that it picks a 'notinbranch' variant if 
available over an inbranch one. (we could probably rely on ordering but 
that's quite error prone and the ordering I'm looking at is by 
definition target specific).
- I currently put the VLA mangling in the target agnostic mangling 
function, if other targets with VLA want to use a different mangling in 
the future we may want to change this into a targethook.



I'll create a ChangeLog when I turn this into a PATCH if we agree on 
this direction.diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
f75eb892f3daa7c2576efcedc8d944ab1e895cdb..122a473770eb4526ecce326f02d843608d088b5b
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -995,6 +995,8 @@ namespace aarch64_sve {
 #ifdef GCC_TARGET_H
   bool verify_type_context (location_t, type_context_kind, const_tree, bool);
 #endif
+ void add_sve_type_attribute (tree, unsigned int, unsigned int,
+ const char *, const char *);
 }
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 
161a14edde7c9fb1b13b146cf50463e2d78db264..6f99c438d10daa91b7e3b623c995489f1a8a0f4c
 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -569,14 +569,16 @@ static bool reported_missing_registers_p;
 /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors
and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
mangling of the type.  ACLE_NAME is the  name of the type.  */
-static void
+void
 add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
const char *mangled_name, const char *acle_name)
 {
   tree mangled_name_tree
 = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
+  tree acle_name_tree
+= (acle_name ? get_identifier (acle_name) : NULL_TREE);
 
-  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
+  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
   value = tree_cons (NULL_TREE, mangled_name_tree, value);
   value = tree_cons (NULL_TREE, size_int (num_pr), value);
   value = tree_cons (NULL_TREE, size_int (num_zr), value);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
5c40b6ed22a508723bd535a7460762c3a243d441..ef93a4e9d43799df4410f152cdd798db285e8897
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4015,13 +4015,13 @@ aarch64_takes_arguments_in_sve_regs_p (const_tree 
fntype)
 static const predefined_function_abi &
 aarch64_fntype_abi (const_tree fntype)
 {
-  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
-return aarch64_simd_abi ();
-
   if (aarch64_returns_value_in_sve_regs_p (fntype)
   || aarch64_takes_arguments_in_sve_regs_p (fntype))
 return aarch64_sve_abi ();
 
+  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)))
+return aarch64_simd_abi ();
+
   return default_function_abi;
 }

[RFC 5/X] omp: Create simd clones from 'omp declare variant's

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This RFC extends the omp-simd-clone pass to create simd clones for 
functions with 'omp declare variant' pragmas that contain simd 
constructs. This patch also implements AArch64's use for this functionality.
This requires two extra pieces of information be kept for each 
simd-clone, a 'variant_name' since each variant has to be named upon 
declaration, and a 'device' since a omp variant has the capability of 
having device clauses that can 'select' the device the variant is meant 
to be used with. For the latter I decided to currently implement it as 
an 'int', to keep a 'code' per device which is target dependent. Though 
we may want to expand this in the future to contain a target dependent 
'target selector' of sorts. This would enable the implementation of the 
'arch' device clause we describe in the BETA ABI can be found in the 
vfabia64 subdir of https://github.com/ARM-software/abi-aa/, this patch 
only implements support for the two 'isa' device clauses isa("simd") and 
isa("sve").


I'll create a ChangeLog when I turn this into a PATCH if we agree on 
this direction.diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 
b5fc739f1b0602a871040292a5bb1d69a9ef305f..ae1af65a9b5913ec435e783223e79767ddd68341
 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -810,6 +810,14 @@ struct GTY(()) cgraph_simd_clone {
   /* Original cgraph node the SIMD clones were created for.  */
   cgraph_node *origin;
 
+  /* This is a flag to indicate what device was selected for the variant
+ clone.  Always 0 for 'omp declare simd' clones.  */
+  unsigned device;
+
+  /* The identifier for the name of the variant in case of a declare variant
+ clone, this is NULL_TREE for declare simd clones.  */
+  tree variant_name;
+
   /* Annotated function arguments for the original function.  */
   cgraph_simd_clone_arg GTY((length ("%h.nargs"))) args[1];
 };
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
ef93a4e9d43799df4410f152cdd798db285e8897..344c6001fdd646a31326f5deb8ff94873d346ed1
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -26970,15 +26970,28 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
 
   clonei->mask_mode = VOIDmode;
   elt_bits = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+  /* A simdclone without simdlen can legally originate from either a:
+ 'omp declare simd':
+   In this case generate at least 3 simd clones, one for Advanced SIMD
+   64-bit vectors, one for Advanced SIMD 128-bit vectors and one for SVE
+   vector length agnostic vectors.
+  'omp declare variant':
+   In this case we must be generating a simd clone for SVE vector length
+   agnostic vectors.
+   */
   if (known_eq (clonei->simdlen, 0U))
 {
-  if (num >= 2)
+  if (clonei->device == 2 || num >= 2)
{
+ count = 1;
  vec_bits = poly_uint64 (128, 128);
  clonei->simdlen = exact_div (vec_bits, elt_bits);
}
   else
{
+ if (clonei->device != 0)
+   return 0;
+
  count = 3;
  vec_bits = (num == 0 ? 64 : 128);
  clonei->simdlen = exact_div (vec_bits, elt_bits);
@@ -26991,7 +27004,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
   /* For now, SVE simdclones won't produce illegal simdlen, So only check
 const simdlens here.  */
   if (clonei->simdlen.is_constant (&const_simdlen)
- && maybe_ne (vec_bits, 64U) && maybe_ne (vec_bits, 128U))
+ /* For Advanced SIMD we require either 64- or 128-bit vectors.  */
+ && ((clonei->device < 2
+  && maybe_ne (vec_bits, 64U)
+  && maybe_ne (vec_bits, 128U))
+ /* For SVE we require multiples of 128-bits.  TODO: should we check
+for max VL?  */
+ || (clonei->device == 2
+ && !constant_multiple_p (vec_bits, 128
{
  if (explicit_p)
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
@@ -27002,7 +27022,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct 
cgraph_node *node,
}
 }
 
-  if (num >= 2)
+  if (clonei->device == 2 || num >= 2)
 {
   clonei->vecsize_mangle = 's';
   clonei->inbranch = 1;
@@ -27082,22 +27102,21 @@ aarch64_simd_clone_adjust_ret_or_param (struct 
cgraph_node *node, tree type,
aarch64_sve_vg = poly_uint16 (2, 2);
   unsigned int num_zr = 0;
   unsigned int num_pr = 0;
+  tree base_type = TREE_TYPE (type);
+  if (POINTER_TYPE_P (base_type))
+   base_type = pointer_sized_int_node;
+  scalar_mode base_mode = as_a  (TYPE_MODE (base_type));
+  machine_mode vec_mode = aarch64_full_sve_mode (base_mode).require ();
+  tree vectype = build_vector_type_for_mode (base_type, vec_mode);
   if (is_mask)
{
- type = truth_type_for (type);
  num_pr = 1;
+ type = truth_type_for (vectype);
}
   el

[RFC 6/X] omp: Allow creation of simd clones from omp declare variant with -fopenmp-simd flag

2023-03-08 Thread Andre Vieira (lists) via Gcc-patches


Hi,

This RFC is to propose relaxing the flag needed to allow the creation of 
simd clones from omp declare variants, such that we can use 
-fopenmp-simd rather than -fopenmp.
This should only change the behaviour of omp simd clones and should not 
enable any other openmp functionality, though I need to test this 
furter, for the time being I just played around a bit with some of the 
existing declare-variant tests.


Any objections to this in general? And/or ideas to properly test the 
effect of this on other omp codegen? My current plan is to have a look 
at the declare-variant tests we had before this patch series, locally 
modify them to pass -fopenmp-simd and make sure they fail the same way 
before and after this patch.diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 
21bc3167ce224823c214efc064be399f2da9c787..b28e3d0a8adb520941dc3a17173cc07de4a653c5
 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23564,6 +23564,13 @@ c_parser_omp_declare (c_parser *parser, enum 
pragma_context context)
  c_parser_omp_declare_reduction (parser, context);
  return false;
}
+  if (strcmp (p, "variant") == 0)
+   {
+ /* c_parser_consume_token (parser); done in
+c_parser_omp_declare_simd.  */
+ c_parser_omp_declare_simd (parser, context);
+ return true;
+   }
   if (!flag_openmp)  /* flag_openmp_simd  */
{
  c_parser_skip_to_pragma_eol (parser, false);
@@ -23575,13 +23582,6 @@ c_parser_omp_declare (c_parser *parser, enum 
pragma_context context)
  c_parser_omp_declare_target (parser);
  return false;
}
-  if (strcmp (p, "variant") == 0)
-   {
- /* c_parser_consume_token (parser); done in
-c_parser_omp_declare_simd.  */
- c_parser_omp_declare_simd (parser, context);
- return true;
-   }
 }
 
   c_parser_error (parser, "expected %, %, "
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 
1aa5f1a7898df9483a2af4f6f9fea99e6b219271..7bd32fd3e345a003be03d1e9acf33db76eed9460
 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8428,7 +8428,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
suppress_warning (decl, OPT_Winit_self);
 }
 
-  if (flag_openmp
+  if (flag_openmp_simd
   && TREE_CODE (decl) == FUNCTION_DECL
   /* #pragma omp declare variant on methods handled in finish_struct
 instead.  */
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 
1a124f5395e018f3c4b2f9f36fcd42159d0b868f..d1c7f9d91d2546ad8f5674232a05f7d7726eeafe
 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -47884,7 +47884,7 @@ cp_parser_omp_declare (cp_parser *parser, cp_token 
*pragma_tok,
  context, false);
  return true;
}
-  if (flag_openmp && strcmp (p, "variant") == 0)
+  if (strcmp (p, "variant") == 0)
{
  cp_lexer_consume_token (parser->lexer);
  cp_parser_omp_declare_simd (parser, pragma_tok,
diff --git a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c 
b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
index 
c44c9464f4e27047db9be5b0c9710ae3cfee8eee..83eeadd108b5578623c63e73dea11b2b17a08618
 100644
--- a/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/declare-variant-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp" } */
+/* { dg-options "-O3 -fopenmp-simd" } */
 
 #include "declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
index 
7a8129fe88ac9759b2337892a3d14f4e8196e61f..616b0ed1c1dc019103dae504d2cec65523a35a3d
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp" } */
+/* { dg-options "-O3 -fopenmp-simd" } */
 
 #include "../declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c 
b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
index 
2b6eabac76cf1cd059ec8d960ddd9e30973dc797..a832c5255306999b0006b68b1890c7f42c3dafb0
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O3 -fopenmp -msve-vector-bits=128" } */
+/* { dg-options "-O3 -fopenmp-simd -msve-vector-bits=128" } */
 
 #include "../declare-variant-1.x"
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c 
b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
index 
e8b598fe479d7e1e92eb7f9e3413d5ac183626a9..455c0338d4680d143daae666c29e4f018df5bff9
 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/declare-variant-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-o

[PATCH] libstdc++: Implement P2520R0 changes to move_iterator's iterator_concept

2023-03-08 Thread Patrick Palka via Gcc-patches

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and perhaps
backports?

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::_S_iter_concept):
Define.
(__cpp_lib_move_iterator_concept): Define for C++20.
(move_iterator::iterator_concept): Strengthen as per P2520R0.
* include/std/version (__cpp_lib_move_iterator_concept): Define
for C++20.
* testsuite/24_iterators/move_iterator/p2520r0.cc: New test.
---
 libstdc++-v3/include/bits/stl_iterator.h  | 20 +-
 libstdc++-v3/include/std/version  |  1 +
 .../24_iterators/move_iterator/p2520r0.cc | 37 +++
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/24_iterators/move_iterator/p2520r0.cc

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index c20dc9ecac5..a6a09dbac16 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1465,11 +1465,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
&& convertible_to;
 #endif
 
+#if __cplusplus > 201703L && __cpp_lib_concepts
+  static auto
+  _S_iter_concept()
+  {
+   if constexpr (random_access_iterator<_Iterator>)
+ return random_access_iterator_tag{};
+   else if constexpr (bidirectional_iterator<_Iterator>)
+ return bidirectional_iterator_tag{};
+   else if constexpr (forward_iterator<_Iterator>)
+ return forward_iterator_tag{};
+   else
+ return input_iterator_tag{};
+  }
+#endif
+
 public:
   using iterator_type = _Iterator;
 
 #if __cplusplus > 201703L && __cpp_lib_concepts
-  using iterator_concept = input_iterator_tag;
+  // This is P2520R0, a C++23 change, but we treat it as a DR against 
C++20.
+# define __cpp_lib_move_iterator_concept 202207L
+  using iterator_concept = decltype(_S_iter_concept());
+
   // iterator_category defined in __move_iter_cat
   using value_type = iter_value_t<_Iterator>;
   using difference_type = iter_difference_t<_Iterator>;
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 871d30db5b3..25ebfc3e512 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -289,6 +289,7 @@
 #define __cpp_lib_polymorphic_allocator 201902L
 #if __cpp_lib_concepts
 # define __cpp_lib_ranges 202110L
+# define __cpp_lib_move_iterator_concept 202207L
 #endif
 #if __cpp_lib_atomic_wait || _GLIBCXX_HAVE_POSIX_SEMAPHORE
 # define __cpp_lib_semaphore 201907L
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/p2520r0.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/p2520r0.cc
new file mode 100644
index 000..883d6cc09e0
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/p2520r0.cc
@@ -0,0 +1,37 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// Verify P2520R0 changes to move_iterator's iterator_concept, which we treat
+// as a DR against C++20.
+
+#include 
+#if __cpp_lib_move_iterator_concept != 202207L
+# error "Feature-test macro __cpp_lib_move_iterator_concept has wrong value in 
"
+#endif
+
+#undef __cpp_lib_move_iterator_concept
+#include 
+#if __cpp_lib_move_iterator_concept != 202207L
+# error "Feature-test macro __cpp_lib_move_iterator_concept has wrong value in 
"
+#endif
+
+#include 
+
+using __gnu_test::test_input_range;
+using __gnu_test::test_forward_range;
+using __gnu_test::test_bidirectional_range;
+using __gnu_test::test_random_access_range;
+
+using ty1 = 
std::move_iterator&>().begin())>;
+static_assert(std::same_as);
+
+using ty2 = 
std::move_iterator&>().begin())>;
+static_assert(std::same_as);
+
+using ty3 = 
std::move_iterator&>().begin())>;
+static_assert(std::same_as);
+
+using ty4 = 
std::move_iterator&>().begin())>;
+static_assert(std::same_as);
+
+static_assert(std::random_access_iterator>);
-- 
2.40.0.rc0.57.g454dfcbddf

Re: [PATCH RFC] c++: lambda mangling alias issues [PR107897]

2023-03-08 Thread Jason Merrill via Gcc-patches


On 3/8/23 11:15, Jason Merrill wrote:

On 3/8/23 10:53, Jan Hubicka wrote:
Tested x86_64-pc-linux-gnu.  Does this look good, or do we want to 
factor the

flag clearing into a symtab_node counterpart to cgraph_node::reset?

-- 8< --

In 107897, by the time we are looking at the mangling clash, the
alias has already been removed from the symbol table by 
analyze_functions,

so we can't look at n->cpp_implicit_alias.  So just assume that it's an
alias if it's internal.

In 108887 the problem is that removing the mangling alias from the 
symbol

table confuses analyze_functions, because it ended up as first_analyzed
somehow, so it becomes a dangling pointer.  Fixed by clearing various 
flags

to neutralize the alias.

PR c++/107897
PR c++/108887

gcc/cp/ChangeLog:

* decl2.cc (record_mangling): Improve symbol table handling.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-lambda3.C: Use -flto if supported.
* g++.dg/cpp0x/lambda/lambda-mangle7.C: New test.
---
  gcc/cp/decl2.cc   | 25 +--
  .../g++.dg/cpp0x/lambda/lambda-mangle7.C  | 70 +++
  gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C |  1 +
  3 files changed, 91 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 387e24542cd..e6e58b08de4 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -4742,15 +4742,30 @@ record_mangling (tree decl, bool need_warning)
  = mangled_decls->find_slot_with_hash (id, IDENTIFIER_HASH_VALUE 
(id),

    INSERT);
-  /* If this is already an alias, remove the alias, because the real
+  /* If this is already an alias, cancel the alias, because the real
   decl takes precedence.  */
    if (*slot && DECL_ARTIFICIAL (*slot) && DECL_IGNORED_P (*slot))
-    if (symtab_node *n = symtab_node::get (*slot))
-  if (n->cpp_implicit_alias)
+    {
+  if (symtab_node *n = symtab_node::get (*slot))
  {
-  n->remove ();
-  *slot = NULL_TREE;
+  if (n->cpp_implicit_alias)
+    {
+  /* Actually removing the node isn't safe if other code is 
already

+ holding a pointer to it, so just neutralize it.  */
+  n->remove_from_same_comdat_group ();
+  n->analyzed = false;
+  n->definition = false;
+  n->alias = false;
+  n->cpp_implicit_alias = false;

We have n->reset () for that which is used in similar situation when
frontends overwrites extern inline function by its different offline
implementation.


The problem there is that reset() is a member of cgraph_node, not 
symtab_node, and I need something that works for variables as well.



reset doesn't call remove_from_same_comdat_group probably because it was
never used on anything in comdat group.  So I think it would make sense
to call n->reset() here and add remove_from_same_comdat_group into that.


How about moving it to symtab_node and using dyn_cast for the cgraph 
bits, like this:
From 1d869ceb04573727e59be6518903133c8654069a Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Mon, 6 Mar 2023 15:33:45 -0500
Subject: [PATCH] c++: lambda mangling alias issues [PR107897]
To: gcc-patches@gcc.gnu.org

In 107897, by the time we are looking at the mangling clash, the
alias has already been removed from the symbol table by analyze_functions,
so we can't look at n->cpp_implicit_alias.  So just assume that it's an
alias if it's internal.

In 108887 the problem is that removing the mangling alias from the symbol
table confuses analyze_functions, because it ended up as first_analyzed
somehow, so it becomes a dangling pointer.  Fixed by clearing various flags
to neutralize the alias.

	PR c++/107897
	PR c++/108887

gcc/ChangeLog:

	* cgraph.h: Move reset() from cgraph_node to symtab_node.
	* cgraphunit.cc (symtab_node::reset): Adjust.

gcc/cp/ChangeLog:

	* decl2.cc (record_mangling): Use symtab_node::reset.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-lambda3.C: Use -flto if supported.
	* g++.dg/cpp0x/lambda/lambda-mangle7.C: New test.
---
 gcc/cgraph.h  | 16 ++---
 gcc/cgraphunit.cc | 27 ---
 gcc/cp/decl2.cc   | 19 +++--
 .../g++.dg/cpp0x/lambda/lambda-mangle7.C  | 70 +++
 gcc/testsuite/g++.dg/cpp2a/concepts-lambda3.C |  1 +
 5 files changed, 109 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-mangle7.C

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index b5fc739f1b0..fb938470be9 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -149,6 +149,14 @@ public:
  cgraph/varpool node creation routines.  */
   void register_symbol (void);
 
+  /* As an GCC extension we allow redefinition of the function.  The
+ semantics when both copies of bodies differ is not well defined.
+ We replace the old body with new body so in unit at a time mode
+ we always use

Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-08 Thread Sandra Loosemore


On 3/8/23 02:11, Arsen Arsenović wrote:


Sandra Loosemore  writes:


On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:

I've rerendered the updated documentation with latest development
Texinfo (as some of the changes I made for the purposes of the GCC
manual still aren't in releases) at:
https://www.aarsen.me/~arsen/final/


Ummm.  I don't think GCC's documentation should depend on an unreleased version
of Texinfo.  Currently install.texi documents that version 4.7 or later is
required, 4.8 for "make pdf"; did I miss something in your patch set that bumps
this requirement?  Exactly what features do you depend on that are not yet
supported by an official Texinfo release?


This patch should still build with older Texinfo versions (albeit, I
hadn't tested 4.7, I missed that requirement).  The unreleased version
should be installed on the server building HTML documentation as it
produces better results w.r.t clickable anchors and index-in-table
handling.  It should not be a hard dependency, and should only degrade
to its current state should in-dev Texinfo be missing.


Hmmm, OK.  We presently have Texinfo version 6.7 installed here, so I'll 
give that a try.  I'm not sure I'd be able to detect problems with 
incorrect HTML anchors or whatever, though.


Most people building GCC from source probably use whatever versions of 
build dependencies are provided by their OS distribution.  In our group 
we need reproducible builds for long-term support so we maintain our own 
list of dependencies and normally update to the latest stable versions 
only once every few years unless there is a hard requirement to upgrade 
some particular tool meanwhile.  I personally do not know how the 
manuals for the GCC web site are built, but it seems kind of important 
to make sure that works as intended since it's the main online resource 
for ordinary GCC users.



It might be worth bumping the minimum, 4.7 is a version from 2004; in
the meanwhile, I'll try a few older versions too.


I agree that it's unlikely anyone is building current GCC with a Texinfo 
version as old as 4.7 any more, and it may be that the manual doesn't 
even build properly with such an old release due to existing 
unintentional dependencies on newer features, independently of your 
patch.  If we do update the version, there's a version check in 
configure.ac and some hack for "makeinfo 4.7 brokenness" in 
doc/install.texi2html that need to be changed, as well as install.texi.


-Sandra

Re: [V4][PATCH 0/2] Handle component_ref to a structure/union field including FAM for builtin_object_size

2023-03-08 Thread Kees Cook via Gcc-patches

On Fri, Feb 24, 2023 at 06:35:03PM +, Qing Zhao wrote:
> Hi, Joseph and Richard,
> 
> Could you please review this patch and let me know whether it’s ready
>  for committing into GCC13?
> 
> The fix to Bug PR101832 is an important patch for kernel security
> purpose. it's better to be put into GCC13.

Hi!

This series tests well for me. Getting this landed would be very nice
for the Linux kernel. :)

Are there any additional review notes for it, or is it ready to land?

Thanks!

-Kees

-- 
Kees Cook

[PATCH] ubsan: missed -fsanitize=bounds for compound ops [PR108060]

2023-03-08 Thread Marek Polacek via Gcc-patches

In this PR we are dealing with a missing .UBSAN_BOUNDS, so the
out-of-bounds access in the test makes the program crash before
a UBSan diagnostic was emitted.  In C and C++, c_genericize gets

  a[b] = a[b] | c;

but in C, both a[b] are one identical shared tree (not in C++ because
cp_fold/ARRAY_REF created two same but not identical trees).  Since
ubsan_walk_array_refs_r keeps a pset, in C we produce

  a[.UBSAN_BOUNDS (0B, SAVE_EXPR , 8);, SAVE_EXPR ;] = a[b] | c;

because the LHS is walked before the RHS.

Since r7-1900, we gimplify the RHS before the LHS.  So the statement above
gets gimplified into

_1 = a[b];
c.0_2 = c;
b.1 = b;
.UBSAN_BOUNDS (0B, b.1, 8);

With this patch we produce:

  a[b] = a[.UBSAN_BOUNDS (0B, SAVE_EXPR , 8);, SAVE_EXPR ;] | c;

which gets gimplified into:

b.0 = b;
.UBSAN_BOUNDS (0B, b.0, 8);
_1 = a[b.0];

therefore we emit a runtime error before making the bad array access.

I think it's OK that only the RHS gets a .UBSAN_BOUNDS, as in few lines
above: the instrumented array access dominates the array access on the
LHS, and I've verified that

  b = 0;
  a[b] = (a[b], b = -32768, a[0] | c);

works as expected: the inner a[b] is OK but we do emit an error for the
a[b] on the LHS.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?

PR sanitizer/108060
PR sanitizer/109050

gcc/c-family/ChangeLog:

* c-gimplify.cc (ubsan_walk_array_refs_r): For a MODIFY_EXPR, instrument
the RHS before the LHS.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/bounds-17.c: New test.
* c-c++-common/ubsan/bounds-18.c: New test.
* c-c++-common/ubsan/bounds-19.c: New test.
* c-c++-common/ubsan/bounds-20.c: New test.
---
 gcc/c-family/c-gimplify.cc   | 12 
 gcc/testsuite/c-c++-common/ubsan/bounds-17.c | 17 +
 gcc/testsuite/c-c++-common/ubsan/bounds-18.c | 17 +
 gcc/testsuite/c-c++-common/ubsan/bounds-19.c | 20 
 gcc/testsuite/c-c++-common/ubsan/bounds-20.c | 16 
 5 files changed, 82 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/bounds-17.c
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/bounds-18.c
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/bounds-19.c
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/bounds-20.c

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 74b276b2b26..ef5c7d919fc 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -106,6 +106,18 @@ ubsan_walk_array_refs_r (tree *tp, int *walk_subtrees, 
void *data)
 }
   else if (TREE_CODE (*tp) == ARRAY_REF)
 ubsan_maybe_instrument_array_ref (tp, false);
+  else if (TREE_CODE (*tp) == MODIFY_EXPR)
+{
+  /* Since r7-1900, we gimplify RHS before LHS.  Consider
+  a[b] |= c;
+wherein we can have a single shared tree a[b] in both LHS and RHS.
+If we only instrument the LHS and the access is invalid, the program
+could crash before emitting a UBSan error.  So instrument the RHS
+first.  */
+  *walk_subtrees = 0;
+  walk_tree (&TREE_OPERAND (*tp, 1), ubsan_walk_array_refs_r, pset, pset);
+  walk_tree (&TREE_OPERAND (*tp, 0), ubsan_walk_array_refs_r, pset, pset);
+}
   return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-17.c 
b/gcc/testsuite/c-c++-common/ubsan/bounds-17.c
new file mode 100644
index 000..b727e3235b8
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-17.c
@@ -0,0 +1,17 @@
+/* PR sanitizer/108060 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } "-flto" } */
+/* { dg-shouldfail "ubsan" } */
+
+int a[8];
+int c;
+
+int
+main ()
+{
+  int b = -32768;
+  a[b] |= c;
+}
+
+/* { dg-output "index -32768 out of bounds for type 'int \\\[8\\\]'" } */
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-18.c 
b/gcc/testsuite/c-c++-common/ubsan/bounds-18.c
new file mode 100644
index 000..556abc0e1c0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-18.c
@@ -0,0 +1,17 @@
+/* PR sanitizer/108060 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } "-flto" } */
+/* { dg-shouldfail "ubsan" } */
+
+int a[8];
+int c;
+
+int
+main ()
+{
+  int b = -32768;
+  a[b] = a[b] | c;
+}
+
+/* { dg-output "index -32768 out of bounds for type 'int \\\[8\\\]'" } */
diff --git a/gcc/testsuite/c-c++-common/ubsan/bounds-19.c 
b/gcc/testsuite/c-c++-common/ubsan/bounds-19.c
new file mode 100644
index 000..54217ae399f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/bounds-19.c
@@ -0,0 +1,20 @@
+/* PR sanitizer/108060 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=bounds" } */
+/* { dg-skip-if "" { *-*-* } "-flto" } */
+/* { dg-shouldfail "ubsan" } */
+
+int a[8];
+int a2[18];
+int c;
+
+int
+main ()
+{
+  int b = 0;
+  a[0] = (a2[b], b = -32768, a[0] | c);

Re: [PATCH] RISC-V: Add fault first load C/C++ support

2023-03-08 Thread Bernhard Reutner-Fischer via Gcc-patches

On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote:
>From: Ju-Zhe Zhong 
>

>+class vleff : public function_base
>+{
>+public:
>+  unsigned int call_properties (const function_instance &) const override
>+  {
>+return CP_READ_MEMORY | CP_WRITE_CSR;
>+  }
>+
>+  gimple *fold (gimple_folder &f) const override
>+  {
>+/* fold vleff (const *base, size_t *new_vl, size_t vl)
>+
>+   > vleff (const *base, size_t vl)
>+   new_vl = MEM_REF[read_vl ()].  */
>+
>+auto_vec vargs;

Where is that magic 8 coming from?

Wouldn't you rather have one temporary to hold this manually CSEd

nargs = gimple_call_num_args (f.call) - 2;

which you would use throughout this function as it does not seem to change?

Would you reserve something based off nargs for the auto_vec above?
If not, please add a comment where the 8 comes from?

thanks,

>+
>+for (unsigned i = 0; i < gimple_call_num_args (f.call); i++)
>+  {
>+  /* Exclude size_t *new_vl argument.  */
>+  if (i == gimple_call_num_args (f.call) - 2)
>+continue;
>+
>+  vargs.quick_push (gimple_call_arg (f.call, i));
>+  }
>+
>+gimple *repl = gimple_build_call_vec (gimple_call_fn (f.call), vargs);
>+gimple_call_set_lhs (repl, f.lhs);
>+
>+/* Handle size_t *new_vl by read_vl.  */
>+tree new_vl = gimple_call_arg (f.call, gimple_call_num_args (f.call) - 2);
>+if (integer_zerop (new_vl))
>+  {
>+  /* This case happens when user passes the nullptr to new_vl argument.
>+ In this case, we just need to ignore the new_vl argument and return
>+ vleff instruction directly. */
>+  return repl;
>+  }
>+
>+tree tmp_var = create_tmp_var (size_type_node, "new_vl");
>+tree decl = get_read_vl_decl ();
>+gimple *g = gimple_build_call (decl, 0);
>+gimple_call_set_lhs (g, tmp_var);
>+tree indirect
>+  = fold_build2 (MEM_REF, size_type_node,
>+   gimple_call_arg (f.call,
>+gimple_call_num_args (f.call) - 2),
>+   build_int_cst (build_pointer_type (size_type_node), 0));
>+gassign *assign = gimple_build_assign (indirect, tmp_var);
>+
>+gsi_insert_after (f.gsi, assign, GSI_SAME_STMT);
>+gsi_insert_after (f.gsi, g, GSI_SAME_STMT);
>+return repl;
>+  }
>+

Re: [PATCH] testsuite: Fix omp-parallel-for-get-min.c and -for-1.c for non-openmp

2023-03-08 Thread David Malcolm via Gcc-patches

On Wed, 2023-03-08 at 05:58 +0100, Hans-Peter Nilsson wrote:
> Committed as obvious.
> -- >8 --
> The recently added tests missed checking for "fopenmp" (see
> other tests where "-fopenmp" is passed), which makes them
> fail on non-openmp systems.

Sorry about that; thanks for the fix.
Dave

Re: [PATCH][stage1] Remove conditionals around free()

2023-03-08 Thread Thomas Schwinge

Hi Bernhard!

On 2023-03-01T22:28:56+0100, Bernhard Reutner-Fischer via Gcc-patches 
 wrote:
> // POSIX: free(NULL) is perfectly valid
> // quote: If ptr is a null pointer, no action shall occur.
> @ rule1 @
> expression e;
> @@
>
> - if (e != NULL)
> -  { free(e); }
> + free (e);

Nice, Coccinelle/spatch!  (Another very interesting tool that I so far
had no chance to actually use.)

> # find ./ \( -name "*.[ch]" -o -name "*.cpp" \) -a \( ! -path 
> "./gcc/testsuite/*" -a ! -path "./gcc/contrib/*" \) -exec spatch --sp-file 
> ~/coccinelle/free-without-if-null.0.cocci --in-place

Also include '*.cc' if you'd like to find some more in 'gcc/' (and
possibly elsewhere, too) than just the following lonely one.  ;-)

> --- a/gcc/ada/rtinit.c
> +++ b/gcc/ada/rtinit.c
> @@ -481,8 +481,7 @@ __gnat_runtime_initialize (int install_handler)
>
>FindClose (hDir);
>
> -  if (dir != NULL)
> -free (dir);
> +  free (dir);


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH V2] RISC-V: Add fault first load C/C++ support

2023-03-08 Thread juzhe . zhong

From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Remove magic number.

---
 gcc/config/riscv/riscv-vector-builtins-bases.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 9f87f8c645a..3f0f809c714 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1617,7 +1617,7 @@ public:
> vleff (const *base, size_t vl)
 new_vl = MEM_REF[read_vl ()].  */
 
-auto_vec vargs;
+auto_vec vargs (gimple_call_num_args (f.call) - 1);
 
 for (unsigned i = 0; i < gimple_call_num_args (f.call); i++)
   {
-- 
2.36.3

Re: Re: [PATCH] RISC-V: Add fault first load C/C++ support

2023-03-08 Thread juzhe.zhong

Address comment and fix it in this V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613608.html



juzhe.zh...@rivai.ai
 
From: Bernhard Reutner-Fischer
Date: 2023-03-09 05:16
To: juzhe.zhong; gcc-patches
CC: kito.cheng; Ju-Zhe Zhong
Subject: Re: [PATCH] RISC-V: Add fault first load C/C++ support
On 7 March 2023 07:21:23 CET, juzhe.zh...@rivai.ai wrote:
>From: Ju-Zhe Zhong 
>
 
>+class vleff : public function_base
>+{
>+public:
>+  unsigned int call_properties (const function_instance &) const override
>+  {
>+return CP_READ_MEMORY | CP_WRITE_CSR;
>+  }
>+
>+  gimple *fold (gimple_folder &f) const override
>+  {
>+/* fold vleff (const *base, size_t *new_vl, size_t vl)
>+
>+   > vleff (const *base, size_t vl)
>+  new_vl = MEM_REF[read_vl ()].  */
>+
>+auto_vec vargs;
 
Where is that magic 8 coming from?
 
Wouldn't you rather have one temporary to hold this manually CSEd
 
nargs = gimple_call_num_args (f.call) - 2;
 
which you would use throughout this function as it does not seem to change?
 
Would you reserve something based off nargs for the auto_vec above?
If not, please add a comment where the 8 comes from?
 
thanks,
 
>+
>+for (unsigned i = 0; i < gimple_call_num_args (f.call); i++)
>+  {
>+ /* Exclude size_t *new_vl argument.  */
>+ if (i == gimple_call_num_args (f.call) - 2)
>+   continue;
>+
>+ vargs.quick_push (gimple_call_arg (f.call, i));
>+  }
>+
>+gimple *repl = gimple_build_call_vec (gimple_call_fn (f.call), vargs);
>+gimple_call_set_lhs (repl, f.lhs);
>+
>+/* Handle size_t *new_vl by read_vl.  */
>+tree new_vl = gimple_call_arg (f.call, gimple_call_num_args (f.call) - 2);
>+if (integer_zerop (new_vl))
>+  {
>+ /* This case happens when user passes the nullptr to new_vl argument.
>+In this case, we just need to ignore the new_vl argument and return
>+vleff instruction directly. */
>+ return repl;
>+  }
>+
>+tree tmp_var = create_tmp_var (size_type_node, "new_vl");
>+tree decl = get_read_vl_decl ();
>+gimple *g = gimple_build_call (decl, 0);
>+gimple_call_set_lhs (g, tmp_var);
>+tree indirect
>+  = fold_build2 (MEM_REF, size_type_node,
>+  gimple_call_arg (f.call,
>+   gimple_call_num_args (f.call) - 2),
>+  build_int_cst (build_pointer_type (size_type_node), 0));
>+gassign *assign = gimple_build_assign (indirect, tmp_var);
>+
>+gsi_insert_after (f.gsi, assign, GSI_SAME_STMT);
>+gsi_insert_after (f.gsi, g, GSI_SAME_STMT);
>+return repl;
>+  }
>+

Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-08 Thread Arsen Arsenović via Gcc-patches

Sandra Loosemore  writes:

> On 3/8/23 02:11, Arsen Arsenović wrote:
>> Sandra Loosemore  writes:
>> 
>>> On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
 I've rerendered the updated documentation with latest development
 Texinfo (as some of the changes I made for the purposes of the GCC
 manual still aren't in releases) at:
 https://www.aarsen.me/~arsen/final/
>>>
>>> Ummm.  I don't think GCC's documentation should depend on an unreleased 
>>> version
>>> of Texinfo.  Currently install.texi documents that version 4.7 or later is
>>> required, 4.8 for "make pdf"; did I miss something in your patch set that 
>>> bumps
>>> this requirement?  Exactly what features do you depend on that are not yet
>>> supported by an official Texinfo release?
>> This patch should still build with older Texinfo versions (albeit, I
>> hadn't tested 4.7, I missed that requirement).  The unreleased version
>> should be installed on the server building HTML documentation as it
>> produces better results w.r.t clickable anchors and index-in-table
>> handling.  It should not be a hard dependency, and should only degrade
>> to its current state should in-dev Texinfo be missing.
>
> Hmmm, OK.  We presently have Texinfo version 6.7 installed here, so I'll give
> that a try.  I'm not sure I'd be able to detect problems with incorrect HTML
> anchors or whatever, though.

As an example, let's take this link:
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Warning-Options.html#index-Wpedantic

This should place you below the item line this index entry refers to,
and there aren't any copiable anchors (see equivalent in my render for
an example of those), both of which were often named as annoyances with
the onlinedocs while the Sphinx experiment was taking place.

A similar thing happens in the standalone and Emacs info viewers (but
that's less noticeable there since the cursor is placed in the middle of
the screen when jumping to an index entry there).  Try, for instance,
'info gcc Wpedantic' (your cursor will be placed just below the item
line).

The fix for the first of these issues should already be applied by
Gerald (in the reordering commits, IIRC at least, save for one that I
created later because someone snuck in new "misplaced" indices), and
that fix should also fix up previous versions of Texinfo.

Even with this change, the copiable anchors will remain missing since
released Texinfo versions lack some AST transformations that enable
those.

Otherwise, manuals should work fine with older releases, unless I missed
something when refactoring @defbuiltin and removing @gols (which I do
believe are superfluous with current versions of texinfo.tex, which is
why I bumped that too).

To be clear, the anchor-related stuff is entirely specific to HTML
output.

> Most people building GCC from source probably use whatever versions of build
> dependencies are provided by their OS distribution.  In our group we need
> reproducible builds for long-term support so we maintain our own list of
> dependencies and normally update to the latest stable versions only once every
> few years unless there is a hard requirement to upgrade some particular tool
> meanwhile.  I personally do not know how the manuals for the GCC web site are
> built, but it seems kind of important to make sure that works as intended 
> since
> it's the main online resource for ordinary GCC users.

Yes, I can get behind this sentiment too.  I don't mean to impose a hard
dependency on the bleeding edge of Texinfo.  My target was indeed the
GCC website and ordinary users.

>> It might be worth bumping the minimum, 4.7 is a version from 2004; in
>> the meanwhile, I'll try a few older versions too.
>
> I agree that it's unlikely anyone is building current GCC with a Texinfo
> version as old as 4.7 any more, and it may be that the manual doesn't even
> build properly with such an old release due to existing unintentional
> dependencies on newer features, independently of your patch.  If we do update
> the version, there's a version check in configure.ac and some hack for
> "makeinfo 4.7 brokenness" in doc/install.texi2html that need to be changed, as
> well as install.texi.

FWIW, I (briefly) tested with Texinfo 6.0, and output seems okay.  On
5.0, I got a few warnings, but I think even 6.0 is apt considering its
age.  I haven't given it a proper scrutiny, though (workdays are busy
this time of year..).

> -Sandra

Thanks, have a lovely evening.
-- 
Arsen Arsenović

signature.asc
Description: PGP signature

Re: [PATCH] combine: Try harder to form zero_extends [PR106594]

2023-03-08 Thread Segher Boessenkool

On Wed, Mar 08, 2023 at 11:58:51AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > An #ifdef is a way of making a change that is not finished yet not hurt
> > the other targets.  It still hurts generic development, which indirectly
> > hurts all targets.
> 
> Seems like this might be moot anyway given that your results
> suggest no impact on other targets.

Which means the patch does not do what it says it does.  It is a net
negative on the only target it did change code on, too.

If the patch did do what it promises it would be a (large!) net benefit,
and also on various other targets.

As it is, either the regression wasn't P1 at all, or the patch doesn't
fix the problem, or the problem only happens in unusual code (or vector
or float code).  Please explain what the regression is you want to
solve?  With a compilable testcase etc., the usual.

> >> Reorganising the
> >> expression in this way isn't something new.  extract_left_shift already
> >> does a similar thing (and does it for all targets).
> >
> > That is not similar at all, no.
> >
> > /* See if X (of mode MODE) contains an ASHIFT of COUNT or more bits that
> >can be commuted with any other operations in X.  Return X without
> >that shift if so.  */
> >
> > If you can factor out a utility function like that, with an actual nice
> > description like that, it would be a much more palatable patch.
> 
> OK, I've factored it out below.  Does the comment look OK?

> As mentioned in the patch description below, I think some of the other
> "and" handling could be moved here too, which should avoid a bit of
> (existing) duplication.  But that isn't necessary to fix the bug.

And stage 1 material.  Like I still think the current patch is as well.

> On the code size results: as I mentioned on IRC yesterday, when I tried
> building the linux kernel locally (with -Os -fno-schedule-insns{,2}),
> I saw code size improvements in 182 .os and a regression in only one .o.
> (I was comparing individual .os because it makes isolation easier.)

Nothing here is about code size.  It is just a good stand-in to compare
the result of a change in combine with: almost all changes in generated
code are because combine could combine more (or fewer) instructions.

This is good if you just want to look at a table of numbers.  It will
often show something is obviously not good, or obviously good, and it
shows what targets are of extra interest.

You still need to look at the actual generated code to confirm things.
For example, with your previous patch on aarch64 part of the code size
increase is extra tail duplication (in the bpf interpreter), not a bad
thing.  Unfortunately that was only a small part of the code size
increase.

> And the testcase in the PR was from a real workload (ArmRAL).

What testcase?  Oh the snippet in #c0?

This is not a good example if you want this to be P1, heh.

> If you can isolate the case you saw, I'll have a look.  But ISTM that
> the change is a pretty clear win on aarch64.

No, *you* have to show it is an improvement.  I have a day job as well,
you know.

> Combine's approach to simplifying a pattern X is to:
> 
> (1) expand "compound" operations (such as extends and extracts)
> into multiple operations (such as shifts), to give a new
> expression X1
> 
> (2) simplify X1 in the normal way, and with some combine-specific
> extras, to give X2
> 
> (3) remake compound operations from X2, to give X3

This is not a good description of what really goes on.  This is only
what is done for some compound operations in some cases.  And then it
is a tiny part of what is done.  And yes, it sucks, but disabling it
causes regressions.

> For example, (1) expands sign_extend to an ashift/ashiftrt pair,
> with the ashift being an immediate operand of the ashiftrt.

It more often converts it to a zero_extend here, for example (a real
one, or what expand_compound_operation comes up with).

> By design, (2) can perturb the structure of the expanded operations
> in X1.

Where X1 is just an abstraction you use here, not something that
actually exists in combine.  Okay.

> Sometimes it will rework them entirely.  But sometimes it
> will keep the outer operations and inner operations, but in a
> slightly different arrangement.  For example, the inner operation
> might no longer be a direct operand of the outer operation.

What does that mean?  It will always still be a single expression.

What is a "direct operand"?  A subexpression?  But it always *is* one.

> The PR contains another case where we need this.  We have:
> 
>   (set (reg:DI R2) (sign_extend:DI (reg:SI R1)))
>   ... (plus:DI (mult:DI (reg:DI R2) (const_int 4)) (reg:DI R3)) ...
> 
> which is a natural, direct comnbination on aarch64.
> 
> First, (1) expands the sign_extend into two operations.  It uses
> R1's nonzero_bits information to determine that the sign extension
> is equivalent to a zero extension:
> 
>   /* Convert sign extension to zero extension, if we kno

[PATCH] rs6000: Accept const pointer operands for MMA builtins [PR109073]

2023-03-08 Thread Peter Bergner via Gcc-patches

PR109073 shows a problem where GCC 11 and GCC 10 do not accept a const
__vector_pair pointer operand to some MMA builtins, which GCC 12 and later
correctly accept.  Fixed here by initializing the builtins to accept const
pointers.

This patch was tested in both GCC 11 and GCC 10 on powerpc64le-linux and
showed no regressions.  Ok for backports?

Peter


gcc/

PR target/109073
* config/rs6000/rs6000-call.c (mma_init_builtins): Accept const pointer
operands for lxvp, stxvp and disassemble builtins.

gcc/testsuite/

PR target/109073
* gcc.target/powerpc/mma-builtin-4.c): New const * test. Update
expected instruction counts.
* gcc.target/powerpc/mma-builtin-5.c: Likewise.
* gcc.target/powerpc/mma-builtin-7.c: Likewise.


diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 1be4797e834..3b6d40f0aef 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14343,22 +14343,30 @@ mma_init_builtins (void)
{
  op[nopnds++] = build_pointer_type (void_type_node);
  if (d->code == MMA_BUILTIN_DISASSEMBLE_ACC)
-   op[nopnds++] = build_pointer_type (vector_quad_type_node);
+   op[nopnds++] = build_pointer_type (build_qualified_type
+(vector_quad_type_node,
+ TYPE_QUAL_CONST));
  else
-   op[nopnds++] = build_pointer_type (vector_pair_type_node);
+   op[nopnds++] = build_pointer_type (build_qualified_type
+(vector_pair_type_node,
+ TYPE_QUAL_CONST));
}
   else if (d->code == VSX_BUILTIN_LXVP)
{
  op[nopnds++] = vector_pair_type_node;
  op[nopnds++] = sizetype;
- op[nopnds++] = build_pointer_type (vector_pair_type_node);
+ op[nopnds++] = build_pointer_type (build_qualified_type
+  (vector_pair_type_node,
+   TYPE_QUAL_CONST));
}
   else if (d->code == VSX_BUILTIN_STXVP)
{
  op[nopnds++] = void_type_node;
  op[nopnds++] = vector_pair_type_node;
  op[nopnds++] = sizetype;
- op[nopnds++] = build_pointer_type (vector_pair_type_node);
+ op[nopnds++] = build_pointer_type (build_qualified_type
+  (vector_pair_type_node,
+   TYPE_QUAL_CONST));
}
   else
{
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c 
b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
index a9fb0107d12..0ba650fcee7 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-4.c
@@ -46,6 +46,15 @@ bar2 (vec_t *dst, __vector_pair *src)
   dst[4] = res[1];
 }
 
+void
+bar3 (vec_t *dst, const __vector_pair *src)
+{
+  vec_t res[2];
+  __builtin_vsx_disassemble_pair (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+}
+
 #if !__has_builtin (__builtin_vsx_assemble_pair)
 #  error "__has_builtin (__builtin_vsx_assemble_pair) failed"
 #endif
@@ -67,7 +76,7 @@ bar2 (vec_t *dst, __vector_pair *src)
 #endif
 
 /* { dg-final { scan-assembler-times {\mlxv\M} 6 } } */
-/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mstxvp\M} 3 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c 
b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
index 00503b7343d..998c436a8bb 100644
--- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-5.c
@@ -31,6 +31,17 @@ bar (vec_t *dst, __vector_quad *src)
   dst[12] = res[3];
 }
 
+void
+bar2 (vec_t *dst, const __vector_quad *src)
+{
+  vec_t res[4];
+  __builtin_mma_disassemble_acc (res, src);
+  dst[0] = res[0];
+  dst[4] = res[1];
+  dst[8] = res[2];
+  dst[12] = res[3];
+}
+
 #if !__has_builtin (__builtin_mma_assemble_acc)
 #  error "__has_builtin (__builtin_mma_assemble_acc) failed"
 #endif
@@ -40,8 +51,8 @@ bar (vec_t *dst, __vector_quad *src)
 #endif
 
 /* { dg-final { scan-assembler-times {\mlxv\M} 8 } } */
-/* { dg-final { scan-assembler-times {\mlxvp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mstxv\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mstxv\M} 8 } } */
 /* { dg-final { scan-assembler-times {\mstxvp\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mxxmfacc\M} 3 } } */
-/* { dg-final { scan-assembler-times {\mxxmtacc\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxmfacc\M} 4 } } */
+/* { dg-final { scan-assembler-times {\m

Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-08 Thread Sandra Loosemore


On 3/8/23 14:22, Arsen Arsenović wrote:


Sandra Loosemore  writes:


On 3/8/23 02:11, Arsen Arsenović wrote:

Sandra Loosemore  writes:


On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:

I've rerendered the updated documentation with latest development
Texinfo (as some of the changes I made for the purposes of the GCC
manual still aren't in releases) at:
 https://www.aarsen.me/~arsen/final/


Ummm.  I don't think GCC's documentation should depend on an unreleased version
of Texinfo.  Currently install.texi documents that version 4.7 or later is
required, 4.8 for "make pdf"; did I miss something in your patch set that bumps
this requirement?  Exactly what features do you depend on that are not yet
supported by an official Texinfo release?

This patch should still build with older Texinfo versions (albeit, I
hadn't tested 4.7, I missed that requirement).  The unreleased version
should be installed on the server building HTML documentation as it
produces better results w.r.t clickable anchors and index-in-table
handling.  It should not be a hard dependency, and should only degrade
to its current state should in-dev Texinfo be missing.


Hmmm, OK.  We presently have Texinfo version 6.7 installed here, so I'll give
that a try.  I'm not sure I'd be able to detect problems with incorrect HTML
anchors or whatever, though.


As an example, let's take this link:
https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Warning-Options.html#index-Wpedantic

This should place you below the item line this index entry refers to,
and there aren't any copiable anchors (see equivalent in my render for
an example of those), both of which were often named as annoyances with
the onlinedocs while the Sphinx experiment was taking place.

A similar thing happens in the standalone and Emacs info viewers (but
that's less noticeable there since the cursor is placed in the middle of
the screen when jumping to an index entry there).  Try, for instance,
'info gcc Wpedantic' (your cursor will be placed just below the item
line).

The fix for the first of these issues should already be applied by
Gerald (in the reordering commits, IIRC at least, save for one that I
created later because someone snuck in new "misplaced" indices), and
that fix should also fix up previous versions of Texinfo.

Even with this change, the copiable anchors will remain missing since
released Texinfo versions lack some AST transformations that enable
those.


OK, I can see the difference there between the current online docs, the 
set you produced with the unreleased Texinfo support, and what I got 
building with Texinfo 6.7.



Otherwise, manuals should work fine with older releases, unless I missed
something when refactoring @defbuiltin and removing @gols (which I do
believe are superfluous with current versions of texinfo.tex, which is
why I bumped that too).


I did a few spot-checks here and there of those changes.  I saw a couple 
of line break problems but they turn out to be due to existing errors in 
the .texi files that were not introduced by your (mostly mechanical) 
changes.



Most people building GCC from source probably use whatever versions of build
dependencies are provided by their OS distribution.  In our group we need
reproducible builds for long-term support so we maintain our own list of
dependencies and normally update to the latest stable versions only once every
few years unless there is a hard requirement to upgrade some particular tool
meanwhile.  I personally do not know how the manuals for the GCC web site are
built, but it seems kind of important to make sure that works as intended since
it's the main online resource for ordinary GCC users.


Yes, I can get behind this sentiment too.  I don't mean to impose a hard
dependency on the bleeding edge of Texinfo.  My target was indeed the
GCC website and ordinary users.


It might be worth bumping the minimum, 4.7 is a version from 2004; in
the meanwhile, I'll try a few older versions too.


I agree that it's unlikely anyone is building current GCC with a Texinfo
version as old as 4.7 any more, and it may be that the manual doesn't even
build properly with such an old release due to existing unintentional
dependencies on newer features, independently of your patch.  If we do update
the version, there's a version check in configure.ac and some hack for
"makeinfo 4.7 brokenness" in doc/install.texi2html that need to be changed, as
well as install.texi.


FWIW, I (briefly) tested with Texinfo 6.0, and output seems okay.  On
5.0, I got a few warnings, but I think even 6.0 is apt considering its
age.  I haven't given it a proper scrutiny, though (workdays are busy
this time of year..).


Texinfo 6.0 was released in 2015, 5.0 in 2013.  FWIW, Trusty Tahr (the 
current oldest Ubuntu LTS release) has 5.2.  4.7 was released in 2004, I 
don't know why anyone would still be trying to use that unless it's 
needed for building legacy code from the same era.


I think we could do away with the requi

Re: [PATCH v2 0/5] A small Texinfo refinement

2023-03-08 Thread Andrew Pinski via Gcc-patches

On Wed, Mar 8, 2023 at 5:09 PM Sandra Loosemore  wrote:
>
> On 3/8/23 14:22, Arsen Arsenović wrote:
> >
> > Sandra Loosemore  writes:
> >
> >> On 3/8/23 02:11, Arsen Arsenović wrote:
> >>> Sandra Loosemore  writes:
> >>>
>  On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:
> > I've rerendered the updated documentation with latest development
> > Texinfo (as some of the changes I made for the purposes of the GCC
> > manual still aren't in releases) at:
> >  https://www.aarsen.me/~arsen/final/
> 
>  Ummm.  I don't think GCC's documentation should depend on an unreleased 
>  version
>  of Texinfo.  Currently install.texi documents that version 4.7 or later 
>  is
>  required, 4.8 for "make pdf"; did I miss something in your patch set 
>  that bumps
>  this requirement?  Exactly what features do you depend on that are not 
>  yet
>  supported by an official Texinfo release?
> >>> This patch should still build with older Texinfo versions (albeit, I
> >>> hadn't tested 4.7, I missed that requirement).  The unreleased version
> >>> should be installed on the server building HTML documentation as it
> >>> produces better results w.r.t clickable anchors and index-in-table
> >>> handling.  It should not be a hard dependency, and should only degrade
> >>> to its current state should in-dev Texinfo be missing.
> >>
> >> Hmmm, OK.  We presently have Texinfo version 6.7 installed here, so I'll 
> >> give
> >> that a try.  I'm not sure I'd be able to detect problems with incorrect 
> >> HTML
> >> anchors or whatever, though.
> >
> > As an example, let's take this link:
> > https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Warning-Options.html#index-Wpedantic
> >
> > This should place you below the item line this index entry refers to,
> > and there aren't any copiable anchors (see equivalent in my render for
> > an example of those), both of which were often named as annoyances with
> > the onlinedocs while the Sphinx experiment was taking place.
> >
> > A similar thing happens in the standalone and Emacs info viewers (but
> > that's less noticeable there since the cursor is placed in the middle of
> > the screen when jumping to an index entry there).  Try, for instance,
> > 'info gcc Wpedantic' (your cursor will be placed just below the item
> > line).
> >
> > The fix for the first of these issues should already be applied by
> > Gerald (in the reordering commits, IIRC at least, save for one that I
> > created later because someone snuck in new "misplaced" indices), and
> > that fix should also fix up previous versions of Texinfo.
> >
> > Even with this change, the copiable anchors will remain missing since
> > released Texinfo versions lack some AST transformations that enable
> > those.
>
> OK, I can see the difference there between the current online docs, the
> set you produced with the unreleased Texinfo support, and what I got
> building with Texinfo 6.7.
>
> > Otherwise, manuals should work fine with older releases, unless I missed
> > something when refactoring @defbuiltin and removing @gols (which I do
> > believe are superfluous with current versions of texinfo.tex, which is
> > why I bumped that too).
>
> I did a few spot-checks here and there of those changes.  I saw a couple
> of line break problems but they turn out to be due to existing errors in
> the .texi files that were not introduced by your (mostly mechanical)
> changes.
>
> >> Most people building GCC from source probably use whatever versions of 
> >> build
> >> dependencies are provided by their OS distribution.  In our group we need
> >> reproducible builds for long-term support so we maintain our own list of
> >> dependencies and normally update to the latest stable versions only once 
> >> every
> >> few years unless there is a hard requirement to upgrade some particular 
> >> tool
> >> meanwhile.  I personally do not know how the manuals for the GCC web site 
> >> are
> >> built, but it seems kind of important to make sure that works as intended 
> >> since
> >> it's the main online resource for ordinary GCC users.
> >
> > Yes, I can get behind this sentiment too.  I don't mean to impose a hard
> > dependency on the bleeding edge of Texinfo.  My target was indeed the
> > GCC website and ordinary users.
> >
> >>> It might be worth bumping the minimum, 4.7 is a version from 2004; in
> >>> the meanwhile, I'll try a few older versions too.
> >>
> >> I agree that it's unlikely anyone is building current GCC with a Texinfo
> >> version as old as 4.7 any more, and it may be that the manual doesn't even
> >> build properly with such an old release due to existing unintentional
> >> dependencies on newer features, independently of your patch.  If we do 
> >> update
> >> the version, there's a version check in configure.ac and some hack for
> >> "makeinfo 4.7 brokenness" in doc/install.texi2html that need to be 
> >> changed, as
> >> well as install.texi.
> >
> > FWIW, I (briefly) teste

[gcc12 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao

From: mayshao-oc 

Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc12 
release branch. Because the bug still exists in gcc12 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc12 branch on Intel, Amd, 
Zhaoxin with make bootstrap and make -k check without failure. Ok for the gcc12 
branch?

BR
Mayshao

gcc/ChangeLog:
PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 gcc/common/config/i386/cpuinfo.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 388f4798406..0333da56ba5 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -910,6 +910,10 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (cpu_model, cpu_model2, cpu_features2,
+ ecx, edx);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -924,9 +928,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_intel_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_INTEL;
@@ -943,9 +944,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_amd_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_AMD;
-- 
2.17.1

[gcc11 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao

Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc11 
release branch. Because the bug still exists in gcc11 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc11 branch on 
Intel,Amd,Zhaoxin with make bootstrap and make -k check without failure. Ok for 
the gcc11 branch?

BR
Mayshao

gcc/ChangeLog:
PR target/100758
* common/config/i386/cpuinfo.h (cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 gcc/common/config/i386/cpuinfo.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 18ff71ac5ad..00b5eee21d4 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -882,6 +882,10 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (cpu_model, cpu_model2, cpu_features2,
+ ecx, edx);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -896,9 +900,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_intel_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_INTEL;
@@ -915,9 +916,6 @@ cpu_indicator_init (struct __processor_model *cpu_model,
   cpu_model2->__cpu_family = family;
   cpu_model2->__cpu_model = model;
 
-  /* Find available features. */
-  get_available_features (cpu_model, cpu_model2, cpu_features2,
- ecx, edx);
   /* Get CPU type.  */
   get_amd_cpu (cpu_model, cpu_model2, cpu_features2);
   cpu_model->__cpu_vendor = VENDOR_AMD;
-- 
2.17.1

[gcc10 backport] i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]

2023-03-08 Thread mayshao

From: mayshao-oc 

Hi Jakub:
  This is backport of the fix for PR target/100758 from mainline to the gcc10 
release branch. Because the bug still exists in gcc10 on Zhaoxin platform, and 
it will incur ISA feature detection failure, we want to fix it as the 
mainline.This patch has been retested against the gcc10 branch on 
Intel,Amd,Zhaoxin with make bootstrap and make -k check without failure. Ok for 
the gcc10 branch?

BR
Mayshao

libgcc/ChangeLog:
PR target/100758
* config/i386/cpuinfo.c (__cpu_indicator_init): Call 
get_available_features
for all CPUs with max_level >= 1, rather than just Intel, AMD.
---
 libgcc/config/i386/cpuinfo.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index 83301a1445f..ad250edbcb7 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -474,6 +474,9 @@ __cpu_indicator_init (void)
   extended_model = (eax >> 12) & 0xf0;
   extended_family = (eax >> 20) & 0xff;
 
+  /* Find available features. */
+  get_available_features (ecx, edx, max_level);
+
   if (vendor == signature_INTEL_ebx)
 {
   /* Adjust model and family for Intel CPUS. */
@@ -487,8 +490,6 @@ __cpu_indicator_init (void)
 
   /* Get CPU type.  */
   get_intel_cpu (family, model, brand_id);
-  /* Find available features. */
-  get_available_features (ecx, edx, max_level);
   __cpu_model.__cpu_vendor = VENDOR_INTEL;
 }
   else if (vendor == signature_AMD_ebx)
@@ -502,8 +503,6 @@ __cpu_indicator_init (void)
 
   /* Get CPU type.  */
   get_amd_cpu (family, model);
-  /* Find available features. */
-  get_available_features (ecx, edx, max_level);
   __cpu_model.__cpu_vendor = VENDOR_AMD;
 }
   else
-- 
2.17.1

Re: [PATCH v2 1/5] docs: Create Indices appendix

2023-03-08 Thread Sandra Loosemore


On 2/23/23 03:27, Arsen Arsenović via Gcc-patches wrote:

The GCC manual has multiple indices.  By creating an appendix which
lists them, we help makeinfo present a more accessible way for the
reader to see all the indices.

gcc/ChangeLog:

* doc/gcc.texi: Add the Indices appendix, to make texinfo
generate nice indices overview page.
(@copying): Move "This file documents the use of the GNU
compilers" into @copying.  Add quotations around cover texts.



I guess this patch is OK and is necessary to smooth over some misfeatures
in newer versions of Texinfo.  In particular, comparing your sample output
https://www.aarsen.me/~arsen/final/gcc.html/index.html

to my own fresh Texinfo 6.7-generated version with your patches applied, 
and the existing online documention like


https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/index.html

the order of the "Short Table of Contents" and longer "Table of 
Contents" have been switched, so that in the new version you have to 
scroll all the way down to the bottom of the page (ugh) to click on 
"Option Index".  (Frankly, this seems like a misfeature; the point of 
having a "Short Table of Contents" is *not* to have to page through the 
long one to find a particular chapter.)


I guess that is a Texinfo change?  gcc.texi still has:

@summarycontents
@contents

in that order.

OTOH, I see that in your new version there is now a line with links 
[Contents][Index] before the Introduction.  If adding this new appendix 
makes the [Index] link point at the indices, I think it is OK, although 
I'm still worried that the overall effect (even without the new version 
of Texinfo) is making the indices harder to find.


I wonder, could we add something to the Introduction text like

Tip: This manual is very long.  If you're looking for something in 
particular, try searching the @ref{Option Index} or @ref{Concept and 
Symbol Index}.


???

-Sandra

[pushed] wwwdocs: gcc-13: Spell front end (noun) without dash

2023-03-08 Thread Gerald Pfeifer

Pushed in alignment with the list in our coding conventions.

Gerald
---
 htdocs/gcc-13/porting_to.html | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/porting_to.html b/htdocs/gcc-13/porting_to.html
index 733bb254..170da096 100644
--- a/htdocs/gcc-13/porting_to.html
+++ b/htdocs/gcc-13/porting_to.html
@@ -121,8 +121,9 @@ the operand as an lvalue.
 
 
 Excess precision changes
+
 GCC 13 implements in C++ excess precision 
support
-which has been before implemented just in the C front-end.  The new behavior is
+which has been before implemented just in the C front end.  The new behavior is
 enabled by default in -std=c++NN modes and e.g. when
 FLT_EVAL_METHOD is 1 or 2 affects behavior of floating point
 constants and expressions.  E.g. for FLT_EVAL_METHOD equal
-- 
2.39.1

Re: [wwwdocs] gcc-13/porting_to.html: Document C++ -fexcess-precision=standard

2023-03-08 Thread Gerald Pfeifer

On Thu, 2 Mar 2023, Jakub Jelinek wrote:
> --- a/htdocs/gcc-13/porting_to.html
> +++ b/htdocs/gcc-13/porting_to.html

> +GCC 13 implements in C++ excess precision 
> support
> +which has been implemented just in the C front-end before.  The new behavior 
> is
> +enabled by default in -std=c++NN modes and when
> +FLT_EVAL_METHOD is 1 or 2 affects behavior of floating point
   ^^
> +constants and expressions.  E.g. for FLT_EVAL_METHOD equal
> +to 2 on ia32:

:

> +will not abort with standard excess precision, because constants and 
> expressions
> +in float or double are evaluated in precision of
> +long double and demoted only on casts or assignments, but will
> +abort with fast excess precision, where whether something is evaluated in
> +precision of long double or not depends on what evaluations are
> +done in the i387 floating point stack or are spilled from it.
> +
> +The -fexcess-precision=fast option can be used to request the
> +previous behavior.

I struggled a bit understanding this and so have come up with what I 
hope is simpler (without changing the meaning).

What do you think of the change below?

Gerald


diff --git a/htdocs/gcc-13/porting_to.html b/htdocs/gcc-13/porting_to.html
index 170da096..8a2822ff 100644
--- a/htdocs/gcc-13/porting_to.html
+++ b/htdocs/gcc-13/porting_to.html
@@ -122,12 +122,14 @@ the operand as an lvalue.
 
 Excess precision changes
 
-GCC 13 implements in C++ excess precision 
support
-which has been before implemented just in the C front end.  The new behavior is
-enabled by default in -std=c++NN modes and e.g. when
-FLT_EVAL_METHOD is 1 or 2 affects behavior of floating point
-constants and expressions.  E.g. for FLT_EVAL_METHOD equal
-to 2 on ia32:
+GCC 13 implements excess precision
+support, which was implemented just in the C front end
+before, in C++. The new behavior is enabled by default in
+-std=c++NN modes and when
+FLT_EVAL_METHOD is 1 or 2 and affects the behavior of
+floating point constants and expressions.
+
+E.g. for FLT_EVAL_METHOD equal to 2 on ia32
 
 
 #include 
@@ -139,11 +141,11 @@ will not abort with standard excess precision, because 
constants and expressions
 in float or double are evaluated in precision of
 long double and demoted only on casts or assignments, but will
 abort with fast excess precision, where whether something is evaluated in
-precision of long double or not depends on what evaluations are
-done in the i387 floating point stack or are spilled from it.
+long double precision depends on what evaluations are
+done in the i387 floating point stack or are spilled from it.
 
-The -fexcess-precision=fast option can be used to request the
-previous behavior.
+The -fexcess-precision=fast option can be used to
+request the previous behavior.
 
 allocator_traits::rebind_alloc 
must be A

[PATCH v2] vect: Check that vector factor is a compile-time constant

2023-03-08 Thread Michael Collison

2023-03-05  Michael Collison  

* tree-vect-loop-manip.cc (vect_do_peeling): Use
result of constant_lower_bound instead of vf in case
vf is not a compile time constant.
---
 gcc/tree-vect-loop-manip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d88edafa018..f60fa50e8f4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2921,7 +2921,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (new_var_p)
{
  value_range vr (type,
- wi::to_wide (build_int_cst (type, vf)),
+ wi::to_wide (build_int_cst (type, lowest_vf)),
  wi::to_wide (TYPE_MAX_VALUE (type)));
  set_range_info (niters, vr);
}
-- 
2.34.1

62 matches

Mail list logo