[Bug target/111591] ppc64be: miscompilation with -mstrict-align / -O3

2023-10-19 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111591

--- Comment #24 from Kewen Lin  ---
(In reply to Richard Biener from comment #22)
> I see the mems properly get their base adjusted:
> 
> (insn 384 383 0 (set (mem/c:V2DI (plus:DI (reg/f:DI 112 virtual-stack-vars)
> (const_int 16 [0x10])) [7 MEM[(struct Vec128D.30433 *)_10]+0
> S16 A128])
> (reg:V2DI 616)) -1
>  (nil))
> 
> vs.
> 
> (insn 389 388 390 (set (reg:HI 619)
> (mem/c:HI (plus:DI (reg/f:DI 112 virtual-stack-vars)
> (const_int 16 [0x10])) [4 MEM[(struct Vec128D.30212
> *)_10].rawD.30221[0]+0 S2 A128])) "test.cc":218:14 -1
>  (nil))
> 
> both are based off a fake _10.  But we get alias sets 7 and 4 used here
> which might be a problem.
> 
> See update_alias_info_with_stack_vars and uses of decls_to_pointers,
> in particular from set_mem_attributes_minus_bitpos where we preserve
> TBAA info with the rewrite.  I'm not sure why that should be OK ...
> (but I'm sure I must have thought of this problem back in time)
> 
> Does the following fix the testcase?
> 
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index 84b6833225e..81c0a63eddc 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -2128,7 +2128,7 @@ set_mem_attributes_minus_bitpos (rtx ref, tree t, int
> objectp,
>   tree *orig_base = 
>   while (handled_component_p (*orig_base))
> orig_base = _OPERAND (*orig_base, 0);
> - tree aptrt = reference_alias_ptr_type (*orig_base);
> + tree aptrt = ptr_type_node;
>   *orig_base = build2 (MEM_REF, TREE_TYPE (*orig_base), *namep,
>build_int_cst (aptrt, 0));
> }

Sorry, this doesn't help.

I noticed that it makes insns 384 and 389 become to:

(insn 384 383 0 (set (mem/c:V2DI (plus:DI (reg/f:DI 112 virtual-stack-vars)
(const_int 16 [0x10])) [7 MEM  [(voidD.48
*)_10]+0 S16 A128])
(reg:V2DI 616)) -1
 (nil))

(insn 389 388 390 (set (reg:HI 619)
(mem/c:HI (plus:DI (reg/f:DI 112 virtual-stack-vars)
(const_int 16 [0x10])) [4 MEM  [(voidD.48
*)_10].rawD.30221[0]+0 S2 A128])) "test.cc":218:14 -1
 (nil))

alias sets are not changed. Aggressively further hacking with attrs.alias = 0
can make it pass. Can we make an new alias set for each partition? then all
involved decls in the same partition is aliased. For a particular involved
decl, it's aliased to the previous ones and the new ones in its own partitions.

[PATCH v4] Introduce hardbool attribute for C

2023-10-19 Thread Alexandre Oliva
Here's a refreshed and retested version of the patch for hardened
booleans in C.  It is unchanged aside from some conflict resolution,
compared with the previous version posted back in June.
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622668.html

Regstrapped on x86_64-linux-gnu and ppc64le-linux-gnu.  Ok to install?



This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.


for  gcc/c-family/ChangeLog

* c-attribs.cc (c_common_attribute_table): Add hardbool.
(handle_hardbool_attribute): New.
(type_valid_for_vector_size): Reject hardbool.
* c-common.cc (convert_and_check): Skip warnings for convert
and check for hardbool.
(c_hardbool_type_attr_1): New.
* c-common.h (c_hardbool_type_attr): New.

for  gcc/c/ChangeLog

* c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
* c-convert.cc (convert): Convert to hardbool through
truthvalue.
* c-decl.cc (check_bitfield_type_and_width): Skip enumeral
truncation warnings for hardbool.
(finish_struct): Propagate hardbool attribute to bitfield
types.
(digest_init): Convert to hardbool.

for  gcc/ChangeLog

* doc/extend.texi (hardbool): New type attribute.
* doc/invoke.texi (-ftrivial-auto-var-init): Document
representation vs values.

for  gcc/testsuite/ChangeLog

* gcc.dg/hardbool-err.c: New.
* gcc.dg/hardbool-trap.c: New.
* gcc.dg/hardbool.c: New.
* gcc.dg/hardbool-s.c: New.
* gcc.dg/hardbool-us.c: New.
* gcc.dg/hardbool-i.c: New.
* gcc.dg/hardbool-ul.c: New.
* gcc.dg/hardbool-ll.c: New.
* gcc.dg/hardbool-5a.c: New.
* gcc.dg/hardbool-s-5a.c: New.
* gcc.dg/hardbool-us-5a.c: New.
* gcc.dg/hardbool-i-5a.c: New.
* gcc.dg/hardbool-ul-5a.c: New.
* gcc.dg/hardbool-ll-5a.c: New.
---
 gcc/c-family/c-attribs.cc |   97 -
 gcc/c-family/c-common.cc  |   21 
 gcc/c-family/c-common.h   |   18 
 gcc/c/c-convert.cc|   14 +++
 gcc/c/c-decl.cc   |   10 ++
 gcc/c/c-typeck.cc |   31 ++-
 gcc/doc/extend.texi   |   65 ++
 gcc/doc/invoke.texi   |   21 
 gcc/testsuite/gcc.dg/hardbool-err.c   |   31 +++
 gcc/testsuite/gcc.dg/hardbool-trap.c  |   13 +++
 gcc/testsuite/gcc.dg/torture/hardbool-5a.c|6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-s.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-us.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool.c   |  118 +
 22 files changed, 497 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-err.c
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-trap.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index abf44d5426e82..232365d46e237 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -176,6 +176,7 @@ static tree handle_objc_root_class_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_objc_nullability_attribute (tree *, tree, tree, int, bool 
*);
 static tree handle_signed_bool_precision_attribute (tree *, tree, tree, int,
 

[Bug tree-optimization/50856] ARM: suboptimal code for absolute difference calculation

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

--- Comment #4 from Andrew Pinski  ---
Here is a full testcase (f3 is caught via fold_cond_expr_with_comparison):
```
int f(int a, int b)
{
  int t = a - b;
  if (t > 0) return t;
  return b - a;
}
int f1(int a, int b)
{
  if (a > b)  return a - b;
  return b - a;
}

int f2(int a, int b)
{
 return (a > b) ? a - b : b - a;
}

int f3(int a, int b)
{
 return (a - b) > 0 ? a - b : b - a;
}
```

I should note we currently have the following note in match.pd about not
folding `(a - b) > 0` because of catching f3:
/* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
   ??? The transformation is valid for the other operators if overflow
   is undefined for the type, but performing it here badly interacts
   with the transformation in fold_cond_expr_with_comparison which
   attempts to synthetize ABS_EXPR.  */

Re: [PATCH] move the (a-b) CMP 0 ? (a-b) : (b-a) optimization from fold_cond_expr_with_comparison to match

2023-10-19 Thread Andrew Pinski
On Mon, Jul 12, 2021 at 4:47 AM Richard Biener via Gcc-patches
 wrote:
>
> On Sun, Jul 11, 2021 at 4:12 AM apinski--- via Gcc-patches
>  wrote:
> >
> > From: Andrew Pinski 
> >
> > This patch moves the (a-b) CMP 0 ? (a-b) : (b-a) optimization
> > from fold_cond_expr_with_comparison to match.
>
> So I searched and I guess these transforms are produced from
>
>   /* If we have A op 0 ? A : -A, consider applying the following
>  transformations:
>
>  A == 0? A : -Asame as -A
>  A != 0? A : -Asame as A
>  A >= 0? A : -Asame as abs (A)
>  A > 0?  A : -Asame as abs (A)
>  A <= 0? A : -Asame as -abs (A)
>  A < 0?  A : -Asame as -abs (A)
>
>  None of these transformations work for modes with signed
>  zeros.  If A is +/-0, the first two transformations will
>  change the sign of the result (from +0 to -0, or vice
>  versa).  The last four will fix the sign of the result,
>  even though the original expressions could be positive or
>  negative, depending on the sign of A.
>
>  Note that all these transformations are correct if A is
>  NaN, since the two alternatives (A and -A) are also NaNs.  */
>   if (!HONOR_SIGNED_ZEROS (type)
>   && (FLOAT_TYPE_P (TREE_TYPE (arg01))
>   ? real_zerop (arg01)
>   : integer_zerop (arg01))
>   && ((TREE_CODE (arg2) == NEGATE_EXPR
>&& operand_equal_p (TREE_OPERAND (arg2, 0), arg1, 0))
>  /* In the case that A is of the form X-Y, '-A' (arg2) may
> have already been folded to Y-X, check for that. */
>   || (TREE_CODE (arg1) == MINUS_EXPR
>   && TREE_CODE (arg2) == MINUS_EXPR
>   && operand_equal_p (TREE_OPERAND (arg1, 0),
>   TREE_OPERAND (arg2, 1), 0)
>   && operand_equal_p (TREE_OPERAND (arg1, 1),
>   TREE_OPERAND (arg2, 0), 0
> ...
>
> I wonder at which point we can remove the code from fold-const.c?

I have to double check if after an updated patch, if that code does
anything that match does not do.
I will do that before I submit an updated patch.

>
> Some comments inline below.
>
> > OK? Bootstrapped and tested on x86_64-linux-gnu.
> >
> > gcc/ChangeLog:
> >
> > * match.pd ((A-B) CMP 0 ? (A-B) : (B - A)):
> > New patterns.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/phi-opt-25.c: New test.
> > ---
> >  gcc/match.pd   | 48 --
> >  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25.c | 45 
> >  2 files changed, 90 insertions(+), 3 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 30680d488ab..aa88381fdcb 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -4040,9 +4040,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(cnd (logical_inverted_value truth_valued_p@0) @1 @2)
> >(cnd @0 @2 @1)))
> >
> > -/* abs/negative simplifications moved from fold_cond_expr_with_comparison,
> > -   Need to handle (A - B) case as fold_cond_expr_with_comparison does.
> > -   Need to handle UN* comparisons.
> > +/* abs/negative simplifications moved from fold_cond_expr_with_comparison.
> >
> > None of these transformations work for modes with signed
> > zeros.  If A is +/-0, the first two transformations will
> > @@ -4098,6 +4096,50 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (convert (negate (absu:utype @0
> > (negate (abs @0)
> >   )
> > +
> > + /* (A - B) == 0 ? (A - B) : (B - A)same as (B - A) */
> > + (for cmp (eq uneq)
> > +  (simplify
> > +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus@3 @2 @1))
> > +(if (!HONOR_SIGNED_ZEROS (type))
> > + @3))
> > +  (simplify
> > +   (cnd (cmp (minus@0 @1 @2) zerop) integer_zerop (minus@3 @2 @1))
>
> So that makes me think why integer_zerop?  'type' should then be
> integer and thus never HONOR_SIGNED_ZEROS.

yes that should be done.

>
> Don't we also need the inverted condition case for completeness?

Yes we should. Though for phiopt we don't.


>
> > +(if (!HONOR_SIGNED_ZEROS (type))
> > + @3))
> > +  (simplify
> > +   (cnd (cmp @1 @2) integer_zerop (minus@3 @2 @1))
>
> I think this needs to be (cmp:c @1 @2)

This is now actually handled already by r14-3606-g3d86e7f4a8ae so I removed it.
I will submit a new patch in the next few days for this too.

Thanks,
Andrew Pinski

>
> > +(if (!HONOR_SIGNED_ZEROS (type))
> > + @3))
> > + )
> > + /* (A - B) != 0 ? (A - B) : (B - A)same as (A - B) */
> > + (for cmp (ne ltgt)
> > +  (simplify
> > +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> > +(if (!HONOR_SIGNED_ZEROS (type))
> > + @0))
> > + )
> > + /* (A - B) >=/> 0 ? (A - B) : (B - A)same as abs (A - B) */
> > + (for cmp (ge gt)
> > +  (simplify
> > +   (cnd (cmp (minus@0 @1 @2) zerop) @0 (minus @2 @1))
> > +(if 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread waffl3x
> > 
> > Based on what you've said, I assume that OFFSET_REF handles static
> > member functions that are overloaded. But as I've said this seems to
> > contradict the comments I'm reading, so I'm not sure that I'm
> > understanding you correctly.
> 
> 
> That's right. For instance,
> 
> struct A {
> static void g();
> static void g(int);
> };
> 
> void (*p)(int) = ::g; // cp_build_addr_expr_1 gets an OFFSET_REF
> 
> > > > I think we need the OFFSET_REF for an explicit-object member function
> > > > because it expresses that the code satisfies the requirement "If the
> > > > operand names an explicit object member function, the operand shall be a
> > > > qualified-id."
> > 
> > I do agree here, but it does reinforce that OFFSET_REF is no longer
> > just for members represented by pointer to member type. So that might
> > be something to take into consideration.
> 
> 
> An OFFSET_REF that isn't type_unknown_p, agreed.
> 
> > > > It might simplify things to remove the optimization in build_offset_ref
> > > > so we get an OFFSET_REF even for a single static member function, and
> > > > add support for that to cp_build_addr_expr_1.
> > 
> > I don't think this should be necessary, the "right thing" should just
> > be done for explicit-object member functions. With all the stuff going
> > on here that I missed I'm starting to wonder how function overloads
> > ever worked at all in my patch. On the other hand though, this
> > optimization probably could be documented better, but I very well might
> > have missed it even if it were.
> > 
> > Hell, maybe it needs a greater redesign altogether, it seems strange to
> > me to bundle overload information in with a construct for a specific
> > expression. (Assuming that's whats happening of course, I still don't
> > fully understand it.) It's not like this has rules unique to it for how
> > overload resolution is decided, right? Initializing a param/variable of
> > pointer to function type with an overloaded function resolves that with
> > similar rules, I think? Maybe it is a little different now that I write
> > it out loud.
> > 
> > I wasn't going to finish my musings about that, but it made me realize
> > that it might not actually be correct for address of explicit-object
> > member functions to be wrapped by OFFSET_REF. I mean surely it's fine
> > because based on what you've said static member functions are also
> > wrapped by OFFSET_REF, so it's likely fully implemented, especially
> > considering things worked before. But now that there are 2 different
> > varieties of class members that the address of them can be taken, it
> > might make sense to split things up a bit? Then again, why were static
> > member functions ever handled the same way? Taking the address of other
> > static members isn't handled in the same way here is it?
> 
> 
> Functions are different because of overloading; in general we can't
> decide what an expression that names a function actually means until we
> have enough context to decide which function, exactly. So we represent
> the id-expression largely as lookup+syntax until overload resolution
> turns it into a specific function. The type_unknown_p check earlier in
> cp_build_addr_expr_1 is for that case.

Yeah this all makes sense, but that's why I was confused by the
following documentation from cp-tree.def.

```
/* An OFFSET_REF is used in two situations:

   1. An expression of the form `A::m' where `A' is a class and `m' is
  a non-static member.  In this case, operand 0 will be a TYPE
  (corresponding to `A') and operand 1 will be a FIELD_DECL,
  BASELINK, or TEMPLATE_ID_EXPR (corresponding to `m').

  The expression is a pointer-to-member if its address is taken,
  but simply denotes a member of the object if its address is not
  taken.
```

> An OFFSET_REF that isn't type_unknown_p, agreed.
But I suppose that's what this might have been referring to.

So is that the case then? OFFSET_REF might be for a regular address of
member expression unless it's type_unknown_p?

> 
> An id-expression that names a single non-template function
> (!really_overloaded_fn) is handled somewhat differently, as we don't
> need to defer everything. But that means various special-case code.
> 
> Currently build_offset_ref special-cases ::f for a single static
> member function, but we can't use the same special case for single
> explicit object member functions because we need to distinguish between
> ::f and  somehow to check the requirement I quoted above. So it
> seems to me we'll need to add support for single explicit object member
> functions in the OFFSET_REF handling in cp_build_addr_expr_1. And I
> thought if we're doing that, perhaps we want to move the single static
> handling over there as well, but that's not necessary.
> 
> Jason

I'm done for today but it does seem like that special case is what is
causing my crash. I found that it only jumps into the section that it
crashes in when there are no overloads. I'm 

Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Alexandre Oliva via Gcc
On Oct 19, 2023, Thomas Schwinge  wrote:

> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
>> I guess I can ask, why there is not a recursive approach for configuring
>> GCC. e.g. AC_SUBDIRS in the top level?

> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> ask the right questions...  ;-)

> What do people think about the attached
> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.

Interesting idea!

It is a little hackish, in that it seems to exploit an implementation
detail in AC_CONFIG_SUBDIRS rather than a documented feature.

I like it!

The autoconf documentation suggests that optional directories can be
tested for:

  if test -d "$srcdir/foo"; then
AC_CONFIG_SUBDIRS([foo])
  fi

We could use a macro that takes a list and iterates over the list (untested):

dnl Handle a list of optional subdirs.
dnl After AC_OUTPUT, affects autoreconf runs, but not configure runs.
AC_DEFUN([AC_CONFIG_SUBDIRS_OPT], [
  m4_foreach_w([dir], [$1], [
if test -d "$srcdir/dir"; then
  AC_CONFIG_SUBDIRS(dir)
fi
  ])
])

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Alexandre Oliva
On Oct 19, 2023, Thomas Schwinge  wrote:

> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
>> I guess I can ask, why there is not a recursive approach for configuring
>> GCC. e.g. AC_SUBDIRS in the top level?

> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> ask the right questions...  ;-)

> What do people think about the attached
> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.

Interesting idea!

It is a little hackish, in that it seems to exploit an implementation
detail in AC_CONFIG_SUBDIRS rather than a documented feature.

I like it!

The autoconf documentation suggests that optional directories can be
tested for:

  if test -d "$srcdir/foo"; then
AC_CONFIG_SUBDIRS([foo])
  fi

We could use a macro that takes a list and iterates over the list (untested):

dnl Handle a list of optional subdirs.
dnl After AC_OUTPUT, affects autoreconf runs, but not configure runs.
AC_DEFUN([AC_CONFIG_SUBDIRS_OPT], [
  m4_foreach_w([dir], [$1], [
if test -d "$srcdir/dir"; then
  AC_CONFIG_SUBDIRS(dir)
fi
  ])
])

Thanks,

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH V2] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

2023-10-19 Thread Lehua Ding
Committed after the commited of the vsetvl pass refactor patch, thanks 
Robin.


On 2023/10/19 16:43, Robin Dapp wrote:

Hi Juzhe,

as discussed off-list this approach generally makes sense to me so
the patch LGTM once the vsetvl rework is upstream and settled.

Independently, we still need to understand why the more complex
broadcast pattern is not hoisted out of the loop.

Regards
  Robin


--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Committed, thanks Patrick and Juzhe.

On 2023/10/20 2:04, Patrick O'Neill wrote:

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 




juzhe.zh...@rivai.ai

*From:* Lehua Ding 
*Date:* 2023-10-19 16:33
*To:* gcc-patches 
*CC:* juzhe.zhong ; kito.cheng
; rdapp.gcc
; palmer ;
jeffreyalaw ; lehua.ding

*Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make
the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
only maintain
   and modify this virtual CFG. Phase 4 performs insertion,
modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
   virtual CFG is called vsetvl_block_info and the vsetvl
information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
   info to a pred basic block to a more unified method that there
is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
   operand from the RVV instruction and removes the unused dest
operand
   register from the vsetvl insns.
These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
PR target/111037
PR target/111234
PR target/111725
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |    2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |    2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/vsetvl/avl_single-106.c |   34 +
.../riscv/rvv/vsetvl/avl_single-107.c  

[Bug target/111725] Missed one vsetivli insn

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111725

--- Comment #1 from CVS Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:29331e72d0ce9fe8aabdeb8c320b99943b9e067a

commit r14-4773-g29331e72d0ce9fe8aabdeb8c320b99943b9e067a
Author: Lehua Ding 
Date:   Fri Oct 20 10:22:43 2023 +0800

RISC-V: Refactor and cleanup vsetvl pass

This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only
maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification
and
   deletion of vsetvl insns based on the virtual CFG. The basic block in
the
   virtual CFG is called vsetvl_block_info and the vsetvl information
inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand
system,
   this phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift
vsetvl
   info to a pred basic block to a more unified method that there is a
vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the
avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. The
reasons
for updating are summarized below:

1. more optimized
   vlmax_back_prop-{25,26}.c
   vlmax_conflict-{3,12}.c/vsetvl-{13,23}.c/vsetvl-23.c/
   avl_single-{23,84,95}.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c
4. add some bugfix testcases.
   pr111037-{3,4}.c/pr111037-4.c
   avl_single-{89,104,105,106,107,108,109}.c

PR target/111037
PR target/111234
PR target/111725

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry):
New.
(debug): Removed.
(compute_reaching_defintion): New.
(enum vsetvl_type): Moved.
(vlmax_avl_p): Moved.
(enum emit_type): Moved.
(vlmul_to_str): Moved.
(vlmax_avl_insn_p): Removed.
(policy_to_str): Moved.
(loop_basic_block_p): Removed.
(valid_sew_p): Removed.
(vsetvl_insn_p): Moved.
(vsetvl_vtype_change_only_p): Removed.
(after_or_same_p): Removed.
(before_p): Removed.
(anticipatable_occurrence_p): Removed.
(available_occurrence_p): Removed.
(insn_should_be_added_p): Removed.
(get_all_sets): Moved.
(get_same_bb_set): Moved.
(gen_vsetvl_pat): Removed.
(calculate_vlmul): Moved.
(get_max_int_sew): New.
(emit_vsetvl_insn): Removed.
(get_max_float_sew): New.
(eliminate_insn): Removed.
(insert_vsetvl): Removed.
(count_regno_occurrences): Moved.
(get_vl_vtype_info): Removed.
(enum def_type): Moved.
(validate_change_or_fail): Moved.
(change_insn): Removed.
(get_all_real_uses): Moved.
(get_forward_read_vl_insn): Removed.
(get_backward_fault_first_load_insn): Removed.
(change_vsetvl_insn): Removed.
(avl_source_has_vsetvl_p): Removed.
(source_equal_p): Moved.
(calculate_sew): Removed.
(same_equiv_note_p): Moved.
(get_expr_id): New.
(incompatible_avl_p): Removed.
(get_regno): New.
(different_sew_p): Removed.
(get_bb_index): New.
(different_lmul_p): Removed.
(has_no_uses): Moved.
(different_ratio_p): Removed.
(different_tail_policy_p): Removed.
(different_mask_policy_p): Removed.
(possible_zero_avl_p): Removed.
(enum demand_flags): New.
(second_ratio_invalid_for_first_sew_p): Removed.
(second_ratio_invalid_for_first_lmul_p): Removed.
(enum class): New.
(float_insn_valid_sew_p): Removed.
(second_sew_less_than_first_sew_p): Removed.
(first_sew_less_than_second_sew_p): Removed.
(class vsetvl_info): New.
(compare_lmul): Removed.
(second_lmul_less_than_first_lmul_p): Removed.
(second_ratio_less_than_first_ratio_p): Removed.
(DEF_INCOMPATIBLE_COND): Removed.

[Bug target/111234] RISC-V: ICE in vsetvl pass

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111234

--- Comment #3 from CVS Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:29331e72d0ce9fe8aabdeb8c320b99943b9e067a

commit r14-4773-g29331e72d0ce9fe8aabdeb8c320b99943b9e067a
Author: Lehua Ding 
Date:   Fri Oct 20 10:22:43 2023 +0800

RISC-V: Refactor and cleanup vsetvl pass

This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only
maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification
and
   deletion of vsetvl insns based on the virtual CFG. The basic block in
the
   virtual CFG is called vsetvl_block_info and the vsetvl information
inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand
system,
   this phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift
vsetvl
   info to a pred basic block to a more unified method that there is a
vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the
avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. The
reasons
for updating are summarized below:

1. more optimized
   vlmax_back_prop-{25,26}.c
   vlmax_conflict-{3,12}.c/vsetvl-{13,23}.c/vsetvl-23.c/
   avl_single-{23,84,95}.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c
4. add some bugfix testcases.
   pr111037-{3,4}.c/pr111037-4.c
   avl_single-{89,104,105,106,107,108,109}.c

PR target/111037
PR target/111234
PR target/111725

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry):
New.
(debug): Removed.
(compute_reaching_defintion): New.
(enum vsetvl_type): Moved.
(vlmax_avl_p): Moved.
(enum emit_type): Moved.
(vlmul_to_str): Moved.
(vlmax_avl_insn_p): Removed.
(policy_to_str): Moved.
(loop_basic_block_p): Removed.
(valid_sew_p): Removed.
(vsetvl_insn_p): Moved.
(vsetvl_vtype_change_only_p): Removed.
(after_or_same_p): Removed.
(before_p): Removed.
(anticipatable_occurrence_p): Removed.
(available_occurrence_p): Removed.
(insn_should_be_added_p): Removed.
(get_all_sets): Moved.
(get_same_bb_set): Moved.
(gen_vsetvl_pat): Removed.
(calculate_vlmul): Moved.
(get_max_int_sew): New.
(emit_vsetvl_insn): Removed.
(get_max_float_sew): New.
(eliminate_insn): Removed.
(insert_vsetvl): Removed.
(count_regno_occurrences): Moved.
(get_vl_vtype_info): Removed.
(enum def_type): Moved.
(validate_change_or_fail): Moved.
(change_insn): Removed.
(get_all_real_uses): Moved.
(get_forward_read_vl_insn): Removed.
(get_backward_fault_first_load_insn): Removed.
(change_vsetvl_insn): Removed.
(avl_source_has_vsetvl_p): Removed.
(source_equal_p): Moved.
(calculate_sew): Removed.
(same_equiv_note_p): Moved.
(get_expr_id): New.
(incompatible_avl_p): Removed.
(get_regno): New.
(different_sew_p): Removed.
(get_bb_index): New.
(different_lmul_p): Removed.
(has_no_uses): Moved.
(different_ratio_p): Removed.
(different_tail_policy_p): Removed.
(different_mask_policy_p): Removed.
(possible_zero_avl_p): Removed.
(enum demand_flags): New.
(second_ratio_invalid_for_first_sew_p): Removed.
(second_ratio_invalid_for_first_lmul_p): Removed.
(enum class): New.
(float_insn_valid_sew_p): Removed.
(second_sew_less_than_first_sew_p): Removed.
(first_sew_less_than_second_sew_p): Removed.
(class vsetvl_info): New.
(compare_lmul): Removed.
(second_lmul_less_than_first_lmul_p): Removed.
(second_ratio_less_than_first_ratio_p): Removed.
(DEF_INCOMPATIBLE_COND): Removed.

[Bug target/111848] RISC-V: RVV cost model pick unexpected big LMUL

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

--- Comment #2 from CVS Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:f0e28d8c13713f509fde26fbe7dd13280b67fb87

commit r14-4774-gf0e28d8c13713f509fde26fbe7dd13280b67fb87
Author: Juzhe-Zhong 
Date:   Wed Oct 18 18:25:33 2023 +0800

RISC-V: Fix failed hoist in LICM of vmv.v.x instruction

Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848

But it generate horrible register spillings.

The root cause is that we didn't hoist the vmv.v.x outside the loop which
increase the SLP loop register pressure.

So, change the COSNT_VECTOR move into vec_duplicate splitter that we can
gain better optimizations:

1. better LICM.
2. More opportunities of transforming 'vv' into 'vx' in the future.

Before this patch:

f3:
ble a4,zero,.L8
csrrt0,vlenb
sllit1,t0,4
csrra6,vlenb
sub sp,sp,t1
csrra5,vlenb
sllia6,a6,3
sllia5,a5,2
add a6,a6,sp
vsetvli a7,zero,e16,m8,ta,ma
sllia4,a4,3
vid.v   v8
addit6,a5,-1
vand.vi v8,v8,-2
neg t5,a5
vs8r.v  v8,0(sp)
vadd.vi v8,v8,1
vs8r.v  v8,0(a6)
j   .L4
.L12:
vsetvli a7,zero,e16,m8,ta,ma
.L4:
csrrt0,vlenb
sllit0,t0,3
vl8re16.v   v16,0(sp)
add t0,t0,sp
vmv.v.x v8,t6
mv  t1,a4
vand.vv v24,v16,v8
mv  a6,a4
vl8re16.v   v16,0(t0)
vand.vv v8,v16,v8
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v20,0(a2)
vle8.v  v16,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v24
vadd.vv v4,v16,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v20,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v20,v8
vadd.vv v4,v4,v16
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L12
csrrt0,vlenb
sllit1,t0,4
add sp,sp,t1
jr  ra
.L8:
ret

After this patch:

f3:
ble a4,zero,.L6
csrra6,vlenb
csrra5,vlenb
sllia6,a6,2
sllia5,a5,2
addia6,a6,-1
sllia4,a4,3
neg t5,a5
vsetvli t1,zero,e16,m8,ta,ma
vmv.v.x v24,a6
vid.v   v8
vand.vi v8,v8,-2
vadd.vi v16,v8,1
vand.vv v8,v8,v24
vand.vv v16,v16,v24
.L4:
mv  t1,a4
mv  a6,a4
bleua4,a5,.L3
mv  a6,a5
.L3:
vsetvli zero,a6,e8,m4,ta,ma
vle8.v  v28,0(a2)
vle8.v  v24,0(a3)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v8
vadd.vv v4,v24,v4
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a0)
vle8.v  v28,0(a2)
vsetvli a7,zero,e8,m4,ta,ma
vrgatherei16.vv v4,v28,v16
vadd.vv v4,v4,v24
vsetvli zero,a6,e8,m4,ta,ma
vse8.v  v4,0(a1)
add a4,a4,t5
add a0,a0,a5
add a3,a3,a5
add a1,a1,a5
add a2,a2,a5
bgtut1,a5,.L4
.L6:
ret

Note that this patch triggers multiple FAILs:
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c
execution test
FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c
execution test
FAIL: 

[Bug target/111037] RISC-V: Invalid vsetvli fusion

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111037

--- Comment #4 from CVS Commits  ---
The trunk branch has been updated by Lehua Ding :

https://gcc.gnu.org/g:29331e72d0ce9fe8aabdeb8c320b99943b9e067a

commit r14-4773-g29331e72d0ce9fe8aabdeb8c320b99943b9e067a
Author: Lehua Ding 
Date:   Fri Oct 20 10:22:43 2023 +0800

RISC-V: Refactor and cleanup vsetvl pass

This patch refactors and cleanups the vsetvl pass in order to make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3 only
maintain
   and modify this virtual CFG. Phase 4 performs insertion, modification
and
   deletion of vsetvl insns based on the virtual CFG. The basic block in
the
   virtual CFG is called vsetvl_block_info and the vsetvl information
inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the demand
system,
   this phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to uplift
vsetvl
   info to a pred basic block to a more unified method that there is a
vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5 removes the
avl
   operand from the RVV instruction and removes the unused dest operand
   register from the vsetvl insns.

These modifications resulted in some testcases needing to be updated. The
reasons
for updating are summarized below:

1. more optimized
   vlmax_back_prop-{25,26}.c
   vlmax_conflict-{3,12}.c/vsetvl-{13,23}.c/vsetvl-23.c/
   avl_single-{23,84,95}.c/pr109773-1.c
2. less unnecessary fusion
   avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c
4. add some bugfix testcases.
   pr111037-{3,4}.c/pr111037-4.c
   avl_single-{89,104,105,106,107,108,109}.c

PR target/111037
PR target/111234
PR target/111725

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry):
New.
(debug): Removed.
(compute_reaching_defintion): New.
(enum vsetvl_type): Moved.
(vlmax_avl_p): Moved.
(enum emit_type): Moved.
(vlmul_to_str): Moved.
(vlmax_avl_insn_p): Removed.
(policy_to_str): Moved.
(loop_basic_block_p): Removed.
(valid_sew_p): Removed.
(vsetvl_insn_p): Moved.
(vsetvl_vtype_change_only_p): Removed.
(after_or_same_p): Removed.
(before_p): Removed.
(anticipatable_occurrence_p): Removed.
(available_occurrence_p): Removed.
(insn_should_be_added_p): Removed.
(get_all_sets): Moved.
(get_same_bb_set): Moved.
(gen_vsetvl_pat): Removed.
(calculate_vlmul): Moved.
(get_max_int_sew): New.
(emit_vsetvl_insn): Removed.
(get_max_float_sew): New.
(eliminate_insn): Removed.
(insert_vsetvl): Removed.
(count_regno_occurrences): Moved.
(get_vl_vtype_info): Removed.
(enum def_type): Moved.
(validate_change_or_fail): Moved.
(change_insn): Removed.
(get_all_real_uses): Moved.
(get_forward_read_vl_insn): Removed.
(get_backward_fault_first_load_insn): Removed.
(change_vsetvl_insn): Removed.
(avl_source_has_vsetvl_p): Removed.
(source_equal_p): Moved.
(calculate_sew): Removed.
(same_equiv_note_p): Moved.
(get_expr_id): New.
(incompatible_avl_p): Removed.
(get_regno): New.
(different_sew_p): Removed.
(get_bb_index): New.
(different_lmul_p): Removed.
(has_no_uses): Moved.
(different_ratio_p): Removed.
(different_tail_policy_p): Removed.
(different_mask_policy_p): Removed.
(possible_zero_avl_p): Removed.
(enum demand_flags): New.
(second_ratio_invalid_for_first_sew_p): Removed.
(second_ratio_invalid_for_first_lmul_p): Removed.
(enum class): New.
(float_insn_valid_sew_p): Removed.
(second_sew_less_than_first_sew_p): Removed.
(first_sew_less_than_second_sew_p): Removed.
(class vsetvl_info): New.
(compare_lmul): Removed.
(second_lmul_less_than_first_lmul_p): Removed.
(second_ratio_less_than_first_ratio_p): Removed.
(DEF_INCOMPATIBLE_COND): Removed.

Re: [PATCH 2/2] tree-optimization/111131 - SLP for non-IFN gathers

2023-10-19 Thread Lehua Ding

Hi Richard,

I'm hitting a couple of testcase ICEs for RISC-V while testing with the 
latest trunk code, it looks like these two patches are causing it, can 
you help me look at it? ICE log like bellow:


➜  vsetvl git:(tintin-dev) 
~/open-source/riscv-gnu-toolchain-golden/build/dev-rv64gcv_zfh-lp64d-medany-newlib-spike-debug-4/install/bin/riscv64-unknown-elf-gcc 
-march=rv64gcv_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output   -ftree-vectorize -O3 --param 
riscv-autovec-lmul=m1 --param=riscv-autovec-preference=scalable 
-fno-vect-cost-model -ffast-math 
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c 
 -march=rv64gcv_zfh -mabi=lp64d -mcmodel=medany 
-fdiagnostics-plain-output -ftree-vectorize -O2 --param 
riscv-autovec-preference=fixed-vlmax --param riscv-autovec-lmul=m1 
-fno-vect-cost-model -ffast-math -mcmodel=medany -lm

during GIMPLE pass: vect
In file included from 
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c:4:
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: 
In function 'f_int16_t_8':
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:15:3: 
internal compiler error: in update_epilogue_loop_vinfo, at 
tree-vect-loop.cc:11376
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:23:3: 
note: in expansion of macro 'TEST_LOOP'
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:31:3: 
note: in expansion of macro 'TEST_TYPE'
../../../riscv-gnu-toolchain-golden/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c:41:1: 
note: in expansion of macro 'TEST_ALL'

0x1bf5838 update_epilogue_loop_vinfo
../../../../gcc/gcc/tree-vect-loop.cc:11375
0x1bf746f vect_transform_loop(_loop_vec_info*, gimple*)
../../../../gcc/gcc/tree-vect-loop.cc:11826
0x1c4e8a7 vect_transform_loops
../../../../gcc/gcc/tree-vectorizer.cc:1007
0x1c4eff9 try_vectorize_loop_1
../../../../gcc/gcc/tree-vectorizer.cc:1152
0x1c4f135 try_vectorize_loop
../../../../gcc/gcc/tree-vectorizer.cc:1184
0x1c4f3e5 execute
../../../../gcc/gcc/tree-vectorizer.cc:1298
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).

Please include the complete backtrace with any bug report.
See  for instructions.


On 2023/10/19 19:47, Richard Biener wrote:

The following implements SLP vectorization support for gathers
without relying on IFNs being pattern detected (and supported by
the target).  That includes support for emulated gathers but also
the legacy x86 builtin path.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push.

Richard.

PR tree-optimization/31
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Make
sure to update all gather/scatter stmt DRs, not only those
that eventually got VMAT_GATHER_SCATTER set.
* tree-vect-slp.cc (_slp_oprnd_info::first_gs_info): Add.
(vect_get_and_check_slp_defs): Handle gathers/scatters,
adding the offset as SLP operand and comparing base and scale.
(vect_build_slp_tree_1): Handle gathers.
(vect_build_slp_tree_2): Likewise.

* gcc.dg/vect/vect-gather-1.c: Now expected to vectorize
everywhere.
* gcc.dg/vect/vect-gather-2.c: Expected to not SLP anywhere.
Massage the scale case to more reliably produce a different
one.  Scan for the specific messages.
* gcc.dg/vect/vect-gather-3.c: Masked gather is also supported
for AVX2, but not emulated.
* gcc.dg/vect/vect-gather-4.c: Expected to not SLP anywhere.
Massage to more properly ensure this.
* gcc.dg/vect/tsvc/vect-tsvc-s353.c: Expect to vectorize
everywhere.
---
  .../gcc.dg/vect/tsvc/vect-tsvc-s353.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-1.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-2.c | 13 --
  gcc/testsuite/gcc.dg/vect/vect-gather-3.c |  2 +-
  gcc/testsuite/gcc.dg/vect/vect-gather-4.c |  6 +--
  gcc/tree-vect-loop.cc |  6 ++-
  gcc/tree-vect-slp.cc  | 45 +--
  7 files changed, 61 insertions(+), 15 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c 
b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
index 98ba7522471..2c4fa3f5991 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s353.c
@@ -44,4 +44,4 @@ int main (int argc, char **argv)
return 0;
  }
  
-/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { xfail { ! riscv_v } } } } */

+/* { dg-final { scan-tree-dump 

Re: [PATCH] return edge in make_eh_edges

2023-10-19 Thread Alexandre Oliva
On Oct 19, 2023, Richard Biener  wrote:

> OK.  Maybe time to do s/make_eh_edges/make_eh_edge/ though.

Thanks, will do, ideally on top of the already-tested refreshed patches
that I'm going to post shortly.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[Bug c/111888] New: RISC-V: Horrible redundant number vsetvl instructions in vectorized codes

2023-10-19 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111888

Bug ID: 111888
   Summary: RISC-V: Horrible redundant number vsetvl instructions
in vectorized codes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

https://godbolt.org/z/9G5MMa3Tq

void
foo (int32_t *__restrict a, int32_t *__restrict b,int32_t *__restrict c,
  int32_t *__restrict a2, int32_t *__restrict b2, int32_t *__restrict c2,
  int32_t *__restrict a3, int32_t *__restrict b3, int32_t *__restrict c3,
  int32_t *__restrict a4, int32_t *__restrict b4, int32_t *__restrict c4,
  int32_t *__restrict a5, int32_t *__restrict b5, int32_t *__restrict c5,
  int32_t *__restrict d,
  int32_t *__restrict d2,
  int32_t *__restrict d3,
  int32_t *__restrict d4,
  int32_t *__restrict d5,
  int n)
{
  for (int i = 0; i < n; i++)
{
  a[i] = b[i] + c[i];
  b5[i] = b[i] + c[i];
  a2[i] = b2[i] + c2[i];
  a3[i] = b3[i] + c3[i];
  a4[i] = b4[i] + c4[i];
  a5[i] = a[i] + a4[i];
  d2[i] = a2[i] + c2[i];
  d3[i] = a3[i] + c3[i];
  d4[i] = a4[i] + c4[i];
  d5[i] = a[i] + a4[i];
  a[i] = a5[i] + b5[i] + a[i];

  c2[i] = a[i] + c[i];
  c3[i] = b5[i] * a5[i];
  c4[i] = a2[i] * a3[i];
  c5[i] = b5[i] * a2[i];
  c[i] = a[i] + c3[i];
  c2[i] = a[i] + c4[i];
  a5[i] = a[i] + a4[i];
  a[i] = a[i] + b5[i] + a[i] * a2[i] * a3[i] * a4[i] 
  * a5[i] * c[i] * c2[i] * c3[i] * c4[i] * c5[i]
  * d[i] * d2[i] * d3[i] * d4[i] * d5[i];
}
}


Loop body:

vsetvli t1,t4,e8,mf4,ta,ma
vle32.v v1,0(a1)
vle32.v v4,0(a2)
vle32.v v2,0(s10)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v4,v4,v1
vsetvli zero,t4,e32,m1,ta,ma
vle32.v v7,0(s9)
vle32.v v1,0(a4)
vse32.v v4,0(t0)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v2,v7,v2
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v2,0(t5)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v5,v2,v4
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v5,0(s3)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v3,v5,v4
vsetvli zero,t4,e32,m1,ta,ma
vle32.v v9,0(a5)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v3,v3,v4
vsetvli zero,t4,e32,m1,ta,ma
vle32.v v6,0(a7)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v1,v9,v1
vsetvli zero,t4,e32,m1,ta,ma
vle32.v v8,0(s8)
vse32.v v1,0(a3)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v6,v8,v6
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v6,0(a6)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v11,v5,v4
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v11,0(s4)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v13,v11,v3
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v13,0(s6)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v10,v6,v1
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v10,0(s5)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v12,v1,v4
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v12,0(t2)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v9,v1,v9
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v9,0(s0)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v8,v6,v8
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v8,0(s1)
vsetvli t3,zero,e32,m1,ta,ma
vadd.vv v7,v2,v7
vsetvli zero,t4,e32,m1,ta,ma
vse32.v v7,0(s2)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v1,v3,v1
vmul.vv v1,v1,v6
vadd.vv v6,v10,v3
vmul.vv v1,v1,v2
vadd.vv v2,v3,v2
vmul.vv v1,v1,v2
vmul.vv v1,v1,v13
vsetvli zero,t1,e32,m1,ta,ma
vse32.v v6,0(s7)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v1,v1,v6
vsetvli zero,t1,e32,m1,ta,ma
vse32.v v2,0(t6)
vsetvli t3,zero,e32,m1,ta,ma
vmul.vv v1,v1,v11
vsetvli zero,t1,e32,m1,ta,ma
vle32.v v2,0(s11)
vsetvli t3,zero,e32,m1,ta,ma
sllit3,t1,2
vmul.vv v1,v1,v10
vadd.vv v3,v3,v4
vmul.vv v1,v1,v12
sub t4,t4,t1
vmul.vv v1,v1,v2
vmul.vv v1,v1,v9
vmul.vv v1,v1,v8
vmul.vv v1,v1,v7
vmadd.vvv5,v1,v3
vsetvli zero,t1,e32,m1,ta,ma
vse32.v v5,0(a0)

So many redundant AVL toggling. Ideally, it should be only a single vsetvl
instruction in the header of the loop. All other vsetvls should be elided.

It's known issue for a long time.
And I will be working on it recently base on refactored VSETVL PASS.

[Bug c++/101631] gcc allows for the changing of an union active member to be changed via a reference

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101631

--- Comment #4 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:1d260ab0e39ea63644e3af3ab2e0db946026b5a6

commit r14-4771-g1d260ab0e39ea63644e3af3ab2e0db946026b5a6
Author: Nathaniel Shead 
Date:   Thu Oct 12 19:53:55 2023 +1100

c++: indirect change of active union member in constexpr
[PR101631,PR102286]

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*() = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.

This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.

As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores and accesses, which catches some additional mistakes
pre-C++20.

PR c++/101631
PR c++/102286

gcc/cp/ChangeLog:

* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_component_reference): Check against first member of
value-initialised union.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp1y/constexpr-union7.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.
* g++.dg/cpp2a/constexpr-union6.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 

[Bug c++/102286] [constexpr] construct_at incorrectly starts union array lifetime in some cases

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102286

--- Comment #5 from CVS Commits  ---
The trunk branch has been updated by Jason Merrill :

https://gcc.gnu.org/g:1d260ab0e39ea63644e3af3ab2e0db946026b5a6

commit r14-4771-g1d260ab0e39ea63644e3af3ab2e0db946026b5a6
Author: Nathaniel Shead 
Date:   Thu Oct 12 19:53:55 2023 +1100

c++: indirect change of active union member in constexpr
[PR101631,PR102286]

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*() = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.

This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.

As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores and accesses, which catches some additional mistakes
pre-C++20.

PR c++/101631
PR c++/102286

gcc/cp/ChangeLog:

* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_component_reference): Check against first member of
value-initialised union.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp1y/constexpr-union7.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.
* g++.dg/cpp2a/constexpr-union6.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Jason Merrill 

Re: [PATCH v2] c++: Improve diagnostics for constexpr cast from void*

2023-10-19 Thread Jason Merrill

On 10/11/23 11:41, Marek Polacek wrote:

On Wed, Oct 11, 2023 at 10:57:06AM +1100, Nathaniel Shead wrote:

On Mon, Oct 09, 2023 at 04:10:20PM -0400, Jason Merrill wrote:

On 10/9/23 06:03, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26,impcx.

-- >8 --

This patch improves the errors given when casting from void* in C++26 to
include the expected type if the type of the pointed-to object was
not similar to the casted-to type.

It also ensures (for all standard modes) that void* casts are checked
even for DECL_ARTIFICIAL declarations, such as lifetime-extended
temporaries, and is only ignored for cases where we know it's OK (heap
identifiers and source_location::current). This provides more accurate
diagnostics when using the pointer and ensures that some other casts
from void* are now correctly rejected.

gcc/cp/ChangeLog:

* constexpr.cc (is_std_source_location_current): New.
(cxx_eval_constant_expression): Only ignore cast from void* for
specific cases and improve other diagnostics.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-cast4.C: New test.

Signed-off-by: Nathaniel Shead 
---
   gcc/cp/constexpr.cc  | 83 +---
   gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C |  7 ++
   2 files changed, 78 insertions(+), 12 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-cast4.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 0f948db7c2d..f38d541a662 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -2301,6 +2301,36 @@ is_std_allocator_allocate (const constexpr_call *call)
  && is_std_allocator_allocate (call->fundef->decl));
   }
+/* Return true if FNDECL is std::source_location::current.  */
+
+static inline bool
+is_std_source_location_current (tree fndecl)
+{
+  if (!decl_in_std_namespace_p (fndecl))
+return false;
+
+  tree name = DECL_NAME (fndecl);
+  if (name == NULL_TREE || !id_equal (name, "current"))
+return false;
+
+  tree ctx = DECL_CONTEXT (fndecl);
+  if (ctx == NULL_TREE || !CLASS_TYPE_P (ctx) || !TYPE_MAIN_DECL (ctx))
+return false;
+
+  name = DECL_NAME (TYPE_MAIN_DECL (ctx));
+  return name && id_equal (name, "source_location");
+}
+
+/* Overload for the above taking constexpr_call*.  */
+
+static inline bool
+is_std_source_location_current (const constexpr_call *call)
+{
+  return (call
+ && call->fundef
+ && is_std_source_location_current (call->fundef->decl));
+}
+
   /* Return true if FNDECL is __dynamic_cast.  */
   static inline bool
@@ -7850,33 +7880,62 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
if (TYPE_PTROB_P (type)
&& TYPE_PTR_P (TREE_TYPE (op))
&& VOID_TYPE_P (TREE_TYPE (TREE_TYPE (op)))
-   /* Inside a call to std::construct_at or to
-  std::allocator::{,de}allocate, we permit casting from void*
+   /* Inside a call to std::construct_at,
+  std::allocator::{,de}allocate, or
+  std::source_location::current, we permit casting from void*
   because that is compiler-generated code.  */
&& !is_std_construct_at (ctx->call)
-   && !is_std_allocator_allocate (ctx->call))
+   && !is_std_allocator_allocate (ctx->call)
+   && !is_std_source_location_current (ctx->call))
  {
/* Likewise, don't error when casting from void* when OP is
uninit and similar.  */
tree sop = tree_strip_nop_conversions (op);
-   if (TREE_CODE (sop) == ADDR_EXPR
-   && VAR_P (TREE_OPERAND (sop, 0))
-   && DECL_ARTIFICIAL (TREE_OPERAND (sop, 0)))
+   tree decl = NULL_TREE;
+   if (TREE_CODE (sop) == ADDR_EXPR)
+ decl = TREE_OPERAND (sop, 0);
+   if (decl
+   && VAR_P (decl)
+   && DECL_ARTIFICIAL (decl)
+   && (DECL_NAME (decl) == heap_identifier
+   || DECL_NAME (decl) == heap_uninit_identifier
+   || DECL_NAME (decl) == heap_vec_identifier
+   || DECL_NAME (decl) == heap_vec_uninit_identifier))
  /* OK */;
/* P2738 (C++26): a conversion from a prvalue P of type "pointer to
   cv void" to a pointer-to-object type T unless P points to an
   object whose type is similar to T.  */
-   else if (cxx_dialect > cxx23
-&& (sop = cxx_fold_indirect_ref (ctx, loc,
- TREE_TYPE (type), sop)))
+   else if (cxx_dialect > cxx23)
  {
-   r = build1 (ADDR_EXPR, type, sop);
-   break;
+   r = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), sop);
+   if (r)
+ {
+   r = build1 (ADDR_EXPR, type, r);
+   break;
+   

Re: [PATCH v6] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-10-19 Thread Jason Merrill

On 10/12/23 18:05, Nathaniel Shead wrote:

On Thu, Oct 12, 2023 at 04:24:00PM -0400, Jason Merrill wrote:

On 10/12/23 04:53, Nathaniel Shead wrote:

On Wed, Oct 11, 2023 at 12:48:12AM +1100, Nathaniel Shead wrote:

On Mon, Oct 09, 2023 at 04:46:46PM -0400, Jason Merrill wrote:

On 10/8/23 21:03, Nathaniel Shead wrote:

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631203.html

+ && (TREE_CODE (t) == MODIFY_EXPR
+ /* Also check if initializations have implicit change of active
+member earlier up the access chain.  */
+ || !refs->is_empty())


I'm not sure what the cumulative point of these two tests is.  TREE_CODE (t)
will be either MODIFY_EXPR or INIT_EXPR, and either should be OK.

As I understand it, the problematic case is something like
constexpr-union2.C, where we're also looking at a MODIFY_EXPR.  So what is
this check doing?


The reasoning was to correctly handle cases like the the following (in
constexpr-union6.C):

constexpr int test1() {
  U u {};
  std::construct_at(, S{ 1, 2 });
  return u.s.b;
}
static_assert(test1() == 2);

The initialisation of  here is not a member access expression within
the call to std::construct_at, since it's just a pointer, but this code
is still legal; in general, an INIT_EXPR to initialise a union member
should always be OK (I believe?), hence constraining to just
MODIFY_EXPR.

However, just that would then (incorrectly) allow all the following
cases in that test to compile, such as

constexpr int test2() {
  U u {};
  int* p = 
  std::construct_at(p, 5);
  return u.s.b;
}
constexpr int x2 = test2();

since the INIT_EXPR is really only initialising 'b', but the implicit
"modification" of active member to 'u.s' is illegal.

Maybe a better way of expressing this condition would be something like
this?

/* An INIT_EXPR of the last member in an access chain is always OK,
   but still check implicit change of members earlier on; see
   cpp2a/constexpr-union6.C.  */
&& !(TREE_CODE (t) == INIT_EXPR && refs->is_empty ())

Otherwise I'll see if I can rework some of the other conditions instead.


Incidentally, I think constexpr-union6.C could use a test where we pass 
to a function other than construct_at, and then try (and fail) to assign to
the b member from that function.

Jason



Sounds good; I've added the following test:

constexpr void foo(S* s) {
  s->b = 10;  // { dg-error "accessing .U::s. member instead of initialized 
.U::k." }
}
constexpr int test3() {
  U u {};
  foo();  // { dg-message "in .constexpr. expansion" }
  return u.s.b;
}
constexpr int x3 = test3();  // { dg-message "in .constexpr. expansion" }

Incidentally I found this particular example caused a very unhelpful
error + ICE due to reporting that S could not be value-initialized in
the current version of the patch. The updated version below fixes that
by using 'build_zero_init' instead -- is this an appropriate choice
here?

A similar (but unrelated) issue is with e.g.
struct S { const int a; int b; };
union U { int k; S s; };

constexpr int test() {
  U u {};
  return u.s.b;
}
constexpr int x = test();

giving me this pretty unhelpful error message:

/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
  6 |   return u.s.b;
|  ~~^
/home/ns/main.cpp:1:8: note: ‘S::S()’ is implicitly deleted because the default 
definition would be ill-formed:
  1 | struct S { const int a; int b; };
|^
/home/ns/main.cpp:1:8: error: uninitialised const member in ‘struct S’
/home/ns/main.cpp:1:22: note: ‘const int S::a’ should be initialised
  1 | struct S { const int a; int b; };
|  ^
/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’
  6 |   return u.s.b;
|  ~~^
/home/ns/main.cpp:8:23:   in ‘constexpr’ expansion of ‘test()’
/home/ns/main.cpp:6:12: error: use of deleted function ‘S::S()’

but I'll try and fix this separately (it exists on current trunk without
this patch as well).


While attempting to fix this I found a much better way of handling
value-initialised unions. Here's a new version of the patch which also
includes the fix for accessing the wrong member of a value-initialised
union as well.

Additionally includes an `auto_diagnostic_group` for the `inform`
diagnostics as Marek helpfully informed me about in my other patch.

Bootstrapped and regtested on x86_64-pc-linux-gnu.


@@ -4496,21 +4491,36 @@ cxx_eval_component_reference (const constexpr_ctx *ctx, 
tree t,

break;
}
   }
-  if (TREE_CODE (TREE_TYPE (whole)) == UNION_TYPE
-  && CONSTRUCTOR_NELTS (whole) > 0)
+  if (TREE_CODE (TREE_TYPE (whole)) == UNION_TYPE)
   {
-  /* DR 1188 

[Bug c/111887] GCC: 14: A potential miscompilation with __builtin_inf

2023-10-19 Thread 141242068 at smail dot nju.edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111887

--- Comment #2 from wierton <141242068 at smail dot nju.edu.cn> ---
Thanks for you reply, I got it!

[Bug c/111887] GCC: 14: A potential miscompilation with __builtin_inf

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111887

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
-Ofast enables -ffinite-math-only which means infinite will not show up and
compares against it will not work.

[PATCH] testsuite: check for and use -mno-strict-align where needed

2023-10-19 Thread Alexandre Oliva
On Mar 10, 2021, Alexandre Oliva  wrote:

> ppc configurations that have -mstrict-align enabled by default fail
> gcc.dg/strlenopt-80.c, because some memcpy calls don't get turned into
> MEM_REFs, which defeats the tested-for strlen optimization.

I've combined this patch with other patches that added -mno-strict-align
to tests that needed it on targets configured with -mstrict-align
enabled by default, and conditioned the use of the flag to targets that
support it.

Regstrapped on x86_64-linux-gnu, ppc64le-linux-gnu, also tested on a
ppc-vx7r2 configured with -mstrict-align.  Ok to install?



Various tests fail on powerpc if the toolchain is configured to enable
-mstrict-align by default.  This patch introduces -mno-strict-align on
tests found to fail that way, when the target supports this option.

I suppose !non_strict_align could be used to skip tests, instead of or
in addition to this tweak, and that might be desirable if they still
fail on targets that do no support -mno-strict-align, but I haven't
observed such scenarios.


The p9-vec-length tests expect vectorization on loop bodies and
epilogues that reference arrays that are not known to be more aligned
than their small element types.

Though VSX vectors work best with 32- or 64-bit alignment, unaligned
vector loads and stores are expected by the tests.  However, with
-mstrict-align by default, vector loads and stores not known to be
aligned end up open coded, which doesn't match the asm output
expectations coded in the tests.


for  gcc/ChangeLog

* doc/sourcebuild.texi (opt_mstrict_align): New target.

for  gcc/testsuite/ChangeLog

* lib/target-supports.exp
(check_effective_target_opt_mstrict_align): New.
* gcc.dg/strlenopt-80.c: Add -mno-strict-align if supported.
* gcc.target/powerpc/prefix-ds-dq.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-run-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-run-8.c: Likewise.
---
 gcc/doc/sourcebuild.texi   |3 +++
 gcc/testsuite/gcc.dg/strlenopt-80.c|4 
 .../gcc.target/powerpc/p9-vec-length-epil-1.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-2.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-3.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-4.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-5.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-6.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-7.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-8.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-1.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-2.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-3.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-4.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-5.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-6.c  |2 ++
 .../gcc.target/powerpc/p9-vec-length-epil-run-7.c  |2 ++
 

[Bug c/111887] New: GCC: 14: A potential miscompilation with __builtin_inf

2023-10-19 Thread 141242068 at smail dot nju.edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111887

Bug ID: 111887
   Summary: GCC: 14: A potential miscompilation with __builtin_inf
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 141242068 at smail dot nju.edu.cn
  Target Milestone: ---

I am uncertain whether this program contains undefined behavior, the testcase
is:
```
extern void abort (void);

void test(double f, double i)
{
  if (i != __builtin_inf())
abort ();
}

int main()
{
  test (34.0, __builtin_inf());
  return 0;
}

```

When compile it with -O0, -O1, -O2, the program normally exits, but when
compile it with -Ofast, the program abort.

Here is the verification link: https://gcc.godbolt.org/z/7YKbvha84

[Bug middle-end/101195] ICE: in tree_to_uhwi, at tree.c:6324

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101195

Andrew Pinski  changed:

   What|Removed |Added

 CC||141242068 at smail dot 
nju.edu.cn

--- Comment #3 from Andrew Pinski  ---
*** Bug 111886 has been marked as a duplicate of this bug. ***

[Bug c/111886] GCC: 14: internal compiler error: in tree_to_uhwi, at tree.cc:6467

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111886

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup.

*** This bug has been marked as a duplicate of bug 101195 ***

[Bug c/111886] New: GCC: 14: internal compiler error: in tree_to_uhwi, at tree.cc:6467

2023-10-19 Thread 141242068 at smail dot nju.edu.cn via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111886

Bug ID: 111886
   Summary: GCC: 14: internal compiler error: in tree_to_uhwi, at
tree.cc:6467
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: 141242068 at smail dot nju.edu.cn
  Target Milestone: ---

The ICE trigger:
```
void f() {
  __builtin_eh_return_data_regno(-1);;
}
```

Verification link: https://gcc.godbolt.org/z/K9178a89W

The stack dump:
```
during RTL pass: expand
: In function 'f':
:2:3: internal compiler error: in tree_to_uhwi, at tree.cc:6467
2 |   __builtin_eh_return_data_regno(-1);;
  |   ^~
0x231788e internal_error(char const*, ...)
???:0
0xa0002a fancy_abort(char const*, int, char const*)
???:0
0xca7b96 expand_builtin_eh_return_data_regno(tree_node*)
???:0
0xb79cc6 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
???:0
0xccd48b expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
expand_modifier, rtx_def**, bool)
???:0
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
```

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread Jason Merrill

On 10/19/23 19:35, waffl3x wrote:

(waffl3x (me))
At a glance it seems like all I need to do then is disable the
PTRMEM_OK_P flag then.


I'm just now realizing that I'm almost certainly wrong about this. It
still needs PTRMEM_OK_P set if there are any implicit-object member
functions in the overload set. That is, if OFFSET_REF includes that
information... but it doesn't seem like it does? Reading the
information on OFFSET_REF, particularly build_offset_ref, seems to
indicate that OFFSET_REF (at least historically) was only for things
with a pointer to member type.


Or things that might end up with pointer-to-member type after overload 
resolution.



An OFFSET_REF (with PTRMEM_OK_P) is used to express that we saw the
::f syntax, so we could build a pointer to member if it resolves to an
implicit-object member function.

For an overload set containing only a single static member function,
build_offset_ref doesn't bother to build an OFFSET_REF, but returns the
BASELINK itself.


Based on what you've said, I assume that OFFSET_REF handles static
member functions that are overloaded. But as I've said this seems to
contradict the comments I'm reading, so I'm not sure that I'm
understanding you correctly.


That's right.  For instance,

struct A {
  static void g();
  static void g(int);
};

void (*p)(int) = ::g; // cp_build_addr_expr_1 gets an OFFSET_REF


I think we need the OFFSET_REF for an explicit-object member function
because it expresses that the code satisfies the requirement "If the
operand names an explicit object member function, the operand shall be a
qualified-id."


I do agree here, but it does reinforce that OFFSET_REF is no longer
just for members represented by pointer to member type. So that might
be something to take into consideration.


An OFFSET_REF that isn't type_unknown_p, agreed.


It might simplify things to remove the optimization in build_offset_ref
so we get an OFFSET_REF even for a single static member function, and
add support for that to cp_build_addr_expr_1.


I don't think this should be necessary, the "right thing" should just
be done for explicit-object member functions. With all the stuff going
on here that I missed I'm starting to wonder how function overloads
ever worked at all in my patch. On the other hand though, this
optimization probably could be documented better, but I very well might
have missed it even if it were.

Hell, maybe it needs a greater redesign altogether, it seems strange to
me to bundle overload information in with a construct for a specific
expression. (Assuming that's whats happening of course, I still don't
fully understand it.) It's not like this has rules unique to it for how
overload resolution is decided, right? Initializing a param/variable of
pointer to function type with an overloaded function resolves that with
similar rules, I think? Maybe it is a little different now that I write
it out loud.

I wasn't going to finish my musings about that, but it made me realize
that it might not actually be correct for address of explicit-object
member functions to be wrapped by OFFSET_REF. I mean surely it's fine
because based on what you've said static member functions are also
wrapped by OFFSET_REF, so it's likely fully implemented, especially
considering things worked before. But now that there are 2 different
varieties of class members that the address of them can be taken, it
might make sense to split things up a bit? Then again, why were static
member functions ever handled the same way? Taking the address of other
static members isn't handled in the same way here is it?


Functions are different because of overloading; in general we can't 
decide what an expression that names a function actually means until we 
have enough context to decide which function, exactly.  So we represent 
the id-expression largely as lookup+syntax until overload resolution 
turns it into a specific function.  The type_unknown_p check earlier in 
cp_build_addr_expr_1 is for that case.


An id-expression that names a single non-template function 
(!really_overloaded_fn) is handled somewhat differently, as we don't 
need to defer everything.  But that means various special-case code.


Currently build_offset_ref special-cases ::f for a single static 
member function, but we can't use the same special case for single 
explicit object member functions because we need to distinguish between 
::f and  somehow to check the requirement I quoted above.  So it 
seems to me we'll need to add support for single explicit object member 
functions in the OFFSET_REF handling in cp_build_addr_expr_1.  And I 
thought if we're doing that, perhaps we want to move the single static 
handling over there as well, but that's not necessary.


Jason



[PATCH] Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big.

2023-10-19 Thread liuhongt
>So with pow being available this limit shouldn't be necessary any more and
>the testcase adjustment can be avoided?
I tries, compile time still hogs on mpz_powm(3, INT_MAX), so i'll just
keep this.

>and to avoid undefined behavior with too large shift just go the gmp
>way unconditionally.
Changed.

>this comment is now resolved I think.
Removed.

>mpz_pow_ui uses unsigned long while i think we constrain known niters
>to uint64 - so I suggest to use mpz_powm instead (limiting to a possibly
>host specific limit - unsigned long - is unfortunately a no-go).
Changed.

There's loop in vect_peel_nonlinear_iv_init to get init_expr *
pow (step_expr, skip_niters). When skipn_iters is too big, compile time
hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to
init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is
pow of 2, otherwise give up vectorization when skip_niters >=
TYPE_PRECISION (TREE_TYPE (init_expr)).

Also give up vectorization when niters_skip is negative which will be
used for fully masked loop.

Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
Ok for trunk.

gcc/ChangeLog:

PR tree-optimization/111820
PR tree-optimization/111833
* tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give
up vectorization for nonlinear iv vect_step_op_mul when
step_expr is not exact_log2 and niters is greater than
TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize
for nagative niters_skip which will be used by fully masked
loop.
(vect_can_advance_ivs_p): Pass whole phi_info to
vect_can_peel_nonlinear_iv_p.
* tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize
init_expr * pow (step_expr, skipn) to init_expr
<< (log2 (step_expr) * skipn) when step_expr is exact_log2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr111820-1.c: New test.
* gcc.target/i386/pr111820-2.c: New test.
* gcc.target/i386/pr111820-3.c: New test.
* gcc.target/i386/pr103144-mul-1.c: Adjust testcase.
* gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
---
 .../gcc.target/i386/pr103144-mul-1.c  |  8 +++---
 .../gcc.target/i386/pr103144-mul-2.c  |  8 +++---
 gcc/testsuite/gcc.target/i386/pr111820-1.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-2.c| 16 +++
 gcc/testsuite/gcc.target/i386/pr111820-3.c| 16 +++
 gcc/tree-vect-loop-manip.cc   | 28 +--
 gcc/tree-vect-loop.cc | 13 ++---
 7 files changed, 90 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr111820-3.c

diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
index 640c34fd959..913d7737dcd 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-1.c
@@ -11,7 +11,7 @@ foo_mul (int* a, int b)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -23,7 +23,7 @@ foo_mul_const (int* a)
   for (int i = 0; i != N; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -34,7 +34,7 @@ foo_mul_peel (int* a, int b)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
 
@@ -46,6 +46,6 @@ foo_mul_peel_const (int* a)
   for (int i = 0; i != 39; i++)
 {
   a[i] = b;
-  b *= 3;
+  b *= 4;
 }
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c 
b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
index 39fdea3a69d..b2ff186e335 100644
--- a/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr103144-mul-2.c
@@ -16,12 +16,12 @@ avx2_test (void)
 
   __builtin_memset (epi32_exp, 0, N * sizeof (int));
   int b = 8;
-  v8si init = __extension__(v8si) { b, b * 3, b * 9, b * 27, b * 81, b * 243, 
b * 729, b * 2187 };
+  v8si init = __extension__(v8si) { b, b * 4, b * 16, b * 64, b * 256, b * 
1024, b * 4096, b * 16384 };
 
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, , 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul (epi32_dst, b);
@@ -32,11 +32,11 @@ avx2_test (void)
   if (__builtin_memcmp (epi32_dst, epi32_exp, 39 * 4) != 0)
 __builtin_abort ();
 
-  init = __extension__(v8si) { 1, 3, 9, 27, 81, 243, 729, 2187 };
+  init = __extension__(v8si) { 1, 4, 16, 64, 256, 1024, 4096, 16384 };
   for (int i = 0; i != N / 8; i++)
 {
   memcpy (epi32_exp + i * 8, , 32);
-  init *= 6561;
+  init *= 65536;
 }
 
   foo_mul_const (epi32_dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr111820-1.c 
b/gcc/testsuite/gcc.target/i386/pr111820-1.c
new file mode 100644
index 000..50e960c39d4
--- /dev/null
+++ 

Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Lehua Ding

Hi Patrick,

Thanks a lot for helping to test these patchs!

On 2023/10/20 2:04, Patrick O'Neill wrote:

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 




juzhe.zh...@rivai.ai

*From:* Lehua Ding 
*Date:* 2023-10-19 16:33
*To:* gcc-patches 
*CC:* juzhe.zhong ; kito.cheng
; rdapp.gcc
; palmer ;
jeffreyalaw ; lehua.ding

*Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make
the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
only maintain
   and modify this virtual CFG. Phase 4 performs insertion,
modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
   virtual CFG is called vsetvl_block_info and the vsetvl
information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
   info to a pred basic block to a more unified method that there
is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
   operand from the RVV instruction and removes the unused dest
operand
   register from the vsetvl insns.
These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
PR target/111037
PR target/111234
PR target/111725
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |    2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |    2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/vsetvl/avl_single-106.c |   34 +

Re: [PATCH] c++: print source code in print_instantiation_partial_context_line

2023-10-19 Thread Patrick Palka
On Tue, 3 Oct 2023, David Malcolm wrote:

> As mentioned in my Cauldron talk, this patch adds a call to
> diagnostic_show_locus to the "required from here" messages
> in print_instantiation_partial_context_line, so that e.g., rather
> than the rather mystifying:
> 
> In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
>  from ../../src/demo-1.C:1:
> ../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In 
> instantiation of ‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& 
> ...) [with _Tp = bar; _Args = {}; __detail::__unique_ptr_t<_Tp> = 
> __detail::__unique_ptr_t]’:
> ../../src/demo-1.C:15:32:   required from here
> ../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
> no matching function for call to ‘bar::bar()’
>  1066 | { return unique_ptr<_Tp>(new 
> _Tp(std::forward<_Args>(__args)...)); }
>   |  ^~~
> ../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
>10 |   bar (int);
>   |   ^~~
> ../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
> ../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
> 7 | class bar : public foo
>   |   ^~~
> ../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
> ../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
> ../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
> 
> we emit:
> 
> In file included from ../x86_64-pc-linux-gnu/libstdc++-v3/include/memory:78,
>  from ../../src/demo-1.C:1:
> ../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h: In 
> instantiation of ‘std::__detail::__unique_ptr_t<_Tp> std::make_unique(_Args&& 
> ...) [with _Tp = bar; _Args = {}; __detail::__unique_ptr_t<_Tp> = 
> __detail::__unique_ptr_t]’:
> ../../src/demo-1.C:15:32:   required from here
>15 |   return std::make_unique ();
>   |  ~~^~
> ../x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:1066:30: error: 
> no matching function for call to ‘bar::bar()’
>  1066 | { return unique_ptr<_Tp>(new 
> _Tp(std::forward<_Args>(__args)...)); }
>   |  ^~~
> ../../src/demo-1.C:10:3: note: candidate: ‘bar::bar(int)’
>10 |   bar (int);
>   |   ^~~
> ../../src/demo-1.C:10:3: note:   candidate expects 1 argument, 0 provided
> ../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(const bar&)’
> 7 | class bar : public foo
>   |   ^~~
> ../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
> ../../src/demo-1.C:7:7: note: candidate: ‘constexpr bar::bar(bar&&)’
> ../../src/demo-1.C:7:7: note:   candidate expects 1 argument, 0 provided
> 
> which shows the code that's leading to the error (the bad call to
> std::make_unique).

This is great!  I noticed however that the source code gets printed in a
surprising way in some contexts.  Consider:

template void f(typename T::type);

int main() {
  f(0);
}

For this testcase we emit:

testcase.C: In function ‘int main()’:
testcase.C:4:9: error: no matching function for call to ‘f(int)’
4 |   f(0);
  |   ~~^~~
testcase.C:1:24: note: candidate: ‘template void f(typename T::type)’
1 | template void f(typename T::type);
  |^
testcase.C:1:24: note:   template argument deduction/substitution failed:
testcase.C: In substitution of ‘template void f(typename T::type) 
[with T = int]’:
testcase.C:4:9:   required from here
testcase.C:1:24: note: 4 |   f(0);
testcase.C:1:24: note:   |   ~~^~~
testcase.C:1:24: error: ‘int’ is not a class, struct, or union type
1 | template void f(typename T::type);
  |^

In particular the source code part following the "required from here" line

testcase.C:4:9:   required from here
testcase.C:1:24: note: 4 |   f(0);
testcase.C:1:24: note:   |   ~~^~~

seems off, I would have expected it be

testcase.C:4:9:   required from here
4 |   f(0);
  |   ~~^~~

i.e. without the "testcase.C:1:24: note:  " prefix.  Does this look
expected?  (I also wonder if we might want to omit printing the source
code altogether in this case, since we already printed that same line
earlier during the "no matching function" error?)

> 
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> 
> gcc/cp/ChangeLog:
>   * error.cc (print_instantiation_partial_context_line): Call
>   diagnostic_show_locus.
> 
> gcc/testsuite/ChangeLog:
>   * g++.dg/diagnostic/static_assert3.C: Add directives for
>   additional source printing.
>   * g++.dg/template/error60.C: New test.
> 
> Signed-off-by: David Malcolm 
> ---
>  gcc/cp/error.cc   |  2 +
>  .../g++.dg/diagnostic/static_assert3.C|  7 +++-
>  

[Bug c++/111885] [14 Regression] source code after "required from here" note sometimes printed strangely

2023-10-19 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111885

Patrick Palka  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Patrick Palka  ---
On second thought I reckon I'll just note this observation in the patch email
thread instead, sorry for the noise...

[Bug tree-optimization/50856] ARM: suboptimal code for absolute difference calculation

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50856

--- Comment #3 from Andrew Pinski  ---
The second case will be solved by updating the patch at:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574892.html

For the review at
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574948.html

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread waffl3x
> (waffl3x (me))
> At a glance it seems like all I need to do then is disable the
> PTRMEM_OK_P flag then.

I'm just now realizing that I'm almost certainly wrong about this. It
still needs PTRMEM_OK_P set if there are any implicit-object member
functions in the overload set. That is, if OFFSET_REF includes that
information... but it doesn't seem like it does? Reading the
information on OFFSET_REF, particularly build_offset_ref, seems to
indicate that OFFSET_REF (at least historically) was only for things
with a pointer to member type.

> > An OFFSET_REF (with PTRMEM_OK_P) is used to express that we saw the
> > ::f syntax, so we could build a pointer to member if it resolves to an
> > implicit-object member function.
> > 
> > For an overload set containing only a single static member function,
> > build_offset_ref doesn't bother to build an OFFSET_REF, but returns the
> > BASELINK itself.

Based on what you've said, I assume that OFFSET_REF handles static
member functions that are overloaded. But as I've said this seems to
contradict the comments I'm reading, so I'm not sure that I'm
understanding you correctly.

I suppose that will be pretty easy to test, so I'll just do that as
well.

> > I think we need the OFFSET_REF for an explicit-object member function
> > because it expresses that the code satisfies the requirement "If the
> > operand names an explicit object member function, the operand shall be a
> > qualified-id."

I do agree here, but it does reinforce that OFFSET_REF is no longer
just for members represented by pointer to member type. So that might
be something to take into consideration.

> > It might simplify things to remove the optimization in build_offset_ref
> > so we get an OFFSET_REF even for a single static member function, and
> > add support for that to cp_build_addr_expr_1.

I don't think this should be necessary, the "right thing" should just
be done for explicit-object member functions. With all the stuff going
on here that I missed I'm starting to wonder how function overloads
ever worked at all in my patch. On the other hand though, this
optimization probably could be documented better, but I very well might
have missed it even if it were.

Hell, maybe it needs a greater redesign altogether, it seems strange to
me to bundle overload information in with a construct for a specific
expression. (Assuming that's whats happening of course, I still don't
fully understand it.) It's not like this has rules unique to it for how
overload resolution is decided, right? Initializing a param/variable of
pointer to function type with an overloaded function resolves that with
similar rules, I think? Maybe it is a little different now that I write
it out loud.

I wasn't going to finish my musings about that, but it made me realize
that it might not actually be correct for address of explicit-object
member functions to be wrapped by OFFSET_REF. I mean surely it's fine
because based on what you've said static member functions are also
wrapped by OFFSET_REF, so it's likely fully implemented, especially
considering things worked before. But now that there are 2 different
varieties of class members that the address of them can be taken, it
might make sense to split things up a bit? Then again, why were static
member functions ever handled the same way? Taking the address of other
static members isn't handled in the same way here is it?

I'm probably spending too much time thinking about it when I don't
fully understand how it's all being done, I'll just go back to poking
around trying to figure it all out. Then I'll worry about whether
thing's should be done differently or not.

Alex



Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-19 Thread Kees Cook
On Wed, Oct 18, 2023 at 09:11:43PM +, Qing Zhao wrote:
> As I replied to Martin in another email, I plan to do the following to 
> resolve this issue:
> 
> 1. No specification for signed or unsigned for counted_by field.
> 2. Add a sanitizer option -fsanitize=counted-by-bound to catch the cases when 
> the size of the counted-by is not positive.

I don't understand why this needs to be a runtime sanitizer. The
signedness is known at compile time, so I would expect a -W option. Or
do you mean you'd split up -fsanitize=bounds between unsigned and signed
indexes? I'd find that kind of awkward for the kernel... but I feel like
I've misunderstood something. :)

-Kees

-- 
Kees Cook


Re: [PATCH] c: Add -Wreturn-mismatch warning, split from -Wreturn-type

2023-10-19 Thread Eric Gallager
On Thu, Oct 19, 2023 at 5:08 PM Florian Weimer  wrote:
>
> The existing -Wreturn-type option covers both constraint violations
> (which are mandatory to diagnose) and warnings that have known
> false positives.  The new -Wreturn-mismatch warning is only about
> the constraint violations (missing or extra return expressions),
> and should eventually be turned into a permerror.
>
> The -std=gnu89 test cases show that by default, we do not warn for
> return; in a function not returning void.  This matches previous
> practice for -Wreturn-type.
>
> gcc/c-family/
>
> * c.opt (Wreturn-mismatch): New.
>
> gcc/c/
>
> * c-typeck.cc (c_finish_return): Use pedwarn with
> OPT_Wreturn_mismatch for missing/extra return expressions.
>
> gcc/
>
> * doc/invoke.texi (Warning Options): Document
> -Wreturn-mismatch.  Update -Wreturn-type documentation.
>
> gcc/testsuite/
>
> * gcc.dg/Wreturn-mismatch-1.c: New.
> * gcc.dg/Wreturn-mismatch-2.c: New.
> * gcc.dg/Wreturn-mismatch-3.c: New.
> * gcc.dg/Wreturn-mismatch-4.c: New.
> * gcc.dg/Wreturn-mismatch-5.c: New.
> * gcc.dg/noncompile/pr55976-1.c: Change -Werror=return-type
> to -Werror=return-mismatch.
> * gcc.dg/noncompile/pr55976-2.c: Likewise.
>
> ---
>  gcc/c-family/c.opt  |  4 +++
>  gcc/c/c-typeck.cc   | 15 +++
>  gcc/doc/invoke.texi | 35 +++-
>  gcc/testsuite/gcc.dg/Wreturn-mismatch-1.c   | 40 
>  gcc/testsuite/gcc.dg/Wreturn-mismatch-2.c   | 41 
> +
>  gcc/testsuite/gcc.dg/Wreturn-mismatch-3.c   | 40 
>  gcc/testsuite/gcc.dg/Wreturn-mismatch-4.c   | 40 
>  gcc/testsuite/gcc.dg/Wreturn-mismatch-5.c   | 40 
>  gcc/testsuite/gcc.dg/noncompile/pr55976-1.c |  2 +-
>  gcc/testsuite/gcc.dg/noncompile/pr55976-2.c |  2 +-
>  10 files changed, 234 insertions(+), 25 deletions(-)
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 44b9c862c14..4c2ebd08af5 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1263,6 +1263,10 @@ Wreorder
>  C++ ObjC++ Var(warn_reorder) Warning LangEnabledBy(C++ ObjC++,Wall)
>  Warn when the compiler reorders code.
>
> +Wreturn-mismatch
> +C ObjC Var(warn_return_mismatch) Warning Init(1)
> +Warn whenever void-returning functions return a non-void expressions, or a 
> return expression is missing in a function not returning void.
> +
>  Wreturn-type
>  C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ 
> ObjC++,Wall) Init(-1)
>  Warn whenever a function's return type defaults to \"int\" (C), or about 
> inconsistent return types (C++).
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 6e044b4afbc..1da622160a3 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -11281,17 +11281,10 @@ c_finish_return (location_t loc, tree retval, tree 
> origtype)
>if ((warn_return_type >= 0 || flag_isoc99)
>   && valtype != NULL_TREE && TREE_CODE (valtype) != VOID_TYPE)
> {
> - bool warned_here;
> - if (flag_isoc99)
> -   warned_here = pedwarn
> - (loc, warn_return_type >= 0 ? OPT_Wreturn_type : 0,
> -  "% with no value, in function returning non-void");
> - else
> -   warned_here = warning_at
> - (loc, OPT_Wreturn_type,
> -  "% with no value, in function returning non-void");
>   no_warning = true;
> - if (warned_here)
> + if (pedwarn (loc, OPT_Wreturn_mismatch,
> +  "% with no value,"
> +  " in function returning non-void"))
> inform (DECL_SOURCE_LOCATION (current_function_decl),
> "declared here");
> }
> @@ -11302,7 +11295,7 @@ c_finish_return (location_t loc, tree retval, tree 
> origtype)
>bool warned_here;
>if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
> warned_here = pedwarn
> - (xloc, warn_return_type >= 0 ? OPT_Wreturn_type : 0,
> + (xloc, OPT_Wreturn_mismatch,
>"% with a value, in function returning void");
>else
> warned_here = pedwarn
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9b5ff457027..c0c155af7cd 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7338,22 +7338,33 @@ This warning is enabled by @option{-Wall} for C and 
> C++.
>  Do not warn about returning a pointer (or in C++, a reference) to a
>  variable that goes out of scope after the function returns.
>
> +@opindex Wreturn-mismatch
> +@opindex Wno-return-mismatch
> +@item -Wreturn-mismatch
> +Warn about return statements without an expressions in functions which
> +do not return @code{void}.  Also warn about a @code{return} statement
> +with an expression in a function 

Re: Improve -Wflex-array-member-not-at-end changes.html wording |Plus: and warning bug? (was: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.)

2023-10-19 Thread Kees Cook
On Thu, Oct 19, 2023 at 08:49:00PM +, Qing Zhao wrote:
> > On Sep 25, 2023, at 2:24 PM, Tobias Burnus  wrote:
> > Secondly, if this is deprecated, shouldn't then the warning enabled by, 
> > e.g., -Wall or made
> > otherwise more prominent? (-std=?) - Currently, one either has to find the 
> > new flag or use
> > -pedantic.
> 
> Yes, agreed, However, I think that it might be better to delay this to next 
> GCC release by giving users plenty time to fix all the 
> -Wflex-array-member-not-at-end warnings.  As I know, linux kernel exposed a 
> lot of warnings when adding -Wflex-array-member-not-at-end, and kernel people 
> are trying to fix all these warnings in the source base.  

Yeah, for the code bases that use flexible arrays (C99 or GNU
Extension), there are cases where they're used as intentional implicit
unions, and the refactoring to make them unambiguous is non-trivial. I
think it may need to be some time before -Wflex-array-member-not-at-end
ends up in -Wall.

I gave some examples of this code pattern (and potential solutions)
here:
https://lore.kernel.org/lkml/202310161249.43FB42A6@keescook

-- 
Kees Cook


Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Eric Gallager
On Thu, Oct 19, 2023 at 6:43 AM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-10-19T11:57:33+0200, Andreas Schwab  wrote:
> > On Okt 19 2023, Thomas Schwinge wrote:
> >> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
> >>> I guess I can ask, why there is not a recursive approach for configuring
> >>> GCC. e.g. AC_SUBDIRS in the top level?
> >>
> >> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> >> ask the right questions...  ;-)
> >>
> >> What do people think about the attached
> >> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.
> >
> > The top-level files are shared with binutils-gdb, which has a different
> > set of subdirs.
>
> Good point, thanks!  Fortunately, the failure mode for non-existing
> directories is non-fatal (skipped with 'subdirectory [...] not present'
> diagnostic); with my original "Enable top-level recursive 'autoreconf'"
> (also re-attached) applied to Binutils/GDB Git master branch, we get:
>
> $ PATH=[...] autoreconf -v
> autoreconf: Entering directory `.'
> autoreconf: configure.ac: not using Gettext
> autoreconf: running: aclocal
> autoreconf: configure.ac: tracing
> autoreconf: configure.ac: subdirectory c++tools not present
> autoreconf: configure.ac: subdirectory fixincludes not present
> autoreconf: configure.ac: subdirectory gcc not present
> autoreconf: configure.ac: subdirectory gcc/m2 not present
> autoreconf: configure.ac: subdirectory gnattools not present
> autoreconf: configure.ac: subdirectory gotools not present
> autoreconf: configure.ac: adding subdirectory intl to autoreconf
> autoreconf: Entering directory `intl'
> [...]
> autoreconf: Leaving directory `intl'
> autoreconf: configure.ac: subdirectory libada not present
> autoreconf: configure.ac: subdirectory libatomic not present
> autoreconf: configure.ac: adding subdirectory libbacktrace to autoreconf
> autoreconf: Entering directory `libbacktrace'
> [...]
>
> So we could (a) simply list *all* directories in the shared top-level
> 'configure.ac', or (b) configure GCC vs. other projrcts via a non-shared
> file ('m4_include([config/AC_CONFIG_SUBDIRS.m4])' or similar -- is there
> an established procedure for non-shared top-level files)?  (I don't have
> a strong preference either way.)
>

I'd just like to note that I have GCC bug 103459 open about silencing
warnings from autoreconf:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103459
Although I guess in this case, they're just notices, and not warnings,
so it's probably okay. (-Werror doesn't turn them into errors, does
it?)

> It's just GCC and Binutils/GDB, or are the top-level files also shared
> with additional projects?
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: Enable top-level recursive 'autoreconf'

2023-10-19 Thread Eric Gallager
On Thu, Oct 19, 2023 at 6:43 AM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2023-10-19T11:57:33+0200, Andreas Schwab  wrote:
> > On Okt 19 2023, Thomas Schwinge wrote:
> >> On 2023-10-18T15:42:18+0100, R jd <3246251196r...@gmail.com> wrote:
> >>> I guess I can ask, why there is not a recursive approach for configuring
> >>> GCC. e.g. AC_SUBDIRS in the top level?
> >>
> >> ('AC_CONFIG_SUBDIRS' you mean.)  You know, often it just takes someone to
> >> ask the right questions...  ;-)
> >>
> >> What do people think about the attached
> >> "Enable top-level recursive 'autoreconf'"?  Only lightly tested, so far.
> >
> > The top-level files are shared with binutils-gdb, which has a different
> > set of subdirs.
>
> Good point, thanks!  Fortunately, the failure mode for non-existing
> directories is non-fatal (skipped with 'subdirectory [...] not present'
> diagnostic); with my original "Enable top-level recursive 'autoreconf'"
> (also re-attached) applied to Binutils/GDB Git master branch, we get:
>
> $ PATH=[...] autoreconf -v
> autoreconf: Entering directory `.'
> autoreconf: configure.ac: not using Gettext
> autoreconf: running: aclocal
> autoreconf: configure.ac: tracing
> autoreconf: configure.ac: subdirectory c++tools not present
> autoreconf: configure.ac: subdirectory fixincludes not present
> autoreconf: configure.ac: subdirectory gcc not present
> autoreconf: configure.ac: subdirectory gcc/m2 not present
> autoreconf: configure.ac: subdirectory gnattools not present
> autoreconf: configure.ac: subdirectory gotools not present
> autoreconf: configure.ac: adding subdirectory intl to autoreconf
> autoreconf: Entering directory `intl'
> [...]
> autoreconf: Leaving directory `intl'
> autoreconf: configure.ac: subdirectory libada not present
> autoreconf: configure.ac: subdirectory libatomic not present
> autoreconf: configure.ac: adding subdirectory libbacktrace to autoreconf
> autoreconf: Entering directory `libbacktrace'
> [...]
>
> So we could (a) simply list *all* directories in the shared top-level
> 'configure.ac', or (b) configure GCC vs. other projrcts via a non-shared
> file ('m4_include([config/AC_CONFIG_SUBDIRS.m4])' or similar -- is there
> an established procedure for non-shared top-level files)?  (I don't have
> a strong preference either way.)
>

I'd just like to note that I have GCC bug 103459 open about silencing
warnings from autoreconf:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103459
Although I guess in this case, they're just notices, and not warnings,
so it's probably okay. (-Werror doesn't turn them into errors, does
it?)

> It's just GCC and Binutils/GDB, or are the top-level files also shared
> with additional projects?
>
>
> Grüße
>  Thomas
>
>
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: gcc 13.2 is missing warnings?

2023-10-19 Thread Eric Gallager
On Thu, Oct 19, 2023 at 7:52 AM Martin Uecker  wrote:
>
>
>
> Note that the C++ warning is for jumping over a declaration,
> which is generally allowed in C but not in C++.
>
> Martin

(Also note that in C, there's -Wjump-misses-init for this, which is
enabled by -Wc++-compat, which isn't enabled by anything else, and has
to be requested manually)

>
> Am Donnerstag, dem 19.10.2023 um 13:49 +0200 schrieb Martin Uecker:
> >
> >
> > GCC supports this as an extension.
> >
> > Mixing declarations and code is allowed in C99 and C23
> > will also allow placing labels before declarations and at
> > the end of a compound statement. GCC supports all this
> > also in earlier language modes.
> >
> > See:
> > https://gcc.gnu.org/onlinedocs/gcc/Mixed-Labels-and-Declarations.html
> >
> > You will get the warnings with -pedantic.
> >
> > Martin
> >
> > Am Donnerstag, dem 19.10.2023 um 07:39 -0400 schrieb Eric Sokolowsky via 
> > Gcc:
> > > I am using gcc 13.2 on Fedora 38. Consider the following program.
> > >
> > > #include 
> > > int main(int argc, char **argv)
> > > {
> > > printf("Enter a number: ");
> > > int num = 0;
> > > scanf("%d", );
> > >
> > > switch (num)
> > > {
> > > case 1:
> > > int a = num + 3;
> > > printf("The new number is %d.\n", a);
> > > break;
> > > case 2:
> > > int b = num - 4;
> > > printf("The new number is %d.\n", b);
> > > break;
> > > default:
> > > int c = num * 3;
> > > printf("The new number is %d.\n", c);
> > > break;
> > > }
> > > }
> > >
> > > I would expect that gcc would complain about the declaration of
> > > variables (a, b, and c) within the case statements. When I run "gcc
> > > -Wall t.c" I get no warnings. When I run "g++ -Wall t.c" I get
> > > warnings and errors as expected. I do get warnings when using MinGW on
> > > Windows (gcc version 6.3 specifically). Did something change in 13.2?
> > >
> > > Eric
> >
>


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread waffl3x


> A BASELINK expresses the result of name lookup for a member function,
> since we need to pass information about the name lookup context along to
> after overload resolution.
> 
> An OFFSET_REF (with PTRMEM_OK_P) is used to express that we saw the
> ::f syntax, so we could build a pointer to member if it resolves to an
> implicit-object member function.
> 
> For an overload set containing only a single static member function,
> build_offset_ref doesn't bother to build an OFFSET_REF, but returns the
> BASELINK itself.
> 
> I think we need the OFFSET_REF for an explicit-object member function
> because it expresses that the code satisfies the requirement "If the
> operand names an explicit object member function, the operand shall be a
> qualified-id."
> 
> It might simplify things to remove the optimization in build_offset_ref
> so we get an OFFSET_REF even for a single static member function, and
> add support for that to cp_build_addr_expr_1.
> 
> Jason

Ah okay I think that sheds a little bit of light on things, and here I
was trying not to involve overloads to make it easier for me to
understand things, it seems it ended up making me miss some things.

At a glance it seems like all I need to do then is disable the
PTRMEM_OK_P flag then. I will try that and see how it goes, provided I
can find where it's all setup. (I think I'm almost to the bottom of it,
but it's tough to unravel so I'm not sure.)

I'm also now realizing it's probably about time I add more tests
involving overloads, because I avoided that early on and I don't think
I ever got around to adding any for that.

Alex



gcc-11-20231019 is now available

2023-10-19 Thread GCC Administrator via Gcc
Snapshot gcc-11-20231019 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20231019/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 71f2b05f8f494b34d2db4b90683ef76ec338bb19

You'll find:

 gcc-11-20231019.tar.xz   Complete GCC

  SHA256=5139731854dda6797c76b9b1c1b7ccde080fae1e2d99ba962beb078638db2ab0
  SHA1=a8398f30ec58dca73ced1a985907835fcb8a

Diffs from 11-20231012 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug c++/54367] [meta-bug] lambda expressions

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54367
Bug 54367 depends on bug 79021, which changed state.

Bug 79021 Summary: attribute noreturn on function template ignored in generic 
lambda
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79021

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug c++/79021] attribute noreturn on function template ignored in generic lambda

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79021

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |8.5

--- Comment #4 from Andrew Pinski  ---
Yes this was the same issue as PR 94742 . I will note in the original testcase,
the f call inside h was not a depedent call would have been resolved and if you
make it a depedent call, then you run into the same issue as the generic
lambda. Or if you change the call to f inside the generic lambda to being
non-depedent, you don't get the warning.

So this is all fixed with a testcase added already.

[Bug c++/79021] attribute noreturn on function template ignored in generic lambda

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79021

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||needs-bisection
  Known to work||10.1.0

--- Comment #3 from Andrew Pinski  ---
Looks to be fixed in GCC 10 though.

[Bug c++/108238] auto return type and some attributes don't get along

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108238

--- Comment #4 from Andrew Pinski  ---
Here is one which is a little more complex for templated function too:
```
template
[[gnu::returns_nonnull]]
auto f() {
  return new T(42);
}

auto g(void)
{
   return f();
}
```

[Bug target/111528] aarch64: Test gfortran.dg/pr80494.f90 fails with -fstack-protector-strong with gcc-13 since r13-7813-gb96e66fd4ef3e3

2023-10-19 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111528

Richard Sandiford  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #7 from Richard Sandiford  ---
Fixed on GCC 12 and 13 branches.  I'm a bit nervous about backporting the fix
to GCC 11 given that the next release will be the last.

Re: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-19 Thread Richard Sandiford
"juzhe.zh...@rivai.ai"  writes:
> Hi, this patch fix V4 issue:
>
> Previously as Richard S commented:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633178.html 
>
> slp_op and mask_vectype are only initialised when mask_index >= 0.
> Shouldn't this code be under mask_index >= 0 too?
> Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
> node has a known vectype by this point.  I think a comment would be useful.
>
> Since I didn't encounter mismatched case in the regression of RISC-V and X86, 
> so 
> I fix it in V4 patch as follows:
> +  if (mask_index >= 0 && slp_node)
> + {
> +   bool match_p
> + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
> +   gcc_assert (match_p);
> + }
> Add assertion here.
>
> However, recently an ICE suddenly appear today in RISC-V regression:
>
> FAIL: gcc.dg/tree-ssa/pr44306.c (internal compiler error: in 
> vectorizable_load, at tree-vect-stmts.cc:9885)
> FAIL: gcc.dg/tree-ssa/pr44306.c (test for excess errors)
>
> This is because we are encountering that mask_vectype is boolean type and it 
> is external def.
> Then vect_maybe_update_slp_op_vectype will return false.
>
> Then I fix this piece of code in V5 here:
>
> +  if (mask_index >= 0 && slp_node
> +   && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype))
> + {
> +   /* We don't vectorize the boolean type external SLP mask.  */
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "incompatible vector types for invariants\n");
> +   return false;
> + }
>
> Bootstrap and Regression on x86 passed.

Why are external defs a problem though?  E.g. in pr44306, it looks like
we should be able to create an invariant mask that contains

   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   (!present[0]) || UseDefaultScalingMatrix8x8Flag[0]
   (!present[1]) || UseDefaultScalingMatrix8x8Flag[1]
   ...repeating...

The type of the mask can be driven by the code that needs it.

Thanks,
Richard

>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>  
> From: Juzhe-Zhong
> Date: 2023-10-18 20:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; Juzhe-Zhong
> Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
> This patch fixes this following FAILs in RISC-V regression:
>  
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>  
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>  
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>  
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
> condtional mask).
>
>This situation we just need to leverage the current MASK_GATHER_LOAD which 
> can achieve SLP MASK_LEN_GATHER_LOAD.
>  
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, 
> zero, -1)
>
>Current SLP check will failed on dummy mask -1, so we relax the check in 
> tree-vect-slp.cc and allow it to be materialized.
> 
> Consider this following case:
>  
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   y[i * 2] = x[indices[i * 2]] + 1;
>   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>  
> https://godbolt.org/z/WG3M3n7Mo
>  
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>  
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> sllia6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>  
> After this patch:
>  
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread Jason Merrill

On 10/19/23 17:05, waffl3x wrote:

Also, I'm not sure what %qs is, should I be using that instead of %s
for strings?


The q prefix means quoted, with ' or other quotation marks, depending on 
the locale.



On another topic, I have been trying to fix taking pointers to explicit
object member functions today, as I ended up breaking that when I
started setting static_function to false for them. Originally it just
worked so I haven't touched any code related to this, but now that they
aren't being treating like static member functions as much so a few
things just broke. What I'm asking myself now is whether it would be
appropriate to just opt-in to static member function behavior for this
one, and I'm not sure that would be correct.

So I started by checking what it did before I turned off the
static_function flag. It's was being passed into cp_build_addr_expr_1
as a baselink node, while regular member functions are passed in as an
offset_ref node. I then checked what the case is for static member
function, and unsurprisingly those are also handled wrapped in a
baselink node, but this raised some questions for me.

I am now trying to figure out what exactly a baselink is, and why it's
used for static member functions. My current best guess is that
method_type nodes already hold the information that a baselink does,
and that information is needed in general. If that is the case, it
might just be correct to just do the same thing for explicit object
member functions, but I wonder if there is more to it, but maybe there
isn't. It worked just fine before when the static_function was still
being set after all.

Any insight on this is appreciated.


A BASELINK expresses the result of name lookup for a member function, 
since we need to pass information about the name lookup context along to 
after overload resolution.


An OFFSET_REF (with PTRMEM_OK_P) is used to express that we saw the 
::f syntax, so we could build a pointer to member if it resolves to an 
implicit-object member function.


For an overload set containing only a single static member function, 
build_offset_ref doesn't bother to build an OFFSET_REF, but returns the 
BASELINK itself.


I think we need the OFFSET_REF for an explicit-object member function 
because it expresses that the code satisfies the requirement "If the 
operand names an explicit object member function, the operand shall be a 
qualified-id."


It might simplify things to remove the optimization in build_offset_ref 
so we get an OFFSET_REF even for a single static member function, and 
add support for that to cp_build_addr_expr_1.


Jason



Re: [PATCH] vect: Cost adjacent vector loads/stores together [PR111784]

2023-10-19 Thread Richard Sandiford
"Kewen.Lin"  writes:
> Hi,
>
> As comments[1][2], this patch is to change the costing way
> on some adjacent vector loads/stores from costing one by
> one to costing them together with the total number once.
>
> It helps to fix the exposed regression PR111784 on aarch64,
> as aarch64 specific costing could make different decisions
> according to the different costing ways (counting with total
> number vs. counting one by one).  Based on a reduced test
> case from PR111784, only considering vec_num can fix the
> regression already, but vector loads/stores in regard to
> ncopies are also adjacent accesses, so they are considered
> as well.
>
> btw, this patch leaves the costing on dr_explicit_realign
> and dr_explicit_realign_optimized alone to make it simple.
> The costing way change can cause the differences for them
> since there is one costing depending on targetm.vectorize.
> builtin_mask_for_load and it's costed according to the
> calling times.  IIUC, these two dr_alignment_support are
> mainly used for old Power? (only having 16 bytes aligned
> vector load/store but no unaligned vector load/store).
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu, powerpc64-linux-gnu P{7,8,9}
> and powerpc64le-linux-gnu P{8,9,10}.
>
> Is it ok for trunk?
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630742.html
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630744.html
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
>   * tree-vect-stmts.cc (vectorizable_store): Adjust costing way for
>   adjacent vector stores, by costing them with the total number
>   rather than costing them one by one.
>   (vectorizable_load): Adjust costing way for adjacent vector
>   loads, by costing them with the total number rather than costing
>   them one by one.

OK.  Thanks for doing this!  Like Richard says, the way that the aarch64
cost hooks rely on the count isn't super robust, but I think it's the
best we can easily do in the circumstances.  Hopefully costing will
become much easier once the non-SLP representation goes away.

Richard

> ---
>  gcc/tree-vect-stmts.cc | 137 -
>  1 file changed, 95 insertions(+), 42 deletions(-)
>
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index b3a56498595..af134ff2bf7 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -8681,6 +8681,9 @@ vectorizable_store (vec_info *vinfo,
>alias_off = build_int_cst (ref_type, 0);
>stmt_vec_info next_stmt_info = first_stmt_info;
>auto_vec vec_oprnds (ncopies);
> +  /* For costing some adjacent vector stores, we'd like to cost with
> +  the total number of them once instead of cost each one by one. */
> +  unsigned int n_adjacent_stores = 0;
>for (g = 0; g < group_size; g++)
>   {
> running_off = offvar;
> @@ -8738,10 +8741,7 @@ vectorizable_store (vec_info *vinfo,
>store to avoid ICE like 110776.  */
> if (VECTOR_TYPE_P (ltype)
> && known_ne (TYPE_VECTOR_SUBPARTS (ltype), 1U))
> - vect_get_store_cost (vinfo, stmt_info, 1,
> -  alignment_support_scheme,
> -  misalignment, _cost,
> -  cost_vec);
> + n_adjacent_stores++;
> else
>   inside_cost
> += record_stmt_cost (cost_vec, 1, scalar_store,
> @@ -8798,11 +8798,18 @@ vectorizable_store (vec_info *vinfo,
>   break;
>   }
>
> -  if (costing_p && dump_enabled_p ())
> - dump_printf_loc (MSG_NOTE, vect_location,
> -  "vect_model_store_cost: inside_cost = %d, "
> -  "prologue_cost = %d .\n",
> -  inside_cost, prologue_cost);
> +  if (costing_p)
> + {
> +   if (n_adjacent_stores > 0)
> + vect_get_store_cost (vinfo, stmt_info, n_adjacent_stores,
> +  alignment_support_scheme, misalignment,
> +  _cost, cost_vec);
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "vect_model_store_cost: inside_cost = %d, "
> +  "prologue_cost = %d .\n",
> +  inside_cost, prologue_cost);
> + }
>
>return true;
>  }
> @@ -8909,6 +8916,9 @@ vectorizable_store (vec_info *vinfo,
>  {
>gcc_assert (!slp && grouped_store);
>unsigned inside_cost = 0, prologue_cost = 0;
> +  /* For costing some adjacent vector stores, we'd like to cost with
> +  the total number of them once instead of cost each one by one. */
> +  unsigned int n_adjacent_stores = 0;
>for (j = 0; j < ncopies; j++)
>   {
> gimple *new_stmt;
> @@ -8974,10 

[Bug target/111876] bf16 complex mul/div does not work when the target has +fp16 support or when -fexcess-precision=16 is supplied

2023-10-19 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111876

--- Comment #5 from Iain Sandoe  ---
for the record an __fp16 implementation works as expected;

* when the target does not support +fp16, the code-gen promotes to float and
does the multiply with __mulsc3

* when the target supports +fp16, the code-gen uses __mulhc3 (which is
implemented in libgcc)

* adding "+bf16" does not affect this.

---

So the issue is that __bf16 works when there is no support for a 16b float,
(because, like fp16, it gets promoted) - but fails when the target declares
support for a different 16 float.

[Bug c++/108238] returns_nonnull attribute with auto return type fails to compile

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108238

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #3 from Andrew Pinski  ---
alloc_align, alloc_size, assume_aligned, malloc attributes have a similar
issue.
const and pure does too.

```
[[gnu::const]]
auto f(){}

[[gnu::const]]
void f1(){}
```

warn_unused_result has a similar issue too:
```
[[gnu::warn_unused_result]]
auto f(){}

[[gnu::warn_unused_result]]
void f1(){}
```

There might be others too.

Re: Arm assembler crc issue

2023-10-19 Thread Iain Sandoe
Hi Richard,

> On 19 Oct 2023, at 22:49, Richard Sandiford  wrote:
> Iain Sandoe  writes:

>> I am being bitten by a problem that falls out from the code that emits
>> 
>>  .arch Armv8.n-a+crc
>> 
>> when the arch is less than Armv8-r.
>> The code that does this,  in gcc/common/config/aarch64 is quite recent 
>> (2022-09).
> 
> Heh.  A workaround for one assembler bug triggers another assembler bug.

Indeed …  the good news is that the LLVM bug seems fixed on current release 
(17) and main.
The bad news is that it will likely take some to percolate through (and it 
won’t help released
toolsets anyway).



>> So, it seems that this ought to be a reasonable configure test:
>> 
>>  .arch armv8.2-a
>>  .text
>> m:
>>  crc32b w0, w1, w2 
>> 
>> and then emit HAS_GAS_AARCH64_CRC_BUG (for example) if that fails to 
>> assemble which can be used to make the +crc emit conditional on a broken 
>> assembler.
> 
> AIUI the problem was in the CPU descriptions, so I don't think this
> would test for the old gas bug that is being worked around.

I see,

> Perhaps instead we could have a configure test for the bug that you've
> found, and disable the crc workaround if so?

OK - I’ll work in that direction, thanks
Iain

> 
> Thanks,
> Richard
> 
>> 
>> - I am asking here before constructing the patch, in case there’s some 
>> reason that doing this at configure time is not acceptable.
>> 
>> thanks
>> Iain



Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 09:31:06PM +, waffl3x wrote:
> Ah alright, I see what you're saying, I see what the difference is now.
> It's a shame we can't have the translated string insert a %s and format
> into that :^). Ah well, I guess this code is just doomed to look poor
> then, what can you do.

Consider e.g. de.po:
msgid "Generate code to check exception specifications."
msgstr "Code zur Überprüfung von Exception-Spezifikationen erzeugen."
If you try to construct the above from 2 parts as
"%s to check exception specifications.", _("Generate code")
then the german translator will need to give up, because the verb
needs to go last and noun first, so translating "Generate code"
to "Code erzeugen" and the rest can't be done, you want to put one word
in one place and another at another.  A lot of languages
have quite strict rules on order of words in sentence, unlike say
Latin with its comparatively free word order.
You could be lucky, but without knowing at least all the currently
supported languages it can be hard to prove it is ok.
Furthermore, it could prevent some other language translations from being
added.

Jakub



Re: Arm assembler crc issue

2023-10-19 Thread Richard Sandiford via Gcc
Iain Sandoe  writes:
> Hi Richard,
>
>
> I am being bitten by a problem that falls out from the code that emits
>
>   .arch Armv8.n-a+crc
>
> when the arch is less than Armv8-r.
> The code that does this,  in gcc/common/config/aarch64 is quite recent 
> (2022-09).

Heh.  A workaround for one assembler bug triggers another assembler bug.

The special treatment of CRC is much older than 2022-09 though.  I think
it dates back to 04a99ebecee885e42e56b6e0c832570e2a91c196 (2016-04),
with 4ca82fc9f86fc1187ee112e3a637cb3ca5d2ef2a providing the more
complete explanation.

>
> --
>
> (I admit the permutations are complex and I might have miss-analyzed) - but 
> it appears that llvm assembler (for mach-o, at least) sees an explict mention 
> of an attribute for a feature which is mandatory at a specified arch level as 
> demoting that arch to the minimum that made the explicit feature mandatory.  
> Of course, it could just be a bug in the handling of transitive feature 
> enables...
>
> the problem is that, for example:
>
>   .arch Armv8.4-a+crc
>
> no longer recognises fp16 insns. (and appending +fp16 does not fix this).
>
> 
>
> Even if upstream LLVM is deemed to be buggy (it does not do what I would 
> expect, at least), and fixed - I will still have a bunch of assembler 
> versions that are broken (before the fix percolates through to downstream 
> xcode) - and the LLVM assembler is the only current option for Darwin.
>
> So, it seems that this ought to be a reasonable configure test:
>
>   .arch armv8.2-a
>   .text
> m:
>   crc32b w0, w1, w2 
>
> and then emit HAS_GAS_AARCH64_CRC_BUG (for example) if that fails to assemble 
> which can be used to make the +crc emit conditional on a broken assembler.

AIUI the problem was in the CPU descriptions, so I don't think this
would test for the old gas bug that is being worked around.

Perhaps instead we could have a configure test for the bug that you've
found, and disable the crc workaround if so?

Thanks,
Richard

>
> - I am asking here before constructing the patch, in case there’s some reason 
> that doing this at configure time is not acceptable.
>
> thanks
> Iain


Re: [PATCH 2/2] c++: remove NON_DEPENDENT_EXPR, part 2

2023-10-19 Thread Jason Merrill

On 9/25/23 16:43, Patrick Palka wrote:

This much more mechanical patch removes build_non_dependent_expr
(and make_args_non_dependent) and adjusts callers accordingly,
no functional change.


These two patches are OK either separately or squashed, whichever you 
prefer.



gcc/cp/ChangeLog:

* call.cc (build_new_method_call): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
* coroutines.cc (finish_co_return_stmt): Likewise.
* cp-tree.h (build_non_dependent_expr): Remove.
(make_args_non_dependent): Remove.
* decl2.cc (grok_array_decl): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(build_offset_ref_call_from_tree): Likewise.
* init.cc (build_new): Likewise.
* pt.cc (make_args_non_dependent): Remove.
(test_build_non_dependent_expr): Remove.
(cp_pt_cc_tests): Adjust.
* semantics.cc (finish_expr_stmt): Remove calls to
build_non_dependent_expr and/or make_args_non_dependent.
(finish_for_expr): Likewise.
(finish_call_expr): Likewise.
(finish_omp_atomic): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.
(build_x_indirect_ref): Likewise.
(build_x_binary_op): Likewise.
(build_x_array_ref): Likewise.
(build_x_vec_perm_expr): Likewise.
(build_x_shufflevector): Likewise.
(build_x_unary_op): Likewise.
(cp_build_addressof): Likewise.
(build_x_conditional_expr):
(build_x_compound_expr): Likewise.
(build_static_cast): Likewise.
(build_x_modify_expr): Likewise.
(check_return_expr): Likewise.
* typeck2.cc (build_x_arrow): Likewise.
---
  gcc/cp/call.cc   |  7 +--
  gcc/cp/coroutines.cc |  3 ---
  gcc/cp/cp-tree.h |  2 --
  gcc/cp/decl2.cc  | 17 +++-
  gcc/cp/init.cc   |  5 -
  gcc/cp/pt.cc | 46 
  gcc/cp/semantics.cc  | 25 ++--
  gcc/cp/typeck.cc | 31 -
  gcc/cp/typeck2.cc|  1 -
  9 files changed, 6 insertions(+), 131 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e8dafbd8ba6..15079ddf6dc 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -11430,12 +11430,7 @@ build_new_method_call (tree instance, tree fns, vec **args,
  }
  
if (processing_template_decl)

-{
-  orig_args = args == NULL ? NULL : make_tree_vector_copy (*args);
-  instance = build_non_dependent_expr (instance);
-  if (args != NULL)
-   make_args_non_dependent (*args);
-}
+orig_args = args == NULL ? NULL : make_tree_vector_copy (*args);
  
/* Process the argument list.  */

if (args != NULL && *args != NULL)
diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index df3cc820797..a5464becf7f 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1351,9 +1351,6 @@ finish_co_return_stmt (location_t kw, tree expr)
 to undo it so we can try to treat it as an rvalue below.  */
expr = maybe_undo_parenthesized_ref (expr);
  
-  if (processing_template_decl)

-   expr = build_non_dependent_expr (expr);
-
if (error_operand_p (expr))
return error_mark_node;
  }
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 66b9a9c4b9a..8b9a7d58462 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7488,8 +7488,6 @@ extern bool any_value_dependent_elements_p  
(const_tree);
  extern bool dependent_omp_for_p   (tree, tree, tree, 
tree);
  extern tree resolve_typename_type (tree, bool);
  extern tree template_for_substitution (tree);
-inline tree build_non_dependent_expr   (tree t) { return t; } // XXX 
remove
-extern void make_args_non_dependent(vec *);
  extern bool reregister_specialization (tree, tree, tree);
  extern tree instantiate_non_dependent_expr(tree, tsubst_flags_t = 
tf_error);
  extern tree instantiate_non_dependent_expr_internal (tree, tsubst_flags_t);
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 344e19ec98b..0aa1e355972 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -427,14 +427,8 @@ grok_array_decl (location_t loc, tree array_expr, tree 
index_exp,
  return build_min_nt_loc (loc, ARRAY_REF, array_expr, index_exp,
   NULL_TREE, NULL_TREE);
}
-  array_expr = build_non_dependent_expr (array_expr);
-  if (index_exp)
-   index_exp = build_non_dependent_expr (index_exp);
-  else
-   {
- orig_index_exp_list = make_tree_vector_copy (*index_exp_list);
- make_args_non_dependent (*index_exp_list);
-   }
+  if (!index_exp)
+   orig_index_exp_list = make_tree_vector_copy (*index_exp_list);
  }
  
type = TREE_TYPE (array_expr);

@@ -5435,18 +5429,13 @@ build_offset_ref_call_from_tree (tree fn, vec **args,
   

Re: [PATCH 2/1] c++: rename tsubst_copy_and_build and tsubst_expr

2023-10-19 Thread Jason Merrill

On 10/4/23 15:23, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

After the previous patch, we currently have two tsubst entry points
for expression trees: tsubst_copy_and_build and tsubst_expr.  But the
latter is just a superset of the former that also handles statement
trees.  We could merge the two entry points so that we just have
tsubst_expr, but it seems natural to distinguish statement trees from
expression trees and to maintain a separate entry point for them.

To that end, this this patch renames tsubst_copy_and_build to
tsubst_expr, and renames the current tsubst_expr to tsubst_stmt, which
continues to be a superset of the former (since sometimes expression
trees appear in statement contexts, e.g. a branch of an IF_STMT could be
NOP_EXPR).  Making tsubst_stmt disjoint from tsubst_expr is left as
future work (if deemed desirable).

This patch in turn renames suitable existing uses of tsubst_expr (that
expect to take statement trees) to use tsubst_stmt.  Thus untouched
tsubst_expr calls are implicitly strengthened to expect only expression
trees after this patch.  And since I'm not familiar with the
tsubst_omp_* routines this patch renames all tsubst_expr uses there to
tsubst_stmt to ensure no unintended functional change in that area.
This patch also moves the handling of CO_YIELD_EXPR and CO_AWAIT_EXPR
from tsubst_stmt to tsubst_expr since they're expression trees.

gcc/cp/ChangeLog:

* cp-lang.cc (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-objcp-common.h (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.
* cp-tree.h (tsubst_copy_and_build): Remove declaration.
* init.cc (maybe_instantiate_nsdmi_init): Use tsubst_expr
instead of tsubst_copy_and_build.
* pt.cc (expand_integer_pack): Likewise.
(instantiate_non_dependent_expr_internal): Likewise.
(instantiate_class_template): Use tsubst_stmt instead of
tsubst_expr for STATIC_ASSERT.
(tsubst_function_decl): Adjust tsubst_copy_and_build uses.
(tsubst_arg_types): Likewise.
(tsubst_exception_specification): Likewise.
(tsubst_tree_list): Likewise.
(tsubst): Likewise.
(tsubst_name): Likewise.
(tsubst_omp_clause_decl): Use tsubst_stmt instead of tsubst_expr.
(tsubst_omp_clauses): Likewise.
(tsubst_copy_asm_operands): Adjust tsubst_copy_and_build use.
(tsubst_omp_for_iterator): Use tsubst_stmt instead of tsubst_expr.
(tsubst_expr): Rename to ...
(tsubst_stmt): ... this.
: Move to tsubst_expr.
(tsubst_omp_udr): Use tsubst_stmt instead of tsubst_expr.
(tsubst_non_call_postfix_expression): Adjust tsubst_copy_and_build
use.
(tsubst_lambda_expr): Likewise.  Use tsubst_stmt instead of
tsubst_expr for the body of a lambda.
(tsubst_copy_and_build_call_args): Rename to ...
(tsubst_call_args): ... this.  Adjust tsubst_copy_and_build use.
(tsubst_copy_and_build): Rename to tsubst_expr.  Adjust
tsubst_copy_and_build and tsubst_copy_and_build_call_args use.
: Use tsubst_stmt instead of tsubst_expr.
(maybe_instantiate_noexcept): Adjust tsubst_copy_and_build use.
(instantiate_body): Use tsubst_stmt instead of tsubst_expr for
substituting the function body.
(tsubst_initializer_list): Adjust tsubst_copy_and_build use.

gcc/objcp/ChangeLog:

* objcp-lang.cc (objcp_tsubst_copy_and_build): Rename to ...
(objcp_tsubst_expr): ... this.  Adjust tsubst_copy_and_build
uses.
---
  gcc/cp/cp-lang.cc|   6 +-
  gcc/cp/cp-objcp-common.h |   2 +-
  gcc/cp/cp-tree.h |   1 -
  gcc/cp/init.cc   |   3 +-
  gcc/cp/pt.cc | 177 +--
  gcc/objcp/objcp-lang.cc  |   5 +-
  6 files changed, 85 insertions(+), 109 deletions(-)

diff --git a/gcc/cp/cp-lang.cc b/gcc/cp/cp-lang.cc
index 2f541460c4a..f2ed83de4fb 100644
--- a/gcc/cp/cp-lang.cc
+++ b/gcc/cp/cp-lang.cc
@@ -113,10 +113,8 @@ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
  /* The following function does something real, but only in Objective-C++.  */
  
  tree

-objcp_tsubst_copy_and_build (tree /*t*/,
-tree /*args*/,
-tsubst_flags_t /*complain*/,
-tree /*in_decl*/)
+objcp_tsubst_expr (tree /*t*/, tree /*args*/, tsubst_flags_t /*complain*/,
+  tree /*in_decl*/)
  {
return NULL_TREE;
  }
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index 80893aa1752..1408301a300 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
  /* In cp/objcp-common.c, cp/cp-lang.cc and objcp/objcp-lang.cc.  */
  
  extern tree 

Re: [PATCH] c++: merge tsubst_copy into tsubst_copy_and_build

2023-10-19 Thread Jason Merrill

On 10/4/23 12:08, Patrick Palka wrote:

On Tue, 3 Oct 2023, Jason Merrill wrote:


On 10/3/23 08:41, Patrick Palka wrote:

On Mon, 2 Oct 2023, Patrick Palka wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

The relationship between tsubst_copy_and_build and tsubst_copy (two of
the main template argument substitution routines for expression trees)
is rather hazy.  The former is mostly a superset of the latter, with
some differences.

The main difference is that they handle many tree codes differently, but
much of the tree code handling in tsubst_copy appears to be dead code[1].
This is because tsubst_copy only gets directly called in a few places
and mostly on id-expressions.  The interesting exceptions are PARM_DECL,
VAR_DECL, BIT_NOT_EXPR, SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE:

   * for PARM_DECL and VAR_DECL, tsubst_copy_and_build calls tsubst_copy
 followed by doing some extra handling of its own
   * for BIT_NOT_EXPR tsubst_copy implicitly handles unresolved destructor
 calls (i.e. the first operand is an identifier or a type)
   * for SCOPE_REF, TEMPLATE_ID_EXPR and IDENTIFIER_NODE tsubst_copy
 refrains from doing name lookup of the terminal name

Other more minor differences are that tsubst_copy exits early when
'args' is null, and it calls maybe_dependent_member_ref


That's curious, since what that function does seems like name lookup; I
wouldn't think we would want to call it when tf_no_name_lookup.


Ah, that makes sense I think.




and finally it dispatches to tsubst for type trees.


And it looks like you fix the callers to avoid that?


Yes, I'll make a note of that in the commit message.




Thus tsubst_copy is (at this point) similar enough to
tsubst_copy_and_build
that it makes sense to merge the two functions, with the main difference
being the name lookup behavior[2].  So this patch merges tsubst_copy into
tsubst_copy_and_build via a new tsubst tf_no_name_lookup which controls
name lookup and resolution of a (top-level) id-expression.

[1]: http://thrifty.mooo.com:8008/gcc-lcov/gcc/cp/pt.cc.gcov.html#17231
[2]: I don't know the history of tsubst_copy but I would guess it was
added before we settled on using processing_template_decl to control
whether our AST building routines perform semantic checking and return
non-templated trees, and so we needed a separate tsubst routine that
avoids semantic checking and always returns a templated tree for e.g.
partial substitution.


Oops, this is wrong -- tsubst_copy_and_build came after tsubst_copy,
and was introduced as an optimization with the intent of getting rid
of tsubst_copy eventually:
https://gcc.gnu.org/pipermail/gcc-patches/2003-January/093659.html


I wonder if we want to add a small tsubst_name wrapper to call
tsubst_copy_and_build with tf_no_name_lookup?


Good idea, that'll complement tsubst_scope nicely.



Can we also merge in tsubst_expr and use that name instead of the unwieldy
tsubst_copy_and_build?


That'd be nice.  Another idea would be to rename tsubst_expr to
tsubst_stmt and make it disjoint from tsubst_copy_and_build, and then
rename tsubst_copy_and_build to tsubst_expr, to draw a distinction
between statement-like trees (the substitution of which typically has
side effects like calling add_stmt) and expression-like trees (which
don't usually have such side effects).  I can work on that as a
follow-up patch.

Here's v2 which guards the call to maybe_dependent_member_ref and adds
tsubst_name, bootstrapped and regtested on x86_64-pc-linux-gnu:


OK.



Re: [PATCH] c: Add -Wreturn-mismatch warning, split from -Wreturn-type

2023-10-19 Thread Joseph Myers
On Thu, 19 Oct 2023, Florian Weimer wrote:

> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index 6e044b4afbc..1da622160a3 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -11281,17 +11281,10 @@ c_finish_return (location_t loc, tree retval, tree 
> origtype)
>if ((warn_return_type >= 0 || flag_isoc99)
> && valtype != NULL_TREE && TREE_CODE (valtype) != VOID_TYPE)
>   {
> -   bool warned_here;
> -   if (flag_isoc99)
> - warned_here = pedwarn
> -   (loc, warn_return_type >= 0 ? OPT_Wreturn_type : 0,
> -"% with no value, in function returning non-void");
> -   else
> - warned_here = warning_at
> -   (loc, OPT_Wreturn_type,
> -"% with no value, in function returning non-void");
> no_warning = true;
> -   if (warned_here)
> +   if (pedwarn (loc, OPT_Wreturn_mismatch,
> +"% with no value,"
> +" in function returning non-void"))

This should not be a pedwarn before C99 (should not be an error with 
-std=c90 -pedantic-errors -Wreturn-type), such a return is valid before 
C99.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread waffl3x
> No, that wouldn't be appropriate for translation.
> None of non-member, static member and explicit object member are
> something that should be printed verbatim untranslated.
> "%s function %qD cannot have cv-qualifier", _("non-member")
> etc. is still inappropriate, some language might need to reorder
> the words in one case and not another one etc.
>
> What is ok if that %s (but in that case better %qs) is say some
> language keyword which shouldn't be translated.
>
> "%qs keyword not expected"
> with
> the arg being "inline", "constexpr", "consteval" for example would
> be fine.
>
> Jakub

Ah alright, I see what you're saying, I see what the difference is now.
It's a shame we can't have the translated string insert a %s and format
into that :^). Ah well, I guess this code is just doomed to look poor
then, what can you do.

This is pretty much why I've been afraid to touch anything with these
macros, if I don't understand exactly how it works anything slightly
weird solution I use might just not work.

Anyway, I think that clears up everything regarding that now, thanks.

Alex



Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-19 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> This adds an implementation for masked copysign along with an optimized
> pattern for masked copysign (x, -1).
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/109154
>   * config/aarch64/aarch64-sve.md (cond_copysign): New.
>
> gcc/testsuite/ChangeLog:
>
>   PR tree-optimization/109154
>   * gcc.target/aarch64/sve/fneg-abs_5.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 071400c820a5b106ddf9dc9faebb117975d74ea0..00ca30c24624dc661254568f45b61a14aa11c305
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -6429,6 +6429,57 @@ (define_expand "copysign3"
>}
>  )
>  
> +(define_expand "cond_copysign"
> +  [(match_operand:SVE_FULL_F 0 "register_operand")
> +   (match_operand: 1 "register_operand")
> +   (match_operand:SVE_FULL_F 2 "register_operand")
> +   (match_operand:SVE_FULL_F 3 "nonmemory_operand")
> +   (match_operand:SVE_FULL_F 4 "aarch64_simd_reg_or_zero")]
> +  "TARGET_SVE"
> +  {
> +rtx sign = gen_reg_rtx (mode);
> +rtx mant = gen_reg_rtx (mode);
> +rtx int_res = gen_reg_rtx (mode);
> +int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
> +
> +rtx arg2 = lowpart_subreg (mode, operands[2], mode);
> +rtx arg3 = lowpart_subreg (mode, operands[3], mode);
> +rtx arg4 = lowpart_subreg (mode, operands[4], mode);
> +
> +rtx v_sign_bitmask
> +  = aarch64_simd_gen_const_vector_dup (mode,
> +HOST_WIDE_INT_M1U << bits);
> +
> +/* copysign (x, -1) should instead be expanded as orr with the sign
> +   bit.  */
> +if (!REG_P (operands[3]))
> +  {
> + auto r0
> +   = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[3]));
> + if (-1 == real_to_integer (r0))

OK with the same change and under the same conditions as the FP/SIMD patch.

Thanks,
Richard

> +   {
> + arg3 = force_reg (mode, v_sign_bitmask);
> + emit_insn (gen_cond_ior (int_res, operands[1], arg2,
> +   arg3, arg4));
> + emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> + DONE;
> +   }
> +  }
> +
> +operands[2] = force_reg (mode, operands[3]);
> +emit_insn (gen_and3 (sign, arg3, v_sign_bitmask));
> +emit_insn (gen_and3
> +(mant, arg2,
> + aarch64_simd_gen_const_vector_dup (mode,
> +~(HOST_WIDE_INT_M1U
> +  << bits;
> +emit_insn (gen_cond_ior (int_res, operands[1], sign, mant,
> +   arg4));
> +emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> +DONE;
> +  }
> +)
> +
>  (define_expand "xorsign3"
>[(match_operand:SVE_FULL_F 0 "register_operand")
> (match_operand:SVE_FULL_F 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c
> new file mode 100644
> index 
> ..f4ecbeecbe1290134e688f46a4389d17155e4a0a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fneg-abs_5.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
> +
> +#include 
> +#include 
> +
> +/*
> +** f1:
> +**   ...
> +**   orr z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s
> +**   ...
> +*/
> +void f1 (float32_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   if (a[i] > n)
> + a[i] = -fabsf (a[i]);
> +   else
> + a[i] = n;
> +}
> +
> +/*
> +** f2:
> +**   ...
> +**   orr z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d
> +**   ...
> +*/
> +void f2 (float64_t *a, int n)
> +{
> +  for (int i = 0; i < (n & -8); i++)
> +   if (a[i] > n)
> + a[i] = -fabs (a[i]);
> +   else
> + a[i] = n;
> +}


Re: [PATCH]AArch64 Handle copysign (x, -1) expansion efficiently

2023-10-19 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> copysign (x, -1) is effectively fneg (abs (x)) which on AArch64 can be
> most efficiently done by doing an OR of the signbit.
>
> The middle-end will optimize fneg (abs (x)) now to copysign as the
> canonical form and so this optimizes the expansion.
>
> If the target has an inclusive-OR that takes an immediate, then the 
> transformed
> instruction is both shorter and faster.  For those that don't, the immediate
> has to be separately constructed, but this still ends up being faster as the
> immediate construction is not on the critical path.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Note that this is part of another patch series, the additional testcases
> are mutually dependent on the match.pd patch.  As such the tests are added
> there insteadof here.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/109154
>   * config/aarch64/aarch64.md (copysign3): Handle
>   copysign (x, -1).
>   * config/aarch64/aarch64-simd.md (copysign3): Likewise.
>   * config/aarch64/aarch64-sve.md (copysign3): Likewise.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 25a1e4e8ecf767636c0ff3cdab6cad6e1482f73e..a78e77dcc3473445108b06c50f9c28a8369f3e3f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -754,15 +754,33 @@ (define_insn 
> "aarch64_dot_lane<
>  (define_expand "copysign3"
>[(match_operand:VHSDF 0 "register_operand")
> (match_operand:VHSDF 1 "register_operand")
> -   (match_operand:VHSDF 2 "register_operand")]
> +   (match_operand:VHSDF 2 "nonmemory_operand")]
>"TARGET_SIMD"
>  {
> -  rtx v_bitmask = gen_reg_rtx (mode);
> +  machine_mode int_mode = mode;
> +  rtx v_bitmask = gen_reg_rtx (int_mode);
>int bits = GET_MODE_UNIT_BITSIZE (mode) - 1;
>  
>emit_move_insn (v_bitmask,
> aarch64_simd_gen_const_vector_dup (mode,
>HOST_WIDE_INT_M1U << 
> bits));
> +
> +  /* copysign (x, -1) should instead be expanded as orr with the sign
> + bit.  */
> +  if (!REG_P (operands[2]))
> +{
> +  auto r0
> + = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[2]));
> +  if (-1 == real_to_integer (r0))

Just restating Andrew's review really, but I think this is safer and
more general as:

  rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
  if (GET_CODE (op2_elt) == CONST_DOUBLE
  && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))

For vectors, the condition (in principle) is instead whether each
element is individually negative, but we can leave that for later.

OK with that change if the patch to reverse the current
(copysign x -1) -> (fneg (fabs x)) canonicalisation is accepted.
(As said, I'm nervous that it'll regress powerpc.)

Thanks,
Richard

> + {
> +   emit_insn (gen_ior3 (
> + lowpart_subreg (int_mode, operands[0], mode),
> + lowpart_subreg (int_mode, operands[1], mode), v_bitmask));
> +   DONE;
> + }
> +}
> +
> +  operands[2] = force_reg (mode, operands[2]);
>emit_insn (gen_aarch64_simd_bsl (operands[0], v_bitmask,
>operands[2], operands[1]));
>DONE;
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 
> 5a652d8536a0ef9461f40da7b22834e683e73ceb..071400c820a5b106ddf9dc9faebb117975d74ea0
>  100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -6387,7 +6387,7 @@ (define_insn "*3"
>  (define_expand "copysign3"
>[(match_operand:SVE_FULL_F 0 "register_operand")
> (match_operand:SVE_FULL_F 1 "register_operand")
> -   (match_operand:SVE_FULL_F 2 "register_operand")]
> +   (match_operand:SVE_FULL_F 2 "nonmemory_operand")]
>"TARGET_SVE"
>{
>  rtx sign = gen_reg_rtx (mode);
> @@ -6398,11 +6398,26 @@ (define_expand "copysign3"
>  rtx arg1 = lowpart_subreg (mode, operands[1], mode);
>  rtx arg2 = lowpart_subreg (mode, operands[2], mode);
>  
> -emit_insn (gen_and3
> -(sign, arg2,
> - aarch64_simd_gen_const_vector_dup (mode,
> -HOST_WIDE_INT_M1U
> -<< bits)));
> +rtx v_sign_bitmask
> +  = aarch64_simd_gen_const_vector_dup (mode,
> +HOST_WIDE_INT_M1U << bits);
> +
> +/* copysign (x, -1) should instead be expanded as orr with the sign
> +   bit.  */
> +if (!REG_P (operands[2]))
> +  {
> + auto r0
> +   = CONST_DOUBLE_REAL_VALUE (unwrap_const_vec_duplicate (operands[2]));
> + if (-1 == real_to_integer (r0))
> +   {
> + emit_insn (gen_ior3 (int_res, arg1, v_sign_bitmask));
> + emit_move_insn (operands[0], gen_lowpart (mode, int_res));
> + 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread Jakub Jelinek
On Thu, Oct 19, 2023 at 09:05:38PM +, waffl3x wrote:
> Okay so taking what you guys are saying here it sounds like it would be
> appropriate to refactor the code I was reluctant to refactor. The code
> (in grokfndecl) conditionally selects one of the two (now three with my
> changes) following strings despite them being mostly identical.
> 
> "non-member function %qD cannot have cv-qualifier"
> "static member function %qD cannot have cv-qualifier"
> "explicit object member function %qD cannot have cv-qualifier"
> 
> >From what I'm getting from what you two have said is that it would be
> reasonable to instead construct a string from it's two parts, just
> selecting the differing part dynamically.
> 
> "%s function %qD cannot have cv-qualifier"

No, that wouldn't be appropriate for translation.
None of non-member, static member and explicit object member are
something that should be printed verbatim untranslated.
"%s function %qD cannot have cv-qualifier", _("non-member")
etc. is still inappropriate, some language might need to reorder
the words in one case and not another one etc.

What is ok if that %s (but in that case better %qs) is say some
language keyword which shouldn't be translated.

"%qs keyword not expected"
with
the arg being "inline", "constexpr", "consteval" for example would
be fine.

Jakub



[PATCH] c: Add -Wreturn-mismatch warning, split from -Wreturn-type

2023-10-19 Thread Florian Weimer
The existing -Wreturn-type option covers both constraint violations
(which are mandatory to diagnose) and warnings that have known
false positives.  The new -Wreturn-mismatch warning is only about
the constraint violations (missing or extra return expressions),
and should eventually be turned into a permerror.

The -std=gnu89 test cases show that by default, we do not warn for
return; in a function not returning void.  This matches previous
practice for -Wreturn-type.

gcc/c-family/

* c.opt (Wreturn-mismatch): New.

gcc/c/

* c-typeck.cc (c_finish_return): Use pedwarn with
OPT_Wreturn_mismatch for missing/extra return expressions.

gcc/

* doc/invoke.texi (Warning Options): Document
-Wreturn-mismatch.  Update -Wreturn-type documentation.

gcc/testsuite/

* gcc.dg/Wreturn-mismatch-1.c: New.
* gcc.dg/Wreturn-mismatch-2.c: New.
* gcc.dg/Wreturn-mismatch-3.c: New.
* gcc.dg/Wreturn-mismatch-4.c: New.
* gcc.dg/Wreturn-mismatch-5.c: New.
* gcc.dg/noncompile/pr55976-1.c: Change -Werror=return-type
to -Werror=return-mismatch.
* gcc.dg/noncompile/pr55976-2.c: Likewise.

---
 gcc/c-family/c.opt  |  4 +++
 gcc/c/c-typeck.cc   | 15 +++
 gcc/doc/invoke.texi | 35 +++-
 gcc/testsuite/gcc.dg/Wreturn-mismatch-1.c   | 40 
 gcc/testsuite/gcc.dg/Wreturn-mismatch-2.c   | 41 +
 gcc/testsuite/gcc.dg/Wreturn-mismatch-3.c   | 40 
 gcc/testsuite/gcc.dg/Wreturn-mismatch-4.c   | 40 
 gcc/testsuite/gcc.dg/Wreturn-mismatch-5.c   | 40 
 gcc/testsuite/gcc.dg/noncompile/pr55976-1.c |  2 +-
 gcc/testsuite/gcc.dg/noncompile/pr55976-2.c |  2 +-
 10 files changed, 234 insertions(+), 25 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 44b9c862c14..4c2ebd08af5 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1263,6 +1263,10 @@ Wreorder
 C++ ObjC++ Var(warn_reorder) Warning LangEnabledBy(C++ ObjC++,Wall)
 Warn when the compiler reorders code.
 
+Wreturn-mismatch
+C ObjC Var(warn_return_mismatch) Warning Init(1)
+Warn whenever void-returning functions return a non-void expressions, or a 
return expression is missing in a function not returning void.
+
 Wreturn-type
 C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wall) Init(-1)
 Warn whenever a function's return type defaults to \"int\" (C), or about 
inconsistent return types (C++).
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 6e044b4afbc..1da622160a3 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -11281,17 +11281,10 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
   if ((warn_return_type >= 0 || flag_isoc99)
  && valtype != NULL_TREE && TREE_CODE (valtype) != VOID_TYPE)
{
- bool warned_here;
- if (flag_isoc99)
-   warned_here = pedwarn
- (loc, warn_return_type >= 0 ? OPT_Wreturn_type : 0,
-  "% with no value, in function returning non-void");
- else
-   warned_here = warning_at
- (loc, OPT_Wreturn_type,
-  "% with no value, in function returning non-void");
  no_warning = true;
- if (warned_here)
+ if (pedwarn (loc, OPT_Wreturn_mismatch,
+  "% with no value,"
+  " in function returning non-void"))
inform (DECL_SOURCE_LOCATION (current_function_decl),
"declared here");
}
@@ -11302,7 +11295,7 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
   bool warned_here;
   if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
warned_here = pedwarn
- (xloc, warn_return_type >= 0 ? OPT_Wreturn_type : 0,
+ (xloc, OPT_Wreturn_mismatch,
   "% with a value, in function returning void");
   else
warned_here = pedwarn
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9b5ff457027..c0c155af7cd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7338,22 +7338,33 @@ This warning is enabled by @option{-Wall} for C and C++.
 Do not warn about returning a pointer (or in C++, a reference) to a
 variable that goes out of scope after the function returns.
 
+@opindex Wreturn-mismatch
+@opindex Wno-return-mismatch
+@item -Wreturn-mismatch
+Warn about return statements without an expressions in functions which
+do not return @code{void}.  Also warn about a @code{return} statement
+with an expression in a function whose return type is @code{void},
+unless the expression type is also @code{void}.  As a GNU extension, the
+latter case is accepted without a warning unless @option{-Wpedantic} is
+used.
+
+Attempting to use the return value of a non-@code{void} function other
+than @code{main} 

Re: [PATCH v3 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-10-19 Thread waffl3x
> (Jakub)
> There are different kinds of format strings in GCC, the most common
> are the gcc-internal-format strings. If you call a function which
> is expected to take such translatable format string (in particular
> a function which takes a gmsgid named argument like error, error_at,
> pedwarn, warning_at, ...) and pass a string literal to that function,
> nothing needs to be marked in a special way, both gcc/po/exgettext
> is able to collect such literals into gcc/po/gcc.pot for translations
> and the function is supposed to use gettext etc. to translate it
> - e.g. see diagnostic_set_info using (gmsgid) for that.
> But, if there is e.g. a temporary pointer var which points to format
> strings and only that is eventually passed to the diagnostic functions,
> gcc/po/exgettext won't be able to collect such literals, which is where
> the G() macro comes into play and one marks the string as
> gcc-internal-format with it; the translation is still handled by the
> diagnostic function at runtime. The N_() macro is similar but for c-format
> strings instead. The _() macro both collects for translations if it is
> used with string literal, and expands to gettext call to translate it at
> runtime, which is something that should be avoided if something translates
> it again.

> (Jason)
> The protocol is described in gcc/ABOUT-GCC-NLS. In general, "strings"
> passed directly to a diagnostic function don't need any decoration, but
> if they're assigned to a variable first, they need G_() so they're
> recognized as diagnostic strings to be added to the translation table.

I read this last night and decided to sleep on it hoping it would make
more sense now. I was going to write about how I'm still confused but I
think it just clicked for me. I somehow didn't make the connection that
any kind of of expression (like a conditional) is going to require it's
use. I guess I kept thinking "but they aren't being assigned to a
variable" but I get it now, it's when the strings aren't passed
DIRECTLY into the diagnostic function.

> (Jakub)
> And another i18n rule is that one shouldn't try to construct diagnostic
> messages from parts of english sentences, it is fine to fill in with %s/%qs
> etc. language keywords etc. but otherwise the format string should contain
> the whole diagnostic line, so that translators can reorder the words etc.

> (Jason)
> The () macro is used for strings that are going to be passed to a %s,
> but better to avoid doing that for strings that need translation. N()
> is (rarely) used for strings that aren't diagnostic format strings, but
> get passed to another function that passes them to _().

Okay so taking what you guys are saying here it sounds like it would be
appropriate to refactor the code I was reluctant to refactor. The code
(in grokfndecl) conditionally selects one of the two (now three with my
changes) following strings despite them being mostly identical.

"non-member function %qD cannot have cv-qualifier"
"static member function %qD cannot have cv-qualifier"
"explicit object member function %qD cannot have cv-qualifier"

>From what I'm getting from what you two have said is that it would be
reasonable to instead construct a string from it's two parts, just
selecting the differing part dynamically.

"%s function %qD cannot have cv-qualifier"

Just to clarify, should I be marking the "non-member", "static member",
"explicit object member" strings with the _ macro then? I'm just not
sure since it seems like Jason is saying they shouldn't be since the
string they are being formatted into is being translated afterwards.

Also, I'm not sure what %qs is, should I be using that instead of %s
for strings?

On another topic, I have been trying to fix taking pointers to explicit
object member functions today, as I ended up breaking that when I
started setting static_function to false for them. Originally it just
worked so I haven't touched any code related to this, but now that they
aren't being treating like static member functions as much so a few
things just broke. What I'm asking myself now is whether it would be
appropriate to just opt-in to static member function behavior for this
one, and I'm not sure that would be correct.

So I started by checking what it did before I turned off the
static_function flag. It's was being passed into cp_build_addr_expr_1
as a baselink node, while regular member functions are passed in as an
offset_ref node. I then checked what the case is for static member
function, and unsurprisingly those are also handled wrapped in a
baselink node, but this raised some questions for me.

I am now trying to figure out what exactly a baselink is, and why it's
used for static member functions. My current best guess is that
method_type nodes already hold the information that a baselink does,
and that information is needed in general. If that is the case, it
might just be correct to just do the same thing for explicit object
member functions, but I wonder if there is more 

[Bug c++/111885] New: source code after "required from here" note sometimes printed strangely

2023-10-19 Thread ppalka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111885

Bug ID: 111885
   Summary: source code after "required from here" note sometimes
printed strangely
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ppalka at gcc dot gnu.org
  Target Milestone: ---

$ cat required-from-here-bug.C
template void f(typename T::type);

int main() {
  f(0);
}

$ g++ required-from-here-bug.C
required-from-here-bug.C: In function ‘int main()’:
required-from-here-bug.C:4:9: error: no matching function for call to
‘f(int)’
4 |   f(0);
  |   ~~^~~
required-from-here-bug.C:1:24: note: candidate: ‘template void
f(typename T::type)’
1 | template void f(typename T::type);
  |^
required-from-here-bug.C:1:24: note:   template argument deduction/substitution
failed:
required-from-here-bug.C: In substitution of ‘template void f(typename
T::type) [with T = int]’:
required-from-here-bug.C:4:9:   required from here
required-from-here-bug.C:1:24: note: 4 |   f(0);
required-from-here-bug.C:1:24: note:   |   ~~^~~
required-from-here-bug.C:1:24: error: ‘int’ is not a class, struct, or union
type
1 | template void f(typename T::type);
  |^

The two notes following the "required from here" diagnostic seem to be
misprinted -- they refer to line 1, but show source code from line 4, which
seems like a bug?

[Bug c/111884] [13/14 Regression] unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread tom at honermann dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

--- Comment #4 from Tom Honermann  ---
(In reply to Marek Polacek from comment #3)
> Thanks, I can test

Thank you. That change looks right. My apologies for introducing the
regression.

Re: Improve -Wflex-array-member-not-at-end changes.html wording |Plus: and warning bug? (was: [V2][PATCH] gcc-14/changes.html: Deprecate a GCC C extension on flexible array members.)

2023-10-19 Thread Qing Zhao
Hi, Tobias,

Sorry for the late reply (just came back from a long vacation after Cauldron).

And thank you for reporting this issue.

Please see my reply embedded below:

> On Sep 25, 2023, at 2:24 PM, Tobias Burnus  wrote:
> 
> Hi all,
> 
> I stumbled over this as I found the wording in the release notes rather 
> unclear.is.
> 
> 
> First, the following gives only a -pedantic warning and not a 
> -Wflex-array-member-not-at-end:
> 
>  struct t { int b; int x[]; };
>  struct q { int b; struct t a[2]; int c; };
> 
> warning: invalid use of structure with flexible array member [-Wpedantic]
> 
> If I remove the "[2]", it shows additionally:
>  warning: structure containing a flexible array member is not at the end of 
> another structure [-Wflex-array-member-not-at-end]

I think that the above behavior is correct as Richard mentioned previously.  -:)

First, by C99, a structure with flexible array member cannot be an element of 
an array, 
Therefore, struct t a[2] is an incorrect usage, the warning:

warning: invalid use of structure with flexible array member [-Wpedantic]

is complaining about this standard violation.  Though the diagnositic  message 
might need to be more specific like the following:

Warning: invalid use of structure with flexible array member as array element 
[-Wpedantic]

Then, after fixing this standard violation issue, the testing case is as 
following:

 struct t { int b; int x[]; };
 struct q { int b; struct t a; int c; };

adding -Wflex-array-member-not-at-end will report the expecting message:

/home/opc/Install/latest/bin/gcc -O  -Wflex-array-member-not-at-end t.c -S
t.c:2:29: warning: structure containing a flexible array member is not at the 
end of another structure [-Wflex-array-member-not-at-end]
2 |  struct q { int b; struct t a; int c; };
  | ^

So, I think that GCC’s behavior is correct. 

However, we might need to make the diagnostic message more accurate here (I can 
submit a small patch to improve this). 

> 
> It seems as if it should print latter warning also inside the struct.
> 
> Qing? Joseph? Thoughts?
> 
> * * *
> 
> Secondly, if this is deprecated, shouldn't then the warning enabled by, e.g., 
> -Wall or made
> otherwise more prominent? (-std=?) - Currently, one either has to find the 
> new flag or use
> -pedantic.

Yes, agreed, However, I think that it might be better to delay this to next GCC 
release by giving users plenty time to fix all the 
-Wflex-array-member-not-at-end warnings.  As I know, linux kernel exposed a lot 
of warnings when adding -Wflex-array-member-not-at-end, and kernel people are 
trying to fix all these warnings in the source base.  

> 
> Or is this not really regarded as deprecated? But then (IMHO) we should not 
> really claim so and just
> add the warning without deprecation.

I think that our final goal is to deprecate this ambiguous extension from GCC 
completely, but we need time to mitigate users step by step. 
> 
> BTW; clang-15 prints the -Wgnu-variable-sized-type-not-at-end warning by 
> default.
> 
> Joseph, all: Thoughts?
> 
> * * *
> 
> Cross ref: The patch adding the new warning is r14-2197-g070a6bf0bdc6761
> https://gcc.gnu.org/pipermail/gcc-cvs/2023-June/385730.html (cf. previously 
> in this thread)
> 
> 
> * * *
> 
> Regarding the changes.html wording:
> 
> On 07.08.23 16:22, Qing Zhao via Gcc-patches wrote:
> 
>> Comparing to the 1st version, the only change is to address Richard's
>> comment on refering a warning option for diagnosing deprecated behavior.
> ...
>> +++ b/htdocs/gcc-14/changes.html
>> @@ -30,7 +30,18 @@ a work-in-progress.
>>  
>>  Caveats
>>  
>> -  ...
>> +  C:
>> +  Support for the GCC extension, a structure containing a C99 flexible 
>> array
>> +  member, or a union containing such a structure, is not the last field 
>> of
>> +  another structure, is deprecated. Refer to
>> +  https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html;>
>> +  Zero Length Arrays.
> 
> ...
> 
> I find the first sentence difficult to read. What do you think of the 
> following?
> (It is hard to come up with some good wording.)
> 
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -31,9 +31,10 @@ a work-in-progress.
> Caveats
> 
>   C:
> -  Support for the GCC extension, a structure containing a C99 flexible 
> array
> -  member, or a union containing such a structure, is not the last field 
> of
> -  another structure, is deprecated. Refer to
> +  Support for the GCC extension that a structure containing a C99 
> flexible
> +  array (and any union containing a member of such structure) can be a
> +  member of a structure has been deprecated for the case that it is not
> +  the last member. Refer to
>   https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html;>
>   Zero Length Arrays.
>   Any code relying on this extension should be modifed to ensure that

I  felt the above is more difficult to understand… how 

[Bug testsuite/111883] Wstringop-overflow-6.C FAILs with -std=c++26

2023-10-19 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111883

--- Comment #3 from Marek Polacek  ---
Did you mean like the following?  I have no idea if that's correct but is
suppresses the warnings I see.

In C++23 I don't see the code in the .ii file at all, so it doesn't warn.

--- a/libstdc++-v3/include/std/charconv
+++ b/libstdc++-v3/include/std/charconv
@@ -684,7 +684,7 @@ namespace __detail
 from_chars_result __res
   = __from_chars_float16_t(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = _Float16(__val);
 return __res;
   }
 #endif
@@ -697,7 +697,7 @@ namespace __detail
 float __val;
 from_chars_result __res = from_chars(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = _Float32(__val);
 return __res;
   }
 #endif
@@ -710,7 +710,7 @@ namespace __detail
 double __val;
 from_chars_result __res = from_chars(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = _Float64(__val);
 return __res;
   }
 #endif
@@ -723,7 +723,7 @@ namespace __detail
 long double __val;
 from_chars_result __res = from_chars(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = _Float128(__val);
 return __res;
   }
 #elif defined(__STDCPP_FLOAT128_T__) && defined(_GLIBCXX_HAVE_FLOAT128_MATH)
@@ -739,7 +739,7 @@ namespace __detail
 __extension__ __ieee128 __val;
 from_chars_result __res = from_chars(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = _Float128(__val);
 return __res;
   }
 #else
@@ -760,7 +760,7 @@ namespace __detail
 from_chars_result __res
   = __from_chars_bfloat16_t(__first, __last, __val, __fmt);
 if (__res.ec == errc{})
-  __value = __val;
+  __value = __gnu_cxx::__bfloat16_t(__val);
 return __res;
   }
 #endif

[Bug c/111884] [13/14 Regression] unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

Andrew Pinski  changed:

   What|Removed |Added

Summary|unsigned char no longer |[13/14 Regression] unsigned
   |aliases anything under  |char no longer aliases
   |-std=c2x|anything under -std=c2x
   Keywords||diagnostic
   Target Milestone|--- |13.3

[Bug c/111884] unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

Marek Polacek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||mpolacek at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #3 from Marek Polacek  ---
Thanks, I can test

--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -3828,8 +3828,9 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
 return -1;

-  /* Unlike char, char8_t doesn't alias. */
-  if (flag_char8_t && t == char8_type_node)
+  /* Unlike char, char8_t doesn't alias in C++.  (In C, char8_t is not
+ a distinct type.)  */
+  if (flag_char8_t && t == char8_type_node && c_dialect_cxx ())
 return -1;

   /* The C standard guarantees that any object may be accessed via an

[Bug testsuite/111883] Wstringop-overflow-6.C FAILs with -std=c++26

2023-10-19 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111883

--- Comment #2 from Jakub Jelinek  ---
Why doesn't it fail with -std=c++23 though?  Was there some C++26 change I'm
not aware of?
In the to_chars cases, we already use float(__value) casts in the
_Float16/__bfloat16_t cases (but others too), so I think we just want to add
explicit casts also to all the from_chars
  __value = __val;
lines (or at least the _Float16/__bfloat16_t cases).

[Bug c/111884] unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

--- Comment #2 from joseph at codesourcery dot com  ---
I'm going to guess this was introduced by the char8_t changes ("C: 
Implement C2X N2653 char8_t and UTF-8 string literal changes", commit 
703837b2cc8ac03c53ac7cc0fb1327055acaebd2).

  /* Unlike char, char8_t doesn't alias. */
  if (flag_char8_t && t == char8_type_node)
return -1;

is not correct for C, where char8_t is not a distinct type.

[Bug c/111884] unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

Sam James  changed:

   What|Removed |Added

 CC||jsm28 at gcc dot gnu.org,
   ||sjames at gcc dot gnu.org

--- Comment #1 from Sam James  ---
Originally reported at
https://inbox.sourceware.org/gcc-help/e18818e9-396c-41fe-6c4a-9ac86c564...@ispras.ru/T/#m147d6322a26e2bc094dd5e0935b7f2223bc910ac.
Curious indeed..

[pushed] doc: Update contrib.texi

2023-10-19 Thread Marek Polacek
I noticed that Patrick is missing here.

gcc/ChangeLog:

* doc/contrib.texi: Add entry for Patrick Palka.
---
 gcc/doc/contrib.texi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi
index 031c4ec44ce..0fe4a874616 100644
--- a/gcc/doc/contrib.texi
+++ b/gcc/doc/contrib.texi
@@ -781,6 +781,9 @@ ABI support, improvements to dejagnu's MIPS support, Java 
configuration
 clean-ups and porting work, and maintaining the IRIX, Solaris 2, and
 Tru64 UNIX ports.
 
+@item
+Patrick Palka for contributions to the C++ library and front end.
+
 @item
 Steven Pemberton for his contribution of @file{enquire} which allowed GCC to
 determine various properties of the floating point unit and generate

base-commit: d8e4e7def3338344a761d841010e98d017c58b0a
-- 
2.41.0



[Bug testsuite/111883] Wstringop-overflow-6.C FAILs with -std=c++26

2023-10-19 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111883

--- Comment #1 from Jonathan Wakely  ---
I think Jakub wrote that code, but it looks like we just want the explicit
casts. I can add those.

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Jason Merrill

On 10/19/23 14:45, Marek Polacek wrote:

On Thu, Oct 19, 2023 at 01:02:33PM -0400, Jason Merrill wrote:

On 10/19/23 12:55, Marek Polacek wrote:

On Thu, Oct 19, 2023 at 12:32:49PM -0400, Jason Merrill wrote:

On 10/19/23 10:14, Marek Polacek wrote:

On Thu, Oct 19, 2023 at 10:06:01AM -0400, Jason Merrill wrote:

On 10/19/23 09:39, Patrick Palka wrote:

On Tue, 17 Oct 2023, Marek Polacek wrote:

[...]

  case PTRMEM_CST:
if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL
  && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt)))
@@ -1162,8 +1141,35 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void *data_)
tree stmt = *stmt_p;
enum tree_code code = TREE_CODE (stmt);
-  if (cxx_dialect > cxx17)
-cp_fold_immediate_r (stmt_p, walk_subtrees, data);
+  if (cxx_dialect >= cxx20)
+{
+  /* Unfortunately we must handle code like
+  false ? bar () : 42
+where we have to check bar too.  The cp_fold call below could
+fold the ?: into a constant before we've checked it.  */
+  if (code == COND_EXPR)
+   {
+ auto then_fn = cp_fold_r, else_fn = cp_fold_r;
+ /* See if we can figure out if either of the branches is dead.  If it
+is, we don't need to do everything that cp_fold_r does.  */
+ tree cond = maybe_constant_value (TREE_OPERAND (stmt, 0));
+ if (integer_zerop (cond))
+   then_fn = cp_fold_immediate_r;
+ else if (TREE_CODE (cond) == INTEGER_CST)
+   else_fn = cp_fold_immediate_r;
+
+ cp_walk_tree (_OPERAND (stmt, 0), cp_fold_r, data, nullptr);


I wonder about doing this before maybe_constant_value, to hopefully reduce
the redundant calculations?  OK either way.


Yeah, I was toying with that, I had

  foo() ? x : y

where foo was a constexpr function but the cp_fold_r on op0 didn't eval it
to a constant :(.


IIUC that's because cp_fold evaluates constexpr calls only with -O, so
doing cp_fold_r before maybe_constant_value on the condition should
still be desirable in that case?


Hmm, and if the cp_fold_r doesn't reduce the test to a constant, I would
think that folding the COND_EXPR also won't discard the other branch, so we
shouldn't need to work harder to get a constant here, so we don't need to
call maybe_constant_value at all?


Sorry, I hadn't seen this message when I posted the tweak.  But the
maybe_constant_value call on TREE_OPERAND (stmt, 0) should still make
sense because it can render a branch dead even on -O0.  Right?


But if op0 isn't constant after cp_fold, folding the COND_EXPR won't discard
the branch, so we don't need to do the extra work to find out that it's
actually dead.


Hmm, so how about this?  Thus far dg.exp passed.

-- >8 --
This patch is an optimization tweak for cp_fold_r.  If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
cp_fold has:

3143 if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
3144 && !flag_no_inline)
...
3151 r = maybe_constant_value (x, /*decl=*/NULL_TREE,

flag_no_inline is 1 for -O0:

1124   if (opts->x_optimize == 0)
1125 {
1126   /* Inlining does not work if not optimizing,
1127  so force it not to be done.  */
1128   opts->x_warn_inline = 0;
1129   opts->x_flag_no_inline = 1;
1130 }

but otherwise it's 0 and cp_fold will maybe_constant_value calls to
constexpr functions.  And if it doesn't, then folding the COND_EXPR
will keep both arms, and we can avoid calling maybe_constant_value.

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_fold_r): Don't call maybe_constant_value.
---
   gcc/cp/cp-gimplify.cc | 7 +++
   1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index a282c3930a3..385042aeef2 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1152,13 +1152,12 @@ cp_fold_r (tree *stmt_p, int *walk_subtrees, void 
*data_)
  auto then_fn = cp_fold_r, else_fn = cp_fold_r;
  /* See if we can figure out if either of the branches is dead.  If it
 is, we don't need to do everything that cp_fold_r does.  */
- tree cond = maybe_constant_value (TREE_OPERAND (stmt, 0));
- if (integer_zerop (cond))
+ cp_walk_tree (_OPERAND (stmt, 0), cp_fold_r, data, nullptr);
+ if (integer_zerop (TREE_OPERAND (stmt, 0)))
then_fn = cp_fold_immediate_r;
- else if (TREE_CODE (cond) == INTEGER_CST)
+ else if (TREE_CODE (TREE_OPERAND (stmt, 0)) == INTEGER_CST)


You probably want to STRIP_NOPS before checking the TREE_CODE, like
fold_ternary_loc and integer_zerop do.


Ok, or use integer_nonzerop like Patrick suggested:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
This patch is an optimization tweak for cp_fold_r.  If we cp_fold_r the
COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
cp_fold has:

3143 

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-19 Thread Robin Dapp
Ugh, I didn't push yet because with a rebased trunk I am
seeing different behavior for some riscv testcases.

A reduction is not recognized because there is yet another
"double use" occurrence in check_reduction_path.  I guess it's
reasonable to loosen the restriction for conditional operations
here as well.

The only change to v4 therefore is:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index ebab1953b9c..64654a55e4c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -4085,7 +4094,15 @@ pop:
|| flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt
  FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
cnt++;
-  if (cnt != 1)
+
+  bool cond_fn_p = op.code.is_internal_fn ()
+   && (conditional_internal_fn_code (internal_fn (*code))
+   != ERROR_MARK);
+
+  /* In case of a COND_OP (mask, op1, op2, op1) reduction we might have
+op1 twice (once as definition, once as else) in the same operation.
+Allow this.  */
+  if ((!cond_fn_p && cnt != 1) || (opi == 1 && cond_fn_p && cnt != 2))

Bootstrapped and regtested again on x86, aarch64 and power10.
Testsuite on riscv unchanged.

Regards
 Robin

Subject: [PATCH v5] ifcvt/vect: Emit COND_OP for conditional scalar reduction.

As described in PR111401 we currently emit a COND and a PLUS expression
for conditional reductions.  This makes it difficult to combine both
into a masked reduction statement later.
This patch improves that by directly emitting a COND_ADD/COND_OP during
ifcvt and adjusting some vectorizer code to handle it.

It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
is true.

gcc/ChangeLog:

PR middle-end/111401
* tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
if supported.
(predicate_scalar_phi): Add whitespace.
* tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
(neutral_op_for_reduction): Return -0 for PLUS.
(check_reduction_path): Don't count else operand in COND_OP.
(vect_is_simple_reduction): Ditto.
(vect_create_epilog_for_reduction): Fix whitespace.
(vectorize_fold_left_reduction): Add COND_OP handling.
(vectorizable_reduction): Don't count else operand in COND_OP.
(vect_transform_reduction): Add COND_OP handling.
* tree-vectorizer.h (neutral_op_for_reduction): Add default
parameter.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
* gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.
---
 .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 +++
 .../riscv/rvv/autovec/cond/pr111401.c | 139 +++
 .../riscv/rvv/autovec/reduc/reduc_call-2.c|   4 +-
 .../riscv/rvv/autovec/reduc/reduc_call-4.c|   4 +-
 gcc/tree-if-conv.cc   |  49 +++--
 gcc/tree-vect-loop.cc | 168 ++
 gcc/tree-vectorizer.h |   2 +-
 7 files changed, 456 insertions(+), 51 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
new file mode 100644
index 000..7b46e7d8a2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
@@ -0,0 +1,141 @@
+/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-add-options ieee } */
+/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
+
+#include "tree-vect.h"
+
+#include 
+
+#define N (VECTOR_BITS * 17)
+
+double __attribute__ ((noinline, noclone))
+reduc_plus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res += a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone))
+reduc_minus_double (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+double __attribute__ ((noinline, noclone, optimize ("0")))
+reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
+{
+  double res = init;
+  for (int i = 0; i < n; i++)
+if (cond[i])
+  res -= a[i];
+  return res;
+}
+
+int 

[Bug c/111884] New: unsigned char no longer aliases anything under -std=c2x

2023-10-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111884

Bug ID: 111884
   Summary: unsigned char no longer aliases anything under
-std=c2x
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

int f(int i)
{
int f = 1;
return i[(unsigned char *)];
}
int g(int i)
{
int f = 1;
return i[(signed char *)];
}
int h(int i)
{
int f = 1;
return i[(char *)];
}


gcc -O2 -std=c2x compiles 'f' as though inspecting representation via an
'unsigned char *' is not valid (with a confusing warning under -Wall).

[Patch] omp_lib.f90.in: Deprecate omp_lock_hint_* for OpenMP 5.0

2023-10-19 Thread Tobias Burnus

I noticed that we only tagged half of the 5.0 deprecations in the omp_lib 
module;
this added the other half. (omp_lock_hint_* was renamed to omp_sync_hint_*.)

Currently, this patch has has no visible effect - but once we bump the OpenMP 
version
from 4.5 to 5.0, there will warnings of the those.

Comments, remarks? If not, I will commit the attached patch tomorrow.

Tobias

PS: Once the OpenMP version has been bumped (as done for testing), the warning
looks as follows:

1 | use omp_lib
  |1
Warning: Using parameter ‘omp_lock_hint_kind’ declared at (1) is deprecated 
[-Wdeprecated-declarations]

3 | print *, omp_get_nested()
  |1
Warning: Using function ‘omp_get_nested’ at (1) is deprecated 
[-Wdeprecated-declarations]

The 'omp_lock_hint_kind' warning is not ideal as it uses the declared-at 
location
and not the first-used location, which would be in line 2:

integer(omp_lock_hint_kind) :: hint
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
omp_lib.f90.in: Deprecate omp_lock_hint_* for OpenMP 5.0

The omp_lock_hint_* parameters were deprecated in favor of
omp_sync_hint_*.  While omp.h contained deprecation markers for those,
the omp_lib module only contained them for omp_{g,s}_nested.

Note: The -Wdeprecated-declarations warning will only become active once
openmp_version / _OPENMP is bumped from 201511 (4.5) to 201811 (5.0).

libgomp/ChangeLog:

	* omp_lib.f90.in: Tag omp_lock_hint_* as being deprecated when
	_OPENMP >= 201811.

diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index e4515271252..755c0bf16ec 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -904,6 +904,10 @@
 
 #if _OPENMP >= 201811
 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
+!GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_kind, omp_lock_hint_none
+!GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_uncontended, omp_lock_hint_contended
+!GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_nonspeculative
+!GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_speculative
 #endif
 
 #if _OPENMP >= 202011


Re: [PATCH v3 2/4] ifcvt: Allow more operations in multiple set if conversion

2023-10-19 Thread Richard Sandiford
Manolis Tsamis  writes:
> Currently the operations allowed for if conversion of a basic block with
> multiple sets are few, namely REG, SUBREG and CONST_INT (as controlled by
> bb_ok_for_noce_convert_multiple_sets).
>
> This commit allows more operations (arithmetic, compare, etc) to participate
> in if conversion. The target's profitability hook and ifcvt's costing is
> expected to reject sequences that are unprofitable.
>
> This is especially useful for targets which provide a rich selection of
> conditional instructions (like aarch64 which has cinc, csneg, csinv, ccmp, 
> ...)
> which are currently not used in basic blocks with more than a single set.
>
> gcc/ChangeLog:
>
>   * ifcvt.cc (try_emit_cmove_seq): Modify comments.
>   (noce_convert_multiple_sets_1): Modify comments.
>   (bb_ok_for_noce_convert_multiple_sets): Allow more operations.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/ifcvt_multiple_sets_arithm.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> Changes in v3:
> - Add SCALAR_INT_MODE_P check in bb_ok_for_noce_convert_multiple_sets.
> - Allow rewiring of multiple regs.
> - Refactor code with noce_multiple_sets_info.
> - Remove old code for subregs.
>
>  gcc/ifcvt.cc  | 63 ++-
>  .../aarch64/ifcvt_multiple_sets_arithm.c  | 79 +++
>  2 files changed, 123 insertions(+), 19 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/ifcvt_multiple_sets_arithm.c
>
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index 3273aeca125..efe8ab1577a 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3215,13 +3215,13 @@ try_emit_cmove_seq (struct noce_if_info *if_info, rtx 
> temp,
>  /* We have something like:
>  
>   if (x > y)
> -   { i = a; j = b; k = c; }
> +   { i = EXPR_A; j = EXPR_B; k = EXPR_C; }
>  
> Make it:
>  
> - tmp_i = (x > y) ? a : i;
> - tmp_j = (x > y) ? b : j;
> - tmp_k = (x > y) ? c : k;
> + tmp_i = (x > y) ? EXPR_A : i;
> + tmp_j = (x > y) ? EXPR_B : j;
> + tmp_k = (x > y) ? EXPR_C : k;
>   i = tmp_i;
>   j = tmp_j;
>   k = tmp_k;
> @@ -3637,11 +3637,10 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
> *if_info,
>  
>  
>  
> -/* Return true iff basic block TEST_BB is comprised of only
> -   (SET (REG) (REG)) insns suitable for conversion to a series
> -   of conditional moves.  Also check that we have more than one set
> -   (other routines can handle a single set better than we would), and
> -   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> +/* Return true iff basic block TEST_BB is suitable for conversion to a
> +   series of conditional moves.  Also check that we have more than one
> +   set (other routines can handle a single set better than we would),
> +   and fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  While going
> through the insns store the sum of their potential costs in COST.  */
>  
>  static bool
> @@ -3667,20 +3666,46 @@ bb_ok_for_noce_convert_multiple_sets (basic_block 
> test_bb, unsigned *cost)
>rtx dest = SET_DEST (set);
>rtx src = SET_SRC (set);
>  
> -  /* We can possibly relax this, but for now only handle REG to REG
> -  (including subreg) moves.  This avoids any issues that might come
> -  from introducing loads/stores that might violate data-race-freedom
> -  guarantees.  */
> -  if (!REG_P (dest))
> +  /* Do not handle anything involving memory loads/stores since it might
> +  violate data-race-freedom guarantees.  */
> +  if (!REG_P (dest) || contains_mem_rtx_p (src))
> + return false;
> +
> +  if (!SCALAR_INT_MODE_P (GET_MODE (src)))
>   return false;
>  
> -  if (!((REG_P (src) || CONSTANT_P (src))
> - || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
> -   && subreg_lowpart_p (src
> +  /* Allow a wide range of operations and let the costing function decide
> +  if the conversion is worth it later.  */
> +  enum rtx_code code = GET_CODE (src);
> +  if (!(CONSTANT_P (src)
> + || code == REG
> + || code == SUBREG
> + || code == ZERO_EXTEND
> + || code == SIGN_EXTEND
> + || code == NOT
> + || code == NEG
> + || code == PLUS
> + || code == MINUS
> + || code == AND
> + || code == IOR
> + || code == MULT
> + || code == ASHIFT
> + || code == ASHIFTRT
> + || code == NE
> + || code == EQ
> + || code == GE
> + || code == GT
> + || code == LE
> + || code == LT
> + || code == GEU
> + || code == GTU
> + || code == LEU
> + || code == LTU
> + || code == COMPARE))
>   return false;

I'm nervous about lists of operations like these, for two reasons:

(1) It isn't obvious what criteria are used to select the codes.

(2) It requires the 

Re: [PATCH v3 1/4] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2023-10-19 Thread Richard Sandiford
Manolis Tsamis  writes:
> This is an extension of what was done in PR106590.
>
> Currently if a sequence generated in noce_convert_multiple_sets clobbers the
> condition rtx (cc_cmp or rev_cc_cmp) then only seq1 is used afterwards
> (sequences that emit the comparison itself). Since this applies only from the
> next iteration it assumes that the sequences generated (in particular seq2)
> doesn't clobber the condition rtx itself before using it in the if_then_else,
> which is only true in specific cases (currently only register/subregister 
> moves
> are allowed).
>
> This patch changes this so it also tests if seq2 clobbers cc_cmp/rev_cc_cmp in
> the current iteration. This makes it possible to include arithmetic operations
> in noce_convert_multiple_sets.
>
> gcc/ChangeLog:
>
>   * ifcvt.cc (check_for_cc_cmp_clobbers): Use modified_in_p instead.
>   (noce_convert_multiple_sets_1): Don't use seq2 if it clobbers cc_cmp.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> (no changes since v1)
>
>  gcc/ifcvt.cc | 49 +++--
>  1 file changed, 19 insertions(+), 30 deletions(-)

Sorry for the slow review.  TBH I was hoping someone else would pick
it up, since (a) I'm not very familiar with this code, and (b) I don't
really agree with the way that the current code works.  I'm not sure the
current dependency checking is safe, so I'm nervous about adding even
more cases to it.  And it feels like the different ifcvt techniques are
not now well distinguished, so that they're beginning to overlap and
compete with one another.  None of that is your fault, of course. :)

> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index a0af553b9ff..3273aeca125 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -3375,20 +3375,6 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>return true;
>  }
>  
> -/* Helper function for noce_convert_multiple_sets_1.  If store to
> -   DEST can affect P[0] or P[1], clear P[0].  Called via note_stores.  */
> -
> -static void
> -check_for_cc_cmp_clobbers (rtx dest, const_rtx, void *p0)
> -{
> -  rtx *p = (rtx *) p0;
> -  if (p[0] == NULL_RTX)
> -return;
> -  if (reg_overlap_mentioned_p (dest, p[0])
> -  || (p[1] && reg_overlap_mentioned_p (dest, p[1])))
> -p[0] = NULL_RTX;
> -}
> -
>  /* This goes through all relevant insns of IF_INFO->then_bb and tries to
> create conditional moves.  In case a simple move sufficis the insn
> should be listed in NEED_NO_CMOV.  The rewired-src cases should be
> @@ -3552,9 +3538,17 @@ noce_convert_multiple_sets_1 (struct noce_if_info 
> *if_info,
>creating an additional compare for each.  If successful, costing
>is easier and this sequence is usually preferred.  */
>if (cc_cmp)
> - seq2 = try_emit_cmove_seq (if_info, temp, cond,
> -new_val, old_val, need_cmov,
> -, _dest2, cc_cmp, rev_cc_cmp);
> + {
> +   seq2 = try_emit_cmove_seq (if_info, temp, cond,
> +  new_val, old_val, need_cmov,
> +  , _dest2, cc_cmp, rev_cc_cmp);
> +
> +   /* The if_then_else in SEQ2 may be affected when cc_cmp/rev_cc_cmp is
> +  clobbered.  We can't safely use the sequence in this case.  */
> +   if (seq2 && (modified_in_p (cc_cmp, seq2)
> +   || (rev_cc_cmp && modified_in_p (rev_cc_cmp, seq2
> + seq2 = NULL;

modified_in_p only checks the first instruction in seq2, not the whole
sequence.

I think the unpatched approach is OK in cases where seq2 clobbers
cc_cmp/rev_cc_cmp in or after the last use, since instructions are
defined to operate on a read-all/compute/write-all basis.

Soon after the snippet above, the unpatched code has this loop:

  /* The backend might have created a sequence that uses the
 condition.  Check this.  */
  rtx_insn *walk = seq2;
  while (walk)
{
  rtx set = single_set (walk);

  if (!set || !SET_SRC (set))

This condition looks odd.  A SET_SRC is never null.  But more importantly,
continuing means "assume the best", and I don't think we should assume
the best for (say) an insn with two parallel sets.

It doesn't look like the series addresses this, but !set seems more
likely to occur if we extend the function to general operations.

{
  walk = NEXT_INSN (walk);
  continue;
}

  rtx src = SET_SRC (set);

  if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
; /* We assume that this is the cmove created by the backend that
 naturally uses the condition.  Therefore we ignore it.  */

XEXP (set, 1) is src.  But here too, I'm a bit nervous about assuming
that any if_then_else is safe to ignore, even currently.  It seems
more dangerous if we extend the function to handle more cases.

  else
{
  if (reg_mentioned_p 

[Bug target/111645] Intrinsics vec_sldb /vec_srdb fail with __vector unsigned __int128

2023-10-19 Thread carll at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111645

Carl Love  changed:

   What|Removed |Added

 CC||carll at gcc dot gnu.org

--- Comment #5 from Carl Love  ---

There are a couple of issues with the test case in the attachment.  For example
one of the tests is:


static inline vui64_t
vec_vsldbi_64 (vui64_t vra, vui64_t vrb, const unsigned int shb)
{
 return vec_sldb (vra, vrb, shb);
}

When I tried to compile it, it seemed to compile.  However if I take off the
static inline, then I get an error about in compatible arguments.  The built-in
requires an explicit integer be based in the third argument.  The following
worked for me:


static inline vui64_t
vec_vsldbi_64 (vui64_t vra, vui64_t vrb, const unsigned int shb)
{
 return vec_sldb (vra, vrb, 1);
}

The compiler/assembler needs an explicit value for the third argument as it has
to generate the instruction with the immediate shift value as part of the
instruction.  Hence a variable for the third argument will not work.

Agreed that the __int128 arguments can and should be supported.  Patch to add
that support is in progress but will require getting the LLVM/OpenXL team to
agree to adding the __128int variants as well.

[Bug target/111876] aarch64: bf16 complex mul/div does not work when the target has +fp16 support or when -fexcess-precision=16 is supplied

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111876

Andrew Pinski  changed:

   What|Removed |Added

 Target|aarch64 |aarch64 x86_64
Summary|aarch64: bf16 complex   |aarch64: bf16 complex
   |mul/div does not work when  |mul/div does not work when
   |the target has +fp16|the target has +fp16
   |support.|support or when
   ||-fexcess-precision=16 is
   ||supplied
 CC||jakub at gcc dot gnu.org

--- Comment #4 from Andrew Pinski  ---
So the difference between with/without +fp16 is -fexcess-precision=16 vs
without that option.

On x86_64 with -fexcess-precision=16, we get the same link failure.
Maybe Jakub understands what is the correct behavior here; (since he added
bfloat16 support to x86_64; r13-3292-gc2565a31c1622ab0926aeef4a6579413e121b9f9
).

[Bug target/111876] aarch64: bf16 complex mul/div does not work when the target has +fp16 support.

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111876

--- Comment #3 from Andrew Pinski  ---
(In reply to Iain Sandoe from comment #2)
> (In reply to Andrew Pinski from comment #1)
> > This could either be wrong code for not doing the promotion or just missing
> > the libgcc functions (which could be implemented as doing the promotion).
> > 
> > Either ways confirmed.
> 
> thanks, for checking.
> but I think the underlying concern is that providing a disjoint extension
> (+fp16) should not alter the behaviour of bf16 (in this case I did some
> limited poking about but could not see any obvious place where the addition
> of fp16 alters complex number handling).

The difference comes from the front-end which adds the promotions even.

[Bug target/111876] aarch64: bf16 complex mul/div does not work when the target has +fp16 support.

2023-10-19 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111876

--- Comment #2 from Iain Sandoe  ---
(In reply to Andrew Pinski from comment #1)
> This could either be wrong code for not doing the promotion or just missing
> the libgcc functions (which could be implemented as doing the promotion).
> 
> Either ways confirmed.

thanks, for checking.
but I think the underlying concern is that providing a disjoint extension
(+fp16) should not alter the behaviour of bf16 (in this case I did some limited
poking about but could not see any obvious place where the addition of fp16
alters complex number handling).

[Bug target/111876] aarch64: bf16 complex mul/div does not work when the target has +fp16 support.

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111876

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||wrong-code
   Last reconfirmed||2023-10-19
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
Summary|aarch64: Wrong code for |aarch64: bf16 complex
   |bf16 complex mul/div when   |mul/div does not work when
   |the target has +fp16|the target has +fp16
   |support.|support.

--- Comment #1 from Andrew Pinski  ---
This could either be wrong code for not doing the promotion or just missing the
libgcc functions (which could be implemented as doing the promotion).

Either ways confirmed.

[Bug tree-optimization/111878] [14 Regression] ICE: in get_loop_exit_edges, at cfgloop.cc:1204 with -O3 -fgraphite-identity -fsave-optimization-record/-fdump-tree-graphite

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111878

Andrew Pinski  changed:

   What|Removed |Added

Summary|[14 Regression] ICE: in |[14 Regression] ICE: in
   |get_loop_exit_edges, at |get_loop_exit_edges, at
   |cfgloop.cc:1204 with -O3|cfgloop.cc:1204 with -O3
   |-fgraphite-identity |-fgraphite-identity
   |-fsave-optimization-record  |-fsave-optimization-record/
   ||-fdump-tree-graphite

--- Comment #3 from Andrew Pinski  ---
If we wrap the whole function inside a loop, there is no crash.
That is:
```
int long_c2i_ltmp;
int *long_c2i_cont;

void
long_c2i (long utmp, int i)
{
  for(int j = 0; j < 100; j++)
 {
  int neg = 1;
  switch (long_c2i_cont[0])
case 0:
neg = 0;

  for (; i; i++)
if (neg)
  utmp |= long_c2i_cont[i] ^ 5;
else
  utmp |= long_c2i_cont[i];
  long_c2i_ltmp = utmp;
 }
}
```

That is because the loop that is being chosen here for the `->loop_father` is
the outer most loop (that was just added).

Re: [PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 01:02:33PM -0400, Jason Merrill wrote:
> On 10/19/23 12:55, Marek Polacek wrote:
> > On Thu, Oct 19, 2023 at 12:32:49PM -0400, Jason Merrill wrote:
> > > On 10/19/23 10:14, Marek Polacek wrote:
> > > > On Thu, Oct 19, 2023 at 10:06:01AM -0400, Jason Merrill wrote:
> > > > > On 10/19/23 09:39, Patrick Palka wrote:
> > > > > > On Tue, 17 Oct 2023, Marek Polacek wrote:
[...]
> > > > > > > > >  case PTRMEM_CST:
> > > > > > > > >if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == 
> > > > > > > > > FUNCTION_DECL
> > > > > > > > > && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER 
> > > > > > > > > (stmt)))
> > > > > > > > > @@ -1162,8 +1141,35 @@ cp_fold_r (tree *stmt_p, int 
> > > > > > > > > *walk_subtrees, void *data_)
> > > > > > > > >tree stmt = *stmt_p;
> > > > > > > > >enum tree_code code = TREE_CODE (stmt);
> > > > > > > > > -  if (cxx_dialect > cxx17)
> > > > > > > > > -cp_fold_immediate_r (stmt_p, walk_subtrees, data);
> > > > > > > > > +  if (cxx_dialect >= cxx20)
> > > > > > > > > +{
> > > > > > > > > +  /* Unfortunately we must handle code like
> > > > > > > > > +false ? bar () : 42
> > > > > > > > > +  where we have to check bar too.  The cp_fold call 
> > > > > > > > > below could
> > > > > > > > > +  fold the ?: into a constant before we've checked it.  
> > > > > > > > > */
> > > > > > > > > +  if (code == COND_EXPR)
> > > > > > > > > + {
> > > > > > > > > +   auto then_fn = cp_fold_r, else_fn = cp_fold_r;
> > > > > > > > > +   /* See if we can figure out if either of the branches 
> > > > > > > > > is dead.  If it
> > > > > > > > > +  is, we don't need to do everything that cp_fold_r 
> > > > > > > > > does.  */
> > > > > > > > > +   tree cond = maybe_constant_value (TREE_OPERAND (stmt, 
> > > > > > > > > 0));
> > > > > > > > > +   if (integer_zerop (cond))
> > > > > > > > > + then_fn = cp_fold_immediate_r;
> > > > > > > > > +   else if (TREE_CODE (cond) == INTEGER_CST)
> > > > > > > > > + else_fn = cp_fold_immediate_r;
> > > > > > > > > +
> > > > > > > > > +   cp_walk_tree (_OPERAND (stmt, 0), cp_fold_r, 
> > > > > > > > > data, nullptr);
> > > > > > > > 
> > > > > > > > I wonder about doing this before maybe_constant_value, to 
> > > > > > > > hopefully reduce
> > > > > > > > the redundant calculations?  OK either way.
> > > > > > > 
> > > > > > > Yeah, I was toying with that, I had
> > > > > > > 
> > > > > > >  foo() ? x : y
> > > > > > > 
> > > > > > > where foo was a constexpr function but the cp_fold_r on op0 
> > > > > > > didn't eval it
> > > > > > > to a constant :(.
> > > > > > 
> > > > > > IIUC that's because cp_fold evaluates constexpr calls only with -O, 
> > > > > > so
> > > > > > doing cp_fold_r before maybe_constant_value on the condition should
> > > > > > still be desirable in that case?
> > > > > 
> > > > > Hmm, and if the cp_fold_r doesn't reduce the test to a constant, I 
> > > > > would
> > > > > think that folding the COND_EXPR also won't discard the other branch, 
> > > > > so we
> > > > > shouldn't need to work harder to get a constant here, so we don't 
> > > > > need to
> > > > > call maybe_constant_value at all?
> > > > 
> > > > Sorry, I hadn't seen this message when I posted the tweak.  But the
> > > > maybe_constant_value call on TREE_OPERAND (stmt, 0) should still make
> > > > sense because it can render a branch dead even on -O0.  Right?
> > > 
> > > But if op0 isn't constant after cp_fold, folding the COND_EXPR won't 
> > > discard
> > > the branch, so we don't need to do the extra work to find out that it's
> > > actually dead.
> > 
> > Hmm, so how about this?  Thus far dg.exp passed.
> > 
> > -- >8 --
> > This patch is an optimization tweak for cp_fold_r.  If we cp_fold_r the
> > COND_EXPR's op0 first, we may be able to evaluate it to a constant if -O.
> > cp_fold has:
> > 
> > 3143 if (callee && DECL_DECLARED_CONSTEXPR_P (callee)
> > 3144 && !flag_no_inline)
> > ...
> > 3151 r = maybe_constant_value (x, /*decl=*/NULL_TREE,
> > 
> > flag_no_inline is 1 for -O0:
> > 
> > 1124   if (opts->x_optimize == 0)
> > 1125 {
> > 1126   /* Inlining does not work if not optimizing,
> > 1127  so force it not to be done.  */
> > 1128   opts->x_warn_inline = 0;
> > 1129   opts->x_flag_no_inline = 1;
> > 1130 }
> > 
> > but otherwise it's 0 and cp_fold will maybe_constant_value calls to
> > constexpr functions.  And if it doesn't, then folding the COND_EXPR
> > will keep both arms, and we can avoid calling maybe_constant_value.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-gimplify.cc (cp_fold_r): Don't call maybe_constant_value.
> > ---
> >   gcc/cp/cp-gimplify.cc | 7 +++
> >   1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > index a282c3930a3..385042aeef2 100644
> > --- a/gcc/cp/cp-gimplify.cc
> > 

[Bug tree-optimization/111878] [14 Regression] ICE: in get_loop_exit_edges, at cfgloop.cc:1204 with -O3 -fgraphite-identity -fsave-optimization-record

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111878

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-19
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
`-O3 -fgraphite-identity -fdump-tree-graphite` will also cause the ICE.

The code in graphite is:
dump_user_location_t loc = find_loop_location
  (scops[i]->scop_info->region.entry->dest->loop_father);
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
 "loop nest optimized\n");

[Bug ipa/111873] [12/13/14 Regression] runtime Segmentation fault with '-O3 -fno-code-hoisting -fno-early-inlining -fno-tree-fre -fno-tree-loop-optimize -fno-tree-pre'

2023-10-19 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111873

Andrew Pinski  changed:

   What|Removed |Added

Summary|runtime Segmentation fault  |[12/13/14 Regression]
   |with '-O3   |runtime Segmentation fault
   |-fno-code-hoisting  |with '-O3
   |-fno-early-inlining |-fno-code-hoisting
   |-fno-tree-fre   |-fno-early-inlining
   |-fno-tree-loop-optimize |-fno-tree-fre
   |-fno-tree-pre'  |-fno-tree-loop-optimize
   ||-fno-tree-pre'
   Target Milestone|--- |12.4
 Status|UNCONFIRMED |NEW
 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-10-19
  Component|middle-end  |ipa

--- Comment #3 from Andrew Pinski  ---
Confirmed.
It is the inliner which messes up and introduces the store to a const variable.
We go from:
```
int main ()
{
   [local count: 1073741824]:
  h (c);
  return 0;

}


void h (const struct a i)
{
  const short int i$b;
  int _1;

   [local count: 1073741824]:
  i$b_5 = i.b;
  i.b = i$b_5;
```

To:
```
int main ()
{
  int D.2031;
  const short int i$b;
  int _4;
  int _6;

   [local count: 1073741824]:
  i$b_3 = 0;
  c.b = i$b_3;
```

Which is totally wrong.

Re: [PATCH v2] gcc: Introduce -fhardened

2023-10-19 Thread Marek Polacek
On Thu, Oct 19, 2023 at 01:33:51PM +0100, Sam James wrote:
> 
> Richard Biener  writes:
> 
> > On Wed, Oct 11, 2023 at 10:48 PM Marek Polacek  wrote:
> >>
> >> On Tue, Sep 19, 2023 at 10:58:19AM -0400, Marek Polacek wrote:
> >> > On Mon, Sep 18, 2023 at 08:57:39AM +0200, Richard Biener wrote:
> >> > > On Fri, Sep 15, 2023 at 5:09 PM Marek Polacek via Gcc-patches
> >> > >  wrote:
> >> > > >
> >> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, 
> >> > > > powerpc64le-unknown-linux-gnu,
> >> > > > and aarch64-unknown-linux-gnu; ok for trunk?
> >> > > >
> >> > > > -- >8 --
> >> > > > In 
> >> > > > 
> >> > > > I proposed -fhardened, a new umbrella option that enables a 
> >> > > > reasonable set
> >> > > > of hardening flags.  The read of the room seems to be that the option
> >> > > > would be useful.  So here's a patch implementing that option.
> >> > > >
> >> > > > Currently, -fhardened enables:
> >> > > >
> >> > > >   -D_FORTIFY_SOURCE=3 (or =2 for older glibcs)
> >> > > >   -D_GLIBCXX_ASSERTIONS
> >> > > >   -ftrivial-auto-var-init=pattern
> >
> > I think =zero is much better here given the overhead is way
> > cheaper and pointers get a more reliable behavior.
> 
> Yes please, as I wouldn't want us to use =pattern distro-wide.

Thanks for the feedback, I switched back to =zero.

Marek



Re: [PATCH v2] gcc: Introduce -fhardened

2023-10-19 Thread Marek Polacek
On Wed, Oct 18, 2023 at 08:12:37PM +, Qing Zhao wrote:
> Marek,
> 
> Sorry for the late comment (I was just back from a long vacation immediate 
> after Cauldron). 

No worries.
 
> One question:
> 
> Is the option “-fhandened” for production build or for development build? 

I intend -fhardened to be for production builds.

> If it’s for development build, then adding -ftrivial-auto-var-init=pattern is 
> reasonable since the major purpose for  -ftrivial-auto-var-init=pattern is 
> for debugging, the runtime overhead of -ftrivial-auto-var-init=pattern is 
> higher then -ftrivial-auto-var-init=zero.
> 
> However, if it’s for production build, then adding 
> -ftrivial-auto-var-init=zero is better since the major purpose for 
> -ftrivial-auto-var-init=zero is for production build to eliminate all 
> uninitialization. And the runtime overhead of =zero is smaller than =pattern.

Okay, I've changed the patch back to enabling -ftrivial-auto-var-init=zero.

In future C++ modes, we may have to default to -ftrivial-auto-var-init=zero
as per  anyway.  On the other hand, it seems that
 might change that again...

Marek



Re: [PATCH V3 00/11] Refactor and cleanup vsetvl pass

2023-10-19 Thread Patrick O'Neill

I tested it this morning on my machine and it passed!

Tested against:
04d6c74564b7eb51660a00b35353aeab706b5a50

Using targets:
glibc rv32gcv qemu
glibc rv64gcv qemu

This patch series does not introduce any new failures.

Here's a list of *resolved* failures by this patch series:
rv64gcv:
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

rv32gcv:
FAIL: gcc.target/riscv/rvv/autovec/binop/narrow_run-1.c execution test
FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test

FAIL: gfortran.dg/host_assoc_function_7.f90   -O3 -g  execution test

Thanks for the quick revision Lehua!

Tested-by: Patrick O'Neill 

Patrick

On 10/19/23 01:50, 钟居哲 wrote:

LGTM now. But wait for Patrick CI testing.

Hi, @Patrick. Could you apply this patch and trigger CI in your 
github  so that we can see the full running result.


Issues · patrick-rivos/riscv-gnu-toolchain · GitHub 




juzhe.zh...@rivai.ai

*From:* Lehua Ding 
*Date:* 2023-10-19 16:33
*To:* gcc-patches 
*CC:* juzhe.zhong ; kito.cheng
; rdapp.gcc
; palmer ;
jeffreyalaw ; lehua.ding

*Subject:* [PATCH V3 00/11] Refactor and cleanup vsetvl pass
This patch refactors and cleanups the vsetvl pass in order to make
the code
easier to modify and understand. This patch does several things:
1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and 3
only maintain
   and modify this virtual CFG. Phase 4 performs insertion,
modification and
   deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
   virtual CFG is called vsetvl_block_info and the vsetvl
information inside
   is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
   this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
   info to a pred basic block to a more unified method that there
is a vsetvl
   info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
   Phase 4 is responsible for inserting, modifying and deleting vsetvl
   instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
   operand from the RVV instruction and removes the unused dest
operand
   register from the vsetvl insns.
These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:
1. more optimized
vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/
   vlmax_conflict-12.c/vsetvl-13.c/vsetvl-23.c/
avl_single-23.c/avl_single-89.c/avl_single-95.c/pr109773-1.c
2. less unnecessary fusion
avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/vsetvl-18.c
3. local fuse direction (backward -> forward)
   scalar_move-1.c/
4. add some bugfix testcases.
   pr111037-3.c/pr111037-4.c
   avl_single-89.c
PR target/111037
PR target/111234
PR target/111725
Lehua Ding (11):
  RISC-V: P1: Refactor
avl_info/vl_vtype_info/vector_insn_info/vector_block_info
  RISC-V: P2: Refactor and cleanup demand system
  RISC-V: P3: Refactor vector_infos_manager
  RISC-V: P4: move method from pass_vsetvl to pre_vsetvl
  RISC-V: P5: combine phase 1 and 2
  RISC-V: P6: Add computing reaching definition data flow
  RISC-V: P7: Move earliest fuse and lcm code to pre_vsetvl class
  RISC-V: P8: Refactor emit-vsetvl phase and delete post optimization
  RISC-V: P9: Cleanup and reorganize helper functions
  RISC-V: P10: Delete riscv-vsetvl.h and adjust riscv-vsetvl.def
  RISC-V: P11: Adjust and add testcases
gcc/config/riscv/riscv-vsetvl.cc  | 6502 +++--
gcc/config/riscv/riscv-vsetvl.def |  641 +-
gcc/config/riscv/riscv-vsetvl.h   |  488 --
gcc/config/riscv/t-riscv  |    2 +-
.../gcc.target/riscv/rvv/base/scalar_move-1.c |    2 +-
.../riscv/rvv/vsetvl/avl_single-104.c |   35 +
.../riscv/rvv/vsetvl/avl_single-105.c |   23 +
.../riscv/rvv/vsetvl/avl_single-106.c |   34 +
.../riscv/rvv/vsetvl/avl_single-107.c |   41 +
.../riscv/rvv/vsetvl/avl_single-108.c |   41 +

Re: [PATCH 06/11] haifa-sched: Allow for NOTE_INSN_DELETED at start of epilogue

2023-10-19 Thread Alex Coplan
Hi Jeff,

On 19/10/2023 08:54, Jeff Law wrote:
> 
> 
> On 10/17/23 14:48, Alex Coplan wrote:
> > haifa-sched.cc:remove_notes asserts that it lands on a real (non-note)
> > insn after advancing past NOTE_INSN_EPILOGUE_BEG, but with the upcoming
> > post-RA aarch64 load pair pass enabled, we can land on
> > NOTE_INSN_DELETED.
> > 
> > This patch adjusts remove_notes to remove these if they occur at the
> > start of the epilogue instead of asserting.
> > 
> > Bootstrapped/regtested as a series on aarch64-linux-gnu, OK for trunk?
> > 
> > gcc/ChangeLog:
> > 
> > * haifa-sched.cc (remove_notes): Allow for NOTE_INSN_DELETED at
> > the start of the epilgoue, remove these.
> One could argue that the pass should have actually deleted the insn rather
> than just turned it into a NOTE_INSN_DELETED.  Is there some reason that's
> not done?  A NOTE_INSN_DELETED carries no useful information.

Yeah, I think we can teach rtl-ssa to do that.  I'm testing a patch to
do so now.  I think the pass is the first consumer of the deletion
functionality (insn_change::DELETE) in rtl-ssa, and I can't see a reason for it
to keep the NOTE_INSN_DELETED around.

So consider the scheduler patch withdrawn for now, thanks!

Alex

> 
> 
> > + /* Skip over any NOTE_INSN_DELETED at the start of the epilogue.
> > +  */
> 
> Don't bring the close comment down to a new line.  If it fits, but it on the
> last line of the actual comment.  Otherwise bring down part of comment so
> that we don't have the close comment on a line by itself.
> 
> Jeff


Re: [RFC] Add function attribute: null_terminated_string_arg(PARAM_IDX)

2023-10-19 Thread David Malcolm
On Thu, 2023-10-19 at 16:13 +0200, Andreas Schwab wrote:
> On Okt 19 2023, David Malcolm wrote:

[...]

> > +   /* First, check for a null-terminated string *without*
> > +  emitting emitting warnings (via a null context), to
> 
> -emitting
> 

Thanks; I've fixed this in my working copy.

Dave



[Bug testsuite/111883] New: Wstringop-overflow-6.C FAILs with -std=c++26

2023-10-19 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111883

Bug ID: 111883
   Summary: Wstringop-overflow-6.C FAILs with -std=c++26
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: mpolacek at gcc dot gnu.org
  Target Milestone: ---

FAIL: g++.dg/warn/Wstringop-overflow-6.C  -std=gnu++26 (test for excess errors)

the excess warning is:

$ xg++ -c Wstringop-overflow-6.C  -std=gnu++26 -O2 -Wall -Wsystem-headers 
In file included from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:51,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/string:54,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/locale_classes.h:40,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/ios_base.h:41,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/ios:44,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/ostream:40,
 from
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/iostream:41,
 from Wstringop-overflow-6.C:6:
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/charconv: In
function ‘std::from_chars_result std::from_chars(const char*, const char*,
_Float16&, chars_format)’:
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/charconv:687:17:
warning: converting to ‘_Float16’ from ‘float’ with greater conversion rank
  687 |   __value = __val;
  | ^
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/charconv: In
function ‘std::from_chars_result std::from_chars(const char*, const char*,
__gnu_cxx::__bfloat16_t&, chars_format)’:
/home/mpolacek/x/trunk/x86_64-pc-linux-gnu/libstdc++-v3/include/charconv:763:17:
warning: converting to ‘__gnu_cxx::__bfloat16_t’ {aka ‘__bf16’} from ‘float’
with greater conversion rank
  763 |   __value = __val;
  | ^

[Bug fortran/111880] [9/10/11/12/13] False positive warning of obsolescent COMMON block with Fortran submodule

2023-10-19 Thread sgk at troutmask dot apl.washington.edu via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111880

--- Comment #3 from Steve Kargl  ---
On Thu, Oct 19, 2023 at 05:20:46PM +, zed.three at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111880
> 
> --- Comment #2 from zed.three at gmail dot com ---
> The common block is in 'third_party_module', rather than 'foo',
> unless you mean that it is visible from 'foo'?

Exactly. 'not_my_code' is in the namespace for foo
through use association of third_party_module. 
It seems that trying to hide 'not_my_code' with PRIVATE
or 'use third_party_module, only : some_param' in foo 
does not mute the warning.  Likely, due to -std=f2018
and F2018, Sec. 4.2.

[Bug tree-optimization/110485] vectorizing simd clone calls without loop masking applied

2023-10-19 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110485

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Andre Simoes Dias Vieira
:

https://gcc.gnu.org/g:8b704ed0b8f35ec1a57e70bd8e6913ba0e9d1f24

commit r14-4765-g8b704ed0b8f35ec1a57e70bd8e6913ba0e9d1f24
Author: Andre Vieira 
Date:   Thu Oct 19 18:28:12 2023 +0100

vect: don't allow fully masked loops with non-masked simd clones [PR
110485]

When analyzing a loop and choosing a simdclone to use it is possible to
choose
a simdclone that cannot be used 'inbranch' for a loop that can use partial
vectors.  This may lead to the vectorizer deciding to use partial vectors
which
are not supported for notinbranch simd clones.  This patch fixes that by
disabling the use of partial vectors once a notinbranch simd clone has been
selected.

gcc/ChangeLog:

PR tree-optimization/110485
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Disable
partial
vectors usage if a notinbranch simdclone has been selected.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/pr110485.c: New test.

[Bug fortran/111880] [9/10/11/12/13] False positive warning of obsolescent COMMON block with Fortran submodule

2023-10-19 Thread zed.three at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111880

--- Comment #2 from zed.three at gmail dot com ---
The common block is in 'third_party_module', rather than 'foo', unless you mean
that it is visible from 'foo'? It is still a surprising warning in this
location at any rate!

  1   2   3   >