[gcc r12-10534] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:d4126b329b2ae4f2b60efa1c7ad51b576de168bd

commit r12-10534-gd4126b329b2ae4f2b60efa1c7ad51b576de168bd
Author: Jakub Jelinek 
Date:   Thu Jun 6 22:12:11 2024 +0200

c: Fix up pointer types to may_alias structures [PR114493]

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE 
(t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

    2024-06-06  Jakub Jelinek  

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

(cherry picked from commit d5a3c6d43acb8b2211d9fb59d59482d74c010f01)

Diff:
---
 gcc/c/c-decl.cc   | 15 +++
 gcc/testsuite/gcc.dg/pr114493-1.c | 19 +++
 gcc/testsuite/gcc.dg/pr114493-2.c | 26 ++
 3 files changed, 60 insertions(+)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 619a2090937..668e234f019 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -8699,6 +8699,17 @@ finish_incomplete_vars (tree incomplete_vars, bool 
toplevel)
 }
 }
 
+/* TYPE is a struct or union that we're applying may_alias to after the body is
+   parsed.  Fixup any POINTER_TO types.  */
+
+static void
+c_fixup_may_alias (tree type)
+{
+  for (tree t = TYPE_POINTER_TO (type); t; t = TYPE_NEXT_PTR_TO (t))
+for (tree v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
+}
+
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.
LOC is the location of the RECORD_TYPE or UNION_TYPE's definition.
FIELDLIST is a chain of FIELD_DECL nodes for the fields.
@@ -8973,6 +8984,10 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   warning_at (loc, 0, "union cannot be made transparent");
 }
 
+  if (lookup_attribute ("may_alias", TYPE_ATTRIBUTES (t)))
+for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
+  c_fixup_may_alias (x);
+
   tree incomplete_vars = C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t));
   for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
 {
diff --git a/gcc/testsuite/gcc.dg/pr114493-1.c 
b/gcc/testsuite/gcc.dg/pr114493-1.c
new file mode 100644
index 000..446f33eac3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-1.c
@@ -0,0 +1,19 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}
diff --git a/gcc/testsuite/gcc.dg/pr114493-2.c 
b/gcc/testsuite/gcc.dg/pr114493-2.c
new file mode 100644
index 000..93e3d6e5bc4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-2.c
@@ -0,0 +1,26 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto -std=c2x" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+void
+corge (void)
+{
+  struct S { int s; } s;
+  s.s = 0;
+}
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}


[gcc r12-10531] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bda8c28e6fcdbe0b486b54616877eec32c86d322

commit r12-10531-gbda8c28e6fcdbe0b486b54616877eec32c86d322
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:11:06 2024 +0200

rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or _instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
  {
_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_in

[gcc r12-10530] combine: Fix up simplify_compare_const [PR115092]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:840bc6741680a9c4b58fa1005f19a5d2e7d4be1f

commit r12-10530-g840bc6741680a9c4b58fa1005f19a5d2e7d4be1f
Author: Jakub Jelinek 
Date:   Wed May 15 18:37:17 2024 +0200

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers.  That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15  Jakub Jelinek  

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

(cherry picked from commit 0b93a0ae153ef70a82ff63e67926a01fdab9956b)

Diff:
---
 gcc/combine.cc  |  6 --
 gcc/testsuite/gcc.dg/pr114902.c | 23 +++
 gcc/testsuite/gcc.dg/pr115092.c | 16 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 9a34ef847aa..e79500d40c9 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11789,8 +11789,10 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  `and'ed with that bit), we can replace this with a comparison
  with zero.  */
   if (const_op
-  && (code == EQ || code == NE || code == GE || code == GEU
- || code == LT || code == LTU)
+  && (code == EQ || code == NE || code == GEU || code == LTU
+ /* This optimization is incorrect for signed >= INT_MIN or
+< INT_MIN, those are always true or always false.  */
+ || ((code == GE || code == LT) && const_op > 0))
   && is_a  (mode, _mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
diff --git a/gcc/testsuite/gcc.dg/pr114902.c b/gcc/testsuite/gcc.dg/pr114902.c
new file mode 100644
index 000..60684faa25d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114902.c
@@ -0,0 +1,23 @@
+/* PR rtl-optimization/114902 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp 
-fno-tree-dominator-opts" } */
+
+__attribute__((noipa))
+int foo (int x)
+{
+  int a = ~x;
+  int t = a & 1;
+  int e = -t;
+  int b = e >= -1;
+  if (b)
+return 0;
+  __builtin_trap ();
+}
+
+int
+main ()
+{
+  foo (-1);
+  foo (0);
+  foo (1);
+}
diff --git a/gcc/testsuite/gcc.dg/pr115092.c b/gcc/testsuite/gcc.dg/pr115092.c
new file mode 100644
index 000..c9047f4d321
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115092.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/115092 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre 
-fno-guess-branch-probability" } */
+
+int a, b, c = 1, d, e;
+
+int
+main ()
+{
+  int f, g = a;
+  b = -2;
+  f = -(1 >> ((c && b) & ~a));
+  if (f <= b)
+d = g / e;
+  return 0;
+}


[gcc r12-10532] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow [PR108789]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:91a371254494934e191e3060ae2a86905eb4b2b2

commit r12-10532-g91a371254494934e191e3060ae2a86905eb4b2b2
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

(cherry picked from commit b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c)

Diff:
---
 gcc/builtins.cc| 16 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 57929a42bc4..f91947020b6 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9180,7 +9180,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (__INT_MAX__ + 1U) / 2;
+  if (add (, , ))
+__builtin_abort ();
+
+  /* 256U * 256U should not overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 4);
+  if (mul (, , ))
+__builtin_abort ();
+
+  /* 2147483648U + 2147483648U should overflow */
+  x = __INT_MAX__ + 1U;
+  if (!add (, , ))
+__builtin_abort ();
+
+  /* 65536U * 65536U should overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 2);
+  if (!mul (, , ))
+__builtin_abort ();
+}


[gcc r12-10529] tree-inline: Remove .ASAN_MARK calls when inlining functions into no_sanitize callers [PR114956]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:25bd98dfd99e92c57ff393d393f54d028d7f86f4

commit r12-10529-g25bd98dfd99e92c57ff393d393f54d028d7f86f4
Author: Jakub Jelinek 
Date:   Tue May 7 21:29:14 2024 +0200

tree-inline: Remove .ASAN_MARK calls when inlining functions into 
no_sanitize callers [PR114956]

In r9-5742 we've started allowing to inline always_inline functions into
functions which have disabled e.g. address sanitization even when the
always_inline function is implicitly from command line options sanitized.

This mostly works fine because most of the asan instrumentation is done only
late after ipa, but as the following testcase the .ASAN_MARK ifn calls
gimplifier adds can result in ICEs.

Fixed by dropping those during inlining, similarly to how we drop
.TSAN_FUNC_EXIT calls.

2024-05-07  Jakub Jelinek  

PR sanitizer/114956
* tree-inline.cc: Include asan.h.
(copy_bb): Remove also .ASAN_MARK calls if id->dst_fn has 
asan/hwasan
sanitization disabled.

* gcc.dg/asan/pr114956.c: New test.

(cherry picked from commit d4e25cf4f7c1f51a8824cc62bbb85a81a41b829a)

Diff:
---
 gcc/testsuite/gcc.dg/asan/pr114956.c | 26 ++
 gcc/tree-inline.cc   | 28 +---
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/asan/pr114956.c 
b/gcc/testsuite/gcc.dg/asan/pr114956.c
new file mode 100644
index 000..fb87d514f25
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/asan/pr114956.c
@@ -0,0 +1,26 @@
+/* PR sanitizer/114956 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsanitize=address,null" } */
+
+int **a;
+void qux (int *);
+
+__attribute__((always_inline)) static inline int *
+foo (void)
+{
+  int b[1];
+  qux (b);
+  return a[1];
+}
+
+__attribute__((no_sanitize_address)) void
+bar (void)
+{
+  *a = foo ();
+}
+
+void
+baz (void)
+{
+  bar ();
+}
diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc
index 8f5a44ee6f5..a49724f3ff2 100644
--- a/gcc/tree-inline.cc
+++ b/gcc/tree-inline.cc
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "symbol-summary.h"
 #include "symtab-thunks.h"
 #include "symtab-clones.h"
+#include "asan.h"
 
 /* I'm not real happy about this, but we need to handle gimple and
non-gimple trees.  */
@@ -2210,13 +2211,26 @@ copy_bb (copy_body_data *id, basic_block bb,
}
  else if (call_stmt
   && id->call_stmt
-  && gimple_call_internal_p (stmt)
-  && gimple_call_internal_fn (stmt) == IFN_TSAN_FUNC_EXIT)
-   {
- /* Drop TSAN_FUNC_EXIT () internal calls during inlining.  */
- gsi_remove (_gsi, false);
- continue;
-   }
+  && gimple_call_internal_p (stmt))
+   switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_TSAN_FUNC_EXIT:
+   /* Drop .TSAN_FUNC_EXIT () internal calls during inlining.  */
+   gsi_remove (_gsi, false);
+   continue;
+ case IFN_ASAN_MARK:
+   /* Drop .ASAN_MARK internal calls during inlining into
+  no_sanitize functions.  */
+   if (!sanitize_flags_p (SANITIZE_ADDRESS, id->dst_fn)
+   && !sanitize_flags_p (SANITIZE_HWADDRESS, id->dst_fn))
+ {
+   gsi_remove (_gsi, false);
+   continue;
+ }
+   break;
+ default:
+   break;
+ }
 
  /* Statements produced by inlining can be unfolded, especially
 when we constant propagated some operands.  We can't fold


[gcc r12-10527] openmp: Copy DECL_LANG_SPECIFIC and DECL_LANG_FLAG_? to tree-nested decl copy [PR114825]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:cc96dc569f74b7410a97b4beee16435fc2abcfdd

commit r12-10527-gcc96dc569f74b7410a97b4beee16435fc2abcfdd
Author: Jakub Jelinek 
Date:   Thu Apr 25 20:09:35 2024 +0200

openmp: Copy DECL_LANG_SPECIFIC and DECL_LANG_FLAG_? to tree-nested decl 
copy [PR114825]

tree-nested.cc creates in 2 spots artificial VAR_DECLs, one of them is used
both for debug info and OpenMP/OpenACC lowering purposes, the other solely 
for
OpenMP/OpenACC lowering purposes.
When the decls are used in OpenMP/OpenACC lowering, the OMP langhooks 
(mostly
Fortran, C just a little and C++ doesn't have nested functions) then inspect
the flags on the vars and based on that decide how to lower the 
corresponding
clauses.

Unfortunately we weren't copying DECL_LANG_SPECIFIC and DECL_LANG_FLAG_?, so
the langhooks made decisions on the default flags on those instead.
As the original decl isn't necessarily a VAR_DECL, could be e.g. PARM_DECL,
using copy_node wouldn't work properly, so this patch just copies those
flags in addition to other flags it was copying already.  And I've removed
code duplication by introducing a helper function which does copying common
to both uses.

2024-04-25  Jakub Jelinek  

PR fortran/114825
* tree-nested.cc (get_debug_decl): New function.
(get_nonlocal_debug_decl): Use it.
(get_local_debug_decl): Likewise.

* gfortran.dg/gomp/pr114825.f90: New test.

(cherry picked from commit 14d48516e588ad2b35e2007b3970bdcb1b3f145c)

Diff:
---
 gcc/testsuite/gfortran.dg/gomp/pr114825.f90 | 16 
 gcc/tree-nested.cc  | 61 -
 2 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/gomp/pr114825.f90 
b/gcc/testsuite/gfortran.dg/gomp/pr114825.f90
new file mode 100644
index 000..b635476af61
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/pr114825.f90
@@ -0,0 +1,16 @@
+! PR fortran/114825
+
+subroutine pr114825(b)
+  type t
+real, allocatable :: m(:)
+  end type t
+  type(t), allocatable, target :: b(:)
+  type(t), pointer :: d
+  !$omp parallel private(d)
+  d => b(1)
+  !$omp end parallel
+contains
+  subroutine sub
+d => b(1)
+  end subroutine sub
+end subroutine pr114825
diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc
index 3956e495f92..3c2cf46e3f4 100644
--- a/gcc/tree-nested.cc
+++ b/gcc/tree-nested.cc
@@ -1039,6 +1039,37 @@ get_frame_field (struct nesting_info *info, tree 
target_context,
 
 static void note_nonlocal_vla_type (struct nesting_info *info, tree type);
 
+/* Helper for get_nonlocal_debug_decl and get_local_debug_decl.  */
+
+static tree
+get_debug_decl (tree decl)
+{
+  tree new_decl
+= build_decl (DECL_SOURCE_LOCATION (decl),
+ VAR_DECL, DECL_NAME (decl), TREE_TYPE (decl));
+  DECL_ARTIFICIAL (new_decl) = DECL_ARTIFICIAL (decl);
+  DECL_IGNORED_P (new_decl) = DECL_IGNORED_P (decl);
+  TREE_THIS_VOLATILE (new_decl) = TREE_THIS_VOLATILE (decl);
+  TREE_SIDE_EFFECTS (new_decl) = TREE_SIDE_EFFECTS (decl);
+  TREE_READONLY (new_decl) = TREE_READONLY (decl);
+  TREE_ADDRESSABLE (new_decl) = TREE_ADDRESSABLE (decl);
+  DECL_SEEN_IN_BIND_EXPR_P (new_decl) = 1;
+  if ((TREE_CODE (decl) == PARM_DECL
+   || TREE_CODE (decl) == RESULT_DECL
+   || VAR_P (decl))
+  && DECL_BY_REFERENCE (decl))
+DECL_BY_REFERENCE (new_decl) = 1;
+  /* Copy DECL_LANG_SPECIFIC and DECL_LANG_FLAG_* for OpenMP langhook
+ purposes.  */
+  DECL_LANG_SPECIFIC (new_decl) = DECL_LANG_SPECIFIC (decl);
+#define COPY_DLF(n) DECL_LANG_FLAG_##n (new_decl) = DECL_LANG_FLAG_##n (decl)
+  COPY_DLF (0); COPY_DLF (1); COPY_DLF (2); COPY_DLF (3);
+  COPY_DLF (4); COPY_DLF (5); COPY_DLF (6); COPY_DLF (7);
+  COPY_DLF (8);
+#undef COPY_DLF
+  return new_decl;
+}
+
 /* A subroutine of convert_nonlocal_reference_op.  Create a local variable
in the nested function with DECL_VALUE_EXPR set to reference the true
variable in the parent function.  This is used both for debug info
@@ -1086,21 +1117,8 @@ get_nonlocal_debug_decl (struct nesting_info *info, tree 
decl)
 x = build_simple_mem_ref_notrap (x);
 
   /* ??? We should be remapping types as well, surely.  */
-  new_decl = build_decl (DECL_SOURCE_LOCATION (decl),
-VAR_DECL, DECL_NAME (decl), TREE_TYPE (decl));
+  new_decl = get_debug_decl (decl);
   DECL_CONTEXT (new_decl) = info->context;
-  DECL_ARTIFICIAL (new_decl) = DECL_ARTIFICIAL (decl);
-  DECL_IGNORED_P (new_decl) = DECL_IGNORED_P (decl);
-  TREE_THIS_VOLATILE (new_decl) = TREE_THIS_VOLATILE (decl);
-  TREE_SIDE_EFFECTS (new_decl) = TREE_SIDE_EFFECTS (decl);
-  TREE_READONLY (new_decl) = TREE_READONLY (decl);
-  TREE_ADDRESSABLE (new_decl) = TREE_ADDRESSABLE (decl);
-  DECL_SEEN_IN_BIND_EXPR_P (new_decl) = 1;
-  if ((TREE_CODE (decl) == PARM_DECL
-   || TREE_CODE (decl) == RESULT_DECL

[gcc r12-10526] rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:7d0673575aba5dfb41022897a882b9c386c332f4

commit r12-10526-g7d0673575aba5dfb41022897a882b9c386c332f4
Author: Jakub Jelinek 
Date:   Fri Apr 19 08:47:53 2024 +0200

rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]

On the following testcase, combine propagates the mem/v load into mem store
with the same address and then removes it, because noop_move_p says it is a
no-op move.  If it was the other way around, i.e. mem/v store and mem load,
or both would be mem/v, it would be kept.
The problem is that rtx_equal_p never checks any kind of flags on the rtxes
(and I think it would be quite dangerous to change it at this point), and
set_noop_p checks side_effects_p on just one of the operands, not both.
In the MEM <- MEM set, it only checks it on the destination, in
store to ZERO_EXTRACT only checks it on the source.

The following patch adds the missing side_effects_p checks.

2024-04-19  Jakub Jelinek  

PR rtl-optimization/114768
* rtlanal.cc (set_noop_p): Don't return true for MEM <- MEM
sets if src has side-effects or for stores into ZERO_EXTRACT
if ZERO_EXTRACT operand has side-effects.

* gcc.dg/pr114768.c: New test.

(cherry picked from commit 9f295847a9c32081bdd0fe908ffba58e830a24fb)

Diff:
---
 gcc/rtlanal.cc  | 11 +++
 gcc/testsuite/gcc.dg/pr114768.c | 10 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 78a740cb54b..5ba5e4aaaf9 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -1639,12 +1639,15 @@ set_noop_p (const_rtx set)
 return 1;
 
   if (MEM_P (dst) && MEM_P (src))
-return rtx_equal_p (dst, src) && !side_effects_p (dst);
+return (rtx_equal_p (dst, src)
+   && !side_effects_p (dst)
+   && !side_effects_p (src));
 
   if (GET_CODE (dst) == ZERO_EXTRACT)
-return rtx_equal_p (XEXP (dst, 0), src)
-  && !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
-  && !side_effects_p (src);
+return (rtx_equal_p (XEXP (dst, 0), src)
+   && !BITS_BIG_ENDIAN && XEXP (dst, 2) == const0_rtx
+   && !side_effects_p (src)
+   && !side_effects_p (XEXP (dst, 0)));
 
   if (GET_CODE (dst) == STRICT_LOW_PART)
 dst = XEXP (dst, 0);
diff --git a/gcc/testsuite/gcc.dg/pr114768.c b/gcc/testsuite/gcc.dg/pr114768.c
new file mode 100644
index 000..ffe3b368638
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114768.c
@@ -0,0 +1,10 @@
+/* PR rtl-optimization/114768 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-final" } */
+/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" } } */
+
+void
+foo (int *p)
+{
+  *p = *(volatile int *) p;
+}


[gcc r12-10528] gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument [PR114876]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bf134407b494bf79f66fc5048ff0ca409275089c

commit r12-10528-gbf134407b494bf79f66fc5048ff0ca409275089c
Author: Jakub Jelinek 
Date:   Tue Apr 30 11:22:32 2024 +0200

gimple-ssa-sprintf: Use [0, 1] range for %lc with (wint_t) 0 argument 
[PR114876]

Seems when Martin S. implemented this, he coded there strict reading
of the standard, which said that %lc with (wint_t) 0 argument is handled
as wchar_t[2] temp = { arg, 0 }; %ls with temp arg and so shouldn't print
any values.  But, most of the libc implementations actually handled that
case like %c with '\0' argument, adding a single NUL character, the only
known exception is musl.
Recently, C23 changed this in response to GB-141 and POSIX in
https://austingroupbugs.net/view.php?id=1647
so that it should have the same behavior as %c with '\0'.

Because there is implementation divergence, the following patch uses
a range rather than hardcoding it to all 1s (i.e. the %c behavior),
though the likely case is still 1 (forward looking plus most of
implementations).
The res.knownrange = true; assignment removed is redundant due to
the same assignment done unconditionally before the if statement,
rest is formatting fixes.

I don't think the min >= 0 && min < 128 case is right either, I'd think
it should be min >= 0 && max < 128, otherwise it is just some possible
inputs are (maybe) ASCII and there can be others, but this code is a total
mess anyway, with the min, max, likely (somewhere in [min, max]?) and then
unlikely possibly larger than max, dunno, perhaps for at least some chars
in the ASCII range the likely case could be for the ascii case; so perhaps
just the one_2_one_ascii shouldn't set max to 1 and mayfail should be true
for max >= 128.  Anyway, didn't feel I should touch that right now.
    
2024-04-30  Jakub Jelinek  

PR tree-optimization/114876
* gimple-ssa-sprintf.cc (format_character): For min == 0 && max == 
0,
set max, likely and unlikely members to 1 rather than 0.  Remove
useless res.knownrange = true;.  Formatting fixes.

* gcc.dg/pr114876.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Adjust expected
diagnostics.

(cherry picked from commit 6c6b70f07208ca14ba783933988c04c6fc2fff42)

Diff:
---
 gcc/gimple-ssa-sprintf.cc  | 20 +++--
 gcc/testsuite/gcc.dg/pr114876.c| 34 ++
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-1.c   | 12 
 3 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index c0405ab32db..301078ac95f 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -2166,8 +2166,7 @@ format_character (const directive , tree arg, 
pointer_query _qry)
 
   res.knownrange = true;
 
-  if (dir.specifier == 'C'
-  || dir.modifier == FMT_LEN_l)
+  if (dir.specifier == 'C' || dir.modifier == FMT_LEN_l)
 {
   /* A wide character can result in as few as zero bytes.  */
   res.range.min = 0;
@@ -2178,10 +2177,13 @@ format_character (const directive , tree arg, 
pointer_query _qry)
{
  if (min == 0 && max == 0)
{
- /* The NUL wide character results in no bytes.  */
- res.range.max = 0;
- res.range.likely = 0;
- res.range.unlikely = 0;
+ /* In strict reading of older ISO C or POSIX, this required
+no characters to be emitted.  ISO C23 changes that, so
+does POSIX, to match what has been implemented in most of the
+implementations, namely emitting a single NUL character.
+Let's use 0 for minimum and 1 for all the other values.  */
+ res.range.max = 1;
+ res.range.likely = res.range.unlikely = 1;
}
  else if (min >= 0 && min < 128)
{
@@ -2189,11 +2191,12 @@ format_character (const directive , tree arg, 
pointer_query _qry)
 is not a 1-to-1 mapping to the source character set or
 if the source set is not ASCII.  */
  bool one_2_one_ascii
-   = (target_to_host_charmap[0] == 1 && target_to_host ('a') == 
97);
+   = (target_to_host_charmap[0] == 1
+  && target_to_host ('a') == 97);
 
  /* A wide character in the ASCII range most likely results
 in a single byte, and only unlikely in up to MB_LEN_MAX.  */
- res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();;
+ res.range.max = one_2_one_ascii ? 1 : target_mb_len_max ();
  res.range.likely = 1;
  res.range.unlikely = target_mb_len_max ();
  

[gcc r12-10524] attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bb21a7de31183108bdb2489f987deaf94e4985b6

commit r12-10524-gbb21a7de31183108bdb2489f987deaf94e4985b6
Author: Jakub Jelinek 
Date:   Mon Apr 15 10:25:22 2024 +0200

attribs: Don't crash on NULL TREE_TYPE in diag_attr_exclusions [PR114634]

The enumerator still doesn't have TREE_TYPE set but diag_attr_exclusions
assumes that all decls must have types.
I think it is better in something as unimportant as diag_attr_exclusions
to be more robust, if there is no type, it can just diagnose exclusions
on the DECL_ATTRIBUTES, like for types it only diagnoses it on
TYPE_ATTRIBUTES.

2024-04-15  Jakub Jelinek  

PR c++/114634
* attribs.cc (diag_attr_exclusions): Set attrs[1] to NULL_TREE for
decls with NULL TREE_TYPE.

* g++.dg/ext/attrib68.C: New test.

(cherry picked from commit 7ec54f5fdfec298812a749699874db4d6a7246bb)

Diff:
---
 gcc/attribs.cc  | 7 ++-
 gcc/testsuite/g++.dg/ext/attrib68.C | 8 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index f73e00b6201..3bf4253ebc4 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -479,7 +479,12 @@ diag_attr_exclusions (tree last_decl, tree node, tree 
attrname,
   if (DECL_P (node))
 {
   attrs[0] = DECL_ATTRIBUTES (node);
-  attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
+  if (TREE_TYPE (node))
+   attrs[1] = TYPE_ATTRIBUTES (TREE_TYPE (node));
+  else
+   /* TREE_TYPE can be NULL e.g. while processing attributes on
+  enumerators.  */
+   attrs[1] = NULL_TREE;
 }
   else
 {
diff --git a/gcc/testsuite/g++.dg/ext/attrib68.C 
b/gcc/testsuite/g++.dg/ext/attrib68.C
new file mode 100644
index 000..be3b1108491
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attrib68.C
@@ -0,0 +1,8 @@
+// PR c++/114634
+// { dg-do compile }
+
+template 
+struct A
+{
+  enum { e __attribute__ ((aligned (16))) };   // { dg-error "alignment may 
not be specified for 'e'" }
+};


[gcc r12-10533] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b065824e30e9168d33b56039e436c4b09078e260

commit r12-10533-gb065824e30e9168d33b56039e436c4b09078e260
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If fn is CFN_CLZ, use CLZ_DEFINED_VALUE_AT.

(cherry picked from commit b82a816000791e7a286c7836b3a473ec0e2a577b)

Diff:
---
 gcc/fold-const.cc | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 70302943cea..d81a71c41a1 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include "internal-fn.h"
 
 /* Nonzero if we are folding constants inside an initializer or a C++
manifestly-constant-evaluated context; zero otherwise.
@@ -14861,7 +14862,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -14870,6 +14870,22 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (fn != CFN_CLZ)
+   return true;
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
+   {
+ tree atype = TREE_TYPE (arg0);
+ int val = 0;
+ if (direct_internal_fn_supported_p (IFN_CLZ, atype,
+ OPTIMIZE_FOR_BOTH)
+ && CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (atype),
+   val) == 2
+ && val >= 0)
+   return true;
+   }
+  break;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */


[gcc r12-10521] c++: Fix up maybe_warn_for_constant_evaluated calls [PR114580]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b3b7176d5857f116a4a42d885df70f8847e4cd2a

commit r12-10521-gb3b7176d5857f116a4a42d885df70f8847e4cd2a
Author: Jakub Jelinek 
Date:   Tue Apr 9 09:31:42 2024 +0200

c++: Fix up maybe_warn_for_constant_evaluated calls [PR114580]

When looking at maybe_warn_for_constant_evaluated for the trivial
infinite loops patch, I've noticed that it can emit weird diagnostics
for if constexpr in templates, first warn that std::is_constant_evaluted()
always evaluates to false (because the function template is not constexpr)
and then during instantiation warn that std::is_constant_evaluted()
always evaluates to true (because it is used in if constexpr condition).
Now, only the latter is actually true, even when the if constexpr
is in a non-constexpr function, it will still always evaluate to true.

So, the following patch fixes it to call maybe_warn_for_constant_evaluated
always with IF_STMT_CONSTEXPR_P (if_stmt) as the second argument rather than
true if it is if constexpr with non-dependent condition etc.

2024-04-09  Jakub Jelinek  

PR c++/114580
* semantics.cc (finish_if_stmt_cond): Call
maybe_warn_for_constant_evaluated with IF_STMT_CONSTEXPR_P (if_stmt)
as the second argument, rather than true/false depending on if
it is if constexpr with non-dependent constant expression with
bool type.

* g++.dg/cpp2a/is-constant-evaluated15.C: New test.

(cherry picked from commit cfed80b9e4f562c99679739548df9369117dd791)

Diff:
---
 gcc/cp/semantics.cc|  4 +---
 .../g++.dg/cpp2a/is-constant-evaluated15.C | 28 ++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0672d6c5b68..d4d1b01b91e 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1026,6 +1026,7 @@ tree
 finish_if_stmt_cond (tree cond, tree if_stmt)
 {
   cond = maybe_convert_cond (cond);
+  maybe_warn_for_constant_evaluated (cond, IF_STMT_CONSTEXPR_P (if_stmt));
   if (IF_STMT_CONSTEXPR_P (if_stmt)
   && !type_dependent_expression_p (cond)
   && require_constant_expression (cond)
@@ -1034,12 +1035,9 @@ finish_if_stmt_cond (tree cond, tree if_stmt)
 converted to bool.  */
   && TYPE_MAIN_VARIANT (TREE_TYPE (cond)) == boolean_type_node)
 {
-  maybe_warn_for_constant_evaluated (cond, /*constexpr_if=*/true);
   cond = instantiate_non_dependent_expr (cond);
   cond = cxx_constant_value (cond, NULL_TREE);
 }
-  else
-maybe_warn_for_constant_evaluated (cond, /*constexpr_if=*/false);
   finish_cond (_COND (if_stmt), cond);
   add_stmt (if_stmt);
   THEN_CLAUSE (if_stmt) = push_stmt_list ();
diff --git a/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated15.C 
b/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated15.C
new file mode 100644
index 000..50a3cac6e07
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/is-constant-evaluated15.C
@@ -0,0 +1,28 @@
+// PR c++/114580
+// { dg-do compile { target c++17 } }
+// { dg-options "-Wtautological-compare" }
+
+namespace std {
+  constexpr inline bool
+  is_constant_evaluated () noexcept
+  {
+#if __cpp_if_consteval >= 202106L
+if consteval { return true; } else { return false; }
+#else
+return __builtin_is_constant_evaluated ();
+#endif
+  }
+}
+
+template 
+void foo ()
+{
+  if constexpr ((T) std::is_constant_evaluated ()) // { dg-warning 
"'std::is_constant_evaluated' always evaluates to true in 'if constexpr'" }
+;  // { dg-bogus 
"'std::is_constant_evaluated' always evaluates to false in a non-'constexpr' 
function" }
+}
+
+void
+bar ()
+{
+  foo  ();
+}


[gcc r12-10523] c++: Fix bogus warnings about ignored annotations [PR114691]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:e9b960edb01449786a29a8d196c476bfefc4f243

commit r12-10523-ge9b960edb01449786a29a8d196c476bfefc4f243
Author: Jakub Jelinek 
Date:   Fri Apr 12 20:53:10 2024 +0200

c++: Fix bogus warnings about ignored annotations [PR114691]

The middle-end warns about the ANNOTATE_EXPR added for while/for loops
if they declare a var inside of the loop condition.
This is because the assumption is that ANNOTATE_EXPR argument is used
immediately in a COND_EXPR (later GIMPLE_COND), but simplify_loop_decl_cond
wraps the ANNOTATE_EXPR inside of a TRUTH_NOT_EXPR, so it no longer
holds.

The following patch fixes that by adding the TRUTH_NOT_EXPR inside of the
ANNOTATE_EXPR argument if any.

2024-04-12  Jakub Jelinek  

PR c++/114691
* semantics.cc (simplify_loop_decl_cond): Use cp_build_unary_op with
TRUTH_NOT_EXPR on ANNOTATE_EXPR argument (if any) rather than
ANNOTATE_EXPR itself.

* g++.dg/ext/pr114691.C: New test.

(cherry picked from commit 91146346f57cc54dfeb2669347edd0eb3d13af7f)

Diff:
---
 gcc/cp/semantics.cc |  6 +-
 gcc/testsuite/g++.dg/ext/pr114691.C | 22 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index d4d1b01b91e..2d29b0ae1b5 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -800,7 +800,11 @@ simplify_loop_decl_cond (tree *cond_p, tree body)
   *cond_p = boolean_true_node;
 
   if_stmt = begin_if_stmt ();
-  cond = cp_build_unary_op (TRUTH_NOT_EXPR, cond, false, tf_warning_or_error);
+  cond_p = 
+  while (TREE_CODE (*cond_p) == ANNOTATE_EXPR)
+cond_p = _OPERAND (*cond_p, 0);
+  *cond_p = cp_build_unary_op (TRUTH_NOT_EXPR, *cond_p, false,
+  tf_warning_or_error);
   finish_if_stmt_cond (cond, if_stmt);
   finish_break_stmt ();
   finish_then_clause (if_stmt);
diff --git a/gcc/testsuite/g++.dg/ext/pr114691.C 
b/gcc/testsuite/g++.dg/ext/pr114691.C
new file mode 100644
index 000..bda8ff9b39f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/pr114691.C
@@ -0,0 +1,22 @@
+// PR c++/114691
+// { dg-do compile }
+// { dg-options "-O2 -Wall" }
+
+void qux (int);
+int foo (int);
+
+void
+bar (int x)
+{
+  #pragma GCC ivdep
+  while (int y = foo (x))  // { dg-bogus "ignoring loop annotation" }
+qux (y);
+}
+
+void
+baz (int x)
+{
+  #pragma GCC ivdep
+  for (; int y = foo (x); )// { dg-bogus "ignoring loop annotation" }
+qux (y);
+}


[gcc r12-10525] internal-fn: Temporarily disable flag_trapv during .{ADD, SUB, MUL}_OVERFLOW etc. expansion [PR114753]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b3ef00f8b8d577d7b62cea36c13cf087a3b13d0c

commit r12-10525-gb3ef00f8b8d577d7b62cea36c13cf087a3b13d0c
Author: Jakub Jelinek 
Date:   Thu Apr 18 09:45:14 2024 +0200

internal-fn: Temporarily disable flag_trapv during .{ADD,SUB,MUL}_OVERFLOW 
etc. expansion [PR114753]

__builtin_{add,sub,mul}_overflow{,_p} builtins are well defined
for all inputs even for -ftrapv, and the -fsanitize=signed-integer-overflow
ifns shouldn't abort in libgcc but emit the desired ubsan diagnostics
or abort depending on -fsanitize* setting regardless of -ftrapv.
The expansion of these internal functions uses expand_expr* in various
places (e.g. MULT_EXPR at least in 2 spots), so temporarily disabling
flag_trapv in all those spots would be hard.
The following patch disables it around the bodies of 3 functions
which can do the expand_expr calls.
If it was in the C++ FE, I'd use some RAII sentinel, but I don't think
we have one in the middle-end.

2024-04-18  Jakub Jelinek  

PR middle-end/114753
* internal-fn.cc (expand_mul_overflow): Save flag_trapv and
temporarily clear it for the duration of the function, then
restore previous value.
(expand_vector_ubsan_overflow): Likewise.
(expand_arith_overflow): Likewise.

* gcc.dg/pr114753.c: New test.

(cherry picked from commit 6c152c9db3b5b9d43e12846fb7a44977c0b65fc2)

Diff:
---
 gcc/internal-fn.cc  | 19 +++
 gcc/testsuite/gcc.dg/pr114753.c | 14 ++
 2 files changed, 33 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index db23f66b021..ca9cf30b6b5 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -1395,7 +1395,11 @@ expand_mul_overflow (location_t loc, tree lhs, tree 
arg0, tree arg1,
   rtx target = NULL_RTX;
   signop sign;
   enum insn_code icode;
+  int save_flag_trapv = flag_trapv;
 
+  /* We don't want any __mulv?i3 etc. calls from the expansion of
+ these internal functions, so disable -ftrapv temporarily.  */
+  flag_trapv = 0;
   done_label = gen_label_rtx ();
   do_error = gen_label_rtx ();
 
@@ -2237,6 +2241,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, 
tree arg1,
   else
expand_arith_overflow_result_store (lhs, target, mode, res);
 }
+  flag_trapv = save_flag_trapv;
 }
 
 /* Expand UBSAN_CHECK_* internal function if it has vector operands.  */
@@ -2257,7 +2262,11 @@ expand_vector_ubsan_overflow (location_t loc, enum 
tree_code code, tree lhs,
   rtx resvr = NULL_RTX;
   unsigned HOST_WIDE_INT const_cnt = 0;
   bool use_loop_p = (!cnt.is_constant (_cnt) || const_cnt > 4);
+  int save_flag_trapv = flag_trapv;
 
+  /* We don't want any __mulv?i3 etc. calls from the expansion of
+ these internal functions, so disable -ftrapv temporarily.  */
+  flag_trapv = 0;
   if (lhs)
 {
   optab op;
@@ -2387,6 +2396,7 @@ expand_vector_ubsan_overflow (location_t loc, enum 
tree_code code, tree lhs,
 }
   else if (resvr)
 emit_move_insn (lhsr, resvr);
+  flag_trapv = save_flag_trapv;
 }
 
 /* Expand UBSAN_CHECK_ADD call STMT.  */
@@ -2465,7 +2475,11 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
   prec0 = MIN (prec0, pr);
   pr = get_min_precision (arg1, uns1_p ? UNSIGNED : SIGNED);
   prec1 = MIN (prec1, pr);
+  int save_flag_trapv = flag_trapv;
 
+  /* We don't want any __mulv?i3 etc. calls from the expansion of
+ these internal functions, so disable -ftrapv temporarily.  */
+  flag_trapv = 0;
   /* If uns0_p && uns1_p, precop is minimum needed precision
  of unsigned type to hold the exact result, otherwise
  precop is minimum needed precision of signed type to
@@ -2506,6 +2520,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
  ops.location = loc;
  rtx tem = expand_expr_real_2 (, NULL_RTX, mode, EXPAND_NORMAL);
  expand_arith_overflow_result_store (lhs, target, mode, tem);
+ flag_trapv = save_flag_trapv;
  return;
}
 
@@ -2529,16 +2544,19 @@ expand_arith_overflow (enum tree_code code, gimple 
*stmt)
  if (integer_zerop (arg0) && !unsr_p)
{
  expand_neg_overflow (loc, lhs, arg1, false, NULL);
+ flag_trapv = save_flag_trapv;
  return;
}
  /* FALLTHRU */
case PLUS_EXPR:
  expand_addsub_overflow (loc, code, lhs, arg0, arg1, unsr_p,
  unsr_p, unsr_p, false, NULL);
+ flag_trapv = save_flag_trapv;
  return;
case MULT_EXPR:
  expand_mul_overflow (loc, lhs, arg0, arg1, unsr_p,
   unsr_p, unsr_p, false, NULL);
+ flag_trapv = save_flag_trapv;
  return;
default:
  gcc_unreachable ();
@@ -2584,6 +2602,7 @@ ex

[gcc r12-10522] asan, v3: Fix up handling of > 32 byte aligned variables with -fsanitize=address -fstack-protector*

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:082fe43efd241caf8f757c056b98e1ae8b55c300

commit r12-10522-g082fe43efd241caf8f757c056b98e1ae8b55c300
Author: Jakub Jelinek 
Date:   Thu Apr 11 11:12:11 2024 +0200

asan, v3: Fix up handling of > 32 byte aligned variables with 
-fsanitize=address -fstack-protector* [PR110027]

On Tue, Mar 26, 2024 at 02:08:02PM +0800, liuhongt wrote:
> > > So, try to add some other variable with larger size and smaller 
alignment
> > > to the frame (and make sure it isn't optimized away).
> > >
> > > alignb above is the alignment of the first partition's var, if
> > > align_frame_offset really needs to depend on the var alignment, it 
probably
> > > should be the maximum alignment of all the vars with alignment
> > > alignb * BITS_PER_UNIT <=3D MAX_SUPPORTED_STACK_ALIGNMENT
> > >
>
> In asan_emit_stack_protection, when it allocated fake stack, it assume
> bottom of stack is also aligned to alignb. And the place violated this
> is the first var partition. which is 32 bytes offsets,  it should be
> BIGGEST_ALIGNMENT / BITS_PER_UNIT.
> So I think we need to use MAX (BIGGEST_ALIGNMENT /
> BITS_PER_UNIT, ASAN_RED_ZONE_SIZE) for the first var partition.

Your first patch aligned offsets[0] to maximum of alignb and
ASAN_RED_ZONE_SIZE.  But as I wrote in the reply to that mail, alignb there
is the alignment of just a single variable which is the first one to appear
in the sorted list and is placed in the highest spot in the stack frame.
That is not necessarily the largest alignment, the sorting ensures that it
is a variable with the largest size in the frame (and only if several of
them have equal size, largest alignment from the same sized ones).  Your
second patch used maximum of BIGGEST_ALIGNMENT / BITS_PER_UNIT and
ASAN_RED_ZONE_SIZE.  That doesn't change anything at all when using
-mno-avx512f - offsets[0] is still just 32-byte aligned in that case
relative to top of frame, just changes the -mavx512f case to be 64-byte
aligned offsets[0] (aka offsets[0] is then either 0 or -64 instead of either
0 or -32).  That will not help if any variable in the frame needs 128-byte,
256-byte, 512-byte ...  4096-byte alignment.  If you want to fix the bug in
the spot you've touched, you'd need to walk all the
stack_vars[stack_vars_sorted[si2]] for si2 [si + 1, n - 1] and for those
where the loop would do anything (i.e.
stack_vars[i2].representative == i2
&& TREE_CODE (decl2) == SSA_NAME
   ? SA.partition_to_pseudo[var_to_partition (SA.map, decl2)] == NULL_RTX
   : DECL_RTL (decl2) == pc_rtx
and the pred applies (but that means also walking the earlier ones!
because with -fstack-protector* the vars can be processed in several calls) 
and
alignb2 * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT
and compute maximum of those alignments.
That maximum is already computed,
data->asan_alignb = MAX (data->asan_alignb, alignb);
computes that, but you get the final result only after you do all the
expand_stack_vars calls.  You'd need to compute it before.

Though, that change would be still in the wrong place.
The thing is, it would be a waste of the precious stack space when it isn't
needed at all (e.g.  when asan will not at compile time do the use after
return checking, or if it won't do it at runtime, or even if it will do at
runtime it will waste the space on the stack).

The following patch fixes it solely for the __asan_stack_malloc_N
allocations, doesn't enlarge unnecessarily further the actual stack frame.
Because asan is only supported on FRAME_GROWS_DOWNWARD architectures
(mips, rs6000 and xtensa are conditional FRAME_GROWS_DOWNWARD arches, which
for -fsanitize=address or -fstack-protector* use FRAME_GROWS_DOWNWARD 1,
otherwise 0, others supporting asan always just use 1), the assumption for
the dynamic stack realignment is that the top of the stack frame (aka offset
0) is aligned to alignb passed to the function (which is the maximum of 
alignb
of all the vars in the frame).  As checked by the assertion in the patch,
offsets[0] is 0 most of the time and so that assumption is correct, the only
case when it is not 0 is if -fstack-protector* is on together with
-fsanitize=address and cfgexpand.cc (create_stack_guard) created a stack
guard.  That is the only variable which is allocated in the stack frame
right away, for all others with -fsanitize=address defer_stack_allocation
(or -fstack-protector*) returns true and so they aren't allocated
immediately but handled during the frame layout phases.  So, the original
frame_offset of 0 is changed because of the stack guard to
-pointer_size_in_bytes and later at 

[gcc r12-10520] vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f8a327930b82e89ae1466cfacb9e8ac9f5c44e77

commit r12-10520-gf8a327930b82e89ae1466cfacb9e8ac9f5c44e77
Author: Jakub Jelinek 
Date:   Fri Apr 5 14:56:14 2024 +0200

vect: Don't clear base_misaligned in update_epilogue_loop_vinfo [PR114566]

The following testcase is miscompiled, because in the vectorized
epilogue the vectorizer assumes it can use aligned loads/stores
(if the base decl gets alignment increased), but it actually doesn't
increase that.
This is because r10-4203-g97c1460367 added the hunk following
patch removes.  The explanation feels reasonable, but actually it
is not true as the testcase proves.
The thing is, we vectorize the main loop with 64-byte vectors
and the corresponding data refs have base_alignment 16 (the
a array has DECL_ALIGN 128) and offset_alignment 32.  Now, because
of the offset_alignment 32 rather than 64, we need to use unaligned
loads/stores in the main loop (and ditto in the first load/store
in vectorized epilogue).  But the second load/store in the vectorized
epilogue uses only 32-byte vectors and because it is a multiple
of offset_alignment, it checks if we could increase alignment of the
a VAR_DECL, the function returns true, sets base_misaligned = true
and says the access is then aligned.
But when update_epilogue_loop_vinfo clears base_misaligned with the
assumption that the var had to have the alignment increased already,
the update of DECL_ALIGN doesn't happen anymore.

Now, I'd think this base_alignment = false was needed before
r10-4030-gd2db7f7901 change was committed where it incorrectly
overwrote DECL_ALIGN even if it was already larger, rather than
just always increasing it.  But with that change in, it doesn't
make sense to me anymore.

Note, the testcase is latent on the trunk, but reproduces on the 13
branch.

2024-04-05  Jakub Jelinek  

PR tree-optimization/114566
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Don't clear
base_misaligned.

* gcc.target/i386/avx512f-pr114566.c: New test.

(cherry picked from commit a844095e17c1a5aada1364c6f6eaade87ead463c)

Diff:
---
 gcc/testsuite/gcc.target/i386/avx512f-pr114566.c | 34 
 gcc/tree-vect-loop.cc|  8 +-
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr114566.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr114566.c
new file mode 100644
index 000..abfab1bfcd5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr114566.c
@@ -0,0 +1,34 @@
+/* PR tree-optimization/114566 */
+/* { dg-do run } */
+/* { dg-options "-O3 -mavx512f" } */
+/* { dg-additional-options "-fstack-protector-strong" { target 
fstack_protector } } */
+/* { dg-require-effective-target avx512f } */
+
+#define AVX512F
+#include "avx512f-helper.h"
+
+__attribute__((noipa)) int
+foo (float x, float y)
+{
+  float a[8][56];
+  __builtin_memset (a, 0, sizeof (a));
+
+  for (int j = 0; j < 8; j++)
+for (int k = 0; k < 56; k++)
+  {
+   float b = k * y;
+   if (b < 0.)
+ b = 0.;
+   if (b > 0.)
+ b = 0.;
+   a[j][k] += b;
+  }
+
+  return __builtin_log (x);
+}
+
+void
+TEST (void)
+{
+  foo (86.25f, 0.625f);
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index fd0e5a70a96..1abc43f396e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9499,9 +9499,7 @@ find_in_mapping (tree t, void *context)
corresponding dr_vec_info need to be reconnected to the EPILOGUE's
stmt_vec_infos, their statements need to point to their corresponding copy,
if they are gather loads or scatter stores then their reference needs to be
-   updated to point to its corresponding copy and finally we set
-   'base_misaligned' to false as we have already peeled for alignment in the
-   prologue of the main loop.  */
+   updated to point to its corresponding copy.  */
 
 static void
 update_epilogue_loop_vinfo (class loop *epilogue, tree advance)
@@ -9642,10 +9640,6 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree 
advance)
}
   DR_STMT (dr) = STMT_VINFO_STMT (stmt_vinfo);
   stmt_vinfo->dr_aux.stmt = stmt_vinfo;
-  /* The vector size of the epilogue is smaller than that of the main loop
-so the alignment is either the same or lower. This means the dr will
-thus by definition be aligned.  */
-  STMT_VINFO_DR_INFO (stmt_vinfo)->base_misaligned = false;
 }
 
   epilogue_vinfo->shared->datarefs_copy.release ();


[gcc r12-10519] c++: Fix ICE with weird copy assignment operator [PR114572]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f33e8ee4cb44e7a6326a894a9c153557238bde03

commit r12-10519-gf33e8ee4cb44e7a6326a894a9c153557238bde03
Author: Jakub Jelinek 
Date:   Fri Apr 5 09:31:28 2024 +0200

c++: Fix ICE with weird copy assignment operator [PR114572]

While ctors/dtors don't return anything (undeclared void or this pointer
on arm) and copy assignment operators normally return a reference to *this,
it isn't invalid to return uselessly some class object which might need
destructing, but the OpenMP clause handling code wasn't expecting that.

The following patch fixes that.

2024-04-05  Jakub Jelinek  

PR c++/114572
* cp-gimplify.cc (cxx_omp_clause_apply_fn): Call build_cplus_new
on build_call_a result if it has class type.

* testsuite/libgomp.c++/pr114572.C: New test.

(cherry picked from commit 592536eb3c0a97a55b1019ff0216ef77e6ca847e)

Diff:
---
 gcc/cp/cp-gimplify.cc|  4 
 libgomp/testsuite/libgomp.c++/pr114572.C | 24 
 2 files changed, 28 insertions(+)

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index 7ae5327f693..4b7e5729ef1 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -2038,6 +2038,8 @@ cxx_omp_clause_apply_fn (tree fn, tree arg1, tree arg2)
   TREE_PURPOSE (parm), fn,
   i - is_method, tf_warning_or_error);
   t = build_call_a (fn, i, argarray);
+  if (MAYBE_CLASS_TYPE_P (TREE_TYPE (t)))
+   t = build_cplus_new (TREE_TYPE (t), t, tf_warning_or_error);
   t = fold_convert (void_type_node, t);
   t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
   append_to_statement_list (t, );
@@ -2071,6 +2073,8 @@ cxx_omp_clause_apply_fn (tree fn, tree arg1, tree arg2)
   TREE_PURPOSE (parm), fn,
   i - is_method, tf_warning_or_error);
   t = build_call_a (fn, i, argarray);
+  if (MAYBE_CLASS_TYPE_P (TREE_TYPE (t)))
+   t = build_cplus_new (TREE_TYPE (t), t, tf_warning_or_error);
   t = fold_convert (void_type_node, t);
   return fold_build_cleanup_point_expr (TREE_TYPE (t), t);
 }
diff --git a/libgomp/testsuite/libgomp.c++/pr114572.C 
b/libgomp/testsuite/libgomp.c++/pr114572.C
new file mode 100644
index 000..21d5c847f8d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/pr114572.C
@@ -0,0 +1,24 @@
+// PR c++/114572
+// { dg-do run }
+// { dg-options "-fopenmp -O0" }
+
+#include 
+
+struct S
+{
+  S () : s (0) {}
+  ~S () {}
+  S operator= (const S ) { s = x.s; return *this; }
+  int s;
+};
+
+int
+main ()
+{
+  S s;
+  #pragma omp parallel for lastprivate(s)
+  for (int i = 0; i < 10; ++i)
+s.s = i;
+  if (s.s != 9)
+abort ();
+}


[gcc r12-10516] icf: Reset SSA_NAME_{PTR, RANGE}_INFO in successfully merged functions [PR113907]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:81c300bf6836505ef1df1c4430972863c732fc14

commit r12-10516-g81c300bf6836505ef1df1c4430972863c732fc14
Author: Jakub Jelinek 
Date:   Thu Mar 14 17:48:30 2024 +0100

icf: Reset SSA_NAME_{PTR,RANGE}_INFO in successfully merged functions 
[PR113907]

AFAIK we have no code in LTO streaming to stream out or in
SSA_NAME_{RANGE,PTR}_INFO, so LTO effectively throws it all away
and let vrp1 and alias analysis after IPA recompute that.  There is
just one spot, for IPA VRP and IPA bit CCP we save/restore ranges
and set SSA_NAME_{PTR,RANGE}_INFO e.g. on parameters depending on what
we saved and propagated, but that is after streaming in bodies for the
post IPA optimizations.

Now, without LTO SSA_NAME_{RANGE,PTR}_INFO is already computed from
earlier in many cases (er.g. evrp and early alias analysis but other spots
too), but IPA ICF is ignoring the ranges and points-to details when
comparing the bodies.  I think ignoring that is just fine, that is
effectively what we do for LTO where we throw that information away
before the analysis, and not ignoring it could lead to fewer ICF merging
possibilities.

So, the following patch instead verifies that for LTO 
SSA_NAME_{PTR,RANGE}_INFO
just isn't there on SSA_NAMEs in functions into which other functions have
been ICFed, and for non-LTO throws that information away (which matches the
LTO behavior).

Another possibility would be to remember the SSA_NAME <-> SSA_NAME mapping
vector (just one of the 2) on successful sem_function::equals on the
sem_function which is not the chosen leader (e.g. how SSA_NAMEs in the
leader map to SSA_NAMEs in the other function) and use that vector
to union the ranges in sem_function::merge.  I can implement that for
comparison, but wanted to post this first if there is an agreement on
doing that or if Honza thinks we should take SSA_NAME_{RANGE,PTR}_INFO
into account.  I think we can compare SSA_NAME_RANGE_INFO, but have
no idea how to try to compare points to info.  And I think it will result
in less effective ICF for non-LTO vs. LTO unnecessarily.

2024-03-12  Jakub Jelinek  

PR middle-end/113907
* ipa-icf.cc (sem_item_optimizer::merge_classes): Reset
SSA_NAME_RANGE_INFO and SSA_NAME_PTR_INFO on successfully ICF merged
functions.

* gcc.dg/pr113907-1.c: New test.

(cherry picked from commit 7580e39452b65ab5fb5a06f3f1ad7d59720269b5)

Diff:
---
 gcc/ipa-icf.cc| 32 -
 gcc/testsuite/gcc.dg/pr113907-1.c | 49 +++
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/gcc/ipa-icf.cc b/gcc/ipa-icf.cc
index 6528a7a10b2..bf06ad522d9 100644
--- a/gcc/ipa-icf.cc
+++ b/gcc/ipa-icf.cc
@@ -3389,6 +3389,7 @@ sem_item_optimizer::merge_classes (unsigned int 
prev_class_count,
  continue;
 
sem_item *source = c->members[0];
+   bool this_merged_p = false;
 
if (DECL_NAME (source->decl)
&& MAIN_NAME_P (DECL_NAME (source->decl)))
@@ -3435,7 +3436,7 @@ sem_item_optimizer::merge_classes (unsigned int 
prev_class_count,
if (dbg_cnt (merged_ipa_icf))
  {
bool merged = source->merge (alias);
-   merged_p |= merged;
+   this_merged_p |= merged;
 
if (merged && alias->type == VAR)
  {
@@ -3444,6 +3445,35 @@ sem_item_optimizer::merge_classes (unsigned int 
prev_class_count,
  }
  }
  }
+
+   merged_p |= this_merged_p;
+   if (this_merged_p
+   && source->type == FUNC
+   && (!flag_wpa || flag_checking))
+ {
+   unsigned i;
+   tree name;
+   FOR_EACH_SSA_NAME (i, name, DECL_STRUCT_FUNCTION (source->decl))
+ {
+   /* We need to either merge or reset SSA_NAME_*_INFO.
+  For merging we don't preserve the mapping between
+  original and alias SSA_NAMEs from successful equals
+  calls.  */
+   if (POINTER_TYPE_P (TREE_TYPE (name)))
+ {
+   if (SSA_NAME_PTR_INFO (name))
+ {
+   gcc_checking_assert (!flag_wpa);
+   SSA_NAME_PTR_INFO (name) = NULL;
+ }
+ }
+   else if (SSA_NAME_RANGE_INFO (name))
+ {
+   gcc_checking_assert (!flag_wpa);
+   SSA_NAME_RANGE_INFO (name) = NULL;
+ }
+ }
+ }
   }
 
   if (!m_merged_variables.is_empty ())
diff --git a/gcc/testsuite/gcc.dg/pr113907-1.c 
b/gcc/testsuite/gcc.dg/pr113907-1.c
new file mode 100644
index 000..04c4fb8c128
--

[gcc r12-10515] aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE [PR114310]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:9f484597028f2b2862bf22003dbae25c24ce5930

commit r12-10515-g9f484597028f2b2862bf22003dbae25c24ce5930
Author: Jakub Jelinek 
Date:   Thu Mar 14 14:09:20 2024 +0100

aarch64: Fix TImode __sync_*_compare_and_exchange expansion with LSE 
[PR114310]

The following testcase ICEs with LSE atomics.
The problem is that the @atomic_compare_and_swap expander uses
aarch64_reg_or_zero predicate for the desired operand, which is fine,
given that for most of the modes and even for TImode in some cases
it can handle zero immediate just fine, but the TImode
@aarch64_compare_and_swap_lse just uses register_operand for
that operand instead, again intentionally so, because the casp,
caspa, caspl and caspal instructions need to use a pair of consecutive
registers for the operand and xzr is just one register and we can't
just store zero into the link register to emulate pair of zeros.

So, the following patch fixes that by forcing the newval operand into
a register for the TImode LSE case.

2024-03-14  Jakub Jelinek  

PR target/114310
* config/aarch64/aarch64.cc (aarch64_expand_compare_and_swap): For
TImode force newval into a register.

* gcc.dg/pr114310.c: New test.

(cherry picked from commit 9349aefa1df7ae36714b7b9f426ad46e314892d1)

Diff:
---
 gcc/config/aarch64/aarch64.cc   |  2 ++
 gcc/testsuite/gcc.dg/pr114310.c | 20 
 2 files changed, 22 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 96976abdbf4..f8082c4035e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23010,6 +23010,8 @@ aarch64_expand_compare_and_swap (rtx operands[])
 rval = copy_to_mode_reg (r_mode, oldval);
   else
emit_move_insn (rval, gen_lowpart (r_mode, oldval));
+  if (mode == TImode)
+   newval = force_reg (mode, newval);
 
   emit_insn (gen_aarch64_compare_and_swap_lse (mode, rval, mem,
   newval, mod_s));
diff --git a/gcc/testsuite/gcc.dg/pr114310.c b/gcc/testsuite/gcc.dg/pr114310.c
new file mode 100644
index 000..55edd800e42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114310.c
@@ -0,0 +1,20 @@
+/* PR target/114310 */
+/* { dg-do run { target int128 } } */
+
+volatile __attribute__((aligned (sizeof (__int128_t __int128_t v = 10;
+
+int
+main ()
+{
+#if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 0) != 10)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 10, (__int128_t) 15) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 0, (__int128_t) 42) != 0)
+__builtin_abort ();
+  if (__sync_val_compare_and_swap (, (__int128_t) 31, (__int128_t) 35) != 42)
+__builtin_abort ();
+#endif
+  return 0;
+}


[gcc r12-10514] bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm goto [PR110079]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b294d461e2efd6894ba6570ca003701c20fc3cd8

commit r12-10514-gb294d461e2efd6894ba6570ca003701c20fc3cd8
Author: Jakub Jelinek 
Date:   Thu Mar 7 10:02:49 2024 +0100

bb-reorder: Fix -freorder-blocks-and-partition ICEs on aarch64 with asm 
goto [PR110079]

The following testcase ICEs, because fix_crossing_unconditional_branches
thinks that asm goto is an unconditional jump and removes it, replacing it
with unconditional jump to one of the labels.
This doesn't happen on x86 because the function in question isn't invoked
there at all:
  /* If the architecture does not have unconditional branches that
 can span all of memory, convert crossing unconditional branches
 into indirect jumps.  Since adding an indirect jump also adds
 a new register usage, update the register usage information as
 well.  */
  if (!HAS_LONG_UNCOND_BRANCH)
fix_crossing_unconditional_branches ();
I think for the asm goto case, for the non-fallthru edge if any we should
handle it like any other fallthru (and fix_crossing_unconditional_branches
doesn't really deal with those, it only looks at explicit branches at the
end of bbs and we are in cfglayout mode at that point) and for the labels
we just pass the labels as immediates to the assembly and it is up to the
user to figure out how to store them/branch to them or whatever they want to
do.
So, the following patch fixes this by not treating asm goto as a simple
unconditional jump.

I really think that on the !HAS_LONG_UNCOND_BRANCH targets we have a bug
somewhere else, where outofcfglayout or whatever should actually create
those indirect jumps on the crossing edges instead of adding normal
unconditional jumps, I see e.g. in
__attribute__((cold)) int bar (char *);
__attribute__((hot)) int baz (char *);
void qux (int x) { if (__builtin_expect (!x, 1)) goto l1; bar (""); goto 
l1; l1: baz (""); }
void corge (int x) { if (__builtin_expect (!x, 0)) goto l1; baz (""); l2: 
return; l1: bar (""); goto l2; }
with -O2 -freorder-blocks-and-partition on aarch64 before/after this patch
just b .L? jumps which I believe are +-32MB, so if .text is larger than
32MB, it could fail to link, but this patch doesn't address that.

2024-03-07  Jakub Jelinek  

PR rtl-optimization/110079
* bb-reorder.cc (fix_crossing_unconditional_branches): Don't adjust
asm goto.

* gcc.dg/pr110079.c: New test.

(cherry picked from commit b209d905f5ce1fa9d76ce634fd54245ff340960b)

Diff:
---
 gcc/bb-reorder.cc   |  3 ++-
 gcc/testsuite/gcc.dg/pr110079.c | 43 +
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/gcc/bb-reorder.cc b/gcc/bb-reorder.cc
index 2c194aa9055..ef2a1923511 100644
--- a/gcc/bb-reorder.cc
+++ b/gcc/bb-reorder.cc
@@ -2266,7 +2266,8 @@ fix_crossing_unconditional_branches (void)
  /* Make sure the jump is not already an indirect or table jump.  */
 
  if (!computed_jump_p (last_insn)
- && !tablejump_p (last_insn, NULL, NULL))
+ && !tablejump_p (last_insn, NULL, NULL)
+ && asm_noperands (PATTERN (last_insn)) < 0)
{
  /* We have found a "crossing" unconditional branch.  Now
 we must convert it to an indirect jump.  First create
diff --git a/gcc/testsuite/gcc.dg/pr110079.c b/gcc/testsuite/gcc.dg/pr110079.c
new file mode 100644
index 000..1682f9c2344
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110079.c
@@ -0,0 +1,43 @@
+/* PR rtl-optimization/110079 */
+/* { dg-do compile { target lra } } */
+/* { dg-options "-O2" } */
+/* { dg-additional-options "-freorder-blocks-and-partition" { target freorder 
} } */
+
+int a;
+__attribute__((cold)) int bar (char *);
+__attribute__((hot)) int baz (char *);
+
+void
+foo (void)
+{
+l1:
+  while (a)
+;
+  bar ("");
+  asm goto ("" : : : : l2);
+  asm ("");
+l2:
+  goto l1;
+}
+
+void
+qux (void)
+{
+  asm goto ("" : : : : l1);
+  bar ("");
+  goto l1;
+l1:
+  baz ("");
+}
+
+void
+corge (void)
+{
+  asm goto ("" : : : : l1);
+  baz ("");
+l2:
+  return;
+l1:
+  bar ("");
+  goto l2;
+}


[gcc r12-10518] fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:42afabb838d511f5feb150bfa4e68b5880aae1fa

commit r12-10518-g42afabb838d511f5feb150bfa4e68b5880aae1fa
Author: Jakub Jelinek 
Date:   Thu Apr 4 10:47:52 2024 +0200

fold-const: Handle NON_LVALUE_EXPR in native_encode_initializer [PR114537]

The following testcase is incorrectly rejected.  The problem is that
for bit-fields native_encode_initializer expects the corresponding
CONSTRUCTOR elt value must be INTEGER_CST, but that isn't the case
here, it is wrapped into NON_LVALUE_EXPR by maybe_wrap_with_location.
We could STRIP_ANY_LOCATION_WRAPPER as well, but as all we are looking for
is INTEGER_CST inside, just looking through NON_LVALUE_EXPR seems easier.

2024-04-04  Jakub Jelinek  

PR c++/114537
* fold-const.cc (native_encode_initializer): Look through
NON_LVALUE_EXPR if val is INTEGER_CST.

* g++.dg/cpp2a/bit-cast16.C: New test.

(cherry picked from commit 1baec8deb014b8a7da58879a407a4c00cdeb5a09)

Diff:
---
 gcc/fold-const.cc   |  2 ++
 gcc/testsuite/g++.dg/cpp2a/bit-cast16.C | 16 
 2 files changed, 18 insertions(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index da96ed34a4c..70302943cea 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8423,6 +8423,8 @@ native_encode_initializer (tree init, unsigned char *ptr, 
int len,
  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
return 0;
 
+ if (TREE_CODE (val) == NON_LVALUE_EXPR)
+   val = TREE_OPERAND (val, 0);
  if (TREE_CODE (val) != INTEGER_CST)
return 0;
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/bit-cast16.C 
b/gcc/testsuite/g++.dg/cpp2a/bit-cast16.C
new file mode 100644
index 000..d298af67ef2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/bit-cast16.C
@@ -0,0 +1,16 @@
+// PR c++/114537
+// { dg-do compile { target c++20 } }
+
+namespace std {
+template
+constexpr T
+bit_cast (const F& f) noexcept
+{
+  return __builtin_bit_cast (T, f);
+}
+}
+
+struct A { signed char b : 1 = 0; signed char c : 7 = 0; };
+struct D { unsigned char e; };
+constexpr unsigned char f = std::bit_cast (A{}).e;
+static_assert (f == 0);


[gcc r12-10513] i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:929972273e858a9a913b0d74e69ac2f8d7255c28

commit r12-10513-g929972273e858a9a913b0d74e69ac2f8d7255c28
Author: Jakub Jelinek 
Date:   Mon Mar 4 10:04:19 2024 +0100

i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

The Intel extended format has the various weird number categories,
pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
Those are not representable in the GCC real_value and so neither
GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
constants.

As can be seen on the following testcase, because it isn't folded
(since GCC 12, before that we were folding it) we can end up with
a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
general_operand, so we ICE during vregs pass trying to recognize
the move instruction.
Initially I thought it is a middle-end bug, the movxf instruction
has general_operand predicate, but the middle-end certainly never
tests that predicate, seems moves are special optabs.
And looking at other mov optabs, e.g. for vector modes the i386
patterns use nonimmediate_operand predicate on the input, yet
ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
arguments which if the predicate was checked couldn't ever make it through.

The following patch handles this case similarly to the
ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
because I believe that is the only mode that needs it from the scalar ones,
others should just be folded.

2024-03-04  Jakub Jelinek  

PR target/114184
* config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
register.

* gcc.target/i386/pr114184.c: New test.

(cherry picked from commit ea1c16f95b8fbaba4a7f3663ff9933ebedfb92a5)

Diff:
---
 gcc/config/i386/i386-expand.cc   | 17 +
 gcc/testsuite/gcc.target/i386/pr114184.c | 22 ++
 2 files changed, 39 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 304d5e7cbbc..c57a8f56dac 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -350,6 +350,23 @@ ix86_expand_move (machine_mode mode, rtx operands[])
 
 default:
   break;
+
+case SUBREG:
+  /* As not all values in XFmode are representable in real_value,
+we might be called with unfoldable SUBREGs of constants.  */
+  if (mode == XFmode
+ && CONSTANT_P (SUBREG_REG (op1))
+ && can_create_pseudo_p ())
+   {
+ machine_mode imode = GET_MODE (SUBREG_REG (op1));
+ rtx r = force_const_mem (imode, SUBREG_REG (op1));
+ if (r)
+   r = validize_mem (r);
+ else
+   r = force_reg (imode, SUBREG_REG (op1));
+ op1 = simplify_gen_subreg (mode, r, imode, SUBREG_BYTE (op1));
+   }
+  break;
 }
 
   if ((flag_pic || MACHOPIC_INDIRECT)
diff --git a/gcc/testsuite/gcc.target/i386/pr114184.c 
b/gcc/testsuite/gcc.target/i386/pr114184.c
new file mode 100644
index 000..360b3b95026
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114184.c
@@ -0,0 +1,22 @@
+/* PR target/114184 */
+/* { dg-do compile } */
+/* { dg-options "-Og -mavx2" } */
+
+typedef unsigned char V __attribute__((vector_size (32)));
+typedef unsigned char W __attribute__((vector_size (16)));
+
+_Complex long double
+foo (void)
+{
+  _Complex long double d;
+  *(V *) = (V) { 149, 136, 89, 42, 38, 240, 196, 194 };
+  return d;
+}
+
+long double
+bar (void)
+{
+  long double d;
+  *(W *) = (W) { 149, 136, 89, 42, 38, 240, 196, 194 };
+  return d;
+}


[gcc r12-10512] c: Handle scoped attributes in __has*attribute and scoped attribute parsing changes in -std=c11 etc.

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:c2cd5eefccf54074ea9f8dc677a9a05b8a880ae4

commit r12-10512-gc2cd5eefccf54074ea9f8dc677a9a05b8a880ae4
Author: Jakub Jelinek 
Date:   Thu Feb 22 19:32:02 2024 +0100

c: Handle scoped attributes in __has*attribute and scoped attribute parsing 
changes in -std=c11 etc. modes [PR114007]

We aren't able to parse __has_attribute (vendor::attr) (and 
__has_c_attribute
and __has_cpp_attribute) in strict C < C23 modes.  While in -std=gnu* modes
or in -std=c23 there is CPP_SCOPE token, in -std=c* (except for -std=c23)
there are is just a pair of CPP_COLON tokens.
The c-lex.cc hunk adds support for that, but always returns 0 in that case
unlike the GCC 14+ version.

2024-02-22  Jakub Jelinek  

PR c/114007
gcc/c-family/
* c-lex.cc (c_common_has_attribute): Parse 2 CPP_COLONs with
the first one with COLON_SCOPE flag the same as CPP_SCOPE but
ensure 0 is returned then.
gcc/testsuite/
* gcc.dg/c23-attr-syntax-8.c: New test.
libcpp/
* include/cpplib.h (COLON_SCOPE): Define to PURE_ZERO.
* lex.cc (_cpp_lex_direct): When lexing CPP_COLON with another
colon after it, if !CPP_OPTION (pfile, scope) set COLON_SCOPE
flag on the first CPP_COLON token.

(cherry picked from commit 37127ed975e09813eaa2d1cf1062055fce45dd16)

Diff:
---
 gcc/c-family/c-lex.cc| 32 ++--
 gcc/testsuite/gcc.dg/c23-attr-syntax-8.c | 12 
 libcpp/include/cpplib.h  |  1 +
 libcpp/lex.cc|  9 +++--
 4 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 8bfa4f4024f..bd48bfc88e0 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -327,9 +327,28 @@ c_common_has_attribute (cpp_reader *pfile, bool std_syntax)
   do
nxt_token = cpp_peek_token (pfile, idx++);
   while (nxt_token->type == CPP_PADDING);
-  if (nxt_token->type == CPP_SCOPE)
+  if (!c_dialect_cxx ()
+ && nxt_token->type == CPP_COLON
+ && (nxt_token->flags & COLON_SCOPE) != 0)
+   {
+ const cpp_token *prev_token = nxt_token;
+ do
+   nxt_token = cpp_peek_token (pfile, idx++);
+ while (nxt_token->type == CPP_PADDING);
+ if (nxt_token->type == CPP_COLON)
+   {
+ /* __has_attribute (vendor::attr) in -std=c17 etc. modes.
+:: isn't CPP_SCOPE but 2 CPP_COLON tokens, where the
+first one should have COLON_SCOPE flag to distinguish
+it from : :.  */
+ have_scope = true;
+ get_token_no_padding (pfile); // Eat first colon.
+   }
+ else
+   nxt_token = prev_token;
+   }
+  if (nxt_token->type == CPP_SCOPE || have_scope)
{
- have_scope = true;
  get_token_no_padding (pfile); // Eat scope.
  nxt_token = get_token_no_padding (pfile);
  if (nxt_token->type == CPP_NAME)
@@ -359,6 +378,15 @@ c_common_has_attribute (cpp_reader *pfile, bool std_syntax)
 "attribute identifier required after scope");
  attr_name = NULL_TREE;
}
+ if (have_scope)
+   {
+ /* The parser in this case won't be able to parse
+[[vendor::attr]], so ensure 0 is returned.  */
+ result = 0;
+ attr_name = NULL_TREE;
+   }
+ else
+   have_scope = true;
}
   else
{
diff --git a/gcc/testsuite/gcc.dg/c23-attr-syntax-8.c 
b/gcc/testsuite/gcc.dg/c23-attr-syntax-8.c
new file mode 100644
index 000..6fff160dff0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c23-attr-syntax-8.c
@@ -0,0 +1,12 @@
+/* PR c/114007 */
+/* { dg-do compile } */
+/* { dg-options "-std=c11" } */
+
+#if __has_c_attribute (gnu::unused)
+[[gnu::unused]]
+#endif
+int i;
+#if __has_cpp_attribute (gnu::unused)
+[[gnu::unused]]
+#endif
+int j;
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 3eba6f74b57..abdc207d1a1 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -193,6 +193,7 @@ struct GTY(()) cpp_string {
 #define BOL(1 << 6) /* Token at beginning of line.  */
 #define PURE_ZERO  (1 << 7) /* Single 0 digit, used by the C++ frontend,
set in c-lex.cc.  */
+#define COLON_SCOPEPURE_ZERO /* Adjacent colons in C < 23.  */
 #define SP_DIGRAPH (1 << 8) /* # or ## token was a digraph.  */
 #define SP_PREV_WHITE  (1 << 9) /* If whitespace before a ##
operator, or before this token
diff --git a/libcpp/lex.cc b/libcpp/lex.cc
index fb1dfabb7af..04acaa72331 100644
--- a/libcpp/lex.cc
+++ b/libcpp/lex.cc
@

[gcc r12-10511] attribs: Don't canonicalize lookup_scoped_attribute_spec argument [PR113674]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:fda7a897d037ff1c59630f0a741eb20e68f45848

commit r12-10511-gfda7a897d037ff1c59630f0a741eb20e68f45848
Author: Jakub Jelinek 
Date:   Mon Feb 12 20:45:01 2024 +0100

attribs: Don't canonicalize lookup_scoped_attribute_spec argument [PR113674]

The C and C++ FEs when parsing attributes already canonicalize them
(i.e. if they start with __ and end with __ substrings, we remove those).
lookup_attribute already verifies in gcc_assert that the first character
of name is not an underscore, and even lookup_scoped_attribute_spec doesn't
attempt to canonicalize the namespace it is passed.  But for some historic
reason it was canonicalizing the name argument, which misbehaves when
an attribute starts with  and ends with .
I believe it is just wrong to try to canonicalize
lookup_scope_attribute_spec name attribute, it should have been
canonicalized already, in other spots where it is called it is already
canonicalized before.

2024-02-12  Jakub Jelinek  

PR c++/113674
* attribs.cc (extract_attribute_substring): Remove.
(lookup_scoped_attribute_spec): Don't call it.

* c-c++-common/Wattributes-3.c: New test.

(cherry picked from commit b42e978f29b33071addff6d7bb8bcdb11d176606)

Diff:
---
 gcc/attribs.cc | 10 --
 gcc/testsuite/c-c++-common/Wattributes-3.c | 13 +
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 876277dd5b3..f73e00b6201 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -109,15 +109,6 @@ static const struct attribute_spec empty_attribute_table[] 
=
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
 };
 
-/* Return base name of the attribute.  Ie '__attr__' is turned into 'attr'.
-   To avoid need for copying, we simply return length of the string.  */
-
-static void
-extract_attribute_substring (struct substring *str)
-{
-  canonicalize_attr_name (str->str, str->length);
-}
-
 /* Insert an array of attributes ATTRIBUTES into a namespace.  This
array must be NULL terminated.  NS is the name of attribute
namespace.  IGNORED_P is true iff all unknown attributes in this
@@ -409,7 +400,6 @@ lookup_scoped_attribute_spec (const_tree ns, const_tree 
name)
 
   attr.str = IDENTIFIER_POINTER (name);
   attr.length = IDENTIFIER_LENGTH (name);
-  extract_attribute_substring ();
   return attrs->attribute_hash->find_with_hash (,
substring_hash (attr.str,
attr.length));
diff --git a/gcc/testsuite/c-c++-common/Wattributes-3.c 
b/gcc/testsuite/c-c++-common/Wattributes-3.c
new file mode 100644
index 000..a1a6d9a5895
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wattributes-3.c
@@ -0,0 +1,13 @@
+/* PR c++/113674 */
+/* { dg-do compile { target { c || c++11 } } } */
+/* { dg-options "" } */
+
+[[noreturn]] int foo (int i)   /* { dg-warning "'__noreturn__' 
attribute (directive )?ignored" } */
+{
+  return i;
+}
+
+[[maybe_unused]] int bar (int i)   /* { dg-warning 
"'__maybe_unused__' attribute (directive )?ignored" } */
+{
+  return i;
+}


[gcc r12-10510] ggc-common: Fix save PCH assertion

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:e6976013c0910a1043b82a820180f01f356ffd3d

commit r12-10510-ge6976013c0910a1043b82a820180f01f356ffd3d
Author: Jakub Jelinek 
Date:   Sat Feb 3 14:37:19 2024 +0100

ggc-common: Fix save PCH assertion

We are getting a gnuradio PCH ICE
/usr/include/pybind11/stl.h:447:1: internal compiler error: in gt_pch_save, 
at ggc-common.cc:693
0x1304e7d gt_pch_save(_IO_FILE*)
../../gcc/ggc-common.cc:693
0x12a45fb c_common_write_pch()
../../gcc/c-family/c-pch.cc:175
0x18ad711 c_parse_final_cleanups()
../../gcc/cp/decl2.cc:5062
0x213988b c_common_parse_file()
../../gcc/c-family/c-opts.cc:1319
(unfortunately it isn't reproduceable always, but often needs
up to 100 attempts, isn't reproduceable in a cross etc.).
The bug is in the assertion I've added in gt_pch_save when adding
relocation support for the PCH files in case they happen not to be
mmapped at the selected address.
addr is a relocated address which points to a location in the PCH
blob (starting at mmi.preferred_base, with mmi.size bytes) which contains
a pointer that needs to be relocated.  So the assertion is meant to
verify the address is within the PCH blob, obviously it needs to be
equal or above mmi.preferred_base, but I got the other comparison wrong
and when one is very unlucky and the last sizeof (void *) bytes of the
blob happen to be a pointer which needs to be relocated, such as on the
s390x host addr 0x8008a04ff8, mmi.preferred_base 0x80 and
mmi.size 0x8a05000, addr + sizeof (void *) is equal to mmi.preferred_base +
mmi.size and that is still fine, both addresses are end of something.

2024-02-03  Jakub Jelinek  

* ggc-common.cc (gt_pch_save): Allow addr to be equal to
mmi.preferred_base + mmi.size - sizeof (void *).

(cherry picked from commit a4e240643cfa387579d4fa2bf9210a7d20433847)

Diff:
---
 gcc/ggc-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ggc-common.cc b/gcc/ggc-common.cc
index 755d166417a..002212f8203 100644
--- a/gcc/ggc-common.cc
+++ b/gcc/ggc-common.cc
@@ -670,7 +670,7 @@ gt_pch_save (FILE *f)
 {
   gcc_assert ((uintptr_t) addr >= (uintptr_t) mmi.preferred_base
  && ((uintptr_t) addr + sizeof (void *)
- < (uintptr_t) mmi.preferred_base + mmi.size));
+ <= (uintptr_t) mmi.preferred_base + mmi.size));
   if (addr == last_addr)
continue;
   if (last_addr == NULL)


[gcc r12-10509] tree-ssa-strlen: Fix up handle_store [PR113603]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f5758e8142d8926f9a3e3500ba3c9956054dfaf8

commit r12-10509-gf5758e8142d8926f9a3e3500ba3c9956054dfaf8
Author: Jakub Jelinek 
Date:   Tue Jan 30 09:58:05 2024 +0100

tree-ssa-strlen: Fix up handle_store [PR113603]

Since r10-2101-gb631bdb3c16e85f35d3 handle_store uses
count_nonzero_bytes{,_addr} which (more recently limited to statements
with the same vuse) can walk earlier statements feeding the rhs
of the store and call get_stridx on it.
Unlike most of the other functions where get_stridx is called first on
rhs and only later on lhs, handle_store calls get_stridx on the lhs before
the count_nonzero_bytes* call and does some si->nonzero_bytes comparison
on it.
Now, strinfo structures are refcounted and it is important not to screw
it up.
What happens on the following testcase is that we call get_strinfo on the
destination idx's base (g), which returns a strinfo at that moment
with refcount of 2, one copy referenced in bb 2 final strinfos, one in bb 3
(the vector of strinfos was unshared from the dominator there because some
other strinfo was added) and finally we process a store in bb 6.
Now, count_nonzero_bytes is called and that sees [1] in a PHI and
calls get_stridx on it, which in turn calls get_stridx_plus_constant
because  + 1 address doesn't have stridx yet.  This creates a new
strinfo for it:
  si = new_strinfo (ptr, idx, build_int_cst (size_type_node, nonzero_chars),
basesi->full_string_p);
  set_strinfo (idx, si);
and the latter call, because it is the first one in bb 6 that needs it,
unshares the stridx_to_strinfo vector (so refcount of the g strinfo becomes
3).
Now, get_stridx_plus_constant needs to chain the new strinfo of [1] in
between the related strinfos, so after the g record.  Because the strinfo
is now shared between the current bb and 2 other bbs, it needs to
unshare_strinfo it (creating a new strinfo which can be modified as a copy
of the old one, decrementing refcount of the old shared one and setting
refcount of the new one to 1):
  if (strinfo *nextsi = get_strinfo (chainsi->next))
{
  nextsi = unshare_strinfo (nextsi);
  si->next = nextsi->idx;
  nextsi->prev = idx;
}
  chainsi = unshare_strinfo (chainsi);
  if (chainsi->first == 0)
chainsi->first = chainsi->idx;
  chainsi->next = idx;
Now, the bug is that the caller of this a couple of frames above,
handle_store, holds on a pointer to this g strinfo (but doesn't know
about the unsharing, so the pointer is to the old strinfo with refcount
of 2), and later needs to update it, so it
  si = unshare_strinfo (si);
and modifies some fields in it.
This creates a new strinfo (with refcount of 1 which is stored into
the vector of the current bb) based on the old strinfo for g and
decrements refcount of the old one to 1.  So, now we are in inconsistent
state, because the old strinfo for g is referenced in bb 2 and bb 3
vectors, but has just refcount of 1, and then have one strinfo (the one
created by unshare_strinfo (chainsi) in get_stridx_plus_constant) which
has refcount of 1 but isn't referenced from anywhere anymore.
Later on when we free one of the bb 2 or bb 3 vectors (forgot which)
that decrements refcount from 1 to 0 and poisons the strinfo/returns it to
the pool, but then maybe_invalidate when looking at the other bb's pointer
to it ICEs.

The following patch fixes it by calling get_strinfo again, it is guaranteed
to return non-NULL, but could be an unshared copy instead of the originally
fetched shared one.

I believe we only need to do this refetching for the case where get_strinfo
is called on the lhs before get_stridx is called on other operands, because
we should be always modifying (apart from the chaining changes) the strinfo
for the destination of the statements, not other strinfos just consumed in
    there.

2024-01-30  Jakub Jelinek  

PR tree-optimization/113603
* tree-ssa-strlen.cc (strlen_pass::handle_store): After
count_nonzero_bytes call refetch si using get_strinfo in case it
has been unshared in the meantime.

* gcc.c-torture/compile/pr113603.c: New test.

(cherry picked from commit d7250c1e02478586a0cd6d5cb67bf4d17249a7e7)

Diff:
---
 gcc/testsuite/gcc.c-torture/compile/pr113603.c | 40 ++
 gcc/tree-ssa-strlen.cc |  3 ++
 2 files changed, 43 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr113603.c 
b/gcc/testsuite/gcc.c-torture/compile/pr113603.c
new file mode 100644
index 000..0d4e817fbef
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr113603.c
@@ -0,0 +1,40 @@
+/* PR tree

[gcc r12-10517] libquadmath: Don't assume the storage for __float128 arguments is aligned [PR114533]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:9987fe67cf6211515d8ebf6528cc83c77dfb5bf3

commit r12-10517-g9987fe67cf6211515d8ebf6528cc83c77dfb5bf3
Author: Jakub Jelinek 
Date:   Wed Apr 3 10:02:35 2024 +0200

libquadmath: Don't assume the storage for __float128 arguments is aligned 
[PR114533]

With the 
register_printf_type/register_printf_modifier/register_printf_specifier
APIs the C library is just told the size of the argument and is provided 
with
a callback to fetch the argument from va_list using va_arg into C library 
provided
memory.  The C library isn't told what alignment requirement it has, but we 
were
using direct load of a __float128 value from that memory which assumes
__alignof (__float128) alignment.

The following patch fixes that by using memcpy instead.

I haven't been able to reproduce an actual crash, tried
 #include 
 #include 
 #include 

int main ()
{
  __float128 r;
  int prec = 20;
  int width = 46;
  char buf[128];

  r = 2.0q;
  r = sqrtq (r);
  int n = quadmath_snprintf (buf, sizeof buf, "%+-#*.20Qe", width, r);
  if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: +1.41421356237309504880e+00 */
  quadmath_snprintf (buf, sizeof buf, "%Qa", r);
  if ((size_t) n < sizeof buf)
printf ("%s\n", buf);
/* Prints: 0x1.6a09e667f3bcc908b2fb1366ea96p+0 */
  n = quadmath_snprintf (NULL, 0, "%+-#46.*Qe", prec, r);
  if (n > -1)
{
  char *str = malloc (n + 1);
  if (str)
{
  quadmath_snprintf (str, n + 1, "%+-#46.*Qe", prec, r);
  printf ("%s\n", str);
  /* Prints: +1.41421356237309504880e+00 */
}
  free (str);
}
  printf ("%+-#*.20Qe\n", width, r);
  printf ("%Qa\n", r);
  printf ("%+-#46.*Qe\n", prec, r);
  printf ("%d %Qe %d %Qe %d %Qe\n", 1, r, 2, r, 3, r);
  return 0;
    }
    In any case, I think memcpy for loading from it is right.

2024-04-03  Simon Chopin  
Jakub Jelinek  

PR libquadmath/114533
* printf/printf_fp.c (__quadmath_printf_fp): Use memcpy to copy
__float128 out of args.
* printf/printf_fphex.c (__quadmath_printf_fphex): Likewise.

Signed-off-by: Simon Chopin 
(cherry picked from commit 8455d6f6cd43b7b143ab9ee19437452fceba9cc9)

Diff:
---
 libquadmath/printf/printf_fp.c| 2 +-
 libquadmath/printf/printf_fphex.c | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/libquadmath/printf/printf_fp.c b/libquadmath/printf/printf_fp.c
index 8effcee88fa..9968aa5307c 100644
--- a/libquadmath/printf/printf_fp.c
+++ b/libquadmath/printf/printf_fp.c
@@ -363,7 +363,7 @@ __quadmath_printf_fp (struct __quadmath_printf_file *fp,
 
   /* Fetch the argument value. */
 {
-  fpnum = **(const __float128 **) args[0];
+  memcpy (, *(const void *const *) args[0], sizeof (fpnum));
 
   /* Check for special values: not a number or infinity.  */
   if (isnanq (fpnum))
diff --git a/libquadmath/printf/printf_fphex.c 
b/libquadmath/printf/printf_fphex.c
index a40a6b00945..ddb413563c6 100644
--- a/libquadmath/printf/printf_fphex.c
+++ b/libquadmath/printf/printf_fphex.c
@@ -163,7 +163,8 @@ __quadmath_printf_fphex (struct __quadmath_printf_file *fp,
 
   /* Fetch the argument value. */
 {
-  fpnum.value = **(const __float128 **) args[0];
+  memcpy (, *(const void *const *) args[0],
+ sizeof (fpnum.value));
 
   /* Check for special values: not a number or infinity.  */
   if (isnanq (fpnum.value))


[gcc r12-10507] i386: Add -masm=intel profiling support [PR113122]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:bc51280bea76a382875da36e45ebb265b8c0

commit r12-10507-gbc51280bea76a382875da36e45ebb265b8c0
Author: Jakub Jelinek 
Date:   Thu Jan 18 10:21:12 2024 +0100

i386: Add -masm=intel profiling support [PR113122]

x86_function_profiler emits assembly directly into file and only emits
AT syntax.  The following patch adjusts it to emit MASM syntax
if -masm=intel.
As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.

I've tested using
for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" 
"-m32"; do
./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
diff -up /tmp/1 /tmp/2; done
that the emitted sequences are identical after assembly.

2024-01-18  Jakub Jelinek  

PR target/113122
* config/i386/i386.cc (x86_function_profiler): Add -masm=intel
support.  Add missing space after , in emitted assembly in some
cases.  Formatting fixes.

* gcc.target/i386/pr113122-1.c: New test.
* gcc.target/i386/pr113122-2.c: New test.
* gcc.target/i386/pr113122-3.c: New test.
* gcc.target/i386/pr113122-4.c: New test.

(cherry picked from commit d4a2d91b46b2cf758b249a4545e34287e90da23b)

Diff:
---
 gcc/config/i386/i386.cc| 62 --
 gcc/testsuite/gcc.target/i386/pr113122-1.c | 10 +
 gcc/testsuite/gcc.target/i386/pr113122-2.c | 11 ++
 gcc/testsuite/gcc.target/i386/pr113122-3.c |  9 +
 gcc/testsuite/gcc.target/i386/pr113122-4.c | 10 +
 5 files changed, 90 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 6b6142f4aa0..af42e4b9739 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21532,7 +21532,10 @@ x86_function_profiler (FILE *file, int labelno 
ATTRIBUTE_UNUSED)
   if (TARGET_64BIT)
 {
 #ifndef NO_PROFILE_COUNTERS
-  fprintf (file, "\tleaq\t%sP%d(%%rip),%%r11\n", LPREFIX, labelno);
+  if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "\tlea\tr11, %sP%d[rip]\n", LPREFIX, labelno);
+  else
+   fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno);
 #endif
 
   if (!TARGET_PECOFF)
@@ -21543,12 +21546,29 @@ x86_function_profiler (FILE *file, int labelno 
ATTRIBUTE_UNUSED)
  /* NB: R10 is caller-saved.  Although it can be used as a
 static chain register, it is preserved when calling
 mcount for nested functions.  */
- fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
-  mcount_name);
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "1:\tmovabs\tr10, OFFSET FLAT:%s\n"
+  "\tcall\tr10\n", mcount_name);
+ else
+   fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
+mcount_name);
  break;
case CM_LARGE_PIC:
 #ifdef NO_PROFILE_COUNTERS
- fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   {
+ fprintf (file, "1:movabs\tr11, "
+"OFFSET FLAT:_GLOBAL_OFFSET_TABLE_-1b\n");
+ fprintf (file, "\tlea\tr10, 1b[rip]\n");
+ fprintf (file, "\tadd\tr10, r11\n");
+ fprintf (file, "\tmovabs\tr11, OFFSET FLAT:%s@PLTOFF\n",
+  mcount_name);
+ fprintf (file, "\tadd\tr10, r11\n");
+ fprintf (file, "\tcall\tr10\n");
+ break;
+   }
+ fprintf (file,
+  "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
  fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
  fprintf (file, "\taddq\t%%r11, %%r10\n");
  fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name);
@@ -21560,7 +21580,11 @@ x86_function_profiler (FILE *file, int labelno 
ATTRIBUTE_UNUSED)
  break;
case CM_SMALL_PIC:
case CM_MEDIUM_PIC:
- fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
+ if (ASSEMBLER_DIALECT == ASM_INTEL)
+   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]\n",
+mcount_name);
+ else
+   fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", mcount_name);
  break

[gcc r12-10506] cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63

commit r12-10506-g170c2bba7cb85b3ac9380a7d5a1c6d82b3c6aa63
Author: Jakub Jelinek 
Date:   Tue Jan 16 11:49:34 2024 +0100

cfgexpand: Workaround CSE of ADDR_EXPRs in VAR_DECL partitioning [PR113372]

The following patch adds a quick workaround to bugs in VAR_DECL
partitioning.
The problem is that there is no dependency between ADDR_EXPRs of local
decls and CLOBBERs of those vars, so VN can CSE uses of ADDR_EXPRs
(including ivopts integral variants thereof), which can break
add_scope_conflicts discovery of what variables are actually used
in certain region.
E.g. we can have
  ivtmp.40_3 = (unsigned long)   [(void *) 
+ 8B];
...
  uses of ivtmp.40_3
...
  bitint.6 ={v} {CLOBBER(eos)};
...
  ivtmp.28_43 = (unsigned long)   [(void 
*) + 8B];
...
  uses of ivtmp.28_43
before VN (such as dom3), which the add_scope_conflicts code identifies as 2
independent uses of bitint.6 variable (which is correct), but then VN
determines ivtmp.28_43 is the same as ivtmp.40_3 and just uses ivtmp.40_3
even in the second region; at that point add_scope_conflict thinks the
bitint.6 variable is not used in that region anymore.

The following patch does a simple single def-stmt check for such ADDR_EXPRs
(rather than say trying to do a full propagation of what SSA_NAMEs can
contain ADDR_EXPRs of local variables), which seems to workaround all 4 PRs.

In addition to this patch I've used the attached one to gather statistics
on the total size of all variable partitions in a function and seems besides
the new testcases nothing is really affected compared to no patch (I've
actually just modified the patch to == OMP_SCAN instead of == ADDR_EXPR, so
it looks the same except that it never triggers).  The comparison wasn't
perfect because I've only gathered BITS_PER_WORD, main_input_filename (did
some replacement of build directories and /tmp/ccXX names of LTO to make
it more similar between the two bootstraps/regtests), current_function_name
and the total size of all variable partitions if any, because I didn't
record e.g. the optimization options and so e.g. torture tests which iterate
over options could have different partition sizes even in one compiler when
BITS_PER_WORD, main_input_filename and current_function_name are all equal.
So had to write an awk script to check if the first triple in the second
build appeared in the first one and the quadruple in the second build
appeared in the first one too, otherwise print result and that only
triggered in the new tests.
Also, the cc1plus binary according to objdump -dr is identical between the
two builds except for the ADDR_EXPR vs. OMP_SCAN constant in the two spots.

2024-01-16  Jakub Jelinek  

PR tree-optimization/113372
PR middle-end/90348
PR middle-end/110115
PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): New function.
(add_scope_conflicts_1): Use it.

* gcc.c-torture/execute/pr90348.c: New test.
* gcc.c-torture/execute/pr110115.c: New test.
* gcc.c-torture/execute/pr111422.c: New test.

(cherry picked from commit 1251d3957de04dc9b023a23c09400217e13deadb)

Diff:
---
 gcc/cfgexpand.cc   | 30 +++--
 gcc/testsuite/gcc.c-torture/execute/pr110115.c | 45 ++
 gcc/testsuite/gcc.c-torture/execute/pr111422.c | 39 ++
 gcc/testsuite/gcc.c-torture/execute/pr90348.c  | 38 ++
 4 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 9c4d67ba7b6..eadec9a2bfd 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -571,6 +571,26 @@ visit_conflict (gimple *, tree op, tree, void *data)
   return false;
 }
 
+/* Helper function for add_scope_conflicts_1.  For USE on
+   a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to be
+   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR.  */
+
+static inline void
+add_scope_conflicts_2 (tree use, bitmap work,
+  walk_stmt_load_store_addr_fn visit)
+{
+  if (TREE_CODE (use) == SSA_NAME
+  && (POINTER_TYPE_P (TREE_TYPE (use))
+ || INTEGRAL_TYPE_P (TREE_TYPE (use
+{
+  gimple *g = SSA_NAME_DEF_STMT (use);
+  if (is_gimple_assign (g))
+   if (tree op = gimple_assign_rhs1 (g))
+ if (TREE_CODE (op) == ADDR_EXPR)
+   visit (g, TREE_OPERAND (op, 0), op, work);
+}
+}
+
 /* Helper routine for add_scope_conflicts, calculating the active partitions
at the end of BB, leaving the result in WORK.  We're called to generate
conflicts when FOR_CONFLICT is true, otherwise we're just tracking
@@ -583,6 +603,8 @@ add_scope_con

[gcc r12-10505] libgomp: Fix up FLOCK fallback handling [PR113192]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:3f0d1e53892348d4df79d822a9910583378674d7

commit r12-10505-g3f0d1e53892348d4df79d822a9910583378674d7
Author: Jakub Jelinek 
Date:   Wed Jan 10 13:29:47 2024 +0100

libgomp: Fix up FLOCK fallback handling [PR113192]

My earlier change broke Solaris testing, because @FLOCK@ isn't substituted
just into libgomp/Makefile where it worked, but also the
testsuite/libgomp-site-extra.exp file where Make variables aren't present
and can't be substituted.

The following patch instead computes the absolute srcdir path and uses it
for FLOCK.

2024-01-10  Jakub Jelinek  

PR libgomp/113192
* configure.ac (FLOCK): Use $libgomp_abs_srcdir/testsuite/flock
instead of \$(abs_top_srcdir)/testsuite/flock.
* configure: Regenerated.

(cherry picked from commit 2fb3ee3ee82874e160309344bc3e52afeed8f26a)

Diff:
---
 libgomp/configure|  9 -
 libgomp/configure.ac | 11 ++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libgomp/configure b/libgomp/configure
index be2c5a63d69..67f6b1435a5 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -16710,6 +16710,13 @@ done
 
 # Fallback if 'perl' is available.
 if test -z "$FLOCK"; then
+  # These need to be absolute paths, yet at the same time need to
+  # canonicalize only relative paths, because then amd will not unmount
+  # drives. Thus the use of PWDCMD: set it to 'pawd' or 'amq -w' if using amd.
+  case $srcdir in
+[\\/$]* | ?:[\\/]*) libgomp_abs_srcdir=${srcdir} ;;
+*) libgomp_abs_srcdir=`cd "$srcdir" && ${PWDCMD-pwd} || echo "$srcdir"` ;;
+  esac
   # Extract the first word of "perl", so it can be a program name with args.
 set dummy perl; ac_word=$2
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
@@ -16727,7 +16734,7 @@ do
   test -z "$as_dir" && as_dir=.
 for ac_exec_ext in '' $ac_executable_extensions; do
   if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_FLOCK="$srcdir/testsuite/flock"
+ac_cv_prog_FLOCK="$libgomp_abs_srcdir/testsuite/flock"
 $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" 
>&5
 break 2
   fi
diff --git a/libgomp/configure.ac b/libgomp/configure.ac
index cc96e5b753b..dd88f20103a 100644
--- a/libgomp/configure.ac
+++ b/libgomp/configure.ac
@@ -343,7 +343,16 @@ AC_MSG_NOTICE([checking for flock implementation])
 AC_CHECK_PROGS(FLOCK, flock)
 # Fallback if 'perl' is available.
 if test -z "$FLOCK"; then
-  AC_CHECK_PROG(FLOCK, perl, $srcdir/testsuite/flock)
+  # These need to be absolute paths, yet at the same time need to
+  # canonicalize only relative paths, because then amd will not unmount
+  # drives. Thus the use of PWDCMD: set it to 'pawd' or 'amq -w' if using amd.
+  case $srcdir in
+changequote(,)dnl
+[\\/$]* | ?:[\\/]*) libgomp_abs_srcdir=${srcdir} ;;
+changequote([,])dnl
+*) libgomp_abs_srcdir=`cd "$srcdir" && ${PWDCMD-pwd} || echo "$srcdir"` ;;
+  esac
+  AC_CHECK_PROG(FLOCK, perl, $libgomp_abs_srcdir/testsuite/flock)
 fi
 
 # Get target configury.


[gcc r12-10508] docs: Fix 2 typos

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:ba385435a9c6f2ae211c2595ffb96ee176aec12c

commit r12-10508-gba385435a9c6f2ae211c2595ffb96ee176aec12c
Author: Jakub Jelinek 
Date:   Thu Jan 25 09:10:08 2024 +0100

docs: Fix 2 typos

When looking into PR113572, I've noticed a typo in VECTOR_CST documentation
and grep found pasto of it elsewhere.

2024-01-25  Jakub Jelinek  

* doc/generic.texi (VECTOR_CST): Fix typo - petterns -> patterns.
* doc/rtl.texi (CONST_VECTOR): Likewise.

(cherry picked from commit 36c1384038f3b9f01124f0fc38bb3c930b1cbe8a)

Diff:
---
 gcc/doc/generic.texi | 2 +-
 gcc/doc/rtl.texi | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index e5f9d1be8ea..1f7b00a2403 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -1144,7 +1144,7 @@ vector.  For example @{ 0, 1 @} could be seen as two 
patterns with
 one element each or one pattern with two elements (@var{base0} and
 @var{base1}).  The canonical encoding is always the one with the
 fewest patterns or (if both encodings have the same number of
-petterns) the one with the fewest encoded elements.
+patterns) the one with the fewest encoded elements.
 
 @samp{vector_cst_encoding_nelts (@var{v})} gives the total number of
 encoded elements in @var{v}, which is 6 in the example above.
diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
index 43c9ee8bffe..2aed9a0454e 100644
--- a/gcc/doc/rtl.texi
+++ b/gcc/doc/rtl.texi
@@ -1843,7 +1843,7 @@ vector.  For example @{ 0, 1 @} could be seen as two 
patterns with
 one element each or one pattern with two elements (@var{base0} and
 @var{base1}).  The canonical encoding is always the one with the
 fewest patterns or (if both encodings have the same number of
-petterns) the one with the fewest encoded elements.
+patterns) the one with the fewest encoded elements.
 
 @samp{const_vector_encoding_nelts (@var{v})} gives the total number of
 encoded elements in @var{v}, which is 6 in the example above.


[gcc r12-10504] c-family: copy attribute diagnostic fixes [PR113262]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:ca8ad807cf33ca9d74a2aecdd78b59af9834b882

commit r12-10504-gca8ad807cf33ca9d74a2aecdd78b59af9834b882
Author: Jakub Jelinek 
Date:   Tue Jan 9 15:37:04 2024 +0100

c-family: copy attribute diagnostic fixes [PR113262]

The copy attributes is allowed on decls as well as types and even has
checks whether decl (set to *node) is DECL_P or TYPE_P, but for diagnostics
unconditionally uses DECL_SOURCE_LOCATION (decl), which obviously only works
if it applies to a decl.

2024-01-09  Jakub Jelinek  

PR c/113262
* c-attribs.cc (handle_copy_attribute): Don't use
DECL_SOURCE_LOCATION (decl) if decl is not DECL_P, use 
input_location
instead.  Formatting fixes.

* gcc.dg/pr113262.c: New test.

(cherry picked from commit c9fc7f398e8b330ff12ec8a29bfa058b6daf6624)

Diff:
---
 gcc/c-family/c-attribs.cc   | 32 ++--
 gcc/testsuite/gcc.dg/pr113262.c |  6 ++
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 8221733613e..88f026336c9 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2820,13 +2820,14 @@ handle_copy_attribute (tree *node, tree name, tree args,
   if (ref == error_mark_node)
 return NULL_TREE;
 
+  location_t loc = input_location;
+  if (DECL_P (decl))
+loc = DECL_SOURCE_LOCATION (decl);
   if (TREE_CODE (ref) == STRING_CST)
 {
   /* Explicitly handle this case since using a string literal
 as an argument is a likely mistake.  */
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute argument cannot be a string",
-   name);
+  error_at (loc, "%qE attribute argument cannot be a string", name);
   return NULL_TREE;
 }
 
@@ -2837,10 +2838,8 @@ handle_copy_attribute (tree *node, tree name, tree args,
   /* Similar to the string case, since some function attributes
 accept literal numbers as arguments (e.g., alloc_size or
 nonnull) using one here is a likely mistake.  */
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute argument cannot be a constant arithmetic "
-   "expression",
-   name);
+  error_at (loc, "%qE attribute argument cannot be a constant arithmetic "
+   "expression", name);
   return NULL_TREE;
 }
 
@@ -2848,12 +2847,11 @@ handle_copy_attribute (tree *node, tree name, tree args,
 {
   /* Another possible mistake (but indirect self-references aren't
 and diagnosed and shouldn't be).  */
-  if (warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
+  if (warning_at (loc, OPT_Wattributes,
  "%qE attribute ignored on a redeclaration "
- "of the referenced symbol",
- name))
-   inform (DECL_SOURCE_LOCATION (node[1]),
-   "previous declaration here");
+ "of the referenced symbol", name)
+ && DECL_P (node[1]))
+   inform (DECL_SOURCE_LOCATION (node[1]), "previous declaration here");
   return NULL_TREE;
 }
 
@@ -2873,7 +2871,8 @@ handle_copy_attribute (tree *node, tree name, tree args,
ref = TREE_OPERAND (ref, 1);
   else
break;
-} while (!DECL_P (ref));
+}
+  while (!DECL_P (ref));
 
   /* For object pointer expressions, consider those to be requests
  to copy from their type, such as in:
@@ -2905,8 +2904,7 @@ handle_copy_attribute (tree *node, tree name, tree args,
 to a variable, or variable attributes to a function.  */
  if (warning (OPT_Wattributes,
   "%qE attribute ignored on a declaration of "
-  "a different kind than referenced symbol",
-  name)
+  "a different kind than referenced symbol", name)
  && DECL_P (ref))
inform (DECL_SOURCE_LOCATION (ref),
"symbol %qD referenced by %qD declared here", ref, decl);
@@ -2956,9 +2954,7 @@ handle_copy_attribute (tree *node, tree name, tree args,
 }
   else if (!TYPE_P (decl))
 {
-  error_at (DECL_SOURCE_LOCATION (decl),
-   "%qE attribute must apply to a declaration",
-   name);
+  error_at (loc, "%qE attribute must apply to a declaration", name);
   return NULL_TREE;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr113262.c b/gcc/testsuite/gcc.dg/pr113262.c
new file mode 100644
index 000..ee55183b587
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr113262.c
@@ -0,0 +1,6 @@
+/* PR c/113262 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+int [[gnu::copy ("")]] a;  /* { dg-error "'copy' attribute argument cannot 
be a string" } */
+


[gcc/redhat/heads/gcc-13-branch] (47 commits) Merge commit 'r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
The branch 'redhat/heads/gcc-13-branch' was updated to point to:

 03b1a31f980... Merge commit 'r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3

It previously pointed to:

 5632a1dc05e... Merge commit 'r13-8792-g53bc98f5355ada17d1629a2d0e96aebd397

Diff:

Summary of changes (added commits):
---

  03b1a31... Merge commit 'r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3
  7813d94... c: Fix up pointer types to may_alias structures [PR114493] (*)
  865d60a... fold-const: Fix up CLZ handling in tree_call_nonnegative_wa (*)
  f9db8b0... builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overf (*)
  308ca60... invoke.texi: Clarify -march=lujiazui (*)
  50b5019... rs6000: Fix up PCH in --enable-host-pie builds [PR115324] (*)
  8deaab6... combine: Fix up simplify_compare_const [PR115092] (*)
  f2ef3ac... Daily bump. (*)
  ef494b1... Fix crash on access-to-incomplete type (*)
  02025fb... Add testcase for PR ada/114398 (*)
  e54d909... ada: Storage_Error in indirect call to function returning l (*)
  7067b7e... Daily bump. (*)
  cd8dc16... Daily bump. (*)
  e11fb72... Daily bump. (*)
  e4f85ea... Disable FMADD in chains for Zen4 and generic (*)
  3cf6c1f... Daily bump. (*)
  c0f2293... Daily bump. (*)
  16fe81c... Daily bump. (*)
  38360ba... Daily bump. (*)
  ed06ca8... alpha: Fix invalid RTX in divmodsi insn patterns [PR115297] (*)
  218246b... Daily bump. (*)
  6634ff0... Daily bump. (*)
  c57d73f... AVR: tree-optimization/115307 - Work around isinf bloat fro (*)
  b45d728... AVR: target/115317 - Make isinf(-Inf) return -1. (*)
  3687dcf... libstdc++: Replace link to gcc-4.3.2 docs in manual [PR1152 (*)
  acdf0f7... Daily bump. (*)
  2602b71... vect: Tighten vect_determine_precisions_from_range [PR11328 (*)
  0836216... vect: Fix access size alignment assumption [PR115192] (*)
  173f876... i386: Fix ix86_option override after change [PR 113719] (*)
  d0fb9d2... Daily bump. (*)
  3be8fa7... MIPS16: Mark $2/$3 as clobbered if GP is used (*)
  2618cda... Daily bump. (*)
  ebca600... Daily bump. (*)
  fd91953... libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 b (*)
  3185cfe... Fortran: Fix SHAPE for zero-size arrays (*)
  67434fe... libstdc++: Guard use of sized deallocation [PR114940] (*)
  d7f9f23... Daily bump. (*)
  b954f15... Daily bump. (*)
  513d050... Daily bump. (*)
  91c7ec5... Daily bump. (*)
  53cdaa7... c++: unroll pragma in templates [PR111529] (*)
  5f14578... c++: array of PMF [PR113598] (*)
  cf76815... Daily bump. (*)
  6f8933c... Daily bump. (*)
  75d394c... testsuite: Verify r0-r3 are extended with CMSE (*)
  f0b88ec... Fortran: fix issues with class(*) assignment [PR114827] (*)
  2ebf3af... Fortran: fix reallocation on assignment of polymorphic vari (*)

(*) This commit already exists in another branch.
Because the reference `refs/vendors/redhat/heads/gcc-13-branch' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/vendors/redhat/heads/gcc-13-branch)] Merge commit 'r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3aea7e' into redhat/gcc-13-branch

2024-06-11 Thread Jakub Jelinek via Libstdc++-cvs
https://gcc.gnu.org/g:03b1a31f9807251f378fcecb29c4669eed357eb2

commit 03b1a31f9807251f378fcecb29c4669eed357eb2
Merge: 5632a1dc05e 7813d94393f
Author: Jakub Jelinek 
Date:   Tue Jun 11 11:10:28 2024 +0200

Merge commit 'r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3aea7e' into 
redhat/gcc-13-branch

Diff:

 gcc/ChangeLog  |  95 +++
 gcc/DATESTAMP  |   2 +-
 gcc/ada/ChangeLog  |  13 ++
 gcc/ada/exp_ch6.adb|  11 +-
 gcc/ada/exp_util.adb   |   6 +
 gcc/ada/sem_ch6.adb|  12 +-
 gcc/builtins.cc|  16 ++-
 gcc/c/c-decl.cc|  15 +++
 gcc/combine.cc |   6 +-
 gcc/config/alpha/alpha.md  |  21 ++--
 gcc/config/alpha/constraints.md|   2 +-
 gcc/config/avr/avr.md  |  16 +++
 gcc/config/i386/i386-options.cc|  10 +-
 gcc/config/i386/x86-tune.def   |   2 +-
 gcc/config/mips/mips.cc|  11 +-
 gcc/config/rs6000/rs6000-builtin.cc|   2 +-
 gcc/config/rs6000/rs6000-c.cc  |  62 +-
 gcc/config/rs6000/rs6000-gen-builtins.cc   |  72 +--
 gcc/cp/ChangeLog   |  19 +++
 gcc/cp/init.cc |   4 +-
 gcc/cp/parser.cc   |   7 +-
 gcc/cp/pt.cc   |  14 +--
 gcc/doc/invoke.texi|   6 +-
 gcc/fold-const.cc  |  18 ++-
 gcc/fortran/ChangeLog  |  38 ++
 gcc/fortran/trans-array.cc |  16 +++
 gcc/fortran/trans-expr.cc  |  52 +---
 gcc/fortran/trans-intrinsic.cc |   4 +-
 gcc/testsuite/ChangeLog| 121 ++
 gcc/testsuite/g++.dg/cpp0x/initlist-pmf2.C |  12 ++
 gcc/testsuite/g++.dg/ext/unroll-4.C|  16 +++
 gcc/testsuite/gcc.c-torture/execute/pr108789.c |  39 ++
 gcc/testsuite/gcc.dg/pr114493-1.c  |  19 +++
 gcc/testsuite/gcc.dg/pr114493-2.c  |  26 
 gcc/testsuite/gcc.dg/pr114902.c|  23 
 gcc/testsuite/gcc.dg/pr115092.c|  16 +++
 gcc/testsuite/gcc.dg/vect/pr113281-1.c |  17 +++
 gcc/testsuite/gcc.dg/vect/pr113281-2.c |  50 
 gcc/testsuite/gcc.dg/vect/pr113281-3.c |  39 ++
 gcc/testsuite/gcc.dg/vect/pr113281-4.c |  55 +
 gcc/testsuite/gcc.dg/vect/pr113281-5.c |  66 ++
 gcc/testsuite/gcc.dg/vect/pr115192.c   |  28 +
 gcc/testsuite/gcc.target/alpha/pr115297.c  |  13 ++
 gcc/testsuite/gcc.target/arm/cmse/extend-param.c   |  21 +++-
 gcc/testsuite/gcc.target/arm/cmse/extend-return.c  |   4 +-
 .../gcc.target/avr/torture/pr115307-isinf.c|  21 
 .../gcc.target/avr/torture/pr115317-isinf.c|  55 +
 gcc/testsuite/gfortran.dg/asan/pr110415-2.f90  |  45 +++
 gcc/testsuite/gfortran.dg/asan/pr110415-3.f90  |  49 
 .../gfortran.dg/asan/unlimited_polymorphic_34.f90  | 135 +
 gcc/testsuite/gfortran.dg/pr110415.f90 |  20 +++
 gcc/testsuite/gfortran.dg/shape_12.f90 |  51 
 gcc/testsuite/gnat.dg/access11.adb |  80 
 gcc/testsuite/gnat.dg/incomplete8.adb  |  22 
 gcc/tree-data-ref.cc   |   5 +-
 gcc/tree-vect-patterns.cc  | 107 +++-
 libgcc/config/avr/libf7/ChangeLog  |   8 ++
 libgcc/config/avr/libf7/libf7-asm.sx   |  19 +--
 libstdc++-v3/ChangeLog |  24 
 libstdc++-v3/doc/html/manual/using.html|  10 +-
 libstdc++-v3/doc/xml/manual/using.xml  |  33 +
 libstdc++-v3/include/std/stacktrace|  13 +-
 .../testsuite/19_diagnostics/stacktrace/hash.cc|   2 +-
 63 files changed, 1614 insertions(+), 202 deletions(-)


Re: [PATCH] jit: Ensure ssize_t is defined.

2024-06-11 Thread Jakub Jelinek
On Tue, Jun 11, 2024 at 10:06:49AM +0200, Richard Biener wrote:
> > approrpiate #define _POSIX_C_SOURCE or #define _XOPE_SOURCE befor the
> > include in case somebody builds with -std=c99?
> 
> Oh, and the manpage says that  also defines ssize_t which
> is a bit odd since we already include that ...

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stdio.h.html
shows that indeed POSIX 2018 stdio.h should provide ssize_t, but
e.g. POSIX 2004 stdio.h doesn't have to:
https://pubs.opengroup.org/onlinepubs/007904875/basedefs/stdio.h.html

Jakub



Re: [PATCH] jit: Ensure ssize_t is defined.

2024-06-11 Thread Jakub Jelinek
On Tue, Jun 11, 2024 at 09:27:37AM +0200, Richard Biener wrote:
> On Tue, 11 Jun 2024, FX Coudert wrote:
> 
> > Hi
> > 
> > I can’t seem to get a review of this one-line patch. Could a global 
> > reviewer help?
> 
> While stdio.h can be relied on to exist I do not think you can assume
> the same for sys/types.h without "configury", but libgccjit.h is an
> installed API.  I would assume including stdlib.h gets you ssize_t as 
> well?

If stdlib.h includes sys/types.h like often on Linux, yes, but not
necessarily.  ssize_t is a POSIX type and it might be solely in sys/types.h.

Perhaps libgccjit.h could use
#ifdef __has_include
#if __has_include ()
#include 
#endif
#endif
instead of just #include .
When compiled by gcc, one can use hacks like
#define unsigned signed
typedef __SIZE_TYPE__ gcc_jit_ssize_t;
#undef unsigned
but that might not work with other compilers and is perhaps
just too ugly.

>  In fact the C11 standard doesn't even mention ssize_t so the
> API should probably avoid using it and instead use size_t for
> 
> /* Given type "T", get its size.
>This API entrypoint was added in LIBGCCJIT_ABI_20; you can test for its
>presence using
>  #ifdef LIBGCCJIT_HAVE_SIZED_INTEGERS  */
> extern ssize_t
> gcc_jit_type_get_size (gcc_jit_type *type);

Jakub



[gcc r13-8837] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:865d60ab4edbdb10d13000af81f9168fd3816a86

commit r13-8837-g865d60ab4edbdb10d13000af81f9168fd3816a86
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If fn is CFN_CLZ, use CLZ_DEFINED_VALUE_AT.

(cherry picked from commit b82a816000791e7a286c7836b3a473ec0e2a577b)

Diff:
---
 gcc/fold-const.cc | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 25dd7c1094e..ef537e1c620 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "vec-perm-indices.h"
 #include "asan.h"
 #include "gimple-range.h"
+#include "internal-fn.h"
 
 /* Nonzero if we are folding constants inside an initializer or a C++
manifestly-constant-evaluated context; zero otherwise.
@@ -14887,7 +14888,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -14896,6 +14896,22 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (fn != CFN_CLZ)
+   return true;
+  else if (INTEGRAL_TYPE_P (TREE_TYPE (arg0)))
+   {
+ tree atype = TREE_TYPE (arg0);
+ int val = 0;
+ if (direct_internal_fn_supported_p (IFN_CLZ, atype,
+ OPTIMIZE_FOR_BOTH)
+  && CLZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (atype),
+val) == 2
+ && val >= 0)
+   return true;
+   }
+  break;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */


[gcc r13-8838] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:7813d94393f60ac641265cb3fc3a446f9f3aea7e

commit r13-8838-g7813d94393f60ac641265cb3fc3a446f9f3aea7e
Author: Jakub Jelinek 
Date:   Thu Jun 6 22:12:11 2024 +0200

c: Fix up pointer types to may_alias structures [PR114493]

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE 
(t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

    2024-06-06  Jakub Jelinek  

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

(cherry picked from commit d5a3c6d43acb8b2211d9fb59d59482d74c010f01)

Diff:
---
 gcc/c/c-decl.cc   | 15 +++
 gcc/testsuite/gcc.dg/pr114493-1.c | 19 +++
 gcc/testsuite/gcc.dg/pr114493-2.c | 26 ++
 3 files changed, 60 insertions(+)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 7dcb1141bf7..318e9c5b253 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9115,6 +9115,17 @@ is_flexible_array_member_p (bool is_last_field,
 }
 
 
+/* TYPE is a struct or union that we're applying may_alias to after the body is
+   parsed.  Fixup any POINTER_TO types.  */
+
+static void
+c_fixup_may_alias (tree type)
+{
+  for (tree t = TYPE_POINTER_TO (type); t; t = TYPE_NEXT_PTR_TO (t))
+for (tree v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
+}
+
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.
LOC is the location of the RECORD_TYPE or UNION_TYPE's definition.
FIELDLIST is a chain of FIELD_DECL nodes for the fields.
@@ -9409,6 +9420,10 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
   warning_at (loc, 0, "union cannot be made transparent");
 }
 
+  if (lookup_attribute ("may_alias", TYPE_ATTRIBUTES (t)))
+for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
+  c_fixup_may_alias (x);
+
   tree incomplete_vars = C_TYPE_INCOMPLETE_VARS (TYPE_MAIN_VARIANT (t));
   for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
 {
diff --git a/gcc/testsuite/gcc.dg/pr114493-1.c 
b/gcc/testsuite/gcc.dg/pr114493-1.c
new file mode 100644
index 000..446f33eac3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-1.c
@@ -0,0 +1,19 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}
diff --git a/gcc/testsuite/gcc.dg/pr114493-2.c 
b/gcc/testsuite/gcc.dg/pr114493-2.c
new file mode 100644
index 000..93e3d6e5bc4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-2.c
@@ -0,0 +1,26 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto -std=c2x" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+void
+corge (void)
+{
+  struct S { int s; } s;
+  s.s = 0;
+}
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}


[gcc r13-8836] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow [PR108789]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f9db8b0571348adfcc98204ea7be787058af85cd

commit r13-8836-gf9db8b0571348adfcc98204ea7be787058af85cd
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

(cherry picked from commit b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c)

Diff:
---
 gcc/builtins.cc| 16 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 1bfdc598eec..e5210bfde49 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -9539,7 +9539,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (__INT_MAX__ + 1U) / 2;
+  if (add (, , ))
+__builtin_abort ();
+
+  /* 256U * 256U should not overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 4);
+  if (mul (, , ))
+__builtin_abort ();
+
+  /* 2147483648U + 2147483648U should overflow */
+  x = __INT_MAX__ + 1U;
+  if (!add (, , ))
+__builtin_abort ();
+
+  /* 65536U * 65536U should overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 2);
+  if (!mul (, , ))
+__builtin_abort ();
+}


[gcc r13-8835] invoke.texi: Clarify -march=lujiazui

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:308ca60bc010f7745a34bdb4527ecced506f72c1

commit r13-8835-g308ca60bc010f7745a34bdb4527ecced506f72c1
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:20:13 2024 +0200

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04  Jakub Jelinek  

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

(cherry picked from commit 09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74)

Diff:
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 792ce283bb9..914c4bc8e6d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32743,8 +32743,10 @@ instruction set support.
 
 @item lujiazui
 ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.
 
 @item geode
 AMD Geode embedded processor with MMX and 3DNow!@: instruction set support.


[gcc r13-8834] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:50b5019fde97c20a377e004c9d73df62e4898773

commit r13-8834-g50b5019fde97c20a377e004c9d73df62e4898773
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:11:06 2024 +0200

rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or _instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
  {
_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_in

[gcc r13-8833] combine: Fix up simplify_compare_const [PR115092]

2024-06-11 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:8deaab6f79768700e1bf05fe6af83b185f678b7f

commit r13-8833-g8deaab6f79768700e1bf05fe6af83b185f678b7f
Author: Jakub Jelinek 
Date:   Wed May 15 18:37:17 2024 +0200

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers.  That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15  Jakub Jelinek  

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

(cherry picked from commit 0b93a0ae153ef70a82ff63e67926a01fdab9956b)

Diff:
---
 gcc/combine.cc  |  6 --
 gcc/testsuite/gcc.dg/pr114902.c | 23 +++
 gcc/testsuite/gcc.dg/pr115092.c | 16 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index fbc84099f73..39d47f50f47 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11801,8 +11801,10 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  `and'ed with that bit), we can replace this with a comparison
  with zero.  */
   if (const_op
-  && (code == EQ || code == NE || code == GE || code == GEU
- || code == LT || code == LTU)
+  && (code == EQ || code == NE || code == GEU || code == LTU
+ /* This optimization is incorrect for signed >= INT_MIN or
+< INT_MIN, those are always true or always false.  */
+ || ((code == GE || code == LT) && const_op > 0))
   && is_a  (mode, _mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
diff --git a/gcc/testsuite/gcc.dg/pr114902.c b/gcc/testsuite/gcc.dg/pr114902.c
new file mode 100644
index 000..60684faa25d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114902.c
@@ -0,0 +1,23 @@
+/* PR rtl-optimization/114902 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp 
-fno-tree-dominator-opts" } */
+
+__attribute__((noipa))
+int foo (int x)
+{
+  int a = ~x;
+  int t = a & 1;
+  int e = -t;
+  int b = e >= -1;
+  if (b)
+return 0;
+  __builtin_trap ();
+}
+
+int
+main ()
+{
+  foo (-1);
+  foo (0);
+  foo (1);
+}
diff --git a/gcc/testsuite/gcc.dg/pr115092.c b/gcc/testsuite/gcc.dg/pr115092.c
new file mode 100644
index 000..c9047f4d321
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115092.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/115092 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre 
-fno-guess-branch-probability" } */
+
+int a, b, c = 1, d, e;
+
+int
+main ()
+{
+  int f, g = a;
+  b = -2;
+  f = -(1 >> ((c && b) & ~a));
+  if (f <= b)
+d = g / e;
+  return 0;
+}


Re: [PATCH] Fix ICE in rtl check due to CONST_WIDE_INT in CONST_VECTOR_DUPLICATE_P

2024-06-11 Thread Jakub Jelinek
On Tue, Jun 11, 2024 at 01:36:35PM +0800, liuhongt wrote:
> In theory, const_wide_int can also be handle with extra check for each 
> components of the HOST_WIDE_INT array, and the check is need for both
> shift and bit_and operands.
> I assume the optimization opportnunity is rare, so the patch just add
> extra check to make sure GET_MODE_INNER (mode) can fix into a
> HOST_WIDE_INT.

I think if you only handle CONST_INT_P, you should check just for that, and
in both places where you check for CONST_VECTOR_DUPLICATE_P (there is one
spot 2 lines above this).
So add
&& CONST_INT_P (XVECEXP (XEXP (op0, 1), 0, 0))
and
&& CONST_INT_P (XVECEXP (op1, 0, 0))
tests right below those && CONST_VECTOR_DUPLICATE_P (something) tests.
> 
> gcc/ChangeLog:
> 
>   PR target/115384
>   * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
>   Only do the simplification of (AND (ASHIFTRT A imm) mask)
>   to (LSHIFTRT A imm) when inner mode fits HOST_WIDE_INT.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/i386/pr115384.c: New test.

Jakub



[gcc/redhat/heads/gcc-14-branch] (58 commits) Merge commit 'r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea

2024-06-07 Thread Jakub Jelinek via Gcc-cvs
The branch 'redhat/heads/gcc-14-branch' was updated to point to:

 1de1e03e8bd... Merge commit 'r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea

It previously pointed to:

 e6b72839728... Merge commit 'r14-10231-gfc9fb69ad624fd4cc89ff31ad0a7b8d884

Diff:

Summary of changes (added commits):
---

  1de1e03... Merge commit 'r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea
  0f616e7... bitint: Fix up lower_addsub_overflow [PR115352] (*)
  7d40974... Daily bump. (*)
  56c7372... c: Fix up pointer types to may_alias structures [PR114493] (*)
  35ed54f... aarch64: Add missing ACLE macro for NEON-SVE Bridge (*)
  d576034... Daily bump. (*)
  e11a42b... testsuite: i386: Require ifunc support in gcc.target/i386/a (*)
  7f0f88e... Daily bump. (*)
  c6e6258... libstdc++: Only define std::span::at for C++26 [PR115335] (*)
  a88e13b... fold-const: Fix up CLZ handling in tree_call_nonnegative_wa (*)
  f9af4a0... builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overf (*)
  1c1bc25... invoke.texi: Clarify -march=lujiazui (*)
  a7dd44c... rs6000: Fix up PCH in --enable-host-pie builds [PR115324] (*)
  14a7296... combine: Fix up simplify_compare_const [PR115092] (*)
  e805232... testsuite: gm2: Remove timeout overrides [PR114886] (*)
  d92b508... libstdc++: Build libbacktrace and 19_diagnostics/stacktrace (*)
  b2bbf98... Daily bump. (*)
  955202e... libstdc++: Fix -Wstringop-overflow warning coming from std: (*)
  97474ba... Add AVX10.1 target_clones support (*)
  1dbf796... Daily bump. (*)
  a31676a... Daily bump. (*)
  d7f4279... AVR: target/115317 - Make isinf(-Inf) return -1. (*)
  2f097c0... libstdc++: Replace link to gcc-4.3.2 docs in manual [PR1152 (*)
  9d08c55... AVR: tree-optimization/115307 - Work around isinf bloat fro (*)
  5ca4e16... Daily bump. (*)
  ec92744... alpha: Fix invalid RTX in divmodsi insn patterns [PR115297] (*)
  36575f5... vect: Fix access size alignment assumption [PR115192] (*)
  cd161b3... i386: Fix ix86_option override after change [PR 113719] (*)
  06333a1... Daily bump. (*)
  201cfa7... MIPS16: Mark $2/$3 as clobbered if GP is used (*)
  8f6c56c... Daily bump. (*)
  fba2843... Fix link failure of GNAT tools on 32-bit SPARC/Linux (*)
  90a4476... tree-optimization/115149 - VOP live and missing PHIs (*)
  2a1fdd5... tree-optimization/115197 - fix ICE w/ constant in LC PHI an (*)
  9e971c6... tree-optimization/114921 - _Float16 -> __bf16 isn't noop fi (*)
  b4d4ece... Align tight loop without considering max skipping bytes (*)
  8060035... Adjust generic loop alignment from 16:11:8 to 16 for Intel  (*)
  e2b66da... Daily bump. (*)
  dbeb3d1... Fortran: Fix SHAPE for zero-size arrays (*)
  89dff14... libstdc++: Guard use of sized deallocation [PR114940] (*)
  e78980f... LoongArch: Guard REGNO with REG_P in loongarch_expand_condi (*)
  133da68... Daily bump. (*)
  4790076... tree-optimization/115232 - demangle failure during -Waccess (*)
  0cae44a... Daily bump. (*)
  2e0f832... Daily bump. (*)
  b0b21d5... Fortran: fix bounds check for assignment, class component [ (*)
  cab8941... Daily bump. (*)
  9031c02... c++: deleting array temporary [PR115187] (*)
  782ad20... c++: Propagate using decls from partitions [PR114868] (*)
  fd6fd88... c++: Fix instantiation of imported temploid friends [PR1142 (*)
  557cddc... c++: Standardise errors for module_may_redeclare (*)
  5429e6a... Daily bump. (*)
  1a6c1c8... sra: Do not leave work for DSE (that it can sometimes not p (*)
  137e7a8... Daily bump. (*)
  c27d6c7... c++: failure to suppress -Wsizeof-array-div in template [PR (*)
  da3a6b0... testsuite: Verify r0-r3 are extended with CMSE (*)
  2f0e086... Fix internal error in seh_cfa_offset with -O2 -fno-omit-fra (*)
  4896bb3... libstdc++: Implement std::formatter withou (*)

(*) This commit already exists in another branch.
Because the reference `refs/vendors/redhat/heads/gcc-14-branch' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/vendors/redhat/heads/gcc-14-branch)] Merge commit 'r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea41fa0c' into redhat/gcc-14-branch

2024-06-07 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:1de1e03e8bd3490b53f6fe454f7a48ddc1c839f2

commit 1de1e03e8bd3490b53f6fe454f7a48ddc1c839f2
Merge: e6b72839728 0f616e75f32
Author: Jakub Jelinek 
Date:   Fri Jun 7 10:39:08 2024 +0200

Merge commit 'r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea41fa0c' into 
redhat/gcc-14-branch

Diff:

 gcc/ChangeLog  | 234 +++
 gcc/DATESTAMP  |   2 +-
 gcc/ada/ChangeLog  |   7 +
 gcc/ada/Makefile.rtl   |  13 +-
 gcc/builtins.cc|  22 +-
 gcc/c/ChangeLog|  10 +
 gcc/c/c-decl.cc|  15 ++
 gcc/combine.cc |   6 +-
 gcc/common/config/i386/i386-common.cc  |   4 +-
 gcc/common/config/i386/i386-cpuinfo.h  |   5 +-
 gcc/common/config/i386/i386-isas.h |   4 +-
 gcc/config/aarch64/aarch64-c.cc|   1 +
 gcc/config/alpha/alpha.md  |  21 +-
 gcc/config/alpha/constraints.md|   2 +-
 gcc/config/avr/avr.md  |  16 ++
 gcc/config/i386/i386-options.cc|  10 +-
 gcc/config/i386/i386.cc| 148 +++-
 gcc/config/i386/i386.md|  10 +-
 gcc/config/i386/x86-tune-costs.h   |   2 +-
 gcc/config/loongarch/loongarch.cc  |  17 +-
 gcc/config/mips/mips.cc|  11 +-
 gcc/config/rs6000/rs6000-builtin.cc|   2 +-
 gcc/config/rs6000/rs6000-c.cc  |  62 ++---
 gcc/config/rs6000/rs6000-gen-builtins.cc   |  72 +++---
 gcc/cp/ChangeLog   |  66 ++
 gcc/cp/cp-tree.h   |   5 +-
 gcc/cp/decl.cc |  69 +++---
 gcc/cp/init.cc |   9 +-
 gcc/cp/module.cc   | 201 
 gcc/cp/name-lookup.cc  |  53 +
 gcc/cp/pt.cc   |  33 ++-
 gcc/cp/semantics.cc|   8 +-
 gcc/cp/tree.cc |   6 +-
 gcc/doc/invoke.texi|   6 +-
 gcc/fold-const.cc  |   6 +-
 gcc/fold-mem-offsets.cc|   2 +-
 gcc/fortran/ChangeLog  |  20 ++
 gcc/fortran/trans-array.cc |   7 +-
 gcc/fortran/trans-expr.cc  |  40 ++--
 gcc/fortran/trans-intrinsic.cc |   4 +-
 gcc/gimple-lower-bitint.cc |   6 +-
 gcc/gimple-ssa-warn-access.cc  |   2 +-
 gcc/testsuite/ChangeLog| 253 +
 gcc/testsuite/g++.dg/cpp1z/array-prvalue3.C|   8 +
 gcc/testsuite/g++.dg/modules/enum-12.C |   2 +-
 gcc/testsuite/g++.dg/modules/friend-5_b.C  |   2 +-
 gcc/testsuite/g++.dg/modules/shadow-1_b.C  |   5 +-
 gcc/testsuite/g++.dg/modules/tpl-friend-10_a.C |  15 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-10_b.C |   5 +
 gcc/testsuite/g++.dg/modules/tpl-friend-10_c.C |   7 +
 gcc/testsuite/g++.dg/modules/tpl-friend-10_d.C |   8 +
 gcc/testsuite/g++.dg/modules/tpl-friend-11_a.C |  14 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-11_b.C |   5 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_a.C |  10 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_b.C |   9 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_c.C |  10 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_d.C |   8 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_e.C |   7 +
 gcc/testsuite/g++.dg/modules/tpl-friend-12_f.C |   8 +
 gcc/testsuite/g++.dg/modules/tpl-friend-13_a.C |  13 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-13_b.C |  11 +
 gcc/testsuite/g++.dg/modules/tpl-friend-13_c.C |  13 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-13_d.C |   7 +
 gcc/testsuite/g++.dg/modules/tpl-friend-13_e.C |  18 ++
 gcc/testsuite/g++.dg/modules/tpl-friend-13_f.C |   7 +
 gcc/testsuite/g++.dg/modules/tpl-friend-13_g.C |  11 +
 gcc/testsuite/g++.dg/modules/tpl-friend-14_a.C |   8 +
 gcc/testsuite/g++.dg/modules/tpl-friend-14_b.C |   8 +
 gcc/testsuite/g++.dg/modules/tpl-friend-14_c.C |   7 +
 gcc/testsuite/g++.dg/modules/tpl-friend-14_d.C |   9 +
 gcc/testsuite/g++.dg/modules/tpl-friend-9.C|  13 ++
 gcc/testsuite/g++.dg/modules/using-15_a.C  |  14 ++
 gcc/testsuite/g++.dg/modules/using-15_b.C  |   6 +
 gcc/testsuite/g++.dg/modules/using-15_c.C  |   8 +
 gcc/testsuite/g++.dg/opt/fmo1.C|  25 ++
 gcc/testsuite/g++.dg/pr115232.C

[gcc r14-10288] bitint: Fix up lower_addsub_overflow [PR115352]

2024-06-07 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:0f616e75f32083e1bc6d08f31e3fbc3dea41fa0c

commit r14-10288-g0f616e75f32083e1bc6d08f31e3fbc3dea41fa0c
Author: Jakub Jelinek 
Date:   Fri Jun 7 10:32:08 2024 +0200

bitint: Fix up lower_addsub_overflow [PR115352]

The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
  _25 = .USUBC (0, _24, _14);
  _12 = IMAGPART_EXPR <_25>;
  _26 = REALPART_EXPR <_25>;
  if (_23 >= 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  if (_23 != 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  _27 = (signed long) _26;
  _28 = _27 >> 1;
  _29 = (unsigned long) _28;
  _31 = _29 + 1;
  _30 = _31 > 1;
  goto ; [100.00%]

   :
  _32 = _26 != _18;
  _33 = _22 | _32;

   :
  # _17 = PHI <_30(9), _22(7), _33(10)>
  # _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.

Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all.  There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.

So, the optimization just can't work properly and the following patch
    removes it.

2024-06-07  Jakub Jelinek  

PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.

* gcc.dg/torture/bitint-71.c: New test.

(cherry picked from commit a47b1aaa7a76201da7e091d9f8d4488105786274)

Diff:
---
 gcc/gimple-lower-bitint.cc   |  6 +-
 gcc/testsuite/gcc.dg/torture/bitint-71.c | 28 
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
index 7e8b6e3c51a..56e5f826a8d 100644
--- a/gcc/gimple-lower-bitint.cc
+++ b/gcc/gimple-lower-bitint.cc
@@ -4286,11 +4286,7 @@ bitint_large_huge::lower_addsub_overflow (tree obj, 
gimple *stmt)
  bool single_comparison
= (startlimb + 2 >= fin || (startlimb & 1) != (i & 1));
  if (!single_comparison)
-   {
- cmp_code = GE_EXPR;
- if (!check_zero && (start % limb_prec) == 0)
-   single_comparison = true;
-   }
+   cmp_code = GE_EXPR;
  else if ((startlimb & 1) == (i & 1))
cmp_code = EQ_EXPR;
  else
diff --git a/gcc/testsuite/gcc.dg/torture/bitint-71.c 
b/gcc/testsuite/gcc.dg/torture/bitint-71.c
new file mode 100644
index 000..8ebd42b30b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/bitint-71.c
@@ -0,0 +1,28 @@
+/* PR middle-end/115352 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 385
+

[gcc r15-1093] bitint: Fix up lower_addsub_overflow [PR115352]

2024-06-07 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:a47b1aaa7a76201da7e091d9f8d4488105786274

commit r15-1093-ga47b1aaa7a76201da7e091d9f8d4488105786274
Author: Jakub Jelinek 
Date:   Fri Jun 7 10:32:08 2024 +0200

bitint: Fix up lower_addsub_overflow [PR115352]

The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
  _25 = .USUBC (0, _24, _14);
  _12 = IMAGPART_EXPR <_25>;
  _26 = REALPART_EXPR <_25>;
  if (_23 >= 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  if (_23 != 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  _27 = (signed long) _26;
  _28 = _27 >> 1;
  _29 = (unsigned long) _28;
  _31 = _29 + 1;
  _30 = _31 > 1;
  goto ; [100.00%]

   :
  _32 = _26 != _18;
  _33 = _22 | _32;

   :
  # _17 = PHI <_30(9), _22(7), _33(10)>
  # _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.

Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all.  There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.

So, the optimization just can't work properly and the following patch
    removes it.

2024-06-07  Jakub Jelinek  

PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.

* gcc.dg/torture/bitint-71.c: New test.

Diff:
---
 gcc/gimple-lower-bitint.cc   |  6 +-
 gcc/testsuite/gcc.dg/torture/bitint-71.c | 28 
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-lower-bitint.cc b/gcc/gimple-lower-bitint.cc
index 7e8b6e3c51a..56e5f826a8d 100644
--- a/gcc/gimple-lower-bitint.cc
+++ b/gcc/gimple-lower-bitint.cc
@@ -4286,11 +4286,7 @@ bitint_large_huge::lower_addsub_overflow (tree obj, 
gimple *stmt)
  bool single_comparison
= (startlimb + 2 >= fin || (startlimb & 1) != (i & 1));
  if (!single_comparison)
-   {
- cmp_code = GE_EXPR;
- if (!check_zero && (start % limb_prec) == 0)
-   single_comparison = true;
-   }
+   cmp_code = GE_EXPR;
  else if ((startlimb & 1) == (i & 1))
cmp_code = EQ_EXPR;
  else
diff --git a/gcc/testsuite/gcc.dg/torture/bitint-71.c 
b/gcc/testsuite/gcc.dg/torture/bitint-71.c
new file mode 100644
index 000..8ebd42b30b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/bitint-71.c
@@ -0,0 +1,28 @@
+/* PR middle-end/115352 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 385
+int
+foo (_BitInt (385) b)
+

Re: [PATCH] OpenMP: warn about iteration var modifications in loop body

2024-06-07 Thread Jakub Jelinek
On Wed, Mar 06, 2024 at 06:08:47PM +0100, Frederik Harwath wrote:
> Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

Note, the partially rewritten OpenMP loop transformations changes are now
in.
See below.

> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -235,6 +235,8 @@ struct gimplify_omp_ctx
>bool order_concurrent;
>bool has_depend;
>bool in_for_exprs;
> +  bool in_omp_for_body;
> +  bool is_doacross;
>int defaultmap[5];
>  };
>  
> @@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
>c->privatized_types = new hash_set;
>c->location = input_location;
>c->region_type = region_type;
> +  c->loop_iter_var.create (0);
> +  c->in_omp_for_body = false;
> +  c->is_doacross = false;

I'm not sure it is a good idea to reuse loop_iter_var for this.

>if ((region_type & ORT_TASK) == 0)
>  c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
>else
> @@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
> || TREE_CODE (*expr_p) == INIT_EXPR);
>  
> +  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
> +{
> +  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
> +  for (size_t i = 0; i < num_vars; i++)
> + {
> +   if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
> + warning_at (input_location, OPT_Wopenmp,
> + "forbidden modification of iteration variable %qE in "
> + "OpenMP loop", *to_p);

I think the forbidden word doesn't belong there, just modification of ...

Note, your patch seems to handle just one gimplify_omp_ctxp, not all.
If I do:
#pragma omp for
for (int i = 0; i < 32; ++i)
{
  ++i; // This is warned about
  #pragma omp parallel shared (i)
  #pragma omp master
  ++i; // This is not
  #pragma omp parallel private (i)
  ++i; // This should not
  #pragma omp target map(tofrom:i)
  ++i; // This is not
  #pragma omp target firstprivate (i)\
  ++i; // This should not
  #pragma omp simd
  for (i = 0; i < 32; ++i) // This is not
;
}
The question is if it isn't just too hard to figure out the data sharing
in nested constructs.  But to be useful, perhaps at least loop
transformation constructs which don't have any privatization on the
iterators (pending the resolution of the data sharing loop transformation
issue) should be handled.

> @@ -15380,23 +15398,22 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>gcc_assert (DECL_P (decl));
>gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (decl))
> || POINTER_TYPE_P (TREE_TYPE (decl)));
> -  if (is_doacross)
> +
> +  if (TREE_CODE (for_stmt) == OMP_FOR && OMP_FOR_ORIG_DECLS (for_stmt))

There is nothing specific about OMP_FOR for the orig decls, the reason
why the check is (probably) there is that simd construct has extra
restriction:
"The only random access iterator types that are allowed for the associated 
loops are pointer
types."
and so there is no point at looking at the orig decls for say for simd 
ordered(2)
doacross loops.
I was worried your patch wouldn't handle
void bar (int &);

void
foo ()
{
  int i;
  #pragma omp for
  for (i = 0; i < 32; ++i)
bar (i);
}
where because the IV is addressable we actually choose to use an artificial
IV and assign i = i.0; at the start of the loop body, but apparently that
works right (though maybe it should go into the testsuite), supposedly we
emit it in gimplify_omp_for in GIMPLE before actually gimplifying the actual
OMP_FOR_BODY (but it is an assignment in there).

Anyway, what the patch certainly doesn't handle is the loop transformations.
The tile/unroll partial as done right now have the inter-tile emitted into
the OMP_FOR body, so both the initial assignment and the increment in there
would trigger the warning.  I guess similarly for reverse construct when
implemented.  Furthermore, the generated loops together with associated
ORIG_DECLs move to whatever outer construct loop needs them.

So, I think instead of doing it during gimplification of actual statements,
we should do it through a walk_tree on the bodies, done perhaps from inside
of omp_maybe_apply_loop_xforms or better right before that and mark through some
new flag loops whose bodies were walked for the diagnostics so that we don't
do that again.  Just have one hash map based on say DECL_UID into which we
mark all the loop iterators which should be warned about,
*walk_subtrees = 0; for OpenMP constructs which could privatize stuff
because it would be too difficult to handle but walk using a separate
walk_tree the loop transformation constructs and normally walk say
OMP_CRITICAL, OMP_MASKED and other constructs which never privatize stuff.
So, handle say
#pragma omp for
#pragma omp tile sizes (2, 2)
for (int i = 0; i < 32; ++i)
for (int j = 0; j < 32; ++j)
{
  ++i; // warn here; this is in the end generated loop of for, 

[PATCH] bitint: Fix up lower_addsub_overflow [PR115352]

2024-06-07 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled because of a flawed optimization.
If one changes the 65 in the testcase to e.g. 66, one gets:
...
  _25 = .USUBC (0, _24, _14);
  _12 = IMAGPART_EXPR <_25>;
  _26 = REALPART_EXPR <_25>;
  if (_23 >= 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  if (_23 != 1)
goto ; [80.00%]
  else
goto ; [20.00%]

   :
  _27 = (signed long) _26;
  _28 = _27 >> 1;
  _29 = (unsigned long) _28;
  _31 = _29 + 1;
  _30 = _31 > 1;
  goto ; [100.00%]

   :
  _32 = _26 != _18;
  _33 = _22 | _32;

   :
  # _17 = PHI <_30(9), _22(7), _33(10)>
  # _19 = PHI <_29(9), _18(7), _18(10)>
...
so there is one path for limbs below the boundary (in this case there are
actually no limbs there, maybe we could consider optimizing that further,
say with simply folding that _23 >= 1 condition to 1 == 1 and letting
cfg cleanup handle it), another case where it is exactly the limb on the
boundary (that is the bb 9 handling where it extracts the interesting
bits (the first 3 statements) and then checks if it is zero or all ones and
finally the case of limbs above that where it compares the current result
limb against the previously recorded 0 or all ones and ors differences into
accumulated result.

Now, the optimization which the first hunk removes was based on the idea
that for that case the extraction of the interesting bits from the limb
don't need anything special, so the _27/_28/_29 statements above aren't
needed, the whole limb is interesting bits, so it handled the >= 1
case like the bb 9 above without the first 3 statements and bb 10 wasn't
there at all.  There are 2 problems with that, for the higher limbs it
only checks if the the result limb bits are all zeros or all ones, but
doesn't check if they are the same as the other extension bits, and
it forgets the previous flag whether there was an overflow.
First I wanted to fix it just by adding the _33 = _22 | _30; statement
to the end of bb 9 above, which fixed the originally filed huge testcase
and the first 2 foo calls in the testcase included in the patch, it no
longer forgets about previously checked differences from 0/1.
But as the last 2 foo calls show, it still didn't check whether each
even (or each odd depending on the exact position) result limb is
equal to the first one, so every second limb it could choose some other
0 vs. all ones value and as long as it repeated in another limb above it
it would be ok.

So, the optimization just can't work properly and the following patch
removes it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/14.2?

2024-06-07  Jakub Jelinek  

PR middle-end/115352
* gimple-lower-bitint.cc (lower_addsub_overflow): Don't disable
single_comparison if cmp_code is GE_EXPR.

* gcc.dg/torture/bitint-71.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-04-12 10:59:48.233153262 +0200
+++ gcc/gimple-lower-bitint.cc  2024-06-06 12:06:57.065717651 +0200
@@ -4286,11 +4286,7 @@ bitint_large_huge::lower_addsub_overflow
  bool single_comparison
= (startlimb + 2 >= fin || (startlimb & 1) != (i & 1));
  if (!single_comparison)
-   {
- cmp_code = GE_EXPR;
- if (!check_zero && (start % limb_prec) == 0)
-   single_comparison = true;
-   }
+   cmp_code = GE_EXPR;
  else if ((startlimb & 1) == (i & 1))
cmp_code = EQ_EXPR;
  else
--- gcc/testsuite/gcc.dg/torture/bitint-71.c.jj 2024-06-06 12:20:55.824913276 
+0200
+++ gcc/testsuite/gcc.dg/torture/bitint-71.c2024-06-06 12:20:45.260044338 
+0200
@@ -0,0 +1,28 @@
+/* PR middle-end/115352 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 385
+int
+foo (_BitInt (385) b)
+{
+  return __builtin_sub_overflow_p (0, b, (_BitInt (65)) 0);
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 385
+  if (!foo (-(_BitInt (385)) 
0x0c377e8a3fd1881fff035bb487a51c9ed1f7350befa7ec445a3cf8d1ebb723981wb))
+__builtin_abort ();
+  if (!foo 
(-0x1c377e8a3fd1881fff035bb487a51c9ed1f7350befa7ec445a3cf8d1ebb723981uwb))
+__builtin_abort ();
+  if (!foo (-(_BitInt (385)) 
0x0a3cf8d1ebb723981wb))
+__builtin_abort ();
+  if (!foo 
(-0x1a3cf8d1ebb723981uwb))
+__builtin_abort ();
+#endif
+}

Jakub



[gcc r14-10286] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-06 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:56c73729c3eab08ca48f366bd435f98457743e45

commit r14-10286-g56c73729c3eab08ca48f366bd435f98457743e45
Author: Jakub Jelinek 
Date:   Thu Jun 6 22:12:11 2024 +0200

c: Fix up pointer types to may_alias structures [PR114493]

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE 
(t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

    2024-06-06  Jakub Jelinek  

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

(cherry picked from commit d5a3c6d43acb8b2211d9fb59d59482d74c010f01)

Diff:
---
 gcc/c/c-decl.cc   | 15 +++
 gcc/testsuite/gcc.dg/pr114493-1.c | 19 +++
 gcc/testsuite/gcc.dg/pr114493-2.c | 26 ++
 3 files changed, 60 insertions(+)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 52af8f32998..e63dab49589 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9393,6 +9393,17 @@ c_update_type_canonical (tree t)
 }
 }
 
+/* TYPE is a struct or union that we're applying may_alias to after the body is
+   parsed.  Fixup any POINTER_TO types.  */
+
+static void
+c_fixup_may_alias (tree type)
+{
+  for (tree t = TYPE_POINTER_TO (type); t; t = TYPE_NEXT_PTR_TO (t))
+for (tree v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
+}
+
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.
LOC is the location of the RECORD_TYPE or UNION_TYPE's definition.
FIELDLIST is a chain of FIELD_DECL nodes for the fields.
@@ -9737,6 +9748,10 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 
   C_TYPE_BEING_DEFINED (t) = 0;
 
+  if (lookup_attribute ("may_alias", TYPE_ATTRIBUTES (t)))
+for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
+  c_fixup_may_alias (x);
+
   /* Set type canonical based on equivalence class.  */
   if (flag_isoc23)
 {
diff --git a/gcc/testsuite/gcc.dg/pr114493-1.c 
b/gcc/testsuite/gcc.dg/pr114493-1.c
new file mode 100644
index 000..446f33eac3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-1.c
@@ -0,0 +1,19 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}
diff --git a/gcc/testsuite/gcc.dg/pr114493-2.c 
b/gcc/testsuite/gcc.dg/pr114493-2.c
new file mode 100644
index 000..1b4a5792dc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-2.c
@@ -0,0 +1,26 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto -std=c23" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+void
+corge (void)
+{
+  struct S { int s; } s;
+  s.s = 0;
+}
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}


[gcc r15-1080] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-06 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:d5a3c6d43acb8b2211d9fb59d59482d74c010f01

commit r15-1080-gd5a3c6d43acb8b2211d9fb59d59482d74c010f01
Author: Jakub Jelinek 
Date:   Thu Jun 6 22:12:11 2024 +0200

c: Fix up pointer types to may_alias structures [PR114493]

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE 
(t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

    2024-06-06  Jakub Jelinek  

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

Diff:
---
 gcc/c/c-decl.cc   | 15 +++
 gcc/testsuite/gcc.dg/pr114493-1.c | 19 +++
 gcc/testsuite/gcc.dg/pr114493-2.c | 26 ++
 3 files changed, 60 insertions(+)

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 64924b87a91..6c09eb73128 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -9446,6 +9446,17 @@ verify_counted_by_attribute (tree struct_type, tree 
field_decl)
   return;
 }
 
+/* TYPE is a struct or union that we're applying may_alias to after the body is
+   parsed.  Fixup any POINTER_TO types.  */
+
+static void
+c_fixup_may_alias (tree type)
+{
+  for (tree t = TYPE_POINTER_TO (type); t; t = TYPE_NEXT_PTR_TO (t))
+for (tree v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
+}
+
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.
LOC is the location of the RECORD_TYPE or UNION_TYPE's definition.
FIELDLIST is a chain of FIELD_DECL nodes for the fields.
@@ -9791,6 +9802,10 @@ finish_struct (location_t loc, tree t, tree fieldlist, 
tree attributes,
 
   C_TYPE_BEING_DEFINED (t) = 0;
 
+  if (lookup_attribute ("may_alias", TYPE_ATTRIBUTES (t)))
+for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
+  c_fixup_may_alias (x);
+
   /* Set type canonical based on equivalence class.  */
   if (flag_isoc23 && !C_TYPE_VARIABLE_SIZE (t))
 {
diff --git a/gcc/testsuite/gcc.dg/pr114493-1.c 
b/gcc/testsuite/gcc.dg/pr114493-1.c
new file mode 100644
index 000..446f33eac3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-1.c
@@ -0,0 +1,19 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}
diff --git a/gcc/testsuite/gcc.dg/pr114493-2.c 
b/gcc/testsuite/gcc.dg/pr114493-2.c
new file mode 100644
index 000..1b4a5792dc9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114493-2.c
@@ -0,0 +1,26 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto -std=c23" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+void
+corge (void)
+{
+  struct S { int s; } s;
+  s.s = 0;
+}
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}


[committed] libgomp: Mark Loop transformation constructs as implemented in the implementation status

2024-06-06 Thread Jakub Jelinek
Hi!

The implementation has been committed in r15-1037.

2024-06-06  Jakub Jelinek  

* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation constructs
as implemented.

--- libgomp/libgomp.texi
+++ libgomp/libgomp.texi
@@ -302,7 +302,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{error} directive @tab Y @tab
 @item @code{masked} construct @tab Y @tab
 @item @code{scope} directive @tab Y @tab
-@item Loop transformation constructs @tab N @tab
+@item Loop transformation constructs @tab Y @tab
 @item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
   clauses of the @code{taskloop} construct @tab Y @tab
 @item @code{align} clause in @code{allocate} directive @tab P

Jakub



[gcc r15-1052] libgomp: Mark Loop transformation constructs as implemented in the implementation status

2024-06-06 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:6a6bab4ba36c5d190b3151055e683e7067be92c1

commit r15-1052-g6a6bab4ba36c5d190b3151055e683e7067be92c1
Author: Jakub Jelinek 
Date:   Thu Jun 6 08:30:42 2024 +0200

libgomp: Mark Loop transformation constructs as implemented in the 
implementation status

The implementation has been committed in r15-1037.

2024-06-06  Jakub Jelinek  

* libgomp.texi (OpenMP 5.1 status): Mark Loop transformation 
constructs
as implemented.

Diff:
---
 libgomp/libgomp.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index d612488ad10..c52bb2672c6 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -302,7 +302,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{error} directive @tab Y @tab
 @item @code{masked} construct @tab Y @tab
 @item @code{scope} directive @tab Y @tab
-@item Loop transformation constructs @tab N @tab
+@item Loop transformation constructs @tab Y @tab
 @item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
   clauses of the @code{taskloop} construct @tab Y @tab
 @item @code{align} clause in @code{allocate} directive @tab P


Re: [PATCH] c++: Handle erroneous DECL_LOCAL_DECL_ALIAS in duplicate_decls [PR107575]

2024-06-05 Thread Jakub Jelinek
On Wed, Jun 05, 2024 at 08:13:14AM +, Simon Martin wrote:
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -2792,10 +2792,13 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
> hiding, bool was_hidden)
> retrofit_lang_decl (newdecl);
> tree alias = DECL_LOCAL_DECL_ALIAS (newdecl)
>   = DECL_LOCAL_DECL_ALIAS (olddecl);
> -   DECL_ATTRIBUTES (alias)
> - = (*targetm.merge_decl_attributes) (alias, newdecl);
> -   if (TREE_CODE (newdecl) == FUNCTION_DECL)
> - merge_attribute_bits (newdecl, alias);
> +   if (alias != error_mark_node)
> + {
> +   DECL_ATTRIBUTES (alias) =
> + (*targetm.merge_decl_attributes) (alias, newdecl);

Formatting nit, = should be on the next line, not at the end of a line.
See https://gcc.gnu.org/codingconventions.html and 
https://gcc.gnu.org/codingconventions.html

Jakub



Re: How to avoid some built-in expansions in gcc?

2024-06-04 Thread Jakub Jelinek via Gcc
On Tue, Jun 04, 2024 at 07:43:40PM +0200, Michael Matz via Gcc wrote:
> (Well, and without reverse-recognition of isfinite-like idioms in the 
> sources.  That's orthogonal as well.)

Why?  If isfinite is better done by a libcall, why isn't isfinite-like
idiom also better done as a libcall?

Jakub



[gcc r14-10280] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:a88e13bd7e0f50011e7f7f6e05c6f5e2a031143c

commit r14-10280-ga88e13bd7e0f50011e7f7f6e05c6f5e2a031143c
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

(cherry picked from commit b82a816000791e7a286c7836b3a473ec0e2a577b)

Diff:
---
 gcc/fold-const.cc |  6 +-
 gcc/testsuite/gcc.dg/bitint-106.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 7b268964acc..f496b3436df 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15241,7 +15241,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -15250,6 +15249,11 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (arg1)
+   return RECURSE (arg1);
+  return true;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */
diff --git a/gcc/testsuite/gcc.dg/bitint-106.c 
b/gcc/testsuite/gcc.dg/bitint-106.c
new file mode 100644
index 000..a36e8836690
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitint-106.c
@@ -0,0 +1,29 @@
+/* PR tree-optimization/115337 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+#define N 128
+#else
+#define N 63
+#endif
+
+_BitInt (N) g;
+int c;
+
+void
+foo (unsigned _BitInt (N + 1) z, _BitInt (N) *ret)
+{
+  c = __builtin_stdc_first_leading_one (z << N);
+  _BitInt (N) y = *(_BitInt (N) *) __builtin_memset (, c, 5);
+  *ret = y;
+}
+
+int
+main ()
+{
+  _BitInt (N) x;
+  foo (0, );
+  if (c || g || x)
+__builtin_abort ();
+}


[gcc r14-10278] invoke.texi: Clarify -march=lujiazui

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:1c1bc2553f6cb6d104f1f1b749aac0f39c4a3959

commit r14-10278-g1c1bc2553f6cb6d104f1f1b749aac0f39c4a3959
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:20:13 2024 +0200

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04  Jakub Jelinek  

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

(cherry picked from commit 09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74)

Diff:
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9456ced468a..a916d618960 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34732,8 +34732,10 @@ instruction set support.
 
 @item lujiazui
 ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.
 
 @item yongfeng
 ZHAOXIN yongfeng CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,


[gcc r14-10279] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow and __builtin{add, sub}c [PR108789]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:f9af4a05e027a8b797628f1a2c39ef0b28dc36d9

commit r14-10279-gf9af4a05e027a8b797628f1a2c39ef0b28dc36d9
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and 
__builtin{add,sub}c [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

(cherry picked from commit b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c)

Diff:
---
 gcc/builtins.cc| 22 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..7c1497561f7 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10042,7 +10042,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
@@ -10354,11 +10368,17 @@ fold_builtin_addc_subc (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
args[0], args[1]);
+  /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+ as while the call itself is const, the REALPART_EXPR store is
+ certainly not.  And in any case, we want just one call,
+ not multiple and trying to CSE them later.  */
+  TREE_SIDE_EFFECTS (call) = 1;
   tree tgt = save_expr (call);
   tree intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
   intres, args[2]);
+  TREE_SIDE_EFFECTS (call) = 1;
   tgt = save_expr (call);
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres2 = build1_loc (loc, IMAGPART_EXPR, type, tgt);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x

[gcc r14-10277] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:a7dd44c02ec1047166b4bacc3faa6255c816da2a

commit r14-10277-ga7dd44c02ec1047166b4bacc3faa6255c816da2a
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:11:06 2024 +0200

rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or _instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
  {
_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_in

[gcc r14-10276] combine: Fix up simplify_compare_const [PR115092]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:14a7296d04474055bfe1d7f130dceac6dabf390d

commit r14-10276-g14a7296d04474055bfe1d7f130dceac6dabf390d
Author: Jakub Jelinek 
Date:   Wed May 15 18:37:17 2024 +0200

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers.  That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15  Jakub Jelinek  

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

(cherry picked from commit 0b93a0ae153ef70a82ff63e67926a01fdab9956b)

Diff:
---
 gcc/combine.cc  |  6 --
 gcc/testsuite/gcc.dg/pr114902.c | 23 +++
 gcc/testsuite/gcc.dg/pr115092.c | 16 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 92b8d98e6c1..60afe043578 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11841,8 +11841,10 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  `and'ed with that bit), we can replace this with a comparison
  with zero.  */
   if (const_op
-  && (code == EQ || code == NE || code == GE || code == GEU
- || code == LT || code == LTU)
+  && (code == EQ || code == NE || code == GEU || code == LTU
+ /* This optimization is incorrect for signed >= INT_MIN or
+< INT_MIN, those are always true or always false.  */
+ || ((code == GE || code == LT) && const_op > 0))
   && is_a  (mode, _mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
diff --git a/gcc/testsuite/gcc.dg/pr114902.c b/gcc/testsuite/gcc.dg/pr114902.c
new file mode 100644
index 000..60684faa25d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114902.c
@@ -0,0 +1,23 @@
+/* PR rtl-optimization/114902 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp 
-fno-tree-dominator-opts" } */
+
+__attribute__((noipa))
+int foo (int x)
+{
+  int a = ~x;
+  int t = a & 1;
+  int e = -t;
+  int b = e >= -1;
+  if (b)
+return 0;
+  __builtin_trap ();
+}
+
+int
+main ()
+{
+  foo (-1);
+  foo (0);
+  foo (1);
+}
diff --git a/gcc/testsuite/gcc.dg/pr115092.c b/gcc/testsuite/gcc.dg/pr115092.c
new file mode 100644
index 000..c9047f4d321
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115092.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/115092 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre 
-fno-guess-branch-probability" } */
+
+int a, b, c = 1, d, e;
+
+int
+main ()
+{
+  int f, g = a;
+  b = -2;
+  f = -(1 >> ((c && b) & ~a));
+  if (f <= b)
+d = g / e;
+  return 0;
+}


[gcc r15-1014] ranger: Improve CLZ fold_range [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:591d30c5c97e757f63ce0d99ae9a3dbe8c75a50a

commit r15-1014-g591d30c5c97e757f63ce0d99ae9a3dbe8c75a50a
Author: Jakub Jelinek 
Date:   Tue Jun 4 16:16:49 2024 +0200

ranger: Improve CLZ fold_range [PR115337]

cfn_ctz::fold_range includes special cases for the case where .CTZ has
two arguments and so is well defined at zero, and the second argument is
equal to prec or -1, but cfn_clz::fold_range does that only for the prec
case.  -1 is fairly common as well though, because the  builtins
do use it now, so I think it is worth special casing that.
If we don't know anything about the argument, the difference for
.CLZ (arg, -1) is that previously the result was varying, now it will be
[-1, prec-1].  If we knew arg can't be zero, it used to be optimized before
as well into e.g. [0, prec-1] or similar.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* gimple-range-op.cc (cfn_clz::fold_range): For
m_gimple_call_internal_p handle as a special case also second 
argument
of -1 next to prec.

Diff:
---
 gcc/gimple-range-op.cc | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index aec3f39ec0e..1b9a84708b9 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -941,8 +941,10 @@ cfn_clz::fold_range (irange , tree type, const irange 
,
   int maxi = prec - 1;
   if (m_gimple_call_internal_p)
 {
-  // Only handle the single common value.
-  if (rh.lower_bound () == prec)
+  // Handle only the two common values.
+  if (rh.lower_bound () == -1)
+   mini = -1;
+  else if (rh.lower_bound () == prec)
maxi = prec;
   else
// Magic value to give up, unless we can prove arg is non-zero.
@@ -953,7 +955,7 @@ cfn_clz::fold_range (irange , tree type, const irange ,
   if (wi::gt_p (lh.lower_bound (), 0, TYPE_SIGN (lh.type (
 {
   maxi = prec - 1 - wi::floor_log2 (lh.lower_bound ());
-  if (mini == -2)
+  if (mini < 0)
mini = 0;
 }
   else if (!range_includes_zero_p (lh))
@@ -969,11 +971,11 @@ cfn_clz::fold_range (irange , tree type, const irange 
,
   if (max == 0)
 {
   // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec,
-  // return [prec, prec], otherwise ignore the range.
-  if (maxi == prec)
-   mini = prec;
+  // return [prec, prec] or [-1, -1], otherwise ignore the range.
+  if (maxi == prec || mini == -1)
+   mini = maxi;
 }
-  else
+  else if (mini >= 0)
 mini = newmini;
 
   if (mini == -2)


[gcc r15-1013] fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:181861b072ff1ef650c1a9d0290a4a672b9e747c

commit r15-1013-g181861b072ff1ef650c1a9d0290a4a672b9e747c
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:52:09 2024 +0200

fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

I think we can handle CTZ exactly like CLZ in tree_call_nonnegative_warnv_p.
Like CLZ, if it is UB at zero, the result range is [0, prec-1] and if it is
well defined at zero, the second argument provides the value at zero.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
CASE_CFN_CTZ like CASE_CFN_CLZ.

Diff:
---
 gcc/fold-const.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 048c654c848..92b048c307e 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15250,6 +15250,7 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   return true;
 
 CASE_CFN_CLZ:
+CASE_CFN_CTZ:
   if (arg1)
return RECURSE (arg1);
   return true;


[gcc r15-1011] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b82a816000791e7a286c7836b3a473ec0e2a577b

commit r15-1011-gb82a816000791e7a286c7836b3a473ec0e2a577b
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:49:41 2024 +0200

fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

Diff:
---
 gcc/fold-const.cc |  6 +-
 gcc/testsuite/gcc.dg/bitint-106.c | 29 +
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 117a816fec6..65ce03d572f 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15241,7 +15241,6 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -15250,6 +15249,11 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (arg1)
+   return RECURSE (arg1);
+  return true;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */
diff --git a/gcc/testsuite/gcc.dg/bitint-106.c 
b/gcc/testsuite/gcc.dg/bitint-106.c
new file mode 100644
index 000..a36e8836690
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/bitint-106.c
@@ -0,0 +1,29 @@
+/* PR tree-optimization/115337 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+#define N 128
+#else
+#define N 63
+#endif
+
+_BitInt (N) g;
+int c;
+
+void
+foo (unsigned _BitInt (N + 1) z, _BitInt (N) *ret)
+{
+  c = __builtin_stdc_first_leading_one (z << N);
+  _BitInt (N) y = *(_BitInt (N) *) __builtin_memset (, c, 5);
+  *ret = y;
+}
+
+int
+main ()
+{
+  _BitInt (N) x;
+  foo (0, );
+  if (c || g || x)
+__builtin_abort ();
+}


[gcc r15-1012] fold-const, gimple-fold: Some formatting cleanups

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:7be37a9bd40862e6a4686105cacf22d393258848

commit r15-1012-g7be37a9bd40862e6a4686105cacf22d393258848
Author: Jakub Jelinek 
Date:   Tue Jun 4 15:51:31 2024 +0200

fold-const, gimple-fold: Some formatting cleanups

While looking into PR115337, I've spotted some badly formatted code,
which the following patch fixes.

2024-06-04  Jakub Jelinek  

* fold-const.cc (tree_call_nonnegative_warnv_p): Formatting fixes.
(tree_invalid_nonnegative_warnv_p): Likewise.
* gimple-fold.cc (gimple_call_nonnegative_warnv_p): Likewise.

Diff:
---
 gcc/fold-const.cc  | 8 
 gcc/gimple-fold.cc | 8 
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 65ce03d572f..048c654c848 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -15331,8 +15331,8 @@ tree_call_nonnegative_warnv_p (tree type, combined_fn 
fn, tree arg0, tree arg1,
 non-negative if both operands are non-negative.  In the presence
 of qNaNs, we're non-negative if either operand is non-negative
 and can't be a qNaN, or if both operands are non-negative.  */
-  if (tree_expr_maybe_signaling_nan_p (arg0) ||
- tree_expr_maybe_signaling_nan_p (arg1))
+  if (tree_expr_maybe_signaling_nan_p (arg0)
+ || tree_expr_maybe_signaling_nan_p (arg1))
 return RECURSE (arg0) && RECURSE (arg1);
   return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
   || RECURSE (arg1))
@@ -15431,8 +15431,8 @@ tree_invalid_nonnegative_warnv_p (tree t, bool 
*strict_overflow_p, int depth)
 
 case CALL_EXPR:
   {
-   tree arg0 = call_expr_nargs (t) > 0 ?  CALL_EXPR_ARG (t, 0) : NULL_TREE;
-   tree arg1 = call_expr_nargs (t) > 1 ?  CALL_EXPR_ARG (t, 1) : NULL_TREE;
+   tree arg0 = call_expr_nargs (t) > 0 ? CALL_EXPR_ARG (t, 0) : NULL_TREE;
+   tree arg1 = call_expr_nargs (t) > 1 ? CALL_EXPR_ARG (t, 1) : NULL_TREE;
 
return tree_call_nonnegative_warnv_p (TREE_TYPE (t),
  get_call_combined_fn (t),
diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index c33583cf3ee..7c534d56bf1 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -9334,10 +9334,10 @@ static bool
 gimple_call_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
 int depth)
 {
-  tree arg0 = gimple_call_num_args (stmt) > 0 ?
-gimple_call_arg (stmt, 0) : NULL_TREE;
-  tree arg1 = gimple_call_num_args (stmt) > 1 ?
-gimple_call_arg (stmt, 1) : NULL_TREE;
+  tree arg0
+= gimple_call_num_args (stmt) > 0 ? gimple_call_arg (stmt, 0) : NULL_TREE;
+  tree arg1
+= gimple_call_num_args (stmt) > 1 ? gimple_call_arg (stmt, 1) : NULL_TREE;
   tree lhs = gimple_call_lhs (stmt);
   return (lhs
  && tree_call_nonnegative_warnv_p (TREE_TYPE (lhs),


[PATCH] fold-const: Handle CTZ like CLZ in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek
Hi!

I think we can handle CTZ exactly like CLZ in tree_call_nonnegative_warnv_p.
Like CLZ, if it is UB at zero, the result range is [0, prec-1] and if it is
well defined at zero, the second argument provides the value at zero.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p): Handle
CASE_CFN_CTZ like CASE_CFN_CLZ.

--- gcc/fold-const.cc.jj2024-06-04 12:08:14.671262211 +0200
+++ gcc/fold-const.cc   2024-06-04 10:56:57.575425348 +0200
@@ -15250,6 +15250,7 @@ tree_call_nonnegative_warnv_p (tree type
   return true;
 
 CASE_CFN_CLZ:
+CASE_CFN_CTZ:
   if (arg1)
return RECURSE (arg1);
   return true;

Jakub



[PATCH] ranger: Improve CLZ fold_range [PR115337]

2024-06-04 Thread Jakub Jelinek
Hi!

cfn_ctz::fold_range includes special cases for the case where .CTZ has
two arguments and so is well defined at zero, and the second argument is
equal to prec or -1, but cfn_clz::fold_range does that only for the prec
case.  -1 is fairly common as well though, because the  builtins
do use it now, so I think it is worth special casing that.
If we don't know anything about the argument, the difference for
.CLZ (arg, -1) is that previously the result was varying, now it will be
[-1, prec-1].  If we knew arg can't be zero, it used to be optimized before
as well into e.g. [0, prec-1] or similar.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* gimple-range-op.cc (cfn_clz::fold_range): For
m_gimple_call_internal_p handle as a special case also second argument
of -1 next to prec.

--- gcc/gimple-range-op.cc.jj   2024-05-21 10:19:34.736524824 +0200
+++ gcc/gimple-range-op.cc  2024-06-04 11:53:35.190005093 +0200
@@ -941,8 +941,10 @@ cfn_clz::fold_range (irange , tree typ
   int maxi = prec - 1;
   if (m_gimple_call_internal_p)
 {
-  // Only handle the single common value.
-  if (rh.lower_bound () == prec)
+  // Handle only the two common values.
+  if (rh.lower_bound () == -1)
+   mini = -1;
+  else if (rh.lower_bound () == prec)
maxi = prec;
   else
// Magic value to give up, unless we can prove arg is non-zero.
@@ -953,7 +955,7 @@ cfn_clz::fold_range (irange , tree typ
   if (wi::gt_p (lh.lower_bound (), 0, TYPE_SIGN (lh.type (
 {
   maxi = prec - 1 - wi::floor_log2 (lh.lower_bound ());
-  if (mini == -2)
+  if (mini < 0)
mini = 0;
 }
   else if (!range_includes_zero_p (lh))
@@ -969,11 +971,11 @@ cfn_clz::fold_range (irange , tree typ
   if (max == 0)
 {
   // If CLZ_DEFINED_VALUE_AT_ZERO is 2 with VALUE of prec,
-  // return [prec, prec], otherwise ignore the range.
-  if (maxi == prec)
-   mini = prec;
+  // return [prec, prec] or [-1, -1], otherwise ignore the range.
+  if (maxi == prec || mini == -1)
+   mini = maxi;
 }
-  else
+  else if (mini >= 0)
 mini = newmini;
 
   if (mini == -2)

Jakub



[PATCH] fold-const, gimple-fold: Some formatting cleanups

2024-06-04 Thread Jakub Jelinek
Hi!

While looking into PR115337, I've spotted some badly formatted code,
which the following patch fixes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-04  Jakub Jelinek  

* fold-const.cc (tree_call_nonnegative_warnv_p): Formatting fixes.
(tree_invalid_nonnegative_warnv_p): Likewise.
* gimple-fold.cc (gimple_call_nonnegative_warnv_p): Likewise.

--- gcc/fold-const.cc.jj2024-04-04 10:47:46.363287718 +0200
+++ gcc/fold-const.cc   2024-06-04 10:56:57.575425348 +0200
@@ -15331,8 +15331,8 @@ tree_call_nonnegative_warnv_p (tree type
 non-negative if both operands are non-negative.  In the presence
 of qNaNs, we're non-negative if either operand is non-negative
 and can't be a qNaN, or if both operands are non-negative.  */
-  if (tree_expr_maybe_signaling_nan_p (arg0) ||
- tree_expr_maybe_signaling_nan_p (arg1))
+  if (tree_expr_maybe_signaling_nan_p (arg0)
+ || tree_expr_maybe_signaling_nan_p (arg1))
 return RECURSE (arg0) && RECURSE (arg1);
   return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
   || RECURSE (arg1))
@@ -15431,8 +15431,8 @@ tree_invalid_nonnegative_warnv_p (tree t
 
 case CALL_EXPR:
   {
-   tree arg0 = call_expr_nargs (t) > 0 ?  CALL_EXPR_ARG (t, 0) : NULL_TREE;
-   tree arg1 = call_expr_nargs (t) > 1 ?  CALL_EXPR_ARG (t, 1) : NULL_TREE;
+   tree arg0 = call_expr_nargs (t) > 0 ? CALL_EXPR_ARG (t, 0) : NULL_TREE;
+   tree arg1 = call_expr_nargs (t) > 1 ? CALL_EXPR_ARG (t, 1) : NULL_TREE;
 
return tree_call_nonnegative_warnv_p (TREE_TYPE (t),
  get_call_combined_fn (t),
--- gcc/gimple-fold.cc.jj   2024-02-28 09:40:09.473563056 +0100
+++ gcc/gimple-fold.cc  2024-06-04 10:38:37.515145399 +0200
@@ -9334,10 +9334,10 @@ static bool
 gimple_call_nonnegative_warnv_p (gimple *stmt, bool *strict_overflow_p,
 int depth)
 {
-  tree arg0 = gimple_call_num_args (stmt) > 0 ?
-gimple_call_arg (stmt, 0) : NULL_TREE;
-  tree arg1 = gimple_call_num_args (stmt) > 1 ?
-gimple_call_arg (stmt, 1) : NULL_TREE;
+  tree arg0
+= gimple_call_num_args (stmt) > 0 ? gimple_call_arg (stmt, 0) : NULL_TREE;
+  tree arg1
+= gimple_call_num_args (stmt) > 1 ? gimple_call_arg (stmt, 1) : NULL_TREE;
   tree lhs = gimple_call_lhs (stmt);
   return (lhs
  && tree_call_nonnegative_warnv_p (TREE_TYPE (lhs),

Jakub



[PATCH] fold-const: Fix up CLZ handling in tree_call_nonnegative_warnv_p [PR115337]

2024-06-04 Thread Jakub Jelinek
Hi!

The function currently incorrectly assumes all the __builtin_clz* and .CLZ
calls have non-negative result.  That is the case of the former which is UB
on zero and has [0, prec-1] return value otherwise, and is the case of the
single argument .CLZ as well (again, UB on zero), but for two argument
.CLZ is the case only if the second argument is also nonnegative (or if we
know the argument can't be zero, but let's do that just in the ranger IMHO).

The following patch does that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and 14?
For 13 and earlier, we can't use the testcase and the fold-const.cc change
would need to differentiate between __builtin_clz* vs. .CLZ and in the
latter case look at CLZ_DEFINED_VALUE_AT_ZERO.

2024-06-04  Jakub Jelinek  

PR tree-optimization/115337
* fold-const.cc (tree_call_nonnegative_warnv_p) :
If arg1 is non-NULL, RECURSE on it, otherwise return true.

* gcc.dg/bitint-106.c: New test.

--- gcc/fold-const.cc.jj2024-04-04 10:47:46.363287718 +0200
+++ gcc/fold-const.cc   2024-06-04 10:56:57.575425348 +0200
@@ -15241,7 +15241,6 @@ tree_call_nonnegative_warnv_p (tree type
 CASE_CFN_FFS:
 CASE_CFN_PARITY:
 CASE_CFN_POPCOUNT:
-CASE_CFN_CLZ:
 CASE_CFN_CLRSB:
 case CFN_BUILT_IN_BSWAP16:
 case CFN_BUILT_IN_BSWAP32:
@@ -15250,6 +15249,11 @@ tree_call_nonnegative_warnv_p (tree type
   /* Always true.  */
   return true;
 
+CASE_CFN_CLZ:
+  if (arg1)
+   return RECURSE (arg1);
+  return true;
+
 CASE_CFN_SQRT:
 CASE_CFN_SQRT_FN:
   /* sqrt(-0.0) is -0.0.  */
--- gcc/testsuite/gcc.dg/bitint-106.c.jj2024-06-04 12:00:59.017079094 
+0200
+++ gcc/testsuite/gcc.dg/bitint-106.c   2024-06-04 12:00:41.975306632 +0200
@@ -0,0 +1,29 @@
+/* PR tree-optimization/115337 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-O2" } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+#define N 128
+#else
+#define N 63
+#endif
+
+_BitInt (N) g;
+int c;
+
+void
+foo (unsigned _BitInt (N + 1) z, _BitInt (N) *ret)
+{
+  c = __builtin_stdc_first_leading_one (z << N);
+  _BitInt (N) y = *(_BitInt (N) *) __builtin_memset (, c, 5);
+  *ret = y;
+}
+
+int
+main ()
+{
+  _BitInt (N) x;
+  foo (0, );
+  if (c || g || x)
+__builtin_abort ();
+}

Jakub



Re: [PATCH] Implement -fassume-sane-operator-new [PR110137]

2024-06-04 Thread Jakub Jelinek
On Wed, May 29, 2024 at 04:09:08AM +, user202...@protonmail.com wrote:
> This patch implements the flag -fassume-sane-operator-new as suggested in 
> PR110137. When the flag is enabled, it is assumed that operator new does not 
> modify global memory.
> 
> While this patch is not powerful enough to handle the original issue in 
> PR110035, it allows the optimizer to handle some simpler case (e.g. load from 
> global memory with fixed address), as demonstrated in the test 
> sane-operator-new-1.C.
> 
> To handle the original issue in PR110035, some other improvement to the 
> optimizer is needed, which will be sent as subsequent patches.
> 
> Bootstrapped and regression tested on x86_64-pc-linux-gnu.

> From 14a8604907c89838577ff8560df9a3f9dc2d8afb Mon Sep 17 00:00:00 2001
> From: user202729 
> Date: Fri, 24 May 2024 17:40:55 +0800
> Subject: [PATCH] Implement -fassume-sane-operator-new [PR110137]
> 
>   PR c++/110137
> 
> gcc/c-family/ChangeLog:
> 
>   * c.opt: New option.

You need c.opt (fassume-sane-operator-new): New option.

> gcc/ChangeLog:
> 
>   * ira.cc (is_call_operator_new_p): New function.
>   (may_modify_memory_p): Likewise.
>   (validate_equiv_mem): Modify to use may_modify_memory_p.

The patch doesn't update doc/invoke.texi with the description of
what the option does, that is essential.

> +fassume-sane-operator-new
> +C++ Optimization Var(flag_assume_sane_operator_new)
> +Assume operator new does not have any side effect other than the allocation.

Is it just about operator new and not about operator delete as well in
clang?
Is it about all operator new or just the replaceable ones (standard ones in
global scope, those also have DECL_IS_REPLACEABLE_OPERATOR flag on them).
Depending on this, if the flag is about only replaceable ones, I think it is
a global property, so for LTO it should be merged as if there is a single TU
which uses this flag, it is set for the whole LTO compilation (or should it
be only for TUs with that flag which actually use such operator new calls?).
If it is all operators new, then it is a local property in each function (or
even better a property of the operators actually) and we should track
somewhere in cfun whether a function compiled with that flag calls operator
new and whether a function compiled without that flag calls operator new.
Then e.g. during inlining merge it, such that if both the functions invoke
operator new and they disagree on whether it is sane or not, the non-sane
case wins.

> --- a/gcc/ira.cc
> +++ b/gcc/ira.cc

This surely is much more important to handle in the alias oracle, not just
IRA.

> @@ -3080,6 +3080,27 @@ validate_equiv_mem_from_store (rtx dest, const_rtx set 
> ATTRIBUTE_UNUSED,
>  
>  static bool equiv_init_varies_p (rtx x);
>  
> +static bool is_call_operator_new_p (rtx_insn *insn)

Formatting, static bool on one line, is_call_... on another one.
And needs a function comment.

> +{
> +  if (!CALL_P (insn))
> +return false;
> +  tree fn = get_call_fndecl (insn);
> +  if (fn == NULL_TREE)
> +return false;
> +  return DECL_IS_OPERATOR_NEW_P (fn);
> +}
> +
> +/* Returns true if there is a possibility that INSN may modify memory.
> +   If false is returned, the compiler proved INSN never modify memory.  */
> +static bool may_modify_memory_p (rtx_insn *insn)

Again, missing newline instead of space after bool.
Not sure about the name of this function, even sane replaceable operator new
may modify memory (it actually has to), just shouldn't modify memory
the compiler cares about.

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/sane-operator-new-1.C
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-O2 -fassume-sane-operator-new" } */

If the tests are x86 specific, they should go to g++.target/i386/ directory.
But as I said earlier, it would be better to handle optimizations like that
on GIMPLE too and then you can test that say on optimized dump on all
targets.

Jakub



Re: RFC: Support for pragma clang loop interleave_count(N)

2024-06-04 Thread Jakub Jelinek
On Tue, Jun 04, 2024 at 11:58:43AM +0100, Andre Vieira (lists) wrote:
>   case annot_expr_unroll_kind:
> + case annot_expr_interleaves_kind:
> {
> - pp_string (pp, ", unroll ");
> + pp_string (pp,
> +annot_expr_unroll_kind

I think annot_expr_unroll_kind is 1 and thus always non-zero.
You want to compare the value of the operand, or just use separate
cases, they aren't that large.

> +? ", unroll "
> +: ", interleaves ");
>   pp_decimal_int (pp,
>   (int) TREE_INT_CST_LOW (TREE_OPERAND (node, 2)));
>   break;

Jakub



[gcc r15-1009] builtins: Force SAVE_EXPR for __builtin_{add, sub, mul}_overflow and __builtin{add, sub}c [PR108789]

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:b8e28381cb5c0cddfe5201faf799d8b27f5d7d6c

commit r15-1009-gb8e28381cb5c0cddfe5201faf799d8b27f5d7d6c
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:28:01 2024 +0200

builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and 
__builtin{add,sub}c [PR108789]

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

Diff:
---
 gcc/builtins.cc| 22 ++-
 gcc/testsuite/gcc.c-torture/execute/pr108789.c | 39 ++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 00ee9eb2925..5b5307c67b8 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -10042,7 +10042,21 @@ fold_builtin_arith_overflow (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
@@ -10354,11 +10368,17 @@ fold_builtin_addc_subc (location_t loc, enum 
built_in_function fcode,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
args[0], args[1]);
+  /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+ as while the call itself is const, the REALPART_EXPR store is
+ certainly not.  And in any case, we want just one call,
+ not multiple and trying to CSE them later.  */
+  TREE_SIDE_EFFECTS (call) = 1;
   tree tgt = save_expr (call);
   tree intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
   intres, args[2]);
+  TREE_SIDE_EFFECTS (call) = 1;
   tgt = save_expr (call);
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres2 = build1_loc (loc, IMAGPART_EXPR, type, tgt);
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr108789.c 
b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
new file mode 100644
index 000..32ee19be1c4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr108789.c
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (__INT_MAX__ + 1U) / 2;
+  if (add (, , ))
+__builtin_abort ();
+
+  /* 256U

[gcc r15-1008] invoke.texi: Clarify -march=lujiazui

2024-06-04 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74

commit r15-1008-g09b4ab53155ea16e1fb12c2afcd9b6fe29a31c74
Author: Jakub Jelinek 
Date:   Tue Jun 4 12:20:13 2024 +0200

invoke.texi: Clarify -march=lujiazui

I was recently searching which exact CPUs are affected by the PR114576
wrong-code issue and went from the PTA_* bitmasks in GCC, so arrived
at the goldmont, goldmont-plus, tremont and lujiazui CPUs (as -march=
cases which do enable -maes and don't enable -mavx).
But when double-checking that against the invoke.texi documentation,
that was true for the first 3, but lujiazui said it supported AVX.
I was really confused by that, until I found the
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604407.html
explanation.  So, seems the CPUs do have AVX and F16C but -march=lujiazui
doesn't enable those and even activelly attempts to filter those out from
the announced CPUID features, in glibc as well as e.g. in libgcc.

Thus, I think we should document what actually happens, otherwise
users could assume that
gcc -march=lujiazui predefines __AVX__ and __F16C__, which it doesn't.

2024-06-04  Jakub Jelinek  

* doc/invoke.texi (lujiazui): Clarify that while the CPUs do support
AVX and F16C, -march=lujiazui actually doesn't enable those.

Diff:
---
 gcc/doc/invoke.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 45115b5fbed..4e8967fd8ab 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -34808,8 +34808,10 @@ instruction set support.
 
 @item lujiazui
 ZHAOXIN lujiazui CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,
-SSE4.2, AVX, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
-ABM, BMI, BMI2, F16C, FXSR, RDSEED instruction set support.
+SSE4.2, POPCNT, AES, PCLMUL, RDRND, XSAVE, XSAVEOPT, FSGSBASE, CX16,
+ABM, BMI, BMI2, FXSR, RDSEED instruction set support.  While the CPUs
+do support AVX and F16C, these aren't enabled by @code{-march=lujiazui}
+for performance reasons.
 
 @item yongfeng
 ZHAOXIN yongfeng CPU with x86-64, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1,


[PATCH] rs6000: Decrease code size of rs6000_init_generated_builtins [PR115324]

2024-06-04 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 03:40:38PM -0500, Segher Boessenkool wrote:
> > So, either we'd need to add some further GTY extensions, or the following
> > patch instead reworks it such that the fntype members which were the only
> > reason for PCH in those arrays are moved to separate arrays.
> 
> And that just sidesteps the limitation in PCH?

Yes.  But at the same size decreases the sizes of the data sections and
decreases size of the data written to/from PCH files, so I think it is a
win.

> >  void
> >  rs6000_init_generated_builtins ()
> >  {
> > +  bifdata *rs6000_builtin_info_p;
> > +  tree *rs6000_builtin_info_fntype_p;
> > +  ovlddata *rs6000_instance_info_p;
> > +  tree *rs6000_instance_info_fntype_p;
> > +  ovldrecord *rs6000_overload_info_p;
> > +  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
> 
> Bah.
> 
> It should not be called _p of course, it is not a predicate.  And
> relying on the operand tie to not have to do a much more obvious
> assignment, please don't.  Just *do* write assignments, and then use
> a simple "+r"?
> 
> But you call this a hack anyway, you wouldn't propose to actually
> include this patch :-)

It was a quick hack just to see why the size grew that much.
Ideally some optimization would figure out we have a single function which
has
461   rs6000_overload_info
   1257   rs6000_builtin_info_fntype
   1768   rs6000_builtin_decls
   2548   rs6000_instance_info_fntype
array references and that maybe it might be a good idea to just preload
the addresses of those arrays into some register if it decreases code size
and doesn't slow things down.
The function actually is called just once and is huge, so code size is even
more important than speed, which is dominated by all the GC allocations
anyway.

Until that is done, here is a slightly cleaner version of the hack, which
makes the function noipa (so that LTO doesn't undo it) for GCC 8.1+ and
passes the 4 arrays as arguments to the function from the caller.
This decreases the function size from 228668 bytes to 207572 bytes.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2024-06-04  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Change
declaration of rs6000_init_generated_builtins from no arguments
to 4 pointer arguments.
(write_init_bif_table): Change rs6000_builtin_info_fntype to
builtin_info_fntype and rs6000_builtin_decls to builtin_decls.
(write_init_ovld_table): Change rs6000_instance_info_fntype to
instance_info_fntype, rs6000_builtin_decls to builtin_decls and
rs6000_overload_info to overload_info.
(write_init_file): Add __noipa__ attribute to
rs6000_init_generated_builtins for GCC 8.1+ and change the function
from no arguments to 4 pointer arguments.  Change rs6000_builtin_decls
to builtin_decls.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Adjust
rs6000_init_generated_builtins caller.

--- gcc/config/rs6000/rs6000-gen-builtins.cc.jj 2024-06-03 23:11:02.662631144 
+0200
+++ gcc/config/rs6000/rs6000-gen-builtins.cc2024-06-03 23:38:31.727620920 
+0200
@@ -2376,7 +2376,10 @@ write_decls (void)
   "rs6000_instance_info_fntype[RS6000_INST_MAX];\n");
   fprintf (header_file, "extern ovldrecord rs6000_overload_info[];\n\n");
 
-  fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n");
+  fprintf (header_file,
+  "extern void rs6000_init_generated_builtins (tree *, tree *,\n");
+  fprintf (header_file,
+  "\t\t\t\t\tovldrecord *, tree *);\n\n");
   fprintf (header_file,
   "extern bool rs6000_builtin_is_supported (rs6000_gen_builtins);\n");
   fprintf (header_file,
@@ -2651,7 +2654,7 @@ write_init_bif_table (void)
   for (int i = 0; i <= curr_bif; i++)
 {
   fprintf (init_file,
-  "  rs6000_builtin_info_fntype[RS6000_BIF_%s]"
+  "  builtin_info_fntype[RS6000_BIF_%s]"
   "\n= %s;\n",
   bifs[i].idname, bifs[i].fndecl);
 
@@ -2678,7 +2681,7 @@ write_init_bif_table (void)
}
 
   fprintf (init_file,
-  "  rs6000_builtin_decls[(int)RS6000_BIF_%s] = t\n",
+  "  builtin_decls[(int)RS6000_BIF_%s] = t\n",
   bifs[i].idname);
   fprintf (init_file,
   "= add_builtin_function (\"%s\",\n",
@@ -2719,7 +2722,7 @@ write_init_bif_table (void)
  fprintf (init_file, "}\n");
  fprintf (init_file, "  else\n");
  fprintf (init_file, "{\n");
- fprintf (init_file, "  rs6000_b

[PATCH] c: Fix up pointer types to may_alias structures [PR114493]

2024-06-04 Thread Jakub Jelinek
Hi!

The following testcase ICEs in ipa-free-lang, because the
fld_incomplete_type_of
  gcc_assert (TYPE_CANONICAL (t2) != t2
  && TYPE_CANONICAL (t2) == TYPE_CANONICAL (TREE_TYPE (t)));
assertion doesn't hold.
This is because t is a struct S * type which was created while struct S
was still incomplete and without the may_alias attribute (and TYPE_CANONICAL
of a pointer type is a type created with can_alias_all = false argument),
while later on on the struct definition may_alias attribute was used.
fld_incomplete_type_of then creates an incomplete distinct copy of the
structure (but with the original attributes) but pointers created for it
are because of the "may_alias" attribute TYPE_REF_CAN_ALIAS_ALL, including
their TYPE_CANONICAL, because while that is created with !can_alias_all
argument, we later set it because of the "may_alias" attribute on the
to_type.

This doesn't ICE with C++ since PR70512 fix because the C++ FE sets
TYPE_REF_CAN_ALIAS_ALL on all pointer types to the class type (and its
variants) when the may_alias is added.

The following patch does that in the C FE as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
release branches?

2024-06-04  Jakub Jelinek  

PR c/114493
* c-decl.cc (c_fixup_may_alias): New function.
(finish_struct): Call it if "may_alias" attribute is
specified.

* gcc.dg/pr114493-1.c: New test.
* gcc.dg/pr114493-2.c: New test.

--- gcc/c/c-decl.cc.jj  2024-05-07 08:47:35.974836903 +0200
+++ gcc/c/c-decl.cc 2024-06-03 19:55:53.819586291 +0200
@@ -9446,6 +9446,17 @@ verify_counted_by_attribute (tree struct
   return;
 }
 
+/* TYPE is a struct or union that we're applying may_alias to after the body is
+   parsed.  Fixup any POINTER_TO types.  */
+
+static void
+c_fixup_may_alias (tree type)
+{
+  for (tree t = TYPE_POINTER_TO (type); t; t = TYPE_NEXT_PTR_TO (t))
+for (tree v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
+}
+
 /* Fill in the fields of a RECORD_TYPE or UNION_TYPE node, T.
LOC is the location of the RECORD_TYPE or UNION_TYPE's definition.
FIELDLIST is a chain of FIELD_DECL nodes for the fields.
@@ -9791,6 +9802,10 @@ finish_struct (location_t loc, tree t, t
 
   C_TYPE_BEING_DEFINED (t) = 0;
 
+  if (lookup_attribute ("may_alias", TYPE_ATTRIBUTES (t)))
+for (x = TYPE_MAIN_VARIANT (t); x; x = TYPE_NEXT_VARIANT (x))
+  c_fixup_may_alias (x);
+
   /* Set type canonical based on equivalence class.  */
   if (flag_isoc23 && !C_TYPE_VARIABLE_SIZE (t))
 {
--- gcc/testsuite/gcc.dg/pr114493-1.c.jj2024-06-03 19:59:58.774336785 
+0200
+++ gcc/testsuite/gcc.dg/pr114493-1.c   2024-06-03 19:59:12.931944923 +0200
@@ -0,0 +1,19 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}
--- gcc/testsuite/gcc.dg/pr114493-2.c.jj2024-06-03 19:59:58.774336785 
+0200
+++ gcc/testsuite/gcc.dg/pr114493-2.c   2024-06-03 20:01:00.886512830 +0200
@@ -0,0 +1,26 @@
+/* PR c/114493 */
+/* { dg-do compile { target lto } } */
+/* { dg-options "-O2 -flto -std=c23" } */
+
+void foo (void);
+struct S;
+struct S bar (struct S **);
+struct S qux (const struct S **);
+
+void
+corge (void)
+{
+  struct S { int s; } s;
+  s.s = 0;
+}
+
+struct __attribute__((__may_alias__)) S {
+  int s;
+};
+
+struct S
+baz (void)
+{
+  foo ();
+  return (struct S) {};
+}

Jakub



[PATCH] builtins: Force SAVE_EXPR for __builtin_{add,sub,mul}_overflow and __builtin{add,sub}c [PR108789]

2024-06-04 Thread Jakub Jelinek
Hi!

The following testcase is miscompiled, because we use save_expr
on the .{ADD,SUB,MUL}_OVERFLOW call we are creating, but if the first
two operands are not INTEGER_CSTs (in that case we just fold it right away)
but are TREE_READONLY/!TREE_SIDE_EFFECTS, save_expr doesn't actually
create a SAVE_EXPR at all and so we lower it to
*arg2 = REALPART_EXPR (.ADD_OVERFLOW (arg0, arg1)), \
IMAGPART_EXPR (.ADD_OVERFLOW (arg0, arg1))
which evaluates the ifn twice and just hope it will be CSEd back.
As *arg2 aliases *arg0, that is not the case.
The builtins are really never const/pure as they store into what
the third arguments points to, so after handling the INTEGER_CST+INTEGER_CST
case, I think we should just always use SAVE_EXPR.  Just building SAVE_EXPR
by hand and setting TREE_SIDE_EFFECTS on it doesn't work, because
c_fully_fold optimizes it away again, so the following patch marks the
ifn calls as TREE_SIDE_EFFECTS (but doesn't do it for the
__builtin_{add,sub,mul}_overflow_p case which were designed for use
especially in constant expressions and don't really evaluate the
realpart side, so we don't really need a SAVE_EXPR in that case).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-06-04  Jakub Jelinek  

PR middle-end/108789
* builtins.cc (fold_builtin_arith_overflow): For ovf_only,
don't call save_expr and don't build REALPART_EXPR, otherwise
set TREE_SIDE_EFFECTS on call before calling save_expr.
(fold_builtin_addc_subc): Set TREE_SIDE_EFFECTS on call before
calling save_expr.

* gcc.c-torture/execute/pr108789.c: New test.

--- gcc/builtins.cc.jj  2024-04-05 09:19:47.899050410 +0200
+++ gcc/builtins.cc 2024-06-03 17:27:11.071693074 +0200
@@ -10042,7 +10042,21 @@ fold_builtin_arith_overflow (location_t
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
arg0, arg1);
-  tree tgt = save_expr (call);
+  tree tgt;
+  if (ovf_only)
+   {
+ tgt = call;
+ intres = NULL_TREE;
+   }
+  else
+   {
+ /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+as while the call itself is const, the REALPART_EXPR store is
+certainly not.  And in any case, we want just one call,
+not multiple and trying to CSE them later.  */
+ TREE_SIDE_EFFECTS (call) = 1;
+ tgt = save_expr (call);
+   }
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   ovfres = fold_convert_loc (loc, boolean_type_node, ovfres);
@@ -10354,11 +10368,17 @@ fold_builtin_addc_subc (location_t loc,
   tree ctype = build_complex_type (type);
   tree call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
args[0], args[1]);
+  /* Force SAVE_EXPR even for calls which satisfy tree_invariant_p_1,
+ as while the call itself is const, the REALPART_EXPR store is
+ certainly not.  And in any case, we want just one call,
+ not multiple and trying to CSE them later.  */
+  TREE_SIDE_EFFECTS (call) = 1;
   tree tgt = save_expr (call);
   tree intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres = build1_loc (loc, IMAGPART_EXPR, type, tgt);
   call = build_call_expr_internal_loc (loc, ifn, ctype, 2,
   intres, args[2]);
+  TREE_SIDE_EFFECTS (call) = 1;
   tgt = save_expr (call);
   intres = build1_loc (loc, REALPART_EXPR, type, tgt);
   tree ovfres2 = build1_loc (loc, IMAGPART_EXPR, type, tgt);
--- gcc/testsuite/gcc.c-torture/execute/pr108789.c.jj   2024-06-03 
17:15:01.143366766 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr108789.c  2024-06-03 
17:12:55.189036744 +0200
@@ -0,0 +1,39 @@
+/* PR middle-end/108789 */
+
+int
+add (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_add_overflow (*a, *b, r);
+}
+
+int
+mul (unsigned *r, const unsigned *a, const unsigned *b)
+{
+  return __builtin_mul_overflow (*a, *b, r);
+}
+
+int
+main ()
+{
+  unsigned x;
+
+  /* 1073741824U + 1073741824U should not overflow.  */
+  x = (__INT_MAX__ + 1U) / 2;
+  if (add (, , ))
+__builtin_abort ();
+
+  /* 256U * 256U should not overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 4);
+  if (mul (, , ))
+__builtin_abort ();
+
+  /* 2147483648U + 2147483648U should overflow */
+  x = __INT_MAX__ + 1U;
+  if (!add (, , ))
+__builtin_abort ();
+
+  /* 65536U * 65536U should overflow */
+  x = 1U << (sizeof (int) * __CHAR_BIT__ / 2);
+  if (!mul (, , ))
+__builtin_abort ();
+}

Jakub



[gcc r15-1001] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-03 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:4cf2de9b5268224816a3d53fdd2c3d799ebfd9c8

commit r15-1001-g4cf2de9b5268224816a3d53fdd2c3d799ebfd9c8
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:11:06 2024 +0200

rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or _instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
  {
_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_in

[gcc r15-1000] c++: Fix parsing of abstract-declarator starting with ... followed by opening paren [PR115012]

2024-06-03 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:48c3e5a4f935d6b8cd9ef7c51995e3b29ceb8be7

commit r15-1000-g48c3e5a4f935d6b8cd9ef7c51995e3b29ceb8be7
Author: Jakub Jelinek 
Date:   Mon Jun 3 23:07:08 2024 +0200

c++: Fix parsing of abstract-declarator starting with ... followed by 
opening paren [PR115012]

The C++26 P2662R3 Pack indexing paper mentions that both GCC
and MSVC don't handle T...[10] parameter declaration when T
is a pack.  And apparently neither T...(args).
While the former will change meaning in C++26, T...(args) is still
valid even in C++26.

The following patch handles just the T...(args) case in
cp_parser_direct_declarator.

2024-06-03  Jakub Jelinek  

PR c++/115012
* parser.cc (cp_parser_direct_declarator): Handle
abstract declarator starting with ... followed by opening paren.

* g++.dg/cpp0x/variadic185.C: New test.

Diff:
---
 gcc/cp/parser.cc | 15 +--
 gcc/testsuite/g++.dg/cpp0x/variadic185.C | 43 
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 779625144db..3b2ad25af9f 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -23991,7 +23991,12 @@ cp_parser_direct_declarator (cp_parser* parser,
 {
   /* Peek at the next token.  */
   token = cp_lexer_peek_token (parser->lexer);
-  if (token->type == CPP_OPEN_PAREN)
+  if (token->type == CPP_OPEN_PAREN
+ || (first
+ && dcl_kind != CP_PARSER_DECLARATOR_NAMED
+ && token->type == CPP_ELLIPSIS
+ && cxx_dialect > cxx98
+ && cp_lexer_nth_token_is (parser->lexer, 2, CPP_OPEN_PAREN)))
{
  /* This is either a parameter-declaration-clause, or a
 parenthesized declarator. When we know we are parsing a
@@ -24030,6 +24035,11 @@ cp_parser_direct_declarator (cp_parser* parser,
 
 Thus again, we try a parameter-declaration-clause, and if
 that fails, we back out and return.  */
+ bool pack_expansion_p = token->type == CPP_ELLIPSIS;
+
+ if (pack_expansion_p)
+   /* Consume the `...' */
+   cp_lexer_consume_token (parser->lexer);
 
  if (!first || dcl_kind != CP_PARSER_DECLARATOR_NAMED)
{
@@ -24173,6 +24183,7 @@ cp_parser_direct_declarator (cp_parser* parser,
 attrs,
 parens_loc);
  declarator->attributes = gnu_attrs;
+ declarator->parameter_pack_p |= pack_expansion_p;
  /* Any subsequent parameter lists are to do with
 return type, so are not those of the declared
 function.  */
@@ -24196,7 +24207,7 @@ cp_parser_direct_declarator (cp_parser* parser,
 
  /* If this is the first, we can try a parenthesized
 declarator.  */
- if (first)
+ if (first && !pack_expansion_p)
{
  bool saved_in_type_id_in_expr_p;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic185.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic185.C
new file mode 100644
index 000..2c04afeda00
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic185.C
@@ -0,0 +1,43 @@
+// PR c++/115012
+// { dg-do compile { target { c++11 } } }
+// { dg-final { scan-assembler "_Z3fooIJidEEvDpFT_iE" } }
+// { dg-final { scan-assembler "_Z3barIiEvPFT_iE" } }
+// { dg-final { scan-assembler "_Z3bazIJidEEvDpFT_iE" } }
+
+template 
+void
+foo (T... x (int))
+{
+}
+
+template 
+void
+bar (T (int))
+{
+}
+
+template 
+void
+baz (T... (int))
+{
+}
+
+int
+f1 (int)
+{
+  return 0;
+}
+
+double
+f2 (int)
+{
+  return 0;
+}
+
+void
+corge ()
+{
+  foo  (f1, f2);
+  bar  (f1);
+  baz  (f1, f2);
+}


Re: [PATCH v6 1/8] Improve must tail in RTL backend

2024-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 07:02:00PM +0200, Michael Matz wrote:
> Hello,
> 
> On Fri, 31 May 2024, Andi Kleen wrote:
> 
> > > I think the ultimate knowledge if a call can or cannot be implemented as 
> > > tail-call lies within calls.cc/expand_call: It is inherently 
> > > target and ABI specific how arguments and returns are layed out, how the 
> > > stack frame is generated, if arguments are or aren't removed by callers 
> > > or callees and so on; all of that being knowledge that tree-tailcall 
> > > doesn't have and doesn't want to have.  As such tree-tailcall should 
> > > not be regarded as ultimate truth, and failures of tree-tailcall to 
> > > recognize something as tail-callable shouldn't matter.
> > 
> > It's not the ultimate truth, but some of the checks it does are not 
> > duplicated at expand time nor the backend. So it's one necessary pre 
> > condition with the current code base.
> > 
> > Yes maybe the checks could be all moved, but that's a much larger 
> > project.
> 
> Hmm.  I count six tests in about 25 lines of code in 
> tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.
> 
> Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
> find_tail_calls)?  Why would that be needed for musttail?  Is that 
> attribute sometimes applied to calls that aren't in fact sibcall-able?
> 
> One thing I'm worried about is the need for a new sibcall pass at O0 just 
> for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
> live local variables for the whole function, potentially being quadratic.

But the pass could be done only if there is at least one musttail call
in a function (remembered in some cfun flag).  If people use that attribute,
guess they are willing to pay for it.

Jakub



Re: [PATCH v7 4/9] C++: Support clang compatible [[musttail]] (PR83324)

2024-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 08:33:52AM -0700, Andi Kleen wrote:
> On Mon, Jun 03, 2024 at 10:42:20AM -0400, Jason Merrill wrote:
> > > @@ -30316,7 +30348,7 @@ cp_parser_std_attribute (cp_parser *parser, tree 
> > > attr_ns)
> > >   /* Maybe we don't expect to see any arguments for this attribute.  
> > > */
> > >   const attribute_spec *as
> > > = lookup_attribute_spec (TREE_PURPOSE (attribute));
> > > -if (as && as->max_length == 0)
> > > +if ((as && as->max_length == 0) || is_attribute_p ("musttail", 
> > > attr_id))
> > 
> > This shouldn't be necessary with the attribute in the c-attribs table,
> > right?  This patch is OK without this hunk and with the comment tweak above.
> 
> Yes I will remove it. Also the hunk above can be simplified, we don't
> need the extra case anymore.
> 
> But unfortunately there's another problem (sorry I missed that earlier
> but the Linaro bot pointed it out again):
> 
> This hunk:
> 
> @@ -21085,12 +21085,14 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl)
>   bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
>   bool ord = CALL_EXPR_ORDERED_ARGS (t);
>   bool rev = CALL_EXPR_REVERSE_ARGS (t);
> - if (op || ord || rev)
> + bool mtc = CALL_EXPR_MUST_TAIL_CALL (t);
> + if (op || ord || rev || mtc)
> if (tree call = extract_call_expr (ret))
>   {
> CALL_EXPR_OPERATOR_SYNTAX (call) = op;
> CALL_EXPR_ORDERED_ARGS (call) = ord;
> CALL_EXPR_REVERSE_ARGS (call) = rev;
> +   CALL_EXPR_MUST_TAIL_CALL (call) = mtc;
>   }

The difference is that CALL_EXPR_MUST_TAIL_CALL is defined as:
#define CALL_EXPR_MUST_TAIL_CALL(NODE) \
  (CALL_EXPR_CHECK (NODE)->base.static_flag)
while the others like:
#define CALL_EXPR_ORDERED_ARGS(NODE) \
  TREE_LANG_FLAG_3 (CALL_OR_AGGR_INIT_CHECK (NODE))
where
#define CALL_OR_AGGR_INIT_CHECK(NODE) \
  TREE_CHECK2 ((NODE), CALL_EXPR, AGGR_INIT_EXPR)
while
#define CALL_EXPR_CHECK(t)  TREE_CHECK (t, CALL_EXPR)
(this one is defined in generated tree-check.h).
So, while the CALL_EXPR_REVERSE_ARGS etc. can be used on either
CALL_EXPR or AGGR_INIT_EXPR (the latter is a C++ specific tree code),
CALL_EXPR_MUST_TAIL_CALL is allowed only on CALL_EXPR.
AGGR_INIT_EXPR is used for C++ constructor calls, so I think
they really don't need such a flag, so you could do:
bool mtc = (TREE_CODE (t) == CALL_EXPR
? CALL_EXPR_MUST_TAIL_CALL (t) : false);
if (op || ord || rev || mtc)
...
  if (mtc)
CALL_EXPR_MUST_TAIL_CALL (call) = 1;
or something similar.
Or you'd need to define a variant of the CALL_EXPR_MUST_TAIL_CALL
macro for the C++ FE (as CALL_OR_AGGR_INIT_CHECK is C++ FE too)
and use that in the FE and somehow assert it means the same thing
as the middle-end flag except that it can be also used on AGGR_INIT_EXPR.

Jakub



[PATCH] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-03 Thread Jakub Jelinek
Hi!

PCH doesn't work properly in --enable-host-pie configurations on
powerpc*-linux*.
The problem is that the rs6000_builtin_info and rs6000_instance_info
arrays mix pointers to .rodata/.data (bifname and attr_string point
to string literals in .rodata section, and the next member is either NULL
or _instance_info[XXX]) and GC member (tree fntype).
Now, for normal GC this works just fine, we emit
  {
_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
  {
_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
_ggc_mx_tree_node,
_pch_nx_tree_node
  },
GC roots which are strided and thus cover only the fntype members of all
the elements of the two arrays.
For PCH though it actually results in saving those huge arrays (one is
130832 bytes, another 81568 bytes) into the .gch files and loading them back
in full.  While the bifname and attr_string and next pointers are marked as
GTY((skip)), they are actually saved to point to the .rodata and .data
sections of the process which writes the PCH, but because cc1/cc1plus etc.
are position independent executables with --enable-host-pie, when it is
loaded from the PCH file, it can point in a completely different addresses
where nothing is mapped at all or some random different thing appears at.
While gengtype supports the callback option, that one is meant for
relocatable function pointers and doesn't work in the case of GTY arrays
inside of .data section anyway.

So, either we'd need to add some further GTY extensions, or the following
patch instead reworks it such that the fntype members which were the only
reason for PCH in those arrays are moved to separate arrays.

Size-wise in .data sections it is (in bytes):

 vanillapatched
rs6000_builtin_info  130832 110704
rs6000_instance_info  81568  40784
rs6000_overload_info   7392   7392
rs6000_builtin_info_fntype0  10064
rs6000_instance_info_fntype   0  20392
sum  219792 189336

where previously we saved/restored for PCH those 130832+81568 bytes, now we
save/restore just 10064+20392 bytes, so this change is beneficial for the
data section size.

Unfortunately, it grows the size of the rs6000_init_generated_builtins
function, vanilla had 218328 bytes, patched has 228668.

When I applied
 void
 rs6000_init_generated_builtins ()
 {
+  bifdata *rs6000_builtin_info_p;
+  tree *rs6000_builtin_info_fntype_p;
+  ovlddata *rs6000_instance_info_p;
+  tree *rs6000_instance_info_fntype_p;
+  ovldrecord *rs6000_overload_info_p;
+  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));
+  __asm ("" : "=r" (rs6000_builtin_info_fntype_p) : "0" 
(rs6000_builtin_info_fntype));
+  __asm ("" : "=r" (rs6000_instance_info_p) : "0" (rs6000_instance_info));
+  __asm ("" : "=r" (rs6000_instance_info_fntype_p) : "0" 
(rs6000_instance_info_fntype));
+  __asm ("" : "=r" (rs6000_overload_info_p) : "0" (rs6000_overload_info));
+  #define rs6000_builtin_info rs6000_builtin_info_p
+  #define rs6000_builtin_info_fntype rs6000_builtin_info_fntype_p
+  #define rs6000_instance_info rs6000_instance_info_p
+  #define rs6000_instance_info_fntype rs6000_instance_info_fntype_p
+  #define rs6000_overload_info rs6000_overload_info_p
+
hack by hand, the size of the function is 209700 though, so if really
wanted, we could add __attribute__((__noipa__)) to the function when
building with recent enough GCC and pass pointers to the first elements
of the 5 arrays to the function as arguments.  If you want such a change,
could that be done incrementally?

Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (-m32/-m64
testing there), ok for trunk and after a while for release branches?

2024-06-03  Jakub Jelinek  

PR target/115324
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
GTY markup from struct bifdata and struct ovlddata and remove their
fntype members.  Change next member in struct ovlddata and
first_instance member of struct ovldrecord to have int type rather
than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
and rs6000_instance_info arrays, declare new
rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays,
which have GTY markup.
(write_bif_static_init): Adjust for the above changes.
(write_ovld_static_init): Likewise.
(write_init_bif_table): Likewise.
(write_init_ovld_table): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Likewise.
* config/rs6000/rs6000-c.cc (find_instance): Likewise.  Make static.
(altivec_resolve_overloaded_builtin): Adjust for

Re: [PATCH 17/52] gcn: Remove macros {FLOAT, DOUBLE, LONG_DOUBLE}_TYPE_SIZE

2024-06-03 Thread Jakub Jelinek
On Mon, Jun 03, 2024 at 05:41:11PM +0800, Kewen.Lin wrote:
> > GCN does have some partially implemented support for HFmode ... do I need 
> > to do something new for that to work?
> 
> For this hook, no, as it's mainly for float, double and long double types (C 
> language supported non decimal floating
> point types).  If you are referring to _Float16, I guess you may be 
> interested in another hook TARGET_FLOATN_MODE
> which is for FloatN types.

You don't need a new hook for that, the current _FloatNN discovery code is all
that is needed.  There should be just one mode for the IEEE compliant
implementations for each size (there is the _Float16 vs. __bf16 but the
latter isn't IEEE compliant, or just IEEE like), so tree.cc should figure
everything out together with the current langhooks.

Jakub



Re: [PATCH v2] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-06-01 Thread Jakub Jelinek
On Sat, Jun 01, 2024 at 09:21:53AM +0100, Jonathan Wakely wrote:
> On Fri, 31 May 2024 at 18:43, Alexandre Oliva  wrote:
> >
> > On May 31, 2024, Alexandre Oliva  wrote:
> >
> > >> So either don't change this line at all, or just do a simple
> > >> s/__clang__/_GLIBCXX_CLANG/
> >
> > > If c++config can be counted on, I'd be happy to do that, but I couldn't
> > > tell that it could.
> >
> > Here's what I've retested on x86_64-linux-gnu and, slightly adjusted for
> > gcc-13, on arm-vx7r2.  Ok to install?
> 
> OK
> 
> If there's any chance of getting the vxworks system headers fixed to
> work with GCC properly, that would be nice.

Fixincludes?
That seems like the standard way to workaround bugs in system headers on
proprietary targets.

Jakub



Re: [PATCH v2] [libstdc++] add _GLIBCXX_CLANG to workaround predefined __clang__

2024-06-01 Thread Jakub Jelinek
On Sat, Jun 01, 2024 at 09:21:53AM +0100, Jonathan Wakely wrote:
> On Fri, 31 May 2024 at 18:43, Alexandre Oliva  wrote:
> >
> > On May 31, 2024, Alexandre Oliva  wrote:
> >
> > >> So either don't change this line at all, or just do a simple
> > >> s/__clang__/_GLIBCXX_CLANG/
> >
> > > If c++config can be counted on, I'd be happy to do that, but I couldn't
> > > tell that it could.
> >
> > Here's what I've retested on x86_64-linux-gnu and, slightly adjusted for
> > gcc-13, on arm-vx7r2.  Ok to install?
> 
> OK
> 
> If there's any chance of getting the vxworks system headers fixed to
> work with GCC properly, that would be nice.

Fixincludes?
That seems like the standard way to workaround bugs in system headers on
proprietary targets.

Jakub



Re: [PATCH 01/11] OpenMP/PolyInt: Pass poly-int structures by address to OMP libs.

2024-05-31 Thread Jakub Jelinek
On Fri, May 31, 2024 at 08:45:54AM +0100, Richard Sandiford wrote:
> > When you say same way, do you mean the way SVE ABI defines the rules for 
> > SVE types?
> 
> No, sorry, I meant that if the choice isn't purely local to a source
> code function, the condition should be something like sizeless_type_p
> (suitably abstracted) rather than POLY_INT_CST_P.  That way, the "ABI"
> stays the same regardless of -msve-vector-bits.

There is no ABI, it is how the caller and indirect callee communicate,
but both parts are compiled with the same compiler, so it can choose
differently based on different compiler version etc.
It is effectively simplified:
struct whatever { ... };
void callee (void *x) { struct whatever *w = *x; use *w; }
void caller (void) { struct whatever w; fill in w; ABI_call (callee, ); }
(plus in some cases the callee can also update values and propagate that
back to caller).
In any case, it is a similar "ABI" to e.g. tree-nested.cc communication
between caller and nested callee, how exactly are the variables laid out
in a struct depends on compiler version and whatever it decides, same
compiler then emits both sides.

Jakub



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:20:01AM +0200, Tobias Burnus wrote:
> +  if (num_devices > 0
> +  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
> +for (int dev = 0; dev < num_devices; dev++)
> +  {
> + int pi;
> + CUresult r;
> + r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, ,
> +   CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS,
> +   dev);

Formatting nit, the CU_DEVICE_... should be below cuDeviceGetAttribute,
I think it fits like that (if it wouldn't one could use a temporary
variable).

Otherwise LGTM.

Jakub



Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 02:15:07PM +0200, Tobias Burnus wrote:
> +  bool b;
> +  hsa_status_t status;
> +  status = hsa_fns.hsa_system_get_info_fn (
> +  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, );
> +  if (status != HSA_STATUS_SUCCESS)
> + GOMP_PLUGIN_error (
> +   "HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT failed");

Formatting, the (s at the end of lines look terrible.
In the first case, perhaps using a temporary would help,
  hsa_system_info_t arg = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;
  status = hsa_fns.hsa_system_get_info_fn (arg, );
(or use something else instead of arg, as long as its short), while in the
second
GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT "
   "failed");
will do.

Other than that LGTM.

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:49:01AM +0200, Tobias Burnus wrote:
> Jakub Jelinek wrote:
> > I mean, if we want to add something, maybe better would an -include like
> > option that instead of including a file includes it directly.
> > gcc --include-inline '#pragma omp requires unified_shared_memory' ...
> 
> Likewise for Fortran, but there the question is whether it should be in the
> use-stmt, import-stmt, implicit-part or declaration-part; I guess having one
> --include-inline-use-stmt and --include-inline-declaration would make sense

Maybe name it slightly differently for Fortran and have the where it should
be added as one argument, so --whatever=where=what

> And, I guess, multiple flags should be permitted, which can then be
> processed as separate lines.

Obviously.  That was the intent with --include-inline= for C as well,
after all, -include works that way too.
-include a.h -include b.h -include c.h

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:41:04AM +0200, Tobias Burnus wrote:
> Jakub Jelinek wrote:
> > How is that option different from
> > echo '#pragma omp requires unified_shared_memory' > omp-usm.h
> > gcc -include omp-usm.h
> > ?
> > I mean with -include you can add anything you want, not just one particular
> > directive, and adding a separate option for each is just weird.
> 
> For C/C++, -include seems to be indeed sufficient (albeit not widely known).
> For Fortran, there at two issues: One placement/semantic issue: it has to be
> added per "compilation unit", i.e. to the specification part of a module,
> subprogram or main program. And a practical issue, gfortran shows:
> 
> error: command-line option '-include !$omp requires' is valid for
> C/C++/ObjC/ObjC++ but not for Fortran
> 
> Thus, for Fortran it is still intrinsically useful – even if one can argue
> whether that feature is needed at all / whether it should be added as
> command-line argument.

But then shouldn't we have an option that adds something at the start of
the declaration part of each ?
I mean, option to add 'implicit none' everywhere, or this
'!$omp requires unified_shared_memory' etc.?

I could live with an one off option for clang compatibility, I just fear
that in 2 years we'll need another one etc. and that solving it in some more
versatile way would be better.

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:26:04AM +0200, Jakub Jelinek wrote:
> > *I am especially thinking about a global variable and "#pragma omp declare
> > target". At least with 'omp requires self_maps' of OpenMP 6, it seems as if
> > 'declare target enter(global_var)' should become 'link(global_var)' where
> > the global_var pointer is updated to point to the host version.
> 
> How is that option different from
> echo '#pragma omp requires unified_shared_memory' > omp-usm.h
> gcc -include omp-usm.h
> ?
> I mean with -include you can add anything you want, not just one particular
> directive, and adding a separate option for each is just weird.

I mean, if we want to add something, maybe better would an -include like
option that instead of including a file includes it directly.
gcc --include-inline '#pragma omp requires unified_shared_memory' ...

Jakub



Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Jakub Jelinek
On Tue, May 28, 2024 at 09:23:41PM +0200, Tobias Burnus wrote:
> -fopenmp-force-usm can be useful for some badly written code. Explicity
> using 'omp requires' makes more sense but still. It might also make sense
> for testing purpose.
> 
> Unfortunately, I did not see a simple way of testing it. When trying it
> manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where
> gcn_data has the omp_requires_mask as second argument and testing showed
> that an explicit pragma and the -f... argument have the same result.
> 
> Alternative would be to move this code later, e.g. to lto-cgraph.cc's
> omp_requires_mask, which might be safer (as it avoids changing as many
> locations). On the other hand, it might require more special cases
> elsewhere.*
> 
> Comment, suggestions?
> 
> Tobias
> 
> *I am especially thinking about a global variable and "#pragma omp declare
> target". At least with 'omp requires self_maps' of OpenMP 6, it seems as if
> 'declare target enter(global_var)' should become 'link(global_var)' where
> the global_var pointer is updated to point to the host version.

How is that option different from
echo '#pragma omp requires unified_shared_memory' > omp-usm.h
gcc -include omp-usm.h
?
I mean with -include you can add anything you want, not just one particular
directive, and adding a separate option for each is just weird.

Jakub



Re: [Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Jakub Jelinek
On Tue, May 28, 2024 at 07:43:00PM +0200, Tobias Burnus wrote:
> Improve test coverage by removing 'prune-output' given that the features are
> implemented in the meanwhile.
> 
> Comments, suggestions? Otherwise I will commit the patch as obvious.
> 
> Tobias

> testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/gomp/lastprivate-conditional-1.c: Remove
>   '{ dg-prune-output "not supported yet" }'.
>   * c-c++-common/gomp/requires-1.c: Likewise.
>   * c-c++-common/gomp/requires-2.c: Likewise.
>   * c-c++-common/gomp/reverse-offload-1.c: Likewise.
>   * g++.dg/gomp/requires-1.C: Likewise.
>   * gfortran.dg/gomp/requires-1.f90: Likewise.
>   * gfortran.dg/gomp/requires-2.f90: Likewise.
>   * gfortran.dg/gomp/requires-4.f90: Likewise.
>   * gfortran.dg/gomp/requires-5.f90: Likewise.
>   * gfortran.dg/gomp/requires-6.f90: Likewise.
>   * gfortran.dg/gomp/requires-7.f90: Likewise.

LGTM.

Jakub



Re: configure adds -std=gnu++11 to CXX variable

2024-05-28 Thread Jakub Jelinek via Gcc
On Tue, May 28, 2024 at 07:35:43AM -0700, Paul Eggert wrote:
> On 2024-05-28 01:20, Jonathan Wakely wrote:
> > I am not aware of any distro ever changing the default -std setting for g++
> > or clang++. Are you attempting to solve a non-problem, but introducing new
> > ones?
> 
> If it's a non-problem for C++, why does Autoconf upgrade to C++11 when the
> default is C++98? Autoconf has done so since Autoconf 2.70 (2020), with
> nobody complaining as far as I know.
> 
> Was the Autoconf 2.70 change done so late that it had no practical effect,
> because no distro was defaulting to C++98 any more? If so, it sounds like

That seems to be the case.
Dunno about clang++ defaults, but GCC defaults to
-std=gnu++98 (before GCC 6), or
-std=gnu++14 starting with GCC 6 (April 2016), or
-std=gnu++17 starting with GCC 11 (April 2021).
So, if autoconf in 2020 c++98 default to c++11, bet it didn't affect almost
anything.  RHEL 7 uses GCC 4.8, which partially but not fully supports c++11
(e.g. GCC which is written in C++11 these days can build using GCC 4.8.5
as oldest compiler, has to use some workarounds because the C++11 support
isn't finished there, the library side even further from that).  And trying
to enable C++11 on GCC 4.4 (RHEL 6) wouldn't be a good idea, too much was
missing there.
With C++20 in GCC 14, from core most of the features are in except that
modules still need further work, for library side at least cppreference
says it is complete too but I think the ABI hasn't been declared stable yet,
so C++17 is still the default, dunno if it will change in GCC 15 or not.

> Autoconf should go back to its 2.69 behavior and not mess with the C++
> version as that's more likely to hurt than help.

Yes.

> For background on that Autoconf 2.70 change, see this 2013 thread:
> 
> https://lists.gnu.org/r/autoconf/2013-01/msg00016.html

>From what I can see, the change was proposed when the C++11 support wasn't
complete and didn't expect compilers will actually change their defaults
when the support is sufficiently stable.
Note, even for C GCC updates the default, -std=gnu99 default was changed to
-std=gnu11 in GCC 5 (April 2015) and -std=gnu17 in GCC 8 (May 2018).
-std=gnu23 support is still incomplete even in GCC 14.

Jakub



Re: configure adds -std=gnu++11 to CXX variable

2024-05-28 Thread Jakub Jelinek
On Tue, May 28, 2024 at 07:35:43AM -0700, Paul Eggert wrote:
> On 2024-05-28 01:20, Jonathan Wakely wrote:
> > I am not aware of any distro ever changing the default -std setting for g++
> > or clang++. Are you attempting to solve a non-problem, but introducing new
> > ones?
> 
> If it's a non-problem for C++, why does Autoconf upgrade to C++11 when the
> default is C++98? Autoconf has done so since Autoconf 2.70 (2020), with
> nobody complaining as far as I know.
> 
> Was the Autoconf 2.70 change done so late that it had no practical effect,
> because no distro was defaulting to C++98 any more? If so, it sounds like

That seems to be the case.
Dunno about clang++ defaults, but GCC defaults to
-std=gnu++98 (before GCC 6), or
-std=gnu++14 starting with GCC 6 (April 2016), or
-std=gnu++17 starting with GCC 11 (April 2021).
So, if autoconf in 2020 c++98 default to c++11, bet it didn't affect almost
anything.  RHEL 7 uses GCC 4.8, which partially but not fully supports c++11
(e.g. GCC which is written in C++11 these days can build using GCC 4.8.5
as oldest compiler, has to use some workarounds because the C++11 support
isn't finished there, the library side even further from that).  And trying
to enable C++11 on GCC 4.4 (RHEL 6) wouldn't be a good idea, too much was
missing there.
With C++20 in GCC 14, from core most of the features are in except that
modules still need further work, for library side at least cppreference
says it is complete too but I think the ABI hasn't been declared stable yet,
so C++17 is still the default, dunno if it will change in GCC 15 or not.

> Autoconf should go back to its 2.69 behavior and not mess with the C++
> version as that's more likely to hurt than help.

Yes.

> For background on that Autoconf 2.70 change, see this 2013 thread:
> 
> https://lists.gnu.org/r/autoconf/2013-01/msg00016.html

>From what I can see, the change was proposed when the C++11 support wasn't
complete and didn't expect compilers will actually change their defaults
when the support is sufficiently stable.
Note, even for C GCC updates the default, -std=gnu99 default was changed to
-std=gnu11 in GCC 5 (April 2015) and -std=gnu17 in GCC 8 (May 2018).
-std=gnu23 support is still incomplete even in GCC 14.

Jakub




[gcc r13-8806] libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 branch

2024-05-28 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:fd91953c4dfba2a592ec15f2b4a2da28b1cf1947

commit r13-8806-gfd91953c4dfba2a592ec15f2b4a2da28b1cf1947
Author: Jakub Jelinek 
Date:   Tue May 28 16:30:48 2024 +0200

libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 branch

The r13-8207-g17acf9fbeb10d7adad commit changed some tests to use
-lstdc++exp instead of -lstdc++_libbacktrace, but it didn't change
the 19_diagnostics/stacktrace/hash.cc test, presumably because
when it was added on the trunk, it already had -lstdc++exp and
it was changed to -lstdc++_libbacktrace only in the
r13-8067-g16635b89f36c07b9e0 cherry-pick.

The test fails with
/usr/bin/ld: cannot find -lstdc++_libbacktrace
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: 19_diagnostics/stacktrace/hash.cc (test for excess errors)
without this (while the library is still built, it isn't added in
-L options).

2024-05-27  Jakub Jelinek  

* testsuite/19_diagnostics/stacktrace/hash.cc: Adjust
dg-options to use -lstdc++exp.

Diff:
---
 libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc 
b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
index a2f61e49981..37e6d6dd7ec 100644
--- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
+++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++23 -lstdc++_libbacktrace" }
+// { dg-options "-std=gnu++23 -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-effective-target stacktrace }


Re: configure adds -std=gnu++11 to CXX variable

2024-05-27 Thread Jakub Jelinek
On Mon, May 27, 2024 at 12:04:40PM -0700, Paul Eggert wrote:
> On 2024-05-27 03:35, Florian Weimer wrote:
> > Does this turn on experimental language modes by default?  That's
> > probably not what we want.
> 
> What do C++ developers want these days? Autoconf should have a reasonable
> default, and C++11 is surely not a good default anymore.

Maybe respect the carefully chosen compiler default (unless explicitly
overridden in configure.ac)?

Jakub




Re: configure adds -std=gnu++11 to CXX variable

2024-05-27 Thread Jakub Jelinek via Gcc
On Mon, May 27, 2024 at 12:04:40PM -0700, Paul Eggert wrote:
> On 2024-05-27 03:35, Florian Weimer wrote:
> > Does this turn on experimental language modes by default?  That's
> > probably not what we want.
> 
> What do C++ developers want these days? Autoconf should have a reasonable
> default, and C++11 is surely not a good default anymore.

Maybe respect the carefully chosen compiler default (unless explicitly
overridden in configure.ac)?

Jakub



Re: [PATCH] tree-optimization/115232 - demangle failure during -Waccess

2024-05-27 Thread Jakub Jelinek
On Mon, May 27, 2024 at 11:11:43AM +0200, Richard Biener wrote:
> For the following testcase we fail to demangle
> _ZZN5OuterIvE6methodIvEEvvQ3cstITL0__EEN5InnernwEm and
> _ZZN5OuterIvE6methodIvEEvvQ3cstITL0__EEN5InnerdlEPv and in turn end
> up building NULL references.  The following puts in a safeguard for
> faile demangling into -Waccess.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
>   PR tree-optimization/115232
>   * gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
>   failure to demangle gracefully.
> 
>   * g++.dg/pr115232.C: New testcase.

LGTM, thanks.

Jakub



[PATCH] libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 branch

2024-05-27 Thread Jakub Jelinek
Hi!

The r13-8207-g17acf9fbeb10d7adad commit changed some tests to use
-lstdc++exp instead of -lstdc++_libbacktrace, but it didn't change
the 19_diagnostics/stacktrace/hash.cc test, presumably because
when it was added on the trunk, it already had -lstdc++exp and
it was changed to -lstdc++_libbacktrace only in the
r13-8067-g16635b89f36c07b9e0 cherry-pick.

The test fails with
/usr/bin/ld: cannot find -lstdc++_libbacktrace
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: 19_diagnostics/stacktrace/hash.cc (test for excess errors)
without this (while the library is still built, it isn't added in
-L options).

Ok for 13 branch?

I think the r13-8067 cherry-pick hasn't been applied to 12 branch,
so we don't need it there.

2024-05-27  Jakub Jelinek  

* testsuite/19_diagnostics/stacktrace/hash.cc: Adjust
dg-options to use -lstdc++exp.

--- libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc.jj 2023-11-22 
11:03:28.812657550 +0100
+++ libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc2024-05-27 
10:18:44.900058884 +0200
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++23 -lstdc++_libbacktrace" }
+// { dg-options "-std=gnu++23 -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-effective-target stacktrace }
 


Jakub



[PATCH] libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 branch

2024-05-27 Thread Jakub Jelinek
Hi!

The r13-8207-g17acf9fbeb10d7adad commit changed some tests to use
-lstdc++exp instead of -lstdc++_libbacktrace, but it didn't change
the 19_diagnostics/stacktrace/hash.cc test, presumably because
when it was added on the trunk, it already had -lstdc++exp and
it was changed to -lstdc++_libbacktrace only in the
r13-8067-g16635b89f36c07b9e0 cherry-pick.

The test fails with
/usr/bin/ld: cannot find -lstdc++_libbacktrace
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: 19_diagnostics/stacktrace/hash.cc (test for excess errors)
without this (while the library is still built, it isn't added in
-L options).

Ok for 13 branch?

I think the r13-8067 cherry-pick hasn't been applied to 12 branch,
so we don't need it there.

2024-05-27  Jakub Jelinek  

* testsuite/19_diagnostics/stacktrace/hash.cc: Adjust
dg-options to use -lstdc++exp.

--- libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc.jj 2023-11-22 
11:03:28.812657550 +0100
+++ libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc2024-05-27 
10:18:44.900058884 +0200
@@ -1,4 +1,4 @@
-// { dg-options "-std=gnu++23 -lstdc++_libbacktrace" }
+// { dg-options "-std=gnu++23 -lstdc++exp" }
 // { dg-do run { target c++23 } }
 // { dg-require-effective-target stacktrace }
 


Jakub



Re: [C PATCH, v2]: allow aliasing of compatible types derived from enumeral types [PR115157]

2024-05-24 Thread Jakub Jelinek
On Fri, May 24, 2024 at 05:39:45PM +0200, Martin Uecker wrote:
> PR 115157
> PR 115177
> 
> gcc/c/
> * c-decl.cc (shadow_tag-warned,parse_xref_tag,start_enum,
> finish_enum): Set SET_TYPE_STRUCTURAL_EQUALITY / TYPE_CANONICAL.
> * c-obj-common.cc (get_alias_set): Remove special case.
> (get_aka_type): Add special case.
> 
> gcc/c-family/
> * c-attribs.cc (handle_hardbool_attribute): Set TYPE_CANONICAL
> for hardbools.
> 
> gcc/
> * godump.cc (go_output_typedef): use TYPE_MAIN_VARIANT instead
> of TYPE_CANONICAL.

Just a nit:
s/use/Use/

Jakub



gcc-wwwdocs branch master updated. b20402b74f21724e2772d48ec8f12043ca785503

2024-05-24 Thread Jakub Jelinek via Gcc-cvs-wwwdocs
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  b20402b74f21724e2772d48ec8f12043ca785503 (commit)
  from  c18141d3bac790885c68d2b7fa6e99559557460d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit b20402b74f21724e2772d48ec8f12043ca785503
Author: Jakub Jelinek 
Date:   Fri May 24 10:36:19 2024 +0200

Adjust 12.3.1 Status Report URL.

diff --git a/htdocs/index.html b/htdocs/index.html
index afc76800..0680ef30 100644
--- a/htdocs/index.html
+++ b/htdocs/index.html
@@ -189,7 +189,7 @@ More news? Let ger...@pfeifer.com know!
 
   Status:
   
-  https://gcc.gnu.org/pipermail/gcc/2023-May/241260.html;>2023-05-08
+  https://gcc.gnu.org/pipermail/gcc/2024-May/243994.html;>2024-05-24
   
   (regression fixes  docs only).
   

---

Summary of changes:
 htdocs/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs


  1   2   3   4   5   6   7   8   9   10   >