[gcc r15-501] tree-cfg: Move the returns_twice check to be last statement only [PR114301]

2024-05-15 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:642f31d6286b8a342130fbface51530befd975fd

commit r15-501-g642f31d6286b8a342130fbface51530befd975fd
Author: Andrew Pinski 
Date:   Tue May 14 06:29:18 2024 -0700

tree-cfg: Move the returns_twice check to be last statement only [PR114301]

When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.

Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

PR tree-optimization/114301
gcc/ChangeLog:

* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/tree-cfg.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index b2d47b720847..7fb7b92966be 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6495,6 +6495,13 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
&& gimple_call_internal_p (last)
&& gimple_call_internal_unique_p (last))
   return false;
+
+/* Prohibit duplication of returns_twice calls, otherwise associated
+   abnormal edges also need to be duplicated properly.
+   return_twice functions will always be the last statement.  */
+if (is_gimple_call (last)
+   && (gimple_call_flags (last) & ECF_RETURNS_TWICE))
+  return false;
   }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
@@ -6502,15 +6509,12 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
 {
   gimple *g = gsi_stmt (gsi);
 
-  /* Prohibit duplication of returns_twice calls, otherwise associated
-abnormal edges also need to be duplicated properly.
-An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 duplicated as part of its group, or not at all.
 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
 group, so the same holds there.  */
   if (is_gimple_call (g)
- && (gimple_call_flags (g) & ECF_RETURNS_TWICE
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)


[gcc r15-502] libstdc++: Give std::memory_order a fixed underlying type [PR89624]

2024-05-15 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:99dd1be14172445795f0012b935359e7014a2215

commit r15-502-g99dd1be14172445795f0012b935359e7014a2215
Author: Jonathan Wakely 
Date:   Thu Apr 11 19:12:48 2024 +0100

libstdc++: Give std::memory_order a fixed underlying type [PR89624]

Prior to C++20 this enum type doesn't have a fixed underlying type,
which means it can be modified by -fshort-enums, which then means the
HLE bits are outside the range of valid values for the type.

As it has a fixed type of int in C++20 and later, do the same for
earlier standards too. This is technically a change for C++17 down,
because the implicit underlying type (without -fshort-enums) was
unsigned before. I doubt it matters in practice. That incompatibility
already exists between C++17 and C++20 and nobody has noticed or
complained. Now at least the underlying type will be int for all -std
modes.

libstdc++-v3/ChangeLog:

PR libstdc++/89624
* include/bits/atomic_base.h (memory_order): Use int as
underlying type.
* testsuite/29_atomics/atomic/89624.cc: New test.

Diff:
---
 libstdc++-v3/include/bits/atomic_base.h   | 4 ++--
 libstdc++-v3/testsuite/29_atomics/atomic/89624.cc | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index dd360302f801..062f15497403 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -78,7 +78,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   inline constexpr memory_order memory_order_acq_rel = memory_order::acq_rel;
   inline constexpr memory_order memory_order_seq_cst = memory_order::seq_cst;
 #else
-  typedef enum memory_order
+  enum memory_order : int
 {
   memory_order_relaxed,
   memory_order_consume,
@@ -86,7 +86,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   memory_order_release,
   memory_order_acq_rel,
   memory_order_seq_cst
-} memory_order;
+};
 #endif
 
   /// @cond undocumented
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/89624.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic/89624.cc
new file mode 100644
index ..480f7c65e2d7
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic/89624.cc
@@ -0,0 +1,9 @@
+// { dg-options "-fshort-enums" }
+// { dg-do compile { target c++11 } }
+
+// Bug 89624 HLE bits don't work with -fshort-enums or -fstrict-enums
+
+#include 
+
+static_assert((std::memory_order_acquire | std::__memory_order_hle_acquire)
+!= std::memory_order_acquire, "HLE acquire sets a bit");


[gcc r15-503] libstdc++: Rewrite std::variant comparisons without macros

2024-05-15 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:d08247b77831c496277b266807d4bd17656d1654

commit r15-503-gd08247b77831c496277b266807d4bd17656d1654
Author: Jonathan Wakely 
Date:   Tue Apr 2 19:40:51 2024 +0100

libstdc++: Rewrite std::variant comparisons without macros

libstdc++-v3/ChangeLog:

* include/std/variant (__detail::__variant::__compare): New
function template.
(operator==, operator!=, operator<, operator>, operator<=)
(operator>=): Replace macro definition with handwritten function
calling __detail::__variant::__compare.
(operator<=>): Call __detail::__variant::__compare.

Diff:
---
 libstdc++-v3/include/std/variant | 163 +++
 1 file changed, 112 insertions(+), 51 deletions(-)

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index bf05eec9a6bf..cfb4bcdbcc98 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -48,6 +48,7 @@
 #include 
 #include  // in_place_index_t
 #if __cplusplus >= 202002L
+# include 
 # include 
 #endif
 
@@ -1237,47 +1238,119 @@ namespace __variant
 
   struct monostate { };
 
+namespace __detail::__variant
+{
+  template
+constexpr _Ret
+__compare(_Ret __ret, const _Vp& __lhs, const _Vp& __rhs, _Op __op)
+{
+  __variant::__raw_idx_visit(
+   [&__ret, &__lhs, __op] (auto&& __rhs_mem, auto __rhs_index) mutable
+   {
+ if constexpr (__rhs_index != variant_npos)
+   {
+ if (__lhs.index() == __rhs_index.value)
+   {
+ auto& __this_mem = std::get<__rhs_index>(__lhs);
+ __ret = __op(__this_mem, __rhs_mem);
+ return;
+   }
+   }
+ __ret = __op(__lhs.index() + 1, __rhs_index + 1);
+   }, __rhs);
+  return __ret;
+}
+} // namespace __detail::__variant
+
+  template
 #if __cpp_lib_concepts
-# define _VARIANT_RELATION_FUNCTION_CONSTRAINTS(TYPES, OP) \
-  requires ((requires (const TYPES& __t) { \
-   { __t OP __t } -> __detail::__boolean_testable; }) && ...)
-#else
-# define _VARIANT_RELATION_FUNCTION_CONSTRAINTS(TYPES, OP)
+requires ((requires (const _Types& __t) {
+  { __t == __t } -> convertible_to; }) && ...)
 #endif
+constexpr bool
+operator== [[nodiscard]] (const variant<_Types...>& __lhs,
+ const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l == __r;
+   });
+}
 
-#define _VARIANT_RELATION_FUNCTION_TEMPLATE(__OP) \
-  template \
-_VARIANT_RELATION_FUNCTION_CONSTRAINTS(_Types, __OP) \
-constexpr bool \
-operator __OP [[nodiscard]] (const variant<_Types...>& __lhs, \
-const variant<_Types...>& __rhs) \
-{ \
-  bool __ret = true; \
-  __detail::__variant::__raw_idx_visit( \
-[&__ret, &__lhs] (auto&& __rhs_mem, auto __rhs_index) mutable \
-{ \
- if constexpr (__rhs_index != variant_npos) \
-   { \
- if (__lhs.index() == __rhs_index) \
-   { \
- auto& __this_mem = std::get<__rhs_index>(__lhs);  \
-  __ret = __this_mem __OP __rhs_mem; \
- return; \
-} \
-} \
- __ret = (__lhs.index() + 1) __OP (__rhs_index + 1); \
-   }, __rhs); \
-  return __ret; \
+  template
+#if __cpp_lib_concepts
+requires ((requires (const _Types& __t) {
+  { __t != __t } -> convertible_to; }) && ...)
+#endif
+constexpr bool
+operator!= [[nodiscard]] (const variant<_Types...>& __lhs,
+ const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l != __r;
+   });
 }
 
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(<)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(<=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(==)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(!=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(>=)
-  _VARIANT_RELATION_FUNCTION_TEMPLATE(>)
+  template
+#if __cpp_lib_concepts
+requires ((requires (const _Types& __t) {
+  { __t < __t } -> convertible_to; }) && ...)
+#endif
+constexpr bool
+operator< [[nodiscard]] (const variant<_Types...>& __lhs,
+const variant<_Types...>& __rhs)
+{
+  return __detail::__variant::__compare(true, __lhs, __rhs,
+   [](auto&& __l, auto&& __r) {
+ return __l < __r;
+   });
+}
 
-#undef _V

[gcc r15-504] [prange] Default pointers_handled_p() to false.

2024-05-15 Thread Aldy Hernandez via Gcc-cvs
https://gcc.gnu.org/g:c400b2100719d0a9e5989c63e0827b9e98919df3

commit r15-504-gc400b2100719d0a9e5989c63e0827b9e98919df3
Author: Aldy Hernandez 
Date:   Tue May 14 16:21:50 2024 +0200

[prange] Default pointers_handled_p() to false.

The pointers_handled_p() method is an internal range-op helper to help
catch dispatch type mismatches for pointer operands.  This is what
caught the IPA mismatch in PR114985.

This method is only a temporary measure to catch any incompatibilities
in the current pointer range-op entries.  This patch returns true for
any *new* entries in the range-op table, as the current ones are
already fleshed out.  This keeps us from having to implement this
boilerplate function for any new range-op entries.

PR tree-optimization/114995
* range-op-ptr.cc (range_operator::pointers_handled_p): Default to 
true.

Diff:
---
 gcc/range-op-ptr.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/range-op-ptr.cc b/gcc/range-op-ptr.cc
index 65cca65103af..2f47f3354ed7 100644
--- a/gcc/range-op-ptr.cc
+++ b/gcc/range-op-ptr.cc
@@ -58,7 +58,7 @@ bool
 range_operator::pointers_handled_p (range_op_dispatch_type ATTRIBUTE_UNUSED,
unsigned dispatch ATTRIBUTE_UNUSED) const
 {
-  return false;
+  return true;
 }
 
 bool


[gcc r15-505] RISC-V: Add test cases for cpymem expansion

2024-05-15 Thread Christoph Mテシllner via Gcc-cvs
https://gcc.gnu.org/g:00029408387e9cc64e135324c22d15cd5a70e946

commit r15-505-g00029408387e9cc64e135324c22d15cd5a70e946
Author: Christoph Müllner 
Date:   Wed May 1 16:54:42 2024 +0200

RISC-V: Add test cases for cpymem expansion

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.

Diff:
---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 131 +++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 138 +
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 129 +++
 gcc/testsuite/gcc.target/riscv/cpymem-64.c | 138 +
 4 files changed, 536 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
new file mode 100644
index ..33fb9891d823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -0,0 +1,131 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)  \
+void copy_##N (void *to, void *from)   \
+{  \
+  __builtin_memcpy (to, from, N);  \
+}
+
+#define COPY_ALIGNED_N(N)  \
+void copy_aligned_##N (void *to, void *from)   \
+{  \
+  to = __builtin_assume_aligned(to, sizeof(long)); \
+  from = __builtin_assume_aligned(from, sizeof(long)); \
+  __builtin_memcpy (to, from, N);  \
+}
+
+/*
+**copy_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**...
+**lbu\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**...
+**sb\t[at][0-9],0\([at][0-9]\)
+**...
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],14\([at][0-9]\)
+**sb\t[at][0-9],14\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**...
+**lw\t[at][0-9],20\([at][0-9]\)
+**...
+**sw\t[at][0-9],20\([at][0-9]\)
+**...
+**lbu\t[at][0-9],26\([at][0-9]\)
+**sb\t[at][0-9],26\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
new file mode 100644
index ..44ba14a1d51f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-outpu

[gcc r15-509] testsuite: i386: Fix g++.target/i386/pr97054.C on Solaris

2024-05-15 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:a0e9fde8766a5b5a28a76c3a6fb0276efa47cd6d

commit r15-509-ga0e9fde8766a5b5a28a76c3a6fb0276efa47cd6d
Author: Rainer Orth 
Date:   Wed May 15 13:13:48 2024 +0200

testsuite: i386: Fix g++.target/i386/pr97054.C on Solaris

g++.target/i386/pr97054.C currently FAILs on 64-bit Solaris/x86:

FAIL: g++.target/i386/pr97054.C  -std=gnu++14 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++14 compilation failed to 
produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++17 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++17 compilation failed to 
produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++2a (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++2a compilation failed to 
produce executable
FAIL: g++.target/i386/pr97054.C  -std=gnu++98 (test for excess errors)
UNRESOLVED: g++.target/i386/pr97054.C  -std=gnu++98 compilation failed to 
produce executable

Excess errors:
/vol/gcc/src/hg/master/local/gcc/testsuite/g++.target/i386/pr97054.C:49:20: 
error: frame pointer required, but reserved

Since Solaris/x86 defaults to -fno-omit-frame-pointer, this patch
explicitly builds with -fomit-frame-pointer as is the default on other
x86 targets.

Tested on i386-pc-solaris2.11 (32 and 64-bit) and x86_64-pc-linux-gnu.

2024-05-15  Rainer Orth  

gcc/testsuite:
* g++.target/i386/pr97054.C (dg-options): Add -fomit-frame-pointer.

Diff:
---
 gcc/testsuite/g++.target/i386/pr97054.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.target/i386/pr97054.C 
b/gcc/testsuite/g++.target/i386/pr97054.C
index d0693af2a42c..b18ef2e46cae 100644
--- a/gcc/testsuite/g++.target/i386/pr97054.C
+++ b/gcc/testsuite/g++.target/i386/pr97054.C
@@ -1,6 +1,6 @@
 // { dg-do run { target { ! ia32 } } }
 // { dg-require-effective-target fstack_protector }
-// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC 
-fstack-protector-strong -O2" }
+// { dg-options "-O2 -fno-strict-aliasing -msse4.2 -mfpmath=sse -fPIC 
-fstack-protector-strong -O2 -fomit-frame-pointer" }
 
 struct p2_icode *ipc;
 register int pars asm("r13");


[gcc r15-510] libstdc++: Fix data race in std::basic_ios::fill() [PR77704]

2024-05-15 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:23ef0f68ad5fca1fd7027c5f6cb9f6d27b28

commit r15-510-g23ef0f68ad5fca1fd7027c5f6cb9f6d27b28
Author: Jonathan Wakely 
Date:   Fri May 3 20:00:08 2024 +0100

libstdc++: Fix data race in std::basic_ios::fill() [PR77704]

The lazy caching in std::basic_ios::fill() updates a mutable member
without synchronization, which can cause a data race if two threads both
call fill() on the same stream object when _M_fill_init is false.

To avoid this we can just cache the _M_fill member and set _M_fill_init
early in std::basic_ios::init, instead of doing it lazily. As explained
by the comment in init, there's a good reason for doing it lazily. When
char_type is neither char nor wchar_t, the locale might not have a
std::ctype facet, so getting the fill character would throw
an exception. The current lazy init allows using unformatted I/O with
such a stream, because the fill character is never needed and so it
doesn't matter if the locale doesn't have a ctype facet. We
can maintain this property by only setting the fill character in
std::basic_ios::init if the ctype facet is present at that time. If
fill() is called later and the fill character wasn't set by init, we can
get it from the stream's current locale at the point when fill() is
called (and not try to cache it without synchronization). If the stream
hasn't been imbued with a locale that includes the facet when we need
the fill() character, then throw bad_cast at that point.

This causes a change in behaviour for the following program:

  std::ostringstream out;
  out.imbue(loc);
  auto fill = out.fill();

Previously the fill character would have been set when fill() is called,
and so would have used the new locale. This commit changes it so that
the fill character is set on construction and isn't affected by the new
locale being imbued later. This new behaviour seems to be what the
standard requires, and matches MSVC.

The new 27_io/basic_ios/fill/char/fill.cc test verifies that it's still
possible to use a std::basic_ios without the ctype facet
being present at construction.

libstdc++-v3/ChangeLog:

PR libstdc++/77704
* include/bits/basic_ios.h (basic_ios::fill()): Do not modify
_M_fill and _M_fill_init in a const member function.
(basic_ios::fill(char_type)): Use _M_fill directly instead of
calling fill(). Set _M_fill_init to true.
* include/bits/basic_ios.tcc (basic_ios::init): Set _M_fill and
_M_fill_init here instead.
* testsuite/27_io/basic_ios/fill/char/1.cc: New test.
* testsuite/27_io/basic_ios/fill/wchar_t/1.cc: New test.

Diff:
---
 libstdc++-v3/include/bits/basic_ios.h  | 10 ++-
 libstdc++-v3/include/bits/basic_ios.tcc| 15 +++--
 .../testsuite/27_io/basic_ios/fill/char/1.cc   | 78 ++
 .../testsuite/27_io/basic_ios/fill/wchar_t/1.cc| 55 +++
 4 files changed, 148 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_ios.h 
b/libstdc++-v3/include/bits/basic_ios.h
index 258e6042b8f7..bc3be4d2e371 100644
--- a/libstdc++-v3/include/bits/basic_ios.h
+++ b/libstdc++-v3/include/bits/basic_ios.h
@@ -373,11 +373,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   char_type
   fill() const
   {
-   if (!_M_fill_init)
- {
-   _M_fill = this->widen(' ');
-   _M_fill_init = true;
- }
+   if (__builtin_expect(!_M_fill_init, false))
+ return this->widen(' ');
return _M_fill;
   }
 
@@ -393,8 +390,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   char_type
   fill(char_type __ch)
   {
-   char_type __old = this->fill();
+   char_type __old = _M_fill;
_M_fill = __ch;
+   _M_fill_init = true;
return __old;
   }
 
diff --git a/libstdc++-v3/include/bits/basic_ios.tcc 
b/libstdc++-v3/include/bits/basic_ios.tcc
index a9313736e327..0197bdf8f671 100644
--- a/libstdc++-v3/include/bits/basic_ios.tcc
+++ b/libstdc++-v3/include/bits/basic_ios.tcc
@@ -138,13 +138,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // return without throwing an exception. Unfortunately,
   // ctype is not necessarily a required facet, so
   // streams with char_type != [char, wchar_t] will not have it by
-  // default. Because of this, the correct value for _M_fill is
-  // constructed on the first call of fill(). That way,
+  // default. If the ctype facet is available now,
+  // _M_fill is set here, but otherwise no fill character will be
+  // cached and a call to fill() will check for the facet again later
+  // (and will throw if the facet is still not present). This way
   // unformatted input and output with non-required basic_ios
   // instantiations is possible even without imbuing the 

[gcc r15-511] testsuite: Require lto-plugin in gcc.dg/ipa/ipa-icf-38.c [PR85656]

2024-05-15 Thread Rainer Orth via Gcc-cvs
https://gcc.gnu.org/g:ff105c39bde43bdb57615e3a4c7af71fbef5f26e

commit r15-511-gff105c39bde43bdb57615e3a4c7af71fbef5f26e
Author: Rainer Orth 
Date:   Wed May 15 13:23:08 2024 +0200

testsuite: Require lto-plugin in gcc.dg/ipa/ipa-icf-38.c [PR85656]

gcc.dg/ipa/ipa-icf-38.c currently FAILs on Solaris (SPARC and x86, 32
and 64-bit):

FAIL: gcc.dg/ipa/ipa-icf-38.c scan-ltrans-tree-dump-not optimized "Function 
bar"

As it turns out, this only happens when the Solaris linker is used; with
GNU ld the test PASSes just fine.  In fact, that happens because gld
supports the lto-plugin while ld does not: in a Solaris build with gld,
the test FAILs the same way as with ld when -fno-use-linker-plugin is
passed, so this patch requires linker_plugin.

Tested on i386-pc-solaris2.11 (ld and gld) and x86_64-pc-linux-gnu.

2024-05-15  Rainer Orth  

gcc/testsuite:
PR ipa/85656
* gcc.dg/ipa/ipa-icf-38.c: Require linker_plugin.

Diff:
---
 gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c 
b/gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c
index a8824d040e52..74b5e56f9180 100644
--- a/gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c
+++ b/gcc/testsuite/gcc.dg/ipa/ipa-icf-38.c
@@ -2,6 +2,7 @@
 /* { dg-require-alias "" } */
 /* { dg-options "-O2 -fdump-ipa-icf-optimized -flto -fdump-tree-optimized 
-fno-ipa-vrp" } */
 /* { dg-require-effective-target lto } */
+/* { dg-require-effective-target linker_plugin } */
 /* { dg-additional-sources "ipa-icf-38a.c" }*/
 
 /* Based on ipa-icf-3.c.  */


[gcc r15-512] Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA

2024-05-15 Thread Jan Hubicka via Gcc-cvs
https://gcc.gnu.org/g:9b7cad5884f21cc5783075be0043777448db3fab

commit r15-512-g9b7cad5884f21cc5783075be0043777448db3fab
Author: Jan Hubicka 
Date:   Wed May 15 14:14:27 2024 +0200

Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA

while building more testcases for ipa-icf I noticed that there are two 
places
in aliasing code where we still compare TYPE_MAIN_VARIANT for pointer 
equality.
This is not good idea for LTO since type merging may not happen for example
when in one unit pointed to type is forward declared while in other it is 
fully
defined.  We have same_type_for_tbaa for that.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* alias.cc (reference_alias_ptr_type_1): Use 
view_converted_memref_p.
* alias.h (view_converted_memref_p): Declare.
* tree-ssa-alias.cc (view_converted_memref_p): Export.
(ao_compare::compare_ao_refs): Use same_type_for_tbaa.

Diff:
---
 gcc/alias.cc  | 5 +
 gcc/alias.h   | 1 +
 gcc/tree-ssa-alias.cc | 6 +++---
 3 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 808e2095d9b4..853e84d7439a 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -770,10 +770,7 @@ reference_alias_ptr_type_1 (tree *t)
   /* If the innermost reference is a MEM_REF that has a
  conversion embedded treat it like a VIEW_CONVERT_EXPR above,
  using the memory access type for determining the alias-set.  */
-  if (TREE_CODE (inner) == MEM_REF
-  && (TYPE_MAIN_VARIANT (TREE_TYPE (inner))
- != TYPE_MAIN_VARIANT
-  (TREE_TYPE (TREE_TYPE (TREE_OPERAND (inner, 1))
+  if (view_converted_memref_p (inner))
 {
   tree alias_ptrtype = TREE_TYPE (TREE_OPERAND (inner, 1));
   /* Unless we have the (aggregate) effective type of the access
diff --git a/gcc/alias.h b/gcc/alias.h
index f8d93e8b5f4c..36095f0bf736 100644
--- a/gcc/alias.h
+++ b/gcc/alias.h
@@ -41,6 +41,7 @@ bool alias_ptr_types_compatible_p (tree, tree);
 int compare_base_decls (tree, tree);
 bool refs_same_for_tbaa_p (tree, tree);
 bool mems_same_for_tbaa_p (rtx, rtx);
+bool view_converted_memref_p (tree);
 
 /* This alias set can be used to force a memory to conflict with all
other memories, creating a barrier across which no memory reference
diff --git a/gcc/tree-ssa-alias.cc b/gcc/tree-ssa-alias.cc
index 374ba04e6fd0..96301bbde7fa 100644
--- a/gcc/tree-ssa-alias.cc
+++ b/gcc/tree-ssa-alias.cc
@@ -2049,7 +2049,7 @@ decl_refs_may_alias_p (tree ref1, tree base1,
which is done by ao_ref_base and thus one extra walk
of handled components is needed.  */
 
-static bool
+bool
 view_converted_memref_p (tree base)
 {
   if (TREE_CODE (base) != MEM_REF && TREE_CODE (base) != TARGET_MEM_REF)
@@ -4330,8 +4330,8 @@ ao_compare::compare_ao_refs (ao_ref *ref1, ao_ref *ref2,
   else if ((end_struct_ref1 != NULL) != (end_struct_ref2 != NULL))
 return flags | ACCESS_PATH;
   if (end_struct_ref1
-  && TYPE_MAIN_VARIANT (TREE_TYPE (end_struct_ref1))
-!= TYPE_MAIN_VARIANT (TREE_TYPE (end_struct_ref2)))
+  && same_type_for_tbaa (TREE_TYPE (end_struct_ref1),
+TREE_TYPE (end_struct_ref2)) != 1)
 return flags | ACCESS_PATH;
 
   /* Now compare all handled components of the access path.


[gcc r15-513] AArch64: Use UZP1 instead of INS

2024-05-15 Thread Wilco Dijkstra via Gcc-cvs
https://gcc.gnu.org/g:43fb827f259e6fdea39bc4021950c810be769d58

commit r15-513-g43fb827f259e6fdea39bc4021950c810be769d58
Author: Wilco Dijkstra 
Date:   Wed May 15 13:07:27 2024 +0100

AArch64: Use UZP1 instead of INS

Use UZP1 instead of INS when combining low and high halves of vectors.
UZP1 has 3 operands which improves register allocation, and is faster on
some microarchitectures.

gcc:
* config/aarch64/aarch64-simd.md (aarch64_combine_internal):
Use UZP1 instead of INS.
(aarch64_combine_internal_be): Likewise.

gcc/testsuite:
* gcc.target/aarch64/ldp_stp_16.c: Update to check for UZP1.
* gcc.target/aarch64/pr109072_1.c: Likewise.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-9.c: Likewise.

Diff:
---
 gcc/config/aarch64/aarch64-simd.md |  4 ++--
 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c  | 16 
 gcc/testsuite/gcc.target/aarch64/pr109072_1.c  |  4 ++--
 gcc/testsuite/gcc.target/aarch64/vec-init-14.c |  4 ++--
 gcc/testsuite/gcc.target/aarch64/vec-init-9.c  | 12 ++--
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f8bb973a278c..16b7445d9f72 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4388,7 +4388,7 @@
&& (register_operand (operands[0], mode)
|| register_operand (operands[2], mode))"
   {@ [ cons: =0 , 1  , 2   ; attrs: type   , arch  ]
- [ w, 0  , w   ; neon_ins, simd  ] 
ins\t%0.[1], %2.[0]
+ [ w, w  , w   ; neon_permute, simd  ] 
uzp1\t%0.2, %1.2, %2.2
  [ w, 0  , ?r  ; neon_from_gp, simd  ] 
ins\t%0.[1], %2
  [ w, 0  , ?r  ; f_mcr , * ] 
fmov\t%0.d[1], %2
  [ w, 0  , Utv ; neon_load1_one_lane , simd  ] 
ld1\t{%0.}[1], %2
@@ -4407,7 +4407,7 @@
&& (register_operand (operands[0], mode)
|| register_operand (operands[2], mode))"
   {@ [ cons: =0 , 1  , 2   ; attrs: type   , arch  ]
- [ w, 0  , w   ; neon_ins, simd  ] 
ins\t%0.[1], %2.[0]
+ [ w, w  , w   ; neon_permute, simd  ] 
uzp1\t%0.2, %1.2, %2.2
  [ w, 0  , ?r  ; neon_from_gp, simd  ] 
ins\t%0.[1], %2
  [ w, 0  , ?r  ; f_mcr , * ] 
fmov\t%0.d[1], %2
  [ w, 0  , Utv ; neon_load1_one_lane , simd  ] 
ld1\t{%0.}[1], %2
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c 
b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
index f1f46e051a86..95835aa2eb41 100644
--- a/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
@@ -80,16 +80,16 @@ CONS2_FN (2, float);
 
 /*
 ** cons2_4_float:  { target aarch64_little_endian }
-** ins v0.s\[1\], v1.s\[0\]
-** stp d0, d0, \[x0\]
-** stp d0, d0, \[x0, #?16\]
+** uzp1v([0-9])\.2s, v0\.2s, v1\.2s
+** stp d\1, d\1, \[x0\]
+** stp d\1, d\1, \[x0, #?16\]
 ** ret
 */
 /*
 ** cons2_4_float:  { target aarch64_big_endian }
-** ins v1.s\[1\], v0.s\[0\]
-** stp d1, d1, \[x0\]
-** stp d1, d1, \[x0, #?16\]
+** uzp1v([0-9])\.2s, v1\.2s, v0\.2s
+** stp d\1, d\1, \[x0\]
+** stp d\1, d\1, \[x0, #?16\]
 ** ret
 */
 CONS2_FN (4, float);
@@ -125,8 +125,8 @@ CONS4_FN (2, float);
 
 /*
 ** cons4_4_float:
-** ins v[0-9]+\.s[^\n]+
-** ins v[0-9]+\.s[^\n]+
+** uzp1v[0-9]+\.2s[^\n]+
+** uzp1v[0-9]+\.2s[^\n]+
 ** zip1v([0-9]+).4s, [^\n]+
 ** stp q\1, q\1, \[x0\]
 ** stp q\1, q\1, \[x0, #?32\]
diff --git a/gcc/testsuite/gcc.target/aarch64/pr109072_1.c 
b/gcc/testsuite/gcc.target/aarch64/pr109072_1.c
index 6c1d2b0bdccf..0fc195a598f3 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr109072_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr109072_1.c
@@ -54,7 +54,7 @@ f32x2_1 (float32_t x)
 
 /*
 ** f32x2_2:
-** ins v0\.s\[1\], v1.s\[0\]
+** uzp1v0\.2s, v0\.2s, v1\.2s
 ** ret
 */
 float32x2_t
@@ -165,7 +165,7 @@ f64x2_1 (float64_t x)
 
 /*
 ** f64x2_2:
-** ins v0\.d\[1\], v1.d\[0\]
+** uzp1v0\.2d, v0\.2d, v1\.2d
 ** ret
 */
 float64x2_t
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-14.c 
b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c
index 02875088cd98..1a2cc9fbf473 100644
--- a/gcc/testsuite/gcc.target/aarch64/vec-init-14.c
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c
@@ -67,7 +67,7 @@ int32x2_t s32_6(int32_t a0, int32_t a1) {
 
 /*
 ** f32_1:
-** ins v0\.s\[1\], v1\.s\[0\]
+** uzp1v0\.2s, v0\.2s, v1\.2s
 ** ret
 */
 float32x2_t f32_1(float32_t a0, float32_t a1) {
@@ -90,7 +90,7 @@ float32x2_t f32_2(float32_t a0, float32_t *ptr) {
 /*
 ** f32_3:
 ** ldr s0, \

[gcc r15-514] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Christoph Mテシllner via Gcc-cvs
https://gcc.gnu.org/g:5609d77e683944439fae38323ecabc44a1eb4671

commit r15-514-g5609d77e683944439fae38323ecabc44a1eb4671
Author: Christoph Müllner 
Date:   Wed May 15 01:34:54 2024 +0200

RISC-V: Test cbo.zero expansion for rv32

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Signed-off-by: Christoph Müllner 

Diff:
---
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c  | 37 +++---
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index 6d4535287d08..9192b391b11d 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,24 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
+/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
-/* { dg-allow-blank-lines-in-output 1 } */
 
-/*
-**clear_buf_123:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**sd\tzero,64\(a[0-9]+\)
-**sd\tzero,72\(a[0-9]+\)
-**sd\tzero,80\(a[0-9]+\)
-**sd\tzero,88\(a[0-9]+\)
-**sd\tzero,96\(a[0-9]+\)
-**sd\tzero,104\(a[0-9]+\)
-**sd\tzero,112\(a[0-9]+\)
-**sh\tzero,120\(a[0-9]+\)
-**sb\tzero,122\(a[0-9]+\)
-**...
-*/
+// 1x cbo.zero, 7x sd (rv64) or 14x sw (rv32), 1x sh, 1x sb
 int
 clear_buf_123 (void *p)
 {
@@ -26,17 +11,17 @@ clear_buf_123 (void *p)
   __builtin_memset (p, 0, 123);
 }
 
-/*
-**clear_buf_128:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**addi\ta[0-9]+,a[0-9]+,64
-**cbo\.zero\t0\(a[0-9]+\)
-**...
-*/
+// 2x cbo.zero, 1x addi 64
 int
 clear_buf_128 (void *p)
 {
   p = __builtin_assume_aligned(p, 64);
   __builtin_memset (p, 0, 128);
 }
+
+/* { dg-final { scan-assembler-times "cbo\.zero\t" 3 } } */
+/* { dg-final { scan-assembler-times "addi\ta\[0-9\]+,a\[0-9\]+,64" 1 } } */
+/* { dg-final { scan-assembler-times "sd\t" 7 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "sw\t" 14 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "sh\t" 1 } } */
+/* { dg-final { scan-assembler-times "sb\t" 1 } } */


[gcc r12-10442] ipa: Force args obtined through pass-through maps to the expected type (PR 114247)

2024-05-15 Thread Martin Jambor via Gcc-cvs
https://gcc.gnu.org/g:44191982c6bd41db1c9d126ea2f15febec3c1f81

commit r12-10442-g44191982c6bd41db1c9d126ea2f15febec3c1f81
Author: Martin Jambor 
Date:   Tue May 14 14:13:36 2024 +0200

ipa: Force args obtined through pass-through maps to the expected type (PR 
114247)

Interactions of IPA-CP and IPA-SRA on the same data is a rather big
source of issues, I'm afraid.  PR 113964 is a situation where IPA-CP
propagates an unsigned short in a union parameter into a function
which itself calls a different function which has a same union
parameter and both these union parameters are split with IPA-SRA.  The
leaf function however uses a signed short member of the union.

In the calling function, we get the unsigned constant as the
replacement for the union and it is then passed in the call without
any type compatibility checks.  Apparently on riscv64 it matters
whether the parameter is signed or unsigned short and so the leaf
function can see different values.

Fixed by using useless_type_conversion_p at the appropriate place and
if it fails, use force_value_to type as elsewhere in similar
situations.

gcc/ChangeLog:

2024-04-04  Martin Jambor  

PR ipa/114247
* ipa-param-manipulation.cc (ipa_param_adjustments::modify_call):
Force values obtined through pass-through maps to the expected
split type.

gcc/testsuite/ChangeLog:

2024-04-04  Patrick O'Neill  
Martin Jambor  

PR ipa/114247
* gcc.dg/ipa/pr114247.c: New test.

(cherry picked from commit 8cd0d29270d4ed86c69b80c08de66dcb6c1e22fe)

Diff:
---
 gcc/ipa-param-manipulation.cc   |  6 ++
 gcc/testsuite/gcc.dg/ipa/pr114247.c | 31 +++
 2 files changed, 37 insertions(+)

diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
index 38328c3e8d0a..3472ef13bc2f 100644
--- a/gcc/ipa-param-manipulation.cc
+++ b/gcc/ipa-param-manipulation.cc
@@ -719,6 +719,12 @@ ipa_param_adjustments::modify_call (cgraph_edge *cs,
  }
   if (repl)
{
+ if (!useless_type_conversion_p(apm->type, repl->typed.type))
+   {
+ repl = force_value_to_type (apm->type, repl);
+ repl = force_gimple_operand_gsi (&gsi, repl,
+  true, NULL, true, GSI_SAME_STMT);
+   }
  vargs.quick_push (repl);
  continue;
}
diff --git a/gcc/testsuite/gcc.dg/ipa/pr114247.c 
b/gcc/testsuite/gcc.dg/ipa/pr114247.c
new file mode 100644
index ..60aa2bc0122f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr114247.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+union a {
+  unsigned short b;
+  int c;
+  signed short d;
+};
+int e, f = 1, g;
+long h;
+const int **i;
+void j(union a k, int l, unsigned m) {
+  const int *a[100];
+  i = &a[0];
+  h = k.d;
+}
+static int o(union a k) {
+  k.d = -1;
+  while (1)
+if (f)
+  break;
+  j(k, g, e);
+  return 0;
+}
+int main() {
+  union a n = {1};
+  o(n);
+  if (h != -1)
+__builtin_abort();
+  return 0;
+}


[gcc r12-10443] ipa: Self-DCE of uses of removed call LHSs (PR 108007)

2024-05-15 Thread Martin Jambor via Gcc-cvs
https://gcc.gnu.org/g:2183e5b5aa3a080624cb95a06993e34dedd09cb2

commit r12-10443-g2183e5b5aa3a080624cb95a06993e34dedd09cb2
Author: Martin Jambor 
Date:   Mon Apr 8 17:34:33 2024 +0200

ipa: Self-DCE of uses of removed call LHSs (PR 108007)

PR 108007 is another manifestation where we rely on DCE to clean-up
after IPA-SRA and if the user explicitely switches DCE off, IPA-SRA
can leave behind statements which are fed uninitialized values and
trap, even though their results are themselves never used.

I have already fixed this for unused parameters in callees, this bug
shows that almost the same thing can happen for removed returns, on
the side of callers.  This means that the issue has to be fixed
elsewhere, in call redirection.  This patch adds a function which
looks for (and through, using a work-list) uses of operations fed
specific SSA names and removes them all.

That would have been easy if it wasn't for debug statements during
tree-inline (from which call redirection is also invoked).  Debug
statements are decoupled from the rest at this point and iterating
over uses of SSAs does not bring them up.  During tree-inline they are
handled especially at the end, I assume in order to make sure that
relative ordering of UIDs are the same with and without debug info.

This means that during tree-inline we need to make a hash of killed
SSAs, that we already have in copy_body_data, available to the
function making the purging.  So the patch duly does also that, making
the interface slightly ugly.  Moreover, all newly unused SSA names
need to be freed and as PR 112616 showed, it must be done in a defined
order, which is what newly added ipa_release_ssas_in_hash does.

This backport to gcc-13 also contains
54e505d0446f86b7ad383acbb8e5501f20872b64 in order not to reintroduce
PR 113757.

gcc/ChangeLog:

2024-04-05  Martin Jambor  

PR ipa/108007
PR ipa/112616
* cgraph.h (cgraph_edge): Add a parameter to
redirect_call_stmt_to_callee.
* ipa-param-manipulation.h (ipa_param_adjustments): Add a
parameter to modify_call.
(ipa_release_ssas_in_hash): Declare.
* cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): New
parameter killed_ssas, pass it to padjs->modify_call.
* ipa-param-manipulation.cc (purge_all_uses): New function.
(ipa_param_adjustments::modify_call): New parameter killed_ssas.
Instead of substituting uses, invoke purge_all_uses.  If
hash of killed SSAs has not been provided, create a temporary one
and release SSAs that have been added to it.
(compare_ssa_versions): New function.
(ipa_release_ssas_in_hash): Likewise.
* tree-inline.cc (redirect_all_calls): Create
id->killed_new_ssa_names earlier, pass it to edge redirection,
adjust a comment.
(copy_body): Release SSAs in id->killed_new_ssa_names.

gcc/testsuite/ChangeLog:

2024-01-15  Martin Jambor  

PR ipa/108007
PR ipa/112616
* gcc.dg/ipa/pr108007.c: New test.
* gcc.dg/ipa/pr112616.c: Likewise.

(cherry picked from commit 40ddc0b05a47f999b24f20c1becb79004995731b)

Diff:
---
 gcc/cgraph.cc   |  10 +++-
 gcc/cgraph.h|   9 ++-
 gcc/ipa-param-manipulation.cc   | 112 +---
 gcc/ipa-param-manipulation.h|   5 +-
 gcc/testsuite/g++.dg/ipa/pr113757.C |  14 +
 gcc/testsuite/gcc.dg/ipa/pr108007.c |  32 +++
 gcc/testsuite/gcc.dg/ipa/pr112616.c |  28 +
 gcc/tree-inline.cc  |  27 -
 8 files changed, 193 insertions(+), 44 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 3734c85db637..b5cfa3b36c57 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -1403,11 +1403,17 @@ cgraph_edge::redirect_callee (cgraph_node *n)
speculative indirect call, remove "speculative" of the indirect call and
also redirect stmt to it's final direct target.
 
+   When called from within tree-inline, KILLED_SSAs has to contain the pointer
+   to killed_new_ssa_names within the copy_body_data structure and SSAs
+   discovered to be useless (if LHS is removed) will be added to it, otherwise
+   it needs to be NULL.
+
It is up to caller to iteratively transform each "speculative"
direct call as appropriate.  */
 
 gimple *
-cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
+cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e,
+  hash_set  *killed_ssas)
 {
   tree decl = gimple_call_fndecl (e->call_stmt);
   gcall *new_stmt;
@@ -1528,7 +1534,7 @@ cgraph_edge::redirect_call_stmt_to_callee (cgraph_edge *e)
remove_stmt_from_eh_lp (e-

[gcc r15-515] c++: add test for DR 2855

2024-05-15 Thread Marek Polacek via Gcc-cvs
https://gcc.gnu.org/g:3cefe91b74f65f0db71472e01ca83c69e2cd81cc

commit r15-515-g3cefe91b74f65f0db71472e01ca83c69e2cd81cc
Author: Marek Polacek 
Date:   Tue May 14 13:48:30 2024 -0400

c++: add test for DR 2855

Let

  int8_t x = 127;

This DR says that while

  x++;

invokes UB,

  ++x;

does not.  The resolution was to make the first one valid.  The
following test verifies that we don't report any errors in a constexpr
context.

DR 2855

gcc/testsuite/ChangeLog:

* g++.dg/DRs/dr2855.C: New test.

Diff:
---
 gcc/testsuite/g++.dg/DRs/dr2855.C | 16 
 1 file changed, 16 insertions(+)

diff --git a/gcc/testsuite/g++.dg/DRs/dr2855.C 
b/gcc/testsuite/g++.dg/DRs/dr2855.C
new file mode 100644
index ..b4609ceaa158
--- /dev/null
+++ b/gcc/testsuite/g++.dg/DRs/dr2855.C
@@ -0,0 +1,16 @@
+// DR 2855, Undefined behavior in postfix increment
+// { dg-do compile { target c++14 } }
+
+using int8_t = signed char;
+
+constexpr int
+f ()
+{
+  int8_t x = 127;
+  x++;
+  int8_t z = 127;
+  ++z;
+  return x + z;
+}
+
+constexpr int i = f();


[gcc r15-516] PR modula2/115057 TextIO.ReadRestLine raises an exception when buffer is exceeded

2024-05-15 Thread Gaius Mulley via Gcc-cvs
https://gcc.gnu.org/g:680af0e1e90d4b80260d173636dfe15654fd470d

commit r15-516-g680af0e1e90d4b80260d173636dfe15654fd470d
Author: Gaius Mulley 
Date:   Wed May 15 16:58:21 2024 +0100

PR modula2/115057 TextIO.ReadRestLine raises an exception when buffer is 
exceeded

TextIO.ReadRestLine will raise an "attempting to read beyond end of file"
exception if the buffer is exceeded.  This bug is caused by the
TextIO.ReadRestLine calling IOChan.Skip without a preceeding IOChan.Look.
The Look procedure will update the status result whereas
Skip always sets read result to allRight.

gcc/m2/ChangeLog:

PR modula2/115057
* gm2-libs-iso/TextIO.mod (ReadRestLine): Use ReadChar to
skip unwanted characters as this calls IOChan.Look and updates
the cid result status.  A Skip without a Look does not update
the status.  Skip always sets read result to allRight.
* gm2-libs-iso/TextUtil.def (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.
* gm2-libs-iso/TextUtil.mod (SkipSpaces): Improve comments.
(CharAvailable): Improve comments.

gcc/testsuite/ChangeLog:

PR modula2/115057
* gm2/isolib/run/pass/testrestline.mod: New test.
* gm2/isolib/run/pass/testrestline2.mod: New test.
* gm2/isolib/run/pass/testrestline3.mod: New test.

Signed-off-by: Gaius Mulley 

Diff:
---
 gcc/m2/gm2-libs-iso/TextIO.mod  | 13 -
 gcc/m2/gm2-libs-iso/TextUtil.def|  6 +-
 gcc/m2/gm2-libs-iso/TextUtil.mod|  6 +-
 gcc/testsuite/gm2/isolib/run/pass/testrestline.mod  | 20 
 gcc/testsuite/gm2/isolib/run/pass/testrestline2.mod | 17 +
 gcc/testsuite/gm2/isolib/run/pass/testrestline3.mod | 16 
 6 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/gcc/m2/gm2-libs-iso/TextIO.mod b/gcc/m2/gm2-libs-iso/TextIO.mod
index 78d67187b799..5204467b1921 100644
--- a/gcc/m2/gm2-libs-iso/TextIO.mod
+++ b/gcc/m2/gm2-libs-iso/TextIO.mod
@@ -114,13 +114,19 @@ PROCEDURE ReadRestLine (cid: IOChan.ChanId; VAR s: ARRAY 
OF CHAR);
   *)
 VAR
i, h: CARDINAL ;
+   ignore  : CHAR ;
finished: BOOLEAN ;
 BEGIN
h := HIGH(s) ;
i := 0 ;
finished := FALSE ;
-   WHILE (i<=h) AND CharAvailable (cid) AND (NOT finished) DO
-  ReadChar (cid, s[i]) ;
+   WHILE CharAvailable (cid) AND (NOT finished) DO
+  IF i <= h
+  THEN
+ ReadChar (cid, s[i])
+  ELSE
+ ReadChar (cid, ignore)
+  END ;
   IF EofOrEoln (cid)
   THEN
  finished := TRUE
@@ -128,9 +134,6 @@ BEGIN
  INC (i)
   END
END ;
-   WHILE CharAvailable (cid) DO
-  IOChan.Skip (cid)
-   END ;
SetNul (cid, i, s, TRUE)
 END ReadRestLine ;
 
diff --git a/gcc/m2/gm2-libs-iso/TextUtil.def b/gcc/m2/gm2-libs-iso/TextUtil.def
index ead045617233..7e6b3ed07dce 100644
--- a/gcc/m2/gm2-libs-iso/TextUtil.def
+++ b/gcc/m2/gm2-libs-iso/TextUtil.def
@@ -45,11 +45,15 @@ IMPORT IOChan ;
 PROCEDURE SkipSpaces (cid: IOChan.ChanId) ;
 
 
-(* The following procedures do not read past line marks.  *)
+(* CharAvailable returns TRUE if IOChan.ReadResult is notKnown or
+   allRight.  *)
 
 PROCEDURE CharAvailable (cid: IOChan.ChanId) : BOOLEAN ;
 
 
+(* EofOrEoln returns TRUE if IOChan.ReadResult is endOfLine or
+   endOfInput.  *)
+
 PROCEDURE EofOrEoln (cid: IOChan.ChanId) : BOOLEAN ;
 
 
diff --git a/gcc/m2/gm2-libs-iso/TextUtil.mod b/gcc/m2/gm2-libs-iso/TextUtil.mod
index 44dbd1c69f8b..ad5786ca2fb2 100644
--- a/gcc/m2/gm2-libs-iso/TextUtil.mod
+++ b/gcc/m2/gm2-libs-iso/TextUtil.mod
@@ -23,7 +23,8 @@ BEGIN
 END SkipSpaces ;
 
 
-(* The following procedures do not read past line marks.  *)
+(* CharAvailable returns TRUE if IOChan.ReadResult is notKnown or
+   allRight.  *)
 
 PROCEDURE CharAvailable (cid: IOChan.ChanId) : BOOLEAN ;
 BEGIN
@@ -32,6 +33,9 @@ BEGIN
 END CharAvailable ;
 
 
+(* EofOrEoln returns TRUE if IOChan.ReadResult is endOfLine or
+   endOfInput.  *)
+
 PROCEDURE EofOrEoln (cid: IOChan.ChanId) : BOOLEAN ;
 BEGIN
RETURN( (IOChan.ReadResult (cid) = IOConsts.endOfLine) OR
diff --git a/gcc/testsuite/gm2/isolib/run/pass/testrestline.mod 
b/gcc/testsuite/gm2/isolib/run/pass/testrestline.mod
new file mode 100644
index ..7702e88bdd95
--- /dev/null
+++ b/gcc/testsuite/gm2/isolib/run/pass/testrestline.mod
@@ -0,0 +1,20 @@
+MODULE testrestline ;
+
+IMPORT SeqFile, TextIO ;
+
+VAR
+   chan: SeqFile.ChanId ;
+   line: ARRAY [0..5] OF CHAR ;
+   results : SeqFile.OpenResults ;
+BEGIN
+   SeqFile.OpenWrite (chan, "test.input", SeqFile.write, results) ;
+   TextIO.WriteString (chan, "a line of text exceeding 6 chars") ;
+   TextIO.WriteLn (chan) ;
+   TextIO.WriteString (chan, "a second lineline of text exceeding 6 chars") ;
+   TextIO.WriteLn (chan) ;
+   SeqFile.Close (chan

[gcc r14-10210] Avoid changing type in the type_hash_canon hash

2024-05-15 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:573e1df0ec8428e564c97af7c237a5e0c98c59bd

commit r14-10210-g573e1df0ec8428e564c97af7c237a5e0c98c59bd
Author: Richard Biener 
Date:   Fri May 3 11:48:07 2024 +0200

Avoid changing type in the type_hash_canon hash

When building a type and type_hash_canon returns an existing type
avoid changing it, in particular its TYPE_CANONICAL.

PR middle-end/114931
* tree.cc (build_array_type_1): Return early when type_hash_canon
returned an older existing type.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.

(cherry picked from commit 7a212ac678e13e0df5da2d090144b246a1262b64)

Diff:
---
 gcc/tree.cc | 12 
 1 file changed, 12 insertions(+)

diff --git a/gcc/tree.cc b/gcc/tree.cc
index 83f3bf306afa..780662549fea 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -7352,7 +7352,10 @@ build_array_type_1 (tree elt_type, tree index_type, bool 
typeless_storage,
   if (shared)
 {
   hashval_t hash = type_hash_canon_hash (t);
+  tree probe_type = t;
   t = type_hash_canon (hash, t);
+  if (t != probe_type)
+   return t;
 }
 
   if (TYPE_CANONICAL (t) == t && set_canonical)
@@ -7509,7 +7512,10 @@ build_function_type (tree value_type, tree arg_types,
 
   /* If we already have such a type, use the old one.  */
   hashval_t hash = type_hash_canon_hash (t);
+  tree probe_type = t;
   t = type_hash_canon (hash, t);
+  if (t != probe_type)
+return t;
 
   /* Set up the canonical type. */
   any_structural_p   = TYPE_STRUCTURAL_EQUALITY_P (value_type);
@@ -7663,7 +7669,10 @@ build_method_type_directly (tree basetype,
 
   /* If we already have such a type, use the old one.  */
   hashval_t hash = type_hash_canon_hash (t);
+  tree probe_type = t;
   t = type_hash_canon (hash, t);
+  if (t != probe_type)
+return t;
 
   /* Set up the canonical type. */
   any_structural_p
@@ -7720,7 +7729,10 @@ build_offset_type (tree basetype, tree type)
 
   /* If we already have such a type, use the old one.  */
   hashval_t hash = type_hash_canon_hash (t);
+  tree probe_type = t;
   t = type_hash_canon (hash, t);
+  if (t != probe_type)
+return t;
 
   if (!COMPLETE_TYPE_P (t))
 layout_type (t);


[gcc r14-10211] middle-end/114931 - type_hash_canon and structual equality types

2024-05-15 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:1d89cb43943e77d0bbb48fd5a58a352bdd3d82c7

commit r14-10211-g1d89cb43943e77d0bbb48fd5a58a352bdd3d82c7
Author: Richard Biener 
Date:   Fri May 3 10:44:50 2024 +0200

middle-end/114931 - type_hash_canon and structual equality types

TYPE_STRUCTURAL_EQUALITY_P is part of our type system so we have
to make sure to include that into the type unification done via
type_hash_canon.  This requires the flag to be set before querying
the hash which is the biggest part of the patch.

PR middle-end/114931
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_STRUCTURAL_EQUALITY_P.
(type_cache_hasher::equal): Compare TYPE_STRUCTURAL_EQUALITY_P.
(build_array_type_1): Set TYPE_STRUCTURAL_EQUALITY_P before
probing with type_hash_canon.
(build_function_type): Likewise.
(build_method_type_directly): Likewise.
(build_offset_type): Likewise.
(build_complex_type): Likewise.
* attribs.cc (build_type_attribute_qual_variant): Likewise.

gcc/c-family/
* c-common.cc (complete_array_type): Set TYPE_STRUCTURAL_EQUALITY_P
before probing with type_hash_canon.

gcc/testsuite/
* gcc.dg/pr114931.c: New testcase.

(cherry picked from commit b09c2e9560648b0cf993c2ca9ad972c34e6bddfa)

Diff:
---
 gcc/attribs.cc  | 20 ++---
 gcc/c-family/c-common.cc| 11 +--
 gcc/testsuite/gcc.dg/pr114931.c | 10 +++
 gcc/tree.cc | 65 +
 4 files changed, 74 insertions(+), 32 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index 12ffc5f170a1..3ab0b0fd87a4 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -1336,6 +1336,16 @@ build_type_attribute_qual_variant (tree otype, tree 
attribute, int quals)
   tree dtype = ntype = build_distinct_type_copy (ttype);
 
   TYPE_ATTRIBUTES (ntype) = attribute;
+  /* If the target-dependent attributes make NTYPE different from
+its canonical type, we will need to use structural equality
+checks for this type.
+
+We shouldn't get here for stripping attributes from a type;
+the no-attribute type might not need structural comparison.  But
+we can if was discarded from type_hash_table.  */
+  if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
+ || !comp_type_attributes (ntype, ttype))
+   SET_TYPE_STRUCTURAL_EQUALITY (ntype);
 
   hashval_t hash = type_hash_canon_hash (ntype);
   ntype = type_hash_canon (hash, ntype);
@@ -1343,16 +1353,6 @@ build_type_attribute_qual_variant (tree otype, tree 
attribute, int quals)
   if (ntype != dtype)
/* This variant was already in the hash table, don't mess with
   TYPE_CANONICAL.  */;
-  else if (TYPE_STRUCTURAL_EQUALITY_P (ttype)
-  || !comp_type_attributes (ntype, ttype))
-   /* If the target-dependent attributes make NTYPE different from
-  its canonical type, we will need to use structural equality
-  checks for this type.
-
-  We shouldn't get here for stripping attributes from a type;
-  the no-attribute type might not need structural comparison.  But
-  we can if was discarded from type_hash_table.  */
-   SET_TYPE_STRUCTURAL_EQUALITY (ntype);
   else if (TYPE_CANONICAL (ntype) == ntype)
TYPE_CANONICAL (ntype) = TYPE_CANONICAL (ttype);
 
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index d14591c7bd3b..aae998d0f738 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -7115,6 +7115,13 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   TYPE_TYPELESS_STORAGE (main_type) = TYPE_TYPELESS_STORAGE (type);
   layout_type (main_type);
 
+  /* Set TYPE_STRUCTURAL_EQUALITY_P early.  */
+  if (TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (main_type))
+  || TYPE_STRUCTURAL_EQUALITY_P (TYPE_DOMAIN (main_type)))
+SET_TYPE_STRUCTURAL_EQUALITY (main_type);
+  else
+TYPE_CANONICAL (main_type) = main_type;
+
   /* Make sure we have the canonical MAIN_TYPE. */
   hashval_t hashcode = type_hash_canon_hash (main_type);
   main_type = type_hash_canon (hashcode, main_type);
@@ -7122,7 +7129,7 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   /* Fix the canonical type.  */
   if (TYPE_STRUCTURAL_EQUALITY_P (TREE_TYPE (main_type))
   || TYPE_STRUCTURAL_EQUALITY_P (TYPE_DOMAIN (main_type)))
-SET_TYPE_STRUCTURAL_EQUALITY (main_type);
+gcc_assert (TYPE_STRUCTURAL_EQUALITY_P (main_type));
   else if (TYPE_CANONICAL (TREE_TYPE (main_type)) != TREE_TYPE (main_type)
   || (TYPE_CANONICAL (TYPE_DOMAIN (main_type))
   != TYPE_DOMAIN (main_type)))
@@ -7130,8 +7137,6 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
   = build_array_type (TYPE_CANONICAL (TREE_TYPE (main_type)),

[gcc r15-517] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:ab25eef36400e8c1d28e3ed059c5f95a38b45f17

commit r15-517-gab25eef36400e8c1d28e3ed059c5f95a38b45f17
Author: Richard Biener 
Date:   Wed May 15 13:06:30 2024 +0200

middle-end/111422 - wrong stack var coalescing, handle PHIs

The gcc.c-torture/execute/pr111422.c testcase after installing the
sink pass improvement reveals that we also need to handle

 _65 = &g + _58;  _44 = &g + _43;
 # _59 = PHI <_65, _44>
 *_59 = 8;
 g = {v} {CLOBBER(eos)};
 ...
 n[0] = &f;
 *_59 = 8;
 g = {v} {CLOBBER(eos)};

where we fail to see the conflict between n and g after the first
clobber of g.  Before the sinking improvement there was a conflict
recorded on a path where _65/_44 are unused, so the real conflict
was missed but the fake one avoided the miscompile.

The following handles PHI defs in add_scope_conflicts_2 which
fixes the issue.

PR middle-end/111422
* cfgexpand.cc (add_scope_conflicts_2): Handle PHIs
by recursing to their arguments.

Diff:
---
 gcc/cfgexpand.cc | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 557cb28733bd..8de5f2ba58b7 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -584,10 +584,21 @@ add_scope_conflicts_2 (tree use, bitmap work,
  || INTEGRAL_TYPE_P (TREE_TYPE (use
 {
   gimple *g = SSA_NAME_DEF_STMT (use);
-  if (is_gimple_assign (g))
-   if (tree op = gimple_assign_rhs1 (g))
- if (TREE_CODE (op) == ADDR_EXPR)
-   visit (g, TREE_OPERAND (op, 0), op, work);
+  if (gassign *a = dyn_cast  (g))
+   {
+ if (tree op = gimple_assign_rhs1 (a))
+   if (TREE_CODE (op) == ADDR_EXPR)
+ visit (a, TREE_OPERAND (op, 0), op, work);
+   }
+  else if (gphi *p = dyn_cast  (g))
+   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
+ if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
+   if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT (use)))
+ {
+   if (tree op = gimple_assign_rhs1 (a))
+ if (TREE_CODE (op) == ADDR_EXPR)
+   visit (a, TREE_OPERAND (op, 0), op, work);
+ }
 }
 }


[gcc r15-518] tree-optimization/114589 - remove profile based sink heuristics

2024-05-15 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:99b1daae18c095d6c94d32efb77442838e11cbfb

commit r15-518-g99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener 
Date:   Fri May 3 14:04:41 2024 +0200

tree-optimization/114589 - remove profile based sink heuristics

The following removes the profile based heuristic limiting sinking
and instead uses post-dominators to avoid sinking to places that
are executed under the same conditions as the earlier location which
the profile based heuristic should have guaranteed as well.

To avoid regressing this moves the empty-latch check to cover all
sink cases.

It also stream-lines the resulting select_best_block a bit but avoids
adjusting heuristics more with this change.  gfortran.dg/streamio_9.f90
starts execute failing with this on x86_64 with -m32 because the
(float)i * 9....e-7 compute is sunk across a STOP causing it
to be no longer spilled and thus the compare failing due to excess
precision.  The patch adds -ffloat-store to avoid this, following
other similar testcases.

This change fixes the testcase in the PR only when using -fno-ivopts
as otherwise VRP is confused.

PR tree-optimization/114589
* tree-ssa-sink.cc (select_best_block): Remove profile-based
heuristics.  Instead reject sink locations that sink
to post-dominators.  Move empty latch check here from
statement_sink_location.  Also consider early_bb for the
loop depth check.
(statement_sink_location): Remove superfluous check.  Remove
empty latch check.
(pass_sink_code::execute): Compute/release post-dominators.

* gfortran.dg/streamio_9.f90: Use -ffloat-store to avoid
excess precision when not spilling.
* g++.dg/tree-ssa/pr114589.C: New testcase.

Diff:
---
 gcc/testsuite/g++.dg/tree-ssa/pr114589.C | 22 
 gcc/testsuite/gfortran.dg/streamio_9.f90 |  1 +
 gcc/tree-ssa-sink.cc | 62 ++--
 3 files changed, 42 insertions(+), 43 deletions(-)

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr114589.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr114589.C
new file mode 100644
index ..85bb6d03015b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr114589.C
@@ -0,0 +1,22 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fno-ivopts -fdump-tree-optimized" }
+
+template 
+struct simple_optional {
+bool has_val;
+T val;
+
+auto begin() const -> T const* { return &val; }
+auto end() const -> T const* { return &val + (has_val ? 1 : 0); }
+};
+
+void f(int);
+
+void call_f(simple_optional const& o) {
+for (int i : o) {
+f(i);
+}
+}
+
+// Only a conditional execution of 'f' should prevail, no loop
+// { dg-final { scan-tree-dump-times ".
 ! Test case derived from that given in PR by Steve Kargl.
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2f90acb7ef48..2188b7523c7b 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -178,15 +178,7 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 
We want the most control dependent block in the shallowest loop nest.
 
-   If the resulting block is in a shallower loop nest, then use it.  Else
-   only use the resulting block if it has significantly lower execution
-   frequency than EARLY_BB to avoid gratuitous statement movement.  We
-   consider statements with VOPS more desirable to move.
-
-   This pass would obviously benefit from PDO as it utilizes block
-   frequencies.  It would also benefit from recomputing frequencies
-   if profile data is not available since frequencies often get out
-   of sync with reality.  */
+   If the resulting block is in a shallower loop nest, then use it.  */
 
 static basic_block
 select_best_block (basic_block early_bb,
@@ -195,18 +187,17 @@ select_best_block (basic_block early_bb,
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
 {
+  /* Walk up the dominator tree, hopefully we'll find a shallower
+loop nest.  */
+  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
+
   /* If we've moved into a lower loop nest, then that becomes
 our best block.  */
   if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
best_bb = temp_bb;
-
-  /* Walk up the dominator tree, hopefully we'll find a shallower
-loop nest.  */
-  temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
 }
 
   /* Placing a statement before a setjmp-like function would be invalid
@@ -221,6 +212,16 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
 return best_bb;
 
+  /* Do not move stmts to post-dominating places on the same loop depth.  */
+  if (dominated_by_p (CDI_POST_DOMINATORS, early_bb, 

[gcc r15-519] openmp: Diagnose using grainsize+num_tasks clauses together [PR115103]

2024-05-15 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:7fdbefc575c24881356b5f4091fa57b5f7166a90

commit r15-519-g7fdbefc575c24881356b5f4091fa57b5f7166a90
Author: Jakub Jelinek 
Date:   Wed May 15 18:34:44 2024 +0200

openmp: Diagnose using grainsize+num_tasks clauses together [PR115103]

I've noticed that while we diagnose many other OpenMP exclusive clauses,
we don't diagnose grainsize together with num_tasks on taskloop construct
in all of C, C++ and Fortran (the implementation simply ignored grainsize
in that case) and for Fortran also don't diagnose mixing nogroup clause
with reduction clause(s).

Fixed thusly.

2024-05-15  Jakub Jelinek  

PR c/115103
gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/cp/
* semantics.cc (finish_omp_clauses): Diagnose grainsize
used together with num_tasks.
gcc/fortran/
* openmp.cc (resolve_omp_clauses): Diagnose grainsize
used together with num_tasks or nogroup used together with
reduction.
gcc/testsuite/
* c-c++-common/gomp/clause-dups-1.c: Add 2 further expected errors.
* gfortran.dg/gomp/pr115103.f90: New test.

Diff:
---
 gcc/c/c-typeck.cc   | 22 --
 gcc/cp/semantics.cc | 16 
 gcc/fortran/openmp.cc   |  7 +++
 gcc/testsuite/c-c++-common/gomp/clause-dups-1.c |  4 ++--
 gcc/testsuite/gfortran.dg/gomp/pr115103.f90 | 14 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 4567b114734b..7ecca9f58c68 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -14722,6 +14722,8 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
   tree *detach_seen = NULL;
   bool linear_variable_step_check = false;
   tree *nowait_clause = NULL;
+  tree *grainsize_seen = NULL;
+  bool num_tasks_seen = false;
   tree ordered_clause = NULL_TREE;
   tree schedule_clause = NULL_TREE;
   bool oacc_async = false;
@@ -16021,8 +16023,6 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
case OMP_CLAUSE_PROC_BIND:
case OMP_CLAUSE_DEVICE_TYPE:
case OMP_CLAUSE_PRIORITY:
-   case OMP_CLAUSE_GRAINSIZE:
-   case OMP_CLAUSE_NUM_TASKS:
case OMP_CLAUSE_THREADS:
case OMP_CLAUSE_SIMD:
case OMP_CLAUSE_HINT:
@@ -16048,6 +16048,16 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
  pc = &OMP_CLAUSE_CHAIN (c);
  continue;
 
+   case OMP_CLAUSE_GRAINSIZE:
+ grainsize_seen = pc;
+ pc = &OMP_CLAUSE_CHAIN (c);
+ continue;
+
+   case OMP_CLAUSE_NUM_TASKS:
+ num_tasks_seen = true;
+ pc = &OMP_CLAUSE_CHAIN (c);
+ continue;
+
case OMP_CLAUSE_MERGEABLE:
  mergeable_seen = true;
  pc = &OMP_CLAUSE_CHAIN (c);
@@ -16333,6 +16343,14 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
   *nogroup_seen = OMP_CLAUSE_CHAIN (*nogroup_seen);
 }
 
+  if (grainsize_seen && num_tasks_seen)
+{
+  error_at (OMP_CLAUSE_LOCATION (*grainsize_seen),
+   "% clause must not be used together with "
+   "% clause");
+  *grainsize_seen = OMP_CLAUSE_CHAIN (*grainsize_seen);
+}
+
   if (detach_seen)
 {
   if (mergeable_seen)
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index df62e2d80dbd..f90c304a65b7 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -7098,6 +7098,7 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
   bool mergeable_seen = false;
   bool implicit_moved = false;
   bool target_in_reduction_seen = false;
+  bool num_tasks_seen = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
@@ -7656,6 +7657,10 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
  /* FALLTHRU */
 
case OMP_CLAUSE_NUM_TASKS:
+ if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_TASKS)
+   num_tasks_seen = true;
+ /* FALLTHRU */
+
case OMP_CLAUSE_NUM_TEAMS:
case OMP_CLAUSE_NUM_THREADS:
case OMP_CLAUSE_NUM_GANGS:
@@ -9246,6 +9251,17 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
}
  pc = &OMP_CLAUSE_CHAIN (c);
  continue;
+   case OMP_CLAUSE_GRAINSIZE:
+ if (num_tasks_seen)
+   {
+ error_at (OMP_CLAUSE_LOCATION (c),
+   "% clause must not be used together with "
+   "% clause");
+ *pc = OMP_CLAUSE_CHAIN (c);
+ continue;
+   }
+ pc = &OMP_CLAUSE_CHAIN (c);
+ continue;
case OMP_CLAUSE_ORDERED:
  if (reduction_seen == -2)
error_at (OMP_CLAUSE_LOCATION (c),
di

[gcc r15-520] combine: Fix up simplify_compare_const [PR115092]

2024-05-15 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:0b93a0ae153ef70a82ff63e67926a01fdab9956b

commit r15-520-g0b93a0ae153ef70a82ff63e67926a01fdab9956b
Author: Jakub Jelinek 
Date:   Wed May 15 18:37:17 2024 +0200

combine: Fix up simplify_compare_const [PR115092]

The following testcases are miscompiled (with tons of GIMPLE
optimization disabled) because combine sees GE comparison of
1-bit sign_extract (i.e. something with [-1, 0] value range)
with (const_int -1) (which is always true) and optimizes it into
NE comparison of 1-bit zero_extract ([0, 1] value range) against
(const_int 0).
The reason is that simplify_compare_const first (correctly)
simplifies the comparison to
GE (ashift:SI something (const_int 31)) (const_int -2147483648)
and then an optimization for when the second operand is power of 2
triggers.  That optimization is fine for power of 2s which aren't
the signed minimum of the mode, or if it is NE, EQ, GEU or LTU
against the signed minimum of the mode, but for GE or LT optimizing
it into NE (or EQ) against const0_rtx is wrong, those cases
are always true or always false (but the function doesn't have
a standardized way to tell callers the comparison is now unconditional).

The following patch just disables the optimization in that case.

2024-05-15  Jakub Jelinek  

PR rtl-optimization/114902
PR rtl-optimization/115092
* combine.cc (simplify_compare_const): Don't optimize
GE op0 SIGNED_MIN or LT op0 SIGNED_MIN into NE op0 const0_rtx or
EQ op0 const0_rtx.

* gcc.dg/pr114902.c: New test.
* gcc.dg/pr115092.c: New test.

Diff:
---
 gcc/combine.cc  |  6 --
 gcc/testsuite/gcc.dg/pr114902.c | 23 +++
 gcc/testsuite/gcc.dg/pr115092.c | 16 
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 71c9abc145c2..3b50bc3529c4 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11852,8 +11852,10 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
  `and'ed with that bit), we can replace this with a comparison
  with zero.  */
   if (const_op
-  && (code == EQ || code == NE || code == GE || code == GEU
- || code == LT || code == LTU)
+  && (code == EQ || code == NE || code == GEU || code == LTU
+ /* This optimization is incorrect for signed >= INT_MIN or
+< INT_MIN, those are always true or always false.  */
+ || ((code == GE || code == LT) && const_op > 0))
   && is_a  (mode, &int_mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
   && pow2p_hwi (const_op & GET_MODE_MASK (int_mode))
diff --git a/gcc/testsuite/gcc.dg/pr114902.c b/gcc/testsuite/gcc.dg/pr114902.c
new file mode 100644
index ..60684faa25d5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114902.c
@@ -0,0 +1,23 @@
+/* PR rtl-optimization/114902 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fno-tree-fre -fno-tree-forwprop -fno-tree-ccp 
-fno-tree-dominator-opts" } */
+
+__attribute__((noipa))
+int foo (int x)
+{
+  int a = ~x;
+  int t = a & 1;
+  int e = -t;
+  int b = e >= -1;
+  if (b)
+return 0;
+  __builtin_trap ();
+}
+
+int
+main ()
+{
+  foo (-1);
+  foo (0);
+  foo (1);
+}
diff --git a/gcc/testsuite/gcc.dg/pr115092.c b/gcc/testsuite/gcc.dg/pr115092.c
new file mode 100644
index ..c9047f4d321a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr115092.c
@@ -0,0 +1,16 @@
+/* PR rtl-optimization/115092 */
+/* { dg-do run } */
+/* { dg-options "-O1 -fgcse -ftree-pre -fno-tree-dominator-opts -fno-tree-fre 
-fno-guess-branch-probability" } */
+
+int a, b, c = 1, d, e;
+
+int
+main ()
+{
+  int f, g = a;
+  b = -2;
+  f = -(1 >> ((c && b) & ~a));
+  if (f <= b)
+d = g / e;
+  return 0;
+}


[gcc r15-521] c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]

2024-05-15 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:6ad7ca1bb905736c0f57688e93e9e77cbc71a325

commit r15-521-g6ad7ca1bb905736c0f57688e93e9e77cbc71a325
Author: Jakub Jelinek 
Date:   Wed May 15 18:50:11 2024 +0200

c++: Optimize in maybe_clone_body aliases even when not at_eof [PR113208]

This patch reworks the cdtor alias optimization, such that we can create
aliases even when maybe_clone_body is called not at at_eof time, without 
trying
to repeat it in maybe_optimize_cdtor.

2024-05-15  Jakub Jelinek  
Jason Merrill  

PR lto/113208
* cp-tree.h (maybe_optimize_cdtor): Remove.
* decl2.cc (tentative_decl_linkage): Call maybe_make_one_only
for implicit instantiations of maybe in charge ctors/dtors
declared inline.
(import_export_decl): Don't call maybe_optimize_cdtor.
(c_parse_final_cleanups): Formatting fixes.
* optimize.cc (can_alias_cdtor): Adjust condition, for
HAVE_COMDAT_GROUP && DECL_ONE_ONLY && DECL_WEAK return true even
if not DECL_INTERFACE_KNOWN.
(maybe_clone_body): Don't clear DECL_SAVED_TREE, instead set it
to void_node.
(maybe_clone_body): Remove.
* decl.cc (cxx_comdat_group): For DECL_CLONED_FUNCTION_P
functions if SUPPORTS_ONE_ONLY return DECL_COMDAT_GROUP if already
set.

* g++.dg/abi/comdat3.C: New test.
* g++.dg/abi/comdat4.C: New test.

Diff:
---
 gcc/cp/cp-tree.h   |  1 -
 gcc/cp/decl.cc |  7 +
 gcc/cp/decl2.cc| 32 ++-
 gcc/cp/optimize.cc | 63 ++
 gcc/testsuite/g++.dg/abi/comdat3.C | 22 +
 gcc/testsuite/g++.dg/abi/comdat4.C | 28 +
 6 files changed, 78 insertions(+), 75 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 9a8c86591573..ba9e848c177f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7453,7 +7453,6 @@ extern bool handle_module_option (unsigned opt, const 
char *arg, int value);
 /* In optimize.cc */
 extern tree clone_attrs(tree);
 extern bool maybe_clone_body   (tree);
-extern void maybe_optimize_cdtor   (tree);
 
 /* In parser.cc */
 extern tree cp_convert_range_for (tree, tree, tree, cp_decomp *, bool,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index a139b293e00c..6fcab615d55e 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -19287,6 +19287,13 @@ cxx_comdat_group (tree decl)
  else
break;
}
+  /* If a ctor/dtor has already set the comdat group by
+maybe_clone_body, don't override it.  */
+  if (SUPPORTS_ONE_ONLY
+ && TREE_CODE (decl) == FUNCTION_DECL
+ && DECL_CLONED_FUNCTION_P (decl))
+   if (tree comdat = DECL_COMDAT_GROUP (decl))
+ return comdat;
 }
 
   return decl;
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6913efa53552..7baff46a1921 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -3325,16 +3325,23 @@ tentative_decl_linkage (tree decl)
 linkage of all functions, and as that causes writes to
 the data mapped in from the PCH file, it's advantageous
 to mark the functions at this point.  */
- if (DECL_DECLARED_INLINE_P (decl)
- && (!DECL_IMPLICIT_INSTANTIATION (decl)
- || DECL_DEFAULTED_FN (decl)))
+ if (DECL_DECLARED_INLINE_P (decl))
{
- /* This function must have external linkage, as
-otherwise DECL_INTERFACE_KNOWN would have been
-set.  */
- gcc_assert (TREE_PUBLIC (decl));
- comdat_linkage (decl);
- DECL_INTERFACE_KNOWN (decl) = 1;
+ if (!DECL_IMPLICIT_INSTANTIATION (decl)
+ || DECL_DEFAULTED_FN (decl))
+   {
+ /* This function must have external linkage, as
+otherwise DECL_INTERFACE_KNOWN would have been
+set.  */
+ gcc_assert (TREE_PUBLIC (decl));
+ comdat_linkage (decl);
+ DECL_INTERFACE_KNOWN (decl) = 1;
+   }
+ else if (DECL_MAYBE_IN_CHARGE_CDTOR_P (decl))
+   /* For implicit instantiations of cdtors try to make
+  it comdat, so that maybe_clone_body can use aliases.
+  See PR113208.  */
+   maybe_make_one_only (decl);
}
}
   else if (VAR_P (decl))
@@ -3604,9 +3611,6 @@ import_export_decl (tree decl)
 }
 
   DECL_INTERFACE_KNOWN (decl) = 1;
-
-  if (DECL_CLONED_FUNCTION_P (decl))
-maybe_optimize_cdtor (decl);
 }
 
 /* Return an expression that performs the destruction of DECL, which
@@ -5331,7 +5335,7 @@ c_parse_final_cleanups (void)
node = node->get_alias_targe

[gcc r15-522] c++: DR 569, DR 1693: fun with semicolons [PR113760]

2024-05-15 Thread Marek Polacek via Gcc-cvs
https://gcc.gnu.org/g:0b3eac4b54a52bf206b88743d1e987badc97cff4

commit r15-522-g0b3eac4b54a52bf206b88743d1e987badc97cff4
Author: Marek Polacek 
Date:   Mon Feb 12 19:36:16 2024 -0500

c++: DR 569, DR 1693: fun with semicolons [PR113760]

Prompted by c++/113760, I started looking into a bogus "extra ;"
warning in C++11.  It quickly turned out that if I want to fix
this for good, the fix will not be so small.

This patch touches on DR 569, an extra ; at namespace scope should
be allowed since C++11:

  struct S {
  };
  ; // pedwarn in C++98

It also touches on DR 1693, which allows superfluous semicolons in
class definitions since C++11:

  struct S {
int a;
; // pedwarn in C++98
  };

Note that a single semicolon is valid after a member function definition:

  struct S {
void foo () {}; // only warns with -Wextra-semi
  };

There's a new function maybe_warn_extra_semi to handle all of the above
in a single place.  So now they all get a fix-it hint.

-Wextra-semi turns on all "extra ;" diagnostics.  Currently, options
like -Wc++11-compat or -Wc++11-extensions are not considered.

DR 1693
PR c++/113760
DR 569

gcc/c-family/ChangeLog:

* c.opt (Wextra-semi): Initialize to -1.

gcc/cp/ChangeLog:

* parser.cc (extra_semi_kind): New.
(maybe_warn_extra_semi): New.
(cp_parser_declaration): Call maybe_warn_extra_semi.
(cp_parser_member_declaration): Likewise.

gcc/ChangeLog:

* doc/invoke.texi: Update -Wextra-semi documentation.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/semicolon1.C: New test.
* g++.dg/diagnostic/semicolon10.C: New test.
* g++.dg/diagnostic/semicolon11.C: New test.
* g++.dg/diagnostic/semicolon12.C: New test.
* g++.dg/diagnostic/semicolon13.C: New test.
* g++.dg/diagnostic/semicolon14.C: New test.
* g++.dg/diagnostic/semicolon15.C: New test.
* g++.dg/diagnostic/semicolon16.C: New test.
* g++.dg/diagnostic/semicolon17.C: New test.
* g++.dg/diagnostic/semicolon2.C: New test.
* g++.dg/diagnostic/semicolon3.C: New test.
* g++.dg/diagnostic/semicolon4.C: New test.
* g++.dg/diagnostic/semicolon5.C: New test.
* g++.dg/diagnostic/semicolon6.C: New test.
* g++.dg/diagnostic/semicolon7.C: New test.
* g++.dg/diagnostic/semicolon8.C: New test.
* g++.dg/diagnostic/semicolon9.C: New test.

Diff:
---
 gcc/c-family/c.opt|  2 +-
 gcc/cp/parser.cc  | 92 ++-
 gcc/doc/invoke.texi   | 29 -
 gcc/testsuite/g++.dg/diagnostic/semicolon1.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon10.C | 11 
 gcc/testsuite/g++.dg/diagnostic/semicolon11.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon12.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon13.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon14.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon15.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon16.C | 38 +++
 gcc/testsuite/g++.dg/diagnostic/semicolon17.C | 29 +
 gcc/testsuite/g++.dg/diagnostic/semicolon2.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon3.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon4.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon5.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon6.C  | 23 +++
 gcc/testsuite/g++.dg/diagnostic/semicolon7.C  | 18 ++
 gcc/testsuite/g++.dg/diagnostic/semicolon8.C  | 11 
 gcc/testsuite/g++.dg/diagnostic/semicolon9.C  | 11 
 20 files changed, 480 insertions(+), 19 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 403abc1f26e1..fb34c3b70319 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -727,7 +727,7 @@ C ObjC C++ ObjC++ Warning
 ; in common.opt
 
 Wextra-semi
-C++ ObjC++ Var(warn_extra_semi) Warning
+C++ ObjC++ Var(warn_extra_semi) Init(-1) Warning
 Warn about semicolon after in-class function definition.
 
 Wflex-array-member-not-at-end
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 7306ce9a8a8b..476ddc0d63ad 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -15331,6 +15331,61 @@ cp_parser_module_export (cp_parser *parser)
   module_kind = mk;
 }
 
+/* Used for maybe_warn_extra_semi.  */
+
+enum class extra_semi_kind { decl, member, in_class_fn_def };
+
+/* Warn about an extra semicolon.  KIND says in which context the extra
+   semicolon occurs.  */
+
+static void
+maybe_warn_extra_semi (location_t loc, extra_semi_kind kind)
+{
+  /* -Wno-extra-semi suppresses all.  */
+  if (warn_extra_semi 

[gcc r15-523] c++: ICE with reference NSDMI [PR114854]

2024-05-15 Thread Marek Polacek via Gcc-cvs
https://gcc.gnu.org/g:1a05332bbac98a4c002bef3fb45a3ad9d56b3a71

commit r15-523-g1a05332bbac98a4c002bef3fb45a3ad9d56b3a71
Author: Marek Polacek 
Date:   Wed May 8 15:43:58 2024 -0400

c++: ICE with reference NSDMI [PR114854]

Here we crash on a cp_gimplify_expr/TARGET_EXPR assert:

  /* A TARGET_EXPR that expresses direct-initialization should have been
 elided by cp_gimplify_init_expr.  */
  gcc_checking_assert (!TARGET_EXPR_DIRECT_INIT_P (*expr_p));

the TARGET_EXPR in question is created for the NSDMI in:

  class Vector { int m_size; };
  struct S {
const Vector &vec{};
  };

where we first need to create a Vector{} temporary, and then bind the
vec reference to it.  The temporary is represented by a TARGET_EXPR
and it cannot be elided.  When we create an object of type S, we get

  D.2848 = {.vec=(const struct Vector &) &TARGET_EXPR }

where the TARGET_EXPR is no longer direct-initializing anything.

Fixed by not setting TARGET_EXPR_DIRECT_INIT_P in 
convert_like_internal/ck_user.

PR c++/114854

gcc/cp/ChangeLog:

* call.cc (convert_like_internal) : Don't set
TARGET_EXPR_DIRECT_INIT_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/nsdmi-aggr22.C: New test.

Diff:
---
 gcc/cp/call.cc|  6 +-
 gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr22.C | 12 
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e058da7735fa..ed68eb3c5684 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -8597,16 +8597,12 @@ convert_like_internal (conversion *convs, tree expr, 
tree fn, int argnum,
&& TYPE_HAS_DEFAULT_CONSTRUCTOR (totype)
&& !processing_template_decl)
  {
-   bool direct = CONSTRUCTOR_IS_DIRECT_INIT (expr);
if (abstract_virtuals_error (NULL_TREE, totype, complain))
  return error_mark_node;
expr = build_value_init (totype, complain);
expr = get_target_expr (expr, complain);
if (expr != error_mark_node)
- {
-   TARGET_EXPR_LIST_INIT_P (expr) = true;
-   TARGET_EXPR_DIRECT_INIT_P (expr) = direct;
- }
+ TARGET_EXPR_LIST_INIT_P (expr) = true;
return expr;
  }
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr22.C 
b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr22.C
new file mode 100644
index ..a4f9ae19ca9d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/nsdmi-aggr22.C
@@ -0,0 +1,12 @@
+// PR c++/114854
+// { dg-do compile { target c++14 } }
+
+struct Vector {
+  int m_size;
+};
+struct S {
+  const Vector &vec{};
+};
+
+void spawn(S);
+void test() { spawn({}); }


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [committed] Fix rv32 issues with recent zicboz work

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:75a06302ef660397001d67afc1fb4d22e6da5870

commit 75a06302ef660397001d67afc1fb4d22e6da5870
Author: Jeff Law 
Date:   Tue May 14 22:50:15 2024 -0600

[committed] Fix rv32 issues with recent zicboz work

I should have double-checked the CI system before pushing Christoph's 
patches
for memset-zero.  While I thought I'd checked CI state, I must have been
looking at the wrong patch from Christoph.

Anyway, this fixes the rv32 ICEs and disables one of the tests for rv32.

The test would need a revamp for rv32 as the expected output is all rv64 
code
using "sd" instructions.  I'm just not vested deeply enough into rv32 to 
adjust
the test to work in that environment though it should be fairly trivial to 
copy
the test and provide new expected output if someone cares enough.

Verified this fixes the rv32 failures in my tester:
> New tests that FAIL (6 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2  
(test for excess errors)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(internal compiler error: in extract_insn, at recog.cc:2812)
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
(test for excess errors)

And after the ICE is fixed, these are eliminated by only running the test 
for
rv64:

> New tests that FAIL (3 tests):
>
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O1   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O2   
check-function-bodies clear_buf_123
> unix/-march=rv32gcv: gcc: gcc.target/riscv/cmo-zicboz-zic64-1.c   -O3 -g  
 check-function-bodies clear_buf_123

gcc/
* config/riscv/riscv-string.cc
(riscv_expand_block_clear_zicboz_zic64b): Handle rv32 correctly.

gcc/testsuite

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Don't run on rv32.

(cherry picked from commit e410ad74e5e4589aeb666aa298b2f933e7b5d9e7)

Diff:
---
 gcc/config/riscv/riscv-string.cc| 5 -
 gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 87f5fdee3c14..b515f44d17ae 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -827,7 +827,10 @@ riscv_expand_block_clear_zicboz_zic64b (rtx dest, rtx 
length)
 {
   rtx mem = adjust_address (dest, BLKmode, offset);
   rtx addr = force_reg (Pmode, XEXP (mem, 0));
-  emit_insn (gen_riscv_zero_di (addr));
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_zero_di (addr));
+  else
+   emit_insn (gen_riscv_zero_si (addr));
   offset += cbo_bytes;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index c2d79eb7ae68..6d4535287d08 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
-/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
 /* { dg-final { check-function-bodies "**" "" } } */
 /* { dg-allow-blank-lines-in-output 1 } */


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Allow unaligned accesses in cpymemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:69408db9b2b3ede055f4392f9d30be33804eec77

commit 69408db9b2b3ede055f4392f9d30be33804eec77
Author: Christoph Müllner 
Date:   Wed May 1 18:50:38 2024 +0200

RISC-V: Allow unaligned accesses in cpymemsi expansion

The RISC-V cpymemsi expansion is called, whenever the by-pieces
infrastructure will not take care of the builtin expansion.
The code emitted by the by-pieces infrastructure may emits code,
that includes unaligned accesses if riscv_slow_unaligned_access_p
is false.

The RISC-V cpymemsi expansion is handled via riscv_expand_block_move().
The current implementation of this function does not check
riscv_slow_unaligned_access_p and never emits unaligned accesses.

Since by-pieces emits unaligned accesses, it is reasonable to implement
the same behaviour in the cpymemsi expansion. And that's what this patch
is doing.

The patch checks riscv_slow_unaligned_access_p at the entry and sets
the allowed alignment accordingly. This alignment is then propagated
down to the routines that emit the actual instructions.

The changes introduced by this patch can be seen in the adjustments
of the cpymem tests.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight): Add
parameter align.
(riscv_adjust_block_mem): Replace parameter length by align.
(riscv_block_move_loop): Add parameter align.
(riscv_expand_block_move_scalar): Set alignment properly if the
target has fast unaligned access.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 04cd8ccaec90405ccf7471252c0e06ba7f5437dc)

Diff:
---
 gcc/config/riscv/riscv-string.cc   | 54 --
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 20 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 14 ++-
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b515f44d17ae..b6cd70323563 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -617,11 +617,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
-/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST
+   with accesses that are ALIGN bytes aligned.
Assume that the areas do not overlap.  */
 
 static void
-riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length,
+  unsigned HOST_WIDE_INT align)
 {
   unsigned HOST_WIDE_INT offset, delta;
   unsigned HOST_WIDE_INT bits;
@@ -629,8 +631,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
   enum machine_mode mode;
   rtx *regs;
 
-  bits = MAX (BITS_PER_UNIT,
- MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
+  bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align));
 
   mode = mode_for_size (bits, MODE_INT, 0).require ();
   delta = bits / BITS_PER_UNIT;
@@ -655,21 +656,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length)
 {
   src = adjust_address (src, BLKmode, offset);
   dest = adjust_address (dest, BLKmode, offset);
-  move_by_pieces (dest, src, length - offset,
- MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+  move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN);
 }
 }
 
 /* Helper function for doing a loop-based block operation on memory
-   reference MEM.  Each iteration of the loop will operate on LENGTH
-   bytes of MEM.
+   reference MEM.
 
Create a new base register for use within the loop and point it to
the start of MEM.  Create a new memory reference that uses this
-   register.  Store them in *LOOP_REG and *LOOP_MEM respectively.  */
+   register and has an alignment of ALIGN.  Store them in *LOOP_REG
+   and *LOOP_MEM respectively.  */
 
 static void
-riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length,
+riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align,
rtx *loop_reg, rtx *loop_mem)
 {
   *loop_reg = copy_addr_to_reg (XEXP (mem, 0));
@@ -677,15 +677,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT 
length,
   /* Although the new mem does not refer to a known location,
  it does keep up to LENGTH bytes of alignment.  */
   *loop_mem = change_address (mem, BLKmode, *loop_reg);
-  set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT));
+  set_mem_align (*loop_mem, align);
 }
 
 /* Move LENGTH bytes from SRC t

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Add test cases for cpymem expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:0dcd2d26d0da77af7f173b6c0d79a7f5ea25c642

commit 0dcd2d26d0da77af7f173b6c0d79a7f5ea25c642
Author: Christoph Müllner 
Date:   Wed May 1 16:54:42 2024 +0200

RISC-V: Add test cases for cpymem expansion

We have two mechanisms in the RISC-V backend that expand
cpymem pattern: a) by-pieces, b) riscv_expand_block_move()
in riscv-string.cc. The by-pieces framework has higher priority
and emits a sequence of up to 15 instructions
(see use_by_pieces_infrastructure_p() for more details).

As a rule-of-thumb, by-pieces emits alternating load/store sequences
and the setmem expansion in the backend emits a sequence of loads
followed by a sequence of stores.

Let's add some test cases to document the current behaviour
and to have tests to identify regressions.

Signed-off-by: Christoph Müllner 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: New test.
* gcc.target/riscv/cpymem-32.c: New test.
* gcc.target/riscv/cpymem-64-ooo.c: New test.
* gcc.target/riscv/cpymem-64.c: New test.

(cherry picked from commit 00029408387e9cc64e135324c22d15cd5a70e946)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 131 +++
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 138 +
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 129 +++
 gcc/testsuite/gcc.target/riscv/cpymem-64.c | 138 +
 4 files changed, 536 insertions(+)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
new file mode 100644
index ..33fb9891d823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -0,0 +1,131 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* { dg-allow-blank-lines-in-output 1 } */
+
+#define COPY_N(N)  \
+void copy_##N (void *to, void *from)   \
+{  \
+  __builtin_memcpy (to, from, N);  \
+}
+
+#define COPY_ALIGNED_N(N)  \
+void copy_aligned_##N (void *to, void *from)   \
+{  \
+  to = __builtin_assume_aligned(to, sizeof(long)); \
+  from = __builtin_assume_aligned(from, sizeof(long)); \
+  __builtin_memcpy (to, from, N);  \
+}
+
+/*
+**copy_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_N(7)
+
+/*
+**copy_aligned_7:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],6\([at][0-9]\)
+**sb\t[at][0-9],6\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(7)
+
+/*
+**copy_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_N(8)
+
+/*
+**copy_aligned_8:
+**...
+**lw\ta[0-9],0\(a[0-9]\)
+**sw\ta[0-9],0\(a[0-9]\)
+**...
+*/
+COPY_ALIGNED_N(8)
+
+/*
+**copy_11:
+**...
+**lbu\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**...
+**sb\t[at][0-9],0\([at][0-9]\)
+**...
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_N(11)
+
+/*
+**copy_aligned_11:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],10\([at][0-9]\)
+**sb\t[at][0-9],10\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(11)
+
+/*
+**copy_15:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(15)
+
+/*
+**copy_aligned_15:
+**...
+**lw\t[at][0-9],0\([at][0-9]\)
+**...
+**sw\t[at][0-9],0\([at][0-9]\)
+**...
+**lbu\t[at][0-9],14\([at][0-9]\)
+**sb\t[at][0-9],14\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(15)
+
+/*
+**copy_27:
+**...
+**(call|tail)\tmemcpy
+**...
+*/
+COPY_N(27)
+
+/*
+**copy_aligned_27:
+**...
+**lw\t[at][0-9],20\([at][0-9]\)
+**...
+**sw\t[at][0-9],20\([at][0-9]\)
+**...
+**lbu\t[at][0-9],26\([at][0-9]\)
+**sb\t[at][0-9],26\([at][0-9]\)
+**...
+*/
+COPY_ALIGNED_N(27)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
new file mode 100644
index ..44ba14a1d51f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c
@@ -0,0 +1,138 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv32 } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
+/* { dg-final { che

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: add tests for overlapping mem ops

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:ad0413b832400aa9e81e20070b3ef6b0a9a6d888

commit ad0413b832400aa9e81e20070b3ef6b0a9a6d888
Author: Christoph Müllner 
Date:   Mon Apr 29 03:06:52 2024 +0200

RISC-V: add tests for overlapping mem ops

A recent patch added the field overlap_op_by_pieces to the struct
riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P()
hook. This hook is used by the by-pieces infrastructure to decide
if overlapping memory accesses should be emitted.

The changes in the expansion can be seen in the adjustments of the
cpymem test cases. These tests also reveal a limitation in the
RISC-V cpymem expansion that prevents this optimization as only
by-pieces cpymem expansions emit overlapping memory accesses.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping
access.
* gcc.target/riscv/cpymem-64-ooo.c: Likewise.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5814437b4fcc550697d6e286f49a2f8b108815bf)

Diff:
---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 20 +++-
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 33 ++
 2 files changed, 20 insertions(+), 33 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 946a773f77a0..947d58c30fa3 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)  \
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(7)
@@ -66,11 +64,10 @@ COPY_ALIGNED_N(8)
 **...
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
-**...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(11)
@@ -79,11 +76,10 @@ COPY_N(11)
 **copy_aligned_11:
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
-**...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(11)
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
index 08a927b94835..108748690cd3 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c
@@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from)  \
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_N(7)
@@ -36,9 +35,8 @@ COPY_N(7)
 **...
 **lw\t[at][0-9],0\([at][0-9]\)
 **sw\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],6\([at][0-9]\)
-**sb\t[at][0-9],6\([at][0-9]\)
+**lw\t[at][0-9],3\([at][0-9]\)
+**sw\t[at][0-9],3\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(7)
@@ -66,9 +64,8 @@ COPY_ALIGNED_N(8)
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(11)
@@ -77,11 +74,9 @@ COPY_N(11)
 **copy_aligned_11:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],10\([at][0-9]\)
-**sb\t[at][0-9],10\([at][0-9]\)
+**lw\t[at][0-9],7\([at][0-9]\)
+**sw\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(11)
@@ -90,11 +85,9 @@ COPY_ALIGNED_N(11)
 **copy_15:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**ld\t[at][0-9],7\([at][0-9]\)
+**sd\t[at][0-9],7\([at][0-9]\)
 **...
 */
 COPY_N(15)
@@ -103,11 +96,9 @@ COPY_N(15)
 **copy_aligned_15:
 **...
 **ld\t[at][0-9],0\([at][0-9]\)
-**...
 **sd\t[at][0-9],0\([at][0-9]\)
-**...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**ld\t[at][0-9],7\([at][0-9]\)
+**sd\

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:59e6343f99eb53da07bbd6198f083ce1bbdf20d8

commit 59e6343f99eb53da07bbd6198f083ce1bbdf20d8
Author: Christoph Müllner 
Date:   Mon Apr 29 02:53:20 2024 +0200

RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight

The current implementation of riscv_block_move_straight() emits a couple
of loads/stores with with maximum width (e.g. 8-byte for RV64).
The remainder is handed over to move_by_pieces().
The by-pieces framework utilizes target hooks to decide about the emitted
instructions (e.g. unaligned accesses or overlapping accesses).

Since the current implementation will always request less than XLEN bytes
to be handled by the by-pieces infrastructure, it is impossible that
overlapping memory accesses can ever be emitted (the by-pieces code does
not know of any previous instructions that were emitted by the backend).

This patch changes the implementation of riscv_block_move_straight()
such, that it utilizes the by-pieces framework if the remaining data
is less than 2*XLEN bytes, which is sufficient to enable overlapping
memory accesses (if the requirements for them are given).

The changes in the expansion can be seen in the adjustments of the
cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are
caused by the different instruction ordering of the code emitted
by the by-pieces infrastructure, which emits alternating load/store
sequences.

gcc/ChangeLog:

* config/riscv/riscv-string.cc (riscv_block_move_straight):
Hand over up to 2xXLEN bytes to move_by_pieces().

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by
by-pieces.
* gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping
access.
* gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by
by-pieces.

Signed-off-by: Christoph Müllner 
(cherry picked from commit ad22c607f3e17f2c6ca45699c1d88adaa618c23c)

Diff:
---
 gcc/config/riscv/riscv-string.cc   |  6 +++---
 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 16 
 gcc/testsuite/gcc.target/riscv/cpymem-32.c | 10 --
 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c |  8 
 gcc/testsuite/gcc.target/riscv/cpymem-64.c |  9 +++--
 5 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index b6cd70323563..96394844bbb6 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -637,18 +637,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned 
HOST_WIDE_INT length,
   delta = bits / BITS_PER_UNIT;
 
   /* Allocate a buffer for the temporary registers.  */
-  regs = XALLOCAVEC (rtx, length / delta);
+  regs = XALLOCAVEC (rtx, length / delta - 1);
 
   /* Load as many BITS-sized chunks as possible.  Use a normal load if
  the source has enough alignment, otherwise use left/right pairs.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 {
   regs[i] = gen_reg_rtx (mode);
   riscv_emit_move (regs[i], adjust_address (src, mode, offset));
 }
 
   /* Copy the chunks to the destination.  */
-  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+  for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++)
 riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
 
   /* Mop up any left-over bytes.  */
diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c 
b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
index 947d58c30fa3..2a48567353a6 100644
--- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
+++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c
@@ -91,8 +91,8 @@ COPY_ALIGNED_N(11)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_N(15)
@@ -104,8 +104,8 @@ COPY_N(15)
 **...
 **sw\t[at][0-9],0\([at][0-9]\)
 **...
-**lbu\t[at][0-9],14\([at][0-9]\)
-**sb\t[at][0-9],14\([at][0-9]\)
+**lw\t[at][0-9],11\([at][0-9]\)
+**sw\t[at][0-9],11\([at][0-9]\)
 **...
 */
 COPY_ALIGNED_N(15)
@@ -117,8 +117,8 @@ COPY_ALIGNED_N(15)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**lbu\t[at][0-9],26\([at][0-9]\)
-**sb\t[at][0-9],26\([at][0-9]\)
+**lw\t[at][0-9],23\([at][0-9]\)
+**sw\t[at][0-9],23\([at][0-9]\)
 **...
 */
 COPY_N(27)
@@ -130,8 +130,8 @@ COPY_N(27)
 **...
 **sw\t[at][0-9],20\([at][0-9]\)
 **...
-**lbu\t[at][0-9],

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:f3d5808070acf09d4ca1da5f5e692be52e3a73a6

commit f3d5808070acf09d4ca1da5f5e692be52e3a73a6
Author: Christoph Müllner 
Date:   Wed May 15 01:34:54 2024 +0200

RISC-V: Test cbo.zero expansion for rv32

We had an issue when expanding via cmo-zero for RV32.
This was fixed upstream, but we don't have a RV32 test.
Therefore, this patch introduces such a test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Signed-off-by: Christoph Müllner 
(cherry picked from commit 5609d77e683944439fae38323ecabc44a1eb4671)

Diff:
---
 .../gcc.target/riscv/cmo-zicboz-zic64-1.c  | 37 +++---
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
index 6d4535287d08..9192b391b11d 100644
--- a/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicboz-zic64-1.c
@@ -1,24 +1,9 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zic64b_zicboz -mabi=lp64d" } */
+/* { dg-options "-march=rv32gc_zic64b_zicboz" { target { rv32 } } } */
+/* { dg-options "-march=rv64gc_zic64b_zicboz" { target { rv64 } } } */
 /* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */
-/* { dg-final { check-function-bodies "**" "" } } */
-/* { dg-allow-blank-lines-in-output 1 } */
 
-/*
-**clear_buf_123:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**sd\tzero,64\(a[0-9]+\)
-**sd\tzero,72\(a[0-9]+\)
-**sd\tzero,80\(a[0-9]+\)
-**sd\tzero,88\(a[0-9]+\)
-**sd\tzero,96\(a[0-9]+\)
-**sd\tzero,104\(a[0-9]+\)
-**sd\tzero,112\(a[0-9]+\)
-**sh\tzero,120\(a[0-9]+\)
-**sb\tzero,122\(a[0-9]+\)
-**...
-*/
+// 1x cbo.zero, 7x sd (rv64) or 14x sw (rv32), 1x sh, 1x sb
 int
 clear_buf_123 (void *p)
 {
@@ -26,17 +11,17 @@ clear_buf_123 (void *p)
   __builtin_memset (p, 0, 123);
 }
 
-/*
-**clear_buf_128:
-**...
-**cbo\.zero\t0\(a[0-9]+\)
-**addi\ta[0-9]+,a[0-9]+,64
-**cbo\.zero\t0\(a[0-9]+\)
-**...
-*/
+// 2x cbo.zero, 1x addi 64
 int
 clear_buf_128 (void *p)
 {
   p = __builtin_assume_aligned(p, 64);
   __builtin_memset (p, 0, 128);
 }
+
+/* { dg-final { scan-assembler-times "cbo\.zero\t" 3 } } */
+/* { dg-final { scan-assembler-times "addi\ta\[0-9\]+,a\[0-9\]+,64" 1 } } */
+/* { dg-final { scan-assembler-times "sd\t" 7 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "sw\t" 14 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "sh\t" 1 } } */
+/* { dg-final { scan-assembler-times "sb\t" 1 } } */


[gcc r15-524] [v2,1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe

commit r15-524-g4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe
Author: Christoph Müllner 
Date:   Wed May 15 12:18:20 2024 -0600

[v2,1/2] RISC-V: Add cmpmemsi expansion

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
li  a4,0
j   .L2
.L8:
bgeua4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addia4,a4,1
li  a7,15// missed hoisting
subwa5,a5,a6
andia5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subwa0,a0,a5
ret
.L7:
li  a0,0
ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
ld  a5,0(a0)
ld  a4,0(a1)
bne a5,a4,.L2
ld  a5,8(a0)
ld  a4,8(a1)
sllia5,a5,8
sllia4,a4,8
bne a5,a4,.L2
li  a0,0
.L3:
sext.w  a0,a0
ret
.L2:
rev8a5,a5
rev8a4,a4
sltua5,a5,a4
neg a5,a5
ori a0,a5,1
j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory 
compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

Diff:
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-string.cc| 40 +--
 gcc/config/riscv/riscv.md   | 15 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |  6 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c | 42 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c | 43 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi.c   | 22 +++
 7 files changed, 155 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5c8a52b78a22..565ead1382a7 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_block_clear (rtx, rtx);
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 96394844bbb6..8f3b6f925e01 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,35 +86,47 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HELPER2(th_tstnbz) /* do_th_tstnbz2  */
 GEN_EMIT_HELPER3(xor) /* do_xo

[gcc r15-525] [v2, 2/2] RISC-V: strcmp expansion: Use adjust_address() for address calculation

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:1fbbae1d4ba3618a3da829a6d7e11a1606a583b3

commit r15-525-g1fbbae1d4ba3618a3da829a6d7e11a1606a583b3
Author: Christoph Müllner 
Date:   Wed May 15 12:19:40 2024 -0600

[v2,2/2] RISC-V: strcmp expansion: Use adjust_address() for address 
calculation

We have an arch-independent routine to generate an address with an offset.
Let's use that instead of doing the calculation in the backend.

gcc/ChangeLog:

* config/riscv/riscv-string.cc 
(emit_strcmp_scalar_load_and_compare):
Use adjust_address() to calculate MEM-PLUS pattern.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8f3b6f925e01..cbb9724d2308 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -227,8 +227,6 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
 rtx final_label)
 {
   const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
-  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
-  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
   unsigned HOST_WIDE_INT offset = 0;
 
   rtx testval = gen_reg_rtx (Xmode);
@@ -246,10 +244,10 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx 
src1, rtx src2,
   else
load_mode = Xmode;
 
-  rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data1, addr1, src1);
-  rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data2, addr2, src2);
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
 
   if (cmp_bytes == 1)
{


[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [v2, 1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d57dfea6e051695349fb9f6da1c30899b7f5

commit d57dfea6e051695349fb9f6da1c30899b7f5
Author: Christoph Müllner 
Date:   Wed May 15 12:18:20 2024 -0600

[v2,1/2] RISC-V: Add cmpmemsi expansion

GCC has a generic cmpmemsi expansion via the by-pieces framework,
which shows some room for target-specific optimizations.
E.g. for comparing two aligned memory blocks of 15 bytes
we get the following sequence:

my_mem_cmp_aligned_15:
li  a4,0
j   .L2
.L8:
bgeua4,a7,.L7
.L2:
add a2,a0,a4
add a3,a1,a4
lbu a5,0(a2)
lbu a6,0(a3)
addia4,a4,1
li  a7,15// missed hoisting
subwa5,a5,a6
andia5,a5,0xff // useless
beq a5,zero,.L8
lbu a0,0(a2) // loading again!
lbu a5,0(a3) // loading again!
subwa0,a0,a5
ret
.L7:
li  a0,0
ret

Diff first byte: 15 insns
Diff second byte: 25 insns
No diff: 25 insns

Possible improvements:
* unroll the loop and use load-with-displacement to avoid offset increments
* load and compare multiple (aligned) bytes at once
* Use the bitmanip/strcmp result calculation (reverse words and
  synthesize (a2 >= a3) ? 1 : -1 in a branchless sequence)

When applying these improvements we get the following sequence:

my_mem_cmp_aligned_15:
ld  a5,0(a0)
ld  a4,0(a1)
bne a5,a4,.L2
ld  a5,8(a0)
ld  a4,8(a1)
sllia5,a5,8
sllia4,a4,8
bne a5,a4,.L2
li  a0,0
.L3:
sext.w  a0,a0
ret
.L2:
rev8a5,a5
rev8a4,a4
sltua5,a5,a4
neg a5,a5
ori a0,a5,1
j   .L3

Diff first byte: 11 insns
Diff second byte: 16 insns
No diff: 11 insns

This patch implements this improvements.

The tests consist of a execution test (similar to
gcc/testsuite/gcc.dg/torture/inline-mem-cmp-1.c) and a few tests
that test the expansion conditions (known length and alignment).

Similar to the cpymemsi expansion this patch does not introduce any
gating for the cmpmemsi expansion (on top of requiring the known length,
alignment and Zbb).

Bootstrapped and SPEC CPU 2017 tested.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_expand_block_compare): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER2): New helper
for zero_extendhi.
(do_load_from_addr): Add support for HI and SI/64 modes.
(do_load): Add helper for zero-extended loads.
(emit_memcmp_scalar_load_and_compare): New helper to emit memcmp.
(emit_memcmp_scalar_result_calculation): Likewise.
(riscv_expand_block_compare_scalar): Likewise.
(riscv_expand_block_compare): New RISC-V expander for memory 
compare.
* config/riscv/riscv.md (cmpmemsi): New cmpmem expansion.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmpmemsi-1.c: New test.
* gcc.target/riscv/cmpmemsi-2.c: New test.
* gcc.target/riscv/cmpmemsi-3.c: New test.
* gcc.target/riscv/cmpmemsi.c: New test.

(cherry picked from commit 4bf1aa1ab90dd487fadc27c86523ec3562b2d2fe)

Diff:
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv-string.cc| 40 +--
 gcc/config/riscv/riscv.md   | 15 ++
 gcc/testsuite/gcc.target/riscv/cmpmemsi-1.c |  6 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-2.c | 42 
 gcc/testsuite/gcc.target/riscv/cmpmemsi-3.c | 43 +
 gcc/testsuite/gcc.target/riscv/cmpmemsi.c   | 22 +++
 7 files changed, 155 insertions(+), 14 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5c8a52b78a22..565ead1382a7 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -189,6 +189,7 @@ rtl_opt_pass * make_pass_avlprop (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_compare (rtx, rtx, rtx, rtx);
 extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_expand_block_clear (rtx, rtx);
 
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 96394844bbb6..8f3b6f925e01 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -86,35 +86,47 @@ GEN_EMIT_HELPER2(th_rev) /* do_th_rev2  */
 GEN_EMIT_HE

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] [v2, 2/2] RISC-V: strcmp expansion: Use adjust_address() for address calculation

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:72e6ff2bcf293116099988ebd367182cba699e9b

commit 72e6ff2bcf293116099988ebd367182cba699e9b
Author: Christoph Müllner 
Date:   Wed May 15 12:19:40 2024 -0600

[v2,2/2] RISC-V: strcmp expansion: Use adjust_address() for address 
calculation

We have an arch-independent routine to generate an address with an offset.
Let's use that instead of doing the calculation in the backend.

gcc/ChangeLog:

* config/riscv/riscv-string.cc 
(emit_strcmp_scalar_load_and_compare):
Use adjust_address() to calculate MEM-PLUS pattern.

(cherry picked from commit 1fbbae1d4ba3618a3da829a6d7e11a1606a583b3)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 8f3b6f925e01..cbb9724d2308 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -227,8 +227,6 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx src1, 
rtx src2,
 rtx final_label)
 {
   const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
-  rtx src1_addr = force_reg (Pmode, XEXP (src1, 0));
-  rtx src2_addr = force_reg (Pmode, XEXP (src2, 0));
   unsigned HOST_WIDE_INT offset = 0;
 
   rtx testval = gen_reg_rtx (Xmode);
@@ -246,10 +244,10 @@ emit_strcmp_scalar_load_and_compare (rtx result, rtx 
src1, rtx src2,
   else
load_mode = Xmode;
 
-  rtx addr1 = gen_rtx_PLUS (Pmode, src1_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data1, addr1, src1);
-  rtx addr2 = gen_rtx_PLUS (Pmode, src2_addr, GEN_INT (offset));
-  do_load_from_addr (load_mode, data2, addr2, src2);
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
 
   if (cmp_bytes == 1)
{


[gcc r15-526] analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

2024-05-15 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:1779e22150b917e28e959623c819ef943fab02df

commit r15-526-g1779e22150b917e28e959623c819ef943fab02df
Author: David Malcolm 
Date:   Wed May 15 18:40:56 2024 -0400

analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

gcc/analyzer/ChangeLog:
PR analyzer/114899
* access-diagram.cc
(written_svalue_spatial_item::get_label_string): Bulletproof
against SSA_NAME_VAR being null.

gcc/testsuite/ChangeLog:
PR analyzer/114899
* c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c: New test.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/analyzer/access-diagram.cc|  3 ++-
 .../analyzer/out-of-bounds-diagram-pr114899.c | 15 +++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/access-diagram.cc b/gcc/analyzer/access-diagram.cc
index 500480b68328..8d7461fe381d 100644
--- a/gcc/analyzer/access-diagram.cc
+++ b/gcc/analyzer/access-diagram.cc
@@ -1632,7 +1632,8 @@ protected:
 if (rep_tree)
   {
if (TREE_CODE (rep_tree) == SSA_NAME)
- rep_tree = SSA_NAME_VAR (rep_tree);
+ if (tree var = SSA_NAME_VAR (rep_tree))
+   rep_tree = var;
switch (TREE_CODE (rep_tree))
  {
  default:
diff --git 
a/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c 
b/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c
new file mode 100644
index ..14ba540d4ec2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/analyzer/out-of-bounds-diagram-pr114899.c
@@ -0,0 +1,15 @@
+/* Verify we don't ICE generating out-of-bounds diagram.  */
+
+/* { dg-additional-options " -fsanitize=undefined 
-fdiagnostics-text-art-charset=unicode" } */
+
+int * a() {
+  int *b = (int *)__builtin_malloc(sizeof(int));
+  int *c = b - 1;
+  ++*c;
+  return b;
+}
+
+/* We don't care about the exact diagram, just that we don't ICE.  */
+
+/* { dg-allow-blank-lines-in-output 1 } */
+/* { dg-prune-output ".*" } */


[gcc r15-527] Add missing hunk in recent change.

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:d7e6fe0f72ad41b8361f927d2796dbc275347297

commit r15-527-gd7e6fe0f72ad41b8361f927d2796dbc275347297
Author: Jeff Law 
Date:   Wed May 15 17:05:24 2024 -0600

Add missing hunk in recent change.

gcc/
* config/riscv/riscv-string.cc: Add missing hunk from last change.

Diff:
---
 gcc/config/riscv/riscv-string.cc | 177 +++
 1 file changed, 177 insertions(+)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index cbb9724d2308..83e7afbd693b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -627,6 +627,183 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
+/* Generate the sequence of load and compares for memcmp using Zbb.
+
+   RESULT is the register where the return value of memcmp will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).
+   DATA1 and DATA2 are registers where the data chunks will be stored.
+   DIFF_LABEL is the location of the code that calculates the return value.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+unsigned HOST_WIDE_INT nbytes,
+rtx data1, rtx data2,
+rtx diff_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  unsigned HOST_WIDE_INT offset = 0;
+
+  while (nbytes > 0)
+{
+  unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+  machine_mode load_mode;
+
+  /* Special cases to avoid masking of trailing bytes.  */
+  if (cmp_bytes == 1)
+   load_mode = QImode;
+  else if (cmp_bytes == 2)
+   load_mode = HImode;
+  else if (cmp_bytes == 4)
+   load_mode = SImode;
+  else
+   load_mode = Xmode;
+
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
+
+  /* Fast-path for a single byte.  */
+  if (cmp_bytes == 1)
+   {
+ rtx tmp = gen_reg_rtx (Xmode);
+ do_sub3 (tmp, data1, data2);
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ emit_jump_insn (gen_jump (final_label));
+ emit_barrier (); /* No fall-through.  */
+ return;
+   }
+
+  /* Shift off trailing bytes in words if needed.  */
+  unsigned int load_bytes = GET_MODE_SIZE (load_mode).to_constant ();
+  if (cmp_bytes < load_bytes)
+   {
+ int shamt = (load_bytes - cmp_bytes) * BITS_PER_UNIT;
+ do_ashl3 (data1, data1, GEN_INT (shamt));
+ do_ashl3 (data2, data2, GEN_INT (shamt));
+   }
+
+  /* Break out if data1 != data2 */
+  rtx cond = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond, data1,
+data2, diff_label));
+  /* Fall-through on equality.  */
+
+  offset += cmp_bytes;
+  nbytes -= cmp_bytes;
+}
+}
+
+/* memcmp result calculation.
+
+   RESULT is the register where the return value will be stored.
+   The two data chunks are in DATA1 and DATA2.  */
+
+static void
+emit_memcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2)
+{
+  /* Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand memcmp using scalar instructions (incl. Zbb).
+
+   RESULT is the register where the return value will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).  */
+
+static bool
+riscv_expand_block_compare_scalar (rtx result, rtx src1, rtx src2, rtx nbytes)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  if (!CONST_INT_P (nbytes))
+return false;
+
+  /* We need the rev (bswap) instruction.  */
+  if (!TARGET_ZBB)
+return false;
+
+  unsigned HOST_WIDE_INT length = UINTVAL (nbytes);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (length > IMM_REACH)
+length = IMM_REACH;
+
+  /* We need xlen-aligned memory.  */
+  unsigned HOST_WIDE_INT align = MIN (MEM_ALIGN (src1), MEM_ALIGN (src2));
+  if (align < (xlen * BITS_PER_UNIT))
+return false;
+
+  if (length > RISCV_MAX_MOVE_BYTES_STRAIGHT)
+return false;

[gcc(refs/vendors/riscv/heads/gcc-14-with-riscv-opts)] Add missing hunk in recent change.

2024-05-15 Thread Jeff Law via Gcc-cvs
https://gcc.gnu.org/g:45c5684c8242add5e97a392374dc160a6e68f2f0

commit 45c5684c8242add5e97a392374dc160a6e68f2f0
Author: Jeff Law 
Date:   Wed May 15 17:05:24 2024 -0600

Add missing hunk in recent change.

gcc/
* config/riscv/riscv-string.cc: Add missing hunk from last change.

(cherry picked from commit d7e6fe0f72ad41b8361f927d2796dbc275347297)

Diff:
---
 gcc/config/riscv/riscv-string.cc | 177 +++
 1 file changed, 177 insertions(+)

diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index cbb9724d2308..83e7afbd693b 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -627,6 +627,183 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
   return false;
 }
 
+/* Generate the sequence of load and compares for memcmp using Zbb.
+
+   RESULT is the register where the return value of memcmp will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).
+   DATA1 and DATA2 are registers where the data chunks will be stored.
+   DIFF_LABEL is the location of the code that calculates the return value.
+   FINAL_LABEL is the location of the code that comes after the calculation
+   of the return value.  */
+
+static void
+emit_memcmp_scalar_load_and_compare (rtx result, rtx src1, rtx src2,
+unsigned HOST_WIDE_INT nbytes,
+rtx data1, rtx data2,
+rtx diff_label, rtx final_label)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+  unsigned HOST_WIDE_INT offset = 0;
+
+  while (nbytes > 0)
+{
+  unsigned HOST_WIDE_INT cmp_bytes = xlen < nbytes ? xlen : nbytes;
+  machine_mode load_mode;
+
+  /* Special cases to avoid masking of trailing bytes.  */
+  if (cmp_bytes == 1)
+   load_mode = QImode;
+  else if (cmp_bytes == 2)
+   load_mode = HImode;
+  else if (cmp_bytes == 4)
+   load_mode = SImode;
+  else
+   load_mode = Xmode;
+
+  rtx addr1 = adjust_address (src1, load_mode, offset);
+  do_load (load_mode, data1, addr1);
+  rtx addr2 = adjust_address (src2, load_mode, offset);
+  do_load (load_mode, data2, addr2);
+
+  /* Fast-path for a single byte.  */
+  if (cmp_bytes == 1)
+   {
+ rtx tmp = gen_reg_rtx (Xmode);
+ do_sub3 (tmp, data1, data2);
+ emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+ emit_jump_insn (gen_jump (final_label));
+ emit_barrier (); /* No fall-through.  */
+ return;
+   }
+
+  /* Shift off trailing bytes in words if needed.  */
+  unsigned int load_bytes = GET_MODE_SIZE (load_mode).to_constant ();
+  if (cmp_bytes < load_bytes)
+   {
+ int shamt = (load_bytes - cmp_bytes) * BITS_PER_UNIT;
+ do_ashl3 (data1, data1, GEN_INT (shamt));
+ do_ashl3 (data2, data2, GEN_INT (shamt));
+   }
+
+  /* Break out if data1 != data2 */
+  rtx cond = gen_rtx_NE (VOIDmode, data1, data2);
+  emit_unlikely_jump_insn (gen_cbranch4 (Pmode, cond, data1,
+data2, diff_label));
+  /* Fall-through on equality.  */
+
+  offset += cmp_bytes;
+  nbytes -= cmp_bytes;
+}
+}
+
+/* memcmp result calculation.
+
+   RESULT is the register where the return value will be stored.
+   The two data chunks are in DATA1 and DATA2.  */
+
+static void
+emit_memcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2)
+{
+  /* Get bytes in big-endian order and compare as words.  */
+  do_bswap2 (data1, data1);
+  do_bswap2 (data2, data2);
+  /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
+  rtx tmp = gen_reg_rtx (Xmode);
+  emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2));
+  do_neg2 (tmp, tmp);
+  do_ior3 (tmp, tmp, const1_rtx);
+  emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp)));
+}
+
+/* Expand memcmp using scalar instructions (incl. Zbb).
+
+   RESULT is the register where the return value will be stored.
+   The source pointers are SRC1 and SRC2 (NBYTES bytes to compare).  */
+
+static bool
+riscv_expand_block_compare_scalar (rtx result, rtx src1, rtx src2, rtx nbytes)
+{
+  const unsigned HOST_WIDE_INT xlen = GET_MODE_SIZE (Xmode);
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  /* We don't support big endian.  */
+  if (BYTES_BIG_ENDIAN)
+return false;
+
+  if (!CONST_INT_P (nbytes))
+return false;
+
+  /* We need the rev (bswap) instruction.  */
+  if (!TARGET_ZBB)
+return false;
+
+  unsigned HOST_WIDE_INT length = UINTVAL (nbytes);
+
+  /* Limit to 12-bits (maximum load-offset).  */
+  if (length > IMM_REACH)
+length = IMM_REACH;
+
+  /* We need xlen-aligned memory.  */
+  unsigned HOST_WIDE_INT align = MIN (MEM_ALIGN (src1), MEM_ALIGN (src2));
+  if (align < (xlen * BITS_PER_UNIT))
+return false

[gcc r15-529] Optimize ashift >> 7 to vpcmpgtb for vector int8.

2024-05-15 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:0cc0956b3bb8bcbc9196075b9073a227d799e042

commit r15-529-g0cc0956b3bb8bcbc9196075b9073a227d799e042
Author: liuhongt 
Date:   Tue May 14 18:39:54 2024 +0800

Optimize ashift >> 7 to vpcmpgtb for vector int8.

Since there is no corresponding instruction, the shift operation for
vector int8 is implemented using the instructions for vector int16,
but for some special shift counts, it can be transformed into vpcmpgtb.

gcc/ChangeLog:

PR target/114514
* config/i386/i386-expand.cc
(ix86_expand_vec_shift_qihi_constant): Optimize ashift >> 7 to
vpcmpgtb.
(ix86_expand_vecop_qihi_partial): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114514-shift.c: New test.

Diff:
---
 gcc/config/i386/i386-expand.cc | 32 +
 gcc/testsuite/gcc.target/i386/pr114514-shift.c | 49 ++
 2 files changed, 81 insertions(+)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e846a946de07..4c47cfe468ef 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24246,6 +24246,28 @@ ix86_expand_vec_shift_qihi_constant (enum rtx_code 
code,
 return false;
 
   gcc_assert (code == ASHIFT || code == ASHIFTRT || code == LSHIFTRT);
+
+
+  if (shift_amount == 7
+  && code == ASHIFTRT)
+{
+  if (qimode == V16QImode
+ || qimode == V32QImode)
+   {
+ rtx zero = gen_reg_rtx (qimode);
+ emit_move_insn (zero, CONST0_RTX (qimode));
+ emit_move_insn (dest, gen_rtx_fmt_ee (GT, qimode, zero, op1));
+   }
+  else
+   {
+ gcc_assert (qimode == V64QImode);
+ rtx kmask = gen_reg_rtx (DImode);
+ emit_insn (gen_avx512bw_cvtb2maskv64qi (kmask, op1));
+ emit_insn (gen_avx512bw_cvtmask2bv64qi (dest, kmask));
+   }
+  return true;
+}
+
   /* Record sign bit.  */
   xor_constant = 1 << (8 - shift_amount - 1);
 
@@ -24356,6 +24378,16 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, 
rtx dest, rtx op1, rtx op2)
   return;
 }
 
+  if (CONST_INT_P (op2)
+  && code == ASHIFTRT
+  && INTVAL (op2) == 7)
+{
+  rtx zero = gen_reg_rtx (qimode);
+  emit_move_insn (zero, CONST0_RTX (qimode));
+  emit_move_insn (dest, gen_rtx_fmt_ee (GT, qimode, zero, op1));
+  return;
+}
+
   switch (code)
 {
 case MULT:
diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shift.c 
b/gcc/testsuite/gcc.target/i386/pr114514-shift.c
new file mode 100644
index ..cf8b32b3b1d2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114514-shift.c
@@ -0,0 +1,49 @@
+/* { dg-do compile  } */
+/* { dg-options "-mavx512vl -mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "vpxor" 4 } } */
+/* { dg-final { scan-assembler-times "vpcmpgtb" 4 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpcmpgtb" 5 { target  ia32 } } } */
+/* { dg-final { scan-assembler-times "vpmovb2m" 1 } } */
+/* { dg-final { scan-assembler-times "vpmovm2b" 1 } } */
+
+
+typedef char v16qi __attribute__((vector_size(16)));
+typedef char v32qi __attribute__((vector_size(32)));
+typedef char v64qi __attribute__((vector_size(64)));
+typedef char v8qi __attribute__((vector_size(8)));
+typedef char v4qi __attribute__((vector_size(4)));
+
+v4qi
+__attribute__((noipa))
+foo1 (v4qi a)
+{
+  return a >> 7;
+}
+
+v8qi
+__attribute__((noipa))
+foo2 (v8qi a)
+{
+  return a >> 7;
+}
+
+v16qi
+__attribute__((noipa))
+foo3 (v16qi a)
+{
+  return a >> 7;
+}
+
+v32qi
+__attribute__((noipa))
+foo4 (v32qi a)
+{
+  return a >> 7;
+}
+
+v64qi
+__attribute__((noipa))
+foo5 (v64qi a)
+{
+  return a >> 7;
+}


[gcc r15-530] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread hongtao Liu via Gcc-cvs
https://gcc.gnu.org/g:090714e6cf8029f4ff8883dce687200024adbaeb

commit r15-530-g090714e6cf8029f4ff8883dce687200024adbaeb
Author: liuhongt 
Date:   Wed May 15 10:56:24 2024 +0800

Set d.one_operand_p to true when TARGET_SSSE3 in 
ix86_expand_vecop_qihi_partial.

pshufb is available under TARGET_SSSE3, so
ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3.

With the patch under -march=x86-64-v2

v8qi
foo (v8qi a)
{
  return a >> 5;
}

<   pmovsxbw%xmm0, %xmm0
<   psraw   $5, %xmm0
<   pshufb  .LC0(%rip), %xmm0

vs.

>   movdqa  %xmm0, %xmm1
>   pcmpeqd %xmm0, %xmm0
>   pmovsxbw%xmm1, %xmm1
>   psrlw   $8, %xmm0
>   psraw   $5, %xmm1
>   pand%xmm1, %xmm0
>   packuswb%xmm0, %xmm0

Although there's a memory load from constant pool, but it should be
better when it's inside a loop. The load from constant pool can be
hoist out. it's 1 instruction vs 4 instructions.

<   pshufb  .LC0(%rip), %xmm0

vs.

>   pcmpeqd %xmm0, %xmm0
>   psrlw   $8, %xmm0
>   pand%xmm1, %xmm0
>   packuswb%xmm0, %xmm0

gcc/ChangeLog:

PR target/114514
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
Set d.one_operand_p to true when TARGET_SSSE3.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr114514-shufb.c: New test.

Diff:
---
 gcc/config/i386/i386-expand.cc |  2 +-
 gcc/testsuite/gcc.target/i386/pr114514-shufb.c | 35 ++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 4c47cfe468ef..4e16aedc5c13 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24458,7 +24458,7 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx 
dest, rtx op1, rtx op2)
   d.op0 = d.op1 = qres;
   d.vmode = V16QImode;
   d.nelt = 16;
-  d.one_operand_p = false;
+  d.one_operand_p = TARGET_SSSE3;
   d.testing_p = false;
 
   for (i = 0; i < d.nelt; ++i)
diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shufb.c 
b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c
new file mode 100644
index ..71fdc9d8daf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-msse4.1 -O2 -mno-avx512f" } */
+/* { dg-final { scan-assembler-not "packuswb" } }  */
+/* { dg-final { scan-assembler-times "pshufb" 4 { target { ! ia32 } } } }  */
+/* { dg-final { scan-assembler-times "pshufb" 6 { target  ia32 } } }  */
+
+typedef unsigned char v8uqi __attribute__((vector_size(8)));
+typedef  char v8qi __attribute__((vector_size(8)));
+typedef unsigned char v4uqi __attribute__((vector_size(4)));
+typedef  char v4qi __attribute__((vector_size(4)));
+
+v8qi
+foo (v8qi a)
+{
+  return a >> 5;
+}
+
+v8uqi
+foo1 (v8uqi a)
+{
+  return a >> 5;
+}
+
+v4qi
+foo2 (v4qi a)
+{
+  return a >> 5;
+}
+
+v4uqi
+foo3 (v4uqi a)
+{
+  return a >> 5;
+}
+


[gcc r15-531] RISC-V: Add Zvfbfwma extension to the -march= option

2024-05-15 Thread xiao via Gcc-cvs
https://gcc.gnu.org/g:38dd4e26e07c6be7cf4d169141ee4f3a03f3a09d

commit r15-531-g38dd4e26e07c6be7cf4d169141ee4f3a03f3a09d
Author: Xiao Zeng 
Date:   Wed May 15 10:03:40 2024 +0800

RISC-V: Add Zvfbfwma extension to the -march= option

This patch would like to add new sub extension (aka Zvfbfwma) to the
-march= option. It introduces a new data type BF16.

1 In spec: "Zvfbfwma requires the Zvfbfmin extension and the Zfbfmin 
extension."
  1.1 In EmbeddedProcessor: Zvfbfwma -> Zvfbfmin -> Zve32f
  1.2 In Application Processor: Zvfbfwma -> Zvfbfmin -> V
  1.3 In both scenarios, there are: Zvfbfwma -> Zfbfmin

2 Zvfbfmin's information is in:



3 Zfbfmin's formation is in:



4 Depending on different usage scenarios, the Zvfbfwma extension may
depend on 'V' or 'Zve32f'. This patch only implements dependencies in
scenario of Embedded Processor. This is consistent with the processing
strategy in Zvfbfmin. In scenario of Application Processor, it is
necessary to explicitly indicate the dependent 'V' extension.

5 You can locate more information about Zvfbfwma from below spec doc:



gcc/ChangeLog:

* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfbfwma item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt:
(MASK_ZVFBFWMA): New macro.
(TARGET_ZVFBFWMA): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-37.c: New test.
* gcc.target/riscv/arch-38.c: New test.
* gcc.target/riscv/predef-36.c: New test.
* gcc.target/riscv/predef-37.c: New test.

Diff:
---
 gcc/common/config/riscv/riscv-common.cc|  5 
 gcc/config/riscv/riscv.opt |  2 ++
 gcc/testsuite/gcc.target/riscv/arch-37.c   |  5 
 gcc/testsuite/gcc.target/riscv/arch-38.c   |  5 
 gcc/testsuite/gcc.target/riscv/predef-36.c | 48 ++
 gcc/testsuite/gcc.target/riscv/predef-37.c | 48 ++
 6 files changed, 113 insertions(+)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index fb76017ffbc0..88204393fde0 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -162,6 +162,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zfa", "f"},
 
   {"zvfbfmin", "zve32f"},
+  {"zvfbfwma", "zvfbfmin"},
+  {"zvfbfwma", "zfbfmin"},
   {"zvfhmin", "zve32f"},
   {"zvfh", "zve32f"},
   {"zvfh", "zfhmin"},
@@ -336,6 +338,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfbfmin",  ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvfbfwma",  ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfhmin",   ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvfh",  ISA_SPEC_CLASS_NONE, 1, 0},
 
@@ -1667,6 +1670,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zve64f",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
   {"zve64d",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
   {"zvfbfmin", &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16},
+  {"zvfbfwma", &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_BF_16},
   {"zvfhmin",  &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
   {"zvfh", &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
 
@@ -1704,6 +1708,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
   {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
   {"zvfbfmin",  &gcc_options::x_riscv_zf_subext, MASK_ZVFBFMIN},
+  {"zvfbfwma",  &gcc_options::x_riscv_zf_subext, MASK_ZVFBFWMA},
   {"zvfhmin",   &gcc_options::x_riscv_zf_subext, MASK_ZVFHMIN},
   {"zvfh",  &gcc_options::x_riscv_zf_subext, MASK_ZVFH},
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 1252834aec5b..d209ac896fde 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -401,6 +401,8 @@ Mask(ZFH) Var(riscv_zf_subext)
 
 Mask(ZVFBFMIN) Var(riscv_zf_subext)
 
+Mask(ZVFBFWMA) Var(riscv_zf_subext)
+
 Mask(ZVFHMIN) Var(riscv_zf_subext)
 
 Mask(ZVFH)Var(riscv_zf_subext)
diff --git a/gcc/testsuite/gcc.target/riscv/arch-37.c 
b/gcc/testsuite/gcc.target/riscv/arch-37.c
new file mode 100644
index ..5b19a73c5567
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-37.c
@@ -0,0 +1,5

[gcc r15-532] diagnostics: handle SGR codes in line_label::m_display_width

2024-05-15 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:a7be993806a90a58397e9d5bc9b54160ac9f35db

commit r15-532-ga7be993806a90a58397e9d5bc9b54160ac9f35db
Author: David Malcolm 
Date:   Wed May 15 21:22:51 2024 -0400

diagnostics: handle SGR codes in line_label::m_display_width

gcc/ChangeLog:
* diagnostic-show-locus.cc: Define INCLUDE_VECTOR and include
"text-art/types.h".
(line_label::line_label): Drop "policy" argument.  Use
styled_string::calc_canvas_width when computing m_display_width,
as this skips SGR codes.
(layout::print_any_labels): Update for line_label ctor change.
(selftest::test_one_liner_labels_utf8): Update expected text to
reflect that the labels can fit on one line if we don't get
confused by SGR colorization codes.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/diagnostic-show-locus.cc | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index ceccc0b793d1..f42006cfe2a1 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "version.h"
@@ -31,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest.h"
 #include "selftest-diagnostic.h"
 #include "cpplib.h"
+#include "text-art/types.h"
 
 #ifdef HAVE_TERMIOS_H
 # include 
@@ -1923,14 +1925,18 @@ struct pod_label_text
 class line_label
 {
 public:
-  line_label (const cpp_char_column_policy &policy,
- int state_idx, int column,
+  line_label (int state_idx, int column,
  label_text text)
   : m_state_idx (state_idx), m_column (column),
 m_text (std::move (text)), m_label_line (0), m_has_vbar (true)
   {
-const int bytes = strlen (m_text.m_buffer);
-m_display_width = cpp_display_width (m_text.m_buffer, bytes, policy);
+/* Using styled_string rather than cpp_display_width here
+   lets us skip SGR formatting characters for color and URLs.
+   It doesn't handle tabs and unicode escaping, but we don't
+   expect to see either of those in labels.  */
+text_art::style_manager sm;
+text_art::styled_string str (sm, m_text.m_buffer);
+m_display_width = str.calc_canvas_width ();
   }
 
   /* Sorting is primarily by column, then by state index.  */
@@ -1990,7 +1996,7 @@ layout::print_any_labels (linenum_type row)
if (text.get () == NULL)
  continue;
 
-   labels.safe_push (line_label (m_policy, i, disp_col, std::move (text)));
+   labels.safe_push (line_label (i, disp_col, std::move (text)));
   }
   }
 
@@ -4382,9 +4388,9 @@ test_one_liner_labels_utf8 ()
   ASSERT_STREQ (" _foo = _bar._field;\n"
" ^    ~~~\n"
" |   ||\n"
-   " |   |label 2\xcf\x80\n"
-   " |   label 1\xcf\x80\n"
-   " label 0\xf0\x9f\x98\x82\n",
+   " label 0\xf0\x9f\x98\x82"
+   /* ... */ "   label 1\xcf\x80"
+   /* ...*/ " label 2\xcf\x80\n",
pp_formatted_text (dc.printer));
 }
 {
@@ -4395,9 +4401,9 @@ test_one_liner_labels_utf8 ()
(" <9f><98><82>_foo = 
<80>_bar.<9f><98><82>_field<80>;\n"
 " ^~~~    ~~\n"
 " |  ||\n"
-" |  |label 2\xcf\x80\n"
-" |  label 1\xcf\x80\n"
-" label 0\xf0\x9f\x98\x82\n",
+" label 0\xf0\x9f\x98\x82"
+/* ... */ "  label 1\xcf\x80"
+/* ..*/ " label 2\xcf\x80\n",
 pp_formatted_text (dc.printer));
 }
   }


[gcc r15-534] diagnostics: add warning emoji to events with VERB_danger

2024-05-15 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:0b7ebe5427a4af0956e0aed5e7432b98559ca7b5

commit r15-534-g0b7ebe5427a4af0956e0aed5e7432b98559ca7b5
Author: David Malcolm 
Date:   Wed May 15 21:22:51 2024 -0400

diagnostics: add warning emoji to events with VERB_danger

Tweak the printing of -fdiagnostics-path-format=inline-events so that
any event with diagnostic_event::VERB_danger gains a warning emoji,
provided that the text art theme enables emoji support.

VERB_danger is set by the analyzer on the last event in a path, and so
this emoji appears at the end of all analyzer execution paths
highlighting the location of the problem.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to include warning emoji.
* gcc.dg/analyzer/warning-emoji.c: New test.

gcc/ChangeLog:
* tree-diagnostic-path.cc: Include "text-art/theme.h".
(path_label::get_text): If the event has
diagnostic_event::VERB_danger, and the theme enables emojis, then
add a warning emoji between the event number and the event text.

Signed-off-by: David Malcolm 

Diff:
---
 .../analyzer/out-of-bounds-diagram-1-emoji.c   |  2 +-
 gcc/testsuite/gcc.dg/analyzer/warning-emoji.c  | 29 +
 gcc/tree-diagnostic-path.cc| 30 +++---
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c 
b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
index 1c6125225ff2..7b4ecf0d6b0c 100644
--- a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
@@ -29,7 +29,7 @@ void int_arr_write_element_after_end_off_by_one(int32_t x)
|   arr[10] = x;
|   ^~~
|   |
-   |   (2) out-of-bounds write from byte 40 till byte 43 but 
'arr' ends at byte 40
+   |   (2) ⚠️  out-of-bounds write from byte 40 till byte 43 
but 'arr' ends at byte 40
|
{ dg-end-multiline-output "" } */
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/warning-emoji.c 
b/gcc/testsuite/gcc.dg/analyzer/warning-emoji.c
new file mode 100644
index ..47e5fb0acf90
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/warning-emoji.c
@@ -0,0 +1,29 @@
+/* Verify that the final event in an analyzer path gets a "warning" emoji 
+   when -fdiagnostics-text-art-charset=emoji (and
+   -fdiagnostics-path-format=inline-events).  */
+
+/* { dg-additional-options "-fdiagnostics-show-line-numbers" } */
+/* { dg-additional-options "-fdiagnostics-show-caret" } */
+/* { dg-additional-options "-fdiagnostics-path-format=inline-events" } */
+/* { dg-additional-options "-fdiagnostics-text-art-charset=emoji" } */
+/* { dg-enable-nn-line-numbers "" } */
+
+void test (void *p)
+{
+  __builtin_free (p);
+  __builtin_free (p); /* { dg-warning "double-'free'" } */
+}
+
+/* { dg-begin-multiline-output "" }
+   NN |   __builtin_free (p);
+  |   ^~
+  'test': events 1-2
+   NN |   __builtin_free (p);
+  |   ^~
+  |   |
+  |   (1) first 'free' here
+   NN |   __builtin_free (p);
+  |   ~~
+  |   |
+  |   (2) ⚠️  second 'free' here; first 'free' was at (1)
+   { dg-end-multiline-output "" } */
diff --git a/gcc/tree-diagnostic-path.cc b/gcc/tree-diagnostic-path.cc
index 33389ef5d33e..bc90aaf321cc 100644
--- a/gcc/tree-diagnostic-path.cc
+++ b/gcc/tree-diagnostic-path.cc
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-event-id.h"
 #include "selftest.h"
 #include "selftest-diagnostic.h"
+#include "text-art/theme.h"
 
 /* Anonymous namespace for path-printing code.  */
 
@@ -60,13 +61,36 @@ class path_label : public range_label
 /* Get the description of the event, perhaps with colorization:
normally, we don't colorize within a range_label, but this
is special-cased for diagnostic paths.  */
-bool colorize = pp_show_color (global_dc->printer);
+const bool colorize = pp_show_color (global_dc->printer);
 label_text event_text (event.get_desc (colorize));
 gcc_assert (event_text.get ());
+
+const diagnostic_event::meaning meaning (event.get_meaning ());
+
 pretty_printer pp;
-pp_show_color (&pp) = pp_show_color (global_dc->printer);
+pp_show_color (&pp) = colorize;
 diagnostic_event_id_t event_id (event_idx);
-pp_printf (&pp, "%@ %s", &event_id, event_text.get ());
+
+pp_printf (&pp, "%@", &event_id);
+pp_space (&pp);
+
+if (meaning.m_verb == diagnostic_event::VERB_danger)
+  if (text_art::theme *theme = global_dc->get_diagram_theme ())
+   if (theme->emojis_p ())
+ {
+   pp_unicode_character (&pp, 0x26A0); /* U+26A0 WARNING SIGN.  */
+   /* Append U+F

[gcc r15-533] diagnostics: simplify output of purely intraprocedural execution paths

2024-05-15 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:3cd267446755ab6b2c59936a718d34c8bc474ca5

commit r15-533-g3cd267446755ab6b2c59936a718d34c8bc474ca5
Author: David Malcolm 
Date:   Wed May 15 21:22:51 2024 -0400

diagnostics: simplify output of purely intraprocedural execution paths

Diagnostic path printing was added in r10-5901-g4bc1899b2e883f.  As of
that commit, with -fdiagnostics-path-format=inline-events (the default),
we print a vertical line to the left of the source line numbering,
visualizing the stack depth and interprocedural calls and returns as
indentation changes.

For cases where the events on a thread are purely interprocedural, this
line does nothing except take up space and complicate the output.

This patch adds logic to omit it for such cases, simpifying the output,
and, I believe, improving readability.

gcc/ChangeLog:
* diagnostic-path.h: Update leading comment to reflect
intraprocedural cases.  Fix typo in comment.
* doc/invoke.texi: Update intraprocedural example.

gcc/testsuite/ChangeLog:
* c-c++-common/analyzer/allocation-size-multiline-1.c: Update
expected results for purely intraprocedural path.
* c-c++-common/analyzer/allocation-size-multiline-2.c: Likewise.
* c-c++-common/analyzer/allocation-size-multiline-3.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-0.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-1.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-2.c: Likewise.
* c-c++-common/analyzer/analyzer-verbosity-3.c: Likewise.
* c-c++-common/analyzer/malloc-macro-inline-events.c: Likewise.
Doing so for this file requires a rewrite since the paths
prefixing the "in expansion of macro" lines become the only thing
on their line and so are no longer pruned by multiline.exp logic
for pruning extra content on non-blank lines.
* c-c++-common/analyzer/malloc-paths-9-noexcept.c: Likewise.
* c-c++-common/analyzer/setjmp-2.c: Likewise.
* gcc.dg/analyzer/malloc-paths-9.c: Likewise.
* gcc.dg/analyzer/out-of-bounds-multiline-2.c: Likewise.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Likewise.

gcc/ChangeLog:
* tree-diagnostic-path.cc (per_thread_summary::interprocedural_p):
New.
(thread_event_printer::print_swimlane_for_event_range): Don't
indent and print the stack depth line if this thread's events are
purely intraprocedural.
(selftest::test_intraprocedural_path): Update expected output.

Signed-off-by: David Malcolm 

Diff:
---
 gcc/diagnostic-path.h  |  32 +-
 gcc/doc/invoke.texi|  30 +-
 .../analyzer/allocation-size-multiline-1.c |  68 +--
 .../analyzer/allocation-size-multiline-2.c |  72 +--
 .../analyzer/allocation-size-multiline-3.c |  48 +-
 .../c-c++-common/analyzer/analyzer-verbosity-0.c   |  40 +-
 .../c-c++-common/analyzer/analyzer-verbosity-1.c   |  40 +-
 .../c-c++-common/analyzer/analyzer-verbosity-2.c   |  40 +-
 .../c-c++-common/analyzer/analyzer-verbosity-3.c   |  40 +-
 .../analyzer/malloc-macro-inline-events.c  |  83 +--
 .../analyzer/malloc-paths-9-noexcept.c | 604 ++---
 gcc/testsuite/c-c++-common/analyzer/setjmp-2.c | 140 +++--
 gcc/testsuite/gcc.dg/analyzer/malloc-paths-9.c | 302 +--
 .../gcc.dg/analyzer/out-of-bounds-multiline-2.c|  21 +-
 .../gcc.dg/plugin/diagnostic-test-paths-2.c|  30 +-
 gcc/tree-diagnostic-path.cc|  86 ++-
 16 files changed, 799 insertions(+), 877 deletions(-)

diff --git a/gcc/diagnostic-path.h b/gcc/diagnostic-path.h
index fb7abe88ed32..696991c6d736 100644
--- a/gcc/diagnostic-path.h
+++ b/gcc/diagnostic-path.h
@@ -41,22 +41,20 @@ class sarif_object;
 29 | PyList_Append(list, item);
| ^
'demo': events 1-3
-  |
-  |   25 |   list = PyList_New(0);
-  |  |  ^
-  |  |  |
-  |  |  (1) when 'PyList_New' fails, returning NULL
-  |   26 |
-  |   27 |   for (i = 0; i < count; i++) {
-  |  |   ~~~
-  |  |   |
-  |  |   (2) when 'i < count'
-  |   28 | item = PyLong_FromLong(random());
-  |   29 | PyList_Append(list, item);
-  |  | ~
-  |  | |
-  |  | (3) when calling 'PyList_Append', passing NULL from (1) 
as argument 1
-  |
+25 |   list = PyList_New(0);
+   |  ^
+   |  |
+   |  (1) when 'PyList_New' fails, returning NULL
+ 

[gcc r15-535] diagnostics: use unicode art for interprocedural depth

2024-05-15 Thread David Malcolm via Gcc-cvs
https://gcc.gnu.org/g:e656656e711949ef42a7e284f7cf81ca56f37374

commit r15-535-ge656656e711949ef42a7e284f7cf81ca56f37374
Author: David Malcolm 
Date:   Wed May 15 21:22:52 2024 -0400

diagnostics: use unicode art for interprocedural depth

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected
output to use unicode for depth indication.
* gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c: Likewise.

gcc/ChangeLog:
* text-art/theme.cc (ascii_theme::get_cppchar): Add
cell_kind::INTERPROCEDURAL_*.
(unicode_theme::get_cppchar): Likewise.
* text-art/theme.h (theme::cell_kind): Likewise.
* tree-diagnostic-path.cc:
(thread_event_printer::print_swimlane_for_event_range): Use the
above to get characters for indicating interprocedural stack
depth activity, falling back to ascii.
(selftest::test_interprocedural_path_1): Test with both ascii
and unicode themes.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.

Signed-off-by: David Malcolm 

Diff:
---
 .../analyzer/out-of-bounds-diagram-1-emoji.c   |  26 +-
 .../analyzer/out-of-bounds-diagram-1-unicode.c |  26 +-
 gcc/text-art/theme.cc  |  30 ++
 gcc/text-art/theme.h   |  10 +
 gcc/tree-diagnostic-path.cc| 381 ++---
 5 files changed, 331 insertions(+), 142 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c 
b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
index 7b4ecf0d6b0c..8d22e4109628 100644
--- a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c
@@ -18,19 +18,19 @@ void int_arr_write_element_after_end_off_by_one(int32_t x)
arr[10] = x;
^~~
   event 1 (depth 0)
-|
-| int32_t arr[10];
-| ^~~
-| |
-| (1) capacity: 40 bytes
-|
-+--> 'int_arr_write_element_after_end_off_by_one': event 2 (depth 1)
-   |
-   |   arr[10] = x;
-   |   ^~~
-   |   |
-   |   (2) ⚠️  out-of-bounds write from byte 40 till byte 43 
but 'arr' ends at byte 40
-   |
+│
+│ int32_t arr[10];
+│ ^~~
+│ |
+│ (1) capacity: 40 bytes
+│
+└──> 'int_arr_write_element_after_end_off_by_one': event 2 (depth 1)
+   │
+   │   arr[10] = x;
+   │   ^~~
+   │   |
+   │   (2) ⚠️  out-of-bounds write from byte 40 till byte 43 
but 'arr' ends at byte 40
+   │
{ dg-end-multiline-output "" } */
 
 /* { dg-begin-multiline-output "" }
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c 
b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c
index 71f66ff87c9e..58c4a7bedf34 100644
--- a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-1-unicode.c
@@ -18,19 +18,19 @@ void int_arr_write_element_after_end_off_by_one(int32_t x)
arr[10] = x;
^~~
   event 1 (depth 0)
-|
-| int32_t arr[10];
-| ^~~
-| |
-| (1) capacity: 40 bytes
-|
-+--> 'int_arr_write_element_after_end_off_by_one': event 2 (depth 1)
-   |
-   |   arr[10] = x;
-   |   ^~~
-   |   |
-   |   (2) out-of-bounds write from byte 40 till byte 43 but 
'arr' ends at byte 40
-   |
+│
+│ int32_t arr[10];
+│ ^~~
+│ |
+│ (1) capacity: 40 bytes
+│
+└──> 'int_arr_write_element_after_end_off_by_one': event 2 (depth 1)
+   │
+   │   arr[10] = x;
+   │   ^~~
+   │   |
+   │   (2) out-of-bounds write from byte 40 till byte 43 but 
'arr' ends at byte 40
+   │
{ dg-end-multiline-output "" } */
 
 /* { dg-begin-multiline-output "" }
diff --git a/gcc/text-art/theme.cc b/gcc/text-art/theme.cc
index 4ac0cae92e26..cba4c585c469 100644
--- a/gcc/text-art/theme.cc
+++ b/gcc/text-art/theme.cc
@@ -125,6 +125,21 @@ ascii_theme::get_cppchar (enum cell_kind kind) const
 case cell_kind::Y_ARROW_UP_TAIL:
 case cell_kind::Y_ARROW_DOWN_TAIL:
   return '|';
+
+case cell_kind::INTERPROCEDURAL_PUSH_FRAME_LEFT:
+  return '+';
+case cell_kind::INTERPROCEDURAL_PUSH_FRAME_MIDDLE:
+  return '-';
+case cell_kind::INTERPROCEDURAL_PUSH_FRAME_RIGHT:
+  return '>';
+case cell_kind::INTERPROCEDURAL_DEPTH_MARKER:
+  return '|';
+case cell_kind::INTERPROCEDURAL_POP_FRAMES_LEFT:
+  return '<';
+case cell_kind::INTERPROC