Re: [PATCH] LoongArch: Allow using --with-arch=native if host CPU is LoongArch

2023-07-22 Thread chenglulu



在 2023/7/20 下午9:28, Xi Ruoyao 写道:

If the host triple and the target triple are different but the host is
LoongArch, in some cases --with-arch=native can be useful.  For example,
if we are bootstrapping a loongarch64-linux-musl toolchain on a
Glibc-based system and we don't intend to use the toolchain on other
machines, we can use

 ../gcc/configure --{build,host}=loongarch64-linux-gnu \
  --target=loongarch64-linux-musl --with-arch=native

Relax the check in config.gcc to allow such configurations.

gcc/ChangeLog:

* config.gcc [target=loongarch*-*-*, with_arch=native]: Allow
building cross compiler if the host CPU is LoongArch.
---

Tested on x86_64-linux-gnu (building a cross compiler targeting
LoongArch --with-arch=native still rejected) and loongarch64-linux-gnu
(building a cross compiler targeting loongarch64-linux-musl allowed).
Ok for trunk?



Hi, Ruoyao:

 I'm sorry that some things at home are late, can this usage scenario 
be described in detail? I didn't really understand that.




  gcc/config.gcc | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1446eb2b3ca..146bca22a38 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4939,10 +4939,13 @@ case "${target}" in
case ${with_arch} in
"" | loongarch64 | la464) ;; # OK, append here.
native)
-   if test x${host} != x${target}; then
+   case ${host} in
+   loongarch*) ;; # OK
+   *)
echo "--with-arch=native is illegal for 
cross-compiler." 1>&2
exit 1
-   fi
+   ;;
+   esac
;;
"")
echo "Please set a default value for \${with_arch}" \




Re: [PATCH v2 2/3] libstdc++: Optimize is_arithmetic performance by __is_arithmetic built-in

2023-07-22 Thread François Dumont via Gcc-patches



On 17/07/2023 06:48, Ken Matsui wrote:

On Sun, Jul 16, 2023 at 5:32 AM François Dumont  wrote:


On 15/07/2023 06:55, Ken Matsui via Libstdc++ wrote:

This patch optimizes the performance of the is_arithmetic trait by
dispatching to the new __is_arithmetic built-in trait.

libstdc++-v3/ChangeLog:

   * include/std/type_traits (is_arithmetic): Use __is_arithmetic
   built-in trait.
   (is_arithmetic_v): Likewise.

Signed-off-by: Ken Matsui 
---
   libstdc++-v3/include/std/type_traits | 14 ++
   1 file changed, 14 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..7ebbe04c77b 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -655,10 +655,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { };

 /// is_arithmetic
+#if __has_builtin(__is_arithmetic)
+  template
+struct is_arithmetic
+: public __bool_constant<__is_arithmetic(_Tp)>
+{ };
+#else
 template
   struct is_arithmetic
   : public __or_, is_floating_point<_Tp>>::type
   { };
+#endif

 /// is_fundamental
 template
@@ -3198,8 +3205,15 @@ template 
 inline constexpr bool is_reference_v<_Tp&> = true;
   template 
 inline constexpr bool is_reference_v<_Tp&&> = true;
+
+#if __has_builtin(__is_arithmetic)
+template 
+  inline constexpr bool is_arithmetic_v = __is_arithmetic(_Tp);
+#else
   template 
 inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
+#endif
+
   template 
 inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
   template 

Same remark as the one I did for __is_pointer in cpp_type_traits.h. You
could implement it as:

template
  struct __is_arithmetic_t
  : public __truth_type<__is_arithmetic(_Tp)>
  { };

François


Thank you for your review! This is from the type_traits header, so the
name should be as-is.


Here I meant that current libstdc++ implementation of 
std::__is_arithmetic in cpp_type_traits.h should also make use of the 
builtin __is_arithmetic that you are introducing. That is to say replace 
this:


  template
    struct __is_arithmetic
    : public __traitor<__is_integer<_Tp>, __is_floating<_Tp> >
    { };

by:

#if __has_builtin(__is_arithmetic)

  template
    struct __is_arithmetic_t
    : public __truth_type<__is_arithmetic<_Tp>>
    { };

#else

  template
    struct __is_arithmetic_t
    : public __traitor<__is_integer<_Tp>, __is_floating<_Tp> >
    { };

#endif

if you replace '__is_arithmetic' by '__is_arithmetic_t' for the 
libstdc++, just adapt to the name you eventually adopt.





Re: [PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-07-22 Thread François Dumont via Gcc-patches
It seems rather logical cause std::disjunction is supposed to avoid 
instantiations but in case of:


std::disjunction, std::is_null_pointer<_Tp>>

you'll avoid std::is_null_pointer instantiation only for 'void' type and 
at the price of instantiating std::disjunction so 2 instantiations at 
best but most of the time 3, clearly useless here.


On 18/07/2023 08:24, Ken Matsui wrote:

Hi,

I took a benchmark for this.

https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental-disjunction.md#mon-jul-17-105937-pm-pdt-2023

template
struct is_fundamental
: public std::bool_constant<__is_arithmetic(_Tp)
 || std::is_void<_Tp>::value
 || std::is_null_pointer<_Tp>::value>
{ };

is faster than:

template
struct is_fundamental
: public std::bool_constant<__is_arithmetic(_Tp)
 || std::disjunction,
 std::is_null_pointer<_Tp>
 >::value>
{ };

Time: -32.2871%
Peak Memory: -18.5071%
Total Memory: -20.1991%

Sincerely,
Ken Matsui

On Sun, Jul 16, 2023 at 9:49 PM Ken Matsui  wrote:

On Sun, Jul 16, 2023 at 5:41 AM François Dumont  wrote:


On 15/07/2023 06:55, Ken Matsui via Libstdc++ wrote:

This patch optimizes the performance of the is_fundamental trait by
dispatching to the new __is_arithmetic built-in trait.

libstdc++-v3/ChangeLog:

   * include/std/type_traits (is_fundamental_v): Use __is_arithmetic
   built-in trait.
   (is_fundamental): Likewise. Optimize the original implementation.

Signed-off-by: Ken Matsui 
---
   libstdc++-v3/include/std/type_traits | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7ebbe04c77b..cf24de2fcac 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   #endif

 /// is_fundamental
+#if __has_builtin(__is_arithmetic)
+  template
+struct is_fundamental
+: public __bool_constant<__is_arithmetic(_Tp)
+ || is_void<_Tp>::value
+ || is_null_pointer<_Tp>::value>
+{ };

What about doing this ?

template
  struct is_fundamental
  : public __bool_constant<__is_arithmetic(_Tp)
   || __or_,
   is_null_pointer<_Tp>>::value>
  { };

Based on your benches it seems that builtin __is_arithmetic is much better that 
std::is_arithmetic. But __or_ could still avoid instantiation of 
is_null_pointer.


Let me take a benchmark for this later.


Re: [PATCH v3 1/3] c++, libstdc++: Implement __is_arithmetic built-in trait

2023-07-22 Thread François Dumont via Gcc-patches



On 18/07/2023 08:27, Ken Matsui via Libstdc++ wrote:

This patch implements built-in trait for std::is_arithmetic.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_arithmetic.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARITHMETIC.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_arithmetic.
* g++.dg/ext/is_arithmetic.C: New test.
* g++.dg/tm/pr46567.C (__is_arithmetic): Rename to ...
(__is_arith): ... this.
* g++.dg/torture/pr57107.C: Likewise.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_arithmetic): Rename to ...
(__is_arith): ... this.
* include/c_global/cmath: Use __is_arith instead.
* include/c_std/cmath: Likewise.
* include/tr1/cmath: Likewise.

Signed-off-by: Ken Matsui 
---
  gcc/cp/constraint.cc|  3 ++
  gcc/cp/cp-trait.def |  1 +
  gcc/cp/semantics.cc |  4 ++
  gcc/testsuite/g++.dg/ext/has-builtin-1.C|  3 ++
  gcc/testsuite/g++.dg/ext/is_arithmetic.C| 33 ++
  gcc/testsuite/g++.dg/tm/pr46567.C   |  6 +--
  gcc/testsuite/g++.dg/torture/pr57107.C  |  4 +-
  libstdc++-v3/include/bits/cpp_type_traits.h |  4 +-
  libstdc++-v3/include/c_global/cmath | 48 ++---
  libstdc++-v3/include/c_std/cmath| 24 +--
  libstdc++-v3/include/tr1/cmath  | 24 +--
  11 files changed, 99 insertions(+), 55 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/is_arithmetic.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8cf0f2d0974..bd517d08843 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3754,6 +3754,9 @@ diagnose_trait_expr (tree expr, tree args)
  case CPTK_IS_AGGREGATE:
inform (loc, "  %qT is not an aggregate", t1);
break;
+case CPTK_IS_ARITHMETIC:
+  inform (loc, "  %qT is not an arithmetic type", t1);
+  break;
  case CPTK_IS_TRIVIALLY_COPYABLE:
inform (loc, "  %qT is not trivially copyable", t1);
break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..a95aeeaf778 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (IS_ARITHMETIC, "__is_arithmetic", 1)
  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
  /* FIXME Added space to avoid direct usage in GCC 13.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..4531f047d73 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
  case CPTK_IS_UNION:
return type_code1 == UNION_TYPE;
  
+case CPTK_IS_ARITHMETIC:

+  return ARITHMETIC_TYPE_P (type1);
+
  case CPTK_IS_ASSIGNABLE:
return is_xible (MODIFY_EXPR, type1, type2);
  
@@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, tree type1, tree type2)

  case CPTK_IS_ENUM:
  case CPTK_IS_UNION:
  case CPTK_IS_SAME:
+case CPTK_IS_ARITHMETIC:
break;
  
  case CPTK_IS_LAYOUT_COMPATIBLE:

diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..3d63b0101d1 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
  #if !__has_builtin (__remove_cvref)
  # error "__has_builtin (__remove_cvref) failed"
  #endif
+#if !__has_builtin (__is_arithmetic)
+# error "__has_builtin (__is_arithmetic) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/is_arithmetic.C 
b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
new file mode 100644
index 000..fd35831f646
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
@@ -0,0 +1,33 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+SA_TEST_CATEGORY(__is_arithmetic, void, false);
+
+SA_TEST_CATEGORY(__is_arithmetic, char, true);
+SA_TEST_CATEGORY(__is_arithmetic, signed char, true);
+SA_TEST_CATEGORY(__is_arithmetic, unsigned char, true);
+SA_TEST_CATEGORY

[x86 PATCH] Don't use insvti_{high, low}part with -O0 (for compile-time).

2023-07-22 Thread Roger Sayle

This patch attempts to help with PR rtl-optimization/110587, a regression
of -O0 compile time for the pathological pr28071.c.  My recent patch helps
a bit, but hasn't returned -O0 compile-time to where it was before my
ix86_expand_move changes.  The obvious solution/workaround is to guard
these new TImode parameter passing optimizations with "&& optimize", so
they don't trigger when compiling with -O0.  The very minor complication
is that "&& optimize" alone leads to the regression of pr110533.c, where
our improved TImode parameter passing fixes a wrong-code issue with naked
functions, importantly, when compiling with -O0.  This should explain
the one line fix below "&& (optimize || ix86_function_naked (cfun))".

I've an additional fix/tweak or two for this compile-time issue, but
this change eliminates the part of the regression that I've caused.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-07-22  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_move): Disable the
64-bit insertions into TImode optimizations with -O0, unless
the function has the "naked" attribute (for PR target/110533).

Cheers,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 7e94447..cdef95e 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -544,6 +544,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
 
   /* Special case inserting 64-bit values into a TImode register.  */
   if (TARGET_64BIT
+  && (optimize || ix86_function_naked (current_function_decl))
   && (mode == DImode || mode == DFmode)
   && SUBREG_P (op0)
   && GET_MODE (SUBREG_REG (op0)) == TImode


Re: [WIP RFC] analyzer: Add optional trim of the analyzer diagnostics going too deep [PR110543]

2023-07-22 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 21 Jul 2023 at 21:05, Benjamin Priour via Gcc-patches
 wrote:
>
> Hi,
>
> Upon David's request I've joined the in progress patch to the below email.
> I hope it makes more sense now.
>
> Best,
> Benjamin.
>
> -- Forwarded message -
> From: Benjamin Priour 
> Date: Tue, Jul 18, 2023 at 3:30 PM
> Subject: [RFC] analyzer: Add optional trim of the analyzer diagnostics
> going too deep [PR110543]
> To: , David Malcolm 
>
>
> Hi,
>
> I'd like to request comments on a patch I am writing for PR110543.
> The goal of this patch is to reduce the noise of the analyzer emitted
> diagnostics when dealing with
> system headers, or simply diagnostic paths that are too long. The new
> option only affects the display
> of the diagnostics, but doesn't hinder the actual analysis.
>
> I've defaulted the new option to "system", thus preventing the diagnostic
> paths from showing system headers.
> "never" corresponds to the pre-patch behavior, whereas you can also specify
> an unsigned value 
> that prevents paths to go deeper than  frames.
>
> fanalyzer-trim-diagnostics=
> > Common Joined RejectNegative ToLower Var(flag_analyzer_trim_diagnostics)
> > Init("system")
> > -fanalyzer-trim-diagnostics=[never|system|] Trim diagnostics
> > path that are too long before emission.
> >
>
> Does it sounds reasonable and user-friendly ?
>
> Regstrapping was a success against trunk, although one of the newly added
> test case fails for c++14.
> Note that the test case below was done with "never", thus behaves exactly
> as the pre-patch analyzer
> on x86_64-linux-gnu.
>
> /* { dg-additional-options "-fdiagnostics-plain-output
> > -fdiagnostics-path-format=inline-events -fanalyzer-trim-diagnostics=never"
> > } */
> > /* { dg-skip-if "" { c++98_only }  } */
> >
> > #include 
> > struct A {int x; int y;};
> >
> > int main () {
> >   std::shared_ptr a;
> >   a->x = 4; /* { dg-line deref_a } */
> >   /* { dg-warning "dereference of NULL" "" { target *-*-* } deref_a } */
> >
> >   return 0;
> > }
> >
> > /* { dg-begin-multiline-output "" }
> >   'int main()': events 1-2
> > |
> > |
> > +--> 'std::__shared_ptr_access<_Tp, _Lp, , 
> > >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> > false]': events 3-4
> >|
> >|
> >+--> 'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> > false; bool  = false]': events 5-6
> >   |
> >   |
> >   +--> 'std::__shared_ptr<_Tp, _Lp>::element_type*
> > std::__shared_ptr<_Tp, _Lp>::get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]': events 7-8
> >  |
> >  |
> >   <--+
> >   |
> > 'std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::element_type* std::__shared_ptr_access<_Tp, _Lp,
> > ,  >::_M_get() const [with _Tp = A;
> > __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool  =
> > false; bool  = false]': event 9
> >   |
> >   |
> ><--+
> >|
> >  'std::__shared_ptr_access<_Tp, _Lp, , 
> > >::element_type* std::__shared_ptr_access<_Tp, _Lp, ,
> >  >::operator->() const [with _Tp = A; __gnu_cxx::_Lock_policy
> > _Lp = __gnu_cxx::_S_atomic; bool  = false; bool  =
> > false]': event 10
> >|
> >|
> > <--+
> > |
> >   'int main()': events 11-12
> > |
> > |
> >{ dg-end-multiline-output "" } */
> >
>
>
> The first events "'int main()': events 1-2" vary in c++14 (get events 1-3).
>
> >
> > // c++14 with fully detailed output
> >   ‘int main()’: events 1-3
> > |
> > |8 | int main () {
> > |  | ^~~~
> > |  | |
> > |  | (1) entry to ‘main’
> > |9 |   std::shared_ptr a;
> > |  |  ~
> > |  |  |
> > |  |  (2)
> > ‘a.std::shared_ptr::.std::__shared_ptr > __gnu_cxx::_S_atomic>::_M_ptr’ is NULL
> > |   10 |   a->x = 4; /* { dg-line deref_a } */
> > |  |~~
> > |  ||
> > |  |(3) calling ‘std::__shared_ptr_access > __gnu_cxx::_S_atomic, false, false>::operator->’ from ‘main’
> >
>
> whereas c++17 and posterior give
>
> > // c++17 with fully detailed output
> >
> // ./xg++ -fanalyzer
> >  ../../gcc/gcc/testsuite/g++.dg/analyzer/fanalyzer-trim-diagnostics-never.C
> >  -B. -shared-libgcc -fanalyzer-trim-diagnostics=never -std=c++17
> >
>   ‘int main()’: events 1-2
> > |
> > |8 | int main () {
> > |  | ^~~~
> > |  | |
> > |  | (1) entry to ‘main’
> > |9 |   std::shared_ptr a;
> >   

[PATCH v5 0/3] c++: Track lifetimes in constant evaluation [PR70331, ...]

2023-07-22 Thread Nathaniel Shead via Gcc-patches
This is an update of the patch series at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625050.html

Changes since v4:

- Reordered patches to be more independent from each other (they don't need 
  to keep updating the new tests)
- Removed workaround for better locations in cxx_eval_store_expression
- Don't bother checking lifetime for CONST_DECLs
- Rewrite patch for dangling pointers to keep the transformation to
  `return (&x, nullptr)`, but only perform it when genericising. It turns out
  that implementing this wasn't as hard as I thought it might be, at least for
  this specific case.

Thanks very much for all the reviews and comments so far!

Bootstrapped and regtested on x86_64-pc-linux-gnu.

Nathaniel Shead (3):
  c++: Improve location information in constant evaluation
  c++: Prevent dangling pointers from becoming nullptr in constexpr
[PR110619]
  c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

 gcc/cp/constexpr.cc   | 159 +-
 gcc/cp/cp-gimplify.cc |  23 ++-
 gcc/cp/cp-tree.h  |   8 +-
 gcc/cp/semantics.cc   |   4 +-
 gcc/cp/typeck.cc  |   9 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  |  10 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |   8 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |   8 +-
 .../g++.dg/cpp0x/constexpr-delete2.C  |   5 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |   2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |   1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|   6 +-
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|   2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C |  10 ++
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |   5 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 +++
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime6.C|  15 ++
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |   4 +-
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |   3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |   3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |   4 +-
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda6.C  |   4 +-
 .../g++.dg/cpp1z/constexpr-lambda8.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   |  14 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |   4 +-
 .../g++.dg/cpp2a/constexpr-dynamic17.C|   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |   5 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |   6 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   |  10 +-
 gcc/testsuite/g++.dg/cpp2a/constinit10.C  |   5 +-
 .../g++.dg/cpp2a/is-corresponding-member4.C   |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla2.C |   4 +-
 gcc/testsuite/g++.dg/ext/constexpr-vla3.C |   4 +-
 gcc/testsuite/g++.dg/ubsan/pr63956.C  |  23 +--
 .../25_algorithms/equal/constexpr_neg.cc  |   7 +-
 .../testsuite/26_numerics/gcd/105844.cc   |  10 +-
 .../testsuite/26_numerics/lcm/105844.cc   |  14 +-
 51 files changed, 361 insertions(+), 168 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C

-- 
2.41.0



[PATCH v5 1/3] c++: Improve location information in constant evaluation

2023-07-22 Thread Nathaniel Shead via Gcc-patches
This patch updates 'input_location' during constant evaluation to ensure
that errors in subexpressions that lack location information still
provide accurate diagnostics.

By itself this change causes some small regressions in diagnostic
quality for circumstances where errors used 'input_location' but the
location of the parent subexpression doesn't make sense, so this patch
also includes a small diagnostic improvement to fix the most egregious
case.

gcc/cp/ChangeLog:

* constexpr.cc (modifying_const_object_error): Find the source
location of the const object's declaration.
(cxx_eval_constant_expression): Update input_location to the location
of the currently evaluated expression, if possible.

libstdc++-v3/ChangeLog:

* testsuite/25_algorithms/equal/constexpr_neg.cc: Update diagnostic
locations.
* testsuite/26_numerics/gcd/105844.cc: Likewise.
* testsuite/26_numerics/lcm/105844.cc: Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-48089.C: Update diagnostic locations.
* g++.dg/cpp0x/constexpr-70323.C: Likewise.
* g++.dg/cpp0x/constexpr-70323a.C: Likewise.
* g++.dg/cpp0x/constexpr-delete2.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-ice20.C: Likewise.
* g++.dg/cpp0x/constexpr-recursion.C: Likewise.
* g++.dg/cpp0x/overflow1.C: Likewise.
* g++.dg/cpp1y/constexpr-89285.C: Likewise.
* g++.dg/cpp1y/constexpr-89481.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const14.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const16.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const18.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const19.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const21.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const22.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const3.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const4.C: Likewise.
* g++.dg/cpp1y/constexpr-tracking-const7.C: Likewise.
* g++.dg/cpp1y/constexpr-union5.C: Likewise.
* g++.dg/cpp1y/pr68180.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda6.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda8.C: Likewise.
* g++.dg/cpp2a/bit-cast11.C: Likewise.
* g++.dg/cpp2a/bit-cast12.C: Likewise.
* g++.dg/cpp2a/bit-cast14.C: Likewise.
* g++.dg/cpp2a/constexpr-98122.C: Likewise.
* g++.dg/cpp2a/constexpr-dynamic17.C: Likewise.
* g++.dg/cpp2a/constexpr-init1.C: Likewise.
* g++.dg/cpp2a/constexpr-new12.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise.
* g++.dg/cpp2a/constinit10.C: Likewise.
* g++.dg/cpp2a/is-corresponding-member4.C: Likewise.
* g++.dg/ext/constexpr-vla2.C: Likewise.
* g++.dg/ext/constexpr-vla3.C: Likewise.
* g++.dg/ubsan/pr63956.C: Likewise.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 31 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-48089.C  | 10 +++---
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323.C  |  8 ++---
 gcc/testsuite/g++.dg/cpp0x/constexpr-70323a.C |  8 ++---
 .../g++.dg/cpp0x/constexpr-delete2.C  |  5 +--
 gcc/testsuite/g++.dg/cpp0x/constexpr-diag3.C  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-ice20.C  |  1 +
 .../g++.dg/cpp0x/constexpr-recursion.C|  6 ++--
 gcc/testsuite/g++.dg/cpp0x/overflow1.C|  2 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-89285.C  |  5 +--
 gcc/testsuite/g++.dg/cpp1y/constexpr-89481.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const14.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const16.C |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const18.C |  4 +--
 .../g++.dg/cpp1y/constexpr-tracking-const19.C |  4 +--
 .../g++.dg/cpp1y/constexpr-tracking-const21.C |  4 +--
 .../g++.dg/cpp1y/constexpr-tracking-const22.C |  4 +--
 .../g++.dg/cpp1y/constexpr-tracking-const3.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const4.C  |  3 +-
 .../g++.dg/cpp1y/constexpr-tracking-const7.C  |  3 +-
 gcc/testsuite/g++.dg/cpp1y/constexpr-union5.C |  4 +--
 gcc/testsuite/g++.dg/cpp1y/pr68180.C  |  4 +--
 .../g++.dg/cpp1z/constexpr-lambda6.C  |  4 +--
 .../g++.dg/cpp1z/constexpr-lambda8.C  |  5 ++-
 gcc/testsuite/g++.dg/cpp2a/bit-cast11.C   | 10 +++---
 gcc/testsuite/g++.dg/cpp2a/bit-cast12.C   | 10 +++---
 gcc/testsuite/g++.dg/cpp2a/bit-cast14.C   | 14 -
 gcc/testsuite/g++.dg/cpp2a/constexpr-98122.C  |  4 +--
 .../g++.dg/cpp2a/constexpr-dynamic17.C|  5 ++-
 gcc/testsuite/g++.dg/cpp2a/constexpr-init1.C  |  5 ++-
 gcc/testsuite/g++.dg/cpp2a/constexpr-new12.C  |  6 ++--
 gcc/testsuite/g++.dg/cpp2a/constexpr-new3.C   | 10 +++---
 gcc/testsuite/g++.dg/cpp2a/constinit10.C  |  5 ++-
 .../g++.dg/cpp2a/is-corresponding-member4.C   |  4 +--
 gcc/testsuite/g++.dg/ext/constexpr-vla2.C |  4 +--
 gcc/testsu

[PATCH v5 2/3] c++: Prevent dangling pointers from becoming nullptr in constexpr [PR110619]

2023-07-22 Thread Nathaniel Shead via Gcc-patches
Currently, when typeck discovers that a return statement will refer to a
local variable it rewrites to return a null pointer. This causes the
error messages for using the return value in a constant expression to be
unhelpful, especially for reference return values, and is also a visible
change to otherwise valid code (as in the linked PR).

The transformation is nonetheless important, however, both as a safety
guard against attackers being able to gain a handle to other data on the
stack, and to prevent duplicate warnings from later null-dereference
warning passes.

As such, this patch just delays the transformation until cp_genericize,
after constexpr function definitions have been generated.

PR c++/110619

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_genericize_r): Transform RETURN_EXPRs to
not return dangling pointers.
* cp-tree.h (RETURN_EXPR_LOCAL_ADDR_P): New flag.
(check_return_expr): Add a new parameter.
* semantics.cc (finish_return_stmt): Set flag on RETURN_EXPR
when referring to dangling pointer.
* typeck.cc (check_return_expr): Disable transformation of
dangling pointers, instead pass this information to caller.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-110619.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-gimplify.cc | 23 ---
 gcc/cp/cp-tree.h  |  8 ++-
 gcc/cp/semantics.cc   |  4 +++-
 gcc/cp/typeck.cc  |  9 
 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C | 10 
 5 files changed, 45 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-110619.C

diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
index f5734197774..0a5d6300aca 100644
--- a/gcc/cp/cp-gimplify.cc
+++ b/gcc/cp/cp-gimplify.cc
@@ -1336,9 +1336,26 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees, void 
*data)
   break;
 
 case RETURN_EXPR:
-  if (TREE_OPERAND (stmt, 0) && is_invisiref_parm (TREE_OPERAND (stmt, 0)))
-   /* Don't dereference an invisiref RESULT_DECL inside a RETURN_EXPR.  */
-   *walk_subtrees = 0;
+  if (TREE_OPERAND (stmt, 0))
+   {
+ if (is_invisiref_parm (TREE_OPERAND (stmt, 0)))
+   /* Don't dereference an invisiref RESULT_DECL inside a RETURN_EXPR. 
 */
+   *walk_subtrees = 0;
+ if (RETURN_EXPR_LOCAL_ADDR_P (stmt))
+   {
+ /* Don't return the address of a local variable.  */
+ tree *p = &TREE_OPERAND (stmt, 0);
+ while (TREE_CODE (*p) == COMPOUND_EXPR)
+   p = &TREE_OPERAND (*p, 0);
+ if (TREE_CODE (*p) == INIT_EXPR)
+   {
+ tree op = TREE_OPERAND (*p, 1);
+ tree new_op = build2 (COMPOUND_EXPR, TREE_TYPE (op), op,
+   build_zero_cst (TREE_TYPE (op)));
+ TREE_OPERAND (*p, 1) = new_op;
+   }
+   }
+   }
   break;
 
 case OMP_CLAUSE:
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 3de0e154c12..e0c181d9aef 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -447,6 +447,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
   INIT_EXPR_NRV_P (in INIT_EXPR)
   ATOMIC_CONSTR_MAP_INSTANTIATED_P (in ATOMIC_CONSTR)
   contract_semantic (in ASSERTION_, PRECONDITION_, POSTCONDITION_STMT)
+  RETURN_EXPR_LOCAL_ADDR_P (in RETURN_EXPR)
1: IDENTIFIER_KIND_BIT_1 (in IDENTIFIER_NODE)
   TI_PENDING_TEMPLATE_FLAG.
   TEMPLATE_PARMS_FOR_INLINE.
@@ -4071,6 +4072,11 @@ struct GTY(()) lang_decl {
   (LANG_DECL_FN_CHECK (FUNCTION_DECL_CHECK (NODE)) \
->u.saved_auto_return_type)
 
+/* In a RETURN_EXPR, whether the expression refers to the address
+   of a local variable.  */
+#define RETURN_EXPR_LOCAL_ADDR_P(NODE) \
+  TREE_LANG_FLAG_0 (RETURN_EXPR_CHECK (NODE))
+
 /* True if NODE is an implicit INDIRECT_REF from convert_from_reference.  */
 #define REFERENCE_REF_P(NODE)  \
   (INDIRECT_REF_P (NODE)   \
@@ -8139,7 +8145,7 @@ extern tree composite_pointer_type(const 
op_location_t &,
 tsubst_flags_t);
 extern tree merge_types(tree, tree);
 extern tree strip_array_domain (tree);
-extern tree check_return_expr  (tree, bool *);
+extern tree check_return_expr  (tree, bool *, bool *);
 extern tree spaceship_type (tree, tsubst_flags_t = 
tf_warning_or_error);
 extern tree genericize_spaceship   (location_t, tree, tree, tree);
 extern tree cp_build_binary_op  (const op_location_t &,
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..720521b7f1a 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -1240

[PATCH v5 3/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-22 Thread Nathaniel Shead via Gcc-patches
This adds rudimentary lifetime tracking in C++ constexpr contexts,
allowing the compiler to report errors with using values after their
backing has gone out of scope. We don't yet handle other ways of
accessing values outside their lifetime (e.g. following explicit
destructor calls).

PR c++/96630
PR c++/98675
PR c++/70331

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_global_ctx::is_outside_lifetime): New
function.
(constexpr_global_ctx::get_value): Don't return expired values.
(constexpr_global_ctx::get_value_ptr): Likewise.
(constexpr_global_ctx::remove_value): Mark value outside
lifetime.
(outside_lifetime_error): New function.
(cxx_eval_call_expression): No longer track save_exprs.
(cxx_eval_loop_expr): Likewise.
(cxx_eval_constant_expression): Add checks for outside lifetime
values. Remove local variables at end of bind exprs, and
temporaries after cleanup points.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: New test.
* g++.dg/cpp1y/constexpr-lifetime2.C: New test.
* g++.dg/cpp1y/constexpr-lifetime3.C: New test.
* g++.dg/cpp1y/constexpr-lifetime4.C: New test.
* g++.dg/cpp1y/constexpr-lifetime5.C: New test.
* g++.dg/cpp1y/constexpr-lifetime6.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/constexpr.cc   | 128 --
 .../g++.dg/cpp1y/constexpr-lifetime1.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime2.C|  20 +++
 .../g++.dg/cpp1y/constexpr-lifetime3.C|  13 ++
 .../g++.dg/cpp1y/constexpr-lifetime4.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime5.C|  11 ++
 .../g++.dg/cpp1y/constexpr-lifetime6.C|  15 ++
 7 files changed, 169 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-lifetime6.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 2bf2458c3cd..0ab77dcaf62 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1148,7 +1148,8 @@ enum constexpr_switch_state {
 
 class constexpr_global_ctx {
   /* Values for any temporaries or local variables within the
- constant-expression. */
+ constant-expression. Objects outside their lifetime have
+ value 'void_node'.  */
   hash_map values;
 public:
   /* Number of cxx_eval_constant_expression calls (except skipped ones,
@@ -1170,17 +1171,28 @@ public:
 : constexpr_ops_count (0), cleanups (NULL), modifiable (nullptr),
   heap_dealloc_count (0) {}
 
+  bool is_outside_lifetime (tree t)
+  {
+if (tree *p = values.get (t))
+  if (*p == void_node)
+   return true;
+return false;
+  }
  tree get_value (tree t)
   {
 if (tree *p = values.get (t))
-  return *p;
+  if (*p != void_node)
+   return *p;
 return NULL_TREE;
   }
   tree *get_value_ptr (tree t)
   {
 if (modifiable && !modifiable->contains (t))
   return nullptr;
-return values.get (t);
+if (tree *p = values.get (t))
+  if (*p != void_node)
+   return p;
+return nullptr;
   }
   void put_value (tree t, tree v)
   {
@@ -1188,7 +1200,13 @@ public:
 if (!already_in_map && modifiable)
   modifiable->add (t);
   }
-  void remove_value (tree t) { values.remove (t); }
+  void remove_value (tree t)
+  {
+if (DECL_P (t))
+  values.put (t, void_node);
+else
+  values.remove (t);
+  }
 };
 
 /* Helper class for constexpr_global_ctx.  In some cases we want to avoid
@@ -3111,12 +3129,9 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  gcc_assert (!DECL_BY_REFERENCE (res));
  ctx->global->put_value (res, NULL_TREE);
 
- /* Track the callee's evaluated SAVE_EXPRs and TARGET_EXPRs so that
-we can forget their values after the call.  */
- constexpr_ctx ctx_with_save_exprs = *ctx;
- auto_vec save_exprs;
- ctx_with_save_exprs.save_exprs = &save_exprs;
- ctx_with_save_exprs.call = &new_call;
+ /* Remember the current call we're evaluating.  */
+ constexpr_ctx call_ctx = *ctx;
+ call_ctx.call = &new_call;
  unsigned save_heap_alloc_count = ctx->global->heap_vars.length ();
  unsigned save_heap_dealloc_count = ctx->global->heap_dealloc_count;
 
@@ -3127,7 +3142,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
  non_constant_p, overflow_p);
 
  tree jump_target = NULL_TREE;
- cxx_eval_constant_expression (&ctx_with_save_exprs, body,
+ cxx_eval_constant_expressi

[x86 PATCH] Use QImode for offsets in zero_extract/sign_extract in i386.md

2023-07-22 Thread Roger Sayle

As suggested by Uros, this patch changes the ZERO_EXTRACTs and SIGN_EXTRACTs
in i386.md to consistently use QImode for bit offsets (i.e. third and fourth
operands), matching the use of QImode for bit counts in shifts and rotates.

There's no change in functionality, and the new patterns simply ensure that
we continue to generate the same code (match revised patterns) as before.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-07-22  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (extv): Use QImode for offsets.
(extzv): Likewise.
(insv): Likewise.
(*testqi_ext_3): Likewise.
(*btr_2): Likewise.
(define_split): Likewise.
(*btsq_imm): Likewise.
(*btrq_imm): Likewise.
(*btcq_imm): Likewise.
(define_peephole2 x3): Likewise.
(*bt): Likewise
(*bt_mask): New define_insn_and_split.
(*jcc_bt): Use QImode for offsets.
(*jcc_bt_1): Delete obsolete pattern.
(*jcc_bt_mask): Use QImode offsets.
(*jcc_bt_mask_1): Likewise.
(define_split): Likewise.
(*bt_setcqi): Likewise.
(*bt_setncqi): Likewise.
(*bt_setnc): Likewise.
(*bt_setncqi_2): Likewise.
(*bt_setc_mask): New define_insn_and_split.
(bmi2_bzhi_3): Use QImode offsets.
(*bmi2_bzhi_3): Likewise.
(*bmi2_bzhi_3_1): Likewise.
(*bmi2_bzhi_3_1_ccz): Likewise.
(@tbm_bextri_): Likewise.


Thanks,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 47ea050..de8c3a5 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3312,8 +3312,8 @@
 (define_expand "extv"
   [(set (match_operand:SWI24 0 "register_operand")
(sign_extract:SWI24 (match_operand:SWI24 1 "register_operand")
-   (match_operand:SI 2 "const_int_operand")
-   (match_operand:SI 3 "const_int_operand")))]
+   (match_operand:QI 2 "const_int_operand")
+   (match_operand:QI 3 "const_int_operand")))]
   ""
 {
   /* Handle extractions from %ah et al.  */
@@ -3340,8 +3340,8 @@
 (define_expand "extzv"
   [(set (match_operand:SWI248 0 "register_operand")
(zero_extract:SWI248 (match_operand:SWI248 1 "register_operand")
-(match_operand:SI 2 "const_int_operand")
-(match_operand:SI 3 "const_int_operand")))]
+(match_operand:QI 2 "const_int_operand")
+(match_operand:QI 3 "const_int_operand")))]
   ""
 {
   if (ix86_expand_pextr (operands))
@@ -3428,8 +3428,8 @@
 
 (define_expand "insv"
   [(set (zero_extract:SWI248 (match_operand:SWI248 0 "register_operand")
-(match_operand:SI 1 "const_int_operand")
-(match_operand:SI 2 "const_int_operand"))
+(match_operand:QI 1 "const_int_operand")
+(match_operand:QI 2 "const_int_operand"))
 (match_operand:SWI248 3 "register_operand"))]
   ""
 {
@@ -10788,8 +10788,8 @@
 (match_operator 1 "compare_operator"
  [(zero_extract:SWI248
 (match_operand 2 "int_nonimmediate_operand" "rm")
-(match_operand 3 "const_int_operand")
-(match_operand 4 "const_int_operand"))
+(match_operand:QI 3 "const_int_operand")
+(match_operand:QI 4 "const_int_operand"))
   (const_int 0)]))]
   "/* Ensure that resulting mask is zero or sign extended operand.  */
INTVAL (operands[4]) >= 0
@@ -15904,7 +15904,7 @@
   [(set (zero_extract:HI
  (match_operand:SWI12 0 "nonimmediate_operand")
  (const_int 1)
- (zero_extend:SI (match_operand:QI 1 "register_operand")))
+ (match_operand:QI 1 "register_operand"))
(const_int 0))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_BT && ix86_pre_reload_split ()"
@@ -15928,7 +15928,7 @@
   [(set (zero_extract:HI
  (match_operand:SWI12 0 "register_operand")
  (const_int 1)
- (zero_extend:SI (match_operand:QI 1 "register_operand")))
+ (match_operand:QI 1 "register_operand"))
(const_int 0))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_USE_BT && ix86_pre_reload_split ()"
@@ -15955,7 +15955,7 @@
 (define_insn "*btsq_imm"
   [(set (zero_extract:DI (match_operand:DI 0 "nonimmediate_operand" "+rm")
 (const_int 1)
-(match_operand 1 "const_0_to_63_operand"))
+(match_operand:QI 1 "const_0_to_63_operand"))
(const_int 1))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_64BIT && (TARGET_USE_BT || reload_completed)"
@@ -15968,7 +15968,7 @@
 (define_insn "*btrq_imm"
   [(set (zero_extract:DI (match_operand:DI 0 "nonimmediate_operand" "+rm")
  

[committed] Fix length computation bug in bfin port

2023-07-22 Thread Jeff Law via Gcc-patches
The tester seemed to occasionally ping-pong a compilation failure on the 
builtin-bitops-1.c test.  I long suspected it was something like length 
computations.


I finally got a few minutes to dig into it, and sure enough the blackfin 
port was claiming the "ones" operation was 2 bytes when it is in fact 4 
bytes.


This fixes the compilation failure for the builtin-bitops-1.c test. 
Sadly, it doesn't fix any of the other failures on the bfin port.


Committed to the trunk.

Jeff
commit bb095e8a343db043a0cd0b0da9b2ab1186d1a1ed
Author: Jeff Law 
Date:   Sat Jul 22 09:47:21 2023 -0600

[committed] Fix length computation bug in bfin port

The tester seemed to occasionally ping-pong a compilation failure on the
builtin-bitops-1.c test.  I long suspected it was something like length
computations.

I finally got a few minutes to dig into it, and sure enough the blackfin
port was claiming the "ones" operation was 2 bytes when it is in fact 4 
bytes.

This fixes the compilation failure for the builtin-bitops-1.c test.   Sadly,
it doesn't fix any of the other failures on the bfin port.

Committed to the trunk.

gcc/
* config/bfin/bfin.md (ones): Fix length computation.

diff --git a/gcc/config/bfin/bfin.md b/gcc/config/bfin/bfin.md
index 9b5ab071778..c6b174dc3bd 100644
--- a/gcc/config/bfin/bfin.md
+++ b/gcc/config/bfin/bfin.md
@@ -1401,7 +1401,8 @@ (define_insn "ones"
 (popcount:SI (match_operand:SI 1 "register_operand" "d"]
   ""
   "%h0 = ONES %1;"
-  [(set_attr "type" "alu0")])
+  [(set_attr "type" "alu0")
+   (set_attr "length" "4")])
 
 (define_expand "popcountsi2"
   [(set (match_dup 2)


Re: [PATCH 4/9] vect: Adjust vectorizable_load costing on VMAT_ELEMENTWISE and VMAT_STRIDED_SLP

2023-07-22 Thread Iain Sandoe
Hi Kewen,

This patch breaks bootstrap on powerpc-darwin (which has Altivec, but not VSX) 
while building libgfortran.

> On 3 Jul 2023, at 04:19, Kewen.Lin via Gcc-patches  
> wrote:

Please see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110776
thanks
Iain



[PATCH] Replace lra-spill.cc's return_regno_p with return_reg_p.

2023-07-22 Thread Roger Sayle

This patch is my attempt to address the compile-time hog issue
in PR rtl-optimization/110587.  Richard Biener's analysis shows that
compilation of pr28071.c with -O0 currently spends ~70% in timer
"LRA non-specific" due to return_regno_p failing to filter a large
number of calls to regno_in_use_p, resulting in quadratic behaviour.

For this pathological test case, things can be improved significantly.
Although the return register (%rax) is indeed mentioned a large
number of times in this function, due to inlining, the inlined functions
access the returned register in TImode, whereas the current function
returns a DImode.  Hence the check to see if we're the last SET of the
return register, which should be followed by a USE, can be improved
by also testing the mode.  Implementation-wise, rather than pass an
additional mode parameter to LRA's local return_regno_p function, which
only has a single caller, it's more convenient to pass the rtx REG_P,
and from this extract both the REGNO and the mode in the callee, and
rename this function to return_reg_p.

The good news is that with this change "LRA non-specific" drops from
70% to 13%.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2023-07-22  Roger Sayle  

gcc/ChangeLog
PR middle-end/28071
PR rtl-optimization/110587
* lra-spills.cc (return_regno_p): Change argument and rename to...
(return_reg_p): Check if the given register RTX has the same
REGNO and machine mode as the function's return value.
(lra_final_code_change): Update call to return_reg_p.


Thanks in advance,
Roger
--

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e..ae147ad 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,10 +705,10 @@ alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
-/* Return true if REGNO is used for return in the current
-   function.  */
+/* Return true if register REG, known to be REG_P, is used for return
+   in the current function.  */
 static bool
-return_regno_p (unsigned int regno)
+return_reg_p (rtx reg)
 {
   rtx outgoing = crtl->return_rtx;
 
@@ -716,7 +716,8 @@ return_regno_p (unsigned int regno)
 return false;
 
   if (REG_P (outgoing))
-return REGNO (outgoing) == regno;
+return REGNO (outgoing) == REGNO (reg)
+  && GET_MODE (outgoing) == GET_MODE (reg);
   else if (GET_CODE (outgoing) == PARALLEL)
 {
   int i;
@@ -725,7 +726,9 @@ return_regno_p (unsigned int regno)
{
  rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
 
- if (REG_P (x) && REGNO (x) == regno)
+ if (REG_P (x)
+ && REGNO (x) == REGNO (reg)
+ && GET_MODE (x) == GET_MODE (reg))
return true;
}
 }
@@ -821,7 +824,7 @@ lra_final_code_change (void)
  if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
  && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
  && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
- && (! return_regno_p (REGNO (SET_SRC (pat)))
+ && (! return_reg_p (SET_SRC (pat))
  || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)
{
  lra_invalidate_insn_data (insn);


[committed] testsuite: Limit bb-slp-pr95839-v8.c to 64-bit vector targets

2023-07-22 Thread Maciej W. Rozycki
Only run bb-slp-pr95839-v8.c with targets that support vectors of 64 
bits, removing regressions with 32-bit x86 targets:

FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c scan-tree-dump slp2 "optimized: basic 
block"
FAIL: gcc.dg/vect/bb-slp-pr95839-v8.c -flto -ffat-lto-objects  scan-tree-dump 
slp2 "optimized: basic block"

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr95839-v8.c: Limit to `vect64' targets.
---
On Fri, 21 Jul 2023, Jiang, Haochen wrote:

> > > > I think the issue is we disable V2SF on ia32 because of the conflict
> > > > with MMX which we don't want to use.
> > >
> > >  I'm not sure if I have a way to test with such a target.  Would you
> > > expect:
> > >
> > > /* { dg-require-effective-target vect64 } */
> > >
> > > to cover it?  If so, then I'll put it back as in the original version
> > > and post for Haochen to verify.
> 
> I suppose just commit to trunk and it should be ok since it is only -m32 
> issue.

 Thanks for mentioning `-m32', I was able to use it to verify this change 
and confirm that the qualifier does indeed work for this configuration.  
Committed as posted then and apologies for messing this up.

  Maciej
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |1 +
 1 file changed, 1 insertion(+)

gcc-test-bb-slp-pr95839-v8-fix.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
===
--- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect64 } */
 /* { dg-additional-options "-w -Wno-psabi" } */
 
 typedef float __attribute__((vector_size(8))) v2f32;


Re: [x86 PATCH] Use QImode for offsets in zero_extract/sign_extract in i386.md

2023-07-22 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 22, 2023 at 5:37 PM Roger Sayle  wrote:
>
>
> As suggested by Uros, this patch changes the ZERO_EXTRACTs and SIGN_EXTRACTs
> in i386.md to consistently use QImode for bit offsets (i.e. third and fourth
> operands), matching the use of QImode for bit counts in shifts and rotates.
>
> There's no change in functionality, and the new patterns simply ensure that
> we continue to generate the same code (match revised patterns) as before.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-22  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.md (extv): Use QImode for offsets.
> (extzv): Likewise.
> (insv): Likewise.
> (*testqi_ext_3): Likewise.
> (*btr_2): Likewise.
> (define_split): Likewise.
> (*btsq_imm): Likewise.
> (*btrq_imm): Likewise.
> (*btcq_imm): Likewise.
> (define_peephole2 x3): Likewise.
> (*bt): Likewise
> (*bt_mask): New define_insn_and_split.
> (*jcc_bt): Use QImode for offsets.
> (*jcc_bt_1): Delete obsolete pattern.
> (*jcc_bt_mask): Use QImode offsets.
> (*jcc_bt_mask_1): Likewise.
> (define_split): Likewise.
> (*bt_setcqi): Likewise.
> (*bt_setncqi): Likewise.
> (*bt_setnc): Likewise.
> (*bt_setncqi_2): Likewise.
> (*bt_setc_mask): New define_insn_and_split.
> (bmi2_bzhi_3): Use QImode offsets.
> (*bmi2_bzhi_3): Likewise.
> (*bmi2_bzhi_3_1): Likewise.
> (*bmi2_bzhi_3_1_ccz): Likewise.
> (@tbm_bextri_): Likewise.

OK.

Thanks,
Uros.

>
>
> Thanks,
> Roger
> --
>


Re: [x86 PATCH] Don't use insvti_{high, low}part with -O0 (for compile-time).

2023-07-22 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 22, 2023 at 4:17 PM Roger Sayle  wrote:
>
>
> This patch attempts to help with PR rtl-optimization/110587, a regression
> of -O0 compile time for the pathological pr28071.c.  My recent patch helps
> a bit, but hasn't returned -O0 compile-time to where it was before my
> ix86_expand_move changes.  The obvious solution/workaround is to guard
> these new TImode parameter passing optimizations with "&& optimize", so
> they don't trigger when compiling with -O0.  The very minor complication
> is that "&& optimize" alone leads to the regression of pr110533.c, where
> our improved TImode parameter passing fixes a wrong-code issue with naked
> functions, importantly, when compiling with -O0.  This should explain
> the one line fix below "&& (optimize || ix86_function_naked (cfun))".
>
> I've an additional fix/tweak or two for this compile-time issue, but
> this change eliminates the part of the regression that I've caused.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
> 2023-07-22  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_move): Disable the
> 64-bit insertions into TImode optimizations with -O0, unless
> the function has the "naked" attribute (for PR target/110533).

LGTM, but please add some comments, why only when optimizing (please
mention PR110587) and especially mention PR110533 on why the naked
attribute is allowed.

Thanks,
Uros.

> Cheers,
> Roger
> --
>


[PATCH] Fix alpha building

2023-07-22 Thread Andrew Pinski via Gcc-patches
The problem is after r14-2587-gd8105b10fff951, the definition of
extended_count now takes a bool as its last argument but we only
have a declaration for the version which takes an int as the last
argument. This fixes the problem by changing the declaration to be
a bool too.

Committed as obvious after building a cross to alpha-linux-gnu.

gcc/ChangeLog:

PR target/110778
* rtl.h (extended_count): Change last argument type
to bool.
---
 gcc/rtl.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rtl.h b/gcc/rtl.h
index 03b7d058295..e1c51156f90 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4214,7 +4214,7 @@ extern bool validate_subreg (machine_mode, machine_mode,
 const_rtx, poly_uint64);
 
 /* In combine.cc  */
-extern unsigned int extended_count (const_rtx, machine_mode, int);
+extern unsigned int extended_count (const_rtx, machine_mode, bool);
 extern rtx remove_death (unsigned int, rtx_insn *);
 extern rtx make_compound_operation (rtx, enum rtx_code);
 
-- 
2.31.1



Re: [PATCH v2] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-22 Thread Vineet Gupta

On 7/21/23 23:05, Jeff Law wrote:



On 7/21/23 12:30, Vineet Gupta wrote:

Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")
(gcc-13 regression)

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to 
optimize it.


void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

This came to light when testing the in-flight f-m-o patch where an ICE
was getting triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output, which is
also fixed in the patch.

gcc/Changelog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
  const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
  patterns to avoid random string matches.

OK
jeff


Thx Jeff. I couldn't resist beefing up the changelog some more to 
capture the technicalities at heart. Hopefully someone in future getting 
up to speed on gcc will find it informing.


-Vineet


[Committed] RISC-V: optim const DF +0.0 store to mem [PR/110748]

2023-07-22 Thread Vineet Gupta
Fixes: ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT")

DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.

void zd(double *) { *d = 0.0; }

currently:

| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret

With patch

| sd  zero,0(a0)
| ret

The fix updates predicate const_0_operand() so reg_or_0_operand () now
includes const_double, enabling movdf expander -> riscv_legitimize_move ()
to generate below vs. an intermediate set (reg:DF) const_double:DF

| (insn 6 3 0 2 (set (mem:DF (reg/v/f:DI 134 [ d ])
|(const_double:DF 0.0 [0x0.0p+0]))

This change also enables such insns to be recog() by later passes.
The md pattern "*movdf_hardfloat_rv64" despite already supporting the
needed constraints {"m","G"} mem/const 0.0 was failing to match because
the additional condition check reg_or_0_operand() was failing due to
missing const_double.

This failure to recog() was triggering an ICE when testing the in-flight
f-m-o patches and is how all of this started, but then was deemed to be
an independent optimization of it's own [1].

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html

Its worthwhile to note all the set peices were already there and working
up until my own commit mentioned at top regressed the whole thing.

Ran thru full multilib testsuite and no surprises. There was 1 false
failure due to random string "lw" appearing in lto build assembler output,
which is also fixed here.

gcc/ChangeLog:

PR target/110748
* config/riscv/predicates.md (const_0_operand): Add back
const_double.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
patterns to avoid random string matches.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/predicates.md |  2 +-
 gcc/testsuite/gcc.target/riscv/pr110748-1.c| 10 ++
 gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c |  8 
 3 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
 
 (define_predicate "const_0_operand"
-  (and (match_code "const_int,const_wide_int,const_vector")
+  (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
 
 (define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c 
b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index ..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0;  }
+void zf(float *f)  { *f = 0.0;  }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c 
b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
 /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
 /* { dg-final { scan-assembler "fmv.x.w" } } */
 /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
-- 
2.34.1



[PATCH] Fix 100864: `(a&!b) | b` is not opimized to `a | b` for comparisons

2023-07-22 Thread Andrew Pinski via Gcc-patches
This adds a special case of the `(a&~b) | b` pattern where
`b` and `~b` are comparisons.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/100864
* match.pd ((~x & y) | x -> x | y): Add comparison variant.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-3.c: New test.
---
 gcc/match.pd | 17 +-
 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c | 67 
 2 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c

diff --git a/gcc/match.pd b/gcc/match.pd
index bfd15d6cd4a..dd4a2df537d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1928,7 +1928,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* (~x & y) | x -> x | y */
  (simplify
   (bitop:c (rbitop:c (bit_not @0) @1) @0)
-  (bitop @0 @1)))
+  (bitop @0 @1))
+ /* Similar but for comparisons which have been inverted already,
+Note it is hard to simulate the inverted tcc_comparison due
+NaNs; That is == and != are sometimes inversions and sometimes not.
+So a double for loop is needed and then compare the inverse code
+with the result of invert_tree_comparison is needed.
+This works fine for vector compares as -1 and 0 are bitwise
+inverses.  */
+ (for cmp (tcc_comparison)
+  (for icmp (tcc_comparison)
+   (simplify
+(bitop:c (rbitop:c (icmp @0 @1) @2) (cmp@3 @0 @1))
+ (with { enum tree_code ic = invert_tree_comparison
+ (cmp, HONOR_NANS (@0)); }
+  (if (ic == icmp)
+   (bitop @3 @2)))
 
 /* ((x | y) & z) | x -> (z & y) | x */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
new file mode 100644
index 000..68fff4edce9
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-3.c
@@ -0,0 +1,67 @@
+/* PR tree-optimization/100864 */
+
+/* { dg-do run } */
+/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
+
+#define op_ne !=
+#define op_eq ==
+#define op_lt <
+#define op_le <=
+#define op_gt >
+#define op_ge >=
+
+#define operators(t) \
+t(ne) \
+t(eq) \
+t(lt) \
+t(le) \
+t(gt) \
+t(ge)
+
+#define cmpfunc(v, op) \
+__attribute__((noipa)) \
+_Bool func_##op##_##v(v int a, v int b, v _Bool e) \
+{ \
+  v _Bool c = (a op_##op b); \
+  v _Bool d = !c; \
+  return (e & d) | c; \
+}
+
+#define cmp_funcs(op) \
+cmpfunc(, op) \
+cmpfunc(volatile , op)
+
+operators(cmp_funcs)
+
+#define test(op) \
+if (func_##op##_ (a, b, e) != func_##op##_volatile (a, b, e)) \
+ __builtin_abort();
+ 
+int main()
+{
+  for(int a = -3; a <= 3; a++)
+for(int b = -3; b <= 3; b++)
+  {
+   _Bool e = 0;
+   operators(test)
+   e = 1;
+   operators(test)
+  }
+  return 0;
+}
+
+/* Check to make sure we optimize `(a&!b) | b` -> `a | b`. */
+/* There are 6 different comparison operators testing here. */
+/* bit_not_expr and bit_and_expr should show up for each one (volatile). */
+/* Each operator should show up twice
+   (except for `!=` which shows up 2*6 (each tester) + 2 (the 2 loops) extra = 
16). */
+/* bit_ior_expr will show up for each operator twice (non-volatile and 
volatile). */
+/* { dg-final { scan-tree-dump-times "ne_expr,"  16 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "eq_expr,"   2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "lt_expr,"   2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "le_expr,"   2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "gt_expr,"   2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "ge_expr,"   2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "bit_not_expr,"  6 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "bit_and_expr,"  6 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "bit_ior_expr," 12 "optimized"} } */
\ No newline at end of file
-- 
2.31.1



[PATCH 2/2] AARCH64: Turn off unwind tables for crtbeginT.o

2023-07-22 Thread Andrew Pinski via Gcc-patches
The problem -fasynchronous-unwind-tables is on by default for aarch64
We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
to .eh_frame data from crtbeginT.o instead of the user-defined object
during static linking.

This turns it off.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

libgcc/ChangeLog:

* config.host (aarch64*-*-*): Add t-crtstuff to tmake_file.
* config/aarch64/t-crtstuff: New file.
---
 libgcc/config.host   | 6 ++
 libgcc/config/aarch64/t-crtstuff | 5 +
 2 files changed, 11 insertions(+)
 create mode 100644 libgcc/config/aarch64/t-crtstuff

diff --git a/libgcc/config.host b/libgcc/config.host
index c94d69d84b7..b2d82041a69 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -390,6 +390,7 @@ aarch64*-*-elf | aarch64*-*-rtems*)
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
@@ -398,6 +399,7 @@ aarch64*-*-elf | aarch64*-*-rtems*)
 aarch64*-*-freebsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
@@ -406,12 +408,14 @@ aarch64*-*-freebsd*)
 aarch64*-*-netbsd*)
extra_parts="$extra_parts crtfastmath.o"
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
md_unwind_header=aarch64/aarch64-unwind.h
;;
 aarch64*-*-fuchsia*)
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp"
tmake_file="${tmake_file} t-dfprules"
@@ -420,6 +424,7 @@ aarch64*-*-linux*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/linux-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-lse t-slibgcc-libgcc"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
@@ -428,6 +433,7 @@ aarch64*-*-vxworks7*)
extra_parts="$extra_parts crtfastmath.o"
md_unwind_header=aarch64/aarch64-unwind.h
tmake_file="${tmake_file} ${cpu_type}/t-aarch64"
+   tmake_file="${tmake_file} ${cpu_type}/t-crtstuff"
tmake_file="${tmake_file} ${cpu_type}/t-lse"
tmake_file="${tmake_file} ${cpu_type}/t-softfp t-softfp t-crtfm"
tmake_file="${tmake_file} t-dfprules"
diff --git a/libgcc/config/aarch64/t-crtstuff b/libgcc/config/aarch64/t-crtstuff
new file mode 100644
index 000..2e2814e6c67
--- /dev/null
+++ b/libgcc/config/aarch64/t-crtstuff
@@ -0,0 +1,5 @@
+# -fasynchronous-unwind-tables -funwind-tables is on by default for aarch64
+# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
+# to .eh_frame data from crtbeginT.o instead of the user-defined object
+# during static linking.
+CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
-- 
2.39.1



[PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2023-07-22 Thread Andrew Pinski via Gcc-patches
The problem -fasynchronous-unwind-tables is on by default for riscv linux
We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
to .eh_frame data from crtbeginT.o instead of the user-defined object
during static linking.

This turns it off.

OK?

libgcc/ChangeLog:

* config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
(riscv*-*-freebsd*): Likewise.
* config/riscv/t-crtstuff: New file.
---
 libgcc/config.host | 4 ++--
 libgcc/config/riscv/t-crtstuff | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)
 create mode 100644 libgcc/config/riscv/t-crtstuff

diff --git a/libgcc/config.host b/libgcc/config.host
index 9d7212028d0..c94d69d84b7 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1304,12 +1304,12 @@ pru-*-*)
tm_file="$tm_file pru/pru-abi.h"
;;
 riscv*-*-linux*)
-   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp 
riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
+   tmake_file="${tmake_file} riscv/t-crtstuff 
riscv/t-softfp${host_address} t-softfp riscv/t-elf riscv/t-elf${host_address} 
t-slibgcc-libgcc"
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o crtendS.o 
crtbeginT.o"
md_unwind_header=riscv/linux-unwind.h
;;
 riscv*-*-freebsd*)
-   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp 
riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
+   tmake_file="${tmake_file} riscv/t-crtstuff 
riscv/t-softfp${host_address} t-softfp riscv/t-elf riscv/t-elf${host_address} 
t-slibgcc-libgcc"
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o crtendS.o 
crtbeginT.o"
;;
 riscv*-*-*)
diff --git a/libgcc/config/riscv/t-crtstuff b/libgcc/config/riscv/t-crtstuff
new file mode 100644
index 000..685d11b3e66
--- /dev/null
+++ b/libgcc/config/riscv/t-crtstuff
@@ -0,0 +1,5 @@
+# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv linux
+# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
+# to .eh_frame data from crtbeginT.o instead of the user-defined object
+# during static linking.
+CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
-- 
2.39.1



Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2023-07-22 Thread Kito Cheng via Gcc-patches
OK for trunk, thanks:)

Andrew Pinski via Gcc-patches  於 2023年7月23日 週日
09:07 寫道:

> The problem -fasynchronous-unwind-tables is on by default for riscv linux
> We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__
> point
> to .eh_frame data from crtbeginT.o instead of the user-defined object
> during static linking.
>
> This turns it off.
>
> OK?
>
> libgcc/ChangeLog:
>
> * config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
> (riscv*-*-freebsd*): Likewise.
> * config/riscv/t-crtstuff: New file.
> ---
>  libgcc/config.host | 4 ++--
>  libgcc/config/riscv/t-crtstuff | 5 +
>  2 files changed, 7 insertions(+), 2 deletions(-)
>  create mode 100644 libgcc/config/riscv/t-crtstuff
>
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 9d7212028d0..c94d69d84b7 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1304,12 +1304,12 @@ pru-*-*)
> tm_file="$tm_file pru/pru-abi.h"
> ;;
>  riscv*-*-linux*)
> -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> +   tmake_file="${tmake_file} riscv/t-crtstuff
> riscv/t-softfp${host_address} t-softfp riscv/t-elf
> riscv/t-elf${host_address} t-slibgcc-libgcc"
> extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> crtendS.o crtbeginT.o"
> md_unwind_header=riscv/linux-unwind.h
> ;;
>  riscv*-*-freebsd*)
> -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> +   tmake_file="${tmake_file} riscv/t-crtstuff
> riscv/t-softfp${host_address} t-softfp riscv/t-elf
> riscv/t-elf${host_address} t-slibgcc-libgcc"
> extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> crtendS.o crtbeginT.o"
> ;;
>  riscv*-*-*)
> diff --git a/libgcc/config/riscv/t-crtstuff
> b/libgcc/config/riscv/t-crtstuff
> new file mode 100644
> index 000..685d11b3e66
> --- /dev/null
> +++ b/libgcc/config/riscv/t-crtstuff
> @@ -0,0 +1,5 @@
> +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv
> linux
> +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> +# during static linking.
> +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> --
> 2.39.1
>
>


[PATCH V5] VECT: Support floating-point in-order reduction for length loop control

2023-07-22 Thread Lehua Ding
From: Ju-Zhe Zhong 

PS: Submitted on behalf of Juzhe Zhong

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add 
mask_len_fold_left_plus.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index d036a7d4480..dba509b6f37 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (mask && len && mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask && mask_reduc_fn == IFN_MASK_FOLD_LEFT_PLUS)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,

Re: Re: [PATCH V2] VECT: Support floating-point in-order reduction for length loop control

2023-07-22 Thread Lehua Ding
Hi Richard,


Bootstrap and regression are passed on X86 and
no new testcases fail on AArch64 with V5 patch:


https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625293.html


V5 patch is ok for trunk?


Best,
Lehua

Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2023-07-22 Thread Andreas Schwab
On Jul 22 2023, Andrew Pinski via Gcc-patches wrote:

> The problem -fasynchronous-unwind-tables is on by default for riscv linux
> We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> to .eh_frame data from crtbeginT.o instead of the user-defined object
> during static linking.
>
> This turns it off.

Since this is a recurring problem, and difficult to notice (see how long
the aarch64 case went unnoticed), it should be fixed generically,
instead of having to patch every case separately.

> diff --git a/libgcc/config/riscv/t-crtstuff b/libgcc/config/riscv/t-crtstuff
> new file mode 100644
> index 000..685d11b3e66
> --- /dev/null
> +++ b/libgcc/config/riscv/t-crtstuff
> @@ -0,0 +1,5 @@
> +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv 
> linux
> +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> +# during static linking.
> +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables

What about CRTSTUFF_T_CFLAGS_S?

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."