Re: [pushed] c++: repeated export using

2024-06-12 Thread Andrew Pinski
On Wed, Jun 12, 2024 at 1:32 PM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> -- 8< --
>
> A sample implementation of module std was breaking because the exports
> included 'using std::operator&' twice.  Since Nathaniel's r15-964 for
> PR114867, the first using added an extra instance of each function that was
> revealed/exported by that using, resulting in duplicates for
> lookup_maybe_add to dedup.  But if the duplicate is the first thing in the
> list, lookup_add doesn't make an OVERLOAD, so trying to set OVL_USING_P
> crashes.  Fixed by using ovl_make in the case where we want to set the flag.

Note this was recorded as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115445 which I closed as
fixed already.

Thanks,
Andrew

>
> gcc/cp/ChangeLog:
>
> * tree.cc (lookup_maybe_add): Use ovl_make when setting OVL_USING_P.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/modules/using-21_a.C: New test.
> ---
>  gcc/cp/tree.cc|  8 ++--
>  gcc/testsuite/g++.dg/modules/using-21_a.C | 11 +++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/using-21_a.C
>
> diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
> index d2a8f79ffab..28648c14c6d 100644
> --- a/gcc/cp/tree.cc
> +++ b/gcc/cp/tree.cc
> @@ -2526,11 +2526,15 @@ lookup_maybe_add (tree fns, tree lookup, bool 
> deduping)
>predecessors onto the lookup.  */
> for (; fns != probe; fns = OVL_CHAIN (fns))
>   {
> -   lookup = lookup_add (OVL_FUNCTION (fns), lookup);
> /* Propagate OVL_USING, but OVL_HIDDEN &
>OVL_DEDUP_P don't matter.  */
> if (OVL_USING_P (fns))
> - OVL_USING_P (lookup) = true;
> + {
> +   lookup = ovl_make (OVL_FUNCTION (fns), lookup);
> +   OVL_USING_P (lookup) = true;
> + }
> +   else
> + lookup = lookup_add (OVL_FUNCTION (fns), lookup);
>   }
>
> /* And now skip this function.  */
> diff --git a/gcc/testsuite/g++.dg/modules/using-21_a.C 
> b/gcc/testsuite/g++.dg/modules/using-21_a.C
> new file mode 100644
> index 000..ce6e3f920f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/using-21_a.C
> @@ -0,0 +1,11 @@
> +// { dg-additional-options "-fmodules-ts -Wno-global-module" }
> +
> +module;
> +namespace foo {
> +  void baz();
> +}
> +export module foo;
> +namespace foo {
> +  export using foo::baz;
> +  export using foo::baz;
> +}
>
> base-commit: 7bf072e87a03c9eaff9b7a1ac182537b70f0ba8e
> prerequisite-patch-id: 6c196fa553aea243ce21f45cd2ddf3daaa840921
> --
> 2.44.0
>


[gcc r15-1216] aarch64: Use bitreverse rtl code instead of unspec [PR115176]

2024-06-12 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:c2f0aaf7539c42b024ed6b3fb6909bd2c86bb206

commit r15-1216-gc2f0aaf7539c42b024ed6b3fb6909bd2c86bb206
Author: Andrew Pinski 
Date:   Tue Jun 11 20:36:34 2024 +

aarch64: Use bitreverse rtl code instead of unspec [PR115176]

Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
use it instead of an unspec. This is just a small cleanup but it does
have one small fix with respect to rtx costs which didn't handle vector 
modes
correctly for the UNSPEC and now it does.
This is part of the first step in adding __builtin_bitreverse's builtins
but it is independent of it though.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR target/115176
* config/aarch64/aarch64-simd.md 
(aarch64_rbit): Use
bitreverse instead of unspec.
* config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert 
over to using
rtx_code_function instead of unspec_based_function.
* config/aarch64/aarch64-sve.md: Update comment where RBIT is 
included.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE 
like BSWAP.
Remove UNSPEC_RBIT support.
* config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
(aarch64_rbit): Use bitreverse instead of unspec.
* config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
(optab): Likewise.
(sve_int_op): Likewise.
(SVE_INT_UNARY): Remove UNSPEC_RBIT.
(optab): Likewise.
(sve_int_op): Likewise.
(min_elem_bits): Likewise.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/config/aarch64/aarch64-simd.md  |  3 +--
 gcc/config/aarch64/aarch64-sve-builtins-base.cc |  2 +-
 gcc/config/aarch64/aarch64-sve.md   |  2 +-
 gcc/config/aarch64/aarch64.cc   |  9 +
 gcc/config/aarch64/aarch64.md   |  3 +--
 gcc/config/aarch64/iterators.md | 10 +-
 6 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f644bd1731e5..0bb39091a385 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -377,8 +377,7 @@
 
 (define_insn "aarch64_rbit"
   [(set (match_operand:VB 0 "register_operand" "=w")
-   (unspec:VB [(match_operand:VB 1 "register_operand" "w")]
-  UNSPEC_RBIT))]
+   (bitreverse:VB (match_operand:VB 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "rbit\\t%0., %1."
   [(set_attr "type" "neon_rbit")]
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e1..dea2f6e6bfc4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -3186,7 +3186,7 @@ FUNCTION (svqincp, svqdecp_svqincp_impl, (SS_PLUS, 
US_PLUS))
 FUNCTION (svqincw, svqinc_bhwd_impl, (SImode))
 FUNCTION (svqincw_pat, svqinc_bhwd_impl, (SImode))
 FUNCTION (svqsub, rtx_code_function, (SS_MINUS, US_MINUS, -1))
-FUNCTION (svrbit, unspec_based_function, (UNSPEC_RBIT, UNSPEC_RBIT, -1))
+FUNCTION (svrbit, rtx_code_function, (BITREVERSE, BITREVERSE, -1))
 FUNCTION (svrdffr, svrdffr_impl,)
 FUNCTION (svrecpe, unspec_based_function, (-1, UNSPEC_URECPE, UNSPEC_FRECPE))
 FUNCTION (svrecps, unspec_based_function, (-1, -1, UNSPEC_FRECPS))
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index d69db34016a5..5331e7121d55 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3083,6 +3083,7 @@
 ;; - CLS (= clrsb)
 ;; - CLZ
 ;; - CNT (= popcount)
+;; - RBIT (= bitreverse)
 ;; - NEG
 ;; - NOT
 ;; -
@@ -3171,7 +3172,6 @@
 ;;  [INT] General unary arithmetic corresponding to unspecs
 ;; -
 ;; Includes
-;; - RBIT
 ;; - REVB
 ;; - REVH
 ;; - REVW
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 13191ec8e345..149e5b2f69ae 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14690,6 +14690,7 @@ cost_plus:
return true;
   }
 
+case BITREVERSE:
 case BSWAP:
   *cost = COSTS_N_INSNS (1);
 
@@ -15339,14 +15340,6 @@ cost_plus:
 
   return false;
 }
-
-  if (XINT (x, 1) == UNSPEC_RBIT)
-{
-  if (speed)
-*cost += extra_cost->alu.rev;
-
-  return false;
-}
   break;
 
 case TRUNCATE:
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 389a1906e236..9de6235b1398 10064

[gcc r15-1215] match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p for truncating casts [PR11

2024-06-12 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:0256121e2f23ac3550e87410c9b1e690c8edfc7c

commit r15-1215-g0256121e2f23ac3550e87410c9b1e690c8edfc7c
Author: Andrew Pinski 
Date:   Tue Jun 11 17:16:42 2024 -0700

match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p 
for truncating casts [PR115449]

As mentioned by Jeff in r15-831-g05daf617ea22e1d818295ed2d037456937e23530, 
we don't handle
`(X | Y) & ~Y` -> `X & ~Y` on the gimple level when there are some 
different signed
(but same precision) types dealing with matching `~Y` with the `Y` part. 
This
improves both gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p to
be able to say `(truncate)a` and `(truncate)a` are bitwise_equal and
that `~(truncate)a` and `(truncate)a` are bitwise_invert_equal.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115449

gcc/ChangeLog:

* gimple-match-head.cc (gimple_maybe_truncate): New declaration.
(gimple_bitwise_equal_p): Match truncations that differ only
in types with the same precision.
(gimple_bitwise_inverted_equal_p): For matching after 
bit_not_with_nop
call gimple_bitwise_equal_p.
* match.pd (maybe_truncate): New match pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-10.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/gimple-match-head.cc  | 17 +++-
 gcc/match.pd  |  7 +++
 gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c | 34 +++
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index e26fa0860ee9..924d3f1e7103 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -243,6 +243,7 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
   gimple_bitwise_equal_p (expr1, expr2, valueize)
 
 bool gimple_nop_convert (tree, tree *, tree (*) (tree));
+bool gimple_maybe_truncate (tree, tree *, tree (*) (tree));
 
 /* Helper function for bitwise_equal_p macro.  */
 
@@ -271,6 +272,10 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 }
   if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
 return true;
+  if (gimple_maybe_truncate (expr3, , valueize)
+  && gimple_maybe_truncate (expr4, , valueize)
+  && operand_equal_p (expr3, expr4, 0))
+return true;
   return false;
 }
 
@@ -318,21 +323,13 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
   /* Try if EXPR1 was defined as ~EXPR2. */
   if (gimple_bit_not_with_nop (expr1, , valueize))
 {
-  if (operand_equal_p (other, expr2, 0))
-   return true;
-  tree expr4;
-  if (gimple_nop_convert (expr2, , valueize)
- && operand_equal_p (other, expr4, 0))
+  if (gimple_bitwise_equal_p (other, expr2, valueize))
return true;
 }
   /* Try if EXPR2 was defined as ~EXPR1. */
   if (gimple_bit_not_with_nop (expr2, , valueize))
 {
-  if (operand_equal_p (other, expr1, 0))
-   return true;
-  tree expr3;
-  if (gimple_nop_convert (expr1, , valueize)
- && operand_equal_p (other, expr3, 0))
+  if (gimple_bitwise_equal_p (other, expr1, valueize))
return true;
 }
 
diff --git a/gcc/match.pd b/gcc/match.pd
index 5cfe81e80b31..3204cf415387 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -200,6 +200,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (maybe_bit_not @0)
  (bit_xor_cst@0 @1 @2))
 
+#if GIMPLE
+(match (maybe_truncate @0)
+ (convert @0)
+ (if (INTEGRAL_TYPE_P (type)
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)
+#endif
+
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
of the ABSU_EXPR will have the corresponding signed type.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
new file mode 100644
index ..000c5aef2377
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115449 */
+
+void setBit_un(unsigned char *a, int b) {
+   unsigned char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+
+void setBit_sign(signed char *a, int b) {
+   signed char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+
+void setBit(char *a, int b) {
+   char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+/*
+   All three should produce:
+_1 = 1 << b_4(D);
+c_5 = (cast) _1;
+_2 = *a_7(D);
+_3 = _2 | c_5;
+*a_7(D) = _3;
+   Removing the `&~c` as we are matching `(~x & y) | x` -&g

[PATCH] match: Improve gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p for truncating casts [PR115449]

2024-06-11 Thread Andrew Pinski
As mentioned by Jeff in r15-831-g05daf617ea22e1d818295ed2d037456937e23530, we 
don't handle
`(X | Y) & ~Y` -> `X & ~Y` on the gimple level when there are some different 
signed
(but same precision) types dealing with matching `~Y` with the `Y` part. This
improves both gimple_bitwise_equal_p and gimple_bitwise_inverted_equal_p to
be able to say `(truncate)a` and `(truncate)a` are bitwise_equal and
that `~(truncate)a` and `(truncate)a` are bitwise_invert_equal.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115449

gcc/ChangeLog:

* gimple-match-head.cc (gimple_maybe_truncate): New declaration.
(gimple_bitwise_equal_p): Match truncations that differ only
in types with the same precision.
(gimple_bitwise_inverted_equal_p): For matching after bit_not_with_nop
call gimple_bitwise_equal_p.
* match.pd (maybe_truncate): New match pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-10.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-match-head.cc  | 17 +---
 gcc/match.pd  |  7 +
 gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c | 34 +++
 3 files changed, 48 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c

diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index e26fa0860ee..924d3f1e710 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -243,6 +243,7 @@ optimize_successive_divisions_p (tree divisor, tree 
inner_div)
   gimple_bitwise_equal_p (expr1, expr2, valueize)
 
 bool gimple_nop_convert (tree, tree *, tree (*) (tree));
+bool gimple_maybe_truncate (tree, tree *, tree (*) (tree));
 
 /* Helper function for bitwise_equal_p macro.  */
 
@@ -271,6 +272,10 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 }
   if (expr2 != expr4 && operand_equal_p (expr1, expr4, 0))
 return true;
+  if (gimple_maybe_truncate (expr3, , valueize)
+  && gimple_maybe_truncate (expr4, , valueize)
+  && operand_equal_p (expr3, expr4, 0))
+return true;
   return false;
 }
 
@@ -318,21 +323,13 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
   /* Try if EXPR1 was defined as ~EXPR2. */
   if (gimple_bit_not_with_nop (expr1, , valueize))
 {
-  if (operand_equal_p (other, expr2, 0))
-   return true;
-  tree expr4;
-  if (gimple_nop_convert (expr2, , valueize)
- && operand_equal_p (other, expr4, 0))
+  if (gimple_bitwise_equal_p (other, expr2, valueize))
return true;
 }
   /* Try if EXPR2 was defined as ~EXPR1. */
   if (gimple_bit_not_with_nop (expr2, , valueize))
 {
-  if (operand_equal_p (other, expr1, 0))
-   return true;
-  tree expr3;
-  if (gimple_nop_convert (expr1, , valueize)
- && operand_equal_p (other, expr3, 0))
+  if (gimple_bitwise_equal_p (other, expr1, valueize))
return true;
 }
 
diff --git a/gcc/match.pd b/gcc/match.pd
index 5cfe81e80b3..3204cf41538 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -200,6 +200,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (maybe_bit_not @0)
  (bit_xor_cst@0 @1 @2))
 
+#if GIMPLE
+(match (maybe_truncate @0)
+ (convert @0)
+ (if (INTEGRAL_TYPE_P (type)
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)
+#endif
+
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
of the ABSU_EXPR will have the corresponding signed type.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
new file mode 100644
index 000..000c5aef237
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-10.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115449 */
+
+void setBit_un(unsigned char *a, int b) {
+   unsigned char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+
+void setBit_sign(signed char *a, int b) {
+   signed char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+
+void setBit(char *a, int b) {
+   char c = 0x1UL << b;
+   *a &= ~c;
+   *a |= c;
+}
+/*
+   All three should produce:
+_1 = 1 << b_4(D);
+c_5 = (cast) _1;
+_2 = *a_7(D);
+_3 = _2 | c_5;
+*a_7(D) = _3;
+   Removing the `&~c` as we are matching `(~x & y) | x` -> `x | y`
+   match pattern even with extra casts are being involved. */
+
+/* { dg-final { scan-tree-dump-not "bit_not_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-not "bit_and_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times "bit_ior_expr, " 3 "optimized" } } */
-- 
2.43.0



Re: [PATCH] [testsuite] add linkonly to dg-additional-sources [PR115295]

2024-06-11 Thread Andrew Pinski
On Tue, Jun 11, 2024 at 7:03 PM Alexandre Oliva  wrote:
>
>
> The D testsuite shows it was a mistake to assume that
> dg-additional-sources are never to be used for compilation tests.
> Even if an output file is specified for compilation, extra module
> files can be named and used in the compilation without being flagged
> as errors.
>
> Introduce a 'linkonly' flag for dg-additional-sources, and use it in
> pr95401.cc, so that its additional sources get discarded when vector
> tests downgrade to compile-only.
>
> Regstrapped on x86_64-linux-gnu.  Also tested with 'dg-do compile' in
> pr95401.cc.  Ok to install?

I think we should just fully revert the changes to
dg-additional-sources and add an explicit `dg-do run` to pr95401.cc as
I had did to the other vect testcases that fail in a similar way (see
PR 113899, r14-8978-g948dbc5ee45f9f ). Since this is only working
around this one testcase this way.

Thanks,
Andrew Pinski

>
>
> for  gcc/ChangeLog
>
> * doc/sourcebuild.texi (dg-additional-sources): Add linkonly.
>
> for  gcc/testsuite/ChangeLog
>
> * g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources.
> * lib/gcc-defs (additional_sources_omit_on_compile): New.
> (dg-additional-sources): Add to it on linkonly.
> (dg-additional-files-options): Omit select sources on compile.
> ---
>  gcc/doc/sourcebuild.texi |9 +
>  gcc/testsuite/g++.dg/vect/pr95401.cc |2 +-
>  gcc/testsuite/lib/gcc-defs.exp   |   35 
> +++---
>  3 files changed, 34 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index e997dbec3334b..08c178db674c8 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -1320,15 +1320,16 @@ to @var{var_value} before execution of the program 
> created by the test.
>  Specify additional files, other than source files, that must be copied
>  to the system where the compiler runs.
>
> -@item @{ dg-additional-sources "@var{filelist}" [@{ target @var{selector} 
> @}] @}
> +@item @{ dg-additional-sources "@var{filelist}" [@{ \[linkonly\] \[target 
> @var{selector}\] @}] @}
>  Specify additional source files to appear in the compile line
>  following the main test file.
>  If the directive includes the optional @samp{@{ @var{selector} @}}
>  then the additional sources are only added if the target system
>  matches the @var{selector}.
> -Additional sources are generally used only in @samp{link} and @samp{run}
> -tests; they are reported as unsupported and discarded in other kinds of
> -tests that direct the compiler to output to a single file.
> +If @samp{linkonly} is specified, additional sources are used only in
> +@samp{link} and @samp{run} tests; they are reported as unsupported and
> +discarded in other kinds of tests that direct the compiler to output to
> +a single file.
>  @end table
>
>  @subsubsection Add checks at the end of a test
> diff --git a/gcc/testsuite/g++.dg/vect/pr95401.cc 
> b/gcc/testsuite/g++.dg/vect/pr95401.cc
> index 6a56dab095722..8b1be4f242521 100644
> --- a/gcc/testsuite/g++.dg/vect/pr95401.cc
> +++ b/gcc/testsuite/g++.dg/vect/pr95401.cc
> @@ -1,5 +1,5 @@
>  // { dg-additional-options "-mavx2 -O3" { target avx2_runtime } }
> -// { dg-additional-sources pr95401a.cc }
> +// { dg-additional-sources pr95401a.cc linkonly }
>
>  extern int var_9;
>  extern unsigned var_14;
> diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp
> index cdca4c254d6ec..c6ec490f0092e 100644
> --- a/gcc/testsuite/lib/gcc-defs.exp
> +++ b/gcc/testsuite/lib/gcc-defs.exp
> @@ -303,18 +303,26 @@ proc dg-additional-options { args } {
>  # main source file.
>
>  set additional_sources ""
> +set additional_sources_omit_on_compile ""
>  set additional_sources_used ""
>
>  proc dg-additional-sources { args } {
>  global additional_sources
> +global additional_sources_omit_on_compile
>
>  if { [llength $args] > 3 } {
> error "[lindex $args 0]: too many arguments"
> return
>  }
>
> -if { [llength $args] >= 3 } {
> -   switch [dg-process-target [lindex $args 2]] {
> +set target [lindex $args 2]
> +if { [llength $args] >= 3 && [lindex $target 0] == "linkonly" } {
> +   append additional_sources_omit_on_compile " [lindex $args 1]"
> +   set target [lreplace $target 0 1]
> +}
> +
> +if { [llength $args] >= 3 && $target != ""} {
> +   switch [dg-process-target $target] {
> "S"

[PATCH] aarch64: Use bitreverse rtl code instead of unspec [PR115176]

2024-06-11 Thread Andrew Pinski
Bitreverse rtl code was added with r14-1586-g6160572f8d243c. So let's
use it instead of an unspec. This is just a small cleanup but it does
have one small fix with respect to rtx costs which didn't handle vector modes
correctly for the UNSPEC and now it does.
This is part of the first step in adding __builtin_bitreverse's builtins
but it is independent of it though.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR target/115176
* config/aarch64/aarch64-simd.md (aarch64_rbit): Use
bitreverse instead of unspec.
* config/aarch64/aarch64-sve-builtins-base.cc (svrbit): Convert over to 
using
rtx_code_function instead of unspec_based_function.
* config/aarch64/aarch64-sve.md: Update comment where RBIT is included.
* config/aarch64/aarch64.cc (aarch64_rtx_costs): Handle BITREVERSE like 
BSWAP.
Remove UNSPEC_RBIT support.
* config/aarch64/aarch64.md (unspec): Remove UNSPEC_RBIT.
(aarch64_rbit): Use bitreverse instead of unspec.
* config/aarch64/iterators.md (SVE_INT_UNARY): Add bitreverse.
(optab): Likewise.
(sve_int_op): Likewise.
(SVE_INT_UNARY): Remove UNSPEC_RBIT.
(optab): Likewise.
(sve_int_op): Likewise.
(min_elem_bits): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64-simd.md  |  3 +--
 gcc/config/aarch64/aarch64-sve-builtins-base.cc |  2 +-
 gcc/config/aarch64/aarch64-sve.md   |  2 +-
 gcc/config/aarch64/aarch64.cc   | 10 ++
 gcc/config/aarch64/aarch64.md   |  3 +--
 gcc/config/aarch64/iterators.md | 10 +-
 6 files changed, 11 insertions(+), 19 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index f644bd1731e..0bb39091a38 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -377,8 +377,7 @@ (define_insn "bswap2"
 
 (define_insn "aarch64_rbit"
   [(set (match_operand:VB 0 "register_operand" "=w")
-   (unspec:VB [(match_operand:VB 1 "register_operand" "w")]
-  UNSPEC_RBIT))]
+   (bitreverse:VB (match_operand:VB 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "rbit\\t%0., %1."
   [(set_attr "type" "neon_rbit")]
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 0d2edf3f19e..dea2f6e6bfc 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -3186,7 +3186,7 @@ FUNCTION (svqincp, svqdecp_svqincp_impl, (SS_PLUS, 
US_PLUS))
 FUNCTION (svqincw, svqinc_bhwd_impl, (SImode))
 FUNCTION (svqincw_pat, svqinc_bhwd_impl, (SImode))
 FUNCTION (svqsub, rtx_code_function, (SS_MINUS, US_MINUS, -1))
-FUNCTION (svrbit, unspec_based_function, (UNSPEC_RBIT, UNSPEC_RBIT, -1))
+FUNCTION (svrbit, rtx_code_function, (BITREVERSE, BITREVERSE, -1))
 FUNCTION (svrdffr, svrdffr_impl,)
 FUNCTION (svrecpe, unspec_based_function, (-1, UNSPEC_URECPE, UNSPEC_FRECPE))
 FUNCTION (svrecps, unspec_based_function, (-1, -1, UNSPEC_FRECPS))
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index d69db34016a..5331e7121d5 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3083,6 +3083,7 @@ (define_expand "vec_extract"
 ;; - CLS (= clrsb)
 ;; - CLZ
 ;; - CNT (= popcount)
+;; - RBIT (= bitreverse)
 ;; - NEG
 ;; - NOT
 ;; -
@@ -3171,7 +3172,6 @@ (define_insn "*cond__any"
 ;;  [INT] General unary arithmetic corresponding to unspecs
 ;; -
 ;; Includes
-;; - RBIT
 ;; - REVB
 ;; - REVH
 ;; - REVW
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 13191ec8e34..0e9d7b1ec0f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -14690,6 +14690,7 @@ cost_plus:
return true;
   }
 
+case BITREVERSE:
 case BSWAP:
   *cost = COSTS_N_INSNS (1);
 
@@ -15339,16 +15340,9 @@ cost_plus:
 
   return false;
 }
-
-  if (XINT (x, 1) == UNSPEC_RBIT)
-{
-  if (speed)
-*cost += extra_cost->alu.rev;
-
-  return false;
-}
   break;
 
+
 case TRUNCATE:
 
   /* Decompose muldi3_highpart.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index dd88fd891b5..69167ab0c04 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -259,7 +259,6 @@ (define_c_enum "unspec" [
 UNSPEC_PACIBSP
 UNSPEC_PRLG_STK
 UNSPEC_REV
-UNSPEC_RBIT
 UNSPEC_SADALP
 UNSPEC_

Re: [committed] [v2] More logical op simplifications in simplify-rtx.cc

2024-06-11 Thread Andrew Pinski
On Sat, May 25, 2024 at 11:42 AM Jeff Law  wrote:
>
> This is a revamp of what started as a target specific patch.
>
> Basically xalan (corrected, I originally thought it was perlbench) has a
> bitset implementation with a bit of an oddity.  Specifically setBit will
> clear the bit before it is set:
>
> > if (bitToSet < 32)
> > {
> > fBits1 &= ~mask;
> > fBits1 |= mask;
> > }
> >  else
> > {
> > fBits2 &= ~mask;
> > fBits2 |= mask;
> > }
>
> We can clean this up pretty easily in RTL with a small bit of code in
> simplify-rtx.  While xalan doesn't have other cases, we can synthesize
> tests pretty easily and handle them as well.
>
>
> It turns out we don't actually have to recognize this stuff at the bit
> level, just standard logical identities are sufficient.  For example
>
> (X | Y) & ~Y -> X & ~Y
>
>
>
> Andrew P. might poke at this at the gimple level.  The type changes
> kindof get in the way in gimple but he's much better at match.pd than I
> am, so if he wants to chase it from the gimple side, I'll fully support
> that.

So we already have this pattern (without the type change) in gimple:
 /* (~x | y) & x -> x & y */
 /* (~x & y) | x -> x | y */
 (simplify
  (bitop:c (rbitop:c @2 @1) @0)
  (with { bool wascmp; }
   (if (bitwise_inverted_equal_p (@0, @2, wascmp)
&& (!wascmp || element_precision (type) == 1))
(bitop @0 @1

The problem is bitwise_inverted_equal_p does not see that:
  c.0_4 = (signed char) _1;
  _5 = ~c.0_4;
  _16 = (charD.11) _5;

and
  c_11 = (charD.11) _1;

are bitwise inversions of each other.

I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115449 to keep
track of this.

Thanks,
Andrew Pinski




>
> Bootstrapped and regression tested on x86.  Also run through my tester
> on its embedded targets.
>
> Pushing to the trunk.
>
> jeff
>


Re: [PUSHED] Fix building JIT with musl libc [PR115442]

2024-06-11 Thread Andrew Pinski
On Tue, Jun 11, 2024 at 12:42 PM Andrew Pinski  wrote:
>
> Just like r13-6662-g0e6f87835ccabf but this time for jit/jit-recording.cc.
>
> Pushed as obvious after a quick build to make sure jit still builds.

Backported also to GCC 14 and GCC 13.

Thanks,
Andrew

>
> gcc/jit/ChangeLog:
>
> * jit-recording.cc: Define INCLUDE_SSTREAM before including
> system.h and don't directly incldue sstream.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/jit/jit-recording.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
> index 68a2e860c1f..70830e34965 100644
> --- a/gcc/jit/jit-recording.cc
> +++ b/gcc/jit/jit-recording.cc
> @@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>
>  #include "config.h"
> +#define INCLUDE_SSTREAM
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tm.h"
> @@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "jit-builtins.h"
>  #include "jit-recording.h"
>  #include "jit-playback.h"
> -#include 
>
>  namespace gcc {
>  namespace jit {
> --
> 2.43.0
>


[gcc r13-8842] Fix building JIT with musl libc [PR115442]

2024-06-11 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:6eb0e931097a8fec01591051c9ef582d52fe7f0c

commit r13-8842-g6eb0e931097a8fec01591051c9ef582d52fe7f0c
Author: Andrew Pinski 
Date:   Tue Jun 11 12:30:01 2024 -0700

Fix building JIT with musl libc [PR115442]

Just like r13-6662-g0e6f87835ccabf but this time for jit/jit-recording.cc.

Pushed as obvious after a quick build to make sure jit still builds.

gcc/jit/ChangeLog:

PR jit/115442
* jit-recording.cc: Define INCLUDE_SSTREAM before including
system.h and don't directly incldue sstream.

Signed-off-by: Andrew Pinski 
(cherry picked from commit e4244b88d75124f6957bfa080c8ad34017364e53)

Diff:
---
 gcc/jit/jit-recording.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index cf734cf7ef5f..914082ae861e 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_SSTREAM
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"


[gcc r14-10304] Fix building JIT with musl libc [PR115442]

2024-06-11 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:e6b1c0820590a1f330099ed7560982b5c6da4e91

commit r14-10304-ge6b1c0820590a1f330099ed7560982b5c6da4e91
Author: Andrew Pinski 
Date:   Tue Jun 11 12:30:01 2024 -0700

Fix building JIT with musl libc [PR115442]

Just like r13-6662-g0e6f87835ccabf but this time for jit/jit-recording.cc.

Pushed as obvious after a quick build to make sure jit still builds.

gcc/jit/ChangeLog:

PR jit/115442
* jit-recording.cc: Define INCLUDE_SSTREAM before including
system.h and don't directly incldue sstream.

Signed-off-by: Andrew Pinski 
(cherry picked from commit e4244b88d75124f6957bfa080c8ad34017364e53)

Diff:
---
 gcc/jit/jit-recording.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 68a2e860c1fb..70830e349653 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_SSTREAM
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-builtins.h"
 #include "jit-recording.h"
 #include "jit-playback.h"
-#include 
 
 namespace gcc {
 namespace jit {


[gcc r15-1188] Fix building JIT with musl libc [PR115442]

2024-06-11 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:e4244b88d75124f6957bfa080c8ad34017364e53

commit r15-1188-ge4244b88d75124f6957bfa080c8ad34017364e53
Author: Andrew Pinski 
Date:   Tue Jun 11 12:30:01 2024 -0700

Fix building JIT with musl libc [PR115442]

Just like r13-6662-g0e6f87835ccabf but this time for jit/jit-recording.cc.

Pushed as obvious after a quick build to make sure jit still builds.

gcc/jit/ChangeLog:

PR jit/115442
* jit-recording.cc: Define INCLUDE_SSTREAM before including
system.h and don't directly incldue sstream.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/jit/jit-recording.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 68a2e860c1fb..70830e349653 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_SSTREAM
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-builtins.h"
 #include "jit-recording.h"
 #include "jit-playback.h"
-#include 
 
 namespace gcc {
 namespace jit {


[PUSHED] Fix building JIT with musl libc [PR115442]

2024-06-11 Thread Andrew Pinski
Just like r13-6662-g0e6f87835ccabf but this time for jit/jit-recording.cc.

Pushed as obvious after a quick build to make sure jit still builds.

gcc/jit/ChangeLog:

* jit-recording.cc: Define INCLUDE_SSTREAM before including
system.h and don't directly incldue sstream.

Signed-off-by: Andrew Pinski 
---
 gcc/jit/jit-recording.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 68a2e860c1f..70830e34965 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 #include "config.h"
+#define INCLUDE_SSTREAM
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -29,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "jit-builtins.h"
 #include "jit-recording.h"
 #include "jit-playback.h"
-#include 
 
 namespace gcc {
 namespace jit {
-- 
2.43.0



[gcc r12-10546] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-06-11 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:d30afaae6764379a63c22459b40aaecfa82b0fc4

commit r12-10546-gd30afaae6764379a63c22459b40aaecfa82b0fc4
Author: Andrew Pinski 
Date:   Sat May 18 11:55:58 2024 -0700

PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 9ff8f041331ef8b56007fb3c4d41d76f9850010d)

Diff:
---
 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c | 21 +
 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c | 30 
 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c | 29 +++
 gcc/tree-ssa-phiopt.cc   |  4 
 4 files changed, 84 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index ..5cb119ea4325
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index ..05c3bbe9738e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index ..53c5fb5588e9
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index e2dba56383b4..558d5b4b57db 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1973,6 +1973,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)


Re: [PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-06-11 Thread Andrew Pinski
On Mon, May 20, 2024 at 11:08 PM Richard Biener
 wrote:
>
> On Mon, May 20, 2024 at 11:37 PM Andrew Pinski (QUIC)
>  wrote:
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Sunday, May 19, 2024 11:55 AM
> > > To: Andrew Pinski (QUIC) 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [PATCH] PHIOPT: Don't transform minmax if
> > > middle bb contains a phi [PR115143]
> > >
> > >
> > >
> > > > Am 19.05.2024 um 01:12 schrieb Andrew Pinski
> > > :
> > > >
> > > > The problem here is even if last_and_only_stmt returns a
> > > statement,
> > > > the bb might still contain a phi node which defines a ssa
> > > name which
> > > > is used in that statement so we need to add a check to make
> > > sure that
> > > > the phi nodes are empty for the middle bbs in both the
> > > > `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B`
> > > cases.
> > >
> > > Is that single arg PHIs or do we have an extra edge into the
> > > middle BB?  I think that might be unexpected, at least costing
> > > wise.  Maybe Also to some of the replacement code we have ?
> >
> > It is only a single arg PHI since we already reject multiple edges in the 
> > middle BBs for these cases.
> > It was EVPR that produces the single arg PHI in the original testcase from 
> > folding of a conditional to false and evpr does not do simple name prop in 
> > this case and there is no pass inbetween evrp and phiopt that will clear up 
> > single arg PHI.
> > I added the Gimple based testcases basically to avoid the needing of 
> > depending on what previous passes could produce too.
> >
> > >
> > > > OK for trunk and backport to all open branches since r14-
> > > 3827-g30e6ee074588ba was backported?
> > > > Bootstrapped and tested on x86_64_linux-gnu with no
> > > regressions.
> > > >
> > >
> > > Ok
> >
> > Does this include the GCC 13 branch or should I wait until after the GCC 
> > 13.3.0 release?
>
> Please wait until after the release.

Committed the modified attached patch for GCC 12. GCC 12 didn't have
the diamond case which is why a modified patch was needed.

Thanks,
Andrew

>
> Thanks,
> Richard.
>
> > Thanks,
> > Andrew Pinski
> >
> > >
> > > Richard
> > >
> > > >PR tree-optimization/115143
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > >* tree-ssa-phiopt.cc (minmax_replacement): Check for
> > > empty
> > > >phi nodes for middle bbs for the case where middle bb is
> > > not empty.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >* gcc.c-torture/compile/pr115143-1.c: New test.
> > > >* gcc.c-torture/compile/pr115143-2.c: New test.
> > > >* gcc.c-torture/compile/pr115143-3.c: New test.
> > > >
> > > > Signed-off-by: Andrew Pinski 
> > > > ---
> > > > .../gcc.c-torture/compile/pr115143-1.c| 21
> > > +
> > > > .../gcc.c-torture/compile/pr115143-2.c| 30
> > > +++
> > > > .../gcc.c-torture/compile/pr115143-3.c| 29
> > > ++
> > > > gcc/tree-ssa-phiopt.cc| 12 
> > > > 4 files changed, 92 insertions(+)
> > > > create mode 100644 gcc/testsuite/gcc.c-
> > > torture/compile/pr115143-1.c
> > > > create mode 100644 gcc/testsuite/gcc.c-
> > > torture/compile/pr115143-2.c
> > > > create mode 100644 gcc/testsuite/gcc.c-
> > > torture/compile/pr115143-3.c
> > > >
> > > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > > b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > > new file mode 100644
> > > > index 000..5cb119ea432
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > > > @@ -0,0 +1,21 @@
> > > > +/* PR tree-optimization/115143 */
> > > > +/* This used to ICE.
> > > > +   minmax part of phiopt would transform,
> > > > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > > > +   which was correct except b was defined by a phi in the
> > > inner
> > > > +   bb which was not handled. */
> > > > +short a, d;
> > > 

Re: [PATCH v1] Test: Move target independent test cases to gcc.dg/torture

2024-06-11 Thread Andrew Pinski
On Mon, Jun 10, 2024, 11:20 PM  wrote:

> From: Pan Li 
>
> The test cases of pr115387 are target independent,  at least x86
> and riscv are able to reproduce.  Thus,  move these cases to
> the gcc.dg/torture.
>
> The below test suites are passed.
> 1. The rv64gcv fully regression test.
> 2. The x86 fully regression test.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr115387-1.c: Move to...
> * gcc.dg/torture/pr115387-1.c: ...here.
> * gcc.target/riscv/pr115387-2.c: Move to...
> * gcc.dg/torture/pr115387-2.c: ...here.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-1.c | 2 +-
>  gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-2.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>  rename gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-1.c
> (92%)
>  rename gcc/testsuite/{gcc.target/riscv => gcc.dg/torture}/pr115387-2.c
> (84%)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> b/gcc/testsuite/gcc.dg/torture/pr115387-1.c
> similarity index 92%
> rename from gcc/testsuite/gcc.target/riscv/pr115387-1.c
> rename to gcc/testsuite/gcc.dg/torture/pr115387-1.c
> index a1c926977c4..fde79f66757 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr115387-1.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr115387-1.c
> @@ -1,6 +1,6 @@
>  /* Test there is no ICE when compile.  */
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +/* { dg-options "-O3" } */
>

Since you are moving it to torture, please remove -O3 as it is already
supplied there as one of the torture options.



>  #define PRINTF_CHK 0x34
>
> diff --git a/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
> similarity index 84%
> rename from gcc/testsuite/gcc.target/riscv/pr115387-2.c
> rename to gcc/testsuite/gcc.dg/torture/pr115387-2.c
> index 7183bf18dfd..0cb4b48d27b 100644
> --- a/gcc/testsuite/gcc.target/riscv/pr115387-2.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr115387-2.c
> @@ -1,6 +1,6 @@
>  /* Test there is no ICE when compile.  */
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +/* { dg-options "-O3" } */
>
>  #include 
>  #include 
> --
> 2.34.1
>
>


RE: [PATCH] aarch64: Improve popcount for bytes [PR113042]

2024-06-10 Thread Andrew Pinski (QUIC)


> -Original Message-
> From: Kyrylo Tkachov 
> Sent: Monday, June 10, 2024 12:26 AM
> To: Andrew Pinski (QUIC) ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> Hi Andrew
> 
> -----Original Message-
> From: Andrew Pinski  <mailto:quic_apin...@quicinc.com>>
> Date: Monday, 10 June 2024 at 06:05
> To: "gcc-patches@gcc.gnu.org <mailto:gcc-
> patc...@gcc.gnu.org>"  <mailto:gcc-patches@gcc.gnu.org>>
> Cc: Andrew Pinski  <mailto:quic_apin...@quicinc.com>>
> Subject: [PATCH] aarch64: Improve popcount for bytes
> [PR113042]
> 
> 
> For popcount for bytes, we don't need the reduction addition
> after the vector cnt instruction as we are only counting one
> byte's popcount.
> This implements a new define_expand to handle that.
> 
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.
> 
> 
> PR target/113042
> 
> 
> gcc/ChangeLog:
> 
> 
> * config/aarch64/aarch64.md (popcountqi2): New pattern.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 
> * gcc.target/aarch64/popcnt5.c: New test.
> 
> 
> Signed-off-by: Andrew Pinski  <mailto:quic_apin...@quicinc.com>>
> ---
> gcc/config/aarch64/aarch64.md | 26
> ++
> gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19
> 
> 2 files changed, 45 insertions(+)
> create mode 100644
> gcc/testsuite/gcc.target/aarch64/popcnt5.c
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.md
> b/gcc/config/aarch64/aarch64.md index
> 389a1906e23..ebaf7ec9970 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5358,6 +5358,32 @@ (define_expand
> "popcount2"
> }
> })
> 
> 
> +/* Popcount for byte can remove the reduction part after the
> popcount.
> + For optimization reasons, enabling this for CSSC. */
> (define_expand
> +"popcountqi2"
> + [(set (match_operand:QI 0 "register_operand" "=w")
> (popcount:QI
> +(match_operand:QI 1 "register_operand" "w")))]
> "TARGET_CSSC ||
> +TARGET_SIMD"
> +{
> + rtx in = operands[1];
> + rtx out = operands[0];
> + if (TARGET_CSSC)
> + {
> + rtx tmp = gen_reg_rtx (SImode);
> + rtx out1 = gen_reg_rtx (SImode);
> + emit_insn (gen_zero_extendqisi2 (tmp, in));  emit_insn
> +(gen_popcountsi2 (out1, tmp));  emit_move_insn (out,
> gen_lowpart
> +(QImode, out1));  DONE;  }  rtx v = gen_reg_rtx (V8QImode);
> rtx v1 =
> +gen_reg_rtx (V8QImode);  emit_move_insn (v, gen_lowpart
> (V8QImode,
> +in));  emit_insn (gen_popcountv8qi2 (v1, v));
> emit_move_insn (out,
> +gen_lowpart (QImode, v1));  DONE;
> +})
> 
> TBH I'd rather merge it with the GPI popcount pattern that
> looks almost identical. You could extend it with the ALLI
> iterator and handle HImode as well quite easily.

I was thinking about that beforehand, but I was trying for the simplified patch 
at the time.
Anyways I posted the updated version: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654115.html

And it includes the CSSC testcases too to make sure the generated code is 
correct.

Thanks,
Andrew Pinski



> Thanks,
> Kyrill
> 
> 
> +
> (define_insn "clrsb2"
> [(set (match_operand:GPI 0 "register_operand" "=r")
> (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> new file mode 100644
> index 000..406369d9b29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +/* PR target/113042 */
> +
> +#pragma GCC target "+nocssc"
> +
> +/*
> +** h8:
> +** ldr b[0-9]+, \[x0\]
> +** cnt v[0-9]+.8b, v[0-9]+.8b
> +** smov w0, v[0-9]+.b\[0\]
> +** ret
> +*/
> +/* We should not need the addv here since we only need a
> byte popcount.
> +*/
> +
> +unsigned h8 (const unsigned char *a) {
> + return __builtin_popcountg (a[0]);
> +}
> --
> 2.42.0
> 
> 
> 
> 



[PATCH v2] aarch64: Improve popcount for bytes [PR113042]

2024-06-10 Thread Andrew Pinski
For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This changes the popcount extend to cover all ALLI rather than GPI.

Changes since v1:
* v2 - Use ALLI iterator and combine all into one pattern.
   Add new testcases popcnt[6-8].c.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcount2): Update pattern
to support ALLI modes.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt5.c: New test.
* gcc.target/aarch64/popcnt6.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 52 +++---
 gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
 gcc/testsuite/gcc.target/aarch64/popcnt6.c | 19 
 gcc/testsuite/gcc.target/aarch64/popcnt7.c | 18 
 gcc/testsuite/gcc.target/aarch64/popcnt8.c | 18 
 5 files changed, 119 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt6.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt7.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt8.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 389a1906e23..dd88fd891b5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5332,28 +5332,66 @@ (define_insn "*aarch64_popcount2_cssc_insn"
 ;; MOV w0, v2.b[0]
 
 (define_expand "popcount2"
-  [(set (match_operand:GPI 0 "register_operand")
-   (popcount:GPI (match_operand:GPI 1 "register_operand")))]
+  [(set (match_operand:ALLI 0 "register_operand")
+   (popcount:ALLI (match_operand:ALLI 1 "register_operand")))]
   "TARGET_CSSC || TARGET_SIMD"
 {
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if (TARGET_CSSC
+  && (mode == HImode
+  || mode == QImode))
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx out1 = gen_reg_rtx (SImode);
+  if (mode == HImode)
+emit_insn (gen_zero_extendhisi2 (tmp, in));
+  else
+emit_insn (gen_zero_extendqisi2 (tmp, in));
+  emit_insn (gen_popcountsi2 (out1, tmp));
+  emit_move_insn (out, gen_lowpart (mode, out1));
+  DONE;
+}
   if (!TARGET_CSSC)
 {
   rtx v = gen_reg_rtx (V8QImode);
   rtx v1 = gen_reg_rtx (V8QImode);
   rtx in = operands[1];
   rtx out = operands[0];
-  if(mode == SImode)
+  /* SImode and HImode should be zero extended to DImode. */
+  if (mode == SImode || mode == HImode)
{
  rtx tmp;
  tmp = gen_reg_rtx (DImode);
- /* If we have SImode, zero extend to DImode, pop count does
-not change if we have extra zeros. */
- emit_insn (gen_zero_extendsidi2 (tmp, in));
+ /* If we have SImode, zero extend to DImode,
+pop count does not change if we have extra zeros. */
+ if (mode == SImode)
+   emit_insn (gen_zero_extendsidi2 (tmp, in));
+ else
+   emit_insn (gen_zero_extendhidi2 (tmp, in));
  in = tmp;
}
   emit_move_insn (v, gen_lowpart (V8QImode, in));
   emit_insn (gen_popcountv8qi2 (v1, v));
-  emit_insn (gen_aarch64_zero_extend_reduc_plus_v8qi (out, v1));
+  /* QImode, just extract from the v8qi vector.  */
+  if (mode == QImode)
+   {
+ emit_move_insn (out, gen_lowpart (QImode, v1));
+   }
+  /* HI and SI, reduction is zero extended to SImode. */
+  else if (mode == SImode || mode == HImode)
+   {
+ rtx out1;
+ out1 = gen_reg_rtx (SImode);
+ emit_insn (gen_aarch64_zero_extendsi_reduc_plus_v8qi (out1, v1));
+ emit_move_insn (out, gen_lowpart (mode, out1));
+   }
+  /* DImode, reduction is zero extended to DImode. */
+  else
+   {
+ gcc_assert (mode == DImode);
+ emit_insn (gen_aarch64_zero_extenddi_reduc_plus_v8qi (out, v1));
+   }
   DONE;
 }
 })
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
new file mode 100644
index 000..406369d9b29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h8:
+** ldr b[0-9]+, \[x0\]
+** cnt v[0-9]+.8b, v[0-9]+.8b
+** smovw0, v[0-9]+.b\[0\]
+** ret
+*/
+/* We should not need the addv here since we only need a byte popcount. */
+
+unsigned h8 (const unsigned char *a) {
+ return __builtin_popcountg (a[0]);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt6.c 
b/gcc/testsuite/gcc.tar

[gcc r15-1165] Fix pr115388.c: plain char could be unsigned by default [PR115415]

2024-06-10 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:c3d1153bc0a2b820e3c373ecf19a5a127703f854

commit r15-1165-gc3d1153bc0a2b820e3c373ecf19a5a127703f854
Author: Andrew Pinski 
Date:   Mon Jun 10 08:23:00 2024 -0700

Fix pr115388.c: plain char could be unsigned by default [PR115415]

This is a simple fix to the testcase as plain `char` could be
unsigned by default on some targets (e.g. aarch64 and powerpc).

Committed as obvious after quick test of the testcase on both aarch64 and 
x86_64.

gcc/testsuite/ChangeLog:

PR testsuite/115415
PR tree-optimization/115388
* gcc.dg/torture/pr115388.c: Use `signed char` directly instead
of plain `char`.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr115388.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115388.c 
b/gcc/testsuite/gcc.dg/torture/pr115388.c
index c7c902888da..17b3f1bcd90 100644
--- a/gcc/testsuite/gcc.dg/torture/pr115388.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115388.c
@@ -2,7 +2,7 @@
 
 int printf(const char *, ...);
 int a[10], b, c, d[0], h, i, j, k, l;
-char e = -1, g;
+signed char e = -1, g;
 volatile int f;
 static void n() {
   while (e >= 0)


[PUSHED] Fix pr115388.c: plain char could be unsigned by default [PR115415]

2024-06-10 Thread Andrew Pinski
This is a simple fix to the testcase as plain `char` could be
unsigned by default on some targets (e.g. aarch64 and powerpc).

Committed as obvious after quick test of the testcase on both aarch64 and 
x86_64.

gcc/testsuite/ChangeLog:

PR testsuite/115415
PR tree-optimization/115388
* gcc.dg/torture/pr115388.c: Use `signed char` directly instead
of plain `char`.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/torture/pr115388.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr115388.c 
b/gcc/testsuite/gcc.dg/torture/pr115388.c
index c7c902888da..17b3f1bcd90 100644
--- a/gcc/testsuite/gcc.dg/torture/pr115388.c
+++ b/gcc/testsuite/gcc.dg/torture/pr115388.c
@@ -2,7 +2,7 @@
 
 int printf(const char *, ...);
 int a[10], b, c, d[0], h, i, j, k, l;
-char e = -1, g;
+signed char e = -1, g;
 volatile int f;
 static void n() {
   while (e >= 0)
-- 
2.43.0



[PATCH] aarch64: Improve popcount for bytes [PR113042]

2024-06-09 Thread Andrew Pinski
For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This implements a new define_expand to handle that.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcountqi2): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt5.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 26 ++
 gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 389a1906e23..ebaf7ec9970 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5358,6 +5358,32 @@ (define_expand "popcount2"
 }
 })
 
+/* Popcount for byte can remove the reduction part after the popcount.
+   For optimization reasons, enabling this for CSSC. */
+(define_expand "popcountqi2"
+  [(set (match_operand:QI 0 "register_operand" "=w")
+   (popcount:QI (match_operand:QI 1 "register_operand" "w")))]
+  "TARGET_CSSC || TARGET_SIMD"
+{
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if (TARGET_CSSC)
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx out1 = gen_reg_rtx (SImode);
+  emit_insn (gen_zero_extendqisi2 (tmp, in));
+  emit_insn (gen_popcountsi2 (out1, tmp));
+  emit_move_insn (out, gen_lowpart (QImode, out1));
+  DONE;
+}
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_move_insn (out, gen_lowpart (QImode, v1));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
new file mode 100644
index 000..406369d9b29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h8:
+** ldr b[0-9]+, \[x0\]
+** cnt v[0-9]+.8b, v[0-9]+.8b
+** smovw0, v[0-9]+.b\[0\]
+** ret
+*/
+/* We should not need the addv here since we only need a byte popcount. */
+
+unsigned h8 (const unsigned char *a) {
+ return __builtin_popcountg (a[0]);
+}
-- 
2.42.0



Re: [PATCH] FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

2024-06-09 Thread Andrew Pinski
On Sun, Jun 9, 2024 at 2:22 PM Gerald Pfeifer  wrote:
>
> On Fri, 13 Aug 2021, Andreas Tobler via Gcc-patches wrote:
> > I would like to commit the attached patch to trunk and after a settling
> > period also to all open branches.
> > Is this ok?
>
> Our MAINTAINERS file has the following entry:
>
>   freebsd   Andreas Tobler   
>
> So ... yes. :-)
>
> Seeing this did not make it into our tree, I applied the patchset,
> bootstrapped on x86_64-unknown-freebsd13.2 and pushed with a minor
> simplification to the ChangeLog. Patch as pushed below...

Note this was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113218
which I just closed as fixed. I am not sure if someone was going to
backport it though.

Thanks,
Andrew

>
> Gerald
>
>
> commit 48abb540701447b0cd9df7542720ab65a34fc1b1
> Author: Andreas Tobler 
> Date:   Sun Jun 9 23:18:04 2024 +0200
>
> FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14
>
> As of FreeBSD version 14, FreeBSD no longer provides profiled system
> libraries like libc_p and libpthread_p. Stop linking against them if
> the FreeBSD major version is 14 or more.
>
> gcc:
> * config/freebsd-spec.h: Change fbsd-lib-spec for FreeBSD > 13,
> do not link against profiled system libraries if -pg is invoked.
> Add a define to note about this change.
> * config/aarch64/aarch64-freebsd.h: Use the note to inform if
> -pg is invoked on FreeBSD > 13.
> * config/arm/freebsd.h: Likewise.
> * config/i386/freebsd.h: Likewise.
> * config/i386/freebsd64.h: Likewise.
> * config/riscv/freebsd.h: Likewise.
> * config/rs6000/freebsd64.h: Likewise.
> * config/rs6000/sysv4.h: Likeise.
>
> diff --git a/gcc/config/aarch64/aarch64-freebsd.h 
> b/gcc/config/aarch64/aarch64-freebsd.h
> index 53cc17a1caf..e26d69ce46c 100644
> --- a/gcc/config/aarch64/aarch64-freebsd.h
> +++ b/gcc/config/aarch64/aarch64-freebsd.h
> @@ -35,6 +35,7 @@
>  #undef  FBSD_TARGET_LINK_SPEC
>  #define FBSD_TARGET_LINK_SPEC " \
>  %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}  \
> +" FBSD_LINK_PG_NOTE "  \
>  %{v:-V} \
>  %{assert*} %{R*} %{rpath*} %{defsym*}   \
>  %{shared:-Bshareable %{h*} %{soname*}}  \
> diff --git a/gcc/config/arm/freebsd.h b/gcc/config/arm/freebsd.h
> index 9d0a5a842ab..ee4860ae637 100644
> --- a/gcc/config/arm/freebsd.h
> +++ b/gcc/config/arm/freebsd.h
> @@ -47,6 +47,7 @@
>  #undef LINK_SPEC
>  #define LINK_SPEC "\
>%{p:%nconsider using `-pg' instead of `-p' with gprof (1)}   \
> +  " FBSD_LINK_PG_NOTE "  
>   \
>%{v:-V}  \
>%{assert*} %{R*} %{rpath*} %{defsym*}  
>   \
>%{shared:-Bshareable %{h*} %{soname*}}   \
> diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
> index a6d1ad1280f..f43056bf2cf 100644
> --- a/gcc/config/freebsd-spec.h
> +++ b/gcc/config/freebsd-spec.h
> @@ -92,19 +92,29 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> libc, depending on whether we're doing profiling or need threads support.
> (similar to the default, except no -lg, and no -p).  */
>
> +#if FBSD_MAJOR < 14
> +#define FBSD_LINK_PG_NOTHREADS "%{!pg: -lc}  %{pg: -lc_p}"
> +#define FBSD_LINK_PG_THREADS   "%{!pg: %{pthread:-lpthread} -lc} " \
> +   "%{pg: %{pthread:-lpthread} -lc_p}"
> +#define FBSD_LINK_PG_NOTE ""
> +#else
> +#define FBSD_LINK_PG_NOTHREADS "%{-lc} "
> +#define FBSD_LINK_PG_THREADS   "%{pthread:-lpthread} -lc "
> +#define FBSD_LINK_PG_NOTE "%{pg:%nFreeBSD no longer provides profiled "\
> + "system libraries}"
> +#endif
> +
>  #ifdef FBSD_NO_THREADS
>  #define FBSD_LIB_SPEC "  
>   \
>%{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
>  is built with the --enable-threads configure-time option.} \
>%{!shared:   \
> -%{!pg: -lc}  
>   \
> -%{pg:  -lc_p}  \
> +" FBSD_LINK_PG_NOTHREADS " \
>}"
>  #else
>  #define FBSD_LIB_SPEC "  
>   \
>%{!shared:   \
> -%{!pg: %{pthread:-lpthread} -lc}   \
> -%{pg:  %{pthread:-lpthread_p} -lc_p}   \
> +" 

Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread Andrew Pinski
On Thu, Jun 6, 2024 at 9:00 AM David Malcolm  wrote:
>
> On Thu, 2024-06-06 at 08:40 -0700, Andrew Pinski wrote:
> > On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg
> >  wrote:
> > >
> > > Dear David,
> > >
> > > On Tue, May 28, 2024 at 10:07 PM David Malcolm
> > >  wrote:
> > > >
> > > > No functional change intended.
> > > >
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > Pushed to trunk as r15-874-g9bda2c4c81b668.
> > > >
> > > > libcpp/ChangeLog:
> > > > * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > > > * include/label-text.h: New file.
> > > > * include/rich-location.h: Include "label-text.h".
> > > > (class label_text): Move to label-text.h.
> > > >
> > > > Signed-off-by: David Malcolm 
> > > > ---
> > > >  libcpp/Makefile.in |   2 +-
> > > >  libcpp/include/label-text.h| 102
> > > > +
> > > >  libcpp/include/rich-location.h |  79 +
> > > >  3 files changed, 105 insertions(+), 78 deletions(-)
> > > >  create mode 100644 libcpp/include/label-text.h
> > > >
> > > > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > > > index ebbca3fb..7e47153264c0 100644
> > > > --- a/libcpp/Makefile.in
> > > > +++ b/libcpp/Makefile.in
> > > > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> > > >
> > > >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> > > >  include/cpplib.h include/line-map.h include/mkdeps.h
> > > > include/symtab.h \
> > > > -include/rich-location.h
> > > > +include/rich-location.h include/label-text.h
> > >
> > > this does not seem to be enough that the new header will be
> > > installed.
> > > I get compile errors when compiling an plug-in with this patch:
> > >
> > > In file included from
> > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > gnu/15.0.0/plugin/include/diagnostic.h:24,
> > > from
> > > /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-
> > > plugin/../src/adapters/compiler/gcc-
> > > plugin/scorep_plugin_inst_descriptor.cpp:43:
> > > /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-
> > > gnu/15.0.0/plugin/include/rich-location.h:25:10:
> > > fatal error: label-text.h: No such file or directory
> > > 25 | #include "label-text.h"
> > > > ^~
> > > compilation terminated.
> >
> > I have a fix which I am testing.
>
> Likewise (and sorry about the breakage)

Committed as r15-1076-g6e6471806d886b .

>
> Dave
>


[gcc r15-1076] Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]

2024-06-06 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:6e6471806d886bc052d3922d636d49aaf75d5d16

commit r15-1076-g6e6471806d886bc052d3922d636d49aaf75d5d16
Author: Andrew Pinski 
Date:   Thu May 30 07:59:00 2024 -0700

Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]

After r15-874-g9bda2c4c81b668, out of tree plugins won't compile
as the new libcpp header file label-text.h is not installed.

This adds the new header file to CPPLIB_H which is used for
the plugin headers to install.

Committed as obvious after a build and install and make sure
the new header file is installed.

gcc/ChangeLog:

PR plugins/115288
* Makefile.in (CPPLIB_H): Add label-text.h.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c983b0c102a..f5adb647d3f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1038,6 +1038,7 @@ SYSTEM_H = system.h hwint.h 
$(srcdir)/../include/libiberty.h \
 PREDICT_H = predict.h predict.def
 CPPLIB_H = $(srcdir)/../libcpp/include/line-map.h \
$(srcdir)/../libcpp/include/rich-location.h \
+   $(srcdir)/../libcpp/include/label-text.h \
$(srcdir)/../libcpp/include/cpplib.h
 CODYLIB_H = $(srcdir)/../libcody/cody.hh
 INPUT_H = $(srcdir)/../libcpp/include/line-map.h input.h


[COMMITTED] Plugins: Add label-text.h to CPPLIB_H so it will be installed [PR115288]

2024-06-06 Thread Andrew Pinski
After r15-874-g9bda2c4c81b668, out of tree plugins won't compile
as the new libcpp header file label-text.h is not installed.

This adds the new header file to CPPLIB_H which is used for
the plugin headers to install.

Committed as obvious after a build and install and make sure
the new header file is installed.

gcc/ChangeLog:

* Makefile.in (CPPLIB_H): Add label-text.h.

Signed-off-by: Andrew Pinski 
---
 gcc/Makefile.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c983b0c102a..f5adb647d3f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1038,6 +1038,7 @@ SYSTEM_H = system.h hwint.h 
$(srcdir)/../include/libiberty.h \
 PREDICT_H = predict.h predict.def
 CPPLIB_H = $(srcdir)/../libcpp/include/line-map.h \
$(srcdir)/../libcpp/include/rich-location.h \
+   $(srcdir)/../libcpp/include/label-text.h \
$(srcdir)/../libcpp/include/cpplib.h
 CODYLIB_H = $(srcdir)/../libcody/cody.hh
 INPUT_H = $(srcdir)/../libcpp/include/line-map.h input.h
-- 
2.43.0



Re: [pushed 2/3] libcpp: move label_text to its own header

2024-06-06 Thread Andrew Pinski
On Thu, Jun 6, 2024 at 6:02 AM Bert Wesarg  wrote:
>
> Dear David,
>
> On Tue, May 28, 2024 at 10:07 PM David Malcolm  wrote:
> >
> > No functional change intended.
> >
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > Pushed to trunk as r15-874-g9bda2c4c81b668.
> >
> > libcpp/ChangeLog:
> > * Makefile.in (TAGS_SOURCES): Add include/label-text.h.
> > * include/label-text.h: New file.
> > * include/rich-location.h: Include "label-text.h".
> > (class label_text): Move to label-text.h.
> >
> > Signed-off-by: David Malcolm 
> > ---
> >  libcpp/Makefile.in |   2 +-
> >  libcpp/include/label-text.h| 102 +
> >  libcpp/include/rich-location.h |  79 +
> >  3 files changed, 105 insertions(+), 78 deletions(-)
> >  create mode 100644 libcpp/include/label-text.h
> >
> > diff --git a/libcpp/Makefile.in b/libcpp/Makefile.in
> > index ebbca3fb..7e47153264c0 100644
> > --- a/libcpp/Makefile.in
> > +++ b/libcpp/Makefile.in
> > @@ -271,7 +271,7 @@ ETAGS = @ETAGS@
> >
> >  TAGS_SOURCES = $(libcpp_a_SOURCES) internal.h system.h ucnid.h \
> >  include/cpplib.h include/line-map.h include/mkdeps.h include/symtab.h \
> > -include/rich-location.h
> > +include/rich-location.h include/label-text.h
>
> this does not seem to be enough that the new header will be installed.
> I get compile errors when compiling an plug-in with this patch:
>
> In file included from
> /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/diagnostic.h:24,
> from 
> /home/bitten/builds/oCyPvWN6/1/perftools/cicd/scorep/src/build-gcc-plugin/../src/adapters/compiler/gcc-plugin/scorep_plugin_inst_descriptor.cpp:43:
> /home/bitten/opt/gcc-15-20240602/lib/gcc/x86_64-pc-linux-gnu/15.0.0/plugin/include/rich-location.h:25:10:
> fatal error: label-text.h: No such file or directory
> 25 | #include "label-text.h"
> | ^~
> compilation terminated.

I have a fix which I am testing.

>
> Best,
> Bert
>
> >
> >
> >  TAGS: $(TAGS_SOURCES)
> > diff --git a/libcpp/include/label-text.h b/libcpp/include/label-text.h
> > new file mode 100644
> > index ..13562cda41f9
> > --- /dev/null
> > +++ b/libcpp/include/label-text.h
> > @@ -0,0 +1,102 @@
> > +/* A very simple string class.
> > +   Copyright (C) 2015-2024 Free Software Foundation, Inc.
> > +
> > +This program is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.
> > +
> > +This program is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with this program; see the file COPYING3.  If not see
> > +.
> > +
> > + In other words, you are welcome to use, share and improve this program.
> > + You are forbidden to forbid anyone else to use, share and improve
> > + what you give them.   Help stamp out software-hoarding!  */
> > +
> > +#ifndef LIBCPP_LABEL_TEXT_H
> > +#define LIBCPP_LABEL_TEXT_H
> > +
> > +/* A struct for the result of range_label::get_text: a NUL-terminated 
> > buffer
> > +   of localized text, and a flag to determine if the caller should "free" 
> > the
> > +   buffer.  */
> > +
> > +class label_text
> > +{
> > +public:
> > +  label_text ()
> > +  : m_buffer (NULL), m_owned (false)
> > +  {}
> > +
> > +  ~label_text ()
> > +  {
> > +if (m_owned)
> > +  free (m_buffer);
> > +  }
> > +
> > +  /* Move ctor.  */
> > +  label_text (label_text &)
> > +  : m_buffer (other.m_buffer), m_owned (other.m_owned)
> > +  {
> > +other.release ();
> > +  }
> > +
> > +  /* Move assignment.  */
> > +  label_text & operator= (label_text &)
> > +  {
> > +if (m_owned)
> > +  free (m_buffer);
> > +m_buffer = other.m_buffer;
> > +m_owned = other.m_owned;
> > +other.release ();
> > +return *this;
> > +  }
> > +
> > +  /* Delete the copy ctor and copy-assignment operator.  */
> > +  label_text (const label_text &) = delete;
> > +  label_text & operator= (const label_text &) = delete;
> > +
> > +  /* Create a label_text instance that borrows BUFFER from a
> > + longer-lived owner.  */
> > +  static label_text borrow (const char *buffer)
> > +  {
> > +return label_text (const_cast  (buffer), false);
> > +  }
> > +
> > +  /* Create a label_text instance that takes ownership of BUFFER.  */
> > +  static label_text take (char *buffer)
> > +  {
> > +return label_text (buffer, true);
> > +  }
> > +
> > +  void release ()
> > +  {
> > +m_buffer = NULL;
> > +m_owned = false;
> > +  }
> > +
> > +  const char *get () const
> > +  {
> > +

Re: [PATCH] testsuite: Improve check-function-bodies

2024-06-05 Thread Andrew Pinski
On Fri, May 31, 2024 at 9:09 AM Richard Sandiford
 wrote:
>
> Wilco Dijkstra  writes:
> > Improve check-function-bodies by allowing single-character function names.
> > Also skip '#' comments which may be emitted from inline assembler.
> >
> > Passes regress, OK for commit?
> >
> > gcc/testsuite:
> > * lib/scanasm.exp (configure_check-function-bodies): Allow 
> > single-char
> > function names.  Skip '#' comments.
> >
> > ---
> >
> > diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> > index 
> > 6cf9997240deec274a191103d21690d80e34ba95..0e461ef260b7a6fee5a9c60d0571e46468f752c0
> >  100644
> > --- a/gcc/testsuite/lib/scanasm.exp
> > +++ b/gcc/testsuite/lib/scanasm.exp
> > @@ -869,15 +869,15 @@ proc configure_check-function-bodies { config } {
> >  # Regexp for the start of a function definition (name in \1).
> >  if { [istarget nvptx*-*-*] } {
> >   set up_config(start) {
> > - {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S+)$}
> > + {^// BEGIN(?: GLOBAL|) FUNCTION DEF: ([a-zA-Z_]\S*)$}
> >   }
> >  } elseif { [istarget *-*-darwin*] } {
> >   set up_config(start) {
> > - {^_([a-zA-Z_]\S+):$}
> > + {^_([a-zA-Z_]\S*):$}
> >   {^LFB[0-9]+:}
> >   }
> >  } else {
> > - set up_config(start) {{^([a-zA-Z_]\S+):$}}
> > + set up_config(start) {{^([a-zA-Z_]\S*):$}}
> >  }
> >
> >  # Regexp for the end of a function definition.
>
> This part is ok, thanks.

Note the issue with single function names was recorded as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111658 (which I just
closed as fixed).

Thanks,
Andrew Pinski

>
> > @@ -899,9 +899,9 @@ proc configure_check-function-bodies { config } {
> >  } else {
> >   # Skip lines beginning with labels ('.L[...]:') or other directives
> >   # ('.align', '.cfi_startproc', '.quad [...]', '.text', etc.), '//' or
> > - # '@' comments ('-fverbose-asm' or ARM-style, for example), or empty
> > - # lines.
> > - set up_config(fluff) {^\s*(?:\.|//|@|$)}
> > + # '@' or '#' comments ('-fverbose-asm' or ARM-style, for example), or
> > + # empty lines.
> > + set up_config(fluff) {^\s*(?:\.|//|@|#|$)}
> >  }
> >
> >  # Regexp for expected output lines prefix.
>
> I think this should be done separately.  It looks like at least
> gcc.target/riscv/target-attr-06.c relies on the current behaviour.
>
> Richard


Re: [PATCH] libstdc++: Implement C++26 features (P2546R5)

2024-06-03 Thread Andrew Pinski
On Mon, Jun 3, 2024 at 9:06 AM Peter0x44  wrote:
>
> 3 Jun 2024 4:14:28 pm Jonathan Wakely :
>
> > On Mon, 3 Jun 2024 at 14:36, Peter0x44 wrote:
> >>> +void
> >>> +std::breakpoint() noexcept
> >>> +{
> >>> +#if _GLIBCXX_HAVE_DEBUGAPI_H && defined(_WIN32) &&
> >>> !defined(__CYGWIN__)
> >>> +  DebugBreak();
> >>> +#elif __has_builtin(__builtin_debugtrap)
> >>> +  __builtin_debugtrap(); // Clang
> >>> +#elif defined(__i386__) || defined(__x86_64__)
> >>> +  __asm__ volatile ("int3");
> >> Worth noting, currently gcc has some bugs around its handling of int3,
> >
> > s/gcc/gdb/ ?
> Yes, my bad
> >
> >> https://sourceware.org/bugzilla/show_bug.cgi?id=31194
> >> int3;nop seems to work around it okay. __builtin_debugtrap() of clang
> >> does run into the same issues.
> >
> > It seemed to work OK for me, maybe because there's no code after it?
> I suspect it won't matter for the tests, the assertion failure is only if
> you step after hitting the int3. I just figured I'd mention it as a heads
> up. It would affect users writing code.

Note there is a request for adding __builtin_break to GCC;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84595 .
(and one for __builtin_debugtrap;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99299 ).
When __builtin_break is added, I think we should use that here if it
is available (as a higher priority than __builtin_debugtrap).

Thanks,
Andrew


> >
> > void breakpoint() {
> > __asm__ volatile ("int3);
> > }


Re: GCC trouble in dump_printf_loc

2024-06-03 Thread Andrew Pinski via Gcc
On Mon, Jun 3, 2024 at 2:54 AM weizhe wang via Gcc  wrote:
>
> Hi Guys,
>
>
>
>  I got some issues in debugging GCC.
>
>  I want to use dump_printf_loc to dump some debug message in GCC. I find 
> the fopt-info-all option which can enable some dump_printf_loc calls.
>
>  But some dump_printf_loc can't be enable by fopt-opt-all option. Because 
> the m_scope_depth variable in class dump_context.
>
>  Are there any options can enable this dump_printf_loc which is disabled 
> by m_scope_depth.
>
>  I want to enable dump_printf_loc in vect_pattern_recog_1.

`-fopt-info-all-internals` as documented
https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Developer-Options.html#index-fopt-info

Note you can find the same information in the vect dump file if you
dump if via -fdump-tree-vect-details too.

Thanks,
Andrew Pinski


>
>
>
> Thanks
> Sent using https://www.zoho.com/mail/


RE: [PATCH] aarch64: Support multiple variants including up to 3

2024-06-03 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Andrew Pinski (QUIC) 
> Sent: Saturday, May 4, 2024 2:03 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Andrew Pinski (QUIC) 
> Subject: [PATCH] aarch64: Support multiple variants including
> up to 3
> 
> On some of the Qualcomm's SoC that includes oryon-1 core,
> the variant will be different on the cores due to big.little
> config. Though the difference between big and little is not
> significant enough to have seperate cost/scheduling models
> for them and the feature set is the same across all variants.
> 
> Also on some SoCs, there are 3 variants of the core,
> big.middle.little so this increases the support there for up to 3
> cores and 3 variants in the original parsing loop but it does not
> change the support for max of 2 different cores.
> 
> After this patch and the patch that adds oryon-1, -
> mcpu=native works on the SoCs I am working with.
> 
> Bootstrapped and tested on aarch64-linux-gnu with no
> regressions.

Ping?

> 
> gcc/ChangeLog:
> 
>   * config/aarch64/driver-aarch64.cc
> (host_detect_local_cpu): Support
>   3 cores and 3 variants. If there is one core but multiple
> variant,
>   then treat the variant as being all.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/aarch64/cpunative/info_25: New file.
>   * gcc.target/aarch64/cpunative/info_26: New file.
>   * gcc.target/aarch64/cpunative/native_cpu_25.c: New
> test.
>   * gcc.target/aarch64/cpunative/native_cpu_26.c: New
> test.
> 
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/driver-aarch64.cc  | 14 ++
>  .../gcc.target/aarch64/cpunative/info_25  | 17
> 
>  .../gcc.target/aarch64/cpunative/info_26  | 26
> +++
>  .../aarch64/cpunative/native_cpu_25.c | 11 
>  .../aarch64/cpunative/native_cpu_26.c | 11 
>  5 files changed, 74 insertions(+), 5 deletions(-)  create mode
> 100644 gcc/testsuite/gcc.target/aarch64/cpunative/info_25
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/info_26
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_25.c
>  create mode 100644
> gcc/testsuite/gcc.target/aarch64/cpunative/native_cpu_26.c
> 
> diff --git a/gcc/config/aarch64/driver-aarch64.cc
> b/gcc/config/aarch64/driver-aarch64.cc
> index b620351e572..abe6e7df7dc 100644
> --- a/gcc/config/aarch64/driver-aarch64.cc
> +++ b/gcc/config/aarch64/driver-aarch64.cc
> @@ -256,9 +256,9 @@ host_detect_local_cpu (int argc, const
> char **argv)
>bool cpu = false;
>unsigned int i = 0;
>unsigned char imp = INVALID_IMP;
> -  unsigned int cores[2] = { INVALID_CORE, INVALID_CORE };
> +  unsigned int cores[3] = { INVALID_CORE, INVALID_CORE,
> INVALID_CORE };
>unsigned int n_cores = 0;
> -  unsigned int variants[2] = { ALL_VARIANTS, ALL_VARIANTS };
> +  unsigned int variants[3] = { ALL_VARIANTS, ALL_VARIANTS,
> ALL_VARIANTS
> + };
>unsigned int n_variants = 0;
>bool processed_exts = false;
>aarch64_feature_flags extension_flags = 0; @@ -314,7
> +314,7 @@ host_detect_local_cpu (int argc, const char
> **argv)
> unsigned cvariant = parse_field (buf);
> if (!contains_core_p (variants, cvariant))
>   {
> -  if (n_variants == 2)
> +   if (n_variants == 3)
>  goto not_found;
> 
>variants[n_variants++] = cvariant; @@ -326,7 +326,7
> @@ host_detect_local_cpu (int argc, const char **argv)
> unsigned ccore = parse_field (buf);
> if (!contains_core_p (cores, ccore))
>   {
> -   if (n_cores == 2)
> +   if (n_cores == 3)
>   goto not_found;
> 
> cores[n_cores++] = ccore;
> @@ -383,11 +383,15 @@ host_detect_local_cpu (int argc,
> const char **argv)
>/* Weird cpuinfo format that we don't know how to handle.
> */
>if (n_cores == 0
>|| n_cores > 2
> -  || (n_cores == 1 && n_variants != 1)
>|| imp == INVALID_IMP
>|| !processed_exts)
>  goto not_found;
> 
> +  /* If we have one core type but multiple variants, consider
> + that as one variant with ALL_VARIANTS instead.  */  if
> (n_cores ==
> + 1 && n_variants != 1)
> +variants[0] = ALL_VARIANTS;
> +
>/* Simple case, one core type or just looking for the arch. */
>if (n_cores == 1 || arch)
>  {
> diff --git a/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
> b/gcc/testsuite/gcc.target/aarch64/cpunative/info_25
> new file mode 100644
> index 000..d6e83ccab09
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/cp

Re: [PATCH v7 2/9] Fix pro_and_epilogue for sibcalls at -O0

2024-06-02 Thread Andrew Pinski
On Sun, Jun 2, 2024, 10:24 AM Andi Kleen  wrote:

> Some of the cfg fixups in pro_and_epilogue for sibcalls were dependent on
> "optimize".
> Make them check cfun->tail_call_marked instead to handle the -O0 musttail
> case. This fixes the musttail test cases on arm targets.
>
> PR115255
>
> gcc/ChangeLog:
>
> * function.cc (thread_prologue_and_epilogue_insns): Check
>   cfun->tail_call_marked for sibcalls too.
> (rest_of_handle_thread_prologue_and_epilogue): Dito.
> ---
>  gcc/function.cc | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/function.cc b/gcc/function.cc
> index 4edd4da12474..f949c38b916c 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -6261,6 +6261,7 @@ thread_prologue_and_epilogue_insns (void)
>/* Threading the prologue and epilogue changes the artificial refs in
> the
>   entry and exit blocks, and may invalidate DF info for tail calls.  */
>if (optimize
> +  || cfun->tail_call_marked
>|| flag_optimize_sibling_calls
>|| flag_ipa_icf_functions
>|| in_lto_p)
> @@ -6557,7 +6558,7 @@ rest_of_handle_thread_prologue_and_epilogue
> (function *fun)
>  {
>/* prepare_shrink_wrap is sensitive to the block structure of the
> control
>   flow graph, so clean it up first.  */
> -  if (optimize)
> +  if (cfun->tail_call_marked || optimize)
>  cleanup_cfg (0);
>
>/* On some machines, the prologue and epilogue code, or parts thereof,
> @@ -6579,7 +6580,7 @@ rest_of_handle_thread_prologue_and_epilogue
> (function *fun)
>
>/* Shrink-wrapping can result in unreachable edges in the epilogue,
>   see PR57320.  */
> -  cleanup_cfg (optimize ? CLEANUP_EXPENSIVE : 0);
> +  cleanup_cfg ((cfun->tail_call_marked || optimize) ? CLEANUP_EXPENSIVE :
> 0);
>

I think this makes -g -O0 with musttail useless. Make sure if you have a
musttail function call in the ir that it does not change the cfg for -O0
case which adds extra jumps specifically for debugging.

Thanks,
Andrew



>/* The stack usage info is finalized during prologue expansion.  */
>if (flag_stack_usage_info || flag_callgraph_info)
> --
> 2.44.0
>
>


[gcc r15-938] Fix some opindex for some options [PR115022]

2024-05-31 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:a0d60660f2aae2d79685f73d568facb2397582d8

commit r15-938-ga0d60660f2aae2d79685f73d568facb2397582d8
Author: Andrew Pinski 
Date:   Wed May 29 20:40:31 2024 -0700

Fix some opindex for some options [PR115022]

While looking at the index I noticed that some options had
`-` in the front for the index which is wrong. And then
I noticed there was no index for `mcmodel=` for targets or had
used `-mcmodel` incorrectly.

This fixes both of those and regnerates the urls files see that
`-mcmodel=` option now has an url associated with it.

gcc/ChangeLog:

PR target/115022
* doc/invoke.texi (fstrub=disable): Fix opindex.
(minline-memops-threshold): Fix opindex.
(mcmodel=): Add opindex and fix them.
* common.opt.urls: Regenerate.
* config/aarch64/aarch64.opt.urls: Regenerate.
* config/bpf/bpf.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/nds32/nds32-elf.opt.urls: Regenerate.
* config/nds32/nds32-linux.opt.urls: Regenerate.
* config/or1k/or1k.opt.urls: Regenerate.
* config/riscv/riscv.opt.urls: Regenerate.
* config/rs6000/aix64.opt.urls: Regenerate.
* config/rs6000/linux64.opt.urls: Regenerate.
* config/sparc/sparc.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/common.opt.urls |  3 +++
 gcc/config/aarch64/aarch64.opt.urls |  3 ++-
 gcc/config/bpf/bpf.opt.urls |  3 +++
 gcc/config/i386/i386.opt.urls   |  3 ++-
 gcc/config/loongarch/loongarch.opt.urls |  2 +-
 gcc/config/nds32/nds32-elf.opt.urls |  2 +-
 gcc/config/nds32/nds32-linux.opt.urls   |  2 +-
 gcc/config/or1k/or1k.opt.urls   |  3 ++-
 gcc/config/riscv/riscv.opt.urls |  3 ++-
 gcc/config/rs6000/aix64.opt.urls|  3 ++-
 gcc/config/rs6000/linux64.opt.urls  |  3 ++-
 gcc/config/sparc/sparc.opt.urls |  2 +-
 gcc/doc/invoke.texi | 17 +++--
 13 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 10462e40874..1f2eb67c8e0 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1339,6 +1339,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fstrict-aliasing)
 fstrict-overflow
 UrlSuffix(gcc/Code-Gen-Options.html#index-fstrict-overflow)
 
+fstrub=disable
+UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003ddisable)
+
 fstrub=strict
 UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003dstrict)
 
diff --git a/gcc/config/aarch64/aarch64.opt.urls 
b/gcc/config/aarch64/aarch64.opt.urls
index 993634c52f8..4fa90384378 100644
--- a/gcc/config/aarch64/aarch64.opt.urls
+++ b/gcc/config/aarch64/aarch64.opt.urls
@@ -18,7 +18,8 @@ 
UrlSuffix(gcc/AArch64-Options.html#index-mfix-cortex-a53-843419)
 mlittle-endian
 UrlSuffix(gcc/AArch64-Options.html#index-mlittle-endian)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/AArch64-Options.html#index-mcmodel_003d)
 
 mtp=
 UrlSuffix(gcc/AArch64-Options.html#index-mtp)
diff --git a/gcc/config/bpf/bpf.opt.urls b/gcc/config/bpf/bpf.opt.urls
index 8c1e5f86d5c..1e8873a899f 100644
--- a/gcc/config/bpf/bpf.opt.urls
+++ b/gcc/config/bpf/bpf.opt.urls
@@ -33,3 +33,6 @@ UrlSuffix(gcc/eBPF-Options.html#index-msmov)
 mcpu=
 UrlSuffix(gcc/eBPF-Options.html#index-mcpu-5)
 
+minline-memops-threshold=
+UrlSuffix(gcc/eBPF-Options.html#index-minline-memops-threshold)
+
diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index 40e8a844936..9384b0b3187 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -40,7 +40,8 @@ UrlSuffix(gcc/x86-Options.html#index-march-16)
 mlarge-data-threshold=
 UrlSuffix(gcc/x86-Options.html#index-mlarge-data-threshold)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/x86-Options.html#index-mcmodel_003d-7)
 
 mcpu=
 UrlSuffix(gcc/x86-Options.html#index-mcpu-14)
diff --git a/gcc/config/loongarch/loongarch.opt.urls 
b/gcc/config/loongarch/loongarch.opt.urls
index 9ed5d7b5596..f7545f65103 100644
--- a/gcc/config/loongarch/loongarch.opt.urls
+++ b/gcc/config/loongarch/loongarch.opt.urls
@@ -58,7 +58,7 @@ mrecip
 UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
 
 mcmodel=
-UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel)
+UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel_003d-1)
 
 mdirect-extern-access
 UrlSuffix(gcc/LoongArch-Options.html#index-mdirect-extern-access)
diff --git a/gcc/config/nds32/nds32-elf.opt.urls 
b/gcc/config/nds32/nds32-elf.opt.urls
index 3ae1efe7312..e5432b62863 100644
--- a/gcc/config/nds32/nds32-elf.opt.urls
+++ b/gcc/config/nds32/nds32-elf.opt.urls
@@ -1,5 +1,5 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/nds32

Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-05-30 Thread Andrew Pinski
On Thu, May 30, 2024 at 5:09 AM Filip Kastl  wrote:
>
> Hi,
>
> This patch adds a transformation into the switch conversion pass --
> the "exponential index transform".  This transformation can help switch
> conversion convert switches it otherwise could not.  The transformation is
> intended for switches whose cases are all powers of 2.  Here is a more 
> detailed
> description of what and how this patch tries to address:
>
>
> The problem
> ---
>
> Consider this piece of C code
>
> switch (i)
>   {
> case (1 << 0): return 0;
> case (1 << 1): return 1;
> case (1 << 2): return 2;
> ...
> case (1 << 30): return 30;
> default: return 31;
>   }
>
> If i is a power of 2 (2^k), the switch just returns the exponent (k).  This 
> can
> be seen as taking the base 2 logarithm of i or as returning the position of 
> the
> singular 1 bit in i.
>
> Currently, GCC fails to see this and doesn't optimize the switch in any way.
>
> Switch conversion is able to optimize similar switches to an array lookup.
> This is not possible here because the range of cases is too wide.  Another
> optimization that switch conversion is able to do is the "linear
> transformation" -- switch conversion is able to notice a linear relationship
> between the index variable (variable i in the case) and the result value and
> rewrite switch to just an assignment (or multiple assignments in case of
> multiple result values). Sadly, linear transformation isn't applicable here
> because the linear relationship is based on the exponent of i, not on i 
> itself.
>
>
> The solution
> 
>
> The exponential index transformation does the following.  When it recognises
> that a switch only has case numbers that are powers of 2 it replaces them with
> their exponents.  It also replaces the index variable by its exponent.  This 
> is
> done by inserting a statement that takes the logarithm of i and using the
> result as the new index variable.  Actually we use the FFS operation for this
> -- since we expect a power of two, we may just ask for the position of the
> first 1 bit.
>
> We also need to insert a conditional that checks at runtime that the index
> variable is a power of two.  If it isn't, the resulting value should just be
> the default case value (31 in the example above).
>
> With exponential index transform, switch conversion is able to simplify the
> above example into something like this
>
> if (i is power of 2)
>   return log2(i); // actually implemented as ffs(i) - 1
> else
>   return 31;
>
> Switch conversion bails if the range of case numbers is too big.  Exponential
> index transform shrinks this range (exponentially).  So even if there is no
> linear relationship in the switch, exponential index transform can still help
> convert the switch at least to an array lookup.
>
>
> Limitations
> ---
>
> Currently we only run the exponential index transform if the target has the
> POPCOUNT (for checking a number is a power of 2) and FFS (for taking the
> logarithm) instructions -- we check direct_internal_fn_supported_p () for
> POPCOUNT and FFS internal functions.  Otherwise maybe computing FFS could be
> less efficient than just using a jump table.  We try to avoid transforming a
> switch into a less efficient form.  Maybe this is too conservative and could 
> be
> tweaked in the future.
>
>
> Bootstrapped and regtested on x86_64 linux.  I have additionally run bootstrap
> and regtest on a version where I removed the check that the target has the
> POPCOUNT and FFS instructions so that the transformation would be triggered
> more often.  That testing also went well.
>
> Are there any things I should tweak?  Or is the patch ready to be applied?
>
> Cheers,
> Filip Kastl
>
>
> -- 8< --
>
>
> Sometimes a switch has case numbers that are powers of 2.  Switch
> conversion usually isn't able to optimize switches.  This patch adds
> "exponential index transformation" to switch conversion.  After switch
> conversion applies this transformation on the switch the index variable
> of the switch becomes the exponent instead of the whole value.  For
> example:
>
> switch (i)
>   {
> case (1 << 0): return 0;
> case (1 << 1): return 1;
> case (1 << 2): return 2;
> ...
> case (1 << 30): return 30;
> default: return 31;
>   }
>
> gets transformed roughly into
>
> switch (log2(i))
>   {
> case 0: return 0;
> case 1: return 1;
> case 2: return 2;
> ...
> case 30: return 30;
> default: return 31;
>   }
>
> This enables switch conversion to further optimize the switch.
>
> This patch only enables this transformation if there are optabs for
> POPCOUNT and FFS so that the base 2 logarithm can be computed
> efficiently at runtime.
>
> gcc/ChangeLog:
>
> * tree-switch-conversion.cc (switch_conversion::switch_conversion):
> Track if the transformation happened.
> (switch_conversion::is_exp_index_transform_viable): New function
> to decide 

[PATCH] Fix some opindex for some options [PR115022]

2024-05-29 Thread Andrew Pinski
While looking at the index I noticed that some options had
`-` in the front for the index which is wrong. And then
I noticed there was no index for `mcmodel=` for targets or had
used `-mcmodel` incorrectly.

This fixes both of those and regnerates the urls files see that
`-mcmodel=` option now has an url associated with it.

OK?

gcc/ChangeLog:

PR target/115022
* doc/invoke.texi (fstrub=disable): Fix opindex.
(minline-memops-threshold): Fix opindex.
(mcmodel=): Add opindex and fix them.
* common.opt.urls: Regenerate.
* config/aarch64/aarch64.opt.urls: Regenerate.
* config/bpf/bpf.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/nds32/nds32-elf.opt.urls: Regenerate.
* config/nds32/nds32-linux.opt.urls: Regenerate.
* config/or1k/or1k.opt.urls: Regenerate.
* config/riscv/riscv.opt.urls: Regenerate.
* config/rs6000/aix64.opt.urls: Regenerate.
* config/rs6000/linux64.opt.urls: Regenerate.
* config/sparc/sparc.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 
---
 gcc/common.opt.urls |  3 +++
 gcc/config/aarch64/aarch64.opt.urls |  3 ++-
 gcc/config/bpf/bpf.opt.urls |  3 +++
 gcc/config/i386/i386.opt.urls   |  3 ++-
 gcc/config/loongarch/loongarch.opt.urls |  2 +-
 gcc/config/nds32/nds32-elf.opt.urls |  2 +-
 gcc/config/nds32/nds32-linux.opt.urls   |  2 +-
 gcc/config/or1k/or1k.opt.urls   |  3 ++-
 gcc/config/riscv/riscv.opt.urls |  3 ++-
 gcc/config/rs6000/aix64.opt.urls|  3 ++-
 gcc/config/rs6000/linux64.opt.urls  |  3 ++-
 gcc/config/sparc/sparc.opt.urls |  2 +-
 gcc/doc/invoke.texi | 17 +++--
 13 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 10462e40874..1f2eb67c8e0 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1339,6 +1339,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fstrict-aliasing)
 fstrict-overflow
 UrlSuffix(gcc/Code-Gen-Options.html#index-fstrict-overflow)
 
+fstrub=disable
+UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003ddisable)
+
 fstrub=strict
 UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003dstrict)
 
diff --git a/gcc/config/aarch64/aarch64.opt.urls 
b/gcc/config/aarch64/aarch64.opt.urls
index 993634c52f8..4fa90384378 100644
--- a/gcc/config/aarch64/aarch64.opt.urls
+++ b/gcc/config/aarch64/aarch64.opt.urls
@@ -18,7 +18,8 @@ 
UrlSuffix(gcc/AArch64-Options.html#index-mfix-cortex-a53-843419)
 mlittle-endian
 UrlSuffix(gcc/AArch64-Options.html#index-mlittle-endian)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/AArch64-Options.html#index-mcmodel_003d)
 
 mtp=
 UrlSuffix(gcc/AArch64-Options.html#index-mtp)
diff --git a/gcc/config/bpf/bpf.opt.urls b/gcc/config/bpf/bpf.opt.urls
index 8c1e5f86d5c..1e8873a899f 100644
--- a/gcc/config/bpf/bpf.opt.urls
+++ b/gcc/config/bpf/bpf.opt.urls
@@ -33,3 +33,6 @@ UrlSuffix(gcc/eBPF-Options.html#index-msmov)
 mcpu=
 UrlSuffix(gcc/eBPF-Options.html#index-mcpu-5)
 
+minline-memops-threshold=
+UrlSuffix(gcc/eBPF-Options.html#index-minline-memops-threshold)
+
diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index 40e8a844936..9384b0b3187 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -40,7 +40,8 @@ UrlSuffix(gcc/x86-Options.html#index-march-16)
 mlarge-data-threshold=
 UrlSuffix(gcc/x86-Options.html#index-mlarge-data-threshold)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/x86-Options.html#index-mcmodel_003d-7)
 
 mcpu=
 UrlSuffix(gcc/x86-Options.html#index-mcpu-14)
diff --git a/gcc/config/loongarch/loongarch.opt.urls 
b/gcc/config/loongarch/loongarch.opt.urls
index 9ed5d7b5596..f7545f65103 100644
--- a/gcc/config/loongarch/loongarch.opt.urls
+++ b/gcc/config/loongarch/loongarch.opt.urls
@@ -58,7 +58,7 @@ mrecip
 UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
 
 mcmodel=
-UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel)
+UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel_003d-1)
 
 mdirect-extern-access
 UrlSuffix(gcc/LoongArch-Options.html#index-mdirect-extern-access)
diff --git a/gcc/config/nds32/nds32-elf.opt.urls 
b/gcc/config/nds32/nds32-elf.opt.urls
index 3ae1efe7312..e5432b62863 100644
--- a/gcc/config/nds32/nds32-elf.opt.urls
+++ b/gcc/config/nds32/nds32-elf.opt.urls
@@ -1,5 +1,5 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/nds32/nds32-elf.opt 
and generated HTML
 
 mcmodel=
-UrlSuffix(gcc/NDS32-Options.html#index-mcmodel-1)
+UrlSuffix(gcc/NDS32-Options.html#index-mcmodel_003d-2)
 
diff --git a/gcc/config/nds32/nds32-linux.opt.urls 
b/gcc/config/nds32/nds32-linux.opt.urls
index ac589ccd472..3986cf225ef 100644
--- a/gcc/config/nds32/nds32-linux.opt.urls
+++ b/gcc/config/nds32

Re: Is fcommon related with performance optimization logic?

2024-05-29 Thread Andrew Pinski via Gcc
On Wed, May 29, 2024 at 7:13 PM 赵海峰 via Gcc  wrote:
>
> Dear Sir/Madam,
>
>
> We found that running on intel SPR UnixBench compiled with gcc 10.3 performs 
> worse than with gcc 8.5 for dhry2reg benchmark.
>
>
> I found it related with -fcommon option which is disabled in 10.3 by default. 
> Fcommon will make global variables addresses in special order in bss 
> section(watching by nm -n) whatever they are defined in source code.
>
>
> We are wondering if fcommon has some special performance optimization process?
>
>
> (I also post the subject to gcc-help. Hope to get some suggestion in this 
> mail list. Sorry for bothering.)

This was already filed as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114532 . But someone
needs to go in and do more analysis of what is going wrong. The
biggest difference for x86_64 is how the variables are laid out and by
who (the compiler or the linker).  There is some notion that
-fno-common increases the number of L1-dcache-load-misses and that
points to the layout of the variable differences causing the
difference. But nobody has gone and seen which variables are laid out
differently and why. I am suspecting that small changes in the
code/variables would cause layout differences which will cause the
cache misses which can cause the performance which is almost all by
accident.
I suspect adding -fdata-sections will cause another performance
difference here too. And there is not much GCC can do about this since
data layout is "hard" to do to get the best performance always.

Thanks,
Andrew Pinski

>
>
> Best regards.
>
>
> Clark Zhao


[gcc r15-908] match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]

2024-05-29 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:547143df5aa0960fb149a26933dad7ca1c363afb

commit r15-908-g547143df5aa0960fb149a26933dad7ca1c363afb
Author: Andrew Pinski 
Date:   Sun May 26 17:38:37 2024 -0700

match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]

While looking into something else, I noticed that `a ^ CST` needed to be
special casing to bitwise_inverted_equal_p as it would simplify to `a ^ 
~CST`
for the bitwise not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115224

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): Add `a ^ CST`
case.
* gimple-match-head.cc (gimple_bit_xor_cst): New declaration.
(gimple_bitwise_inverted_equal_p): Add `a ^ CST` case.
* match.pd (bit_xor_cst): New match.
(maybe_bit_not): Add bit_xor_cst case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-8.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/generic-match-head.cc| 10 ++
 gcc/gimple-match-head.cc | 13 +
 gcc/match.pd |  4 
 gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c | 15 +++
 4 files changed, 42 insertions(+)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 55ba369c6b3..641d8e9b2de 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -158,6 +158,16 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool 
)
   if (TREE_CODE (expr2) == BIT_NOT_EXPR
   && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
 return true;
+
+  /* `X ^ CST` and `X ^ ~CST` match for ~. */
+  if (TREE_CODE (expr1) == BIT_XOR_EXPR && TREE_CODE (expr2) == BIT_XOR_EXPR
+  && bitwise_equal_p (TREE_OPERAND (expr1, 0), TREE_OPERAND (expr2, 0)))
+{
+  tree cst1 = uniform_integer_cst_p (TREE_OPERAND (expr1, 1));
+  tree cst2 = uniform_integer_cst_p (TREE_OPERAND (expr2, 1));
+  if (cst1 && cst2 && wi::to_wide (cst1) == ~wi::to_wide (cst2))
+   return true;
+}
   if (COMPARISON_CLASS_P (expr1)
   && COMPARISON_CLASS_P (expr2))
 {
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 6220725b259..e26fa0860ee 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -283,6 +283,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 
 bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree));
 bool gimple_maybe_cmp (tree, tree *, tree (*) (tree));
+bool gimple_bit_xor_cst (tree, tree *, tree (*) (tree));
 
 /* Helper function for bitwise_inverted_equal_p macro.  */
 
@@ -301,6 +302,18 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
   if (operand_equal_p (expr1, expr2, 0))
 return false;
 
+  tree xor1[2];
+  tree xor2[2];
+  /* `X ^ CST` and `X ^ ~CST` match for ~. */
+  if (gimple_bit_xor_cst (expr1, xor1, valueize)
+  && gimple_bit_xor_cst (expr2, xor2, valueize))
+{
+  if (operand_equal_p (xor1[0], xor2[0], 0)
+ && (wi::to_wide (uniform_integer_cst_p (xor1[1]))
+ == ~wi::to_wide (uniform_integer_cst_p (xor2[1]
+   return true;
+}
+
   tree other;
   /* Try if EXPR1 was defined as ~EXPR2. */
   if (gimple_bit_not_with_nop (expr1, , valueize))
diff --git a/gcc/match.pd b/gcc/match.pd
index 090ad4e08b0..480e36bbbaf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -174,6 +174,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (bit_not_with_nop @0)
  (convert (bit_not @0))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)
+(match (bit_xor_cst @0 @1)
+ (bit_xor @0 uniform_integer_cst_p@1))
 (for cmp (tcc_comparison)
  (match (maybe_cmp @0)
   (cmp@0 @1 @2))
@@ -195,6 +197,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (INTEGER_CST@0))
 (match (maybe_bit_not @0)
  (maybe_cmp@0 @1))
+(match (maybe_bit_not @0)
+ (bit_xor_cst@0 @1 @2))
 
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
new file mode 100644
index 000..40f756e4455
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115224 */
+
+int f1(int a, int b)
+{
+a = a ^ 1;
+int c = ~a;
+return c | (a ^ b);
+// ~((a ^ 1) & b) or (a ^ -2) | ~b
+}
+/* { dg-final { scan-tree-dump-times   "bit_xor_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */
+


[gcc r15-907] Match: Add maybe_bit_not instead of plain matching

2024-05-29 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:0a9154d154957b21eb2c9e4fbe9869e50fb9742f

commit r15-907-g0a9154d154957b21eb2c9e4fbe9869e50fb9742f
Author: Andrew Pinski 
Date:   Sat May 25 23:29:48 2024 -0700

Match: Add maybe_bit_not instead of plain matching

While working on adding matching of negative expressions of `a - b`,
I noticed that we started to have "duplicated" patterns due to not having
a way to match maybe negative expressions. So I went back to what I did for
bit_not and decided to improve the situtation there so for some patterns
where we had 2 operands of an expression where one could have been a 
bit_not,
add back maybe_bit_not.
This does not add maybe_bit_not in every place were bitwise_inverted_equal_p
is used, just the ones were 2 operands of an expression could be swapped.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (bit_not_with_nop): Unconditionalize.
(maybe_cmp): Likewise.
(maybe_bit_not): New match pattern.
(`~X & X`): Use maybe_bit_not and add `:c` back.
(`~x ^ x`/`~x | x`): Likewise.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/match.pd | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 024e3350465..090ad4e08b0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -167,7 +167,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
   && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
(@0))
 
-#if GIMPLE
 /* These are used by gimple_bitwise_inverted_equal_p to simplify
detection of BIT_NOT and comparisons. */
 (match (bit_not_with_nop @0)
@@ -188,7 +187,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_xor@0 @1 @2)
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) == 1)))
-#endif
+/* maybe_bit_not is used to match what
+   is acceptable for bitwise_inverted_equal_p. */
+(match (maybe_bit_not @0)
+ (bit_not_with_nop@0 @1))
+(match (maybe_bit_not @0)
+ (INTEGER_CST@0))
+(match (maybe_bit_not @0)
+ (maybe_cmp@0 @1))
 
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
@@ -1332,7 +1338,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simplify ~X & X as zero.  */
 (simplify
- (bit_and (convert? @0) (convert? @1))
+ (bit_and:c (convert? @0) (convert? (maybe_bit_not @1)))
  (with { bool wascmp; }
   (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
&& bitwise_inverted_equal_p (@0, @1, wascmp))
@@ -1597,7 +1603,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~x ^ x -> -1 */
 (for op (bit_ior bit_xor)
  (simplify
-  (op (convert? @0) (convert? @1))
+  (op:c (convert? @0) (convert? (maybe_bit_not @1)))
   (with { bool wascmp; }
(if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
 && bitwise_inverted_equal_p (@0, @1, wascmp))


[PATCH] ASAN: call initialize_sanitizer_builtins for hwasan [PR115205]

2024-05-28 Thread Andrew Pinski
Sometimes initialize_sanitizer_builtins is not called before emitting
the asan builtins with hwasan. In the case of the bug report, there
was a path with the fortran front-end where it was not called.
So let's call it in asan_instrument before calling transform_statements.

Built and tested for aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

PR sanitizer/115205
* asan.cc (asan_instrument): Call initialize_sanitizer_builtins
for hwasan.

Signed-off-by: Andrew Pinski 
---
 gcc/asan.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 9e0f51b1477..c684ca6d366 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -4276,6 +4276,7 @@ asan_instrument (void)
 {
   if (hwasan_sanitize_p ())
 {
+  initialize_sanitizer_builtins ();
   transform_statements ();
   return 0;
 }
-- 
2.43.0



Re: [r15-853 Regression] FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized "bit_not_expr, " 1 on Linux/x86_64

2024-05-27 Thread Andrew Pinski
On Mon, May 27, 2024 at 4:30 PM haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> c5a7628470a7fb801ebeea82e16a4549db43bfa5 is the first bad commit
> commit c5a7628470a7fb801ebeea82e16a4549db43bfa5
> Author: Andrew Pinski 
> Date:   Sun May 26 17:59:21 2024 -0700
>
> match: Use uniform_integer_cst_p in bitwise_inverted_equal_p [PR115238]
>
> caused
>
> FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized 
> "bit_ior_expr, " 1
> FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized 
> "bit_not_expr, " 1


Sorry about that, fixed in
r15-855-g4fcdc37e8856bde847d3b8dd2915b68d56ad1d62 . Just was a
testcase issue really.

Thanks,
Andrew

>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-853/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/bitops-9.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/bitops-9.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)


Re: [r15-853 Regression] FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized "bit_not_expr, " 1 on Linux/x86_64

2024-05-27 Thread Andrew Pinski via Gcc-regression
On Mon, May 27, 2024 at 4:30 PM haochen.jiang
 wrote:
>
> On Linux/x86_64,
>
> c5a7628470a7fb801ebeea82e16a4549db43bfa5 is the first bad commit
> commit c5a7628470a7fb801ebeea82e16a4549db43bfa5
> Author: Andrew Pinski 
> Date:   Sun May 26 17:59:21 2024 -0700
>
> match: Use uniform_integer_cst_p in bitwise_inverted_equal_p [PR115238]
>
> caused
>
> FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized 
> "bit_ior_expr, " 1
> FAIL: gcc.dg/tree-ssa/bitops-9.c scan-tree-dump-times optimized 
> "bit_not_expr, " 1


Sorry about that, fixed in
r15-855-g4fcdc37e8856bde847d3b8dd2915b68d56ad1d62 . Just was a
testcase issue really.

Thanks,
Andrew

>
> with GCC configured with
>
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r15-853/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
>
> To reproduce:
>
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/bitops-9.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-ssa.exp=gcc.dg/tree-ssa/bitops-9.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
>
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com.)
> (If you met problems with cascadelake related, disabling AVX512F in command 
> line might save that.)
> (However, please make sure that there is no potential problems with AVX512.)


[gcc r15-855] Fix bitops-9.c for -m32 and other targets that don't have vector modes

2024-05-27 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:4fcdc37e8856bde847d3b8dd2915b68d56ad1d62

commit r15-855-g4fcdc37e8856bde847d3b8dd2915b68d56ad1d62
Author: Andrew Pinski 
Date:   Mon May 27 17:24:11 2024 -0700

Fix bitops-9.c for -m32 and other targets that don't have vector modes

This just moves the tree scan earlier so we can detect the optimization and 
not
need to detect the vector splitting too.

Committed as obvious after a quick test.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-9.c: Look at cdcde1 rather than 
optmization.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
index a18b6bf3214..bcf079ab59d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* { dg-options "-O2 -fdump-tree-cddce1-raw" } */
 /* PR tree-optimization/115238 */
 
 
@@ -10,6 +10,8 @@ void f(int a, vector8 int *b)
 a = 1;
 *b = a | ((~a) ^ *b);
 }
-/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
-/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
-/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */
+/* Scan early on in the phases before the vector has possibily been split
+   but late enough after forwprop or other match-simplify has happened though. 
*/
+/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "cddce1" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "cddce1" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "cddce1" } } */


[COMMITTED] Fix bitops-9.c for -m32 and other targets that don't have vector modes

2024-05-27 Thread Andrew Pinski
This just moves the tree scan earlier so we can detect the optimization and not
need to detect the vector splitting too.

Committed as obvious after a quick test.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-9.c: Look at cdcde1 rather than optmization.

Signed-off-by: Andrew Pinski 
---
 gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
index a18b6bf3214..bcf079ab59d 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* { dg-options "-O2 -fdump-tree-cddce1-raw" } */
 /* PR tree-optimization/115238 */
 
 
@@ -10,6 +10,8 @@ void f(int a, vector8 int *b)
 a = 1;
 *b = a | ((~a) ^ *b);
 }
-/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
-/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
-/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */
+/* Scan early on in the phases before the vector has possibily been split
+   but late enough after forwprop or other match-simplify has happened though. 
*/
+/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "cddce1" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "cddce1" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "cddce1" } } */
-- 
2.43.0



[gcc r15-853] match: Use uniform_integer_cst_p in bitwise_inverted_equal_p [PR115238]

2024-05-27 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:c5a7628470a7fb801ebeea82e16a4549db43bfa5

commit r15-853-gc5a7628470a7fb801ebeea82e16a4549db43bfa5
Author: Andrew Pinski 
Date:   Sun May 26 17:59:21 2024 -0700

match: Use uniform_integer_cst_p in bitwise_inverted_equal_p [PR115238]

I noticed while working on the `a ^ CST` patch, that 
bitwise_inverted_equal_p
would check INTEGER_CST directly and not handle vector csts that are 
uniform.
This moves over to using uniform_integer_cst_p instead of checking 
INTEGER_CST
directly.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115238

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): Use
uniform_integer_cst_p instead of checking INTEGER_CST.
* gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-9.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/generic-match-head.cc|  6 --
 gcc/gimple-match-head.cc |  6 --
 gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c | 15 +++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index e2e1e4b2d64..55ba369c6b3 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -146,8 +146,10 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool 
)
 return false;
   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
 return false;
-  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
-return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  tree cst1 = uniform_integer_cst_p (expr1);
+  tree cst2 = uniform_integer_cst_p (expr2);
+  if (cst1 && cst2)
+return wi::to_wide (cst1) == ~wi::to_wide (cst2);
   if (operand_equal_p (expr1, expr2, 0))
 return false;
   if (TREE_CODE (expr1) == BIT_NOT_EXPR
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 49b1dde6ae4..6220725b259 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -294,8 +294,10 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
 return false;
   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
 return false;
-  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
-return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  tree cst1 = uniform_integer_cst_p (expr1);
+  tree cst2 = uniform_integer_cst_p (expr2);
+  if (cst1 && cst2)
+return wi::to_wide (cst1) == ~wi::to_wide (cst2);
   if (operand_equal_p (expr1, expr2, 0))
 return false;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
new file mode 100644
index 000..a18b6bf3214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115238 */
+
+
+#define vector8 __attribute__((vector_size(2*sizeof(int
+
+void f(int a, vector8 int *b)
+{
+a = 1;
+*b = a | ((~a) ^ *b);
+}
+/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */


[PATCH] match: Use uniform_integer_cst_p in bitwise_inverted_equal_p [PR115238]

2024-05-26 Thread Andrew Pinski
I noticed while working on the `a ^ CST` patch, that bitwise_inverted_equal_p
would check INTEGER_CST directly and not handle vector csts that are uniform.
This moves over to using uniform_integer_cst_p instead of checking INTEGER_CST
directly.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115238

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): Use
uniform_integer_cst_p instead of checking INTEGER_CST.
* gimple-match-head.cc (gimple_bitwise_inverted_equal_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-9.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/generic-match-head.cc|  6 --
 gcc/gimple-match-head.cc |  6 --
 gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c | 15 +++
 3 files changed, 23 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index 3709fe5456d..641d8e9b2de 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -146,8 +146,10 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool 
)
 return false;
   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
 return false;
-  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
-return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  tree cst1 = uniform_integer_cst_p (expr1);
+  tree cst2 = uniform_integer_cst_p (expr2);
+  if (cst1 && cst2)
+return wi::to_wide (cst1) == ~wi::to_wide (cst2);
   if (operand_equal_p (expr1, expr2, 0))
 return false;
   if (TREE_CODE (expr1) == BIT_NOT_EXPR
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index d5908f4e9a6..e26fa0860ee 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -295,8 +295,10 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
 return false;
   if (!tree_nop_conversion_p (TREE_TYPE (expr1), TREE_TYPE (expr2)))
 return false;
-  if (TREE_CODE (expr1) == INTEGER_CST && TREE_CODE (expr2) == INTEGER_CST)
-return wi::to_wide (expr1) == ~wi::to_wide (expr2);
+  tree cst1 = uniform_integer_cst_p (expr1);
+  tree cst2 = uniform_integer_cst_p (expr2);
+  if (cst1 && cst2)
+return wi::to_wide (cst1) == ~wi::to_wide (cst2);
   if (operand_equal_p (expr1, expr2, 0))
 return false;
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
new file mode 100644
index 000..a18b6bf3214
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115238 */
+
+
+#define vector8 __attribute__((vector_size(2*sizeof(int
+
+void f(int a, vector8 int *b)
+{
+a = 1;
+*b = a | ((~a) ^ *b);
+}
+/* { dg-final { scan-tree-dump-not "bit_xor_expr, " "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */
-- 
2.43.0



[PATCH 2/2] match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]

2024-05-26 Thread Andrew Pinski
While looking into something else, I noticed that `a ^ CST` needed to be
special casing to bitwise_inverted_equal_p as it would simplify to `a ^ ~CST`
for the bitwise not.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115224

gcc/ChangeLog:

* generic-match-head.cc (bitwise_inverted_equal_p): Add `a ^ CST`
case.
* gimple-match-head.cc (gimple_bit_xor_cst): New declaration.
(gimple_bitwise_inverted_equal_p): Add `a ^ CST` case.
* match.pd (bit_xor_cst): New match.
(maybe_bit_not): Add bit_xor_cst case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bitops-8.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/generic-match-head.cc| 10 ++
 gcc/gimple-match-head.cc | 13 +
 gcc/match.pd |  4 
 gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c | 15 +++
 4 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c

diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
index e2e1e4b2d64..3709fe5456d 100644
--- a/gcc/generic-match-head.cc
+++ b/gcc/generic-match-head.cc
@@ -156,6 +156,16 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool 
)
   if (TREE_CODE (expr2) == BIT_NOT_EXPR
   && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
 return true;
+
+  /* `X ^ CST` and `X ^ ~CST` match for ~. */
+  if (TREE_CODE (expr1) == BIT_XOR_EXPR && TREE_CODE (expr2) == BIT_XOR_EXPR
+  && bitwise_equal_p (TREE_OPERAND (expr1, 0), TREE_OPERAND (expr2, 0)))
+{
+  tree cst1 = uniform_integer_cst_p (TREE_OPERAND (expr1, 1));
+  tree cst2 = uniform_integer_cst_p (TREE_OPERAND (expr2, 1));
+  if (cst1 && cst2 && wi::to_wide (cst1) == ~wi::to_wide (cst2))
+   return true;
+}
   if (COMPARISON_CLASS_P (expr1)
   && COMPARISON_CLASS_P (expr2))
 {
diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
index 49b1dde6ae4..d5908f4e9a6 100644
--- a/gcc/gimple-match-head.cc
+++ b/gcc/gimple-match-head.cc
@@ -283,6 +283,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
(*valueize) (tree))
 
 bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree));
 bool gimple_maybe_cmp (tree, tree *, tree (*) (tree));
+bool gimple_bit_xor_cst (tree, tree *, tree (*) (tree));
 
 /* Helper function for bitwise_inverted_equal_p macro.  */
 
@@ -299,6 +300,18 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
bool , tree (*va
   if (operand_equal_p (expr1, expr2, 0))
 return false;
 
+  tree xor1[2];
+  tree xor2[2];
+  /* `X ^ CST` and `X ^ ~CST` match for ~. */
+  if (gimple_bit_xor_cst (expr1, xor1, valueize)
+  && gimple_bit_xor_cst (expr2, xor2, valueize))
+{
+  if (operand_equal_p (xor1[0], xor2[0], 0)
+ && (wi::to_wide (uniform_integer_cst_p (xor1[1]))
+ == ~wi::to_wide (uniform_integer_cst_p (xor2[1]
+   return true;
+}
+
   tree other;
   /* Try if EXPR1 was defined as ~EXPR2. */
   if (gimple_bit_not_with_nop (expr1, , valueize))
diff --git a/gcc/match.pd b/gcc/match.pd
index 090ad4e08b0..480e36bbbaf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -174,6 +174,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (bit_not_with_nop @0)
  (convert (bit_not @0))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)
+(match (bit_xor_cst @0 @1)
+ (bit_xor @0 uniform_integer_cst_p@1))
 (for cmp (tcc_comparison)
  (match (maybe_cmp @0)
   (cmp@0 @1 @2))
@@ -195,6 +197,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (INTEGER_CST@0))
 (match (maybe_bit_not @0)
  (maybe_cmp@0 @1))
+(match (maybe_bit_not @0)
+ (bit_xor_cst@0 @1 @2))
 
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
new file mode 100644
index 000..40f756e4455
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
+/* PR tree-optimization/115224 */
+
+int f1(int a, int b)
+{
+a = a ^ 1;
+int c = ~a;
+return c | (a ^ b);
+// ~((a ^ 1) & b) or (a ^ -2) | ~b
+}
+/* { dg-final { scan-tree-dump-times   "bit_xor_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } */
+/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } */
+
-- 
2.43.0



[PATCH 1/2] Match: Add maybe_bit_not instead of plain matching

2024-05-26 Thread Andrew Pinski
While working on adding matching of negative expressions of `a - b`,
I noticed that we started to have "duplicated" patterns due to not having
a way to match maybe negative expressions. So I went back to what I did for
bit_not and decided to improve the situtation there so for some patterns
where we had 2 operands of an expression where one could have been a bit_not,
add back maybe_bit_not.
This does not add maybe_bit_not in every place were bitwise_inverted_equal_p
is used, just the ones were 2 operands of an expression could be swapped.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd (bit_not_with_nop): Unconditionalize.
(maybe_cmp): Likewise.
(maybe_bit_not): New match pattern.
(`~X & X`): Use maybe_bit_not and add `:c` back.
(`~x ^ x`/`~x | x`): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 024e3350465..090ad4e08b0 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -167,7 +167,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
   && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
(@0))
 
-#if GIMPLE
 /* These are used by gimple_bitwise_inverted_equal_p to simplify
detection of BIT_NOT and comparisons. */
 (match (bit_not_with_nop @0)
@@ -188,7 +187,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_xor@0 @1 @2)
  (if (INTEGRAL_TYPE_P (type)
   && TYPE_PRECISION (type) == 1)))
-#endif
+/* maybe_bit_not is used to match what
+   is acceptable for bitwise_inverted_equal_p. */
+(match (maybe_bit_not @0)
+ (bit_not_with_nop@0 @1))
+(match (maybe_bit_not @0)
+ (INTEGER_CST@0))
+(match (maybe_bit_not @0)
+ (maybe_cmp@0 @1))
 
 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
@@ -1332,7 +1338,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Simplify ~X & X as zero.  */
 (simplify
- (bit_and (convert? @0) (convert? @1))
+ (bit_and:c (convert? @0) (convert? (maybe_bit_not @1)))
  (with { bool wascmp; }
   (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
&& bitwise_inverted_equal_p (@0, @1, wascmp))
@@ -1597,7 +1603,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* ~x ^ x -> -1 */
 (for op (bit_ior bit_xor)
  (simplify
-  (op (convert? @0) (convert? @1))
+  (op:c (convert? @0) (convert? (maybe_bit_not @1)))
   (with { bool wascmp; }
(if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
 && bitwise_inverted_equal_p (@0, @1, wascmp))
-- 
2.43.0



[gcc r15-813] Use simple_dce_from_worklist in phiprop

2024-05-24 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:3e06763a695d97aa46c9de71573ec6a43bb92449

commit r15-813-g3e06763a695d97aa46c9de71573ec6a43bb92449
Author: Andrew Pinski 
Date:   Thu May 23 09:56:37 2024 -0700

Use simple_dce_from_worklist in phiprop

I noticed that phiprop leaves around phi nodes which
defines a ssa name which is unused. This just adds a
bitmap to mark those ssa names and then calls
simple_dce_from_worklist at the very end to remove
those phi nodes and all of the dependencies if there
was any. This might allow us to optimize something earlier
due to the removal of the phi which was taking the address
of the variables.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (phiprop_insert_phi): Add
dce_ssa_names argument. Add the phi's result to it.
(propagate_with_phi): Add dce_ssa_names argument.
Update call to phiprop_insert_phi.
(pass_phiprop::execute): Update call to propagate_with_phi.
Call simple_dce_from_worklist if there was a change.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/tree-ssa-phiprop.cc | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 041521ef106..2a1cdae46d2 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "tree-ssa-loop.h"
 #include "tree-cfg.h"
+#include "tree-ssa-dce.h"
 
 /* This pass propagates indirect loads through the PHI node for its
address to make the load source possibly non-addressable and to
@@ -132,12 +133,15 @@ phivn_valid_p (struct phiprop_d *phivn, tree name, 
basic_block bb)
 
 static tree
 phiprop_insert_phi (basic_block bb, gphi *phi, gimple *use_stmt,
-   struct phiprop_d *phivn, size_t n)
+   struct phiprop_d *phivn, size_t n,
+   bitmap dce_ssa_names)
 {
   tree res;
   gphi *new_phi = NULL;
   edge_iterator ei;
   edge e;
+  tree phi_result = PHI_RESULT (phi);
+  bitmap_set_bit (dce_ssa_names, SSA_NAME_VERSION (phi_result));
 
   gcc_assert (is_gimple_assign (use_stmt)
  && gimple_assign_rhs_code (use_stmt) == MEM_REF);
@@ -276,7 +280,7 @@ chk_uses (tree, tree *idx, void *data)
 
 static bool
 propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-   size_t n)
+   size_t n, bitmap dce_ssa_names)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -420,9 +424,10 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
goto next;
}
 
- phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
 
- /* Remove old stmt.  The phi is taken care of by DCE.  */
+ /* Remove old stmt. The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  /* Unlinking the VDEF here is fine as we are sure that we process
 stmts in execution order due to aggregate copies having VDEFs
@@ -442,16 +447,15 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
 is the first load transformation.  */
   else if (!phi_inserted)
{
- res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
  type = TREE_TYPE (res);
 
  /* Remember the value we created for *ptr.  */
  phivn[SSA_NAME_VERSION (ptr)].value = res;
  phivn[SSA_NAME_VERSION (ptr)].vuse = vuse;
 
- /* Remove old stmt.  The phi is taken care of by DCE, if we
-want to delete it here we also have to delete all intermediate
-copies.  */
+ /* Remove old stmt.  The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  gsi_remove (, true);
 
@@ -514,6 +518,7 @@ pass_phiprop::execute (function *fun)
   gphi_iterator gsi;
   unsigned i;
   size_t n;
+  auto_bitmap dce_ssa_names;
 
   calculate_dominance_info (CDI_DOMINATORS);
 
@@ -531,11 +536,14 @@ pass_phiprop::execute (function *fun)
   if (bb_has_abnormal_pred (bb))
continue;
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
-   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n, 
dce_ssa_names);
 }
 
   if (did_something)
-gsi_commit_edge_inserts ();
+{
+  gsi_commit_edge_inserts ();
+  simple_dce_from_worklist (dce_ssa_names);
+}
 
   free (phivn);


[PATCH] Use simple_dce_from_worklist in phiprop

2024-05-23 Thread Andrew Pinski
I noticed that phiprop leaves around phi nodes which
defines a ssa name which is unused. This just adds a
bitmap to mark those ssa names and then calls
simple_dce_from_worklist at the very end to remove
those phi nodes and all of the dependencies if there
was any. This might allow us to optimize something earlier
due to the removal of the phi which was taking the address
of the variables.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiprop.cc (phiprop_insert_phi): Add
dce_ssa_names argument. Add the phi's result to it.
(propagate_with_phi): Add dce_ssa_names argument.
Update call to phiprop_insert_phi.
(pass_phiprop::execute): Update call to propagate_with_phi.
Call simple_dce_from_worklist if there was a change.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-ssa-phiprop.cc | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc
index 041521ef106..2a1cdae46d2 100644
--- a/gcc/tree-ssa-phiprop.cc
+++ b/gcc/tree-ssa-phiprop.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "tree-ssa-loop.h"
 #include "tree-cfg.h"
+#include "tree-ssa-dce.h"
 
 /* This pass propagates indirect loads through the PHI node for its
address to make the load source possibly non-addressable and to
@@ -132,12 +133,15 @@ phivn_valid_p (struct phiprop_d *phivn, tree name, 
basic_block bb)
 
 static tree
 phiprop_insert_phi (basic_block bb, gphi *phi, gimple *use_stmt,
-   struct phiprop_d *phivn, size_t n)
+   struct phiprop_d *phivn, size_t n,
+   bitmap dce_ssa_names)
 {
   tree res;
   gphi *new_phi = NULL;
   edge_iterator ei;
   edge e;
+  tree phi_result = PHI_RESULT (phi);
+  bitmap_set_bit (dce_ssa_names, SSA_NAME_VERSION (phi_result));
 
   gcc_assert (is_gimple_assign (use_stmt)
  && gimple_assign_rhs_code (use_stmt) == MEM_REF);
@@ -276,7 +280,7 @@ chk_uses (tree, tree *idx, void *data)
 
 static bool
 propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-   size_t n)
+   size_t n, bitmap dce_ssa_names)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -420,9 +424,10 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
goto next;
}
 
- phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
 
- /* Remove old stmt.  The phi is taken care of by DCE.  */
+ /* Remove old stmt. The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  /* Unlinking the VDEF here is fine as we are sure that we process
 stmts in execution order due to aggregate copies having VDEFs
@@ -442,16 +447,15 @@ propagate_with_phi (basic_block bb, gphi *phi, struct 
phiprop_d *phivn,
 is the first load transformation.  */
   else if (!phi_inserted)
{
- res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n);
+ res = phiprop_insert_phi (bb, phi, use_stmt, phivn, n, dce_ssa_names);
  type = TREE_TYPE (res);
 
  /* Remember the value we created for *ptr.  */
  phivn[SSA_NAME_VERSION (ptr)].value = res;
  phivn[SSA_NAME_VERSION (ptr)].vuse = vuse;
 
- /* Remove old stmt.  The phi is taken care of by DCE, if we
-want to delete it here we also have to delete all intermediate
-copies.  */
+ /* Remove old stmt.  The phi and all of maybe its depedencies
+will be removed later via simple_dce_from_worklist. */
  gsi = gsi_for_stmt (use_stmt);
  gsi_remove (, true);
 
@@ -514,6 +518,7 @@ pass_phiprop::execute (function *fun)
   gphi_iterator gsi;
   unsigned i;
   size_t n;
+  auto_bitmap dce_ssa_names;
 
   calculate_dominance_info (CDI_DOMINATORS);
 
@@ -531,11 +536,14 @@ pass_phiprop::execute (function *fun)
   if (bb_has_abnormal_pred (bb))
continue;
   for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
-   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n);
+   did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n, 
dce_ssa_names);
 }
 
   if (did_something)
-gsi_commit_edge_inserts ();
+{
+  gsi_commit_edge_inserts ();
+  simple_dce_from_worklist (dce_ssa_names);
+}
 
   free (phivn);
 
-- 
2.43.0



Re: [PATCH] [RFC] Target-independent store forwarding avoidance. [PR48696] Target-independent store forwarding avoidance.

2024-05-23 Thread Andrew Pinski
On Thu, May 23, 2024 at 8:01 AM Manolis Tsamis  wrote:
>
> This pass detects cases of expensive store forwarding and tries to avoid them
> by reordering the stores and using suitable bit insertion sequences.
> For example it can transform this:
>
>  strbw2, [x1, 1]
>  ldr x0, [x1]  # Epxensive store forwarding to larger load.
>
> To:
>
>  ldr x0, [x1]
>  strbw2, [x1]
>  bfi x0, x2, 0, 8
>

Are you sure this is correct with respect to the C11/C++11 memory
models? If not then the pass should be gated with
flag_store_data_races.
Also stores like this start a new "alias set" (I can't remember the
exact term here). So how do you represent the store's aliasing set? Do
you change it? If not, are you sure that will do the right thing?

You didn't document the new option or the new --param (invoke.texi);
this is the bare minimum requirement.
Note you should add documentation for the new pass in the internals
manual (passes.texi) (note most folks forget to update this when
adding a new pass).

Thanks,
Andrew


> Assembly like this can appear with bitfields or type punning / unions.
> On stress-ng when running the cpu-union microbenchmark the following speedups
> have been observed.
>
>   Neoverse-N1:  +29.4%
>   Intel Coffeelake: +13.1%
>   AMD 5950X:+17.5%
>
> PR rtl-optimization/48696
>
> gcc/ChangeLog:
>
> * Makefile.in: Add avoid-store-forwarding.o.
> * common.opt: New option -favoid-store-forwarding.
> * params.opt: New param store-forwarding-max-distance.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> * avoid-store-forwarding.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/avoid-store-forwarding-1.c: New test.
> * gcc.dg/avoid-store-forwarding-2.c: New test.
> * gcc.dg/avoid-store-forwarding-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
>  gcc/Makefile.in   |   1 +
>  gcc/avoid-store-forwarding.cc | 554 ++
>  gcc/common.opt|   4 +
>  gcc/params.opt|   4 +
>  gcc/passes.def|   1 +
>  .../gcc.dg/avoid-store-forwarding-1.c |  46 ++
>  .../gcc.dg/avoid-store-forwarding-2.c |  39 ++
>  .../gcc.dg/avoid-store-forwarding-3.c |  31 +
>  gcc/tree-pass.h   |   1 +
>  9 files changed, 681 insertions(+)
>  create mode 100644 gcc/avoid-store-forwarding.cc
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/avoid-store-forwarding-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index a7f15694c34..be969b1ca1d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1681,6 +1681,7 @@ OBJS = \
> statistics.o \
> stmt.o \
> stor-layout.o \
> +   avoid-store-forwarding.o \
> store-motion.o \
> streamer-hooks.o \
> stringpool.o \
> diff --git a/gcc/avoid-store-forwarding.cc b/gcc/avoid-store-forwarding.cc
> new file mode 100644
> index 000..d90627c4872
> --- /dev/null
> +++ b/gcc/avoid-store-forwarding.cc
> @@ -0,0 +1,554 @@
> +/* Avoid store forwarding optimization pass.
> +   Copyright (C) 2024 Free Software Foundation, Inc.
> +   Contributed by VRULL GmbH.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3, or (at your option)
> +   any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with GCC; see the file COPYING3.  If not see
> +   .  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "rtl.h"
> +#include "alias.h"
> +#include "rtlanal.h"
> +#include "tree-pass.h"
> +#include "cselib.h"
> +#include "predict.h"
> +#include "insn-config.h"
> +#include "expmed.h"
> +#include "recog.h"
> +#include "regset.h"
> +#include "df.h"
> +#include "expr.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "vec.h"
> +
> +/* This pass tries to detect and avoid cases of store forwarding.
> +   On many processors there is a large penalty when smaller stores are
> +   forwarded to larger loads.  The idea used to avoid the stall is to move
> +   the store after the load and in addition emit a bit insert sequence so
> +   the load register has the 

Re: [PATCH] AARCH64: Add Qualcomnm oryon-1 core

2024-05-22 Thread Andrew Pinski
On Tue, May 14, 2024 at 10:27 AM Kyrill Tkachov
 wrote:
>
> Hi Andrew,
>
> On Fri, May 3, 2024 at 8:50 PM Andrew Pinski  wrote:
>>
>> This patch adds Qualcomm's new oryon-1 core; this is enough
>> to recongize the core and later on will add the tuning structure.
>>
>> gcc/ChangeLog:
>>
>> * config/aarch64/aarch64-cores.def (oryon-1): New entry.
>> * config/aarch64/aarch64-tune.md: Regenerate.
>> * doc/invoke.texi  (AArch64 Options): Document oryon-1.
>>
>> Signed-off-by: Andrew Pinski 
>> Co-authored-by: Joel Jones 
>> Co-authored-by: Wei Zhao 
>> ---
>>  gcc/config/aarch64/aarch64-cores.def | 5 +
>>  gcc/config/aarch64/aarch64-tune.md   | 2 +-
>>  gcc/doc/invoke.texi  | 1 +
>>  3 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index f69fc212d56..be60929e400 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, 
>> cortexa57, V8_4A,  (SVE, I8MM, B
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO), 
>> saphira,   0x51, 0xC01, -1)
>>
>> +/* ARMv8.6-A Architecture Processors.  */
>> +
>> +/* Qualcomm ('Q') cores. */
>> +AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, F16), 
>> cortexa72,   0x51, 0x001, -1)
>> +
>>  /* ARMv8-A big.LITTLE implementations.  */
>>
>>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, V8A,  
>> (CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
>> diff --git a/gcc/config/aarch64/aarch64-tune.md 
>> b/gcc/config/aarch64/aarch64-tune.md
>> index abd3c9e0822..ba940f1c890 100644
>> --- a/gcc/config/aarch64/aarch64-tune.md
>> +++ b/gcc/config/aarch64/aarch64-tune.md
>> @@ -1,5 +1,5 @@
>>  ;; -*- buffer-read-only: t -*-
>>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>>  (define_attr "tune"
>> -   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> +   
>> "cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
>> (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 9456ced468a..eabe09dc28f 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -21323,6 +21323,7 @@ performance of the code.  Permissible values for 
>> this option are:
>>  @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
>>  @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
>>  @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
>> +@samp{oyron-1},
>
>
> Typo in the name.
> LGTM with that fixed.

Thanks, pushed as r15-784-g01cfd601825014.

Thanks,
Andrew

> Thanks,
> Kyrill
>
>>
>>
>>  @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
>>  @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
>>  @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},
>> --
>> 2.43.0
>>


[gcc r15-784] AARCH64: Add Qualcomnm oryon-1 core

2024-05-22 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:01cfd6018250141a262219c5803c3f2a278d909d

commit r15-784-g01cfd6018250141a262219c5803c3f2a278d909d
Author: Andrew Pinski 
Date:   Fri Apr 5 13:40:35 2024 -0700

AARCH64: Add Qualcomnm oryon-1 core

This patch adds Qualcomm's new oryon-1 core; this is enough
to recongize the core and later on will add the tuning structure.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (oryon-1): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi  (AArch64 Options): Document oryon-1.

Signed-off-by: Andrew Pinski 
Co-authored-by: Joel Jones 
Co-authored-by: Wei Zhao 

Diff:
---
 gcc/config/aarch64/aarch64-cores.def | 5 +
 gcc/config/aarch64/aarch64-tune.md   | 2 +-
 gcc/doc/invoke.texi  | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index f69fc212d56..be60929e400 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -151,6 +151,11 @@ AARCH64_CORE("neoverse-512tvb", neoverse512tvb, cortexa57, 
V8_4A,  (SVE, I8MM, B
 /* Qualcomm ('Q') cores. */
 AARCH64_CORE("saphira", saphira,saphira,V8_4A,  (CRYPTO), saphira, 
  0x51, 0xC01, -1)
 
+/* ARMv8.6-A Architecture Processors.  */
+
+/* Qualcomm ('Q') cores. */
+AARCH64_CORE("oryon-1", oryon1, cortexa57, V8_6A, (CRYPTO, SM4, SHA3, F16), 
cortexa72,   0x51, 0x001, -1)
+
 /* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, V8A,  
(CRC), cortexa57, 0x41, AARCH64_BIG_LITTLE (0xd07, 0xd03), -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index abd3c9e0822..ba940f1c890 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,ampere1b,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,oryon1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,cobalt100,neoversev2,demeter,generic,generic_armv8_a,generic_armv9_a"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0625a5ede6f..c9d8f6b37b6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21382,6 +21382,7 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a65}, @samp{cortex-a65ae}, @samp{cortex-a34},
 @samp{cortex-a78}, @samp{cortex-a78ae}, @samp{cortex-a78c},
 @samp{ares}, @samp{exynos-m1}, @samp{emag}, @samp{falkor},
+@samp{oryon-1},
 @samp{neoverse-512tvb}, @samp{neoverse-e1}, @samp{neoverse-n1},
 @samp{neoverse-n2}, @samp{neoverse-v1}, @samp{neoverse-v2}, @samp{qdf24xx},
 @samp{saphira}, @samp{phecda}, @samp{xgene1}, @samp{vulcan},


Re: [PATCH] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Andrew Pinski
On Wed, May 22, 2024 at 5:28 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold 
> > vget_high_*
> > intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
> > arm_neon.h to use the new intrinsics framework.
> >
> >   PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc 
> > (AARCH64_SIMD_VGET_HIGH_BUILTINS):
> >   New macro to create definitions for all vget_high intrinsics.
> >   (VGET_HIGH_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_high function codes.
> >   (AARCH64_SIMD_VGET_LOW_BUILTINS): Delete duplicate macro.
> >   (aarch64_general_fold_builtin): Fold vget_high calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_high builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_high): Delete.
> >   (aarch64_vget_hi_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_high_f16): Likewise.
> >   (vget_high_f32): Likewise.
> >   (vget_high_f64): Likewise.
> >   (vget_high_p8): Likewise.
> >   (vget_high_p16): Likewise.
> >   (vget_high_p64): Likewise.
> >   (vget_high_s8): Likewise.
> >   (vget_high_s16): Likewise.
> >   (vget_high_s32): Likewise.
> >   (vget_high_s64): Likewise.
> >   (vget_high_u8): Likewise.
> >   (vget_high_u16): Likewise.
> >   (vget_high_u32): Likewise.
> >   (vget_high_u64): Likewise.
> >   (vget_high_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/vget_high_2.c: New test.
> >   * gcc.target/aarch64/vget_high_2_be.c: New test.
>
> OK, thanks.

Pushed as r15-778-g1d1ef1c22752b3 .

Thanks,
Andrew


>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  59 +++---
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   6 -
> >  gcc/config/aarch64/aarch64-simd.md|  22 
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  .../gcc.target/aarch64/vget_high_2.c  |  30 +
> >  .../gcc.target/aarch64/vget_high_2_be.c   |  31 ++
> >  6 files changed, 104 insertions(+), 149 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_high_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 11b888016ed..f8eeccb554d 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -675,6 +675,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VGET_LOW_BUILTIN(u64) \
> >VGET_LOW_BUILTIN(bf16)
> >
> > +#define AARCH64_SIMD_VGET_HIGH_BUILTINS \
> > +  VGET_HIGH_BUILTIN(f16) \
> > +  VGET_HIGH_BUILTIN(f32) \
> > +  VGET_HIGH_BUILTIN(f64) \
> > +  VGET_HIGH_BUILTIN(p8) \
> > +  VGET_HIGH_BUILTIN(p16) \
> > +  VGET_HIGH_BUILTIN(p64) \
> > +  VGET_HIGH_BUILTIN(s8) \
> > +  VGET_HIGH_BUILTIN(s16) \
> > +  VGET_HIGH_BUILTIN(s32) \
> > +  VGET_HIGH_BUILTIN(s64) \
> > +  VGET_HIGH_BUILTIN(u8) \
> > +  VGET_HIGH_BUILTIN(u16) \
> > +  VGET_HIGH_BUILTIN(u32) \
> > +  VGET_HIGH_BUILTIN(u64) \
> > +  VGET_HIGH_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -717,6 +734,9 @@ typedef struct
> >  #define VGET_LOW_BUILTIN(A) \
> >AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> >
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_HIGH_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -753,6 +773,7 @@ enum aarch64_builtins
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> >AARCH64_SIMD_VGET_LOW_BUILTINS
> > +  AARCH64_SIMD_VGET_HIGH_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -855,26 +876,21 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> > false \
> >},
> >
> > -#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > -  VGET_LOW_BUILTIN(f16) \
> > -  VGET_LOW_BUILTIN(f32) \
> > -  VGET_LOW_BUILTIN(f64) \
> > -  VGET_LOW_BUILTIN(p8) \
> > -  VGET_LOW_BUILTIN(p16) \
> > -  VGET_LOW_BUILTIN(p64) \
> > -  VGET_LOW_BUILTIN(s8) \
> > -  VGET_LOW_BUILTIN(s16) \
> > -  VGET_LOW_BUILTIN(s32) \
> > -  VGET_LOW_BUILTIN(s64) \
> > -  VGET_LOW_BUILTIN(u8) \
> > -  VGET_LOW_BUILTIN(u16) \
> > -  VGET_LOW_BUILTIN(u32) \
> > -  VGET_LOW_BUILTIN(u64) \
> > -  VGET_LOW_BUILTIN(bf16)
> > +#undef VGET_HIGH_BUILTIN
> > +#define VGET_HIGH_BUILTIN(A) \
> > +  {"vget_high_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_HIGH_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), 

[gcc r15-778] aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-22 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:1d1ef1c22752b3e250ee769ae6d79f537471a57f

commit r15-778-g1d1ef1c22752b3e250ee769ae6d79f537471a57f
Author: Pengxuan Zheng 
Date:   Tue May 21 10:55:06 2024 -0700

aarch64: Fold vget_high_* intrinsics to BIT_FIELD_REF [PR102171]

This patch is a follow-up of r15-697-ga2e4fe5a53cf75 to also fold 
vget_high_*
intrinsics to BIT_FILED_REF and remove the vget_high_* definitions from
arm_neon.h to use the new intrinsics framework.

PR target/102171

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc 
(AARCH64_SIMD_VGET_HIGH_BUILTINS):
New macro to create definitions for all vget_high intrinsics.
(VGET_HIGH_BUILTIN): Likewise.
(enum aarch64_builtins): Add vget_high function codes.
(AARCH64_SIMD_VGET_LOW_BUILTINS): Delete duplicate macro.
(aarch64_general_fold_builtin): Fold vget_high calls.
* config/aarch64/aarch64-simd-builtins.def: Delete vget_high 
builtins.
* config/aarch64/aarch64-simd.md (aarch64_get_high): Delete.
(aarch64_vget_hi_halfv8bf): Likewise.
* config/aarch64/arm_neon.h (__attribute__): Delete.
(vget_high_f16): Likewise.
(vget_high_f32): Likewise.
(vget_high_f64): Likewise.
(vget_high_p8): Likewise.
(vget_high_p16): Likewise.
(vget_high_p64): Likewise.
(vget_high_s8): Likewise.
(vget_high_s16): Likewise.
(vget_high_s32): Likewise.
(vget_high_s64): Likewise.
(vget_high_u8): Likewise.
(vget_high_u16): Likewise.
(vget_high_u32): Likewise.
(vget_high_u64): Likewise.
(vget_high_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vget_high_2.c: New test.
* gcc.target/aarch64/vget_high_2_be.c: New test.

Signed-off-by: Pengxuan Zheng 

Diff:
---
 gcc/config/aarch64/aarch64-builtins.cc|  59 
 gcc/config/aarch64/aarch64-simd-builtins.def  |   6 --
 gcc/config/aarch64/aarch64-simd.md|  22 -
 gcc/config/aarch64/arm_neon.h | 105 --
 gcc/testsuite/gcc.target/aarch64/vget_high_2.c|  30 +++
 gcc/testsuite/gcc.target/aarch64/vget_high_2_be.c |  31 +++
 6 files changed, 104 insertions(+), 149 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 11b888016ed..f8eeccb554d 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -675,6 +675,23 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VGET_LOW_BUILTIN(u64) \
   VGET_LOW_BUILTIN(bf16)
 
+#define AARCH64_SIMD_VGET_HIGH_BUILTINS \
+  VGET_HIGH_BUILTIN(f16) \
+  VGET_HIGH_BUILTIN(f32) \
+  VGET_HIGH_BUILTIN(f64) \
+  VGET_HIGH_BUILTIN(p8) \
+  VGET_HIGH_BUILTIN(p16) \
+  VGET_HIGH_BUILTIN(p64) \
+  VGET_HIGH_BUILTIN(s8) \
+  VGET_HIGH_BUILTIN(s16) \
+  VGET_HIGH_BUILTIN(s32) \
+  VGET_HIGH_BUILTIN(s64) \
+  VGET_HIGH_BUILTIN(u8) \
+  VGET_HIGH_BUILTIN(u16) \
+  VGET_HIGH_BUILTIN(u32) \
+  VGET_HIGH_BUILTIN(u64) \
+  VGET_HIGH_BUILTIN(bf16)
+
 typedef struct
 {
   const char *name;
@@ -717,6 +734,9 @@ typedef struct
 #define VGET_LOW_BUILTIN(A) \
   AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
 
+#define VGET_HIGH_BUILTIN(A) \
+  AARCH64_SIMD_BUILTIN_VGET_HIGH_##A,
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   AARCH64_SIMD_BUILTIN_##T##_##N##A,
@@ -753,6 +773,7 @@ enum aarch64_builtins
   /* SIMD intrinsic builtins.  */
   AARCH64_SIMD_VREINTERPRET_BUILTINS
   AARCH64_SIMD_VGET_LOW_BUILTINS
+  AARCH64_SIMD_VGET_HIGH_BUILTINS
   /* ARMv8.3-A Pointer Authentication Builtins.  */
   AARCH64_PAUTH_BUILTIN_AUTIA1716,
   AARCH64_PAUTH_BUILTIN_PACIA1716,
@@ -855,26 +876,21 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
false \
   },
 
-#define AARCH64_SIMD_VGET_LOW_BUILTINS \
-  VGET_LOW_BUILTIN(f16) \
-  VGET_LOW_BUILTIN(f32) \
-  VGET_LOW_BUILTIN(f64) \
-  VGET_LOW_BUILTIN(p8) \
-  VGET_LOW_BUILTIN(p16) \
-  VGET_LOW_BUILTIN(p64) \
-  VGET_LOW_BUILTIN(s8) \
-  VGET_LOW_BUILTIN(s16) \
-  VGET_LOW_BUILTIN(s32) \
-  VGET_LOW_BUILTIN(s64) \
-  VGET_LOW_BUILTIN(u8) \
-  VGET_LOW_BUILTIN(u16) \
-  VGET_LOW_BUILTIN(u32) \
-  VGET_LOW_BUILTIN(u64) \
-  VGET_LOW_BUILTIN(bf16)
+#undef VGET_HIGH_BUILTIN
+#define VGET_HIGH_BUILTIN(A) \
+  {"vget_high_" #A, \
+   AARCH64_SIMD_BUILTIN_VGET_HIGH_##A, \
+   2, \
+   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
+   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
+   FLAG_AUTO_FP, \
+   false \
+  },
 
 static const aarch64_simd_intrinsic_datum aarch64_simd_intrinsic_data[] = {
   AARCH64_SIMD_VREINTERPRET_BUILTINS
   AARCH64_SIMD_VGET_LOW_BUILTINS
+  AARCH64_SIMD_VGET_HIGH_BUILTINS
 };
 
 
@@ -3270,6 +3286,10 @@ 

[gcc r13-8784] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-21 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:3f6a42510a1bd4b004ed70ac44cdad2770b732a8

commit r13-8784-g3f6a42510a1bd4b004ed70ac44cdad2770b732a8
Author: Andrew Pinski 
Date:   Sat May 18 11:55:58 2024 -0700

PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 9ff8f041331ef8b56007fb3c4d41d76f9850010d)

Diff:
---
 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c | 21 +
 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c | 30 
 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c | 29 +++
 gcc/tree-ssa-phiopt.cc   | 12 ++
 4 files changed, 92 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index 000..5cb119ea432
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index 000..05c3bbe9738
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index 000..53c5fb5588e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index c3d78d1400b..d507530307a 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -2106,6 +2106,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
@@ -2119,6 +2123,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)

[gcc r13-8783] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-21 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:d6cf49eaf5ac237c57785dce42c89deac911affa

commit r13-8783-gd6cf49eaf5ac237c57785dce42c89deac911affa
Author: Andrew Pinski 
Date:   Mon May 20 00:16:40 2024 -0700

match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for 
signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround 
the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for 
that.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): 
Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 49c87d22535ac4f8aacf088b3f462861c26cacb4)

Diff:
---
 gcc/match.pd   |  6 --
 .../c-c++-common/ubsan/signed1bitfield-1.c | 25 ++
 .../gcc.c-torture/execute/signed1bitfield-1.c  | 23 
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index dc34e7ead9f..fda4a211efc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2023,12 +2023,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}


[gcc r14-10224] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-21 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:b2bb49d6a77e4568c0b91db17b2599f5929fb85b

commit r14-10224-gb2bb49d6a77e4568c0b91db17b2599f5929fb85b
Author: Andrew Pinski 
Date:   Mon May 20 00:16:40 2024 -0700

match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for 
signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround 
the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for 
that.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): 
Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 49c87d22535ac4f8aacf088b3f462861c26cacb4)

Diff:
---
 gcc/match.pd   |  6 --
 .../c-c++-common/ubsan/signed1bitfield-1.c | 25 ++
 .../gcc.c-torture/execute/signed1bitfield-1.c  | 23 
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index d401e7503e6..4a0aa80cee1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}


[gcc r15-755] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-21 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:49c87d22535ac4f8aacf088b3f462861c26cacb4

commit r15-755-g49c87d22535ac4f8aacf088b3f462861c26cacb4
Author: Andrew Pinski 
Date:   Mon May 20 00:16:40 2024 -0700

match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for 
signed 1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround 
the undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for 
that.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): 
Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/match.pd   |  6 --
 .../c-c++-common/ubsan/signed1bitfield-1.c | 25 ++
 .../gcc.c-torture/execute/signed1bitfield-1.c  | 23 
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..35e3d82b131 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}


Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024 at 5:28 AM Li, Pan2  wrote:
>
> Thanks Andrew for comments.
>
>
>
> > I think you need to make sure type and @0's type matches.
>
>
>
> Oh, yes, we need that, will update in v2.
>
>
>
> > Also I don't think you need :c here since you don't match @0 nor @1 more 
> > than once.
>
>
>
> You mean the :c from (IFN_ADD_OVERFLOW:c@2 @0 @1)), right?
>
> My initial idea is to catch both the (IFN_ADD_OVERFLOW @0 @1) and 
> (IFN_ADD_OVERFLOW @1 @0).
>
> It is unnecessary if IFN_ADD_OVERFLOW takes care of this already.

Since in this case there is Canonical form/order here (at least there
should be).
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))

Since you matching @2 for the realpart rather than `(IFN_ADD_OVERFLOW
@0 @1)` directly the :c is not needed and genmatch will just generate
extra matching code that cannot be not get reached

Thanks,
Andrew.

>
>
>
> Pan
>
>
>
>
>
> From: Andrew Pinski 
> Sent: Tuesday, May 21, 2024 7:40 PM
> To: Li, Pan2 
> Cc: GCC Patches ; 钟居哲 ; Kito 
> Cheng ; Tamar Christina ; 
> Richard Guenther 
> Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form 
> for unsigned SAT_ADD
>
>
>
>
>
> On Tue, May 21, 2024, 3:55 AM  wrote:
>
> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, ) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..8b9ded98323 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))
>
>
>
> I think you need to make sure type and @0's type matches.
>
>
>
> Also I don't think you need :c here since you don't match @0 nor @1 more than 
> once.
>
>
>
> Thanks,
>
> Andrew
>
>
>
>
>
> +
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1


Re: [PATCH] driver: Use -as/ld as final fallback instead of as/ld for cross

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024 at 5:12 AM YunQiang Su  wrote:
>
> If `find_a_program` cannot find `as/ld` and we are a cross toolchain,
> the final fallback is `as/ld` of system.  In fact, we can have a try
> with -as/ld before fallback to native as/ld.
>
> This patch is derivatived from Debian's patch:
>   gcc-search-prefixed-as-ld.diff
>
> gcc
> * gcc.cc(execute): Looks for -as/ld before fallback
> to native as/ld.
> ---
>  gcc/gcc.cc | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 830a4700a87..8a1bdb5e3e2 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3293,6 +3293,27 @@ execute (void)
>string = find_a_program(commands[0].prog);
>if (string)
> commands[0].argv[0] = string;
> +  else if (*cross_compile != '0'
> +   && (!strcmp (commands[0].argv[0], "as")
> +   || !strcmp (commands[0].argv[0], "ld")))
> +   {
> + string = XNEWVEC (char, strlen (commands[0].argv[0]) + 2
> + + strlen (DEFAULT_REAL_TARGET_MACHINE));
> + strcpy (string, DEFAULT_REAL_TARGET_MACHINE);
> + strcat (string, "-");
> + strcat (string, commands[0].argv[0]);
> + const char *string_args[] = {string, "--version", NULL};
> + int exit_status = 0;
> + int err = 0;
> + const char *errmsg = pex_one (PEX_SEARCH, string,
> + CONST_CAST (char **, string_args), string,
> + NULL, NULL, _status, );

I think this should be handled under find_a_program instead of
execute. That should simplify things slightly.
You should also most likely use concat here instead of
XNEWVEC/strcpy/strcat which will also simplify the code.
Like string = concat (DEFAULT_REAL_TARGET_MACHINE, "-", commands[0].prog);

I think this should be done for more than just as/ld but also objcopy
(which is used for gsplit-dwarf).
Is there a reason why you are needing to try to execute with
"--version" as an argument here?

Thanks,
Andrew Pinski

> + if (errmsg == NULL && exit_status == 0 && err == 0)
> +   {
> + commands[0].argv[0] = string;
> + commands[0].prog = string;
> +   }
> +   }
>  }
>
>for (n_commands = 1, i = 0; argbuf.iterate (i, ); i++)
> --
> 2.39.2
>


Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD

2024-05-21 Thread Andrew Pinski
On Tue, May 21, 2024, 3:55 AM  wrote:

> From: Pan Li 
>
> This patch would like to support the __builtin_add_overflow branch form for
> unsigned SAT_ADD.  For example as below:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, ) ? -1 : ret;
> }
>
> Different to the branchless version,  we leverage the simplify to
> convert the branch version of SAT_ADD into branchless if and only
> if the backend has supported the IFN_SAT_ADD.  Thus,  the backend has
> the ability to choose branch or branchless implementation of .SAT_ADD.
> For example,  some target can take care of branches code more optimally.
>
> When the target implement the IFN_SAT_ADD for unsigned and before this
> patch:
>
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are passed for this patch:
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * match.pd: Add new simplify to convert branch SAT_ADD into
> branchless,  if and only if backend implement the IFN.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..8b9ded98323 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_add @0 @1)
>   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
>
> +#if GIMPLE
> +
> +(simplify
> + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
> +  integer_minus_onep (realpart @2))
> + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type,
> OPTIMIZE_FOR_BOTH))
> +  (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0))
>

I think you need to make sure type and @0's type matches.

Also I don't think you need :c here since you don't match @0 nor @1 more
than once.

Thanks,
Andrew


+
> +#endif
> +
>  /* x >  y  &&  x != XXX_MIN  -->  x > y
> x >  y  &&  x == XXX_MIN  -->  false . */
>  (for eqne (eq ne)
> --
> 2.34.1
>
>


[PATCH] match: Disable `(type)zero_one_valuep*CST` for 1bit signed types [PR115154]

2024-05-20 Thread Andrew Pinski
The problem here is the pattern added in r13-1162-g9991d84d2a8435
assumes that it is well defined to multiply zero_one_valuep by the truncated
converted integer constant. It is well defined for all types except for signed 
1bit types.
Where `a * -1` is produced which is undefined/
So disable this pattern for 1bit signed types.

Note the pattern added in r14-3432-gddd64a6ec3b38e is able to workaround the 
undefinedness except when
`-fsanitize=undefined` is turned on, this is why I added a testcase for that.

OK for trunk and gcc-14 and gcc-13 branches? Bootstrapped and tested on 
x86_64-linux-gnu with no regressions.

PR tree-optimization/115154

gcc/ChangeLog:

* match.pd (convert (mult zero_one_valued_p@1 INTEGER_CST@2)): Disable
for 1bit signed types.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/signed1bitfield-1.c: New test.
* gcc.c-torture/execute/signed1bitfield-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd  |  6 +++--
 .../c-c++-common/ubsan/signed1bitfield-1.c| 25 +++
 .../gcc.c-torture/execute/signed1bitfield-1.c | 23 +
 3 files changed, 52 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 0f9c34fa897..35e3d82b131 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2395,12 +2395,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (mult (convert @0) @1)))
 
 /* Narrow integer multiplication by a zero_one_valued_p operand.
-   Multiplication by [0,1] is guaranteed not to overflow.  */
+   Multiplication by [0,1] is guaranteed not to overflow except for
+   1bit signed types.  */
 (simplify
  (convert (mult@0 zero_one_valued_p@1 INTEGER_CST@2))
  (if (INTEGRAL_TYPE_P (type)
   && INTEGRAL_TYPE_P (TREE_TYPE (@0))
-  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0)))
+  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (@0))
+  && (TYPE_UNSIGNED (type) || TYPE_PRECISION (type) > 1))
   (mult (convert @1) (convert @2
 
 /* (X << C) != 0 can be simplified to X, when C is zero_one_valued_p.
diff --git a/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c 
b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
new file mode 100644
index 000..2ba8cf4dab0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/ubsan/signed1bitfield-1.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fsanitize=undefined" } */
+
+/* PR tree-optimization/115154 */
+/* This was being miscompiled with -fsanitize=undefined due to
+   `(signed:1)(t*5)` being transformed into `-((signed:1)t)` which
+   is undefined. */
+
+struct s {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c 
b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
new file mode 100644
index 000..ab888ca3a04
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/signed1bitfield-1.c
@@ -0,0 +1,23 @@
+/* PR tree-optimization/115154 */
+/* This was being miscompiled to `(signed:1)(t*5)`
+   being transformed into `-((signed:1)t)` which is undefined.
+   Note there is a pattern which removes the negative in some cases
+   which works around the issue.  */
+
+struct {
+  signed b : 1;
+} f;
+int i = 55;
+__attribute__((noinline))
+void check(int a)
+{
+if (!a)
+__builtin_abort();
+}
+int main() {
+int t = i != 5;
+t = t*5;
+f.b = t;
+int tt = f.b;
+check(f.b);
+}
-- 
2.43.0



[gcc r14-10222] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:89ab128656b9da1359705bd770ae7d2367b33ec2

commit r14-10222-g89ab128656b9da1359705bd770ae7d2367b33ec2
Author: Andrew Pinski 
Date:   Sat May 18 11:55:58 2024 -0700

PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 9ff8f041331ef8b56007fb3c4d41d76f9850010d)

Diff:
---
 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c | 21 +
 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c | 30 
 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c | 29 +++
 gcc/tree-ssa-phiopt.cc   | 12 ++
 4 files changed, 92 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index ..5cb119ea4325
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index ..05c3bbe9738e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index ..53c5fb5588e9
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index d1746c4b468a..150e58e39e3f 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1918,6 +1918,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
@@ -1931,6 +1935,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimpl

[gcc r15-699] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:9ff8f041331ef8b56007fb3c4d41d76f9850010d

commit r15-699-g9ff8f041331ef8b56007fb3c4d41d76f9850010d
Author: Andrew Pinski 
Date:   Sat May 18 11:55:58 2024 -0700

PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c | 21 +
 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c | 30 
 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c | 29 +++
 gcc/tree-ssa-phiopt.cc   | 12 ++
 4 files changed, 92 insertions(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index ..5cb119ea4325
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index ..05c3bbe9738e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index ..53c5fb5588e9
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f166c3132cb7..918cf50b5898 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1925,6 +1925,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
@@ -1938,6 +1942,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes i

RE: [PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-20 Thread Andrew Pinski (QUIC)
> -Original Message-
> From: Richard Biener 
> Sent: Sunday, May 19, 2024 11:55 AM
> To: Andrew Pinski (QUIC) 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] PHIOPT: Don't transform minmax if
> middle bb contains a phi [PR115143]
> 
> 
> 
> > Am 19.05.2024 um 01:12 schrieb Andrew Pinski
> :
> >
> > The problem here is even if last_and_only_stmt returns a
> statement,
> > the bb might still contain a phi node which defines a ssa
> name which
> > is used in that statement so we need to add a check to make
> sure that
> > the phi nodes are empty for the middle bbs in both the
> > `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B`
> cases.
> 
> Is that single arg PHIs or do we have an extra edge into the
> middle BB?  I think that might be unexpected, at least costing
> wise.  Maybe Also to some of the replacement code we have ?

It is only a single arg PHI since we already reject multiple edges in the 
middle BBs for these cases.
It was EVPR that produces the single arg PHI in the original testcase from 
folding of a conditional to false and evpr does not do simple name prop in this 
case and there is no pass inbetween evrp and phiopt that will clear up single 
arg PHI.
I added the Gimple based testcases basically to avoid the needing of depending 
on what previous passes could produce too.

> 
> > OK for trunk and backport to all open branches since r14-
> 3827-g30e6ee074588ba was backported?
> > Bootstrapped and tested on x86_64_linux-gnu with no
> regressions.
> >
> 
> Ok

Does this include the GCC 13 branch or should I wait until after the GCC 13.3.0 
release?

Thanks,
Andrew Pinski

> 
> Richard
> 
> >PR tree-optimization/115143
> >
> > gcc/ChangeLog:
> >
> >* tree-ssa-phiopt.cc (minmax_replacement): Check for
> empty
> >phi nodes for middle bbs for the case where middle bb is
> not empty.
> >
> > gcc/testsuite/ChangeLog:
> >
> >* gcc.c-torture/compile/pr115143-1.c: New test.
> >* gcc.c-torture/compile/pr115143-2.c: New test.
> >* gcc.c-torture/compile/pr115143-3.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> > .../gcc.c-torture/compile/pr115143-1.c| 21
> +
> > .../gcc.c-torture/compile/pr115143-2.c| 30
> +++
> > .../gcc.c-torture/compile/pr115143-3.c| 29
> ++
> > gcc/tree-ssa-phiopt.cc| 12 
> > 4 files changed, 92 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-1.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-2.c
> > create mode 100644 gcc/testsuite/gcc.c-
> torture/compile/pr115143-3.c
> >
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > new file mode 100644
> > index 000..5cb119ea432
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
> > @@ -0,0 +1,21 @@
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +short a, d;
> > +char b;
> > +long c;
> > +unsigned long e, f;
> > +void g(unsigned long h) {
> > +  if (c ? e : b)
> > +if (e)
> > +  if (d) {
> > +a = f ? ({
> > +  unsigned long i = d ? f : 0, j = e ? h : 0;
> > +  i < j ? i : j;
> > +}) : 0;
> > +  }
> > +}
> > +
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > new file mode 100644
> > index 000..05c3bbe9738
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-options "-fgimple" } */
> > +/* PR tree-optimization/115143 */
> > +/* This used to ICE.
> > +   minmax part of phiopt would transform,
> > +   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
> > +   which was correct except b was defined by a phi in the
> inner
> > +   bb which was not handled. */
> > +unsigned __GIMPLE (ssa,startwith("phiopt")) foo (unsigned
> a, unsigned
> > +b) {
> > +  unsigned j;
> > +  unsigned _23;
> > +  unsigned _12;
> > +
> > +  __BB(2):
> > +  if (a_6(D) != 0u)
> > +goto __BB3;
> >

Re: [PATCH] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Andrew Pinski
On Mon, May 20, 2024 at 2:57 AM Richard Sandiford
 wrote:
>
> Pengxuan Zheng  writes:
> > This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
> > optimization opportunities for gimple optimizers.
> >
> > While we are here, we also remove the vget_low_* definitions from 
> > arm_neon.h and
> > use the new intrinsics framework.
> >
> > PR target/102171
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-builtins.cc (AARCH64_SIMD_VGET_LOW_BUILTINS):
> >   New macro to create definitions for all vget_low intrinsics.
> >   (VGET_LOW_BUILTIN): Likewise.
> >   (enum aarch64_builtins): Add vget_low function codes.
> >   (aarch64_general_fold_builtin): Fold vget_low calls.
> >   * config/aarch64/aarch64-simd-builtins.def: Delete vget_low builtins.
> >   * config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
> >   (aarch64_vget_lo_halfv8bf): Likewise.
> >   * config/aarch64/arm_neon.h (__attribute__): Delete.
> >   (vget_low_f16): Likewise.
> >   (vget_low_f32): Likewise.
> >   (vget_low_f64): Likewise.
> >   (vget_low_p8): Likewise.
> >   (vget_low_p16): Likewise.
> >   (vget_low_p64): Likewise.
> >   (vget_low_s8): Likewise.
> >   (vget_low_s16): Likewise.
> >   (vget_low_s32): Likewise.
> >   (vget_low_s64): Likewise.
> >   (vget_low_u8): Likewise.
> >   (vget_low_u16): Likewise.
> >   (vget_low_u32): Likewise.
> >   (vget_low_u64): Likewise.
> >   (vget_low_bf16): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/aarch64/pr113573.c: Replace __builtin_aarch64_get_lowv8hi
> >   with vget_low_s16.
> >   * gcc.target/aarch64/vget_low_2.c: New test.
> >   * gcc.target/aarch64/vget_low_2_be.c: New test.
>
> Ok, thanks.  I suppose the patch has the side effect of allowing
> vget_low_bf16 to be called without +bf16.  IMO that's the correct
> behaviour though, and is consistent with how we handle reinterprets.

Pushed as r15-697-ga2e4fe5a53cf75cd055f64e745ebd51253e42254 .

Thanks,
Andrew

>
> Richard
>
> > Signed-off-by: Pengxuan Zheng 
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc|  60 ++
> >  gcc/config/aarch64/aarch64-simd-builtins.def  |   5 +-
> >  gcc/config/aarch64/aarch64-simd.md|  23 +---
> >  gcc/config/aarch64/arm_neon.h | 105 --
> >  gcc/testsuite/gcc.target/aarch64/pr113573.c   |   2 +-
> >  gcc/testsuite/gcc.target/aarch64/vget_low_2.c |  30 +
> >  .../gcc.target/aarch64/vget_low_2_be.c|  31 ++
> >  7 files changed, 124 insertions(+), 132 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
> > b/gcc/config/aarch64/aarch64-builtins.cc
> > index 75d21de1401..4afe7c86ae3 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
> > aarch64_simd_builtin_data[] = {
> >VREINTERPRET_BUILTINS \
> >VREINTERPRETQ_BUILTINS
> >
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  VGET_LOW_BUILTIN(f32) \
> > +  VGET_LOW_BUILTIN(f64) \
> > +  VGET_LOW_BUILTIN(p8) \
> > +  VGET_LOW_BUILTIN(p16) \
> > +  VGET_LOW_BUILTIN(p64) \
> > +  VGET_LOW_BUILTIN(s8) \
> > +  VGET_LOW_BUILTIN(s16) \
> > +  VGET_LOW_BUILTIN(s32) \
> > +  VGET_LOW_BUILTIN(s64) \
> > +  VGET_LOW_BUILTIN(u8) \
> > +  VGET_LOW_BUILTIN(u16) \
> > +  VGET_LOW_BUILTIN(u32) \
> > +  VGET_LOW_BUILTIN(u64) \
> > +  VGET_LOW_BUILTIN(bf16)
> > +
> >  typedef struct
> >  {
> >const char *name;
> > @@ -697,6 +714,9 @@ typedef struct
> >  #define VREINTERPRET_BUILTIN(A, B, L) \
> >AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
> >
> > +#define VGET_LOW_BUILTIN(A) \
> > +  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
> > +
> >  #undef VAR1
> >  #define VAR1(T, N, MAP, FLAG, A) \
> >AARCH64_SIMD_BUILTIN_##T##_##N##A,
> > @@ -732,6 +752,7 @@ enum aarch64_builtins
> >AARCH64_CRC32_BUILTIN_MAX,
> >/* SIMD intrinsic builtins.  */
> >AARCH64_SIMD_VREINTERPRET_BUILTINS
> > +  AARCH64_SIMD_VGET_LOW_BUILTINS
> >/* ARMv8.3-A Pointer Authentication Builtins.  */
> >AARCH64_PAUTH_BUILTIN_AUTIA1716,
> >AARCH64_PAUTH_BUILTIN_PACIA1716,
> > @@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
> > aarch64_fcmla_lane_builtin_data[] = {
> >   && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
> >},
> >
> > +#undef VGET_LOW_BUILTIN
> > +#define VGET_LOW_BUILTIN(A) \
> > +  {"vget_low_" #A, \
> > +   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
> > +   2, \
> > +   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
> > +   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
> > +   FLAG_AUTO_FP, \
> > +   false \
> > +  },
> > +
> > +#define AARCH64_SIMD_VGET_LOW_BUILTINS \
> > +  VGET_LOW_BUILTIN(f16) \
> > +  

[gcc r15-697] aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

2024-05-20 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:a2e4fe5a53cf75cd055f64e745ebd51253e42254

commit r15-697-ga2e4fe5a53cf75cd055f64e745ebd51253e42254
Author: Pengxuan Zheng 
Date:   Mon May 13 10:47:10 2024 -0700

aarch64: Fold vget_low_* intrinsics to BIT_FIELD_REF [PR102171]

This patch folds vget_low_* intrinsics to BIT_FILED_REF to open up more
optimization opportunities for gimple optimizers.

While we are here, we also remove the vget_low_* definitions from 
arm_neon.h and
use the new intrinsics framework.

PR target/102171

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc 
(AARCH64_SIMD_VGET_LOW_BUILTINS):
New macro to create definitions for all vget_low intrinsics.
(VGET_LOW_BUILTIN): Likewise.
(enum aarch64_builtins): Add vget_low function codes.
(aarch64_general_fold_builtin): Fold vget_low calls.
* config/aarch64/aarch64-simd-builtins.def: Delete vget_low 
builtins.
* config/aarch64/aarch64-simd.md (aarch64_get_low): Delete.
(aarch64_vget_lo_halfv8bf): Likewise.
* config/aarch64/arm_neon.h (__attribute__): Delete.
(vget_low_f16): Likewise.
(vget_low_f32): Likewise.
(vget_low_f64): Likewise.
(vget_low_p8): Likewise.
(vget_low_p16): Likewise.
(vget_low_p64): Likewise.
(vget_low_s8): Likewise.
(vget_low_s16): Likewise.
(vget_low_s32): Likewise.
(vget_low_s64): Likewise.
(vget_low_u8): Likewise.
(vget_low_u16): Likewise.
(vget_low_u32): Likewise.
(vget_low_u64): Likewise.
(vget_low_bf16): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr113573.c: Replace 
__builtin_aarch64_get_lowv8hi
with vget_low_s16.
* gcc.target/aarch64/vget_low_2.c: New test.
* gcc.target/aarch64/vget_low_2_be.c: New test.

Signed-off-by: Pengxuan Zheng 

Diff:
---
 gcc/config/aarch64/aarch64-builtins.cc   |  60 +
 gcc/config/aarch64/aarch64-simd-builtins.def |   5 +-
 gcc/config/aarch64/aarch64-simd.md   |  23 +
 gcc/config/aarch64/arm_neon.h| 105 ---
 gcc/testsuite/gcc.target/aarch64/pr113573.c  |   2 +-
 gcc/testsuite/gcc.target/aarch64/vget_low_2.c|  30 +++
 gcc/testsuite/gcc.target/aarch64/vget_low_2_be.c |  31 +++
 7 files changed, 124 insertions(+), 132 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 75d21de14011..11b888016ed7 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -658,6 +658,23 @@ static aarch64_simd_builtin_datum 
aarch64_simd_builtin_data[] = {
   VREINTERPRET_BUILTINS \
   VREINTERPRETQ_BUILTINS
 
+#define AARCH64_SIMD_VGET_LOW_BUILTINS \
+  VGET_LOW_BUILTIN(f16) \
+  VGET_LOW_BUILTIN(f32) \
+  VGET_LOW_BUILTIN(f64) \
+  VGET_LOW_BUILTIN(p8) \
+  VGET_LOW_BUILTIN(p16) \
+  VGET_LOW_BUILTIN(p64) \
+  VGET_LOW_BUILTIN(s8) \
+  VGET_LOW_BUILTIN(s16) \
+  VGET_LOW_BUILTIN(s32) \
+  VGET_LOW_BUILTIN(s64) \
+  VGET_LOW_BUILTIN(u8) \
+  VGET_LOW_BUILTIN(u16) \
+  VGET_LOW_BUILTIN(u32) \
+  VGET_LOW_BUILTIN(u64) \
+  VGET_LOW_BUILTIN(bf16)
+
 typedef struct
 {
   const char *name;
@@ -697,6 +714,9 @@ typedef struct
 #define VREINTERPRET_BUILTIN(A, B, L) \
   AARCH64_SIMD_BUILTIN_VREINTERPRET##L##_##A##_##B,
 
+#define VGET_LOW_BUILTIN(A) \
+  AARCH64_SIMD_BUILTIN_VGET_LOW_##A,
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   AARCH64_SIMD_BUILTIN_##T##_##N##A,
@@ -732,6 +752,7 @@ enum aarch64_builtins
   AARCH64_CRC32_BUILTIN_MAX,
   /* SIMD intrinsic builtins.  */
   AARCH64_SIMD_VREINTERPRET_BUILTINS
+  AARCH64_SIMD_VGET_LOW_BUILTINS
   /* ARMv8.3-A Pointer Authentication Builtins.  */
   AARCH64_PAUTH_BUILTIN_AUTIA1716,
   AARCH64_PAUTH_BUILTIN_PACIA1716,
@@ -823,8 +844,37 @@ static aarch64_fcmla_laneq_builtin_datum 
aarch64_fcmla_lane_builtin_data[] = {
  && SIMD_INTR_QUAL(A) == SIMD_INTR_QUAL(B) \
   },
 
+#undef VGET_LOW_BUILTIN
+#define VGET_LOW_BUILTIN(A) \
+  {"vget_low_" #A, \
+   AARCH64_SIMD_BUILTIN_VGET_LOW_##A, \
+   2, \
+   { SIMD_INTR_MODE(A, d), SIMD_INTR_MODE(A, q) }, \
+   { SIMD_INTR_QUAL(A), SIMD_INTR_QUAL(A) }, \
+   FLAG_AUTO_FP, \
+   false \
+  },
+
+#define AARCH64_SIMD_VGET_LOW_BUILTINS \
+  VGET_LOW_BUILTIN(f16) \
+  VGET_LOW_BUILTIN(f32) \
+  VGET_LOW_BUILTIN(f64) \
+  VGET_LOW_BUILTIN(p8) \
+  VGET_LOW_BUILTIN(p16) \
+  VGET_LOW_BUILTIN(p64) \
+  VGET_LOW_BUILTIN(s8) \
+  VGET_LOW_BUILTIN(s16) \
+  VGET_LOW_BUILTIN(s32) \
+  VGET_LOW_BUILTIN(s64) \
+  VGET_LOW_BUILTIN(u8) \
+  VGET_LOW_BUILTIN(u16) \
+  VGET_LOW_BUILTIN(u32) \
+  VGET_LOW_BUILTIN(u64) \
+  VGET_LOW_BUILTIN(bf16)
+
 static const aarch64_simd_intrinsic_datum aarch64_simd_intrinsic_data[] = {
   

Re: [pshed] testsuite, C++, Darwin: Skip cxa_atexit-6, which is not applicable.

2024-05-19 Thread Andrew Pinski
On Sun, May 19, 2024 at 6:38 AM Iain Sandoe  wrote:
>
> As per the analysis in the PR, tested on x86_64, i686 and aarch64 Darwin
> (and on x86_64 linux), pushed to trunk, thanks,
> Iain

Thanks for doing this.

Thanks,
Andrew

>
> --- 8< ---
>
> For Darwin, non-weak functions defined in a TU always bind locally
> and so cxa_atexit-6.C is not applicable here.
>
> PR testsuite/114982
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/tree-ssa/cxa_atexit-6.C: Skip for Darwin.
>
> Signed-off-by: Iain Sandoe 
> ---
>  gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C 
> b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> index f6599a3c9f4..e22036067dd 100644
> --- a/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> +++ b/gcc/testsuite/g++.dg/tree-ssa/cxa_atexit-6.C
> @@ -2,10 +2,14 @@
>  /* { dg-require-effective-target fpic } */
>  /* { dg-options "-O2 -fdump-tree-cddce1-details -fdump-tree-optimized -fPIC" 
> } */
>  // { dg-require-effective-target cxa_atexit }
> +/* This test is not appropriate for targets where non-weak functions defined
> +   in the TU always bind locally; see PR114982.  */
> +/* { dg-skip-if "PR114982" { *-*-darwin* } } */
>  /* PR tree-optimization/19661 */
>
>  /* The call to axexit should not be removed as A::~A() cannot be figured if 
> it
> -   is a pure/const function call as the function call g does not bind 
> locally. */
> +   is a pure/const function call for platforms where the function call g does
> +   not bind locally. */
>
>  __attribute__((noinline))
>  void g() {}
> --
> 2.39.2 (Apple Git-143)
>


Re: [to-be-committed][RISC-V] Eliminate redundant bitmanip operation

2024-05-19 Thread Andrew Pinski
On Sun, May 19, 2024 at 10:58 AM Jeff Law  wrote:
>
> perl has some internal bitmap code.  One of its implementation
> properties is that if you ask it to set a bit, the bit is first cleared.
>
>
> Unfortunately this is fairly hard to see in gimple/match due to type
> changes in the IL.  But it is easy to see in the code we get from
> combine.  So we just match the relevant cases.


So looking into this from a gimple point of view, we can see the issue
on x86_64 if you used explicitly `unsigned char`.
We have:
```
  c_8 = (unsigned char) _1;
  _2 = *a_10(D);
  c.0_3 = (signed char) _1;
  _4 = ~c.0_3;
  _12 = (unsigned char) _4;
``
So for this, we could push the no_op cast from `signed char` to
`unsigned char` past the `bit_not` and I think it will fix the issue
on the gimple level.
So something like:
```
/* Push no_op conversion past the bit_not expression if it was single use. */
(simplify
 (convert (bit_not:s @0))
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  (bit_not (convert @0))))
```

Thanks,
Andrew Pinski

>
>
>
> Regression tested in Ventana's CI system as well as my own.  Waiting on
> the Rivos CI system before moving forward.
>
>
>
> Jeff


Re: [PATCH v1] Match: Extract integer_types_ternary_match helper to avoid code dup [NFC]

2024-05-18 Thread Andrew Pinski
On Sat, May 18, 2024, 9:17 PM  wrote:

> From: Pan Li 
>
> There are sorts of match pattern for SAT related cases,  there will be
> some duplicated code to check the dest, op_0, op_1 are same tree types.
> Aka ternary tree type matches.  Thus, extract one helper function to
> do this and avoid match code duplication.
>
> The below test suites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 regression test.
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (integer_types_ternary_match): New helper
> function to check tenary tree type matches or not.
> * gimple-match-head.cc (integer_types_ternary_match): Ditto but
> for match.
> * match.pd: Leverage above helper function to avoid code dup.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/generic-match-head.cc | 17 +
>  gcc/gimple-match-head.cc  | 17 +
>  gcc/match.pd  | 25 +
>  3 files changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index 0d3f648fe8d..cdd48c7a5cc 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -59,6 +59,23 @@ types_match (tree t1, tree t2)
>return TYPE_MAIN_VARIANT (t1) == TYPE_MAIN_VARIANT (t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GENERIC.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GENERIC, we assume this is
> always true.  */
>
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 5f8a1a1ad8e..91f2e56b8ef 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -79,6 +79,23 @@ types_match (tree t1, tree t2)
>return types_compatible_p (t1, t2);
>  }
>
> +/* Routine to determine if the types T1,  T2 and T3 are effectively
> +   the same integer type for GIMPLE.  If T1,  T2 or T3 is not a type,
> +   the test applies to their TREE_TYPE.  */
> +
> +static inline bool
> +integer_types_ternary_match (tree t1, tree t2, tree t3)
> +{
> +  t1 = TYPE_P (t1) ? t1 : TREE_TYPE (t1);
> +  t2 = TYPE_P (t2) ? t2 : TREE_TYPE (t2);
> +  t3 = TYPE_P (t3) ? t3 : TREE_TYPE (t3);
> +
> +  if (!INTEGRAL_TYPE_P (t1) || !INTEGRAL_TYPE_P (t2) || !INTEGRAL_TYPE_P
> (t3))
> +return false;
> +
> +  return types_match (t1, t2) && types_match (t1, t3);
> +}
> +
>  /* Return if T has a single use.  For GIMPLE, we also allow any
> non-SSA_NAME (ie constants) and zero uses to cope with uses
> that aren't linked up yet.  */
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 0f9c34fa897..b291e34bbe4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3046,38 +3046,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Unsigned Saturation Add */
>  (match (usadd_left_part_1 @0 @1)
>   (plus:c @0 @1)
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>


Even though unsigned might be the cheaper check, you might need to swap the
order back to where it was so you check integral first.

Otherwise this is nice cleanup. (Note I can't approve it though).

Thanks,
Andrew


>  (match (usadd_left_part_2 @0 @1)
>   (realpart (IFN_ADD_OVERFLOW:c @0 @1))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (lt (plus:c @0 @1) @0)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_1 @0 @1)
>   (negate (convert (gt @0 (plus:c @0 @1
> - (if (INTEGRAL_TYPE_P (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@0))
> -  && types_match (type, TREE_TYPE (@1)
> + (if (TYPE_UNSIGNED (type) && integer_types_ternary_match (type, @0,
> @1
>
>  (match (usadd_right_part_2 @0 @1)
>   (negate (convert (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1))
> integer_zerop)))
> - (if (INTEGRAL_TYPE_P (type)
> -  && 

[PATCH] PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]

2024-05-18 Thread Andrew Pinski
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.

OK for trunk and backport to all open branches since r14-3827-g30e6ee074588ba 
was backported?
Bootstrapped and tested on x86_64_linux-gnu with no regressions.

PR tree-optimization/115143

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.

Signed-off-by: Andrew Pinski 
---
 .../gcc.c-torture/compile/pr115143-1.c| 21 +
 .../gcc.c-torture/compile/pr115143-2.c| 30 +++
 .../gcc.c-torture/compile/pr115143-3.c| 29 ++
 gcc/tree-ssa-phiopt.cc| 12 
 4 files changed, 92 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr115143-3.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
new file mode 100644
index 000..5cb119ea432
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-1.c
@@ -0,0 +1,21 @@
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+short a, d;
+char b;
+long c;
+unsigned long e, f;
+void g(unsigned long h) {
+  if (c ? e : b)
+if (e)
+  if (d) {
+a = f ? ({
+  unsigned long i = d ? f : 0, j = e ? h : 0;
+  i < j ? i : j;
+}) : 0;
+  }
+}
+
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
new file mode 100644
index 000..05c3bbe9738
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-2.c
@@ -0,0 +1,30 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) != 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_11(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
new file mode 100644
index 000..53c5fb5588e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115143-3.c
@@ -0,0 +1,29 @@
+/* { dg-options "-fgimple" } */
+/* PR tree-optimization/115143 */
+/* This used to ICE.
+   minmax part of phiopt would transform,
+   would transform `a!=0?min(a, b) : 0` into `min(a,b)`
+   which was correct except b was defined by a phi in the inner
+   bb which was not handled. */
+unsigned __GIMPLE (ssa,startwith("phiopt"))
+foo (unsigned a, unsigned b)
+{
+  unsigned j;
+  unsigned _23;
+  unsigned _12;
+
+  __BB(2):
+  if (a_6(D) > 0u)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  j_10 = __PHI (__BB2: b_7(D));
+  _23 = __MIN (a_6(D), j_10);
+  goto __BB4;
+
+  __BB(4):
+  _12 = __PHI (__BB3: _23, __BB2: 0u);
+  return _12;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index f166c3132cb..918cf50b589 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1925,6 +1925,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (middle_bb)))
+   return false;
+
   lhs = gimple_assign_lhs (assign);
   ass_code = gimple_assign_rhs_code (assign);
   if (ass_code != MAX_EXPR && ass_code != MIN_EXPR)
@@ -1938,6 +1942,10 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  || gimple_code (assign) != GIMPLE_ASSIGN)
return false;
 
+  /* There cannot be any phi nodes in the alt middle bb. */
+  if (!gimple_seq_empty_p (phi_nodes (alt_middle_bb)))
+   

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-18 Thread Andrew Pinski
On Thu, Apr 11, 2024 at 8:07 PM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isfinite. The finite check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?


This is missing adding documentation for the new optab.
It should be documented in md.texi under `Standard Pattern Names For
Generation` section.

Thanks,
Andrew


>
> Thanks
> Gui Haochen
>
> ChangeLog
> optab: Add isfinite_optab for isfinite builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isfinite_optab
> for isfinite builtin.
> * optabs.def (isfinite_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index d2786f207b8..5262aa01660 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;
> +case BUILT_IN_ISNORMAL:
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index ad14f9328b9..dcd77315c2a 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (fmod_optab, "fmod$a3")
>  OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
> +OPTAB_D (isfinite_optab, "isfinite$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-05-18 Thread Andrew Pinski
On Fri, Apr 12, 2024 at 1:10 AM HAO CHEN GUI  wrote:
>
> Hi,
>   This patch adds an optab for __builtin_isnormal. The normal check can be
> implemented on rs6000 by a single instruction. It needs an optab to be
> expanded to the certain sequence of instructions.
>
>   The subsequent patches will implement the expand on rs6000.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for next stage-1?

This is missing adding documentation for the new optab.
It should be documented in md.texi under `Standard Pattern Names For
Generation` section.

Thanks,
Andrew

>
> Thanks
> Gui Haochen
> ChangeLog
> optab: Add isnormal_optab for isnormal builtin
>
> gcc/
> * builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
> for isnormal builtin.
> * optabs.def (isnormal_optab): New.
>
> patch.diff
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index 3174f52ebe8..defb39de95f 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2462,6 +2462,7 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>  case BUILT_IN_ISFINITE:
>builtin_optab = isfinite_optab; break;
>  case BUILT_IN_ISNORMAL:
> +  builtin_optab = isnormal_optab; break;
>  CASE_FLT_FN (BUILT_IN_FINITE):
>  case BUILT_IN_FINITED32:
>  case BUILT_IN_FINITED64:
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index dcd77315c2a..3c401fc0b4c 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
>  OPTAB_D (ilogb_optab, "ilogb$a2")
>  OPTAB_D (isinf_optab, "isinf$a2")
>  OPTAB_D (isfinite_optab, "isfinite$a2")
> +OPTAB_D (isnormal_optab, "isnormal$a2")
>  OPTAB_D (issignaling_optab, "issignaling$a2")
>  OPTAB_D (ldexp_optab, "ldexp$a3")
>  OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 7:46 PM Tamar Christina 
wrote:

> Hi Victor,
>
> > -Original Message-
> > From: Victor Do Nascimento 
> > Sent: Thursday, May 16, 2024 3:39 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Richard Sandiford ; Richard Earnshaw
> > ; Victor Do Nascimento
> > 
> > Subject: [PATCH] middle-end: Expand {u|s}dot product support in
> autovectorizer
> >
> > From: Victor Do Nascimento 
> >
> > At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> > optabs for dealing with vectorizable dot product code sequences.  The
> > consequence of using a direct optab for this is that backend-pattern
> > selection is only ever able to match against one datatype - Either
> > that of the operands or of the accumulated value, never both.
> >
> > With the introduction of the 2-way (un)signed dot-product insn [1][2]
> > in AArch64 SVE2, the existing direct opcode approach is no longer
> > sufficient for full specification of all the possible dot product
> > machine instructions to be matched to the code sequence; a dot product
> > resulting in VNx4SI may result from either dot products on VNx16QI or
> > VNx8HI values for the 4- and 2-way dot product operations, respectively.
> >
> > This means that the following example fails autovectorization:
> >
> > uint32_t foo(int n, uint16_t* data) {
> >   uint32_t sum = 0;
> >   for (int i=0; i > sum += data[i] * data[i];
> >   }
> >   return sum;
> > }
> >
> > To remedy the issue a new optab is added, tentatively named
> > `udot_prod_twoway_optab', whose selection is dependent upon checking
> > of both input and output types involved in the operation.
> >
> > In order to minimize changes to the existing codebase,
> > `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> > argument is added to its signature - `const_tree otype', allowing type
> > information to be specified for both input and output types.  The
> > existing nterface is retained by defining a new `optab_for_tree_code',
> > which serves as a shim to `optab_for_tree_code_1', passing old
> > parameters as-is and setting the new `optype' argument to `NULL_TREE'.
> >
> > For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> > directly, passing it both types, adding the internal logic to the
> > function to distinguish between competing optabs.
> >
> > Finally, necessary changes are made to `expand_widen_pattern_expr' to
> > ensure the new icode can be correctly selected, given the new optab.
> >
> > [1] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> > [2] https://developer.arm.com/documentation/ddi0602/2024-03/SVE-
> > Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
> >
> > gcc/ChangeLog:
> >
> >   * config/aarch64/aarch64-sve2.md
> > (@aarch64_sve_dotvnx4sivnx8hi):
> >   renamed to `dot_prod_twoway_vnx8hi'.
> >   * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> >   update icodes used in line with above rename.
>
> Please split the target specific bits from the target agnostic parts.
> I.e. this patch series should be split in two.
>
> >   * optabs-tree.cc (optab_for_tree_code_1): Renamed
> >   `optab_for_tree_code' and added new argument.
> >   (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> >   * optabs-tree.h (optab_for_tree_code_1): New.
> >   * optabs.cc (expand_widen_pattern_expr): Expand support for
> >   DOT_PROD_EXPR patterns.
> >   * optabs.def (udot_prod_twoway_optab): New.
> >   (sdot_prod_twoway_optab): Likewise.
> >   * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> >   support for misc optabs that use two modes.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/vect/vect-dotprod-twoway.c: New.
> > ---
> >  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
> >  gcc/config/aarch64/aarch64-sve2.md|  2 +-
> >  gcc/optabs-tree.cc| 23 --
> >  gcc/optabs-tree.h |  2 ++
> >  gcc/optabs.cc |  2 +-
> >  gcc/optabs.def|  2 ++
> >  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
> >  gcc/tree-vect-patterns.cc |  2 +-
> >  8 files changed, 54 insertions(+), 7 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 0d2edf3f19e..e457db09f66 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -764,8 +764,8 @@ public:
> >icode = (e.type_suffix (0).float_p
> >  ? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
> >  : e.type_suffix (0).unsigned_p
> > -? 

Re: [PATCH] middle-end: Expand {u|s}dot product support in autovectorizer

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 4:40 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> From: Victor Do Nascimento 
>
> At present, the compiler offers the `{u|s|us}dot_prod_optab' direct
> optabs for dealing with vectorizable dot product code sequences.  The
> consequence of using a direct optab for this is that backend-pattern
> selection is only ever able to match against one datatype - Either
> that of the operands or of the accumulated value, never both.
>
> With the introduction of the 2-way (un)signed dot-product insn [1][2]
> in AArch64 SVE2, the existing direct opcode approach is no longer
> sufficient for full specification of all the possible dot product
> machine instructions to be matched to the code sequence; a dot product
> resulting in VNx4SI may result from either dot products on VNx16QI or
> VNx8HI values for the 4- and 2-way dot product operations, respectively.
>
> This means that the following example fails autovectorization:
>
> uint32_t foo(int n, uint16_t* data) {
>   uint32_t sum = 0;
>   for (int i=0; i sum += data[i] * data[i];
>   }
>   return sum;
> }
>
> To remedy the issue a new optab is added, tentatively named
> `udot_prod_twoway_optab', whose selection is dependent upon checking
> of both input and output types involved in the operation.
>
> In order to minimize changes to the existing codebase,
> `optab_for_tree_code' is renamed `optab_for_tree_code_1' and a new
> argument is added to its signature - `const_tree otype', allowing type
> information to be specified for both input and output types.  The
> existing nterface is retained by defining a new `optab_for_tree_code',
> which serves as a shim to `optab_for_tree_code_1', passing old
> parameters as-is and setting the new `optype' argument to `NULL_TREE'.
>
> For DOT_PROD_EXPR tree codes, we can call `optab_for_tree_code_1'
> directly, passing it both types, adding the internal logic to the
> function to distinguish between competing optabs.
>
> Finally, necessary changes are made to `expand_widen_pattern_expr' to
> ensure the new icode can be correctly selected, given the new optab.
>

Since you are adding an optab, please update the internals manual with the
documentation of the optab (the standard pattern names section).

Thanks,
Andrew


> [1]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/UDOT--2-way--vectors---Unsigned-integer-dot-product-
> [2]
> https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/SDOT--2-way--vectors---Signed-integer-dot-product-
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-sve2.md
> (@aarch64_sve_dotvnx4sivnx8hi):
> renamed to `dot_prod_twoway_vnx8hi'.
> * config/aarch64/aarch64-sve-builtins-base.cc (svdot_impl.expand):
> update icodes used in line with above rename.
> * optabs-tree.cc (optab_for_tree_code_1): Renamed
> `optab_for_tree_code' and added new argument.
> (optab_for_tree_code): Now a call to `optab_for_tree_code_1'.
> * optabs-tree.h (optab_for_tree_code_1): New.
> * optabs.cc (expand_widen_pattern_expr): Expand support for
> DOT_PROD_EXPR patterns.
> * optabs.def (udot_prod_twoway_optab): New.
> (sdot_prod_twoway_optab): Likewise.
> * tree-vect-patterns.cc (vect_supportable_direct_optab_p): Add
> support for misc optabs that use two modes.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-dotprod-twoway.c: New.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  |  4 ++--
>  gcc/config/aarch64/aarch64-sve2.md|  2 +-
>  gcc/optabs-tree.cc| 23 --
>  gcc/optabs-tree.h |  2 ++
>  gcc/optabs.cc |  2 +-
>  gcc/optabs.def|  2 ++
>  .../gcc.dg/vect/vect-dotprod-twoway.c | 24 +++
>  gcc/tree-vect-patterns.cc |  2 +-
>  8 files changed, 54 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-dotprod-twoway.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 0d2edf3f19e..e457db09f66 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -764,8 +764,8 @@ public:
>icode = (e.type_suffix (0).float_p
>? CODE_FOR_aarch64_sve_fdotvnx4sfvnx8hf
>: e.type_suffix (0).unsigned_p
> -  ? CODE_FOR_aarch64_sve_udotvnx4sivnx8hi
> -  : CODE_FOR_aarch64_sve_sdotvnx4sivnx8hi);
> +  ? CODE_FOR_udot_prod_twoway_vnx8hi
> +  : CODE_FOR_sdot_prod_twoway_vnx8hi);
>  return e.use_unpred_insn (icode);
>}
>  };
> diff --git a/gcc/config/aarch64/aarch64-sve2.md
> b/gcc/config/aarch64/aarch64-sve2.md
> index 934e57055d3..5677de7108d 100644
> --- a/gcc/config/aarch64/aarch64-sve2.md
> 

Re: [PATCH] middle-end: Drop __builtin_pretech calls in autovectorization [PR114061]'

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 3:58 PM Victor Do Nascimento <
victor.donascime...@arm.com> wrote:

> At present the autovectorizer fails to vectorize simple loops
> involving calls to `__builtin_prefetch'.  A simple example of such
> loop is given below:
>
> void foo(double * restrict a, double * restrict b, int n){
>   int i;
>   for(i=0; i a[i] = a[i] + b[i];
> __builtin_prefetch(&(b[i+8]));
>   }
> }
>
> The failure stems from two issues:
>
> 1. Given that it is typically not possible to fully reason about a
>function call due to the possibility of side effects, the
>autovectorizer does not attempt to vectorize loops which make such
>calls.
>
>Given the memory reference passed to `__builtin_prefetch', in the
>absence of assurances about its effect on the passed memory
>location the compiler deems the function unsafe to vectorize,
>marking it as clobbering memory in `vect_find_stmt_data_reference'.
>This leads to the failure in autovectorization.
>
> 2. Notwithstanding the above issue, though the prefetch statement
>would be classed as `vect_unused_in_scope', the loop invariant that
>is used in the address of the prefetch is the scalar loop's and not
>the vector loop's IV. That is, it still uses `i' and not `vec_iv'
>because the instruction wasn't vectorized, causing DCE to think the
>value is live, such that we now have both the vector and scalar loop
>invariant actively used in the loop.
>
> This patch addresses both of these:
>
> 1. About the issue regarding the memory clobber, data prefetch does
>not generate faults if its address argument is invalid and does not
>write to memory.  Therefore, it does not alter the internal state
>of the program or its control flow under any circumstance.  As
>such, it is reasonable that the function be marked as not affecting
>memory contents.
>
>To achieve this, we add the necessary logic to
>`get_references_in_stmt' to ensure that builtin functions are given
>given the same treatment as internal functions.  If the gimple call
>is to a builtin function and its function code is
>`BUILT_IN_PREFETCH', we mark `clobbers_memory' as false.
>
> 2. Finding precedence in the way clobber statements are handled,
>whereby the vectorizer drops these from both the scalar and
>vectorized versions of a given loop, we choose to drop prefetch
>hints in a similar fashion.  This seems appropriate given how
>software prefetch hints are typically ignored by processors across
>architectures, as they seldom lead to performance gain over their
>hardware counterparts.
>
>PR target/114061
>

This most likely be tree-optimization/114061 since it is a generic
vectorizer issue. Oh maybe reference the bug # in summary next time just
for easier reference.

Thanks,
Andrew


> gcc/ChangeLog:
>
> * tree-data-ref.cc (get_references_in_stmt): set
> `clobbers_memory' to false for __builtin_prefetch.
> * tree-vect-loop.cc (vect_transform_loop): Drop all
> __builtin_prefetch calls from loops.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-prefetch-drop.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c | 14 ++
>  gcc/tree-data-ref.cc   |  9 +
>  gcc/tree-vect-loop.cc  |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> new file mode 100644
> index 000..57723a8c972
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-prefetch-drop.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target { aarch64*-*-* } } } */
> +/* { dg-additional-options "-march=-O3 -march=armv9.2-a+sve
> -fdump-tree-vect-details" { target { aarch64*-*-* } } } */
> +
> +void foo(double * restrict a, double * restrict b, int n){
> +  int i;
> +  for(i=0; i +a[i] = a[i] + b[i];
> +__builtin_prefetch(&(b[i+8]));
> +  }
> +}
> +
> +/* { dg-final { scan-assembler-not "prfm" } } */
> +/* { dg-final { scan-assembler "fadd\tz\[0-9\]+.d, p\[0-9\]+/m,
> z\[0-9\]+.d, z\[0-9\]+.d" } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> diff --git a/gcc/tree-data-ref.cc b/gcc/tree-data-ref.cc
> index f37734b5340..47bfec0f922 100644
> --- a/gcc/tree-data-ref.cc
> +++ b/gcc/tree-data-ref.cc
> @@ -5843,6 +5843,15 @@ get_references_in_stmt (gimple *stmt,
> vec *references)
> clobbers_memory = true;
> break;
>   }
> +
> +  else if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
> +   {
> + enum built_in_function fn_type = DECL_FUNCTION_CODE
> (TREE_OPERAND (gimple_call_fn (stmt), 0));
> + if (fn_type == BUILT_IN_PREFETCH)
> +   clobbers_memory = false;
> + else
> +   clobbers_memory = 

Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Andrew Pinski
On Thu, May 16, 2024, 12:55 PM Oleg Endo  wrote:

>
> On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> > On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > >
> > > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis 
> wrote:
> > > >
> > > > If we consider code like:
> > > >
> > > > if (bar1 == x)
> > > >   return foo();
> > > > if (bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > We would like the ifcombine pass to convert this to:
> > > >
> > > > if (bar1 == x || bar2 != y)
> > > >   return foo();
> > > > return 0;
> > > >
> > > > The ifcombine pass can handle this transformation but it is ran very
> early and
> > > > it misses the opportunity because there are two seperate blocks for
> foo().
> > > > The pre pass is good at removing duplicate code and blocks and due
> to that
> > > > running ifcombine again after it can increase the number of
> successful
> > > > conversions.
> > >
> > > I do think we should have something similar to re-running
> > > ssa-ifcombine but I think it should be much later, like after the loop
> > > optimizations are done.
> > > Maybe just a simplified version of it (that does the combining and not
> > > the optimizations part) included in isel or pass_optimize_widening_mul
> > > (which itself should most likely become part of isel or renamed since
> > > it handles more than just widening multiply these days).
> >
> > I've long wished we had a (late?) pass that can also undo if-conversion
> > (basically do what RTL expansion would later do).  Maybe
> > gimple-predicate-analysis.cc (what's used by uninit analysis) can
> > represent mixed CFG + if-converted conditions so we can optimize
> > it and code-gen the condition in a more optimal manner much like
> > we have if-to-switch, switch-conversion and switch-expansion.
> >
> > That said, I agree that re-running ifcombine should be later.  And
> there's
> > still the old task of splitting tail-merging from PRE (and possibly
> making
> > it more effective).
>
> Sorry to butt in, but it might be little bit relevant and caught my
> attention.
>
> I've got this SH patch sitting around
> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543
>
> The idea is basically to run an additional loop pass after combine and
> split1.  The main purpose is to hoist constant loads out of loops. Such
> constant loads might be formed (in this particular case) during combine
> transformations.
>
> The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
> plate code copy pasted from other places to get the loop pass setup and
> going.
>
> Any thoughts on this way of doing it?
>

I have been looking at a similar issue on aarch64 for a few cases, csinc
and nand. What I decided to do for nand was not depend on combine in the
end and create a new infrastructure to expand better to rtl from gimple and
maybe even have target specific pattern matching on the gimple level. So
the constant is not part of the other instruction.

I should have a write up/first draft of an implementation by August time
frame or so. The write up will most likely be earlier.

Thanks,
Andrew



>
> Best regards,
> Oleg Endo
>


Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-15 Thread Andrew Pinski
On Thu, May 16, 2024, 4:09 AM Kewen.Lin  wrote:

> Hi,
>
> As the associated test case in PR114846 shows, currently
> with eh_return involved some register restoring for EH
> RETURN DATA in epilogue can clobber the one which holding
> the return value.  Referring to the existing handlings in
> some other targets, this patch makes eh_return expander
> call one new define_insn_and_split eh_return_internal which
> directly calls rs6000_emit_epilogue with epilogue_type
> EPILOGUE_TYPE_EH_RETURN instead of the previous treating
> normal return with crtl->calls_eh_return specially.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P8/P9 and
> powerpc64le-linux-gnu P9 and P10.
>
> I'm going to push this next week if no objections.
>


Thanks for fixing this for powerpc. I hope my patch for aarch64 gets
reviewed soon and it will contain many more testcases. Hopefully someone
will fix the arm target too.

Thanks,
Andrew



> BR,
> Kewen
> -
> PR target/114846
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-logue.cc (rs6000_emit_epilogue): As
> EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
> now, adjust the relevant handlings on it.
> * config/rs6000/rs6000.md (eh_return expander): Append by calling
> gen_eh_return_internal and emit_barrier.
> (eh_return_internal): New define_insn_and_split, call function
> rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr114846.c: New test.
> ---
>  gcc/config/rs6000/rs6000-logue.cc   |  7 +++
>  gcc/config/rs6000/rs6000.md | 15 +++
>  gcc/testsuite/gcc.target/powerpc/pr114846.c | 20 
>  3 files changed, 38 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr114846.c
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc
> b/gcc/config/rs6000/rs6000-logue.cc
> index 60ba15a8bc3..bd5d56ba002 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4308,9 +4308,6 @@ rs6000_emit_epilogue (enum epilogue_type
> epilogue_type)
>
>rs6000_stack_t *info = rs6000_stack_info ();
>
> -  if (epilogue_type == EPILOGUE_TYPE_NORMAL && crtl->calls_eh_return)
> -epilogue_type = EPILOGUE_TYPE_EH_RETURN;
> -
>int strategy = info->savres_strategy;
>bool using_load_multiple = !!(strategy & REST_MULTIPLE);
>bool restoring_GPRs_inline = !!(strategy & REST_INLINE_GPRS);
> @@ -4788,7 +4785,9 @@ rs6000_emit_epilogue (enum epilogue_type
> epilogue_type)
>
>/* In the ELFv2 ABI we need to restore all call-saved CR fields from
>   *separate* slots if the routine calls __builtin_eh_return, so
> - that they can be independently restored by the unwinder.  */
> + that they can be independently restored by the unwinder.  Since
> + it is for CR fields restoring, it should be done for any epilogue
> + types (not EPILOGUE_TYPE_EH_RETURN specific).  */
>if (DEFAULT_ABI == ABI_ELFv2 && crtl->calls_eh_return)
>  {
>int i, cr_off = info->ehcr_offset;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index ac5651d7420..d4120c3b9ce 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -14281,6 +14281,8 @@ (define_expand "eh_return"
>""
>  {
>emit_insn (gen_eh_set_lr (Pmode, operands[0]));
> +  emit_jump_insn (gen_eh_return_internal ());
> +  emit_barrier ();
>DONE;
>  })
>
> @@ -14297,6 +14299,19 @@ (define_insn_and_split "@eh_set_lr_"
>DONE;
>  })
>
> +(define_insn_and_split "eh_return_internal"
> +  [(eh_return)]
> +  ""
> +  "#"
> +  "epilogue_completed"
> +  [(const_int 0)]
> +{
> +  if (!TARGET_SCHED_PROLOG)
> +emit_insn (gen_blockage ());
> +  rs6000_emit_epilogue (EPILOGUE_TYPE_EH_RETURN);
> +  DONE;
> +})
> +
>  (define_insn "prefetch"
>[(prefetch (match_operand 0 "indexed_or_indirect_address" "a")
>  (match_operand:SI 1 "const_int_operand" "n")
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr114846.c
> b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> new file mode 100644
> index 000..efe2300b73a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr114846.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target builtin_eh_return } */
> +
> +/* Ensure it runs successfully.  */
> +
> +__attribute__ ((noipa))
> +int f (int *a, long offset, void *handler)
> +{
> +  if (*a == 5)
> +return 5;
> +  __builtin_eh_return (offset, handler);
> +}
> +
> +int main ()
> +{
> +  int t = 5;
> +  if (f (, 0, 0) != 5)
> +__builtin_abort ();
> +  return 0;
> +}
> --
> 2.39.3
>


[RFC] New optab for `a&~b` (and future expand improvements)

2024-05-15 Thread Andrew Pinski via Gcc
Hi all,
  This is an RFC more than anything and I will be implementing the ideas here. 
So thinking about how to improve code generation in general and depend less on 
RTL passes (like combine) to do some instruction selection. 
So there are 2 ways of implementing this but both involve adding optabs.
For the proposal that we decide on going forward with, I will write it up in a 
more generic form and place it up on the wik so folks can follow the same 
pattern of this going forward. And if I implement proposal 2, I will make sure 
the internals document is updated for each item too.

Proposal 1 (improve expand):
* Add an optab for andnot (`a & ~b`)
* Use TER to match the andnot pattern and and see if there is a optab for it 
and expand it using the optab.
* Use TER to pattern match `((A ^B) & C) ^ B` and expand it as `(A) | (B & 
~C)` using the optab (if it exist); should we do some cost check here or assume 
the optab is the same cost as bit_and?

Proposal 2 (use math-opt/ISEL)
* Add an optab for andnot (`a & ~b`) [same as above]
* Add an internal function for andnot
* Create a subpass of math-opt (or isel) that uses a new math-and-simplify like 
format to create the internal function for the simple `a & ~b` if there is an 
optab
* Do a similar thing for `((A ^B) & C) ^ B` to use the new internal function.

The pros and cons of each proposal:
* pros of proposal 1:
** does not need much refactoring or new code added
** patches would be smaller to review
** Patches could be implemented within a week
* cons of proposal 1:
** keeps around TER longer
** does not scale for additional changes
** need manual matching since TER has its own rules
** Can't use ranges due to the way CFG is in transition between Gimple and RTL

* pros of proposal 2:
** Can be used to simplify expand later on
** Easier to add new rules via match syntax
*** can still use manual matching like the current math-opt pass is done
** Start removal of TER
** Can use ranger much easier
* cons of proposal 2:
** genmatch will need to change
** patches will take a month to write


I like proposal 2 better than proposal 1 since it allows for cleanups later on.
I think about starting this in July if folks think proposal 2 is the way 
forward.

Thanks,
Andrew Pinski


Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Andrew Pinski
On Wed, May 15, 2024, 12:17 PM Wilco Dijkstra 
wrote:

> Improve costing of ctz - both TARGET_CSSC and vector cases were not
> handled yet.
>
> Passes regress & bootstrap - OK for commit?
>

I should note popcount has a similar issue which I hope to fix next week.
Popcount cost is used during expand so it is very useful to be slightly
more correct.

Thanks,
Andrew



> gcc:
> * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ
> costing.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> fe13c9a0d4863041eb9101882ea57c2094240d16..2a6f76f4008839bf0aa158504430af9b971c
> 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -14309,10 +14309,24 @@ aarch64_rtx_costs (rtx x, machine_mode mode, int
> outer ATTRIBUTE_UNUSED,
>return false;
>
>  case CTZ:
> -  *cost = COSTS_N_INSNS (2);
> -
> -  if (speed)
> -   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +  if (VECTOR_MODE_P (mode))
> +   {
> + *cost = COSTS_N_INSNS (3);
> + if (speed)
> +   *cost += extra_cost->vect.alu * 3;
> +   }
> +  else if (TARGET_CSSC)
> +   {
> + *cost = COSTS_N_INSNS (1);
> + if (speed)
> +   *cost += extra_cost->alu.clz;
> +   }
> +  else
> +   {
> + *cost = COSTS_N_INSNS (2);
> + if (speed)
> +   *cost += extra_cost->alu.clz + extra_cost->alu.rev;
> +   }
>return false;
>
>  case COMPARE:
>
>


[gcc r15-501] tree-cfg: Move the returns_twice check to be last statement only [PR114301]

2024-05-15 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:642f31d6286b8a342130fbface51530befd975fd

commit r15-501-g642f31d6286b8a342130fbface51530befd975fd
Author: Andrew Pinski 
Date:   Tue May 14 06:29:18 2024 -0700

tree-cfg: Move the returns_twice check to be last statement only [PR114301]

When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.

Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

PR tree-optimization/114301
gcc/ChangeLog:

* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/tree-cfg.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index b2d47b720847..7fb7b92966be 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6495,6 +6495,13 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
&& gimple_call_internal_p (last)
&& gimple_call_internal_unique_p (last))
   return false;
+
+/* Prohibit duplication of returns_twice calls, otherwise associated
+   abnormal edges also need to be duplicated properly.
+   return_twice functions will always be the last statement.  */
+if (is_gimple_call (last)
+   && (gimple_call_flags (last) & ECF_RETURNS_TWICE))
+  return false;
   }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
@@ -6502,15 +6509,12 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
 {
   gimple *g = gsi_stmt (gsi);
 
-  /* Prohibit duplication of returns_twice calls, otherwise associated
-abnormal edges also need to be duplicated properly.
-An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 duplicated as part of its group, or not at all.
 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
 group, so the same holds there.  */
   if (is_gimple_call (g)
- && (gimple_call_flags (g) & ECF_RETURNS_TWICE
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)


[PATCH] tree-cfg: Move the returns_twice check to be last statement only [PR114301]

2024-05-14 Thread Andrew Pinski
When I was checking to making sure that all of the bugs dealing
with the case where gimple_can_duplicate_bb_p would return false was fixed,
I noticed that the code which was checking if a call statement was
returns_twice was checking all call statements rather than just the
last statement. Since calling gimple_call_flags has a small non-zero
overhead due to a few string comparison, removing the uses of it
can have a small performance improvement. In the case of returns_twice
functions calls, will always end the basic-block due to the check in
stmt_can_terminate_bb_p (and others). So checking only the last statement
is a small optimization and will be safe.

Bootstrapped and tested pon x86_64-linux-gnu with no regressions.

PR tree-optimization/114301
gcc/ChangeLog:

* tree-cfg.cc (gimple_can_duplicate_bb_p): Check returns_twice
only on the last call statement rather than all.

Signed-off-by: Andrew Pinski 
---
 gcc/tree-cfg.cc | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index b2d47b72084..7fb7b92966b 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -6495,6 +6495,13 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
&& gimple_call_internal_p (last)
&& gimple_call_internal_unique_p (last))
   return false;
+
+/* Prohibit duplication of returns_twice calls, otherwise associated
+   abnormal edges also need to be duplicated properly.
+   return_twice functions will always be the last statement.  */
+if (is_gimple_call (last)
+   && (gimple_call_flags (last) & ECF_RETURNS_TWICE))
+  return false;
   }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (CONST_CAST_BB (bb));
@@ -6502,15 +6509,12 @@ gimple_can_duplicate_bb_p (const_basic_block bb)
 {
   gimple *g = gsi_stmt (gsi);
 
-  /* Prohibit duplication of returns_twice calls, otherwise associated
-abnormal edges also need to be duplicated properly.
-An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
+  /* An IFN_GOMP_SIMT_ENTER_ALLOC/IFN_GOMP_SIMT_EXIT call must be
 duplicated as part of its group, or not at all.
 The IFN_GOMP_SIMT_VOTE_ANY and IFN_GOMP_SIMT_XCHG_* are part of such a
 group, so the same holds there.  */
   if (is_gimple_call (g)
- && (gimple_call_flags (g) & ECF_RETURNS_TWICE
- || gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
+ && (gimple_call_internal_p (g, IFN_GOMP_SIMT_ENTER_ALLOC)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
  || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_BFLY)
-- 
2.34.1



Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-13 Thread Andrew Pinski
On Mon, May 13, 2024, 11:41 PM Kees Cook  wrote:

> On Mon, May 13, 2024 at 02:46:32PM -0600, Jeff Law wrote:
> >
> >
> > On 5/13/24 1:48 PM, Qing Zhao wrote:
> > > -Warray-bounds is an important option to enable linux kernal to keep
> > > the array out-of-bound errors out of the source tree.
> > >
> > > However, due to the false positive warnings reported in PR109071
> > > (-Warray-bounds false positive warnings due to code duplication from
> > > jump threading), -Warray-bounds=1 cannot be added on by default.
> > >
> > > Although it's impossible to elinimate all the false positive warnings
> > > from -Warray-bounds=1 (See PR104355 Misleading -Warray-bounds
> > > documentation says "always out of bounds"), we should minimize the
> > > false positive warnings in -Warray-bounds=1.
> > >
> > > The root reason for the false positive warnings reported in PR109071
> is:
> > >
> > > When the thread jump optimization tries to reduce the # of branches
> > > inside the routine, sometimes it needs to duplicate the code and
> > > split into two conditional pathes. for example:
> > >
> > > The original code:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >if (index >= 4)
> > >  warn ();
> > >*ptr = *val;
> > >
> > >return;
> > > }
> > >
> > > With the thread jump, the above becomes:
> > >
> > > void sparx5_set (int * ptr, struct nums * sg, int index)
> > > {
> > >if (index >= 4)
> > >  {
> > >warn ();
> > >*ptr = 0;// Code duplications since "warn" does
> return;
> > >*val = sg->vals[index];  // same this line.
> > > // In this path, since it's under the
> condition
> > > // "index >= 4", the compiler knows the
> value
> > > // of "index" is larger then 4, therefore
> the
> > > // out-of-bound warning.
> > >warn ();
> > >  }
> > >else
> > >  {
> > >*ptr = 0;
> > >*val = sg->vals[index];
> > >  }
> > >*ptr = *val;
> > >return;
> > > }
> > >
> > > We can see, after the thread jump optimization, the # of branches
> inside
> > > the routine "sparx5_set" is reduced from 2 to 1, however,  due to the
> > > code duplication (which is needed for the correctness of the code), we
> > > got a false positive out-of-bound warning.
> > >
> > > In order to eliminate such false positive out-of-bound warning,
> > >
> > > A. Add one more flag for GIMPLE: is_splitted.
> > > B. During the thread jump optimization, when the basic blocks are
> > > duplicated, mark all the STMTs inside the original and duplicated
> > > basic blocks as "is_splitted";
> > > C. Inside the array bound checker, add the following new heuristic:
> > >
> > > If
> > > 1. the stmt is duplicated and splitted into two conditional paths;
> > > +  2. the warning level < 2;
> > > +  3. the current block is not dominating the exit block
> > > Then not report the warning.
> > >
> > > The false positive warnings are moved from -Warray-bounds=1 to
> > >   -Warray-bounds=2 now.
> > >
> > > Bootstrapped and regression tested on both x86 and aarch64. adjusted
> > >   -Warray-bounds-61.c due to the false positive warnings.
> > >
> > > Let me know if you have any comments and suggestions.
> > This sounds horribly wrong.   In the code above, the warning is correct.
>
> It's not sensible from a user's perspective.
>
> If this doesn't warn:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>*ptr = 0;
>*val = sg->vals[index];
>*ptr = *val;
> }
>
> ... because the value range tracking of "index" spans [INT_MIN,INT_MAX],
> and warnings based on the value range are silenced if they haven't been
> clamped at all. (Otherwise warnings would be produced everywhere: only
> when a limited set of values is known is it useful to produce a warning.)
>
>
> But it makes no sense to warn about:
>
> void sparx5_set (int * ptr, struct nums * sg, int index)
> {
>if (index >= 4)
>  warn ();
>*ptr = 0;
>*val = sg->vals[index];
>if (index >= 4)
>  warn ();
>*ptr = *val;
> }
>
> Because at "*val = sg->vals[index];" the actual value range tracking for
> index is _still_ [INT_MIN,INT_MAX]. (Only within the "then" side of the
> "if" statements is the range tracking [4,INT_MAX].)
>
> However, in the case where jump threading has split the execution flow
> and produced a copy of "*val = sg->vals[index];" where the value range
> tracking for "index" is now [4,INT_MAX], is the warning valid. But it
> is only for that instance. Reporting it for effectively both (there is
> only 1 source line for the array indexing) is misleading because there
> is nothing the user can do about it -- the compiler created the copy and
> then noticed it had a range it could apply to that array index.
>

"there is 

[PATCH] Match: optimize `a == CST & unary(a)` [PR111487]

2024-05-13 Thread Andrew Pinski
This is an expansion of the optimize `a == CST & a`
to handle more than just casts. It adds optimization
for unary.
The patch for binary operators will come later.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111487
gcc/ChangeLog:

* match.pd (tcc_int_unary): New operator list.
(`a == CST & unary(a)`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/and-unary-1.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd| 12 
 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c | 61 +
 2 files changed, 73 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 07e743ae464..3ee28a3d8fc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -57,6 +57,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "cfn-operators.pd"
 
+/* integer unary operators that return the same type. */
+(define_operator_list tcc_int_unary
+ abs absu negate bit_not BSWAP POPCOUNT CTZ CLZ PARITY)
+
 /* Define operand lists for math rounding functions {,i,l,ll}FN,
where the versions prefixed with "i" return an int, those prefixed with
"l" return a long and those prefixed with "ll" return a long long.
@@ -5451,6 +5455,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   @2
   { build_zero_cst (type); }))
 
+/* `(a == CST) & unary(a)` can be simplified to `(a == CST) & unary(CST)`. */
+(simplify
+ (bit_and:c (convert@2 (eq @0 INTEGER_CST@1))
+(convert? (tcc_int_unary @3)))
+ (if (bitwise_equal_p (@0, @3))
+  (with { tree  inner_type = TREE_TYPE (@3); }
+   (bit_and @2 (convert (tcc_int_unary (convert:inner_type @1)))
+
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
x_5 == cstN ? cst4 : cst3
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
new file mode 100644
index 000..c157bc11b00
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/and-unary-1.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-forwprop1-raw -fdump-tree-optimized-raw" } */
+/* unary part of PR tree-optimization/111487 */
+
+int abs1(int a)
+{
+  int b = __builtin_abs(a);
+  return (a == 1) & b;
+}
+int absu1(int a)
+{
+  int b;
+  b = a > 0 ? -a:a;
+  b = -b;
+return (a == 1) & b;
+}
+
+int bswap1(int a)
+{
+  int b = __builtin_bswap32(a);
+  return (a == 1) & b;
+}
+
+int ctz1(int a)
+{
+  int b = __builtin_ctz(a);
+  return (a == 1) & b;
+}
+int pop1(int a)
+{
+  int b = __builtin_popcount(a);
+  return (a == 1) & b;
+}
+int neg1(int a)
+{
+  int b = -(a);
+  return (a == 1) & b;
+}
+int not1(int a)
+{
+  int b = ~(a);
+  return (a == 1) & b;
+}
+int partity1(int a)
+{
+  int b = __builtin_parity(a);
+  return (a == 1) & b;
+}
+
+
+/* We should optimize out the unary operator for each.
+   For ctz we can optimize directly to `return 0`.
+   For bswap1 and not1, we can do the same but not until after forwprop1.  */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 7 "forwprop1" } } */
+/* { dg-final { scan-tree-dump-times "eq_expr, " 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "abs_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "absu_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_not_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "negate_expr, "  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "gimple_call <"  "forwprop1" } } */
+/* { dg-final { scan-tree-dump-not "bit_and_expr,  "  "forwprop1" } } */
-- 
2.34.1



[gcc r11-11422] Fix PR 110386: backprop vs ABSU_EXPR

2024-05-09 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:dbfc2d075f10149bd94e16c1210ffe4bac7e60c3

commit r11-11422-gdbfc2d075f10149bd94e16c1210ffe4bac7e60c3
Author: Andrew Pinski 
Date:   Sat Sep 23 21:53:09 2023 -0700

Fix PR 110386: backprop vs ABSU_EXPR

The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110386

gcc/ChangeLog:

* gimple-ssa-backprop.c (strip_sign_op_1): Remove ABSU_EXPR.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr110386-1.c: New test.
* gcc.c-torture/compile/pr110386-2.c: New test.

(cherry picked from commit 2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f)

Diff:
---
 gcc/gimple-ssa-backprop.c|  1 -
 gcc/testsuite/gcc.c-torture/compile/pr110386-1.c |  9 +
 gcc/testsuite/gcc.c-torture/compile/pr110386-2.c | 11 +++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-backprop.c b/gcc/gimple-ssa-backprop.c
index 4b62bb92a21d..8c0a37e6e97d 100644
--- a/gcc/gimple-ssa-backprop.c
+++ b/gcc/gimple-ssa-backprop.c
@@ -688,7 +688,6 @@ strip_sign_op_1 (tree rhs)
 switch (gimple_assign_rhs_code (assign))
   {
   case ABS_EXPR:
-  case ABSU_EXPR:
   case NEGATE_EXPR:
return gimple_assign_rhs1 (assign);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
new file mode 100644
index ..4fcc977ad16f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
@@ -0,0 +1,9 @@
+
+int f(int a)
+{
+int c = c < 0 ? c : -c;
+c = -c;
+unsigned b =  c;
+unsigned t = b*a;
+return t*t;
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
new file mode 100644
index ..c60e1b6994b7
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-mavx" } */
+
+#include 
+
+__m128i do_stuff(__m128i XMM0) {
+   __m128i ABS0 = _mm_abs_epi32(XMM0);
+   __m128i MUL0 = _mm_mullo_epi32(ABS0, XMM0);
+   __m128i MUL1 = _mm_mullo_epi32(MUL0, MUL0);
+   return MUL1;
+}


Re: [COMMITTED/13] Fix PR 110386: backprop vs ABSU_EXPR

2024-05-09 Thread Andrew Pinski
On Sun, Oct 1, 2023 at 12:28 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The issue here is that when backprop tries to go
> and strip sign ops, it skips over ABSU_EXPR but
> ABSU_EXPR not only does an ABS, it also changes the
> type to unsigned.
> Since strip_sign_op_1 is only supposed to strip off
> sign changing operands and not ones that change types,
> removing ABSU_EXPR here is correct. We don't handle
> nop conversions so this does cause any missed optimizations either.
>
> Committed to the GCC 13 branch after bootstrapped and
> tested on x86_64-linux-gnu with no regressions.

And to the GCC 12 and 11 branches too.

Thanks,
Andrew

>
> PR tree-optimization/110386
>
> gcc/ChangeLog:
>
> * gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr110386-1.c: New test.
> * gcc.c-torture/compile/pr110386-2.c: New test.
>
> (cherry picked from commit 2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f)
> ---
>  gcc/gimple-ssa-backprop.cc   |  1 -
>  gcc/testsuite/gcc.c-torture/compile/pr110386-1.c |  9 +
>  gcc/testsuite/gcc.c-torture/compile/pr110386-2.c | 11 +++
>  3 files changed, 20 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
>
> diff --git a/gcc/gimple-ssa-backprop.cc b/gcc/gimple-ssa-backprop.cc
> index 65a65590017..dcb15ed4f61 100644
> --- a/gcc/gimple-ssa-backprop.cc
> +++ b/gcc/gimple-ssa-backprop.cc
> @@ -694,7 +694,6 @@ strip_sign_op_1 (tree rhs)
>  switch (gimple_assign_rhs_code (assign))
>{
>case ABS_EXPR:
> -  case ABSU_EXPR:
>case NEGATE_EXPR:
> return gimple_assign_rhs1 (assign);
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
> new file mode 100644
> index 000..4fcc977ad16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
> @@ -0,0 +1,9 @@
> +
> +int f(int a)
> +{
> +int c = c < 0 ? c : -c;
> +c = -c;
> +unsigned b =  c;
> +unsigned t = b*a;
> +return t*t;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
> new file mode 100644
> index 000..c60e1b6994b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-mavx" } */
> +
> +#include 
> +
> +__m128i do_stuff(__m128i XMM0) {
> +   __m128i ABS0 = _mm_abs_epi32(XMM0);
> +   __m128i MUL0 = _mm_mullo_epi32(ABS0, XMM0);
> +   __m128i MUL1 = _mm_mullo_epi32(MUL0, MUL0);
> +   return MUL1;
> +}
> --
> 2.39.3
>


[gcc r12-10434] Fix PR 110386: backprop vs ABSU_EXPR

2024-05-09 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:f5c7306d7f039e5c74c5e82cf06610f0ae07a0e8

commit r12-10434-gf5c7306d7f039e5c74c5e82cf06610f0ae07a0e8
Author: Andrew Pinski 
Date:   Sat Sep 23 21:53:09 2023 -0700

Fix PR 110386: backprop vs ABSU_EXPR

The issue here is that when backprop tries to go
and strip sign ops, it skips over ABSU_EXPR but
ABSU_EXPR not only does an ABS, it also changes the
type to unsigned.
Since strip_sign_op_1 is only supposed to strip off
sign changing operands and not ones that change types,
removing ABSU_EXPR here is correct. We don't handle
nop conversions so this does cause any missed optimizations either.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110386

gcc/ChangeLog:

* gimple-ssa-backprop.cc (strip_sign_op_1): Remove ABSU_EXPR.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr110386-1.c: New test.
* gcc.c-torture/compile/pr110386-2.c: New test.

(cherry picked from commit 2bbac12ea7bd8a3eef5382e1b13f6019df4ec03f)

Diff:
---
 gcc/gimple-ssa-backprop.cc   |  1 -
 gcc/testsuite/gcc.c-torture/compile/pr110386-1.c |  9 +
 gcc/testsuite/gcc.c-torture/compile/pr110386-2.c | 11 +++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-backprop.cc b/gcc/gimple-ssa-backprop.cc
index 74f981112567..68ea403e847f 100644
--- a/gcc/gimple-ssa-backprop.cc
+++ b/gcc/gimple-ssa-backprop.cc
@@ -688,7 +688,6 @@ strip_sign_op_1 (tree rhs)
 switch (gimple_assign_rhs_code (assign))
   {
   case ABS_EXPR:
-  case ABSU_EXPR:
   case NEGATE_EXPR:
return gimple_assign_rhs1 (assign);
 
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
new file mode 100644
index ..4fcc977ad16f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-1.c
@@ -0,0 +1,9 @@
+
+int f(int a)
+{
+int c = c < 0 ? c : -c;
+c = -c;
+unsigned b =  c;
+unsigned t = b*a;
+return t*t;
+}
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
new file mode 100644
index ..c60e1b6994b7
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr110386-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-options "-mavx" } */
+
+#include 
+
+__m128i do_stuff(__m128i XMM0) {
+   __m128i ABS0 = _mm_abs_epi32(XMM0);
+   __m128i MUL0 = _mm_mullo_epi32(ABS0, XMM0);
+   __m128i MUL1 = _mm_mullo_epi32(MUL0, MUL0);
+   return MUL1;
+}


Re: Ping [PATCH/RFC] target, hooks: Allow a target to trap on unreachable [PR109267].

2024-05-08 Thread Andrew Pinski
On Wed, May 8, 2024 at 12:37 PM Iain Sandoe  wrote:
>
> Hi Folks,
>
> I’d like to land a viable solution to this issue if possible, (it is a show-
> stopper for the aarch64-darwin development branch).
>
> > On 9 Apr 2024, at 14:55, Iain Sandoe  wrote:
> >
> > So far, tested lightly on aarch64-darwin; if this is acceptable then
> > it will be possible to back out of the ad hoc fixes used on x86 and
> > powerpc darwin.
> > Comments welcome, thanks,
>
> @Andrew - you were also (at one stage) talking about some ideas about
> how to handle this is in the middle end.
> Is that something you are likely to have time to do?
> Would it still be reasonable to have a target hook to control the behaviour.
> (the implementation below allows one to make the effect per TU)

I won't be able to implement the idea until July at earliest though.

Thanks,
Andrew

>
>
> > Iain
> >
> > --- 8< ---
> >
> >
> > In the PR cited case a target linker cannot handle enpty FDEs,
> > arguably this is a linker bug - but in some cases we might still
> > wish to work around it.
> >
> > In the case of Darwin, the ABI does not allow two global symbols
> > to have the same address, so that emitting empty functions has
> > potential (almost guarantee) to break ABI.
> >
> > This patch allows a target to ask that __builtin_unreachable is
> > expanded in the same way as __builtin_trap (either to a trap
> > instruction or to abort() if there is no such insn).
> >
> > This means that the middle end's use of unreachability for
> > optimisation should not be altered.
> >
> > __builtin_unreachble is currently expanded to a barrier and
> > __builtin_trap is expanded to a trap insn + a barrier so that it
> > seems we should not be unduly affecting RTL optimisations.
> >
> > For Darwin, we enable this by default, but allow it to be disabled
> > per TU using -mno-unreachable-traps.
> >
> >   PR middle-end/109267
> >
> > gcc/ChangeLog:
> >
> >   * builtins.cc (expand_builtin_unreachable): Allow for
> >   a target to expand this as a trap.
> >   * config/darwin-protos.h (darwin_unreachable_traps_p): New.
> >   * config/darwin.cc (darwin_unreachable_traps_p): New.
> >   * config/darwin.h (TARGET_UNREACHABLE_SHOULD_TRAP): New.
> >   * config/darwin.opt (munreachable-traps): New.
> >   * doc/invoke.texi: Document -munreachable-traps.
> >   * doc/tm.texi: Regenerate.
> >   * doc/tm.texi.in: Document TARGET_UNREACHABLE_SHOULD_TRAP.
> >   * target.def (TARGET_UNREACHABLE_SHOULD_TRAP): New hook.
> >
> > Signed-off-by: Iain Sandoe 
> > ---
> > gcc/builtins.cc|  7 +++
> > gcc/config/darwin-protos.h |  1 +
> > gcc/config/darwin.cc   |  7 +++
> > gcc/config/darwin.h|  4 
> > gcc/config/darwin.opt  |  4 
> > gcc/doc/invoke.texi|  7 ++-
> > gcc/doc/tm.texi|  5 +
> > gcc/doc/tm.texi.in |  2 ++
> > gcc/target.def | 10 ++
> > 9 files changed, 46 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> > index f8d94c4b435..13f321b6be6 100644
> > --- a/gcc/builtins.cc
> > +++ b/gcc/builtins.cc
> > @@ -5929,6 +5929,13 @@ expand_builtin_trap (void)
> > static void
> > expand_builtin_unreachable (void)
> > {
> > +  /* If the target wants a trap in place of the fall-through, use that.  */
> > +  if (targetm.unreachable_should_trap ())
> > +{
> > +  expand_builtin_trap ();
> > +  return;
> > +}
> > +
> >   /* Use gimple_build_builtin_unreachable or builtin_decl_unreachable
> >  to avoid this.  */
> >   gcc_checking_assert (!sanitize_flags_p (SANITIZE_UNREACHABLE));
> > diff --git a/gcc/config/darwin-protos.h b/gcc/config/darwin-protos.h
> > index b67e05264e1..48a32b2ccc2 100644
> > --- a/gcc/config/darwin-protos.h
> > +++ b/gcc/config/darwin-protos.h
> > @@ -124,6 +124,7 @@ extern void darwin_enter_string_into_cfstring_table 
> > (tree);
> > extern void darwin_asm_output_anchor (rtx symbol);
> > extern bool darwin_use_anchors_for_symbol_p (const_rtx symbol);
> > extern bool darwin_kextabi_p (void);
> > +extern bool darwin_unreachable_traps_p (void);
> > extern void darwin_override_options (void);
> > extern void darwin_patch_builtins (void);
> > extern void darwin_rename_builtins (void);
> > diff --git a/gcc/config/darwin.cc b/gcc/config/darwin.cc
> > index dcfccb4952a..018547d09c6 100644
> > --- a/gcc/config/darwin.cc
> > +++ b/gcc/config/darwin.cc
> > @@ -3339,6 +3339,13 @@ darwin_kextabi_p (void) {
> >   return flag_apple_kext;
> > }
> >
> > +/* True, iff we want to map __builtin_unreachable to a trap.  */
> > +
> > +bool
> > +darwin_unreachable_traps_p (void) {
> > +  return darwin_unreachable_traps;
> > +}
> > +
> > void
> > darwin_override_options (void)
> > {
> > diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
> > index d335ffe7345..17f41cf30ef 100644
> > --- a/gcc/config/darwin.h
> > +++ b/gcc/config/darwin.h
> > @@ -1225,6 +1225,10 @@ void add_framework_path (char *);
> > 

Re: [COMMITTED] warn-access: Fix handling of unnamed types [PR109804]

2024-05-08 Thread Andrew Pinski
On Thu, Feb 22, 2024 at 9:28 AM Andrew Pinski  wrote:
>
> This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
> DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
> the code expected newc.u.s_binary.left would be valid.
> So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function 
> paramaters
> (DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters 
> (DEMANGLE_COMPONENT_TEMPLATE_PARAM).
>
> Note the code in the demangler does this when it sets 
> DEMANGLE_COMPONENT_UNNAMED_TYPE:
>   ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
>   ret->u.s_number.number = num;
>
> Committed as obvious after bootstrap/test on x86_64-linux-gnu
> Will commit to other branches in a few days.

Now committed (with the testcase fix backported too) to the GCC 12 branch.

Thanks,
Andrew Pinski

>
> PR tree-optimization/109804
>
> gcc/ChangeLog:
>
> * gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
> DEMANGLE_COMPONENT_UNNAMED_TYPE.
>
> gcc/testsuite/ChangeLog:
>
>     * g++.dg/warn/Wmismatched-new-delete-8.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/gimple-ssa-warn-access.cc |  1 +
>  .../g++.dg/warn/Wmismatched-new-delete-8.C| 42 +++
>  2 files changed, 43 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
>
> diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
> index cd083ab2237..dedaae27b31 100644
> --- a/gcc/gimple-ssa-warn-access.cc
> +++ b/gcc/gimple-ssa-warn-access.cc
> @@ -1701,6 +1701,7 @@ new_delete_mismatch_p (const demangle_component ,
>
>  case DEMANGLE_COMPONENT_FUNCTION_PARAM:
>  case DEMANGLE_COMPONENT_TEMPLATE_PARAM:
> +case DEMANGLE_COMPONENT_UNNAMED_TYPE:
>return newc.u.s_number.number != delc.u.s_number.number;
>
>  case DEMANGLE_COMPONENT_CHARACTER:
> diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C 
> b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> new file mode 100644
> index 000..0ddc056c6df
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
> @@ -0,0 +1,42 @@
> +/* PR tree-optimization/109804 */
> +/* { dg-do compile { target c++11 } } */
> +/* { dg-options "-Wall" } */
> +
> +/* Here we used to ICE in new_delete_mismatch_p because
> +   we didn't handle unnamed types from the demangler 
> (DEMANGLE_COMPONENT_UNNAMED_TYPE). */
> +
> +template 
> +static inline T * construct_at(void *at, ARGS && args)
> +{
> + struct Placeable : T
> + {
> +  Placeable(ARGS && args) : T(args) { }
> +  void * operator new (long unsigned int, void *ptr) { return ptr; }
> +  void operator delete (void *, void *) { }
> + };
> + return new (at) Placeable(static_cast(args));
> +}
> +template 
> +struct Reconstructible
> +{
> +  char _space[sizeof(MT)];
> +  Reconstructible() { }
> +};
> +template 
> +struct Constructible : Reconstructible
> +{
> + Constructible(){}
> +};
> +struct A { };
> +struct B
> +{
> + Constructible a { };
> + B(int) { }
> +};
> +Constructible b { };
> +void f()
> +{
> +  enum { ENUM_A = 1 };
> +  enum { ENUM_B = 1 };
> +  construct_at(b._space, ENUM_B);
> +}
> --
> 2.43.0
>


[gcc r12-10433] testsuite: fix Wmismatched-new-delete-8.C with -m32

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:58d11bfc27d5412619c0919738158a4c05cca2cf

commit r12-10433-g58d11bfc27d5412619c0919738158a4c05cca2cf
Author: Marek Polacek 
Date:   Thu Feb 22 18:52:32 2024 -0500

testsuite: fix Wmismatched-new-delete-8.C with -m32

This fixes
error: 'operator new' takes type 'size_t' ('unsigned int') as first 
parameter [-fpermissive]

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wmismatched-new-delete-8.C: Use __SIZE_TYPE__.

(cherry picked from commit d34d7c74d51d365a3a4ddcd4383fc7c9f29020a1)

Diff:
---
 gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C 
b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
index 0ddc056c6df2..e8fd7a85b8c9 100644
--- a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
+++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
@@ -11,7 +11,7 @@ static inline T * construct_at(void *at, ARGS && args)
  struct Placeable : T
  {
   Placeable(ARGS && args) : T(args) { }
-  void * operator new (long unsigned int, void *ptr) { return ptr; }
+  void * operator new (__SIZE_TYPE__, void *ptr) { return ptr; }
   void operator delete (void *, void *) { }
  };
  return new (at) Placeable(static_cast(args));


[gcc r12-10432] warn-access: Fix handling of unnamed types [PR109804]

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:16319f8fba6c049d743046488588f40da2349048

commit r12-10432-g16319f8fba6c049d743046488588f40da2349048
Author: Andrew Pinski 
Date:   Wed Feb 21 20:12:21 2024 -0800

warn-access: Fix handling of unnamed types [PR109804]

This looks like an oversight of handling DEMANGLE_COMPONENT_UNNAMED_TYPE.
DEMANGLE_COMPONENT_UNNAMED_TYPE only has the u.s_number.number set while
the code expected newc.u.s_binary.left would be valid.
So this treats DEMANGLE_COMPONENT_UNNAMED_TYPE like we treat function 
paramaters
(DEMANGLE_COMPONENT_FUNCTION_PARAM) and template paramaters 
(DEMANGLE_COMPONENT_TEMPLATE_PARAM).

Note the code in the demangler does this when it sets 
DEMANGLE_COMPONENT_UNNAMED_TYPE:
  ret->type = DEMANGLE_COMPONENT_UNNAMED_TYPE;
  ret->u.s_number.number = num;

Committed as obvious after bootstrap/test on x86_64-linux-gnu

PR tree-optimization/109804

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (new_delete_mismatch_p): Handle
DEMANGLE_COMPONENT_UNNAMED_TYPE.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wmismatched-new-delete-8.C: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 1076ffda6ce5e6d5fc9577deaf8233e549e5787a)

Diff:
---
 gcc/gimple-ssa-warn-access.cc  |  1 +
 .../g++.dg/warn/Wmismatched-new-delete-8.C | 42 ++
 2 files changed, 43 insertions(+)

diff --git a/gcc/gimple-ssa-warn-access.cc b/gcc/gimple-ssa-warn-access.cc
index 8d088ad33f2f..e70a6f1fb877 100644
--- a/gcc/gimple-ssa-warn-access.cc
+++ b/gcc/gimple-ssa-warn-access.cc
@@ -1688,6 +1688,7 @@ new_delete_mismatch_p (const demangle_component ,
 
 case DEMANGLE_COMPONENT_FUNCTION_PARAM:
 case DEMANGLE_COMPONENT_TEMPLATE_PARAM:
+case DEMANGLE_COMPONENT_UNNAMED_TYPE:
   return newc.u.s_number.number != delc.u.s_number.number;
 
 case DEMANGLE_COMPONENT_CHARACTER:
diff --git a/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C 
b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
new file mode 100644
index ..0ddc056c6df2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wmismatched-new-delete-8.C
@@ -0,0 +1,42 @@
+/* PR tree-optimization/109804 */
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-Wall" } */
+
+/* Here we used to ICE in new_delete_mismatch_p because
+   we didn't handle unnamed types from the demangler 
(DEMANGLE_COMPONENT_UNNAMED_TYPE). */
+
+template 
+static inline T * construct_at(void *at, ARGS && args)
+{
+ struct Placeable : T
+ {
+  Placeable(ARGS && args) : T(args) { }
+  void * operator new (long unsigned int, void *ptr) { return ptr; }
+  void operator delete (void *, void *) { }
+ };
+ return new (at) Placeable(static_cast(args));
+}
+template 
+struct Reconstructible
+{
+  char _space[sizeof(MT)];
+  Reconstructible() { }
+};
+template 
+struct Constructible : Reconstructible
+{
+ Constructible(){}
+};
+struct A { };
+struct B
+{
+ Constructible a { };
+ B(int) { }
+};
+Constructible b { };
+void f()
+{
+  enum { ENUM_A = 1 };
+  enum { ENUM_B = 1 };
+  construct_at(b._space, ENUM_B);
+}


[gcc r12-10431] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:39d56b96996dd8336652ceac97983c26fd8de4c6

commit r12-10431-g39d56b96996dd8336652ceac97983c26fd8de4c6
Author: Andrew Pinski 
Date:   Thu Sep 7 22:13:31 2023 -0700

Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the 
comparison
which was wrong.

Committed to GCC 13 branch after bootstrapped and tested on 
x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111331
* tree-ssa-phiopt.cc (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max : a` optimization.

gcc/testsuite/ChangeLog:

PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.

(cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/pr111331-1.c | 17 +
 gcc/testsuite/gcc.c-torture/execute/pr111331-2.c | 19 +++
 gcc/testsuite/gcc.c-torture/execute/pr111331-3.c | 15 +++
 gcc/tree-ssa-phiopt.cc   |  8 
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
new file mode 100644
index ..4c7f4fdbaa9d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
@@ -0,0 +1,17 @@
+int a;
+int b;
+int c(int d, int e, int f) {
+  if (d < e)
+return e;
+  if (d > f)
+return f;
+  return d;
+}
+int main() {
+  int g = -1;
+  a = c(b + 30, 29, g + 29);
+  volatile t = a;
+  if (t != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
new file mode 100644
index ..5c677f2caa9f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
@@ -0,0 +1,19 @@
+
+int a;
+int b;
+
+int main() {
+  int d = b+30;
+  {
+int t;
+if (d < 29)
+  t =  29;
+else
+  t = (d > 28) ? 28 : d;
+a = t;
+  }
+  volatile int t = a;
+  if (a != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
new file mode 100644
index ..213d9bdd539d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
@@ -0,0 +1,15 @@
+int a;
+int b;
+
+int main() {
+  int d = b+30;
+  {
+int t;
+t = d < 29 ? 29 : ((d > 28) ? 28 : d);
+a = t;
+  }
+  volatile int t = a;
+  if (a != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index c56d0b9ff151..e2dba56383b4 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -2014,7 +2014,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND <= LARGER.  */
  if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
- bound, larger)))
+ bound, arg_false)))
return false;
}
  else if (operand_equal_for_phi_arg_p (arg_false, smaller)
@@ -2045,7 +2045,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND >= SMALLER.  */
  if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
- bound, smaller)))
+ bound, arg_false)))
return false;
}
  else
@@ -2085,7 +2085,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND >= LARGER.  */
  if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
- bound, larger)))
+ bound, arg_true)))
return false;
}
  else if (operand_equal_for_phi_arg_p (arg_true, smaller)
@@ -2112,7 +2112,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND <= SMALLER.  */
  if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
- bound, smaller)))
+ bound, arg_true)))
return false;
}
  else


[gcc r11-11421] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:16e27b6d03756bf1fae22607fa93107787a7b9cb

commit r11-11421-g16e27b6d03756bf1fae22607fa93107787a7b9cb
Author: Andrew Pinski 
Date:   Thu Sep 7 22:13:31 2023 -0700

Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the 
comparison
which was wrong.

Committed to GCC 13 branch after bootstrapped and tested on 
x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111331
* tree-ssa-phiopt.c (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max : a` optimization.

gcc/testsuite/ChangeLog:

PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.

(cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)

Diff:
---
 gcc/testsuite/gcc.c-torture/execute/pr111331-1.c | 17 +
 gcc/testsuite/gcc.c-torture/execute/pr111331-2.c | 19 +++
 gcc/testsuite/gcc.c-torture/execute/pr111331-3.c | 15 +++
 gcc/tree-ssa-phiopt.c|  8 
 4 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
new file mode 100644
index ..4c7f4fdbaa9d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
@@ -0,0 +1,17 @@
+int a;
+int b;
+int c(int d, int e, int f) {
+  if (d < e)
+return e;
+  if (d > f)
+return f;
+  return d;
+}
+int main() {
+  int g = -1;
+  a = c(b + 30, 29, g + 29);
+  volatile t = a;
+  if (t != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
new file mode 100644
index ..5c677f2caa9f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
@@ -0,0 +1,19 @@
+
+int a;
+int b;
+
+int main() {
+  int d = b+30;
+  {
+int t;
+if (d < 29)
+  t =  29;
+else
+  t = (d > 28) ? 28 : d;
+a = t;
+  }
+  volatile int t = a;
+  if (a != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
new file mode 100644
index ..213d9bdd539d
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
@@ -0,0 +1,15 @@
+int a;
+int b;
+
+int main() {
+  int d = b+30;
+  {
+int t;
+t = d < 29 ? 29 : ((d > 28) ? 28 : d);
+a = t;
+  }
+  volatile int t = a;
+  if (a != 28)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index 5831a7764a49..d26d7889d952 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -1676,7 +1676,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND <= LARGER.  */
  if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
- bound, larger)))
+ bound, arg_false)))
return false;
}
  else if (operand_equal_for_phi_arg_p (arg_false, smaller)
@@ -1707,7 +1707,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND >= SMALLER.  */
  if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
- bound, smaller)))
+ bound, arg_false)))
return false;
}
  else
@@ -1747,7 +1747,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND >= LARGER.  */
  if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
- bound, larger)))
+ bound, arg_true)))
return false;
}
  else if (operand_equal_for_phi_arg_p (arg_true, smaller)
@@ -1774,7 +1774,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb,
 
  /* We need BOUND <= SMALLER.  */
  if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
- bound, smaller)))
+ bound, arg_true)))
return false;
}
  else


Re: [COMMITTED/13] Fix PR 111331: wrong code for `a > 28 ? MIN : 29`

2024-05-08 Thread Andrew Pinski
On Sun, Oct 1, 2023 at 1:23 PM Andrew Pinski  wrote:
>
> From: Andrew Pinski 
>
> The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
> the comparison to see if the transformation could be done was using the
> wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
> the outer value, it was comparing the inner to the value used in the 
> comparison
> which was wrong.
>
> Committed to GCC 13 branch after bootstrapped and tested on x86_64-linux-gnu.

Committed also to GCC 12 and 11 branches.

>
> gcc/ChangeLog:
>
> PR tree-optimization/111331
> * tree-ssa-phiopt.cc (minmax_replacement):
> Fix the LE/GE comparison for the
> `(a CMP CST1) ? max : a` optimization.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/111331
> * gcc.c-torture/execute/pr111331-1.c: New test.
> * gcc.c-torture/execute/pr111331-2.c: New test.
> * gcc.c-torture/execute/pr111331-3.c: New test.
>
> (cherry picked from commit 30e6ee074588bacefd2dfe745b188bb20c81fe5e)
> ---
>  .../gcc.c-torture/execute/pr111331-1.c| 17 +
>  .../gcc.c-torture/execute/pr111331-2.c| 19 +++
>  .../gcc.c-torture/execute/pr111331-3.c| 15 +++
>  gcc/tree-ssa-phiopt.cc|  8 
>  4 files changed, 55 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
>  create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> new file mode 100644
> index 000..4c7f4fdbaa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-1.c
> @@ -0,0 +1,17 @@
> +int a;
> +int b;
> +int c(int d, int e, int f) {
> +  if (d < e)
> +return e;
> +  if (d > f)
> +return f;
> +  return d;
> +}
> +int main() {
> +  int g = -1;
> +  a = c(b + 30, 29, g + 29);
> +  volatile t = a;
> +  if (t != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> new file mode 100644
> index 000..5c677f2caa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-2.c
> @@ -0,0 +1,19 @@
> +
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +if (d < 29)
> +  t =  29;
> +else
> +  t = (d > 28) ? 28 : d;
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c 
> b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> new file mode 100644
> index 000..213d9bdd539
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr111331-3.c
> @@ -0,0 +1,15 @@
> +int a;
> +int b;
> +
> +int main() {
> +  int d = b+30;
> +  {
> +int t;
> +t = d < 29 ? 29 : ((d > 28) ? 28 : d);
> +a = t;
> +  }
> +  volatile int t = a;
> +  if (a != 28)
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index a7ab6ce4ad9..c3d78d1400b 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -2270,7 +2270,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND <= LARGER.  */
>   if (!integer_nonzerop (fold_build2 (LE_EXPR, boolean_type_node,
> - bound, larger)))
> + bound, arg_false)))
> return false;
> }
>   else if (operand_equal_for_phi_arg_p (arg_false, smaller)
> @@ -2301,7 +2301,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= SMALLER.  */
>   if (!integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node,
> - bound, smaller)))
> + bound, arg_false)))
> return false;
> }
>   else
> @@ -2341,7 +2341,7 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>   /* We need BOUND >= LARGER.  */
>   if (!integer_nonzerop (fold_build

Re: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-05-08 Thread Andrew Pinski
On Mon, Mar 11, 2024 at 11:41 PM Andrew Pinski (QUIC)
 wrote:
>
> > -Original Message-
> > From: Andrew Pinski (QUIC) 
> > Sent: Sunday, March 10, 2024 7:58 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Andrew Pinski (QUIC) 
> > Subject: [COMMITTED] Fold: Fix up merge_truthop_with_opposite_arm for
> > NaNs [PR95351]
> >
> > The problem here is that merge_truthop_with_opposite_arm would use the
> > type of the result of the comparison rather than the operands of the
> > comparison to figure out if we are honoring NaNs.
> > This fixes that oversight and now we get the correct results in this case.
> >
> > Committed as obvious after a bootstrap/test on x86_64-linux-gnu.
>
> Committed to the GCC 13 branch too.

And the GCC 12 and 11 branches too.


>
> Thanks,
> Andrew
>
> >
> >   PR middle-end/95351
> >
> > gcc/ChangeLog:
> >
> >   * fold-const.cc (merge_truthop_with_opposite_arm): Use
> >   the type of the operands of the comparison and not the type
> >   of the comparison.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/float_opposite_arm-1.c: New test.
> >
> > Signed-off-by: Andrew Pinski 
> > ---
> >  gcc/fold-const.cc   |  3 ++-
> >  gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> > 43105d20be3..299c22bf391 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -6420,7 +6420,6 @@ static tree
> >  merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
> >bool rhs_only)
> >  {
> > -  tree type = TREE_TYPE (cmpop);
> >enum tree_code code = TREE_CODE (cmpop);
> >enum tree_code truthop_code = TREE_CODE (op);
> >tree lhs = TREE_OPERAND (op, 0);
> > @@ -6436,6 +6435,8 @@ merge_truthop_with_opposite_arm (location_t
> > loc, tree op, tree cmpop,
> >if (TREE_CODE_CLASS (code) != tcc_comparison)
> >  return NULL_TREE;
> >
> > +  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
> > +
> >if (rhs_code == truthop_code)
> >  {
> >tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop,
> > rhs_only); diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > new file mode 100644
> > index 000..d2dbff35066
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
> > +/* { dg-add-options ieee } */
> > +/* PR middle-end/95351 */
> > +
> > +int Foo(double possiblyNAN, double b, double c) {
> > +return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c)); }
> > +
> > +/* Make sure we don't remove either >/<=  */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0"
> > +"optimized" } } */
> > +
> > +/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } }
> > +*/
> > +/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0"
> > +"optimized" } } */
> > --
> > 2.43.0
>


[gcc r11-11420] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:6c00c3245e688d00dae3e928f0d03f530640caae

commit r11-11420-g6c00c3245e688d00dae3e928f0d03f530640caae
Author: Andrew Pinski 
Date:   Sun Mar 10 22:17:09 2024 +

Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/95351

gcc/ChangeLog:

* fold-const.c (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/float_opposite_arm-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 31ce2e993d09dcad1ce139a2848a28de5931056d)

Diff:
---
 gcc/fold-const.c|  3 ++-
 gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index f4fd980dbbc8..97f77da5b93f 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -6171,7 +6171,6 @@ static tree
 merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
 bool rhs_only)
 {
-  tree type = TREE_TYPE (cmpop);
   enum tree_code code = TREE_CODE (cmpop);
   enum tree_code truthop_code = TREE_CODE (op);
   tree lhs = TREE_OPERAND (op, 0);
@@ -6187,6 +6186,8 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, 
tree cmpop,
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return NULL_TREE;
 
+  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
+
   if (rhs_code == truthop_code)
 {
   tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop, 
rhs_only);
diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c 
b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
new file mode 100644
index ..d2dbff350663
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+/* PR middle-end/95351 */
+
+int Foo(double possiblyNAN, double b, double c)
+{
+return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c));
+}
+
+/* Make sure we don't remove either >/<=  */
+
+/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0" "optimized" 
} } */
+
+/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0" "optimized" 
} } */


[gcc r12-10430] Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:d88fe8210e4edc2f4ddf722ba788924452c6f6a0

commit r12-10430-gd88fe8210e4edc2f4ddf722ba788924452c6f6a0
Author: Andrew Pinski 
Date:   Sun Mar 10 22:17:09 2024 +

Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95351]

The problem here is that merge_truthop_with_opposite_arm would
use the type of the result of the comparison rather than the operands
of the comparison to figure out if we are honoring NaNs.
This fixes that oversight and now we get the correct results in this
case.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

PR middle-end/95351

gcc/ChangeLog:

* fold-const.cc (merge_truthop_with_opposite_arm): Use
the type of the operands of the comparison and not the type
of the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/float_opposite_arm-1.c: New test.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 31ce2e993d09dcad1ce139a2848a28de5931056d)

Diff:
---
 gcc/fold-const.cc   |  3 ++-
 gcc/testsuite/gcc.dg/float_opposite_arm-1.c | 17 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index cd410e50d779..da96ed34a4c3 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -6188,7 +6188,6 @@ static tree
 merge_truthop_with_opposite_arm (location_t loc, tree op, tree cmpop,
 bool rhs_only)
 {
-  tree type = TREE_TYPE (cmpop);
   enum tree_code code = TREE_CODE (cmpop);
   enum tree_code truthop_code = TREE_CODE (op);
   tree lhs = TREE_OPERAND (op, 0);
@@ -6204,6 +6203,8 @@ merge_truthop_with_opposite_arm (location_t loc, tree op, 
tree cmpop,
   if (TREE_CODE_CLASS (code) != tcc_comparison)
 return NULL_TREE;
 
+  tree type = TREE_TYPE (TREE_OPERAND (cmpop, 0));
+
   if (rhs_code == truthop_code)
 {
   tree newrhs = merge_truthop_with_opposite_arm (loc, rhs, cmpop, 
rhs_only);
diff --git a/gcc/testsuite/gcc.dg/float_opposite_arm-1.c 
b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
new file mode 100644
index ..d2dbff350663
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/float_opposite_arm-1.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-original -fdump-tree-optimized" } */
+/* { dg-add-options ieee } */
+/* PR middle-end/95351 */
+
+int Foo(double possiblyNAN, double b, double c)
+{
+return (possiblyNAN <= 2.0) || ((possiblyNAN  > 2.0) && (b > c));
+}
+
+/* Make sure we don't remove either >/<=  */
+
+/* { dg-final { scan-tree-dump "possiblyNAN > 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. > 2.0e.0" "optimized" 
} } */
+
+/* { dg-final { scan-tree-dump "possiblyNAN <= 2.0e.0" "original" } } */
+/* { dg-final { scan-tree-dump "possiblyNAN_\[0-9\]+.D. <= 2.0e.0" "optimized" 
} } */


[gcc r13-8728] Fix PR 110066: crash with -pg -static on riscv

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:929b0fffe4d3d836e07e5a398a8e176e65f8b2c2

commit r13-8728-g929b0fffe4d3d836e07e5a398a8e176e65f8b2c2
Author: Andrew Pinski 
Date:   Sat Jul 22 08:52:42 2023 -0700

Fix PR 110066: crash with -pg -static on riscv

The problem -fasynchronous-unwind-tables is on by default for riscv linux
We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ 
point
to .eh_frame data from crtbeginT.o instead of the user-defined object
during static linking.

This turns it off.

OK?

libgcc/ChangeLog:

* config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
(riscv*-*-freebsd*): Likewise.
* config/riscv/t-crtstuff: New file.

(cherry picked from commit bbc1a102735c72e3c5a4dede8ab382813d12b058)

Diff:
---
 libgcc/config.host | 4 ++--
 libgcc/config/riscv/t-crtstuff | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/libgcc/config.host b/libgcc/config.host
index 9d7212028d06..c94d69d84b7c 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -1304,12 +1304,12 @@ pru-*-*)
tm_file="$tm_file pru/pru-abi.h"
;;
 riscv*-*-linux*)
-   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp 
riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
+   tmake_file="${tmake_file} riscv/t-crtstuff 
riscv/t-softfp${host_address} t-softfp riscv/t-elf riscv/t-elf${host_address} 
t-slibgcc-libgcc"
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o crtendS.o 
crtbeginT.o"
md_unwind_header=riscv/linux-unwind.h
;;
 riscv*-*-freebsd*)
-   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp 
riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
+   tmake_file="${tmake_file} riscv/t-crtstuff 
riscv/t-softfp${host_address} t-softfp riscv/t-elf riscv/t-elf${host_address} 
t-slibgcc-libgcc"
extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o crtendS.o 
crtbeginT.o"
;;
 riscv*-*-*)
diff --git a/libgcc/config/riscv/t-crtstuff b/libgcc/config/riscv/t-crtstuff
new file mode 100644
index ..685d11b3e66d
--- /dev/null
+++ b/libgcc/config/riscv/t-crtstuff
@@ -0,0 +1,5 @@
+# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv linux
+# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
+# to .eh_frame data from crtbeginT.o instead of the user-defined object
+# during static linking.
+CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables


Re: [PATCH 1/2] Fix PR 110066: crash with -pg -static on riscv

2024-05-08 Thread Andrew Pinski
On Sat, Jul 22, 2023 at 8:36 PM Kito Cheng via Gcc-patches
 wrote:
>
> OK for trunk, thanks:)

I have now backported it to 13 branch.

Thanks,
Andrew


>
> Andrew Pinski via Gcc-patches  於 2023年7月23日 週日
> 09:07 寫道:
>
> > The problem -fasynchronous-unwind-tables is on by default for riscv linux
> > We need turn it off for crt*.o because it would make __EH_FRAME_BEGIN__
> > point
> > to .eh_frame data from crtbeginT.o instead of the user-defined object
> > during static linking.
> >
> > This turns it off.
> >
> > OK?
> >
> > libgcc/ChangeLog:
> >
> > * config.host (riscv*-*-linux*): Add t-crtstuff to tmake_file.
> > (riscv*-*-freebsd*): Likewise.
> > * config/riscv/t-crtstuff: New file.
> > ---
> >  libgcc/config.host | 4 ++--
> >  libgcc/config/riscv/t-crtstuff | 5 +
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >  create mode 100644 libgcc/config/riscv/t-crtstuff
> >
> > diff --git a/libgcc/config.host b/libgcc/config.host
> > index 9d7212028d0..c94d69d84b7 100644
> > --- a/libgcc/config.host
> > +++ b/libgcc/config.host
> > @@ -1304,12 +1304,12 @@ pru-*-*)
> > tm_file="$tm_file pru/pru-abi.h"
> > ;;
> >  riscv*-*-linux*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > md_unwind_header=riscv/linux-unwind.h
> > ;;
> >  riscv*-*-freebsd*)
> > -   tmake_file="${tmake_file} riscv/t-softfp${host_address} t-softfp
> > riscv/t-elf riscv/t-elf${host_address} t-slibgcc-libgcc"
> > +   tmake_file="${tmake_file} riscv/t-crtstuff
> > riscv/t-softfp${host_address} t-softfp riscv/t-elf
> > riscv/t-elf${host_address} t-slibgcc-libgcc"
> > extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o
> > crtendS.o crtbeginT.o"
> > ;;
> >  riscv*-*-*)
> > diff --git a/libgcc/config/riscv/t-crtstuff
> > b/libgcc/config/riscv/t-crtstuff
> > new file mode 100644
> > index 000..685d11b3e66
> > --- /dev/null
> > +++ b/libgcc/config/riscv/t-crtstuff
> > @@ -0,0 +1,5 @@
> > +# -fasynchronous-unwind-tables -funwind-tables is on by default for riscv
> > linux
> > +# We turn it off for crt*.o because it would make __EH_FRAME_BEGIN__ point
> > +# to .eh_frame data from crtbeginT.o instead of the user-defined object
> > +# during static linking.
> > +CRTSTUFF_T_CFLAGS += -fno-asynchronous-unwind-tables -fno-unwind-tables
> > --
> > 2.39.1
> >
> >


[gcc r15-328] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-08 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:5726de79e2154a16d8a045567d2cfad035f7ed19

commit r15-328-g5726de79e2154a16d8a045567d2cfad035f7ed19
Author: Andrew Pinski 
Date:   Mon May 6 23:53:41 2024 -0700

match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` 
[PR112392]

We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : 
ABS`
and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note I have a secondary pattern for the equal case as either a or 
nonnegative
could be used.

PR tree-optimization/112392

gcc/ChangeLog:

* match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-41.c: New test.

Signed-off-by: Andrew Pinski 

Diff:
---
 gcc/match.pd   | 15 +
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
 2 files changed, 49 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 03a03c31233c..07e743ae464b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (absu:utype @0)))
 @3
 
+/* X >  Positive ? X : ABS(X) -> ABS(X) */
+/* X >= Positive ? X : ABS(X) -> ABS(X) */
+/* X == Positive ? X : ABS(X) -> ABS(X) */
+(for cmp (eq gt ge)
+ (simplify
+  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
+  (if (INTEGRAL_TYPE_P (type))
+   @3)))
+
+/* X == Positive ? Positive : ABS(X) -> ABS(X) */
+(simplify
+ (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
+ (if (INTEGRAL_TYPE_P (type))
+  @3))
+
 /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
new file mode 100644
index ..9774e283a7ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiopt1" } */
+/* PR tree-optimization/112392 */
+
+int feq_1(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return absb;
+  return a > 0 ? a : -a;
+}
+int feq_2(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fgt(int a, unsigned char b)
+{
+  int absb = b;
+  if (a > absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fge(int a, unsigned char b)
+{
+  int absb = b;
+  if (a >= absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */


[PATCH] match: `a CMP nonnegative ? a : ABS` simplified to just `ABS` [PR112392]

2024-05-07 Thread Andrew Pinski
We can optimize `a == nonnegative ? a : ABS`, `a > nonnegative ? a : ABS`
and `a >= nonnegative ? a : ABS` into `ABS`. This allows removal of
some extra comparison and extra conditional moves in some cases.
I don't remember where I had found though but it is simple to add so
let's add it.

Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note I have a secondary pattern for the equal case as either a or nonnegative
could be used.

PR tree-optimization/112392

gcc/ChangeLog:

* match.pd (`x CMP nonnegative ? x : ABS`): New pattern;
where CMP is ==, > and >=.
(`x CMP nonnegative@y ? y : ABS`): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-41.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/match.pd   | 15 ++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c | 34 ++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 03a03c31233..07e743ae464 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -5876,6 +5876,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (convert (absu:utype @0)))
 @3
 
+/* X >  Positive ? X : ABS(X) -> ABS(X) */
+/* X >= Positive ? X : ABS(X) -> ABS(X) */
+/* X == Positive ? X : ABS(X) -> ABS(X) */
+(for cmp (eq gt ge)
+ (simplify
+  (cond (cmp:c @0 tree_expr_nonnegative_p@1) @0 (abs@3 @0))
+  (if (INTEGRAL_TYPE_P (type))
+   @3)))
+
+/* X == Positive ? Positive : ABS(X) -> ABS(X) */
+(simplify
+ (cond (eq:c @0 tree_expr_nonnegative_p@1) @1 (abs@3 @0))
+ (if (INTEGRAL_TYPE_P (type))
+  @3))
+
 /* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when
X is unsigned, as when X + 1 overflows, X is -1, so -X == 1.  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
new file mode 100644
index 000..9774e283a7b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-41.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-phiopt1" } */
+/* PR tree-optimization/112392 */
+
+int feq_1(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return absb;
+  return a > 0 ? a : -a;
+}
+int feq_2(int a, unsigned char b)
+{
+  int absb = b;
+  if (a == absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fgt(int a, unsigned char b)
+{
+  int absb = b;
+  if (a > absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+int fge(int a, unsigned char b)
+{
+  int absb = b;
+  if (a >= absb)  return a;
+  return a > 0 ? a : -a;
+}
+
+
+/* { dg-final { scan-tree-dump-not "if " "phiopt1" } } */
+/* { dg-final { scan-tree-dump-times "ABS_EXPR <" 4 "phiopt1" } } */
-- 
2.43.0



[gcc r11-11419] c++/c-common: Fix convert_vector_to_array_for_subscript for qualified vector types [PR89224]

2024-05-07 Thread Andrew Pinski via Gcc-cvs
https://gcc.gnu.org/g:046aeffba336295fbdaf0e1ecf64b582d08f0aa6

commit r11-11419-g046aeffba336295fbdaf0e1ecf64b582d08f0aa6
Author: Andrew Pinski 
Date:   Tue Feb 20 13:38:28 2024 -0800

c++/c-common: Fix convert_vector_to_array_for_subscript for qualified 
vector types [PR89224]

After r7-987-gf17a223de829cb, the access for the elements of a vector type 
would lose the qualifiers.
So if we had `constvector[0]`, the type of the element of the array would 
not have const on it.
This was due to a missing build_qualified_type for the inner type of the 
vector when building the array type.
We need to add back the call to build_qualified_type and now the access has 
the correct qualifiers. So the
overloads and even if it is a lvalue or rvalue is correctly done.

Note we correctly now reject the testcase gcc.dg/pr83415.c which was 
incorrectly accepted after r7-987-gf17a223de829cb.

Built and tested for aarch64-linux-gnu.

PR c++/89224

gcc/c-family/ChangeLog:

* c-common.c (convert_vector_to_array_for_subscript): Call 
build_qualified_type
for the inner type.

gcc/cp/ChangeLog:

* constexpr.c (cxx_eval_array_reference): Compare main variants
for the vector/array types instead of the types directly.

gcc/testsuite/ChangeLog:

* g++.dg/torture/vector-subaccess-1.C: New test.
* gcc.dg/pr83415.c: Change warning to error.

Signed-off-by: Andrew Pinski 
(cherry picked from commit 4421d35167b3083e0f2e4c84c91fded09a30cf22)

Diff:
---
 gcc/c-family/c-common.c   |  7 ++-
 gcc/cp/constexpr.c|  3 ++-
 gcc/testsuite/g++.dg/torture/vector-subaccess-1.C | 23 +++
 gcc/testsuite/gcc.dg/pr83415.c|  2 +-
 4 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9417b7fb4d1f..ae3ef89b05cb 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8274,6 +8274,7 @@ convert_vector_to_array_for_subscript (location_t loc,
   if (gnu_vector_type_p (TREE_TYPE (*vecp)))
 {
   tree type = TREE_TYPE (*vecp);
+  tree newitype;
 
   ret = !lvalue_p (*vecp);
 
@@ -8288,8 +8289,12 @@ convert_vector_to_array_for_subscript (location_t loc,
 for function parameters.  */
   c_common_mark_addressable_vec (*vecp);
 
+  /* Make sure qualifiers are copied from the vector type to the new 
element
+of the array type.  */
+  newitype = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
+
   *vecp = build1 (VIEW_CONVERT_EXPR,
- build_array_type_nelts (TREE_TYPE (type),
+ build_array_type_nelts (newitype,
  TYPE_VECTOR_SUBPARTS (type)),
  *vecp);
 }
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 38f684144f0c..eb18b5b35378 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3767,7 +3767,8 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree 
t,
   if (!lval
   && TREE_CODE (ary) == VIEW_CONVERT_EXPR
   && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (ary, 0)))
-  && TREE_TYPE (t) == TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0
+  && (TYPE_MAIN_VARIANT (TREE_TYPE (t))
+ == TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (TREE_OPERAND (ary, 0))
 ary = TREE_OPERAND (ary, 0);
 
   tree oldidx = TREE_OPERAND (t, 1);
diff --git a/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C 
b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
new file mode 100644
index ..0c8958a4e034
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/vector-subaccess-1.C
@@ -0,0 +1,23 @@
+/* PR c++/89224 */
+
+/* The access of `vector[i]` has the same qualifiers as the original
+   vector which was missing. */
+
+typedef __attribute__((vector_size(16))) unsigned char  Int8x8_t;
+
+template 
+void g(T ) {
+__builtin_abort();
+}
+template 
+void g(const T ) {
+  __builtin_exit(0);
+}
+void f(const Int8x8_t x) {
+  g(x[0]);
+}
+int main(void)
+{
+Int8x8_t x ={};
+f(x);
+}
diff --git a/gcc/testsuite/gcc.dg/pr83415.c b/gcc/testsuite/gcc.dg/pr83415.c
index 5934c16d97cb..2fc85031505d 100644
--- a/gcc/testsuite/gcc.dg/pr83415.c
+++ b/gcc/testsuite/gcc.dg/pr83415.c
@@ -7,6 +7,6 @@ int
 main (int argc, short *argv[])
 {
   int i = argc;
-  y[i] = 7 - i; /* { dg-warning "read-only" } */
+  y[i] = 7 - i; /* { dg-error "read-only" } */
   return 0;
 }


  1   2   3   4   5   6   7   8   9   10   >