Re: RISC-V: Folding memory for FP + constant case

2023-07-14 Thread Jeff Law via Gcc-patches




On 7/12/23 14:59, Jivan Hakobyan via Gcc-patches wrote:

Accessing local arrays element turned into load form (fp + (index <<
C1)) + C2 address. In the case when access is in the loop we got loop
invariant computation. For some reason, moving out that part cannot
be done in loop-invariant passes. But we can handle that in
target-specific hook (legitimize_address). That provides an
opportunity to rewrite memory access more suitable for the target
architecture.

This patch solves the mentioned case by rewriting mentioned case to
((fp + C2) + (index << C1)) I have evaluated it on SPEC2017 and got
an improvement on leela (over 7b instructions, .39% of the dynamic
count) and dwarfs the regression for gcc (14m instructions, .0012% of
the dynamic count).


gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address):
Handle folding. (mem_shadd_or_shadd_rtx_p): New predicate.
So I still need to give the new version a review.  But a high level 
question -- did you re-run the benchmarks with this version to verify 
that we still saw the same nice improvement in leela?


The reason I ask is when I use this on Ventana's internal tree I don't 
see any notable differences in the dynamic instruction counts.  And 
probably the most critical difference between the upstream tree and 
Ventana's tree in this space is Ventana's internal tree has an earlier 
version of the fold-mem-offsets work from Manolis.


It may ultimately be the case that this work and Manolis's f-m-o patch 
have a lot of overlap in terms of their final effect on code generation. 
 Manolis's pass runs much later (after register allocation), so it's 
not going to address the loop-invariant-code-motion issue that 
originally got us looking into this space.  But his pass is generic 
enough that it helps other targets.  So we may ultimately want both.


Anyway, just wanted to verify if this variant is still showing the nice 
improvement on leela that the prior version did.


Jeff

ps.  I know you're on PTO.  No rush on responding -- enjoy the time off.



Re: [PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-07-14 Thread Ken Matsui via Gcc-patches
Hi,

Here are the benchmarks for this change:

* is_fundamental

https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental.md#fri-jul-14-091146-pm-pdt-2023

Time: -37.1619%
Peak Memory Usage: -29.4294%
Total Memory Usage: -29.4783%

* is_fundamental_v

https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental_v.md#fri-jul-14-091757-pm-pdt-2023

Time: -35.5446%
Peak Memory Usage: -30.0096%
Total Memory Usage: -30.6021%

* is_fundamental with bool_constant (on trunk
[18dac101678b8c0aed4bd995351e47f26cd54dec])

https://github.com/ken-matsui/gcc-benches/blob/main/is_fundamental-bool_constant.md#fri-jul-14-094237-pm-pdt-2023

Time: -28.3908%
Peak Memory Usage: -18.5403%
Total Memory Usage: -19.9045%

---

It appears using bool_constant is better than disjunction. If my
understanding is correct, disjunction can avoid later instantiations
when short-circuiting, but might the evaluation of disjunction be more
expensive than evaluating is_void and is_null_pointer? Or my benchmark
might be just incorrect.

Sincerely,
Ken Matsui

On Fri, Jul 14, 2023 at 9:57 PM Ken Matsui  wrote:
>
> This patch optimizes the performance of the is_fundamental trait by
> dispatching to the new __is_arithmetic built-in trait.
>
> libstdc++-v3/ChangeLog:
>
> * include/std/type_traits (is_fundamental_v): Use __is_arithmetic
> built-in trait.
> (is_fundamental): Likewise. Optimize the original implementation.
>
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 7ebbe04c77b..cf24de2fcac 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  #endif
>
>/// is_fundamental
> +#if __has_builtin(__is_arithmetic)
> +  template
> +struct is_fundamental
> +: public __bool_constant<__is_arithmetic(_Tp)
> + || is_void<_Tp>::value
> + || is_null_pointer<_Tp>::value>
> +{ };
> +#else
>template
>  struct is_fundamental
> -: public __or_, is_void<_Tp>,
> -  is_null_pointer<_Tp>>::type
> +: public __bool_constant::value
> + || is_void<_Tp>::value
> + || is_null_pointer<_Tp>::value>
>  { };
> +#endif
>
>/// is_object
>template
> @@ -3209,13 +3219,16 @@ template 
>  #if __has_builtin(__is_arithmetic)
>  template 
>inline constexpr bool is_arithmetic_v = __is_arithmetic(_Tp);
> +template 
> +  inline constexpr bool is_fundamental_v
> += __is_arithmetic(_Tp) || is_void_v<_Tp> || is_null_pointer_v<_Tp>;
>  #else
>  template 
>inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
> -#endif
> -
>  template 
>inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
> +#endif
> +
>  template 
>inline constexpr bool is_object_v = is_object<_Tp>::value;
>  template 
> --
> 2.41.0
>


[PATCH v2 3/3] libstdc++: Optimize is_fundamental performance by __is_arithmetic built-in

2023-07-14 Thread Ken Matsui via Gcc-patches
This patch optimizes the performance of the is_fundamental trait by
dispatching to the new __is_arithmetic built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_fundamental_v): Use __is_arithmetic
built-in trait.
(is_fundamental): Likewise. Optimize the original implementation.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 7ebbe04c77b..cf24de2fcac 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -668,11 +668,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
 
   /// is_fundamental
+#if __has_builtin(__is_arithmetic)
+  template
+struct is_fundamental
+: public __bool_constant<__is_arithmetic(_Tp)
+ || is_void<_Tp>::value
+ || is_null_pointer<_Tp>::value>
+{ };
+#else
   template
 struct is_fundamental
-: public __or_, is_void<_Tp>,
-  is_null_pointer<_Tp>>::type
+: public __bool_constant::value
+ || is_void<_Tp>::value
+ || is_null_pointer<_Tp>::value>
 { };
+#endif
 
   /// is_object
   template
@@ -3209,13 +3219,16 @@ template 
 #if __has_builtin(__is_arithmetic)
 template 
   inline constexpr bool is_arithmetic_v = __is_arithmetic(_Tp);
+template 
+  inline constexpr bool is_fundamental_v
+= __is_arithmetic(_Tp) || is_void_v<_Tp> || is_null_pointer_v<_Tp>;
 #else
 template 
   inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
-#endif
-
 template 
   inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
+#endif
+
 template 
   inline constexpr bool is_object_v = is_object<_Tp>::value;
 template 
-- 
2.41.0



[PATCH v2 2/3] libstdc++: Optimize is_arithmetic performance by __is_arithmetic built-in

2023-07-14 Thread Ken Matsui via Gcc-patches
This patch optimizes the performance of the is_arithmetic trait by
dispatching to the new __is_arithmetic built-in trait.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_arithmetic): Use __is_arithmetic
built-in trait.
(is_arithmetic_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..7ebbe04c77b 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -655,10 +655,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { };
 
   /// is_arithmetic
+#if __has_builtin(__is_arithmetic)
+  template
+struct is_arithmetic
+: public __bool_constant<__is_arithmetic(_Tp)>
+{ };
+#else
   template
 struct is_arithmetic
 : public __or_, is_floating_point<_Tp>>::type
 { };
+#endif
 
   /// is_fundamental
   template
@@ -3198,8 +3205,15 @@ template 
   inline constexpr bool is_reference_v<_Tp&> = true;
 template 
   inline constexpr bool is_reference_v<_Tp&&> = true;
+
+#if __has_builtin(__is_arithmetic)
+template 
+  inline constexpr bool is_arithmetic_v = __is_arithmetic(_Tp);
+#else
 template 
   inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
+#endif
+
 template 
   inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
 template 
-- 
2.41.0



[PATCH v2 1/3] c++, libstdc++: Implement __is_arithmetic built-in trait

2023-07-14 Thread Ken Matsui via Gcc-patches
This patch implements built-in trait for std::is_arithmetic.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_arithmetic.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARITHMETIC.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_arithmetic.
* g++.dg/ext/is_arithmetic.C: New test.
* g++.dg/tm/pr46567.C (__is_arithmetic): Rename to ...
(is_arithmetic): ... this.
* g++.dg/torture/pr57107.C: Likewise.

libstdc++-v3/ChangeLog:

* include/bits/cpp_type_traits.h (__is_arithmetic): Rename to ...
(is_arithmetic): ... this.
* include/c_global/cmath: Use is_arithmetic instead.
* include/c_std/cmath: Likewise.
* include/tr1/cmath: Likewise.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc|  3 ++
 gcc/cp/cp-trait.def |  1 +
 gcc/cp/semantics.cc |  4 ++
 gcc/testsuite/g++.dg/ext/has-builtin-1.C|  3 ++
 gcc/testsuite/g++.dg/ext/is_arithmetic.C| 33 ++
 gcc/testsuite/g++.dg/tm/pr46567.C   |  6 +--
 gcc/testsuite/g++.dg/torture/pr57107.C  |  4 +-
 libstdc++-v3/include/bits/cpp_type_traits.h |  4 +-
 libstdc++-v3/include/c_global/cmath | 48 ++---
 libstdc++-v3/include/c_std/cmath| 24 +--
 libstdc++-v3/include/tr1/cmath  | 24 +--
 11 files changed, 99 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_arithmetic.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8cf0f2d0974..bd517d08843 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3754,6 +3754,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_AGGREGATE:
   inform (loc, "  %qT is not an aggregate", t1);
   break;
+case CPTK_IS_ARITHMETIC:
+  inform (loc, "  %qT is not an arithmetic type", t1);
+  break;
 case CPTK_IS_TRIVIALLY_COPYABLE:
   inform (loc, "  %qT is not trivially copyable", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..a95aeeaf778 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (IS_ARITHMETIC, "__is_arithmetic", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 /* FIXME Added space to avoid direct usage in GCC 13.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..4531f047d73 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_UNION:
   return type_code1 == UNION_TYPE;
 
+case CPTK_IS_ARITHMETIC:
+  return ARITHMETIC_TYPE_P (type1);
+
 case CPTK_IS_ASSIGNABLE:
   return is_xible (MODIFY_EXPR, type1, type2);
 
@@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_ENUM:
 case CPTK_IS_UNION:
 case CPTK_IS_SAME:
+case CPTK_IS_ARITHMETIC:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..3d63b0101d1 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__is_arithmetic)
+# error "__has_builtin (__is_arithmetic) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/is_arithmetic.C 
b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
new file mode 100644
index 000..fd35831f646
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
@@ -0,0 +1,33 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)  \
+  SA(TRAIT(TYPE) == EXPECT);   \
+  SA(TRAIT(const TYPE) == EXPECT); \
+  SA(TRAIT(volatile TYPE) == EXPECT);  \
+  SA(TRAIT(const volatile TYPE) == EXPECT)
+
+SA_TEST_CATEGORY(__is_arithmetic, void, false);
+
+SA_TEST_CATEGORY(__is_arithmetic, char, true);
+SA_TEST_CATEGORY(__is_arithmetic, signed char, true);
+SA_TEST_CATEGORY(__is_arithmetic, unsigned char, true);
+SA_TEST_CATEGORY(__is_arithmetic, wchar_t, true);
+SA_TEST_CATEGORY(__is_arithmetic, short, tr

[PATCH 2/2] [PATCH] Fix tree-opt/110252: wrong code due to phiopt using flow sensitive info during match

2023-07-14 Thread Andrew Pinski via Gcc-patches
Match will query ranger via tree_nonzero_bits/get_nonzero_bits for 2 and 3rd
operand of the COND_EXPR and phiopt tries to do create the COND_EXPR even if we 
moving
one statement. That one statement could have some flow sensitive information on 
it
based on the condition that is for the COND_EXPR but that might create wrong 
code
if the statement was moved out.

This is similar to the previous version of the patch except now we use
flow_sensitive_info_storage instead of manually doing the save/restore
and also handle all defs on a gimple statement rather than just for lhs
of the gimple statement. Oh and a few more testcases were added that
was failing before.

OK? Bootsrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/110252

gcc/ChangeLog:

* tree-ssa-phiopt.cc (class auto_flow_sensitive): New class.
(auto_flow_sensitive::auto_flow_sensitive): New constructor.
(auto_flow_sensitive::~auto_flow_sensitive): New deconstructor.
(match_simplify_replacement): Temporarily
remove the flow sensitive info on the two statements that might
be moved.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/phi-opt-25b.c: Updated as
__builtin_parity loses the nonzerobits info.
* gcc.c-torture/execute/pr110252-1.c: New test.
* gcc.c-torture/execute/pr110252-2.c: New test.
* gcc.c-torture/execute/pr110252-3.c: New test.
* gcc.c-torture/execute/pr110252-4.c: New test.
---
 .../gcc.c-torture/execute/pr110252-1.c| 15 ++
 .../gcc.c-torture/execute/pr110252-2.c| 10 
 .../gcc.c-torture/execute/pr110252-3.c| 13 +
 .../gcc.c-torture/execute/pr110252-4.c|  8 +++
 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c   |  6 +--
 gcc/tree-ssa-phiopt.cc| 51 +--
 6 files changed, 96 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110252-4.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
new file mode 100644
index 000..4ae93ca0647
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-1.c
@@ -0,0 +1,15 @@
+/* This is reduced from sel-sched.cc which was noticed was being miscompiled 
too. */
+int g(int min_need_stall) __attribute__((__noipa__));
+int g(int min_need_stall)
+{
+  return  min_need_stall < 0 ? 1 : ((min_need_stall) < (1) ? (min_need_stall) 
: (1));
+}
+int main(void)
+{
+  for(int i = -100; i <= 100; i++)
+{
+  int t = g(i);
+  if (t != (i!=0))
+__builtin_abort();
+}
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
new file mode 100644
index 000..7f1a7dbf134
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-2.c
@@ -0,0 +1,10 @@
+signed char f() __attribute__((__noipa__));
+signed char f() { return 0; }
+int main()
+{
+  int g = f() - 1;
+  int e = g < 0 ? 1 : ((g >> (8-2))!=0);
+  asm("":"+r"(e));
+  if (e != 1)
+__builtin_abort();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
new file mode 100644
index 000..c24bf1ab1e4
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-3.c
@@ -0,0 +1,13 @@
+
+unsigned int a = 1387579096U;
+void sinkandcheck(unsigned b) __attribute__((noipa));
+void sinkandcheck(unsigned b)
+{
+if (a != b)
+__builtin_abort();
+}
+int main() {
+a = 1 < (~a) ? 1 : (~a);
+sinkandcheck(1);
+return 0;
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c
new file mode 100644
index 000..f97edd3f069
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110252-4.c
@@ -0,0 +1,8 @@
+
+int a, b = 2, c = 2;
+int main() {
+  b = ~(1 % (a ^ (b - (1 && c) || c & b)));
+  if (b < -1)
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c 
b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
index 7298da0c96e..0fd9b004a03 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-25b.c
@@ -65,8 +65,6 @@ int test_popcountll(unsigned long long x, unsigned long long 
y)
   return x ? __builtin_popcountll(y) : 0;
 }
 
-/* 3 types of functions (not including parity), each with 3 types and there 
are 2 goto each */
-/* { dg-final { scan-tree-dump-times "goto " 18 "optimized" } } */
+/* 4 types of functions, each with 3 types and there are 2 goto each */
+/* { dg-final { scan-tree-dump-times "goto " 24 "optimized" } } */
 /* { dg-final { scan-tree-dump-times "x_..D. != 0" 12 "optimized" } } */
-/* parity case will be optimized

[PATCH 1/2] Add flow_sensitive_info_storage and use it in gimple-fold.

2023-07-14 Thread Andrew Pinski via Gcc-patches
This adds flow_sensitive_info_storage and uses it in
maybe_fold_comparisons_from_match_pd as mentioned in
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621817.html .
Since using it in maybe_fold_comparisons_from_match_pd was easy
and allowed me to test the storage earlier, I did it.

This also hides better how the flow sensitive information is
stored and only a single place needs to be updated if that
ever changes (again).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* gimple-fold.cc (fosa_unwind): Replace `vrange_storage *`
with flow_sensitive_info_storage.
(follow_outer_ssa_edges): Update how to save off the flow
sensitive info.
(maybe_fold_comparisons_from_match_pd): Update restoring
of flow sensitive info.
* tree-ssanames.cc (flow_sensitive_info_storage::save): New method.
(flow_sensitive_info_storage::restore): New method.
(flow_sensitive_info_storage::save_and_clear): New method.
(flow_sensitive_info_storage::clear_storage): New method.
* tree-ssanames.h (class flow_sensitive_info_storage): New class.
---
 gcc/gimple-fold.cc   | 17 +--
 gcc/tree-ssanames.cc | 72 
 gcc/tree-ssanames.h  | 21 +
 3 files changed, 100 insertions(+), 10 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 4027ff71e10..de94efbcff7 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -6947,7 +6947,7 @@ and_comparisons_1 (tree type, enum tree_code code1, tree 
op1a, tree op1b,
 }
 
 static basic_block fosa_bb;
-static vec > *fosa_unwind;
+static vec > *fosa_unwind;
 static tree
 follow_outer_ssa_edges (tree val)
 {
@@ -6967,14 +6967,11 @@ follow_outer_ssa_edges (tree val)
   || POINTER_TYPE_P (TREE_TYPE (val)))
  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (val)))
return NULL_TREE;
+  flow_sensitive_info_storage storage;
+  storage.save_and_clear (val);
   /* If the definition does not dominate fosa_bb temporarily reset
 flow-sensitive info.  */
-  if (val->ssa_name.info.range_info)
-   {
- fosa_unwind->safe_push (std::make_pair
-   (val, val->ssa_name.info.range_info));
- val->ssa_name.info.range_info = NULL;
-   }
+  fosa_unwind->safe_push (std::make_pair (val, storage));
   return val;
 }
   return val;
@@ -7034,14 +7031,14 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
tree_code code,
  type, gimple_assign_lhs (stmt1),
  gimple_assign_lhs (stmt2));
   fosa_bb = outer_cond_bb;
-  auto_vec, 8> unwind_stack;
+  auto_vec, 8> unwind_stack;
   fosa_unwind = &unwind_stack;
   if (op.resimplify (NULL, (!outer_cond_bb
? follow_all_ssa_edges : follow_outer_ssa_edges)))
 {
   fosa_unwind = NULL;
   for (auto p : unwind_stack)
-   p.first->ssa_name.info.range_info = p.second;
+   p.second.restore (p.first);
   if (gimple_simplified_result_is_gimple_val (&op))
{
  tree res = op.ops[0];
@@ -7065,7 +7062,7 @@ maybe_fold_comparisons_from_match_pd (tree type, enum 
tree_code code,
 }
   fosa_unwind = NULL;
   for (auto p : unwind_stack)
-p.first->ssa_name.info.range_info = p.second;
+p.second.restore (p.first);
 
   return NULL_TREE;
 }
diff --git a/gcc/tree-ssanames.cc b/gcc/tree-ssanames.cc
index 5fdb6a37e9f..f81332451fc 100644
--- a/gcc/tree-ssanames.cc
+++ b/gcc/tree-ssanames.cc
@@ -916,3 +916,75 @@ make_pass_release_ssa_names (gcc::context *ctxt)
 {
   return new pass_release_ssa_names (ctxt);
 }
+
+/* Save and restore of flow sensitive information. */
+
+/* Save off the flow sensitive info from NAME. */
+
+void
+flow_sensitive_info_storage::save (tree name)
+{
+  gcc_assert (state == 0);
+  if (!POINTER_TYPE_P (TREE_TYPE (name)))
+{
+  range_info = SSA_NAME_RANGE_INFO (name);
+  state = 1;
+  return;
+}
+  state = -1;
+  auto ptr_info = SSA_NAME_PTR_INFO (name);
+  if (ptr_info)
+{
+  align = ptr_info->align;
+  misalign = ptr_info->misalign;
+  null = SSA_NAME_PTR_INFO (name)->pt.null;
+}
+  else
+{
+  align = 0;
+  misalign = 0;
+  null = true;
+}
+}
+
+/* Restore the flow sensitive info from NAME. */
+
+void
+flow_sensitive_info_storage::restore (tree name)
+{
+  gcc_assert (state != 0);
+  if (!POINTER_TYPE_P (TREE_TYPE (name)))
+{
+  gcc_assert (state == 1);
+  SSA_NAME_RANGE_INFO (name) = range_info;
+  return;
+}
+  gcc_assert (state == -1);
+  auto ptr_info = SSA_NAME_PTR_INFO (name);
+  /* If there was no flow sensitive info on the pointer
+ just return, there is nothing to restore to.  */
+  if (!ptr_info)
+return;
+  if (align != 0)
+set_ptr_info_alignment (ptr_info, align, misalign);
+  else
+mark_ptr_info_alignment_unknown (ptr_info);
+  SSA_NAME_PTR_INFO (name)->pt.nul

[PATCH] libstdc++: Use __bool_constant entirely

2023-07-14 Thread Ken Matsui via Gcc-patches
This patch uses __bool_constant entirely instead of integral_constant
in the type_traits header, specifically for true_type, false_type,
and bool_constant.

libstdc++-v3/ChangeLog:

* include/std/type_traits (true_type): Use __bool_constant
instead.
(false_type): Likewise.
(bool_constant): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 9f086992ebc..7dc5791a7c5 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -78,24 +78,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 constexpr _Tp integral_constant<_Tp, __v>::value;
 #endif
 
-  /// The type used as a compile-time boolean with true value.
-  using true_type =  integral_constant;
-
-  /// The type used as a compile-time boolean with false value.
-  using false_type = integral_constant;
-
   /// @cond undocumented
   /// bool_constant for C++11
   template
 using __bool_constant = integral_constant;
   /// @endcond
 
+  /// The type used as a compile-time boolean with true value.
+  using true_type =  __bool_constant;
+
+  /// The type used as a compile-time boolean with false value.
+  using false_type = __bool_constant;
+
 #if __cplusplus >= 201703L
 # define __cpp_lib_bool_constant 201505L
   /// Alias template for compile-time boolean constant types.
   /// @since C++17
   template
-using bool_constant = integral_constant;
+using bool_constant = __bool_constant<__v>;
 #endif
 
   // Metaprogramming helper types.
-- 
2.41.0



[PATCH v3] Introduce attribute reverse_alias

2023-07-14 Thread Alexandre Oliva via Gcc-patches


This patch introduces an attribute to add extra aliases to a symbol
when its definition is output.  The main goal is to ease interfacing
C++ with Ada, as C++ mangled names have to be named, and in some cases
(e.g. when using stdint.h typedefs in function arguments) the symbol
names may vary across platforms.

The attribute is usable in C and C++, presumably in all C-family
languages.  It can be attached to global variables and functions.  In
C++, it can also be attached to namespace-scoped variables and
functions, static data members, member functions, explicit
instantiations and specializations of template functions, members and
classes.

When applied to constructors or destructor, additional reverse_aliases
with _Base and _Del suffixes are defined for variants other than
complete-object ones.  This changes the assumption that clones always
carry the same attributes as their abstract declarations, so there is
now a function to adjust them.

C++ also had a bug in which attributes from local extern declarations
failed to be propagated to a preexisting corresponding
namespace-scoped decl.  I've fixed that, and adjusted acc tests that
distinguished between C and C++ in this regard.

Applying the attribute to class types is only valid in C++, and the
effect is to attach the alias to the RTTI object associated with the
class type.

Regstrapped on x86_64-linux-gnu.  Ok to install?

This is refreshed and renamed from earlier versions that named the
attribute 'exalias', and that AFAICT got stuck in name bikeshedding.
https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551614.html


for  gcc/ChangeLog

* attribs.cc: Include cgraph.h.
(decl_attributes): Allow late introduction of reverse_alias in
types.
(create_reverse_alias_decl, create_reverse_alias_decls): New.
* attribs.h: Declare them.
(FOR_EACH_REVERSE_ALIAS): New macro.
* cgraph.cc (cgraph_node::create): Create reverse_alias decls.
* varpool.cc (varpool_node::get_create): Create reverse_alias
decls.
* cgraph.h (symtab_node::remap_reverse_alias_target): New.
* symtab.cc (symtab_node::remap_reverse_alias_target):
Define.
* cgraphunit.cc (cgraph_node::analyze): Create alias_target
node if needed.
(analyze_functions): Fixup visibility of implicit alias only
after its node is analyzed.
* doc/extend.texi (reverse_alias): Document for variables,
functions and types.

for  gcc/ada/ChangeLog

* doc/gnat_rm/interfacing_to_other_languages.rst: Mention
attribute reverse_alias to give RTTI symbols mnemonic names.
* doc/gnat_ugn/the_gnat_compilation_model.rst: Mention
attribute reverse_alias.  Fix incorrect ref to C1 ctor variant.

for  gcc/c-family/ChangeLog

* c-ada-spec.cc (pp_asm_name): Use first reverse_alias if
available.
* c-attribs.cc (handle_reverse_alias_attribute): New.
(c_common_attribute_table): Add reverse_alias.
(handle_copy_attribute): Do not copy reverse_alias.

for  gcc/c/ChangeLog

* c-decl.cc (duplicate_decls): Remap reverse_alias target.

for  gcc/cp/ChangeLog

* class.cc (adjust_clone_attributes): New.
(copy_fndecl_with_name, build_clone): Call it.
* cp-tree.h (adjust_clone_attributes): Declare.
(update_reverse_alias_interface): Declare.
(update_tinfo_reverse_alias): Declare.
* decl.cc (duplicate_decls): Remap reverse_alias target.
Adjust clone attributes.
(grokfndecl): Tentatively create reverse_alias decls after
adding attributes in e.g. a template member function explicit
instantiation.
* decl2.cc (cplus_decl_attributes): Update tinfo
reverse_alias.
(copy_interface, update_reverse_alias_interface): New.
(determine_visibility): Update reverse_alias interface.
(tentative_decl_linkage, import_export_decl): Likewise.
* name-lookup.cc: Include target.h and cgraph.h.
(push_local_extern_decl_alias): Merge attributes with
namespace-scoped decl, and drop duplicate reverse_alias.
* optimize.cc (maybe_clone_body): Re-adjust attributes after
cloning them.  Update reverse_alias interface.
* rtti.cc: Include attribs.h and cgraph.h.
(get_tinfo_decl): Copy reverse_alias attributes from type to
tinfo decl.  Create reverse_alias decls.
(update_tinfo_reverse_alias): New.

for  gcc/testsuite/ChangeLog

* c-c++-common/goacc/declare-1.c: Adjust.
* c-c++-common/goacc/declare-2.c: Adjust.
* c-c++-common/torture/attr-revalias-1.c: New.
* c-c++-common/torture/attr-revalias-2.c: New.
* c-c++-common/torture/attr-revalias-3.c: New.
* c-c++-common/torture/attr-revalias-4.c: New.
* g++.dg/torture/attr-revalias-1.C: New.
* g++.dg/torture/attr-revalias-2.C: New.
* g++.dg/torture/attr-revalias-3.C: 

[PATCH] VECT: Add mask_len_fold_left_plus for in-order floating-point reduction

2023-07-14 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch adds mask_len_fold_left_plus pattern to support in-order 
floating-point
reduction for target support len loop control.

Consider this following case:
double
foo2 (double *__restrict a,
 double init,
 int *__restrict cond,
 int n)
{
for (int i = 0; i < n; i++)
  if (cond[i])
init += a[i];
return init;
}

ARM SVE:

...
vec_mask_and_60 = loop_mask_54 & mask__23.33_57;
vect__ifc__35.37_64 = .VCOND_MASK (vec_mask_and_60, vect__8.36_61, { 0.0, ... 
});
_36 = .MASK_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, loop_mask_54);
...

For RVV, we want to see:
...
_36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.37_64, control_mask, 
loop_len, bias);
...

gcc/ChangeLog:

* doc/md.texi: Add mask_len_fold_left_plus.
* internal-fn.cc (mask_len_fold_left_direct): Ditto.
(expand_mask_len_fold_left_optab_fn): Ditto.
(direct_mask_len_fold_left_optab_supported_p): Ditto.
* internal-fn.def (MASK_LEN_FOLD_LEFT_PLUS): Ditto.
* optabs.def (OPTAB_D): Ditto.

---
 gcc/doc/md.texi | 13 +
 gcc/internal-fn.cc  |  5 +
 gcc/internal-fn.def |  3 +++
 gcc/optabs.def  |  1 +
 4 files changed, 22 insertions(+)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index cbcb992e5d7..6f44e66399d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5615,6 +5615,19 @@ no reassociation.
 Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
 (operand 3) that specifies which elements of the source vector should be added.
 
+@cindex @code{mask_len_fold_left_plus_@var{m}} instruction pattern
+@item @code{mask_len_fold_left_plus_@var{m}}
+Like @samp{fold_left_plus_@var{m}}, but takes an additional mask operand
+(operand 3), len operand (operand 4) and bias operand (operand 5) that
+performs following operations strictly in-order (no reassociation):
+
+@smallexample
+operand0 = operand1;
+for (i = 0; i < LEN + BIAS; i++)
+  if (operand3[i])
+operand0 += operand2[i];
+@end smallexample
+
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
 
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index e698f0bffc7..2bf4fc492fe 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -190,6 +190,7 @@ init_internal_fns ()
 #define fold_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
 #define mask_fold_left_direct { 1, 1, false }
+#define mask_len_fold_left_direct { 1, 1, false }
 #define check_ptrs_direct { 0, 0, false }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
@@ -3890,6 +3891,9 @@ expand_convert_optab_fn (internal_fn fn, gcall *stmt, 
convert_optab optab,
 #define expand_mask_fold_left_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 3)
 
+#define expand_mask_len_fold_left_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
+
 #define expand_check_ptrs_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 4)
 
@@ -3997,6 +4001,7 @@ multi_vector_optab_supported_p (convert_optab optab, 
tree_pair types,
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
 #define direct_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_mask_fold_left_optab_supported_p direct_optab_supported_p
+#define direct_mask_len_fold_left_optab_supported_p direct_optab_supported_p
 #define direct_check_ptrs_optab_supported_p direct_optab_supported_p
 #define direct_vec_set_optab_supported_p direct_optab_supported_p
 #define direct_vec_extract_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index ea750a921ed..d3aec51b1f2 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -319,6 +319,9 @@ DEF_INTERNAL_OPTAB_FN (FOLD_LEFT_PLUS, ECF_CONST | 
ECF_NOTHROW,
 DEF_INTERNAL_OPTAB_FN (MASK_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
   mask_fold_left_plus, mask_fold_left)
 
+DEF_INTERNAL_OPTAB_FN (MASK_LEN_FOLD_LEFT_PLUS, ECF_CONST | ECF_NOTHROW,
+  mask_len_fold_left_plus, mask_len_fold_left)
+
 /* Unary math functions.  */
 DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
 DEF_INTERNAL_FLT_FN (ACOSH, ECF_CONST, acosh, unary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3dae228fba6..7023392979e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -385,6 +385,7 @@ OPTAB_D (reduc_ior_scal_optab,  "reduc_ior_scal_$a")
 OPTAB_D (reduc_xor_scal_optab,  "reduc_xor_scal_$a")
 OPTAB_D (fold_left_plus_optab, "fold_left_plus_$a")
 OPTAB_D (mask_fold_left_plus_optab, "mask_fold_left_plus_$a")
+OPTAB_D (mask_len_fold_left_plus_optab, "mask_len_fold_left_plus_$a")
 
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
-- 
2.36.1



Re: [PATCH v3 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/23 11:16, Jason Merrill wrote:
I'm not seeing either a copyright assignment or DCO certification for 
you; please see https://gcc.gnu.org/contribute.html#legal for more 
information.


Oops, now I see the DCO sign-off, not sure how I was missing it.

Jason



[pushed] c++: c++26 regression fixes

2023-07-14 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Apparently I wasn't actually running the testsuite in C++26 mode like I
thought I was, so there were some failures I wasn't seeing.

The constexpr hunk fixes regressions with the P2738 implementation; we still
need to use the old handling for casting from void pointers to heap
variables.

PR c++/110344

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): Move P2738 handling
after heap handling.
* name-lookup.cc (get_cxx_dialect_name): Add C++26.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-cast2.C: Adjust for P2738.
* g++.dg/ipa/devirt-45.C: Handle -fimplicit-constexpr.
---
 gcc/cp/constexpr.cc  | 21 ++--
 gcc/cp/name-lookup.cc|  2 ++
 gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C |  6 +++---
 gcc/testsuite/g++.dg/ipa/devirt-45.C |  2 +-
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index cca0435bafc..9f96a6c41ea 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7681,17 +7681,6 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
&& !is_std_construct_at (ctx->call)
&& !is_std_allocator_allocate (ctx->call))
  {
-   /* P2738 (C++26): a conversion from a prvalue P of type "pointer to
-  cv void" to a pointer-to-object type T unless P points to an
-  object whose type is similar to T.  */
-   if (cxx_dialect > cxx23)
- if (tree ob
- = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), op))
-   {
- r = build1 (ADDR_EXPR, type, ob);
- break;
-   }
-
/* Likewise, don't error when casting from void* when OP is
   &heap uninit and similar.  */
tree sop = tree_strip_nop_conversions (op);
@@ -7699,6 +7688,16 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
&& VAR_P (TREE_OPERAND (sop, 0))
&& DECL_ARTIFICIAL (TREE_OPERAND (sop, 0)))
  /* OK */;
+   /* P2738 (C++26): a conversion from a prvalue P of type "pointer to
+  cv void" to a pointer-to-object type T unless P points to an
+  object whose type is similar to T.  */
+   else if (cxx_dialect > cxx23
+&& (sop = cxx_fold_indirect_ref (ctx, loc,
+ TREE_TYPE (type), sop)))
+ {
+   r = build1 (ADDR_EXPR, type, sop);
+   break;
+ }
else
  {
if (!ctx->quiet)
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 74565184403..2d747561e1f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -6731,6 +6731,8 @@ get_cxx_dialect_name (enum cxx_dialect dialect)
   return "C++20";
 case cxx23:
   return "C++23";
+case cxx26:
+  return "C++26";
 }
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C
index b79e8a90131..3efbd92f043 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-cast2.C
@@ -6,11 +6,11 @@ static int i;
 constexpr void *vp0 = nullptr;
 constexpr void *vpi = &i;
 constexpr int *p1 = (int *) vp0; // { dg-error "cast from .void\\*. is not 
allowed" }
-constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not 
allowed" }
+constexpr int *p2 = (int *) vpi; // { dg-error "cast from .void\\*. is not 
allowed" "" { target c++23_down } }
 constexpr int *p3 = static_cast(vp0); // { dg-error "cast from 
.void\\*. is not allowed" }
-constexpr int *p4 = static_cast(vpi); // { dg-error "cast from 
.void\\*. is not allowed" }
+constexpr int *p4 = static_cast(vpi); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
 constexpr void *p5 = vp0;
 constexpr void *p6 = vpi;
 
 constexpr int *pi = &i;
-constexpr bool b = ((int *)(void *) pi == pi); // { dg-error "cast from 
.void\\*. is not allowed" }
+constexpr bool b = ((int *)(void *) pi == pi); // { dg-error "cast from 
.void\\*. is not allowed" "" { target c++23_down } }
diff --git a/gcc/testsuite/g++.dg/ipa/devirt-45.C 
b/gcc/testsuite/g++.dg/ipa/devirt-45.C
index c26be21964c..019b454835c 100644
--- a/gcc/testsuite/g++.dg/ipa/devirt-45.C
+++ b/gcc/testsuite/g++.dg/ipa/devirt-45.C
@@ -37,5 +37,5 @@ int main()
 }
 
 /* One invocation is A::foo () other is B::foo () even though the type is 
destroyed and rebuilt in test() */
-/* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known 
target\[^\\n\]*A::foo" 2 "inline"  } } */
+/* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known 
target\[^\\n\]*A::foo" 2 "inline" { target { ! implicit_constexpr } } } }*/
 /* { dg-final { scan

Re: [PATCH] c++: mangling template-id of unknown template [PR110524]

2023-07-14 Thread Jason Merrill via Gcc-patches

On 7/13/23 09:20, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk and perhaps 13?


OK for both.


-- >8 --

This fixes a crash when mangling an ADL-enabled call to a template-id
naming an unknown template (as per P0846R0).

PR c++/110524

gcc/cp/ChangeLog:

* mangle.cc (write_expression): Handle TEMPLATE_ID_EXPR
whose template is already an IDENTIFIER_NODE.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/fn-template26.C: New test.
---
  gcc/cp/mangle.cc   |  3 ++-
  gcc/testsuite/g++.dg/cpp2a/fn-template26.C | 16 
  2 files changed, 18 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/fn-template26.C

diff --git a/gcc/cp/mangle.cc b/gcc/cp/mangle.cc
index 7dab4e62bc9..bef0fda6d22 100644
--- a/gcc/cp/mangle.cc
+++ b/gcc/cp/mangle.cc
@@ -3312,7 +3312,8 @@ write_expression (tree expr)
else if (TREE_CODE (expr) == TEMPLATE_ID_EXPR)
  {
tree fn = TREE_OPERAND (expr, 0);
-  fn = OVL_NAME (fn);
+  if (!identifier_p (fn))
+   fn = OVL_NAME (fn);
if (IDENTIFIER_ANY_OP_P (fn))
write_string ("on");
write_unqualified_id (fn);
diff --git a/gcc/testsuite/g++.dg/cpp2a/fn-template26.C 
b/gcc/testsuite/g++.dg/cpp2a/fn-template26.C
new file mode 100644
index 000..d4a17eb9bd1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/fn-template26.C
@@ -0,0 +1,16 @@
+// PR c++/110524
+// { dg-do compile { target c++20 } }
+
+template
+auto f(T t) -> decltype(g(t));
+
+namespace N {
+  struct A { };
+  template void g(T);
+};
+
+int main() {
+  f(N::A{});
+}
+
+// { dg-final { scan-assembler "_Z1fIN1N1AEEDTcl1gIT_Efp_EES2_" } }




Re: [PATCH] c++: copy elision of object arg in static memfn call [PR110441]

2023-07-14 Thread Jason Merrill via Gcc-patches

On 7/13/23 14:49, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


-- >8 --

Here the call A().f() is represented as a COMPOUND_EXPR whose first
operand is the otherwise unused object argument A() and second operand
is the call result (both are TARGET_EXPRs).  Within the return statement,
this outermost COMPOUND_EXPR ends up foiling the copy elision check in
build_special_member_call, resulting in us introducing a bogus call to the
deleted move constructor.  (Within the variable initialization, which goes
through ocp_convert instead of convert_for_initialization, we've already
been eliding the copy despite the outermost COMPOUND_EXPR ever since
r10-7410-g72809d6fe8e085 made ocp_convert look through COMPOUND_EXPR).

In contrast, I noticed '(A(), A::f())' (which should be equivalent to
the above call) is represented with the COMPOUND_EXPR inside the RHS's
TARGET_EXPR initializer thanks to a special case in cp_build_compound_expr
thus avoiding the issue.

So this patch fixes this by making keep_unused_object_arg
use cp_build_compound_expr as well.

PR c++/110441

gcc/cp/ChangeLog:

* call.cc (keep_unused_object_arg): Use cp_build_compound_expr
instead of building a COMPOUND_EXPR directly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/elide8.C: New test.
---
  gcc/cp/call.cc  |  2 +-
  gcc/testsuite/g++.dg/cpp1z/elide8.C | 25 +
  2 files changed, 26 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/elide8.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 119063979fa..b0a69cb46d4 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -5218,7 +5218,7 @@ keep_unused_object_arg (tree result, tree obj, tree fn)
if (TREE_THIS_VOLATILE (a))
  a = build_this (a);
if (TREE_SIDE_EFFECTS (a))
-return build2 (COMPOUND_EXPR, TREE_TYPE (result), a, result);
+return cp_build_compound_expr (a, result, tf_warning_or_error);
return result;
  }
  
diff --git a/gcc/testsuite/g++.dg/cpp1z/elide8.C b/gcc/testsuite/g++.dg/cpp1z/elide8.C

new file mode 100644
index 000..7d471be8a2a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/elide8.C
@@ -0,0 +1,25 @@
+// PR c++/110441
+// { dg-do compile { target c++11 } }
+
+struct immovable {
+  immovable(immovable &&) = delete;
+};
+
+struct A {
+  static immovable f();
+};
+
+immovable f() {
+  immovable m = A().f(); // { dg-error "deleted" "" { target c++14_down } }
+  return A().f(); // { dg-error "deleted" "" { target c++14_down } }
+}
+
+struct B {
+  A* operator->();
+};
+
+immovable g() {
+  B b;
+  immovable m = b->f(); // { dg-error "deleted" "" { target c++14_down } }
+  return b->f(); // { dg-error "deleted" "" { target c++14_down } }
+}




Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-07-14 Thread Jason Merrill via Gcc-patches

On 7/14/23 14:07, Patrick Palka wrote:

On Thu, 13 Jul 2023, Jason Merrill wrote:


On 7/13/23 11:48, Patrick Palka wrote:

On Wed, 28 Jun 2023, Patrick Palka wrote:


On Wed, Jun 28, 2023 at 11:50 AM Jason Merrill  wrote:


On 6/23/23 12:23, Patrick Palka wrote:

On Fri, 23 Jun 2023, Jason Merrill wrote:


On 6/21/23 13:19, Patrick Palka wrote:

When stepping through the variable/alias template specialization
code
paths, I noticed we perform template argument coercion twice:
first from
instantiate_alias_template / finish_template_variable and again
from
tsubst_decl (during instantiate_template).  It should suffice to
perform
coercion once.

To that end patch elides this second coercion from tsubst_decl
when
possible.  We can't get rid of it completely because we don't
always
specialize a variable template from finish_template_variable: we
could
also be doing so directly from instantiate_template during
variable
template partial specialization selection, in which case the
coercion
from tsubst_decl would be the first and only coercion.


Perhaps we should be coercing in lookup_template_variable rather
than
finish_template_variable?


Ah yes, there's a patch for that at
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)


So after that patch, can we get rid of the second coercion completely?


On second thought it should be possible to get rid of it, if we
rearrange things to always pass the primary arguments to tsubst_decl,
and perform partial specialization selection from there instead of
instantiate_template.  Let me try...


Like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

When stepping through the variable/alias template specialization code
paths, I noticed we perform template argument coercion twice: first from
instantiate_alias_template / finish_template_variable and again from
tsubst_decl (during instantiate_template).  It'd be good to avoid this
redundant coercion.

It turns out that this coercion could be safely elided whenever
specializing a primary variable/alias template, because we can rely on
lookup_template_variable and instantiate_alias_template to already have
coerced the arguments.

The other situation to consider is when fully specializing a partial
variable template specialization (from instantiate_template), in which
case the passed 'args' are the (already coerced) arguments relative to
the partial template and 'argvec', the result of substitution into
DECL_TI_ARGS, are the (uncoerced) arguments relative to the primary
template, so coercion is still necessary.  We can still avoid this
coercion however if we always pass the primary variable template to
tsubst_decl from instantiate_template, and instead perform partial
specialization selection directly from tsubst_decl.  This patch
implements this approach.


The relationship between instantiate_template and tsubst_decl is pretty
tangled.  We use the former to substitute (often deduced) template arguments
into a template, and the latter to substitute template arguments into a use of
a template...and also to implement the former.

For substitution of uses of a template, we expect to need to coerce the
arguments after substitution.  But we avoid this issue for variable templates
by keeping them as TEMPLATE_ID_EXPR until substitution time, so if we see a
VAR_DECL in tsubst_decl it's either a non-template variable or under
instantiate_template.


FWIW it seems we could also be in tsubst_decl for a VAR_DECL if

   * we're partially instantiating a class-scope variable template
 during instantiation of the class


Hmm, why don't partial instantiations stay as TEMPLATE_ID_EXPR?


   * we're substituting a use of an already non-dependent variable
 template specialization


Sure.


So it seems like the current coercion for variable templates is only needed in
this case to support the redundant hash table lookup that we just did in
instantiate_template.  Perhaps instead of doing coercion here or moving the
partial spec lookup, we could skip the hash table lookup for the case of a
variable template?


It seems we'd then also have to make instantiate_template responsible
for registering the variable template specialization since tsubst_decl
no longer necessarily has the arguments relative to the primary template
('args' could be relative to the partial template).

Like so?  The following makes us perform all the specialization table
manipulation in instantiate_template instead of tsubst_decl for variable
template specializations.


Looks good.


I wonder if we might want to do this for alias template specializations too?


That would make sense.


@@ -15222,20 +15230,21 @@ tsubst_decl (tree t, tree args, tsubst_flags_t 
complain)
  {
tmpl = DECL_TI_TEMPLATE (t);
gen_tmpl = most_general_template (tmpl);
-   argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl);
-   if (argvec != error_mark_node
-   && PRIMARY_TEMPLAT

Re: [WIP RFC] Add support for keyword-based attributes

2023-07-14 Thread Nathan Sidwell via Gcc-patches

On 7/14/23 11:56, Richard Sandiford wrote:

Summary: We'd like to be able to specify some attributes using
keywords, rather than the traditional __attribute__ or [[...]]
syntax.  Would that be OK?

In more detail:

We'd like to add some new target-specific attributes for Arm SME.
These attributes affect semantics and code generation and so they
can't simply be ignored.

Traditionally we've done this kind of thing by adding GNU attributes,
via TARGET_ATTRIBUTE_TABLE in GCC's case.  The problem is that both
GCC and Clang have traditionally only warned about unrecognised GNU
attributes, rather than raising an error.  Older compilers might
therefore be able to look past some uses of the new attributes and
still produce object code, even though that object code is almost
certainly going to be wrong.  (The compilers will also emit a default-on
warning, but that might go unnoticed when building a big project.)

There are some existing attributes that similarly affect semantics
in ways that cannot be ignored.  vector_size is one obvious example.
But that doesn't make it a good thing. :)

Also, C++ says this for standard [[...]] attributes:

   For an attribute-token (including an attribute-scoped-token)
   not specified in this document, the behavior is implementation-defined;
   any such attribute-token that is not recognized by the implementation
   is ignored.

which doubles down on the idea that attributes should not be used
for necessary semantic information.


There;s been quite a bit of discussion about the practicalities of that.  As you 
say, there are existing, std-specified attributes, [[no_unique_address]] for 
instance, that affect user-visible object layout if ignored.
Further, my understanding is that implementation-specific attributes are 
permitted to affect program semantics -- they're implementatin extensions.


IMHO, attributes are the accepted mechanism for what you're describing. 
Compilers already have a way of dealing with them -- both parsing and, in 
general, representing them.  I would be wary of inventing a different mechanism.


Have you approached C or C++ std bodies for input?



One of the attributes we'd like to add provides a new way of compiling
existing code.  The attribute doesn't require SME to be available;
it just says that the code must be compiled so that it can run in either
of two modes.  This is probably the most dangerous attribute of the set,
since compilers that ignore it would just produce normal code.  That
code might work in some test scenarios, but it would fail in others.

The feeling from the Clang community was therefore that these SME
attributes should use keywords instead, so that the keywords trigger
an error with older compilers.

However, it seemed wrong to define new SME-specific grammar rules,
since the underlying problem is pretty generic.  We therefore
proposed having a type of keyword that can appear exactly where
a standard [[...]] attribute can appear and that appertains to
exactly what a standard [[...]] attribute would appertain to.
No divergence or cherry-picking is allowed.

For example:

   [[arm::foo]]

would become:

   __arm_foo

and:

   [[arm::bar(args)]]

would become:

   __arm_bar(args)

It wouldn't be possible to retrofit arguments to a keyword that
previously didn't take arguments, since that could lead to parsing
ambiguities.  So when a keyword is first added, a binding decision
would need to be made whether the keyword always takes arguments
or is always standalone.

For that reason, empty argument lists are allowed for keywords,
even though they're not allowed for [[...]] attributes.

The argument-less version was accepted into Clang, and I have a follow-on
patch for handling arguments.  Would the same thing be OK for GCC,
in both the C and C++ frontends?

The patch below is a proof of concept for the C frontend.  It doesn't
bootstrap due to warnings about uninitialised fields.  And it doesn't
have tests.  But I did test it locally with various combinations of
attribute_spec and it seemed to work as expected.

The impact on the C frontend seems to be pretty small.  It looks like
the impact on the C++ frontend would be a bit bigger, but not much.

The patch contains a logically unrelated change: c-common.h set aside
16 keywords for address spaces, but of the in-tree ports, the maximum
number of keywords used is 6 (for amdgcn).  The patch therefore changes
the limit to 8 and uses 8 keywords for the new attributes.  This keeps
the number of reserved ids <= 256.

A real, non-proof-of-concept patch series would:

- Change the address-space keywords separately, and deal with any fallout.

- Clean up the way that attributes are specified, so that it isn't
   necessary to update all definitions when adding a new field.

- Allow more precise attribute requirements, such as "function decl only".

- Add tests :)

WDYT?  Does this approach look OK in principle, or is it a non-starter?

If it is a non-starter, the fallback would be to predefi

Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-14 Thread Andrew MacLeod via Gcc-patches



On 7/14/23 09:37, Richard Biener wrote:

On Fri, 14 Jul 2023, Aldy Hernandez wrote:


I don't know what you're trying to accomplish here, as I haven't been
following the PR, but adding all these helper functions to the ranger header
file seems wrong, especially since there's only one use of them. I see you're
tweaking the irange API, adding helper functions to range-op (which is only
for code dealing with implementing range operators for tree codes), etc etc.

If you need these helper functions, I suggest you put them closer to their
uses (i.e. wherever the match.pd support machinery goes).

Note I suggested the opposite beacuse I thought these kind of helpers
are closer to value-range support than to match.pd.



probably vr-values.{cc.h} and  the simply_using_ranges paradigm would be 
the most sensible place to put these kinds of auxiliary routines?





But I take away from your answer that there's nothing close in the
value-range machinery that answers the question whether A op B may
overflow?


we dont track it in ranges themselves.   During calculation of a range 
we obviously know, but propagating that generally when we rarely care 
doesn't seem worthwhile.  The very first generation of irange 6 years 
ago had an overflow_p() flag, but it was removed as not being worth 
keeping.     easier to simply ask the question when it matters


As the routines show, it pretty easy to figure out when the need arises 
so I think that should suffice.  At least for now,


Should we decide we would like it in general, it wouldnt be hard to add 
to irange.  wi_fold() cuurently returns null, it could easily return a 
bool indicating if an overflow happened, and wi_fold_in_parts and 
fold_range would simply OR the results all together of the compoent 
wi_fold() calls.  It would require updating/audfiting  a number of 
range-op entries and adding an overflowed_p()  query to irange.


Andrew



[PATCH] Fix PR 110666: `(a != 2) == a` produces wrong code

2023-07-14 Thread Andrew Pinski via Gcc-patches
I had messed up the case where the outer operator is `==`.
The check for the resulting should have been `==` and not `!=`.
This patch fixes that and adds a full runtime testcase now for
all cases to make sure it works.

OK? Bootstrapped and tested on x86-64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/110666
* match.pd (A NEEQ (A NEEQ CST)): Fix Outer EQ case.

gcc/testsuite/ChangeLog:

PR tree-optimization/110666
* gcc.c-torture/execute/pr110666-1.c: New test.
---
 gcc/match.pd  | 34 -
 .../gcc.c-torture/execute/pr110666-1.c| 51 +++
 2 files changed, 71 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr110666-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 351d9285e92..88061fa4a6f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -6431,8 +6431,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* x != (typeof x)(x == CST) -> CST == 0 ? 1 : (CST == 1 ? (x!=0&&x!=1) : x != 
0) */
 /* x != (typeof x)(x != CST) -> CST == 1 ? 1 : (CST == 0 ? (x!=0&&x!=1) : x != 
1) */
-/* x == (typeof x)(x == CST) -> CST == 0 ? 0 : (CST == 1 ? (x==0||x==1) : x != 
0) */
-/* x == (typeof x)(x != CST) -> CST == 1 ? 0 : (CST == 0 ? (x==0||x==1) : x != 
1) */
+/* x == (typeof x)(x == CST) -> CST == 0 ? 0 : (CST == 1 ? (x==0||x==1) : x == 
0) */
+/* x == (typeof x)(x != CST) -> CST == 1 ? 0 : (CST == 0 ? (x==0||x==1) : x == 
1) */
 (for outer (ne eq)
  (for inner (ne eq)
   (simplify
@@ -6443,23 +6443,29 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  bool innereq = inner == EQ_EXPR;
  bool outereq = outer == EQ_EXPR;
 }
-   (switch
-(if (innereq ? cst0 : cst1)
- { constant_boolean_node (!outereq, type); })
-(if (innereq ? cst1 : cst0)
+(switch
+ (if (innereq ? cst0 : cst1)
+  { constant_boolean_node (!outereq, type); })
+ (if (innereq ? cst1 : cst0)
+  (with {
+tree utype = unsigned_type_for (TREE_TYPE (@0));
+tree ucst1 = build_one_cst (utype);
+   }
+   (if (!outereq)
+(gt (convert:utype @0) { ucst1; })
+(le (convert:utype @0) { ucst1; })
+   )
+  )
+ )
  (with {
-   tree utype = unsigned_type_for (TREE_TYPE (@0));
-   tree ucst1 = build_one_cst (utype);
+   tree value = build_int_cst (TREE_TYPE (@0), !innereq);
   }
-  (if (!outereq)
-   (gt (convert:utype @0) { ucst1; })
-   (le (convert:utype @0) { ucst1; })
+  (if (outereq)
+   (eq @0 { value; })
+   (ne @0 { value; })
   )
  )
 )
-(if (innereq)
- (ne @0 { build_zero_cst (TREE_TYPE (@0)); }))
-(ne @0 { build_one_cst (TREE_TYPE (@0)); }))
)
   )
  )
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr110666-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr110666-1.c
new file mode 100644
index 000..b22eb7781da
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr110666-1.c
@@ -0,0 +1,51 @@
+
+#define func_name(outer,inner,cst) outer##inner##_##cst
+#define func_name_v(outer,inner,cst) outer##inner##_##cst##_v
+
+#define func_decl(outer,inner,cst) \
+int outer##inner##_##cst (int) __attribute__((noipa)); \
+int outer##inner##_##cst (int a) { \
+  return (a op_##inner cst) op_##outer a; \
+} \
+int outer##inner##_##cst##_v (int) __attribute__((noipa)); \
+int outer##inner##_##cst##_v (volatile int a) { \
+  return (a op_##inner cst) op_##outer a; \
+}
+
+#define functions_n(outer, inner) \
+func_decl(outer,inner,0) \
+func_decl(outer,inner,1) \
+func_decl(outer,inner,2)
+
+#define functions() \
+functions_n(eq,eq) \
+functions_n(eq,ne) \
+functions_n(ne,eq) \
+functions_n(ne,ne)
+
+#define op_ne !=
+#define op_eq ==
+
+#define test(inner,outer,cst,arg) \
+func_name_v (inner,outer,cst)(arg) != func_name(inner,outer,cst)(arg)
+
+functions()
+
+#define tests_n(inner,outer,arg) \
+if (test(inner,outer,0,arg)) __builtin_abort(); \
+if (test(inner,outer,1,arg)) __builtin_abort(); \
+if (test(inner,outer,2,arg)) __builtin_abort();
+
+#define tests(arg) \
+tests_n(eq,eq,arg) \
+tests_n(eq,ne,arg) \
+tests_n(ne,eq,arg) \
+tests_n(ne,ne,arg)
+
+
+int main()
+{
+  for(int n = -1; n <= 2; n++) {
+tests(n)
+  }
+}
-- 
2.31.1



Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math

2023-07-14 Thread Andrew Pinski via Gcc-patches
On Thu, Jul 13, 2023 at 2:54 AM Richard Biener via Gcc-patches
 wrote:
>
> The following makes sure that FP x > y ? x : y style max/min operations
> are if-converted at the GIMPLE level.  While we can neither match
> it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
> than the ternary ?: operation we can make sure to maintain this form
> as a COND_EXPR so backends have the chance to match this to instructions
> their ISA offers.
>
> The patch does this in phiopt where we recognize min/max and instead
> of giving up when we have to honor NaNs we alter the generated code
> to a COND_EXPR.
>
> This resolves PR88540 and we can then SLP vectorize the min operation
> for its testcase.  It also resolves part of the regressions observed
> with the change matching bit-inserts of bit-field-refs to vec_perm.
>
> Expansion from a COND_EXPR rather than from compare-and-branch
> regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c
> by producing extra moves while the corresponding min/max operations
> are now already synthesized by RTL expansion, register selection
> isn't optimal.  This can be also provoked without this change by
> altering the operand order in the source.
>
> It regresses gcc.target/i386/pr110170.c where we end up CSEing the
> condition which makes RTL expansion no longer produce the min/max
> directly and code generation is obfuscated enough to confuse
> RTL if-conversion.
>
> It also regresses gcc.target/i386/ssefp-[12].c where oddly one
> variant isn't if-converted and ix86_expand_fp_movcc doesn't
> match directly (the FP constants get expanded twice).  A fix
> could be in emit_conditional_move where both prepare_cmp_insn
> and emit_conditional_move_1 force the constants to (different)
> registers.
>
> Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> PR tree-optimization/88540
> * tree-ssa-phiopt.cc (minmax_replacement): Do not give up
> with NaNs but handle the simple case by if-converting to a
> COND_EXPR.

One thing which I was thinking about adding to phiopt is having the
last pass do the conversion to COND_EXPR if the target supports a
conditional move for that expression. That should fix this one right?
This was one of things I was working towards with the moving to use
match-and-simplify too.

Thanks,
Andrew

>
> * gcc.target/i386/pr88540.c: New testcase.
> * gcc.target/i386/pr54855-12.c: Adjust.
> * gcc.target/i386/pr54855-13.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/pr54855-12.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr54855-13.c |  2 +-
>  gcc/testsuite/gcc.target/i386/pr88540.c| 10 ++
>  gcc/tree-ssa-phiopt.cc | 21 -
>  4 files changed, 28 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c 
> b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> index 2f8af392c83..09e8ab8ae39 100644
> --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mavx512fp16" } */
> -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
>  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
>  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } 
> } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c 
> b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> index 87b4f459a5a..a4f25066f81 100644
> --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c
> +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mavx512fp16" } */
> -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */
> +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */
>  /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */
>  /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } 
> } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c 
> b/gcc/testsuite/gcc.target/i386/pr88540.c
> new file mode 100644
> index 000..b927d0c57d5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr88540.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +void test(double* __restrict d1, double* __restrict d2, double* __restrict 
> d3)
> +{
> +  for (int n = 0; n < 2; ++n)
> +d3[n] = d1[n] < d2[n] ? d1[n] : d2[n];
> +}
> +
> +/* { dg-final { scan-assembler "minpd" } } */
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 467c9fd108a..13ee486831d 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block 
> middle_bb, basic_block alt_
>
>tree type = TREE_TYPE (PHI_RESULT (phi));
>
> -  /*

Re: [PATCH v2 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-14 Thread Ken Matsui via Gcc-patches
On Fri, Jul 14, 2023 at 3:49 AM Jonathan Wakely  wrote:
>
> On Fri, 14 Jul 2023 at 11:48, Jonathan Wakely  wrote:
> >
> > On Thu, 13 Jul 2023 at 21:04, Ken Matsui  wrote:
> > >
> > > On Thu, Jul 13, 2023 at 2:22 AM Jonathan Wakely  
> > > wrote:
> > > >
> > > > On Wed, 12 Jul 2023 at 21:42, Ken Matsui  
> > > > wrote:
> > > > >
> > > > > On Wed, Jul 12, 2023 at 3:01 AM Jonathan Wakely  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 10 Jul 2023 at 06:51, Ken Matsui via Libstdc++
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Here is the benchmark result for is_pointer:
> > > > > > >
> > > > > > > https://github.com/ken-matsui/gcc-benches/blob/main/is_pointer.md#sun-jul--9-103948-pm-pdt-2023
> > > > > > >
> > > > > > > Time: -62.1344%
> > > > > > > Peak Memory Usage: -52.4281%
> > > > > > > Total Memory Usage: -53.5889%
> > > > > >
> > > > > > Wow!
> > > > > >
> > > > > > Although maybe we could have improved our std::is_pointer_v anyway, 
> > > > > > like so:
> > > > > >
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v = false;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > > > >
> > > > > > I'm not sure why I didn't already do that.
> > > > > >
> > > > > > Could you please benchmark that? And if it is better than the 
> > > > > > current
> > > > > > impl using is_pointer<_Tp>::value then we should do this in the
> > > > > > library:
> > > > > >
> > > > > > #if __has_builtin(__is_pointer)
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > > > > > #else
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v = false;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > > > template 
> > > > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > > > > #endif
> > > > >
> > > > > Hi François and Jonathan,
> > > > >
> > > > > Thank you for your reviews! I will rename the four underscores to the
> > > > > appropriate name and take a benchmark once I get home.
> > > > >
> > > > > If I apply your change on is_pointer_v, is it better to add the
> > > > > `Co-authored-by:` line in the commit?
> > > >
> > > > Yes, that would be the correct thing to do (although in this case the
> > > > change is small enough that I don't really care about getting credit
> > > > for it :-)
> > > >
> > > Thank you! I will include it in my commit :) I see that you included
> > > the DCO sign-off in the MAINTAINERS file. However, if a reviewer
> > > doesn't, should I include the `Signed-off-by:` line for the reviewer
> > > as well?
> >
> > No, reviewers should not sign-off, that's for the code author. And
> > authors should add that themselves (or clearly state that they agree
> > to the DCO terms). You should not sign-off on someone else's behalf.
>
> You can add Reviewed-by: if you want to record that information.
>
I see. Thank you!


Re: [PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-14 Thread Andrew Pinski via Gcc-patches
On Fri, Jul 14, 2023 at 11:56 AM Roger Sayle  wrote:
>
>
>
> This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 as
>
> the host compiler.  Ok for mainline?  [I might be missing something]

I think adding const here makes this well defined C++20 too.
See http://cplusplus.github.io/LWG/lwg-defects.html#3031 .
Also see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107850 .

(I could be reading these wrong too).

Thanks,
Andrew

>
>
>
>
>
> 2023-07-14  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * tree-if-conv.cc (predicate_scalar_phi): Make the arguments
>
> to the std::sort comparison lambda function const.
>
>
>
>
>
> Cheers,
>
> Roger
>
> --
>
>
>


[PATCH] Fix bootstrap failure (with g++ 4.8.5) in tree-if-conv.cc.

2023-07-14 Thread Roger Sayle
 

This patch fixes the bootstrap failure I'm seeing using gcc 4.8.5 as

the host compiler.  Ok for mainline?  [I might be missing something]

 

 

2023-07-14  Roger Sayle  

 

gcc/ChangeLog

* tree-if-conv.cc (predicate_scalar_phi): Make the arguments

to the std::sort comparison lambda function const.

 

 

Cheers,

Roger

--

 

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 91e2eff..799f071 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -2204,7 +2204,8 @@ predicate_scalar_phi (gphi *phi, gimple_stmt_iterator 
*gsi)
 }
 
   /* Sort elements based on rankings ARGS.  */
-  std::sort(argsKV.begin(), argsKV.end(), [](ArgEntry &left, ArgEntry &right) {
+  std::sort(argsKV.begin(), argsKV.end(), [](const ArgEntry &left,
+const ArgEntry &right) {
 return left.second < right.second;
   });
 


Re: [PATCH] c++: redundant targ coercion for var/alias tmpls

2023-07-14 Thread Patrick Palka via Gcc-patches
On Thu, 13 Jul 2023, Jason Merrill wrote:

> On 7/13/23 11:48, Patrick Palka wrote:
> > On Wed, 28 Jun 2023, Patrick Palka wrote:
> > 
> > > On Wed, Jun 28, 2023 at 11:50 AM Jason Merrill  wrote:
> > > > 
> > > > On 6/23/23 12:23, Patrick Palka wrote:
> > > > > On Fri, 23 Jun 2023, Jason Merrill wrote:
> > > > > 
> > > > > > On 6/21/23 13:19, Patrick Palka wrote:
> > > > > > > When stepping through the variable/alias template specialization
> > > > > > > code
> > > > > > > paths, I noticed we perform template argument coercion twice:
> > > > > > > first from
> > > > > > > instantiate_alias_template / finish_template_variable and again
> > > > > > > from
> > > > > > > tsubst_decl (during instantiate_template).  It should suffice to
> > > > > > > perform
> > > > > > > coercion once.
> > > > > > > 
> > > > > > > To that end patch elides this second coercion from tsubst_decl
> > > > > > > when
> > > > > > > possible.  We can't get rid of it completely because we don't
> > > > > > > always
> > > > > > > specialize a variable template from finish_template_variable: we
> > > > > > > could
> > > > > > > also be doing so directly from instantiate_template during
> > > > > > > variable
> > > > > > > template partial specialization selection, in which case the
> > > > > > > coercion
> > > > > > > from tsubst_decl would be the first and only coercion.
> > > > > > 
> > > > > > Perhaps we should be coercing in lookup_template_variable rather
> > > > > > than
> > > > > > finish_template_variable?
> > > > > 
> > > > > Ah yes, there's a patch for that at
> > > > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617377.html :)
> > > > 
> > > > So after that patch, can we get rid of the second coercion completely?
> > > 
> > > On second thought it should be possible to get rid of it, if we
> > > rearrange things to always pass the primary arguments to tsubst_decl,
> > > and perform partial specialization selection from there instead of
> > > instantiate_template.  Let me try...
> > 
> > Like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > 
> > -- >8 --
> > 
> > When stepping through the variable/alias template specialization code
> > paths, I noticed we perform template argument coercion twice: first from
> > instantiate_alias_template / finish_template_variable and again from
> > tsubst_decl (during instantiate_template).  It'd be good to avoid this
> > redundant coercion.
> > 
> > It turns out that this coercion could be safely elided whenever
> > specializing a primary variable/alias template, because we can rely on
> > lookup_template_variable and instantiate_alias_template to already have
> > coerced the arguments.
> > 
> > The other situation to consider is when fully specializing a partial
> > variable template specialization (from instantiate_template), in which
> > case the passed 'args' are the (already coerced) arguments relative to
> > the partial template and 'argvec', the result of substitution into
> > DECL_TI_ARGS, are the (uncoerced) arguments relative to the primary
> > template, so coercion is still necessary.  We can still avoid this
> > coercion however if we always pass the primary variable template to
> > tsubst_decl from instantiate_template, and instead perform partial
> > specialization selection directly from tsubst_decl.  This patch
> > implements this approach.
> 
> The relationship between instantiate_template and tsubst_decl is pretty
> tangled.  We use the former to substitute (often deduced) template arguments
> into a template, and the latter to substitute template arguments into a use of
> a template...and also to implement the former.
> 
> For substitution of uses of a template, we expect to need to coerce the
> arguments after substitution.  But we avoid this issue for variable templates
> by keeping them as TEMPLATE_ID_EXPR until substitution time, so if we see a
> VAR_DECL in tsubst_decl it's either a non-template variable or under
> instantiate_template.

FWIW it seems we could also be in tsubst_decl for a VAR_DECL if

  * we're partially instantiating a class-scope variable template
during instantiation of the class
  * we're substituting a use of an already non-dependent variable
template specialization

> 
> So it seems like the current coercion for variable templates is only needed in
> this case to support the redundant hash table lookup that we just did in
> instantiate_template.  Perhaps instead of doing coercion here or moving the
> partial spec lookup, we could skip the hash table lookup for the case of a
> variable template?

It seems we'd then also have to make instantiate_template responsible
for registering the variable template specialization since tsubst_decl
no longer necessarily has the arguments relative to the primary template
('args' could be relative to the partial template).

Like so?  The following makes us perform all the specialization table
manipulation in instantiate_template instead of tsubst_decl for variable
template specializations.

Re: [WIP RFC] Add support for keyword-based attributes

2023-07-14 Thread Jakub Jelinek via Gcc-patches
On Fri, Jul 14, 2023 at 04:56:18PM +0100, Richard Sandiford via Gcc-patches 
wrote:
> Summary: We'd like to be able to specify some attributes using
> keywords, rather than the traditional __attribute__ or [[...]]
> syntax.  Would that be OK?

Will defer to C/C++ maintainers, but as you mentioned, there are many
attributes which really can't be ignored and change behavior significantly.
vector_size is one of those, mode attribute another,
no_unique_address another one (changes ABI in various cases),
the OpenMP attributes (omp::directive, omp::sequence) can change
behavior if -fopenmp, etc.
One can easily error with
#ifdef __has_cpp_attribute
#if !__has_cpp_attribute (arm::whatever)
#error arm::whatever attribute unsupported
#endif
#else
#error __has_cpp_attribute unsupported
#endif
Adding keywords instead of attributes seems to be too ugly to me.

Jakub



Re: [PATCH v2] c++: wrong error with static constexpr var in tmpl [PR109876]

2023-07-14 Thread Jason Merrill via Gcc-patches

On 7/13/23 14:54, Marek Polacek wrote:

On Fri, May 26, 2023 at 09:47:10PM -0400, Jason Merrill wrote:

On 5/26/23 19:18, Marek Polacek wrote:

The is_really_empty_class check is sort of non-obvious but the
comment should explain why I added it.

+  /* When there's nothing to initialize, we'll never mark the
+ VAR_DECL TREE_CONSTANT, therefore it would remain
+ value-dependent and we wouldn't instantiate.  */
  
Sorry it's taken so long to get back to this.



Interesting.  Can we change that (i.e. mark it TREE_CONSTANT) rather than
work around it here?


I think we can.  Maybe as in the below:

-- >8 --
Since r8-509, we'll no longer create a static temporary var for
the initializer '{ 1, 2 }' for num in the attached test because
the code in finish_compound_literal is now guarded by
'&& fcl_context == fcl_c99' but it's fcl_functional here.  This
causes us to reject num as non-constant when evaluating it in
a template.

Jason's idea was to treat num as value-dependent even though it
actually isn't.  This patch implements that suggestion.

We weren't marking objects whose type is an empty class type
constant.  This patch changes that so that v_d_e_p doesn't need
to check is_really_empty_class.

Co-authored-by: Jason Merrill 

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK, thanks.

Incidentally, I prefer to put the "ok?" line above the scissors line 
since it isn't intended to be part of the commit message.



PR c++/109876

gcc/cp/ChangeLog:

* decl.cc (cp_finish_decl): Set TREE_CONSTANT when initializing
an object of empty class type.
* pt.cc (value_dependent_expression_p) : Treat a
constexpr-declared non-constant variable as value-dependent.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-template12.C: New test.
* g++.dg/cpp1z/constexpr-template1.C: New test.
* g++.dg/cpp1z/constexpr-template2.C: New test.
---
  gcc/cp/decl.cc| 13 +--
  gcc/cp/pt.cc  |  7 
  .../g++.dg/cpp0x/constexpr-template12.C   | 38 +++
  .../g++.dg/cpp1z/constexpr-template1.C| 25 
  .../g++.dg/cpp1z/constexpr-template2.C| 25 
  5 files changed, 105 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-template12.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-template1.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-template2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 60f107d50c4..792ab330dd0 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8200,7 +8200,6 @@ void
  cp_finish_decl (tree decl, tree init, bool init_const_expr_p,
tree asmspec_tree, int flags)
  {
-  tree type;
vec *cleanups = NULL;
const char *asmspec = NULL;
int was_readonly = 0;
@@ -8220,7 +8219,7 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
/* Parameters are handled by store_parm_decls, not cp_finish_decl.  */
gcc_assert (TREE_CODE (decl) != PARM_DECL);
  
-  type = TREE_TYPE (decl);

+  tree type = TREE_TYPE (decl);
if (type == error_mark_node)
  return;
  
@@ -8410,7 +8409,7 @@ cp_finish_decl (tree decl, tree init, bool init_const_expr_p,

  if (decl_maybe_constant_var_p (decl)
  /* FIXME setting TREE_CONSTANT on refs breaks the back end.  */
  && !TYPE_REF_P (type))
-   TREE_CONSTANT (decl) = 1;
+   TREE_CONSTANT (decl) = true;
}
/* This is handled mostly by gimplify.cc, but we have to deal with
 not warning about int x = x; as it is a GCC extension to turn off
@@ -8421,6 +8420,14 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  && !warning_enabled_at (DECL_SOURCE_LOCATION (decl), OPT_Winit_self))
suppress_warning (decl, OPT_Winit_self);
  }
+  else if (VAR_P (decl)
+  && COMPLETE_TYPE_P (type)
+  && !TYPE_REF_P (type)
+  && !dependent_type_p (type)
+  && is_really_empty_class (type, /*ignore_vptr*/false))
+/* We have no initializer but there's nothing to initialize anyway.
+   Treat DECL as constant due to c++/109876.  */
+TREE_CONSTANT (decl) = true;
  
if (flag_openmp

&& TREE_CODE (decl) == FUNCTION_DECL
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index fa15b75b9c5..255d18b9539 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27983,6 +27983,13 @@ value_dependent_expression_p (tree expression)
else if (TYPE_REF_P (TREE_TYPE (expression)))
/* FIXME cp_finish_decl doesn't fold reference initializers.  */
return true;
+  /* We have a constexpr variable and we're processing a template.  When
+there's lifetime extension involved (for which finish_compound_literal
+used to create a temporary), we'll not be able to evaluate the
+variable until instantiating, so 

[PATCH] [og13] OpenMP: Dimension ordering for array-shaping operator for C and C++

2023-07-14 Thread Julian Brown
This patch fixes a bug in non-contiguous 'target update' operations using
the new array-shaping operator for C and C++, processing the dimensions
of the array the wrong way round during the OpenMP lowering pass.
Fortran was also incorrectly using the wrong ordering but the second
reversal in omp-low.cc made it produce the correct result.

The C and C++ bug only affected array shapes where the dimension sizes
are different ([X][Y]) - several existing tests used the same value
for both/all dimensions ([X][X]), which masked the problem.  Only the
array dimensions (extents) are affected, not e.g. the indices, lengths
or strides for array sections.

This patch reverses the order used in both omp-low.cc and the Fortran
front-end, so the order should now be correct for all supported base
languages.

Tested with offloading to AMD GCN.  I will apply (to og13) shortly.

2023-07-14  Julian Brown  

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_arrayshape_type): Reverse dimension
ordering for created array type.

gcc/
* omp-low.cc (lower_omp_target): Reverse iteration over array
dimensions.

libgomp/
* testsuite/libgomp.c-c++-common/array-shaping-14.c: New test.
---
 gcc/fortran/trans-openmp.cc   |  2 +-
 gcc/omp-low.cc|  6 ++--
 .../libgomp.c-c++-common/array-shaping-14.c   | 34 +++
 3 files changed, 38 insertions(+), 4 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/array-shaping-14.c

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index 6cb5340687e..6b9a0430eba 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -4271,7 +4271,7 @@ gfc_trans_omp_arrayshape_type (tree type, vec *dims)
 {
   gcc_assert (dims->length () > 0);
 
-  for (int i = dims->length () - 1; i >= 0; i--)
+  for (unsigned i = 0; i < dims->length (); i++)
 {
   tree dim = fold_convert (sizetype, (*dims)[i]);
   /* We need the index of the last element, not the array size.  */
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index c7706a5921f..ab2e4145ab2 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14290,7 +14290,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
  dims++;
}
 
-   int tdim = tdims.length () - 1;
+   unsigned tdim = 0;
 
vec *vdim;
vec *vindex;
@@ -14365,7 +14365,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
nc = nc2;
  }
 
-   if (tdim >= 0)
+   if (tdim < tdims.length ())
  {
/* We have an array shape -- use that to find the
   total size of the data on the target to look up
@@ -14403,7 +14403,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
"for array");
dim = index = len = stride = error_mark_node;
  }
-   tdim--;
+   tdim++;
 
c = nc;
  }
diff --git a/libgomp/testsuite/libgomp.c-c++-common/array-shaping-14.c 
b/libgomp/testsuite/libgomp.c-c++-common/array-shaping-14.c
new file mode 100644
index 000..4ca6f794f93
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/array-shaping-14.c
@@ -0,0 +1,34 @@
+/* { dg-do run { target offload_device_nonshared_as } } */
+
+#include 
+#include 
+#include 
+
+typedef struct {
+  int *ptr;
+} S;
+
+int main(void)
+{
+  S q;
+  q.ptr = (int *) calloc (9 * 11, sizeof (int));
+
+#pragma omp target enter data map(to: q.ptr, q.ptr[0:9*11])
+
+#pragma omp target
+  for (int i = 0; i < 9*11; i++)
+q.ptr[i] = i;
+
+#pragma omp target update from(([9][11]) q.ptr[3:3:2][1:4:3])
+
+  for (int j = 0; j < 9; j++)
+for (int i = 0; i < 11; i++)
+  if (j >= 3 && j <= 7 && ((j - 3) % 2) == 0
+ && i >= 1 && i <= 10 && ((i - 1) % 3) == 0)
+   assert (q.ptr[j * 11 + i] == j * 11 + i);
+  else
+   assert (q.ptr[j * 11 + i] == 0);
+
+#pragma omp target exit data map(release: q.ptr, q.ptr[0:9*11])
+  return 0;
+}
-- 
2.25.1



[PATCH] [og13] OpenMP: Enable c-c++-common/gomp/declare-mapper-3.c for C

2023-07-14 Thread Julian Brown
This patch enables the c-c++-common/gomp/declare-mapper-3.c test for C.
This was seemingly overlooked in commit 393fd99c90e.

Tested with offloading to AMD GCN.  I will apply (to og13) shortly.

2023-07-14  Julian Brown  

gcc/testsuite/
* c-c++-common/gomp/declare-mapper-3.c: Enable for C.
---
 gcc/testsuite/c-c++-common/gomp/declare-mapper-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/declare-mapper-3.c 
b/gcc/testsuite/c-c++-common/gomp/declare-mapper-3.c
index 983d979d68c..e491bcd0ce6 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-mapper-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-mapper-3.c
@@ -1,4 +1,4 @@
-// { dg-do compile { target c++ } }
+// { dg-do compile }
 // { dg-additional-options "-fdump-tree-gimple" }
 
 #include 
-- 
2.25.1



RE: [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq

2023-07-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, July 13, 2023 11:22 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 1/6] arm: [MVE intrinsics] Factorize vcaddq vhcaddq
> 
> Factorize vcaddq, vhcaddq so that they use the same parameterized
> names.
> 
> To be able to use the same patterns, we add a suffix to vcaddq.
> 
> Note that vcadd uses UNSPEC_VCADDxx for builtins without predication,
> and VCADDQ_ROTxx_M_x (that is, not starting with "UNSPEC_").  The
> UNPEC_* names are also used by neon.md

Thanks for working on this.
The series is ok.
Kyrill

> 
> 2023-07-13  Christophe Lyon  
> 
>   gcc/
>   * config/arm/arm_mve_builtins.def (vcaddq_rot90_,
> vcaddq_rot270_)
>   (vcaddq_rot90_f, vcaddq_rot90_f): Add "_" or "_f" suffix.
>   * config/arm/iterators.md (mve_insn): Add vcadd, vhcadd.
>   (isu): Add UNSPEC_VCADD90, UNSPEC_VCADD270,
> VCADDQ_ROT270_M_U,
>   VCADDQ_ROT270_M_S, VCADDQ_ROT90_M_U,
> VCADDQ_ROT90_M_S,
>   VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
> VHCADDQ_ROT90_S,
>   VHCADDQ_ROT270_S.
>   (rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
> VCADDQ_ROT90_M_U,
>   VCADDQ_ROT270_M_F, VCADDQ_ROT270_M_S,
> VCADDQ_ROT270_M_U,
>   VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, VHCADDQ_ROT90_M_S,
>   VHCADDQ_ROT270_M_S.
>   (mve_rot): Add VCADDQ_ROT90_M_F, VCADDQ_ROT90_M_S,
>   VCADDQ_ROT90_M_U, VCADDQ_ROT270_M_F,
> VCADDQ_ROT270_M_S,
>   VCADDQ_ROT270_M_U, VHCADDQ_ROT90_S, VHCADDQ_ROT270_S,
>   VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S.
>   (supf): Add VHCADDQ_ROT90_M_S, VHCADDQ_ROT270_M_S,
>   VHCADDQ_ROT90_S, VHCADDQ_ROT270_S, UNSPEC_VCADD90,
>   UNSPEC_VCADD270.
>   (VCADDQ_ROT270_M): Delete.
>   (VCADDQ_M_F VxCADDQ VxCADDQ_M): New.
>   (VCADDQ_ROT90_M): Delete.
>   * config/arm/mve.md (mve_vcaddq)
>   (mve_vhcaddq_rot270_s, mve_vhcaddq_rot90_s):
> Merge
>   into ...
>   (@mve_q_): ... this.
>   (mve_vcaddq): Rename into ...
>   (@mve_q_f): ... this
>   (mve_vcaddq_rot270_m_)
>   (mve_vcaddq_rot90_m_,
> mve_vhcaddq_rot270_m_s)
>   (mve_vhcaddq_rot90_m_s): Merge into ...
>   (@mve_q_m_): ... this.
>   (mve_vcaddq_rot270_m_f, mve_vcaddq_rot90_m_f):
> Merge
>   into ...
>   (@mve_q_m_f): ... this.
> ---
>  gcc/config/arm/arm_mve_builtins.def |   6 +-
>  gcc/config/arm/iterators.md |  38 +++-
>  gcc/config/arm/mve.md   | 135 +---
>  3 files changed, 62 insertions(+), 117 deletions(-)
> 
> diff --git a/gcc/config/arm/arm_mve_builtins.def
> b/gcc/config/arm/arm_mve_builtins.def
> index 8de765de3b0..63ad1845593 100644
> --- a/gcc/config/arm/arm_mve_builtins.def
> +++ b/gcc/config/arm/arm_mve_builtins.def
> @@ -187,6 +187,10 @@ VAR3 (BINOP_NONE_NONE_NONE, vmaxvq_s, v16qi,
> v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vmaxq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhsubq_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhsubq_n_s, v16qi, v8hi, v4si)
> +VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot90_, v16qi, v8hi, v4si)
> +VAR3 (BINOP_NONE_NONE_NONE, vcaddq_rot270_, v16qi, v8hi, v4si)
> +VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot90_f, v8hf, v4sf)
> +VAR2 (BINOP_NONE_NONE_NONE, vcaddq_rot270_f, v8hf, v4sf)
>  VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot90_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhcaddq_rot270_s, v16qi, v8hi, v4si)
>  VAR3 (BINOP_NONE_NONE_NONE, vhaddq_s, v16qi, v8hi, v4si)
> @@ -870,8 +874,6 @@ VAR3
> (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi,
> v8hi, v4si)
>  VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u,
> v16qi, v8hi, v4si)
> 
>  /* optabs without any suffixes.  */
> -VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot90, v16qi, v8hi, v4si, v8hf,
> v4sf)
> -VAR5 (BINOP_NONE_NONE_NONE, vcaddq_rot270, v16qi, v8hi, v4si, v8hf,
> v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot90, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot270, v8hf, v4sf)
>  VAR2 (BINOP_NONE_NONE_NONE, vcmulq_rot180, v8hf, v4sf)
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 9e77af55d60..da1ead34e58 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -902,6 +902,7 @@
>])
> 
>  (define_int_attr mve_insn [
> +  (UNSPEC_VCADD90 "vcadd") (UNSPEC_VCADD270 "vcadd")
>(VABAVQ_P_S "vabav") (VABAVQ_P_U "vabav")
>(VABAVQ_S "vabav") (VABAVQ_U "vabav")
>(VABDQ_M_S "vabd") (VABDQ_M_U "vabd") (VABDQ_M_F
> "vabd")
> @@ -925,6 +926,8 @@
>(VBICQ_N_S "vbic") (VBICQ_N_U "vbic")
>(VBRSRQ_M_N_S "vbrsr") (VBRSRQ_M_N_U "vbrsr")
> (VBRSRQ_M_N_F "vbrsr")
>(VBRSRQ_N_S "vbrsr") (VBRSRQ_N_U "vbrsr") (VBRSRQ_N_F
> "vbrsr")
> +  (VCADDQ_ROT270_M_U "vcadd") (VCADDQ_ROT270_M_S
> "vcadd") (VCADDQ_ROT270_M_F "vcadd")
> +  (VCADDQ_ROT9

[WIP RFC] Add support for keyword-based attributes

2023-07-14 Thread Richard Sandiford via Gcc-patches
Summary: We'd like to be able to specify some attributes using
keywords, rather than the traditional __attribute__ or [[...]]
syntax.  Would that be OK?

In more detail:

We'd like to add some new target-specific attributes for Arm SME.
These attributes affect semantics and code generation and so they
can't simply be ignored.

Traditionally we've done this kind of thing by adding GNU attributes,
via TARGET_ATTRIBUTE_TABLE in GCC's case.  The problem is that both
GCC and Clang have traditionally only warned about unrecognised GNU
attributes, rather than raising an error.  Older compilers might
therefore be able to look past some uses of the new attributes and
still produce object code, even though that object code is almost
certainly going to be wrong.  (The compilers will also emit a default-on
warning, but that might go unnoticed when building a big project.)

There are some existing attributes that similarly affect semantics
in ways that cannot be ignored.  vector_size is one obvious example.
But that doesn't make it a good thing. :)

Also, C++ says this for standard [[...]] attributes:

  For an attribute-token (including an attribute-scoped-token)
  not specified in this document, the behavior is implementation-defined;
  any such attribute-token that is not recognized by the implementation
  is ignored.

which doubles down on the idea that attributes should not be used
for necessary semantic information.

One of the attributes we'd like to add provides a new way of compiling
existing code.  The attribute doesn't require SME to be available;
it just says that the code must be compiled so that it can run in either
of two modes.  This is probably the most dangerous attribute of the set,
since compilers that ignore it would just produce normal code.  That
code might work in some test scenarios, but it would fail in others.

The feeling from the Clang community was therefore that these SME
attributes should use keywords instead, so that the keywords trigger
an error with older compilers.

However, it seemed wrong to define new SME-specific grammar rules,
since the underlying problem is pretty generic.  We therefore
proposed having a type of keyword that can appear exactly where
a standard [[...]] attribute can appear and that appertains to
exactly what a standard [[...]] attribute would appertain to.
No divergence or cherry-picking is allowed.

For example:

  [[arm::foo]]

would become:

  __arm_foo

and:

  [[arm::bar(args)]]

would become:

  __arm_bar(args)

It wouldn't be possible to retrofit arguments to a keyword that
previously didn't take arguments, since that could lead to parsing
ambiguities.  So when a keyword is first added, a binding decision
would need to be made whether the keyword always takes arguments
or is always standalone.

For that reason, empty argument lists are allowed for keywords,
even though they're not allowed for [[...]] attributes.

The argument-less version was accepted into Clang, and I have a follow-on
patch for handling arguments.  Would the same thing be OK for GCC,
in both the C and C++ frontends?

The patch below is a proof of concept for the C frontend.  It doesn't
bootstrap due to warnings about uninitialised fields.  And it doesn't
have tests.  But I did test it locally with various combinations of
attribute_spec and it seemed to work as expected.

The impact on the C frontend seems to be pretty small.  It looks like
the impact on the C++ frontend would be a bit bigger, but not much.

The patch contains a logically unrelated change: c-common.h set aside
16 keywords for address spaces, but of the in-tree ports, the maximum
number of keywords used is 6 (for amdgcn).  The patch therefore changes
the limit to 8 and uses 8 keywords for the new attributes.  This keeps
the number of reserved ids <= 256.

A real, non-proof-of-concept patch series would:

- Change the address-space keywords separately, and deal with any fallout.

- Clean up the way that attributes are specified, so that it isn't
  necessary to update all definitions when adding a new field.

- Allow more precise attribute requirements, such as "function decl only".

- Add tests :)

WDYT?  Does this approach look OK in principle, or is it a non-starter?

If it is a non-starter, the fallback would be to predefine macros
that expand to [[...]] or __attribute__.  Having the keywords gives
more precise semantics and better error messages though.

Thanks,
Richard
---
 gcc/attribs.cc| 30 +++-
 gcc/c-family/c-common.h   | 13 ++
 gcc/c/c-parser.cc | 88 +--
 gcc/config/aarch64/aarch64.cc |  1 +
 gcc/tree-core.h   | 19 
 5 files changed, 135 insertions(+), 16 deletions(-)

diff --git a/gcc/attribs.cc b/gcc/attribs.cc
index b8cb55b97df..706cd81329c 100644
--- a/gcc/attribs.cc
+++ b/gcc/attribs.cc
@@ -752,6 +752,11 @@ decl_attributes (tree *node, tree attributes, int flags,
 
   if (spec->decl_required && !DECL_P (*anode)

RE: [PATCH 2/2] [testsuite, arm]: Make mve_fp_fpu[12].c accept single or double precision FPU

2023-07-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, July 13, 2023 11:22 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw 
> Cc: Christophe Lyon 
> Subject: [PATCH 2/2] [testsuite,arm]: Make mve_fp_fpu[12].c accept single or
> double precision FPU
> 
> This tests currently expect a directive containing .fpu fpv5-sp-d16
> and thus may fail if the test is executed for instance with
> -march=armv8.1-m.main+mve.fp+fp.dp
> 
> This patch accepts either fpv5-sp-d16 or fpv5-d16 to avoid the failure.
> 

Ok.
Thanks,
Kyrill

> 2023-06-28  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c: Fix .fpu
>   scan-assembler.
>   * gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c | 2 +-
>  gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
> index e375327fb97..8358a616bb5 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu1.c
> @@ -12,4 +12,4 @@ foo1 (int8x16_t value)
>return b;
>  }
> 
> -/* { dg-final { scan-assembler "\.fpu fpv5-sp-d16" }  } */
> +/* { dg-final { scan-assembler "\.fpu fpv5(-sp|)-d16" }  } */
> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
> index 1fca1100cf0..5dd2feefc35 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_fp_fpu2.c
> @@ -12,4 +12,4 @@ foo1 (int8x16_t value)
>return b;
>  }
> 
> -/* { dg-final { scan-assembler "\.fpu fpv5-sp-d16" }  } */
> +/* { dg-final { scan-assembler "\.fpu fpv5(-sp|)-d16" }  } */
> --
> 2.34.1



RE: [PATCH 1/2] [testsuite,arm]: Make nomve_fp_1.c require arm_fp

2023-07-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, July 13, 2023 11:22 AM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw 
> Cc: Christophe Lyon 
> Subject: [PATCH 1/2] [testsuite,arm]: Make nomve_fp_1.c require arm_fp
> 
> If GCC is configured with the default (soft) -mfloat-abi, and we don't
> override the target_board test flags appropriately,
> gcc.target/arm/mve/general-c/nomve_fp_1.c fails for lack of
> -mfloat-abi=softfp or -mfloat-abi=hard, because it doesn't use
> dg-add-options arm_v8_1m_mve (on purpose, see comment in the test).
> 
> Require and use the options needed for arm_fp to fix this problem.

Ok.
Thanks,
Kyrill

> 
> 2023-06-28  Christophe Lyon  
> 
>   gcc/testsuite/
>   * gcc.target/arm/mve/general-c/nomve_fp_1.c: Require arm_fp.
> ---
>  gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> index 21c2af16a61..c9d279ead68 100644
> --- a/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> +++ b/gcc/testsuite/gcc.target/arm/mve/general-c/nomve_fp_1.c
> @@ -1,9 +1,11 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target arm_v8_1m_mve_ok } */
> +/* { dg-require-effective-target arm_fp_ok } */
>  /* Do not use dg-add-options arm_v8_1m_mve, because this might expand
> to "",
> which could imply mve+fp depending on the user settings. We want to
> make
> sure the '+fp' extension is not enabled.  */
>  /* { dg-options "-mfpu=auto -march=armv8.1-m.main+mve" } */
> +/* { dg-add-options arm_fp } */
> 
>  #include 
> 
> --
> 2.34.1



Re: [PATCH v3 1/3] c++: Track lifetimes in constant evaluation [PR70331,PR96630,PR98675]

2023-07-14 Thread Jason Merrill via Gcc-patches

On 6/30/23 23:28, Nathaniel Shead via Gcc-patches wrote:

This adds rudimentary lifetime tracking in C++ constexpr contexts,


Thanks!

I'm not seeing either a copyright assignment or DCO certification for 
you; please see https://gcc.gnu.org/contribute.html#legal for more 
information.



diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index cca0435bafc..bc59b4aab67 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1188,7 +1190,12 @@ public:
  if (!already_in_map && modifiable)
modifiable->add (t);
}
-  void remove_value (tree t) { values.remove (t); }
+  void remove_value (tree t)
+  {
+if (DECL_P (t))
+  outside_lifetime.add (t);
+values.remove (t);


What if, instead of removing the variable from one hash table and adding 
it to another, we change the value to, say, void_node?



+ /* Also don't cache a call if we return a pointer to an expired
+value.  */
+ if (cacheable && (cp_walk_tree_without_duplicates
+   (&result, find_expired_values,
+&ctx->global->outside_lifetime)))
+   cacheable = false;


I think we need to reconsider cacheability in general; I think we only 
want to cache calls that are themselves valid constant expressions, in 
that the return value is a "permitted result of a constant expression" 
(https://eel.is/c++draft/expr.const#13).  A pointer to an automatic 
variable is not, whether or not it is currently within its lifetime.


That is, only cacheable if reduced_constant_expression_p (result).

I'm experimenting with this now, you don't need to mess with it.


@@ -7085,7 +7138,7 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
  case PARM_DECL:
if (lval && !TYPE_REF_P (TREE_TYPE (t)))
/* glvalue use.  */;
-  else if (tree v = ctx->global->get_value (r))
+  else if (tree v = ctx->global->get_value (t))


I agree with this change, but it doesn't have any actual effect, right? 
I'll go ahead and apply it separately.



@@ -7328,17 +7386,28 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
auto_vec cleanups;
vec *prev_cleanups = ctx->global->cleanups;
ctx->global->cleanups = &cleanups;
-   r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0),
+
+   auto_vec save_exprs;


Now that we're going to track temporaries for each full-expression, I 
think we shouldn't also need to track them for loops and calls.


Jason



[PATCH v2] vect: Handle demoting FLOAT and promoting FIX_TRUNC.

2023-07-14 Thread Robin Dapp via Gcc-patches
>>> Can you add testcases?  Also the current restriction is because
>>> the variants you add are not always correct and I don't see any
>>> checks that the intermediate type doesn't lose significant bits?

I didn't manage to create one for aarch64 nor for x86 because AVX512
has direct conversions e.g. for int64_t -> _Float16 and the new code
will not be triggered.  Instead I added two separate RISC-V tests.

The attached V2 always checks trapping_math when converting float
to integer and, like the NARROW_DST case, checks if the operand fits
the intermediate type when demoting from int to float.

Would that be sufficient?

riscv seems to be the only backend not (yet?) providing pack/unpack
expanders for the vect conversions and rather relying on extend/trunc
which seems a disadvantage now, particularly for the cases requiring
!flag_trapping_math with NONE but not for NARROW_DST.  That might
be reason enough to implement pack/unpack in the backend.

Nevertheless the patch might improve the status quo a bit?

Regards
 Robin


The recent changes that allowed multi-step conversions for
"non-packing/unpacking", i.e. modifier == NONE targets included
promoting to-float and demoting to-int variants.  This patch
adds the missing demoting to-float and promoting to-int handling.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_conversion): Handle
more demotion/promotion for modifier == NONE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c: 
New test.
* gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c: 
New test.
---
 .../conversions/vec-narrow-int64-float16.c| 12 
 .../conversions/vec-widen-float16-int64.c | 12 
 gcc/tree-vect-stmts.cc| 58 +++
 3 files changed, 71 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
new file mode 100644
index 000..ebee1cfa888
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-narrow-int64-float16.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh 
-mabi=lp64d --param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+void convert (_Float16 *restrict dst, int64_t *restrict a, int n)
+{
+  for (int i = 0; i < n; i++)
+dst[i] = (_Float16) (a[i] & 0x7fff);
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
new file mode 100644
index 000..eb0a17e99bc
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vec-widen-float16-int64.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model -march=rv64gcv_zvfh 
-mabi=lp64d --param=riscv-autovec-preference=scalable -fno-trapping-math" } */
+
+#include 
+
+void convert (int64_t *restrict dst, _Float16 *restrict a, int n)
+{
+  for (int i = 0; i < n; i++)
+dst[i] = (int64_t) a[i];
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c08d0ef951f..c78a750301d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -5192,29 +5192,65 @@ vectorizable_conversion (vec_info *vinfo,
break;
   }
 
-  /* For conversions between float and smaller integer types try whether we
-can use intermediate signed integer types to support the
+  /* For conversions between float and integer types try whether
+we can use intermediate signed integer types to support the
 conversion.  */
   if ((code == FLOAT_EXPR
-  && GET_MODE_SIZE (lhs_mode) > GET_MODE_SIZE (rhs_mode))
+  && GET_MODE_SIZE (lhs_mode) != GET_MODE_SIZE (rhs_mode))
  || (code == FIX_TRUNC_EXPR
- && GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode)
- && !flag_trapping_math))
+ && (GET_MODE_SIZE (rhs_mode) != GET_MODE_SIZE (lhs_mode)
+ && !flag_trapping_math)))
{
+ bool demotion = GET_MODE_SIZE (rhs_mode) > GET_MODE_SIZE (lhs_mode);
  bool float_expr_p = code == FLOAT_EXPR;
- scalar_mode imode = float_expr_p ? rhs_mode : lhs_mode;
- fltsz = GET_MODE_SIZE (float_expr_p ? lhs_mode : rhs_mode);
+ unsigned short target_size;
+ scalar_mode intermediate_mode;
+ if (demotion)
+   {
+ intermed

vectorizer: Avoid an OOB access from vectorization

2023-07-14 Thread Matthew Malcomson via Gcc-patches
Our checks for whether the vectorization of a given loop would make an
out of bounds access miss the case when the vector we load is so large
as to span multiple iterations worth of data (while only being there to
implement a single iteration).

This patch adds a check for such an access.

Example where this was going wrong (smaller version of testcase added):

```
  extern unsigned short multi_array[5][16][16];
  extern void initialise_s(int *);
  extern int get_sval();

  void foo() {
int s0 = get_sval();
int s[31];
int i,j;
initialise_s(&s[0]);
s0 = get_sval();
for (j=0; j < 16; j++)
  for (i=0; i < 16; i++)
multi_array[1][j][i]=s[j*2];
  }
```

With the above loop we would load the `s[j*2]` integer into a 4 element
vector, which reads 3 extra elements than the scalar loop would.
`get_group_load_store_type` identifies that the loop requires a scalar
epilogue due to gaps.  However we do not identify that the above code
requires *two* scalar loops to be peeled due to the fact that each
iteration loads an amount of data from the *next* iteration (while not
using it).

Bootstrapped and regtested on aarch64-none-linux-gnu.
N.b. out of interest we came across this working with Morello.


### Attachment also inlined for ease of reply###


diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c 
b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
new file mode 100644
index 
..1b721fd26cab8d5583b153dd6b28c914db870ec3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c
@@ -0,0 +1,60 @@
+/* For some targets we end up vectorizing the below loop such that the `sp`
+   single integer is loaded into a 4 integer vector.
+   While the writes are all safe, without 2 scalar loops being peeled into the
+   epilogue we would read past the end of the 31 integer array.  This happens
+   because we load a 4 integer chunk to only use the first integer and
+   increment by 2 integers at a time, hence the last load needs s[30-33] and
+   the penultimate load needs s[28-31].
+   This testcase ensures that we do not crash due to that behaviour.  */
+/* { dg-require-effective-target mmap } */
+#include 
+#include 
+
+#define MMAP_SIZE 0x2
+#define ADDRESS 0x112200
+
+#define MB_BLOCK_SIZE 16
+#define VERT_PRED_16 0
+#define HOR_PRED_16 1
+#define DC_PRED_16 2
+int *sptr;
+extern void intrapred_luma_16x16();
+unsigned short mprr_2[5][16][16];
+void initialise_s(int *s) { }
+int main() {
+void *s_mapping;
+void *end_s;
+s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+if (s_mapping == MAP_FAILED)
+  {
+   perror ("mmap");
+   return 1;
+  }
+end_s = (s_mapping + MMAP_SIZE);
+sptr = (int*)(end_s - sizeof(int[31]));
+intrapred_luma_16x16(sptr);
+return 0;
+}
+
+void intrapred_luma_16x16(int * restrict sp) {
+for (int j=0; j < MB_BLOCK_SIZE; j++)
+  {
+   mprr_2[VERT_PRED_16][j][0]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][1]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][2]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][3]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][4]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][5]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][6]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][7]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][8]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][9]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][10]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][11]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][12]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][13]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][14]=sp[j*2];
+   mprr_2[VERT_PRED_16][j][15]=sp[j*2];
+  }
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
c08d0ef951fc63adcfffc601917134ddf51ece45..1c8c6784cc7b5f2d327339ff55a5a5ea08835aab
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2217,7 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 but the access in the loop doesn't cover the full vector
 we can end up with no gap recorded but still excess
 elements accessed, see PR103116.  Make sure we peel for
-gaps if necessary and sufficient and give up if not.  */
+gaps if necessary and sufficient and give up if not.
+If there is a combination of the access not covering the full 
vector and
+a gap recorded then we may need to peel twice.  */
  if (loop_vinfo
  && *memory_access_type == VMAT_CONTIGUOUS
  && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
@@ -2233,7 +2235,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info 
stmt_info,
 access excess elements.
 ???  Enhancements include peeling multiple iterations
 or using masked loads with a static mask.  */
-   

Turn TODO_rebuild_frequencies to a pass

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi,
currently we rebuild profile_counts from profile_probability after inlining,
because there is a chance that producing large loop nests may get 
unrealistically
large profile_count values.  This is much less of concern when we switched to
new profile_count representation while back.

This propagation can also compensate for profile inconsistencies caused by
optimization passes.  Since inliner is followed by basic cleanup passes that
does not use profile, we get more realistic profile by delaying the 
recomputation
after basic optimizations exposed by inlininig are finished.

This does not fit into TODO machinery, so I turn rebuilding into stand alone
pass and schedule it before first consumer of profile in the optimization
queue.

I also added logic that avoids repropagating when CFG is good and not too close
to overflow.  Propagating visits very basic block loop_depth times, so it is
not linear and avoiding it may help a bit.

On tramp3d we get 14 functions repropagated and 916 are OK.  The repropagated
functions are RB tree ones where we produce crazy loop nests by recurisve 
inlining.
This is something to fix independently.

Bootstrapped/regtested x86_64-linux.  Plan to commit it later today
if there are no complains.

Honza

gcc/ChangeLog:

* passes.cc (execute_function_todo): Remove
TODO_rebuild_frequencies
* passes.def: Add rebuild_frequencies pass.
* predict.cc (estimate_bb_frequencies): Drop
force parameter.
(tree_estimate_probability): Update call of
estimate_bb_frequencies.
(rebuild_frequencies): Turn into a pass; verify CFG profile consistency
first and do not rebuild if not necessary.
(class pass_rebuild_frequencies): New.
(make_pass_rebuild_frequencies): New.
* profile-count.h: Add profile_count::very_large_p.
* tree-inline.cc (optimize_inline_calls): Do not return
TODO_rebuild_frequencies
* tree-pass.h (TODO_rebuild_frequencies): Remove.
(make_pass_rebuild_frequencies): Declare.

diff --git a/gcc/passes.cc b/gcc/passes.cc
index 2f0e378b8b2..d7b0ad271a1 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -2075,9 +2075,6 @@ execute_function_todo (function *fn, void *data)
   if (flags & TODO_remove_unused_locals)
 remove_unused_locals ();
 
-  if (flags & TODO_rebuild_frequencies)
-rebuild_frequencies ();
-
   if (flags & TODO_rebuild_cgraph_edges)
 cgraph_edge::rebuild_edges ();
 
diff --git a/gcc/passes.def b/gcc/passes.def
index faa5208b26b..f2893ae8a8b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -206,6 +206,10 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_post_ipa_warn);
   /* Must run before loop unrolling.  */
   NEXT_PASS (pass_warn_access, /*early=*/true);
+  /* Profile count may overflow as a result of inlinining very large
+ loop nests.  This pass should run before any late pass that makes
+use of profile.  */
+  NEXT_PASS (pass_rebuild_frequencies);
   NEXT_PASS (pass_complete_unrolli);
   NEXT_PASS (pass_backprop);
   NEXT_PASS (pass_phiprop);
@@ -395,6 +399,10 @@ along with GCC; see the file COPYING3.  If not see
  to forward object-size and builtin folding results properly.  */
   NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dce);
+  /* Profile count may overflow as a result of inlinining very large
+ loop nests.  This pass should run before any late pass that makes
+use of profile.  */
+  NEXT_PASS (pass_rebuild_frequencies);
   NEXT_PASS (pass_sancov);
   NEXT_PASS (pass_asan);
   NEXT_PASS (pass_tsan);
diff --git a/gcc/predict.cc b/gcc/predict.cc
index 1aa4c25eb70..26f9f3f6a88 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -89,7 +89,7 @@ static void predict_paths_leading_to_edge (edge, enum 
br_predictor,
 static bool can_predict_insn_p (const rtx_insn *);
 static HOST_WIDE_INT get_predictor_value (br_predictor, HOST_WIDE_INT);
 static void determine_unlikely_bbs ();
-static void estimate_bb_frequencies (bool force);
+static void estimate_bb_frequencies ();
 
 /* Information we hold about each branch predictor.
Filled using information from predict.def.  */
@@ -3169,8 +3169,9 @@ tree_estimate_probability (bool dry_run)
   delete bb_predictions;
   bb_predictions = NULL;
 
-  if (!dry_run)
-estimate_bb_frequencies (false);
+  if (!dry_run
+  && profile_status_for_fn (cfun) != PROFILE_READ)
+estimate_bb_frequencies ();
   free_dominance_info (CDI_POST_DOMINATORS);
   remove_fake_exit_edges ();
 }
@@ -3923,103 +3924,97 @@ determine_unlikely_bbs ()
 }
 
 /* Estimate and propagate basic block frequencies using the given branch
-   probabilities.  If FORCE is true, the frequencies are used to estimate
-   the counts even when there are already non-zero profile counts.  */
+   probabilities.  */
 
 static void
-estimate_bb_frequencies (bool force)
+estimate_bb_frequencies ()
 {
   basic_bloc

Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-14 Thread Aldy Hernandez via Gcc-patches




On 7/14/23 15:37, Richard Biener wrote:

On Fri, 14 Jul 2023, Aldy Hernandez wrote:


I don't know what you're trying to accomplish here, as I haven't been
following the PR, but adding all these helper functions to the ranger header
file seems wrong, especially since there's only one use of them. I see you're
tweaking the irange API, adding helper functions to range-op (which is only
for code dealing with implementing range operators for tree codes), etc etc.

If you need these helper functions, I suggest you put them closer to their
uses (i.e. wherever the match.pd support machinery goes).


Note I suggested the opposite beacuse I thought these kind of helpers
are closer to value-range support than to match.pd.


Oh sorry, I missed that.



But I take away from your answer that there's nothing close in the
value-range machinery that answers the question whether A op B may
overflow?


Not currently.

I vaguely recall we talked about some mechanism for doing range 
operations in a wider precision and comparing them with the result of 
doing it in the natural precision, and if the results differ, it must 
have overflowed.


*hunts down PR*

Comment 23 here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100499#c23

Would something like that work?

I would prefer something more general, rather than having to re-invent 
every range-op entry to check for overflow.


Aldy



Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-14 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Aldy Hernandez wrote:

> I don't know what you're trying to accomplish here, as I haven't been
> following the PR, but adding all these helper functions to the ranger header
> file seems wrong, especially since there's only one use of them. I see you're
> tweaking the irange API, adding helper functions to range-op (which is only
> for code dealing with implementing range operators for tree codes), etc etc.
> 
> If you need these helper functions, I suggest you put them closer to their
> uses (i.e. wherever the match.pd support machinery goes).

Note I suggested the opposite beacuse I thought these kind of helpers
are closer to value-range support than to match.pd.

But I take away from your answer that there's nothing close in the 
value-range machinery that answers the question whether A op B may
overflow?

Richard.

> Aldy
> 
> On 7/11/23 11:04, Jiufu Guo wrote:
> > Hi,
> > 
> > Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
> > if there is no wrap/overflow/underflow and "X - N * M" has the same
> > sign with "X".
> > 
> > Compare the previous version:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623028.html
> > - The APIs for checking overflow of range operation are moved to
> > other files: range-op and gimple-range.
> > - Improve the patterns with '(X + C)' for unsigned type.
> > 
> > Bootstrap & regtest pass on ppc64{,le} and x86_64.
> > Is this patch ok for trunk?
> > 
> > BR,
> > Jeff (Jiufu Guo)
> > 
> > 
> >  PR tree-optimization/108757
> > 
> > gcc/ChangeLog:
> > 
> >  * gimple-range.cc (arith_without_overflow_p): New function.
> >  (same_sign_p): New function.
> >  * gimple-range.h (arith_without_overflow_p): New declare.
> >  (same_sign_p): New declare.
> >  * match.pd ((X - N * M) / N): New pattern.
> >  ((X + N * M) / N): New pattern.
> >  ((X + C) div_rshift N): New pattern.
> >  * range-op.cc (plus_without_overflow_p): New function.
> >  (minus_without_overflow_p): New function.
> >  (mult_without_overflow_p): New function.
> >  * range-op.h (plus_without_overflow_p): New declare.
> >  (minus_without_overflow_p): New declare.
> >  (mult_without_overflow_p): New declare.
> >  * value-query.h (get_range): New function
> >  * value-range.cc (irange::nonnegative_p): New function.
> >  (irange::nonpositive_p): New function.
> >  * value-range.h (irange::nonnegative_p): New declare.
> >  (irange::nonpositive_p): New declare.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.dg/pr108757-1.c: New test.
> >  * gcc.dg/pr108757-2.c: New test.
> >  * gcc.dg/pr108757.h: New test.
> > 
> > ---
> >   gcc/gimple-range.cc   |  50 +++
> >   gcc/gimple-range.h|   2 +
> >   gcc/match.pd  |  64 
> >   gcc/range-op.cc   |  77 ++
> >   gcc/range-op.h|   4 +
> >   gcc/value-query.h |  10 ++
> >   gcc/value-range.cc|  12 ++
> >   gcc/value-range.h |   2 +
> >   gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
> >   gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
> >   gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
> >   11 files changed, 491 insertions(+)
> >   create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
> >   create mode 100644 gcc/testsuite/gcc.dg/pr108757.h
> > 
> > diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
> > index
> > 01e62d3ff3901143bde33dc73c0debf41d0c0fdd..620fe32e85e5fe3847a933554fc656b2939cf02d
> > 100644
> > --- a/gcc/gimple-range.cc
> > +++ b/gcc/gimple-range.cc
> > @@ -926,3 +926,53 @@ assume_query::dump (FILE *f)
> >   }
> > fprintf (f, "--\n");
> >   }
> > +
> > +/* Return true if the operation "X CODE Y" in type does not overflow
> > +   underflow or wrap with value range info, otherwise return false.  */
> > +
> > +bool
> > +arith_without_overflow_p (tree_code code, tree x, tree y, tree type)
> > +{
> > +  gcc_assert (INTEGRAL_TYPE_P (type));
> > +
> > +  if (TYPE_OVERFLOW_UNDEFINED (type))
> > +return true;
> > +
> > +  value_range vr0;
> > +  value_range vr1;
> > +  if (!(get_range (vr0, x) && get_range (vr1, y)))
> > +return false;
> > +
> > +  switch (code)
> > +{
> > +case PLUS_EXPR:
> > +  return plus_without_overflow_p (vr0, vr1, type);
> > +case MINUS_EXPR:
> > +  return minus_without_overflow_p (vr0, vr1, type);
> > +case MULT_EXPR:
> > +  return mult_without_overflow_p (vr0, vr1, type);
> > +default:
> > +  gcc_unreachable ();
> > +}
> > +
> > +  return false;
> > +}
> > +
> > +/* Return true if "X" and "Y" have the same sign or zero.  */
> > +
> > +bool
> > +same_sign_p (tree x, tree y, tree type)
> > +{
> > +  gcc_assert (INTEGRAL_TYPE_P (type));
> > +
> > +  if (TYPE_UNSIGNED (type))
> > +return true;
> > +
> > +  value_range vr0;
> > +  value_range vr1;
> > +  if (!(get_range (vr0, x) && get_range (

RE: [PATCH 12/19]middle-end: implement loop peeling and IV updates for early break.

2023-07-14 Thread Richard Biener via Gcc-patches
On Thu, 13 Jul 2023, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 13, 2023 6:31 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> > Subject: Re: [PATCH 12/19]middle-end: implement loop peeling and IV
> > updates for early break.
> > 
> > On Wed, 28 Jun 2023, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This patch updates the peeling code to maintain LCSSA during peeling.
> > > The rewrite also naturally takes into account multiple exits and so it 
> > > didn't
> > > make sense to split them off.
> > >
> > > For the purposes of peeling the only change for multiple exits is that the
> > > secondary exits are all wired to the start of the new loop preheader when
> > doing
> > > epilogue peeling.
> > >
> > > When doing prologue peeling the CFG is kept in tact.
> > >
> > > For both epilogue and prologue peeling we wire through between the two
> > loops any
> > > PHI nodes that escape the first loop into the second loop if flow_loops is
> > > specified.  The reason for this conditionality is because
> > > slpeel_tree_duplicate_loop_to_edge_cfg is used in the compiler in 3 ways:
> > >   - prologue peeling
> > >   - epilogue peeling
> > >   - loop distribution
> > >
> > > for the last case the loops should remain independent, and so not be
> > connected.
> > > Because of this propagation of only used phi nodes get_current_def can be
> > used
> > > to easily find the previous definitions.  However live statements that are
> > > not used inside the loop itself are not propagated (since if unused, the
> > moment
> > > we add the guard in between the two loops the value across the bypass edge
> > can
> > > be wrong if the loop has been peeled.)
> > >
> > > This is dealt with easily enough in find_guard_arg.
> > >
> > > For multiple exits, while we are in LCSSA form, and have a correct DOM 
> > > tree,
> > the
> > > moment we add the guard block we will change the dominators again.  To
> > deal with
> > > this slpeel_tree_duplicate_loop_to_edge_cfg can optionally return the 
> > > blocks
> > to
> > > update without having to recompute the list of blocks to update again.
> > >
> > > When multiple exits and doing epilogue peeling we will also temporarily 
> > > have
> > an
> > > incorrect VUSES chain for the secondary exits as it anticipates the final 
> > > result
> > > after the VDEFs have been moved.  This will thus be corrected once the 
> > > code
> > > motion is applied.
> > >
> > > Lastly by doing things this way we can remove the helper functions that
> > > previously did lock step iterations to update things as it went along.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > 
> > Not sure if I get through all of this in one go - so be prepared that
> > the rest of the review follows another day.
> 
> No worries, I appreciate the reviews!
> Just giving some quick replies for when you continue.

Continueing.

> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * tree-loop-distribution.cc (copy_loop_before): Pass flow_loops =
> > false.
> > >   * tree-ssa-loop-niter.cc (loop_only_exit_p):  Fix bug when exit==null.
> > >   * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add
> > additional
> > >   assert.
> > >   (vect_set_loop_condition_normal): Skip modifying loop IV for multiple
> > >   exits.
> > >   (slpeel_tree_duplicate_loop_to_edge_cfg): Support multiple exit
> > peeling.
> > >   (slpeel_can_duplicate_loop_p): Likewise.
> > >   (vect_update_ivs_after_vectorizer): Don't enter this...
> > >   (vect_update_ivs_after_early_break): ...but instead enter here.
> > >   (find_guard_arg): Update for new peeling code.
> > >   (slpeel_update_phi_nodes_for_loops): Remove.
> > >   (slpeel_update_phi_nodes_for_guard2): Remove hardcoded edge 0
> > checks.
> > >   (slpeel_update_phi_nodes_for_lcssa): Remove.
> > >   (vect_do_peeling): Fix VF for multiple exits and force epilogue.
> > >   * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
> > >   non_break_control_flow and early_breaks.
> > >   (vect_need_peeling_or_partial_vectors_p): Force partial vector if
> > >   multiple exits and VLA.
> > >   (vect_analyze_loop_form): Support inner loop multiple exits.
> > >   (vect_create_loop_vinfo): Set LOOP_VINFO_EARLY_BREAKS.
> > >   (vect_create_epilog_for_reduction):  Update live phi nodes.
> > >   (vectorizable_live_operation): Ignore live operations in vector loop
> > >   when multiple exits.
> > >   (vect_transform_loop): Force unrolling for VF loops and multiple exits.
> > >   * tree-vect-stmts.cc (vect_stmt_relevant_p): Analyze ctrl statements.
> > >   (vect_mark_stmts_to_be_vectorized): Check for non-exit control flow
> > and
> > >   analyze gcond params.
> > >   (vect_analyze_stmt): Support gcond.
> > >   * tree-vectorizer.cc (pass_vectorize::execute): Support multiple exits
> > >   in RPO pass.
> > >   * tree-vectoriz

[PATCH v1] RISC-V: Fix RVV frm run test failure on RV32

2023-07-14 Thread Pan Li via Gcc-patches
From: Pan Li 

Refine the run test case to avoid interactive checking in RV32, by
separating each checks in different functions.

Signed-off-by: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-run-1.c: Fix failure
on RV32.
---
 .../riscv/rvv/base/float-point-frm-run-1.c| 58 ++-
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
index 210c49c5e8d..1d90b4f50d9 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-run-1.c
@@ -5,6 +5,24 @@
 #include 
 #include 
 
+#define DEFINE_TEST_FRM_FUNC(FRM) \
+vfloat32m1_t __attribute__ ((noinline)) \
+test_float_point_frm_run_##FRM (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl) 
\
+{  
\
+  vfloat32m1_t result; 
\
+   
\
+  set_frm (0); 
\
+   
\
+  result = __riscv_vfadd_vv_f32m1_rm (op1, result, FRM, vl);   
\
+   
\
+  assert_equal (FRM, get_frm (), "The value of frm should be " #FRM ".");  
\
+   
\
+  return result;   
\
+}
+
+#define CALL_TEST_FUNC(FRM, op1, op2, vl) \
+  test_float_point_frm_run_##FRM (op1, op2, vl)
+
 static int
 get_frm ()
 {
@@ -31,40 +49,22 @@ set_frm (int frm)
   );
 }
 
-static inline void
+void __attribute__ ((noinline)) \
 assert_equal (int a, int b, char *message)
 {
   if (a != b)
 {
-  printf (message);
+  fprintf (stdout, message);
+  fflush (stdout);
   __builtin_abort ();
 }
 }
 
-vfloat32m1_t __attribute__ ((noinline))
-test_float_point_frm_run (vfloat32m1_t op1, vfloat32m1_t op2, size_t vl)
-{
-  set_frm (0);
-
-  vfloat32m1_t result;
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 1, vl);
-  assert_equal (1, get_frm (), "The value of frm register should be 1.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 2, vl);
-  assert_equal (2, get_frm (), "The value of frm register should be 2.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 3, vl);
-  assert_equal (3, get_frm (), "The value of frm register should be 3.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 4, vl);
-  assert_equal (4, get_frm (), "The value of frm register should be 4.");
-
-  result = __riscv_vfadd_vv_f32m1_rm (op1, result, 0, vl);
-  assert_equal (0, get_frm (), "The value of frm register should be 0.");
-
-  return result;
-}
+DEFINE_TEST_FRM_FUNC (0)
+DEFINE_TEST_FRM_FUNC (1)
+DEFINE_TEST_FRM_FUNC (2)
+DEFINE_TEST_FRM_FUNC (3)
+DEFINE_TEST_FRM_FUNC (4)
 
 int
 main ()
@@ -73,7 +73,11 @@ main ()
   vfloat32m1_t op1;
   vfloat32m1_t op2;
 
-  test_float_point_frm_run (op1, op2, vl);
+  CALL_TEST_FUNC (0, op1, op2, vl);
+  CALL_TEST_FUNC (1, op1, op2, vl);
+  CALL_TEST_FUNC (2, op1, op2, vl);
+  CALL_TEST_FUNC (3, op1, op2, vl);
+  CALL_TEST_FUNC (4, op1, op2, vl);
 
   return 0;
 }
-- 
2.34.1



RE: [PATCH V2] RISC-V: Enable COND_LEN_FMA auto-vectorization

2023-07-14 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito and Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Friday, July 14, 2023 8:33 PM
To: Robin Dapp 
Cc: Juzhe-Zhong ; GCC Patches ; 
Kito Cheng ; Palmer Dabbelt ; Jeff 
Law 
Subject: Re: [PATCH V2] RISC-V: Enable COND_LEN_FMA auto-vectorization

LGTM

Robin Dapp via Gcc-patches  於 2023年7月14日 週五 15:05
寫道:

> Hi Juzhe,
>
> thanks, looks good to me now - did before already actually ;).
>
> Regards
>  Robin
>


Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-14 Thread 钟居哲
So to be safe, I think it should be backport to GCC 13 even though I didn't 
have a intrinsic testcase to reproduce it.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong 

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
res &= x[i];
  return res;
}

ASM:
and_loop:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,-1
.L3:
vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vand.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,-1
vsetvli a3,zero,e32,m1,ta,ma
vredand.vs  v1,v1,v2
vmv.x.s a5,v1
and a0,a2,a5
ret
.L4:
mv  a0,a2
ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if 
it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

* config/riscv/autovec.md (reduc_plus_scal_): New pattern.
(reduc_smax_scal_): Ditto.
(reduc_umax_scal_): Ditto.
(reduc_smin_scal_): Ditto.
(reduc_umin_scal_): Ditto.
(reduc_and_scal_): Ditto.
(reduc_ior_scal_): Ditto.
(reduc_xor_scal_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_nonvlmax_integer_move_insn): Add reduction.
(expand_reduction): New function.
* config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(get_m1_mode): Ditto.
(expand_cond_len_binop): Fix name.
(expand_reduction): New function.
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
(change_insn): Ditto.
(change_vsetvl_insn): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
* gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 138 ++
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-v.cc   |  84 ++-
 gcc/config/riscv/riscv-vsetvl.cc  |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
 .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
 .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
 .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
 .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (, operands);
   DONE;
 })
+
+;; =
+;; == Reductions
+;; 

Re: Re: [PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-14 Thread 钟居哲
>> It's performance bug or correctness bug? Does it's also appeared in gcc 13 
>> if it's a correctness bug?

It's correctness bug. 

The bug as below:

vsetvli zero, 1, e16, m1, ta, ma  > VSETVL pass detect it can be  fused as 
"t1,zero,e16,m2,ta,ma" but failed in change_insn
vmv.s.x v1,a5
...
vsetvli t1,zero,e16,m2,ta,ma  -> elided 
vlse16.v v2...

So finally, we end up with:

vsetvli zero, 1, e16, m1, ta, ma 
vmv.s.x v1,a5
...
vlse16.v v2...

which is incorrect.
I tried to reproduce this situation by intrinsic but failed.
It seems that it can only be reproduced by reduction auto-vectorization.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-07-14 20:38
To: 钟居哲
CC: GCC Patches; Kito Cheng; Palmer Dabbelt; Robin Dapp; Jeff Law
Subject: Re: [PATCH] RISC-V: Support non-SLP unordered reduction


 於 2023年7月14日 週五 20:31 寫道:
From: Ju-Zhe Zhong 

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
res &= x[i];
  return res;
}

ASM:
and_loop:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,-1
.L3:
vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vand.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,-1
vsetvli a3,zero,e32,m1,ta,ma
vredand.vs  v1,v1,v2
vmv.x.s a5,v1
and a0,a2,a5
ret
.L4:
mv  a0,a2
ret

Fix bug of VSETVL PASS which is caused by reduction testcase.


It's performance bug or correctness bug? Does it's also appeared in gcc 13 if 
it's a correctness bug?


SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

* config/riscv/autovec.md (reduc_plus_scal_): New pattern.
(reduc_smax_scal_): Ditto.
(reduc_umax_scal_): Ditto.
(reduc_smin_scal_): Ditto.
(reduc_umin_scal_): Ditto.
(reduc_and_scal_): Ditto.
(reduc_ior_scal_): Ditto.
(reduc_xor_scal_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_nonvlmax_integer_move_insn): Add reduction.
(expand_reduction): New function.
* config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(get_m1_mode): Ditto.
(expand_cond_len_binop): Fix name.
(expand_reduction): New function.
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
(change_insn): Ditto.
(change_vsetvl_insn): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
* gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 138 ++
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-v.cc   |  84 ++-
 gcc/config/riscv/riscv-vsetvl.cc  |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
 .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
 .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
 .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
 .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mo

Re: [PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-14 Thread Kito Cheng via Gcc-patches
 於 2023年7月14日 週五 20:31 寫道:

> From: Ju-Zhe Zhong 
>
> This patch add reduc_*_scal to support reduction auto-vectorization.
>
> Use COND_LEN_* + reduc_*_scal to support unordered non-SLP
> auto-vectorization.
>
> Consider this following case:
> int __attribute__((noipa))
> and_loop (int32_t * __restrict x,
> int32_t n, int res)
> {
>   for (int i = 0; i < n; ++i)
> res &= x[i];
>   return res;
> }
>
> ASM:
> and_loop:
> ble a1,zero,.L4
> vsetvli a3,zero,e32,m1,ta,ma
> vmv.v.i v1,-1
> .L3:
> vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
> sllia4,a5,2
> sub a1,a1,a5
> vle32.v v2,0(a0)
> add a0,a0,a4
> vand.vv v1,v2,v1
> bne a1,zero,.L3
> vsetivlizero,1,e32,m1,ta,ma
> vmv.v.i v2,-1
> vsetvli a3,zero,e32,m1,ta,ma
> vredand.vs  v1,v1,v2
> vmv.x.s a5,v1
> and a0,a2,a5
> ret
> .L4:
> mv  a0,a2
> ret
>
> Fix bug of VSETVL PASS which is caused by reduction testcase.
>


It's performance bug or correctness bug? Does it's also appeared in gcc 13
if it's a correctness bug?


> SLP reduction and floating-point in-order reduction are not supported yet.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (reduc_plus_scal_): New pattern.
> (reduc_smax_scal_): Ditto.
> (reduc_umax_scal_): Ditto.
> (reduc_smin_scal_): Ditto.
> (reduc_umin_scal_): Ditto.
> (reduc_and_scal_): Ditto.
> (reduc_ior_scal_): Ditto.
> (reduc_xor_scal_): Ditto.
> * config/riscv/riscv-protos.h (enum insn_type): New enum.
> (emit_nonvlmax_integer_move_insn): Add reduction.
> (expand_reduction): New function.
> * config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
> (emit_vlmax_fp_reduction_insn): Ditto.
> (get_m1_mode): Ditto.
> (expand_cond_len_binop): Fix name.
> (expand_reduction): New function.
> * config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
> (change_insn): Ditto.
> (change_vsetvl_insn): Ditto.
> (pass_vsetvl::backward_demand_fusion): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
> * gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 138 ++
>  gcc/config/riscv/riscv-protos.h   |   3 +
>  gcc/config/riscv/riscv-v.cc   |  84 ++-
>  gcc/config/riscv/riscv-vsetvl.cc  |  28 +++-
>  .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
>  .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
>  .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
>  .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
>  .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
>  .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
>  .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
>  13 files changed, 868 insertions(+), 8 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 0476b1dea45..a74f66f41ac 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1531,3 +1531,141 @@
>riscv_vector::expand_cond_len_binop (, operands);
>DONE;
>  })
> +
> +;;
> =
> +;; == Reductions
> +;;
> =
> +
> +;;
> -

Re: [PATCH V2] RISC-V: Enable COND_LEN_FMA auto-vectorization

2023-07-14 Thread Kito Cheng via Gcc-patches
LGTM

Robin Dapp via Gcc-patches  於 2023年7月14日 週五 15:05
寫道:

> Hi Juzhe,
>
> thanks, looks good to me now - did before already actually ;).
>
> Regards
>  Robin
>


Re: [PATCH] RISC-V: Remove the redundant expressions in the and3.

2023-07-14 Thread Kito Cheng via Gcc-patches
Committed :)

Jeff Law via Gcc-patches  於 2023年7月14日 週五 10:52 寫道:

>
>
> On 7/13/23 20:41, Kito Cheng via Gcc-patches wrote:
> > Expanding without DONE or FAIL will leave the pattern as well, so this
> > patch is fine IMO, so this patch LGTM, but anyway I will test this and
> > commit if passed :)
> THanks.  I looked fine to me, but I wasn't going to have the time to
> commit/push it tonight.
>
> jeff
>


[PATCH] RISC-V: Support non-SLP unordered reduction

2023-07-14 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch add reduc_*_scal to support reduction auto-vectorization.

Use COND_LEN_* + reduc_*_scal to support unordered non-SLP auto-vectorization.

Consider this following case:
int __attribute__((noipa))
and_loop (int32_t * __restrict x, 
int32_t n, int res)
{
  for (int i = 0; i < n; ++i)
res &= x[i];
  return res;
}

ASM:
and_loop:
ble a1,zero,.L4
vsetvli a3,zero,e32,m1,ta,ma
vmv.v.i v1,-1
.L3:
vsetvli a5,a1,e32,m1,tu,ma   > MUST BE "TU".
sllia4,a5,2
sub a1,a1,a5
vle32.v v2,0(a0)
add a0,a0,a4
vand.vv v1,v2,v1
bne a1,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,-1
vsetvli a3,zero,e32,m1,ta,ma
vredand.vs  v1,v1,v2
vmv.x.s a5,v1
and a0,a2,a5
ret
.L4:
mv  a0,a2
ret

Fix bug of VSETVL PASS which is caused by reduction testcase.

SLP reduction and floating-point in-order reduction are not supported yet.

gcc/ChangeLog:

* config/riscv/autovec.md (reduc_plus_scal_): New pattern.
(reduc_smax_scal_): Ditto.
(reduc_umax_scal_): Ditto.
(reduc_smin_scal_): Ditto.
(reduc_umin_scal_): Ditto.
(reduc_and_scal_): Ditto.
(reduc_ior_scal_): Ditto.
(reduc_xor_scal_): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_nonvlmax_integer_move_insn): Add reduction.
(expand_reduction): New function.
* config/riscv/riscv-v.cc (emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(get_m1_mode): Ditto.
(expand_cond_len_binop): Fix name.
(expand_reduction): New function.
* config/riscv/riscv-vsetvl.cc (gen_vsetvl_pat): Fix bug.
(change_insn): Ditto.
(change_vsetvl_insn): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add reduction tests.
* gcc.target/riscv/rvv/autovec/reduc/reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c: New test.

---
 gcc/config/riscv/autovec.md   | 138 ++
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-v.cc   |  84 ++-
 gcc/config/riscv/riscv-vsetvl.cc  |  28 +++-
 .../riscv/rvv/autovec/reduc/reduc-1.c | 118 +++
 .../riscv/rvv/autovec/reduc/reduc-2.c | 129 
 .../riscv/rvv/autovec/reduc/reduc-3.c |  65 +
 .../riscv/rvv/autovec/reduc/reduc-4.c |  59 
 .../riscv/rvv/autovec/reduc/reduc_run-1.c |  56 +++
 .../riscv/rvv/autovec/reduc/reduc_run-2.c |  79 ++
 .../riscv/rvv/autovec/reduc/reduc_run-3.c |  49 +++
 .../riscv/rvv/autovec/reduc/reduc_run-4.c |  66 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 13 files changed, 868 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/reduc/reduc_run-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0476b1dea45..a74f66f41ac 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1531,3 +1531,141 @@
   riscv_vector::expand_cond_len_binop (, operands);
   DONE;
 })
+
+;; =
+;; == Reductions
+;; =
+
+;; -
+;;  [INT] Tree reductions
+;; -
+;; Includes:
+;; - vredsum.vs
+;; - vredmaxu.vs
+;; - vredmax.vs
+;; - vredminu.vs
+;; - vredmin.vs
+;; - vredand.vs
+;; - vredor.vs
+;; - vredxor.vs
+;; -
+
+(define_expan

Loop-ch improvements, part 3

2023-07-14 Thread Jan Hubicka via Gcc-patches
Hi,
loop-ch currently does analysis using ranger for all loops to identify
candidates and then follows by phase where headers are duplicated (which
breaks SSA and ranger).  The second stage does more analysis (to see how
many BBs we want to duplicate) but can't use ranger and thus misses
information about static conditionals.

This patch pushes all analysis into the first stage. We record how many
BBs to duplicate and the second stage just duplicats as it is told so.
This makes it possible to also extend range query done also to basic
blocks that are not headers.  This is easy to do, since we already do
path specific query so we only need to extend the path by headers we
decided to dulicate earlier.

This makes it possible to track situations where exit that is always
false in the first iteration for tests not in the original loop header.
Doing so lets us to update profile better and do better heuristics.  In
particular I changed logic as follows
  1) should_duplicate_loop_header_p counts size of duplicated region.  When we
 know that a given conditional will be constant true or constant false 
either
 in the duplicated region, by range query, or in the loop body after
 duplication (since it is loop invariant), we do not account it to code size
 costs
  2) don't need account loop invariant compuations that will be duplicated
 as they will become fully invariant
 (maybe we want to have some cap for register pressure eventually?)
  3) optimize_size logic is now different.  Originally we started duplicating
 iff the first conditional was known to be true by ranger query, but then
 we used same limits as for -O2.

 I now simply lower limits to 0. This means that every conditional
 in duplicated sequence must be either loop invariant or constant when
 duplicated and we only duplicate statements computing loop invariants
 and those we account to 0 size anyway,

This makes code IMO more streamlined (and hopefully will let us to merge
ibts with loop peeling logic), but makes little difference in practice.
The problem is that in loop:

void test2();
void test(int n)
{
  for (int i = 0; n && i < 10; i++)
  test2();
}

We produce:
   [local count: 1073741824 freq: 9.090909]:
  # i_4 = PHI <0(2), i_9(3)>
  _1 = n_7(D) != 0;
  _2 = i_4 <= 9;
  _3 = _1 & _2;
  if (_3 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

and do not understand that the final conditional is a combination of a 
conditional
that is always true in first iteration and a conditional that is loop invariant.

This is also the case of
void test2();
void test(int n)
{
  for (int i = 0; n; i++)
{
  if (i > 10)
break;
  test2();
}
}
Which we turn to the earlier case in ifcombine.

With disabled ifcombine things however works as exepcted.  This is something
I plan to handle incrementally.  However extending loop-ch and peeling passes
to understand such combined conditionals is still not good enough: at the time 
ifcombine
merged the two conditionals we lost profile information on how often n is 0,
so we can't recover correct profile or know what is expected number of 
iterations
after the transofrm.

Bootstrapped/regtested x86_64-linux, OK?

Honza


gcc/ChangeLog:

* tree-ssa-loop-ch.cc (edge_range_query): Take loop argument; be ready
for queries not in headers.
(static_loop_exit): Add basic blck parameter; update use of
edge_range_query
(should_duplicate_loop_header_p): Add ranger and static_exits
parameter.  Do not account statements that will be optimized
out after duplicaiton in overall size. Add ranger query to
find static exits.
(update_profile_after_ch):  Take static_exits has set instead of
single eliminated_edge.
(ch_base::copy_headers): Do all analysis in the first pass;
remember invariant_exits and static_exits.

diff --git a/gcc/tree-ssa-loop-ch.cc b/gcc/tree-ssa-loop-ch.cc
index 24e7fbc805a..e0139cb432c 100644
--- a/gcc/tree-ssa-loop-ch.cc
+++ b/gcc/tree-ssa-loop-ch.cc
@@ -49,11 +49,13 @@ along with GCC; see the file COPYING3.  If not see
the range of the solved conditional in R.  */
 
 static void
-edge_range_query (irange &r, edge e, gcond *cond, gimple_ranger &ranger)
+edge_range_query (irange &r, class loop *loop, gcond *cond, gimple_ranger 
&ranger)
 {
-  auto_vec path (2);
-  path.safe_push (e->dest);
-  path.safe_push (e->src);
+  auto_vec path;
+  for (basic_block bb = gimple_bb (cond); bb != loop->header; bb = 
single_pred_edge (bb)->src)
+path.safe_push (bb);
+  path.safe_push (loop->header);
+  path.safe_push (loop_preheader_edge (loop)->src);
   path_range_query query (ranger, path);
   if (!query.range_of_stmt (r, cond))
 r.set_varying (boolean_type_node);
@@ -63,17 +65,16 @@ edge_range_query (irange &r, edge e, gcond *cond, 
gimple_ranger &ranger)
and NULL otherwise.  */
 
 static edge
-static_loop_exit (class loop *l, gimple_ranger *ra

[COMMITTED] bpf: enable instruction scheduling

2023-07-14 Thread Jose E. Marchesi via Gcc-patches


commit 53d12ecd624ec901d8449cfa1917f6f90e910927 (HEAD -> master, origin/master, 
origin/HEAD)
Author: Jose E. Marchesi 
Date:   Fri Jul 14 13:54:06 2023 +0200

bpf: enable instruction scheduling

This patch adds a dummy FSM to bpf.md in order to get INSN_SCHEDULING
defined.  If the later is not defined, the `combine' pass generates
paradoxical subregs of mems, which seems to then be mishandled by LRA,
resulting in invalid code.

Tested in bpf-unknown-none.

gcc/ChangeLog:

2023-07-14  Jose E. Marchesi  

PR target/110657
* config/bpf/bpf.md: Enable instruction scheduling.

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index f6be0a21234..329f62f55c3 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -20,6 +20,17 @@
 (include "predicates.md")
 (include "constraints.md")
 
+ Instruction Scheduler FSM
+
+;; This is just to get INSN_SCHEDULING defined, so that combine does
+;; not make paradoxical subregs of memory.  These subregs seems to
+;; confuse LRA that ends generating wrong instructions.
+
+(define_automaton "frob")
+(define_cpu_unit "frob_unit" "frob")
+(define_insn_reservation "frobnicator" 814
+  (const_int 0) "frob_unit")
+
  Unspecs
 
 (define_c_enum "unspec" [


Re: [PATCH V4] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-07-14 Thread Aldy Hernandez via Gcc-patches
I don't know what you're trying to accomplish here, as I haven't been 
following the PR, but adding all these helper functions to the ranger 
header file seems wrong, especially since there's only one use of them. 
I see you're tweaking the irange API, adding helper functions to 
range-op (which is only for code dealing with implementing range 
operators for tree codes), etc etc.


If you need these helper functions, I suggest you put them closer to 
their uses (i.e. wherever the match.pd support machinery goes).


Aldy

On 7/11/23 11:04, Jiufu Guo wrote:

Hi,

Integer expression "(X - N * M) / N" can be optimized to "X / N - M"
if there is no wrap/overflow/underflow and "X - N * M" has the same
sign with "X".

Compare the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623028.html
- The APIs for checking overflow of range operation are moved to
other files: range-op and gimple-range.
- Improve the patterns with '(X + C)' for unsigned type.

Bootstrap & regtest pass on ppc64{,le} and x86_64.
Is this patch ok for trunk?

BR,
Jeff (Jiufu Guo)


PR tree-optimization/108757

gcc/ChangeLog:

* gimple-range.cc (arith_without_overflow_p): New function.
(same_sign_p): New function.
* gimple-range.h (arith_without_overflow_p): New declare.
(same_sign_p): New declare.
* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.
* range-op.cc (plus_without_overflow_p): New function.
(minus_without_overflow_p): New function.
(mult_without_overflow_p): New function.
* range-op.h (plus_without_overflow_p): New declare.
(minus_without_overflow_p): New declare.
(mult_without_overflow_p): New declare.
* value-query.h (get_range): New function
* value-range.cc (irange::nonnegative_p): New function.
(irange::nonpositive_p): New function.
* value-range.h (irange::nonnegative_p): New declare.
(irange::nonpositive_p): New declare.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

---
  gcc/gimple-range.cc   |  50 +++
  gcc/gimple-range.h|   2 +
  gcc/match.pd  |  64 
  gcc/range-op.cc   |  77 ++
  gcc/range-op.h|   4 +
  gcc/value-query.h |  10 ++
  gcc/value-range.cc|  12 ++
  gcc/value-range.h |   2 +
  gcc/testsuite/gcc.dg/pr108757-1.c |  18 +++
  gcc/testsuite/gcc.dg/pr108757-2.c |  19 +++
  gcc/testsuite/gcc.dg/pr108757.h   | 233 ++
  11 files changed, 491 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/pr108757-1.c
  create mode 100644 gcc/testsuite/gcc.dg/pr108757-2.c
  create mode 100644 gcc/testsuite/gcc.dg/pr108757.h

diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 
01e62d3ff3901143bde33dc73c0debf41d0c0fdd..620fe32e85e5fe3847a933554fc656b2939cf02d
 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -926,3 +926,53 @@ assume_query::dump (FILE *f)
  }
fprintf (f, "--\n");
  }
+
+/* Return true if the operation "X CODE Y" in type does not overflow
+   underflow or wrap with value range info, otherwise return false.  */
+
+bool
+arith_without_overflow_p (tree_code code, tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  switch (code)
+{
+case PLUS_EXPR:
+  return plus_without_overflow_p (vr0, vr1, type);
+case MINUS_EXPR:
+  return minus_without_overflow_p (vr0, vr1, type);
+case MULT_EXPR:
+  return mult_without_overflow_p (vr0, vr1, type);
+default:
+  gcc_unreachable ();
+}
+
+  return false;
+}
+
+/* Return true if "X" and "Y" have the same sign or zero.  */
+
+bool
+same_sign_p (tree x, tree y, tree type)
+{
+  gcc_assert (INTEGRAL_TYPE_P (type));
+
+  if (TYPE_UNSIGNED (type))
+return true;
+
+  value_range vr0;
+  value_range vr1;
+  if (!(get_range (vr0, x) && get_range (vr1, y)))
+return false;
+
+  return (vr0.nonnegative_p () && vr1.nonnegative_p ())
+|| (vr0.nonpositive_p () && vr1.nonpositive_p ());
+}
diff --git a/gcc/gimple-range.h b/gcc/gimple-range.h
index 
6587e4923ff44e10826a697ecced237a0ad23c88..84eac87392b642ed3305011415c804f5b319e09f
 100644
--- a/gcc/gimple-range.h
+++ b/gcc/gimple-range.h
@@ -101,5 +101,7 @@ protected:
gori_compute m_gori;
  };
  
+bool arith_without_overflow_p (tree_code code, tree x, tree y, tree type);

+bool same_sign_p (tree x, tree y, tree type);
  
  #endif // GCC_GIMPLE_RANGE_H

diff --git a/gcc/match.pd b/gcc/match.pd
index 
8543f777a28e4f39b2b2a40d0702aed88786b

Re: [COMMITTED] [range-op] Take known set bits into account in popcount [PR107053]

2023-07-14 Thread Aldy Hernandez via Gcc-patches




On 7/12/23 23:50, Jeff Law wrote:



On 7/12/23 15:15, Aldy Hernandez via Gcc-patches wrote:

This patch teaches popcount about known set bits which are now
available in the irange.

PR tree-optimization/107053

gcc/ChangeLog:

* gimple-range-op.cc (cfn_popcount): Use known set bits.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr107053.c: New test.
You could probably play similar games with ctz/clz, though it's hard to 
know if it's worth the effort.


One way to find out might be to build jemalloc which uses those idioms 
heavily.  Similarly for deepsjeng from spec2017.


Jeff



See class cfn_clz and class cfn_ctz in gimple-range-op.cc.  There's 
already code for both of these, although they're throwback from the VRP 
era, so there's definitely room for improvement.  I think they came from 
vr-values.cc.


Aldy



[committed] libgomp.texi: Extend memory allocation documentation

2023-07-14 Thread Tobias Burnus

When looking at the documentation, I found some more gaps and a missing
cross ref between OMP_ALLOCATOR and the 'Memory allocation' section.

Hence, I applied the attached patch as r14-2518-ga85a106c35c6d1

For more documentation tasks, see for instance: https://gcc.gnu.org/PR110364

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit a85a106c35c6d1d9fd40627e149501e5e854bcc3
Author: Tobias Burnus 
Date:   Fri Jul 14 13:15:07 2023 +0200

libgomp.texi: Extend memory allocation documentation

libgomp/
* libgomp.texi (OMP_ALLOCATOR): Document the default values for
the traits. Add crossref to 'Memory allocation'.
(Memory allocation): Refer to OMP_ALLOCATOR for the available
traits and allocators/mem spaces; document the default value
for the pool_size trait.
---
 libgomp/libgomp.texi | 37 -
 1 file changed, 32 insertions(+), 5 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 1645cc0a2d3..639dd05eb7b 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2005,7 +2005,7 @@ the @code{omp_set_default_allocator} API routine can be used to change
 value.
 
 @multitable @columnfractions .45 .45
-@headitem Predefined allocators @tab Predefined memory spaces
+@headitem Predefined allocators @tab Associated predefined memory spaces
 @item omp_default_mem_alloc @tab omp_default_mem_space
 @item omp_large_cap_mem_alloc   @tab omp_large_cap_mem_space
 @item omp_const_mem_alloc   @tab omp_const_mem_space
@@ -2016,22 +2016,40 @@ value.
 @item omp_thread_mem_alloc  @tab --
 @end multitable
 
-@multitable @columnfractions .30 .60
-@headitem Trait @tab Allowed values
+The predefined allocators use the default values for the traits,
+as listed below.  Except that the last three allocators have the
+@code{access} trait set to @code{cgroup}, @code{pteam}, and
+@code{thread}, respectively.
+
+@multitable @columnfractions .25 .40 .25
+@headitem Trait @tab Allowed values @tab Default value
 @item @code{sync_hint} @tab @code{contended}, @code{uncontended},
 @code{serialized}, @code{private}
+   @tab @code{contended}
 @item @code{alignment} @tab Positive integer being a power of two
+   @tab 1 byte
 @item @code{access}@tab @code{all}, @code{cgroup},
 @code{pteam}, @code{thread}
+   @tab @code{all}
 @item @code{pool_size} @tab Positive integer
+   @tab See @ref{Memory allocation}
 @item @code{fallback}  @tab @code{default_mem_fb}, @code{null_fb},
 @code{abort_fb}, @code{allocator_fb}
+   @tab See below
 @item @code{fb_data}   @tab @emph{unsupported as it needs an allocator handle}
+   @tab (none)
 @item @code{pinned}@tab @code{true}, @code{false}
+   @tab @code{false}
 @item @code{partition} @tab @code{environment}, @code{nearest},
 @code{blocked}, @code{interleaved}
+   @tab @code{environment}
 @end multitable
 
+For the @code{fallback} trait, the default value is @code{null_fb} for the
+@code{omp_default_mem_alloc} allocator and any allocator that is associated
+with device memory; for all other other allocators, it is @code{default_mem_fb}
+by default.
+
 Examples:
 @smallexample
 OMP_ALLOCATOR=omp_high_bw_mem_alloc
@@ -2039,7 +2057,8 @@ OMP_ALLOCATOR=omp_large_cap_mem_space
 OMP_ALLOCATR=omp_low_lat_mem_space:pinned=true,partition=nearest
 @end smallexample
 
-@c @item @emph{See also}:
+@item @emph{See also}:
+@ref{Memory allocation}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
@@ -4632,6 +4651,11 @@ smaller number.  On non-host devices, the value of the
 @node Memory allocation
 @section Memory allocation
 
+For the available predefined allocators and, as applicable, their associated
+predefined memory spaces and for the available traits and their default values,
+see @ref{OMP_ALLOCATOR}.  Predefined allocators without an associated memory
+space use the @code{omp_default_mem_space} memory space.
+
 For the memory spaces, the following applies:
 @itemize
 @item @code{omp_default_mem_space} is supported
@@ -4674,9 +4698,12 @@ current node; therefore, unless the memory placement policy has been overridden,
 the @code{partition} trait @code{environment} (the default) will be effectively
 a @code{nearest} allocation.
 
-Additional notes:
+Additional notes regarding the traits:
 @itemize
 @item The @code{pinned} trait is unsupported.
+@item The default for the @code{pool_size} trait is no pool and for every
+  (re)allocation the

Re: [PATCH v2 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-14 Thread Jonathan Wakely via Gcc-patches
On Fri, 14 Jul 2023 at 11:48, Jonathan Wakely  wrote:
>
> On Thu, 13 Jul 2023 at 21:04, Ken Matsui  wrote:
> >
> > On Thu, Jul 13, 2023 at 2:22 AM Jonathan Wakely  wrote:
> > >
> > > On Wed, 12 Jul 2023 at 21:42, Ken Matsui  
> > > wrote:
> > > >
> > > > On Wed, Jul 12, 2023 at 3:01 AM Jonathan Wakely  
> > > > wrote:
> > > > >
> > > > > On Mon, 10 Jul 2023 at 06:51, Ken Matsui via Libstdc++
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Here is the benchmark result for is_pointer:
> > > > > >
> > > > > > https://github.com/ken-matsui/gcc-benches/blob/main/is_pointer.md#sun-jul--9-103948-pm-pdt-2023
> > > > > >
> > > > > > Time: -62.1344%
> > > > > > Peak Memory Usage: -52.4281%
> > > > > > Total Memory Usage: -53.5889%
> > > > >
> > > > > Wow!
> > > > >
> > > > > Although maybe we could have improved our std::is_pointer_v anyway, 
> > > > > like so:
> > > > >
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v = false;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > > >
> > > > > I'm not sure why I didn't already do that.
> > > > >
> > > > > Could you please benchmark that? And if it is better than the current
> > > > > impl using is_pointer<_Tp>::value then we should do this in the
> > > > > library:
> > > > >
> > > > > #if __has_builtin(__is_pointer)
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > > > > #else
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v = false;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > > template 
> > > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > > > #endif
> > > >
> > > > Hi François and Jonathan,
> > > >
> > > > Thank you for your reviews! I will rename the four underscores to the
> > > > appropriate name and take a benchmark once I get home.
> > > >
> > > > If I apply your change on is_pointer_v, is it better to add the
> > > > `Co-authored-by:` line in the commit?
> > >
> > > Yes, that would be the correct thing to do (although in this case the
> > > change is small enough that I don't really care about getting credit
> > > for it :-)
> > >
> > Thank you! I will include it in my commit :) I see that you included
> > the DCO sign-off in the MAINTAINERS file. However, if a reviewer
> > doesn't, should I include the `Signed-off-by:` line for the reviewer
> > as well?
>
> No, reviewers should not sign-off, that's for the code author. And
> authors should add that themselves (or clearly state that they agree
> to the DCO terms). You should not sign-off on someone else's behalf.

You can add Reviewed-by: if you want to record that information.



Re: [PATCH v2 1/2] c++, libstdc++: implement __is_pointer built-in trait

2023-07-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 13 Jul 2023 at 21:04, Ken Matsui  wrote:
>
> On Thu, Jul 13, 2023 at 2:22 AM Jonathan Wakely  wrote:
> >
> > On Wed, 12 Jul 2023 at 21:42, Ken Matsui  wrote:
> > >
> > > On Wed, Jul 12, 2023 at 3:01 AM Jonathan Wakely  
> > > wrote:
> > > >
> > > > On Mon, 10 Jul 2023 at 06:51, Ken Matsui via Libstdc++
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Here is the benchmark result for is_pointer:
> > > > >
> > > > > https://github.com/ken-matsui/gcc-benches/blob/main/is_pointer.md#sun-jul--9-103948-pm-pdt-2023
> > > > >
> > > > > Time: -62.1344%
> > > > > Peak Memory Usage: -52.4281%
> > > > > Total Memory Usage: -53.5889%
> > > >
> > > > Wow!
> > > >
> > > > Although maybe we could have improved our std::is_pointer_v anyway, 
> > > > like so:
> > > >
> > > > template 
> > > >   inline constexpr bool is_pointer_v = false;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > >
> > > > I'm not sure why I didn't already do that.
> > > >
> > > > Could you please benchmark that? And if it is better than the current
> > > > impl using is_pointer<_Tp>::value then we should do this in the
> > > > library:
> > > >
> > > > #if __has_builtin(__is_pointer)
> > > > template 
> > > >   inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > > > #else
> > > > template 
> > > >   inline constexpr bool is_pointer_v = false;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp*> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* const> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > > > template 
> > > >   inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > > > #endif
> > >
> > > Hi François and Jonathan,
> > >
> > > Thank you for your reviews! I will rename the four underscores to the
> > > appropriate name and take a benchmark once I get home.
> > >
> > > If I apply your change on is_pointer_v, is it better to add the
> > > `Co-authored-by:` line in the commit?
> >
> > Yes, that would be the correct thing to do (although in this case the
> > change is small enough that I don't really care about getting credit
> > for it :-)
> >
> Thank you! I will include it in my commit :) I see that you included
> the DCO sign-off in the MAINTAINERS file. However, if a reviewer
> doesn't, should I include the `Signed-off-by:` line for the reviewer
> as well?

No, reviewers should not sign-off, that's for the code author. And
authors should add that themselves (or clearly state that they agree
to the DCO terms). You should not sign-off on someone else's behalf.



Re: [PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Jan Beulich via Gcc-patches
On 14.07.2023 12:10, Uros Bizjak wrote:
> On Fri, Jul 14, 2023 at 11:44 AM Jan Beulich  wrote:
>>
>> The corresponding insn serves this purpose quite fine, and leads to
>> slightly less (generated) code. All we need is the insn to not have a
>> leading * in its name, while retaining that * for "extendhfsf2".
>> Introduce a mode attribute in exchange to achieve that.
>>
>> gcc/
>>
>> * config/i386/i386.md (extendhfdf2): Delete expander.
>> (extendhf): New mode attribute.
>> (*extendhf2): Use it.
> 
> No, please leave the expander, it is there due to extendhfsf2 that
> prevents effective macroization.

Well, okay then.

> FYI, there is no less generated code when the named pattern is used,
> the same code is generated from the named pattern as from the
> expander. Source code can be shrinked, but in this particular case,
> forced macroization complicates things more.

Hmm, I'm pretty sure I checked and found some reduction.

Jan


Re: [PATCH, OpenACC 2.7] Implement default clause support for data constructs

2023-07-14 Thread Chung-Lin Tang via Gcc-patches
Hi Thomas,

On 2023/6/23 6:47 PM, Thomas Schwinge wrote:
>> +
>>ctx->clauses = *orig_list_p;
>>gimplify_omp_ctxp = ctx;
>>  }
> Instead of this, in 'gimplify_omp_workshare', before the
> 'gimplify_scan_omp_clauses' call, do something like:
> 
> if ((ort & ORT_ACC)
> && !omp_find_clause (OMP_CLAUSES (expr), OMP_CLAUSE_DEFAULT))
>   {
> /* Determine effective 'default' clause for OpenACC compute 
> construct.  */
> for (struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; ctx; ctx = 
> ctx->outer_context)
>   {
> if (ctx->region_type == ORT_ACC_DATA
> && ctx->default_kind != OMP_CLAUSE_DEFAULT_SHARED)
>   {
> [Append actual default clause on compute construct.]
> break;
>   }
>   }
>   }
> 
> That seems conceptually simpler to me?

I'm not sure if this is conceptually simpler, but using 'oacc_default_kind'
is definitely faster computationally :)

However, as you mention below...

> For the 'build_omp_clause', does using 'ctx->location' instead of
> 'UNKNOWN_LOCATION' help diagnostics in any way?  Like if we add in
> 'gcc/gimplify.cc:oacc_default_clause',
> 'if (ctx->default_kind == OMP_CLAUSE_DEFAULT_NONE)' another 'inform' to
> point to the 'data' construct's 'default' clause?  (But not sure if
> that's easily done; otherwise don't.)

Noticed that we will need to track the actually lexically enclosing OpenACC 
construct
with the user set default-clause somewhere in 'ctx', in order to satisfy the 
current
diagnostics in oacc_default_clause().

(the UNKNOWN_LOCATION for the internally created default-clause probably doesn't
matter, that one is just for reminder in internal dumps, probably never plays 
role
in user diagnostics)

> Similar to the ones you've already got, please also add a few test cases
> for nested 'default' clauses, like:
> 
> #pragma acc data // no vs. 'default(none)' vs. 'default(present)'
> {
>   #pragma acc data // no vs. same vs. different 'default' clause
>   {
> #pragma acc data // no vs. same vs. different 'default' clause
> {
>   #pragma acc parallel
> 
> Similarly, test cases where 'default' on the compute construct overrides
> 'default' of an outer 'data' construct.

Okay, will add more testcases.

Thanks,
Chung-Lin


Re: [gcc r14-2455] riscv: Prepare backend for index registers

2023-07-14 Thread Andreas Schwab
Why didn't you test that?

../../gcc/config/riscv/riscv.cc: In function 'int 
riscv_regno_ok_for_index_p(int)':
../../gcc/config/riscv/riscv.cc:864:33: error: unused parameter 'regno' 
[-Werror=unused-parameter]
  864 | riscv_regno_ok_for_index_p (int regno)
  | ^
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:2499: riscv.o] Error 1

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-07-14 Thread Tejas Belagod via Gcc-patches

On 7/13/23 4:05 PM, Richard Biener wrote:

On Thu, Jul 13, 2023 at 12:15 PM Tejas Belagod  wrote:


On 7/3/23 1:31 PM, Richard Biener wrote:

On Mon, Jul 3, 2023 at 8:50 AM Tejas Belagod  wrote:


On 6/29/23 6:55 PM, Richard Biener wrote:

On Wed, Jun 28, 2023 at 1:26 PM Tejas Belagod  wrote:






From: Richard Biener 
Date: Tuesday, June 27, 2023 at 12:58 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:






From: Richard Biener 
Date: Monday, June 26, 2023 at 2:23 PM
To: Tejas Belagod 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
 wrote:


Hi,

Packed Boolean Vectors
--

I'd like to propose a feature addition to GNU Vector extensions to add packed
boolean vectors (PBV).  This has been discussed in the past here[1] and a 
variant has
been implemented in Clang recently[2].

With predication features being added to vector architectures (SVE, MVE, AVX),
it is a useful feature to have to model predication on targets.  This could
find its use in intrinsics or just used as is as a GNU vector extension being
mapped to underlying target features.  For example, the packed boolean vector
could directly map to a predicate register on SVE.

Also, this new packed boolean type GNU extension can be used with SVE ACLE
intrinsics to replace a fixed-length svbool_t.

Here are a few options to represent the packed boolean vector type.


The GIMPLE frontend uses a new 'vector_mask' attribute:

typedef int v8si __attribute__((vector_size(8*sizeof(int;
typedef v8si v8sib __attribute__((vector_mask));

it get's you a vector type that's the appropriate (dependent on the
target) vector
mask type for the vector data type (v8si in this case).



Thanks Richard.

Having had a quick look at the implementation, it does seem to tick the boxes.

I must admit I haven't dug deep, but if the target hook allows the mask to be

defined in way that is target-friendly (and I don't know how much effort it will

be to migrate the attribute to more front-ends), it should do the job nicely.

Let me go back and dig a bit deeper and get back with questions if any.



Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

Sorry for the delayed response – I spent a day experimenting with vector_mask.



Yeah, this is what option 4 in the RFC is trying to achieve – be portable enough

to avoid having to sprinkle the code with ifdefs.


It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.



This sounds very much like what the scenario would be with NEON vs SVE. Coming 
to think

of it, vector_mask resembles option 4 in the proposal with ‘n’ implied by the 
‘base’ vector type

and a ‘w’ specified for the type.



Given its current implementation, if vector_mask is exposed to the CFE, would 
there be any

major challenges wrt implementation or defining behaviour semantics? I played 
around with a

few examples from the testsuite and wrote some new ones. I mostly tried 
operations that

the new type would have to support (unary, binary bitwise, initializations etc) 
– with a couple of exceptions

most of the ops seem to be supported. I also triggered a couple of ICEs in some 
tests involving

implicit conversions to wider/narrower vector_mask types (will raise reports 
for these). Correct me

if I’m wrong here, but we’d probably have to support a couple of new ops if 
vector_mask is exposed

to the CFE – initialization and subscript operations?


Yes, either that or restrict how the mask vectors can be used, thus
properly diagnose improper
uses.


Indeed.

A question would be for example how to write common mask test

operations like
if (any (mask)) or if (all (mask)).


I see 2 options here. New builtins could support new types - they'd
provide a target independent way to test any and all conditions. Another
would be to let the target use its intrinsics to do them in the most
efficient way possible (which the builtins would get lowered down to
anyway).


Likewise writing merge operations

- do those as

a = a | (mask ? b : 0);

thus use ternary ?: for this?

Re: [PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 11:44 AM Jan Beulich  wrote:
>
> The corresponding insn serves this purpose quite fine, and leads to
> slightly less (generated) code. All we need is the insn to not have a
> leading * in its name, while retaining that * for "extendhfsf2".
> Introduce a mode attribute in exchange to achieve that.
>
> gcc/
>
> * config/i386/i386.md (extendhfdf2): Delete expander.
> (extendhf): New mode attribute.
> (*extendhf2): Use it.

No, please leave the expander, it is there due to extendhfsf2 that
prevents effective macroization.

FYI, there is no less generated code when the named pattern is used,
the same code is generated from the named pattern as from the
expander. Source code can be shrinked, but in this particular case,
forced macroization complicates things more.

Uros.

> ---
> Of course the mode attribute could as well supply the full names.
>
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5221,13 +5221,9 @@
>  }
>  })
>
> -(define_expand "extendhfdf2"
> -  [(set (match_operand:DF 0 "register_operand")
> -   (float_extend:DF
> - (match_operand:HF 1 "nonimmediate_operand")))]
> -  "TARGET_AVX512FP16")
> +(define_mode_attr extendhf [(SF "*") (DF "")])
>
> -(define_insn "*extendhf2"
> +(define_insn "extendhf2"
>[(set (match_operand:MODEF 0 "register_operand" "=v")
>  (float_extend:MODEF
>   (match_operand:HF 1 "nonimmediate_operand" "vm")))]


Re: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 11:27 AM Roger Sayle  wrote:
>
>
> > From: Uros Bizjak 
> > Sent: 13 July 2023 19:21
> >
> > On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle 
> > wrote:
> > >
> > > This patch resolves PR target/110588 to catch another case in combine
> > > where the i386 backend should be generating a btl instruction.  This
> > > adds another define_insn_and_split to recognize the RTL representation
> > > for this case.
> > >
> > > I also noticed that two related define_insn_and_split weren't using
> > > the preferred string style for single statement
> > > preparation-statements, so I've reformatted these to be consistent in 
> > > style with
> > the new one.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-13  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/110588
> > > * config/i386/i386.md (*bt_setcqi): Prefer string form
> > > preparation statement over braces for a single statement.
> > > (*bt_setncqi): Likewise.
> > > (*bt_setncqi_2): New define_insn_and_split.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR target/110588
> > > * gcc.target/i386/pr110588.c: New test case.
> >
> > +;; Help combine recognize bt followed by setnc (PR target/110588)
> > +(define_insn_and_split "*bt_setncqi_2"
> > +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> > +  (zero_extract:SWI48
> > +(match_operand:SWI48 1 "register_operand")
> > +(const_int 1)
> > +(zero_extend:SI (match_operand:QI 2 "register_operand")))
> > +  (const_int 0)))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (reg:CCC FLAGS_REG)
> > +(compare:CCC
> > + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> > + (const_int 0)))
> > +   (set (match_dup 0)
> > +(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > I don't think the above transformation is 100% correct, mainly due to the 
> > use of
> > paradoxical subreg.
> >
> > The combined instruction is operating with a zero_extended QImode register, 
> > so
> > all bits of the register are well defined. You are splitting using 
> > paradoxical subreg,
> > so you don't know what garbage is there in the highpart of the count 
> > register.
> > However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a 
> > slightly
> > invalid RTX, everything checks out.
> >
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > You probably need mode instead of SImode here.
>
> The define_insn for *bt is:
>
> (define_insn "*bt"
>   [(set (reg:CCC FLAGS_REG)
> (compare:CCC
>   (zero_extract:SWI48
> (match_operand:SWI48 0 "nonimmediate_operand" "r,m")
> (const_int 1)
> (match_operand:SI 1 "nonmemory_operand" "r,"))
>   (const_int 0)))]
>
> So  isn't appropriate here.
>
> But now you've made me think about it, it's inconsistent that all of the 
> shifts
> and rotates in i386.md standardize on QImode for shift counts, but the bit 
> test
> instructions use SImode?  I think this explains where the paradoxical SUBREGs
> come from, and in theory any_extend from QImode to SImode here could/should
> be handled/unnecessary.
>
> Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
> SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

IIRC, zero_extract was moved from modeless to a pattern with defined
mode a while ago. Perhaps SImode is just because of these ancient
times, and BT pattern was written that way to satisfy combine. I think
it is definitely worth investigating; perhaps some BT-related pattern
will become obsolete because of the change.

Uros.


Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 10:53 AM Richard Biener  wrote:
>
> On Fri, 14 Jul 2023, Uros Bizjak wrote:
>
> > On Fri, Jul 14, 2023 at 10:31?AM Richard Biener  wrote:
> > >
> > > On Fri, 14 Jul 2023, Uros Bizjak wrote:
> > >
> > > > cprop1 pass does not consider paradoxical subreg and for (insn 22) 
> > > > claims
> > > > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> > > >
> > > > (insn 21 19 22 4 (set (reg:V4QI 98)
> > > > (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> > > >  (expr_list:REG_EQUAL (const_vector:V4QI [
> > > > (const_int -52 [0xffcc]) repeated x4
> > > > ])
> > > > (nil)))
> > > > (insn 22 21 23 4 (set (reg:V8HI 100)
> > > > (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 
> > > > 0)
> > > > (parallel [
> > > > (const_int 0 [0])
> > > > (const_int 1 [0x1])
> > > > (const_int 2 [0x2])
> > > > (const_int 3 [0x3])
> > > > (const_int 4 [0x4])
> > > > (const_int 5 [0x5])
> > > > (const_int 6 [0x6])
> > > > (const_int 7 [0x7])
> > > > ] "pr110206.c":12:42 7471 
> > > > {sse4_1_zero_extendv8qiv8hi2}
> > > >  (expr_list:REG_EQUAL (const_vector:V8HI [
> > > > (const_int 204 [0xcc]) repeated x8
> > > > ])
> > > > (expr_list:REG_DEAD (reg:V4QI 98)
> > > > (nil
> > > >
> > > > We rely on the "undefined" vals to have a specific value (from the 
> > > > earlier
> > > > REG_EQUAL note) but actual code generation doesn't ensure this (it 
> > > > doesn't
> > > > need to).  That said, the issue isn't the constant folding per-se but 
> > > > that
> > > > we do not actually constant fold but register an equality that doesn't 
> > > > hold.
> > > >
> > > > PR target/110206
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > > > * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > > > * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > > > * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > > > when the original source contains a paradoxical subreg.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * gcc.dg/torture/pr110206.c: New test.
> > > >
> > > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > > >
> > > > OK for mainline and backports?
> > >
> > > OK.
> > >
> > > I think the testcase can also run on other targets if you add
> > > dg-additional-options "-w -Wno-psabi", all generic vector ops
> > > should be lowered if not supported.
> >
> > True, but with lowered vector ops, the test would not even come close
> > to the problem. The problem is specific to generic vector ops, and can
> > be triggered only when paradoxical subregs are used to implement
> > (partial) vector modes. This is the case on x86, where partial vectors
> > are now heavily used, and even there we need the latest vector ISA
> > enabled to trip the condition.
> >
> > The above is the reason that dg-torture is used, with the hope that
> > the runtime failure will trip when testsuite is run with specific
> > target options.
>
> I see.  I'm fine with this then though moving to gcc.target/i386
> with appropriate triggering options and a dg-require for runtime
> support would also work.

You are right. I'll add the attached testcase to gcc.target/i386 instead.

Uros.
/* PR target/110206 */
/* { dg-do run } */
/* { dg-options "-Os -mavx512bw -mavx512vl" } */
/* { dg-require-effective-target avx512bw } */
/* { dg-require-effective-target avx512vl } */

#define AVX512BW
#define AVX512VL

#include "avx512f-check.h"

typedef unsigned char __attribute__((__vector_size__ (4))) U;
typedef unsigned char __attribute__((__vector_size__ (8))) V;
typedef unsigned short u16;

V g;

void
__attribute__((noinline))
foo (U u, u16 c, V *r)
{
  if (!c)
abort ();
  V x = __builtin_shufflevector (u, (204 >> u), 7, 0, 5, 1, 3, 5, 0, 2);
  V y = __builtin_shufflevector (g, (V) { }, 7, 6, 6, 7, 2, 6, 3, 5);
  V z = __builtin_shufflevector (y, 204 * x, 3, 9, 8, 1, 4, 6, 14, 5);
  *r = z;
}

static void test_256 (void) { };

static void
test_128 (void)
{
  V r;
  foo ((U){4}, 5, &r);
  if (r[6] != 0x30)
abort();
}


[PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Jan Beulich via Gcc-patches
The corresponding insn serves this purpose quite fine, and leads to
slightly less (generated) code. All we need is the insn to not have a
leading * in its name, while retaining that * for "extendhfsf2".
Introduce a mode attribute in exchange to achieve that.

gcc/

* config/i386/i386.md (extendhfdf2): Delete expander.
(extendhf): New mode attribute.
(*extendhf2): Use it.
---
Of course the mode attribute could as well supply the full names.

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -5221,13 +5221,9 @@
 }
 })
 
-(define_expand "extendhfdf2"
-  [(set (match_operand:DF 0 "register_operand")
-   (float_extend:DF
- (match_operand:HF 1 "nonimmediate_operand")))]
-  "TARGET_AVX512FP16")
+(define_mode_attr extendhf [(SF "*") (DF "")])
 
-(define_insn "*extendhf2"
+(define_insn "extendhf2"
   [(set (match_operand:MODEF 0 "register_operand" "=v")
 (float_extend:MODEF
  (match_operand:HF 1 "nonimmediate_operand" "vm")))]


[PATCH] x86: avoid maybe_gen_...()

2023-07-14 Thread Jan Beulich via Gcc-patches
In the (however unlikely) event that no insn can be found for the
requested mode, using maybe_gen_...() without (really) checking its
result for being a null rtx would lead to silent bad code generation.

gcc/

* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Use gen_vec_set_0.
(ix86_expand_vector_extract): Use gen_vec_extract_lo /
gen_vec_extract_hi.
(expand_vec_perm_broadcast_1): Use gen_vec_interleave_high /
gen_vec_interleave_low. Rename local variable.

--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15456,8 +15456,7 @@ ix86_expand_vector_init_duplicate (bool
{
  tmp1 = force_reg (GET_MODE_INNER (mode), val);
  tmp2 = gen_reg_rtx (mode);
- emit_insn (maybe_gen_vec_set_0 (mode, tmp2,
- CONST0_RTX (mode), tmp1));
+ emit_insn (gen_vec_set_0 (mode, tmp2, CONST0_RTX (mode), tmp1));
  tmp1 = gen_lowpart (mode, tmp2);
}
  else
@@ -17419,9 +17418,9 @@ ix86_expand_vector_extract (bool mmx_ok,
 ? gen_reg_rtx (V16HFmode)
 : gen_reg_rtx (V16BFmode));
  if (elt < 16)
-   emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
+   emit_insn (gen_vec_extract_lo (mode, tmp, vec));
  else
-   emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
+   emit_insn (gen_vec_extract_hi (mode, tmp, vec));
  ix86_expand_vector_extract (false, target, tmp, elt & 15);
  return;
}
@@ -17435,9 +17434,9 @@ ix86_expand_vector_extract (bool mmx_ok,
 ? gen_reg_rtx (V8HFmode)
 : gen_reg_rtx (V8BFmode));
  if (elt < 8)
-   emit_insn (maybe_gen_vec_extract_lo (mode, tmp, vec));
+   emit_insn (gen_vec_extract_lo (mode, tmp, vec));
  else
-   emit_insn (maybe_gen_vec_extract_hi (mode, tmp, vec));
+   emit_insn (gen_vec_extract_hi (mode, tmp, vec));
  ix86_expand_vector_extract (false, target, tmp, elt & 7);
  return;
}
@@ -22501,18 +22500,18 @@ expand_vec_perm_broadcast_1 (struct expa
   if (d->testing_p)
return true;
 
-  rtx (*maybe_gen) (machine_mode, int, rtx, rtx, rtx);
+  rtx (*gen_interleave) (machine_mode, int, rtx, rtx, rtx);
   if (elt >= nelt2)
{
- maybe_gen = maybe_gen_vec_interleave_high;
+ gen_interleave = gen_vec_interleave_high;
  elt -= nelt2;
}
   else
-   maybe_gen = maybe_gen_vec_interleave_low;
+   gen_interleave = gen_vec_interleave_low;
   nelt2 /= 2;
 
   dest = gen_reg_rtx (vmode);
-  emit_insn (maybe_gen (vmode, 1, dest, op0, op0));
+  emit_insn (gen_interleave (vmode, 1, dest, op0, op0));
 
   vmode = V4SImode;
   op0 = gen_lowpart (vmode, dest);


[PATCH] x86: slightly enhance "vec_dupv2df"

2023-07-14 Thread Jan Beulich via Gcc-patches
Introduce a new alternative permitting all 32 registers to be used as
source without AVX512VL, by broadcasting to the full 512 bits in that
case. (The insn would also permit all registers to be used as
destination, but V2DFmode doesn't.)

gcc/

* config/i386/sse.md (vec_dupv2df): Add new AVX512F
alternative. Move AVX512VL part of condition to new "enabled"
attribute.
---
Because of the V2DF restriction, in principle the new source constraint
could also omit 'm'.

Can't the latter two of the original alternatives be folded, by using
Yvm instead of xm/vm?

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13761,18 +13761,27 @@
(set_attr "mode" "DF,DF,V1DF,V1DF,V1DF,V2DF,V1DF,V1DF,V1DF")])
 
 (define_insn "vec_dupv2df"
-  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v")
+  [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,v")
(vec_duplicate:V2DF
- (match_operand:DF 1 "nonimmediate_operand" " 0,xm,vm")))]
-  "TARGET_SSE2 && "
+ (match_operand:DF 1 "nonimmediate_operand" "0,xm,vm,vm")))]
+  "TARGET_SSE2"
   "@
unpcklpd\t%0, %0
%vmovddup\t{%1, %0|%0, %1}
-   vmovddup\t{%1, %0|%0, %1}"
-  [(set_attr "isa" "noavx,sse3,avx512vl")
-   (set_attr "type" "sselog1")
-   (set_attr "prefix" "orig,maybe_vex,evex")
-   (set_attr "mode" "V2DF,DF,DF")])
+   vmovddup\t{%1, %0|%0, %1}
+   vbroadcastsd\t{%1, }%g0{|, %1}"
+  [(set_attr "isa" "noavx,sse3,avx512vl,*")
+   (set_attr "type" "sselog1,ssemov,ssemov,ssemov")
+   (set_attr "prefix" "orig,maybe_vex,evex,evex")
+   (set_attr "mode" "V2DF,DF,DF,V8DF")
+   (set (attr "enabled")
+   (cond [(eq_attr "alternative" "3")
+(symbol_ref "TARGET_AVX512F && !TARGET_AVX512VL
+ && !TARGET_PREFER_AVX256")
+  (match_test "")
+(const_string "*")
+ ]
+ (symbol_ref "false")))])
 
 (define_insn "vec_concatv2df"
   [(set (match_operand:V2DF 0 "register_operand" "=x,x,v,x,x, v,x,x")


RE: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-14 Thread Roger Sayle


> From: Uros Bizjak 
> Sent: 13 July 2023 19:21
> 
> On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle 
> wrote:
> >
> > This patch resolves PR target/110588 to catch another case in combine
> > where the i386 backend should be generating a btl instruction.  This
> > adds another define_insn_and_split to recognize the RTL representation
> > for this case.
> >
> > I also noticed that two related define_insn_and_split weren't using
> > the preferred string style for single statement
> > preparation-statements, so I've reformatted these to be consistent in style 
> > with
> the new one.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-13  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110588
> > * config/i386/i386.md (*bt_setcqi): Prefer string form
> > preparation statement over braces for a single statement.
> > (*bt_setncqi): Likewise.
> > (*bt_setncqi_2): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110588
> > * gcc.target/i386/pr110588.c: New test case.
> 
> +;; Help combine recognize bt followed by setnc (PR target/110588)
> +(define_insn_and_split "*bt_setncqi_2"
> +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> +  (zero_extract:SWI48
> +(match_operand:SWI48 1 "register_operand")
> +(const_int 1)
> +(zero_extend:SI (match_operand:QI 2 "register_operand")))
> +  (const_int 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCC FLAGS_REG)
> +(compare:CCC
> + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> + (const_int 0)))
> +   (set (match_dup 0)
> +(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> I don't think the above transformation is 100% correct, mainly due to the use 
> of
> paradoxical subreg.
> 
> The combined instruction is operating with a zero_extended QImode register, so
> all bits of the register are well defined. You are splitting using 
> paradoxical subreg,
> so you don't know what garbage is there in the highpart of the count register.
> However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a 
> slightly
> invalid RTX, everything checks out.
> 
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> You probably need mode instead of SImode here.

The define_insn for *bt is:

(define_insn "*bt"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC
  (zero_extract:SWI48
(match_operand:SWI48 0 "nonimmediate_operand" "r,m")
(const_int 1)
(match_operand:SI 1 "nonmemory_operand" "r,"))
  (const_int 0)))]

So  isn't appropriate here.

But now you've made me think about it, it's inconsistent that all of the shifts
and rotates in i386.md standardize on QImode for shift counts, but the bit test
instructions use SImode?  I think this explains where the paradoxical SUBREGs
come from, and in theory any_extend from QImode to SImode here could/should 
be handled/unnecessary.

Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

Thanks in advance,
Roger
--




Re: [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits.

2023-07-14 Thread Richard Biener via Gcc-patches
On Thu, 13 Jul 2023, Richard Biener wrote:

> On Wed, 28 Jun 2023, Tamar Christina wrote:
> 
> > Hi All,
> > 
> > For early break vectorization we have to update niters analysis to record 
> > and
> > analyze all exits of the loop, and so all conds.
> > 
> > The niters of the loop is still determined by the main/natural exit of the 
> > loop
> > as this is the O(n) bounds.  For now we don't do much with the secondary 
> > conds,
> > but their assumptions can be used to generate versioning checks later.
> > 
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > 
> > Ok for master?
> 
> I probably confused vec_init_exit_info in the previous patch - that said,
> I'm missing a clear function that determines the natural exit of the
> original (if-converted) scalar loop.  As vec_init_exit_info seems
> to (re-)compute that I'll comment on it here.
> 
> +  /* The main IV is to be determined by the block that's the first 
> reachable
> + block from the latch.  We cannot rely on the order the loop analysis
> + returns and we don't have any SCEV analysis on the loop.  */
> +  auto_vec  workset;
> +  workset.safe_push (loop_latch_edge (loop));
> +  hash_set  visited;
> +
> +  while (!workset.is_empty ())
> +{
> +  edge e = workset.pop ();
> +  if (visited.contains (e))
> +   continue;
> +
> +  bool found_p = false;
> +  for (edge ex : e->src->succs)
> +   {
> + if (exits.contains (ex))
> +   {
> + found_p = true;
> + e = ex;
> + break;
> +   }
> +   }
> +
> +  if (found_p)
> +   {
> + loop->vec_loop_iv = e;
> + for (edge ex : exits)
> +   if (e != ex)
> + loop->vec_loop_alt_exits.safe_push (ex);
> + return;
> +   }
> +  else
> +   {
> + for (edge ex : e->src->preds)
> +   workset.safe_insert (0, ex);
> +   }
> +  visited.add (e);
> +}
> 
> So this greedily follows edges from the latch and takes the first
> exit.  Why's that better than simply choosing the first?
> 
> I'd have done
> 
>  auto_vec exits = get_loop_exit_edges (loop);
>  for (e : exits)
>{
>  if (vect_get_loop_niters (...))
>{
>  if no assumptions use that edge, if assumptions continue
>  searching, maybe ther's an edge w/o assumptions
>}
>}
>  use (first) exit with assumptions
> 
> we probably want to know 'may_be_zero' as well and prefer an edge
> without that.  So eventually call number_of_iterations_exit_assumptions
> directly and look for the best niter_desc and pass that to
> vect_get_loop_niters (or re-do the work).
> 
> As said for "copying" the exit to the loop copies use the block mapping.

In case you only support treating the last exit as IV exit you can
also simply walk dominators from the latch block until you reach
the header and pick the first [niter analyzable] exit block you reach.

But I'm not yet sure this restriction exists.

Richard.

> 
> > Thanks,
> > Tamar
> > 
> > gcc/ChangeLog:
> > 
> > * tree-vect-loop.cc (vect_get_loop_niters): Analyze all exits and return
> > all gconds.
> > (vect_analyze_loop_form): Update code checking for conds.
> > (vect_create_loop_vinfo): Handle having multiple conds.
> > (vect_analyze_loop): Release extra loop conds structures.
> > * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS,
> > LOOP_VINFO_LOOP_IV_COND): New.
> > (struct vect_loop_form_info): Add conds, loop_iv_cond.
> > 
> > --- inline copy of patch -- 
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 
> > 55e69a7ca0b24e0872477141db6f74dbf90b7981..9065811b3b9c2a550baf44768603172b9e26b94b
> >  100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -849,80 +849,106 @@ vect_fixup_scalar_cycles_with_patterns 
> > (loop_vec_info loop_vinfo)
> > in NUMBER_OF_ITERATIONSM1.  Place the condition under which the
> > niter information holds in ASSUMPTIONS.
> >  
> > -   Return the loop exit condition.  */
> > +   Return the loop exit conditions.  */
> >  
> >  
> > -static gcond *
> > +static vec
> >  vect_get_loop_niters (class loop *loop, tree *assumptions,
> >   tree *number_of_iterations, tree *number_of_iterationsm1)
> >  {
> > -  edge exit = single_exit (loop);
> > +  auto_vec exits = get_loop_exit_edges (loop);
> > +  vec conds;
> > +  conds.create (exits.length ());
> >class tree_niter_desc niter_desc;
> >tree niter_assumptions, niter, may_be_zero;
> > -  gcond *cond = get_loop_exit_condition (loop);
> >  
> >*assumptions = boolean_true_node;
> >*number_of_iterationsm1 = chrec_dont_know;
> >*number_of_iterations = chrec_dont_know;
> > +
> >DUMP_VECT_SCOPE ("get_loop_niters");
> >  
> > -  if (!exit)
> > -return cond;
> > +  if (exits.is_empty ())
> > +return conds;
> >  
> > -  may_be_zero = NULL_TREE;
> > -  if (!number_of_iterations_exit_assumptions (loop, exit, &niter_d

Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Uros Bizjak wrote:

> On Fri, Jul 14, 2023 at 10:31?AM Richard Biener  wrote:
> >
> > On Fri, 14 Jul 2023, Uros Bizjak wrote:
> >
> > > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> > > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> > >
> > > (insn 21 19 22 4 (set (reg:V4QI 98)
> > > (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> > >  (expr_list:REG_EQUAL (const_vector:V4QI [
> > > (const_int -52 [0xffcc]) repeated x4
> > > ])
> > > (nil)))
> > > (insn 22 21 23 4 (set (reg:V8HI 100)
> > > (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> > > (parallel [
> > > (const_int 0 [0])
> > > (const_int 1 [0x1])
> > > (const_int 2 [0x2])
> > > (const_int 3 [0x3])
> > > (const_int 4 [0x4])
> > > (const_int 5 [0x5])
> > > (const_int 6 [0x6])
> > > (const_int 7 [0x7])
> > > ] "pr110206.c":12:42 7471 
> > > {sse4_1_zero_extendv8qiv8hi2}
> > >  (expr_list:REG_EQUAL (const_vector:V8HI [
> > > (const_int 204 [0xcc]) repeated x8
> > > ])
> > > (expr_list:REG_DEAD (reg:V4QI 98)
> > > (nil
> > >
> > > We rely on the "undefined" vals to have a specific value (from the earlier
> > > REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> > > need to).  That said, the issue isn't the constant folding per-se but that
> > > we do not actually constant fold but register an equality that doesn't 
> > > hold.
> > >
> > > PR target/110206
> > >
> > > gcc/ChangeLog:
> > >
> > > * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > > * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > > * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > > * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > > when the original source contains a paradoxical subreg.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/torture/pr110206.c: New test.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> > >
> > > OK for mainline and backports?
> >
> > OK.
> >
> > I think the testcase can also run on other targets if you add
> > dg-additional-options "-w -Wno-psabi", all generic vector ops
> > should be lowered if not supported.
> 
> True, but with lowered vector ops, the test would not even come close
> to the problem. The problem is specific to generic vector ops, and can
> be triggered only when paradoxical subregs are used to implement
> (partial) vector modes. This is the case on x86, where partial vectors
> are now heavily used, and even there we need the latest vector ISA
> enabled to trip the condition.
> 
> The above is the reason that dg-torture is used, with the hope that
> the runtime failure will trip when testsuite is run with specific
> target options.

I see.  I'm fine with this then though moving to gcc.target/i386
with appropriate triggering options and a dg-require for runtime
support would also work.

Richard.


Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 10:31 AM Richard Biener  wrote:
>
> On Fri, 14 Jul 2023, Uros Bizjak wrote:
>
> > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> > that it equals 8 elements of HImodeby setting REG_EQUAL note:
> >
> > (insn 21 19 22 4 (set (reg:V4QI 98)
> > (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> > A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
> >  (expr_list:REG_EQUAL (const_vector:V4QI [
> > (const_int -52 [0xffcc]) repeated x4
> > ])
> > (nil)))
> > (insn 22 21 23 4 (set (reg:V8HI 100)
> > (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> > (parallel [
> > (const_int 0 [0])
> > (const_int 1 [0x1])
> > (const_int 2 [0x2])
> > (const_int 3 [0x3])
> > (const_int 4 [0x4])
> > (const_int 5 [0x5])
> > (const_int 6 [0x6])
> > (const_int 7 [0x7])
> > ] "pr110206.c":12:42 7471 
> > {sse4_1_zero_extendv8qiv8hi2}
> >  (expr_list:REG_EQUAL (const_vector:V8HI [
> > (const_int 204 [0xcc]) repeated x8
> > ])
> > (expr_list:REG_DEAD (reg:V4QI 98)
> > (nil
> >
> > We rely on the "undefined" vals to have a specific value (from the earlier
> > REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> > need to).  That said, the issue isn't the constant folding per-se but that
> > we do not actually constant fold but register an equality that doesn't hold.
> >
> > PR target/110206
> >
> > gcc/ChangeLog:
> >
> > * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> > * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> > * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> > * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> > when the original source contains a paradoxical subreg.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/torture/pr110206.c: New test.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> >
> > OK for mainline and backports?
>
> OK.
>
> I think the testcase can also run on other targets if you add
> dg-additional-options "-w -Wno-psabi", all generic vector ops
> should be lowered if not supported.

True, but with lowered vector ops, the test would not even come close
to the problem. The problem is specific to generic vector ops, and can
be triggered only when paradoxical subregs are used to implement
(partial) vector modes. This is the case on x86, where partial vectors
are now heavily used, and even there we need the latest vector ISA
enabled to trip the condition.

The above is the reason that dg-torture is used, with the hope that
the runtime failure will trip when testsuite is run with specific
target options.

Uros.


[PATCH] Provide extra checking for phi argument access from edge

2023-07-14 Thread Richard Biener via Gcc-patches
The following adds checking that the edge we query an associated
PHI arg for is related to the PHI node.  Triggered by questionable
code in one of my reviews.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* gimple.h (gimple_phi_arg): New const overload.
(gimple_phi_arg_def): Make gimple arg const.
(gimple_phi_arg_def_from_edge): New inline function.
* tree-phinodes.h (gimple_phi_arg_imm_use_ptr_from_edge):
Likewise.
* tree-ssa-operands.h (PHI_ARG_DEF_FROM_EDGE): Direct to
new inline function.
(PHI_ARG_DEF_PTR_FROM_EDGE): Likewise.
---
 gcc/gimple.h| 25 -
 gcc/tree-phinodes.h |  7 +++
 gcc/tree-ssa-operands.h |  4 ++--
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple.h b/gcc/gimple.h
index daf55242f68..d3750f95d79 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -4633,6 +4633,13 @@ gimple_phi_arg (const gphi *gs, unsigned index)
   return &(gs->args[index]);
 }
 
+inline const phi_arg_d *
+gimple_phi_arg (const gimple *gs, unsigned index)
+{
+  const gphi *phi_stmt = as_a  (gs);
+  return gimple_phi_arg (phi_stmt, index);
+}
+
 inline struct phi_arg_d *
 gimple_phi_arg (gimple *gs, unsigned index)
 {
@@ -4678,11 +4685,27 @@ gimple_phi_arg_def (const gphi *gs, size_t index)
 }
 
 inline tree
-gimple_phi_arg_def (gimple *gs, size_t index)
+gimple_phi_arg_def (const gimple *gs, size_t index)
 {
   return gimple_phi_arg (gs, index)->def;
 }
 
+/* Return the tree operand for the argument associated with
+   edge E of PHI node GS.  */
+
+inline tree
+gimple_phi_arg_def_from_edge (const gphi *gs, const_edge e)
+{
+  gcc_checking_assert (e->dest == gimple_bb (gs));
+  return gimple_phi_arg (gs, e->dest_idx)->def;
+}
+
+inline tree
+gimple_phi_arg_def_from_edge (const gimple *gs, const_edge e)
+{
+  gcc_checking_assert (e->dest == gimple_bb (gs));
+  return gimple_phi_arg (gs, e->dest_idx)->def;
+}
 
 /* Return a pointer to the tree operand for argument I of phi node PHI.  */
 
diff --git a/gcc/tree-phinodes.h b/gcc/tree-phinodes.h
index 932a461e987..be114e317b4 100644
--- a/gcc/tree-phinodes.h
+++ b/gcc/tree-phinodes.h
@@ -37,6 +37,13 @@ gimple_phi_arg_imm_use_ptr (gimple *gs, int i)
   return &gimple_phi_arg (gs, i)->imm_use;
 }
 
+inline use_operand_p
+gimple_phi_arg_imm_use_ptr_from_edge (gimple *gs, const_edge e)
+{
+  gcc_checking_assert (e->dest == gimple_bb (gs));
+  return &gimple_phi_arg (gs, e->dest_idx)->imm_use;
+}
+
 /* Return the phi argument which contains the specified use.  */
 
 inline int
diff --git a/gcc/tree-ssa-operands.h b/gcc/tree-ssa-operands.h
index ae36bcdb893..c7b74e046d0 100644
--- a/gcc/tree-ssa-operands.h
+++ b/gcc/tree-ssa-operands.h
@@ -83,9 +83,9 @@ struct GTY(()) ssa_operands {
 #define SET_PHI_ARG_DEF(PHI, I, V) \
SET_USE (PHI_ARG_DEF_PTR ((PHI), (I)), (V))
 #define PHI_ARG_DEF_FROM_EDGE(PHI, E)  \
-   PHI_ARG_DEF ((PHI), (E)->dest_idx)
+   gimple_phi_arg_def_from_edge ((PHI), (E))
 #define PHI_ARG_DEF_PTR_FROM_EDGE(PHI, E)  \
-   PHI_ARG_DEF_PTR ((PHI), (E)->dest_idx)
+   gimple_phi_arg_imm_use_ptr_from_edge ((PHI), 
(E))
 #define PHI_ARG_INDEX_FROM_USE(USE)   phi_arg_index_from_use (USE)
 
 
-- 
2.35.3


Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Richard Biener via Gcc-patches
On Fri, 14 Jul 2023, Uros Bizjak wrote:

> cprop1 pass does not consider paradoxical subreg and for (insn 22) claims
> that it equals 8 elements of HImodeby setting REG_EQUAL note:
> 
> (insn 21 19 22 4 (set (reg:V4QI 98)
> (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0  S4
> A32])) "pr110206.c":12:42 1530 {*movv4qi_internal}
>  (expr_list:REG_EQUAL (const_vector:V4QI [
> (const_int -52 [0xffcc]) repeated x4
> ])
> (nil)))
> (insn 22 21 23 4 (set (reg:V8HI 100)
> (zero_extend:V8HI (vec_select:V8QI (subreg:V16QI (reg:V4QI 98) 0)
> (parallel [
> (const_int 0 [0])
> (const_int 1 [0x1])
> (const_int 2 [0x2])
> (const_int 3 [0x3])
> (const_int 4 [0x4])
> (const_int 5 [0x5])
> (const_int 6 [0x6])
> (const_int 7 [0x7])
> ] "pr110206.c":12:42 7471 
> {sse4_1_zero_extendv8qiv8hi2}
>  (expr_list:REG_EQUAL (const_vector:V8HI [
> (const_int 204 [0xcc]) repeated x8
> ])
> (expr_list:REG_DEAD (reg:V4QI 98)
> (nil
> 
> We rely on the "undefined" vals to have a specific value (from the earlier
> REG_EQUAL note) but actual code generation doesn't ensure this (it doesn't
> need to).  That said, the issue isn't the constant folding per-se but that
> we do not actually constant fold but register an equality that doesn't hold.
> 
> PR target/110206
> 
> gcc/ChangeLog:
> 
> * fwprop.cc (contains_paradoxical_subreg_p): Move to ...
> * rtlanal.cc (contains_paradoxical_subreg_p): ... here.
> * rtlanal.h (contains_paradoxical_subreg_p): Add prototype.
> * cprop.cc (try_replace_reg): Do not set REG_EQUAL note
> when the original source contains a paradoxical subreg.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/torture/pr110206.c: New test.
> 
> Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
> 
> OK for mainline and backports?

OK.

I think the testcase can also run on other targets if you add
dg-additional-options "-w -Wno-psabi", all generic vector ops
should be lowered if not supported.

Richard.


Re: [PATCH] gcc-ar: Handle response files properly [PR77576]

2023-07-14 Thread Costas Argyris via Gcc-patches
Pinging to try and get this bug in gcc-ar fixed.

Note that the patch posted as an attachment in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623400.html

is exactly the same as the patch embedded in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623855.html

and the one posted in the PR itself

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77576

On Fri, 7 Jul 2023 at 13:00, Costas Argyris 
wrote:

> Bootstrapped successfully on x86_64-pc-linux-gnu
>
> On Fri, 7 Jul 2023 at 11:33, Costas Argyris 
> wrote:
>
>> Problem: gcc-ar fails when a @file is passed to it:
>>
>> $ cat rsp
>> --version
>> $ gcc-ar @rsp
>> /usr/bin/ar: invalid option -- '@'
>>
>> This is because a dash '-' is prepended to the first
>> argument if it doesn't start with one, resulting in
>> the wrong call 'ar -@rsp'.
>>
>> Fix: Expand argv to get rid of any @files and if any
>> expansions were made, pass everything through a
>> temporary response file.
>>
>> $ gcc-ar @rsp
>> GNU ar (GNU Binutils for Debian) 2.35.2
>> ...
>>
>>
>> PR gcc-ar/77576
>> * gcc/gcc-ar.cc (main): Expand argv and use
>> temporary response file to call ar if any
>> expansions were made.
>> ---
>>  gcc/gcc-ar.cc | 47 +++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/gcc/gcc-ar.cc b/gcc/gcc-ar.cc
>> index 4e4c525927d..417c4913793 100644
>> --- a/gcc/gcc-ar.cc
>> +++ b/gcc/gcc-ar.cc
>> @@ -135,6 +135,10 @@ main (int ac, char **av)
>>int k, status, err;
>>const char *err_msg;
>>const char **nargv;
>> +  char **old_argv;
>> +  const char *rsp_file = NULL;
>> +  const char *rsp_arg = NULL;
>> +  const char *rsp_argv[3];
>>bool is_ar = !strcmp (PERSONALITY, "ar");
>>int exit_code = FATAL_EXIT_CODE;
>>int i;
>> @@ -209,6 +213,13 @@ main (int ac, char **av)
>>   }
>>  }
>>
>> +  /* Expand any @files before modifying the command line
>> + and use a temporary response file if there were any.  */
>> +  old_argv = av;
>> +  expandargv (&ac, &av);
>> +  if (av != old_argv)
>> +rsp_file = make_temp_file ("");
>> +
>>/* Prepend - if necessary.  */
>>if (is_ar && av[1] && av[1][0] != '-')
>>  av[1] = concat ("-", av[1], NULL);
>> @@ -225,6 +236,39 @@ main (int ac, char **av)
>>  nargv[j + k] = av[k];
>>nargv[j + k] = NULL;
>>
>> +  /* If @file was passed, put nargv into the temporary response
>> + file and then change it to a single @FILE argument, where
>> + FILE is the temporary filename.  */
>> +  if (rsp_file)
>> +{
>> +  FILE *f;
>> +  int status;
>> +  f = fopen (rsp_file, "w");
>> +  if (f == NULL)
>> +{
>> +  fprintf (stderr, "Cannot open temporary file %s\n", rsp_file);
>> +  exit (1);
>> +}
>> +  status = writeargv (
>> +  CONST_CAST2 (char * const *, const char **, nargv) + 1, f);
>> +  if (status)
>> +{
>> +  fprintf (stderr, "Cannot write to temporary file %s\n",
>> rsp_file);
>> +  exit (1);
>> +}
>> +  status = fclose (f);
>> +  if (EOF == status)
>> +{
>> +  fprintf (stderr, "Cannot close temporary file %s\n", rsp_file);
>> +  exit (1);
>> +}
>> +  rsp_arg = concat ("@", rsp_file, NULL);
>> +  rsp_argv[0] = nargv[0];
>> +  rsp_argv[1] = rsp_arg;
>> +  rsp_argv[2] = NULL;
>> +  nargv = rsp_argv;
>> +}
>> +
>>/* Run utility */
>>/* ??? the const is misplaced in pex_one's argv? */
>>err_msg = pex_one (PEX_LAST|PEX_SEARCH,
>> @@ -249,5 +293,8 @@ main (int ac, char **av)
>>else
>>  exit_code = SUCCESS_EXIT_CODE;
>>
>> +  if (rsp_file)
>> +unlink (rsp_file);
>> +
>>return exit_code;
>>  }
>> --
>> 2.30.2
>>
>


Re: [x86_64 PATCH] Improved insv of DImode/DFmode {high,low}parts into TImode.

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 13, 2023 at 6:45 PM Roger Sayle  wrote:
>
>
> This is the next piece towards a fix for (the x86_64 ABI issues affecting)
> PR 88873.  This patch generalizes the recent tweak to ix86_expand_move
> for setting the highpart of a TImode reg from a DImode source using
> *insvti_highpart_1, to handle both DImode and DFmode sources, and also
> use the recently added *insvti_lowpart_1 for setting the lowpart.
>
> Although this is another intermediate step (not yet a fix), towards
> enabling *insvti and *concat* patterns to be candidates for TImode STV
> (by using V2DI/V2DF instructions), it already improves things a little.
>
> For the test case from PR 88873
>
> typedef struct { double x, y; } s_t;
> typedef double v2df __attribute__ ((vector_size (2 * sizeof(double;
>
> s_t foo (s_t a, s_t b, s_t c)
> {
>   return (s_t) { fma(a.x, b.x, c.x), fma (a.y, b.y, c.y) };
> }
>
>
> With -O2 -march=cascadelake, GCC currently generates:
>
> Before (29 instructions):
> vmovq   %xmm2, -56(%rsp)
> movq-56(%rsp), %rdx
> vmovq   %xmm4, -40(%rsp)
> movq$0, -48(%rsp)
> movq%rdx, -56(%rsp)
> movq-40(%rsp), %rdx
> vmovq   %xmm0, -24(%rsp)
> movq%rdx, -40(%rsp)
> movq-24(%rsp), %rsi
> movq-56(%rsp), %rax
> movq$0, -32(%rsp)
> vmovq   %xmm3, -48(%rsp)
> movq-48(%rsp), %rcx
> vmovq   %xmm5, -32(%rsp)
> vmovq   %rax, %xmm6
> movq-40(%rsp), %rax
> movq$0, -16(%rsp)
> movq%rsi, -24(%rsp)
> movq-32(%rsp), %rsi
> vpinsrq $1, %rcx, %xmm6, %xmm6
> vmovq   %rax, %xmm7
> vmovq   %xmm1, -16(%rsp)
> vmovapd %xmm6, %xmm3
> vpinsrq $1, %rsi, %xmm7, %xmm7
> vfmadd132pd -24(%rsp), %xmm7, %xmm3
> vmovapd %xmm3, -56(%rsp)
> vmovsd  -48(%rsp), %xmm1
> vmovsd  -56(%rsp), %xmm0
> ret
>
> After (20 instructions):
> vmovq   %xmm2, -56(%rsp)
> movq-56(%rsp), %rax
> vmovq   %xmm3, -48(%rsp)
> vmovq   %xmm4, -40(%rsp)
> movq-48(%rsp), %rcx
> vmovq   %xmm5, -32(%rsp)
> vmovq   %rax, %xmm6
> movq-40(%rsp), %rax
> movq-32(%rsp), %rsi
> vpinsrq $1, %rcx, %xmm6, %xmm6
> vmovq   %xmm0, -24(%rsp)
> vmovq   %rax, %xmm7
> vmovq   %xmm1, -16(%rsp)
> vmovapd %xmm6, %xmm2
> vpinsrq $1, %rsi, %xmm7, %xmm7
> vfmadd132pd -24(%rsp), %xmm7, %xmm2
> vmovapd %xmm2, -56(%rsp)
> vmovsd  -48(%rsp), %xmm1
> vmovsd  -56(%rsp), %xmm0
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  No testcase yet, as the above code will hopefully
> change dramatically with the next pieces.  Ok for mainline?
>
>
> 2023-07-13  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_move): Generalize special
> case inserting of 64-bit values into a TImode register, to handle
> both DImode and DFmode using either *insvti_lowpart_1
> or *isnvti_highpart_1.

LGTM, but please watch out for fallout.

Thanks,
Uros.


Re: [Patch] libgomp: Use libnuma for OpenMP's partition=nearest allocation trait

2023-07-14 Thread Tobias Burnus

Hi Prathamesh,

On 13.07.23 18:13, Prathamesh Kulkarni wrote:


The newly added tests in above commit -- alloc-11.c and alloc-12.c
seem to fail during execution on armv8l-unknown-linux-gnueabihf:


thanks for the report and sorry for the breakage. While being aware that
libnuma is potentially not available, the code actually did not properly
test for it. (That 200+ packages require libnuma on this system, did not
help with testing it without, though.)

I have committed the attached obvious patch as r14-2514-g407d68daed00e0,
which hopefully fixes all issues.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit 407d68daed00e040a7d9545b2a18aa27bf93a106
Author: Tobias Burnus 
Date:   Fri Jul 14 09:14:37 2023 +0200

libgomp: Fix allocator handling for Linux when libnuma is not available

Follow up to r14-2462-g450b05ce54d3f0.  The case that libnuma was not
available at runtime was not properly handled; now it falls back to
the normal malloc.

libgomp/

* allocator.c (omp_init_allocator): Check whether symbol from
dlopened libnuma is available before using libnuma for
allocations.
---
 libgomp/allocator.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b3187ab2911..90f2dcb60d6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -377,8 +377,9 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
 #ifdef LIBGOMP_USE_LIBNUMA
   if (data.memkind == GOMP_MEMKIND_NONE && data.partition == omp_atv_nearest)
 {
-  data.memkind = GOMP_MEMKIND_LIBNUMA;
   libnuma_data = gomp_get_libnuma ();
+  if (libnuma_data->numa_alloc_local != NULL)
+	data.memkind = GOMP_MEMKIND_LIBNUMA;
 }
 #endif
 


Re: [PATCH] i386: Auto vectorize usdot_prod, udot_prod with AVXVNNIINT16 instruction.

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 8:24 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This patch aims to auto vectorize usdot_prod and udot_prod with newly
> introduced AVX-VNNI-INT16.
>
> Also I refined the redundant mode iterator in the patch.
>
> Regtested on x86_64-pc-linux-gnu. Ok for trunk after AVX-VNNI-INT16 patch
> checked in?
>
> BRs,
> Haochen
>
> gcc/ChangeLog:
>
> * config/i386/sse.md (VI2_AVX2): Delete V32HI since we actually
> have the same iterator. Also renaming all the occurence to
> VI2_AVX2_AVX512BW.
> (usdot_prod): New define_expand.
> (udot_prod): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/vnniint16-auto-vectorize-1.c: New test.
> * gcc.target/i386/vnniint16-auto-vectorize-2.c: Ditto.

OK with two changes below.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md| 98 +--
>  .../i386/vnniint16-auto-vectorize-1.c | 28 ++
>  .../i386/vnniint16-auto-vectorize-2.c | 76 ++
>  3 files changed, 172 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vnniint16-auto-vectorize-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 7471932b27e..98e7f9334bc 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -545,6 +545,9 @@
> V32HI (V16HI "TARGET_AVX512VL")])
>
>  (define_mode_iterator VI2_AVX2
> +  [(V16HI "TARGET_AVX2") V8HI])
> +
> +(define_mode_iterator VI2_AVX2_AVX512BW
>[(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
>
>  (define_mode_iterator VI2_AVX512F
> @@ -637,9 +640,6 @@
> (V16HI "TARGET_AVX2") V8HI
> (V8SI "TARGET_AVX2") V4SI])
>
> -(define_mode_iterator VI2_AVX2_AVX512BW
> -  [(V32HI "TARGET_AVX512BW") (V16HI "TARGET_AVX2") V8HI])
> -
>  (define_mode_iterator VI248_AVX512VL
>[V32HI V16SI V8DI
> (V16HI "TARGET_AVX512VL") (V8SI "TARGET_AVX512VL")
> @@ -15298,16 +15298,16 @@
>  })
>
>  (define_expand "mul3"
> -  [(set (match_operand:VI2_AVX2 0 "register_operand")
> -   (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "vector_operand")
> -  (match_operand:VI2_AVX2 2 "vector_operand")))]
> +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand")
> +   (mult:VI2_AVX2_AVX512BW (match_operand:VI2_AVX2_AVX512BW 1 
> "vector_operand")
> +  (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand")))]
>"TARGET_SSE2 &&  && "
>"ix86_fixup_binary_operands_no_copy (MULT, mode, operands);")
>
>  (define_insn "*mul3"
> -  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,")
> -   (mult:VI2_AVX2 (match_operand:VI2_AVX2 1 "vector_operand" "%0,")
> -  (match_operand:VI2_AVX2 2 "vector_operand" 
> "xBm,m")))]
> +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,")
> +   (mult:VI2_AVX2_AVX512BW (match_operand:VI2_AVX2_AVX512BW 1 
> "vector_operand" "%0,")
> +  (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" 
> "xBm,m")))]
>"TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))
> &&  && "
>"@
> @@ -15320,28 +15320,28 @@
> (set_attr "mode" "")])
>
>  (define_expand "mul3_highpart"
> -  [(set (match_operand:VI2_AVX2 0 "register_operand")
> -   (truncate:VI2_AVX2
> +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand")
> +   (truncate:VI2_AVX2_AVX512BW
>   (lshiftrt:
> (mult:
>   (any_extend:
> -   (match_operand:VI2_AVX2 1 "vector_operand"))
> +   (match_operand:VI2_AVX2_AVX512BW 1 "vector_operand"))
>   (any_extend:
> -   (match_operand:VI2_AVX2 2 "vector_operand")))
> +   (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand")))
> (const_int 16]
>"TARGET_SSE2
> &&  && "
>"ix86_fixup_binary_operands_no_copy (MULT, mode, operands);")
>
>  (define_insn "*mul3_highpart"
> -  [(set (match_operand:VI2_AVX2 0 "register_operand" "=x,")
> -   (truncate:VI2_AVX2
> +  [(set (match_operand:VI2_AVX2_AVX512BW 0 "register_operand" "=x,")
> +   (truncate:VI2_AVX2_AVX512BW
>   (lshiftrt:
> (mult:
>   (any_extend:
> -   (match_operand:VI2_AVX2 1 "vector_operand" "%0,"))
> +   (match_operand:VI2_AVX2_AVX512BW 1 "vector_operand" 
> "%0,"))
>   (any_extend:
> -   (match_operand:VI2_AVX2 2 "vector_operand" "xBm,m")))
> +   (match_operand:VI2_AVX2_AVX512BW 2 "vector_operand" 
> "xBm,m")))
> (const_int 16]
>"TARGET_SSE2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))
> &&  && "
> @@ -15591,8 +15591,8 @@
>  (define_insn "avx512bw_pmaddwd512"
>[(set (match_operand: 0 "register_operand" "=v")
>(unspec:
> -[(match_operand:VI2_AVX2 1 "register_operand" "v")
> - (match_operand:VI2_AVX2 2 "nonim

Re: [PATCH 2/2] RISC-V: Implement locality for __builtin_prefetch

2023-07-14 Thread Kito Cheng via Gcc-patches
Corresponding PR on c-api-doc under discussion, so defer this until
that settles down :)

https://github.com/riscv-non-isa/riscv-c-api-doc/pull/46

On Thu, Jul 13, 2023 at 1:40 PM Monk Chiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_print_operand):
>   Add 'N' for print a non-temporal locality hints instruction.
> * config/riscv/riscv.md (prefetch):
>   Add NTLH instruction for prefetch.r and prefetch.w.
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/prefetch-zihintntl.c: New test.
> ---
>  gcc/config/riscv/riscv.cc | 22 +++
>  gcc/config/riscv/riscv.md | 10 ++---
>  .../gcc.target/riscv/prefetch-zihintntl.c | 20 +
>  3 files changed, 49 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 706c18416db..42f80088bab 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -4532,6 +4532,7 @@ riscv_memmodel_needs_amo_release (enum memmodel model)
> 'A' Print the atomic operation suffix for memory model OP.
> 'I' Print the LR suffix for memory model OP.
> 'J' Print the SC suffix for memory model OP.
> +   'N' Print a non-temporal locality hints instruction.
> 'z' Print x0 if OP is zero, otherwise print OP normally.
> 'i' Print i if the operand is not a register.
> 'S' Print shift-index of single-bit mask OP.
> @@ -4718,6 +4719,27 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>break;
>  }
>
> +case 'N':
> +  {
> +   const char *ntl_hint = NULL;
> +   switch (INTVAL (op))
> + {
> + case 0:
> +   ntl_hint = "ntl.all";
> +   break;
> + case 1:
> +   ntl_hint = "ntl.pall";
> +   break;
> + case 2:
> +   ntl_hint = "ntl.p1";
> +   break;
> + }
> +
> +  if (ntl_hint)
> +   asm_fprintf (file, "%s\n\t", ntl_hint);
> +  break;
> +  }
> +
>  case 'i':
>if (code != REG)
>  fputs ("i", file);
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 7988026d129..3357c981b5d 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -3256,11 +3256,15 @@
>  {
>switch (INTVAL (operands[1]))
>{
> -case 0: return "prefetch.r\t%a0";
> -case 1: return "prefetch.w\t%a0";
> +case 0: return TARGET_ZIHINTNTL ? "%N2prefetch.r\t%a0" : 
> "prefetch.r\t%a0";
> +case 1: return TARGET_ZIHINTNTL ? "%N2prefetch.w\t%a0" : 
> "prefetch.w\t%a0";
>  default: gcc_unreachable ();
>}
> -})
> +}
> +  [(set (attr "length") (if_then_else (and (match_test "TARGET_ZIHINTNTL")
> +  (match_test "INTVAL (operands[2]) 
> != 3"))
> + (const_string "8")
> + (const_string "4")))])
>
>  (define_insn "riscv_prefetchi_"
>[(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
> diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c 
> b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
> new file mode 100644
> index 000..78a3afe6833
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile target { { rv64-*-*}}} */
> +/* { dg-options "-march=rv64gc_zicbop_zihintntl -mabi=lp64" } */
> +
> +void foo (char *p)
> +{
> +  __builtin_prefetch (p, 0, 0);
> +  __builtin_prefetch (p, 0, 1);
> +  __builtin_prefetch (p, 0, 2);
> +  __builtin_prefetch (p, 0, 3);
> +  __builtin_prefetch (p, 1, 0);
> +  __builtin_prefetch (p, 1, 1);
> +  __builtin_prefetch (p, 1, 2);
> +  __builtin_prefetch (p, 1, 3);
> +}
> +
> +/* { dg-final { scan-assembler-times "ntl.all" 2 } } */
> +/* { dg-final { scan-assembler-times "ntl.pall" 2 } } */
> +/* { dg-final { scan-assembler-times "ntl.p1" 2 } } */
> +/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
> +/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
> --
> 2.40.1
>


Re: [PATCH 1/2] RISC-V: Recognized zihintntl extensions

2023-07-14 Thread Kito Cheng via Gcc-patches
Committed, thanks :)

On Thu, Jul 13, 2023 at 1:39 PM Monk Chiang via Gcc-patches
 wrote:
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc:
> (riscv_implied_info): Add zihintntl item.
> (riscv_ext_version_table): Ditto.
> (riscv_ext_flag_table): Ditto.
> * config/riscv/riscv-opts.h (MASK_ZIHINTNTL): New macro.
> (TARGET_ZIHINTNTL): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/arch-22.c: New test.
> * gcc.target/riscv/predef-28.c: New test.
> ---
>  gcc/common/config/riscv/riscv-common.cc|  4 ++
>  gcc/config/riscv/riscv-opts.h  |  2 +
>  gcc/testsuite/gcc.target/riscv/arch-22.c   |  5 +++
>  gcc/testsuite/gcc.target/riscv/predef-28.c | 47 ++
>  4 files changed, 58 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/arch-22.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-28.c
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 6091d8f281b..28c8f0c1489 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -206,6 +206,8 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"zksh",  ISA_SPEC_CLASS_NONE, 1, 0},
>{"zkt",   ISA_SPEC_CLASS_NONE, 1, 0},
>
> +  {"zihintntl", ISA_SPEC_CLASS_NONE, 1, 0},
> +
>{"zicboz",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zicbom",ISA_SPEC_CLASS_NONE, 1, 0},
>{"zicbop",ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1267,6 +1269,8 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>{"zksh",   &gcc_options::x_riscv_zk_subext, MASK_ZKSH},
>{"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
>
> +  {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
> +
>{"zicboz", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOZ},
>{"zicbom", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOM},
>{"zicbop", &gcc_options::x_riscv_zicmo_subext, MASK_ZICBOP},
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index cfcf608ea62..beee241aa1b 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -101,9 +101,11 @@ enum riscv_entity
>
>  #define MASK_ZICSR(1 << 0)
>  #define MASK_ZIFENCEI (1 << 1)
> +#define MASK_ZIHINTNTL (1 << 2)
>
>  #define TARGET_ZICSR((riscv_zi_subext & MASK_ZICSR) != 0)
>  #define TARGET_ZIFENCEI ((riscv_zi_subext & MASK_ZIFENCEI) != 0)
> +#define TARGET_ZIHINTNTL ((riscv_zi_subext & MASK_ZIHINTNTL) != 0)
>
>  #define MASK_ZAWRS   (1 << 0)
>  #define TARGET_ZAWRS ((riscv_za_subext & MASK_ZAWRS) != 0)
> diff --git a/gcc/testsuite/gcc.target/riscv/arch-22.c 
> b/gcc/testsuite/gcc.target/riscv/arch-22.c
> new file mode 100644
> index 000..cdc18e13d0f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/arch-22.c
> @@ -0,0 +1,5 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv64gc_zihintntl -mabi=lp64 -mcmodel=medlow" } */
> +int foo()
> +{
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/predef-28.c 
> b/gcc/testsuite/gcc.target/riscv/predef-28.c
> new file mode 100644
> index 000..81fdad571e7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/predef-28.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zihintntl -mabi=lp64 -mcmodel=medlow" } */
> +
> +int main () {
> +
> +#ifndef __riscv_arch_test
> +#error "__riscv_arch_test"
> +#endif
> +
> +#if __riscv_xlen != 64
> +#error "__riscv_xlen"
> +#endif
> +
> +#if !defined(__riscv_i)
> +#error "__riscv_i"
> +#endif
> +
> +#if !defined(__riscv_c)
> +#error "__riscv_c"
> +#endif
> +
> +#if defined(__riscv_e)
> +#error "__riscv_e"
> +#endif
> +
> +#if !defined(__riscv_a)
> +#error "__riscv_a"
> +#endif
> +
> +#if !defined(__riscv_m)
> +#error "__riscv_m"
> +#endif
> +
> +#if !defined(__riscv_f)
> +#error "__riscv_f"
> +#endif
> +
> +#if !defined(__riscv_d)
> +#error "__riscv_d"
> +#endif
> +
> +#if !defined(__riscv_zihintntl)
> +#error "__riscv_zihintntl"
> +#endif
> +
> +  return 0;
> +}
> --
> 2.40.1
>


Re: [PATCH V2] RISC-V: Enable COND_LEN_FMA auto-vectorization

2023-07-14 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

thanks, looks good to me now - did before already actually ;).

Regards
 Robin