[gcc r14-9435] tree-optimization/114297 - SLP reduction with early break fix

2024-03-12 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:c0c57246d5b47459bdb488734bc2c004a92668b5

commit r14-9435-gc0c57246d5b47459bdb488734bc2c004a92668b5
Author: Richard Biener 
Date:   Mon Mar 11 14:58:57 2024 +0100

tree-optimization/114297 - SLP reduction with early break fix

The following makes sure to pass in the SLP node for the live stmts
we are generating the reduction epilogue for to
vect_create_epilog_for_reduction.  This follows the previous fix for
the non-SLP path.

PR tree-optimization/114297
* tree-vect-loop.cc (vectorizable_live_operation): Pass in the
live stmts SLP node to vect_create_epilog_for_reduction.

* gcc.dg/vect/vect-early-break_123-pr114297.c: New testcase.

Diff:
---
 .../gcc.dg/vect/vect-early-break_123-pr114297.c| 22 ++
 gcc/tree-vect-loop.cc  |  7 ---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c
new file mode 100644
index 000..84487b7903b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-add-options vect_early_break } */
+/* { dg-require-effective-target vect_early_break } */
+
+void h() __attribute__((__noreturn__));
+struct Extremes {
+  int w;
+  int h;
+};
+struct Extremes *array;
+int f(int num, int size1)
+{
+  int sw = 0, sh = 0;
+  for (int i = 0; i < size1; ++i)
+  {
+if (num - i == 0)
+  h();
+sw += array[i].w;
+sh += array[i].h;
+  }
+  return (sw) +  (sh);
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 20ee0aad932..4375ebdcb49 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10729,17 +10729,18 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 block, but we have to find an alternate exit first.  */
   if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
{
+ slp_tree phis_node = slp_node ? slp_node_instance->reduc_phis : NULL;
  for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo))
  {
vect_create_epilog_for_reduction (loop_vinfo, reduc_info,
- slp_node, slp_node_instance,
+ phis_node, slp_node_instance,
  exit);
break;
  }
  if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo))
-   vect_create_epilog_for_reduction (loop_vinfo, reduc_info, slp_node,
- slp_node_instance,
+   vect_create_epilog_for_reduction (loop_vinfo, reduc_info,
+ phis_node, slp_node_instance,
  LOOP_VINFO_IV_EXIT (loop_vinfo));
}


[gcc r14-9436] RISC-V: Fix some code style issue(s) in riscv-c.cc [NFC]

2024-03-12 Thread Pan Li via Gcc-cvs
https://gcc.gnu.org/g:cdf0c6604d03afd7f544dd8bd5d43d9ded059ada

commit r14-9436-gcdf0c6604d03afd7f544dd8bd5d43d9ded059ada
Author: Pan Li 
Date:   Tue Mar 12 15:01:57 2024 +0800

RISC-V: Fix some code style issue(s) in riscv-c.cc [NFC]

Notice some code style issue(s) when add __riscv_v_fixed_vlen, includes:

* Meanless empty line.
* Line greater than 80 chars.
* Indent with 3 space(s).
* Argument unalignment.

gcc/ChangeLog:

* config/riscv/riscv-c.cc (riscv_ext_version_value): Fix
code style greater than 80 chars.
(riscv_cpu_cpp_builtins): Fix useless empty line, indent
with 3 space(s) and argument unalignment.

Signed-off-by: Pan Li 

Diff:
---
 gcc/config/riscv/riscv-c.cc | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 3755ec0b8ef..7029ba88186 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -37,7 +37,8 @@ along with GCC; see the file COPYING3.  If not see
 static int
 riscv_ext_version_value (unsigned major, unsigned minor)
 {
-  return (major * RISCV_MAJOR_VERSION_BASE) + (minor * 
RISCV_MINOR_VERSION_BASE);
+  return (major * RISCV_MAJOR_VERSION_BASE)
++ (minor * RISCV_MINOR_VERSION_BASE);
 }
 
 /* Implement TARGET_CPU_CPP_BUILTINS.  */
@@ -110,7 +111,6 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 case CM_MEDANY:
   builtin_define ("__riscv_cmodel_medany");
   break;
-
 }
 
   if (riscv_user_wants_strict_align)
@@ -142,9 +142,9 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
 riscv_ext_version_value (0, 12));
 }
 
-   if (TARGET_XTHEADVECTOR)
- builtin_define_with_int_value ("__riscv_th_v_intrinsic",
-riscv_ext_version_value (0, 11));
+  if (TARGET_XTHEADVECTOR)
+builtin_define_with_int_value ("__riscv_th_v_intrinsic",
+  riscv_ext_version_value (0, 11));
 
   /* Define architecture extension test macros.  */
   builtin_define_with_int_value ("__riscv_arch_test", 1);


[gcc r14-9437] strlen: Fix another spot that can create invalid ranges [PR114293]

2024-03-12 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:39737cdf002637c7a652e9c3e36f369cfce581e5

commit r14-9437-g39737cdf002637c7a652e9c3e36f369cfce581e5
Author: Jakub Jelinek 
Date:   Tue Mar 12 10:23:19 2024 +0100

strlen: Fix another spot that can create invalid ranges [PR114293]

This PR is similar to PR110603 fixed with r14-8487, except in a different
spot.  From the memset with -1 size of non-zero value we determine minimum
of (size_t) -1 and the code uses PTRDIFF_MAX - 2 (not really sure I
understand why it is - 2 and not - 1, e.g. heap allocated array
with PTRDIFF_MAX char elements which contain '\0' in the last element
should be fine, no?  One can still represent arr[PTRDIFF_MAX] - arr[0]
and arr[0] - arr[PTRDIFF_MAX] in ptrdiff_t and
strlen (arr) == PTRDIFF_MAX - 1) as the maximum, so again invalid range.
As in the other case, it is just UB that can lead to that, and we have
choice to only keep the min and use +inf for max, or only keep max
and use 0 for min, or not set the range at all, or use [min, min] or
[max, max] etc.  The following patch uses [min, +inf].

2024-03-12  Jakub Jelinek  

PR tree-optimization/114293
* tree-ssa-strlen.cc (strlen_pass::handle_builtin_strlen): If
max is smaller than min, set max to ~(size_t)0.

* gcc.dg/pr114293.c: New test.

Diff:
---
 gcc/testsuite/gcc.dg/pr114293.c | 10 ++
 gcc/tree-ssa-strlen.cc  |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr114293.c b/gcc/testsuite/gcc.dg/pr114293.c
new file mode 100644
index 000..eb49ede0657
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr114293.c
@@ -0,0 +1,10 @@
+/* PR tree-optimization/114293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -w" } */
+
+int
+foo (int x)
+{
+  __builtin_memset (&x, 5, -1);
+  return __builtin_strlen ((char *) &x);
+}
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 20540c52948..e09c9cc081f 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -2341,6 +2341,8 @@ strlen_pass::handle_builtin_strlen ()
  wide_int min = wi::to_wide (old);
  wide_int max
= wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node)) - 2;
+ if (wi::gtu_p (min, max))
+   max = wi::to_wide (TYPE_MAX_VALUE (TREE_TYPE (lhs)));
  set_strlen_range (lhs, min, max);
}
  else


[gcc r14-9438] asan: Instrument stores in callees rather than callers [PR112709]

2024-03-12 Thread Jakub Jelinek via Gcc-cvs
https://gcc.gnu.org/g:ad860cc27b3312f9119c7fecb8638a7c1f6d77c9

commit r14-9438-gad860cc27b3312f9119c7fecb8638a7c1f6d77c9
Author: Jakub Jelinek 
Date:   Tue Mar 12 11:34:50 2024 +0100

asan: Instrument  stores in callees rather than callers [PR112709]

asan currently instruments since PR69276 r6-6758 fix calls which store
the return value into memory on the caller side, before the call it
verifies the memory is writable.
Now PR112709 where we ICE on trying to instrument such calls made me
think about whether that is what we want to do.

There are 3 different cases.

One is when a function returns an aggregate which is passed e.g. in
registers, say like struct S { int a[4]; }; returning on x86_64.
That would be ideally instrumented in between the actual call and
storing of the aggregate into memory, but asan currently mostly
works as a GIMPLE pass and arranging for the instrumentation to happen
at that spot would be really hard.  We could diagnose after the call
but generally asan attempts to diagnose stuff before something is
overwritten rather than after, or keep the current behavior (that is
what this patch does, which has the disadvantage that it can complain
about UB even for functions which never return and so never actually store,
and doesn't check whether the memory wasn't e.g. poisoned during the call)
or could e.g. instrument both before and after the call (that would have
the disadvantage the current state has but at least would check post-factum
the store again afterwards).

Another case is when a function returns an aggregate through a hidden
reference, struct T { int a[128]; }; on x86_64 or even the above struct S
on ia32 as example.  In the actual program such stores happen when storing
something to  or its parts in the callee, because  there
expands to *hidden_retval.  So, IMHO we should instrument those in the
callee rather than caller, that is where the writes are and we can do that
easily.  This is what the patch below does.

And the last case is for builtins/internal functions.  Usually those don't
return aggregates, but in case they'd do and can be expanded inline, it is
better to instrument them in the caller (as before) rather than not
instrumenting the return stores at all.

I had to tweak the expected output on the PR69276 testcase, because
with the patch it keeps previous behavior on x86_64 (structure returned
in registers, stored in the caller, so reported as UB in A::A()),
while on i686 it changed the behavior and is reported as UB in the
vnull::operator vec which stores the structure, A::A() is then a frame
above it in the backtrace.

2024-03-12  Jakub Jelinek  

PR sanitizer/112709
* asan.cc (has_stmt_been_instrumented_p): Don't instrument call
stores on the caller side unless it is a call to a builtin or
internal function or function doesn't return by hidden reference.
(maybe_instrument_call): Likewise.
(instrument_derefs): Instrument stores to RESULT_DECL if
returning by hidden reference.

* gcc.dg/asan/pr112709-1.c: New test.
* g++.dg/asan/pr69276.C: Adjust expected output for some targets.

Diff:
---
 gcc/asan.cc| 17 +--
 gcc/testsuite/g++.dg/asan/pr69276.C|  3 +-
 gcc/testsuite/gcc.dg/asan/pr112709-1.c | 52 ++
 3 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index d621ec9c323..c533b09b1a1 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1372,7 +1372,12 @@ has_stmt_been_instrumented_p (gimple *stmt)
  return true;
}
 }
-  else if (is_gimple_call (stmt) && gimple_store_p (stmt))
+  else if (is_gimple_call (stmt)
+  && gimple_store_p (stmt)
+  && (gimple_call_builtin_p (stmt)
+  || gimple_call_internal_p (stmt)
+  || !aggregate_value_p (TREE_TYPE (gimple_call_lhs (stmt)),
+ gimple_call_fntype (stmt
 {
   asan_mem_ref r;
   asan_mem_ref_init (&r, NULL, 1);
@@ -2751,7 +2756,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t,
 return;
 
   poly_int64 decl_size;
-  if ((VAR_P (inner) || TREE_CODE (inner) == RESULT_DECL)
+  if ((VAR_P (inner)
+   || (TREE_CODE (inner) == RESULT_DECL
+  && !aggregate_value_p (inner, current_function_decl)))
   && offset == NULL_TREE
   && DECL_SIZE (inner)
   && poly_int_tree_p (DECL_SIZE (inner), &decl_size)
@@ -3023,7 +3030,11 @@ maybe_instrument_call (gimple_stmt_iterator *iter)
 }
 
   bool instrumented = false;
-  if (gimple_store_p (stmt))
+  if (gimple_store_p (stmt)
+  && (gimple_call_builtin_p (stmt)
+ || gimple_call_internal_p (stmt)
+ || !aggregate_value_p (TREE_TY

[gcc r14-9439] c++: Support target-specific nodes when streaming modules [PR111224]

2024-03-12 Thread Nathaniel Shead via Gcc-cvs
https://gcc.gnu.org/g:4aa87b856067d4911de8fb66b3a27659dc75ca6d

commit r14-9439-g4aa87b856067d4911de8fb66b3a27659dc75ca6d
Author: Nathaniel Shead 
Date:   Sun Mar 10 22:06:18 2024 +1100

c++: Support target-specific nodes when streaming modules [PR111224]

Some targets make use of POLY_INT_CSTs and other custom builtin types,
which currently violate some assumptions when streaming. This patch adds
support for them, such as types like Aarch64 __fp16, PowerPC __ibm128,
and vector types thereof.

This patch doesn't provide "full" support of AArch64 SVE, however, since
for that we would need to support 'target' nodes (tracked in PR108080).

Adding the new builtin types means that on Aarch64 we now have 217
global trees created on initialisation (up from 191), so this patch also
slightly bumps the initial size of the fixed_trees allocation to 250.

PR c++/98645
PR c++/98688
PR c++/111224

gcc/cp/ChangeLog:

* module.cc (enum tree_tag): Add new tag for builtin types.
(trees_out::start): POLY_INT_CSTs can be emitted.
(trees_in::start): Likewise.
(trees_out::core_vals): Stream POLY_INT_CSTs.
(trees_in::core_vals): Likewise.
(trees_out::type_node): Handle vectors with multiple coeffs.
(trees_in::tree_node): Likewise.
(init_modules): Register target-specific builtin types. Bump
initial capacity slightly.

gcc/testsuite/ChangeLog:

* g++.dg/modules/target-aarch64-1_a.C: New test.
* g++.dg/modules/target-aarch64-1_b.C: New test.
* g++.dg/modules/target-powerpc-1_a.C: New test.
* g++.dg/modules/target-powerpc-1_b.C: New test.
* g++.dg/modules/target-powerpc-2_a.C: New test.
* g++.dg/modules/target-powerpc-2_b.C: New test.

Signed-off-by: Nathaniel Shead 
Reviewed-by: Patrick Palka 

Diff:
---
 gcc/cp/module.cc  | 32 ---
 gcc/testsuite/g++.dg/modules/target-aarch64-1_a.C | 17 
 gcc/testsuite/g++.dg/modules/target-aarch64-1_b.C | 13 +
 gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C |  7 +
 gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C | 10 +++
 gcc/testsuite/g++.dg/modules/target-powerpc-2_a.C | 20 ++
 gcc/testsuite/g++.dg/modules/target-powerpc-2_b.C | 12 +
 7 files changed, 101 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 99055523d91..8aab9ea0bae 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -5173,7 +5173,6 @@ trees_out::start (tree t, bool code_streamed)
   break;
 
 case FIXED_CST:
-case POLY_INT_CST:
   gcc_unreachable (); /* Not supported in C++.  */
   break;
 
@@ -5259,7 +5258,6 @@ trees_in::start (unsigned code)
 
 case FIXED_CST:
 case IDENTIFIER_NODE:
-case POLY_INT_CST:
 case SSA_NAME:
 case TARGET_MEM_REF:
 case TRANSLATION_UNIT_DECL:
@@ -6106,7 +6104,10 @@ trees_out::core_vals (tree t)
   break;
 
 case POLY_INT_CST:
-  gcc_unreachable (); /* Not supported in C++.  */
+  if (streaming_p ())
+   for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++)
+ WT (POLY_INT_CST_COEFF (t, ix));
+  break;
 
 case REAL_CST:
   if (streaming_p ())
@@ -6615,8 +6616,9 @@ trees_in::core_vals (tree t)
   break;
 
 case POLY_INT_CST:
-  /* Not suported in C++.  */
-  return false;
+  for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++)
+   RT (POLY_INT_CST_COEFF (t, ix));
+  break;
 
 case REAL_CST:
   if (const void *bytes = buf (sizeof (real_value)))
@@ -9068,8 +9070,8 @@ trees_out::type_node (tree type)
   if (streaming_p ())
{
  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (type);
- /* to_constant asserts that only coeff[0] is of interest.  */
- wu (static_cast (nunits.to_constant ()));
+ for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++)
+   wu (nunits.coeffs[ix]);
}
   break;
 }
@@ -9630,9 +9632,11 @@ trees_in::tree_node (bool is_use)
 
  case VECTOR_TYPE:
{
- unsigned HOST_WIDE_INT nunits = wu ();
+ poly_uint64 nunits;
+ for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++)
+   nunits.coeffs[ix] = wu ();
  if (!get_overrun ())
-   res = build_vector_type (res, static_cast (nunits));
+   res = build_vector_type (res, nunits);
}
break;
  }
@@ -20151,7 +20155,7 @@ init_modules (cpp_reader *reader)
  some global trees are lazily created and we don't want that to
  mess with our syndrome of fixed trees.  */
   unsigned crc = 0;
-  vec_alloc (fixed_trees, 200);
+  vec_alloc (fixed_trees, 250);
 
   dump () && dump ("+Creating globals");
 

[gcc r14-9440] tree-optimization/114121 - chrec_fold_{plus, multiply} and recursion

2024-03-12 Thread Richard Biener via Gcc-cvs
https://gcc.gnu.org/g:73dac51b32575f980289c073969c6d825963d076

commit r14-9440-g73dac51b32575f980289c073969c6d825963d076
Author: Richard Biener 
Date:   Tue Mar 12 14:00:05 2024 +0100

tree-optimization/114121 - chrec_fold_{plus,multiply} and recursion

The following addresses endless recursion in the
chrec_fold_{plus,multiply} functions when handling sign-conversions.
We only need to apply tricks when we'd fail (there's a chrec in the
converted operand) and we need to make sure to not turn the other
operand into something worse (for the chrec-vs-chrec case).

PR tree-optimization/114121
* tree-chrec.cc (chrec_fold_plus_1): Guard recursion with
converted operand properly.
(chrec_fold_multiply): Likewise.  Handle missed recursion.

* gcc.dg/torture/pr114312.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.dg/torture/pr114312.c |  15 +++
 gcc/tree-chrec.cc   | 176 +---
 2 files changed, 107 insertions(+), 84 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/torture/pr114312.c 
b/gcc/testsuite/gcc.dg/torture/pr114312.c
new file mode 100644
index 000..c508c64ed19
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr114312.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target bitint } */
+
+#if __BITINT_MAXWIDTH__ >= 129
+typedef _BitInt(129) B;
+B b;
+
+B
+foo(void)
+{
+  _BitInt(64) a = 1;
+  a &= b * b;
+  return b << a;
+}
+#endif
diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc
index 7cd0ebc1010..1b2ed753551 100644
--- a/gcc/tree-chrec.cc
+++ b/gcc/tree-chrec.cc
@@ -251,23 +251,27 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
  return chrec_fold_plus_poly_poly (code, type, op0, op1);
 
CASE_CONVERT:
- {
-   /* We can strip sign-conversions to signed by performing the
-  operation in unsigned.  */
-   tree optype = TREE_TYPE (TREE_OPERAND (op1, 0));
-   if (INTEGRAL_TYPE_P (type)
-   && INTEGRAL_TYPE_P (optype)
-   && tree_nop_conversion_p (type, optype)
-   && TYPE_UNSIGNED (optype))
- return chrec_convert (type,
-   chrec_fold_plus_1 (code, optype,
-  chrec_convert (optype,
- op0, 
NULL),
-  TREE_OPERAND (op1, 0)),
-   NULL);
-   if (tree_contains_chrecs (op1, NULL))
+ if (tree_contains_chrecs (op1, NULL))
+   {
+ /* We can strip sign-conversions to signed by performing the
+operation in unsigned.  */
+ tree optype = TREE_TYPE (TREE_OPERAND (op1, 0));
+ if (INTEGRAL_TYPE_P (type)
+ && INTEGRAL_TYPE_P (optype)
+ && tree_nop_conversion_p (type, optype)
+ && TYPE_UNSIGNED (optype))
+   {
+ tree tem = chrec_convert (optype, op0, NULL);
+ if (TREE_CODE (tem) == POLYNOMIAL_CHREC)
+   return chrec_convert (type,
+ chrec_fold_plus_1 (code, optype,
+tem,
+TREE_OPERAND
+  (op1, 0)),
+ NULL);
+   }
  return chrec_dont_know;
- }
+   }
  /* FALLTHRU */
 
default:
@@ -284,26 +288,27 @@ chrec_fold_plus_1 (enum tree_code code, tree type,
}
 
 CASE_CONVERT:
-  {
-   /* We can strip sign-conversions to signed by performing the
-  operation in unsigned.  */
-   tree optype = TREE_TYPE (TREE_OPERAND (op0, 0));
-   if (INTEGRAL_TYPE_P (type)
-   && INTEGRAL_TYPE_P (optype)
-   && tree_nop_conversion_p (type, optype)
-   && TYPE_UNSIGNED (optype))
- return chrec_convert (type,
-   chrec_fold_plus_1 (code, optype,
-  TREE_OPERAND (op0, 0),
-  chrec_convert (optype,
- op1, NULL)),
-   NULL);
-   if (tree_contains_chrecs (op0, NULL))
+  if (tree_contains_chrecs (op0, NULL))
+   {
+ /* We can strip sign-conversions to signed by performing the
+operation in unsigned.  */
+ tree optype = TREE_TYPE (TREE_OPERAND (op0, 0));
+ if (INTEGRAL_TYPE_P (type)
+ && INTEGRAL_TYPE_P (optype)
+ && tree_nop_conversion_p (type, optype)
+ && TYPE_UNSIGNED (optype))
+   retur

[gcc r13-8421] libstdc++: Optimize std::to_array for trivial types [PR110167]

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:4c6bb36e88d5c8e510b10d12c01e3461c2aa4259

commit r13-8421-g4c6bb36e88d5c8e510b10d12c01e3461c2aa4259
Author: Jonathan Wakely 
Date:   Thu Jun 8 12:24:43 2023 +0100

libstdc++: Optimize std::to_array for trivial types [PR110167]

As reported in PR libstdc++/110167, std::to_array compiles extremely
slowly for very large arrays. It needs to instantiate a very large
specialization of std::index_sequence and then create a very large
aggregate initializer from the pack expansion. For trivial types we can
simply default-initialize the std::array and then use memcpy to copy the
values. For non-trivial types we need to use the existing
implementation, despite the compilation cost.

As also noted in the PR, using a generic lambda instead of the
__to_array helper compiles faster since gcc-13. It also produces
slightly smaller code at -O1, due to additional inlining. The code at
-Os, -O2 and -O3 seems to be the same. This new implementation requires
__cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported
since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1).

libstdc++-v3/ChangeLog:

PR libstdc++/110167
* include/std/array (to_array): Initialize arrays of trivial
types using memcpy. For non-trivial types, use lambda
expressions instead of a separate helper function.
(__to_array): Remove.
* testsuite/23_containers/array/creation/110167.cc: New test.

(cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9)

Diff:
---
 libstdc++-v3/include/std/array | 53 +++---
 .../23_containers/array/creation/110167.cc | 14 ++
 2 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index 97cca454ef9..edcac892b52 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::move(std::get<_Int>(__arr));
 }
 
-#if __cplusplus > 201703L
+#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L
 #define __cpp_lib_to_array 201907L
-
-  template
-constexpr array, sizeof...(_Idx)>
-__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>)
-{
-  if constexpr (_Move)
-   return {{std::move(__a[_Idx])...}};
-  else
-   return {{__a[_Idx]...}};
-}
-
   template
 [[nodiscard]]
 constexpr array, _Nm>
@@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(!is_array_v<_Tp>);
   static_assert(is_constructible_v<_Tp, _Tp&>);
   if constexpr (is_constructible_v<_Tp, _Tp&>)
-   return std::__to_array(__a, make_index_sequence<_Nm>{});
-  __builtin_unreachable(); // FIXME: see PR c++/91388
+   {
+ if constexpr (is_trivial_v<_Tp>)
+   {
+ array, _Nm> __arr;
+ if (!__is_constant_evaluated() && _Nm != 0)
+   __builtin_memcpy((void*)__arr.data(), (void*)__a, sizeof(__a));
+ else
+   for (size_t __i = 0; __i < _Nm; ++__i)
+ __arr._M_elems[__i] = __a[__i];
+ return __arr;
+   }
+ else
+   return [&__a](index_sequence<_Idx...>) {
+ return array, _Nm>{{ __a[_Idx]... }};
+   }(make_index_sequence<_Nm>{});
+   }
+  else
+   __builtin_unreachable(); // FIXME: see PR c++/91388
 }
 
   template
@@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(!is_array_v<_Tp>);
   static_assert(is_move_constructible_v<_Tp>);
   if constexpr (is_move_constructible_v<_Tp>)
-   return std::__to_array<1>(__a, make_index_sequence<_Nm>{});
-  __builtin_unreachable(); // FIXME: see PR c++/91388
+   {
+ if constexpr (is_trivial_v<_Tp>)
+   {
+ array, _Nm> __arr;
+ if (!__is_constant_evaluated() && _Nm != 0)
+   __builtin_memcpy((void*)__arr.data(), (void*)__a, sizeof(__a));
+ else
+   for (size_t __i = 0; __i < _Nm; ++__i)
+ __arr._M_elems[__i] = __a[__i];
+ return __arr;
+   }
+ else
+   return [&__a](index_sequence<_Idx...>) {
+ return array, _Nm>{{ std::move(__a[_Idx])... }};
+   }(make_index_sequence<_Nm>{});
+   }
+  else
+   __builtin_unreachable(); // FIXME: see PR c++/91388
 }
 #endif // C++20
 
diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc 
b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
new file mode 100644
index 000..c2aecc911bd
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// PR libstdc++/110167 - excessive compile time w

[gcc r13-8422] libstdc++: Fix a -Wsign-compare warning in std::list

2024-03-12 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:66c55e4f57135f2df09daeea94e0900862c54799

commit r13-8422-g66c55e4f57135f2df09daeea94e0900862c54799
Author: Jonathan Wakely 
Date:   Wed Aug 9 11:28:56 2023 +0100

libstdc++: Fix a -Wsign-compare warning in std::list

libstdc++-v3/ChangeLog:

* include/bits/list.tcc (list::sort(Cmp)): Fix -Wsign-compare
warning for loop condition.

(cherry picked from commit 9bd194434acb47fac80aad45ed04039e0535d1fe)

Diff:
---
 libstdc++-v3/include/bits/list.tcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/list.tcc 
b/libstdc++-v3/include/bits/list.tcc
index 3e5b1f7b972..344386aa4d0 100644
--- a/libstdc++-v3/include/bits/list.tcc
+++ b/libstdc++-v3/include/bits/list.tcc
@@ -654,7 +654,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
{
  // Move all nodes back into *this.
  __carry._M_put_all(end()._M_node);
- for (int __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i)
+ for (size_t __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i)
__tmp[__i]._M_put_all(end()._M_node);
  __throw_exception_again;
}


[gcc r13-8423] libstdc++: Add [[nodiscard]] to std::span members

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:779563cff2e18e7891abf57aeee90e8db5035eb5

commit r13-8423-g779563cff2e18e7891abf57aeee90e8db5035eb5
Author: Jonathan Wakely 
Date:   Sat Nov 4 08:30:54 2023 +

libstdc++: Add [[nodiscard]] to std::span members

All std::span member functions are pure functions that have no side
effects. They are only useful for their return value, so they should all
warn if that value is not used.

libstdc++-v3/ChangeLog:

* include/std/span (span, as_bytes, as_writable_bytes): Add
[[nodiscard]] attribute on all non-void functions.
* testsuite/23_containers/span/back_assert_neg.cc: Suppress
nodiscard warning.
* testsuite/23_containers/span/back_neg.cc: Likewise.
* testsuite/23_containers/span/first_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/first_assert_neg.cc: Likewise.
* testsuite/23_containers/span/first_neg.cc: Likewise.
* testsuite/23_containers/span/front_assert_neg.cc: Likewise.
* testsuite/23_containers/span/front_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_assert_neg.cc: Likewise.
* testsuite/23_containers/span/index_op_neg.cc: Likewise.
* testsuite/23_containers/span/last_2_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_assert_neg.cc: Likewise.
* testsuite/23_containers/span/last_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_2_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_3_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_4_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_5_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_6_assert_neg.cc:
Likewise.
* testsuite/23_containers/span/subspan_assert_neg.cc: Likewise.
* testsuite/23_containers/span/subspan_neg.cc: Likewise.
* testsuite/23_containers/span/nodiscard.cc: New test.

(cherry picked from commit a92a434024c59f57dc24328d946f97a5e71cee94)

Diff:
---
 libstdc++-v3/include/std/span  | 26 +-
 .../23_containers/span/back_assert_neg.cc  |  2 +-
 .../testsuite/23_containers/span/back_neg.cc   |  2 +-
 .../23_containers/span/first_2_assert_neg.cc   |  2 +-
 .../23_containers/span/first_assert_neg.cc |  2 +-
 .../testsuite/23_containers/span/first_neg.cc  |  2 +-
 .../23_containers/span/front_assert_neg.cc |  2 +-
 .../testsuite/23_containers/span/front_neg.cc  |  2 +-
 .../23_containers/span/index_op_assert_neg.cc  |  2 +-
 .../testsuite/23_containers/span/index_op_neg.cc   |  2 +-
 .../23_containers/span/last_2_assert_neg.cc|  2 +-
 .../23_containers/span/last_assert_neg.cc  |  2 +-
 .../testsuite/23_containers/span/last_neg.cc   |  2 +-
 .../testsuite/23_containers/span/nodiscard.cc  | 58 ++
 .../23_containers/span/subspan_2_assert_neg.cc |  2 +-
 .../23_containers/span/subspan_3_assert_neg.cc |  2 +-
 .../23_containers/span/subspan_4_assert_neg.cc |  2 +-
 .../23_containers/span/subspan_5_assert_neg.cc |  2 +-
 .../23_containers/span/subspan_6_assert_neg.cc |  2 +-
 .../23_containers/span/subspan_assert_neg.cc   |  2 +-
 .../testsuite/23_containers/span/subspan_neg.cc|  6 +--
 21 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 67633899665..b70893779d8 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -248,20 +248,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   // observers
 
+  [[nodiscard]]
   constexpr size_type
   size() const noexcept
   { return this->_M_extent._M_extent(); }
 
+  [[nodiscard]]
   constexpr size_type
   size_bytes() const noexcept
   { return this->_M_extent._M_extent() * sizeof(element_type); }
 
-  [[nodiscard]] constexpr bool
+  [[nodiscard]]
+  constexpr bool
   empty() const noexcept
   { return size() == 0; }
 
   // element access
 
+  [[nodiscard]]
   constexpr reference
   front() const noexcept
   {
@@ -269,6 +273,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *this->_M_ptr;
   }
 
+  [[nodiscard]]
   constexpr reference
   back() const noexcept
   {
@@ -276,6 +281,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *(this->_M_ptr + (size() - 1));
   }
 
+  [[nodiscard]]
   constexpr reference
   operator[](size_type __idx) const noexcept
   {
@@ -283,41 +289,50 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *(this->_M_ptr + __idx);
   }
 
+  [[nodiscard]]
   constexpr pointer
   data() const noexcept
   { return this->_M_ptr; }
 

[gcc r13-8424] libstdc++: Fix UB in weekday::weekday(sys_days) and add test

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:d1472711efc77d5ddc2fa6d5eff57baca584c8ef

commit r13-8424-gd1472711efc77d5ddc2fa6d5eff57baca584c8ef
Author: Cassio Neri 
Date:   Sun Nov 12 01:33:52 2023 +

libstdc++: Fix UB in weekday::weekday(sys_days) and add test

The following has undefined behaviour (signed overflow) [1]:
weekday max{sys_days{days{numeric_limits::max()}}};

The issue is in this line when __n is very large and __n + 4 overflows:
return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6);

In addition to fixing this bug, the new implementation makes the compiler 
emit
shorter and branchless code for x86-64 and ARM [2].

[1] https://godbolt.org/z/1s5bv7KfT
[2] https://godbolt.org/z/zKsabzrhs

libstdc++-v3/ChangeLog:

* include/std/chrono (weekday::_S_from_days): Fix UB.
* testsuite/std/time/weekday/1.cc: Add test for overflow.

(cherry picked from commit f6ce081d0ffb5f25d71eb2f30fcfdff7f20dba22)

Diff:
---
 libstdc++-v3/include/std/chrono  | 11 +--
 libstdc++-v3/testsuite/std/time/weekday/1.cc |  9 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index ac7febbaa2c..fb8d6c82e8a 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -936,8 +936,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static constexpr weekday
   _S_from_days(const days& __d)
   {
-   auto __n = __d.count();
-   return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6);
+   using _Rep = days::rep;
+   using _URep = make_unsigned_t<_Rep>;
+   const auto __n = __d.count();
+   const auto __m = static_cast<_URep>(__n);
+
+   // 1970-01-01 (__n =  0, __m = 0) -> Thursday (4)
+   // 1969-31-12 (__n = -1, __m = _URep(-1)) -> Wednesday (3)
+   const auto __offset = __n >= 0 ? _URep(4) : 3 - _URep(-1) % 7 - 7;
+   return weekday((__m + __offset) % 7);
   }
 
 public:
diff --git a/libstdc++-v3/testsuite/std/time/weekday/1.cc 
b/libstdc++-v3/testsuite/std/time/weekday/1.cc
index 1e018eaa3e0..bfc617c4cc8 100644
--- a/libstdc++-v3/testsuite/std/time/weekday/1.cc
+++ b/libstdc++-v3/testsuite/std/time/weekday/1.cc
@@ -21,6 +21,7 @@
 // Class template day [time.cal.weekday]
 
 #include 
+#include 
 
 constexpr void
 constexpr_weekday()
@@ -38,6 +39,14 @@ constexpr_weekday()
   static_assert(weekday{3}[2].weekday() == weekday{3});
   static_assert(weekday{3}[last].weekday() == weekday{3});
 
+  // Test for UB (overflow).
+  {
+using rep = days::rep;
+using std::numeric_limits;
+constexpr weekday max{sys_days{days{numeric_limits::max()}}};
+constexpr weekday min{sys_days{days{numeric_limits::min()}}};
+  }
+
   static_assert(weekday{sys_days{1900y/January/1}} == Monday);
   static_assert(weekday{sys_days{1970y/January/1}} == Thursday);
   static_assert(weekday{sys_days{2020y/August/21}} == Friday);


[gcc r13-8425] libstdc++: Remove unnecessary "& 1" from year_month_day_last::day()

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:29dc5fb5b62364b3a0ef8272c7dab528b91b7ae1

commit r13-8425-g29dc5fb5b62364b3a0ef8272c7dab528b91b7ae1
Author: Cassio Neri 
Date:   Sat Nov 11 16:44:58 2023 +

libstdc++: Remove unnecessary "& 1" from year_month_day_last::day()

When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_day_last::day): Remove &1.

(cherry picked from commit b011535456396a6846ff24fb5b1baea8fe0a33b1)

Diff:
---
 libstdc++-v3/include/std/chrono | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index fb8d6c82e8a..f22b8097174 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1813,22 +1813,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
const auto __m = static_cast(month());
 
-   // Excluding February, the last day of month __m is either 30 or 31 or,
-   // in another words, it is 30 + b = 30 | b, where b is in {0, 1}.
+   // The result is unspecified if __m < 1 or __m > 12.  Hence, assume
+   // 1 <= __m <= 12.  For __m != 2, day() == 30 or day() == 31 or, in
+   // other words, day () == 30 | b, where b is in {0, 1}.
 
-   // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd.
-   // Hence, b = __m & 1 = (__m ^ 0) & 1.
+   // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if, __m is
+   // odd.  Hence, b = __m & 1 = (__m ^ 0) & 1.
 
-   // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if __m is 
even.
-   // Hence, b = (__m ^ 1) & 1.
+   // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if, __m is
+   // even.  Hence, b = (__m ^ 1) & 1.
 
// Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if
// __m >= 8, that is, c = __m >> 3.
 
-   // The above mathematically justifies this implementation whose
-   // performance does not depend on look-up tables being on the L1 cache.
-   return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30
-   : _M_y.is_leap() ? 29 : 28};
+   // Since 30 = (0)_2 and __m <= 31 = (1)_2, the "& 1" in b's
+   // calculation is unnecessary.
+
+   // The performance of this implementation does not depend on look-up
+   // tables being on the L1 cache.
+   return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30
+ : _M_y.is_leap() ? 29 : 28};
   }
 
   constexpr


[gcc r13-8426] libstdc++: Simplify year::is_leap()

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:3cbaada7d9186410a4da6575c27a156e72820ebf

commit r13-8426-g3cbaada7d9186410a4da6575c27a156e72820ebf
Author: Cassio Neri 
Date:   Sat Nov 11 22:59:50 2023 +

libstdc++: Simplify year::is_leap()

The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].

In leap years calculation, it's correct to replace the divisibility check by
100 with the one by 25. It turns out that _M_y % 25 == 0 also saves the ror
instruction [2]. Therefore, the obfuscation is not required.

[1] https://godbolt.org/z/5PaEv6a6b
[2] https://godbolt.org/z/55G8rn77e

libstdc++-v3/ChangeLog:

* include/std/chrono (year::is_leap): Clear code.

(cherry picked from commit 86a0df1a6c7fe4a835620b868e76ea78d42d6620)

Diff:
---
 libstdc++-v3/include/std/chrono | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index f22b8097174..57cc803f1af 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -841,29 +841,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr bool
   is_leap() const noexcept
   {
-   // Testing divisibility by 100 first gives better performance, that is,
-   // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0;
-
-   // It gets even faster if _M_y is in [-536870800, 536870999]
-   // (which is the case here) and _M_y % 100 is replaced by
-   // __is_multiple_of_100 below.
+   // Testing divisibility by 100 first gives better performance [1], i.e.,
+   // return _M_y % 100 == 0 ? _M_y % 400 == 0 : _M_y % 16 == 0;
+   // Furthermore, if _M_y % 100 == 0, then _M_y % 400 == 0 is equivalent
+   // to _M_y % 16 == 0, so we can simplify it to
+   // return _M_y % 100 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0.  // #1
+   // Similarly, we can replace 100 with 25 (which is good since
+   // _M_y % 25 == 0 requires one fewer instruction than _M_y % 100 == 0
+   // [2]):
+   // return _M_y % 25 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0.  // #2
+   // Indeed, first assume _M_y % 4 != 0.  Then _M_y % 16 != 0 and hence,
+   // _M_y % 4 == 0 and _M_y % 16 == 0 are both false.  Therefore, #2
+   // returns false as it should (regardless of _M_y % 25.) Now assume
+   // _M_y % 4 == 0.  In this case, _M_y % 25 == 0 if, and only if,
+   // _M_y % 100 == 0, that is, #1 and #2 are equivalent.  Finally, #2 is
+   // equivalent to
+   // return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0.
 
// References:
// [1] https://github.com/cassioneri/calendar
-   // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16
-
-   // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0,
-   // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0,
-   // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0.
-   // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
-
-   constexpr uint32_t __multiplier   = 42949673;
-   constexpr uint32_t __bound= 42949669;
-   constexpr uint32_t __max_dividend = 1073741799;
-   constexpr uint32_t __offset   = __max_dividend / 2 / 100 * 100;
-   const bool __is_multiple_of_100
- = __multiplier * (_M_y + __offset) < __bound;
-   return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
+   // [2] https://godbolt.org/z/55G8rn77e
+   // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html
+
+   return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0;
   }
 
   explicit constexpr


[gcc r13-8428] libstdc++: Remove UB from month and weekday additions and subtractions.

2024-03-12 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:2d3cc6806a9fc3c9ac299bb021819bcb5e7605ea

commit r13-8428-g2d3cc6806a9fc3c9ac299bb021819bcb5e7605ea
Author: Cassio Neri 
Date:   Sun Dec 10 11:31:31 2023 +

libstdc++: Remove UB from month and weekday additions and subtractions.

The following invoke signed integer overflow (UB) [1]:

  month   + months{MAX} // where MAX is the maximum value of months::rep
  month   + months{MIN} // where MIN is the maximum value of months::rep
  month   - months{MIN} // where MIN is the minimum value of months::rep
  weekday + days  {MAX} // where MAX is the maximum value of days::rep
  weekday - days  {MIN} // where MIN is the minimum value of days::rep

For the additions to MAX, the crux of the problem is that, in libstdc++,
months::rep and days::rep are int64_t. Other implementations use int32_t, 
cast
operands to int64_t and perform arithmetic operations without risk of
overflowing.

For month + months{MIN}, the implementation follows the Standard's "returns
clause" and evaluates:

   modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12);

Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could 
help
but, unfortunately again, this is not possible for libstdc++.

For the subtraction of MIN, the problem is that -MIN is not representable.

It's fair to say that the intention is for these additions/subtractions to
be performed in modulus (12 or 7) arithmetic so that no overflow is 
expected.

To fix these UB, this patch implements:

  template 
  unsigned __add_modulo(unsigned __x, _T __y);

  template 
  unsigned __sub_modulo(unsigned __x, _T __y);

which respectively, returns the remainder of Euclidean division of, __x + 
__y
and __x - __y by __d without overflowing. These functions replace

  constexpr unsigned __modulo(long long __n, unsigned __d);

which also calculates the reminder of __n, where __n is the result of the
addition or subtraction. Hence, these operations might invoke UB before 
__modulo
is called and thus, __modulo can't do anything to remediate the issue.

In addition to solve the UB issues, __add_modulo and __sub_modulo allow 
better
codegen (shorter and branchless) on x86-64 and ARM [2].

[1] https://godbolt.org/z/a9YfWdn57
[2] https://godbolt.org/z/Gh36cr7E4

libstdc++-v3/ChangeLog:

* include/std/chrono: Fix + and - for months and weekdays.
* testsuite/std/time/month/1.cc: Add constexpr tests against 
overflow.
* testsuite/std/time/month/2.cc: New test for extreme values.
* testsuite/std/time/weekday/1.cc: Add constexpr tests against 
overflow.
* testsuite/std/time/weekday/2.cc: New test for extreme values.

(cherry picked from commit 2cb3d42d3f3e7a5345ee7a6f3676a10c84864d72)

Diff:
---
 libstdc++-v3/include/std/chrono  | 79 +++-
 libstdc++-v3/testsuite/std/time/month/1.cc   | 19 +++
 libstdc++-v3/testsuite/std/time/month/2.cc   | 32 +++
 libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +-
 libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 +++
 5 files changed, 151 insertions(+), 27 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index c303eedd464..b2abf90cf71 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -503,18 +503,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 namespace __detail
 {
-  // Compute the remainder of the Euclidean division of __n divided by __d.
-  // Euclidean division truncates toward negative infinity and always
-  // produces a remainder in the range of [0,__d-1] (whereas standard
-  // division truncates toward zero and yields a nonpositive remainder
-  // for negative __n).
+  // Helper to __add_modulo and __sub_modulo.
+  template 
+  consteval auto
+  __modulo_offset()
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   auto constexpr __a = _Up(-1) - _Up(255 + __d - 2);
+   auto constexpr __b = _Up(__d * (__a / __d) - 1);
+   // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1.
+   return _Up(-1) - __b; // >= 255 + d - 1
+  }
+
+  // Compute the remainder of the Euclidean division of __x + __y divided 
by
+  // __d without overflowing.  Typically, __x <= 255 + d - 1 is sum of
+  // weekday/month with a shift in [0, d - 1] and __y is a duration count.
+  template 
+  constexpr unsigned
+  __add_modulo(unsigned __x, _Tp __y)
+  {
+   using _Up = make_unsigned_t<_Tp>;
+   // For __y >= 0, _Up(__y) has the same mathematical value as __y and
+   // this function simply returns (__x + _Up(__y)) % d.  Typically, this
+   // doesn't overflow since the range of _Up contains many more positive
+

[gcc r13-8430] libstdc++: Fix std::basic_format_arg::handle for BasicFormatters

2024-03-12 Thread Jonathan Wakely via Gcc-cvs
https://gcc.gnu.org/g:826f7e5ca3bddf3ff82bc52c09e84f5d35b24dbf

commit r13-8430-g826f7e5ca3bddf3ff82bc52c09e84f5d35b24dbf
Author: Jonathan Wakely 
Date:   Wed Feb 28 15:05:08 2024 +

libstdc++: Fix std::basic_format_arg::handle for BasicFormatters

std::basic_format_arg::handle is supposed to format its value as const
if that is valid, to reduce the number of instantiations of the
formatter's format function. I made a silly typo so that it checks
formattable_with not formattable_with,
which breaks support for BasicFormatters i.e. ones that can only format
non-const types.

There's a static_assert in the handle constructor which is supposed to
improve diagnostics for trying to format a const argument with a
formatter that doesn't support it. That condition can't fail, because
the std::basic_format_arg constructor is already constrained to check
that the argument type is formattable. The static_assert can be removed.

libstdc++-v3/ChangeLog:

* include/std/format (basic_format_arg::handle::__maybe_const_t):
Fix condition to check if const type is formattable.
(basic_format_arg::handle::handle(T&)): Remove redundant
static_assert.
* testsuite/std/format/formatter/basic.cc: New test.

(cherry picked from commit 02ca9d3f0c5d2b0255df28f021834dd67ad79bc2)

Diff:
---
 libstdc++-v3/include/std/format|  6 +-
 .../testsuite/std/format/formatter/basic.cc| 24 ++
 2 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 7bcaddb3715..a938d65a7b9 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -2866,7 +2866,7 @@ namespace __format
// Format as const if possible, to reduce instantiations.
template
  using __maybe_const_t
-   = __conditional_t<__formattable<_Tp>, const _Tp, _Tp>;
+   = __conditional_t<__formattable, const _Tp, _Tp>;
 
template
  static void
@@ -2884,10 +2884,6 @@ namespace __format
  explicit
  handle(_Tp& __val) noexcept
  {
-   if constexpr (!__formattable)
- static_assert(!is_const_v<_Tp>, "std::format argument must be "
- "non-const for this type");
-
this->_M_ptr = __builtin_addressof(__val);
auto __func = _S_format<__maybe_const_t<_Tp>>;
this->_M_func = reinterpret_cast(__func);
diff --git a/libstdc++-v3/testsuite/std/format/formatter/basic.cc 
b/libstdc++-v3/testsuite/std/format/formatter/basic.cc
new file mode 100644
index 000..56c18864135
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/format/formatter/basic.cc
@@ -0,0 +1,24 @@
+// { dg-do compile { target c++20 } }
+
+// BasicFormatter requirements do not require a const parameter.
+
+#include 
+
+struct X { };
+
+template<> struct std::formatter
+{
+  constexpr auto parse(format_parse_context& ctx)
+  { return ctx.begin(); }
+
+  // Takes non-const X&
+  format_context::iterator format(X&, format_context& ctx) const
+  {
+auto out = ctx.out();
+*out++ = 'x';
+return out;
+  }
+};
+
+X x;
+auto s = std::format("{}", x);


[gcc r13-8427] libstdc++: Improve operator-(weekday x, weekday y)

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:e3e5bdee78df9cb44803af6813e0eb10aa8341c0

commit r13-8427-ge3e5bdee78df9cb44803af6813e0eb10aa8341c0
Author: Cassio Neri 
Date:   Tue Nov 14 00:27:39 2023 +

libstdc++: Improve operator-(weekday x, weekday y)

The current implementation calls __detail::__modulo which is relatively
expensive.

A better implementation is possible if we assume that x.ok() && y.ok() == 
true,
so that n = x.c_encoding() - y.c_encoding() is in [-6, 6]. In this case, it
suffices to return n >= 0 ? n : n + 7.

The above is allowed by [time.cal.wd.nonmembers]/5: the returned value is
unspecified when x.ok() || y.ok() == false.

The assembly emitted for x86-64 and ARM can be seen in:
https://godbolt.org/z/nMdc5vv9n.

libstdc++-v3/ChangeLog:

* include/std/chrono (operator-(const weekday&, const weekday&)):
Optimize.

(cherry picked from commit f71352c71d78ac977ea0e71a6900699a8cf09219)

Diff:
---
 libstdc++-v3/include/std/chrono | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 57cc803f1af..c303eedd464 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -1049,8 +1049,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   friend constexpr days
   operator-(const weekday& __x, const weekday& __y) noexcept
   {
-   auto __n = static_cast(__x._M_wd) - __y._M_wd;
-   return days{__detail::__modulo(__n, 7)};
+   const auto __n = __x.c_encoding() - __y.c_encoding();
+   return static_cast(__n) >= 0 ? days{__n} : days{__n + 7};
   }
 };


[gcc r13-8431] libstdc++: Update expiry times for leap seconds lists

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:1870ee44351f182e8782238e9a6732e842eebf1d

commit r13-8431-g1870ee44351f182e8782238e9a6732e842eebf1d
Author: Jonathan Wakely 
Date:   Fri Mar 1 20:55:10 2024 +

libstdc++: Update expiry times for leap seconds lists

The list in tzdb.cc isn't the only hardcoded list of leap seconds in the
library, there's the one defined inline in  (to avoid loading
the tzdb for the common case) and another in a testcase. This updates
them to note that there are no new leap seconds in 2024 either, until at
least 2024-12-28.

libstdc++-v3/ChangeLog:

* include/std/chrono (__get_leap_second_info): Update expiry
time for hardcoded list of leap seconds.
* testsuite/std/time/tzdb/leap_seconds.cc: Update comment.

(cherry picked from commit ddd347fca0685804bf68d6c768282573f3ea6442)

Diff:
---
 libstdc++-v3/include/std/chrono  | 2 +-
 libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index b2abf90cf71..edb782f6f10 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -3253,7 +3253,7 @@ namespace __detail
   };
   // The list above is known to be valid until (at least) this date
   // and only contains positive leap seconds.
-  const sys_seconds __expires(1703721600s); // 2023-12-28 00:00:00 UTC
+  const sys_seconds __expires(1735344000s); // 2024-12-28 00:00:00 UTC
 
 #if _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI
   if (__ss > __expires)
diff --git a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc 
b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
index d27038225c8..537fb0670ff 100644
--- a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
+++ b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc
@@ -22,7 +22,7 @@ void
 test_load_leapseconds()
 {
   std::ofstream("leapseconds") << R"(
-# These are all the real leap seconds as of 2022:
+# These are all the real leap seconds as of 2024:
 Leap   1972Jun 30  23:59:60+   S
 Leap   1972Dec 31  23:59:60+   S
 Leap   1973Dec 31  23:59:60+   S


[gcc r13-8429] libstdc++: Implement P2905R2 "Runtime format strings" for C++20

2024-03-12 Thread Jonathan Wakely via Libstdc++-cvs
https://gcc.gnu.org/g:3c8faeac3d03e032d55fae390618e577c292a83e

commit r13-8429-g3c8faeac3d03e032d55fae390618e577c292a83e
Author: Jonathan Wakely 
Date:   Sun Jan 7 22:21:08 2024 +

libstdc++: Implement P2905R2 "Runtime format strings" for C++20

This change makes std::make_format_args refuse to create dangling
references to temporaries. This makes the std::vformat API safer. This
was approved in Kona 2023 as a DR for C++20 so the change is implemented
unconditionally.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono): Always use
lvalue arguments to make_format_args.
* include/std/format (make_format_args): Change parameter pack
from forwarding references to lvalue references. Remove use of
remove_reference_t which is now unnecessary.
(format_to, formatted_size): Remove incorrect forwarding of
arguments.
* testsuite/20_util/duration/io.cc: Use lvalues as arguments to
make_format_args.
* testsuite/std/format/arguments/args.cc: Likewise.
* testsuite/std/format/arguments/lwg3810.cc: Likewise.
* testsuite/std/format/functions/format.cc: Likewise.
* testsuite/std/format/functions/vformat_to.cc: Likewise.
* testsuite/std/format/string.cc: Likewise.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/format/arguments/args_neg.cc: New test.

(cherry picked from commit 2a8ee2592e48735d88df786cbafa6b0da39fc4d6)

Diff:
---
 libstdc++-v3/include/bits/chrono_io.h  | 15 +++
 libstdc++-v3/include/std/format| 30 +++---
 libstdc++-v3/testsuite/20_util/duration/io.cc  |  3 ++-
 .../testsuite/std/format/arguments/args.cc | 26 ++-
 .../testsuite/std/format/arguments/args_neg.cc | 12 +
 .../testsuite/std/format/arguments/lwg3810.cc  |  8 --
 .../testsuite/std/format/functions/format.cc   |  6 +++--
 .../testsuite/std/format/functions/vformat_to.cc   |  9 +--
 libstdc++-v3/testsuite/std/format/string.cc|  7 +++--
 libstdc++-v3/testsuite/std/time/day/io.cc  |  4 +--
 libstdc++-v3/testsuite/std/time/month/io.cc|  4 +--
 libstdc++-v3/testsuite/std/time/weekday/io.cc  |  4 +--
 libstdc++-v3/testsuite/std/time/year/io.cc |  4 +--
 .../testsuite/std/time/year_month_day/io.cc|  4 +--
 14 files changed, 91 insertions(+), 45 deletions(-)

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index c42797f64c4..1c08130bf65 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -2195,7 +2195,8 @@ namespace chrono
   _Str __s = _GLIBCXX_WIDEN("{:02d} is not a valid day");
   if (__d.ok())
__s = __s.substr(0, 6);
-  __os << std::vformat(__s, make_format_args<_Ctx>((unsigned)__d));
+  auto __u = (unsigned)__d;
+  __os << std::vformat(__s, make_format_args<_Ctx>(__u));
   return __os;
 }
 
@@ -2213,8 +2214,10 @@ namespace chrono
__os << std::vformat(__os.getloc(), __s.substr(0, 6),
 make_format_args<_Ctx>(__m));
   else
-   __os << std::vformat(__s.substr(6),
-make_format_args<_Ctx>((unsigned)__m));
+   {
+ auto __u = (unsigned)__m;
+ __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__u));
+   }
   return __os;
 }
 
@@ -2253,8 +2256,10 @@ namespace chrono
__os << std::vformat(__os.getloc(), __s.substr(0, 6),
 make_format_args<_Ctx>(__wd));
   else
-   __os << std::vformat(__s.substr(6),
-make_format_args<_Ctx>(__wd.c_encoding()));
+   {
+ auto __c = __wd.c_encoding();
+ __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__c));
+   }
   return __os;
 }
 
diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 807c97680c6..7bcaddb3715 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -3117,7 +3117,7 @@ namespace __format
 
   template
friend auto
-   make_format_args(_Argz&&...) noexcept;
+   make_format_args(_Argz&...) noexcept;
 
   template
friend decltype(auto)
@@ -3287,7 +3287,7 @@ namespace __format
 
   template
friend auto
-   make_format_args(_Args&&...) noexcept;
+   make_format_args(_Args&...) noexcept;
 
   // An array of _Arg_t enums corresponding to _Args...
   template
@@ -3325,7 +3325,7 @@ namespace __format
 
   template

[gcc r14-9441] libgomp/libgomp.texi: Fix @node order in @menu

2024-03-12 Thread Tobias Burnus via Gcc-cvs
https://gcc.gnu.org/g:ef79c64cb5762c86ee04ddfcedb7fe31eaa3bac8

commit r14-9441-gef79c64cb5762c86ee04ddfcedb7fe31eaa3bac8
Author: Tobias Burnus 
Date:   Tue Mar 12 15:42:50 2024 +0100

libgomp/libgomp.texi: Fix @node order in @menu

While texinfo 7.0.3 does not warn, an older texinfo did complain about:
libgomp.texi:1964: warning: node next `omp_target_memcpy' in menu
`omp_target_memcpy_rect' and in sectioning `omp_target_memcpy_async' differ

libgomp/

* libgomp.texi (Device Memory Routines): Swap item order to match
the order of the '@node's of the '@subsection's.

Diff:
---
 libgomp/libgomp.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index bf5c7a76fc9..57165e0e981 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1783,8 +1783,8 @@ pointers on devices. They have C linkage and do not throw 
exceptions.
 * omp_target_is_present:: Check whether storage is mapped
 * omp_target_is_accessible:: Check whether memory is device accessible
 * omp_target_memcpy:: Copy data between devices
-* omp_target_memcpy_rect:: Copy a subvolume of data between devices
 * omp_target_memcpy_async:: Copy data between devices asynchronously
+* omp_target_memcpy_rect:: Copy a subvolume of data between devices
 * omp_target_memcpy_rect_async:: Copy a subvolume of data between devices 
asynchronously
 @c * omp_target_memset:: /TR12
 @c * omp_target_memset_async:: /TR12


[gcc r14-9442] Fortran: handle procedure pointer component in DT array [PR110826]

2024-03-12 Thread Harald Anlauf via Gcc-cvs
https://gcc.gnu.org/g:81ee1298b47d3f3b3712ef3f3b2929ca26c4bcd2

commit r14-9442-g81ee1298b47d3f3b3712ef3f3b2929ca26c4bcd2
Author: Harald Anlauf 
Date:   Mon Mar 11 22:05:51 2024 +0100

Fortran: handle procedure pointer component in DT array [PR110826]

gcc/fortran/ChangeLog:

PR fortran/110826
* array.cc (gfc_array_dimen_size): When walking the ref chain of an
array and the ultimate component is a procedure pointer, do not try
to figure out its dimension even if it is a array-valued function.

gcc/testsuite/ChangeLog:

PR fortran/110826
* gfortran.dg/proc_ptr_comp_53.f90: New test.

Diff:
---
 gcc/fortran/array.cc   |  7 +
 gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 | 43 ++
 2 files changed, 50 insertions(+)

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index 3a6e3a7c95b..e9934f1491b 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -2597,6 +2597,13 @@ gfc_array_dimen_size (gfc_expr *array, int dimen, mpz_t 
*result)
 case EXPR_FUNCTION:
   for (ref = array->ref; ref; ref = ref->next)
{
+ /* Ultimate component is a procedure pointer.  */
+ if (ref->type == REF_COMPONENT
+ && !ref->next
+ && ref->u.c.component->attr.function
+ && IS_PROC_POINTER (ref->u.c.component))
+   return false;
+
  if (ref->type != REF_ARRAY)
continue;
 
diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 
b/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90
new file mode 100644
index 000..affb5922235
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90
@@ -0,0 +1,43 @@
+! { dg-do compile }
+! PR fortran/110826 - procedure pointer component in DT array
+
+module m
+  implicit none
+
+  type pp
+procedure(func_template), pointer, nopass :: f =>null()
+  end type pp
+
+  abstract interface
+ function func_template(state) result(dstate)
+   implicit none
+   real, dimension(:,:), intent(in)  :: state
+   real, dimension(size(state,1), size(state,2)) :: dstate
+ end function
+  end interface
+
+contains
+
+  function zero_state(state) result(dstate)
+real, dimension(:,:), intent(in)  :: state
+real, dimension(size(state,1), size(state,2)) :: dstate
+dstate = 0.
+  end function zero_state
+
+end module m
+
+program test_func_array
+  use m
+  implicit none
+
+  real, dimension(4,6) :: state
+  type(pp) :: func_scalar
+  type(pp) :: func_array(4)
+
+  func_scalar  %f => zero_state
+  func_array(1)%f => zero_state
+  print *, func_scalar  %f(state)
+  print *, func_array(1)%f(state)
+  if (.not. all (shape (func_scalar  %f(state)) == shape (state))) stop 1
+  if (.not. all (shape (func_array(1)%f(state)) == shape (state))) stop 2
+end program test_func_array


[gcc/meissner/heads/work162-ajit] (16 commits) Merge commit 'refs/users/meissner/heads/work162-ajit' of gi

2024-03-12 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work162-ajit' was updated to point to:

 cc383c3f802... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi

It previously pointed to:

 5d73f63c135... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi

Diff:

Summary of changes (added commits):
---

  cc383c3... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi
  a3aa724... Add ChangeLog.ajit and update REVISION.
  0746a20... Add -mcpu=future tuning support. (*)
  5180f01... Add -mcpu=future support. (*)
  c87c9fa... Add -mcpu=power11 tests. (*)
  34edbd6... Add -mcpu=power11 tuning support. (*)
  a2e7314... Add -mcpu=power11 support. (*)
  ae3aa18... Revert all changes (*)
  6712015... Add -mcpu=future support part 3. (*)
  ea0193b... Add -mcpu=future support part 2 (*)
  522bd06... Add -mcpu=future support. (*)
  35fe360... Add -mcpu=power11 tests. (*)
  8939ee2... Add -mcpu=power11 support part 3. (*)
  a09c97d... Add -mcpu=power11 support part 2 (*)
  79df8d6... Add -mcpu=power11 support. (*)
  448253a... Revert some changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work162-ajit' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work162-ajit)] Add ChangeLog.ajit and update REVISION.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:a3aa724cc83ce2f56cfaa04fa6b3ccd19674eb98

commit a3aa724cc83ce2f56cfaa04fa6b3ccd19674eb98
Author: Michael Meissner 
Date:   Thu Mar 7 11:08:43 2024 -0500

Add ChangeLog.ajit and update REVISION.

2024-03-07  Michael Meissner  

gcc/

* ChangeLog.ajit: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.ajit | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.ajit b/gcc/ChangeLog.ajit
new file mode 100644
index 000..eb5570e2484
--- /dev/null
+++ b/gcc/ChangeLog.ajit
@@ -0,0 +1,6 @@
+ Branch work162-ajit, baseline 
+
+2024-03-07   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index 1f2b7b56b83..1fa4bd9178d 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work162 branch
+work162-ajit branch


[gcc(refs/users/meissner/heads/work162-ajit)] Merge commit 'refs/users/meissner/heads/work162-ajit' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:cc383c3f802f476cbb7d89df27de371a5bffa0ca

commit cc383c3f802f476cbb7d89df27de371a5bffa0ca
Merge: a3aa724cc83 5d73f63c135
Author: Michael Meissner 
Date:   Tue Mar 12 17:51:42 2024 -0400

Merge commit 'refs/users/meissner/heads/work162-ajit' of 
git+ssh://gcc.gnu.org/git/gcc into me/work162-ajit

Diff:


[gcc/meissner/heads/work162-dmf] (16 commits) Merge commit 'refs/users/meissner/heads/work162-dmf' of git

2024-03-12 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work162-dmf' was updated to point to:

 6d46ee8dac6... Merge commit 'refs/users/meissner/heads/work162-dmf' of git

It previously pointed to:

 f8660bb40a9... Merge commit 'refs/users/meissner/heads/work162-dmf' of git

Diff:

Summary of changes (added commits):
---

  6d46ee8... Merge commit 'refs/users/meissner/heads/work162-dmf' of git
  1aef508... Add ChangeLog.dmf and update REVISION.
  0746a20... Add -mcpu=future tuning support. (*)
  5180f01... Add -mcpu=future support. (*)
  c87c9fa... Add -mcpu=power11 tests. (*)
  34edbd6... Add -mcpu=power11 tuning support. (*)
  a2e7314... Add -mcpu=power11 support. (*)
  ae3aa18... Revert all changes (*)
  6712015... Add -mcpu=future support part 3. (*)
  ea0193b... Add -mcpu=future support part 2 (*)
  522bd06... Add -mcpu=future support. (*)
  35fe360... Add -mcpu=power11 tests. (*)
  8939ee2... Add -mcpu=power11 support part 3. (*)
  a09c97d... Add -mcpu=power11 support part 2 (*)
  79df8d6... Add -mcpu=power11 support. (*)
  448253a... Revert some changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work162-dmf' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work162-dmf)] Add ChangeLog.dmf and update REVISION.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:1aef508da5a8c94561e0805d3f91a9a0ca2722c1

commit 1aef508da5a8c94561e0805d3f91a9a0ca2722c1
Author: Michael Meissner 
Date:   Thu Mar 7 11:05:59 2024 -0500

Add ChangeLog.dmf and update REVISION.

2024-03-07  Michael Meissner  

gcc/

* ChangeLog.dmf: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.dmf | 6 ++
 gcc/REVISION  | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
new file mode 100644
index 000..4bf550e6556
--- /dev/null
+++ b/gcc/ChangeLog.dmf
@@ -0,0 +1,6 @@
+ Branch work162-dmf, baseline 
+
+2024-03-07   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index 1f2b7b56b83..58945f7a1ad 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work162 branch
+work162-dmf branch


[gcc(refs/users/meissner/heads/work162-dmf)] Merge commit 'refs/users/meissner/heads/work162-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work16

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:6d46ee8dac66b56b73f3470eabb6f19dd7de162d

commit 6d46ee8dac66b56b73f3470eabb6f19dd7de162d
Merge: 1aef508da5a f8660bb40a9
Author: Michael Meissner 
Date:   Tue Mar 12 17:58:00 2024 -0400

Merge commit 'refs/users/meissner/heads/work162-dmf' of 
git+ssh://gcc.gnu.org/git/gcc into me/work162-dmf

Diff:


[gcc/meissner/heads/work162-test] (16 commits) Merge commit 'refs/users/meissner/heads/work162-test' of gi

2024-03-12 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work162-test' was updated to point to:

 1fd55caa3fe... Merge commit 'refs/users/meissner/heads/work162-test' of gi

It previously pointed to:

 f8f47c34771... Merge commit 'refs/users/meissner/heads/work162-test' of gi

Diff:

Summary of changes (added commits):
---

  1fd55ca... Merge commit 'refs/users/meissner/heads/work162-test' of gi
  6b796c8... Add ChangeLog.test and update REVISION.
  0746a20... Add -mcpu=future tuning support. (*)
  5180f01... Add -mcpu=future support. (*)
  c87c9fa... Add -mcpu=power11 tests. (*)
  34edbd6... Add -mcpu=power11 tuning support. (*)
  a2e7314... Add -mcpu=power11 support. (*)
  ae3aa18... Revert all changes (*)
  6712015... Add -mcpu=future support part 3. (*)
  ea0193b... Add -mcpu=future support part 2 (*)
  522bd06... Add -mcpu=future support. (*)
  35fe360... Add -mcpu=power11 tests. (*)
  8939ee2... Add -mcpu=power11 support part 3. (*)
  a09c97d... Add -mcpu=power11 support part 2 (*)
  79df8d6... Add -mcpu=power11 support. (*)
  448253a... Revert some changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work162-test' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work162-test)] Add ChangeLog.test and update REVISION.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:6b796c8c6e10d3991dae549a4ba6e3a3bed22bf1

commit 6b796c8c6e10d3991dae549a4ba6e3a3bed22bf1
Author: Michael Meissner 
Date:   Thu Mar 7 11:09:39 2024 -0500

Add ChangeLog.test and update REVISION.

2024-03-07  Michael Meissner  

gcc/

* ChangeLog.test: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.test | 6 ++
 gcc/REVISION   | 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.test b/gcc/ChangeLog.test
new file mode 100644
index 000..9512bcddaf9
--- /dev/null
+++ b/gcc/ChangeLog.test
@@ -0,0 +1,6 @@
+ Branch work162-test, baseline 
+
+2024-03-07   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index 1f2b7b56b83..6bf4941fb03 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work162 branch
+work162-test branch


[gcc(refs/users/meissner/heads/work162-test)] Merge commit 'refs/users/meissner/heads/work162-test' of git+ssh://gcc.gnu.org/git/gcc into me/work1

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:1fd55caa3fe37bc777e52181c9b01f51d1f50ac3

commit 1fd55caa3fe37bc777e52181c9b01f51d1f50ac3
Merge: 6b796c8c6e1 f8f47c34771
Author: Michael Meissner 
Date:   Tue Mar 12 18:03:24 2024 -0400

Merge commit 'refs/users/meissner/heads/work162-test' of 
git+ssh://gcc.gnu.org/git/gcc into me/work162-test

Diff:


[gcc/meissner/heads/work162-vpair] (16 commits) Merge commit 'refs/users/meissner/heads/work162-vpair' of g

2024-03-12 Thread Michael Meissner via Gcc-cvs
The branch 'meissner/heads/work162-vpair' was updated to point to:

 3ca2a9f1c96... Merge commit 'refs/users/meissner/heads/work162-vpair' of g

It previously pointed to:

 ed10bc0b1be... Merge commit 'refs/users/meissner/heads/work162-vpair' of g

Diff:

Summary of changes (added commits):
---

  3ca2a9f... Merge commit 'refs/users/meissner/heads/work162-vpair' of g
  e73aa4f... Add ChangeLog.vpair and update REVISION.
  0746a20... Add -mcpu=future tuning support. (*)
  5180f01... Add -mcpu=future support. (*)
  c87c9fa... Add -mcpu=power11 tests. (*)
  34edbd6... Add -mcpu=power11 tuning support. (*)
  a2e7314... Add -mcpu=power11 support. (*)
  ae3aa18... Revert all changes (*)
  6712015... Add -mcpu=future support part 3. (*)
  ea0193b... Add -mcpu=future support part 2 (*)
  522bd06... Add -mcpu=future support. (*)
  35fe360... Add -mcpu=power11 tests. (*)
  8939ee2... Add -mcpu=power11 support part 3. (*)
  a09c97d... Add -mcpu=power11 support part 2 (*)
  79df8d6... Add -mcpu=power11 support. (*)
  448253a... Revert some changes (*)

(*) This commit already exists in another branch.
Because the reference `refs/users/meissner/heads/work162-vpair' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.


[gcc(refs/users/meissner/heads/work162-vpair)] Add ChangeLog.vpair and update REVISION.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e73aa4f8e5fc9de58d5aaca7c290b0a9d516664f

commit e73aa4f8e5fc9de58d5aaca7c290b0a9d516664f
Author: Michael Meissner 
Date:   Thu Mar 7 11:07:00 2024 -0500

Add ChangeLog.vpair and update REVISION.

2024-03-07  Michael Meissner  

gcc/

* ChangeLog.vpair: New file for branch.
* REVISION: Update.

Diff:
---
 gcc/ChangeLog.vpair | 6 ++
 gcc/REVISION| 2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair
new file mode 100644
index 000..faeb03cac7b
--- /dev/null
+++ b/gcc/ChangeLog.vpair
@@ -0,0 +1,6 @@
+ Branch work162-vpair, baseline 
+
+2024-03-07   Michael Meissner  
+
+   Clone branch
+
diff --git a/gcc/REVISION b/gcc/REVISION
index 1f2b7b56b83..5f53efe48c3 100644
--- a/gcc/REVISION
+++ b/gcc/REVISION
@@ -1 +1 @@
-work162 branch
+work162-vpair branch


[gcc(refs/users/meissner/heads/work162-vpair)] Merge commit 'refs/users/meissner/heads/work162-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:3ca2a9f1c968d61a4de44f410f89b4f98fefd2f2

commit 3ca2a9f1c968d61a4de44f410f89b4f98fefd2f2
Merge: e73aa4f8e5f ed10bc0b1be
Author: Michael Meissner 
Date:   Tue Mar 12 18:07:04 2024 -0400

Merge commit 'refs/users/meissner/heads/work162-vpair' of 
git+ssh://gcc.gnu.org/git/gcc into me/work162-vpair

Diff:


[gcc(refs/users/meissner/heads/work162-vpair)] Power10: Add options to disable load and store vector pair.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:8135a35053e1bf1723ef225a3d75c19a0684f6f2

commit 8135a35053e1bf1723ef225a3d75c19a0684f6f2
Author: Michael Meissner 
Date:   Tue Mar 12 20:09:21 2024 -0400

Power10: Add options to disable load and store vector pair.

In working on some future patches that involve utilizing vector pair
instructions, I wanted to be able to tune my program to enable or disable 
using
the vector pair load or store operations while still keeping the other
operations on the vector pair.

This patch adds two undocumented tuning options.  The -mno-load-vector-pair
option would tell GCC to generate two load vector instructions instead of a
single load vector pair.  The -mno-store-vector-pair option would tell GCC 
to
generate two store vector instructions instead of a single store vector 
pair.

If either -mno-load-vector-pair is used, GCC will not generate the indexed
stxvpx instruction.  Similarly if -mno-store-vector-pair is used, GCC will 
not
generate the indexed lxvpx instruction.  The reason for this is to enable
splitting the {,p}lxvp or {,p}stxvp instructions after reload without 
needing a
scratch GPR register.

The default for -mcpu=power10 is that both load vector pair and store vector
pair are enabled.

I added code so that the user code can modify these settings using either a
'#pragma GCC target' directive or used __attribute__((__target__(...))) in 
the
function declaration.

I added tests for the switches, #pragma, and attribute options.

I have built this on both little endian power10 systems and big endian 
power9
systems doing the normal bootstrap and test.  There were no regressions in 
any
of the tests, and the new tests passed.  Can I check this patch into the 
master
branch?

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/mma.md (movoo): Add support for 
-mno-load-vector-pair and
-mno-store-vector-pair.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add support 
for
-mload-vector-pair and -mstore-vector-pair.
(POWERPC_MASKS): Likewise.
* config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): Only allow
indexed mode for OOmode if we are generating both load vector pair 
and
store vector pair instructions.
(rs6000_option_override_internal): Add support for 
-mno-load-vector-pair
and -mno-store-vector-pair.
(rs6000_opt_masks): Likewise.
* config/rs6000/rs6000.md (isa attribute): Add lxvp and stxvp
attributes.
(enabled attribute): Likewise.
* config/rs6000/rs6000.opt (-mload-vector-pair): New option.
(-mstore-vector-pair): Likewise.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-attribute.c: New test.
* gcc.target/powerpc/vector-pair-pragma.c: New test.
* gcc.target/powerpc/vector-pair-switch1.c: New test.
* gcc.target/powerpc/vector-pair-switch2.c: New test.
* gcc.target/powerpc/vector-pair-switch3.c: New test.
* gcc.target/powerpc/vector-pair-switch4.c: New test.

Diff:
---
 gcc/config/rs6000/mma.md   | 19 +---
 gcc/config/rs6000/rs6000-cpus.def  |  8 +++-
 gcc/config/rs6000/rs6000.cc| 30 +++-
 gcc/config/rs6000/rs6000.md| 10 +++-
 gcc/config/rs6000/rs6000.opt   |  8 
 .../gcc.target/powerpc/vector-pair-attribute.c | 39 +++
 .../gcc.target/powerpc/vector-pair-pragma.c| 55 ++
 .../gcc.target/powerpc/vector-pair-switch1.c   | 16 +++
 .../gcc.target/powerpc/vector-pair-switch2.c   | 17 +++
 .../gcc.target/powerpc/vector-pair-switch3.c   | 17 +++
 .../gcc.target/powerpc/vector-pair-switch4.c   | 17 +++
 11 files changed, 225 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 04e2d0066df..6a7d8a836db 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -292,27 +292,34 @@
 gcc_assert (false);
 })
 
+;; If the user used -mno-store-vector-pair or -mno-load-vector pair, use an
+;; alternative that does not allow indexed addresses so we can split the load
+;; or store.
 (define_insn_and_split "*movoo"
-  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa")
-   (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))]
+  [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,wa,ZwO,QwO,wa")
+   (match_operand:OO 1 "input_operand" "ZwO,QwO,wa,wa,wa"))]
   "TARGET_MMA
&& (gpc_reg_operand (operands[0], OOmode)
|| gpc_reg_operand (operands[1], OOmode))"
   "@
lxvp%X1 %x0,%1
+   #
stxvp%X0 %x1,%0
+   #
#"
   "&& reload_completed
-   && (!MEM_P (operands[0]) && !MEM_P (ope

[gcc(refs/users/meissner/heads/work162-vpair)] Peter's patches for subreg support.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:5368f7b97a553d623c5787b3b6d71505732a9c47

commit 5368f7b97a553d623c5787b3b6d71505732a9c47
Author: Michael Meissner 
Date:   Tue Mar 12 20:18:26 2024 -0400

Peter's patches for subreg support.

2024-03-12  Peter Bergner  

gcc/

PR target/109116
* gcc/config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Make OOmode
tieable with 128-bit vector modes.

2024-01-23  Peter Bergner  

gcc/

PR target/109116
* gcc/config/rs6000/mma.md (vsx_disassemble_pair): Use SUBREG's 
instead
of UNSPEC's.
(mma_disassemble_acc): Likewise.

Diff:
---
 gcc/config/rs6000/mma.md| 50 -
 gcc/config/rs6000/rs6000.cc |  9 +---
 2 files changed, 10 insertions(+), 49 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 6a7d8a836db..831e646c473 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -405,29 +405,8 @@
(match_operand 2 "const_0_to_1_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-   gen_rtvec (2, operands[1], GEN_INT (regoff)),
-   UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*vsx_disassemble_pair"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-   (unspec:V16QI [(match_operand:OO 1 "vsx_register_operand" "wa")
- (match_operand 2 "const_0_to_1_operand")]
- UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && vsx_register_operand (operands[1], OOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], OOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })
@@ -479,29 +458,8 @@
(match_operand 2 "const_0_to_3_operand")]
   "TARGET_MMA"
 {
-  rtx src;
-  int regoff = INTVAL (operands[2]);
-  src = gen_rtx_UNSPEC (V16QImode,
-   gen_rtvec (2, operands[1], GEN_INT (regoff)),
-   UNSPEC_MMA_EXTRACT);
-  emit_move_insn (operands[0], src);
-  DONE;
-})
-
-(define_insn_and_split "*mma_disassemble_acc"
-  [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa")
-   (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d")
- (match_operand 2 "const_0_to_3_operand")]
- UNSPEC_MMA_EXTRACT))]
-  "TARGET_MMA
-   && fpr_reg_operand (operands[1], XOmode)"
-  "#"
-  "&& reload_completed"
-  [(const_int 0)]
-{
-  int reg = REGNO (operands[1]);
-  int regoff = INTVAL (operands[2]);
-  rtx src = gen_rtx_REG (V16QImode, reg + regoff);
+  int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode);
+  rtx src = simplify_gen_subreg (V16QImode, operands[1], XOmode, regoff);
   emit_move_insn (operands[0], src);
   DONE;
 })
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 08198fa9fdf..e37e0a74ebe 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1975,9 +1975,12 @@ rs6000_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
-  if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
-  || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
-return mode1 == mode2;
+   if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
+   || mode2 == PTImode || mode2 == XOmode)
+ return mode1 == mode2;
+ 
+  if (mode2 == OOmode)
+return ALTIVEC_OR_VSX_VECTOR_MODE (mode1);
 
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
 return ALTIVEC_OR_VSX_VECTOR_MODE (mode2);


[gcc(refs/users/meissner/heads/work162-vpair)] Add support for vector pair unary and binary operations.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:581eca4771f36cbcb9241daeef266dbe36ed27a8

commit 581eca4771f36cbcb9241daeef266dbe36ed27a8
Author: Michael Meissner 
Date:   Tue Mar 12 20:23:48 2024 -0400

Add support for vector pair unary and binary operations.

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new
built-in functions for vector pair support.
* config/rs6000/rs6000-protos.h (enum vpair_split_unary): New
enumeration.
(vpair_split_unary): New declaration.
(vpair_split_binary): Likewise.
* config/rs6000/rs6000.cc (vpair_split_unary): New function to split
vector pair operations.
(vpair_split_binary): Likewise.
* config/rs6000/rs6000.md (toplevel): Include vector-pair.md.
* config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
* config/rs6000/vector-pair.md: New file.
* doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add
documentation for the new vector pair built-in functions.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-1.c: New test.
* gcc.target/powerpc/vector-pair-2.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtins.def|  56 
 gcc/config/rs6000/rs6000-protos.h|  12 ++
 gcc/config/rs6000/rs6000.cc  |  67 ++
 gcc/config/rs6000/rs6000.md  |   1 +
 gcc/config/rs6000/t-rs6000   |   1 +
 gcc/config/rs6000/vector-pair.md | 160 +++
 gcc/doc/extend.texi  |  51 
 gcc/testsuite/gcc.target/powerpc/vector-pair-1.c |  87 
 gcc/testsuite/gcc.target/powerpc/vector-pair-2.c |  86 
 9 files changed, 521 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..83e7206e989 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -4131,3 +4131,59 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
+
+;; Vector pair built-in functions with float elements
+  v256 __builtin_vpair_f32_abs (v256);
+VPAIR_F32_ABS vpair_abs_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_add (v256, v256);
+VPAIR_F32_ADD vpair_add_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_div (v256, v256);
+VPAIR_F32_DIV vpair_div_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_max (v256, v256);
+VPAIR_F32_MAX vpair_smax_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_min (v256, v256);
+VPAIR_F32_MIN vpair_smin_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_mul (v256, v256);
+VPAIR_F32_MUL vpair_mul_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_nabs (v256);
+VPAIR_F32_NABS vpair_nabs_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_neg (v256);
+VPAIR_F32_NEG vpair_neg_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_sub (v256, v256);
+VPAIR_F32_SUB vpair_sub_v8sf3 {mma}
+
+;; Vector pair built-in functions with double elements
+  v256 __builtin_vpair_f64_abs (v256);
+VPAIR_F64_ABS vpair_abs_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_add (v256, v256);
+VPAIR_F64_ADD vpair_add_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_div (v256, v256);
+VPAIR_F64_DIV vpair_div_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_max (v256, v256);
+VPAIR_F64_MAX vpair_smax_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_min (v256, v256);
+VPAIR_F64_MIN vpair_smin_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_mul (v256, v256);
+VPAIR_F64_MUL vpair_mul_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_nabs (v256);
+VPAIR_F64_NABS vpair_nabs_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_neg (v256);
+VPAIR_F64_NEG vpair_neg_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_sub (v256, v256);
+VPAIR_F64_SUB vpair_sub_v4df3 {mma}
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..4d6ecc83436 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -162,6 +162,18 @@ extern bool rs6000_pcrel_p (void);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
 extern void rs6000_output_addr_vec_elt (FILE *, int);
 
+/* If we are splitting a vector pair unary operator into two separate vector
+   operations, we need to generate a NEG if this is NABS.  */
+
+enum vpair_split_unary {
+  VPAIR_SPLIT_NORMAL,  /* No extra processing is needed.  */
+  VPAIR_SPLIT_NEGATE   /* Wrap operation with a NEG.  */
+};
+
+extern void vpair_split_unary (rtx [], machine_mode, enum rtx_code,
+  enum vpair_split_unary);
+extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code);
+
 /* Different PowerPC instruction formats that are used by GCC.  There are
various other instruction formats used by the PowerPC hardware, but these
formats are not currently 

[gcc(refs/users/meissner/heads/work162-vpair)] Add support for vector pair fma operations.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:e1939c7a8b72c315ba15751d40bb439231499a1e

commit e1939c7a8b72c315ba15751d40bb439231499a1e
Author: Michael Meissner 
Date:   Tue Mar 12 20:29:24 2024 -0400

Add support for vector pair fma operations.

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New
built-in.
(__builtin_vpair_f32_fms): Likewise.
(__builtin_vpair_f32_nfma): Likewise.
(__builtin_vpair_f32_nfms): Likewise.
(__builtin_vpair_f64_fma): Likewise.
(__builtin_vpair_f64_fms): Likewise.
(__builtin_vpair_f64_nfma): Likewise.
* config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New
enumeration.
(vpair_split_fma): New declaration.
* config/rs6000/rs6000.cc (vpair_split_fma): New function to split
vector pair FMA operations.
* config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec.
(vpair_stdname): Add UNSPEC_VPAIR_FMA.
(VPAIR_OP): Likewise.
(vpair_fma_4): New insns.
(vpair_fms_4): Likewise.
(vpair_nfma_4): Likewise.
(vpair_nfms_4): Likewise.
* doc/extend.texi (PowerPC Vector Pair Built-in Functions): 
Document new
vector pair fma built-in functions.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-3.c: New test.
* gcc.target/powerpc/vector-pair-4.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtins.def| 24 ++
 gcc/config/rs6000/rs6000-protos.h| 13 
 gcc/config/rs6000/rs6000.cc  | 71 ++
 gcc/config/rs6000/vector-pair.md | 96 
 gcc/doc/extend.texi  | 25 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-3.c | 57 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-4.c | 57 ++
 7 files changed, 343 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 83e7206e989..4362cbb8fc7 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -4142,6 +4142,12 @@
   v256 __builtin_vpair_f32_div (v256, v256);
 VPAIR_F32_DIV vpair_div_v8sf3 {mma}
 
+  v256 __builtin_vpair_f32_fma (v256, v256, v256);
+VPAIR_F32_FMA vpair_fma_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_fms (v256, v256, v256);
+VPAIR_F32_FMS vpair_fms_v8sf4 {mma}
+
   v256 __builtin_vpair_f32_max (v256, v256);
 VPAIR_F32_MAX vpair_smax_v8sf3 {mma}
 
@@ -4157,6 +4163,12 @@
   v256 __builtin_vpair_f32_neg (v256);
 VPAIR_F32_NEG vpair_neg_v8sf2 {mma}
 
+  v256 __builtin_vpair_f32_nfma (v256, v256, v256);
+VPAIR_F32_NFMA vpair_nfma_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_nfms (v256, v256, v256);
+VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma}
+
   v256 __builtin_vpair_f32_sub (v256, v256);
 VPAIR_F32_SUB vpair_sub_v8sf3 {mma}
 
@@ -4170,6 +4182,12 @@
   v256 __builtin_vpair_f64_div (v256, v256);
 VPAIR_F64_DIV vpair_div_v4df3 {mma}
 
+  v256 __builtin_vpair_f64_fma (v256, v256, v256);
+VPAIR_F64_FMA vpair_fma_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_fms (v256, v256, v256);
+VPAIR_F64_FMS vpair_fms_v4df4 {mma}
+
   v256 __builtin_vpair_f64_max (v256, v256);
 VPAIR_F64_MAX vpair_smax_v4df3 {mma}
 
@@ -4185,5 +4203,11 @@
   v256 __builtin_vpair_f64_neg (v256);
 VPAIR_F64_NEG vpair_neg_v4df2 {mma}
 
+  v256 __builtin_vpair_f64_nfma (v256, v256, v256);
+VPAIR_F64_NFMA vpair_nfma_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_nfms (v256, v256, v256);
+VPAIR_F64_NFMS vpair_nfms_v4df4 {mma}
+
   v256 __builtin_vpair_f64_sub (v256, v256);
 VPAIR_F64_SUB vpair_sub_v4df3 {mma}
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 4d6ecc83436..aed4081c87b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -174,6 +174,19 @@ extern void vpair_split_unary (rtx [], machine_mode, enum 
rtx_code,
   enum vpair_split_unary);
 extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code);
 
+/* When we are splitting a vector pair FMA operation into two vector 
operations, we
+   may need to modify the code generated.  This enumeration encodes the
+   different choices.  */
+
+enum vpair_split_fma {
+  VPAIR_SPLIT_FMA, /* Fused multiply-add.  */
+  VPAIR_SPLIT_FMS, /* Fused multiply-subtract.  */
+  VPAIR_SPLIT_NFMA,/* Fused negate multiply-add.  */
+  VPAIR_SPLIT_NFMS /* Fused negate multiply-subtract.  */
+};
+
+extern void vpair_split_fma (rtx [], machine_mode, enum vpair_split_fma);
+
 /* Different PowerPC instruction formats that are used by GCC.  There are
various other instruction formats used by the PowerPC hardware, but these
formats are

[gcc(refs/users/meissner/heads/work162-vpair)] Add vector pair init and splat.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:0d1d819a6872d7e6098b00925ce343c39efc7dcf

commit 0d1d819a6872d7e6098b00925ce343c39efc7dcf
Author: Michael Meissner 
Date:   Tue Mar 12 20:56:54 2024 -0400

Add vector pair init and splat.

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New
built-in function.
(__builtin_vpair_f32_splat): Likewise.
(__builtin_vpair_f64_splat): Likewise.
* config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec.
(UNSPEC_VPAIR_SPLAT): Likewise.
(VPAIR_SPLAT_VMODE): New mode iterator.
(VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute.
(vpair_splat_name): Likewise.
(vpair_zero): New insn.
(vpair_splat_): New define_expand.
(vpair_splat__internal): New insns.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-5.c: New test.
* gcc.target/powerpc/vector-pair-6.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtins.def|  10 +++
 gcc/config/rs6000/vector-pair.md | 102 ++-
 gcc/doc/extend.texi  |   9 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-5.c |  56 +
 gcc/testsuite/gcc.target/powerpc/vector-pair-6.c |  56 +
 5 files changed, 232 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 4362cbb8fc7..b757a8630ff 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -4132,6 +4132,10 @@
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
 STXVP nothing {mma,pair}
 
+;; Vector pair built-in functions.
+  v256 __builtin_vpair_zero ();
+VPAIR_ZERO vpair_zero {mma}
+
 ;; Vector pair built-in functions with float elements
   v256 __builtin_vpair_f32_abs (v256);
 VPAIR_F32_ABS vpair_abs_v8sf2 {mma}
@@ -4169,6 +4173,9 @@
   v256 __builtin_vpair_f32_nfms (v256, v256, v256);
 VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma}
 
+  v256 __builtin_vpair_f32_splat (float);
+VPAIR_F32_SPLAT vpair_splat_v8sf {mma}
+
   v256 __builtin_vpair_f32_sub (v256, v256);
 VPAIR_F32_SUB vpair_sub_v8sf3 {mma}
 
@@ -4209,5 +4216,8 @@
   v256 __builtin_vpair_f64_nfms (v256, v256, v256);
 VPAIR_F64_NFMS vpair_nfms_v4df4 {mma}
 
+  v256 __builtin_vpair_f64_splat (double);
+VPAIR_F64_SPLAT vpair_splat_v4df {mma}
+
   v256 __builtin_vpair_f64_sub (v256, v256);
 VPAIR_F64_SUB vpair_sub_v4df3 {mma}
diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md
index 73ae46e6d40..39b419c6814 100644
--- a/gcc/config/rs6000/vector-pair.md
+++ b/gcc/config/rs6000/vector-pair.md
@@ -38,7 +38,9 @@
UNSPEC_VPAIR_NEG
UNSPEC_VPAIR_PLUS
UNSPEC_VPAIR_SMAX
-   UNSPEC_VPAIR_SMIN])
+   UNSPEC_VPAIR_SMIN
+   UNSPEC_VPAIR_ZERO
+   UNSPEC_VPAIR_SPLAT])
 
 ;; Vector pair element ID that defines the scaler element within the vector 
pair.
 (define_c_enum "vpair_element"
@@ -98,6 +100,104 @@
 ;; Map the scalar element ID into the appropriate insn type for divide.
 (define_int_attr vpair_divtype [(VPAIR_ELEMENT_FLOAT  "vecfdiv")
(VPAIR_ELEMENT_DOUBLE "vecdiv")])
+
+;; Mode iterator for the vector modes that we provide splat operations for.
+(define_mode_iterator VPAIR_SPLAT_VMODE [V4SF V2DF])
+
+;; Map element mode to 128-bit vector mode for splat operations
+(define_mode_attr VPAIR_SPLAT_ELEMENT_TO_VMODE [(SF "V4SF")
+   (DF "V2DF")])
+
+;; Map either element mode or vector mode into the name for the splat insn.
+(define_mode_attr vpair_splat_name [(SF   "v8sf")
+   (DF   "v4df")
+   (V4SF "v8sf")
+   (V2DF "v4df")])
+
+;; Initialize a vector pair to 0
+(define_insn_and_split "vpair_zero"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+   (unspec:OO [(const_int 0)] UNSPEC_VPAIR_ZERO))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 1) (match_dup 3))
+   (set (match_dup 2) (match_dup 3))]
+{
+  rtx op0 = operands[0];
+
+  operands[1] = simplify_gen_subreg (V2DFmode, op0, OOmode, 0);
+  operands[2] = simplify_gen_subreg (V2DFmode, op0, OOmode, 16);
+  operands[3] = CONST0_RTX (V2DFmode);
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecperm")])
+
+;; Create a vector pair with a value splat'ed (duplicated) to all of the
+;; elements.
+(define_expand "vpair_splat_"
+  [(use (match_operand:OO 0 "vsx_register_operand"))
+   (use (match_operand:SFDF 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  machine_mode element_mode = mode;
+
+  if (op1 == CONST0_RTX (element_mode))
+{
+  emit_insn (gen_vpair_zero (op0));
+  DONE;
+}
+
+  machine_mode vector_mode = mode;

[gcc(refs/users/meissner/heads/work162-vpair)] Add vector pair optimizations.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:9627a2f2476f7a5eb84de3ef83a9d373e678d619

commit 9627a2f2476f7a5eb84de3ef83a9d373e678d619
Author: Michael Meissner 
Date:   Tue Mar 12 21:03:17 2024 -0400

Add vector pair optimizations.

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/vector-pair.md (vpair_add_neg_3): 
New
combiner insn to convert vector plus/neg into a minus operation.
(vpair_fma__merge): Optimize multiply, 
add/subtract, and
negation into fma operations if the user specifies to create fmas.
(vpair_fma__merge): Likewise.
(vpair_fma__merge2): Likewise.
(vpair_nfma__merge): Likewise.
(vpair_nfms__merge): Likewise.
(vpair_nfms__merge2): Likewise.

gcc/testsuite/

* gcc.target/powerpc/vector-pair-7.c: New test.
* gcc.target/powerpc/vector-pair-8.c: Likewise.
* gcc.target/powerpc/vector-pair-9.c: Likewise.
* gcc.target/powerpc/vector-pair-10.c: Likewise.
* gcc.target/powerpc/vector-pair-11.c: Likewise.
* gcc.target/powerpc/vector-pair-12xs.c: Likewise.

Diff:
---
 gcc/config/rs6000/vector-pair.md  | 224 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-10.c |  61 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-11.c |  65 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-12.c |  65 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-7.c  |  18 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-8.c  |  18 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-9.c  |  61 ++
 7 files changed, 512 insertions(+)

diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md
index 39b419c6814..7a81acbdc05 100644
--- a/gcc/config/rs6000/vector-pair.md
+++ b/gcc/config/rs6000/vector-pair.md
@@ -261,6 +261,31 @@
(set (attr "type") (if_then_else (match_test " == DIV")
(const_string "")
(const_string "")))])
+
+;; Optimize vector pair add of a negative value into a subtract.
+(define_insn_and_split "*vpair_add_neg_3"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+   (unspec:OO
+[(match_operand:OO 1 "vsx_register_operand" "wa")
+ (unspec:OO
+  [(match_operand:OO 2 "vsx_register_operand" "wa")
+   (const_int VPAIR_FP_ELEMENT)]
+  UNSPEC_VPAIR_NEG)
+ (const_int VPAIR_FP_ELEMENT)]
+VPAIR_FP_BINARY))]
+  "TARGET_MMA"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:OO
+[(match_dup 1)
+ (match_dup 2)
+ (const_int VPAIR_FP_ELEMENT)]
+UNSPEC_VPAIR_MINUS))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "")])
 
 ;; Vector pair fused-multiply (FMA) operations.  The last argument in the
 ;; UNSPEC is a CONST_INT which identifies what the scalar element is.
@@ -354,3 +379,202 @@
 }
   [(set_attr "length" "8")
(set_attr "type" "")])
+
+;; Optimize vector pair multiply and vector pair add into vector pair fma,
+;; providing the compiler would do this optimization for scalar and vectors.
+;; Unlike most of the define_insn_and_splits, this can be done before register
+;; allocation.
+(define_insn_and_split "*vpair_fma__merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+   (unspec:OO
+[(unspec:OO
+  [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+   (match_operand:OO 2 "vsx_register_operand" "wa,0")
+   (const_int VPAIR_FP_ELEMENT)]
+  UNSPEC_VPAIR_MULT)
+ (match_operand:OO 3 "vsx_register_operand" "0,wa")
+ (const_int VPAIR_FP_ELEMENT)]
+UNSPEC_VPAIR_PLUS))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:OO
+[(match_dup 1)
+ (match_dup 2)
+ (match_dup 3)
+ (const_int VPAIR_FP_ELEMENT)]
+UNSPEC_VPAIR_FMA))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "")])
+
+;; Merge multiply and subtract.
+(define_insn_and_split "*vpair_fma__merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+   (unspec:OO
+[(unspec:OO
+  [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+   (match_operand:OO 2 "vsx_register_operand" "wa,0")
+   (const_int VPAIR_FP_ELEMENT)]
+  UNSPEC_VPAIR_MULT)
+ (match_operand:OO 3 "vsx_register_operand" "0,wa")
+ (const_int VPAIR_FP_ELEMENT)]
+UNSPEC_VPAIR_MINUS))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:OO
+[(match_dup 1)
+ (match_dup 2)
+ (unspec:OO
+  [(match_dup 3)
+   (const_int VPAIR_FP_ELEMENT)]
+  UNSPEC_VPAIR_NEG)
+ (const_int VPAIR_FP_ELEMENT)]
+UNSPEC_VPAIR_FMA))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "

[gcc(refs/users/meissner/heads/work162-vpair)] Update ChangeLog.*

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:66de2c74aebd4e587a9aa4e20eb0b71dfa7450e2

commit 66de2c74aebd4e587a9aa4e20eb0b71dfa7450e2
Author: Michael Meissner 
Date:   Tue Mar 12 21:09:53 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.vpair | 211 
 1 file changed, 211 insertions(+)

diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair
index faeb03cac7b..184e4f8bccc 100644
--- a/gcc/ChangeLog.vpair
+++ b/gcc/ChangeLog.vpair
@@ -1,5 +1,216 @@
+ Branch work162-vpair, patch #205 
+
+Add vector pair optimizations.
+
+2024-03-12  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/vector-pair.md (vpair_add_neg_3): New
+   combiner insn to convert vector plus/neg into a minus operation.
+   (vpair_fma__merge): Optimize multiply, add/subtract, and
+   negation into fma operations if the user specifies to create fmas.
+   (vpair_fma__merge): Likewise.
+   (vpair_fma__merge2): Likewise.
+   (vpair_nfma__merge): Likewise.
+   (vpair_nfms__merge): Likewise.
+   (vpair_nfms__merge2): Likewise.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-pair-7.c: New test.
+   * gcc.target/powerpc/vector-pair-8.c: Likewise.
+   * gcc.target/powerpc/vector-pair-9.c: Likewise.
+   * gcc.target/powerpc/vector-pair-10.c: Likewise.
+   * gcc.target/powerpc/vector-pair-11.c: Likewise.
+   * gcc.target/powerpc/vector-pair-12xs.c: Likewise.
+
+ Branch work162-vpair, patch #204 
+
+Add vector pair init and splat.
+
+2024-03-12  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New
+   built-in function.
+   (__builtin_vpair_f32_splat): Likewise.
+   (__builtin_vpair_f64_splat): Likewise.
+   * config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec.
+   (UNSPEC_VPAIR_SPLAT): Likewise.
+   (VPAIR_SPLAT_VMODE): New mode iterator.
+   (VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute.
+   (vpair_splat_name): Likewise.
+   (vpair_zero): New insn.
+   (vpair_splat_): New define_expand.
+   (vpair_splat__internal): New insns.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-pair-5.c: New test.
+   * gcc.target/powerpc/vector-pair-6.c: Likewise.
+
+ Branch work162-vpair, patch #203 
+
+Add support for vector pair fma operations.
+
+2024-03-12  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New
+   built-in.
+   (__builtin_vpair_f32_fms): Likewise.
+   (__builtin_vpair_f32_nfma): Likewise.
+   (__builtin_vpair_f32_nfms): Likewise.
+   (__builtin_vpair_f64_fma): Likewise.
+   (__builtin_vpair_f64_fms): Likewise.
+   (__builtin_vpair_f64_nfma): Likewise.
+   * config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New
+   enumeration.
+   (vpair_split_fma): New declaration.
+   * config/rs6000/rs6000.cc (vpair_split_fma): New function to split
+   vector pair FMA operations.
+   * config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec.
+   (vpair_stdname): Add UNSPEC_VPAIR_FMA.
+   (VPAIR_OP): Likewise.
+   (vpair_fma_4): New insns.
+   (vpair_fms_4): Likewise.
+   (vpair_nfma_4): Likewise.
+   (vpair_nfms_4): Likewise.
+   * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document new
+   vector pair fma built-in functions.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-pair-3.c: New test.
+   * gcc.target/powerpc/vector-pair-4.c: Likewise.
+
+ Branch work162-vpair, patch #202 
+
+Add support for vector pair unary and binary operations.
+
+2024-03-12  Michael Meissner  
+
+gcc/
+
+   * config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new
+   built-in functions for vector pair support.
+   * config/rs6000/rs6000-protos.h (enum vpair_split_unary): New
+   enumeration.
+   (vpair_split_unary): New declaration.
+   (vpair_split_binary): Likewise.
+   * config/rs6000/rs6000.cc (vpair_split_unary): New function to split
+   vector pair operations.
+   (vpair_split_binary): Likewise.
+   * config/rs6000/rs6000.md (toplevel): Include vector-pair.md.
+   * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
+   * config/rs6000/vector-pair.md: New file.
+   * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add
+   documentation for the new vector pair built-in functions.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/vector-pair-1.c: New test.
+   * gcc.target/powerpc/vector-pair-2.c: Likewise.
+
+ Branch work162-vpair, patch #201 
+
+Peter's patches for subreg support.
+
+2024-03-12  Peter Bergner  
+
+gcc/
+
+   PR target/109116
+   * gcc/config/rs6000/rs6000.cc (rs6000_modes_tieable_p): M

[gcc(refs/users/meissner/heads/work162-dmf)] Use vector pair load/store for memcpy with -mcpu=future

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:86949afcea130e0b6cb621f55385ca0f90a56a1f

commit 86949afcea130e0b6cb621f55385ca0f90a56a1f
Author: Michael Meissner 
Date:   Wed Mar 13 01:33:25 2024 -0400

Use vector pair load/store for memcpy with -mcpu=future

In the development for the power10 processor, GCC did not enable using the 
load
vector pair and store vector pair instructions when optimizing things like
memory copy.  This patch enables using those instructions if -mcpu=future is
used.

2024-03-12  Michael Meissner  

gcc/

* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable 
using
load vector pair and store vector pair instructions for memory copy
operations.
(POWERPC_MASKS): Make the bit for enabling using load vector pair 
and
store vector pair operations set and reset when the PowerPC 
processor is
changed.

Diff:
---
 gcc/config/rs6000/rs6000-cpus.def | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-cpus.def 
b/gcc/config/rs6000/rs6000-cpus.def
index 47365534af8..4ddba142e44 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -90,6 +90,7 @@
  | OPTION_MASK_POWER11)
 
 #define ISA_FUTURE_MASKS_SERVER(ISA_POWER11_MASKS_SERVER   
\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_FUTURE)
 
 /* Flags that need to be turned off if -mno-vsx.  */
@@ -121,6 +122,7 @@
 
 /* Mask of all options to set the default isa flags based on -mcpu=.  */
 #define POWERPC_MASKS  (OPTION_MASK_ALTIVEC\
+| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\
 | OPTION_MASK_CMPB \
 | OPTION_MASK_CRYPTO   \
 | OPTION_MASK_DFP  \


[gcc(refs/users/meissner/heads/work162-dmf)] Add wD constraint.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:5bd41ca9c05b8483af758e5010f9d000182d3e88

commit 5bd41ca9c05b8483af758e5010f9d000182d3e88
Author: Michael Meissner 
Date:   Wed Mar 13 02:21:06 2024 -0400

Add wD constraint.

This patch adds a new constraint ('wD') that matches the accumulator 
registers
that overlap with VSX registers 0..31 on power10.  Future patches will add 
the
support for a separate accumulator register class that will be used when the
support for dense math registes is added.

2024-03-13   Michael Meissner  

* config/rs6000/constraints.md (wD): New constraint.
* config/rs6000/mma.md (mma_disassemble_acc): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")]
MMA_ACC))]
   "TARGET_MMA"
   " %A0"
@@ -515,7 +513,7 @@
 ;; UNSPEC_VOLATILE.
 
 (define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD")
(unspec_volatile:XO [(const_int 0)]
UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
@@ -523,7 +521,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
MMA_VV))]
@@ -532,8 +530,8 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
MMA_AVV))]
@@ -542,7 +540,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
MMA_PV))]
@@ -551,8 +549,8 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:OO 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
MMA_APV))]
@@ -561,7 +559,7 @@
   [(set_attr "type" "mma")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:SI 3 "const_0_to_15_operand" "n,n")
@@ -574,8 +572,8 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
+   (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
(match_operand:SI 4 "const_0_to_15_operand" "n,n")
@@ -588,7 +586,7 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
+  [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD")
(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
(match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
(match_operand:SI 3 "const_0_to_15_operand" "n,n")
@@ -601,8 +599,8 @@
(set_attr "prefixed" "yes")])
 
 (define_insn "mma_"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-   (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0"

[gcc(refs/users/meissner/heads/work162-dmf)] Add support for dense math registers.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:6d9972ad5488900014564a0b5f3447a7c1fed0ca

commit 6d9972ad5488900014564a0b5f3447a7c1fed0ca
Author: Michael Meissner 
Date:   Wed Mar 13 02:26:35 2024 -0400

Add support for dense math registers.

The MMA subsystem added the notion of accumulator registers as an optional
feature of ISA 3.1 (power10).  In ISA 3.1, these accumulators overlapped 
with
the VSX registers 0..31, but logically the accumulator registers were 
separate
from the FPR registers.  In ISA 3.1, it was anticipated that in future 
systems,
the accumulator registers may no overlap with the FPR registers.  This patch
adds the support for dense math registers as separate registers.

This particular patch does not change the MMA support to use the 
accumulators
within the dense math registers.  This patch just adds the basic support for
having separate DMRs.  The next patch will switch the MMA support to use the
accumulators if -mcpu=future is used.

For testing purposes, I added an undocumented option '-mdense-math' to 
enable
or disable the dense math support.

This patch adds a new constraint (wD).  If MMA is selected but dense math is
not selected (i.e. -mcpu=power10), the wD constraint will allow access to
accumulators that overlap with VSX registers 0..31.  If both MMA and dense 
math
are selected (i.e. -mcpu=future), the wD constraint will only allow dense 
math
registers.

This patch modifies the existing %A output modifier.  If MMA is selected but
dense math is not selected, then %A output modifier converts the VSX 
register
number to the accumulator number, by dividing it by 4.  If both MMA and 
dense
math are selected, then %A will map the separate DMR registers into 0..7.

The intention is that user code using extended asm can be modified to run on
both MMA without dense math and MMA with dense math:

1)  If possible, don't use extended asm, but instead use the MMA 
built-in
functions;

2)  If you do need to write extended asm, change the d constraints
targetting accumulators should now use wD;

3)  Only use the built-in zero, assemble and disassemble functions 
create
move data between vector quad types and dense math accumulators.
I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the
extended asm code.  The reason is these instructions assume there 
is a
1-to-1 correspondence between 4 adjacent FPR registers and an
accumulator that overlaps with those instructions.  With 
accumulators
now being separate registers, there no longer is a 1-to-1
correspondence.

It is possible that the mangling for DMRs and the GDB register numbers may
produce other changes in the future.

2024-03-13   Michael Meissner  

* config/rs6000/mma.md (movxo): Add comments about dense math 
registers.
(movxo_nodm): Rename from movxo and restrict the usage to machines
without dense math registers.
(movxo_dm): New insn for movxo support for machines with dense math
registers.
(mma_): Restrict usage to machines without dense math 
registers.
(mma_xxsetaccz): Make a define_expand, and add support for dense 
math
registers.
(mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to
machines without dense math registers.
(mma_dmsetaccz): New insn.
* config/rs6000/predicates.md (dmr_operand): New predicate.
(accumulator_operand): Add support for dense math registers.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): 
Do
not de-prime accumulator when disassembling a vector quad.
* config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE.
(enum rs6000_reload_reg_type): Add RELOAD_REG_DMR.
(LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD
constraint.
(reload_reg_map): Likewise.
(rs6000_reg_names): Likewise.
(alt_reg_names): Likewise.
(rs6000_hard_regno_nregs_internal): Likewise.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_secondary_reload_memory): Add support for DMR registers.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(print_operand): Make %A handle both FPRs and DMRs.
(rs6000_dmr_register_move_cost): New helper function.
(rs6000_register_move_cost): Add support for DMR registers.

[gcc(refs/users/meissner/heads/work162-dmf)] PowerPC: Switch to dense math names for all MMA operations.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:26e7b15b3a259f753b9862ca7a999ce3e70a8c3d

commit 26e7b15b3a259f753b9862ca7a999ce3e70a8c3d
Author: Michael Meissner 
Date:   Wed Mar 13 02:28:07 2024 -0400

PowerPC: Switch to dense math names for all MMA operations.

This patch changes the assembler instruction names for MMA instructions from
the original name used in power10 to the new name when used with the dense 
math
system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
same bits for either spelling.

For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
instruction.  However, the prefixed instructions have a 'pm' prefix, and we 
add
the 'dm' prefix afterwards.  To prevent having two sets of parallel int
attributes, we remove the "pm" prefix from the instruction string in the
attributes, and add it later, both in the insn name and in the output 
template.

2024-03-13   Michael Meissner  

gcc/

* config/rs6000/mma.md (vvi4i4i8): Change the instruction to not 
have a
"pm" prefix.
(avvi4i4i8): Likewise.
(vvi4i4i2): Likewise.
(avvi4i4i2): Likewise.
(vvi4i4): Likewise.
(avvi4i4): Likewise.
(pvi4i2): Likewise.
(apvi4i2): Likewise.
(vvi4i4i4): Likewise.
(avvi4i4i4): Likewise.
(mma_xxsetaccz): Add support for running on DMF systems, generating 
the
dense math instruction and using the dense math accumulators.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_): Likewise.
(mma_pm): Add support for running on DMF systems, 
generating
the dense math instruction and using the dense math accumulators.
Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm'
prefixes based on whether we have the original MMA specification or 
if
we have dense math support.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.
(mma_pm): Likewise.

Diff:
---
 gcc/config/rs6000/mma.md | 161 +++
 1 file changed, 107 insertions(+), 54 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index 2ce613b46cc..f3870eac51a 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -224,44 +224,47 @@
 (UNSPEC_MMA_XVF64GERNP "xvf64gernp")
 (UNSPEC_MMA_XVF64GERNN "xvf64gernn")])
 
-(define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")])
+;; The "pm" prefix is not in these expansions, so that we can generate
+;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems
+;; without dense math registers.
+(define_int_attr vvi4i4i8  [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")])
 
-(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   
"pmxvi4ger8pp")])
+(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP   "xvi4ger8pp")])
 
-(define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2")
-(UNSPEC_MMA_PMXVI16GER2S   "pmxvi16ger2s")
-(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2")
-(UNSPEC_MMA_PMXVBF16GER2   
"pmxvbf16ger2")])
+(define_int_attr vvi4i4i2  [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2")
+(UNSPEC_MMA_PMXVI16GER2S   "xvi16ger2s")
+(UNSPEC_MMA_PMXVF16GER2"xvf16ger2")
+(UNSPEC_MMA_PMXVBF16GER2   "xvbf16ger2")])
 
-(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "pmxvi16ger2pp")
-(UNSPEC_MMA_PMXVI16GER2SPP 
"pmxvi16ger2spp")
-(UNSPEC_MMA_PMXVF16GER2PP  "pmxvf16ger2pp")
-(UNSPEC_MMA_PMXVF16GER2PN  "pmxvf16ger2pn")
-(UNSPEC_MMA_PMXVF16GER2NP  "pmxvf16ger2np")
-(UNSPEC_MMA_PMXVF16GER2NN  "pmxvf16ger2nn")
-(UNSPEC_MMA_PMXVBF16GER2PP 
"pmxvbf16ger2pp")
-(UNSPEC_MMA_PMXVBF16GER2PN 
"pmxvbf16ger2pn")
-(UNSPEC_MMA_PMXVBF16GER2NP 
"pmxvbf16ger2np")
-(UNSPEC_MMA_PMXVBF16GER2NN 
"pmxvbf16ger2nn")])
+(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP  "xvi16ger2pp")
+(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp")
+(UNSPEC_MMA_PMXVF16GER2PP  "xvf16ger2pp")
+(UNSPE

[gcc(refs/users/meissner/heads/work162-dmf)] Add dense math test for new instruction names.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:da950b93278df73899ae4a6e027fca4c01aa00b2

commit da950b93278df73899ae4a6e027fca4c01aa00b2
Author: Michael Meissner 
Date:   Wed Mar 13 02:28:55 2024 -0400

Add dense math test for new instruction names.

2024-03-13   Michael Meissner  

gcc/testsuite/

* gcc.target/powerpc/dm-double-test.c: New test.
* lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
target test.

Diff:
---
 gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 ++
 gcc/testsuite/lib/target-supports.exp |  23 +++
 2 files changed, 217 insertions(+)

diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c 
b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c
new file mode 100644
index 000..66c19779585
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c
@@ -0,0 +1,194 @@
+/* Test derived from mma-double-1.c, modified for dense math.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_dense_math_ok } */
+/* { dg-options "-mdejagnu-cpu=future -O2" } */
+
+#include 
+#include 
+#include 
+
+typedef unsigned char vec_t __attribute__ ((vector_size (16)));
+typedef double v4sf_t __attribute__ ((vector_size (16)));
+#define SAVE_ACC(ACC, ldc, J)  \
+ __builtin_mma_disassemble_acc (result, ACC); \
+ rowC = (v4sf_t *) &CO[0*ldc+J]; \
+  rowC[0] += result[0]; \
+  rowC = (v4sf_t *) &CO[1*ldc+J]; \
+  rowC[0] += result[1]; \
+  rowC = (v4sf_t *) &CO[2*ldc+J]; \
+  rowC[0] += result[2]; \
+  rowC = (v4sf_t *) &CO[3*ldc+J]; \
+ rowC[0] += result[3];
+
+void
+DM (int m, int n, int k, double *A, double *B, double *C)
+{
+  __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7;
+  v4sf_t result[4];
+  v4sf_t *rowC;
+  for (int l = 0; l < n; l += 4)
+{
+  double *CO;
+  double *AO;
+  AO = A;
+  CO = C;
+  C += m * 4;
+  for (int j = 0; j < m; j += 16)
+   {
+ double *BO = B;
+ __builtin_mma_xxsetaccz (&acc0);
+ __builtin_mma_xxsetaccz (&acc1);
+ __builtin_mma_xxsetaccz (&acc2);
+ __builtin_mma_xxsetaccz (&acc3);
+ __builtin_mma_xxsetaccz (&acc4);
+ __builtin_mma_xxsetaccz (&acc5);
+ __builtin_mma_xxsetaccz (&acc6);
+ __builtin_mma_xxsetaccz (&acc7);
+ unsigned long i;
+
+ for (i = 0; i < k; i++)
+   {
+ vec_t *rowA = (vec_t *) & AO[i * 16];
+ __vector_pair rowB;
+ vec_t *rb = (vec_t *) & BO[i * 4];
+ __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]);
+ __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]);
+ __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]);
+ __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]);
+ __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]);
+ __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]);
+ __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]);
+ __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]);
+ __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]);
+   }
+ SAVE_ACC (&acc0, m, 0);
+ SAVE_ACC (&acc2, m, 4);
+ SAVE_ACC (&acc1, m, 2);
+ SAVE_ACC (&acc3, m, 6);
+ SAVE_ACC (&acc4, m, 8);
+ SAVE_ACC (&acc6, m, 12);
+ SAVE_ACC (&acc5, m, 10);
+ SAVE_ACC (&acc7, m, 14);
+ AO += k * 16;
+ BO += k * 4;
+ CO += 16;
+   }
+  B += k * 4;
+}
+}
+
+void
+init (double *matrix, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+{
+  for (int i = 0; i < row; i++)
+   {
+ matrix[j * row + i] = (i * 16 + 2 + j) / 0.123;
+   }
+}
+}
+
+void
+init0 (double *matrix, double *matrix1, int row, int column)
+{
+  for (int j = 0; j < column; j++)
+for (int i = 0; i < row; i++)
+  matrix[j * row + i] = matrix1[j * row + i] = 0;
+}
+
+
+void
+print (const char *name, const double *matrix, int row, int column)
+{
+  printf ("Matrix %s has %d rows and %d columns:\n", name, row, column);
+  for (int i = 0; i < row; i++)
+{
+  for (int j = 0; j < column; j++)
+   {
+ printf ("%f ", matrix[j * row + i]);
+   }
+  printf ("\n");
+}
+  printf ("\n");
+}
+
+int
+main (int argc, char *argv[])
+{
+  int rowsA, colsB, common;
+  int i, j, k;
+  int ret = 0;
+
+  for (int t = 16; t <= 128; t += 16)
+{
+  for (int t1 = 4; t1 <= 16; t1 += 4)
+   {
+ rowsA = t;
+ colsB = t1;
+ common = 1;
+ /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */
+ double A[rowsA * common];
+ double B[common * colsB];
+ double C[rowsA * colsB];
+ double D[rowsA * colsB];
+
+
+ init (A, rowsA, common);
+ init (B, common, colsB);
+ init0 (C, D, rowsA, colsB);
+ DM (rowsA, colsB, common, A, B, C);
+
+ 

[gcc(refs/users/meissner/heads/work162-dmf)] PowerPC: Add support for 1, 024 bit DMR registers.

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:18c91326e38cead42b2101729a3b97e1816832e8

commit 18c91326e38cead42b2101729a3b97e1816832e8
Author: Michael Meissner 
Date:   Wed Mar 13 02:33:43 2024 -0400

PowerPC: Add support for 1,024 bit DMR registers.

This patch is a prelimianry patch to add the full 1,024 bit dense math 
register
(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of 
the
DMR register.

This patch only adds the new 1,024 bit register support.  It does not add
support for any instructions that need 1,024 bit registers instead of 512 
bit
registers.

I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
registers.  The 'wD' constraint added in previous patches is used for these
registers.  I added support to do load and store of DMRs via the VSX 
registers,
since there are no load/store dense math instructions.  I added the new 
keyword
'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At 
present, I
don't have aliases for __dmr512 and __dmr1024 that we've discussed 
internally.

The patches have been tested on both little and big endian systems.  Can I 
check
it into the master branch?

2024-03-13   Michael Meissner  

gcc/

* config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
(UNSPEC_DM_INSERT512_LOWER): Likewise.
(UNSPEC_DM_EXTRACT512): Likewise.
(UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
(UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
(movtdo): New define_expand and define_insn_and_split to implement 
1,024
bit DMR registers.
(movtdo_insert512_upper): New insn.
(movtdo_insert512_lower): Likewise.
(movtdo_extract512): Likewise.
(reload_dmr_from_memory): Likewise.
(reload_dmr_to_memory): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
support.
(rs6000_init_builtins): Add support for __dmr keyword.
* config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add 
support
for TDOmode.
(rs6000_function_arg): Likewise.
* config/rs6000/rs6000-modes.def (TDOmode): New mode.
* config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
support for TDOmode.
(rs6000_hard_regno_mode_ok_uncached): Likewise.
(rs6000_hard_regno_mode_ok): Likewise.
(rs6000_modes_tieable_p): Likewise.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup 
reload
hooks for DMR mode.
(reg_offset_addressing_ok_p): Add support for TDOmode.
(rs6000_emit_move): Likewise.
(rs6000_secondary_reload_simple_move): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.
(rs6000_mangle_type): Add mangling for __dmr type.
(rs6000_dmr_register_move_cost): Add support for TDOmode.
(rs6000_split_multireg_move): Likewise.
(rs6000_invalid_conversion): Likewise.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
(enum rs6000_builtin_type_index): Add DMR type nodes.
(dmr_type_node): Likewise.
(ptr_dmr_type_node): Likewise.

gcc/testsuite/

* gcc.target/powerpc/dm-1024bit.c: New test.

Diff:
---
 gcc/config/rs6000/mma.md  | 154 ++
 gcc/config/rs6000/rs6000-builtin.cc   |  17 +++
 gcc/config/rs6000/rs6000-call.cc  |  10 +-
 gcc/config/rs6000/rs6000-modes.def|   4 +
 gcc/config/rs6000/rs6000.cc   | 101 -
 gcc/config/rs6000/rs6000.h|   6 +-
 gcc/testsuite/gcc.target/powerpc/dm-1024bit.c |  63 +++
 7 files changed, 321 insertions(+), 34 deletions(-)

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index f3870eac51a..4f9c59046ea 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -91,6 +91,11 @@
UNSPEC_MMA_XVI8GER4SPP
UNSPEC_MMA_XXMFACC
UNSPEC_MMA_XXMTACC
+   UNSPEC_DM_INSERT512_UPPER
+   UNSPEC_DM_INSERT512_LOWER
+   UNSPEC_DM_EXTRACT512
+   UNSPEC_DMR_RELOAD_FROM_MEMORY
+   UNSPEC_DMR_RELOAD_TO_MEMORY
   ])
 
 (define_c_enum "unspecv"
@@ -770,3 +775,152 @@
 }
   [(set_attr "type" "mma")
(set_attr "prefixed" "yes")])
+
+;; TDOmode (__dmr keyword for 1,024 bit registers).
+(define_expand "movtdo"
+  [(set (match_operand:TDO 0 "nonimmediate_operand")
+   (match_operand:TDO 1 "input_operand"))]
+  "TARGET_MMA_DENSE_MATH"
+{
+  rs6000_emit_move (operands[0], operands[1], TDOmode);
+  DONE;
+})
+
+(define_insn_and_split "*movtdo"
+  [(set (match_operand:TDO 0 "noni

[gcc(refs/users/meissner/heads/work162-dmf)] Update ChangeLog.*

2024-03-12 Thread Michael Meissner via Gcc-cvs
https://gcc.gnu.org/g:9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd

commit 9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd
Author: Michael Meissner 
Date:   Wed Mar 13 02:36:19 2024 -0400

Update ChangeLog.*

Diff:
---
 gcc/ChangeLog.dmf | 307 ++
 1 file changed, 307 insertions(+)

diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf
index 4bf550e6556..03ab4ad714c 100644
--- a/gcc/ChangeLog.dmf
+++ b/gcc/ChangeLog.dmf
@@ -1,5 +1,312 @@
+ Branch work162-dmf, patch #106 
+
+PowerPC: Add support for 1,024 bit DMR registers.
+
+This patch is a prelimianry patch to add the full 1,024 bit dense math register
+(DMRs) for -mcpu=future.  The MMA 512-bit accumulators map onto the top of the
+DMR register.
+
+This patch only adds the new 1,024 bit register support.  It does not add
+support for any instructions that need 1,024 bit registers instead of 512 bit
+registers.
+
+I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit
+registers.  The 'wD' constraint added in previous patches is used for these
+registers.  I added support to do load and store of DMRs via the VSX registers,
+since there are no load/store dense math instructions.  I added the new keyword
+'__dmr' to create 1,024 bit types that can be loaded into DMRs.  At present, I
+don't have aliases for __dmr512 and __dmr1024 that we've discussed internally.
+
+The patches have been tested on both little and big endian systems.  Can I 
check
+it into the master branch?
+
+2024-03-13   Michael Meissner  
+
+gcc/
+
+   * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec.
+   (UNSPEC_DM_INSERT512_LOWER): Likewise.
+   (UNSPEC_DM_EXTRACT512): Likewise.
+   (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise.
+   (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise.
+   (movtdo): New define_expand and define_insn_and_split to implement 1,024
+   bit DMR registers.
+   (movtdo_insert512_upper): New insn.
+   (movtdo_insert512_lower): Likewise.
+   (movtdo_extract512): Likewise.
+   (reload_dmr_from_memory): Likewise.
+   (reload_dmr_to_memory): Likewise.
+   * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR
+   support.
+   (rs6000_init_builtins): Add support for __dmr keyword.
+   * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support
+   for TDOmode.
+   (rs6000_function_arg): Likewise.
+   * config/rs6000/rs6000-modes.def (TDOmode): New mode.
+   * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add
+   support for TDOmode.
+   (rs6000_hard_regno_mode_ok_uncached): Likewise.
+   (rs6000_hard_regno_mode_ok): Likewise.
+   (rs6000_modes_tieable_p): Likewise.
+   (rs6000_debug_reg_global): Likewise.
+   (rs6000_setup_reg_addr_masks): Likewise.
+   (rs6000_init_hard_regno_mode_ok): Add support for TDOmode.  Setup reload
+   hooks for DMR mode.
+   (reg_offset_addressing_ok_p): Add support for TDOmode.
+   (rs6000_emit_move): Likewise.
+   (rs6000_secondary_reload_simple_move): Likewise.
+   (rs6000_preferred_reload_class): Likewise.
+   (rs6000_secondary_reload_class): Likewise.
+   (rs6000_mangle_type): Add mangling for __dmr type.
+   (rs6000_dmr_register_move_cost): Add support for TDOmode.
+   (rs6000_split_multireg_move): Likewise.
+   (rs6000_invalid_conversion): Likewise.
+   * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode.
+   (enum rs6000_builtin_type_index): Add DMR type nodes.
+   (dmr_type_node): Likewise.
+   (ptr_dmr_type_node): Likewise.
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/dm-1024bit.c: New test.
+
+ Branch work162-dmf, patch #105 
+
+Add dense math test for new instruction names.
+
+2024-03-13   Michael Meissner  
+
+gcc/testsuite/
+
+   * gcc.target/powerpc/dm-double-test.c: New test.
+   * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New
+   target test.
+
+ Branch work162-dmf, patch #104 
+
+PowerPC: Switch to dense math names for all MMA operations.
+
+This patch changes the assembler instruction names for MMA instructions from
+the original name used in power10 to the new name when used with the dense math
+system.  I.e. xvf64gerpp becomes dmxvf64gerpp.  The assembler will emit the
+same bits for either spelling.
+
+For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the
+instruction.  However, the prefixed instructions have a 'pm' prefix, and we add
+the 'dm' prefix afterwards.  To prevent having two sets of parallel int
+attributes, we remove the "pm" prefix from the instruction string in the
+attributes, and add it later, both in the insn name and in the output template.
+
+2024-03-13   Michael Meissner  
+
+gcc/
+
+   * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a
+   "p