[gcc r14-9435] tree-optimization/114297 - SLP reduction with early break fix
https://gcc.gnu.org/g:c0c57246d5b47459bdb488734bc2c004a92668b5 commit r14-9435-gc0c57246d5b47459bdb488734bc2c004a92668b5 Author: Richard Biener Date: Mon Mar 11 14:58:57 2024 +0100 tree-optimization/114297 - SLP reduction with early break fix The following makes sure to pass in the SLP node for the live stmts we are generating the reduction epilogue for to vect_create_epilog_for_reduction. This follows the previous fix for the non-SLP path. PR tree-optimization/114297 * tree-vect-loop.cc (vectorizable_live_operation): Pass in the live stmts SLP node to vect_create_epilog_for_reduction. * gcc.dg/vect/vect-early-break_123-pr114297.c: New testcase. Diff: --- .../gcc.dg/vect/vect-early-break_123-pr114297.c| 22 ++ gcc/tree-vect-loop.cc | 7 --- 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c new file mode 100644 index 000..84487b7903b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_123-pr114297.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-add-options vect_early_break } */ +/* { dg-require-effective-target vect_early_break } */ + +void h() __attribute__((__noreturn__)); +struct Extremes { + int w; + int h; +}; +struct Extremes *array; +int f(int num, int size1) +{ + int sw = 0, sh = 0; + for (int i = 0; i < size1; ++i) + { +if (num - i == 0) + h(); +sw += array[i].w; +sh += array[i].h; + } + return (sw) + (sh); +} diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 20ee0aad932..4375ebdcb49 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10729,17 +10729,18 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, block, but we have to find an alternate exit first. */ if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo)) { + slp_tree phis_node = slp_node ? slp_node_instance->reduc_phis : NULL; for (auto exit : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo))) if (exit != LOOP_VINFO_IV_EXIT (loop_vinfo)) { vect_create_epilog_for_reduction (loop_vinfo, reduc_info, - slp_node, slp_node_instance, + phis_node, slp_node_instance, exit); break; } if (LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo)) - vect_create_epilog_for_reduction (loop_vinfo, reduc_info, slp_node, - slp_node_instance, + vect_create_epilog_for_reduction (loop_vinfo, reduc_info, + phis_node, slp_node_instance, LOOP_VINFO_IV_EXIT (loop_vinfo)); }
[gcc r14-9436] RISC-V: Fix some code style issue(s) in riscv-c.cc [NFC]
https://gcc.gnu.org/g:cdf0c6604d03afd7f544dd8bd5d43d9ded059ada commit r14-9436-gcdf0c6604d03afd7f544dd8bd5d43d9ded059ada Author: Pan Li Date: Tue Mar 12 15:01:57 2024 +0800 RISC-V: Fix some code style issue(s) in riscv-c.cc [NFC] Notice some code style issue(s) when add __riscv_v_fixed_vlen, includes: * Meanless empty line. * Line greater than 80 chars. * Indent with 3 space(s). * Argument unalignment. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_ext_version_value): Fix code style greater than 80 chars. (riscv_cpu_cpp_builtins): Fix useless empty line, indent with 3 space(s) and argument unalignment. Signed-off-by: Pan Li Diff: --- gcc/config/riscv/riscv-c.cc | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc index 3755ec0b8ef..7029ba88186 100644 --- a/gcc/config/riscv/riscv-c.cc +++ b/gcc/config/riscv/riscv-c.cc @@ -37,7 +37,8 @@ along with GCC; see the file COPYING3. If not see static int riscv_ext_version_value (unsigned major, unsigned minor) { - return (major * RISCV_MAJOR_VERSION_BASE) + (minor * RISCV_MINOR_VERSION_BASE); + return (major * RISCV_MAJOR_VERSION_BASE) ++ (minor * RISCV_MINOR_VERSION_BASE); } /* Implement TARGET_CPU_CPP_BUILTINS. */ @@ -110,7 +111,6 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) case CM_MEDANY: builtin_define ("__riscv_cmodel_medany"); break; - } if (riscv_user_wants_strict_align) @@ -142,9 +142,9 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile) riscv_ext_version_value (0, 12)); } - if (TARGET_XTHEADVECTOR) - builtin_define_with_int_value ("__riscv_th_v_intrinsic", -riscv_ext_version_value (0, 11)); + if (TARGET_XTHEADVECTOR) +builtin_define_with_int_value ("__riscv_th_v_intrinsic", + riscv_ext_version_value (0, 11)); /* Define architecture extension test macros. */ builtin_define_with_int_value ("__riscv_arch_test", 1);
[gcc r14-9437] strlen: Fix another spot that can create invalid ranges [PR114293]
https://gcc.gnu.org/g:39737cdf002637c7a652e9c3e36f369cfce581e5 commit r14-9437-g39737cdf002637c7a652e9c3e36f369cfce581e5 Author: Jakub Jelinek Date: Tue Mar 12 10:23:19 2024 +0100 strlen: Fix another spot that can create invalid ranges [PR114293] This PR is similar to PR110603 fixed with r14-8487, except in a different spot. From the memset with -1 size of non-zero value we determine minimum of (size_t) -1 and the code uses PTRDIFF_MAX - 2 (not really sure I understand why it is - 2 and not - 1, e.g. heap allocated array with PTRDIFF_MAX char elements which contain '\0' in the last element should be fine, no? One can still represent arr[PTRDIFF_MAX] - arr[0] and arr[0] - arr[PTRDIFF_MAX] in ptrdiff_t and strlen (arr) == PTRDIFF_MAX - 1) as the maximum, so again invalid range. As in the other case, it is just UB that can lead to that, and we have choice to only keep the min and use +inf for max, or only keep max and use 0 for min, or not set the range at all, or use [min, min] or [max, max] etc. The following patch uses [min, +inf]. 2024-03-12 Jakub Jelinek PR tree-optimization/114293 * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strlen): If max is smaller than min, set max to ~(size_t)0. * gcc.dg/pr114293.c: New test. Diff: --- gcc/testsuite/gcc.dg/pr114293.c | 10 ++ gcc/tree-ssa-strlen.cc | 2 ++ 2 files changed, 12 insertions(+) diff --git a/gcc/testsuite/gcc.dg/pr114293.c b/gcc/testsuite/gcc.dg/pr114293.c new file mode 100644 index 000..eb49ede0657 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr114293.c @@ -0,0 +1,10 @@ +/* PR tree-optimization/114293 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -w" } */ + +int +foo (int x) +{ + __builtin_memset (&x, 5, -1); + return __builtin_strlen ((char *) &x); +} diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc index 20540c52948..e09c9cc081f 100644 --- a/gcc/tree-ssa-strlen.cc +++ b/gcc/tree-ssa-strlen.cc @@ -2341,6 +2341,8 @@ strlen_pass::handle_builtin_strlen () wide_int min = wi::to_wide (old); wide_int max = wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node)) - 2; + if (wi::gtu_p (min, max)) + max = wi::to_wide (TYPE_MAX_VALUE (TREE_TYPE (lhs))); set_strlen_range (lhs, min, max); } else
[gcc r14-9438] asan: Instrument stores in callees rather than callers [PR112709]
https://gcc.gnu.org/g:ad860cc27b3312f9119c7fecb8638a7c1f6d77c9 commit r14-9438-gad860cc27b3312f9119c7fecb8638a7c1f6d77c9 Author: Jakub Jelinek Date: Tue Mar 12 11:34:50 2024 +0100 asan: Instrument stores in callees rather than callers [PR112709] asan currently instruments since PR69276 r6-6758 fix calls which store the return value into memory on the caller side, before the call it verifies the memory is writable. Now PR112709 where we ICE on trying to instrument such calls made me think about whether that is what we want to do. There are 3 different cases. One is when a function returns an aggregate which is passed e.g. in registers, say like struct S { int a[4]; }; returning on x86_64. That would be ideally instrumented in between the actual call and storing of the aggregate into memory, but asan currently mostly works as a GIMPLE pass and arranging for the instrumentation to happen at that spot would be really hard. We could diagnose after the call but generally asan attempts to diagnose stuff before something is overwritten rather than after, or keep the current behavior (that is what this patch does, which has the disadvantage that it can complain about UB even for functions which never return and so never actually store, and doesn't check whether the memory wasn't e.g. poisoned during the call) or could e.g. instrument both before and after the call (that would have the disadvantage the current state has but at least would check post-factum the store again afterwards). Another case is when a function returns an aggregate through a hidden reference, struct T { int a[128]; }; on x86_64 or even the above struct S on ia32 as example. In the actual program such stores happen when storing something to or its parts in the callee, because there expands to *hidden_retval. So, IMHO we should instrument those in the callee rather than caller, that is where the writes are and we can do that easily. This is what the patch below does. And the last case is for builtins/internal functions. Usually those don't return aggregates, but in case they'd do and can be expanded inline, it is better to instrument them in the caller (as before) rather than not instrumenting the return stores at all. I had to tweak the expected output on the PR69276 testcase, because with the patch it keeps previous behavior on x86_64 (structure returned in registers, stored in the caller, so reported as UB in A::A()), while on i686 it changed the behavior and is reported as UB in the vnull::operator vec which stores the structure, A::A() is then a frame above it in the backtrace. 2024-03-12 Jakub Jelinek PR sanitizer/112709 * asan.cc (has_stmt_been_instrumented_p): Don't instrument call stores on the caller side unless it is a call to a builtin or internal function or function doesn't return by hidden reference. (maybe_instrument_call): Likewise. (instrument_derefs): Instrument stores to RESULT_DECL if returning by hidden reference. * gcc.dg/asan/pr112709-1.c: New test. * g++.dg/asan/pr69276.C: Adjust expected output for some targets. Diff: --- gcc/asan.cc| 17 +-- gcc/testsuite/g++.dg/asan/pr69276.C| 3 +- gcc/testsuite/gcc.dg/asan/pr112709-1.c | 52 ++ 3 files changed, 68 insertions(+), 4 deletions(-) diff --git a/gcc/asan.cc b/gcc/asan.cc index d621ec9c323..c533b09b1a1 100644 --- a/gcc/asan.cc +++ b/gcc/asan.cc @@ -1372,7 +1372,12 @@ has_stmt_been_instrumented_p (gimple *stmt) return true; } } - else if (is_gimple_call (stmt) && gimple_store_p (stmt)) + else if (is_gimple_call (stmt) + && gimple_store_p (stmt) + && (gimple_call_builtin_p (stmt) + || gimple_call_internal_p (stmt) + || !aggregate_value_p (TREE_TYPE (gimple_call_lhs (stmt)), + gimple_call_fntype (stmt { asan_mem_ref r; asan_mem_ref_init (&r, NULL, 1); @@ -2751,7 +2756,9 @@ instrument_derefs (gimple_stmt_iterator *iter, tree t, return; poly_int64 decl_size; - if ((VAR_P (inner) || TREE_CODE (inner) == RESULT_DECL) + if ((VAR_P (inner) + || (TREE_CODE (inner) == RESULT_DECL + && !aggregate_value_p (inner, current_function_decl))) && offset == NULL_TREE && DECL_SIZE (inner) && poly_int_tree_p (DECL_SIZE (inner), &decl_size) @@ -3023,7 +3030,11 @@ maybe_instrument_call (gimple_stmt_iterator *iter) } bool instrumented = false; - if (gimple_store_p (stmt)) + if (gimple_store_p (stmt) + && (gimple_call_builtin_p (stmt) + || gimple_call_internal_p (stmt) + || !aggregate_value_p (TREE_TY
[gcc r14-9439] c++: Support target-specific nodes when streaming modules [PR111224]
https://gcc.gnu.org/g:4aa87b856067d4911de8fb66b3a27659dc75ca6d commit r14-9439-g4aa87b856067d4911de8fb66b3a27659dc75ca6d Author: Nathaniel Shead Date: Sun Mar 10 22:06:18 2024 +1100 c++: Support target-specific nodes when streaming modules [PR111224] Some targets make use of POLY_INT_CSTs and other custom builtin types, which currently violate some assumptions when streaming. This patch adds support for them, such as types like Aarch64 __fp16, PowerPC __ibm128, and vector types thereof. This patch doesn't provide "full" support of AArch64 SVE, however, since for that we would need to support 'target' nodes (tracked in PR108080). Adding the new builtin types means that on Aarch64 we now have 217 global trees created on initialisation (up from 191), so this patch also slightly bumps the initial size of the fixed_trees allocation to 250. PR c++/98645 PR c++/98688 PR c++/111224 gcc/cp/ChangeLog: * module.cc (enum tree_tag): Add new tag for builtin types. (trees_out::start): POLY_INT_CSTs can be emitted. (trees_in::start): Likewise. (trees_out::core_vals): Stream POLY_INT_CSTs. (trees_in::core_vals): Likewise. (trees_out::type_node): Handle vectors with multiple coeffs. (trees_in::tree_node): Likewise. (init_modules): Register target-specific builtin types. Bump initial capacity slightly. gcc/testsuite/ChangeLog: * g++.dg/modules/target-aarch64-1_a.C: New test. * g++.dg/modules/target-aarch64-1_b.C: New test. * g++.dg/modules/target-powerpc-1_a.C: New test. * g++.dg/modules/target-powerpc-1_b.C: New test. * g++.dg/modules/target-powerpc-2_a.C: New test. * g++.dg/modules/target-powerpc-2_b.C: New test. Signed-off-by: Nathaniel Shead Reviewed-by: Patrick Palka Diff: --- gcc/cp/module.cc | 32 --- gcc/testsuite/g++.dg/modules/target-aarch64-1_a.C | 17 gcc/testsuite/g++.dg/modules/target-aarch64-1_b.C | 13 + gcc/testsuite/g++.dg/modules/target-powerpc-1_a.C | 7 + gcc/testsuite/g++.dg/modules/target-powerpc-1_b.C | 10 +++ gcc/testsuite/g++.dg/modules/target-powerpc-2_a.C | 20 ++ gcc/testsuite/g++.dg/modules/target-powerpc-2_b.C | 12 + 7 files changed, 101 insertions(+), 10 deletions(-) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index 99055523d91..8aab9ea0bae 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -5173,7 +5173,6 @@ trees_out::start (tree t, bool code_streamed) break; case FIXED_CST: -case POLY_INT_CST: gcc_unreachable (); /* Not supported in C++. */ break; @@ -5259,7 +5258,6 @@ trees_in::start (unsigned code) case FIXED_CST: case IDENTIFIER_NODE: -case POLY_INT_CST: case SSA_NAME: case TARGET_MEM_REF: case TRANSLATION_UNIT_DECL: @@ -6106,7 +6104,10 @@ trees_out::core_vals (tree t) break; case POLY_INT_CST: - gcc_unreachable (); /* Not supported in C++. */ + if (streaming_p ()) + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + WT (POLY_INT_CST_COEFF (t, ix)); + break; case REAL_CST: if (streaming_p ()) @@ -6615,8 +6616,9 @@ trees_in::core_vals (tree t) break; case POLY_INT_CST: - /* Not suported in C++. */ - return false; + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + RT (POLY_INT_CST_COEFF (t, ix)); + break; case REAL_CST: if (const void *bytes = buf (sizeof (real_value))) @@ -9068,8 +9070,8 @@ trees_out::type_node (tree type) if (streaming_p ()) { poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (type); - /* to_constant asserts that only coeff[0] is of interest. */ - wu (static_cast (nunits.to_constant ())); + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + wu (nunits.coeffs[ix]); } break; } @@ -9630,9 +9632,11 @@ trees_in::tree_node (bool is_use) case VECTOR_TYPE: { - unsigned HOST_WIDE_INT nunits = wu (); + poly_uint64 nunits; + for (unsigned ix = 0; ix != NUM_POLY_INT_COEFFS; ix++) + nunits.coeffs[ix] = wu (); if (!get_overrun ()) - res = build_vector_type (res, static_cast (nunits)); + res = build_vector_type (res, nunits); } break; } @@ -20151,7 +20155,7 @@ init_modules (cpp_reader *reader) some global trees are lazily created and we don't want that to mess with our syndrome of fixed trees. */ unsigned crc = 0; - vec_alloc (fixed_trees, 200); + vec_alloc (fixed_trees, 250); dump () && dump ("+Creating globals");
[gcc r14-9440] tree-optimization/114121 - chrec_fold_{plus, multiply} and recursion
https://gcc.gnu.org/g:73dac51b32575f980289c073969c6d825963d076 commit r14-9440-g73dac51b32575f980289c073969c6d825963d076 Author: Richard Biener Date: Tue Mar 12 14:00:05 2024 +0100 tree-optimization/114121 - chrec_fold_{plus,multiply} and recursion The following addresses endless recursion in the chrec_fold_{plus,multiply} functions when handling sign-conversions. We only need to apply tricks when we'd fail (there's a chrec in the converted operand) and we need to make sure to not turn the other operand into something worse (for the chrec-vs-chrec case). PR tree-optimization/114121 * tree-chrec.cc (chrec_fold_plus_1): Guard recursion with converted operand properly. (chrec_fold_multiply): Likewise. Handle missed recursion. * gcc.dg/torture/pr114312.c: New testcase. Diff: --- gcc/testsuite/gcc.dg/torture/pr114312.c | 15 +++ gcc/tree-chrec.cc | 176 +--- 2 files changed, 107 insertions(+), 84 deletions(-) diff --git a/gcc/testsuite/gcc.dg/torture/pr114312.c b/gcc/testsuite/gcc.dg/torture/pr114312.c new file mode 100644 index 000..c508c64ed19 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr114312.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target bitint } */ + +#if __BITINT_MAXWIDTH__ >= 129 +typedef _BitInt(129) B; +B b; + +B +foo(void) +{ + _BitInt(64) a = 1; + a &= b * b; + return b << a; +} +#endif diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc index 7cd0ebc1010..1b2ed753551 100644 --- a/gcc/tree-chrec.cc +++ b/gcc/tree-chrec.cc @@ -251,23 +251,27 @@ chrec_fold_plus_1 (enum tree_code code, tree type, return chrec_fold_plus_poly_poly (code, type, op0, op1); CASE_CONVERT: - { - /* We can strip sign-conversions to signed by performing the - operation in unsigned. */ - tree optype = TREE_TYPE (TREE_OPERAND (op1, 0)); - if (INTEGRAL_TYPE_P (type) - && INTEGRAL_TYPE_P (optype) - && tree_nop_conversion_p (type, optype) - && TYPE_UNSIGNED (optype)) - return chrec_convert (type, - chrec_fold_plus_1 (code, optype, - chrec_convert (optype, - op0, NULL), - TREE_OPERAND (op1, 0)), - NULL); - if (tree_contains_chrecs (op1, NULL)) + if (tree_contains_chrecs (op1, NULL)) + { + /* We can strip sign-conversions to signed by performing the +operation in unsigned. */ + tree optype = TREE_TYPE (TREE_OPERAND (op1, 0)); + if (INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (optype) + && tree_nop_conversion_p (type, optype) + && TYPE_UNSIGNED (optype)) + { + tree tem = chrec_convert (optype, op0, NULL); + if (TREE_CODE (tem) == POLYNOMIAL_CHREC) + return chrec_convert (type, + chrec_fold_plus_1 (code, optype, +tem, +TREE_OPERAND + (op1, 0)), + NULL); + } return chrec_dont_know; - } + } /* FALLTHRU */ default: @@ -284,26 +288,27 @@ chrec_fold_plus_1 (enum tree_code code, tree type, } CASE_CONVERT: - { - /* We can strip sign-conversions to signed by performing the - operation in unsigned. */ - tree optype = TREE_TYPE (TREE_OPERAND (op0, 0)); - if (INTEGRAL_TYPE_P (type) - && INTEGRAL_TYPE_P (optype) - && tree_nop_conversion_p (type, optype) - && TYPE_UNSIGNED (optype)) - return chrec_convert (type, - chrec_fold_plus_1 (code, optype, - TREE_OPERAND (op0, 0), - chrec_convert (optype, - op1, NULL)), - NULL); - if (tree_contains_chrecs (op0, NULL)) + if (tree_contains_chrecs (op0, NULL)) + { + /* We can strip sign-conversions to signed by performing the +operation in unsigned. */ + tree optype = TREE_TYPE (TREE_OPERAND (op0, 0)); + if (INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (optype) + && tree_nop_conversion_p (type, optype) + && TYPE_UNSIGNED (optype)) + retur
[gcc r13-8421] libstdc++: Optimize std::to_array for trivial types [PR110167]
https://gcc.gnu.org/g:4c6bb36e88d5c8e510b10d12c01e3461c2aa4259 commit r13-8421-g4c6bb36e88d5c8e510b10d12c01e3461c2aa4259 Author: Jonathan Wakely Date: Thu Jun 8 12:24:43 2023 +0100 libstdc++: Optimize std::to_array for trivial types [PR110167] As reported in PR libstdc++/110167, std::to_array compiles extremely slowly for very large arrays. It needs to instantiate a very large specialization of std::index_sequence and then create a very large aggregate initializer from the pack expansion. For trivial types we can simply default-initialize the std::array and then use memcpy to copy the values. For non-trivial types we need to use the existing implementation, despite the compilation cost. As also noted in the PR, using a generic lambda instead of the __to_array helper compiles faster since gcc-13. It also produces slightly smaller code at -O1, due to additional inlining. The code at -Os, -O2 and -O3 seems to be the same. This new implementation requires __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1). libstdc++-v3/ChangeLog: PR libstdc++/110167 * include/std/array (to_array): Initialize arrays of trivial types using memcpy. For non-trivial types, use lambda expressions instead of a separate helper function. (__to_array): Remove. * testsuite/23_containers/array/creation/110167.cc: New test. (cherry picked from commit 960de5dd886572711ef86fa1e15e30d3810eccb9) Diff: --- libstdc++-v3/include/std/array | 53 +++--- .../23_containers/array/creation/110167.cc | 14 ++ 2 files changed, 51 insertions(+), 16 deletions(-) diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array index 97cca454ef9..edcac892b52 100644 --- a/libstdc++-v3/include/std/array +++ b/libstdc++-v3/include/std/array @@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return std::move(std::get<_Int>(__arr)); } -#if __cplusplus > 201703L +#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L #define __cpp_lib_to_array 201907L - - template -constexpr array, sizeof...(_Idx)> -__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>) -{ - if constexpr (_Move) - return {{std::move(__a[_Idx])...}}; - else - return {{__a[_Idx]...}}; -} - template [[nodiscard]] constexpr array, _Nm> @@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static_assert(!is_array_v<_Tp>); static_assert(is_constructible_v<_Tp, _Tp&>); if constexpr (is_constructible_v<_Tp, _Tp&>) - return std::__to_array(__a, make_index_sequence<_Nm>{}); - __builtin_unreachable(); // FIXME: see PR c++/91388 + { + if constexpr (is_trivial_v<_Tp>) + { + array, _Nm> __arr; + if (!__is_constant_evaluated() && _Nm != 0) + __builtin_memcpy((void*)__arr.data(), (void*)__a, sizeof(__a)); + else + for (size_t __i = 0; __i < _Nm; ++__i) + __arr._M_elems[__i] = __a[__i]; + return __arr; + } + else + return [&__a](index_sequence<_Idx...>) { + return array, _Nm>{{ __a[_Idx]... }}; + }(make_index_sequence<_Nm>{}); + } + else + __builtin_unreachable(); // FIXME: see PR c++/91388 } template @@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static_assert(!is_array_v<_Tp>); static_assert(is_move_constructible_v<_Tp>); if constexpr (is_move_constructible_v<_Tp>) - return std::__to_array<1>(__a, make_index_sequence<_Nm>{}); - __builtin_unreachable(); // FIXME: see PR c++/91388 + { + if constexpr (is_trivial_v<_Tp>) + { + array, _Nm> __arr; + if (!__is_constant_evaluated() && _Nm != 0) + __builtin_memcpy((void*)__arr.data(), (void*)__a, sizeof(__a)); + else + for (size_t __i = 0; __i < _Nm; ++__i) + __arr._M_elems[__i] = __a[__i]; + return __arr; + } + else + return [&__a](index_sequence<_Idx...>) { + return array, _Nm>{{ std::move(__a[_Idx])... }}; + }(make_index_sequence<_Nm>{}); + } + else + __builtin_unreachable(); // FIXME: see PR c++/91388 } #endif // C++20 diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc new file mode 100644 index 000..c2aecc911bd --- /dev/null +++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc @@ -0,0 +1,14 @@ +// { dg-options "-std=gnu++20" } +// { dg-do compile { target c++20 } } + +// PR libstdc++/110167 - excessive compile time w
[gcc r13-8422] libstdc++: Fix a -Wsign-compare warning in std::list
https://gcc.gnu.org/g:66c55e4f57135f2df09daeea94e0900862c54799 commit r13-8422-g66c55e4f57135f2df09daeea94e0900862c54799 Author: Jonathan Wakely Date: Wed Aug 9 11:28:56 2023 +0100 libstdc++: Fix a -Wsign-compare warning in std::list libstdc++-v3/ChangeLog: * include/bits/list.tcc (list::sort(Cmp)): Fix -Wsign-compare warning for loop condition. (cherry picked from commit 9bd194434acb47fac80aad45ed04039e0535d1fe) Diff: --- libstdc++-v3/include/bits/list.tcc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libstdc++-v3/include/bits/list.tcc b/libstdc++-v3/include/bits/list.tcc index 3e5b1f7b972..344386aa4d0 100644 --- a/libstdc++-v3/include/bits/list.tcc +++ b/libstdc++-v3/include/bits/list.tcc @@ -654,7 +654,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER { // Move all nodes back into *this. __carry._M_put_all(end()._M_node); - for (int __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i) + for (size_t __i = 0; __i < sizeof(__tmp)/sizeof(__tmp[0]); ++__i) __tmp[__i]._M_put_all(end()._M_node); __throw_exception_again; }
[gcc r13-8423] libstdc++: Add [[nodiscard]] to std::span members
https://gcc.gnu.org/g:779563cff2e18e7891abf57aeee90e8db5035eb5 commit r13-8423-g779563cff2e18e7891abf57aeee90e8db5035eb5 Author: Jonathan Wakely Date: Sat Nov 4 08:30:54 2023 + libstdc++: Add [[nodiscard]] to std::span members All std::span member functions are pure functions that have no side effects. They are only useful for their return value, so they should all warn if that value is not used. libstdc++-v3/ChangeLog: * include/std/span (span, as_bytes, as_writable_bytes): Add [[nodiscard]] attribute on all non-void functions. * testsuite/23_containers/span/back_assert_neg.cc: Suppress nodiscard warning. * testsuite/23_containers/span/back_neg.cc: Likewise. * testsuite/23_containers/span/first_2_assert_neg.cc: Likewise. * testsuite/23_containers/span/first_assert_neg.cc: Likewise. * testsuite/23_containers/span/first_neg.cc: Likewise. * testsuite/23_containers/span/front_assert_neg.cc: Likewise. * testsuite/23_containers/span/front_neg.cc: Likewise. * testsuite/23_containers/span/index_op_assert_neg.cc: Likewise. * testsuite/23_containers/span/index_op_neg.cc: Likewise. * testsuite/23_containers/span/last_2_assert_neg.cc: Likewise. * testsuite/23_containers/span/last_assert_neg.cc: Likewise. * testsuite/23_containers/span/last_neg.cc: Likewise. * testsuite/23_containers/span/subspan_2_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_3_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_4_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_5_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_6_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_assert_neg.cc: Likewise. * testsuite/23_containers/span/subspan_neg.cc: Likewise. * testsuite/23_containers/span/nodiscard.cc: New test. (cherry picked from commit a92a434024c59f57dc24328d946f97a5e71cee94) Diff: --- libstdc++-v3/include/std/span | 26 +- .../23_containers/span/back_assert_neg.cc | 2 +- .../testsuite/23_containers/span/back_neg.cc | 2 +- .../23_containers/span/first_2_assert_neg.cc | 2 +- .../23_containers/span/first_assert_neg.cc | 2 +- .../testsuite/23_containers/span/first_neg.cc | 2 +- .../23_containers/span/front_assert_neg.cc | 2 +- .../testsuite/23_containers/span/front_neg.cc | 2 +- .../23_containers/span/index_op_assert_neg.cc | 2 +- .../testsuite/23_containers/span/index_op_neg.cc | 2 +- .../23_containers/span/last_2_assert_neg.cc| 2 +- .../23_containers/span/last_assert_neg.cc | 2 +- .../testsuite/23_containers/span/last_neg.cc | 2 +- .../testsuite/23_containers/span/nodiscard.cc | 58 ++ .../23_containers/span/subspan_2_assert_neg.cc | 2 +- .../23_containers/span/subspan_3_assert_neg.cc | 2 +- .../23_containers/span/subspan_4_assert_neg.cc | 2 +- .../23_containers/span/subspan_5_assert_neg.cc | 2 +- .../23_containers/span/subspan_6_assert_neg.cc | 2 +- .../23_containers/span/subspan_assert_neg.cc | 2 +- .../testsuite/23_containers/span/subspan_neg.cc| 6 +-- 21 files changed, 103 insertions(+), 23 deletions(-) diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span index 67633899665..b70893779d8 100644 --- a/libstdc++-v3/include/std/span +++ b/libstdc++-v3/include/std/span @@ -248,20 +248,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // observers + [[nodiscard]] constexpr size_type size() const noexcept { return this->_M_extent._M_extent(); } + [[nodiscard]] constexpr size_type size_bytes() const noexcept { return this->_M_extent._M_extent() * sizeof(element_type); } - [[nodiscard]] constexpr bool + [[nodiscard]] + constexpr bool empty() const noexcept { return size() == 0; } // element access + [[nodiscard]] constexpr reference front() const noexcept { @@ -269,6 +273,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return *this->_M_ptr; } + [[nodiscard]] constexpr reference back() const noexcept { @@ -276,6 +281,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return *(this->_M_ptr + (size() - 1)); } + [[nodiscard]] constexpr reference operator[](size_type __idx) const noexcept { @@ -283,41 +289,50 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return *(this->_M_ptr + __idx); } + [[nodiscard]] constexpr pointer data() const noexcept { return this->_M_ptr; }
[gcc r13-8424] libstdc++: Fix UB in weekday::weekday(sys_days) and add test
https://gcc.gnu.org/g:d1472711efc77d5ddc2fa6d5eff57baca584c8ef commit r13-8424-gd1472711efc77d5ddc2fa6d5eff57baca584c8ef Author: Cassio Neri Date: Sun Nov 12 01:33:52 2023 + libstdc++: Fix UB in weekday::weekday(sys_days) and add test The following has undefined behaviour (signed overflow) [1]: weekday max{sys_days{days{numeric_limits::max()}}}; The issue is in this line when __n is very large and __n + 4 overflows: return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6); In addition to fixing this bug, the new implementation makes the compiler emit shorter and branchless code for x86-64 and ARM [2]. [1] https://godbolt.org/z/1s5bv7KfT [2] https://godbolt.org/z/zKsabzrhs libstdc++-v3/ChangeLog: * include/std/chrono (weekday::_S_from_days): Fix UB. * testsuite/std/time/weekday/1.cc: Add test for overflow. (cherry picked from commit f6ce081d0ffb5f25d71eb2f30fcfdff7f20dba22) Diff: --- libstdc++-v3/include/std/chrono | 11 +-- libstdc++-v3/testsuite/std/time/weekday/1.cc | 9 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index ac7febbaa2c..fb8d6c82e8a 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -936,8 +936,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static constexpr weekday _S_from_days(const days& __d) { - auto __n = __d.count(); - return weekday(__n >= -4 ? (__n + 4) % 7 : (__n + 5) % 7 + 6); + using _Rep = days::rep; + using _URep = make_unsigned_t<_Rep>; + const auto __n = __d.count(); + const auto __m = static_cast<_URep>(__n); + + // 1970-01-01 (__n = 0, __m = 0) -> Thursday (4) + // 1969-31-12 (__n = -1, __m = _URep(-1)) -> Wednesday (3) + const auto __offset = __n >= 0 ? _URep(4) : 3 - _URep(-1) % 7 - 7; + return weekday((__m + __offset) % 7); } public: diff --git a/libstdc++-v3/testsuite/std/time/weekday/1.cc b/libstdc++-v3/testsuite/std/time/weekday/1.cc index 1e018eaa3e0..bfc617c4cc8 100644 --- a/libstdc++-v3/testsuite/std/time/weekday/1.cc +++ b/libstdc++-v3/testsuite/std/time/weekday/1.cc @@ -21,6 +21,7 @@ // Class template day [time.cal.weekday] #include +#include constexpr void constexpr_weekday() @@ -38,6 +39,14 @@ constexpr_weekday() static_assert(weekday{3}[2].weekday() == weekday{3}); static_assert(weekday{3}[last].weekday() == weekday{3}); + // Test for UB (overflow). + { +using rep = days::rep; +using std::numeric_limits; +constexpr weekday max{sys_days{days{numeric_limits::max()}}}; +constexpr weekday min{sys_days{days{numeric_limits::min()}}}; + } + static_assert(weekday{sys_days{1900y/January/1}} == Monday); static_assert(weekday{sys_days{1970y/January/1}} == Thursday); static_assert(weekday{sys_days{2020y/August/21}} == Friday);
[gcc r13-8425] libstdc++: Remove unnecessary "& 1" from year_month_day_last::day()
https://gcc.gnu.org/g:29dc5fb5b62364b3a0ef8272c7dab528b91b7ae1 commit r13-8425-g29dc5fb5b62364b3a0ef8272c7dab528b91b7ae1 Author: Cassio Neri Date: Sat Nov 11 16:44:58 2023 + libstdc++: Remove unnecessary "& 1" from year_month_day_last::day() When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised that the operation "& 1" wasn't necessary but we did not patch it at that time. This patch removes the unnecessary operation. libstdc++-v3/ChangeLog: * include/std/chrono (year_month_day_last::day): Remove &1. (cherry picked from commit b011535456396a6846ff24fb5b1baea8fe0a33b1) Diff: --- libstdc++-v3/include/std/chrono | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index fb8d6c82e8a..f22b8097174 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1813,22 +1813,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { const auto __m = static_cast(month()); - // Excluding February, the last day of month __m is either 30 or 31 or, - // in another words, it is 30 + b = 30 | b, where b is in {0, 1}. + // The result is unspecified if __m < 1 or __m > 12. Hence, assume + // 1 <= __m <= 12. For __m != 2, day() == 30 or day() == 31 or, in + // other words, day () == 30 | b, where b is in {0, 1}. - // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if __m is odd. - // Hence, b = __m & 1 = (__m ^ 0) & 1. + // If __m in {1, 3, 4, 5, 6, 7}, then b is 1 if, and only if, __m is + // odd. Hence, b = __m & 1 = (__m ^ 0) & 1. - // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if __m is even. - // Hence, b = (__m ^ 1) & 1. + // If __m in {8, 9, 10, 11, 12}, then b is 1 if, and only if, __m is + // even. Hence, b = (__m ^ 1) & 1. // Therefore, b = (__m ^ c) & 1, where c = 0, if __m < 8, or c = 1 if // __m >= 8, that is, c = __m >> 3. - // The above mathematically justifies this implementation whose - // performance does not depend on look-up tables being on the L1 cache. - return chrono::day{__m != 2 ? ((__m ^ (__m >> 3)) & 1) | 30 - : _M_y.is_leap() ? 29 : 28}; + // Since 30 = (0)_2 and __m <= 31 = (1)_2, the "& 1" in b's + // calculation is unnecessary. + + // The performance of this implementation does not depend on look-up + // tables being on the L1 cache. + return chrono::day{__m != 2 ? (__m ^ (__m >> 3)) | 30 + : _M_y.is_leap() ? 29 : 28}; } constexpr
[gcc r13-8426] libstdc++: Simplify year::is_leap()
https://gcc.gnu.org/g:3cbaada7d9186410a4da6575c27a156e72820ebf commit r13-8426-g3cbaada7d9186410a4da6575c27a156e72820ebf Author: Cassio Neri Date: Sat Nov 11 22:59:50 2023 + libstdc++: Simplify year::is_leap() The current implementation returns (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0; where __is_multiple_of_100 is calculated using an obfuscated algorithm which saves one ror instruction when compared to _M_y % 100 == 0 [1]. In leap years calculation, it's correct to replace the divisibility check by 100 with the one by 25. It turns out that _M_y % 25 == 0 also saves the ror instruction [2]. Therefore, the obfuscation is not required. [1] https://godbolt.org/z/5PaEv6a6b [2] https://godbolt.org/z/55G8rn77e libstdc++-v3/ChangeLog: * include/std/chrono (year::is_leap): Clear code. (cherry picked from commit 86a0df1a6c7fe4a835620b868e76ea78d42d6620) Diff: --- libstdc++-v3/include/std/chrono | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index f22b8097174..57cc803f1af 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -841,29 +841,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr bool is_leap() const noexcept { - // Testing divisibility by 100 first gives better performance, that is, - // return (_M_y % 100 != 0 || _M_y % 400 == 0) && _M_y % 4 == 0; - - // It gets even faster if _M_y is in [-536870800, 536870999] - // (which is the case here) and _M_y % 100 is replaced by - // __is_multiple_of_100 below. + // Testing divisibility by 100 first gives better performance [1], i.e., + // return _M_y % 100 == 0 ? _M_y % 400 == 0 : _M_y % 16 == 0; + // Furthermore, if _M_y % 100 == 0, then _M_y % 400 == 0 is equivalent + // to _M_y % 16 == 0, so we can simplify it to + // return _M_y % 100 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0. // #1 + // Similarly, we can replace 100 with 25 (which is good since + // _M_y % 25 == 0 requires one fewer instruction than _M_y % 100 == 0 + // [2]): + // return _M_y % 25 == 0 ? _M_y % 16 == 0 : _M_y % 4 == 0. // #2 + // Indeed, first assume _M_y % 4 != 0. Then _M_y % 16 != 0 and hence, + // _M_y % 4 == 0 and _M_y % 16 == 0 are both false. Therefore, #2 + // returns false as it should (regardless of _M_y % 25.) Now assume + // _M_y % 4 == 0. In this case, _M_y % 25 == 0 if, and only if, + // _M_y % 100 == 0, that is, #1 and #2 are equivalent. Finally, #2 is + // equivalent to + // return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0. // References: // [1] https://github.com/cassioneri/calendar - // [2] https://accu.org/journals/overload/28/155/overload155.pdf#page=16 - - // Furthermore, if y%100 == 0, then y%400==0 is equivalent to y%16==0, - // so we can simplify it to (!mult_100 && y % 4 == 0) || y % 16 == 0, - // which is equivalent to (y & (mult_100 ? 15 : 3)) == 0. - // See https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html - - constexpr uint32_t __multiplier = 42949673; - constexpr uint32_t __bound= 42949669; - constexpr uint32_t __max_dividend = 1073741799; - constexpr uint32_t __offset = __max_dividend / 2 / 100 * 100; - const bool __is_multiple_of_100 - = __multiplier * (_M_y + __offset) < __bound; - return (_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0; + // [2] https://godbolt.org/z/55G8rn77e + // [3] https://gcc.gnu.org/pipermail/libstdc++/2021-June/052815.html + + return (_M_y & (_M_y % 25 == 0 ? 15 : 3)) == 0; } explicit constexpr
[gcc r13-8428] libstdc++: Remove UB from month and weekday additions and subtractions.
https://gcc.gnu.org/g:2d3cc6806a9fc3c9ac299bb021819bcb5e7605ea commit r13-8428-g2d3cc6806a9fc3c9ac299bb021819bcb5e7605ea Author: Cassio Neri Date: Sun Dec 10 11:31:31 2023 + libstdc++: Remove UB from month and weekday additions and subtractions. The following invoke signed integer overflow (UB) [1]: month + months{MAX} // where MAX is the maximum value of months::rep month + months{MIN} // where MIN is the maximum value of months::rep month - months{MIN} // where MIN is the minimum value of months::rep weekday + days {MAX} // where MAX is the maximum value of days::rep weekday - days {MIN} // where MIN is the minimum value of days::rep For the additions to MAX, the crux of the problem is that, in libstdc++, months::rep and days::rep are int64_t. Other implementations use int32_t, cast operands to int64_t and perform arithmetic operations without risk of overflowing. For month + months{MIN}, the implementation follows the Standard's "returns clause" and evaluates: modulo(static_cast(unsigned{__x}) + (__y.count() - 1), 12); Overflow occurs when MIN - 1 is evaluated. Casting to a larger type could help but, unfortunately again, this is not possible for libstdc++. For the subtraction of MIN, the problem is that -MIN is not representable. It's fair to say that the intention is for these additions/subtractions to be performed in modulus (12 or 7) arithmetic so that no overflow is expected. To fix these UB, this patch implements: template unsigned __add_modulo(unsigned __x, _T __y); template unsigned __sub_modulo(unsigned __x, _T __y); which respectively, returns the remainder of Euclidean division of, __x + __y and __x - __y by __d without overflowing. These functions replace constexpr unsigned __modulo(long long __n, unsigned __d); which also calculates the reminder of __n, where __n is the result of the addition or subtraction. Hence, these operations might invoke UB before __modulo is called and thus, __modulo can't do anything to remediate the issue. In addition to solve the UB issues, __add_modulo and __sub_modulo allow better codegen (shorter and branchless) on x86-64 and ARM [2]. [1] https://godbolt.org/z/a9YfWdn57 [2] https://godbolt.org/z/Gh36cr7E4 libstdc++-v3/ChangeLog: * include/std/chrono: Fix + and - for months and weekdays. * testsuite/std/time/month/1.cc: Add constexpr tests against overflow. * testsuite/std/time/month/2.cc: New test for extreme values. * testsuite/std/time/weekday/1.cc: Add constexpr tests against overflow. * testsuite/std/time/weekday/2.cc: New test for extreme values. (cherry picked from commit 2cb3d42d3f3e7a5345ee7a6f3676a10c84864d72) Diff: --- libstdc++-v3/include/std/chrono | 79 +++- libstdc++-v3/testsuite/std/time/month/1.cc | 19 +++ libstdc++-v3/testsuite/std/time/month/2.cc | 32 +++ libstdc++-v3/testsuite/std/time/weekday/1.cc | 16 +- libstdc++-v3/testsuite/std/time/weekday/2.cc | 32 +++ 5 files changed, 151 insertions(+), 27 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index c303eedd464..b2abf90cf71 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -503,18 +503,47 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION namespace __detail { - // Compute the remainder of the Euclidean division of __n divided by __d. - // Euclidean division truncates toward negative infinity and always - // produces a remainder in the range of [0,__d-1] (whereas standard - // division truncates toward zero and yields a nonpositive remainder - // for negative __n). + // Helper to __add_modulo and __sub_modulo. + template + consteval auto + __modulo_offset() + { + using _Up = make_unsigned_t<_Tp>; + auto constexpr __a = _Up(-1) - _Up(255 + __d - 2); + auto constexpr __b = _Up(__d * (__a / __d) - 1); + // Notice: b <= a - 1 <= _Up(-1) - (255 + d - 1) and b % d = d - 1. + return _Up(-1) - __b; // >= 255 + d - 1 + } + + // Compute the remainder of the Euclidean division of __x + __y divided by + // __d without overflowing. Typically, __x <= 255 + d - 1 is sum of + // weekday/month with a shift in [0, d - 1] and __y is a duration count. + template + constexpr unsigned + __add_modulo(unsigned __x, _Tp __y) + { + using _Up = make_unsigned_t<_Tp>; + // For __y >= 0, _Up(__y) has the same mathematical value as __y and + // this function simply returns (__x + _Up(__y)) % d. Typically, this + // doesn't overflow since the range of _Up contains many more positive +
[gcc r13-8430] libstdc++: Fix std::basic_format_arg::handle for BasicFormatters
https://gcc.gnu.org/g:826f7e5ca3bddf3ff82bc52c09e84f5d35b24dbf commit r13-8430-g826f7e5ca3bddf3ff82bc52c09e84f5d35b24dbf Author: Jonathan Wakely Date: Wed Feb 28 15:05:08 2024 + libstdc++: Fix std::basic_format_arg::handle for BasicFormatters std::basic_format_arg::handle is supposed to format its value as const if that is valid, to reduce the number of instantiations of the formatter's format function. I made a silly typo so that it checks formattable_with not formattable_with, which breaks support for BasicFormatters i.e. ones that can only format non-const types. There's a static_assert in the handle constructor which is supposed to improve diagnostics for trying to format a const argument with a formatter that doesn't support it. That condition can't fail, because the std::basic_format_arg constructor is already constrained to check that the argument type is formattable. The static_assert can be removed. libstdc++-v3/ChangeLog: * include/std/format (basic_format_arg::handle::__maybe_const_t): Fix condition to check if const type is formattable. (basic_format_arg::handle::handle(T&)): Remove redundant static_assert. * testsuite/std/format/formatter/basic.cc: New test. (cherry picked from commit 02ca9d3f0c5d2b0255df28f021834dd67ad79bc2) Diff: --- libstdc++-v3/include/std/format| 6 +- .../testsuite/std/format/formatter/basic.cc| 24 ++ 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 7bcaddb3715..a938d65a7b9 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -2866,7 +2866,7 @@ namespace __format // Format as const if possible, to reduce instantiations. template using __maybe_const_t - = __conditional_t<__formattable<_Tp>, const _Tp, _Tp>; + = __conditional_t<__formattable, const _Tp, _Tp>; template static void @@ -2884,10 +2884,6 @@ namespace __format explicit handle(_Tp& __val) noexcept { - if constexpr (!__formattable) - static_assert(!is_const_v<_Tp>, "std::format argument must be " - "non-const for this type"); - this->_M_ptr = __builtin_addressof(__val); auto __func = _S_format<__maybe_const_t<_Tp>>; this->_M_func = reinterpret_cast(__func); diff --git a/libstdc++-v3/testsuite/std/format/formatter/basic.cc b/libstdc++-v3/testsuite/std/format/formatter/basic.cc new file mode 100644 index 000..56c18864135 --- /dev/null +++ b/libstdc++-v3/testsuite/std/format/formatter/basic.cc @@ -0,0 +1,24 @@ +// { dg-do compile { target c++20 } } + +// BasicFormatter requirements do not require a const parameter. + +#include + +struct X { }; + +template<> struct std::formatter +{ + constexpr auto parse(format_parse_context& ctx) + { return ctx.begin(); } + + // Takes non-const X& + format_context::iterator format(X&, format_context& ctx) const + { +auto out = ctx.out(); +*out++ = 'x'; +return out; + } +}; + +X x; +auto s = std::format("{}", x);
[gcc r13-8427] libstdc++: Improve operator-(weekday x, weekday y)
https://gcc.gnu.org/g:e3e5bdee78df9cb44803af6813e0eb10aa8341c0 commit r13-8427-ge3e5bdee78df9cb44803af6813e0eb10aa8341c0 Author: Cassio Neri Date: Tue Nov 14 00:27:39 2023 + libstdc++: Improve operator-(weekday x, weekday y) The current implementation calls __detail::__modulo which is relatively expensive. A better implementation is possible if we assume that x.ok() && y.ok() == true, so that n = x.c_encoding() - y.c_encoding() is in [-6, 6]. In this case, it suffices to return n >= 0 ? n : n + 7. The above is allowed by [time.cal.wd.nonmembers]/5: the returned value is unspecified when x.ok() || y.ok() == false. The assembly emitted for x86-64 and ARM can be seen in: https://godbolt.org/z/nMdc5vv9n. libstdc++-v3/ChangeLog: * include/std/chrono (operator-(const weekday&, const weekday&)): Optimize. (cherry picked from commit f71352c71d78ac977ea0e71a6900699a8cf09219) Diff: --- libstdc++-v3/include/std/chrono | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index 57cc803f1af..c303eedd464 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -1049,8 +1049,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION friend constexpr days operator-(const weekday& __x, const weekday& __y) noexcept { - auto __n = static_cast(__x._M_wd) - __y._M_wd; - return days{__detail::__modulo(__n, 7)}; + const auto __n = __x.c_encoding() - __y.c_encoding(); + return static_cast(__n) >= 0 ? days{__n} : days{__n + 7}; } };
[gcc r13-8431] libstdc++: Update expiry times for leap seconds lists
https://gcc.gnu.org/g:1870ee44351f182e8782238e9a6732e842eebf1d commit r13-8431-g1870ee44351f182e8782238e9a6732e842eebf1d Author: Jonathan Wakely Date: Fri Mar 1 20:55:10 2024 + libstdc++: Update expiry times for leap seconds lists The list in tzdb.cc isn't the only hardcoded list of leap seconds in the library, there's the one defined inline in (to avoid loading the tzdb for the common case) and another in a testcase. This updates them to note that there are no new leap seconds in 2024 either, until at least 2024-12-28. libstdc++-v3/ChangeLog: * include/std/chrono (__get_leap_second_info): Update expiry time for hardcoded list of leap seconds. * testsuite/std/time/tzdb/leap_seconds.cc: Update comment. (cherry picked from commit ddd347fca0685804bf68d6c768282573f3ea6442) Diff: --- libstdc++-v3/include/std/chrono | 2 +- libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index b2abf90cf71..edb782f6f10 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -3253,7 +3253,7 @@ namespace __detail }; // The list above is known to be valid until (at least) this date // and only contains positive leap seconds. - const sys_seconds __expires(1703721600s); // 2023-12-28 00:00:00 UTC + const sys_seconds __expires(1735344000s); // 2024-12-28 00:00:00 UTC #if _GLIBCXX_USE_CXX11_ABI || ! _GLIBCXX_USE_DUAL_ABI if (__ss > __expires) diff --git a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc index d27038225c8..537fb0670ff 100644 --- a/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc +++ b/libstdc++-v3/testsuite/std/time/tzdb/leap_seconds.cc @@ -22,7 +22,7 @@ void test_load_leapseconds() { std::ofstream("leapseconds") << R"( -# These are all the real leap seconds as of 2022: +# These are all the real leap seconds as of 2024: Leap 1972Jun 30 23:59:60+ S Leap 1972Dec 31 23:59:60+ S Leap 1973Dec 31 23:59:60+ S
[gcc r13-8429] libstdc++: Implement P2905R2 "Runtime format strings" for C++20
https://gcc.gnu.org/g:3c8faeac3d03e032d55fae390618e577c292a83e commit r13-8429-g3c8faeac3d03e032d55fae390618e577c292a83e Author: Jonathan Wakely Date: Sun Jan 7 22:21:08 2024 + libstdc++: Implement P2905R2 "Runtime format strings" for C++20 This change makes std::make_format_args refuse to create dangling references to temporaries. This makes the std::vformat API safer. This was approved in Kona 2023 as a DR for C++20 so the change is implemented unconditionally. libstdc++-v3/ChangeLog: * include/bits/chrono_io.h (__formatter_chrono): Always use lvalue arguments to make_format_args. * include/std/format (make_format_args): Change parameter pack from forwarding references to lvalue references. Remove use of remove_reference_t which is now unnecessary. (format_to, formatted_size): Remove incorrect forwarding of arguments. * testsuite/20_util/duration/io.cc: Use lvalues as arguments to make_format_args. * testsuite/std/format/arguments/args.cc: Likewise. * testsuite/std/format/arguments/lwg3810.cc: Likewise. * testsuite/std/format/functions/format.cc: Likewise. * testsuite/std/format/functions/vformat_to.cc: Likewise. * testsuite/std/format/string.cc: Likewise. * testsuite/std/time/day/io.cc: Likewise. * testsuite/std/time/month/io.cc: Likewise. * testsuite/std/time/weekday/io.cc: Likewise. * testsuite/std/time/year/io.cc: Likewise. * testsuite/std/time/year_month_day/io.cc: Likewise. * testsuite/std/format/arguments/args_neg.cc: New test. (cherry picked from commit 2a8ee2592e48735d88df786cbafa6b0da39fc4d6) Diff: --- libstdc++-v3/include/bits/chrono_io.h | 15 +++ libstdc++-v3/include/std/format| 30 +++--- libstdc++-v3/testsuite/20_util/duration/io.cc | 3 ++- .../testsuite/std/format/arguments/args.cc | 26 ++- .../testsuite/std/format/arguments/args_neg.cc | 12 + .../testsuite/std/format/arguments/lwg3810.cc | 8 -- .../testsuite/std/format/functions/format.cc | 6 +++-- .../testsuite/std/format/functions/vformat_to.cc | 9 +-- libstdc++-v3/testsuite/std/format/string.cc| 7 +++-- libstdc++-v3/testsuite/std/time/day/io.cc | 4 +-- libstdc++-v3/testsuite/std/time/month/io.cc| 4 +-- libstdc++-v3/testsuite/std/time/weekday/io.cc | 4 +-- libstdc++-v3/testsuite/std/time/year/io.cc | 4 +-- .../testsuite/std/time/year_month_day/io.cc| 4 +-- 14 files changed, 91 insertions(+), 45 deletions(-) diff --git a/libstdc++-v3/include/bits/chrono_io.h b/libstdc++-v3/include/bits/chrono_io.h index c42797f64c4..1c08130bf65 100644 --- a/libstdc++-v3/include/bits/chrono_io.h +++ b/libstdc++-v3/include/bits/chrono_io.h @@ -2195,7 +2195,8 @@ namespace chrono _Str __s = _GLIBCXX_WIDEN("{:02d} is not a valid day"); if (__d.ok()) __s = __s.substr(0, 6); - __os << std::vformat(__s, make_format_args<_Ctx>((unsigned)__d)); + auto __u = (unsigned)__d; + __os << std::vformat(__s, make_format_args<_Ctx>(__u)); return __os; } @@ -2213,8 +2214,10 @@ namespace chrono __os << std::vformat(__os.getloc(), __s.substr(0, 6), make_format_args<_Ctx>(__m)); else - __os << std::vformat(__s.substr(6), -make_format_args<_Ctx>((unsigned)__m)); + { + auto __u = (unsigned)__m; + __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__u)); + } return __os; } @@ -2253,8 +2256,10 @@ namespace chrono __os << std::vformat(__os.getloc(), __s.substr(0, 6), make_format_args<_Ctx>(__wd)); else - __os << std::vformat(__s.substr(6), -make_format_args<_Ctx>(__wd.c_encoding())); + { + auto __c = __wd.c_encoding(); + __os << std::vformat(__s.substr(6), make_format_args<_Ctx>(__c)); + } return __os; } diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format index 807c97680c6..7bcaddb3715 100644 --- a/libstdc++-v3/include/std/format +++ b/libstdc++-v3/include/std/format @@ -3117,7 +3117,7 @@ namespace __format template friend auto - make_format_args(_Argz&&...) noexcept; + make_format_args(_Argz&...) noexcept; template friend decltype(auto) @@ -3287,7 +3287,7 @@ namespace __format template friend auto - make_format_args(_Args&&...) noexcept; + make_format_args(_Args&...) noexcept; // An array of _Arg_t enums corresponding to _Args... template @@ -3325,7 +3325,7 @@ namespace __format template
[gcc r14-9441] libgomp/libgomp.texi: Fix @node order in @menu
https://gcc.gnu.org/g:ef79c64cb5762c86ee04ddfcedb7fe31eaa3bac8 commit r14-9441-gef79c64cb5762c86ee04ddfcedb7fe31eaa3bac8 Author: Tobias Burnus Date: Tue Mar 12 15:42:50 2024 +0100 libgomp/libgomp.texi: Fix @node order in @menu While texinfo 7.0.3 does not warn, an older texinfo did complain about: libgomp.texi:1964: warning: node next `omp_target_memcpy' in menu `omp_target_memcpy_rect' and in sectioning `omp_target_memcpy_async' differ libgomp/ * libgomp.texi (Device Memory Routines): Swap item order to match the order of the '@node's of the '@subsection's. Diff: --- libgomp/libgomp.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index bf5c7a76fc9..57165e0e981 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -1783,8 +1783,8 @@ pointers on devices. They have C linkage and do not throw exceptions. * omp_target_is_present:: Check whether storage is mapped * omp_target_is_accessible:: Check whether memory is device accessible * omp_target_memcpy:: Copy data between devices -* omp_target_memcpy_rect:: Copy a subvolume of data between devices * omp_target_memcpy_async:: Copy data between devices asynchronously +* omp_target_memcpy_rect:: Copy a subvolume of data between devices * omp_target_memcpy_rect_async:: Copy a subvolume of data between devices asynchronously @c * omp_target_memset:: /TR12 @c * omp_target_memset_async:: /TR12
[gcc r14-9442] Fortran: handle procedure pointer component in DT array [PR110826]
https://gcc.gnu.org/g:81ee1298b47d3f3b3712ef3f3b2929ca26c4bcd2 commit r14-9442-g81ee1298b47d3f3b3712ef3f3b2929ca26c4bcd2 Author: Harald Anlauf Date: Mon Mar 11 22:05:51 2024 +0100 Fortran: handle procedure pointer component in DT array [PR110826] gcc/fortran/ChangeLog: PR fortran/110826 * array.cc (gfc_array_dimen_size): When walking the ref chain of an array and the ultimate component is a procedure pointer, do not try to figure out its dimension even if it is a array-valued function. gcc/testsuite/ChangeLog: PR fortran/110826 * gfortran.dg/proc_ptr_comp_53.f90: New test. Diff: --- gcc/fortran/array.cc | 7 + gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 | 43 ++ 2 files changed, 50 insertions(+) diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc index 3a6e3a7c95b..e9934f1491b 100644 --- a/gcc/fortran/array.cc +++ b/gcc/fortran/array.cc @@ -2597,6 +2597,13 @@ gfc_array_dimen_size (gfc_expr *array, int dimen, mpz_t *result) case EXPR_FUNCTION: for (ref = array->ref; ref; ref = ref->next) { + /* Ultimate component is a procedure pointer. */ + if (ref->type == REF_COMPONENT + && !ref->next + && ref->u.c.component->attr.function + && IS_PROC_POINTER (ref->u.c.component)) + return false; + if (ref->type != REF_ARRAY) continue; diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 b/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 new file mode 100644 index 000..affb5922235 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/proc_ptr_comp_53.f90 @@ -0,0 +1,43 @@ +! { dg-do compile } +! PR fortran/110826 - procedure pointer component in DT array + +module m + implicit none + + type pp +procedure(func_template), pointer, nopass :: f =>null() + end type pp + + abstract interface + function func_template(state) result(dstate) + implicit none + real, dimension(:,:), intent(in) :: state + real, dimension(size(state,1), size(state,2)) :: dstate + end function + end interface + +contains + + function zero_state(state) result(dstate) +real, dimension(:,:), intent(in) :: state +real, dimension(size(state,1), size(state,2)) :: dstate +dstate = 0. + end function zero_state + +end module m + +program test_func_array + use m + implicit none + + real, dimension(4,6) :: state + type(pp) :: func_scalar + type(pp) :: func_array(4) + + func_scalar %f => zero_state + func_array(1)%f => zero_state + print *, func_scalar %f(state) + print *, func_array(1)%f(state) + if (.not. all (shape (func_scalar %f(state)) == shape (state))) stop 1 + if (.not. all (shape (func_array(1)%f(state)) == shape (state))) stop 2 +end program test_func_array
[gcc/meissner/heads/work162-ajit] (16 commits) Merge commit 'refs/users/meissner/heads/work162-ajit' of gi
The branch 'meissner/heads/work162-ajit' was updated to point to: cc383c3f802... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi It previously pointed to: 5d73f63c135... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi Diff: Summary of changes (added commits): --- cc383c3... Merge commit 'refs/users/meissner/heads/work162-ajit' of gi a3aa724... Add ChangeLog.ajit and update REVISION. 0746a20... Add -mcpu=future tuning support. (*) 5180f01... Add -mcpu=future support. (*) c87c9fa... Add -mcpu=power11 tests. (*) 34edbd6... Add -mcpu=power11 tuning support. (*) a2e7314... Add -mcpu=power11 support. (*) ae3aa18... Revert all changes (*) 6712015... Add -mcpu=future support part 3. (*) ea0193b... Add -mcpu=future support part 2 (*) 522bd06... Add -mcpu=future support. (*) 35fe360... Add -mcpu=power11 tests. (*) 8939ee2... Add -mcpu=power11 support part 3. (*) a09c97d... Add -mcpu=power11 support part 2 (*) 79df8d6... Add -mcpu=power11 support. (*) 448253a... Revert some changes (*) (*) This commit already exists in another branch. Because the reference `refs/users/meissner/heads/work162-ajit' matches your hooks.email-new-commits-only configuration, no separate email is sent for this commit.
[gcc(refs/users/meissner/heads/work162-ajit)] Add ChangeLog.ajit and update REVISION.
https://gcc.gnu.org/g:a3aa724cc83ce2f56cfaa04fa6b3ccd19674eb98 commit a3aa724cc83ce2f56cfaa04fa6b3ccd19674eb98 Author: Michael Meissner Date: Thu Mar 7 11:08:43 2024 -0500 Add ChangeLog.ajit and update REVISION. 2024-03-07 Michael Meissner gcc/ * ChangeLog.ajit: New file for branch. * REVISION: Update. Diff: --- gcc/ChangeLog.ajit | 6 ++ gcc/REVISION | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.ajit b/gcc/ChangeLog.ajit new file mode 100644 index 000..eb5570e2484 --- /dev/null +++ b/gcc/ChangeLog.ajit @@ -0,0 +1,6 @@ + Branch work162-ajit, baseline + +2024-03-07 Michael Meissner + + Clone branch + diff --git a/gcc/REVISION b/gcc/REVISION index 1f2b7b56b83..1fa4bd9178d 100644 --- a/gcc/REVISION +++ b/gcc/REVISION @@ -1 +1 @@ -work162 branch +work162-ajit branch
[gcc(refs/users/meissner/heads/work162-ajit)] Merge commit 'refs/users/meissner/heads/work162-ajit' of git+ssh://gcc.gnu.org/git/gcc into me/work1
https://gcc.gnu.org/g:cc383c3f802f476cbb7d89df27de371a5bffa0ca commit cc383c3f802f476cbb7d89df27de371a5bffa0ca Merge: a3aa724cc83 5d73f63c135 Author: Michael Meissner Date: Tue Mar 12 17:51:42 2024 -0400 Merge commit 'refs/users/meissner/heads/work162-ajit' of git+ssh://gcc.gnu.org/git/gcc into me/work162-ajit Diff:
[gcc/meissner/heads/work162-dmf] (16 commits) Merge commit 'refs/users/meissner/heads/work162-dmf' of git
The branch 'meissner/heads/work162-dmf' was updated to point to: 6d46ee8dac6... Merge commit 'refs/users/meissner/heads/work162-dmf' of git It previously pointed to: f8660bb40a9... Merge commit 'refs/users/meissner/heads/work162-dmf' of git Diff: Summary of changes (added commits): --- 6d46ee8... Merge commit 'refs/users/meissner/heads/work162-dmf' of git 1aef508... Add ChangeLog.dmf and update REVISION. 0746a20... Add -mcpu=future tuning support. (*) 5180f01... Add -mcpu=future support. (*) c87c9fa... Add -mcpu=power11 tests. (*) 34edbd6... Add -mcpu=power11 tuning support. (*) a2e7314... Add -mcpu=power11 support. (*) ae3aa18... Revert all changes (*) 6712015... Add -mcpu=future support part 3. (*) ea0193b... Add -mcpu=future support part 2 (*) 522bd06... Add -mcpu=future support. (*) 35fe360... Add -mcpu=power11 tests. (*) 8939ee2... Add -mcpu=power11 support part 3. (*) a09c97d... Add -mcpu=power11 support part 2 (*) 79df8d6... Add -mcpu=power11 support. (*) 448253a... Revert some changes (*) (*) This commit already exists in another branch. Because the reference `refs/users/meissner/heads/work162-dmf' matches your hooks.email-new-commits-only configuration, no separate email is sent for this commit.
[gcc(refs/users/meissner/heads/work162-dmf)] Add ChangeLog.dmf and update REVISION.
https://gcc.gnu.org/g:1aef508da5a8c94561e0805d3f91a9a0ca2722c1 commit 1aef508da5a8c94561e0805d3f91a9a0ca2722c1 Author: Michael Meissner Date: Thu Mar 7 11:05:59 2024 -0500 Add ChangeLog.dmf and update REVISION. 2024-03-07 Michael Meissner gcc/ * ChangeLog.dmf: New file for branch. * REVISION: Update. Diff: --- gcc/ChangeLog.dmf | 6 ++ gcc/REVISION | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf new file mode 100644 index 000..4bf550e6556 --- /dev/null +++ b/gcc/ChangeLog.dmf @@ -0,0 +1,6 @@ + Branch work162-dmf, baseline + +2024-03-07 Michael Meissner + + Clone branch + diff --git a/gcc/REVISION b/gcc/REVISION index 1f2b7b56b83..58945f7a1ad 100644 --- a/gcc/REVISION +++ b/gcc/REVISION @@ -1 +1 @@ -work162 branch +work162-dmf branch
[gcc(refs/users/meissner/heads/work162-dmf)] Merge commit 'refs/users/meissner/heads/work162-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work16
https://gcc.gnu.org/g:6d46ee8dac66b56b73f3470eabb6f19dd7de162d commit 6d46ee8dac66b56b73f3470eabb6f19dd7de162d Merge: 1aef508da5a f8660bb40a9 Author: Michael Meissner Date: Tue Mar 12 17:58:00 2024 -0400 Merge commit 'refs/users/meissner/heads/work162-dmf' of git+ssh://gcc.gnu.org/git/gcc into me/work162-dmf Diff:
[gcc/meissner/heads/work162-test] (16 commits) Merge commit 'refs/users/meissner/heads/work162-test' of gi
The branch 'meissner/heads/work162-test' was updated to point to: 1fd55caa3fe... Merge commit 'refs/users/meissner/heads/work162-test' of gi It previously pointed to: f8f47c34771... Merge commit 'refs/users/meissner/heads/work162-test' of gi Diff: Summary of changes (added commits): --- 1fd55ca... Merge commit 'refs/users/meissner/heads/work162-test' of gi 6b796c8... Add ChangeLog.test and update REVISION. 0746a20... Add -mcpu=future tuning support. (*) 5180f01... Add -mcpu=future support. (*) c87c9fa... Add -mcpu=power11 tests. (*) 34edbd6... Add -mcpu=power11 tuning support. (*) a2e7314... Add -mcpu=power11 support. (*) ae3aa18... Revert all changes (*) 6712015... Add -mcpu=future support part 3. (*) ea0193b... Add -mcpu=future support part 2 (*) 522bd06... Add -mcpu=future support. (*) 35fe360... Add -mcpu=power11 tests. (*) 8939ee2... Add -mcpu=power11 support part 3. (*) a09c97d... Add -mcpu=power11 support part 2 (*) 79df8d6... Add -mcpu=power11 support. (*) 448253a... Revert some changes (*) (*) This commit already exists in another branch. Because the reference `refs/users/meissner/heads/work162-test' matches your hooks.email-new-commits-only configuration, no separate email is sent for this commit.
[gcc(refs/users/meissner/heads/work162-test)] Add ChangeLog.test and update REVISION.
https://gcc.gnu.org/g:6b796c8c6e10d3991dae549a4ba6e3a3bed22bf1 commit 6b796c8c6e10d3991dae549a4ba6e3a3bed22bf1 Author: Michael Meissner Date: Thu Mar 7 11:09:39 2024 -0500 Add ChangeLog.test and update REVISION. 2024-03-07 Michael Meissner gcc/ * ChangeLog.test: New file for branch. * REVISION: Update. Diff: --- gcc/ChangeLog.test | 6 ++ gcc/REVISION | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.test b/gcc/ChangeLog.test new file mode 100644 index 000..9512bcddaf9 --- /dev/null +++ b/gcc/ChangeLog.test @@ -0,0 +1,6 @@ + Branch work162-test, baseline + +2024-03-07 Michael Meissner + + Clone branch + diff --git a/gcc/REVISION b/gcc/REVISION index 1f2b7b56b83..6bf4941fb03 100644 --- a/gcc/REVISION +++ b/gcc/REVISION @@ -1 +1 @@ -work162 branch +work162-test branch
[gcc(refs/users/meissner/heads/work162-test)] Merge commit 'refs/users/meissner/heads/work162-test' of git+ssh://gcc.gnu.org/git/gcc into me/work1
https://gcc.gnu.org/g:1fd55caa3fe37bc777e52181c9b01f51d1f50ac3 commit 1fd55caa3fe37bc777e52181c9b01f51d1f50ac3 Merge: 6b796c8c6e1 f8f47c34771 Author: Michael Meissner Date: Tue Mar 12 18:03:24 2024 -0400 Merge commit 'refs/users/meissner/heads/work162-test' of git+ssh://gcc.gnu.org/git/gcc into me/work162-test Diff:
[gcc/meissner/heads/work162-vpair] (16 commits) Merge commit 'refs/users/meissner/heads/work162-vpair' of g
The branch 'meissner/heads/work162-vpair' was updated to point to: 3ca2a9f1c96... Merge commit 'refs/users/meissner/heads/work162-vpair' of g It previously pointed to: ed10bc0b1be... Merge commit 'refs/users/meissner/heads/work162-vpair' of g Diff: Summary of changes (added commits): --- 3ca2a9f... Merge commit 'refs/users/meissner/heads/work162-vpair' of g e73aa4f... Add ChangeLog.vpair and update REVISION. 0746a20... Add -mcpu=future tuning support. (*) 5180f01... Add -mcpu=future support. (*) c87c9fa... Add -mcpu=power11 tests. (*) 34edbd6... Add -mcpu=power11 tuning support. (*) a2e7314... Add -mcpu=power11 support. (*) ae3aa18... Revert all changes (*) 6712015... Add -mcpu=future support part 3. (*) ea0193b... Add -mcpu=future support part 2 (*) 522bd06... Add -mcpu=future support. (*) 35fe360... Add -mcpu=power11 tests. (*) 8939ee2... Add -mcpu=power11 support part 3. (*) a09c97d... Add -mcpu=power11 support part 2 (*) 79df8d6... Add -mcpu=power11 support. (*) 448253a... Revert some changes (*) (*) This commit already exists in another branch. Because the reference `refs/users/meissner/heads/work162-vpair' matches your hooks.email-new-commits-only configuration, no separate email is sent for this commit.
[gcc(refs/users/meissner/heads/work162-vpair)] Add ChangeLog.vpair and update REVISION.
https://gcc.gnu.org/g:e73aa4f8e5fc9de58d5aaca7c290b0a9d516664f commit e73aa4f8e5fc9de58d5aaca7c290b0a9d516664f Author: Michael Meissner Date: Thu Mar 7 11:07:00 2024 -0500 Add ChangeLog.vpair and update REVISION. 2024-03-07 Michael Meissner gcc/ * ChangeLog.vpair: New file for branch. * REVISION: Update. Diff: --- gcc/ChangeLog.vpair | 6 ++ gcc/REVISION| 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair new file mode 100644 index 000..faeb03cac7b --- /dev/null +++ b/gcc/ChangeLog.vpair @@ -0,0 +1,6 @@ + Branch work162-vpair, baseline + +2024-03-07 Michael Meissner + + Clone branch + diff --git a/gcc/REVISION b/gcc/REVISION index 1f2b7b56b83..5f53efe48c3 100644 --- a/gcc/REVISION +++ b/gcc/REVISION @@ -1 +1 @@ -work162 branch +work162-vpair branch
[gcc(refs/users/meissner/heads/work162-vpair)] Merge commit 'refs/users/meissner/heads/work162-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work
https://gcc.gnu.org/g:3ca2a9f1c968d61a4de44f410f89b4f98fefd2f2 commit 3ca2a9f1c968d61a4de44f410f89b4f98fefd2f2 Merge: e73aa4f8e5f ed10bc0b1be Author: Michael Meissner Date: Tue Mar 12 18:07:04 2024 -0400 Merge commit 'refs/users/meissner/heads/work162-vpair' of git+ssh://gcc.gnu.org/git/gcc into me/work162-vpair Diff:
[gcc(refs/users/meissner/heads/work162-vpair)] Power10: Add options to disable load and store vector pair.
https://gcc.gnu.org/g:8135a35053e1bf1723ef225a3d75c19a0684f6f2 commit 8135a35053e1bf1723ef225a3d75c19a0684f6f2 Author: Michael Meissner Date: Tue Mar 12 20:09:21 2024 -0400 Power10: Add options to disable load and store vector pair. In working on some future patches that involve utilizing vector pair instructions, I wanted to be able to tune my program to enable or disable using the vector pair load or store operations while still keeping the other operations on the vector pair. This patch adds two undocumented tuning options. The -mno-load-vector-pair option would tell GCC to generate two load vector instructions instead of a single load vector pair. The -mno-store-vector-pair option would tell GCC to generate two store vector instructions instead of a single store vector pair. If either -mno-load-vector-pair is used, GCC will not generate the indexed stxvpx instruction. Similarly if -mno-store-vector-pair is used, GCC will not generate the indexed lxvpx instruction. The reason for this is to enable splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a scratch GPR register. The default for -mcpu=power10 is that both load vector pair and store vector pair are enabled. I added code so that the user code can modify these settings using either a '#pragma GCC target' directive or used __attribute__((__target__(...))) in the function declaration. I added tests for the switches, #pragma, and attribute options. I have built this on both little endian power10 systems and big endian power9 systems doing the normal bootstrap and test. There were no regressions in any of the tests, and the new tests passed. Can I check this patch into the master branch? 2024-03-12 Michael Meissner gcc/ * config/rs6000/mma.md (movoo): Add support for -mno-load-vector-pair and -mno-store-vector-pair. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add support for -mload-vector-pair and -mstore-vector-pair. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): Only allow indexed mode for OOmode if we are generating both load vector pair and store vector pair instructions. (rs6000_option_override_internal): Add support for -mno-load-vector-pair and -mno-store-vector-pair. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.md (isa attribute): Add lxvp and stxvp attributes. (enabled attribute): Likewise. * config/rs6000/rs6000.opt (-mload-vector-pair): New option. (-mstore-vector-pair): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-attribute.c: New test. * gcc.target/powerpc/vector-pair-pragma.c: New test. * gcc.target/powerpc/vector-pair-switch1.c: New test. * gcc.target/powerpc/vector-pair-switch2.c: New test. * gcc.target/powerpc/vector-pair-switch3.c: New test. * gcc.target/powerpc/vector-pair-switch4.c: New test. Diff: --- gcc/config/rs6000/mma.md | 19 +--- gcc/config/rs6000/rs6000-cpus.def | 8 +++- gcc/config/rs6000/rs6000.cc| 30 +++- gcc/config/rs6000/rs6000.md| 10 +++- gcc/config/rs6000/rs6000.opt | 8 .../gcc.target/powerpc/vector-pair-attribute.c | 39 +++ .../gcc.target/powerpc/vector-pair-pragma.c| 55 ++ .../gcc.target/powerpc/vector-pair-switch1.c | 16 +++ .../gcc.target/powerpc/vector-pair-switch2.c | 17 +++ .../gcc.target/powerpc/vector-pair-switch3.c | 17 +++ .../gcc.target/powerpc/vector-pair-switch4.c | 17 +++ 11 files changed, 225 insertions(+), 11 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 04e2d0066df..6a7d8a836db 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -292,27 +292,34 @@ gcc_assert (false); }) +;; If the user used -mno-store-vector-pair or -mno-load-vector pair, use an +;; alternative that does not allow indexed addresses so we can split the load +;; or store. (define_insn_and_split "*movoo" - [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") - (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))] + [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,wa,ZwO,QwO,wa") + (match_operand:OO 1 "input_operand" "ZwO,QwO,wa,wa,wa"))] "TARGET_MMA && (gpc_reg_operand (operands[0], OOmode) || gpc_reg_operand (operands[1], OOmode))" "@ lxvp%X1 %x0,%1 + # stxvp%X0 %x1,%0 + # #" "&& reload_completed - && (!MEM_P (operands[0]) && !MEM_P (ope
[gcc(refs/users/meissner/heads/work162-vpair)] Peter's patches for subreg support.
https://gcc.gnu.org/g:5368f7b97a553d623c5787b3b6d71505732a9c47 commit 5368f7b97a553d623c5787b3b6d71505732a9c47 Author: Michael Meissner Date: Tue Mar 12 20:18:26 2024 -0400 Peter's patches for subreg support. 2024-03-12 Peter Bergner gcc/ PR target/109116 * gcc/config/rs6000/rs6000.cc (rs6000_modes_tieable_p): Make OOmode tieable with 128-bit vector modes. 2024-01-23 Peter Bergner gcc/ PR target/109116 * gcc/config/rs6000/mma.md (vsx_disassemble_pair): Use SUBREG's instead of UNSPEC's. (mma_disassemble_acc): Likewise. Diff: --- gcc/config/rs6000/mma.md| 50 - gcc/config/rs6000/rs6000.cc | 9 +--- 2 files changed, 10 insertions(+), 49 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 6a7d8a836db..831e646c473 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -405,29 +405,8 @@ (match_operand 2 "const_0_to_1_operand")] "TARGET_MMA" { - rtx src; - int regoff = INTVAL (operands[2]); - src = gen_rtx_UNSPEC (V16QImode, - gen_rtvec (2, operands[1], GEN_INT (regoff)), - UNSPEC_MMA_EXTRACT); - emit_move_insn (operands[0], src); - DONE; -}) - -(define_insn_and_split "*vsx_disassemble_pair" - [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa") - (unspec:V16QI [(match_operand:OO 1 "vsx_register_operand" "wa") - (match_operand 2 "const_0_to_1_operand")] - UNSPEC_MMA_EXTRACT))] - "TARGET_MMA - && vsx_register_operand (operands[1], OOmode)" - "#" - "&& reload_completed" - [(const_int 0)] -{ - int reg = REGNO (operands[1]); - int regoff = INTVAL (operands[2]); - rtx src = gen_rtx_REG (V16QImode, reg + regoff); + int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode); + rtx src = simplify_gen_subreg (V16QImode, operands[1], OOmode, regoff); emit_move_insn (operands[0], src); DONE; }) @@ -479,29 +458,8 @@ (match_operand 2 "const_0_to_3_operand")] "TARGET_MMA" { - rtx src; - int regoff = INTVAL (operands[2]); - src = gen_rtx_UNSPEC (V16QImode, - gen_rtvec (2, operands[1], GEN_INT (regoff)), - UNSPEC_MMA_EXTRACT); - emit_move_insn (operands[0], src); - DONE; -}) - -(define_insn_and_split "*mma_disassemble_acc" - [(set (match_operand:V16QI 0 "mma_disassemble_output_operand" "=mwa") - (unspec:V16QI [(match_operand:XO 1 "fpr_reg_operand" "d") - (match_operand 2 "const_0_to_3_operand")] - UNSPEC_MMA_EXTRACT))] - "TARGET_MMA - && fpr_reg_operand (operands[1], XOmode)" - "#" - "&& reload_completed" - [(const_int 0)] -{ - int reg = REGNO (operands[1]); - int regoff = INTVAL (operands[2]); - rtx src = gen_rtx_REG (V16QImode, reg + regoff); + int regoff = INTVAL (operands[2]) * GET_MODE_SIZE (V16QImode); + rtx src = simplify_gen_subreg (V16QImode, operands[1], XOmode, regoff); emit_move_insn (operands[0], src); DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 08198fa9fdf..e37e0a74ebe 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -1975,9 +1975,12 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode) static bool rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2) { - if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode - || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode) -return mode1 == mode2; + if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode + || mode2 == PTImode || mode2 == XOmode) + return mode1 == mode2; + + if (mode2 == OOmode) +return ALTIVEC_OR_VSX_VECTOR_MODE (mode1); if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1)) return ALTIVEC_OR_VSX_VECTOR_MODE (mode2);
[gcc(refs/users/meissner/heads/work162-vpair)] Add support for vector pair unary and binary operations.
https://gcc.gnu.org/g:581eca4771f36cbcb9241daeef266dbe36ed27a8 commit 581eca4771f36cbcb9241daeef266dbe36ed27a8 Author: Michael Meissner Date: Tue Mar 12 20:23:48 2024 -0400 Add support for vector pair unary and binary operations. 2024-03-12 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new built-in functions for vector pair support. * config/rs6000/rs6000-protos.h (enum vpair_split_unary): New enumeration. (vpair_split_unary): New declaration. (vpair_split_binary): Likewise. * config/rs6000/rs6000.cc (vpair_split_unary): New function to split vector pair operations. (vpair_split_binary): Likewise. * config/rs6000/rs6000.md (toplevel): Include vector-pair.md. * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md. * config/rs6000/vector-pair.md: New file. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add documentation for the new vector pair built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-1.c: New test. * gcc.target/powerpc/vector-pair-2.c: Likewise. Diff: --- gcc/config/rs6000/rs6000-builtins.def| 56 gcc/config/rs6000/rs6000-protos.h| 12 ++ gcc/config/rs6000/rs6000.cc | 67 ++ gcc/config/rs6000/rs6000.md | 1 + gcc/config/rs6000/t-rs6000 | 1 + gcc/config/rs6000/vector-pair.md | 160 +++ gcc/doc/extend.texi | 51 gcc/testsuite/gcc.target/powerpc/vector-pair-1.c | 87 gcc/testsuite/gcc.target/powerpc/vector-pair-2.c | 86 9 files changed, 521 insertions(+) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 3bc7fed6956..83e7206e989 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4131,3 +4131,59 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} + +;; Vector pair built-in functions with float elements + v256 __builtin_vpair_f32_abs (v256); +VPAIR_F32_ABS vpair_abs_v8sf2 {mma} + + v256 __builtin_vpair_f32_add (v256, v256); +VPAIR_F32_ADD vpair_add_v8sf3 {mma} + + v256 __builtin_vpair_f32_div (v256, v256); +VPAIR_F32_DIV vpair_div_v8sf3 {mma} + + v256 __builtin_vpair_f32_max (v256, v256); +VPAIR_F32_MAX vpair_smax_v8sf3 {mma} + + v256 __builtin_vpair_f32_min (v256, v256); +VPAIR_F32_MIN vpair_smin_v8sf3 {mma} + + v256 __builtin_vpair_f32_mul (v256, v256); +VPAIR_F32_MUL vpair_mul_v8sf3 {mma} + + v256 __builtin_vpair_f32_nabs (v256); +VPAIR_F32_NABS vpair_nabs_v8sf2 {mma} + + v256 __builtin_vpair_f32_neg (v256); +VPAIR_F32_NEG vpair_neg_v8sf2 {mma} + + v256 __builtin_vpair_f32_sub (v256, v256); +VPAIR_F32_SUB vpair_sub_v8sf3 {mma} + +;; Vector pair built-in functions with double elements + v256 __builtin_vpair_f64_abs (v256); +VPAIR_F64_ABS vpair_abs_v4df2 {mma} + + v256 __builtin_vpair_f64_add (v256, v256); +VPAIR_F64_ADD vpair_add_v4df3 {mma} + + v256 __builtin_vpair_f64_div (v256, v256); +VPAIR_F64_DIV vpair_div_v4df3 {mma} + + v256 __builtin_vpair_f64_max (v256, v256); +VPAIR_F64_MAX vpair_smax_v4df3 {mma} + + v256 __builtin_vpair_f64_min (v256, v256); +VPAIR_F64_MIN vpair_smin_v4df3 {mma} + + v256 __builtin_vpair_f64_mul (v256, v256); +VPAIR_F64_MUL vpair_mul_v4df3 {mma} + + v256 __builtin_vpair_f64_nabs (v256); +VPAIR_F64_NABS vpair_nabs_v4df2 {mma} + + v256 __builtin_vpair_f64_neg (v256); +VPAIR_F64_NEG vpair_neg_v4df2 {mma} + + v256 __builtin_vpair_f64_sub (v256, v256); +VPAIR_F64_SUB vpair_sub_v4df3 {mma} diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 09a57a806fa..4d6ecc83436 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -162,6 +162,18 @@ extern bool rs6000_pcrel_p (void); extern bool rs6000_fndecl_pcrel_p (const_tree); extern void rs6000_output_addr_vec_elt (FILE *, int); +/* If we are splitting a vector pair unary operator into two separate vector + operations, we need to generate a NEG if this is NABS. */ + +enum vpair_split_unary { + VPAIR_SPLIT_NORMAL, /* No extra processing is needed. */ + VPAIR_SPLIT_NEGATE /* Wrap operation with a NEG. */ +}; + +extern void vpair_split_unary (rtx [], machine_mode, enum rtx_code, + enum vpair_split_unary); +extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code); + /* Different PowerPC instruction formats that are used by GCC. There are various other instruction formats used by the PowerPC hardware, but these formats are not currently
[gcc(refs/users/meissner/heads/work162-vpair)] Add support for vector pair fma operations.
https://gcc.gnu.org/g:e1939c7a8b72c315ba15751d40bb439231499a1e commit e1939c7a8b72c315ba15751d40bb439231499a1e Author: Michael Meissner Date: Tue Mar 12 20:29:24 2024 -0400 Add support for vector pair fma operations. 2024-03-12 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New built-in. (__builtin_vpair_f32_fms): Likewise. (__builtin_vpair_f32_nfma): Likewise. (__builtin_vpair_f32_nfms): Likewise. (__builtin_vpair_f64_fma): Likewise. (__builtin_vpair_f64_fms): Likewise. (__builtin_vpair_f64_nfma): Likewise. * config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New enumeration. (vpair_split_fma): New declaration. * config/rs6000/rs6000.cc (vpair_split_fma): New function to split vector pair FMA operations. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec. (vpair_stdname): Add UNSPEC_VPAIR_FMA. (VPAIR_OP): Likewise. (vpair_fma_4): New insns. (vpair_fms_4): Likewise. (vpair_nfma_4): Likewise. (vpair_nfms_4): Likewise. * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document new vector pair fma built-in functions. gcc/testsuite/ * gcc.target/powerpc/vector-pair-3.c: New test. * gcc.target/powerpc/vector-pair-4.c: Likewise. Diff: --- gcc/config/rs6000/rs6000-builtins.def| 24 ++ gcc/config/rs6000/rs6000-protos.h| 13 gcc/config/rs6000/rs6000.cc | 71 ++ gcc/config/rs6000/vector-pair.md | 96 gcc/doc/extend.texi | 25 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-3.c | 57 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-4.c | 57 ++ 7 files changed, 343 insertions(+) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 83e7206e989..4362cbb8fc7 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4142,6 +4142,12 @@ v256 __builtin_vpair_f32_div (v256, v256); VPAIR_F32_DIV vpair_div_v8sf3 {mma} + v256 __builtin_vpair_f32_fma (v256, v256, v256); +VPAIR_F32_FMA vpair_fma_v8sf4 {mma} + + v256 __builtin_vpair_f32_fms (v256, v256, v256); +VPAIR_F32_FMS vpair_fms_v8sf4 {mma} + v256 __builtin_vpair_f32_max (v256, v256); VPAIR_F32_MAX vpair_smax_v8sf3 {mma} @@ -4157,6 +4163,12 @@ v256 __builtin_vpair_f32_neg (v256); VPAIR_F32_NEG vpair_neg_v8sf2 {mma} + v256 __builtin_vpair_f32_nfma (v256, v256, v256); +VPAIR_F32_NFMA vpair_nfma_v8sf4 {mma} + + v256 __builtin_vpair_f32_nfms (v256, v256, v256); +VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma} + v256 __builtin_vpair_f32_sub (v256, v256); VPAIR_F32_SUB vpair_sub_v8sf3 {mma} @@ -4170,6 +4182,12 @@ v256 __builtin_vpair_f64_div (v256, v256); VPAIR_F64_DIV vpair_div_v4df3 {mma} + v256 __builtin_vpair_f64_fma (v256, v256, v256); +VPAIR_F64_FMA vpair_fma_v4df4 {mma} + + v256 __builtin_vpair_f64_fms (v256, v256, v256); +VPAIR_F64_FMS vpair_fms_v4df4 {mma} + v256 __builtin_vpair_f64_max (v256, v256); VPAIR_F64_MAX vpair_smax_v4df3 {mma} @@ -4185,5 +4203,11 @@ v256 __builtin_vpair_f64_neg (v256); VPAIR_F64_NEG vpair_neg_v4df2 {mma} + v256 __builtin_vpair_f64_nfma (v256, v256, v256); +VPAIR_F64_NFMA vpair_nfma_v4df4 {mma} + + v256 __builtin_vpair_f64_nfms (v256, v256, v256); +VPAIR_F64_NFMS vpair_nfms_v4df4 {mma} + v256 __builtin_vpair_f64_sub (v256, v256); VPAIR_F64_SUB vpair_sub_v4df3 {mma} diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 4d6ecc83436..aed4081c87b 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -174,6 +174,19 @@ extern void vpair_split_unary (rtx [], machine_mode, enum rtx_code, enum vpair_split_unary); extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code); +/* When we are splitting a vector pair FMA operation into two vector operations, we + may need to modify the code generated. This enumeration encodes the + different choices. */ + +enum vpair_split_fma { + VPAIR_SPLIT_FMA, /* Fused multiply-add. */ + VPAIR_SPLIT_FMS, /* Fused multiply-subtract. */ + VPAIR_SPLIT_NFMA,/* Fused negate multiply-add. */ + VPAIR_SPLIT_NFMS /* Fused negate multiply-subtract. */ +}; + +extern void vpair_split_fma (rtx [], machine_mode, enum vpair_split_fma); + /* Different PowerPC instruction formats that are used by GCC. There are various other instruction formats used by the PowerPC hardware, but these formats are
[gcc(refs/users/meissner/heads/work162-vpair)] Add vector pair init and splat.
https://gcc.gnu.org/g:0d1d819a6872d7e6098b00925ce343c39efc7dcf commit 0d1d819a6872d7e6098b00925ce343c39efc7dcf Author: Michael Meissner Date: Tue Mar 12 20:56:54 2024 -0400 Add vector pair init and splat. 2024-03-12 Michael Meissner gcc/ * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New built-in function. (__builtin_vpair_f32_splat): Likewise. (__builtin_vpair_f64_splat): Likewise. * config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec. (UNSPEC_VPAIR_SPLAT): Likewise. (VPAIR_SPLAT_VMODE): New mode iterator. (VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute. (vpair_splat_name): Likewise. (vpair_zero): New insn. (vpair_splat_): New define_expand. (vpair_splat__internal): New insns. gcc/testsuite/ * gcc.target/powerpc/vector-pair-5.c: New test. * gcc.target/powerpc/vector-pair-6.c: Likewise. Diff: --- gcc/config/rs6000/rs6000-builtins.def| 10 +++ gcc/config/rs6000/vector-pair.md | 102 ++- gcc/doc/extend.texi | 9 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-5.c | 56 + gcc/testsuite/gcc.target/powerpc/vector-pair-6.c | 56 + 5 files changed, 232 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def index 4362cbb8fc7..b757a8630ff 100644 --- a/gcc/config/rs6000/rs6000-builtins.def +++ b/gcc/config/rs6000/rs6000-builtins.def @@ -4132,6 +4132,10 @@ void __builtin_vsx_stxvp (v256, unsigned long, const v256 *); STXVP nothing {mma,pair} +;; Vector pair built-in functions. + v256 __builtin_vpair_zero (); +VPAIR_ZERO vpair_zero {mma} + ;; Vector pair built-in functions with float elements v256 __builtin_vpair_f32_abs (v256); VPAIR_F32_ABS vpair_abs_v8sf2 {mma} @@ -4169,6 +4173,9 @@ v256 __builtin_vpair_f32_nfms (v256, v256, v256); VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma} + v256 __builtin_vpair_f32_splat (float); +VPAIR_F32_SPLAT vpair_splat_v8sf {mma} + v256 __builtin_vpair_f32_sub (v256, v256); VPAIR_F32_SUB vpair_sub_v8sf3 {mma} @@ -4209,5 +4216,8 @@ v256 __builtin_vpair_f64_nfms (v256, v256, v256); VPAIR_F64_NFMS vpair_nfms_v4df4 {mma} + v256 __builtin_vpair_f64_splat (double); +VPAIR_F64_SPLAT vpair_splat_v4df {mma} + v256 __builtin_vpair_f64_sub (v256, v256); VPAIR_F64_SUB vpair_sub_v4df3 {mma} diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 73ae46e6d40..39b419c6814 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -38,7 +38,9 @@ UNSPEC_VPAIR_NEG UNSPEC_VPAIR_PLUS UNSPEC_VPAIR_SMAX - UNSPEC_VPAIR_SMIN]) + UNSPEC_VPAIR_SMIN + UNSPEC_VPAIR_ZERO + UNSPEC_VPAIR_SPLAT]) ;; Vector pair element ID that defines the scaler element within the vector pair. (define_c_enum "vpair_element" @@ -98,6 +100,104 @@ ;; Map the scalar element ID into the appropriate insn type for divide. (define_int_attr vpair_divtype [(VPAIR_ELEMENT_FLOAT "vecfdiv") (VPAIR_ELEMENT_DOUBLE "vecdiv")]) + +;; Mode iterator for the vector modes that we provide splat operations for. +(define_mode_iterator VPAIR_SPLAT_VMODE [V4SF V2DF]) + +;; Map element mode to 128-bit vector mode for splat operations +(define_mode_attr VPAIR_SPLAT_ELEMENT_TO_VMODE [(SF "V4SF") + (DF "V2DF")]) + +;; Map either element mode or vector mode into the name for the splat insn. +(define_mode_attr vpair_splat_name [(SF "v8sf") + (DF "v4df") + (V4SF "v8sf") + (V2DF "v4df")]) + +;; Initialize a vector pair to 0 +(define_insn_and_split "vpair_zero" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO [(const_int 0)] UNSPEC_VPAIR_ZERO))] + "TARGET_MMA" + "#" + "&& reload_completed" + [(set (match_dup 1) (match_dup 3)) + (set (match_dup 2) (match_dup 3))] +{ + rtx op0 = operands[0]; + + operands[1] = simplify_gen_subreg (V2DFmode, op0, OOmode, 0); + operands[2] = simplify_gen_subreg (V2DFmode, op0, OOmode, 16); + operands[3] = CONST0_RTX (V2DFmode); +} + [(set_attr "length" "8") + (set_attr "type" "vecperm")]) + +;; Create a vector pair with a value splat'ed (duplicated) to all of the +;; elements. +(define_expand "vpair_splat_" + [(use (match_operand:OO 0 "vsx_register_operand")) + (use (match_operand:SFDF 1 "input_operand"))] + "TARGET_MMA" +{ + rtx op0 = operands[0]; + rtx op1 = operands[1]; + machine_mode element_mode = mode; + + if (op1 == CONST0_RTX (element_mode)) +{ + emit_insn (gen_vpair_zero (op0)); + DONE; +} + + machine_mode vector_mode = mode;
[gcc(refs/users/meissner/heads/work162-vpair)] Add vector pair optimizations.
https://gcc.gnu.org/g:9627a2f2476f7a5eb84de3ef83a9d373e678d619 commit 9627a2f2476f7a5eb84de3ef83a9d373e678d619 Author: Michael Meissner Date: Tue Mar 12 21:03:17 2024 -0400 Add vector pair optimizations. 2024-03-12 Michael Meissner gcc/ * config/rs6000/vector-pair.md (vpair_add_neg_3): New combiner insn to convert vector plus/neg into a minus operation. (vpair_fma__merge): Optimize multiply, add/subtract, and negation into fma operations if the user specifies to create fmas. (vpair_fma__merge): Likewise. (vpair_fma__merge2): Likewise. (vpair_nfma__merge): Likewise. (vpair_nfms__merge): Likewise. (vpair_nfms__merge2): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-7.c: New test. * gcc.target/powerpc/vector-pair-8.c: Likewise. * gcc.target/powerpc/vector-pair-9.c: Likewise. * gcc.target/powerpc/vector-pair-10.c: Likewise. * gcc.target/powerpc/vector-pair-11.c: Likewise. * gcc.target/powerpc/vector-pair-12xs.c: Likewise. Diff: --- gcc/config/rs6000/vector-pair.md | 224 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-10.c | 61 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-11.c | 65 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-12.c | 65 +++ gcc/testsuite/gcc.target/powerpc/vector-pair-7.c | 18 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-8.c | 18 ++ gcc/testsuite/gcc.target/powerpc/vector-pair-9.c | 61 ++ 7 files changed, 512 insertions(+) diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md index 39b419c6814..7a81acbdc05 100644 --- a/gcc/config/rs6000/vector-pair.md +++ b/gcc/config/rs6000/vector-pair.md @@ -261,6 +261,31 @@ (set (attr "type") (if_then_else (match_test " == DIV") (const_string "") (const_string "")))]) + +;; Optimize vector pair add of a negative value into a subtract. +(define_insn_and_split "*vpair_add_neg_3" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + (unspec:OO +[(match_operand:OO 1 "vsx_register_operand" "wa") + (unspec:OO + [(match_operand:OO 2 "vsx_register_operand" "wa") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] +VPAIR_FP_BINARY))] + "TARGET_MMA" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO +[(match_dup 1) + (match_dup 2) + (const_int VPAIR_FP_ELEMENT)] +UNSPEC_VPAIR_MINUS))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) ;; Vector pair fused-multiply (FMA) operations. The last argument in the ;; UNSPEC is a CONST_INT which identifies what the scalar element is. @@ -354,3 +379,202 @@ } [(set_attr "length" "8") (set_attr "type" "")]) + +;; Optimize vector pair multiply and vector pair add into vector pair fma, +;; providing the compiler would do this optimization for scalar and vectors. +;; Unlike most of the define_insn_and_splits, this can be done before register +;; allocation. +(define_insn_and_split "*vpair_fma__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO +[(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] +UNSPEC_VPAIR_PLUS))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO +[(match_dup 1) + (match_dup 2) + (match_dup 3) + (const_int VPAIR_FP_ELEMENT)] +UNSPEC_VPAIR_FMA))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "")]) + +;; Merge multiply and subtract. +(define_insn_and_split "*vpair_fma__merge" + [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa") + (unspec:OO +[(unspec:OO + [(match_operand:OO 1 "vsx_register_operand" "%wa,wa") + (match_operand:OO 2 "vsx_register_operand" "wa,0") + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_MULT) + (match_operand:OO 3 "vsx_register_operand" "0,wa") + (const_int VPAIR_FP_ELEMENT)] +UNSPEC_VPAIR_MINUS))] + "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:OO +[(match_dup 1) + (match_dup 2) + (unspec:OO + [(match_dup 3) + (const_int VPAIR_FP_ELEMENT)] + UNSPEC_VPAIR_NEG) + (const_int VPAIR_FP_ELEMENT)] +UNSPEC_VPAIR_FMA))] +{ +} + [(set_attr "length" "8") + (set_attr "type" "
[gcc(refs/users/meissner/heads/work162-vpair)] Update ChangeLog.*
https://gcc.gnu.org/g:66de2c74aebd4e587a9aa4e20eb0b71dfa7450e2 commit 66de2c74aebd4e587a9aa4e20eb0b71dfa7450e2 Author: Michael Meissner Date: Tue Mar 12 21:09:53 2024 -0400 Update ChangeLog.* Diff: --- gcc/ChangeLog.vpair | 211 1 file changed, 211 insertions(+) diff --git a/gcc/ChangeLog.vpair b/gcc/ChangeLog.vpair index faeb03cac7b..184e4f8bccc 100644 --- a/gcc/ChangeLog.vpair +++ b/gcc/ChangeLog.vpair @@ -1,5 +1,216 @@ + Branch work162-vpair, patch #205 + +Add vector pair optimizations. + +2024-03-12 Michael Meissner + +gcc/ + + * config/rs6000/vector-pair.md (vpair_add_neg_3): New + combiner insn to convert vector plus/neg into a minus operation. + (vpair_fma__merge): Optimize multiply, add/subtract, and + negation into fma operations if the user specifies to create fmas. + (vpair_fma__merge): Likewise. + (vpair_fma__merge2): Likewise. + (vpair_nfma__merge): Likewise. + (vpair_nfms__merge): Likewise. + (vpair_nfms__merge2): Likewise. + +gcc/testsuite/ + + * gcc.target/powerpc/vector-pair-7.c: New test. + * gcc.target/powerpc/vector-pair-8.c: Likewise. + * gcc.target/powerpc/vector-pair-9.c: Likewise. + * gcc.target/powerpc/vector-pair-10.c: Likewise. + * gcc.target/powerpc/vector-pair-11.c: Likewise. + * gcc.target/powerpc/vector-pair-12xs.c: Likewise. + + Branch work162-vpair, patch #204 + +Add vector pair init and splat. + +2024-03-12 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New + built-in function. + (__builtin_vpair_f32_splat): Likewise. + (__builtin_vpair_f64_splat): Likewise. + * config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec. + (UNSPEC_VPAIR_SPLAT): Likewise. + (VPAIR_SPLAT_VMODE): New mode iterator. + (VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute. + (vpair_splat_name): Likewise. + (vpair_zero): New insn. + (vpair_splat_): New define_expand. + (vpair_splat__internal): New insns. + +gcc/testsuite/ + + * gcc.target/powerpc/vector-pair-5.c: New test. + * gcc.target/powerpc/vector-pair-6.c: Likewise. + + Branch work162-vpair, patch #203 + +Add support for vector pair fma operations. + +2024-03-12 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New + built-in. + (__builtin_vpair_f32_fms): Likewise. + (__builtin_vpair_f32_nfma): Likewise. + (__builtin_vpair_f32_nfms): Likewise. + (__builtin_vpair_f64_fma): Likewise. + (__builtin_vpair_f64_fms): Likewise. + (__builtin_vpair_f64_nfma): Likewise. + * config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New + enumeration. + (vpair_split_fma): New declaration. + * config/rs6000/rs6000.cc (vpair_split_fma): New function to split + vector pair FMA operations. + * config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec. + (vpair_stdname): Add UNSPEC_VPAIR_FMA. + (VPAIR_OP): Likewise. + (vpair_fma_4): New insns. + (vpair_fms_4): Likewise. + (vpair_nfma_4): Likewise. + (vpair_nfms_4): Likewise. + * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document new + vector pair fma built-in functions. + +gcc/testsuite/ + + * gcc.target/powerpc/vector-pair-3.c: New test. + * gcc.target/powerpc/vector-pair-4.c: Likewise. + + Branch work162-vpair, patch #202 + +Add support for vector pair unary and binary operations. + +2024-03-12 Michael Meissner + +gcc/ + + * config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new + built-in functions for vector pair support. + * config/rs6000/rs6000-protos.h (enum vpair_split_unary): New + enumeration. + (vpair_split_unary): New declaration. + (vpair_split_binary): Likewise. + * config/rs6000/rs6000.cc (vpair_split_unary): New function to split + vector pair operations. + (vpair_split_binary): Likewise. + * config/rs6000/rs6000.md (toplevel): Include vector-pair.md. + * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md. + * config/rs6000/vector-pair.md: New file. + * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add + documentation for the new vector pair built-in functions. + +gcc/testsuite/ + + * gcc.target/powerpc/vector-pair-1.c: New test. + * gcc.target/powerpc/vector-pair-2.c: Likewise. + + Branch work162-vpair, patch #201 + +Peter's patches for subreg support. + +2024-03-12 Peter Bergner + +gcc/ + + PR target/109116 + * gcc/config/rs6000/rs6000.cc (rs6000_modes_tieable_p): M
[gcc(refs/users/meissner/heads/work162-dmf)] Use vector pair load/store for memcpy with -mcpu=future
https://gcc.gnu.org/g:86949afcea130e0b6cb621f55385ca0f90a56a1f commit 86949afcea130e0b6cb621f55385ca0f90a56a1f Author: Michael Meissner Date: Wed Mar 13 01:33:25 2024 -0400 Use vector pair load/store for memcpy with -mcpu=future In the development for the power10 processor, GCC did not enable using the load vector pair and store vector pair instructions when optimizing things like memory copy. This patch enables using those instructions if -mcpu=future is used. 2024-03-12 Michael Meissner gcc/ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using load vector pair and store vector pair instructions for memory copy operations. (POWERPC_MASKS): Make the bit for enabling using load vector pair and store vector pair operations set and reset when the PowerPC processor is changed. Diff: --- gcc/config/rs6000/rs6000-cpus.def | 2 ++ 1 file changed, 2 insertions(+) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 47365534af8..4ddba142e44 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -90,6 +90,7 @@ | OPTION_MASK_POWER11) #define ISA_FUTURE_MASKS_SERVER(ISA_POWER11_MASKS_SERVER \ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\ | OPTION_MASK_FUTURE) /* Flags that need to be turned off if -mno-vsx. */ @@ -121,6 +122,7 @@ /* Mask of all options to set the default isa flags based on -mcpu=. */ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC\ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \
[gcc(refs/users/meissner/heads/work162-dmf)] Add wD constraint.
https://gcc.gnu.org/g:5bd41ca9c05b8483af758e5010f9d000182d3e88 commit 5bd41ca9c05b8483af758e5010f9d000182d3e88 Author: Michael Meissner Date: Wed Mar 13 02:21:06 2024 -0400 Add wD constraint. This patch adds a new constraint ('wD') that matches the accumulator registers that overlap with VSX registers 0..31 on power10. Future patches will add the support for a separate accumulator register class that will be used when the support for dense math registes is added. 2024-03-13 Michael Meissner * config/rs6000/constraints.md (wD): New constraint. * config/rs6000/mma.md (mma_disassemble_acc): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")] + [(set (match_operand:XO 0 "accumulator_operand" "=&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0")] MMA_ACC))] "TARGET_MMA" " %A0" @@ -515,7 +513,7 @@ ;; UNSPEC_VOLATILE. (define_insn "mma_xxsetaccz" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + [(set (match_operand:XO 0 "accumulator_operand" "=wD") (unspec_volatile:XO [(const_int 0)] UNSPECV_MMA_XXSETACCZ))] "TARGET_MMA" @@ -523,7 +521,7 @@ [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_VV))] @@ -532,8 +530,8 @@ [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_AVV))] @@ -542,7 +540,7 @@ [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")] MMA_PV))] @@ -551,8 +549,8 @@ [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:OO 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")] MMA_APV))] @@ -561,7 +559,7 @@ [(set_attr "type" "mma")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -574,8 +572,8 @@ (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") + (unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:V16QI 3 "vsx_register_operand" "v,?wa") (match_operand:SI 4 "const_0_to_15_operand" "n,n") @@ -588,7 +586,7 @@ (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") + [(set (match_operand:XO 0 "accumulator_operand" "=&wD,&wD") (unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa") (match_operand:V16QI 2 "vsx_register_operand" "v,?wa") (match_operand:SI 3 "const_0_to_15_operand" "n,n") @@ -601,8 +599,8 @@ (set_attr "prefixed" "yes")]) (define_insn "mma_" - [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d") - (unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0"
[gcc(refs/users/meissner/heads/work162-dmf)] Add support for dense math registers.
https://gcc.gnu.org/g:6d9972ad5488900014564a0b5f3447a7c1fed0ca commit 6d9972ad5488900014564a0b5f3447a7c1fed0ca Author: Michael Meissner Date: Wed Mar 13 02:26:35 2024 -0400 Add support for dense math registers. The MMA subsystem added the notion of accumulator registers as an optional feature of ISA 3.1 (power10). In ISA 3.1, these accumulators overlapped with the VSX registers 0..31, but logically the accumulator registers were separate from the FPR registers. In ISA 3.1, it was anticipated that in future systems, the accumulator registers may no overlap with the FPR registers. This patch adds the support for dense math registers as separate registers. This particular patch does not change the MMA support to use the accumulators within the dense math registers. This patch just adds the basic support for having separate DMRs. The next patch will switch the MMA support to use the accumulators if -mcpu=future is used. For testing purposes, I added an undocumented option '-mdense-math' to enable or disable the dense math support. This patch adds a new constraint (wD). If MMA is selected but dense math is not selected (i.e. -mcpu=power10), the wD constraint will allow access to accumulators that overlap with VSX registers 0..31. If both MMA and dense math are selected (i.e. -mcpu=future), the wD constraint will only allow dense math registers. This patch modifies the existing %A output modifier. If MMA is selected but dense math is not selected, then %A output modifier converts the VSX register number to the accumulator number, by dividing it by 4. If both MMA and dense math are selected, then %A will map the separate DMR registers into 0..7. The intention is that user code using extended asm can be modified to run on both MMA without dense math and MMA with dense math: 1) If possible, don't use extended asm, but instead use the MMA built-in functions; 2) If you do need to write extended asm, change the d constraints targetting accumulators should now use wD; 3) Only use the built-in zero, assemble and disassemble functions create move data between vector quad types and dense math accumulators. I.e. do not use the xxmfacc, xxmtacc, and xxsetaccz directly in the extended asm code. The reason is these instructions assume there is a 1-to-1 correspondence between 4 adjacent FPR registers and an accumulator that overlaps with those instructions. With accumulators now being separate registers, there no longer is a 1-to-1 correspondence. It is possible that the mangling for DMRs and the GDB register numbers may produce other changes in the future. 2024-03-13 Michael Meissner * config/rs6000/mma.md (movxo): Add comments about dense math registers. (movxo_nodm): Rename from movxo and restrict the usage to machines without dense math registers. (movxo_dm): New insn for movxo support for machines with dense math registers. (mma_): Restrict usage to machines without dense math registers. (mma_xxsetaccz): Make a define_expand, and add support for dense math registers. (mma_xxsetaccz_nodm): Rename from mma_xxsetaccz, and restrict to machines without dense math registers. (mma_dmsetaccz): New insn. * config/rs6000/predicates.md (dmr_operand): New predicate. (accumulator_operand): Add support for dense math registers. * config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_mma_builtin): Do not de-prime accumulator when disassembling a vector quad. * config/rs6000/rs6000.cc (enum rs6000_reg_type): Add DMR_REG_TYPE. (enum rs6000_reload_reg_type): Add RELOAD_REG_DMR. (LAST_RELOAD_REG_CLASS): Add support for DMR registers and the wD constraint. (reload_reg_map): Likewise. (rs6000_reg_names): Likewise. (alt_reg_names): Likewise. (rs6000_hard_regno_nregs_internal): Likewise. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_secondary_reload_memory): Add support for DMR registers. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (print_operand): Make %A handle both FPRs and DMRs. (rs6000_dmr_register_move_cost): New helper function. (rs6000_register_move_cost): Add support for DMR registers.
[gcc(refs/users/meissner/heads/work162-dmf)] PowerPC: Switch to dense math names for all MMA operations.
https://gcc.gnu.org/g:26e7b15b3a259f753b9862ca7a999ce3e70a8c3d commit 26e7b15b3a259f753b9862ca7a999ce3e70a8c3d Author: Michael Meissner Date: Wed Mar 13 02:28:07 2024 -0400 PowerPC: Switch to dense math names for all MMA operations. This patch changes the assembler instruction names for MMA instructions from the original name used in power10 to the new name when used with the dense math system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the same bits for either spelling. For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the instruction. However, the prefixed instructions have a 'pm' prefix, and we add the 'dm' prefix afterwards. To prevent having two sets of parallel int attributes, we remove the "pm" prefix from the instruction string in the attributes, and add it later, both in the insn name and in the output template. 2024-03-13 Michael Meissner gcc/ * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a "pm" prefix. (avvi4i4i8): Likewise. (vvi4i4i2): Likewise. (avvi4i4i2): Likewise. (vvi4i4): Likewise. (avvi4i4): Likewise. (pvi4i2): Likewise. (apvi4i2): Likewise. (vvi4i4i4): Likewise. (avvi4i4i4): Likewise. (mma_xxsetaccz): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_): Likewise. (mma_pm): Add support for running on DMF systems, generating the dense math instruction and using the dense math accumulators. Rename the insn with a 'pm' prefix and add either 'pm' or 'pmdm' prefixes based on whether we have the original MMA specification or if we have dense math support. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. (mma_pm): Likewise. Diff: --- gcc/config/rs6000/mma.md | 161 +++ 1 file changed, 107 insertions(+), 54 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 2ce613b46cc..f3870eac51a 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -224,44 +224,47 @@ (UNSPEC_MMA_XVF64GERNP "xvf64gernp") (UNSPEC_MMA_XVF64GERNN "xvf64gernn")]) -(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "pmxvi4ger8")]) +;; The "pm" prefix is not in these expansions, so that we can generate +;; pmdmxvi4ger8 on systems with dense math registers and xvi4ger8 on systems +;; without dense math registers. +(define_int_attr vvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8 "xvi4ger8")]) -(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "pmxvi4ger8pp")]) +(define_int_attr avvi4i4i8 [(UNSPEC_MMA_PMXVI4GER8PP "xvi4ger8pp")]) -(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"pmxvi16ger2") -(UNSPEC_MMA_PMXVI16GER2S "pmxvi16ger2s") -(UNSPEC_MMA_PMXVF16GER2"pmxvf16ger2") -(UNSPEC_MMA_PMXVBF16GER2 "pmxvbf16ger2")]) +(define_int_attr vvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2"xvi16ger2") +(UNSPEC_MMA_PMXVI16GER2S "xvi16ger2s") +(UNSPEC_MMA_PMXVF16GER2"xvf16ger2") +(UNSPEC_MMA_PMXVBF16GER2 "xvbf16ger2")]) -(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "pmxvi16ger2pp") -(UNSPEC_MMA_PMXVI16GER2SPP "pmxvi16ger2spp") -(UNSPEC_MMA_PMXVF16GER2PP "pmxvf16ger2pp") -(UNSPEC_MMA_PMXVF16GER2PN "pmxvf16ger2pn") -(UNSPEC_MMA_PMXVF16GER2NP "pmxvf16ger2np") -(UNSPEC_MMA_PMXVF16GER2NN "pmxvf16ger2nn") -(UNSPEC_MMA_PMXVBF16GER2PP "pmxvbf16ger2pp") -(UNSPEC_MMA_PMXVBF16GER2PN "pmxvbf16ger2pn") -(UNSPEC_MMA_PMXVBF16GER2NP "pmxvbf16ger2np") -(UNSPEC_MMA_PMXVBF16GER2NN "pmxvbf16ger2nn")]) +(define_int_attr avvi4i4i2 [(UNSPEC_MMA_PMXVI16GER2PP "xvi16ger2pp") +(UNSPEC_MMA_PMXVI16GER2SPP "xvi16ger2spp") +(UNSPEC_MMA_PMXVF16GER2PP "xvf16ger2pp") +(UNSPE
[gcc(refs/users/meissner/heads/work162-dmf)] Add dense math test for new instruction names.
https://gcc.gnu.org/g:da950b93278df73899ae4a6e027fca4c01aa00b2 commit da950b93278df73899ae4a6e027fca4c01aa00b2 Author: Michael Meissner Date: Wed Mar 13 02:28:55 2024 -0400 Add dense math test for new instruction names. 2024-03-13 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/dm-double-test.c: New test. * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New target test. Diff: --- gcc/testsuite/gcc.target/powerpc/dm-double-test.c | 194 ++ gcc/testsuite/lib/target-supports.exp | 23 +++ 2 files changed, 217 insertions(+) diff --git a/gcc/testsuite/gcc.target/powerpc/dm-double-test.c b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c new file mode 100644 index 000..66c19779585 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/dm-double-test.c @@ -0,0 +1,194 @@ +/* Test derived from mma-double-1.c, modified for dense math. */ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_dense_math_ok } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +#include +#include +#include + +typedef unsigned char vec_t __attribute__ ((vector_size (16))); +typedef double v4sf_t __attribute__ ((vector_size (16))); +#define SAVE_ACC(ACC, ldc, J) \ + __builtin_mma_disassemble_acc (result, ACC); \ + rowC = (v4sf_t *) &CO[0*ldc+J]; \ + rowC[0] += result[0]; \ + rowC = (v4sf_t *) &CO[1*ldc+J]; \ + rowC[0] += result[1]; \ + rowC = (v4sf_t *) &CO[2*ldc+J]; \ + rowC[0] += result[2]; \ + rowC = (v4sf_t *) &CO[3*ldc+J]; \ + rowC[0] += result[3]; + +void +DM (int m, int n, int k, double *A, double *B, double *C) +{ + __vector_quad acc0, acc1, acc2, acc3, acc4, acc5, acc6, acc7; + v4sf_t result[4]; + v4sf_t *rowC; + for (int l = 0; l < n; l += 4) +{ + double *CO; + double *AO; + AO = A; + CO = C; + C += m * 4; + for (int j = 0; j < m; j += 16) + { + double *BO = B; + __builtin_mma_xxsetaccz (&acc0); + __builtin_mma_xxsetaccz (&acc1); + __builtin_mma_xxsetaccz (&acc2); + __builtin_mma_xxsetaccz (&acc3); + __builtin_mma_xxsetaccz (&acc4); + __builtin_mma_xxsetaccz (&acc5); + __builtin_mma_xxsetaccz (&acc6); + __builtin_mma_xxsetaccz (&acc7); + unsigned long i; + + for (i = 0; i < k; i++) + { + vec_t *rowA = (vec_t *) & AO[i * 16]; + __vector_pair rowB; + vec_t *rb = (vec_t *) & BO[i * 4]; + __builtin_mma_assemble_pair (&rowB, rb[1], rb[0]); + __builtin_mma_xvf64gerpp (&acc0, rowB, rowA[0]); + __builtin_mma_xvf64gerpp (&acc1, rowB, rowA[1]); + __builtin_mma_xvf64gerpp (&acc2, rowB, rowA[2]); + __builtin_mma_xvf64gerpp (&acc3, rowB, rowA[3]); + __builtin_mma_xvf64gerpp (&acc4, rowB, rowA[4]); + __builtin_mma_xvf64gerpp (&acc5, rowB, rowA[5]); + __builtin_mma_xvf64gerpp (&acc6, rowB, rowA[6]); + __builtin_mma_xvf64gerpp (&acc7, rowB, rowA[7]); + } + SAVE_ACC (&acc0, m, 0); + SAVE_ACC (&acc2, m, 4); + SAVE_ACC (&acc1, m, 2); + SAVE_ACC (&acc3, m, 6); + SAVE_ACC (&acc4, m, 8); + SAVE_ACC (&acc6, m, 12); + SAVE_ACC (&acc5, m, 10); + SAVE_ACC (&acc7, m, 14); + AO += k * 16; + BO += k * 4; + CO += 16; + } + B += k * 4; +} +} + +void +init (double *matrix, int row, int column) +{ + for (int j = 0; j < column; j++) +{ + for (int i = 0; i < row; i++) + { + matrix[j * row + i] = (i * 16 + 2 + j) / 0.123; + } +} +} + +void +init0 (double *matrix, double *matrix1, int row, int column) +{ + for (int j = 0; j < column; j++) +for (int i = 0; i < row; i++) + matrix[j * row + i] = matrix1[j * row + i] = 0; +} + + +void +print (const char *name, const double *matrix, int row, int column) +{ + printf ("Matrix %s has %d rows and %d columns:\n", name, row, column); + for (int i = 0; i < row; i++) +{ + for (int j = 0; j < column; j++) + { + printf ("%f ", matrix[j * row + i]); + } + printf ("\n"); +} + printf ("\n"); +} + +int +main (int argc, char *argv[]) +{ + int rowsA, colsB, common; + int i, j, k; + int ret = 0; + + for (int t = 16; t <= 128; t += 16) +{ + for (int t1 = 4; t1 <= 16; t1 += 4) + { + rowsA = t; + colsB = t1; + common = 1; + /* printf ("Running test for rows = %d,cols = %d\n", t, t1); */ + double A[rowsA * common]; + double B[common * colsB]; + double C[rowsA * colsB]; + double D[rowsA * colsB]; + + + init (A, rowsA, common); + init (B, common, colsB); + init0 (C, D, rowsA, colsB); + DM (rowsA, colsB, common, A, B, C); + +
[gcc(refs/users/meissner/heads/work162-dmf)] PowerPC: Add support for 1, 024 bit DMR registers.
https://gcc.gnu.org/g:18c91326e38cead42b2101729a3b97e1816832e8 commit 18c91326e38cead42b2101729a3b97e1816832e8 Author: Michael Meissner Date: Wed Mar 13 02:33:43 2024 -0400 PowerPC: Add support for 1,024 bit DMR registers. This patch is a prelimianry patch to add the full 1,024 bit dense math register (DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the DMR register. This patch only adds the new 1,024 bit register support. It does not add support for any instructions that need 1,024 bit registers instead of 512 bit registers. I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit registers. The 'wD' constraint added in previous patches is used for these registers. I added support to do load and store of DMRs via the VSX registers, since there are no load/store dense math instructions. I added the new keyword '__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. The patches have been tested on both little and big endian systems. Can I check it into the master branch? 2024-03-13 Michael Meissner gcc/ * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. (UNSPEC_DM_INSERT512_LOWER): Likewise. (UNSPEC_DM_EXTRACT512): Likewise. (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. (movtdo): New define_expand and define_insn_and_split to implement 1,024 bit DMR registers. (movtdo_insert512_upper): New insn. (movtdo_insert512_lower): Likewise. (movtdo_extract512): Likewise. (reload_dmr_from_memory): Likewise. (reload_dmr_to_memory): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR support. (rs6000_init_builtins): Add support for __dmr keyword. * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support for TDOmode. (rs6000_function_arg): Likewise. * config/rs6000/rs6000-modes.def (TDOmode): New mode. * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add support for TDOmode. (rs6000_hard_regno_mode_ok_uncached): Likewise. (rs6000_hard_regno_mode_ok): Likewise. (rs6000_modes_tieable_p): Likewise. (rs6000_debug_reg_global): Likewise. (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload hooks for DMR mode. (reg_offset_addressing_ok_p): Add support for TDOmode. (rs6000_emit_move): Likewise. (rs6000_secondary_reload_simple_move): Likewise. (rs6000_preferred_reload_class): Likewise. (rs6000_secondary_reload_class): Likewise. (rs6000_mangle_type): Add mangling for __dmr type. (rs6000_dmr_register_move_cost): Add support for TDOmode. (rs6000_split_multireg_move): Likewise. (rs6000_invalid_conversion): Likewise. * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. (enum rs6000_builtin_type_index): Add DMR type nodes. (dmr_type_node): Likewise. (ptr_dmr_type_node): Likewise. gcc/testsuite/ * gcc.target/powerpc/dm-1024bit.c: New test. Diff: --- gcc/config/rs6000/mma.md | 154 ++ gcc/config/rs6000/rs6000-builtin.cc | 17 +++ gcc/config/rs6000/rs6000-call.cc | 10 +- gcc/config/rs6000/rs6000-modes.def| 4 + gcc/config/rs6000/rs6000.cc | 101 - gcc/config/rs6000/rs6000.h| 6 +- gcc/testsuite/gcc.target/powerpc/dm-1024bit.c | 63 +++ 7 files changed, 321 insertions(+), 34 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index f3870eac51a..4f9c59046ea 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -91,6 +91,11 @@ UNSPEC_MMA_XVI8GER4SPP UNSPEC_MMA_XXMFACC UNSPEC_MMA_XXMTACC + UNSPEC_DM_INSERT512_UPPER + UNSPEC_DM_INSERT512_LOWER + UNSPEC_DM_EXTRACT512 + UNSPEC_DMR_RELOAD_FROM_MEMORY + UNSPEC_DMR_RELOAD_TO_MEMORY ]) (define_c_enum "unspecv" @@ -770,3 +775,152 @@ } [(set_attr "type" "mma") (set_attr "prefixed" "yes")]) + +;; TDOmode (__dmr keyword for 1,024 bit registers). +(define_expand "movtdo" + [(set (match_operand:TDO 0 "nonimmediate_operand") + (match_operand:TDO 1 "input_operand"))] + "TARGET_MMA_DENSE_MATH" +{ + rs6000_emit_move (operands[0], operands[1], TDOmode); + DONE; +}) + +(define_insn_and_split "*movtdo" + [(set (match_operand:TDO 0 "noni
[gcc(refs/users/meissner/heads/work162-dmf)] Update ChangeLog.*
https://gcc.gnu.org/g:9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd commit 9e9f7da1148b547dd4aa1f2084cd7df1d407d2dd Author: Michael Meissner Date: Wed Mar 13 02:36:19 2024 -0400 Update ChangeLog.* Diff: --- gcc/ChangeLog.dmf | 307 ++ 1 file changed, 307 insertions(+) diff --git a/gcc/ChangeLog.dmf b/gcc/ChangeLog.dmf index 4bf550e6556..03ab4ad714c 100644 --- a/gcc/ChangeLog.dmf +++ b/gcc/ChangeLog.dmf @@ -1,5 +1,312 @@ + Branch work162-dmf, patch #106 + +PowerPC: Add support for 1,024 bit DMR registers. + +This patch is a prelimianry patch to add the full 1,024 bit dense math register +(DMRs) for -mcpu=future. The MMA 512-bit accumulators map onto the top of the +DMR register. + +This patch only adds the new 1,024 bit register support. It does not add +support for any instructions that need 1,024 bit registers instead of 512 bit +registers. + +I used the new mode 'TDOmode' to be the opaque mode used for 1,024 bit +registers. The 'wD' constraint added in previous patches is used for these +registers. I added support to do load and store of DMRs via the VSX registers, +since there are no load/store dense math instructions. I added the new keyword +'__dmr' to create 1,024 bit types that can be loaded into DMRs. At present, I +don't have aliases for __dmr512 and __dmr1024 that we've discussed internally. + +The patches have been tested on both little and big endian systems. Can I check +it into the master branch? + +2024-03-13 Michael Meissner + +gcc/ + + * config/rs6000/mma.md (UNSPEC_DM_INSERT512_UPPER): New unspec. + (UNSPEC_DM_INSERT512_LOWER): Likewise. + (UNSPEC_DM_EXTRACT512): Likewise. + (UNSPEC_DMR_RELOAD_FROM_MEMORY): Likewise. + (UNSPEC_DMR_RELOAD_TO_MEMORY): Likewise. + (movtdo): New define_expand and define_insn_and_split to implement 1,024 + bit DMR registers. + (movtdo_insert512_upper): New insn. + (movtdo_insert512_lower): Likewise. + (movtdo_extract512): Likewise. + (reload_dmr_from_memory): Likewise. + (reload_dmr_to_memory): Likewise. + * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Add DMR + support. + (rs6000_init_builtins): Add support for __dmr keyword. + * config/rs6000/rs6000-call.cc (rs6000_return_in_memory): Add support + for TDOmode. + (rs6000_function_arg): Likewise. + * config/rs6000/rs6000-modes.def (TDOmode): New mode. + * config/rs6000/rs6000.cc (rs6000_hard_regno_nregs_internal): Add + support for TDOmode. + (rs6000_hard_regno_mode_ok_uncached): Likewise. + (rs6000_hard_regno_mode_ok): Likewise. + (rs6000_modes_tieable_p): Likewise. + (rs6000_debug_reg_global): Likewise. + (rs6000_setup_reg_addr_masks): Likewise. + (rs6000_init_hard_regno_mode_ok): Add support for TDOmode. Setup reload + hooks for DMR mode. + (reg_offset_addressing_ok_p): Add support for TDOmode. + (rs6000_emit_move): Likewise. + (rs6000_secondary_reload_simple_move): Likewise. + (rs6000_preferred_reload_class): Likewise. + (rs6000_secondary_reload_class): Likewise. + (rs6000_mangle_type): Add mangling for __dmr type. + (rs6000_dmr_register_move_cost): Add support for TDOmode. + (rs6000_split_multireg_move): Likewise. + (rs6000_invalid_conversion): Likewise. + * config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P): Add TDOmode. + (enum rs6000_builtin_type_index): Add DMR type nodes. + (dmr_type_node): Likewise. + (ptr_dmr_type_node): Likewise. + +gcc/testsuite/ + + * gcc.target/powerpc/dm-1024bit.c: New test. + + Branch work162-dmf, patch #105 + +Add dense math test for new instruction names. + +2024-03-13 Michael Meissner + +gcc/testsuite/ + + * gcc.target/powerpc/dm-double-test.c: New test. + * lib/target-supports.exp (check_effective_target_ppc_dmr_ok): New + target test. + + Branch work162-dmf, patch #104 + +PowerPC: Switch to dense math names for all MMA operations. + +This patch changes the assembler instruction names for MMA instructions from +the original name used in power10 to the new name when used with the dense math +system. I.e. xvf64gerpp becomes dmxvf64gerpp. The assembler will emit the +same bits for either spelling. + +For the non-prefixed MMA instructions, we add a 'dm' prefix in front of the +instruction. However, the prefixed instructions have a 'pm' prefix, and we add +the 'dm' prefix afterwards. To prevent having two sets of parallel int +attributes, we remove the "pm" prefix from the instruction string in the +attributes, and add it later, both in the insn name and in the output template. + +2024-03-13 Michael Meissner + +gcc/ + + * config/rs6000/mma.md (vvi4i4i8): Change the instruction to not have a + "p