[gcc/aoliva/heads/testbase] (71 commits) Align tight&hot loop without considering max skipping bytes
The branch 'aoliva/heads/testbase' was updated to point to: b644126237a... Align tight&hot loop without considering max skipping bytes It previously pointed to: 7acd5d71547... testsuite: adjust iteration count for ppc costmodel 76b Diff: Summary of changes (added commits): --- b644126... Align tight&hot loop without considering max skipping bytes (*) 00ed542... Adjust generic loop alignment from 16:11:8 to 16 for Intel (*) d9933e8... testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_vsx (*) a19f588... Gori_on_edge tweaks. (*) e5fc5d4... rs6000: Don't clobber return value when eh_return called [P (*) 2b84169... Daily bump. (*) 1d6199e... Reduce cost of MEM (A + imm). (*) 6f36cc2... More tweaks from gimple_outgoing_range changes. (*) 802a98d... resource.cc: Remove redundant conditionals (*) e1abce5... resource.cc (mark_target_live_regs): Remove check for bb no (*) 933ab59... resource.cc: Replace calls to find_basic_block with cfgrtl (*) 84b4ed4... resource.cc (mark_target_live_regs): Don't look past target (*) 91d7905... i386: Improve access to _Atomic DImode location via XMM reg (*) 21fc89b... diagnostics: consolidate global state in diagnostic-color.c (*) 9bda2c4... libcpp: move label_text to its own header (*) fb7a943... selftests: split out make_fndecl from selftest.h to its own (*) 7cc529f... regenerate-opt-urls.py: fix transposed values for "vax" and (*) efaaae4... c++: extend -Wself-move for mem-init-list [PR109396] (*) 5ada486... Do not invoke SCEV if it will use a different range query. (*) d52b44a... Strlen pass should set current range query. (*) 5bc731b... c++: mark TARGET_EXPRs for function arguments eliding [PR11 (*) c0d7828... testsuite/*/gomp: Remove 'dg-prune-output "not supported ye (*) 2dbb1c1... diagnostics: disable localization of events in selftest pat (*) b544ff8... Fix bootstrap on AIX by adding c-family/c-type-mismatch.cc (*) 2361160... [to-be-committed] [RISC-V] Some basic patterns for zbkb cod (*) a3aeff4... vect: Use vect representative statement instead of original (*) d8d70b7... target/115254 - fix gcc.dg/vect/vect-gather-4.c dump scanni (*) c08b0d3... tree-optimization/115236 - more points-to *ANYTHING = x fix (*) 19cc611... Avoid pessimistic constraints for asm memory constraints (*) eaaa4b8... tree-optimization/115254 - don't account single-lane SLP ag (*) 65aa46f... Fix SLP reduction neutral op value for pointer reductions (*) c650023... Fix predicate mismatch between vfcmaddcph's define_insn and (*) ded91d8... LoongArch: Guard REGNO with REG_P in loongarch_expand_condi (*) 4fcdc37... Fix bitops-9.c for -m32 and other targets that don't have v (*) 958a682... Daily bump. (*) c5a7628... match: Use uniform_integer_cst_p in bitwise_inverted_equal_ (*) a209f21... modula2: simplify xref usage in documentation, remove exter (*) 07cdba6... Fix points-to SCC collapsing bug (*) f9fbb47... tree-optimization/115220 - fix store sinking virtual operan (*) f6c5f83... Define which threading model is in use on Windows (*) 311d7f5... tree-optimization/115232 - demangle failure during -Waccess (*) 88c9b96... Add testcase for PR c++/105229: ICE in lookup_template_clas (*) 6e97482... doc: Use https for our own site (and GCC for the project) (*) 06bb125... RISC-V: Fix missing boolean_expression in zmmul extension (*) 314448f... VAX/doc: Fix issues with FP format option documentation (*) a7f6543... vax: Fix descriptions of the FP format options [PR79646] (*) 1609294... [to-be-committed][RISC-V] Reassociate constants in logical (*) 0022064... x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw (*) 5d99cf7... Gen-Match: Fix gen_kids_1 right hand braces mis-alignment (*) 56d0d0d... Daily bump. (*) 3a915d6... [to-be-committed] [RISC-V] Try inverting for constant synth (*) a06df66... go: Move web references from golang.org to go.dev. (*) 53d9198... doc: Quote singular '=' signs (*) 9566022... [to-be-committed][RISC-V] Generate nearby constant, then ad (*) 8746373... [PATCH] libcpp: Correct typo 'r' -> '\r' (*) f981072... Delete gori_map during destruction of GORI. (*) 3c7ae57... Daily bump. (*) 05daf61... [committed] [v2] More logical op simplifications in simplif (*) 28b5082... c++/modules: Improve diagnostic when redeclaring builtin in (*) 6c0b7e1... Daily bump. (*) 9561cf5... Fortran: improve attribute conflict checking [PR93635] (*) 9376573... Fortran: fix bounds check for assignment, class component [ (*) 73eef7a... Small enhancement to implementation of -fdump-ada-spec (*) 9f1798c... c: Fix for some variably modified types not being recognize (*) dae606a... c++/modules: Improve errors for bad module-directives [PR11 (*) 03531ec... c++/modules: Remember that header units have CMIs (*) 0173dcc... c++/modules: Fix treatment of unnamed types (*) 401994d... [to-be-committed,v2,RISC-V] Use bclri in constant synt
[gcc/aoliva/heads/testme] (78 commits) [testsuite] [powerpc] adjust -m32 counts for fold-vec-extra
The branch 'aoliva/heads/testme' was updated to point to: ca809ee3fbe... [testsuite] [powerpc] adjust -m32 counts for fold-vec-extra It previously pointed to: 3bcf4294d89... [rs6000] adjust return_pc debug attrs Diff: !!! WARNING: THE FOLLOWING COMMITS ARE NO LONGER ACCESSIBLE (LOST): --- 3bcf429... [rs6000] adjust return_pc debug attrs a56062c... enable adjustment of return_pc debug attrs Summary of changes (added commits): --- ca809ee... [testsuite] [powerpc] adjust -m32 counts for fold-vec-extra 0276651... [libstdc++-v3] [rtems] enable filesystem support 1c34040... [tree-prof] skip if errors were seen [PR113681] 1b22d42... [testsuite] [arm] add effective target and options for pacb d34a3eb... add explicit ABI and align options to pr88233.c 0bb10f1... [rs6000] adjust return_pc debug attrs 99047b7... enable adjustment of return_pc debug attrs b644126... Align tight&hot loop without considering max skipping bytes (*) 00ed542... Adjust generic loop alignment from 16:11:8 to 16 for Intel (*) d9933e8... testsuite, rs6000: Replace powerpc_vsx_ok with powerpc_vsx (*) a19f588... Gori_on_edge tweaks. (*) e5fc5d4... rs6000: Don't clobber return value when eh_return called [P (*) 2b84169... Daily bump. (*) 1d6199e... Reduce cost of MEM (A + imm). (*) 6f36cc2... More tweaks from gimple_outgoing_range changes. (*) 802a98d... resource.cc: Remove redundant conditionals (*) e1abce5... resource.cc (mark_target_live_regs): Remove check for bb no (*) 933ab59... resource.cc: Replace calls to find_basic_block with cfgrtl (*) 84b4ed4... resource.cc (mark_target_live_regs): Don't look past target (*) 91d7905... i386: Improve access to _Atomic DImode location via XMM reg (*) 21fc89b... diagnostics: consolidate global state in diagnostic-color.c (*) 9bda2c4... libcpp: move label_text to its own header (*) fb7a943... selftests: split out make_fndecl from selftest.h to its own (*) 7cc529f... regenerate-opt-urls.py: fix transposed values for "vax" and (*) efaaae4... c++: extend -Wself-move for mem-init-list [PR109396] (*) 5ada486... Do not invoke SCEV if it will use a different range query. (*) d52b44a... Strlen pass should set current range query. (*) 5bc731b... c++: mark TARGET_EXPRs for function arguments eliding [PR11 (*) c0d7828... testsuite/*/gomp: Remove 'dg-prune-output "not supported ye (*) 2dbb1c1... diagnostics: disable localization of events in selftest pat (*) b544ff8... Fix bootstrap on AIX by adding c-family/c-type-mismatch.cc (*) 2361160... [to-be-committed] [RISC-V] Some basic patterns for zbkb cod (*) a3aeff4... vect: Use vect representative statement instead of original (*) d8d70b7... target/115254 - fix gcc.dg/vect/vect-gather-4.c dump scanni (*) c08b0d3... tree-optimization/115236 - more points-to *ANYTHING = x fix (*) 19cc611... Avoid pessimistic constraints for asm memory constraints (*) eaaa4b8... tree-optimization/115254 - don't account single-lane SLP ag (*) 65aa46f... Fix SLP reduction neutral op value for pointer reductions (*) c650023... Fix predicate mismatch between vfcmaddcph's define_insn and (*) ded91d8... LoongArch: Guard REGNO with REG_P in loongarch_expand_condi (*) 4fcdc37... Fix bitops-9.c for -m32 and other targets that don't have v (*) 958a682... Daily bump. (*) c5a7628... match: Use uniform_integer_cst_p in bitwise_inverted_equal_ (*) a209f21... modula2: simplify xref usage in documentation, remove exter (*) 07cdba6... Fix points-to SCC collapsing bug (*) f9fbb47... tree-optimization/115220 - fix store sinking virtual operan (*) f6c5f83... Define which threading model is in use on Windows (*) 311d7f5... tree-optimization/115232 - demangle failure during -Waccess (*) 88c9b96... Add testcase for PR c++/105229: ICE in lookup_template_clas (*) 6e97482... doc: Use https for our own site (and GCC for the project) (*) 06bb125... RISC-V: Fix missing boolean_expression in zmmul extension (*) 314448f... VAX/doc: Fix issues with FP format option documentation (*) a7f6543... vax: Fix descriptions of the FP format options [PR79646] (*) 1609294... [to-be-committed][RISC-V] Reassociate constants in logical (*) 0022064... x86: Fix Logical Shift Issue in expand_vec_perm_psrlw_psllw (*) 5d99cf7... Gen-Match: Fix gen_kids_1 right hand braces mis-alignment (*) 56d0d0d... Daily bump. (*) 3a915d6... [to-be-committed] [RISC-V] Try inverting for constant synth (*) a06df66... go: Move web references from golang.org to go.dev. (*) 53d9198... doc: Quote singular '=' signs (*) 9566022... [to-be-committed][RISC-V] Generate nearby constant, then ad (*) 8746373... [PATCH] libcpp: Correct typo 'r' -> '\r' (*) f981072... Delete gori_map during destruction of GORI. (*) 3c7ae57... Daily bump. (*) 05daf61... [committed] [v2] More logical op simplifications in simplif (*) 28b5082...
[gcc(refs/users/aoliva/heads/testme)] add explicit ABI and align options to pr88233.c
https://gcc.gnu.org/g:d34a3eb9286d547533a7226a504c229b2ab6d4b3 commit d34a3eb9286d547533a7226a504c229b2ab6d4b3 Author: Alexandre Oliva Date: Wed May 29 02:52:14 2024 -0300 add explicit ABI and align options to pr88233.c We've observed failures of this test on powerpc configurations that default to different calling conventions and alignment requirements. Both settings are needed for the original expectations to be met. The test was later modified to have different expectations for big and little endian code generation. This patch restores the original codegen expectations, that, with the explicit options, don't vary any more. for gcc/testsuite/ChangeLog * gcc.target/powerpc/pr88233.c: Make some alignment strictness and calling conventions assumptions explicit. Restore uniform codegen expectations Diff: --- gcc/testsuite/gcc.target/powerpc/pr88233.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/powerpc/pr88233.c b/gcc/testsuite/gcc.target/powerpc/pr88233.c index 27c73717a3f..46a3ebfa287 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr88233.c +++ b/gcc/testsuite/gcc.target/powerpc/pr88233.c @@ -1,5 +1,5 @@ /* { dg-require-effective-target lp64 } */ -/* { dg-options "-O2 -mdejagnu-cpu=power8" } */ +/* { dg-options "-O2 -mdejagnu-cpu=power8 -mno-strict-align -fpcc-struct-return" } */ typedef struct { double a[2]; } A; A @@ -9,6 +9,5 @@ foo (const A *a) } /* { dg-final { scan-assembler-not {\mmtvsr} } } */ -/* { dg-final { scan-assembler-times {\mlxvd2x\M} 1 { target { be } } } } */ -/* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 { target { be } } } } */ -/* { dg-final { scan-assembler-times {\mlfd\M} 2 { target { le } } } } */ +/* { dg-final { scan-assembler-times {\mlxvd2x\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
[gcc r15-889] libstdc++: Avoid MMX return types from __builtin_shufflevector
https://gcc.gnu.org/g:241a6cc88d866fb36bd35ddb3edb659453d6322e commit r15-889-g241a6cc88d866fb36bd35ddb3edb659453d6322e Author: Matthias Kretz Date: Wed May 15 11:02:22 2024 +0200 libstdc++: Avoid MMX return types from __builtin_shufflevector This resolves a regression on i686 that was introduced with r15-429-gfb1649f8b4ad50. Signed-off-by: Matthias Kretz libstdc++-v3/ChangeLog: PR libstdc++/115247 * include/experimental/bits/simd.h (__as_vector): Don't use vector_size(8) on __i386__. (__vec_shuffle): Never return MMX vectors, widen to 16 bytes instead. (concat): Fix padding calculation to pick up widening logic from __as_vector. Diff: --- libstdc++-v3/include/experimental/bits/simd.h | 39 +++ 1 file changed, 28 insertions(+), 11 deletions(-) diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 6a6fd4f109d..7c524625719 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1665,7 +1665,12 @@ template { static_assert(is_simd<_V>::value); using _Tp = typename _V::value_type; +#ifdef __i386__ + constexpr auto __bytes = sizeof(_Tp) == 8 ? 16 : sizeof(_Tp); + using _RV [[__gnu__::__vector_size__(__bytes)]] = _Tp; +#else using _RV [[__gnu__::__vector_size__(sizeof(_Tp))]] = _Tp; +#endif return _RV{__data(__x)}; } } @@ -2081,11 +2086,14 @@ template > // }}} // __vec_shuffle{{{ template - _GLIBCXX_SIMD_INTRINSIC constexpr auto + _GLIBCXX_SIMD_INTRINSIC constexpr + __vector_type_t()[0])>, sizeof...(_Is)> __vec_shuffle(_T0 __x, _T1 __y, index_sequence<_Is...> __seq, _Fun __idx_perm) { constexpr int _N0 = sizeof(__x) / sizeof(__x[0]); constexpr int _N1 = sizeof(__y) / sizeof(__y[0]); +using _Tp = remove_reference_t()[0])>; +using _RV [[maybe_unused]] = __vector_type_t<_Tp, sizeof...(_Is)>; #if __has_builtin(__builtin_shufflevector) #ifdef __clang__ // Clang requires _T0 == _T1 @@ -2105,14 +2113,23 @@ template }); else #endif - return __builtin_shufflevector(__x, __y, [=] { - constexpr int __j = __idx_perm(_Is); - static_assert(__j < _N0 + _N1); - return __j; -}()...); + { + const auto __r = __builtin_shufflevector(__x, __y, [=] { + constexpr int __j = __idx_perm(_Is); + static_assert(__j < _N0 + _N1); + return __j; +}()...); +#ifdef __i386__ + if constexpr (sizeof(__r) == sizeof(_RV)) + return __r; + else + return _RV {__r[_Is]...}; +#else + return __r; +#endif + } #else -using _Tp = __remove_cvref_t; -return __vector_type_t<_Tp, sizeof...(_Is)> { +return _RV { [=]() -> _Tp { constexpr int __j = __idx_perm(_Is); static_assert(__j < _N0 + _N1); @@ -4393,9 +4410,9 @@ template __vec_shuffle(__as_vector(__xs)..., std::make_index_sequence<_RW::_S_full_size>(), [](int __i) { constexpr int __sizes[2] = {int(simd_size_v<_Tp, _As>)...}; - constexpr int __padding0 - = sizeof(__vector_type_t<_Tp, __sizes[0]>) / sizeof(_Tp) - - __sizes[0]; + constexpr int __vsizes[2] + = {int(sizeof(__as_vector(__xs)) / sizeof(_Tp))...}; + constexpr int __padding0 = __vsizes[0] - __sizes[0]; return __i >= _Np ? -1 : __i < __sizes[0] ? __i : __i + __padding0; })}; }
[gcc r15-890] libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641]
https://gcc.gnu.org/g:a99ebb88f8f25e76ebed5afc22e64fa77a2f0d3f commit r15-890-ga99ebb88f8f25e76ebed5afc22e64fa77a2f0d3f Author: Rainer Orth Date: Wed May 29 10:08:07 2024 +0200 libstdc++: Build libbacktrace and 19_diagnostics/stacktrace with -funwind-tables [PR111641] Several of the 19_diagnostics/stacktrace tests FAIL on Solaris/SPARC (32 and 64-bit), Solaris/x86 (32-bit only), and several other targets: FAIL: 19_diagnostics/stacktrace/current.cc -std=gnu++23 execution test FAIL: 19_diagnostics/stacktrace/current.cc -std=gnu++26 execution test FAIL: 19_diagnostics/stacktrace/entry.cc -std=gnu++23 execution test FAIL: 19_diagnostics/stacktrace/entry.cc -std=gnu++26 execution test FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++23 execution test FAIL: 19_diagnostics/stacktrace/output.cc -std=gnu++26 execution test FAIL: 19_diagnostics/stacktrace/stacktrace.cc -std=gnu++23 execution test FAIL: 19_diagnostics/stacktrace/stacktrace.cc -std=gnu++26 execution test As it turns out, both the copy of libbacktrace in libstdc++ and the testcases proper need to compiled with -funwind-tables, as is done for libbacktrace itself. This isn't an issue on Linux/x86_64 and Solaris/amd64 since 64-bit x86 always defaults to -funwind-tables. 32-bit x86 does, too, when -fomit-frame-pointer is enabled as on Linux/i686, but unlike Solaris/i386. So this patch always enables the option both for the libbacktrace copy and the testcases. Tested on i386-pc-solaris2.11, sparc-sun-solaris2.11, and x86_64-pc-linux-gnu. 2024-05-23 Rainer Orth libstdc++-v3: PR libstdc++/111641 * src/libbacktrace/Makefile.am (AM_CFLAGS): Add -funwind-tables. * src/libbacktrace/Makefile.in: Regenerate. * testsuite/19_diagnostics/stacktrace/current.cc (dg-options): Add -funwind-tables. * testsuite/19_diagnostics/stacktrace/entry.cc: Likewise. * testsuite/19_diagnostics/stacktrace/hash.cc: Likewise. * testsuite/19_diagnostics/stacktrace/output.cc: Likewise. * testsuite/19_diagnostics/stacktrace/stacktrace.cc: Likewise. Diff: --- libstdc++-v3/src/libbacktrace/Makefile.am | 2 +- libstdc++-v3/src/libbacktrace/Makefile.in | 2 +- libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc| 2 +- libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc | 2 +- libstdc++-v3/testsuite/19_diagnostics/stacktrace/hash.cc | 2 +- libstdc++-v3/testsuite/19_diagnostics/stacktrace/output.cc | 2 +- libstdc++-v3/testsuite/19_diagnostics/stacktrace/stacktrace.cc | 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/src/libbacktrace/Makefile.am b/libstdc++-v3/src/libbacktrace/Makefile.am index a2e78671259..82205db46de 100644 --- a/libstdc++-v3/src/libbacktrace/Makefile.am +++ b/libstdc++-v3/src/libbacktrace/Makefile.am @@ -51,7 +51,7 @@ C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes -Wold-styl CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter AM_CFLAGS = \ $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \ - $(C_WARN_FLAGS) + $(C_WARN_FLAGS) -funwind-tables AM_CFLAGS += $(EXTRA_CFLAGS) AM_CXXFLAGS = \ $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \ diff --git a/libstdc++-v3/src/libbacktrace/Makefile.in b/libstdc++-v3/src/libbacktrace/Makefile.in index b5713b0c616..51c8092335a 100644 --- a/libstdc++-v3/src/libbacktrace/Makefile.in +++ b/libstdc++-v3/src/libbacktrace/Makefile.in @@ -473,7 +473,7 @@ libstdc___libbacktrace_la_CPPFLAGS = \ C_WARN_FLAGS = $(WARN_FLAGS) -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wno-unused-but-set-variable CXX_WARN_FLAGS = $(WARN_FLAGS) -Wno-unused-parameter AM_CFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \ - $(C_WARN_FLAGS) $(EXTRA_CFLAGS) + $(C_WARN_FLAGS) -funwind-tables $(EXTRA_CFLAGS) AM_CXXFLAGS = $(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \ $(CXX_WARN_FLAGS) -fno-rtti -fno-exceptions $(EXTRA_CXXFLAGS) obj_prefix = std_stacktrace diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc index b1af5f74fb2..cdebd5f1daa 100644 --- a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc +++ b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/current.cc @@ -1,4 +1,4 @@ -// { dg-options "-lstdc++exp" } +// { dg-options "-funwind-tables -lstdc++exp" } // { dg-do run { target c++23 } } // { dg-require-cpp-feature-test __cpp_lib_stacktrace } diff --git a/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc b/libstdc++-v3/testsuite/19_diagnostics/stacktrace/entry.cc index bb348ebef8f..90671e68f8b 100644 --- a/libstdc++-v3/testsuite/19_diagnostics
[gcc r15-891] Fix memory leak.
https://gcc.gnu.org/g:2f97d98d174e3ef9f3a9a83c179d787abde5e066 commit r15-891-g2f97d98d174e3ef9f3a9a83c179d787abde5e066 Author: Andre Vehreschild Date: Wed Jul 12 16:52:15 2023 +0200 Fix memory leak. Prevent double call of function return class object and free the object after copy. gcc/fortran/ChangeLog: PR fortran/90069 * trans-expr.cc (gfc_conv_procedure_call): Evaluate expressions with side-effects only ones and ensure old is freeed. gcc/testsuite/ChangeLog: PR fortran/90069 * gfortran.dg/class_76.f90: New test. Diff: --- gcc/fortran/trans-expr.cc | 29 +-- gcc/testsuite/gfortran.dg/class_76.f90 | 66 ++ 2 files changed, 92 insertions(+), 3 deletions(-) diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index dfc5b8e9b4a..9f6cc8f871e 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -6725,9 +6725,32 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, { tree efield; - /* Evaluate arguments just once. */ - if (e->expr_type != EXPR_VARIABLE) - parmse.expr = save_expr (parmse.expr); + /* Evaluate arguments just once, when they have +side effects. */ + if (TREE_SIDE_EFFECTS (parmse.expr)) + { + tree cldata, zero; + + parmse.expr = gfc_evaluate_now (parmse.expr, + &parmse.pre); + + /* Prevent memory leak, when old component +was allocated already. */ + cldata = gfc_class_data_get (parmse.expr); + zero = build_int_cst (TREE_TYPE (cldata), + 0); + tmp = fold_build2_loc (input_location, NE_EXPR, +logical_type_node, +cldata, zero); + tmp = build3_v (COND_EXPR, tmp, + gfc_call_free (cldata), + build_empty_stmt ( + input_location)); + gfc_add_expr_to_block (&parmse.finalblock, +tmp); + gfc_add_modify (&parmse.finalblock, + cldata, zero); + } /* Set the _data field. */ tmp = gfc_class_data_get (var); diff --git a/gcc/testsuite/gfortran.dg/class_76.f90 b/gcc/testsuite/gfortran.dg/class_76.f90 new file mode 100644 index 000..1ee1e1fc25f --- /dev/null +++ b/gcc/testsuite/gfortran.dg/class_76.f90 @@ -0,0 +1,66 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } +! +! PR fortran/90069 +! +! Contributed by Brad Richardson +! + +program returned_memory_leak +implicit none + +type, abstract :: base +end type base + +type, extends(base) :: extended +end type extended + +type :: container +class(*), allocatable :: thing +end type + +call run() +contains +subroutine run() +type(container) :: a_container + +a_container = theRightWay() +a_container = theWrongWay() +end subroutine + +function theRightWay() +type(container) :: theRightWay + +class(base), allocatable :: thing + +allocate(thing, source = newAbstract()) +theRightWay = newContainer(thing) +end function theRightWay + +function theWrongWay() +type(container) :: theWrongWay + +theWrongWay = newContainer(newAbstract()) +end function theWrongWay + +function newAbstract() +class(base), allocatable :: newAbstract + +allocate(newAbstract, source = newExtended()) +end function newAbstract + +function newExtended() +type(extended) :: newExtended +end function newExtended + +function newContainer(thing) +class(*), intent(in) :: thing +type(container) :: newContainer + +allocate(newContainer%thing, source = thing) +end function newContainer +end program returned_memory_leak + +! { dg-final { scan-tree-dump-times "newabstract" 14 "original" } } +! { dg-final { scan-tree-dump-times "__builtin_free" 8 "original" } } +
[gcc r15-892] c++: canonicity of fn types w/ instantiated eh specs [PR115223]
https://gcc.gnu.org/g:58b8c87b7fb281e35a6817cc91a292096fdc02dc commit r15-892-g58b8c87b7fb281e35a6817cc91a292096fdc02dc Author: Patrick Palka Date: Wed May 29 04:49:37 2024 -0400 c++: canonicity of fn types w/ instantiated eh specs [PR115223] When propagating structural equality in build_cp_fntype_variant, we should consider structural equality of the exception-less variant, not of the given type which might use structural equality only because it has a (complex) noexcept-spec that we're intending to replace, as in maybe_instantiate_noexcept which calls build_exception_variant using the deferred-noexcept function type. Otherwise we might pessimistically use structural equality for a function type with a simple instantiated noexcept-spec, leading to a LTO-triggered type verification failure if we later use that (structural-equality) type as the canonical version of some other variant. PR c++/115223 gcc/cp/ChangeLog: * tree.cc (build_cp_fntype_variant): Propagate structural equality of the exception-less variant. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept87.C: New test. Reviewed-by: Jason Merrill Diff: --- gcc/cp/tree.cc | 4 gcc/testsuite/g++.dg/cpp0x/noexcept87.C | 11 +++ 2 files changed, 15 insertions(+) diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc index fe3f034d000..72dd46e1bd1 100644 --- a/gcc/cp/tree.cc +++ b/gcc/cp/tree.cc @@ -2796,6 +2796,10 @@ build_cp_fntype_variant (tree type, cp_ref_qualifier rqual, bool complex_eh_spec_p = (cr && cr != noexcept_true_spec && !UNPARSED_NOEXCEPT_SPEC_P (cr)); + if (!complex_eh_spec_p && TYPE_RAISES_EXCEPTIONS (type)) +/* We want to consider structural equality of the exception-less + variant since we'll be replacing the exception specification. */ +type = build_cp_fntype_variant (type, rqual, /*raises=*/NULL_TREE, late); if (TYPE_STRUCTURAL_EQUALITY_P (type) || complex_eh_spec_p) /* Propagate structural equality. And always use structural equality for function types with a complex noexcept-spec since their identity diff --git a/gcc/testsuite/g++.dg/cpp0x/noexcept87.C b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C new file mode 100644 index 000..339569d15ae --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/noexcept87.C @@ -0,0 +1,11 @@ +// PR c++/115223 +// { dg-do compile { target c++11 } } +// { dg-additional-options -flto } + +template +void f() noexcept(bool(T() || true)); + +void g() { f(); } + +using type = void; +type callDestructorIfNecessary() noexcept {}
[gcc r15-893] i386: Fix ix86_option override after change [PR 113719]
https://gcc.gnu.org/g:499d00127d39ba894b0f7216d73660b380bdc325 commit r15-893-g499d00127d39ba894b0f7216d73660b380bdc325 Author: Hongyu Wang Date: Wed May 15 11:24:34 2024 +0800 i386: Fix ix86_option override after change [PR 113719] In ix86_override_options_after_change, calls to ix86_default_align and ix86_recompute_optlev_based_flags will cause mismatched target opt_set when doing cl_optimization_restore. Move them back to ix86_option_override_internal to solve the issue. gcc/ChangeLog: PR target/113719 * config/i386/i386-options.cc (ix86_override_options_after_change): Remove call to ix86_default_align and ix86_recompute_optlev_based_flags. (ix86_option_override_internal): Call ix86_default_align and ix86_recompute_optlev_based_flags. Diff: --- gcc/config/i386/i386-options.cc | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 78602a17f7e..f2cecc0e254 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -1916,11 +1916,6 @@ ix86_recompute_optlev_based_flags (struct gcc_options *opts, void ix86_override_options_after_change (void) { - /* Default align_* from the processor table. */ - ix86_default_align (&global_options); - - ix86_recompute_optlev_based_flags (&global_options, &global_options_set); - /* Disable unrolling small loops when there's explicit -f{,no}unroll-loop. */ if ((OPTION_SET_P (flag_unroll_loops)) @@ -2491,6 +2486,8 @@ ix86_option_override_internal (bool main_args_p, set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes); + ix86_recompute_optlev_based_flags (opts, opts_set); + ix86_override_options_after_change (); ix86_tune_cost = processor_cost_table[ix86_tune]; @@ -2526,6 +2523,9 @@ ix86_option_override_internal (bool main_args_p, || TARGET_64BIT_P (opts->x_ix86_isa_flags)) opts->x_ix86_regparm = REGPARM_MAX; + /* Default align_* from the processor table. */ + ix86_default_align (&global_options); + /* Provide default for -mbranch-cost= value. */ SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost, ix86_tune_cost->branch_cost);
[gcc r15-894] Fix link failure of GNAT tools on 32-bit SPARC/Linux
https://gcc.gnu.org/g:9c6e75a6d1cc2858fc945266a5edb700edb44389 commit r15-894-g9c6e75a6d1cc2858fc945266a5edb700edb44389 Author: Eric Botcazou Date: Wed May 29 12:06:32 2024 +0200 Fix link failure of GNAT tools on 32-bit SPARC/Linux There is an incorrect binding to the 64-bit compare-and-exchange builtin. gcc/ada/ PR ada/115270 * Makefile.rtl (PowerPC/Linux): Use libgnat/s-atopri__32.ads for the 32-bit library. (SPARC/Linux): Likewise. Diff: --- gcc/ada/Makefile.rtl | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl index 570d0b2703d..0f5ebb87d73 100644 --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -2266,15 +2266,18 @@ ifeq ($(strip $(filter-out powerpc% linux%,$(target_cpu) $(target_os))),) system.ads
[gcc r14-10258] Fix link failure of GNAT tools on 32-bit SPARC/Linux
https://gcc.gnu.org/g:fba2843b9b35b9700155677f90555700b6ad4e16 commit r14-10258-gfba2843b9b35b9700155677f90555700b6ad4e16 Author: Eric Botcazou Date: Wed May 29 12:06:32 2024 +0200 Fix link failure of GNAT tools on 32-bit SPARC/Linux There is an incorrect binding to the 64-bit compare-and-exchange builtin. gcc/ada/ PR ada/115270 * Makefile.rtl (PowerPC/Linux): Use libgnat/s-atopri__32.ads for the 32-bit library. (SPARC/Linux): Likewise. Diff: --- gcc/ada/Makefile.rtl | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl index 6e1ca305faf..32cbdb69247 100644 --- a/gcc/ada/Makefile.rtl +++ b/gcc/ada/Makefile.rtl @@ -2238,15 +2238,18 @@ ifeq ($(strip $(filter-out powerpc% linux%,$(target_cpu) $(target_os))),) system.ads
[gcc r15-895] tree-optimization/114435 - pcom left around copies confusing SLP
https://gcc.gnu.org/g:1065a7db6f2a69770a85b4d53b9123b090dd1771 commit r15-895-g1065a7db6f2a69770a85b4d53b9123b090dd1771 Author: Richard Biener Date: Wed May 29 10:41:51 2024 +0200 tree-optimization/114435 - pcom left around copies confusing SLP The following arranges for the pre-SLP vectorization scalar cleanup to be run when predictive commoning was applied to a loop in the function. This is similar to the complete unroll situation and facilitating SLP vectorization. Avoiding the SSA copies in predictive commoning itself isn't easy (and predcom also sometimes unrolls, asking for scalar cleanup). PR tree-optimization/114435 * tree-predcom.cc (tree_predictive_commoning): Queue the next scalar cleanup sub-pipeline to be run when we did something. * gcc.dg/vect/bb-slp-pr114435.c: New testcase. Diff: --- gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c | 37 + gcc/tree-predcom.cc | 3 +++ 2 files changed, 40 insertions(+) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c new file mode 100644 index 000..d1eecf7979a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_double } */ +/* Predictive commining is supposed to happen. */ +/* { dg-additional-options "-O3 -fdump-tree-pcom" } */ + +struct res { +double r0; +double r1; +double r2; +double r3; +}; + +struct pxl { +double v0; +double v1; +double v2; +double v3; +}; + +#define IS_NAN(x) ((x) == (x)) + +void fold(struct res *r, struct pxl *in, double k, int sz) +{ + int i; + + for (i = 0; i < sz; i++) { + if (IS_NAN(k)) continue; + r->r0 += in[i].v0 * k; + r->r1 += in[i].v1 * k; + r->r2 += in[i].v2 * k; + r->r3 += in[i].v3 * k; + } +} + +/* { dg-final { scan-tree-dump "# r__r0_lsm\[^\r\n\]* = PHI" "pcom" } } */ +/* { dg-final { scan-tree-dump "optimized: basic block part vectorized" "slp1" } } */ +/* { dg-final { scan-tree-dump "# vect\[^\r\n\]* = PHI" "slp1" } } */ diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc index 75a4c85164c..9844fee1e97 100644 --- a/gcc/tree-predcom.cc +++ b/gcc/tree-predcom.cc @@ -3522,6 +3522,9 @@ tree_predictive_commoning (bool allow_unroll_p) } } + if (ret != 0) +cfun->pending_TODOs |= PENDING_TODO_force_next_scalar_cleanup; + return ret; }
[gcc r15-896] tree-optimization/115252 - enhance peeling for gaps avoidance
https://gcc.gnu.org/g:f46eaad445e680034df51bd0dec4e6c7b1f372a4 commit r15-896-gf46eaad445e680034df51bd0dec4e6c7b1f372a4 Author: Richard Biener Date: Mon May 27 16:04:35 2024 +0200 tree-optimization/115252 - enhance peeling for gaps avoidance Code generation for contiguous load vectorization can already deal with generalized avoidance of loading from a gap. The following extends detection of peeling for gaps requirement with that, gets rid of the old special casing of a half load and makes sure when we do access the gap we have peeling for gaps enabled. PR tree-optimization/115252 * tree-vect-stmts.cc (get_group_load_store_type): Enhance detecting the number of cases where we can avoid accessing a gap during code generation. (vectorizable_load): Remove old half-vector peeling for gap avoidance which is now redundant. Add gap-aligned case where it's OK to access the gap. Add assert that we have peeling for gaps enabled when we access a gap. * gcc.dg/vect/slp-gap-1.c: New testcase. Diff: --- gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +++ gcc/tree-vect-stmts.cc| 58 +-- 2 files changed, 46 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c new file mode 100644 index 000..36463ca22c5 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3" } */ + +typedef unsigned char uint8_t; +typedef short int16_t; +void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) { + for (int y = 0; y < 4; y++) { +for (int x = 0; x < 4; x++) + diff[x + y * 4] = pix1[x] - pix2[x]; +pix1 += 16; +pix2 += 32; + } +} + +/* We can vectorize this without peeling for gaps and thus without epilogue, + but the only thing we can reliably scan is the zero-padding trick for the + partial loads. */ +/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target vect64 } } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 4219ad832db..935d80f0e1b 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, dr_alignment_support alss; int misalign = dr_misalignment (first_dr_info, vectype); tree half_vtype; + poly_uint64 remain; + unsigned HOST_WIDE_INT tem, num; if (overrun_p && !masked_p && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info, vectype, misalign))) == dr_aligned || alss == dr_unaligned_supported) - && known_eq (nunits, (group_size - gap) * 2) - && known_eq (nunits, group_size) - && (vector_vector_composition_type (vectype, 2, &half_vtype) - != NULL_TREE)) + && can_div_trunc_p (group_size + * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap, + nunits, &tem, &remain) + && (known_eq (remain, 0u) + || (constant_multiple_p (nunits, remain, &num) + && (vector_vector_composition_type (vectype, num, + &half_vtype) + != NULL_TREE overrun_p = false; if (overrun_p && !can_overrun_p) @@ -11513,33 +11519,14 @@ vectorizable_load (vec_info *vinfo, unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info); unsigned int vect_align = vect_known_alignment_in_bytes (first_dr_info, vectype); - unsigned int scalar_dr_size - = vect_get_scalar_dr_size (first_dr_info); - /* If there's no peeling for gaps but we have a gap - with slp loads then load the lower half of the - vector only. See get_group_load_store_type for - when we apply this optimization. */ - if (slp - && loop_vinfo - && !LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && gap != 0 - && known_eq (nunits, (group_size - gap) * 2) - && known_eq (nunits, group_size) - && gap >= (vect_align / scalar_dr_size)) - { - tree half_vtype; - new_vtype - = vector_vector_composition_type (vectype, 2, - &half_vtype); - if (new_vtype != NULL_TREE) -
[gcc r15-897] c-family: add hints for strerror
https://gcc.gnu.org/g:19c491d1848a8410559247183597096778967edf commit r15-897-g19c491d1848a8410559247183597096778967edf Author: Oskari Pirhonen Date: Tue Feb 27 19:13:30 2024 -0600 c-family: add hints for strerror Add proper hints for implicit declaration of strerror. The results could be confusing depending on the other included headers. These example messages are from compiling a trivial program to print the string for an errno value. It only includes stdio.h (cstdio for C++). Before: $ /tmp/gcc-master/bin/gcc test.c -o test_c test.c: In function ‘main’: test.c:4:20: warning: implicit declaration of function ‘strerror’; did you mean ‘perror’? [-Wimplicit-function-declaration] 4 | printf("%s\n", strerror(0)); |^~~~ |perror $ /tmp/gcc-master/bin/g++ test.cpp -o test_cpp test.cpp: In function ‘int main()’: test.cpp:4:20: error: ‘strerror’ was not declared in this scope; did you mean ‘stderr’? 4 | printf("%s\n", strerror(0)); |^~~~ |stderr After: $ /tmp/gcc-known-headers/bin/gcc test.c -o test_c test.c: In function ‘main’: test.c:4:20: warning: implicit declaration of function ‘strerror’ [-Wimplicit-function-declaration] 4 | printf("%s\n", strerror(0)); |^~~~ test.c:2:1: note: ‘strerror’ is defined in header ‘’; this is probably fixable by adding ‘#include ’ 1 | #include +++ |+#include 2 | $ /tmp/gcc-known-headers/bin/g++ test.cpp -o test_cpp test.cpp: In function ‘int main()’: test.cpp:4:20: error: ‘strerror’ was not declared in this scope 4 | printf("%s\n", strerror(0)); |^~~~ test.cpp:2:1: note: ‘strerror’ is defined in header ‘’; this is probably fixable by adding ‘#include ’ 1 | #include +++ |+#include 2 | gcc/c-family/ChangeLog: * known-headers.cc (get_stdlib_header_for_name): Add strerror. gcc/testsuite/ChangeLog: * g++.dg/spellcheck-stdlib.C: Add check for strerror. * gcc.dg/spellcheck-stdlib-2.c: New test. Signed-off-by: Oskari Pirhonen Diff: --- gcc/c-family/known-headers.cc | 1 + gcc/testsuite/g++.dg/spellcheck-stdlib.C | 2 ++ gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c | 8 3 files changed, 11 insertions(+) diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc index dbc42eacde1..871fd714eb5 100644 --- a/gcc/c-family/known-headers.cc +++ b/gcc/c-family/known-headers.cc @@ -182,6 +182,7 @@ get_stdlib_header_for_name (const char *name, enum stdlib lib) {"strchr", {"", ""} }, {"strcmp", {"", ""} }, {"strcpy", {"", ""} }, +{"strerror", {"", ""} }, {"strlen", {"", ""} }, {"strncat", {"", ""} }, {"strncmp", {"", ""} }, diff --git a/gcc/testsuite/g++.dg/spellcheck-stdlib.C b/gcc/testsuite/g++.dg/spellcheck-stdlib.C index fd0f3a9b8c9..33718b8034e 100644 --- a/gcc/testsuite/g++.dg/spellcheck-stdlib.C +++ b/gcc/testsuite/g++.dg/spellcheck-stdlib.C @@ -104,6 +104,8 @@ void test_cstring (char *dest, char *src) // { dg-message "'#include '" "" { target *-*-* } .-1 } strcpy(dest, "test"); // { dg-error "was not declared" } // { dg-message "'#include '" "" { target *-*-* } .-1 } + strerror(0); // { dg-error "was not declared" } + // { dg-message "'#include '" "" { target *-*-* } .-1 } strlen("test"); // { dg-error "was not declared" } // { dg-message "'#include '" "" { target *-*-* } .-1 } strncat(dest, "test", 3); // { dg-error "was not declared" } diff --git a/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c new file mode 100644 index 000..4762e2ddbbd --- /dev/null +++ b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c @@ -0,0 +1,8 @@ +/* { dg-options "-Wimplicit-function-declaration" } */ + +/* Missing . */ +void test_string_h (void) +{ + strerror (0); /* { dg-error "implicit declaration of function 'strerror'" } */ + /* { dg-message "'strerror' is defined in header ''" "" { target *-*-* } .-1 } */ +}
[gcc r15-898] libgomp: Enable USM for some nvptx devices
https://gcc.gnu.org/g:4ccb3366ade6ec9493f8ca20ab73b0da4b9816db commit r15-898-g4ccb3366ade6ec9493f8ca20ab73b0da4b9816db Author: Tobias Burnus Date: Wed May 29 15:14:38 2024 +0200 libgomp: Enable USM for some nvptx devices A few high-end nvptx devices support the attribute CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared memory is supported in hardware. This patch enables support for those - if all installed nvptx devices have this feature (as the capabilities are per device type). This exposes a bug in gomp_copy_back_icvs as it did before use omp_get_mapped_ptr to find mapped variables, but that returns the unchanged pointer in cased of shared memory. But in this case, we have a few actually mapped pointers - like the ICV variables. Additionally, there was a mismatch with regards to '-1' for the device number as gomp_copy_back_icvs and omp_get_mapped_ptr count differently. Hence, do the lookup manually. include/ChangeLog: * cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add. libgomp/ChangeLog: * libgomp.texi (nvptx): Update USM description. * plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim support when requesting USM and all devices support CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS. * target.c (gomp_copy_back_icvs): Fix device ptr lookup. (gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the devices supports USM. Diff: --- include/cuda/cuda.h | 3 ++- libgomp/libgomp.texi | 7 +-- libgomp/plugin/plugin-nvptx.c | 15 +++ libgomp/target.c | 24 +++- 4 files changed, 45 insertions(+), 4 deletions(-) diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h index 0dca4b3a5c0..804d08ca57e 100644 --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -83,7 +83,8 @@ typedef enum { CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39, CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40, CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41, - CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82 + CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82, + CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88 } CUdevice_attribute; enum { diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 71d62105a20..22868635230 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -6435,8 +6435,11 @@ The implementation remark: the next reverse offload region is only executed after the previous one returned. @item OpenMP code that has a @code{requires} directive with - @code{unified_shared_memory} will remove any nvptx device from the - list of available devices (``host fallback''). + @code{unified_shared_memory} runs on nvptx devices if and only if + all of those support the @code{pageableMemoryAccess} property;@footnote{ + @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}} + otherwise, all nvptx device are removed from the list of available + devices (``host fallback''). @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack} in the GCC manual. @item The OpenMP routines @code{omp_target_memcpy_rect} and diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 5aad3448a8d..4cedc5390a3 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) if (num_devices > 0 && ((omp_requires_mask & ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0)) return -1; + /* Check whether host page access (direct or via migration) is supported; + if so, enable USM. Currently, capabilities is per device type, hence, + check all devices. */ + if (num_devices > 0 + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) +for (int dev = 0; dev < num_devices; dev++) + { + int pi; + CUresult r; + r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi, + CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS, dev); + if (r != CUDA_SUCCESS || pi == 0) + return -1; + } return num_devices; } diff --git a/libgomp/target.c b/libgomp/target.c index 5ec19ae489e..48689920d4a 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2969,8 +2969,25 @@ gomp_copy_back_icvs (struct gomp_device_descr *devicep, int device) if (item == NULL) return; + gomp_mutex_lock (&devicep->lock); + + struct splay_tree_s *mem_map = &devicep->mem_map; + struct splay_tree_key_s cur_node; + void *dev_ptr = NULL; + void *host_ptr = &item->icvs; - void *dev_ptr = omp_get_mapped_ptr (host_ptr, device); +
[gcc r15-899] libgomp: Enable USM for AMD APUs and MI200 devices
https://gcc.gnu.org/g:18f477980c8597fe3dca2c2e8bd533c0c2b17aa6 commit r15-899-g18f477980c8597fe3dca2c2e8bd533c0c2b17aa6 Author: Tobias Burnus Date: Wed May 29 15:29:06 2024 +0200 libgomp: Enable USM for AMD APUs and MI200 devices If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true, all GPUs on the system support unified shared memory. That's the case for APUs and MI200 devices when XNACK is enabled. XNACK can be enabled by setting HSA_XNACK=1 as env var for supported devices; otherwise, if disable, USM code will use host fallback. gcc/ChangeLog: * config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo. include/ChangeLog: * hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add enum value. libgomp/ChangeLog: * libgomp.texi (gcn): Update USM handling * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true. Diff: --- gcc/config/gcn/gcn-hsa.h| 2 +- include/hsa.h | 4 +++- libgomp/libgomp.texi| 9 +++-- libgomp/plugin/plugin-gcn.c | 17 + 4 files changed, 28 insertions(+), 4 deletions(-) diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index 4611bc55392..03220555075 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (const char *name); writes a new AMD GPU object file and the ABI version needs to be the same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5. GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5. - Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer + Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer supported in LLVM >= 18. */ #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \ "!march=*|march=*:--amdhsa-code-object-version=4" diff --git a/include/hsa.h b/include/hsa.h index f9b5d9daf85..3c7be95d7fd 100644 --- a/include/hsa.h +++ b/include/hsa.h @@ -466,7 +466,9 @@ typedef enum { /** * String containing the ROCr build identifier. */ - HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200 + HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200, + + HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202 } hsa_system_info_t; /** diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 22868635230..e79bd7a3392 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -6360,8 +6360,13 @@ The implementation remark: such that the next reverse offload region is only executed after the previous one returned. @item OpenMP code that has a @code{requires} directive with - @code{unified_shared_memory} will remove any GCN device from the list of - available devices (``host fallback''). + @code{unified_shared_memory} is only supported if all AMD GPUs have the + @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for + discrete GPUs, this may require setting the @code{HSA_XNACK} environment + variable to @samp{1}; for systems with both an APU and a discrete GPU that + does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to + enable only the APU. If not supported, all AMD GPU devices are removed + from the list of available devices (``host fallback''). @item The available stack size can be changed using the @code{GCN_STACK_SIZE} environment variable; the default is 32 kiB per thread. @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 3cdc7ba929f..3d882b5ab63 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -3355,8 +3355,25 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask) if (hsa_context.agent_count > 0 && ((omp_requires_mask & ~(GOMP_REQUIRES_UNIFIED_ADDRESS + | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0)) return -1; + /* Check whether host page access is supported; this is per system level + (all GPUs supported by HSA). While intrinsically true for APUs, it + requires XNACK support for discrete GPUs. */ + if (hsa_context.agent_count > 0 + && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)) +{ + bool b; + hsa_system_info_t type = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT; + hsa_status_t status = hsa_fns.hsa_system_get_info_fn (type, &b); + if (status != HSA_STATUS_SUCCESS) + GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT " + "failed"); + if (!b) + return -1; +} + return hsa_context.agent_count; }
[gcc r15-900] c++: add module extensions
https://gcc.gnu.org/g:ff41abdca0ab9993b6170b9b1f46b3a40921f1b0 commit r15-900-gff41abdca0ab9993b6170b9b1f46b3a40921f1b0 Author: Jason Merrill Date: Thu May 16 16:09:12 2024 -0400 c++: add module extensions There is a trend in the broader C++ community to use a different extension for module interface units, even though (in GCC) they are compiled in the same way as other source files. Let's recognize these extensions as C++. .ixx is the MSVC standard, while the .c*m are supported by Clang. libc++ standard headers use .cppm, as their other source files use .cpp. Perhaps libstdc++ might use .ccm for parallel consistency? One issue with .c++m is that libcpp/mkdeps.cc has been using it for the phony dependencies to express module dependencies, so I'm changing mkdeps to something less likely to be an actual file, ".c++-module". gcc/cp/ChangeLog: * lang-specs.h: Add module interface extensions. gcc/ChangeLog: * doc/invoke.texi: Update module extension docs. libcpp/ChangeLog: * mkdeps.cc (make_write): Change .c++m to .c++-module. gcc/testsuite/ChangeLog: * g++.dg/modules/dep-1_a.C * g++.dg/modules/dep-1_b.C * g++.dg/modules/dep-2.C: Change .c++m to .c++-module. Diff: --- gcc/doc/invoke.texi| 20 ++-- gcc/cp/lang-specs.h| 6 ++ gcc/testsuite/g++.dg/modules/dep-1_a.C | 4 ++-- gcc/testsuite/g++.dg/modules/dep-1_b.C | 8 gcc/testsuite/g++.dg/modules/dep-2.C | 4 ++-- libcpp/mkdeps.cc | 13 ++--- 6 files changed, 30 insertions(+), 25 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2cba380718b..517a782987d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -2317,9 +2317,12 @@ other language. C++ source files conventionally use one of the suffixes @samp{.C}, @samp{.cc}, @samp{.cpp}, @samp{.CPP}, @samp{.c++}, @samp{.cp}, or @samp{.cxx}; C++ header files often use @samp{.hh}, @samp{.hpp}, -@samp{.H}, or (for shared template code) @samp{.tcc}; and -preprocessed C++ files use the suffix @samp{.ii}. GCC recognizes -files with these names and compiles them as C++ programs even if you +@samp{.H}, or (for shared template code) @samp{.tcc}; +preprocessed C++ files use the suffix @samp{.ii}; and C++20 module interface +units sometimes use @samp{.ixx}, @samp{.cppm}, @samp{.cxxm}, @samp{.c++m}, +or @samp{.ccm}. + +GCC recognizes files with these names and compiles them as C++ programs even if you call the compiler the same way as for compiling C programs (usually with the name @command{gcc}). @@ -37705,13 +37708,10 @@ Modular compilation is @emph{not} enabled with just the version selected, although in pre-C++20 versions, it is of course an extension. -No new source file suffixes are required or supported. If you wish to -use a non-standard suffix (@pxref{Overall Options}), you also need -to provide a @option{-x c++} option too.@footnote{Some users like to -distinguish module interface files with a new suffix, such as naming -the source @code{module.cppm}, which involves -teaching all tools about the new suffix. A different scheme, such as -naming @code{module-m.cpp} would be less invasive.} +No new source file suffixes are required. A few suffixes preferred +for module interface units by other compilers (e.g. @samp{.ixx}, +@samp{.cppm}) are supported, but files with these suffixes are treated +the same as any other C++ source file. Compiling a module interface unit produces an additional output (to the assembly or object file), called a Compiled Module Interface diff --git a/gcc/cp/lang-specs.h b/gcc/cp/lang-specs.h index 7a7f5ff0ab5..e5651567a2d 100644 --- a/gcc/cp/lang-specs.h +++ b/gcc/cp/lang-specs.h @@ -39,6 +39,12 @@ along with GCC; see the file COPYING3. If not see {".HPP", "@c++-header", 0, 0, 0}, {".tcc", "@c++-header", 0, 0, 0}, {".hh", "@c++-header", 0, 0, 0}, + /* Module interface unit. Should there also be a .C counterpart? */ + {".ixx", "@c++", 0, 0, 0}, /* MSVC */ + {".cppm", "@c++", 0, 0, 0}, /* Clang/libc++ */ + {".cxxm", "@c++", 0, 0, 0}, + {".c++m", "@c++", 0, 0, 0}, + {".ccm", "@c++", 0, 0, 0}, {"@c++-header", "%{E|M|MM:cc1plus -E %{fmodules-ts:-fdirectives-only -fmodule-header}" " %(cpp_options) %2 %(cpp_debug_options)}" diff --git a/gcc/testsuite/g++.dg/modules/dep-1_a.C b/gcc/testsuite/g++.dg/modules/dep-1_a.C index 5ec5dd30f6d..3e92eeaef9f 100644 --- a/gcc/testsuite/g++.dg/modules/dep-1_a.C +++ b/gcc/testsuite/g++.dg/modules/dep-1_a.C @@ -4,6 +4,6 @@ export module m:part; // { dg-module-cmi m:part } // All The Backslashes! -// { dg-final { scan-file dep-1_a.d {\nm:part\.c\+\+m: gcm.cache/m-part\.gcm} } } +// { dg-final { scan-file dep-1_a.d {\nm:part\.c\+\+-module: gcm.cache/m-part\.gcm} } } // { dg-final { scan-f
[gcc r15-901] [to-be-committed] [RISC-V] Use pack to handle repeating constants
https://gcc.gnu.org/g:3ae02dcb108df426838bbbcc73d7d01855bc1196 commit r15-901-g3ae02dcb108df426838bbbcc73d7d01855bc1196 Author: Jeff Law Date: Wed May 29 07:41:55 2024 -0600 [to-be-committed] [RISC-V] Use pack to handle repeating constants This patch utilizes zbkb to improve the code we generate for 64bit constants when the high half is a duplicate of the low half. Basically we generate the low half and use a pack instruction with that same register repeated. ie pack dest,src,src That gives us a maximum sequence of 3 instructions and sometimes it will be just 2 instructions (say if the low 32bits can be constructed with a single addi or lui). As with shadd, I'm abusing an RTL opcode. This time it's CONCAT. It's reasonably close to what we're doing. Obviously it's just how we identify the desire to generate a pack in the array of opcodes. We don't actually emit a CONCAT. Note that we don't care about the potential sign extension from bit 31. pack will only look at bits 0..31 of each input (for rv64). So we go ahead and sign extend before synthesizing the low part as that allows us to handle more cases trivially. I had my testsuite generator chew on random cases of a repeating constant without any surprises. I don't see much point in including all those in the testcase (after all there's 2**32 of them). I've got a set of 10 I'm including. Nothing particularly interesting in them. An enterprising developer that needs this improved without zbkb could probably do so with a bit of work. First increase the cost by 1 unit. Second avoid cases where bit 31 is set and restrict it to cases when we can still create pseudos. On the codegen side, when encountering the CONCAT, generate the appropriate shift of "X" into a temporary register, then IOR the temporary with "X" into the new destination. Anyway, I've tested this in my tester (though it doesn't turn on zbkb, yet). I'll let the CI system chew on it overnight, but like mine, I don't think it lights up zbkb. So it's unlikely to spit out anything interesting. gcc/ * config/riscv/crypto.md (riscv_xpack___2): Remove '*' allow it to be used via the gen_* interface. * config/riscv/riscv.cc (riscv_build_integer): Identify when Zbkb can be used to profitably synthesize repeating constants. (riscv_move_integer): Codegen changes to generate those Zbkb sequences. gcc/testsuite/ * gcc.target/riscv/synthesis-9.c: New test. Diff: --- gcc/config/riscv/crypto.md | 2 +- gcc/config/riscv/riscv.cc| 23 +++ gcc/testsuite/gcc.target/riscv/synthesis-9.c | 28 3 files changed, 52 insertions(+), 1 deletion(-) diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md index b632312ade2..b9cac78fce1 100644 --- a/gcc/config/riscv/crypto.md +++ b/gcc/config/riscv/crypto.md @@ -107,7 +107,7 @@ ;; This is slightly more complex than the other pack patterns ;; that fully expose the RTL as it needs to self-adjust to ;; rv32 and rv64. But it's not that hard. -(define_insn "*riscv_xpack__2" +(define_insn "riscv_xpack___2" [(set (match_operand:X 0 "register_operand" "=r") (ior:X (ashift:X (match_operand:X 1 "register_operand" "r") (match_operand 2 "immediate_operand" "n")) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index a99211d56b1..91fefacee80 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -1123,6 +1123,22 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, } } + /* With pack we can generate a 64 bit constant with the same high + and low 32 bits triviall. */ + if (cost > 3 && TARGET_64BIT && TARGET_ZBKB) +{ + unsigned HOST_WIDE_INT loval = value & 0x; + unsigned HOST_WIDE_INT hival = value & ~loval; + if (hival >> 32 == loval) + { + cost = 1 + riscv_build_integer_1 (codes, sext_hwi (loval, 32), mode); + codes[cost - 1].code = CONCAT; + codes[cost - 1].value = 0; + codes[cost - 1].use_uw = false; + } + +} + return cost; } @@ -2679,6 +2695,13 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT value, rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; x = riscv_emit_set (t, x); } + else if (codes[i].code == CONCAT) + { + rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; + rtx t2 = gen_lowpart (SImode, x); + emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2)); + x = t; + } else x = gen_rtx_fmt_ee (codes[i].code, mode,
[gcc r15-902] c++: pragma target and static init [PR109753]
https://gcc.gnu.org/g:eff00046409a7289bfdc1861e68b532895f91c0e commit r15-902-geff00046409a7289bfdc1861e68b532895f91c0e Author: Jason Merrill Date: Wed Feb 14 17:18:17 2024 -0500 c++: pragma target and static init [PR109753] #pragma target and optimize should also apply to implicitly-generated functions like static initialization functions and defaulted special member functions. The handle_optimize_attribute change is necessary to avoid regressing g++.dg/opt/pr105306.C; maybe_clone_body creates a cgraph_node for the ~B alias before handle_optimize_attribute, and the alias never goes through finalize_function, so we need to adjust semantic_interposition somewhere else. PR c++/109753 gcc/c-family/ChangeLog: * c-attribs.cc (handle_optimize_attribute): Set cgraph_node::semantic_interposition. gcc/cp/ChangeLog: * decl.cc (start_preparsed_function): Call decl_attributes. gcc/testsuite/ChangeLog: * g++.dg/opt/always_inline1.C: New test. Diff: --- gcc/c-family/c-attribs.cc | 4 gcc/cp/decl.cc| 3 +++ gcc/testsuite/g++.dg/opt/always_inline1.C | 8 3 files changed, 15 insertions(+) diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc index 04e39b41bdf..605469dd7dd 100644 --- a/gcc/c-family/c-attribs.cc +++ b/gcc/c-family/c-attribs.cc @@ -5971,6 +5971,10 @@ handle_optimize_attribute (tree *node, tree name, tree args, if (prev_target_node != target_node) DECL_FUNCTION_SPECIFIC_TARGET (*node) = target_node; + /* Also update the cgraph_node, if it's already built. */ + if (cgraph_node *cn = cgraph_node::get (*node)) + cn->semantic_interposition = flag_semantic_interposition; + /* Restore current options. */ cl_optimization_restore (&global_options, &global_options_set, &cur_opts); diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc index a992d54dc8f..d481e1ec074 100644 --- a/gcc/cp/decl.cc +++ b/gcc/cp/decl.cc @@ -17832,6 +17832,9 @@ start_preparsed_function (tree decl1, tree attrs, int flags) doing_friend = true; } + /* Adjust for #pragma target/optimize. */ + decl_attributes (&decl1, NULL_TREE, 0); + if (DECL_DECLARED_INLINE_P (decl1) && lookup_attribute ("noinline", attrs)) warning_at (DECL_SOURCE_LOCATION (decl1), 0, diff --git a/gcc/testsuite/g++.dg/opt/always_inline1.C b/gcc/testsuite/g++.dg/opt/always_inline1.C new file mode 100644 index 000..a042a1cf0c6 --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/always_inline1.C @@ -0,0 +1,8 @@ +// PR c++/109753 +// { dg-do compile { target x86_64-*-* } } + +#pragma GCC target("avx2") +struct aa { +__attribute__((__always_inline__)) aa() {} +}; +aa _M_impl;
[gcc r15-903] vect: Unify bbs in loop_vec_info and bb_vec_info
https://gcc.gnu.org/g:9c747183efa555e45200523c162021e385511be5 commit r15-903-g9c747183efa555e45200523c162021e385511be5 Author: Feng Xue Date: Thu May 16 11:08:38 2024 +0800 vect: Unify bbs in loop_vec_info and bb_vec_info Both derived classes have their own "bbs" field, which have exactly same purpose of recording all basic blocks inside the corresponding vect region, while the fields are composed by different data type, one is normal array, the other is auto_vec. This difference causes some duplicated code even handling the same stuff, almost in tree-vect-patterns. One refinement is lifting this field into the base class "vec_info", and reset its value to the continuous memory area pointed by two old "bbs" in each constructor of derived classes. 2024-05-16 Feng Xue gcc/ * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move initialization of bbs to explicit construction code. Adjust the definition of nbbs. (update_epilogue_loop_vinfo): Update nbbs for epilog vinfo. * tree-vect-patterns.cc (vect_determine_precisions): Make loop_vec_info and bb_vec_info share same code. (vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0] via base vec_info class. (_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data fields of input auto_vec<> bbs. (vect_slp_region): Use access to nbbs to replace original bbs.length(). (vect_schedule_slp_node): Access to bbs[0] via base vec_info class. * tree-vectorizer.cc (vec_info::vec_info): Add initialization of bbs and nbbs. (vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info class. * tree-vectorizer.h (vec_info): Add new fields bbs and nbbs. (LOOP_VINFO_NBBS): New macro. (BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS. (BB_VINFO_NBBS): New macro. (_loop_vec_info): Remove field bbs. (_bb_vec_info): Rename field bbs. Diff: --- gcc/tree-vect-loop.c | 0 gcc/tree-vect-loop.cc | 7 ++- gcc/tree-vect-patterns.cc | 142 +- gcc/tree-vect-slp.cc | 23 +--- gcc/tree-vectorizer.cc| 7 ++- gcc/tree-vectorizer.h | 23 6 files changed, 74 insertions(+), 128 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c new file mode 100644 index 000..e69de29bb2d diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 3b94bb13a8b..04a9ac64df7 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data) _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) : vec_info (vec_info::loop, shared), loop (loop_in), -bbs (XCNEWVEC (basic_block, loop->num_nodes)), num_itersm1 (NULL_TREE), num_iters (NULL_TREE), num_iters_unchanged (NULL_TREE), @@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared) case of the loop forms we allow, a dfs order of the BBs would the same as reversed postorder traversal, so we are safe. */ - unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, - bbs, loop->num_nodes, loop); + bbs = XCNEWVEC (basic_block, loop->num_nodes); + nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs, +loop->num_nodes, loop); gcc_assert (nbbs == loop->num_nodes); for (unsigned int i = 0; i < nbbs; i++) @@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, tree advance) free (LOOP_VINFO_BBS (epilogue_vinfo)); LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs; + LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes; /* Advance data_reference's with the number of iterations of the previous loop and its prologue. */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 8929e5aa7f3..88e7e34d78d 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, stmt_vec_info stmt_info) void vect_determine_precisions (vec_info *vinfo) { + basic_block *bbs = vinfo->bbs; + unsigned int nbbs = vinfo->nbbs; + DUMP_VECT_SCOPE ("vect_determine_precisions"); - if (loop_vec_info loop_vinfo = dyn_cast (vinfo)) + for (unsigned int i = 0; i < nbbs; i++) { - class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - unsigned int nbbs = loop->num_nodes; - - for (unsigned int i = 0; i < nbbs; i++) + basic_block bb = bbs[i]; + for (auto gsi
[gcc r15-904] Delete a file due to push error
https://gcc.gnu.org/g:b24b081113c696f4e523c8ae53fc3ab89c3b4e4d commit r15-904-gb24b081113c696f4e523c8ae53fc3ab89c3b4e4d Author: Feng Xue Date: Wed May 29 22:20:45 2024 +0800 Delete a file due to push error gcc/ * tree-vect-loop.c : Removed. Diff: --- gcc/tree-vect-loop.c | 0 1 file changed, 0 insertions(+), 0 deletions(-) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c deleted file mode 100644 index e69de29bb2d..000
[gcc r15-905] libstdc++: Use RAII to replace try/catch blocks
https://gcc.gnu.org/g:d22eaeca7634b57e80ea61cadd82902fdc7e57ea commit r15-905-gd22eaeca7634b57e80ea61cadd82902fdc7e57ea Author: François Dumont Date: Thu May 16 06:59:50 2024 +0200 libstdc++: Use RAII to replace try/catch blocks Move _Guard into std::vector declaration and use it to guard all calls to vector _M_allocate. Doing so the compiler has more visibility on what is done with the pointers and do not raise anymore the -Wfree-nonheap-object warning. libstdc++-v3/ChangeLog: * include/bits/vector.tcc (_Guard): Move all the nested duplicated class... * include/bits/stl_vector.h (_Guard_alloc): ...here and rename. (_M_allocate_and_copy): Use latter. (_M_initialize_dispatch): Small code simplification. (_M_range_initialize): Likewise and set _M_finish first from the result of __uninitialize_fill_n_a that can throw. Diff: --- libstdc++-v3/include/bits/stl_vector.h | 77 ++--- libstdc++-v3/include/bits/vector.tcc | 78 ++ 2 files changed, 55 insertions(+), 100 deletions(-) diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index 31169711a48..182ad41ed94 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER clear() _GLIBCXX_NOEXCEPT { _M_erase_at_end(this->_M_impl._M_start); } +private: + // RAII guard for allocated storage. + struct _Guard_alloc + { + pointer _M_storage; // Storage to deallocate + size_type _M_len; + _Base& _M_vect; + + _GLIBCXX20_CONSTEXPR + _Guard_alloc(pointer __s, size_type __l, _Base& __vect) + : _M_storage(__s), _M_len(__l), _M_vect(__vect) + { } + + _GLIBCXX20_CONSTEXPR + ~_Guard_alloc() + { + if (_M_storage) + _M_vect._M_deallocate(_M_storage, _M_len); + } + + _GLIBCXX20_CONSTEXPR + pointer + _M_release() + { + pointer __res = _M_storage; + _M_storage = pointer(); + return __res; + } + + private: + _Guard_alloc(const _Guard_alloc&); + }; + protected: /** * Memory expansion handler. Uses the member allocation function to @@ -1618,18 +1651,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER _M_allocate_and_copy(size_type __n, _ForwardIterator __first, _ForwardIterator __last) { - pointer __result = this->_M_allocate(__n); - __try - { - std::__uninitialized_copy_a(__first, __last, __result, - _M_get_Tp_allocator()); - return __result; - } - __catch(...) - { - _M_deallocate(__result, __n); - __throw_exception_again; - } + _Guard_alloc __guard(this->_M_allocate(__n), __n, *this); + std::__uninitialized_copy_a + (__first, __last, __guard._M_storage, _M_get_Tp_allocator()); + return __guard._M_release(); } @@ -1642,13 +1667,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER // 438. Ambiguity in the "do the right thing" clause template void - _M_initialize_dispatch(_Integer __n, _Integer __value, __true_type) + _M_initialize_dispatch(_Integer __int_n, _Integer __value, __true_type) { - this->_M_impl._M_start = _M_allocate(_S_check_init_len( - static_cast(__n), _M_get_Tp_allocator())); - this->_M_impl._M_end_of_storage = - this->_M_impl._M_start + static_cast(__n); - _M_fill_initialize(static_cast(__n), __value); + const size_type __n = static_cast(__int_n); + pointer __start = + _M_allocate(_S_check_init_len(__n, _M_get_Tp_allocator())); + this->_M_impl._M_start = __start; + this->_M_impl._M_end_of_storage = __start + __n; + _M_fill_initialize(__n, __value); } // Called by the range constructor to implement [23.1.1]/9 @@ -1690,13 +1716,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER std::forward_iterator_tag) { const size_type __n = std::distance(__first, __last); - this->_M_impl._M_start - = this->_M_allocate(_S_check_init_len(__n, _M_get_Tp_allocator())); - this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n; - this->_M_impl._M_finish = - std::__uninitialized_copy_a(__first, __last, - this->_M_impl._M_start, - _M_get_Tp_allocator()); + pointer __start = + this->_M_allocate(_S_check_init_len(__n, _M_get_Tp_allocator())); + _Guard_alloc __guard(__start, __n, *this); + this->_M_im
[gcc r15-906] aarch64: Split aarch64_combinev16qi before RA [PR115258]
https://gcc.gnu.org/g:39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec commit r15-906-g39263ed2d39ac1cebde59bc5e72ddcad5dc7a1ec Author: Richard Sandiford Date: Wed May 29 16:43:33 2024 +0100 aarch64: Split aarch64_combinev16qi before RA [PR115258] Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose purpose is to put the two input data vectors into consecutive registers. This aarch64_combinev16qi was then split after reload into individual moves (from the first input to the first half of the output, and from the second input to the second half of the output). In the worst case, the RA might allocate things so that the destination of the aarch64_combinev16qi is the second input followed by the first input. In that case, the split form of aarch64_combinev16qi uses three eors to swap the registers around. This PR is about a test where this worst case occurred. And given the insn description, that allocation doesn't semm unreasonable. early-ra should (hopefully) mean that we're now better at allocating subregs of vector registers. The upcoming RA subreg patches should improve things further. The best fix for the PR therefore seems to be to split the combination before RA, so that the RA can see the underlying moves. Perhaps it even makes sense to do this at expand time, avoiding the need for aarch64_combinev16qi entirely. That deserves more experimentation though. gcc/ PR target/115258 * config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow the split before reload. * config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize into a form that handles pseudo registers. gcc/testsuite/ PR target/115258 * gcc.target/aarch64/pr115258.c: New test. Diff: --- gcc/config/aarch64/aarch64-simd.md | 2 +- gcc/config/aarch64/aarch64.cc | 29 ++--- gcc/testsuite/gcc.target/aarch64/pr115258.c | 19 +++ 3 files changed, 34 insertions(+), 16 deletions(-) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c311888e4bd..868f4486218 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8474,7 +8474,7 @@ UNSPEC_CONCAT))] "TARGET_SIMD" "#" - "&& reload_completed" + "&& 1" [(const_int 0)] { aarch64_split_combinev16qi (operands); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index ee12d8897a8..13191ec8e34 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -25333,27 +25333,26 @@ aarch64_output_sve_ptrues (rtx const_unspec) void aarch64_split_combinev16qi (rtx operands[3]) { - unsigned int dest = REGNO (operands[0]); - unsigned int src1 = REGNO (operands[1]); - unsigned int src2 = REGNO (operands[2]); machine_mode halfmode = GET_MODE (operands[1]); - unsigned int halfregs = REG_NREGS (operands[1]); - rtx destlo, desthi; gcc_assert (halfmode == V16QImode); - if (src1 == dest && src2 == dest + halfregs) + rtx destlo = simplify_gen_subreg (halfmode, operands[0], + GET_MODE (operands[0]), 0); + rtx desthi = simplify_gen_subreg (halfmode, operands[0], + GET_MODE (operands[0]), + GET_MODE_SIZE (halfmode)); + + bool skiplo = rtx_equal_p (destlo, operands[1]); + bool skiphi = rtx_equal_p (desthi, operands[2]); + + if (skiplo && skiphi) { /* No-op move. Can't split to nothing; emit something. */ emit_note (NOTE_INSN_DELETED); return; } - /* Preserve register attributes for variable tracking. */ - destlo = gen_rtx_REG_offset (operands[0], halfmode, dest, 0); - desthi = gen_rtx_REG_offset (operands[0], halfmode, dest + halfregs, - GET_MODE_SIZE (halfmode)); - /* Special case of reversed high/low parts. */ if (reg_overlap_mentioned_p (operands[2], destlo) && reg_overlap_mentioned_p (operands[1], desthi)) @@ -25366,16 +25365,16 @@ aarch64_split_combinev16qi (rtx operands[3]) { /* Try to avoid unnecessary moves if part of the result is in the right place already. */ - if (src1 != dest) + if (!skiplo) emit_move_insn (destlo, operands[1]); - if (src2 != dest + halfregs) + if (!skiphi) emit_move_insn (desthi, operands[2]); } else { - if (src2 != dest + halfregs) + if (!skiphi) emit_move_insn (desthi, operands[2]); - if (src1 != dest) + if (!skiplo) emit_move_insn (destlo, operands[1]); } } diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258.c b/gcc/testsuite/gcc.target/aarch64/pr115258.c new file mode 100644 index 000..9a489d4604c --- /dev/null +++ b
[gcc r15-907] Match: Add maybe_bit_not instead of plain matching
https://gcc.gnu.org/g:0a9154d154957b21eb2c9e4fbe9869e50fb9742f commit r15-907-g0a9154d154957b21eb2c9e4fbe9869e50fb9742f Author: Andrew Pinski Date: Sat May 25 23:29:48 2024 -0700 Match: Add maybe_bit_not instead of plain matching While working on adding matching of negative expressions of `a - b`, I noticed that we started to have "duplicated" patterns due to not having a way to match maybe negative expressions. So I went back to what I did for bit_not and decided to improve the situtation there so for some patterns where we had 2 operands of an expression where one could have been a bit_not, add back maybe_bit_not. This does not add maybe_bit_not in every place were bitwise_inverted_equal_p is used, just the ones were 2 operands of an expression could be swapped. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (bit_not_with_nop): Unconditionalize. (maybe_cmp): Likewise. (maybe_bit_not): New match pattern. (`~X & X`): Use maybe_bit_not and add `:c` back. (`~x ^ x`/`~x | x`): Likewise. Signed-off-by: Andrew Pinski Diff: --- gcc/match.pd | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 024e3350465..090ad4e08b0 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -167,7 +167,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))) && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0)) -#if GIMPLE /* These are used by gimple_bitwise_inverted_equal_p to simplify detection of BIT_NOT and comparisons. */ (match (bit_not_with_nop @0) @@ -188,7 +187,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (bit_xor@0 @1 @2) (if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) == 1))) -#endif +/* maybe_bit_not is used to match what + is acceptable for bitwise_inverted_equal_p. */ +(match (maybe_bit_not @0) + (bit_not_with_nop@0 @1)) +(match (maybe_bit_not @0) + (INTEGER_CST@0)) +(match (maybe_bit_not @0) + (maybe_cmp@0 @1)) /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR ABSU_EXPR returns unsigned absolute value of the operand and the operand @@ -1332,7 +1338,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Simplify ~X & X as zero. */ (simplify - (bit_and (convert? @0) (convert? @1)) + (bit_and:c (convert? @0) (convert? (maybe_bit_not @1))) (with { bool wascmp; } (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1)) && bitwise_inverted_equal_p (@0, @1, wascmp)) @@ -1597,7 +1603,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* ~x ^ x -> -1 */ (for op (bit_ior bit_xor) (simplify - (op (convert? @0) (convert? @1)) + (op:c (convert? @0) (convert? (maybe_bit_not @1))) (with { bool wascmp; } (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1)) && bitwise_inverted_equal_p (@0, @1, wascmp))
[gcc r15-908] match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]
https://gcc.gnu.org/g:547143df5aa0960fb149a26933dad7ca1c363afb commit r15-908-g547143df5aa0960fb149a26933dad7ca1c363afb Author: Andrew Pinski Date: Sun May 26 17:38:37 2024 -0700 match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224] While looking into something else, I noticed that `a ^ CST` needed to be special casing to bitwise_inverted_equal_p as it would simplify to `a ^ ~CST` for the bitwise not. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/115224 gcc/ChangeLog: * generic-match-head.cc (bitwise_inverted_equal_p): Add `a ^ CST` case. * gimple-match-head.cc (gimple_bit_xor_cst): New declaration. (gimple_bitwise_inverted_equal_p): Add `a ^ CST` case. * match.pd (bit_xor_cst): New match. (maybe_bit_not): Add bit_xor_cst case. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitops-8.c: New test. Signed-off-by: Andrew Pinski Diff: --- gcc/generic-match-head.cc| 10 ++ gcc/gimple-match-head.cc | 13 + gcc/match.pd | 4 gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c | 15 +++ 4 files changed, 42 insertions(+) diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc index 55ba369c6b3..641d8e9b2de 100644 --- a/gcc/generic-match-head.cc +++ b/gcc/generic-match-head.cc @@ -158,6 +158,16 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool &wascmp) if (TREE_CODE (expr2) == BIT_NOT_EXPR && bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0))) return true; + + /* `X ^ CST` and `X ^ ~CST` match for ~. */ + if (TREE_CODE (expr1) == BIT_XOR_EXPR && TREE_CODE (expr2) == BIT_XOR_EXPR + && bitwise_equal_p (TREE_OPERAND (expr1, 0), TREE_OPERAND (expr2, 0))) +{ + tree cst1 = uniform_integer_cst_p (TREE_OPERAND (expr1, 1)); + tree cst2 = uniform_integer_cst_p (TREE_OPERAND (expr2, 1)); + if (cst1 && cst2 && wi::to_wide (cst1) == ~wi::to_wide (cst2)) + return true; +} if (COMPARISON_CLASS_P (expr1) && COMPARISON_CLASS_P (expr2)) { diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc index 6220725b259..e26fa0860ee 100644 --- a/gcc/gimple-match-head.cc +++ b/gcc/gimple-match-head.cc @@ -283,6 +283,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree (*valueize) (tree)) bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree)); bool gimple_maybe_cmp (tree, tree *, tree (*) (tree)); +bool gimple_bit_xor_cst (tree, tree *, tree (*) (tree)); /* Helper function for bitwise_inverted_equal_p macro. */ @@ -301,6 +302,18 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, bool &wascmp, tree (*va if (operand_equal_p (expr1, expr2, 0)) return false; + tree xor1[2]; + tree xor2[2]; + /* `X ^ CST` and `X ^ ~CST` match for ~. */ + if (gimple_bit_xor_cst (expr1, xor1, valueize) + && gimple_bit_xor_cst (expr2, xor2, valueize)) +{ + if (operand_equal_p (xor1[0], xor2[0], 0) + && (wi::to_wide (uniform_integer_cst_p (xor1[1])) + == ~wi::to_wide (uniform_integer_cst_p (xor2[1] + return true; +} + tree other; /* Try if EXPR1 was defined as ~EXPR2. */ if (gimple_bit_not_with_nop (expr1, &other, valueize)) diff --git a/gcc/match.pd b/gcc/match.pd index 090ad4e08b0..480e36bbbaf 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -174,6 +174,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (bit_not_with_nop @0) (convert (bit_not @0)) (if (tree_nop_conversion_p (type, TREE_TYPE (@0) +(match (bit_xor_cst @0 @1) + (bit_xor @0 uniform_integer_cst_p@1)) (for cmp (tcc_comparison) (match (maybe_cmp @0) (cmp@0 @1 @2)) @@ -195,6 +197,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (INTEGER_CST@0)) (match (maybe_bit_not @0) (maybe_cmp@0 @1)) +(match (maybe_bit_not @0) + (bit_xor_cst@0 @1 @2)) /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR ABSU_EXPR returns unsigned absolute value of the operand and the operand diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c new file mode 100644 index 000..40f756e4455 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */ +/* PR tree-optimization/115224 */ + +int f1(int a, int b) +{ +a = a ^ 1; +int c = ~a; +return c | (a ^ b); +// ~((a ^ 1) & b) or (a ^ -2) | ~b +} +/* { dg-final { scan-tree-dump-times "bit_xor_expr, " 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "bit_ior_expr, " 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "bit_not_expr, " 1 "optimized" } } */ +
[gcc r15-909] PR modula2/115276 bugfix libgm2 wraptime.InitTM returns NIL
https://gcc.gnu.org/g:d1a1f7e9f0bedea55c558ab95127679bc3e9ff72 commit r15-909-gd1a1f7e9f0bedea55c558ab95127679bc3e9ff72 Author: Gaius Mulley Date: Wed May 29 17:26:59 2024 +0100 PR modula2/115276 bugfix libgm2 wraptime.InitTM returns NIL This patch fixes libgm2/libm2iso/wraptime.cc:InitTM so that it does not always return NULL. The incorrect autoconf macro was used (inside InitTM) and the function short circuited to return NULL. The fix is to use HAVE_SYS_TIME_H and use AC_HEADER_TIME in libgm2/configure.ac. libgm2/ChangeLog: PR modula2/115276 * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Use AC_HEADER_TIME. * libm2iso/wraptime.cc (InitTM): Check HAVE_SYS_TIME_H before using struct tm to obtain the size. gcc/testsuite/ChangeLog: PR modula2/115276 * gm2/isolib/run/pass/testinittm.mod: New test. Signed-off-by: Gaius Mulley Diff: --- gcc/testsuite/gm2/isolib/run/pass/testinittm.mod | 17 +++ libgm2/config.h.in | 3 ++ libgm2/configure | 39 ++-- libgm2/configure.ac | 1 + libgm2/libm2iso/wraptime.cc | 2 +- 5 files changed, 59 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gm2/isolib/run/pass/testinittm.mod b/gcc/testsuite/gm2/isolib/run/pass/testinittm.mod new file mode 100644 index 000..dfe041140f1 --- /dev/null +++ b/gcc/testsuite/gm2/isolib/run/pass/testinittm.mod @@ -0,0 +1,17 @@ +MODULE testinittm ; + +FROM wraptime IMPORT InitTM, tm ; +FROM libc IMPORT printf, exit ; + +VAR + m: tm ; +BEGIN + m := InitTM () ; + IF m = NIL + THEN + printf ("InitTM failed\n"); + exit (1) + ELSE + printf ("InitTM passed\n") + END +END testinittm. diff --git a/libgm2/config.h.in b/libgm2/config.h.in index 7426cb26cf8..321ef3b807f 100644 --- a/libgm2/config.h.in +++ b/libgm2/config.h.in @@ -335,6 +335,9 @@ /* Define to 1 if you have the ANSI C header files. */ #undef STDC_HEADERS +/* Define to 1 if you can safely include both and . */ +#undef TIME_WITH_SYS_TIME + /* Enable extensions on AIX 3, Interix. */ #ifndef _ALL_SOURCE # undef _ALL_SOURCE diff --git a/libgm2/configure b/libgm2/configure index 13861f0ff93..c36fd7d4cac 100755 --- a/libgm2/configure +++ b/libgm2/configure @@ -6837,6 +6837,41 @@ $as_echo "#define HAVE_SYS_WAIT_H 1" >>confdefs.h fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether time.h and sys/time.h may both be included" >&5 +$as_echo_n "checking whether time.h and sys/time.h may both be included... " >&6; } +if ${ac_cv_header_time+:} false; then : + $as_echo_n "(cached) " >&6 +else + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +#include +#include + +int +main () +{ +if ((struct tm *) 0) +return 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + ac_cv_header_time=yes +else + ac_cv_header_time=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_time" >&5 +$as_echo "$ac_cv_header_time" >&6; } +if test $ac_cv_header_time = yes; then + +$as_echo "#define TIME_WITH_SYS_TIME 1" >>confdefs.h + +fi + ac_fn_c_check_header_mongrel "$LINENO" "math.h" "ac_cv_header_math_h" "$ac_includes_default" if test "x$ac_cv_header_math_h" = xyes; then : @@ -14544,7 +14579,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 14547 "configure" +#line 14582 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -14650,7 +14685,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 14653 "configure" +#line 14688 "configure" #include "confdefs.h" #if HAVE_DLFCN_H diff --git a/libgm2/configure.ac b/libgm2/configure.ac index 9563831ddc5..1e6b82305ff 100644 --- a/libgm2/configure.ac +++ b/libgm2/configure.ac @@ -88,6 +88,7 @@ AC_ARG_WITH(cross-host, # Checks for header files. AC_HEADER_STDC AC_HEADER_SYS_WAIT +AC_HEADER_TIME AC_CHECK_HEADER([math.h], [AC_DEFINE([HAVE_MATH_H], [1], [have math.h])]) diff --git a/libgm2/libm2iso/wraptime.cc b/libgm2/libm2iso/wraptime.cc index 158086b75cc..4bbd5f9701d 100644 --- a/libgm2/libm2iso/wraptime.cc +++ b/libgm2/libm2iso/wraptime.cc @@ -113,7 +113,7 @@ EXPORT(KillTimezone) (struct timezone *tv) /* InitTM - returns a newly created opaque type. */ -#if defined(HAVE_STRUCT_TM) && defined(HAVE_MALLOC_H) +#if defined(HAVE_SYS_TIME_H) && defined(HAVE_MALLOC_H) extern "C" struct tm * EXPORT(InitTM) (void) {
[gcc r15-910] MIPS/testsuite: Fix bseli.b fail in msa-builtins.c
https://gcc.gnu.org/g:9a92e5e56a7f2b19928b8cb7634f59d9c7b2b582 commit r15-910-g9a92e5e56a7f2b19928b8cb7634f59d9c7b2b582 Author: YunQiang Su Date: Tue May 28 23:44:49 2024 +0800 MIPS/testsuite: Fix bseli.b fail in msa-builtins.c commit 05daf617ea22e1d818295ed2d037456937e23530 Author: Jeff Law Date: Sat May 25 12:39:05 2024 -0600 [committed] [v2] More logical op simplifications in simplify-rtx.cc does some simplifications, and then `bseli.b $w1,$w0,255` is found that it is same with `or.v $w1,$w0,$w1`. So there will be no bseli.b instruction generated. Let's use 254 instead of 255 to test the generation of `bseli.b`. gcc/testsuite * gcc.target/mips/msa-builtins.c: Use 254 instead of 255 for bseli.b, as `bseli.b $w0,$w1,255` is same as `or.v $w0,$w0,$w1`. Diff: --- gcc/testsuite/gcc.target/mips/msa-builtins.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/mips/msa-builtins.c b/gcc/testsuite/gcc.target/mips/msa-builtins.c index a679f065f34..6a146b3e6ae 100644 --- a/gcc/testsuite/gcc.target/mips/msa-builtins.c +++ b/gcc/testsuite/gcc.target/mips/msa-builtins.c @@ -705,7 +705,7 @@ #define BNEG(T) NOMIPS16 T FN (bneg, T ## _DF) (T i, T j) { return BUILTIN (bneg, T ## _DF) (i, j); } #define BNEGI(T) NOMIPS16 T FN (bnegi, T ## _DF) (T i) { return BUILTIN (bnegi, T ## _DF) (i, 0); } #define BSEL(T) NOMIPS16 T FN (bsel, v) (T i, T j, T k) { return BUILTIN (bsel, v) (i, j, k); } -#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN (bseli, T ## _DF) (i, j, U8MAX); } +#define BSELI(T) NOMIPS16 T FN (bseli, T ## _DF) (T i, T j) { return BUILTIN (bseli, T ## _DF) (i, j, U8MAX-1); } #define BSET(T) NOMIPS16 T FN (bset, T ## _DF) (T i, T j) { return BUILTIN (bset, T ## _DF) (i, j); } #define BSETI(T) NOMIPS16 T FN (bseti, T ## _DF) (T i) { return BUILTIN (bseti, T ## _DF) (i, 0); } #define NLOC(T) NOMIPS16 T FN (nloc, T ## _DF) (T i) { return BUILTIN (nloc, T ## _DF) (i); }
[gcc r15-911] MIPS16: Mark $2/$3 as clobbered if GP is used
https://gcc.gnu.org/g:915440eed21de367cb41857afb5273aff5bcb737 commit r15-911-g915440eed21de367cb41857afb5273aff5bcb737 Author: YunQiang Su Date: Wed May 29 02:28:25 2024 +0800 MIPS16: Mark $2/$3 as clobbered if GP is used PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu$2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer Origin-Patch-by: Felix Fietkau . gcc * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. Diff: --- gcc/config/mips/mips.cc | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index b63d40a357b..b478cddc8ad 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -3233,6 +3233,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p) { rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), + MIPS_PROLOGUE_TEMP (word_mode)); } return insn; @@ -3329,7 +3332,13 @@ mips16_gp_pseudo_reg (void) rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx); rtx_insn *insn = emit_insn_after (set, scan); INSN_LOCATION (insn) = 0; - + /* NewABI support hasn't been implement. NewABI should generate RTL +sequence instead of ASM sequence directly. */ + if (mips_current_loadgp_style () == LOADGP_OLDABI) + { + emit_clobber (MIPS16_PIC_TEMP); + emit_clobber (MIPS_PROLOGUE_TEMP (Pmode)); + } pop_topmost_sequence (); }
[gcc/ibm/heads/gcc-13-branch] (553 commits) ibm: Merge up to top of releases/gcc-13
The branch 'ibm/heads/gcc-13-branch' was updated to point to: c3db5f495a1... ibm: Merge up to top of releases/gcc-13 It previously pointed to: efb4bfb219d... ibm: Merge up to top of releases/gcc-13 Diff: Summary of changes (added commits): --- c3db5f4... ibm: Merge up to top of releases/gcc-13 ebca600... Daily bump. (*) fd91953... libstdc++: Fix up 19_diagnostics/stacktrace/hash.cc on 13 b (*) 3185cfe... Fortran: Fix SHAPE for zero-size arrays (*) 67434fe... libstdc++: Guard use of sized deallocation [PR114940] (*) d7f9f23... Daily bump. (*) b954f15... Daily bump. (*) 513d050... Daily bump. (*) 91c7ec5... Daily bump. (*) 53cdaa7... c++: unroll pragma in templates [PR111529] (*) 5f14578... c++: array of PMF [PR113598] (*) cf76815... Daily bump. (*) 6f8933c... Daily bump. (*) 75d394c... testsuite: Verify r0-r3 are extended with CMSE (*) f0b88ec... Fortran: fix issues with class(*) assignment [PR114827] (*) 2ebf3af... Fortran: fix reallocation on assignment of polymorphic vari (*) 53bc98f... strlen: Fix up !si->full_string_p handling in count_nonzero (*) 35ac28b... ubsan: Use right address space for MEM_REF created for bool (*) a841964... Daily bump. (*) 9433e30... libstdc++: testsuite: Enhance codecvt_unicode with tests fo (*) bd5e672... libstdc++: Fix handling of surrogate CP in codecvt [PR10897 (*) 0a9df2c... c++: Fix std dialect hint for std::to_address [PR107800] (*) 5ed32d0... Fortran: fix dependency checks for inquiry refs [PR115039] (*) c827f46... testsuite: Adjust pr113359-2_*.c with unsigned long long [P (*) 3f6a425... PHIOPT: Don't transform minmax if middle bb contains a phi (*) d6cf49e... match: Disable `(type)zero_one_valuep*CST` for 1bit signed (*) bde5894... Bump BASE-VER. (*) b71f1de... Update ChangeLog and version files for release (*) a021b58... Daily bump. (*) 4416023... Daily bump. (*) 94509b6... Daily bump. (*) 162c441... [committed] Fix RISC-V missing stack tie (*) 5b5342e... Daily bump. (*) 851aa3b... Daily bump. (*) 1db45e8... ipa: Compare jump functions in ICF (PR 113907) (*) 10bf53a... ICF&SRA: Make ICF and SRA agree on padding (*) 7dca716... libstdc++: Fix typo in std::stacktrace::max_size [PR115063] (*) 71e941b... libstdc++: Fix infinite loop in std::binomial_distribution (*) b9e2a32... libstdc++: Adjust expected locale-dependent date formats in (*) ebc61a9... libstdc++: Fix typo in Doxygen comment (*) bce15a5... libstdc++: Fix run_doxygen for Doxygen 1.10 man page format (*) 47cac09... c++: build_extra_args recapturing local specs [PR114303] (*) 12ee04d... Daily bump. (*) d3659e2... c++: constexpr union member access folding [PR114709] (*) 2e353c6... Manually add ChangeLog entries for various commits from 202 (*) d629308... rtl-optimization/54052 - RTL SSA PHI insertion compile-time (*) 6d1801f... Daily bump. (*) b7a2697... diagnostics: fix corrupt json/SARIF on stderr [PR114348] (*) 2a6f99a... Fix ICE in -fdiagnostics-generate-patch [PR112684] (*) 230f672... diagnostics: fix ICE on sarif output when source file is un (*) 96f7a36... analyzer: fix ICE and false positive with -Wanalyzer-deref- (*) 810d35a... analyzer: fix ICE due to type mismatch when replaying call (*) ed02610... analyzer: fix -Wanalyzer-deref-before-check false positive (*) 67d104f... analyzer: fix -Wanalyzer-va-arg-type-mismatch false +ve on (*) 2c688f6... analyzer: fix skipping of debug stmts [PR113253] (*) 0593151... analyzer: fix defaults in compound assignments from non-zer (*) 132eb1a... analyzer: casting all zeroes should give all zeroes [PR1133 (*) 994477c... analyzer: fix deref-before-check false positives due to inl (*) a1cb188... analyzer: fix ICE for 2 bits before the start of base regio (*) b8c772c... jit: dump string literal initializers correctly (*) 44968a0... testsuite, analyzer: add test case [PR108171] (*) a0b13d0... analyzer: fix ICE on zero-sized arrays [PR110882] (*) 0df1ee0... analyzer: fix ICE on division of tainted floating-point val (*) 60dcb71... jit.exp: handle dwarf version mismatch in jit-check-debug-i (*) b38472f... jit: avoid using __vector in testcase [PR110466] (*) e0c5290... testsuite: Add more allocation size tests for conjured sval (*) ccf8d3e... analyzer: Fix allocation size false positive on conjured sv (*) 89feb35... analyzer: add caching to globals with initializers [PR11011 (*) e30211c... [PR114415][scheduler]: Fixing wrong code generation (*) 421311a... Fix range-ops operator_addr. (*) fefdb9f... Daily bump. (*) 6f7674a... testsuite: Fix up vector-subaccess-1.C test for ia32 [PR892 (*) adba85b... AVR: target/114981 - Support __builtin_powi[l] / __powidf2. (*) 44d84db... reassoc: Fix up optimize_range_tests_to_bit_test [PR114965] (*) cad27df... expansion: Use __trunchfbf2 calls rather than __extendhfbf2 (*) d1ec7bc... tree-inline: Remove .ASAN_MARK calls when inlining function
[gcc(refs/vendors/ibm/heads/gcc-13-branch)] ibm: Merge up to top of releases/gcc-13
https://gcc.gnu.org/g:c3db5f495a1543fb22f725be910dc46249a15e57 commit c3db5f495a1543fb22f725be910dc46249a15e57 Merge: efb4bfb219d ebca6006f44 Author: Peter Bergner Date: Wed May 29 10:48:31 2024 -0500 ibm: Merge up to top of releases/gcc-13 2024-05-29 Peter Bergner Merge up to releases/gcc-13 ebca6006f44408b8084868da6613f185b810db74 Diff: ChangeLog | 15 + Makefile.in| 30 + Makefile.tpl | 24 + c++tools/ChangeLog |4 + config/ChangeLog |4 + contrib/ChangeLog | 13 + contrib/dg-extract-results.sh | 17 +- contrib/header-tools/ChangeLog |4 + contrib/reghunt/ChangeLog |4 + contrib/regression/ChangeLog |4 + fixincludes/ChangeLog |4 + gcc/BASE-VER |2 +- gcc/ChangeLog | 1964 ++ gcc/ChangeLog.ibm |4 + gcc/DATESTAMP |2 +- gcc/ada/ChangeLog | 50 + gcc/ada/exp_attr.adb | 63 +- gcc/ada/exp_ch4.adb|2 - gcc/ada/exp_ch7.adb| 13 + gcc/ada/exp_util.adb | 15 +- gcc/ada/sem_aggr.adb |9 +- gcc/ada/sem_ch13.adb | 12 +- gcc/ada/sem_res.adb| 14 +- gcc/analyzer/ChangeLog | 148 + gcc/analyzer/call-summary.cc | 12 + gcc/analyzer/checker-event.cc | 40 - gcc/analyzer/constraint-manager.cc | 131 + gcc/analyzer/constraint-manager.h |1 + gcc/analyzer/engine.cc |7 + gcc/analyzer/inlining-iterator.h | 40 + gcc/analyzer/kf.cc | 22 + gcc/analyzer/region-model-manager.cc |9 +- gcc/analyzer/region-model.cc | 110 +- gcc/analyzer/region.cc | 77 +- gcc/analyzer/region.h | 14 +- gcc/analyzer/sm-malloc.cc | 40 + gcc/analyzer/sm-taint.cc |6 + gcc/analyzer/state-purge.cc|9 + gcc/analyzer/store.cc | 11 +- gcc/analyzer/store.h | 10 +- gcc/analyzer/supergraph.cc |4 + gcc/analyzer/varargs.cc| 38 +- gcc/asan.cc| 52 +- gcc/attribs.cc | 17 +- gcc/bb-reorder.cc |3 +- gcc/bitmap.cc |2 +- gcc/c-family/ChangeLog | 49 + gcc/c-family/c-attribs.cc | 32 +- gcc/c-family/c-common.cc |8 +- gcc/c-family/c-lex.cc | 32 +- gcc/c-family/c-pch.cc |5 +- gcc/c/ChangeLog| 14 + gcc/c/c-decl.cc|7 +- gcc/calls.cc |7 +- gcc/cfgexpand.cc | 32 +- gcc/cfgrtl.cc | 27 +- gcc/cfgrtl.h |1 + gcc/cgraph.cc | 10 +- gcc/cgraph.h | 15 +- gcc/cgraphunit.cc |2 + gcc/combine.cc | 12 +- gcc/common.opt |2 +- gcc/common/config/avr/avr-common.cc|6 - gcc/common/config/i386/i386-common.cc |2 +- gcc/config.gcc |1 + gcc/config.in | 21 +- gcc/config/aarch64/aarch64-arches.def |2 +- gcc/config/aarch64/aarch64-builtins.cc |2 +- gcc/config/aarch64/aarch64-cores.def |2 +- gcc/config/aarch64/aarch64.cc | 31 +- gcc/config/aarch64/aarch64.md | 35 +- gcc/config/aarch64/iterators.md|3 + gcc/config/aarch64/t-aarch64-rtems | 42 + gcc/config/alpha/alpha.cc |3 +- gcc/config/arc/arc.cc |
[gcc r15-912] C23: fix aliasing for structures/unions with incomplete types
https://gcc.gnu.org/g:86b98d939989427ff025bcfd536ad361fcdc699c commit r15-912-g86b98d939989427ff025bcfd536ad361fcdc699c Author: Martin Uecker Date: Sat Mar 30 19:49:48 2024 +0100 C23: fix aliasing for structures/unions with incomplete types When incomplete structure/union types are completed later, compatibility of struct types that contain pointers to such types changes. When forming equivalence classes for TYPE_CANONICAL, we therefor need to be conservative and treat all structs with the same tag which are pointer targets as equivalent for purposed of determining equivalency of structure/union types which contain such types as member. This avoids having to update TYPE_CANONICAL of such structure/unions recursively. The pointer types themselves are updated in c_update_type_canonical. gcc/c/ * c-typeck.cc (comptypes_internal): Add flag to track whether a struct is the target of a pointer. (tagged_types_tu_compatible): When forming equivalence classes, treat nested pointed-to structs as equivalent. gcc/testsuite/ * gcc.dg/c23-tag-incomplete-alias-1.c: New test. Diff: --- gcc/c/c-typeck.cc | 43 +-- gcc/testsuite/gcc.dg/c23-tag-incomplete-alias-1.c | 36 +++ 2 files changed, 76 insertions(+), 3 deletions(-) diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index ad4c7add562..09b2c265a46 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -1172,6 +1172,7 @@ struct comptypes_data { bool different_types_p; bool warning_needed; bool anon_field; + bool pointedto; bool equiv; const struct tagged_tu_seen_cache* cache; @@ -1235,8 +1236,36 @@ comptypes_check_different_types (tree type1, tree type2, } -/* Like comptypes, but if it returns nonzero for struct and union - types considered equivalent for aliasing purposes. */ +/* Like comptypes, but if it returns true for struct and union types + considered equivalent for aliasing purposes, i.e. for setting + TYPE_CANONICAL after completing a struct or union. + + This function must return false only for types which are not + compatible according to C language semantics (cf. comptypes), + otherwise the middle-end would make incorrect aliasing decisions. + It may return true for some similar types that are not compatible + according to those stricter rules. + + In particular, we ignore size expression in arrays so that the + following structs are in the same equivalence class: + + struct foo { char (*buf)[]; }; + struct foo { char (*buf)[3]; }; + struct foo { char (*buf)[4]; }; + + We also treat unions / structs with members which are pointers to + structures or unions with the same tag as equivalent (if they are not + incompatible for other reasons). Although incomplete structure + or union types are not compatible to any other type, they may become + compatible to different types when completed. To avoid having to update + TYPE_CANONICAL at this point, we only consider the tag when forming + the equivalence classes. For example, the following types with tag + 'foo' are all considered equivalent: + + struct bar; + struct foo { struct bar *x }; + struct foo { struct bar { int a; } *x }; + struct foo { struct bar { char b; } *x }; */ bool comptypes_equiv_p (tree type1, tree type2) @@ -1357,6 +1386,7 @@ comptypes_internal (const_tree type1, const_tree type2, /* Do not remove mode information. */ if (TYPE_MODE (t1) != TYPE_MODE (t2)) return false; + data->pointedto = true; return comptypes_internal (TREE_TYPE (t1), TREE_TYPE (t2), data); case FUNCTION_TYPE: @@ -1375,7 +1405,7 @@ comptypes_internal (const_tree type1, const_tree type2, if ((d1 == NULL_TREE) != (d2 == NULL_TREE)) data->different_types_p = true; - /* Ignore size mismatches. */ + /* Ignore size mismatches when forming equivalence classes. */ if (data->equiv) return true; /* Sizes must match unless one is missing or variable. */ @@ -1515,6 +1545,12 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree t2, if (TYPE_NAME (t1) != TYPE_NAME (t2)) return false; + /* When forming equivalence classes for TYPE_CANONICAL in C23, we treat + structs with the same tag as equivalent, but only when they are targets + of pointers inside other structs. */ + if (data->equiv && data->pointedto) +return true; + if (!data->anon_field && NULL_TREE == TYPE_NAME (t1)) return false; @@ -1610,6 +1646,7 @@ tagged_types_tu_compatible_p (const_tree t1, const_tree t2, return false; data->anon_field = !DECL_NAME (s1); + data->pointedto = false; data->cache = &entry; if (!comptypes_internal (TREE_TYPE (s1), TREE_TYPE (s2), data)) diff --git a/gcc/testsuite/gcc.d
[gcc/ibm/heads/gcc-12-branch] (363 commits) ibm: Merge up to top of releases/gcc-12
The branch 'ibm/heads/gcc-12-branch' was updated to point to: 92786addfe0... ibm: Merge up to top of releases/gcc-12 It previously pointed to: 9f2e51a88fb... ibm: Merge up to top of releases/gcc-12 Diff: Summary of changes (added commits): --- 92786ad... ibm: Merge up to top of releases/gcc-12 342f577... Daily bump. (*) da9b7a5... ubsan: Use right address space for MEM_REF created for bool (*) e0b2c4f... Fortran: Fix SHAPE for zero-size arrays (*) 72f6b7e... ipa: Compare jump functions in ICF (PR 113907) (*) 3bb534d... Daily bump. (*) 4507501... Daily bump. (*) 0bd259a... Daily bump. (*) e11d3dd... Daily bump. (*) ba57a52... c++: __is_constructible ref binding [PR100667] (*) 6a5dcdb... c++: fix PR111529 backport (*) 1982783... c++: unroll pragma in templates [PR111529] (*) 419b5e1... c++: array of PMF [PR113598] (*) 7076c56... c++: binding reference to comma expr [PR114561] (*) a1ff317... Daily bump. (*) df19155... Daily bump. (*) d9c8940... testsuite: Verify r0-r3 are extended with CMSE (*) 13ced60... Daily bump. (*) 113ddbe... Daily bump. (*) 2f0c2cc... Daily bump. (*) 1ba6e8b... Daily bump. (*) 65e5547... middle-end/110176 - wrong zext (bool) <= (int) 4294967295u (*) 47e6bff... tree-optimization/111039 - abnormals and bit test merging (*) 5db4b54... tree-optimization/112281 - loop distribution and zero depen (*) dbb5273... tree-optimization/112495 - alias versioning and address spa (*) 4a71557... tree-optimization/112505 - bit-precision induction vectoriz (*) 1f41e8e... debug/112718 - reset all type units with -ffat-lto-objects (*) 9bad5cf... tree-optimization/112793 - SLP of constant/external code-ge (*) 2d650c0... tree-optimization/114027 - fix testcase (*) 6661a7c... tree-optimization/114027 - conditional reduction chain (*) c1b2185... tree-optimization/114375 - disallow SLP discovery of permut (*) a7b1d81... tree-optimization/114231 - use patterns for BB SLP discover (*) 46b2e98... middle-end/114734 - wrong code with expand_call_mem_ref (*) 42a0393... lto/114655 - -flto=4 at link time doesn't override -flto=au (*) 56415e3... gcov-profile/114715 - missing coverage for switch (*) b656e65... Daily bump. (*) 2183e5b... ipa: Self-DCE of uses of removed call LHSs (PR 108007) (*) 4419198... ipa: Force args obtined through pass-through maps to the ex (*) de66146... Daily bump. (*) 2beef72... Daily bump. (*) c5c3a4a... Fix range-ops operator_addr. (*) f7db003... Daily bump. (*) 587596d... Objective-C, NeXT, v2: Correct a regression in code-gen. (*) 3349a6c... Daily bump. (*) ffa41c6... testsuite: Fix up vector-subaccess-1.C test for ia32 [PR892 (*) f5c7306... Fix PR 110386: backprop vs ABSU_EXPR (*) 58d11bf... testsuite: fix Wmismatched-new-delete-8.C with -m32 (*) 16319f8... warn-access: Fix handling of unnamed types [PR109804] (*) 39d56b9... Fix PR 111331: wrong code for `a > 28 ? MIN : 29` (*) d88fe82... Fold: Fix up merge_truthop_with_opposite_arm for NaNs [PR95 (*) 0ab30fb... libstdc++: Fix conversion of simd to vector builtin (*) 79aa696... libstdc++: Silence irrelevant warnings in (*) 7abc861... libstdc++: Fix -Wsystem-headers warnings in tests (*) c0c1207... libstdc++: Update synopsis test for C++11 and late (*) 2d174d4... libstdc++: Fix -Wsystem-headers warnings (*) 14876f3... libstdc++: Improve doxygen docs for (*) 0a9cfae... libstdc++: Improve doxygen docs for some of (*) 0d128f5... libstdc++: Improve doxygen docs for algorithms and more (*) 54de91d... libstdc++: Improve doxygen docs for std::allocator (*) e1800b8... libstdc++: Improve doxygen docs for (*) f0db5df... libstdc++: Improve doxygen docs for (*) 914a226... libstdc++: Stop defining C++0x compat symbols for versioned (*) f8ab9b7... libstdc++: Add macros for the inline namespace std::_V2 (*) d9f006d... libstdc++: Disable Doxygen GROUP_NESTED_COMPOUNDS config op (*) f3d4e25... libstdc++: Simplify fs::path construction using variable te (*) 57eb035... libstdc++: Update std::pointer_traits to match new LWG 3545 (*) 1bb467f... libstdc++: Simplify detection idiom using concepts (*) 51e9dcc... libstdc++: Improve doxygen docs for std::pointer_traits (*) c6f80dc... libstdc++: use grep -E instead of egrep in scripts (*) 5c156f5... libstdc++: Fix allocator propagation in regex algorithms [P (*) e35b26c... libstdc++: Define std::basic_stringbuf::view() for old std: (*) 0135f93... libstdc++: Add autoconf checks for mkdir, chmod, chdir, and (*) a389921... libstdc++: Explicitly default some copy ctors and assignmen (*) dc0964f... libstdc++: Add static_assert to std::integer_sequence [PR11 (*) 15c5170... libstdc++: Remove non-void static assertions in variant's s (*) c285c1b... libstdc++: Fix exception thrown by std::shared_lock::unlock (*) 6f5dcea... libstdc++: Fix conditions for using memcmp in std::lexicogr (*) 8ec265c... libstdc++: Do not use memmove
[gcc(refs/vendors/ibm/heads/gcc-12-branch)] ibm: Merge up to top of releases/gcc-12
https://gcc.gnu.org/g:92786addfe0797790a97ddc50f7709a1bf4791a9 commit 92786addfe0797790a97ddc50f7709a1bf4791a9 Merge: 9f2e51a88fb 342f577d8ea Author: Peter Bergner Date: Wed May 29 14:42:14 2024 -0500 ibm: Merge up to top of releases/gcc-12 2024-05-29 Peter Bergner Merge up to releases/gcc-12 342f577d8ea60c3473a6c1e66ef038b96f99f9d2 Diff: ChangeLog |8 + configure |2 +- configure.ac |2 +- fixincludes/ChangeLog | 20 + fixincludes/fixincl.x | 109 +- fixincludes/inclhack.def | 47 + fixincludes/tests/base/objc/runtime.h | 24 + fixincludes/tests/base/stdio.h |7 + gcc/ChangeLog | 954 +++ gcc/ChangeLog.ibm |4 + gcc/DATESTAMP |2 +- gcc/ada/ChangeLog | 18 + gcc/ada/exp_ch4.adb|2 - gcc/ada/exp_ch7.adb| 13 + gcc/ada/exp_util.adb | 15 +- gcc/ada/sem_res.adb| 14 +- gcc/asan.cc| 15 +- gcc/c-family/ChangeLog | 16 + gcc/c-family/c-common.cc |7 +- gcc/c-family/c-pch.cc |5 +- gcc/cfgexpand.cc |2 +- gcc/cfgrtl.cc | 24 +- gcc/cfgrtl.h |1 + gcc/cgraph.cc | 10 +- gcc/cgraph.h | 18 +- gcc/cgraphunit.cc |2 + gcc/config.in | 24 + gcc/config/aarch64/aarch64-cores.def |2 +- gcc/config/aarch64/aarch64.cc | 29 +- gcc/config/aarch64/aarch64.h |2 +- gcc/config/aarch64/aarch64.md | 35 +- gcc/config/aarch64/iterators.md|3 + gcc/config/arm/arm.cc | 69 ++ gcc/config/arm/neon.md |4 +- gcc/config/avr/avr-mcus.def| 83 +- gcc/config/avr/avr.cc | 10 + gcc/config/darwin-protos.h | 11 + gcc/config/darwin-sections.def |4 +- gcc/config/darwin.cc | 224 +++- gcc/config/darwin.h| 92 +- gcc/config/darwin.opt |4 + gcc/config/i386/amxtileintrin.h|4 +- gcc/config/i386/darwin.h |4 +- gcc/config/i386/i386-builtin.def |4 + gcc/config/i386/i386-expand.cc | 19 + gcc/config/i386/i386-features.cc | 50 +- gcc/config/i386/i386-features.h|1 + gcc/config/i386/i386.md| 24 + gcc/config/loongarch/genopts/loongarch.opt.in | 31 +- gcc/config/loongarch/gnu-user.h|4 +- gcc/config/loongarch/loongarch-opts.cc | 22 + gcc/config/loongarch/loongarch-opts.h | 18 + gcc/config/loongarch/loongarch-protos.h|2 +- gcc/config/loongarch/loongarch.cc | 69 +- gcc/config/loongarch/loongarch.h | 22 +- gcc/config/loongarch/loongarch.md | 23 +- gcc/config/loongarch/loongarch.opt | 31 +- gcc/config/loongarch/sync.md | 46 +- gcc/config/mips/mips-msa.md| 18 +- gcc/config/pa/pa.md|6 +- gcc/config/riscv/sync.md |9 + gcc/config/rs6000/darwin.h |6 +- gcc/config/rs6000/mma.md |8 +- gcc/config/rs6000/predicates.md|2 +- gcc/config/rs6000/rs6000-builtin.cc|6 +- gcc/config/rs6000/rs6000-c.cc | 14 +- gcc/config/rs6000/rs6000-cpus.def |5 +- gcc/config/rs6000/rs6000.cc| 19 +- gcc/config/rs6000/rs6000.h |4 +- gcc/config/rs6000/rs6000.md|8 +- gcc/config/rs6000/rs6000.opt |6 +- gcc/config/rs6000/vsx.md |4 +- gcc/config/sh/sh.cc|3 +- gcc/configure | 149 ++- gcc/configure.ac
[gcc r15-914] Revert "resource.cc: Remove redundant conditionals"
https://gcc.gnu.org/g:c31a9d3152d6119aab83c403308ddb933fe905c5 commit r15-914-gc31a9d3152d6119aab83c403308ddb933fe905c5 Author: Hans-Peter Nilsson Date: Thu May 30 01:57:16 2024 +0200 Revert "resource.cc: Remove redundant conditionals" This reverts commit 802a98d128f9b0eea2432f6511328d14e0bd721b. Diff: --- gcc/resource.cc | 123 1 file changed, 71 insertions(+), 52 deletions(-) diff --git a/gcc/resource.cc b/gcc/resource.cc index 7c1de886432..62bd46f786e 100644 --- a/gcc/resource.cc +++ b/gcc/resource.cc @@ -658,41 +658,48 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource res->cc = 0; /* See if we have computed this value already. */ - for (tinfo = target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME]; - tinfo; tinfo = tinfo->next) -if (tinfo->uid == INSN_UID (target)) - break; - - /* Start by getting the basic block number. If we have saved - information, we can get it from there unless the insn at the - start of the basic block has been deleted. */ - if (tinfo && tinfo->block != -1 - && ! BB_HEAD (BASIC_BLOCK_FOR_FN (cfun, tinfo->block))->deleted ()) -b = tinfo->block; + if (target_hash_table != NULL) +{ + for (tinfo = target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME]; + tinfo; tinfo = tinfo->next) + if (tinfo->uid == INSN_UID (target)) + break; + + /* Start by getting the basic block number. If we have saved +information, we can get it from there unless the insn at the +start of the basic block has been deleted. */ + if (tinfo && tinfo->block != -1 + && ! BB_HEAD (BASIC_BLOCK_FOR_FN (cfun, tinfo->block))->deleted ()) + b = tinfo->block; +} if (b == -1) b = BLOCK_FOR_INSN (target)->index; gcc_assert (b != -1); - if (tinfo) + if (target_hash_table != NULL) { - /* If the information is up-to-date, use it. Otherwise, we will -update it below. */ - if (b == tinfo->block && tinfo->bb_tick == bb_ticks[b]) + if (tinfo) { - res->regs = tinfo->live_regs; - return; + /* If the information is up-to-date, use it. Otherwise, we will +update it below. */ + if (b == tinfo->block && tinfo->bb_tick == bb_ticks[b]) + { + res->regs = tinfo->live_regs; + return; + } + } + else + { + /* Allocate a place to put our results and chain it into the +hash table. */ + tinfo = XNEW (struct target_info); + tinfo->uid = INSN_UID (target); + tinfo->block = b; + tinfo->next + = target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME]; + target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME] = tinfo; } -} - else -{ - /* Allocate a place to put our results and chain it into the hash -table. */ - tinfo = XNEW (struct target_info); - tinfo->uid = INSN_UID (target); - tinfo->block = b; - tinfo->next = target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME]; - target_hash_table[INSN_UID (target) % TARGET_HASH_PRIME] = tinfo; } CLEAR_HARD_REG_SET (pending_dead_regs); @@ -818,12 +825,13 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource to be live here still are. The fallthrough edge may have left a live register uninitialized. */ bb = BLOCK_FOR_INSN (real_insn); - gcc_assert (bb); - - HARD_REG_SET extra_live; + if (bb) + { + HARD_REG_SET extra_live; - REG_SET_TO_HARD_REG_SET (extra_live, DF_LR_IN (bb)); - current_live_regs |= extra_live; + REG_SET_TO_HARD_REG_SET (extra_live, DF_LR_IN (bb)); + current_live_regs |= extra_live; + } } /* The beginning of the epilogue corresponds to the end of the @@ -839,8 +847,10 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource { tinfo->block = b; tinfo->bb_tick = bb_ticks[b]; - tinfo->live_regs = res->regs; } + + if (tinfo != NULL) +tinfo->live_regs = res->regs; } /* Initialize the resources required by mark_target_live_regs (). @@ -929,25 +939,31 @@ init_resource_info (rtx_insn *epilogue_insn) void free_resource_info (void) { - int i; - - for (i = 0; i < TARGET_HASH_PRIME; ++i) + if (target_hash_table != NULL) { - struct target_info *ti = target_hash_table[i]; + int i; - while (ti) + for (i = 0; i < TARGET_HASH_PRIME; ++i) { - struct target_info *next = ti->next; - free (ti); - ti = next; + struct target_info *ti = target_hash_table[i]; + + while (ti) + { + struct target_info *next = ti->next; + free
[gcc r15-915] Revert "resource.cc (mark_target_live_regs): Remove check for bb not found"
https://gcc.gnu.org/g:afe48a45b8baa310c8373499b1e5b5407a3e2b94 commit r15-915-gafe48a45b8baa310c8373499b1e5b5407a3e2b94 Author: Hans-Peter Nilsson Date: Thu May 30 01:57:29 2024 +0200 Revert "resource.cc (mark_target_live_regs): Remove check for bb not found" This reverts commit e1abce5b6ad8f5aee86ec7729b516d81014db09e. Diff: --- gcc/resource.cc | 270 +--- 1 file changed, 138 insertions(+), 132 deletions(-) diff --git a/gcc/resource.cc b/gcc/resource.cc index 62bd46f786e..0d8cde93570 100644 --- a/gcc/resource.cc +++ b/gcc/resource.cc @@ -704,150 +704,156 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource CLEAR_HARD_REG_SET (pending_dead_regs); - /* Get the live registers from the basic block and update them with - anything set or killed between its start and the insn before - TARGET; this custom life analysis is really about registers so we - need to use the LR problem. Otherwise, we must assume everything - is live. */ - regset regs_live = DF_LR_IN (BASIC_BLOCK_FOR_FN (cfun, b)); - rtx_insn *start_insn, *stop_insn; - df_ref def; - - /* Compute hard regs live at start of block. */ - REG_SET_TO_HARD_REG_SET (current_live_regs, regs_live); - FOR_EACH_ARTIFICIAL_DEF (def, b) -if (DF_REF_FLAGS (def) & DF_REF_AT_TOP) - SET_HARD_REG_BIT (current_live_regs, DF_REF_REGNO (def)); - - /* Get starting and ending insn, handling the case where each might - be a SEQUENCE. */ - start_insn = (b == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb->index ? - insns : BB_HEAD (BASIC_BLOCK_FOR_FN (cfun, b))); - stop_insn = target; - - if (NONJUMP_INSN_P (start_insn) - && GET_CODE (PATTERN (start_insn)) == SEQUENCE) -start_insn = as_a (PATTERN (start_insn))->insn (0); - - if (NONJUMP_INSN_P (stop_insn) - && GET_CODE (PATTERN (stop_insn)) == SEQUENCE) -stop_insn = next_insn (PREV_INSN (stop_insn)); - - for (insn = start_insn; insn != stop_insn; - insn = next_insn_no_annul (insn)) + /* If we found a basic block, get the live registers from it and update + them with anything set or killed between its start and the insn before + TARGET; this custom life analysis is really about registers so we need + to use the LR problem. Otherwise, we must assume everything is live. */ + if (b != -1) { - rtx link; - rtx_insn *real_insn = insn; - enum rtx_code code = GET_CODE (insn); - - if (DEBUG_INSN_P (insn)) - continue; - - /* If this insn is from the target of a branch, it isn't going to -be used in the sequel. If it is used in both cases, this -test will not be true. */ - if ((code == INSN || code == JUMP_INSN || code == CALL_INSN) - && INSN_FROM_TARGET_P (insn)) - continue; - - /* If this insn is a USE made by update_block, we care about the -underlying insn. */ - if (code == INSN - && GET_CODE (PATTERN (insn)) == USE - && INSN_P (XEXP (PATTERN (insn), 0))) - real_insn = as_a (XEXP (PATTERN (insn), 0)); - - if (CALL_P (real_insn)) + regset regs_live = DF_LR_IN (BASIC_BLOCK_FOR_FN (cfun, b)); + rtx_insn *start_insn, *stop_insn; + df_ref def; + + /* Compute hard regs live at start of block. */ + REG_SET_TO_HARD_REG_SET (current_live_regs, regs_live); + FOR_EACH_ARTIFICIAL_DEF (def, b) + if (DF_REF_FLAGS (def) & DF_REF_AT_TOP) + SET_HARD_REG_BIT (current_live_regs, DF_REF_REGNO (def)); + + /* Get starting and ending insn, handling the case where each might +be a SEQUENCE. */ + start_insn = (b == ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb->index ? + insns : BB_HEAD (BASIC_BLOCK_FOR_FN (cfun, b))); + stop_insn = target; + + if (NONJUMP_INSN_P (start_insn) + && GET_CODE (PATTERN (start_insn)) == SEQUENCE) + start_insn = as_a (PATTERN (start_insn))->insn (0); + + if (NONJUMP_INSN_P (stop_insn) + && GET_CODE (PATTERN (stop_insn)) == SEQUENCE) + stop_insn = next_insn (PREV_INSN (stop_insn)); + + for (insn = start_insn; insn != stop_insn; + insn = next_insn_no_annul (insn)) { - /* Values in call-clobbered registers survive a COND_EXEC CALL -if that is not executed; this matters for resoure use because -they may be used by a complementarily (or more strictly) -predicated instruction, or if the CALL is NORETURN. */ - if (GET_CODE (PATTERN (real_insn)) != COND_EXEC) + rtx link; + rtx_insn *real_insn = insn; + enum rtx_code code = GET_CODE (insn); + + if (DEBUG_INSN_P (insn)) + continue; + + /* If this insn is from the target of a branch, it isn't going to +be used in the sequel. If it is used in both cases, this +test will not be true. */ + if ((code == INSN
[gcc r15-916] Revert "resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN"
https://gcc.gnu.org/g:c68bd7e8023f65d1dc23237f5a04a863344b1264 commit r15-916-gc68bd7e8023f65d1dc23237f5a04a863344b1264 Author: Hans-Peter Nilsson Date: Thu May 30 01:57:39 2024 +0200 Revert "resource.cc: Replace calls to find_basic_block with cfgrtl BLOCK_FOR_INSN" This reverts commit 933ab59c59bdc1ac9e3ca3a56527836564e1821b. Diff: --- gcc/resource.cc | 66 - 1 file changed, 56 insertions(+), 10 deletions(-) diff --git a/gcc/resource.cc b/gcc/resource.cc index 0d8cde93570..06fcfd3e44c 100644 --- a/gcc/resource.cc +++ b/gcc/resource.cc @@ -28,7 +28,6 @@ along with GCC; see the file COPYING3. If not see #include "tm_p.h" #include "regs.h" #include "emit-rtl.h" -#include "cfgrtl.h" #include "resource.h" #include "insn-attr.h" #include "function-abi.h" @@ -76,6 +75,7 @@ static HARD_REG_SET current_live_regs; static HARD_REG_SET pending_dead_regs; static void update_live_status (rtx, const_rtx, void *); +static int find_basic_block (rtx_insn *, int); static rtx_insn *next_insn_no_annul (rtx_insn *); /* Utility function called from mark_target_live_regs via note_stores. @@ -113,6 +113,46 @@ update_live_status (rtx dest, const_rtx x, void *data ATTRIBUTE_UNUSED) CLEAR_HARD_REG_BIT (pending_dead_regs, i); } } + +/* Find the number of the basic block with correct live register + information that starts closest to INSN. Return -1 if we couldn't + find such a basic block or the beginning is more than + SEARCH_LIMIT instructions before INSN. Use SEARCH_LIMIT = -1 for + an unlimited search. + + The delay slot filling code destroys the control-flow graph so, + instead of finding the basic block containing INSN, we search + backwards toward a BARRIER where the live register information is + correct. */ + +static int +find_basic_block (rtx_insn *insn, int search_limit) +{ + /* Scan backwards to the previous BARRIER. Then see if we can find a + label that starts a basic block. Return the basic block number. */ + for (insn = prev_nonnote_insn (insn); + insn && !BARRIER_P (insn) && search_limit != 0; + insn = prev_nonnote_insn (insn), --search_limit) +; + + /* The closest BARRIER is too far away. */ + if (search_limit == 0) +return -1; + + /* The start of the function. */ + else if (insn == 0) +return ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb->index; + + /* See if any of the upcoming CODE_LABELs start a basic block. If we reach + anything other than a CODE_LABEL or note, we can't find this code. */ + for (insn = next_nonnote_insn (insn); + insn && LABEL_P (insn); + insn = next_nonnote_insn (insn)) +if (BLOCK_FOR_INSN (insn)) + return BLOCK_FOR_INSN (insn)->index; + + return -1; +} /* Similar to next_insn, but ignores insns in the delay slots of an annulled branch. */ @@ -674,8 +714,7 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource } if (b == -1) -b = BLOCK_FOR_INSN (target)->index; - gcc_assert (b != -1); +b = find_basic_block (target, param_max_delay_slot_live_search); if (target_hash_table != NULL) { @@ -683,7 +722,7 @@ mark_target_live_regs (rtx_insn *insns, rtx target_maybe_return, struct resource { /* If the information is up-to-date, use it. Otherwise, we will update it below. */ - if (b == tinfo->block && tinfo->bb_tick == bb_ticks[b]) + if (b == tinfo->block && b != -1 && tinfo->bb_tick == bb_ticks[b]) { res->regs = tinfo->live_regs; return; @@ -866,6 +905,7 @@ void init_resource_info (rtx_insn *epilogue_insn) { int i; + basic_block bb; /* Indicate what resources are required to be valid at the end of the current function. The condition code never is and memory always is. @@ -935,8 +975,10 @@ init_resource_info (rtx_insn *epilogue_insn) target_hash_table = XCNEWVEC (struct target_info *, TARGET_HASH_PRIME); bb_ticks = XCNEWVEC (int, last_basic_block_for_fn (cfun)); - /* Set the BLOCK_FOR_INSN for each insn. */ - compute_bb_for_insn (); + /* Set the BLOCK_FOR_INSN of each label that starts a basic block. */ + FOR_EACH_BB_FN (bb, cfun) +if (LABEL_P (BB_HEAD (bb))) + BLOCK_FOR_INSN (BB_HEAD (bb)) = bb; } /* Free up the resources allocated to mark_target_live_regs (). This @@ -945,6 +987,8 @@ init_resource_info (rtx_insn *epilogue_insn) void free_resource_info (void) { + basic_block bb; + if (target_hash_table != NULL) { int i; @@ -971,7 +1015,9 @@ free_resource_info (void) bb_ticks = NULL; } - free_bb_for_insn (); + FOR_EACH_BB_FN (bb, cfun) +if (LABEL_P (BB_HEAD (bb))) + BLOCK_FOR_INSN (BB_HEAD (bb)) = NULL; } /* Clear any hashed information that we have stored for INSN. */ @@ -1017,10 +1063,10 @@ clear_hashed_info_until_next_barrier (rtx_insn *insn) void incr_ticks
[gcc r14-10260] MIPS16: Mark $2/$3 as clobbered if GP is used
https://gcc.gnu.org/g:201cfa725587d13867b4dc25955434ebe90aff7b commit r14-10260-g201cfa725587d13867b4dc25955434ebe90aff7b Author: YunQiang Su Date: Wed May 29 02:28:25 2024 +0800 MIPS16: Mark $2/$3 as clobbered if GP is used PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu$2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer Origin-Patch-by: Felix Fietkau . gcc * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737) Diff: --- gcc/config/mips/mips.cc | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index ce764a5cb35..1156d212c1f 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -3233,6 +3233,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p) { rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), + MIPS_PROLOGUE_TEMP (word_mode)); } return insn; @@ -3329,7 +3332,13 @@ mips16_gp_pseudo_reg (void) rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx); rtx_insn *insn = emit_insn_after (set, scan); INSN_LOCATION (insn) = 0; - + /* NewABI support hasn't been implement. NewABI should generate RTL +sequence instead of ASM sequence directly. */ + if (mips_current_loadgp_style () == LOADGP_OLDABI) + { + emit_clobber (MIPS16_PIC_TEMP); + emit_clobber (MIPS_PROLOGUE_TEMP (Pmode)); + } pop_topmost_sequence (); }
[gcc r12-10480] MIPS16: Mark $2/$3 as clobbered if GP is used
https://gcc.gnu.org/g:e26f16424f6279662efb210bc87c77148e956fed commit r12-10480-ge26f16424f6279662efb210bc87c77148e956fed Author: YunQiang Su Date: Wed May 29 02:28:25 2024 +0800 MIPS16: Mark $2/$3 as clobbered if GP is used PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu$2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer Origin-Patch-by: Felix Fietkau . gcc * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737) Diff: --- gcc/config/mips/mips.cc | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index e64928f4113..d26630b20ce 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -3140,6 +3140,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p) { rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), + MIPS_PROLOGUE_TEMP (word_mode)); } return insn; @@ -3236,7 +3239,13 @@ mips16_gp_pseudo_reg (void) rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx); rtx_insn *insn = emit_insn_after (set, scan); INSN_LOCATION (insn) = 0; - + /* NewABI support hasn't been implement. NewABI should generate RTL +sequence instead of ASM sequence directly. */ + if (mips_current_loadgp_style () == LOADGP_OLDABI) + { + emit_clobber (MIPS16_PIC_TEMP); + emit_clobber (MIPS_PROLOGUE_TEMP (Pmode)); + } pop_topmost_sequence (); }
[gcc r13-8809] MIPS16: Mark $2/$3 as clobbered if GP is used
https://gcc.gnu.org/g:3be8fa7b19d218ca5812d71801e3e83ee2260ea0 commit r13-8809-g3be8fa7b19d218ca5812d71801e3e83ee2260ea0 Author: YunQiang Su Date: Wed May 29 02:28:25 2024 +0800 MIPS16: Mark $2/$3 as clobbered if GP is used PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu$2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer Origin-Patch-by: Felix Fietkau . gcc * config/mips/mips.cc(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737) Diff: --- gcc/config/mips/mips.cc | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc index 8e3dc313cb3..9bc73b2e77d 100644 --- a/gcc/config/mips/mips.cc +++ b/gcc/config/mips/mips.cc @@ -3140,6 +3140,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p) { rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), + MIPS_PROLOGUE_TEMP (word_mode)); } return insn; @@ -3236,7 +3239,13 @@ mips16_gp_pseudo_reg (void) rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx); rtx_insn *insn = emit_insn_after (set, scan); INSN_LOCATION (insn) = 0; - + /* NewABI support hasn't been implement. NewABI should generate RTL +sequence instead of ASM sequence directly. */ + if (mips_current_loadgp_style () == LOADGP_OLDABI) + { + emit_clobber (MIPS16_PIC_TEMP); + emit_clobber (MIPS_PROLOGUE_TEMP (Pmode)); + } pop_topmost_sequence (); }
[gcc r15-917] tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE
https://gcc.gnu.org/g:c9842f99042454bef99fe82506c6dd50f34e283e commit r15-917-gc9842f99042454bef99fe82506c6dd50f34e283e Author: Jiawei Date: Mon May 27 15:40:51 2024 +0800 tree-ssa-pre.c/115214(ICE in find_or_generate_expression, at tree-ssa-pre.c:2780): Return NULL_TREE when deal special cases. Return NULL_TREE when genop3 equal EXACT_DIV_EXPR. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652641.html version log v3: remove additional POLY_INT_CST check. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652795.html gcc/ChangeLog: * tree-ssa-pre.cc (create_component_ref_by_pieces_1): New conditions. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr115214.c: New test. Diff: --- .../gcc.target/riscv/rvv/vsetvl/pr115214.c | 52 ++ gcc/tree-ssa-pre.cc| 10 +++-- 2 files changed, 59 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c new file mode 100644 index 000..fce2e9da766 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr115214.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-mrvv-vector-bits=scalable -march=rv64gcv -mabi=lp64d -O3 -w" } */ +/* { dg-skip-if "" { *-*-* } { "-flto" } } */ + +#include + +static inline __attribute__(()) int vaddq_f32(); +static inline __attribute__(()) int vload_tillz_f32(int nlane) { + vint32m1_t __trans_tmp_9; + { +int __trans_tmp_0 = nlane; +{ + vint64m1_t __trans_tmp_1; + vint64m1_t __trans_tmp_2; + vint64m1_t __trans_tmp_3; + vint64m1_t __trans_tmp_4; + if (__trans_tmp_0 == 1) { +{ + __trans_tmp_3 = + __riscv_vslideup_vx_i64m1(__trans_tmp_1, __trans_tmp_2, 1, 2); +} +__trans_tmp_4 = __trans_tmp_2; + } + __trans_tmp_4 = __trans_tmp_3; + __trans_tmp_9 = __riscv_vreinterpret_v_i64m1_i32m1(__trans_tmp_3); +} + } + return vaddq_f32(__trans_tmp_9); /* { dg-error {RVV type 'vint32m1_t' cannot be passed to an unprototyped function} } */ +} + +char CFLOAT_add_args[3]; +const int *CFLOAT_add_steps; +const int CFLOAT_steps; + +__attribute__(()) void CFLOAT_add() { + char *b_src0 = &CFLOAT_add_args[0], *b_src1 = &CFLOAT_add_args[1], + *b_dst = &CFLOAT_add_args[2]; + const float *src1 = (float *)b_src1; + float *dst = (float *)b_dst; + const int ssrc1 = CFLOAT_add_steps[1] / sizeof(float); + const int sdst = CFLOAT_add_steps[2] / sizeof(float); + const int hstep = 4 / 2; + vfloat32m1x2_t a; + int len = 255; + for (; len > 0; len -= hstep, src1 += 4, dst += 4) { +int b = vload_tillz_f32(len); +int r = vaddq_f32(a.__val[0], b); /* { dg-error {RVV type '__rvv_float32m1_t' cannot be passed to an unprototyped function} } */ + } + for (; len > 0; --len, b_src0 += CFLOAT_steps, + b_src1 += CFLOAT_add_steps[1], b_dst += CFLOAT_add_steps[2]) +; +} diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc index 75217f5cde1..5cf1968bc26 100644 --- a/gcc/tree-ssa-pre.cc +++ b/gcc/tree-ssa-pre.cc @@ -2685,11 +2685,15 @@ create_component_ref_by_pieces_1 (basic_block block, vn_reference_t ref, here as the element alignment may be not visible. See PR43783. Simply drop the element size for constant sizes. */ - if (TREE_CODE (genop3) == INTEGER_CST + if ((TREE_CODE (genop3) == INTEGER_CST && TREE_CODE (TYPE_SIZE_UNIT (elmt_type)) == INTEGER_CST && wi::eq_p (wi::to_offset (TYPE_SIZE_UNIT (elmt_type)), -(wi::to_offset (genop3) - * vn_ref_op_align_unit (currop +(wi::to_offset (genop3) * vn_ref_op_align_unit (currop + || (TREE_CODE (genop3) == EXACT_DIV_EXPR + && TREE_CODE (TREE_OPERAND (genop3, 1)) == INTEGER_CST + && operand_equal_p (TREE_OPERAND (genop3, 0), TYPE_SIZE_UNIT (elmt_type)) + && wi::eq_p (wi::to_offset (TREE_OPERAND (genop3, 1)), +vn_ref_op_align_unit (currop genop3 = NULL_TREE; else {
[gcc r11-11457] MIPS16: Mark $2/$3 as clobbered if GP is used
https://gcc.gnu.org/g:1bc4a777b21ae36b116e1842b7c482340ec929ef commit r11-11457-g1bc4a777b21ae36b116e1842b7c482340ec929ef Author: YunQiang Su Date: Wed May 29 02:28:25 2024 +0800 MIPS16: Mark $2/$3 as clobbered if GP is used PR Target/84790. The gp init sequence li $2,%hi(_gp_disp) addiu $3,$pc,%lo(_gp_disp) sll $2,16 addu$2,$3 is generated directly in `mips_output_function_prologue`, and does not appear in the RTL. So the IRA/IPA passes are not aware that $2/$3 have been clobbered, so they may be used for cross (local) function call. Let's mark $2/$3 clobber both: - Just after the UNSPEC_GP RTL of a function; - Just after a function call. Reported-by: Matthias Schiffer Origin-Patch-by: Felix Fietkau . gcc * config/mips/mips.c(mips16_gp_pseudo_reg): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered. (mips_emit_call_insn): Mark MIPS16_PIC_TEMP and MIPS_PROLOGUE_TEMP clobbered if MIPS16 and CALL_CLOBBERED_GP. (cherry picked from commit 915440eed21de367cb41857afb5273aff5bcb737) Diff: --- gcc/config/mips/mips.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index bb6ff08e94c..3cf09494aec 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -3138,6 +3138,9 @@ mips_emit_call_insn (rtx pattern, rtx orig_addr, rtx addr, bool lazy_p) { rtx post_call_tmp_reg = gen_rtx_REG (word_mode, POST_CALL_TMP_REG); clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), post_call_tmp_reg); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), MIPS16_PIC_TEMP); + clobber_reg (&CALL_INSN_FUNCTION_USAGE (insn), + MIPS_PROLOGUE_TEMP (word_mode)); } return insn; @@ -3234,7 +3237,13 @@ mips16_gp_pseudo_reg (void) rtx set = gen_load_const_gp (cfun->machine->mips16_gp_pseudo_rtx); rtx_insn *insn = emit_insn_after (set, scan); INSN_LOCATION (insn) = 0; - + /* NewABI support hasn't been implement. NewABI should generate RTL +sequence instead of ASM sequence directly. */ + if (mips_current_loadgp_style () == LOADGP_OLDABI) + { + emit_clobber (MIPS16_PIC_TEMP); + emit_clobber (MIPS_PROLOGUE_TEMP (Pmode)); + } pop_topmost_sequence (); }
[gcc r15-918] [testsuite] conditionalize dg-additional-sources on target and type
https://gcc.gnu.org/g:bdc264a16e327c63d133131a695a202fbbc0a6a0 commit r15-918-gbdc264a16e327c63d133131a695a202fbbc0a6a0 Author: Alexandre Oliva Date: Thu May 30 02:06:48 2024 -0300 [testsuite] conditionalize dg-additional-sources on target and type g++.dg/vect/pr95401.cc has dg-additional-sources, and that fails when check_vect_support_and_set_flags finds vector support lacking for execution tests: tests decay to compile tests, and additional sources are rejected by the compiler when compiling to a named output file. At first I considered using some effective target to conditionalize the additional sources. There was no support for target-specific additional sources, so I added that. But then, I found that adding an effective target to check whether the test involves linking would just make for busy work in this case, and so I went ahead and adjusted the handling of additional sources to refrain from adding them on compile tests, reporting them as unsupported. That solves the problem without using the newly-added machinery for per-target additional sources, but I figured since I'd implemented it I might as well contribute it, since there might be other uses for it. for gcc/ChangeLog * doc/sourcebuild.texi (dg-additional-sources): Document newly-added support for target selectors, and implicit discard on non-linking tests that name the compiler output explicitly. for gcc/testsuite/ChangeLog * lib/gcc-defs.exp (dg-additional-sources): Support target selectors. Make it cumulative. (dg-additional-files-options): Take dest and type. Note unsupported additional sources when not linking and naming the compiler output. Adjust source dirname prepending to cope with leading blanks. * lib/g++.exp (g++_target_compile): Pass dest and type on to dg-additional-files-options. * lib/gcc.exp (gcc_target_compile): Likewise. * lib/gdc.exp (gdb_target_compile): Likewise. * lib/gfortran.exp (gfortran_target_compile): Likewise. * lib/go.exp (go_target_compile): Likewise. * lib/obj-c++.exp (obj-c++_target_compile): Likewise. * lib/objc.exp (objc_target_compile): Likewise. * lib/rust.exp (rust_target_compile): Likewise. * lib/profopt.exp (profopt-execute): Likewise-ish. Diff: --- gcc/doc/sourcebuild.texi | 8 +++- gcc/testsuite/lib/g++.exp | 2 +- gcc/testsuite/lib/gcc-defs.exp | 35 ++- gcc/testsuite/lib/gcc.exp | 2 +- gcc/testsuite/lib/gdc.exp | 2 +- gcc/testsuite/lib/gfortran.exp | 2 +- gcc/testsuite/lib/go.exp | 2 +- gcc/testsuite/lib/obj-c++.exp | 2 +- gcc/testsuite/lib/objc.exp | 2 +- gcc/testsuite/lib/profopt.exp | 2 +- gcc/testsuite/lib/rust.exp | 2 +- 11 files changed, 46 insertions(+), 15 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 8e4e59ac44c..e997dbec333 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -1320,9 +1320,15 @@ to @var{var_value} before execution of the program created by the test. Specify additional files, other than source files, that must be copied to the system where the compiler runs. -@item @{ dg-additional-sources "@var{filelist}" @} +@item @{ dg-additional-sources "@var{filelist}" [@{ target @var{selector} @}] @} Specify additional source files to appear in the compile line following the main test file. +If the directive includes the optional @samp{@{ @var{selector} @}} +then the additional sources are only added if the target system +matches the @var{selector}. +Additional sources are generally used only in @samp{link} and @samp{run} +tests; they are reported as unsupported and discarded in other kinds of +tests that direct the compiler to output to a single file. @end table @subsubsection Add checks at the end of a test diff --git a/gcc/testsuite/lib/g++.exp b/gcc/testsuite/lib/g++.exp index 0e47769c25b..a6b34d5d3a2 100644 --- a/gcc/testsuite/lib/g++.exp +++ b/gcc/testsuite/lib/g++.exp @@ -326,7 +326,7 @@ proc g++_target_compile { source dest type options } { append board_info($tboard,multilib_flags) " $flags_to_postpone" } -set options [dg-additional-files-options $options $source] +set options [dg-additional-files-options $options $source $dest $type] if { [target_info needs_status_wrapper] != "" && [info exists gluefile] } { lappend options "libs=${gluefile}" diff --git a/gcc/testsuite/lib/gcc-defs.exp b/gcc/testsuite/lib/gcc-defs.exp index 70215ed4905..cdca4c254d6 100644 --- a/gcc/testsuite/lib/gcc-defs.exp +++ b/gcc/testsuite/lib/gcc-defs.exp @@ -307,7 +307,22 @@ set additional_sources_used "" proc dg-additional-sources { args } { g
[gcc r15-919] Don't reduce estimated unrolled size for innermost loop.
https://gcc.gnu.org/g:ef27b91b62c3aa8841c02665dffa8914c742fd37 commit r15-919-gef27b91b62c3aa8841c02665dffa8914c742fd37 Author: liuhongt Date: Tue Feb 27 15:34:57 2024 +0800 Don't reduce estimated unrolled size for innermost loop. For the innermost loop, after completely loop unroll, it will most likely not be able to reduce the body size to 2/3. The current 2/3 reduction will make some of the larger loops completely unrolled during cunrolli, which will then result in them not being able to be vectorized. It also increases the register pressure. The patch move the 2/3 reduction from estimated_unrolled_size to tree_unroll_loops_completely. gcc/ChangeLog: PR tree-optimization/112325 * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Move the 2 / 3 loop body size reduction to .. (try_unroll_loop_completely): .. here, add it for the check of body size shrink, and the check of comparison against param_max_completely_peeled_insns when (!cunrolli ||loop->inner). (canonicalize_loop_induction_variables): Add new parameter cunrolli and pass down. (tree_unroll_loops_completely_1): Ditto. (canonicalize_induction_variables): Pass cunrolli as false to canonicalize_loop_induction_variables. (tree_unroll_loops_completely): Set cunrolli to true at beginning and set it to false after CHANGED is true. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr112325.c: New test. Diff: --- gcc/testsuite/gcc.dg/vect/pr112325.c | 59 gcc/tree-ssa-loop-ivcanon.cc | 49 -- 2 files changed, 86 insertions(+), 22 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c b/gcc/testsuite/gcc.dg/vect/pr112325.c new file mode 100644 index 000..71cf4099253 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr112325.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -funroll-loops -fdump-tree-vect-details" } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */ + +typedef unsigned short ggml_fp16_t; +static float table_f32_f16[1 << 16]; + +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) { +unsigned short s; +__builtin_memcpy(&s, &f, sizeof(unsigned short)); +return table_f32_f16[s]; +} + +typedef struct { +ggml_fp16_t d; +ggml_fp16_t m; +unsigned char qh[4]; +unsigned char qs[32 / 2]; +} block_q5_1; + +typedef struct { +float d; +float s; +char qs[32]; +} block_q8_1; + +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * restrict vx, const void * restrict vy) { +const int qk = 32; +const int nb = n / qk; + +const block_q5_1 * restrict x = vx; +const block_q8_1 * restrict y = vy; + +float sumf = 0.0; + +for (int i = 0; i < nb; i++) { +unsigned qh; +__builtin_memcpy(&qh, x[i].qh, sizeof(qh)); + +int sumi = 0; + +for (int j = 0; j < qk/2; ++j) { +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10; +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10; + +const int x0 = (x[i].qs[j] & 0xF) | xh_0; +const int x1 = (x[i].qs[j] >> 4) | xh_1; + +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]); +} + +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s; +} + +*s = sumf; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc index bf017137260..5ef24a91917 100644 --- a/gcc/tree-ssa-loop-ivcanon.cc +++ b/gcc/tree-ssa-loop-ivcanon.cc @@ -437,11 +437,7 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge edge_to_cancel, It is (NUNROLL + 1) * size of loop body with taking into account the fact that in last copy everything after exit conditional is dead and that some instructions will be eliminated after - peeling. - - Loop body is likely going to simplify further, this is difficult - to guess, we just decrease the result by 1/3. */ - + peeling. */ static unsigned HOST_WIDE_INT estimated_unrolled_size (struct loop_size *size, unsigned HOST_WIDE_INT nunroll) @@ -453,10 +449,6 @@ estimated_unrolled_size (struct loop_size *size, unr_insns = 0; unr_insns += size->last_iteration - size->last_iteration_eliminated_by_peeling; - unr_insns = unr_insns * 2 / 3; - if (unr_insns <= 0) -unr_insns = 1; - return unr_insns; } @@ -734,7 +726,8 @@ try_unroll_loop_completely (class loop *loop, edge exit, tree niter, bool may_be_zero, enum unroll_level ul,
[gcc r15-920] Support vcond_mask_qiqi and friends.
https://gcc.gnu.org/g:b6c6d5abf0d31c936f50f8f9073c5e335b9e24b7 commit r15-920-gb6c6d5abf0d31c936f50f8f9073c5e335b9e24b7 Author: liuhongt Date: Wed Feb 28 11:17:10 2024 +0800 Support vcond_mask_qiqi and friends. gcc/ChangeLog: * config/i386/sse.md (vcond_mask_): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/pr114125.c: New test. Diff: --- gcc/config/i386/sse.md | 20 gcc/testsuite/gcc.target/i386/pr114125.c | 10 ++ 2 files changed, 30 insertions(+) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 0f4fbcb2c5d..7cd912eeeb1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -4807,6 +4807,26 @@ DONE; }) +(define_expand "vcond_mask_" + [(match_operand:SWI1248_AVX512BW 0 "register_operand") + (match_operand:SWI1248_AVX512BW 1 "register_operand") + (match_operand:SWI1248_AVX512BW 2 "register_operand") + (match_operand:SWI1248_AVX512BW 3 "register_operand")] + "TARGET_AVX512F" +{ + /* (operand[1] & operand[3]) | (operand[2] & ~operand[3]) */ + rtx op1 = gen_reg_rtx (mode); + rtx op2 = gen_reg_rtx (mode); + rtx op3 = gen_reg_rtx (mode); + + emit_insn (gen_and3 (op1, operands[1], operands[3])); + emit_insn (gen_one_cmpl2 (op3, operands[3])); + emit_insn (gen_and3 (op2, operands[2], op3)); + emit_insn (gen_ior3 (operands[0], op1, op2)); + + DONE; +}) + ; ;; ;; Parallel floating point logical operations diff --git a/gcc/testsuite/gcc.target/i386/pr114125.c b/gcc/testsuite/gcc.target/i386/pr114125.c new file mode 100644 index 000..e63fbffe965 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114125.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=x86-64-v4 -fdump-tree-forwprop3-raw " } */ + +typedef long vec __attribute__((vector_size(16))); +vec f(vec x){ + vec y = x < 10; + return y & (y == 0); +} + +/* { dg-final { scan-tree-dump-not "_expr" "forwprop3" } } */