Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans
Hi Jeff, Thank you for taking care of it. Toru From: Jeff Law Sent: Monday, June 19, 2023 7:55 PM To: Richard Biener; Toru Kisuki Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans [EXTERNAL] Caution: This email originated from outside of the organization. On 6/19/23 05:41, Richard Biener via Gcc-patches wrote: > On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches > wrote: >> >> Hi, >> >> >> With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x >> + 0.0' to 'x'. >> > > OK if you bootstrapped / tested this change. I'm suspect Toru doesn't have write access. So I went ahead and did and x86 bootstrap & regression test which passed. The ChangeLog entry needed fleshing out a bit and fixed a minor whitespace problem in the patch itself. Pushed to the trunk. jeff
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
On 6/19/23 22:52, Tamar Christina wrote: It's a bit hackish, but could we reject the stack pointer for operand1 in the stack-tie? And if we do so, does it help? Yeah this one I had to defer until later this week to look at closer because what I'm wondering about is whether the optimization should apply to frame related RTX as well. Looking at the description of RTX_FRAME_RELATED_P that this optimization may end up de-optimizing RISC targets by creating an offset that is larger than offset which can be used from a SP making reload having to spill. i.e. sometimes the move was explicitly done. So perhaps it should not apply it to RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1? Other parts of this pass already seems to bail out in similar situations. So I needed to write some testcases to check what would happen in these cases hence the deferral. to later in the week. Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better in general to me. The cases where we're looking to clean things up aren't really in the prologue/epilogue, but instead in the main body after register elimination has turned fp into sp + offset, thus making all kinds of things no longer valid. jeff
RE: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
> -Original Message- > From: Jeff Law > Sent: Tuesday, June 20, 2023 3:17 AM > To: Andrew Pinski ; Thiago Jung Bauermann > > Cc: Manolis Tsamis ; Philipp Tomsich > ; Richard Biener ; > Palmer Dabbelt ; Kito Cheng ; > gcc-patches@gcc.gnu.org; Tamar Christina > Subject: Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack > pointer if possible. > > > > On 6/19/23 17:48, Andrew Pinski wrote: > > On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski > wrote: > >> > >> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches > >> wrote: > >>> > >>> > >>> Hello Manolis, > >>> > >>> Philipp Tomsich writes: > >>> > On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: > > > > On 5/25/23 06:35, Manolis Tsamis wrote: > >> Propagation of the stack pointer in cprop_hardreg is currenty > >> forbidden in all cases, due to maybe_mode_change returning NULL. > >> Relax this restriction and allow propagation when no mode change is > requested. > >> > >> gcc/ChangeLog: > >> > >> * regcprop.cc (maybe_mode_change): Enable stack pointer > propagation. > > Thanks for the clarification. This is OK for the trunk. It looks > > generic enough to have value going forward now rather than waiting. > > Rebased, retested, and applied to trunk. Thanks! > >>> > >>> Our CI found a couple of tests that started failing on aarch64-linux > >>> after this commit. I was able to confirm manually that they don't > >>> happen in the commit immediately before this one, and also that > >>> these failures are still present in today's trunk. > >>> > >>> I have testsuite logs for last good commit, first bad commit and > >>> current trunk here: > >>> > >>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbb > >>> d4b/ > >>> > >>> Could you please check? > >>> > >>> These are the new failures: > >>> > >>> Running gcc:gcc.target/aarch64/aarch64.exp ... > >>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times > >>> mov\\tx11, sp 1 > >> > >> So for the above before this change we had: > >> ``` > >> (insn:TI 597 596 598 2 (set (reg:DI 11 x11) > >> (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 > {*movdi_aarch64} > >> (nil)) > >> (insn 598 597 599 2 (set (mem:BLK (scratch) [0 A8]) > >> (unspec:BLK [ > >> (reg:DI 11 x11) > >> (reg/f:DI 31 sp) > >> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 > >> 1169 {stack_tie} > >> (expr_list:REG_DEAD (reg:DI 11 x11) > >> (nil))) > >> ``` > >> > >> After we get: > >> ``` > >> (insn 598 596 599 2 (set (mem:BLK (scratch) [0 A8]) > >> (unspec:BLK [ > >> (reg:DI 31 sp [11]) repeated x2 > >> ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 > >> 1169 {stack_tie} > >> (nil)) > >> ``` > >> Which seems to be ok, except we still have: > >> .cfi_def_cfa_register 11 > >> > >> That is because on: > >> (insn/f 596 595 598 2 (set (reg:DI 12 x12) > >> (plus:DI (reg:DI 12 x12) > >> (const_int 272 [0x110]))) > >> "stack-check-prologue-16.c":16:1 > >> 153 {*adddi3_aarch64} > >> (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) > >> (nil))) > >> > >> We record x11 but never update it though that came before the mov for > >> x11 ... So it seems like cprop_hardreg had no idea it needed to > >> update it. > >> > >> I suspect the other testcases are just propagation of sp into the > >> stores and such and just needed update. But the above testcase seems > >> getting broken cfi though I don't know how to fix it. Yeah, we noticed the failures internally but left them broken since we have an upcoming AArch64 patch which requires them to be updated anyway and are rolling up the updates into that patch. > > > > The code from aarch64.cc: > > ``` > >/* This is done to provide unwinding information for the stack > > adjustments we're about to do, however to prevent the > > optimizers > > from removing the R11 move and leaving the CFA note (which > > would > be > > very wrong) we tie the old and new stack pointer together. > > The tie will expand to nothing but the optimizers will not > > touch > > the instruction. */ > >rtx stack_ptr_copy = gen_rtx_REG (Pmode, > STACK_CLASH_SVE_CFA_REGNUM); > >emit_move_insn (stack_ptr_copy, stack_pointer_rtx); > >emit_insn (gen_stack_tie (stack_ptr_copy, > > stack_pointer_rtx)); > > > >/* We want the CFA independent of the stack pointer for the > > duration of the loop. */ > >add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy); > >RTX_FRAME_RELATED_P (insn) = 1; ``` > > > > Well except now with this change, the optimizers touch this > > instruction. Maybe the move instruction should not be a move but an > > unspec so optimizers don't know what
[PATCH] Change fma_reassoc_width tuning for ampere1
This patch enables reassociation of floating-point additions on ampere1. This brings about 1% overall benefit on spec2017 fprate cases. (There are minor regressions in 510.parest_r and 508.namd_r, analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .) Bootstrapped and tested on aarch64-unknown-linux-gnu. Is this OK for trunk? Thanks, Di Zhao gcc/ChangeLog: * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1 --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index d16565b5581..301c9f6c0cd 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1927,7 +1927,7 @@ static const struct tune_params ampere1_tunings = "32:12", /* loop_align. */ 2, /* int_reassoc_width. */ 4, /* fp_reassoc_width. */ - 1, /* fma_reassoc_width. */ + 4, /* fma_reassoc_width. */ 2, /* vec_reassoc_width. */ 2, /* min_div_recip_mul_sf. */ 2, /* min_div_recip_mul_df. */ -- 2.25.1
[Bug testsuite/110230] new test case gcc.target/powerpc/pr109932-1.c in r14-1705-g2764335bd336f2 fails for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110230 --- Comment #5 from CVS Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:4b4a21c93406aef276fbff00d3e9491285d7b4a9 commit r13-7458-g4b4a21c93406aef276fbff00d3e9491285d7b4a9 Author: Kewen Lin Date: Tue Jun 13 03:04:54 2023 -0500 testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230] This patch is to make newly added test cases pr109932-{1,2}.c check int128 effective target to avoid unsupported type error on 32-bit. I did hit this failure during testing and fixed it, but made a stupid mistake not updating the local formatted patch which was actually out of date. PR testsuite/110230 PR target/109932 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target. * gcc.target/powerpc/pr109932-2.c: Ditto. (cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)
[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932 --- Comment #7 from CVS Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:4b4a21c93406aef276fbff00d3e9491285d7b4a9 commit r13-7458-g4b4a21c93406aef276fbff00d3e9491285d7b4a9 Author: Kewen Lin Date: Tue Jun 13 03:04:54 2023 -0500 testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230] This patch is to make newly added test cases pr109932-{1,2}.c check int128 effective target to avoid unsupported type error on 32-bit. I did hit this failure during testing and fixed it, but made a stupid mistake not updating the local formatted patch which was actually out of date. PR testsuite/110230 PR target/109932 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target. * gcc.target/powerpc/pr109932-2.c: Ditto. (cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)
[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932 --- Comment #6 from CVS Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:4e67d73ee5100c12993c79852e4ede13d2606cad commit r13-7457-g4e67d73ee5100c12993c79852e4ede13d2606cad Author: Kewen Lin Date: Mon Jun 12 01:08:22 2023 -0500 rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932] As PR109932 shows, builtins __builtin_{un,}pack_vector_int128 should be guarded under vsx rather than power7, as their corresponding bif patterns have the conditions TARGET_VSX and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode). This patch is to move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure their supports. PR target/109932 gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128, __builtin_unpack_vector_int128): Move from stanza power7 to vsx. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: New test. * gcc.target/powerpc/pr109932-2.c: New test. (cherry picked from commit ff83d1b47aadcdaf80a4fda84b0dc00bb2cd3641)
[Bug target/110011] -mfull-toc (-mfp-in-toc) yields incorrect _Float128 constants on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110011 --- Comment #10 from CVS Commits --- The releases/gcc-13 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:cefe925fe49af81bb4ae7a27fa2c96f0926fe22e commit r13-7456-gcefe925fe49af81bb4ae7a27fa2c96f0926fe22e Author: Kewen Lin Date: Mon Jun 12 01:07:52 2023 -0500 rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011] As PR110011 shows, when encoding 128 bits fp constant into toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is to find the first float mode with LONG_DOUBLE_TYPE_SIZE bits of precision, it would be TFmode here. But the 128 bits fp constant can be with mode IFmode or KFmode, which doesn't necessarily have the same underlying float format as the one of TFmode, like this PR exposes, with option -mabi=ibmlongdouble TFmode has ibm_extended_format while KFmode has ieee_quad_format, mixing up the formats (the encoding/decoding ways) would cause unexpected results. This patch is to make it use constant's own mode instead of TFmode for real_to_target call. PR target/110011 gcc/ChangeLog: * config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit floating constant itself for real_to_target call. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr110011.c: New test. (cherry picked from commit 388809f2afde874180da0669c669e241037eeba0)
[Bug testsuite/110230] new test case gcc.target/powerpc/pr109932-1.c in r14-1705-g2764335bd336f2 fails for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110230 --- Comment #4 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:4591c2c8a6b15ca99ba049d84e0e694f12db4f60 commit r12-9714-g4591c2c8a6b15ca99ba049d84e0e694f12db4f60 Author: Kewen Lin Date: Tue Jun 13 03:04:54 2023 -0500 testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230] This patch is to make newly added test cases pr109932-{1,2}.c check int128 effective target to avoid unsupported type error on 32-bit. I did hit this failure during testing and fixed it, but made a stupid mistake not updating the local formatted patch which was actually out of date. PR testsuite/110230 PR target/109932 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target. * gcc.target/powerpc/pr109932-2.c: Ditto. (cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)
[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932 --- Comment #5 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:4591c2c8a6b15ca99ba049d84e0e694f12db4f60 commit r12-9714-g4591c2c8a6b15ca99ba049d84e0e694f12db4f60 Author: Kewen Lin Date: Tue Jun 13 03:04:54 2023 -0500 testsuite: Check int128 effective target for pr109932-{1,2}.c [PR110230] This patch is to make newly added test cases pr109932-{1,2}.c check int128 effective target to avoid unsupported type error on 32-bit. I did hit this failure during testing and fixed it, but made a stupid mistake not updating the local formatted patch which was actually out of date. PR testsuite/110230 PR target/109932 gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: Adjust with int128 effective target. * gcc.target/powerpc/pr109932-2.c: Ditto. (cherry picked from commit 16eb9d69079d769b2aa2c07ce54aca20f5547c14)
[Bug target/109932] ICE in in extract_insn, at recog.cc:2791 on ppc64le with -mno-vsx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109932 --- Comment #4 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:31d88c795a0eb05df5a0684c34ec74116cce133f commit r12-9713-g31d88c795a0eb05df5a0684c34ec74116cce133f Author: Kewen Lin Date: Mon Jun 12 01:08:22 2023 -0500 rs6000: Guard __builtin_{un,}pack_vector_int128 with vsx [PR109932] As PR109932 shows, builtins __builtin_{un,}pack_vector_int128 should be guarded under vsx rather than power7, as their corresponding bif patterns have the conditions TARGET_VSX and VECTOR_MEM_ALTIVEC_OR_VSX_P (V1TImode). This patch is to move __builtin_{un,}pack_vector_int128 to stanza vsx to ensure their supports. PR target/109932 gcc/ChangeLog: * config/rs6000/rs6000-builtins.def (__builtin_pack_vector_int128, __builtin_unpack_vector_int128): Move from stanza power7 to vsx. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr109932-1.c: New test. * gcc.target/powerpc/pr109932-2.c: New test. (cherry picked from commit ff83d1b47aadcdaf80a4fda84b0dc00bb2cd3641)
[Bug target/110011] -mfull-toc (-mfp-in-toc) yields incorrect _Float128 constants on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110011 --- Comment #9 from CVS Commits --- The releases/gcc-12 branch has been updated by Kewen Lin : https://gcc.gnu.org/g:90e1030d4c6d981c2293d89db6d1d57c057ad61d commit r12-9712-g90e1030d4c6d981c2293d89db6d1d57c057ad61d Author: Kewen Lin Date: Mon Jun 12 01:07:52 2023 -0500 rs6000: Don't use TFmode for 128 bits fp constant in toc [PR110011] As PR110011 shows, when encoding 128 bits fp constant into toc, we adopts REAL_VALUE_TO_TARGET_LONG_DOUBLE which is to find the first float mode with LONG_DOUBLE_TYPE_SIZE bits of precision, it would be TFmode here. But the 128 bits fp constant can be with mode IFmode or KFmode, which doesn't necessarily have the same underlying float format as the one of TFmode, like this PR exposes, with option -mabi=ibmlongdouble TFmode has ibm_extended_format while KFmode has ieee_quad_format, mixing up the formats (the encoding/decoding ways) would cause unexpected results. This patch is to make it use constant's own mode instead of TFmode for real_to_target call. PR target/110011 gcc/ChangeLog: * config/rs6000/rs6000.cc (output_toc): Use the mode of the 128-bit floating constant itself for real_to_target call. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr110011.c: New test. (cherry picked from commit 388809f2afde874180da0669c669e241037eeba0)
[PATCH] RISC-V: Optimize codegen of VLA SLP
Recently, I figure out a better approach in case of codegen for VLA stepped vector. Here is the detail descriptions: Case 1: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8] = b[i * 8 + 37] + 1; a[i * 8 + 1] = b[i * 8 + 37] + 2; a[i * 8 + 2] = b[i * 8 + 37] + 3; a[i * 8 + 3] = b[i * 8 + 37] + 4; a[i * 8 + 4] = b[i * 8 + 37] + 5; a[i * 8 + 5] = b[i * 8 + 37] + 6; a[i * 8 + 6] = b[i * 8 + 37] + 7; a[i * 8 + 7] = b[i * 8 + 37] + 8; } } We need to generate the stepped vector: NPATTERNS = 8. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 } Before this patch: vid.vv4 ;; {0,1,2,3,4,5,6,7,...} vsrl.vi v4,v4,3;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...} li a3,8 ;; {8} vmul.vx v4,v4,a3 ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...} After this patch: vid.vv4;; {0,1,2,3,4,5,6,7,...} vand.vi v4,v4,-8(-NPATTERNS) ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...} Case 2: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8] = b[i * 8 + 3] + 1; a[i * 8 + 1] = b[i * 8 + 2] + 2; a[i * 8 + 2] = b[i * 8 + 1] + 3; a[i * 8 + 3] = b[i * 8 + 0] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 6] + 6; a[i * 8 + 6] = b[i * 8 + 5] + 7; a[i * 8 + 7] = b[i * 8 + 4] + 8; } } We need to generate the stepped vector: NPATTERNS = 4. { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } Before this patch: li a6,134221824 slli a6,a6,5 addi a6,a6,3;; 64-bit: 0x000300020001 vmv.v.x v6,a6 ;; {3, 2, 1, 0, ... } vid.vv4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... } vsrl.vi v4,v4,2;; {0, 0, 0, 0, 1, 1, 1, 1, ... } li a3,4 ;; {4} vmul.vx v4,v4,a3 ;; {0, 0, 0, 0, 4, 4, 4, 4, ... } vadd.vv v4,v4,v6 ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } After this patch: li a3,-536875008 sllia3,a3,4 addia3,a3,1 sllia3,a3,16 vmv.v.x v2,a3 ;; {3, 1, -1, -3, ... } vid.v v4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... } vadd.vv v4,v4,v2;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test. --- gcc/config/riscv/riscv-v.cc | 78 --- .../riscv/rvv/autovec/partial/slp-1.c | 2 + .../riscv/rvv/autovec/partial/slp-16.c| 24 ++ .../riscv/rvv/autovec/partial/slp_run-16.c| 66 4 files changed, 125 insertions(+), 45 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-16.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 79c0337327d..aa143c864d6 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1128,7 +1128,7 @@ expand_const_vector (rtx target, rtx src) builder.quick_push (CONST_VECTOR_ELT (src, i * npatterns + j)); } builder.finalize (); - + if (CONST_VECTOR_DUPLICATE_P (src)) { /* Handle the case with repeating sequence that NELTS_PER_PATTERN = 1 @@ -1204,61 +1204,49 @@ expand_const_vector (rtx target, rtx src) if (builder.single_step_npatterns_p ()) { /* Describe the case by choosing NPATTERNS = 4 as an example. */ - rtx base, step; + insn_code icode; + + /* Step 1: Generate vid = { 0, 1, 2, 3, 4, 5, 6, 7, ... }. */ + rtx vid = gen_reg_rtx (builder.mode ()); + rtx vid_ops[] = {vid}; + icode = code_for_pred_series (builder.mode ()); + emit_vlmax_insn (icode, RVV_MISC_OP, vid_ops); + if (builder.npatterns_all_equal_p ()) { /* Generate the variable-length vector following this rule: { a, a, a + step, a + step, a + step * 2, a + step * 2, ...} E.g. { 0, 0, 8, 8, 16, 16, ... } */ - /* Step 1: Generate base = { 0, 0, 0, 0, 0, 0, 0, ... }. */ - base = expand_vector_broadcast (builder.mode (), builder.elt (0)); + /* Step 2: VID AND -NPATTERNS: +{ 0&-4, 1&-4, 2&-4, 3 &-4, 4 &-4, 5 &-4, 6 &-4, 7 &-4, ... } + */ + rtx imm + = gen_int_mode (-builder.npatterns (), builder.inner_mode ()); + rtx and_ops[] = {target, vid, imm}; + icode = code_for_pred_scalar (AND, builder.mode ()); + emit_vlmax_insn (icode, RVV_BINOP, and_ops); } else { /* Generate the
[Bug c++/110304] __builtin_adcs missing and jakub you miss the point of builtin_adcb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110304 --- Comment #11 from cqwrteur --- Actually mine template<::std::unsigned_integral T> inline constexpr T add_carry(T a,T b,T carryin,T& carryout) noexcept { [[assume(carryin==0||carryin==1)]]; a+=b; carryout=a
Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn
HP, It makes sense. I will update the patch. Thanks Gui Haochen 在 2023/6/20 8:07, Hans-Peter Nilsson 写道: > On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote: > >> +++ b/gcc/config/rs6000/rs6000.md >> @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2" >> (set_attr "dot" "yes") >> (set_attr "length" "4,4,8")]) >> >> +(define_peephole2 >> + [(set (match_operand:CC 2 "cc_reg_operand" "") >> +(compare:CC (match_operand:P 1 "int_reg_operand" "") >> +(const_int 0))) >> + (set (match_operand:P 0 "int_reg_operand" "") > > A random comment from the sideline: I'd suggest to remove the > (empty) constraints string from your peephole2's. > > It can be a matter of port-specific-taste but it seems removing > them would be consistent with the other peephole2's in > rs6000.md. > > (In this matter, I believe the examples in md.texi are bad.) > > brgds, H-P
Re: [PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values
Hi Carl, on 2023/6/20 02:54, Carl Love wrote: > > Kewen, GCC maintainers: > > Version 6, Fixed missing change log entry. Changed builtin id names as > requested. Missed making the change on the last version. Fixed > comment in the three test cases. Reran regression suite on Power 10, > no regressions. > > Version 5, Tested the patch on P9 BE per request. Fixed up test case > to get the correct expected values for BE and LE. Fixed typos. > Updated the doc/extend.texi to clarify the vector arguments. Changed > test file names per request. Moved builtin defs next to related > definitions. Renamed new mode_attr. Removed new mode_iterator, used > existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. > Fixed up overloaded definitions per request. > > Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp > cases to rs6000_expand_builtin. Merged the new define_insn definitions > with the existing definitions. Renamed the builtins by removing the > __builtin_ prefix from the names. Fixed the documentation for the > builtins. Updated the test files to check the desired instructions > were generated. Retested patch on Power 10 with no regressions. > > Version 3, was able to get the overloaded version of scalar_insert_exp > to work and the change to xsxexpqp_f128_ define instruction to > work with the suggestions from Kewen. > > Version 2, I have addressed the various comments from Kewen. I had > issues with adding an additional overloaded version of > scalar_insert_exp with vector arguments. The overload infrastructure > didn't work with a mix of scalar and vector arguments. I did rename > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make > it similar to the existing builtin. I also wasn't able to get the > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so > I left the two simpler definitiions. > > The patch add three new builtins to extract the significand and > exponent of an IEEE float 128-bit value where the builtin argument is a > vector. Additionally, a builtin to insert the exponent into an IEEE > float 128-bit vector argument is added. These builtins were requested > since there is no clean and optimal way to transfer between a vector > and a scalar IEEE 128 bit value. > > The patch has been tested on Power 9 BE and Power 10 LE with no > regressions. Please let me know if the patch is acceptable or not. > Thanks. OK for trunk with some nits fixed in changelog (sorry that I didn't catch all of them in previous review, but I don't think you need to post a new version). Thanks! > >Carl > > > rs6000: Add builtins for IEEE 128-bit floating point values > > Add support for the following builtins: > > __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128); > __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128); > __ieee128 scalar_insert_exp (__vector unsigned __int128, > __vector unsigned long long); > > The instructions used in the builtins operate on vector registers. Thus > the result must be moved to a scalar type. There is no clean, performant > way to do this. The user code typically needs the result as a vector > anyway. > > gcc/ > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di. > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di. Miss "Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti." "Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti." "Rename CODE_FOR_xsiexpqp_tf to CODE_FOR_xsiexpqp_tf_di." "Rename CODE_FOR_xsiexpqp_kf to CODE_FOR_xsiexpqp_kf_di." > (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti, > CODE_FOR_xsiexpqp_kf_v2di): Add case statements. > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp, >__builtin_extractf128_sig, __builtin_insertf128_exp): Add new > builtin definitions. Should be with correct names: (__builtin_vsx_scalar_extract_exp_to_vec, __builtin_vsx_scalar_extract_sig_to_vec, __builtin_vsx_scalar_insert_exp_vqp): > Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di, > xsxsigqp_kf_ti, xsiexpqp_kf_di respectively. > * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): > Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new > overloaded instance. Update comments. > * config/rs6000/rs6000-overload.def > (__builtin_vec_scalar_insert_exp): Add new overload definition with > vector arguments. > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New > overloaded definitions. > * config/vsx.md (V2DI_DI): New mode iterator. > (DI_to_TI): New mode attribute. > Rename xsxexpqp_ to sxexpqp__. > Rename xsxsigqp_ to xsxsigqp__. >
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
On 6/19/23 17:48, Andrew Pinski wrote: On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski wrote: On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches wrote: Hello Manolis, Philipp Tomsich writes: On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: On 5/25/23 06:35, Manolis Tsamis wrote: Propagation of the stack pointer in cprop_hardreg is currenty forbidden in all cases, due to maybe_mode_change returning NULL. Relax this restriction and allow propagation when no mode change is requested. gcc/ChangeLog: * regcprop.cc (maybe_mode_change): Enable stack pointer propagation. Thanks for the clarification. This is OK for the trunk. It looks generic enough to have value going forward now rather than waiting. Rebased, retested, and applied to trunk. Thanks! Our CI found a couple of tests that started failing on aarch64-linux after this commit. I was able to confirm manually that they don't happen in the commit immediately before this one, and also that these failures are still present in today's trunk. I have testsuite logs for last good commit, first bad commit and current trunk here: https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/ Could you please check? These are the new failures: Running gcc:gcc.target/aarch64/aarch64.exp ... FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1 So for the above before this change we had: ``` (insn:TI 597 596 598 2 (set (reg:DI 11 x11) (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64} (nil)) (insn 598 597 599 2 (set (mem:BLK (scratch) [0 A8]) (unspec:BLK [ (reg:DI 11 x11) (reg/f:DI 31 sp) ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 {stack_tie} (expr_list:REG_DEAD (reg:DI 11 x11) (nil))) ``` After we get: ``` (insn 598 596 599 2 (set (mem:BLK (scratch) [0 A8]) (unspec:BLK [ (reg:DI 31 sp [11]) repeated x2 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 {stack_tie} (nil)) ``` Which seems to be ok, except we still have: .cfi_def_cfa_register 11 That is because on: (insn/f 596 595 598 2 (set (reg:DI 12 x12) (plus:DI (reg:DI 12 x12) (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1 153 {*adddi3_aarch64} (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) (nil))) We record x11 but never update it though that came before the mov for x11 ... So it seems like cprop_hardreg had no idea it needed to update it. I suspect the other testcases are just propagation of sp into the stores and such and just needed update. But the above testcase seems getting broken cfi though I don't know how to fix it. The code from aarch64.cc: ``` /* This is done to provide unwinding information for the stack adjustments we're about to do, however to prevent the optimizers from removing the R11 move and leaving the CFA note (which would be very wrong) we tie the old and new stack pointer together. The tie will expand to nothing but the optimizers will not touch the instruction. */ rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM); emit_move_insn (stack_ptr_copy, stack_pointer_rtx); emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx)); /* We want the CFA independent of the stack pointer for the duration of the loop. */ add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy); RTX_FRAME_RELATED_P (insn) = 1; ``` Well except now with this change, the optimizers touch this instruction. Maybe the move instruction should not be a move but an unspec so optimizers don't know what the move was. Adding Tamar to the CC who added this code to aarch64 originally for comments on the above understanding here. It's a bit hackish, but could we reject the stack pointer for operand1 in the stack-tie? And if we do so, does it help? jeff
RE: Re: [PATCH] RISC-V: Fix fails of testcases
Committed, thanks Jeff. Pan -Original Message- From: Gcc-patches On Behalf Of ??? Sent: Tuesday, June 20, 2023 7:15 AM To: Jeff Law ; gcc-patches Cc: kito.cheng ; palmer ; rdapp.gcc Subject: Re: Re: [PATCH] RISC-V: Fix fails of testcases >> Presumably the target selector in the dg-do ensures we only build/run >> these on the appropriate targets now and we don't need explicitly -march >> arguments? Yes. >> Assuming that's correct, this is fine for the trunk. Thanks. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-06-20 07:13 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; palmer; rdapp.gcc Subject: Re: [PATCH] RISC-V: Fix fails of testcases On 6/19/23 17:04, Juzhe-Zhong wrote: > FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 > -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for > excess errors) > Excess errors: > xgcc: fatal error: Cannot find suitable multilib set for > '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d' > compilation terminated. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: > Ditto. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto. Presumably the target selector in the dg-do ensures we only build/run these on the appropriate targets now and we don't need explicitly -march arguments? Assuming that's correct, this is fine for the trunk. jeff
RE: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code
Committed, thanks Jeff. -Original Message- From: Gcc-patches On Behalf Of Jeff Law via Gcc-patches Sent: Tuesday, June 20, 2023 2:04 AM To: 钟居哲 ; 丁乐华 ; gcc-patches Cc: Wang, Yanzhang ; kito.cheng ; palmer ; rdapp.gcc Subject: Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code On 6/18/23 07:16, 钟居哲 wrote: > Thanks for cleaning up codes for future's ABI support patch. > Let's wait for Jeff or Robin comments. Looks reasonable to me given the state we're in WRT psabi and vectors. jeff
Re: [committed] libstdc++: Optimize std::to_array for trivial types [PR110167]
On Fri, 9 Jun 2023, Jonathan Wakely via Libstdc++ wrote: > Tested powerpc64le-linux. Pushed to trunk. > > This makes sense to backport after some soak time on trunk. > > -- >8 -- > > As reported in PR libstdc++/110167, std::to_array compiles extremely > slowly for very large arrays. It needs to instantiate a very large > specialization of std::index_sequence and then create a very large > aggregate initializer from the pack expansion. For trivial types we can > simply default-initialize the std::array and then use memcpy to copy the > values. For non-trivial types we need to use the existing > implementation, despite the compilation cost. > > As also noted in the PR, using a generic lambda instead of the > __to_array helper compiles faster since gcc-13. It also produces > slightly smaller code at -O1, due to additional inlining. The code at > -Os, -O2 and -O3 seems to be the same. This new implementation requires > __cpp_generic_lambdas >= 201707L (i.e. P0428R2) but that is supported > since Clang 10 and since Intel icc 2021.5.0 (and since GCC 10.1). > > libstdc++-v3/ChangeLog: > > PR libstdc++/110167 > * include/std/array (to_array): Initialize arrays of trivial > types using memcpy. For non-trivial types, use lambda > expressions instead of a separate helper function. > (__to_array): Remove. > * testsuite/23_containers/array/creation/110167.cc: New test. > --- > libstdc++-v3/include/std/array| 53 +-- > .../23_containers/array/creation/110167.cc| 14 + > 2 files changed, 51 insertions(+), 16 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/23_containers/array/creation/110167.cc > > diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array > index 70280c1beeb..b791d86ddb2 100644 > --- a/libstdc++-v3/include/std/array > +++ b/libstdc++-v3/include/std/array > @@ -414,19 +414,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >return std::move(std::get<_Int>(__arr)); > } > > -#if __cplusplus > 201703L > +#if __cplusplus >= 202002L && __cpp_generic_lambdas >= 201707L > #define __cpp_lib_to_array 201907L > - > - template > -constexpr array, sizeof...(_Idx)> > -__to_array(_Tp (&__a)[sizeof...(_Idx)], index_sequence<_Idx...>) > -{ > - if constexpr (_Move) > - return {{std::move(__a[_Idx])...}}; > - else > - return {{__a[_Idx]...}}; > -} > - >template > [[nodiscard]] > constexpr array, _Nm> > @@ -436,8 +425,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >static_assert(!is_array_v<_Tp>); >static_assert(is_constructible_v<_Tp, _Tp&>); >if constexpr (is_constructible_v<_Tp, _Tp&>) > - return __to_array(__a, make_index_sequence<_Nm>{}); > - __builtin_unreachable(); // FIXME: see PR c++/91388 > + { > + if constexpr (is_trivial_v<_Tp> && _Nm != 0) redundant _Nm != 0 test? > + { > + array, _Nm> __arr; > + if (!__is_constant_evaluated() && _Nm != 0) > + __builtin_memcpy(__arr.data(), __a, sizeof(__a)); > + else > + for (size_t __i = 0; __i < _Nm; ++__i) > + __arr._M_elems[__i] = __a[__i]; > + return __arr; > + } > + else > + return [&__a](index_sequence<_Idx...>) { > + return array, _Nm>{{ __a[_Idx]... }}; > + }(make_index_sequence<_Nm>{}); > + } > + else > + __builtin_unreachable(); // FIXME: see PR c++/91388 > } > >template > @@ -449,8 +454,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >static_assert(!is_array_v<_Tp>); >static_assert(is_move_constructible_v<_Tp>); >if constexpr (is_move_constructible_v<_Tp>) > - return __to_array<1>(__a, make_index_sequence<_Nm>{}); > - __builtin_unreachable(); // FIXME: see PR c++/91388 > + { > + if constexpr (is_trivial_v<_Tp>) > + { > + array, _Nm> __arr; > + if (!__is_constant_evaluated() && _Nm != 0) > + __builtin_memcpy(__arr.data(), __a, sizeof(__a)); > + else > + for (size_t __i = 0; __i < _Nm; ++__i) > + __arr._M_elems[__i] = std::move(__a[__i]); IIUC this std::move is unnecessary for trivial arrays? > + return __arr; > + } > + else > + return [&__a](index_sequence<_Idx...>) { > + return array, _Nm>{{ std::move(__a[_Idx])... }}; > + }(make_index_sequence<_Nm>{}); > + } > + else > + __builtin_unreachable(); // FIXME: see PR c++/91388 > } > #endif // C++20 > > diff --git a/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc > b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc > new file mode 100644 > index 000..c2aecc911bd > --- /dev/null > +++ b/libstdc++-v3/testsuite/23_containers/array/creation/110167.cc > @@ -0,0 +1,14 @@ > +// { dg-options "-std=gnu++20" } > +// { dg-do compile { target c++20 } } > + > +// PR
Re: [PATCH v6 0/4] P1689R5 support
On 6/17/23 10:43, Ben Boeckel wrote: On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote: I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g. FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors) Excess errors: /home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal compiler error: Segmentation fault 0x19e2f3c crash_signal /home/jason/gt/gcc/toplev.cc:314 0x340f3f8 mkdeps::vec::size() const /home/jason/gt/libcpp/mkdeps.cc:57 0x340dc1f apply_vpath /home/jason/gt/libcpp/mkdeps.cc:194 0x340e08e deps_add_dep(mkdeps*, char const*) /home/jason/gt/libcpp/mkdeps.cc:318 0xea7b51 module_client::open_module_client(unsigned int, char const*, mkdeps*, void (*)(char const*), char const*) /home/jason/gt/gcc/cp/mapper-client.cc:291 0xef2ba8 make_mapper /home/jason/gt/gcc/cp/module.cc:14042 0xf0896c get_mapper(unsigned int, mkdeps*) /home/jason/gt/gcc/cp/module.cc:3977 0xf032ac name_pending_imports /home/jason/gt/gcc/cp/module.cc:19623 0xf03a7d preprocessed_module(cpp_reader*) /home/jason/gt/gcc/cp/module.cc:19817 0xe85104 module_token_cdtor(cpp_reader*, unsigned long) /home/jason/gt/gcc/cp/lex.cc:548 0xf467b2 cp_lexer_new_main /home/jason/gt/gcc/cp/parser.cc:756 0xfc1e3a c_parse_file() /home/jason/gt/gcc/cp/parser.cc:49725 0x11c5bf5 c_common_parse_file() /home/jason/gt/gcc/c-family/c-opts.cc:1268 Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I think I got misled by `make check` returning a zero exit code even if there are failures. Aha! Patches 3 and 4 could also use testcases. Jason
Re: [PATCH v6 0/4] P1689R5 support
On 6/17/23 10:43, Ben Boeckel wrote: On Fri, Jun 16, 2023 at 23:55:53 -0400, Jason Merrill wrote: I see the same thing with patch 4 on x86_64-pc-linux-gnu, e.g. FAIL: g++.dg/modules/ben-1_a.C -std=c++17 (test for excess errors) Excess errors: /home/jason/gt/gcc/testsuite/g++.dg/modules/ben-1_a.C:9:1: internal compiler error: Segmentation fault 0x19e2f3c crash_signal /home/jason/gt/gcc/toplev.cc:314 0x340f3f8 mkdeps::vec::size() const /home/jason/gt/libcpp/mkdeps.cc:57 0x340dc1f apply_vpath /home/jason/gt/libcpp/mkdeps.cc:194 0x340e08e deps_add_dep(mkdeps*, char const*) /home/jason/gt/libcpp/mkdeps.cc:318 0xea7b51 module_client::open_module_client(unsigned int, char const*, mkdeps*, void (*)(char const*), char const*) /home/jason/gt/gcc/cp/mapper-client.cc:291 0xef2ba8 make_mapper /home/jason/gt/gcc/cp/module.cc:14042 0xf0896c get_mapper(unsigned int, mkdeps*) /home/jason/gt/gcc/cp/module.cc:3977 0xf032ac name_pending_imports /home/jason/gt/gcc/cp/module.cc:19623 0xf03a7d preprocessed_module(cpp_reader*) /home/jason/gt/gcc/cp/module.cc:19817 0xe85104 module_token_cdtor(cpp_reader*, unsigned long) /home/jason/gt/gcc/cp/lex.cc:548 0xf467b2 cp_lexer_new_main /home/jason/gt/gcc/cp/parser.cc:756 0xfc1e3a c_parse_file() /home/jason/gt/gcc/cp/parser.cc:49725 0x11c5bf5 c_common_parse_file() /home/jason/gt/gcc/c-family/c-opts.cc:1268 Thanks. I missed a `nullptr` check before calling `deps_add_dep`. I think I got misled by `make check` returning a zero exit code even if there are failures. Aha! Patches 3 and 4 could also use testcases. Jason
[Bug testsuite/110316] New: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110316 Bug ID: 110316 Summary: [14 regression] g++.dg/ext/timevar1.C and timevar2.C fail erratically Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- I unfortunately do not have a clear starting point for this but recently the g++.dg/ext/timevar1.C and timevar2 tests began failing some runs and working on the next. It is happening on one of our newer/faster machines but it did not used to fail there. The last run I did not see any failures (for this ome nor previously) was 47fa3cef59a031f1b0fdce309ff634fab717606d, r14-1906-g47fa3cef59a031 The first run with failures was 0f9bb3e7a4aab95fd449f60b5f891ed9a6e5f352, r14-1910-g0f9bb3e7a4aab9 I don't see anything in that range that might cause this, though. FAIL: g++.dg/ext/timevar1.C -std=gnu++17 (internal compiler error: in validate_phases, at timevar.cc:626) FAIL: g++.dg/ext/timevar1.C -std=gnu++17 (test for excess errors) FAIL: g++.dg/ext/timevar2.C -std=gnu++20 (internal compiler error: in validate_phases, at timevar.cc:626) FAIL: g++.dg/ext/timevar2.C -std=gnu++20 (test for excess errors) spawn -ignore SIGHUP /home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../xg++ -B/home/gccbuild/build/nightly/build-gcc-trunk/gcc/testsuite/g++1/../../ /home/gccbuild/gcc_trunk_git/gcc/gcc/testsuite/g++.dg/ext/timevar2.C -fdiagnostics-plain-output -nostdinc++ -I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include/powerpc64le-unknown-linux-gnu -I/home/gccbuild/build/nightly/build-gcc-trunk/powerpc64le-unknown-linux-gnu/libstdc++-v3/include -I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/libsupc++ -I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/include/backward -I/home/gccbuild/gcc_trunk_git/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++98 -ftime-report -S -o timevar2.s^M ^M Time variable usr sys wall GGC^M phase setup: 0.00 ( 0%) 0.00 ( 0%) 0.01 (100%) 2835k ( 81%)^M phase parsing : 0.01 (100%) 0.00 ( 0%) 0.00 ( 0%) 603k ( 17%)^M |name lookup : 0.00 ( 0%) 0.00 ( 0%) 0.01 (100%) 174k ( 5%)^M parser (global): 0.01 (100%) 0.00 ( 0%) 0.00 ( 0%) 587k ( 17%)^M TOTAL : 0.01 0.00 0.01 3496k^M Extra diagnostic checks enabled; compiler may run slowly.^M Configure with --enable-checking=release to disable checks.^M Timing error: total of phase timers exceeds total time.^M wall1.02666800281492e-02 > 1.00150810810562e-02^M internal compiler error: in validate_phases, at timevar.cc:626^M 0x10ff92bb toplev::~toplev()^M /home/gccbuild/gcc_trunk_git/gcc/gcc/toplev.cc:2155^M xg++: internal compiler error: Segmentation fault signal terminated program cc1plus^M Note that the two phase timings are both 0.01 and both report 100% while the total time is also 0.01. Is this maybe a rounding issue?
Re: [PATCH, rs6000] Add two peephole2 patterns for mr. insn
On Tue, 30 May 2023, HAO CHEN GUI via Gcc-patches wrote: > +++ b/gcc/config/rs6000/rs6000.md > @@ -7891,6 +7891,36 @@ (define_insn "*mov_internal2" > (set_attr "dot" "yes") > (set_attr "length" "4,4,8")]) > > +(define_peephole2 > + [(set (match_operand:CC 2 "cc_reg_operand" "") > + (compare:CC (match_operand:P 1 "int_reg_operand" "") > + (const_int 0))) > + (set (match_operand:P 0 "int_reg_operand" "") A random comment from the sideline: I'd suggest to remove the (empty) constraints string from your peephole2's. It can be a matter of port-specific-taste but it seems removing them would be consistent with the other peephole2's in rs6000.md. (In this matter, I believe the examples in md.texi are bad.) brgds, H-P
[Bug tree-optimization/110315] [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |13.2 Component|c++ |tree-optimization --- Comment #1 from Andrew Pinski --- My bet is you could reproduce it before r13-4562-g3da5ae7a347b7d74765053f4a08 and your bisect just produced one revision which just happened to undo part of the front-end optimizations done in r13-4562 (and a few others afterwards).
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski wrote: > > On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches > wrote: > > > > > > Hello Manolis, > > > > Philipp Tomsich writes: > > > > > On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: > > >> > > >> On 5/25/23 06:35, Manolis Tsamis wrote: > > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden > > >> > in all cases, due to maybe_mode_change returning NULL. Relax this > > >> > restriction and allow propagation when no mode change is requested. > > >> > > > >> > gcc/ChangeLog: > > >> > > > >> > * regcprop.cc (maybe_mode_change): Enable stack pointer > > >> > propagation. > > >> Thanks for the clarification. This is OK for the trunk. It looks > > >> generic enough to have value going forward now rather than waiting. > > > > > > Rebased, retested, and applied to trunk. Thanks! > > > > Our CI found a couple of tests that started failing on aarch64-linux > > after this commit. I was able to confirm manually that they don't happen > > in the commit immediately before this one, and also that these failures > > are still present in today's trunk. > > > > I have testsuite logs for last good commit, first bad commit and current > > trunk here: > > > > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/ > > > > Could you please check? > > > > These are the new failures: > > > > Running gcc:gcc.target/aarch64/aarch64.exp ... > > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times > > mov\\tx11, sp 1 > > So for the above before this change we had: > ``` > (insn:TI 597 596 598 2 (set (reg:DI 11 x11) > (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64} > (nil)) > (insn 598 597 599 2 (set (mem:BLK (scratch) [0 A8]) > (unspec:BLK [ > (reg:DI 11 x11) > (reg/f:DI 31 sp) > ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 > {stack_tie} > (expr_list:REG_DEAD (reg:DI 11 x11) > (nil))) > ``` > > After we get: > ``` > (insn 598 596 599 2 (set (mem:BLK (scratch) [0 A8]) > (unspec:BLK [ > (reg:DI 31 sp [11]) repeated x2 > ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 > {stack_tie} > (nil)) > ``` > Which seems to be ok, except we still have: > .cfi_def_cfa_register 11 > > That is because on: > (insn/f 596 595 598 2 (set (reg:DI 12 x12) > (plus:DI (reg:DI 12 x12) > (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1 > 153 {*adddi3_aarch64} > (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) > (nil))) > > We record x11 but never update it though that came before the mov for > x11 ... So it seems like cprop_hardreg had no idea it needed to update > it. > > I suspect the other testcases are just propagation of sp into the > stores and such and just needed update. But the above testcase seems > getting broken cfi though I don't know how to fix it. The code from aarch64.cc: ``` /* This is done to provide unwinding information for the stack adjustments we're about to do, however to prevent the optimizers from removing the R11 move and leaving the CFA note (which would be very wrong) we tie the old and new stack pointer together. The tie will expand to nothing but the optimizers will not touch the instruction. */ rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM); emit_move_insn (stack_ptr_copy, stack_pointer_rtx); emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx)); /* We want the CFA independent of the stack pointer for the duration of the loop. */ add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy); RTX_FRAME_RELATED_P (insn) = 1; ``` Well except now with this change, the optimizers touch this instruction. Maybe the move instruction should not be a move but an unspec so optimizers don't know what the move was. Adding Tamar to the CC who added this code to aarch64 originally for comments on the above understanding here. Thanks, Andrew > > Thanks, > Andrew Pinski > > > > > > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ... > > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve > > -fno-stack-protector check-function-bodies caller_pred > > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve > > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve > > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve > > -fno-stack-protector scan-assembler \\tfmov\\t(z[0-9]+\\.h), > > #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n > > FAIL:
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches wrote: > > > Hello Manolis, > > Philipp Tomsich writes: > > > On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: > >> > >> On 5/25/23 06:35, Manolis Tsamis wrote: > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden > >> > in all cases, due to maybe_mode_change returning NULL. Relax this > >> > restriction and allow propagation when no mode change is requested. > >> > > >> > gcc/ChangeLog: > >> > > >> > * regcprop.cc (maybe_mode_change): Enable stack pointer > >> > propagation. > >> Thanks for the clarification. This is OK for the trunk. It looks > >> generic enough to have value going forward now rather than waiting. > > > > Rebased, retested, and applied to trunk. Thanks! > > Our CI found a couple of tests that started failing on aarch64-linux > after this commit. I was able to confirm manually that they don't happen > in the commit immediately before this one, and also that these failures > are still present in today's trunk. > > I have testsuite logs for last good commit, first bad commit and current > trunk here: > > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/ > > Could you please check? > > These are the new failures: > > Running gcc:gcc.target/aarch64/aarch64.exp ... > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, > sp 1 So for the above before this change we had: ``` (insn:TI 597 596 598 2 (set (reg:DI 11 x11) (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64} (nil)) (insn 598 597 599 2 (set (mem:BLK (scratch) [0 A8]) (unspec:BLK [ (reg:DI 11 x11) (reg/f:DI 31 sp) ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 {stack_tie} (expr_list:REG_DEAD (reg:DI 11 x11) (nil))) ``` After we get: ``` (insn 598 596 599 2 (set (mem:BLK (scratch) [0 A8]) (unspec:BLK [ (reg:DI 31 sp [11]) repeated x2 ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169 {stack_tie} (nil)) ``` Which seems to be ok, except we still have: .cfi_def_cfa_register 11 That is because on: (insn/f 596 595 598 2 (set (reg:DI 12 x12) (plus:DI (reg:DI 12 x12) (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1 153 {*adddi3_aarch64} (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11) (nil))) We record x11 but never update it though that came before the mov for x11 ... So it seems like cprop_hardreg had no idea it needed to update it. I suspect the other testcases are just propagation of sp into the stores and such and just needed update. But the above testcase seems getting broken cfi though I don't know how to fix it. Thanks, Andrew Pinski > > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ... > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve > -fno-stack-protector check-function-bodies caller_pred > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tfmov\\t(z[0-9]+\\.h), > #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - > z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - > z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - > z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - > z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2b\\t{(z[0-9]+\\.b) -
[Bug c++/110315] New: [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110315 Bug ID: 110315 Summary: [13 Regression] g++ crashes with a segmentation fault while compiling a large const std::vector of std::string since r13-6566-ge0324e2629e25a90 Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: glebfm at altlinux dot org Target Milestone: --- Created attachment 55366 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55366=edit preprocessed source $ ~/gcc-test/bin/g++ --version g++ (GCC) 13.1.1 20230619 Copyright (C) 2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ ~/gcc-test/bin/g++ -O -c test.ii g++: internal compiler error: Segmentation fault signal terminated program cc1plus Please submit a full bug report, with preprocessed source (by using -freport-bug). See <https://gcc.gnu.org/bugs/> for instructions. $ gdb -q --args /usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus -quiet -O test.cpp -o test.o Reading symbols from /usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus... (gdb) run Starting program: /usr/src/gcc-test/libexec/gcc/x86_64-pc-linux-gnu/13.1.1/cc1plus -quiet -O test.cpp -o test.o [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x01aafe51 in fold_using_range::range_of_phi (this=this@entry=0x7bfffcaf, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., phi=phi@entry=0x72d4c000, src=...) at ../../gcc/gimple-range-fold.cc:733 733 { (gdb) bt #0 0x01aafe51 in fold_using_range::range_of_phi (this=this@entry=0x7bfffcaf, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., phi=phi@entry=0x72d4c000, src=...) at ../../gcc/gimple-range-fold.cc:733 #1 0x01ab2289 in fold_using_range::fold_stmt (this=0x7bfffcaf, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., s=0x72d4c000, src=..., name=0x72d4b480) at ../../gcc/gimple-range-fold.cc:491 #2 0x01aa5292 in gimple_ranger::fold_range_internal (name=0x72d4b480, s=0x72d4c000, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., this=0x29069a0) at ../../gcc/gimple-range.cc:257 #3 gimple_ranger::range_of_stmt (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., s=0x72d4c000, name=) at ../../gcc/gimple-range.cc:318 #4 0x01aa3fbb in gimple_ranger::range_on_entry (this=this@entry=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., bb=0x734b0720, name=name@entry=0x72d4b480) at ../../gcc/gimple-range.cc:153 #5 0x01aa6cff in gimple_ranger::range_of_expr (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., expr=0x72d4b480, stmt=) at ../../gcc/gimple-range.cc:130 #6 0x01aa418a in gimple_ranger::range_on_exit (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., bb=0x734b0720, name=0x72d4b480) at ../../gcc/gimple-range.cc:187 #7 0x01aa728a in gimple_ranger::range_on_edge (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., e=0x7325a720, name=0x72d4b480) at ../../gcc/gimple-range.cc:233 #8 0x01ab012f in fold_using_range::range_of_phi (this=this@entry=0x7c0073af, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., phi=phi@entry=0x72d4c100, src=...) at ../../gcc/value-range.h:634 #9 0x01ab2289 in fold_using_range::fold_stmt (this=0x7c0073af, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., s=0x72d4c100, src=..., name=0x72d4b4c8) at ../../gcc/gimple-range-fold.cc:491 #10 0x01aa5292 in gimple_ranger::fold_range_internal (name=0x72d4b4c8, s=0x72d4c100, r=warning: RTTI symbol not found for class 'int_range<255u>' #11 gimple_ranger::range_of_stmt (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., s=0x72d4c100, name=) at ../../gcc/gimple-range.cc:318 #12 0x01aa3fbb in gimple_ranger::range_on_entry (this=this@entry=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., bb=0x734b0900, name=name@entry=0x72d4b4c8) at ../../gcc/gimple-range.cc:153 #13 0x01aa6cff in gimple_ranger::range_of_expr (this=0x29069a0, r=warning: RTTI symbol not found for class 'int_range<255u>' ..., expr=0x72d4b4c8, stmt=) at ../../gcc/gimple-range.cc:130 #14 0x01aa418a in gimpl
Re: Re: [PATCH] RISC-V: Fix fails of testcases
>> Presumably the target selector in the dg-do ensures we only build/run >> these on the appropriate targets now and we don't need explicitly -march >> arguments? Yes. >> Assuming that's correct, this is fine for the trunk. Thanks. juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-06-20 07:13 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; palmer; rdapp.gcc Subject: Re: [PATCH] RISC-V: Fix fails of testcases On 6/19/23 17:04, Juzhe-Zhong wrote: > FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 > -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for > excess errors) > Excess errors: > xgcc: fatal error: Cannot find suitable multilib set for > '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d' > compilation terminated. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: > Ditto. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto. > * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto. Presumably the target selector in the dg-do ensures we only build/run these on the appropriate targets now and we don't need explicitly -march arguments? Assuming that's correct, this is fine for the trunk. jeff
Re: [PATCH] RISC-V: Fix fails of testcases
On 6/19/23 17:04, Juzhe-Zhong wrote: FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors) Excess errors: xgcc: fatal error: Cannot find suitable multilib set for '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d' compilation terminated. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto. Presumably the target selector in the dg-do ensures we only build/run these on the appropriate targets now and we don't need explicitly -march arguments? Assuming that's correct, this is fine for the trunk. jeff
[PATCH] RISC-V: Fix fails of testcases
FAIL: gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c -std=c99 -O3 -ftree-vectorize --param riscv-autovec-preference=fixed-vlmax (test for excess errors) Excess errors: xgcc: fatal error: Cannot find suitable multilib set for '-march=rv64imafdcv_zicsr_zifencei_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b'/'-mabi=lp64d' compilation terminated. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c: Fix fail. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: Ditto. --- .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c| 2 +- .../riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c | 2 +- .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c| 2 +- .../gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c index 82bf6d674ec..dd22dae5eb9 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-run.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_vector } } } */ -/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */ +/* { dg-additional-options "-std=c99 -Wno-pedantic" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c index a0b2cf97afe..db54acc6535 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c @@ -1,5 +1,5 @@ /* { dg-do run {target { riscv_zvfh_hw } } } */ -/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */ +/* { dg-additional-options "-Wno-pedantic" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c index 7e5e0e69d51..bf04a3d029e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_vector } } } */ -/* { dg-additional-options "-std=c99 -march=rv64gcv -Wno-pedantic" } */ +/* { dg-additional-options "-std=c99 -Wno-pedantic" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c index bf514f9426b..df8363e0428 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c @@ -1,5 +1,5 @@ /* { dg-do run { target { riscv_zvfh_hw } } } */ -/* { dg-additional-options "-march=rv64gcv_zvfh -Wno-pedantic" } */ +/* { dg-additional-options "-Wno-pedantic" } */ #include -- 2.36.1
[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #4 from matoro --- (In reply to Alexander Monakov from comment #3) > Do you have older versions of GCC to check on this testcase? No, for the same reason I didn't get a complete backtrace, it takes a while to compile on this machine. I can go ahead and kick it off though, and update with results as I find them.
[PR target/110201] Fix operand types for various scalar crypto insns
A handful of the scalar crypto instructions are supposed to take a constant integer argument 0..3 inclusive. A suitable constraint was created and used for this purpose (D03), but the operand's predicate is "register_operand". That's just wrong. This patch adds a new predicate "const_0_3_operand" and fixes the relevant insns to use it. One could argue the constraint is redundant now (and you'd be correct). I wouldn't lose sleep if someone wanted that removed, in which case I'll spin up a V2. The testsuite was broken in a way that made it consistent with the compiler, so the tests passed, when they really should have been issuing errors all along. This patch adjusts the existing tests so that they all expect a diagnostic on the invalid operand usage (including out of range constants). It adds new tests with proper constants, testing the extremes of valid values. OK for the trunk, or should we remove the D03 constraint? Jeff PR target/110201 gcc/ * config/riscv/predicates.md (const_0_3_operand): New predicate. * config/riscv/crypto.md (riscv_aes32dsi): Use new predicate. (riscv_aes32dsmi, riscv_aes32esi, riscvaes32esmi): Likewise. (riscv_sm4ed_, riscv_sm4ks_" [(set (match_operand:X 0 "register_operand" "=r") (unspec:X [(match_operand:X 1 "register_operand" "r") (match_operand:X 2 "register_operand" "r") - (match_operand:SI 3 "register_operand" "D03")] + (match_operand:SI 3 "const_0_3_operand" "D03")] UNSPEC_SM4_ED))] "TARGET_ZKSED" "sm4ed\t%0,%1,%2,%3" @@ -404,7 +404,7 @@ (define_insn "riscv_sm4ks_" [(set (match_operand:X 0 "register_operand" "=r") (unspec:X [(match_operand:X 1 "register_operand" "r") (match_operand:X 2 "register_operand" "r") - (match_operand:SI 3 "register_operand" "D03")] + (match_operand:SI 3 "const_0_3_operand" "D03")] UNSPEC_SM4_KS))] "TARGET_ZKSED" "sm4ks\t%0,%1,%2,%3" diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md index 04ca6ceabc7..7aed71b5123 100644 --- a/gcc/config/riscv/predicates.md +++ b/gcc/config/riscv/predicates.md @@ -45,6 +45,10 @@ (define_predicate "const_csr_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 0, 31)"))) +(define_predicate "const_0_3_operand" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 3)"))) + (define_predicate "csr_operand" (ior (match_operand 0 "const_csr_operand") (match_operand 0 "register_operand"))) diff --git a/gcc/testsuite/gcc.target/riscv/zknd32-2.c b/gcc/testsuite/gcc.target/riscv/zknd32-2.c new file mode 100644 index 000..f8e68c6e56b --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/zknd32-2.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=rv32gc_zknd -mabi=ilp32d" } */ +/* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */ + +#include + +int32_t foo1(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsi(rs1,rs2,0); +} + +int32_t foo2(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsmi(rs1,rs2,0); +} + +int32_t foo3(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsi(rs1,rs2,3); +} + +int32_t foo4(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsmi(rs1,rs2,3); +} + +/* { dg-final { scan-assembler-times "aes32dsi" 2 } } */ +/* { dg-final { scan-assembler-times "aes32dsmi" 2 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/zknd32.c b/gcc/testsuite/gcc.target/riscv/zknd32.c index 5fcc66da901..7370a2c1812 100644 --- a/gcc/testsuite/gcc.target/riscv/zknd32.c +++ b/gcc/testsuite/gcc.target/riscv/zknd32.c @@ -6,13 +6,30 @@ int32_t foo1(int32_t rs1, int32_t rs2, int bs) { -return __builtin_riscv_aes32dsi(rs1,rs2,bs); +return __builtin_riscv_aes32dsi(rs1,rs2,bs); /* { dg-error "invalid argument to built-in function" } */ } int32_t foo2(int32_t rs1, int32_t rs2, int bs) { -return __builtin_riscv_aes32dsmi(rs1,rs2,bs); +return __builtin_riscv_aes32dsmi(rs1,rs2,bs); /* { dg-error "invalid argument to built-in function" } */ } -/* { dg-final { scan-assembler-times "aes32dsi" 1 } } */ -/* { dg-final { scan-assembler-times "aes32dsmi" 1 } } */ +int32_t foo3(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsi(rs1,rs2,-1); /* { dg-error "invalid argument to built-in function" } */ +} + +int32_t foo4(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsmi(rs1,rs2,-1); /* { dg-error "invalid argument to built-in function" } */ +} + +int32_t foo5(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsi(rs1,rs2,4); /* { dg-error "invalid argument to built-in function" } */ +} + +int32_t foo6(int32_t rs1, int32_t rs2) +{ +return __builtin_riscv_aes32dsmi(rs1,rs2,4); /* { dg-error "invalid argument to built-in function" } */ +} diff --git
[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #3 from Alexander Monakov --- Do you have older versions of GCC to check on this testcase?
Re: [PATCH] RISC-V: Add VLS modes for GNU vectors
On 6/19/23 15:45, 钟居哲 wrote: Hi, Jeff. Thanks for comment. I add INCLUDE_ALGORITHM since I use std:min. I failed to compile when I didn't add INCLUDE_ALGORITHM. Is INCLUDE_ALGORITHM expensive that you don't want it? It just stood out as unexpected. THere's no concerns with std::min and the like. Jeff
Re: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
Hi, this patch refactors the codes in tree-vect-stmts.cc in case of gimple IR generation. I realize the codes change too much and I am not sure whether you are happy with it. Originally, the codes are like: if (final_mask) { generate IFN_MASK_LOAD... } else if (loop_len) { generate IFN_LEN_LOAD handle BIAS. } else { NORMAL_LOAD } Now, I refactor it: if (final_mask || loop_len) { if (get_len_load_store ().exisits ()) { /* LEN_MASK_LOAD or LEN_LOAD */ get len.. if (LEN_MASK_LOAD) { get mask... generate IFN_LEN_MASK_LOAD... } else { generate IFN_LEN_LOAD... } Handle BIAS } else { gcc_assert (final_mask) /* MASK_LOAD */ } } else { NORMAL_LOAD } The reason I refactor it is I found LEN_MASK_LOAD and LEN_LOAD share some common codes. Avoid duplicate codes make the codes looks reasonable. Boostrap and Regression is on the way. juzhe.zh...@rivai.ai From: juzhe.zhong Date: 2023-06-20 00:17 To: gcc-patches CC: rguenther; richard.sandiford; Ju-Zhe Zhong Subject: [PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer From: Ju-Zhe Zhong This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer. I refactor gimple IR build to make codes look cleaner. gcc/ChangeLog: * internal-fn.cc (expand_partial_store_optab_fn): Add LEN_MASK_{LOAD,STORE} vectorizer support. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_len_load_store_bias): Ditto. * optabs-query.cc (can_vec_mask_load_store_p): Ditto. (get_len_load_store_mode): Ditto. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto. (get_all_ones_mask): New function. (vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support. (vectorizable_load): Ditto. --- gcc/internal-fn.cc | 35 +- gcc/optabs-query.cc| 25 +++- gcc/tree-vect-stmts.cc | 259 + 3 files changed, 213 insertions(+), 106 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index c911ae790cb..e10c21de5f1 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) * OPTAB. */ static void -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) { class expand_operand ops[5]; tree type, lhs, rhs, maskt, biast; @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) insn_code icode; maskt = gimple_call_arg (stmt, 2); - rhs = gimple_call_arg (stmt, 3); + rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn)); type = TREE_TYPE (rhs); lhs = expand_call_mem_ref (type, stmt, 0); @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn) case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: case IFN_LEN_LOAD: +case IFN_LEN_MASK_LOAD: return true; default: @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn) case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: case IFN_LEN_STORE: +case IFN_LEN_MASK_STORE: return true; default: @@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_STORE_LANES: return 2; +case IFN_LEN_MASK_LOAD: +case IFN_LEN_MASK_STORE: + return 3; + case IFN_MASK_GATHER_LOAD: case IFN_MASK_SCATTER_STORE: return 4; @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_LEN_STORE: return 3; +case IFN_LEN_MASK_STORE: + return 4; + default: return -1; } @@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, machine_mode mode) { optab optab = direct_internal_fn_optab (ifn); insn_code icode = direct_optab_handler (optab, mode); + int bias_argno = 3; + if (icode == CODE_FOR_nothing) +{ + machine_mode mask_mode + = targetm.vectorize.get_mask_mode (mode).require (); + if (ifn == IFN_LEN_LOAD) + { + /* Try LEN_MASK_LOAD. */ + optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD); + } + else + { + /* Try LEN_MASK_STORE. */ + optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE); + } + icode = convert_optab_handler (optab, mode, mask_mode); + bias_argno = 4; +} if (icode != CODE_FOR_nothing) { /* For now we only support biases of 0 or -1. Try both of them. */ - if (insn_operand_matches (icode, 3, GEN_INT (0))) + if (insn_operand_matches (icode, bias_argno, GEN_INT (0))) return 0; - if (insn_operand_matches (icode, 3, GEN_INT (-1))) + if (insn_operand_matches (icode, bias_argno, GEN_INT (-1))) return -1; } diff --git a/gcc/optabs-query.cc
[Bug middle-end/110282] Segmentation fault with specific optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110282 --- Comment #5 from Andrew Pinski --- Note I suspect r12-248-gb58dc0b803057c0e6032e0d9b made the problem latent in GCC 12+. But turning off DSE in GCC 12.1.0 does not reproduce the bug
Re: Different ASM for ReLU function between GCC11 and GCC12
On Mon, 19 Jun 2023, André Günther via Gcc wrote: I noticed that a simple function like auto relu( float x ) { return x > 0.f ? x : 0.f; } compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On -O3 -mavx2 the former compiles above function to relu(float): vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] ret .LC0: .long 0 which is what I would naively expect and what also clang essentially does (clang actually uses an xor before the maxss to get the zero). The latter, however, compiles the function to relu(float): vxorps xmm1, xmm1, xmm1 vcmpltss xmm2, xmm1, xmm0 vblendvps xmm0, xmm1, xmm0, xmm2 ret which looks like a missed optimisation. Does anyone know if there's a reason for the changed behaviour? With -fno-signed-zeros -ffinite-math-only, gcc-12 still uses max instead of cmp+blend. So the first thing to check would be if both versions give the same result on negative 0 and NaN. -- Marc Glisse
Re: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors
Hi, Jeff. Thanks for comment. I add INCLUDE_ALGORITHM since I use std:min. I failed to compile when I didn't add INCLUDE_ALGORITHM. Is INCLUDE_ALGORITHM expensive that you don't want it? juzhe.zh...@rivai.ai From: Jeff Law Date: 2023-06-20 02:25 To: Juzhe-Zhong; gcc-patches CC: kito.cheng; palmer; rdapp.gcc Subject: Re: [PATCH] RISC-V: Add VLS modes for GNU vectors On 6/18/23 17:06, Juzhe-Zhong wrote: > This patch is a propsal patch is **NOT** ready to push since > after this patch the total machine modes will exceed 255 which will create ICE > in LTO: >internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290 Right. Note that an ack from Jakub or Richi will be sufficient for the LTO fixes to go forward. > > The reason we need to add VLS modes for following reason: > 1. Enhance GNU vectors codegen: > For example: > typedef int32_t vnx8si __attribute__ ((vector_size (32))); > > __attribute__ ((noipa)) void > f_vnx8si (int32_t * in, int32_t * out) > { > vnx8si v = *(vnx8si*)in; > *(vnx8si *) out = v; > } > > compile option: --param=riscv-autovec-preference=scalable > before this patch: > f_vnx8si: > ld a2,0(a0) > ld a3,8(a0) > ld a4,16(a0) > ld a5,24(a0) > addisp,sp,-32 > sd a2,0(a1) > sd a3,8(a1) > sd a4,16(a1) > sd a5,24(a1) > addisp,sp,32 > jr ra > > After this patch: > f_vnx8si: > vsetivlizero,8,e32,m2,ta,ma > vle32.v v2,0(a0) > vse32.v v2,0(a1) > ret > > 2. Ehance VLA SLP: > void > f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c) > { >for (int i = 0; i < 100; ++i) > { >a[i * 8] = b[i * 8] + c[i * 8]; >a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1]; >a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2]; >a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3]; >a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4]; >a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5]; >a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6]; >a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7]; > } > } > > > .. > Loop body: > ... > vrgatherei16.vv... > ... > > Tail: > lbu a4,792(a1) > lbu a5,792(a2) > addwa5,a5,a4 > sb a5,792(a0) > lbu a5,793(a2) > addwa5,a5,a4 > sb a5,793(a0) > lbu a4,794(a1) > lbu a5,794(a2) > addwa5,a5,a4 > sb a5,794(a0) > lbu a5,795(a2) > addwa5,a5,a4 > sb a5,795(a0) > lbu a4,796(a1) > lbu a5,796(a2) > addwa5,a5,a4 > sb a5,796(a0) > lbu a5,797(a2) > addwa5,a5,a4 > sb a5,797(a0) > lbu a4,798(a1) > lbu a5,798(a2) > addwa5,a5,a4 > sb a5,798(a0) > lbu a5,799(a2) > addwa5,a5,a4 > sb a5,799(a0) > ret > > The tail elements need VLS modes to vectorize like ARM SVE: > > f: > mov x3, 0 > cntbx5 > mov x4, 792 > whilelo p7.b, xzr, x4 > .L2: > ld1bz31.b, p7/z, [x1, x3] > ld1bz30.b, p7/z, [x2, x3] > trn1z31.b, z31.b, z31.b > add z31.b, z31.b, z30.b > st1bz31.b, p7, [x0, x3] > add x3, x3, x5 > whilelo p7.b, x3, x4 > b.any .L2 > Tail: > ldr b31, [x1, 792] > ldr b27, [x1, 794] > ldr b28, [x1, 796] > dup v31.8b, v31.b[0] > ldr b29, [x1, 798] > ldr d30, [x2, 792] > ins v31.b[2], v27.b[0] > ins v31.b[3], v27.b[0] > ins v31.b[4], v28.b[0] > ins v31.b[5], v28.b[0] > ins v31.b[6], v29.b[0] > ins v31.b[7], v29.b[0] > add v31.8b, v30.8b, v31.8b > str d31, [x0, 792] > ret > > Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail. > > gcc/ChangeLog: > > * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for > GNU vectors. > (ADJUST_ALIGNMENT): Ditto. > (ADJUST_BYTESIZE): Ditto. > > (ADJUST_PRECISION): Ditto. > (VECTOR_MODES): Ditto. > * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto. > (get_regno_alignment): Ditto. > * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto. > (const_vlmax_p): Ditto. > (legitimize_move): Ditto. > (get_vlmul): Ditto. > (get_regno_alignment): Ditto. > (get_ratio): Ditto. > (get_vector_mode): Ditto. > * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto. > * config/riscv/riscv.cc
Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
On 6/6/23 16:50, Ben Boeckel wrote: Unicode does not support such values because they are unrepresentable in UTF-16. Pushed. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (, , ); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010 are +invalid because they cannot be represented in UTF-16. + +Reject such values.*/ + if (cp >= 0x10) + return false; } /* No problems encountered. */ return true;
Re: [PATCH v5 3/5] p1689r5: initial support
On 5/12/23 10:24, Ben Boeckel wrote: On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: I notice that the actual flags are all -fdep-*, though some of them are -fdeps-* here, and the internal variables all seem to be fdeps_*. I lean toward harmonizing on "deps", I think. Done. I don't love the three separate options, but I suppose it's fine. I'd prefer "target" instead of "output". Done. It should be possible to omit both -file and -target and get reasonable defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options. `file` can be omitted (the `output_stream` will be used then). I *think* I see that adding: %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}} %{!fdeps-file: but yes. would at least do for `-fdeps-file` defaults? I don't know if there's a reasonable default for `-fdeps-target=` though given that this command line has no information about the object file that will be used (`-o` is used for preprocessor output since we're leaning on `-E` here). I would think it could default to %b.o? I had quite a few more comments on the v5 patch that you didn't respond to here or address in the v6 patch; did your mail client hide them from you? Jason
Re: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF
On 6/6/23 16:50, Ben Boeckel wrote: Unicode does not support such values because they are unrepresentable in UTF-16. Pushed. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (, , ); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010 are +invalid because they cannot be represented in UTF-16. + +Reject such values.*/ + if (cp >= 0x10) + return false; } /* No problems encountered. */ return true;
Re: [PATCH v5 3/5] p1689r5: initial support
On 5/12/23 10:24, Ben Boeckel wrote: On Tue, Feb 14, 2023 at 16:50:27 -0500, Jason Merrill wrote: I notice that the actual flags are all -fdep-*, though some of them are -fdeps-* here, and the internal variables all seem to be fdeps_*. I lean toward harmonizing on "deps", I think. Done. I don't love the three separate options, but I suppose it's fine. I'd prefer "target" instead of "output". Done. It should be possible to omit both -file and -target and get reasonable defaults, like the ones for -MD/-MQ in gcc.cc:cpp_unique_options. `file` can be omitted (the `output_stream` will be used then). I *think* I see that adding: %{fdeps_file:-fdeps-file=%{!o:%b.ddi}%{o*:%.ddi%*}} %{!fdeps-file: but yes. would at least do for `-fdeps-file` defaults? I don't know if there's a reasonable default for `-fdeps-target=` though given that this command line has no information about the object file that will be used (`-o` is used for preprocessor output since we're leaning on `-E` here). I would think it could default to %b.o? I had quite a few more comments on the v5 patch that you didn't respond to here or address in the v6 patch; did your mail client hide them from you? Jason
[PATCH 12/14] OpenACC: "declare create" fixes wrt. "allocatable" variables
This patch fixes a case revealed by the previous patch where a synthetic "acc data" region created for a "declare create" variable could interact strangely with lexical inheritance behaviour. In fact, it doesn't seem right to create the "acc data" region for allocatable variables at all -- doing so means that a data region is likely to be created for an unallocated variable. The fix is not to add such variables to the synthetic "acc data" region at all, and defer to the code that performs "enter data"/"exit data" for them when allocated/deallocated on the host instead. Then, "declare create" variables are implicitly turned into "present" clauses on in-scope offload regions. 2023-06-16 Julian Brown gcc/fortran/ * trans-openmp.cc (gfc_omp_finish_clause): Handle "declare create" for scalar allocatable variables. (gfc_trans_omp_clauses): Don't include allocatable vars in synthetic "acc data" region created for "declare create" variables. Mark such variables with the "oacc declare create" attribute instead. Don't create ALWAYS_POINTER mapping for target-to-host updates of declare create variables. (gfc_trans_oacc_declare): Handle empty clause list. gcc/ * gimplify.cc (gimplify_adjust_omp_clauses_1): Handle "oacc declare create" attribute. libgomp/ * testsuite/libgomp.oacc-fortran/declare-create-1.f90: New test. * testsuite/libgomp.oacc-fortran/declare-create-2.f90: New test. * testsuite/libgomp.oacc-fortran/declare-create-3.f90: New test. --- gcc/fortran/trans-openmp.cc | 45 --- gcc/gimplify.cc | 8 .../libgomp.oacc-fortran/declare-create-1.f90 | 21 + .../libgomp.oacc-fortran/declare-create-2.f90 | 25 +++ .../libgomp.oacc-fortran/declare-create-3.f90 | 25 +++ 5 files changed, 119 insertions(+), 5 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-2.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/declare-create-3.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 1a14d2bc068..819d79cda28 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -1619,7 +1619,16 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) orig_decl = decl; c4 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER); + if (openacc + && GFC_DECL_GET_SCALAR_ALLOCATABLE (decl) + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_PRESENT) + /* This allows "declare create" to work for scalar allocatables. The + resulting mapping nodes are: +force_present(*var) firstprivate_pointer(var) + which is the same as an explicit "present" clause gives. */ + OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_FIRSTPRIVATE_POINTER); + else + OMP_CLAUSE_SET_MAP_KIND (c4, GOMP_MAP_POINTER); OMP_CLAUSE_DECL (c4) = decl; OMP_CLAUSE_SIZE (c4) = size_int (0); decl = build_fold_indirect_ref (decl); @@ -4588,6 +4597,29 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, if (!n->sym->attr.referenced) continue; + /* We do not want to include allocatable vars in a synthetic +"acc data" region created for "!$acc declare create" vars. +Such variables are handled by augmenting allocate/deallocate +statements elsewhere (with +"acc enter data declare_allocate(...)", etc.). */ + if (op == EXEC_OACC_DECLARE + && n->u.map_op == OMP_MAP_ALLOC + && n->sym->attr.allocatable + && n->sym->attr.oacc_declare_create) + { + tree tree_var = gfc_get_symbol_decl (n->sym); + if (!lookup_attribute ("oacc declare create", +DECL_ATTRIBUTES (tree_var))) + DECL_ATTRIBUTES (tree_var) + = tree_cons (get_identifier ("oacc declare create"), + NULL_TREE, DECL_ATTRIBUTES (tree_var)); + /* We might need to turn what would normally be a +"firstprivate" mapping into a "present" mapping. For the +latter, we need the decl to be addressable. */ + TREE_ADDRESSABLE (tree_var) = 1; + continue; + } + bool always_modifier = false; tree node = build_omp_clause (input_location, OMP_CLAUSE_MAP); tree node2 = NULL_TREE; @@ -4780,7 +4812,8 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses, tree orig_decl = decl;
[PATCH 11/14] OpenACC: Reimplement "inheritance" for lexically-nested offload regions
This patch reimplements "lexical inheritance" for OpenACC offload regions inside "data" regions, allowing e.g. this to work: int *ptr; [...] #pragma acc data copyin(ptr[10:2]) { #pragma acc parallel { ... } } here, the "copyin" is mirrored on the inner "acc parallel" as "present(ptr[10:2])" -- allowing code within the parallel to use that section of the array even though the mapping is implicit. In terms of implementation, this works by expanding mapping nodes for "acc data" to include pointer mappings that might be needed by inner offload regions. The resulting mapping group is then copied to the inner offload region as needed, rewriting the first node to "force_present". The pointer mapping nodes are then removed from the "acc data" later during gimplification. For OpenMP, pointer mapping nodes on equivalent "omp data" regions are not needed, so remain suppressed during expansion. 2023-06-16 Julian Brown gcc/c-family/ * c-omp.cc (c_omp_address_inspector::expand_array_base): Don't omit pointer nodes for OpenACC. gcc/ * gimplify.cc (omp_tsort_mark, omp_mapping_group): Move before gimplify_omp_ctx. Add constructor to omp_mapping_group. (gimplify_omp_ctx): Add DECL_DATA_CLAUSE field. (new_omp_context, delete_omp_context): Initialise and free above field. (omp_gather_mapping_groups_1): Use constructor for omp_mapping_group. (gimplify_scan_omp_clauses): Record mappings that might be lexically inherited. Don't remove GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE yet. (gomp_oacc_needs_data_present): New function. (gimplify_adjust_omp_clauses_1): Implement lexical inheritance behaviour for OpenACC. (gimplify_adjust_omp_clauses): Remove GOMP_MAP_FIRSTPRIVATE_POINTER/GOMP_MAP_FIRSTPRIVATE_REFERENCE here instead, after lexical inheritance is done. gcc/testsuite/ * c-c++-common/goacc/acc-data-chain.c: Re-enable scan test. * gfortran.dg/goacc/pr70828.f90: Likewise. * gfortran.dg/goacc/assumed-size.f90: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/pr70828.c: Un-XFAIL. * testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828.f90: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828-2.f90: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828-3.f90: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828-4.f90: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828-5.f90: Un-XFAIL. * testsuite/libgomp.oacc-fortran/pr70828-6.f90: Un-XFAIL. --- gcc/c-family/c-omp.cc | 13 +- gcc/gimplify.cc | 208 +- .../c-c++-common/goacc/acc-data-chain.c | 4 +- .../gfortran.dg/goacc/assumed-size.f90| 35 +++ gcc/testsuite/gfortran.dg/goacc/pr70828.f90 | 3 +- .../libgomp.oacc-c-c++-common/pr70828-2.c | 2 - .../libgomp.oacc-c-c++-common/pr70828.c | 2 - .../libgomp.oacc-fortran/pr70828-2.f90| 2 - .../libgomp.oacc-fortran/pr70828-3.f90| 2 - .../libgomp.oacc-fortran/pr70828-4.f90| 2 - .../libgomp.oacc-fortran/pr70828-5.f90| 2 - .../libgomp.oacc-fortran/pr70828-6.f90| 2 - .../libgomp.oacc-fortran/pr70828.f90 | 2 - 13 files changed, 202 insertions(+), 77 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc index e55b2aec920..291a26293ef 100644 --- a/gcc/c-family/c-omp.cc +++ b/gcc/c-family/c-omp.cc @@ -4313,7 +4313,8 @@ c_omp_address_inspector::expand_array_base (tree c, /* The code handling "firstprivatize_array_bases" in gimplify.cc is relevant here. What do we need to create for arrays at this stage? (This condition doesn't feel quite right. FIXME?) */ - if (!target + if (openmp + && !target && (TREE_CODE (TREE_TYPE (addr_tokens[i + 1]->expr)) == ARRAY_TYPE)) break; @@ -4324,7 +4325,7 @@ c_omp_address_inspector::expand_array_base (tree c, virtual_origin); tree data_addr = omp_accessed_addr (addr_tokens, i + 1, expr); c2 = build_omp_clause (loc, OMP_CLAUSE_MAP); - if (decl_p && target) + if (decl_p && (!openmp || target)) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); else { @@ -4375,9 +4376,11 @@ c_omp_address_inspector::expand_array_base (tree c, tree data_addr = omp_accessed_addr (addr_tokens, last_access, expr); c2 = build_omp_clause (loc, OMP_CLAUSE_MAP); /* For OpenACC, use FIRSTPRIVATE_POINTER for decls even on non-compute - regions (e.g. "acc data" constructs). It'll be removed anyway in - gimplify.cc, but doing it this way
[PATCH 10/14] OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc
This patch has been separated out from the C++ "declare mapper" support patch. It contains just the gimplify.cc rearrangement work, mostly moving gimplification from gimplify_scan_omp_clauses to gimplify_adjust_omp_clauses for map clauses. The motivation for doing this was that we don't know if we need to instantiate mappers implicitly until the body of an offload region has been scanned, i.e. in gimplify_adjust_omp_clauses, but we also need the un-gimplified form of clauses to sort by base-pointer dependencies after mapper instantiation has taken place. The patch also reimplements the "present" clause sorting code to avoid another sorting pass on mapping nodes. 2023-06-16 Julian Brown gcc/ * gimplify.cc (omp_segregate_mapping_groups): Handle "present" groups. (gimplify_scan_omp_clauses): Use mapping group functionality to iterate through mapping nodes. Remove most gimplification of OMP_CLAUSE_MAP nodes from here, but still populate ctx->variables splay tree. (gimplify_adjust_omp_clauses): Move most gimplification of OMP_CLAUSE_MAP nodes here. gcc/testsuite/ * gfortran.dg/gomp/map-12.f90: Adjust scan output. --- gcc/gimplify.cc | 670 -- gcc/testsuite/gfortran.dg/gomp/map-12.f90 | 2 +- 2 files changed, 378 insertions(+), 294 deletions(-) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 9ce1f5b983a..e21e9d99cc9 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -9779,10 +9779,15 @@ omp_tsort_mapping_groups (vec *groups, return outlist; } -/* Split INLIST into two parts, moving groups corresponding to - ALLOC/RELEASE/DELETE mappings to one list, and other mappings to another. - The former list is then appended to the latter. Each sub-list retains the - order of the original list. +/* Split INLIST into four parts: + + - "present" to/from groups + - "present" alloc groups + - other to/from groups + - other alloc/release/delete groups + + These sub-lists are then concatenated together to form the final list. + Each sub-list retains the order of the original list. Note that ATTACH nodes are later moved to the end of the list in gimplify_adjust_omp_clauses, for target regions. */ @@ -9790,7 +9795,9 @@ static omp_mapping_group * omp_segregate_mapping_groups (omp_mapping_group *inlist) { omp_mapping_group *ard_groups = NULL, *tf_groups = NULL; + omp_mapping_group *pa_groups = NULL, *ptf_groups = NULL; omp_mapping_group **ard_tail = _groups, **tf_tail = _groups; + omp_mapping_group **pa_tail = _groups, **ptf_tail = _groups; for (omp_mapping_group *w = inlist; w;) { @@ -9809,6 +9816,20 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist) ard_tail = >next; break; + case GOMP_MAP_PRESENT_ALLOC: + *pa_tail = w; + w->next = NULL; + pa_tail = >next; + break; + + case GOMP_MAP_PRESENT_FROM: + case GOMP_MAP_PRESENT_TO: + case GOMP_MAP_PRESENT_TOFROM: + *ptf_tail = w; + w->next = NULL; + ptf_tail = >next; + break; + default: *tf_tail = w; w->next = NULL; @@ -9820,8 +9841,10 @@ omp_segregate_mapping_groups (omp_mapping_group *inlist) /* Now splice the lists together... */ *tf_tail = ard_groups; + *pa_tail = tf_groups; + *ptf_tail = pa_groups; - return tf_groups; + return ptf_groups; } /* Given a list LIST_P containing groups of mappings given by GROUPS, reorder @@ -11673,119 +11696,30 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, break; } - if (code == OMP_TARGET - || code == OMP_TARGET_DATA - || code == OMP_TARGET_ENTER_DATA - || code == OMP_TARGET_EXIT_DATA) -{ - vec *groups; - groups = omp_gather_mapping_groups (list_p); - if (groups) - { - hash_map *grpmap; - grpmap = omp_index_mapping_groups (groups); + vec *groups = omp_gather_mapping_groups (list_p); + hash_map *grpmap = NULL; + unsigned grpnum = 0; + tree *grp_start_p = NULL, grp_end = NULL_TREE; - omp_resolve_clause_dependencies (code, groups, grpmap); - omp_build_struct_sibling_lists (code, region_type, groups, , - list_p); - - omp_mapping_group *outlist = NULL; - bool enter_exit = (code == OMP_TARGET_ENTER_DATA -|| code == OMP_TARGET_EXIT_DATA); - - /* Topological sorting may fail if we have duplicate nodes, which -we should have detected and shown an error for already. Skip -sorting in that case. */ - if (seen_error ()) - goto failure; - - delete grpmap; - delete groups; - - /* Rebuild now we have struct sibling lists. */ - groups = omp_gather_mapping_groups (list_p); - grpmap = omp_index_mapping_groups (groups); - -
[PATCH 08/14] OpenMP: Pointers and member mappings
This patch changes the mapping node arrangement used for array components of derived types, e.g.: type T integer, pointer, dimension(:) :: arrptr end type T type(T) :: tvar [...] !$omp target map(tofrom: tvar%arrptr) This will currently be mapped using three mapping nodes: GOMP_MAP_TO tvar%arrptr (the descriptor) GOMP_MAP_TOFROM *tvar%arrptr%data (the actual array data) GOMP_MAP_ALWAYS_POINTER tvar%arrptr%data (a pointer to the array data) This follows OMP 5.0, 2.19.7.1 (or OpenMP 5.2, 5.8.3) "map Clause": "If a list item in a map clause is an associated pointer and the pointer is not the base pointer of another list item in a map clause on the same construct, then it is treated as if its pointer target is implicitly mapped in the same clause. For the purposes of the map clause, the mapped pointer target is treated as if its base pointer is the associated pointer." However, we can also write this: map(to: tvar%arrptr) map(tofrom: tvar%arrptr(3:8)) and then instead we should follow (OpenMP 5.2, 5.8.3 "map Clause"): "For map clauses on map-entering constructs, if any list item has a base pointer for which a corresponding pointer exists in the data environment upon entry to the region and either a new list item or the corresponding pointer is created in the device data environment on entry to the region, then: 1. [Fortran] The corresponding pointer variable is associated with a pointer target that has the same rank and bounds as the pointer target of the original pointer, such that the corresponding list item can be accessed through the pointer in a target region. 2. The corresponding pointer variable becomes an attached pointer for the corresponding list item." With this patch you can write the above mappings, and the mapping nodes used to map pointers to array sections (with descriptors) now look like this: 1) map(to: tvar%arrptr) --> GOMP_MAP_TO [implicit] *tvar%arrptr%data (the array data) GOMP_MAP_TO_PSETtvar%arrptr(the descriptor) GOMP_MAP_ATTACH_DETACH tvar%arrptr%data 2) map(tofrom: tvar%arrptr(3:8) --> GOMP_MAP_TOFROM *tvar%arrptr%data(3) (size 8-3+1, etc.) GOMP_MAP_TO_PSETtvar%arrptr GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.) In this case, we can determine in the front-end that the whole-array/pointer mapping (1) is only needed to map the pointer -- so we drop it entirely. (Note also that we set -- early -- the OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer mappings. See below.) In the middle end, we process mappings using the struct sibling-list handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle of the group of three mapping nodes to the proper sorted position after the GOMP_MAP_STRUCT mapping: GOMP_MAP_STRUCT tvar (len: 1) GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here [...] | GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___| GOMP_MAP_ATTACH_DETACH tvar%arrptr%data In another case, if we have an array of derived-type values "dtarr", and mappings like: i = 1 j = 1 map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8)) We still map the same way, but this time we cannot prove that the base expressions "dtarr(i) and "dtarr(j)" are the same in the front-end. So we keep both mappings, but we move the "[implicit]" mapping of the full-array reference to the end of the clause list in gimplify.cc (by adjusting the topological sorting algorithm): GOMP_MAP_STRUCT dtvar (len: 2) GOMP_MAP_TO_PSETdtvar(i)%arrptr GOMP_MAP_TO_PSETdtvar(j)%arrptr [...] GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3) (size: 8-3+1) GOMP_MAP_ATTACH_DETACH dtvar(j)%arrptr%data GOMP_MAP_TO [implicit] *dtvar(i)%arrptr%data(1) (size: whole array) GOMP_MAP_ATTACH_DETACH dtvar(i)%arrptr%data Always moving "[implicit]" full-array mappings after array-section mappings (without that bit set) means that we'll avoid copying the whole array unnecessarily -- even in cases where we can't prove that the arrays are the same. The patch also fixes some bugs with "enter data" and "exit data" directives with this new mapping arrangement. Also now if you have mappings like this: #pragma omp target enter data map(to: dv, dv%arr(1:20)) The whole of the derived-type variable "dv" is mapped, so the GOMP_MAP_TO_PSET for the array-section mapping can be dropped: GOMP_MAP_TOdv GOMP_MAP_TO*dv%arr%data GOMP_MAP_TO_PSET dv%arr <-- deleted (array section mapping) GOMP_MAP_ATTACH_DETACH dv%arr%data To accommodate for recent changes to mapping nodes made by Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET for "exit data" directives, in favour of using the "correct" GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.
[PATCH 09/14] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic
This patch adds support for non-constant component offsets in "map" clauses for OpenMP (and the equivalants for OpenACC), which are not able to be sorted into order at compile time. Normally struct accesses in such clauses are gathered together and sorted into increasing address order after a "GOMP_MAP_STRUCT" node: if we have variable indices, that is no longer possible. This version of the patch scales back the previously-posted version to merely add a diagnostic for incorrect usage of component accesses with variably-indexed arrays of structs: the only permitted variant is where we have multiple indices that are the same, but we could not prove so at compile time. Rather than silently producing the wrong result for cases where the indices are in fact different, we error out (e.g., "map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for different i/j). For now, multiple *constant* array indices are still supported (see map-arrayofstruct-1.c). That could perhaps be addressed with a follow-up patch, if necessary. This version of the patch renumbers the GOMP_MAP_STRUCT_UNORD kind to avoid clashing with the OpenACC "non-contiguous" dynamic array support. 2023-06-16 Julian Brown gcc/fortran/ * trans-openmp.cc (gfc_omp_deep_map_kind_p): Add GOMP_MAP_STRUCT_UNORD. gcc/ * gimplify.cc (extract_base_bit_offset): Add VARIABLE_OFFSET parameter. (omp_get_attachment, omp_group_last, omp_group_base, omp_directive_maps_explicitly): Add GOMP_MAP_STRUCT_UNORD support. (omp_accumulate_sibling_list): Update calls to extract_base_bit_offset. Support GOMP_MAP_STRUCT_UNORD. (omp_build_struct_sibling_lists, gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses, gimplify_omp_target_update): Add GOMP_MAP_STRUCT_UNORD support. * omp-low.cc (lower_omp_target): Add GOMP_MAP_STRUCT_UNORD support. * tree-pretty-print.cc (dump_omp_clause): Likewise. include/ * gomp-constants.h (gomp_map_kind): Add GOMP_MAP_STRUCT_UNORD. libgomp/ * oacc-mem.c (find_group_last, goacc_enter_data_internal, goacc_exit_data_internal, GOACC_enter_exit_data): Add GOMP_MAP_STRUCT_UNORD support. * target.c (gomp_map_vars_internal): Add GOMP_MAP_STRUCT_UNORD support. Detect incorrect use of variable indexing of arrays of structs. (GOMP_target_enter_exit_data, gomp_target_task_fn): Add GOMP_MAP_STRUCT_UNORD support. * testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c: New test. * testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c: New test. * testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c: New test. * testsuite/libgomp.fortran/map-subarray-5.f90: New test. --- gcc/fortran/trans-openmp.cc | 1 + gcc/gimplify.cc | 110 ++ gcc/omp-low.cc| 1 + gcc/tree-pretty-print.cc | 3 + include/gomp-constants.h | 6 + libgomp/oacc-mem.c| 6 +- libgomp/target.c | 60 +- .../map-arrayofstruct-1.c | 38 ++ .../map-arrayofstruct-2.c | 58 + .../map-arrayofstruct-3.c | 68 +++ .../libgomp.fortran/map-subarray-5.f90| 54 + 11 files changed, 378 insertions(+), 27 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-1.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c create mode 100644 libgomp/testsuite/libgomp.fortran/map-subarray-5.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index a108f718ffa..1a14d2bc068 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -2961,6 +2961,7 @@ gfc_omp_deep_map_kind_p (tree clause) case GOMP_MAP_FORCE_TOFROM: case GOMP_MAP_USE_DEVICE_PTR_IF_PRESENT: case GOMP_MAP_STRUCT: +case GOMP_MAP_STRUCT_UNORD: case GOMP_MAP_ALWAYS_POINTER: case GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION: case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION: diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index da81582da1c..9ce1f5b983a 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -8952,7 +8952,8 @@ build_omp_struct_comp_nodes (enum tree_code code, tree grp_start, tree grp_end, static tree extract_base_bit_offset (tree base, poly_int64 *bitposp, -poly_offset_int *poffsetp) +poly_offset_int *poffsetp, +bool *variable_offset) { tree offset; poly_int64 bitsize, bitpos; @@ -8970,10 +8971,13 @@ extract_base_bit_offset (tree base, poly_int64 *bitposp, if (offset && poly_int_tree_p (offset)) { poffset = wi::to_poly_offset (offset); -
[PATCH 14/14] OpenACC: Improve implicit mapping for non-lexically nested offload regions
This patch enables use of the OMP_CLAUSE_RUNTIME_IMPLICIT_P flag for OpenACC. This allows code like this to work correctly: int arr[100]; [...] #pragma acc enter data copyin(arr[20:10]) /* No explicit mapping of 'arr' here. */ #pragma acc parallel { /* use of arr[20:10]... */ } #pragma acc exit data copyout(arr[20:10]) Otherwise, the implicit "copy" ("present_or_copy") on the parallel corresponds to the whole array, and that fails at runtime when the subarray is mapped. The numbering of the GOMP_MAP_IMPLICIT bit clashes with the OpenACC "non-contiguous" dynamic array support, so the GOMP_MAP_NONCONTIG_ARRAY_P macro has been adjusted to account for that. This behaviour relates to upstream OpenACC issue 490 (not yet resolved). 2023-06-16 Julian Brown gcc/ * gimplify.cc (gimplify_adjust_omp_clauses_1): Set OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P for OpenACC also. gcc/testsuite/ * c-c++-common/goacc/combined-reduction.c: Adjust scan output. * c-c++-common/goacc/reduction-1.c: Likewise. * c-c++-common/goacc/reduction-2.c: Likewise. * c-c++-common/goacc/reduction-3.c: Likewise. * c-c++-common/goacc/reduction-4.c: Likewise. * c-c++-common/goacc/reduction-10.c: Likewise. * gfortran.dg/goacc/loop-tree-1.f90: Likewise. include/ * gomp-constants.h (GOMP_MAP_NONCONTIG_ARRAY_P): Tweak condition. libgomp/ * testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c: New test. --- gcc/gimplify.cc | 5 +--- .../c-c++-common/goacc/combined-reduction.c | 2 +- .../c-c++-common/goacc/reduction-1.c | 4 ++-- .../c-c++-common/goacc/reduction-10.c | 9 +++ .../c-c++-common/goacc/reduction-2.c | 4 ++-- .../c-c++-common/goacc/reduction-3.c | 4 ++-- .../c-c++-common/goacc/reduction-4.c | 4 ++-- .../gfortran.dg/goacc/loop-tree-1.f90 | 2 +- include/gomp-constants.h | 3 ++- .../implicit-mapping-1.c | 24 +++ 10 files changed, 42 insertions(+), 19 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/implicit-mapping-1.c diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 0706f130ebb..1e90d2ed031 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -13413,10 +13413,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) gcc_unreachable (); } OMP_CLAUSE_SET_MAP_KIND (clause, kind); - /* Setting of the implicit flag for the runtime is currently disabled for -OpenACC. */ - if ((gimplify_omp_ctxp->region_type & ORT_ACC) == 0) - OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1; + OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P (clause) = 1; if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST) { diff --git a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c index ecf23f59d66..40b93acc9ea 100644 --- a/gcc/testsuite/c-c++-common/goacc/combined-reduction.c +++ b/gcc/testsuite/c-c++-common/goacc/combined-reduction.c @@ -25,5 +25,5 @@ main () /* { dg-final { scan-tree-dump-times "omp target oacc_parallel reduction.+:v1. map.tofrom:v1" 1 "gimple" } } */ /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 "gimple" } } */ -/* { dg-final { scan-tree-dump-times "omp target oacc_kernels map.force_tofrom:n .len: 4.. map.force_tofrom:v1 .len: 4.." 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "omp target oacc_kernels map.force_tofrom:n .len: 4..implicit.. map.force_tofrom:v1 .len: 4..implicit.." 1 "gimple" } } */ /* { dg-final { scan-tree-dump-times "acc loop reduction.+:v1. private.i." 1 "gimple" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-1.c b/gcc/testsuite/c-c++-common/goacc/reduction-1.c index 35bfc868708..d9e3c380b8e 100644 --- a/gcc/testsuite/c-c++-common/goacc/reduction-1.c +++ b/gcc/testsuite/c-c++-common/goacc/reduction-1.c @@ -68,5 +68,5 @@ main(void) } /* Check that default copy maps are generated for loop reductions. */ -/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\)" 7 "gimple" } } */ -/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\)" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:result \\\[len: \[0-9\]+\\\]\\\[implicit\\\]\\)" 7 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "map\\(tofrom:lresult \\\[len: \[0-9\]+\\\]\\\[implicit\\\]\\)" 2 "gimple" } } */ diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-10.c b/gcc/testsuite/c-c++-common/goacc/reduction-10.c index 579aa561479..36c330e9267 100644 --- a/gcc/testsuite/c-c++-common/goacc/reduction-10.c +++ b/gcc/testsuite/c-c++-common/goacc/reduction-10.c @@ -87,7 +87,8 @@ main(void) /* Check that default copy maps are generated for loop reductions. */ /* {
[PATCH 13/14] OpenACC: Allow implicit uses of assumed-size arrays in offload regions
This patch reimplements the functionality of the previously-reverted patch "Assumed-size arrays with non-lexical data mappings". The purpose is to support implicit uses of assumed-size arrays for Fortran when those arrays have already been mapped on the target some other way (e.g. by "acc enter data"). This relates to upstream OpenACC issue 489 (not yet resolved). 2023-06-16 Julian Brown gcc/fortran/ * trans-openmp.cc (gfc_omp_finish_clause): Treat implicitly-mapped assumed-size arrays as zero-sized for OpenACC, rather than an error. gcc/testsuite/ * gfortran.dg/goacc/assumed-size.f90: Don't expect error. libgomp/ * testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90: New test. * testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90: New test. --- gcc/fortran/trans-openmp.cc | 16 ++-- .../gfortran.dg/goacc/assumed-size.f90| 4 +- .../nonlexical-assumed-size-1.f90 | 28 + .../nonlexical-assumed-size-2.f90 | 40 +++ 4 files changed, 82 insertions(+), 6 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 819d79cda28..230cebf250b 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -1587,6 +1587,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) return; tree decl = OMP_CLAUSE_DECL (c); + bool assumed_size = false; /* Assumed-size arrays can't be mapped implicitly, they have to be mapped explicitly using array sections. */ @@ -1597,9 +1598,14 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1) == NULL) { - error_at (OMP_CLAUSE_LOCATION (c), - "implicit mapping of assumed size array %qD", decl); - return; + if (openacc) + assumed_size = true; + else + { + error_at (OMP_CLAUSE_LOCATION (c), + "implicit mapping of assumed size array %qD", decl); + return; + } } if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR) @@ -1654,7 +1660,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) else { OMP_CLAUSE_DECL (c) = decl; - OMP_CLAUSE_SIZE (c) = NULL_TREE; + OMP_CLAUSE_SIZE (c) = assumed_size ? size_zero_node : NULL_TREE; + if (assumed_size) + OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION (c) = 1; } if (TREE_CODE (TREE_TYPE (orig_decl)) == REFERENCE_TYPE && (GFC_DECL_GET_SCALAR_POINTER (orig_decl) diff --git a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 index 4fced2e70c9..12f44c4743a 100644 --- a/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/assumed-size.f90 @@ -4,7 +4,8 @@ ! exit data, respectively. ! This does not appear to be supported by the OpenACC standard as of version -! 3.0. Check for an appropriate error message. +! 3.0. There is however real-world code that relies on this working, so we +! make an attempt to support it. program test implicit none @@ -26,7 +27,6 @@ subroutine dtest (a, n) !$acc enter data copyin(a(1:n)) !$acc parallel loop -! { dg-error {implicit mapping of assumed size array 'a'} "" { target *-*-* } .-1 } do i = 1, n a(i) = i end do diff --git a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 new file mode 100644 index 000..4b61e1cee9b --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-1.f90 @@ -0,0 +1,28 @@ +! { dg-do run } + +program p +implicit none +integer :: myarr(10) + +myarr = 0 + +call subr(myarr) + +if (myarr(5).ne.5) stop 1 + +contains + +subroutine subr(arr) +implicit none +integer :: arr(*) + +!$acc enter data copyin(arr(1:10)) + +!$acc serial +arr(5) = 5 +!$acc end serial + +!$acc exit data copyout(arr(1:10)) + +end subroutine subr +end program p diff --git a/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 new file mode 100644 index 000..daf7089915f --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/nonlexical-assumed-size-2.f90 @@ -0,0 +1,40 @@ +! { dg-do run } + +program p +implicit none +integer :: myarr(10) + +myarr = 0 + +call subr(myarr) + +if (myarr(5).ne.5) stop 1 + +contains + +subroutine subr(arr) +implicit none +integer :: arr(*) + +! At first glance, it might not be obvious how this works. The "enter data" +! and "exit data" operations expand to a pair
[PATCH 05/14] OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause
This patch trivially adds braces and reindents the OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in c_finish_omp_clause and finish_omp_clause, in preparation for the following patch (to clarify the diff a little). 2022-09-13 Julian Brown gcc/c/ * c-typeck.cc (c_finish_omp_clauses): Add braces and reindent OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza. gcc/cp/ * semantics.cc (finish_omp_clause): Add braces and reindent OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza. --- gcc/c/c-typeck.cc | 615 +- gcc/cp/semantics.cc | 788 ++-- 2 files changed, 707 insertions(+), 696 deletions(-) diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 9591d67251e..2cfe2174bab 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -15520,321 +15520,326 @@ c_finish_omp_clauses (tree clauses, enum c_omp_region_type ort) case OMP_CLAUSE_TO: case OMP_CLAUSE_FROM: case OMP_CLAUSE__CACHE_: - t = OMP_CLAUSE_DECL (c); - if (TREE_CODE (t) == TREE_LIST) - { - grp_start_p = pc; - grp_sentinel = OMP_CLAUSE_CHAIN (c); + { + t = OMP_CLAUSE_DECL (c); + if (TREE_CODE (t) == TREE_LIST) + { + grp_start_p = pc; + grp_sentinel = OMP_CLAUSE_CHAIN (c); - if (handle_omp_array_sections (c, ort)) - remove = true; - else - { - t = OMP_CLAUSE_DECL (c); - if (!omp_mappable_type (TREE_TYPE (t))) - { - error_at (OMP_CLAUSE_LOCATION (c), - "array section does not have mappable type " - "in %qs clause", - omp_clause_code_name[OMP_CLAUSE_CODE (c)]); - remove = true; - } - else if (TYPE_ATOMIC (TREE_TYPE (t))) - { - error_at (OMP_CLAUSE_LOCATION (c), - "%<_Atomic%> %qE in %qs clause", t, - omp_clause_code_name[OMP_CLAUSE_CODE (c)]); - remove = true; - } - while (TREE_CODE (t) == ARRAY_REF) - t = TREE_OPERAND (t, 0); - if (TREE_CODE (t) == COMPONENT_REF - && TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE) - { - do - { - t = TREE_OPERAND (t, 0); - if (TREE_CODE (t) == MEM_REF - || TREE_CODE (t) == INDIRECT_REF) - { - t = TREE_OPERAND (t, 0); - STRIP_NOPS (t); - if (TREE_CODE (t) == POINTER_PLUS_EXPR) - t = TREE_OPERAND (t, 0); - } - } - while (TREE_CODE (t) == COMPONENT_REF -|| TREE_CODE (t) == ARRAY_REF); - - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP - && OMP_CLAUSE_MAP_IMPLICIT (c) - && (bitmap_bit_p (_head, DECL_UID (t)) - || bitmap_bit_p (_field_head, DECL_UID (t)) - || bitmap_bit_p (_firstprivate_head, - DECL_UID (t - { - remove = true; - break; - } - if (bitmap_bit_p (_field_head, DECL_UID (t))) - break; - if (bitmap_bit_p (_head, DECL_UID (t))) - { - if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP) - error_at (OMP_CLAUSE_LOCATION (c), - "%qD appears more than once in motion " - "clauses", t); - else if (ort == C_ORT_ACC) - error_at (OMP_CLAUSE_LOCATION (c), - "%qD appears more than once in data " - "clauses", t); - else - error_at (OMP_CLAUSE_LOCATION (c), - "%qD appears more than once in map " - "clauses", t); - remove = true; - } - else - { - bitmap_set_bit (_head, DECL_UID (t)); - bitmap_set_bit (_field_head, DECL_UID (t)); - } - } - } - if
[PATCH 07/14] OpenMP: implicitly map base pointer for array-section pointer components
Following from discussion in: https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570075.html and: https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608100.html and also upstream OpenMP issue 342, this patch changes mapping for array sections of pointer components on compute regions like this: #pragma omp target map(s.ptr[0:10]) { ...use of 's'... } so the base pointer 's.ptr' is implicitly mapped, and thus pointer attachment happens. This is subtly different in the "enter data" case, e.g: #pragma omp target enter data map(s.ptr[0:10]) if 's.ptr' (or the whole of 's') is not present on the target before the directive is executed, the array section is copied to the target but pointer attachment does *not* take place, since 's' (or 's.ptr') is not mapped implicitly for "enter data". To get a pointer attachment with "enter data", you can do, e.g: #pragma omp target enter data map(s.ptr, s.ptr[0:10]) #pragma omp target { ...implicit use of 's'... } That is, once the attachment has happened, implicit mapping of 's' and uses of 's.ptr[...]' work correctly in the target region. ChangeLog 2022-12-12 Julian Brown gcc/ * gimplify.cc (omp_accumulate_sibling_list): Don't require explicitly-mapped base pointer for compute regions. gcc/testsuite/ * c-c++-comon/gomp/target-implicit-map-2.c: Update expected scan output. libgomp/ * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing "free". * testsuite/libgomp.c-c++-common/target-implicit-map-3.c: New test. * testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test. * testsuite/libgomp.c/target-22.c: Remove explicit base pointer mappings. --- gcc/gimplify.cc | 9 ++-- .../c-c++-common/gomp/target-implicit-map-2.c | 3 +- .../target-implicit-map-2.c | 2 + .../target-implicit-map-5.c | 50 +++ .../libgomp.c-c++-common/target-map-zlas-1.c | 36 + libgomp/testsuite/libgomp.c/target-22.c | 3 +- 6 files changed, 97 insertions(+), 6 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-5.c create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-map-zlas-1.c diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 9be5d9c5328..6a43c792450 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -10696,6 +10696,7 @@ omp_accumulate_sibling_list (enum omp_region_type region_type, poly_int64 cbitpos; tree ocd = OMP_CLAUSE_DECL (grp_end); bool openmp = !(region_type & ORT_ACC); + bool target = (region_type & ORT_TARGET) != 0; tree *continue_at = NULL; while (TREE_CODE (ocd) == ARRAY_REF) @@ -10800,9 +10801,9 @@ omp_accumulate_sibling_list (enum omp_region_type region_type, } /* For OpenMP semantics, we don't want to implicitly allocate -space for the pointer here. A FRAGILE_P node is only being -created so that omp-low.cc is able to rewrite the struct -properly. +space for the pointer here for non-compute regions (e.g. "enter +data"). A FRAGILE_P node is only being created so that +omp-low.cc is able to rewrite the struct properly. For references (to pointers), we want to actually allocate the space for the reference itself in the sorted list following the struct node. @@ -10810,6 +10811,7 @@ omp_accumulate_sibling_list (enum omp_region_type region_type, mapping of the attachment point, but not otherwise. */ if (*fragile_p || (openmp + && !target && attach_detach && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end))) @@ -11122,6 +11124,7 @@ omp_accumulate_sibling_list (enum omp_region_type region_type, if (*fragile_p || (openmp + && !target && attach_detach && TREE_CODE (TREE_TYPE (ocd)) == POINTER_TYPE && !OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED (grp_end))) diff --git a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c index 5ba1d7efe08..72df5b1 100644 --- a/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c +++ b/gcc/testsuite/c-c++-common/gomp/target-implicit-map-2.c @@ -49,4 +49,5 @@ main (void) /* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(tofrom:a \[len: [0-9]+\]\[implicit\]\)} "gimple" } } */ -/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a \[len: 1\]\) map\(alloc:a\.ptr \[len: 0\]\) map\(tofrom:\*_[0-9]+ \[len: [0-9]+\]\) map\(attach:a\.ptr \[bias: 0\]\)} "gimple" } } */ +/* { dg-final { scan-tree-dump {#pragma omp target num_teams.* map\(struct:a \[len:
[PATCH 03/14] Revert "Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)"
This reverts commit a84b89b8f070f1efe86ea347e98d57e6bc32ae2d. Relevant tests are temporarily disabled or XFAILed. 2023-06-16 Julian Brown gcc/ Revert: * gimplify.cc (oacc_array_mapping_info): New struct. (gimplify_omp_ctx): Add decl_data_clause hash map. (new_omp_context): Zero-initialise above. (delete_omp_context): Delete above if allocated. (gimplify_scan_omp_clauses): Scan for array mappings on data constructs, and record in above map. (gomp_oacc_needs_data_present): New function. (gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array slices) declared in lexically-enclosing data constructs. * omp-low.cc (lower_omp_target): Allow decl for bias not to be present in OpenACC context. gcc/fortran/ Revert: * trans-openmp.cc: Handle implicit "present". gcc/testsuite/ * c-c++-common/goacc/acc-data-chain.c: Partly disable test. * gfortran.dg/goacc/pr70828.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/pr70828.c: XFAIL test. * testsuite/libgomp.oacc-c-c++-common/pr70828-2.c: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828.f90: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828-2.f90: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828-3.f90: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828-4.f90: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828-5.f90: XFAIL test. * testsuite/libgomp.oacc-fortran/pr70828-6.f90: XFAIL test. --- gcc/fortran/trans-openmp.cc | 10 +- gcc/gimplify.cc | 143 +- gcc/omp-low.cc| 10 +- .../c-c++-common/goacc/acc-data-chain.c | 4 +- gcc/testsuite/gfortran.dg/goacc/pr70828.f90 | 3 +- .../libgomp.oacc-c-c++-common/pr70828-2.c | 2 + .../libgomp.oacc-c-c++-common/pr70828.c | 2 + .../libgomp.oacc-fortran/pr70828-2.f90| 2 + .../libgomp.oacc-fortran/pr70828-3.f90| 2 + .../libgomp.oacc-fortran/pr70828-4.f90| 2 + .../libgomp.oacc-fortran/pr70828-5.f90| 2 + .../libgomp.oacc-fortran/pr70828-6.f90| 2 + .../libgomp.oacc-fortran/pr70828.f90 | 2 + 13 files changed, 28 insertions(+), 158 deletions(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 96e91a3bc50..809b96bc220 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -1587,13 +1587,9 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) tree decl = OMP_CLAUSE_DECL (c); - /* Assumed-size arrays can't be mapped implicitly, they have to be mapped - explicitly using array sections. An exception is if the array is - mapped explicitly in an enclosing data construct for OpenACC, in which - case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an - error. */ - if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT - && TREE_CODE (decl) == PARM_DECL + /* Assumed-size arrays can't be mapped implicitly, they have to be + mapped explicitly using array sections. */ + if (TREE_CODE (decl) == PARM_DECL && GFC_ARRAY_TYPE_P (TREE_TYPE (decl)) && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl), diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 80f1f3a657f..e3384c7f65b 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -218,17 +218,6 @@ enum gimplify_defaultmap_kind GDMK_POINTER }; -/* Used to record clauses representing array slices on data directives that - may affect implicit mapping semantics on enclosed OpenACC parallel/kernels - regions. PSET is used for Fortran array slices with array descriptors, - or NULL otherwise. */ -struct oacc_array_mapping_info -{ - tree mapping; - tree pset; - tree pointer; -}; - struct gimplify_omp_ctx { struct gimplify_omp_ctx *outer_context; @@ -250,7 +239,6 @@ struct gimplify_omp_ctx bool in_for_exprs; bool ompacc; int defaultmap[5]; - hash_map *decl_data_clause; }; struct privatize_reduction @@ -485,7 +473,6 @@ new_omp_context (enum omp_region_type region_type) c->defaultmap[GDMK_AGGREGATE] = GOVD_MAP; c->defaultmap[GDMK_ALLOCATABLE] = GOVD_MAP; c->defaultmap[GDMK_POINTER] = GOVD_MAP; - c->decl_data_clause = NULL; return c; } @@ -498,8 +485,6 @@ delete_omp_context (struct gimplify_omp_ctx *c) splay_tree_delete (c->variables); delete c->privatized_types; c->loop_iter_var.release (); - if (c->decl_data_clause) -delete c->decl_data_clause; XDELETE (c); } @@ -11235,41 +11220,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_TARGET: break; case OACC_DATA: - { - tree base_ptr = OMP_CLAUSE_CHAIN (c); - tree pset = NULL; -
[PATCH 01/14] Revert "Assumed-size arrays with non-lexical data mappings"
This reverts commit 72733f6e6f6ec1bb9884fea8bfbebd3de03d9374. 2023-06-16 Julian Brown gcc/ Revert: * gimplify.cc (gimplify_adjust_omp_clauses_1): Raise error for assumed-size arrays in map clauses for Fortran/OpenMP. * omp-low.cc (lower_omp_target): Set the size of assumed-size Fortran arrays to one to allow use of data already mapped on the offload device. gcc/fortran/ Revert: * trans-openmp.cc (gfc_omp_finish_clause): Change clauses mapping assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type. --- gcc/fortran/trans-openmp.cc | 22 +- gcc/gimplify.cc | 14 -- gcc/omp-low.cc | 5 - 3 files changed, 9 insertions(+), 32 deletions(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index e8f3b24e5f8..e55c8292d05 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -1588,18 +1588,10 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) tree decl = OMP_CLAUSE_DECL (c); /* Assumed-size arrays can't be mapped implicitly, they have to be mapped - explicitly using array sections. For OpenACC this restriction is lifted - if the array has already been mapped: - - - Using a lexically-enclosing data region: in that case we see the - GOMP_MAP_FORCE_PRESENT mapping kind here. - - - Using a non-lexical data mapping ("acc enter data"). - - In the latter case we change the mapping type to GOMP_MAP_FORCE_PRESENT. - This raises an error for OpenMP in the caller - (gimplify.c:gimplify_adjust_omp_clauses_1). OpenACC will raise a runtime - error if the implicitly-referenced assumed-size array is not mapped. */ + explicitly using array sections. An exception is if the array is + mapped explicitly in an enclosing data construct for OpenACC, in which + case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an + error. */ if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT && TREE_CODE (decl) == PARM_DECL && GFC_ARRAY_TYPE_P (TREE_TYPE (decl)) @@ -1607,7 +1599,11 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl), GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1) == NULL) -OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT); +{ + error_at (OMP_CLAUSE_LOCATION (c), + "implicit mapping of assumed size array %qD", decl); + return; +} if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR) return; diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 09c596f026e..3729b986801 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -12828,26 +12828,12 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) *list_p = clause; struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; gimplify_omp_ctxp = ctx->outer_context; - gomp_map_kind kind = (code == OMP_CLAUSE_MAP) ? OMP_CLAUSE_MAP_KIND (clause) - : (gomp_map_kind) GOMP_MAP_LAST; /* Don't call omp_finish_clause on implicitly added OMP_CLAUSE_PRIVATE in simd. Those are only added for the local vars inside of simd body and they don't need to be e.g. default constructible. */ if (code != OMP_CLAUSE_PRIVATE || ctx->region_type != ORT_SIMD) lang_hooks.decls.omp_finish_clause (clause, pre_p, (ctx->region_type & ORT_ACC) != 0); - /* Allow OpenACC to have implicit assumed-size arrays via FORCE_PRESENT, - which should work as long as the array has previously been mapped - explicitly on the target (e.g. by "enter data"). Raise an error for - OpenMP. */ - if (lang_GNU_Fortran () - && code == OMP_CLAUSE_MAP - && (ctx->region_type & ORT_ACC) == 0 - && kind == GOMP_MAP_TOFROM - && OMP_CLAUSE_MAP_KIND (clause) == GOMP_MAP_FORCE_PRESENT) -error_at (OMP_CLAUSE_LOCATION (clause), - "implicit mapping of assumed size array %qD", - OMP_CLAUSE_DECL (clause)); if (gimplify_omp_ctxp) for (; clause != chain; clause = OMP_CLAUSE_CHAIN (clause)) if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 3424eba2217..59143d8efe5 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -14353,11 +14353,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) s = OMP_CLAUSE_SIZE (c); if (s == NULL_TREE) s = TYPE_SIZE_UNIT (TREE_TYPE (ovar)); - /* Fortran assumed-size arrays have zero size because the type is - incomplete. Set the size to one to allow the runtime to remap - any existing data that is already present on the accelerator. */ - if (s == NULL_TREE && is_gimple_omp_oacc (ctx->stmt)) - s = integer_one_node;
[PATCH 02/14] Revert "Fix references declared in lexically-enclosing OpenACC data region"
This reverts commit c9cd2bac6a5127a01c6f47e5636a926ac39b5e21. 2023-06-16 Julian Brown gcc/fortran/ Revert: * trans-openmp.cc (gfc_omp_finish_clause): Guard addition of clauses for pointers with DECL_P. gcc/ Revert: * gimplify.cc (oacc_array_mapping_info): Add REF field. (gimplify_scan_omp_clauses): Initialise above field for data blocks passed by reference. (gomp_oacc_needs_data_present): Handle references. (gimplify_adjust_omp_clauses_1): Handle references and optional arguments for variables declared in lexically-enclosing OpenACC data region. --- gcc/fortran/trans-openmp.cc | 2 +- gcc/gimplify.cc | 55 + 2 files changed, 8 insertions(+), 49 deletions(-) diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index e55c8292d05..96e91a3bc50 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -1611,7 +1611,7 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p, bool openacc) tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE; tree present = gfc_omp_check_optional_argument (decl, true); tree orig_decl = NULL_TREE; - if (DECL_P (decl) && POINTER_TYPE_P (TREE_TYPE (decl))) + if (POINTER_TYPE_P (TREE_TYPE (decl))) { if (!gfc_omp_privatize_by_reference (decl) && !GFC_DECL_GET_SCALAR_POINTER (decl) diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 3729b986801..80f1f3a657f 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -227,7 +227,6 @@ struct oacc_array_mapping_info tree mapping; tree pset; tree pointer; - tree ref; }; struct gimplify_omp_ctx @@ -11248,9 +11247,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, } if (base_ptr && OMP_CLAUSE_CODE (base_ptr) == OMP_CLAUSE_MAP - && !(OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP -&& (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ALLOC -|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)) && OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_TO_PSET && ((OMP_CLAUSE_MAP_KIND (base_ptr) == GOMP_MAP_FIRSTPRIVATE_POINTER) @@ -11269,19 +11265,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, ai.mapping = unshare_expr (c); ai.pset = pset ? unshare_expr (pset) : NULL; ai.pointer = unshare_expr (base_ptr); - ai.ref = NULL_TREE; - if (TREE_CODE (base_addr) == INDIRECT_REF - && (TREE_CODE (TREE_TYPE (TREE_OPERAND (base_addr, 0))) - == REFERENCE_TYPE)) - { - base_addr = TREE_OPERAND (base_addr, 0); - tree ref_clause = OMP_CLAUSE_CHAIN (base_ptr); - gcc_assert ((OMP_CLAUSE_CODE (ref_clause) -== OMP_CLAUSE_MAP) - && (OMP_CLAUSE_MAP_KIND (ref_clause) - == GOMP_MAP_POINTER)); - ai.ref = unshare_expr (ref_clause); - } ctx->decl_data_clause->put (base_addr, ai); } if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE) @@ -12464,15 +12447,11 @@ gomp_oacc_needs_data_present (tree decl) && gimplify_omp_ctxp->region_type != ORT_ACC_KERNELS) return NULL; - tree type = TREE_TYPE (decl); - if (TREE_CODE (type) == REFERENCE_TYPE) -type = TREE_TYPE (type); - - if (TREE_CODE (type) != ARRAY_TYPE - && TREE_CODE (type) != POINTER_TYPE - && TREE_CODE (type) != RECORD_TYPE - && (TREE_CODE (type) != POINTER_TYPE - || TREE_CODE (TREE_TYPE (type)) != ARRAY_TYPE)) + if (TREE_CODE (TREE_TYPE (decl)) != ARRAY_TYPE + && TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE + && TREE_CODE (TREE_TYPE (decl)) != RECORD_TYPE + && (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE + || TREE_CODE (TREE_TYPE (TREE_TYPE (decl))) != ARRAY_TYPE)) return NULL; decl = get_base_address (decl); @@ -12626,12 +12605,6 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data) { tree mapping = array_info->mapping; tree pointer = array_info->pointer; - gomp_map_kind presence_kind = GOMP_MAP_FORCE_PRESENT; - bool no_alloc = (OMP_CLAUSE_CODE (mapping) == OMP_CLAUSE_MAP - && OMP_CLAUSE_MAP_KIND (mapping) == GOMP_MAP_IF_PRESENT); - - if (no_alloc || omp_check_optional_argument (decl, false)) -presence_kind = GOMP_MAP_IF_PRESENT; if (code == OMP_CLAUSE_FIRSTPRIVATE) /* Oops, we have the wrong type of clause. Rebuild it. */ @@ -12639,15 +12612,14 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void *data)
[PATCH 04/14] Revert "openmp: Handle C/C++ array reference base-pointers in array sections"
This reverts commit 3385743fd2fa15a2a750a29daf6d4f97f5aad0ae. 2023-06-16 Julian Brown Revert: 2022-02-24 Chung-Lin Tang gcc/c/ChangeLog: * c-typeck.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/cp/ChangeLog: * semantics.cc (handle_omp_array_sections): Add handling for creating array-reference base-pointer attachment clause. gcc/testsuite/ChangeLog: * c-c++-common/gomp/target-enter-data-1.c: Adjust testcase. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test. --- gcc/c/c-typeck.cc | 27 + gcc/cp/semantics.cc | 28 + .../c-c++-common/gomp/target-enter-data-1.c | 3 +- .../libgomp.c-c++-common/ptr-attach-2.c | 60 --- 4 files changed, 3 insertions(+), 115 deletions(-) delete mode 100644 libgomp/testsuite/libgomp.c-c++-common/ptr-attach-2.c diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc index 450214556f9..9591d67251e 100644 --- a/gcc/c/c-typeck.cc +++ b/gcc/c/c-typeck.cc @@ -14113,10 +14113,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (int_size_in_bytes (TREE_TYPE (first)) <= 0) maybe_zero_len = true; - struct dim { tree low_bound, length; }; - auto_vec dims (num); - dims.safe_grow (num); - for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -14238,9 +14234,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } - - dim d = { low_bound, length }; - dims[i] = d; } if (non_contiguous) { @@ -14288,23 +14281,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } - - tree aref = t; - for (i = 0; i < dims.length (); i++) - { - if (dims[i].length && integer_onep (dims[i].length)) - { - tree lb = dims[i].low_bound; - aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb); - } - else - { - if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE) - t = aref; - break; - } - } - first = c_fully_fold (first, false, NULL); OMP_CLAUSE_DECL (c) = first; if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) @@ -14339,8 +14315,7 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) break; } tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); - if (TREE_CODE (t) == COMPONENT_REF || TREE_CODE (t) == ARRAY_REF - || TREE_CODE (t) == INDIRECT_REF) + if (TREE_CODE (t) == COMPONENT_REF) OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_ATTACH_DETACH); else OMP_CLAUSE_SET_MAP_KIND (c2, GOMP_MAP_FIRSTPRIVATE_POINTER); diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index e7bda6fa060..93ff7cf5e1b 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -5605,10 +5605,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) if (processing_template_decl && maybe_zero_len) return false; - struct dim { tree low_bound, length; }; - auto_vec dims (num); - dims.safe_grow (num); - for (i = num, t = OMP_CLAUSE_DECL (c); i > 0; t = TREE_CHAIN (t)) { @@ -5728,9 +5724,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) else size = size_binop (MULT_EXPR, size, l); } - - dim d = { low_bound, length }; - dims[i] = d; } if (!processing_template_decl) { @@ -5782,24 +5775,6 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) OMP_CLAUSE_DECL (c) = t; return false; } - - tree aref = t; - for (i = 0; i < dims.length (); i++) - { - if (dims[i].length && integer_onep (dims[i].length)) - { - tree lb = dims[i].low_bound; - aref = convert_from_reference (aref); - aref = build_array_ref (OMP_CLAUSE_LOCATION (c), aref, lb); - } - else - { - if (TREE_CODE (TREE_TYPE (aref)) == POINTER_TYPE) - t = aref; - break; - } - } - OMP_CLAUSE_DECL (c) = first; if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_HAS_DEVICE_ADDR) return false; @@ -5841,8 +5816,7 @@ handle_omp_array_sections (tree c, enum c_omp_region_type ort) bool reference_always_pointer = true; tree c2 = build_omp_clause (OMP_CLAUSE_LOCATION (c), OMP_CLAUSE_MAP); -
[PATCH 00/14] [og13] OpenMP/OpenACC: map clause and OMP gimplify rework
This series (for the og13 branch) is a rebased and merged version of the first few patches of the series previously sent upstream for mainline: https://gcc.gnu.org/pipermail/gcc-patches/2022-December/609031.html The series contains patches 1-6 and the parts of 8 ("C++ "declare mapper" support) that pertain to reorganisation of gimplify.cc:gimplify_{scan,adjust}_omp_clauses. The series also contains reversions and rewrites of several patches that needed adjustment in order to fit in with the new clause-processing arrangements. Tested with offloading to AMD GCN. I will apply shortly. Thanks, Julian Julian Brown (14): Revert "Assumed-size arrays with non-lexical data mappings" Revert "Fix references declared in lexically-enclosing OpenACC data region" Revert "Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)" Revert "openmp: Handle C/C++ array reference base-pointers in array sections" OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause OpenMP/OpenACC: Rework clause expansion and nested struct handling OpenMP: implicitly map base pointer for array-section pointer components OpenMP: Pointers and member mappings OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic OpenMP/OpenACC: Reorganise OMP map clause handling in gimplify.cc OpenACC: Reimplement "inheritance" for lexically-nested offload regions OpenACC: "declare create" fixes wrt. "allocatable" variables OpenACC: Allow implicit uses of assumed-size arrays in offload regions OpenACC: Improve implicit mapping for non-lexically nested offload regions gcc/c-family/c-common.h | 74 +- gcc/c-family/c-omp.cc | 837 - gcc/c/c-parser.cc | 17 +- gcc/c/c-typeck.cc | 773 ++-- gcc/cp/parser.cc | 17 +- gcc/cp/pt.cc |4 +- gcc/cp/semantics.cc | 1065 +++--- gcc/fortran/dependency.cc | 128 + gcc/fortran/dependency.h |1 + gcc/fortran/gfortran.h|1 + gcc/fortran/trans-openmp.cc | 376 +- gcc/gimplify.cc | 2239 gcc/omp-general.cc| 424 +++ gcc/omp-general.h | 69 + gcc/omp-low.cc| 23 +- .../c-c++-common/goacc/acc-data-chain.c |2 +- .../c-c++-common/goacc/combined-reduction.c |2 +- .../c-c++-common/goacc/reduction-1.c |4 +- .../c-c++-common/goacc/reduction-10.c |9 +- .../c-c++-common/goacc/reduction-2.c |4 +- .../c-c++-common/goacc/reduction-3.c |4 +- .../c-c++-common/goacc/reduction-4.c |4 +- gcc/testsuite/c-c++-common/gomp/clauses-2.c |2 +- gcc/testsuite/c-c++-common/gomp/target-50.c |2 +- .../c-c++-common/gomp/target-enter-data-1.c |4 +- .../c-c++-common/gomp/target-implicit-map-2.c |3 +- .../g++.dg/gomp/static-component-1.C | 23 + gcc/testsuite/gcc.dg/gomp/target-3.c |2 +- .../gfortran.dg/goacc/assumed-size.f90| 35 + .../gfortran.dg/goacc/loop-tree-1.f90 |2 +- gcc/testsuite/gfortran.dg/gomp/map-12.f90 |2 +- gcc/testsuite/gfortran.dg/gomp/map-9.f90 |2 +- .../gfortran.dg/gomp/map-subarray-2.f90 | 57 + .../gfortran.dg/gomp/map-subarray.f90 | 40 + gcc/tree-pretty-print.cc |3 + gcc/tree.h|8 + include/gomp-constants.h |9 +- libgomp/oacc-mem.c|6 +- libgomp/target.c | 91 +- libgomp/testsuite/libgomp.c++/baseptrs-3.C| 275 ++ libgomp/testsuite/libgomp.c++/baseptrs-4.C| 3154 + libgomp/testsuite/libgomp.c++/baseptrs-5.C| 62 + libgomp/testsuite/libgomp.c++/class-array-1.C | 59 + libgomp/testsuite/libgomp.c++/target-48.C | 32 + libgomp/testsuite/libgomp.c++/target-49.C | 37 + .../libgomp.c-c++-common/baseptrs-1.c | 50 + .../libgomp.c-c++-common/baseptrs-2.c | 70 + .../map-arrayofstruct-1.c | 38 + .../map-arrayofstruct-2.c | 58 + .../map-arrayofstruct-3.c | 68 + .../target-implicit-map-2.c |2 + .../target-implicit-map-5.c | 50 + .../libgomp.c-c++-common/target-map-zlas-1.c | 36 + .../libgomp.fortran/map-subarray-2.f90| 108 + .../libgomp.fortran/map-subarray-3.f90| 62 + .../libgomp.fortran/map-subarray-4.f90| 35 + .../libgomp.fortran/map-subarray-5.f90| 54 + .../libgomp.fortran/map-subarray-6.f90| 26 +
[Bug debug/110308] [14 Regression] ICE on audiofile-0.3.6: RTL: vartrack: Segmentation fault in mode_to_precision(machine_mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308 Andrew Pinski changed: What|Removed |Added Component|rtl-optimization|debug --- Comment #4 from Andrew Pinski --- So I think there are 2 bugs here. First the lost of debugging info because of ch, and the latent segfault.
[Bug rtl-optimization/110308] [14 Regression] ICE on audiofile-0.3.6: RTL: vartrack: Segmentation fault in mode_to_precision(machine_mode)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308 --- Comment #3 from Andrew Pinski --- There are a difference in .optimized with respect to debug statements: GCC 13: # DEBUG i => 0 vs GCC trunk: # DEBUG i => NULL This is in BB 5. The change in Debug statements happened starting in ch2. Before ch2: [local count: 715863673]: # DEBUG BEGIN_STMT _3 = state[i_8]; _4 = _3->sample1; _5 = _4 + 1; _3->sample1 = _5; # DEBUG BEGIN_STMT i_23 = i_8 + 1; # DEBUG i => i_23 [local count: 1073741824]: # i_8 = PHI <0(4), i_23(5)> # DEBUG i => i_8 # DEBUG BEGIN_STMT if (channelCount.1_1 > i_8) goto ; [66.67%] else goto ; [33.33%] After: [local count: 715863673]: # i_9 = PHI # DEBUG BEGIN_STMT _3 = state[i_9]; _4 = _3->sample1; _5 = _4 + 1; _3->sample1 = _5; # DEBUG BEGIN_STMT i_23 = i_9 + 1; # DEBUG i => i_23 # DEBUG i => i_23 # DEBUG BEGIN_STMT if (channelCount.1_1 > i_23) goto ; [66.67%] else goto ; [33.33%] While in GCC 13 after ch2 looks like: [local count: 715863673]: # i_9 = PHI # DEBUG i => i_9 # DEBUG BEGIN_STMT _3 = state[i_9]; _4 = _3->sample1; _5 = _4 + 1; _3->sample1 = _5; # DEBUG BEGIN_STMT i_23 = i_9 + 1; # DEBUG i => i_23 # DEBUG i => i_23 # DEBUG BEGIN_STMT if (channelCount.1_1 > i_23) goto ; [66.67%] else goto ; [33.33%] Notice the `i => i_9` debug statement which is now missing.
[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305 --- Comment #8 from Michael Morrell --- Interesting information. I still feel that perhaps both functions should use the same logic to determine whether to make this transformation, but, for example, the extra checking for the vector case done by fold_real_zero_addition_p may not be needed in simplify_binary_operation_1 because of when the latter is used.
Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset
Jeff Law writes: > On 6/16/23 06:34, Richard Biener via Gcc-patches wrote: >> IVOPTs has strip_offset which suffers from the same issues regarding >> integer overflow that split_constant_offset did but the latter was >> fixed quite some time ago. The following implements strip_offset >> in terms of split_constant_offset, removing the redundant and >> incorrect implementation. >> >> The implementations are not exactly the same, strip_offset relies >> on ptrdiff_tree_p to fend off too large offsets while split_constant_offset >> simply assumes those do not happen and truncates them. By >> the same means strip_offset also handles POLY_INT_CSTs but >> split_constant_offset does not. Massaging the latter to >> behave like strip_offset in those cases might be the way to go? >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu. >> >> Comments? >> >> Thanks, >> Richard. >> >> PR tree-optimization/110243 >> * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove. >> (strip_offset): Make it a wrapper around split_constant_offset. >> >> * gcc.dg/torture/pr110243.c: New testcase. > Your call -- IMHO you know this code far better than I. +1, but LGTM FWIW. I couldn't see anything obvious (and valid) that split_offset_1 handles and split_constant_offset doesn't. Thanks, Richard
[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #2 from matoro --- Created attachment 55365 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55365=edit archive from -fdump-tree-all -fdump-rtl-all (In reply to Alexander Monakov from comment #1) > I tried building a cross-compiler from trunk with > --target=alpha-unknown-linux-gnu --with-gnu-ld --with-gnu-as > --enable-secureplt --enable-languages=c --enable-tls and got > > t.c:8:1: error: unrecognizable insn: > 8 | } > | ^ > (insn 23 22 24 5 (set (reg/f:DI 74) > (symbol_ref:DI ("ruby_current_ec") [flags 0x10] 0x7fb457a6c090 ruby_current_ec>)) "t.c":6:22 -1 > (nil)) > during RTL pass: vregs > > Would you mind compiling the testcase with -fdump-tree-all -fdump-rtl-all > and attaching a tar.gz with the resulting dumps? Absolutely, here you go.
[Bug target/100799] Stackoverflow in optimized code on PPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799 Peter Bergner changed: What|Removed |Added Resolution|--- |INVALID Status|WAITING |RESOLVED --- Comment #22 from Peter Bergner --- I'm closing this as NOT A BUG in GCC and is a bug in the source code being compiled not being cognizant of the rules between calling between fortran and C. Surya listed two solutions which can be used in Comment #21 below.
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #6 from palmer at gcc dot gnu.org --- (In reply to Craig Topper from comment #3) > I don't have a testsuite. I saw that gcc had crypto builtins and I happened > to noticed the tests in gcc weren't passing constant arguments. > > We also have a divergence in names between clang and gcc for some crypto > builtins. We really need to define a scalar crypto intrinsic header file. OK, let's try and get that sorted out? We're generally not supposed to be merging intrinsics without some sort of spec to point at, but we did a pretty poor job at that for the V intrinsics and it looks like we've slipped a bit here too. Unless I'm missing something we haven't released GCC with the crypto intrinsics yet, so we should be safe to fix bugs there as they come up.
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #5 from palmer at gcc dot gnu.org --- (In reply to Jeffrey A. Law from comment #4) > Yea, the tests aren't great. They'll be better shortly. They'll test > non-constant arguments and out-of-range constants, expecting a suitable > diagnostic. They'll also test the extrema of valid constants. Awesome, thanks!
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #4 from Jeffrey A. Law --- Yea, the tests aren't great. They'll be better shortly. They'll test non-constant arguments and out-of-range constants, expecting a suitable diagnostic. They'll also test the extrema of valid constants.
[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305 --- Comment #7 from Andrew Pinski --- (In reply to Michael Morrell from comment #6) > I'm curious why this transformation is being done by both > fold_real_zero_addition_p AND simplify_binary_operation_1. The answer there involves the history of GCC and the history of how optimizations were done in GCC. Basically fold_real_zero_addition_p (fold) would only act a statement while simplify_binary_operation_1 could happen between statements (while doing CSE and combine, etc.). That changed with the merge of tree-ssa in r0-58166-g6de9cd9a886ea6 (2004). simplify_binary_operation_1 had the optimization since the begining of git (though it moved from cse.c to simplify-rtx in r0-24738-g0cedb36cbd7e0c and the HONOR_SIGNED_ZEROS was done by r0-41258-g71925bc04f24a4, in 2002 before it was just checking ieee float format and unsafe-math). fold had it since the begining of git also (and changed in a similar fashion as simplify-rtx for the HONOR_SIGNED_ZEROS).
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #3 from Craig Topper --- I don't have a testsuite. I saw that gcc had crypto builtins and I happened to noticed the tests in gcc weren't passing constant arguments. We also have a divergence in names between clang and gcc for some crypto builtins. We really need to define a scalar crypto intrinsic header file.
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 --- Comment #2 from palmer at gcc dot gnu.org --- Do you guys have a test suite for these, or did you just happen to run into it? The intrinsic testing has been a bit of a blind spot in GCC land.
[Bug ada/110314] New: Gnat failed assertion and Allocators with discriminant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110314 Bug ID: 110314 Summary: Gnat failed assertion and Allocators with discriminant Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: franckbehaghel_gcc at protonmail dot com CC: dkm at gcc dot gnu.org Target Milestone: --- With checks enabled, Gnat failed to build this file. $ cat main_assertion_failed.adb with Ada.Containers.Synchronized_Queue_Interfaces; with Ada.Containers.Unbounded_Synchronized_Queues; procedure main is package Queue_Interfaces is new Ada.Containers.Synchronized_Queue_Interfaces (Integer); package Synchronized_Queues is new Ada.Containers.Unbounded_Synchronized_Queues ( Queue_Interfaces => Queue_Interfaces); subtype Queue is Synchronized_Queues.Queue; type Access_Type is access all Queue; Q1 : Access_Type := new Queue; Q2 : Access_Type := new Queue; begin null; end Main; $ gnatmake main_assertion_failed.adb gcc -c main_assertion_failed.adb +===GNAT BUG DETECTED==+ | 14.0.0 20230617 (experimental) (aarch64-unknown-linux-gnu) Assert_Failure nlists.adb:172| | Error detected at main_assertion_failed.adb:12:21| | Compiling main_assertion_failed.adb | | Please submit a bug report; see https://gcc.gnu.org/bugs/ . | | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact command that you entered. | | Also include sources listed below. | +==+ It fails also with Compiler Explorer (https://godbolt.org/) +===GNAT BUG DETECTED==+ | 14.0.0 20230619 (experimental) (x86_64-linux-gnu) Assert_Failure nlists.adb:172| | Error detected at example.adb:12:21 | | Compiling| | Please submit a bug report; see https://gcc.gnu.org/bugs/ . | | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact command that you entered. | | Also include sources listed below. | +==+ regards,
Re: Different ASM for ReLU function between GCC11 and GCC12
On Mon, Jun 19, 2023 at 09:10:53PM +0200, André Günther via Gcc wrote: > I noticed that a simple function like > auto relu( float x ) { > return x > 0.f ? x : 0.f; > } > compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On > -O3 -mavx2 the former compiles above function to Such reports should go into gcc.gnu.org/bugzilla/, not to the mailing list, if you are convinced that loading the constant from memory is faster. Another possibility is vxorps xmm1, xmm1, xmm1 vmaxss xmm0, xmm0, xmm1 ret which doesn't need to wait for the memory. This changed with https://gcc.gnu.org/r12-7693 > > relu(float): > vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] > ret > .LC0: > .long 0 > > which is what I would naively expect and what also clang essentially does > (clang actually uses an xor before the maxss to get the zero). The latter, > however, compiles the function to > > relu(float): > vxorps xmm1, xmm1, xmm1 > vcmpltss xmm2, xmm1, xmm0 > vblendvps xmm0, xmm1, xmm0, xmm2 > ret > > which looks like a missed optimisation. Does anyone know if there's a > reason for the changed behaviour? Jakub
[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201 Jeffrey A. Law changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-06-19 Status|UNCONFIRMED |NEW --- Comment #1 from Jeffrey A. Law --- It looks like some of the aes patterns have the same problem. It may just have been Liao not understanding the difference between an operand constraint and an operand predicate.
[Bug go/110297] [13/14 Regression] all libgo tests fail on arm-linux-gnueabi and arm-linxu-gnueabihf
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110297 --- Comment #4 from Ian Lance Taylor --- Thanks. I suspect this was broken by https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604158.html.
Different ASM for ReLU function between GCC11 and GCC12
Hi, I noticed that a simple function like auto relu( float x ) { return x > 0.f ? x : 0.f; } compiles to different ASM using GCC11 (or lower) and GCC12 (or higher). On -O3 -mavx2 the former compiles above function to relu(float): vmaxss xmm0, xmm0, DWORD PTR .LC0[rip] ret .LC0: .long 0 which is what I would naively expect and what also clang essentially does (clang actually uses an xor before the maxss to get the zero). The latter, however, compiles the function to relu(float): vxorps xmm1, xmm1, xmm1 vcmpltss xmm2, xmm1, xmm0 vblendvps xmm0, xmm1, xmm0, xmm2 ret which looks like a missed optimisation. Does anyone know if there's a reason for the changed behaviour? Andre
Re: [PATCH v2] RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.
On 6/14/23 01:57, Jin Ma wrote: In order to avoid interrupt functions to change the FCSR, it needs to be saved and restored at the beginning and end of the function. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for FCSR. (riscv_for_each_saved_reg): Save and restore FCSR in interrupt functions. * config/riscv/riscv.md (riscv_frcsr): New patterns. (riscv_fscsr): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-fcsr-1.c: New test. * gcc.target/riscv/interrupt-fcsr-2.c: New test. * gcc.target/riscv/interrupt-fcsr-3.c: New test. Thanks. I pushed this to the trunk. jeff
[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values
Kewen, GCC maintainers: Version 6, Fixed missing change log entry. Changed builtin id names as requested. Missed making the change on the last version. Fixed comment in the three test cases. Reran regression suite on Power 10, no regressions. Version 5, Tested the patch on P9 BE per request. Fixed up test case to get the correct expected values for BE and LE. Fixed typos. Updated the doc/extend.texi to clarify the vector arguments. Changed test file names per request. Moved builtin defs next to related definitions. Renamed new mode_attr. Removed new mode_iterator, used existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. Fixed up overloaded definitions per request. Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp cases to rs6000_expand_builtin. Merged the new define_insn definitions with the existing definitions. Renamed the builtins by removing the __builtin_ prefix from the names. Fixed the documentation for the builtins. Updated the test files to check the desired instructions were generated. Retested patch on Power 10 with no regressions. Version 3, was able to get the overloaded version of scalar_insert_exp to work and the change to xsxexpqp_f128_ define instruction to work with the suggestions from Kewen. Version 2, I have addressed the various comments from Kewen. I had issues with adding an additional overloaded version of scalar_insert_exp with vector arguments. The overload infrastructure didn't work with a mix of scalar and vector arguments. I did rename the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make it similar to the existing builtin. I also wasn't able to get the suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so I left the two simpler definitiions. The patch add three new builtins to extract the significand and exponent of an IEEE float 128-bit value where the builtin argument is a vector. Additionally, a builtin to insert the exponent into an IEEE float 128-bit vector argument is added. These builtins were requested since there is no clean and optimal way to transfer between a vector and a scalar IEEE 128 bit value. The patch has been tested on Power 9 BE and Power 10 LE with no regressions. Please let me know if the patch is acceptable or not. Thanks. Carl rs6000: Add builtins for IEEE 128-bit floating point values Add support for the following builtins: __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128); __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128); __ieee128 scalar_insert_exp (__vector unsigned __int128, __vector unsigned long long); The instructions used in the builtins operate on vector registers. Thus the result must be moved to a scalar type. There is no clean, performant way to do this. The user code typically needs the result as a vector anyway. gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di. Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di. (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti, CODE_FOR_xsiexpqp_kf_v2di): Add case statements. * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp, __builtin_extractf128_sig, __builtin_insertf128_exp): Add new builtin definitions. Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di, xsxsigqp_kf_ti, xsiexpqp_kf_di respectively. * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new overloaded instance. Update comments. * config/rs6000/rs6000-overload.def (__builtin_vec_scalar_insert_exp): Add new overload definition with vector arguments. (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New overloaded definitions. * config/vsx.md (V2DI_DI): New mode iterator. (DI_to_TI): New mode attribute. Rename xsxexpqp_ to sxexpqp__. Rename xsxsigqp_ to xsxsigqp__. Rename xsiexpqp_ to xsiexpqp__. * doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig): Add documentation for new builtins. (scalar_insert_exp): Add new overloaded builtin definition. gcc/testsuite/ * gcc.target/powerpc/bfp/extract-exp-8.c: New test case. * gcc.target/powerpc/bfp/extract-sig-8.c: New test case. * gcc.target/powerpc/bfp/insert-exp-16.c: New test case. --- gcc/config/rs6000/rs6000-builtin.cc | 21 +++- gcc/config/rs6000/rs6000-builtins.def | 15 ++- gcc/config/rs6000/rs6000-c.cc | 10 +- gcc/config/rs6000/rs6000-overload.def | 12 ++ gcc/config/rs6000/vsx.md | 25 +++-- gcc/doc/extend.texi | 24 +++-
[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305 Michael Morrell changed: What|Removed |Added CC||mmorrell at tachyum dot com --- Comment #6 from Michael Morrell --- I'm curious why this transformation is being done by both fold_real_zero_addition_p AND simplify_binary_operation_1. The checks in fold_real_zero_addition_p are more complex and will leave "a + 0.0" unchanged in more cases, yet later simplify_binary_operation_1 transforms the expression for less complex reasons. I also wonder if there aren't similar expressions (perhaps "a * 1.0" -> a) that need to be looked at.
[Bug fortran/92887] [F2008] Passing nullified/disassociated pointer or unalloc allocatable to OPTIONAL + VALUE dummy fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92887 --- Comment #6 from anlauf at gcc dot gnu.org --- (In reply to Mikael Morin from comment #5) > (In reply to anlauf from comment #4) > > > > I'll need broader feedback, so unless someone adds to this pr, I'll submit > > the present patch - with testcases - to get attention. > > > Here you go: > > > diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc > > index 45a984b6bdb..d9dcc11e5bd 100644 > > --- a/gcc/fortran/trans-expr.cc > > +++ b/gcc/fortran/trans-expr.cc > > > @@ -6396,7 +6399,28 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * > > sym, > > && fsym->ts.type != BT_CLASS > > && fsym->ts.type != BT_DERIVED) > > { > > - if (e->expr_type != EXPR_VARIABLE > > + /* F2018:15.5.2.12 Argument presence and > > + restrictions on arguments not present. */ > > + if (e->expr_type == EXPR_VARIABLE > > + && (e->symtree->n.sym->attr.allocatable > > + || e->symtree->n.sym->attr.pointer)) > > Beware of expressions like derived%alloc_comp or derived%pointer_comp which > don't match the above. Right. This is fixable by using && (gfc_expr_attr (e).allocatable || gfc_expr_attr (e).pointer)) instead. > > @@ -7072,6 +7096,42 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * > > sym, > > } > > } > > > > + /* F2023:15.5.3, 15.5.4: Actual argument expressions are evaluated > > +before they are associated and a procedure is executed. */ > > + if (e && e->expr_type != EXPR_VARIABLE && !gfc_is_constant_expr (e)) > > + { > > + /* Create temporary except for functions returning pointers that > > +can appear in a variable definition context. */ > > Maybe explain *why* we have to create a temporary, that is some data > references may become undefined by the procedure call (intent(out) dummies) > so we have to evaluate values depending on them beforehand (PR 92178). That is one reason. Another one, also pointed out in PR92178 by Tobias' review of Steve's draft, is the first testcase at https://gcc.gnu.org/legacy-ml/gcc-patches/2019-10/msg01970.html This is reminiscent to an issue reported for the MERGE intrinsic (pr107874, fixed so far, but there is a remaining issue in pr105371). > > + if (e->expr_type != EXPR_FUNCTION > > + || !(gfc_expr_attr (e).pointer || gfc_expr_attr (e).proc_pointer)) > > Merge with the outer condition? Yes. The above form was intended more for proof-of-concept and readability than for coding standards. > > + need_temp = true; > > + } > > + > > + if (need_temp) > > + { > > + if (cond_temp == NULL_TREE) > > + parmse.expr = gfc_evaluate_now (parmse.expr, ); > > I'm not sure about this. The condition to set need_temp looks quite general > (especially it matches non-scalar cases, doesn't it?), but > gfc_conv_expr_reference should already take care of creating a variable, so > that a temporary is missing only for value dummies, I think. I would rather > move this to the place specific to value dummies. I agree in principle. The indentation level is already awful in the specific place, which calls for thoughts about refactoring that mega-loop over the arguments than currently spans far more than 1000 source code lines. > I think this PR is only about scalars with basic types, is there the same > problem with derived types? with classes? > I guess arrays are different as they are always by reference? For the current documentation of the argument passing convention see: https://gcc.gnu.org/onlinedocs/gfortran/Argument-passing-conventions.html "For OPTIONAL dummy arguments, an absent argument is denoted by a NULL pointer, except for scalar dummy arguments of intrinsic type which have the VALUE attribute. For those, a hidden Boolean argument (logical(kind=C_bool),value) is used to indicate whether the argument is present." My understanding is that for these scalar arguments we do need something that can be passed by value. We currently do not support VALUE with array arguments (F2008+), character of length > 1, and character actual arguments are broken unless they are constants. There are several open PRs. > > + else > > I would rather move the else part to the place above where cond_temp is set, > so that the code is easier to follow. > > > + { > > + /* "Conditional temporary" to handle variables that possibly > > +cannot be dereferenced. Use null value as fallback. */ > > + tree dflt_temp; > > + gcc_assert (e->ts.type != BT_DERIVED && e->ts.type != BT_CLASS); > > + gcc_assert (e->rank == 0); > > + dflt_temp = gfc_create_var (TREE_TYPE (parmse.expr), "temp"); > > + TREE_STATIC (dflt_temp) = 1; > > +
Re: [PATCH] Introduce hardbool attribute for C
On 16 June 2023 07:35:27 CEST, Alexandre Oliva via Gcc-patches wrote: index 0..634feaed4deef --- /dev/null +++ b/gcc/testsuite/gcc.dg/hardbool-err.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "" } */ + +typedef _Bool __attribute__ ((__hardbool__)) +hbbl; /* { dg-error "integral types" } */ + +typedef double __attribute__ ((__hardbool__)) +hbdbl; /* { dg-error "integral types" } */ + +enum x; +typedef enum x __attribute__ ((__hardbool__)) +hbenum; /* { dg-error "integral types" } */ + +struct s; +typedef struct s __attribute__ ((__hardbool__)) +hbstruct; /* { dg-error "integral types" } */ + +typedef int __attribute__ ((__hardbool__ (0, 0))) +hb00; /* { dg-error "different values" } */ + +typedef int __attribute__ ((__hardbool__ (4, 16))) hb4x; +struct s { + hb4x m:2; +}; /* { dg-error "is a GCC extension|different values" } */ +/* { dg-warning "changes value" "warning" { target *-*-* } .-1 } */ + +hb4x __attribute__ ((vector_size (4 * sizeof (hb4x +vvar; /* { dg-error "invalid vector type" } */ Arm-chair, tinfoil hat still on, didn't look closely, hence: I don't see explicit tests with _Complex nor __complex__. Would we want to check these here, or are they handled thought the "underlying" tests above? I'd welcome a fortran interop note in the docs as hinted previously to cover out of the box behavior. It's probably reasonably unlikely but better be safe than sorry? cheers,
Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values
Kewen: On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote: > > > Hi Carl, > > on 2023/6/17 01:57, Carl Love wrote: > > overloaded instance. Update comments. > > * config/rs6000/rs6000-overload.def > > (__builtin_vec_scalar_insert_exp): Add new overload definition > > with > > vector arguments. > > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New > > overloaded definitions. > > * config/vsx.md (V2DI_DI): New mode iterator. > > Missing an entry for DI_to_TI. Opps, missed that. Sorry, fixed. > > > > > > >const signed long long __builtin_vsx_scalar_extract_expq > > (_Float128); > > -VSEEQP xsxexpqp_kf {} > > +VSEEQP xsxexpqp_kf_di {} > > + > > + vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128); > > +VSEEXPKF xsxexpqp_kf_v2di {} > > As I pointed out previously, the related id is VSEEQP, since both of > them Oops, I guess I forgot to change that. Sorry. > have kf in their names, having KF in its id doesn't look good IMHO. > How about VSEEQPV instead of VSEEXPKF? It's also consistent with > what > we use for VSIEQP. Yup, makes sense, changed to VSEEQPV. > > > > >const signed __int128 __builtin_vsx_scalar_extract_sigq > > (_Float128); > > -VSESQP xsxsigqp_kf {} > > +VSESQP xsxsigqp_kf_ti {} > > + > > + vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128); > > +VSESIGKF xsxsigqp_kf_v1ti {} > > Similar to the above, s/VSESIGKF/VSESQPV/ Changed to VSESQPV. > > > > >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned > > __int128, \ > > unsigned long > > long); > > -VSIEQP xsiexpqp_kf {} > > +VSIEQP xsiexpqp_kf_di {} > > > >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \ > >unsigned > > long long); > > VSIEQPF xsiexpqpf_kf {} > > > > + const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull); > > +VSIEQPV xsiexpqp_kf_v2di {} > > + > >const signed int __builtin_vsx_scalar_test_data_class_qp > > (_Float128, \ > > const > > int<7>); > > VSTDCQP xststdcqp_kf {} > > diff --git a/gcc/config/rs6000/rs6000-c.cc > > b/gcc/config/rs6000/rs6000-c.cc > > index 8555174d36e..11060f697db 100644 > > --- a/gcc/config/rs6000/rs6000-c.cc > > +++ b/gcc/config/rs6000/rs6000-c.cc > > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin > > (location_t loc, tree fndecl, > >128-bit variant of built-in function. */ > > if (GET_MODE_PRECISION (arg1_mode) > 64) > > { > > - /* If first argument is of float variety, choose variant > > - that expects __ieee128 argument. Otherwise, expect > > - __int128 argument. */ > > + /* If first argument is of float variety, choose the > > variant that > > + expects __ieee128 argument. If the first argument is > > vector > > + int, choose the variant that expects vector unsigned > > + __int128 argument. Otherwise, expect scalar __int128 > > argument. > > + */ > > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT) > > instance_code = RS6000_BIF_VSIEQPF; > > + else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT) > > + instance_code = RS6000_BIF_VSIEQPV; > > else > > instance_code = RS6000_BIF_VSIEQP; > > } > > diff --git a/gcc/config/rs6000/rs6000-overload.def > > b/gcc/config/rs6000/rs6000-overload.def > > index c582490c084..05a5ca6a04d 100644 > > --- a/gcc/config/rs6000/rs6000-overload.def > > +++ b/gcc/config/rs6000/rs6000-overload.def > > @@ -4515,6 +4515,18 @@ > > VSIEQP > >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned > > long long); > > VSIEQPF > > + _Float128 __builtin_vec_scalar_insert_exp (vuq, vull); > > +VSIEQPV > > + > > +[VEC_VSEEV, scalar_extract_exp_to_vec, \ > > +__builtin_vec_scalar_extract_exp_to_vector] > > + vull __builtin_vec_scalar_extract_exp_to_vector (_Float128); > > +VSEEXPKF > > + > > Need to update if the above changes. changed > > > +[VEC_VSESV, scalar_extract_sig_to_vec, \ > > +__builtin_vec_scalar_extract_sig_to_vector] > > + vuq __builtin_vec_scalar_extract_sig_to_vector (_Float128); > > +VSESIGKF > > > > Ditto. changed > > > > > diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract- > > exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp- > > 8.c > > new file mode 100644 > > index 000..e24e09012d9 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-8.c > > @@ -0,0 +1,58 @@ > > +/* { dg-do run { target { powerpc*-*-* } } } */ > > +/* { dg-require-effective-target lp64 } */ > > +/* { dg-require-effective-target p9vector_hw } */ > > +/* { dg-options "-mdejagnu-cpu=power9 -save-temps" } */ > > + > > +#include > > +#include > > + > > +#if
Re: [PATCH] tree-optimization/110243 - kill off IVOPTs split_offset
On 6/16/23 06:34, Richard Biener via Gcc-patches wrote: IVOPTs has strip_offset which suffers from the same issues regarding integer overflow that split_constant_offset did but the latter was fixed quite some time ago. The following implements strip_offset in terms of split_constant_offset, removing the redundant and incorrect implementation. The implementations are not exactly the same, strip_offset relies on ptrdiff_tree_p to fend off too large offsets while split_constant_offset simply assumes those do not happen and truncates them. By the same means strip_offset also handles POLY_INT_CSTs but split_constant_offset does not. Massaging the latter to behave like strip_offset in those cases might be the way to go? Bootstrapped and tested on x86_64-unknown-linux-gnu. Comments? Thanks, Richard. PR tree-optimization/110243 * tree-ssa-loop-ivopts.cc (strip_offset_1): Remove. (strip_offset): Make it a wrapper around split_constant_offset. * gcc.dg/torture/pr110243.c: New testcase. Your call -- IMHO you know this code far better than I. jeff
Re: [PATCH] RISC-V: Add VLS modes for GNU vectors
On 6/18/23 17:06, Juzhe-Zhong wrote: This patch is a propsal patch is **NOT** ready to push since after this patch the total machine modes will exceed 255 which will create ICE in LTO: internal compiler error: in bp_pack_int_in_range, at data-streamer.h:290 Right. Note that an ack from Jakub or Richi will be sufficient for the LTO fixes to go forward. The reason we need to add VLS modes for following reason: 1. Enhance GNU vectors codegen: For example: typedef int32_t vnx8si __attribute__ ((vector_size (32))); __attribute__ ((noipa)) void f_vnx8si (int32_t * in, int32_t * out) { vnx8si v = *(vnx8si*)in; *(vnx8si *) out = v; } compile option: --param=riscv-autovec-preference=scalable before this patch: f_vnx8si: ld a2,0(a0) ld a3,8(a0) ld a4,16(a0) ld a5,24(a0) addisp,sp,-32 sd a2,0(a1) sd a3,8(a1) sd a4,16(a1) sd a5,24(a1) addisp,sp,32 jr ra After this patch: f_vnx8si: vsetivlizero,8,e32,m2,ta,ma vle32.v v2,0(a0) vse32.v v2,0(a1) ret 2. Ehance VLA SLP: void f (uint8_t *restrict a, uint8_t *restrict b, uint8_t *restrict c) { for (int i = 0; i < 100; ++i) { a[i * 8] = b[i * 8] + c[i * 8]; a[i * 8 + 1] = b[i * 8] + c[i * 8 + 1]; a[i * 8 + 2] = b[i * 8 + 2] + c[i * 8 + 2]; a[i * 8 + 3] = b[i * 8 + 2] + c[i * 8 + 3]; a[i * 8 + 4] = b[i * 8 + 4] + c[i * 8 + 4]; a[i * 8 + 5] = b[i * 8 + 4] + c[i * 8 + 5]; a[i * 8 + 6] = b[i * 8 + 6] + c[i * 8 + 6]; a[i * 8 + 7] = b[i * 8 + 6] + c[i * 8 + 7]; } } .. Loop body: ... vrgatherei16.vv... ... Tail: lbu a4,792(a1) lbu a5,792(a2) addwa5,a5,a4 sb a5,792(a0) lbu a5,793(a2) addwa5,a5,a4 sb a5,793(a0) lbu a4,794(a1) lbu a5,794(a2) addwa5,a5,a4 sb a5,794(a0) lbu a5,795(a2) addwa5,a5,a4 sb a5,795(a0) lbu a4,796(a1) lbu a5,796(a2) addwa5,a5,a4 sb a5,796(a0) lbu a5,797(a2) addwa5,a5,a4 sb a5,797(a0) lbu a4,798(a1) lbu a5,798(a2) addwa5,a5,a4 sb a5,798(a0) lbu a5,799(a2) addwa5,a5,a4 sb a5,799(a0) ret The tail elements need VLS modes to vectorize like ARM SVE: f: mov x3, 0 cntbx5 mov x4, 792 whilelo p7.b, xzr, x4 .L2: ld1bz31.b, p7/z, [x1, x3] ld1bz30.b, p7/z, [x2, x3] trn1z31.b, z31.b, z31.b add z31.b, z31.b, z30.b st1bz31.b, p7, [x0, x3] add x3, x3, x5 whilelo p7.b, x3, x4 b.any .L2 Tail: ldr b31, [x1, 792] ldr b27, [x1, 794] ldr b28, [x1, 796] dup v31.8b, v31.b[0] ldr b29, [x1, 798] ldr d30, [x2, 792] ins v31.b[2], v27.b[0] ins v31.b[3], v27.b[0] ins v31.b[4], v28.b[0] ins v31.b[5], v28.b[0] ins v31.b[6], v29.b[0] ins v31.b[7], v29.b[0] add v31.8b, v30.8b, v31.8b str d31, [x0, 792] ret Notice ARM SVE use ADVSIMD modes (Neon) to vectorize the tail. gcc/ChangeLog: * config/riscv/riscv-modes.def (VECTOR_BOOL_MODE): Add VLS modes for GNU vectors. (ADJUST_ALIGNMENT): Ditto. (ADJUST_BYTESIZE): Ditto. (ADJUST_PRECISION): Ditto. (VECTOR_MODES): Ditto. * config/riscv/riscv-protos.h (riscv_v_ext_vls_mode_p): Ditto. (get_regno_alignment): Ditto. * config/riscv/riscv-v.cc (INCLUDE_ALGORITHM): Ditto. (const_vlmax_p): Ditto. (legitimize_move): Ditto. (get_vlmul): Ditto. (get_regno_alignment): Ditto. (get_ratio): Ditto. (get_vector_mode): Ditto. * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Ditto. * config/riscv/riscv.cc (riscv_v_ext_vls_mode_p): Ditto. (VLS_ENTRY): Ditto. (riscv_v_ext_mode_p): Ditto. (riscv_hard_regno_nregs): Ditto. (riscv_hard_regno_mode_ok): Ditto. * config/riscv/riscv.md: Ditto. * config/riscv/vector-iterators.md: Ditto. * config/riscv/vector.md: Ditto. * config/riscv/autovec-vls.md: New file. --- So I expected we were going to have to define some static length patterns at some point. So this isn't a huge surprise. diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 6421e933ca9..6fc1c433069 100644 --- a/gcc/config/riscv/riscv-v.cc +++
Re: [PATCH] c-family: implement -ffp-contract=on
> Am 19.06.2023 um 19:03 schrieb Alexander Monakov : > > > Ping. OK for trunk? Ok if the FE maintainers do not object within 48h. Thanks, Richard >> On Mon, 5 Jun 2023, Alexander Monakov wrote: >> >> Ping for the front-end maintainers' input. >> >>> On Mon, 22 May 2023, Richard Biener wrote: >>> >>> On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches >>> wrote: Implement -ffp-contract=on for C and C++ without changing default behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN). >>> >>> The documentation changes mention the defaults are changed for >>> standard modes, I suppose you want to remove that hunk. >>> gcc/c-family/ChangeLog: * c-gimplify.cc (fma_supported_p): New helper. (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA contraction. gcc/ChangeLog: * common.opt (fp_contract_mode) [on]: Remove fallback. * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test. * doc/invoke.texi (-ffp-contract): Update. * trans-mem.cc (diagnose_tm_1): Skip internal function calls. --- gcc/c-family/c-gimplify.cc | 78 ++ gcc/common.opt | 3 +- gcc/config/sh/sh.md| 2 +- gcc/doc/invoke.texi| 8 ++-- gcc/trans-mem.cc | 3 ++ 5 files changed, 88 insertions(+), 6 deletions(-) diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc index ef5c7d919f..f7635d3b0c 100644 --- a/gcc/c-family/c-gimplify.cc +++ b/gcc/c-family/c-gimplify.cc @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3. If not see #include "c-ubsan.h" #include "tree-nested.h" #include "context.h" +#include "tree-pass.h" +#include "internal-fn.h" /* The gimplification pass converts the language-dependent trees (ld-trees) emitted by the parser into language-independent trees @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree body) return bind; } +/* Helper for c_gimplify_expr: test if target supports fma-like FN. */ + +static bool +fma_supported_p (enum internal_fn fn, tree type) +{ + return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH); +} + /* Gimplification of expression trees. */ /* Do C-specific gimplification on *EXPR_P. PRE_P and POST_P are as in @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p ATTRIBUTE_UNUSED, break; } +case PLUS_EXPR: +case MINUS_EXPR: + { + tree type = TREE_TYPE (*expr_p); + /* For -ffp-contract=on we need to attempt FMA contraction only + during initial gimplification. Late contraction across statement + boundaries would violate language semantics. */ + if (SCALAR_FLOAT_TYPE_P (type) + && flag_fp_contract_mode == FP_CONTRACT_ON + && cfun && !(cfun->curr_properties & PROP_gimple_any) + && fma_supported_p (IFN_FMA, type)) + { + bool neg_mul = false, neg_add = code == MINUS_EXPR; + + tree *op0_p = _OPERAND (*expr_p, 0); + tree *op1_p = _OPERAND (*expr_p, 1); + + /* Look for ±(x * y) ± z, swapping operands if necessary. */ + if (TREE_CODE (*op0_p) == NEGATE_EXPR + && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR) + /* '*EXPR_P' is '-(x * y) ± z'. This is fine. */; + else if (TREE_CODE (*op0_p) != MULT_EXPR) + { + std::swap (op0_p, op1_p); + std::swap (neg_mul, neg_add); + } + if (TREE_CODE (*op0_p) == NEGATE_EXPR) + { + op0_p = _OPERAND (*op0_p, 0); + neg_mul = !neg_mul; + } + if (TREE_CODE (*op0_p) != MULT_EXPR) + break; + auto_vec ops (3); + ops.quick_push (TREE_OPERAND (*op0_p, 0)); + ops.quick_push (TREE_OPERAND (*op0_p, 1)); + ops.quick_push (*op1_p); + + enum internal_fn ifn = IFN_FMA; + if (neg_mul) + { + if (fma_supported_p (IFN_FNMA, type)) + ifn = IFN_FNMA; + else + ops[0] = build1 (NEGATE_EXPR, type, ops[0]); + } + if (neg_add) + { + enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : IFN_FNMS; + if (fma_supported_p (ifn2, type)) + ifn = ifn2; + else +
Re: Tiny phiprop compile time optimization
> Am 19.06.2023 um 20:08 schrieb Andrew Pinski via Gcc-patches > : > > On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches > wrote: >> >>> On Mon, 19 Jun 2023, Jan Hubicka wrote: >>> >>> Hi, >>> this patch avoids unnecessary post dominator and update_ssa in phiprop. >>> >>> Bootstrapped/regtested x86_64-linux, OK? >>> >>> gcc/ChangeLog: >>> >>> * tree-ssa-phiprop.cc (propagate_with_phi): Add >>> post_dominators_computed; >>> compute post dominators lazilly. >>> (const pass_data pass_data_phiprop): Remove TODO_update_ssa. >>> (pass_phiprop::execute): Update; return TODO_update_ssa if something >>> changed. >>> >>> diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc >>> index 3cb4900b6be..87e3a2ccf3a 100644 >>> --- a/gcc/tree-ssa-phiprop.cc >>> +++ b/gcc/tree-ssa-phiprop.cc >>> @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data) >>> >>> static bool >>> propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn, >>> - size_t n) >>> + size_t n, bool *post_dominators_computed) >>> { >>> tree ptr = PHI_RESULT (phi); >>> gimple *use_stmt; >>> @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct >>> phiprop_d *phivn, >>> gimple *def_stmt; >>> tree vuse; >>> >>> + if (!*post_dominators_computed) >>> +{ >>> + calculate_dominance_info (CDI_POST_DOMINATORS); >>> + *post_dominators_computed = true; >> >> I think you can save the parameter by using dom_info_available_p () here >> and ... >> >>> + } >>> + >>> /* Only replace loads in blocks that post-dominate the PHI node. That >>> makes sure we don't end up speculating loads. */ >>> if (!dominated_by_p (CDI_POST_DOMINATORS, >>> @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop = >>> 0, /* properties_provided */ >>> 0, /* properties_destroyed */ >>> 0, /* todo_flags_start */ >>> - TODO_update_ssa, /* todo_flags_finish */ >>> + 0, /* todo_flags_finish */ >>> }; >>> >>> class pass_phiprop : public gimple_opt_pass >>> @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun) >>> gphi_iterator gsi; >>> unsigned i; >>> size_t n; >>> + bool post_dominators_computed = false; >>> >>> calculate_dominance_info (CDI_DOMINATORS); >>> - calculate_dominance_info (CDI_POST_DOMINATORS); >>> >>> n = num_ssa_names; >>> phivn = XCNEWVEC (struct phiprop_d, n); >>> @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun) >>> if (bb_has_abnormal_pred (bb)) >>> continue; >>> for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ()) >>> - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n); >>> + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n, >>> + _dominators_computed); >>> } >>> >>> if (did_something) >>> @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun) >>> >>> free (phivn); >>> >>> - free_dominance_info (CDI_POST_DOMINATORS); >>> + if (post_dominators_computed) >>> +free_dominance_info (CDI_POST_DOMINATORS); >> >> unconditionally free_dominance_info here. >> >>> - return 0; >>> + return did_something ? TODO_update_ssa : 0; >> >> I guess that change is following general practice and good to catch >> undesired changes (update_ssa will exit early when there's nothing >> to do anyway). > > I wonder if TODO_update_ssa_only_virtuals should be used here rather > than TODO_update_ssa as the code produces ssa names already and just > adds memory loads/stores. But I could be wrong. I guess it should be able to update virtual SSA form itself. But it’s been some time since I wrote the pass … > > Thanks, > Andrew Pinski > > >> >> OK with those changes.
Re: Tiny phiprop compile time optimization
On Mon, Jun 19, 2023 at 1:32 AM Richard Biener via Gcc-patches wrote: > > On Mon, 19 Jun 2023, Jan Hubicka wrote: > > > Hi, > > this patch avoids unnecessary post dominator and update_ssa in phiprop. > > > > Bootstrapped/regtested x86_64-linux, OK? > > > > gcc/ChangeLog: > > > > * tree-ssa-phiprop.cc (propagate_with_phi): Add > > post_dominators_computed; > > compute post dominators lazilly. > > (const pass_data pass_data_phiprop): Remove TODO_update_ssa. > > (pass_phiprop::execute): Update; return TODO_update_ssa if something > > changed. > > > > diff --git a/gcc/tree-ssa-phiprop.cc b/gcc/tree-ssa-phiprop.cc > > index 3cb4900b6be..87e3a2ccf3a 100644 > > --- a/gcc/tree-ssa-phiprop.cc > > +++ b/gcc/tree-ssa-phiprop.cc > > @@ -260,7 +260,7 @@ chk_uses (tree, tree *idx, void *data) > > > > static bool > > propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn, > > - size_t n) > > + size_t n, bool *post_dominators_computed) > > { > >tree ptr = PHI_RESULT (phi); > >gimple *use_stmt; > > @@ -324,6 +324,12 @@ propagate_with_phi (basic_block bb, gphi *phi, struct > > phiprop_d *phivn, > >gimple *def_stmt; > >tree vuse; > > > > + if (!*post_dominators_computed) > > +{ > > + calculate_dominance_info (CDI_POST_DOMINATORS); > > + *post_dominators_computed = true; > > I think you can save the parameter by using dom_info_available_p () here > and ... > > > + } > > + > >/* Only replace loads in blocks that post-dominate the PHI node. > > That > > makes sure we don't end up speculating loads. */ > >if (!dominated_by_p (CDI_POST_DOMINATORS, > > @@ -465,7 +471,7 @@ const pass_data pass_data_phiprop = > >0, /* properties_provided */ > >0, /* properties_destroyed */ > >0, /* todo_flags_start */ > > - TODO_update_ssa, /* todo_flags_finish */ > > + 0, /* todo_flags_finish */ > > }; > > > > class pass_phiprop : public gimple_opt_pass > > @@ -490,9 +497,9 @@ pass_phiprop::execute (function *fun) > >gphi_iterator gsi; > >unsigned i; > >size_t n; > > + bool post_dominators_computed = false; > > > >calculate_dominance_info (CDI_DOMINATORS); > > - calculate_dominance_info (CDI_POST_DOMINATORS); > > > >n = num_ssa_names; > >phivn = XCNEWVEC (struct phiprop_d, n); > > @@ -508,7 +515,8 @@ pass_phiprop::execute (function *fun) > >if (bb_has_abnormal_pred (bb)) > > continue; > >for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ()) > > - did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n); > > + did_something |= propagate_with_phi (bb, gsi.phi (), phivn, n, > > + _dominators_computed); > > } > > > >if (did_something) > > @@ -516,9 +524,10 @@ pass_phiprop::execute (function *fun) > > > >free (phivn); > > > > - free_dominance_info (CDI_POST_DOMINATORS); > > + if (post_dominators_computed) > > +free_dominance_info (CDI_POST_DOMINATORS); > > unconditionally free_dominance_info here. > > > - return 0; > > + return did_something ? TODO_update_ssa : 0; > > I guess that change is following general practice and good to catch > undesired changes (update_ssa will exit early when there's nothing > to do anyway). I wonder if TODO_update_ssa_only_virtuals should be used here rather than TODO_update_ssa as the code produces ssa names already and just adds memory loads/stores. But I could be wrong. Thanks, Andrew Pinski > > OK with those changes.
Re: [PATCH] RISC-V: Add tuple vector mode psABI checking and simplify code
On 6/18/23 07:16, 钟居哲 wrote: Thanks for cleaning up codes for future's ABI support patch. Let's wait for Jeff or Robin comments. Looks reasonable to me given the state we're in WRT psabi and vectors. jeff
Re: [PATCH] Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans
On 6/19/23 05:41, Richard Biener via Gcc-patches wrote: On Mon, Jun 19, 2023 at 12:33 PM Toru Kisuki via Gcc-patches wrote: Hi, With -O3 -fsignaling-nans -fno-signed-zeros, compiler should not simplify 'x + 0.0' to 'x'. OK if you bootstrapped / tested this change. I'm suspect Toru doesn't have write access. So I went ahead and did and x86 bootstrap & regression test which passed. The ChangeLog entry needed fleshing out a bit and fixed a minor whitespace problem in the patch itself. Pushed to the trunk. jeff
[Bug rtl-optimization/110305] Incorrect optimization with -O3 -fsignaling-nans -fno-signed-zeros
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110305 --- Comment #5 from CVS Commits --- The master branch has been updated by Jeff Law : https://gcc.gnu.org/g:827b2a279fc6ad5bb76e4d2c2eb3432955b5e11c commit r14-1952-g827b2a279fc6ad5bb76e4d2c2eb3432955b5e11c Author: Toru Kisuki Date: Mon Jun 19 11:51:09 2023 -0600 Do not allow "x + 0.0" to "x" optimization with -fsignaling-nans gcc/ PR rtl-optimization/110305 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Handle HONOR_SNANS for x + 0.0.
[Bug c/102989] Implement C2x's n2763 (_BitInt)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 Jakub Jelinek changed: What|Removed |Added Attachment #55329|0 |1 is obsolete|| --- Comment #66 from Jakub Jelinek --- Created attachment 55364 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55364=edit gcc14-bitint-wip.patch Updated patch. This can already do some simple lowering of the large/huge _BitInt operations, like: void foo (_BitInt(192) *x, _BitInt(192) *y, _BitInt(135) *z, _BitInt(135) *w) { x[0] &= y[0]; x[1] |= y[1]; x[2] ^= y[2]; x[3] = ~y[3]; z[0] &= w[0]; z[1] |= w[1]; z[2] ^= w[2]; z[3] = ~w[3]; } _BitInt(517) a, b, c, d, e, f; void bar (void) { a &= b; c |= b; d ^= b; e = ~f; } Additions/subtractions/left shift by small constant next.
[Bug c++/110312] -Wcast-align=strict warning despite alignas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110312 --- Comment #2 from Frank Heckenbach --- (In reply to Andrew Pinski from comment #1) > The decl has the increased alignment but the type does not in this case. > > So I think the warning is still correct. So there's no way around it other then disabling the warning, correct?
[Bug c++/110312] -Wcast-align=strict warning despite alignas
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110312 Andrew Pinski changed: What|Removed |Added Keywords||diagnostic --- Comment #1 from Andrew Pinski --- The decl has the increased alignment but the type does not in this case. So I think the warning is still correct.
Re: [PATCH] debug/110295 - mixed up early/late debug for member DIEs
On 6/19/23 06:15, Richard Biener wrote: When we process a scope typedef during early debug creation and we have already created a DIE for the type when the decl is TYPE_DECL_IS_STUB and this DIE is still in limbo we end up just re-parenting that type DIE instead of properly creating a DIE for the decl, eventually picking up the now completed type and creating DIEs for the members. Instead this is currently defered to the second time we come here, when we annotate the DIEs with locations late where now the type DIE is no longer in limbo and we fall through doing the job for the decl. The following makes sure we perform the necessary early tasks for this by continuing with the decl DIE creation after setting a parent for the limbo type DIE. [LTO] Bootstrapped on x86_64-unknown-linux-gnu. OK for trunk? Thanks, Richard. PR debug/110295 * dwarf2out.cc (process_scope_var): Continue processing the decl after setting a parent in case the existing DIE was in limbo. * g++.dg/debug/pr110295.C: New testcase. --- gcc/dwarf2out.cc | 3 ++- gcc/testsuite/g++.dg/debug/pr110295.C | 19 +++ 2 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/debug/pr110295.C diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc index d89ffa66847..e70c47cec8d 100644 --- a/gcc/dwarf2out.cc +++ b/gcc/dwarf2out.cc @@ -26533,7 +26533,8 @@ process_scope_var (tree stmt, tree decl, tree origin, dw_die_ref context_die) if (die != NULL && die->die_parent == NULL) add_child_die (context_die, die); I wonder about reorganizing the function a bit to unify this parent setting with the one a bit below, which already falls through to gen_decl_die: if (decl && DECL_P (decl)) { die = lookup_decl_die (decl); /* Early created DIEs do not have a parent as the decls refer to the function as DECL_CONTEXT rather than the BLOCK. */ if (die && die->die_parent == NULL) { gcc_assert (in_lto_p); add_child_die (context_die, die); } } OK either way. Jason
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
On Mon, Jun 19, 2023 at 7:57 PM Thiago Jung Bauermann wrote: > > > Hello Manolis, > > Philipp Tomsich writes: > > > On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: > >> > >> On 5/25/23 06:35, Manolis Tsamis wrote: > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden > >> > in all cases, due to maybe_mode_change returning NULL. Relax this > >> > restriction and allow propagation when no mode change is requested. > >> > > >> > gcc/ChangeLog: > >> > > >> > * regcprop.cc (maybe_mode_change): Enable stack pointer > >> > propagation. > >> Thanks for the clarification. This is OK for the trunk. It looks > >> generic enough to have value going forward now rather than waiting. > > > > Rebased, retested, and applied to trunk. Thanks! > > Our CI found a couple of tests that started failing on aarch64-linux > after this commit. I was able to confirm manually that they don't happen > in the commit immediately before this one, and also that these failures > are still present in today's trunk. > > I have testsuite logs for last good commit, first bad commit and current > trunk here: > > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/ > > Could you please check? > > These are the new failures: > > Running gcc:gcc.target/aarch64/aarch64.exp ... > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, > sp 1 > > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ... > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve > -fno-stack-protector check-function-bodies caller_pred > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), > #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tfmov\\t(z[0-9]+\\.h), > #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - > z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - > z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - > z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - > z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - > z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - > z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - > z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - > z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - > z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - > z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - > z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve > -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - > z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n > FAIL:
Re: [PATCH] c-family: implement -ffp-contract=on
Ping. OK for trunk? On Mon, 5 Jun 2023, Alexander Monakov wrote: > Ping for the front-end maintainers' input. > > On Mon, 22 May 2023, Richard Biener wrote: > > > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches > > wrote: > > > > > > Implement -ffp-contract=on for C and C++ without changing default > > > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN). > > > > The documentation changes mention the defaults are changed for > > standard modes, I suppose you want to remove that hunk. > > > > > gcc/c-family/ChangeLog: > > > > > > * c-gimplify.cc (fma_supported_p): New helper. > > > (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA > > > contraction. > > > > > > gcc/ChangeLog: > > > > > > * common.opt (fp_contract_mode) [on]: Remove fallback. > > > * config/sh/sh.md (*fmasf4): Correct flag_fp_contract_mode test. > > > * doc/invoke.texi (-ffp-contract): Update. > > > * trans-mem.cc (diagnose_tm_1): Skip internal function calls. > > > --- > > > gcc/c-family/c-gimplify.cc | 78 ++ > > > gcc/common.opt | 3 +- > > > gcc/config/sh/sh.md| 2 +- > > > gcc/doc/invoke.texi| 8 ++-- > > > gcc/trans-mem.cc | 3 ++ > > > 5 files changed, 88 insertions(+), 6 deletions(-) > > > > > > diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc > > > index ef5c7d919f..f7635d3b0c 100644 > > > --- a/gcc/c-family/c-gimplify.cc > > > +++ b/gcc/c-family/c-gimplify.cc > > > @@ -41,6 +41,8 @@ along with GCC; see the file COPYING3. If not see > > > #include "c-ubsan.h" > > > #include "tree-nested.h" > > > #include "context.h" > > > +#include "tree-pass.h" > > > +#include "internal-fn.h" > > > > > > /* The gimplification pass converts the language-dependent trees > > > (ld-trees) emitted by the parser into language-independent trees > > > @@ -686,6 +688,14 @@ c_build_bind_expr (location_t loc, tree block, tree > > > body) > > >return bind; > > > } > > > > > > +/* Helper for c_gimplify_expr: test if target supports fma-like FN. */ > > > + > > > +static bool > > > +fma_supported_p (enum internal_fn fn, tree type) > > > +{ > > > + return direct_internal_fn_supported_p (fn, type, OPTIMIZE_FOR_BOTH); > > > +} > > > + > > > /* Gimplification of expression trees. */ > > > > > > /* Do C-specific gimplification on *EXPR_P. PRE_P and POST_P are as in > > > @@ -739,6 +749,74 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p > > > ATTRIBUTE_UNUSED, > > > break; > > >} > > > > > > +case PLUS_EXPR: > > > +case MINUS_EXPR: > > > + { > > > + tree type = TREE_TYPE (*expr_p); > > > + /* For -ffp-contract=on we need to attempt FMA contraction only > > > + during initial gimplification. Late contraction across > > > statement > > > + boundaries would violate language semantics. */ > > > + if (SCALAR_FLOAT_TYPE_P (type) > > > + && flag_fp_contract_mode == FP_CONTRACT_ON > > > + && cfun && !(cfun->curr_properties & PROP_gimple_any) > > > + && fma_supported_p (IFN_FMA, type)) > > > + { > > > + bool neg_mul = false, neg_add = code == MINUS_EXPR; > > > + > > > + tree *op0_p = _OPERAND (*expr_p, 0); > > > + tree *op1_p = _OPERAND (*expr_p, 1); > > > + > > > + /* Look for ±(x * y) ± z, swapping operands if necessary. */ > > > + if (TREE_CODE (*op0_p) == NEGATE_EXPR > > > + && TREE_CODE (TREE_OPERAND (*op0_p, 0)) == MULT_EXPR) > > > + /* '*EXPR_P' is '-(x * y) ± z'. This is fine. */; > > > + else if (TREE_CODE (*op0_p) != MULT_EXPR) > > > + { > > > + std::swap (op0_p, op1_p); > > > + std::swap (neg_mul, neg_add); > > > + } > > > + if (TREE_CODE (*op0_p) == NEGATE_EXPR) > > > + { > > > + op0_p = _OPERAND (*op0_p, 0); > > > + neg_mul = !neg_mul; > > > + } > > > + if (TREE_CODE (*op0_p) != MULT_EXPR) > > > + break; > > > + auto_vec ops (3); > > > + ops.quick_push (TREE_OPERAND (*op0_p, 0)); > > > + ops.quick_push (TREE_OPERAND (*op0_p, 1)); > > > + ops.quick_push (*op1_p); > > > + > > > + enum internal_fn ifn = IFN_FMA; > > > + if (neg_mul) > > > + { > > > + if (fma_supported_p (IFN_FNMA, type)) > > > + ifn = IFN_FNMA; > > > + else > > > + ops[0] = build1 (NEGATE_EXPR, type, ops[0]); > > > + } > > > + if (neg_add) > > > + { > > > + enum internal_fn ifn2 = ifn == IFN_FMA ? IFN_FMS : > > > IFN_FNMS; > > > + if (fma_supported_p (ifn2, type)) > > > + ifn = ifn2; > > > + else > > > + ops[2] =
Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
Hello Manolis, Philipp Tomsich writes: > On Thu, 8 Jun 2023 at 00:18, Jeff Law wrote: >> >> On 5/25/23 06:35, Manolis Tsamis wrote: >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden >> > in all cases, due to maybe_mode_change returning NULL. Relax this >> > restriction and allow propagation when no mode change is requested. >> > >> > gcc/ChangeLog: >> > >> > * regcprop.cc (maybe_mode_change): Enable stack pointer >> > propagation. >> Thanks for the clarification. This is OK for the trunk. It looks >> generic enough to have value going forward now rather than waiting. > > Rebased, retested, and applied to trunk. Thanks! Our CI found a couple of tests that started failing on aarch64-linux after this commit. I was able to confirm manually that they don't happen in the commit immediately before this one, and also that these failures are still present in today's trunk. I have testsuite logs for last good commit, first bad commit and current trunk here: https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/ Could you please check? These are the new failures: Running gcc:gcc.target/aarch64/aarch64.exp ... FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1 Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ... FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve -fno-stack-protector check-function-bodies caller_pred FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tfmov\\t(z[0-9]+\\.h), #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve -fno-stack-protector scan-assembler
Re: gcc tricore porting
On Mon, 19 Jun 2023, Mikael Pettersson via Gcc wrote: > (Note I'm reading the gcc mailing list via the Web archives, which > doesn't let me create "proper" replies. Oh well.) (there's a public-inbox instance at https://inbox.sourceware.org/gcc/ but some messages are not available there) Alexander
Re: gcc tricore porting
On Mon, Jun 19, 2023, 10:36 AM Mikael Pettersson via Gcc wrote: > (Note I'm reading the gcc mailing list via the Web archives, which > doesn't let me > create "proper" replies. Oh well.) > > On Sun Jun 18 09:58:56 GMT 2023, wrote: > > Hi, this is my first time with open source development. I worked in > > automotive for 22 years and we (generally) were using tricore series for > > these products. GCC doesn't compile on that platform. I left my work some > > days ago and so I'll have some spare time in the next few months. I would > > like to know how difficult it is to port the tricore platform on gcc and > if > > during this process somebody can support me as tutor and... also if the > gcc > > team is interested in this item... > > https://github.com/volumit has a port of gcc + binutils + newlib + gdb > to Tricore, > and it's not _that_ ancient. I have no idea where it originates from > or how complete > it is, but I do know the gcc-4.9.4 based one builds with some tweaks. > https://github.com/volumit/package_494 says there is a port in process to > gcc 9. Perhaps digging in and assessing that would be a good start. > One question is whether that code has proper assignments on file for ultimate inclusion. That should be part of your assessment. --joel > I don't know anything more about it, I'm just a collector of > cross-compilers for > obscure / lost / forgotten / abandoned targets. > > /Mikael >
[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #12 from CVS Commits --- The master branch has been updated by Jan Hubicka : https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b Author: Jan Hubicka Date: Mon Jun 19 18:28:17 2023 +0200 optimize std::max early we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs. I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later and max is used by code computing new size of vector on resize. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. * gcc.dg/tree-ssa/pr21463.c: Adjust template.
[Bug middle-end/109849] suboptimal code for vector walking loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #16 from CVS Commits --- The master branch has been updated by Jan Hubicka : https://gcc.gnu.org/g:7b34cacc5735385e7e2855d7c0a6fad60ef4a99b commit r14-1951-g7b34cacc5735385e7e2855d7c0a6fad60ef4a99b Author: Jan Hubicka Date: Mon Jun 19 18:28:17 2023 +0200 optimize std::max early we currently produce very bad code on loops using std::vector as a stack, since we fail to inline push_back which in turn prevents SRA and we fail to optimize out some store-to-load pairs. I looked into why this function is not inlined and it is inlined by clang. We currently estimate it to 66 instructions and inline limits are 15 at -O2 and 30 at -O3. Clang has similar estimate, but still decides to inline at -O2. I looked into reason why the body is so large and one problem I spotted is the way std::max is implemented by taking and returning reference to the values. const T& max( const T& a, const T& b ); This makes it necessary to store the values to memory and load them later and max is used by code computing new size of vector on resize. We optimize this to MAX_EXPR, but only during late optimizations. I think this is a common enough coding pattern and we ought to make this transparent to early opts and IPA. The following is easist fix that simply adds phiprop pass that turns the PHI of address values into PHI of values so later FRE can propagate values across memory, phiopt discover the MAX_EXPR pattern and DSE remove the memory stores. gcc/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * passes.def: Add phiprop to early optimization passes. * tree-ssa-phiprop.cc: Allow clonning. gcc/testsuite/ChangeLog: PR tree-optimization/109811 PR tree-optimization/109849 * gcc.dg/tree-ssa/phiprop-1.c: New test. * gcc.dg/tree-ssa/pr21463.c: Adjust template.
[PATCH] VECT: Apply LEN_MASK_{LOAD,STORE} into vectorizer
From: Ju-Zhe Zhong This patch is apply LEN_MASK_{LOAD,STORE} into vectorizer. I refactor gimple IR build to make codes look cleaner. gcc/ChangeLog: * internal-fn.cc (expand_partial_store_optab_fn): Add LEN_MASK_{LOAD,STORE} vectorizer support. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_len_load_store_bias): Ditto. * optabs-query.cc (can_vec_mask_load_store_p): Ditto. (get_len_load_store_mode): Ditto. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Ditto. (get_all_ones_mask): New function. (vectorizable_store): Add LEN_MASK_{LOAD,STORE} vectorizer support. (vectorizable_load): Ditto. --- gcc/internal-fn.cc | 35 +- gcc/optabs-query.cc| 25 +++- gcc/tree-vect-stmts.cc | 259 + 3 files changed, 213 insertions(+), 106 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index c911ae790cb..e10c21de5f1 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -2949,7 +2949,7 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) * OPTAB. */ static void -expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) +expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) { class expand_operand ops[5]; tree type, lhs, rhs, maskt, biast; @@ -2957,7 +2957,7 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) insn_code icode; maskt = gimple_call_arg (stmt, 2); - rhs = gimple_call_arg (stmt, 3); + rhs = gimple_call_arg (stmt, internal_fn_stored_value_index (ifn)); type = TREE_TYPE (rhs); lhs = expand_call_mem_ref (type, stmt, 0); @@ -4435,6 +4435,7 @@ internal_load_fn_p (internal_fn fn) case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: case IFN_LEN_LOAD: +case IFN_LEN_MASK_LOAD: return true; default: @@ -4455,6 +4456,7 @@ internal_store_fn_p (internal_fn fn) case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: case IFN_LEN_STORE: +case IFN_LEN_MASK_STORE: return true; default: @@ -4494,6 +4496,10 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_STORE_LANES: return 2; +case IFN_LEN_MASK_LOAD: +case IFN_LEN_MASK_STORE: + return 3; + case IFN_MASK_GATHER_LOAD: case IFN_MASK_SCATTER_STORE: return 4; @@ -4519,6 +4525,9 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_LEN_STORE: return 3; +case IFN_LEN_MASK_STORE: + return 4; + default: return -1; } @@ -4583,13 +4592,31 @@ internal_len_load_store_bias (internal_fn ifn, machine_mode mode) { optab optab = direct_internal_fn_optab (ifn); insn_code icode = direct_optab_handler (optab, mode); + int bias_argno = 3; + if (icode == CODE_FOR_nothing) +{ + machine_mode mask_mode + = targetm.vectorize.get_mask_mode (mode).require (); + if (ifn == IFN_LEN_LOAD) + { + /* Try LEN_MASK_LOAD. */ + optab = direct_internal_fn_optab (IFN_LEN_MASK_LOAD); + } + else + { + /* Try LEN_MASK_STORE. */ + optab = direct_internal_fn_optab (IFN_LEN_MASK_STORE); + } + icode = convert_optab_handler (optab, mode, mask_mode); + bias_argno = 4; +} if (icode != CODE_FOR_nothing) { /* For now we only support biases of 0 or -1. Try both of them. */ - if (insn_operand_matches (icode, 3, GEN_INT (0))) + if (insn_operand_matches (icode, bias_argno, GEN_INT (0))) return 0; - if (insn_operand_matches (icode, 3, GEN_INT (-1))) + if (insn_operand_matches (icode, bias_argno, GEN_INT (-1))) return -1; } diff --git a/gcc/optabs-query.cc b/gcc/optabs-query.cc index 276f8408dd7..4394d391200 100644 --- a/gcc/optabs-query.cc +++ b/gcc/optabs-query.cc @@ -566,11 +566,14 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load) { optab op = is_load ? maskload_optab : maskstore_optab; + optab len_op = is_load ? len_maskload_optab : len_maskstore_optab; machine_mode vmode; /* If mode is vector mode, check it directly. */ if (VECTOR_MODE_P (mode)) -return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing; +return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing + || convert_optab_handler (len_op, mode, mask_mode) + != CODE_FOR_nothing; /* Otherwise, return true if there is some vector mode with the mask load/store supported. */ @@ -584,7 +587,9 @@ can_vec_mask_load_store_p (machine_mode mode, vmode = targetm.vectorize.preferred_simd_mode (smode); if (VECTOR_MODE_P (vmode) && targetm.vectorize.get_mask_mode (vmode).exists (_mode) - &&