[Bug tree-optimization/51030] PHI opt does not handle value-replacement with a transfer function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51030 --- Comment #5 from Andrew Pinski --- Created attachment 56317 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56317=edit First set of patches Note the last patch is still being worked on really. Note the first patch is just a small speed up and makes the last patch easier to write. The middle 2 patches are ones which move some of the optimizations that value_replacement does to match. I think I need the `a == 1` cases still.
[Bug tree-optimization/51030] PHI opt does not handle value-replacement with a transfer function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51030 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #4 from Andrew Pinski --- I am rewriting value-replacement in phi-opt to use match-and-simplify which should simplify many of this.
[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #7 from Hongtao.liu --- (In reply to Andrew Pinski from comment #3) > First off does this even make sense to vectorize but rather do some kind of > scalar reduction with respect to j = j^1 here . Filed PR 112104 for that. > > Basically vectorizing this loop is a waste compared to that. Yes, it's always zero, it would be nice if the middle end can optimize the whole loop off. So for this PR, it's more related to the misoptimization of the redundant loop(better finalize the induction variable with a simple assignment), not vectorization.
[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #6 from Hongtao.liu --- (In reply to Andrew Pinski from comment #5) > Oh this is the original code: > https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c > Yes, it's from unixbench.
[Bug libstdc++/82366] std::regex constructor called from shared library throws std::bad_cast
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82366 Alibek Omarov changed: What|Removed |Added CC||a1ba.omarov at gmail dot com --- Comment #7 from Alibek Omarov --- Can confirm, it still happens with GCC/libstdc++ 13.0. However, in my case, it's in static initializer.
[Bug tree-optimization/112104] loop of ^1 should just be reduced to ^(n&1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104 --- Comment #1 from Andrew Pinski --- This shows up in a really really bad benchmark: https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c
[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #5 from Andrew Pinski --- Oh this is the original code: https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench/src/whets.c HEHEHEHEHEHEHEHEH. Basically after optimizing: _9 = j_19 != 1; _14 = (long int) _9; Over to: _14 = j_19 ^ 1; We could optimize this whole loop out. Note this is a bad benchmark.
[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 --- Comment #4 from Andrew Pinski --- Is there a non-reduced testcase here? Or does the loop really just do j = j^1 ?
[Bug tree-optimization/111972] [14 regression] missed vectorzation for bool a = j != 1; j = (long int)a;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111972 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=112104 --- Comment #3 from Andrew Pinski --- First off does this even make sense to vectorize but rather do some kind of scalar reduction with respect to j = j^1 here . Filed PR 112104 for that. Basically vectorizing this loop is a waste compared to that.
[Bug tree-optimization/112104] New: loop of ^1 should just be reduced to ^(n&1)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112104 Bug ID: 112104 Summary: loop of ^1 should just be reduced to ^(n&1) Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` int foo(long n3) { int j = 0; for(int i=0; i=0) j = j ^ (n3&1); return j; } ``` We should figure out that j as the result is just alternating between 0 and 1 (VRP figures that that is the range) in SCCP pattern matching. I Noticed this while looking into PR 111972
[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #17 from Andrew Pinski --- *** Bug 111833 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833 Andrew Pinski changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #6 from Andrew Pinski --- Dup. *** This bug has been marked as a duplicate of bug 111820 ***
[Bug target/103861] [i386] vectorize v2qi vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103861 --- Comment #15 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:7eed861e8ca3f533e56dea6348573caa09f16f5e commit r14-4964-g7eed861e8ca3f533e56dea6348573caa09f16f5e Author: liuhongt Date: Mon Oct 23 13:40:10 2023 +0800 Support vec_cmpmn/vcondmn for v2hf/v4hf. gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle V2HF/V2BF/V4HF/V4BFmode. * config/i386/i386.cc (ix86_get_mask_mode): Return QImode when data_mode is V4HF/V2HFmode. * config/i386/mmx.md (vec_cmpv4hfqi): New expander. (vcond_mask_v4hi): Ditto. (vcond_mask_qi): Ditto. (vec_cmpv2hfqi): Ditto. (vcond_mask_v2hi): Ditto. (mmx_plendvb_): Add 2 combine splitters after the patterns. (mmx_pblendvb_v8qi): Ditto. (v2hi3): Add a combine splitter after the pattern. (3): Ditto. (v8qi3): Ditto. (3): Ditto. * config/i386/sse.md (vcond): Merge this with .. (vcond): .. this into .. (vcond): .. this, and extend to V8BF/V16BF/V32BFmode. gcc/testsuite/ChangeLog: * g++.target/i386/part-vect-vcondhf.C: New test. * gcc.target/i386/part-vect-vec_cmphf.c: New test.
[Bug target/111318] RISC-V: Redundant vsetvl instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111318 Lehua Ding changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Lehua Ding --- Confirmed.
[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833 --- Comment #5 from Hongtao.liu --- It's the same issue as PR111820, thus should be fixed.
[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #16 from Andrew Pinski --- .
[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #15 from Hongtao.liu --- (In reply to Richard Biener from comment #13) > (In reply to Hongtao.liu from comment #12) > > Fixed in GCC14, not sure if we want to backport the patch. > > If so, the patch needs to be adjusted since GCC13 doesn't support auto_mpz. > > Yes, we want to backport. Also fixed in GCC13.
[Bug tree-optimization/111820] [13 Regression] Compiler time hog in the vectorizer with `-O3 -fno-tree-vrp`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111820 --- Comment #14 from CVS Commits --- The releases/gcc-13 branch has been updated by hongtao Liu : https://gcc.gnu.org/g:82919cf4cb232166fed03d84a91fefd07feef6bb commit r13-7988-g82919cf4cb232166fed03d84a91fefd07feef6bb Author: liuhongt Date: Wed Oct 18 10:08:24 2023 +0800 Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big. There's loop in vect_peel_nonlinear_iv_init to get init_expr * pow (step_expr, skip_niters). When skipn_iters is too big, compile time hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is pow of 2, otherwise give up vectorization when skip_niters >= TYPE_PRECISION (TREE_TYPE (init_expr)). Also give up vectorization when niters_skip is negative which will be used for fully masked loop. gcc/ChangeLog: PR tree-optimization/111820 PR tree-optimization/111833 * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give up vectorization for nonlinear iv vect_step_op_mul when step_expr is not exact_log2 and niters is greater than TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize for nagative niters_skip which will be used by fully masked loop. (vect_can_advance_ivs_p): Pass whole phi_info to vect_can_peel_nonlinear_iv_p. * tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize init_expr * pow (step_expr, skipn) to init_expr << (log2 (step_expr) * skipn) when step_expr is exact_log2. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111820-1.c: New test. * gcc.target/i386/pr111820-2.c: New test. * gcc.target/i386/pr111820-3.c: New test. * gcc.target/i386/pr103144-mul-1.c: Adjust testcase. * gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
[Bug tree-optimization/111833] [13/14 Regression] GCC: 14: hangs on a simple for loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111833 --- Comment #4 from CVS Commits --- The releases/gcc-13 branch has been updated by hongtao Liu : https://gcc.gnu.org/g:82919cf4cb232166fed03d84a91fefd07feef6bb commit r13-7988-g82919cf4cb232166fed03d84a91fefd07feef6bb Author: liuhongt Date: Wed Oct 18 10:08:24 2023 +0800 Avoid compile time hog on vect_peel_nonlinear_iv_init for nonlinear induction vec_step_op_mul when iteration count is too big. There's loop in vect_peel_nonlinear_iv_init to get init_expr * pow (step_expr, skip_niters). When skipn_iters is too big, compile time hogs. To avoid that, optimize init_expr * pow (step_expr, skip_niters) to init_expr << (exact_log2 (step_expr) * skip_niters) when step_expr is pow of 2, otherwise give up vectorization when skip_niters >= TYPE_PRECISION (TREE_TYPE (init_expr)). Also give up vectorization when niters_skip is negative which will be used for fully masked loop. gcc/ChangeLog: PR tree-optimization/111820 PR tree-optimization/111833 * tree-vect-loop-manip.cc (vect_can_peel_nonlinear_iv_p): Give up vectorization for nonlinear iv vect_step_op_mul when step_expr is not exact_log2 and niters is greater than TYPE_PRECISION (TREE_TYPE (step_expr)). Also don't vectorize for nagative niters_skip which will be used by fully masked loop. (vect_can_advance_ivs_p): Pass whole phi_info to vect_can_peel_nonlinear_iv_p. * tree-vect-loop.cc (vect_peel_nonlinear_iv_init): Optimize init_expr * pow (step_expr, skipn) to init_expr << (log2 (step_expr) * skipn) when step_expr is exact_log2. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111820-1.c: New test. * gcc.target/i386/pr111820-2.c: New test. * gcc.target/i386/pr111820-3.c: New test. * gcc.target/i386/pr103144-mul-1.c: Adjust testcase. * gcc.target/i386/pr103144-mul-2.c: Adjust testcase.
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 --- Comment #9 from JuzheZhong --- (In reply to Maciej W. Rozycki from comment #7) > Thank you for all your explanations. I think I'm still missing something > here, so I'll write it differently (and let's ignore the tail-agnostic vs > tail-undisturbed choice for the purpose of this consideration). > > Why is the `vl' value determined by hardware from `avl' by an explicit > request (!) of the programmer who inserted the vsetvl intrinsics ignored? > Is the compiler able to prove the use of `avl' in place of `vl' does not > affect the operation of the VLE32.V and VSE32.V instructions in any way? > What is the purpose of these intrinsics if they can be freely ignored? > > Please forgive me if my questions seem to you obvious to answer or > irrelevant, I'm still rather new to this RVV stuff. As long as the ratio of user vsetvl intrinsics are same as the following RVV normal instruction, compiler is free to optimize it. For example: vl = __riscv_vsetvl_e32m1 (avl) __riscv_vadd_vv_i32m1 (...,vl) A naive way to insert vsetvl: vsetvl VL, AVL e32 m1 vsetvl zero, VL e32 m1 vadd.vv Howerver, since they are have same ratio, we can do it: vsetvl zero, AVL e32 m1 vadd.vv It's absolutely correct in-dependent on hardware. However, different ratio: vl = __riscv_vsetvl_e32m1 (avl) __riscv_vadd_vv_i64m1 (...,vl) vsetvl VL, AVL e32 m1 vsetvl zero, VL e64 m1 vadd.vv We can't optimize it. This is the only correct codegen. Thanks.
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 --- Comment #8 from JuzheZhong --- (In reply to Maciej W. Rozycki from comment #7) > Thank you for all your explanations. I think I'm still missing something > here, so I'll write it differently (and let's ignore the tail-agnostic vs > tail-undisturbed choice for the purpose of this consideration). > > Let me paste the whole assembly code produced here (sans decorations): > > beq a5,zero,.L2 > vsetvli zero,a6,e32,m1,tu,ma > .L3: > beq a4,zero,.L7 > li a5,0 > .L5: > vle32.v v1,0(a0) > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > addia5,a5,1 > bne a4,a5,.L5 > .L7: > ret > .L2: > vsetvli zero,a6,e32,m1,tu,ma > j .L3 > > This seems to me to correspond to this source code: > > if (cond) > __riscv_vsetvl_e32m1(avl); > else > __riscv_vsetvl_e16mf2(avl); > for (size_t i = 0; i < n; i += 1) { > vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl); > vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl); > vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl); > __riscv_vse32_v_i32m1(out, c, avl); > } > > And in that case I'd expect the conditional to be optimised away, as its > result is ignored (along with the intrinsics) and does not affect actual > code executed except for the different execution path, i.e.: > > beq a4,zero,.L7 > vsetvli zero,a6,e32,m1,tu,ma > li a5,0 > .L5: > vle32.v v1,0(a0) > vle32.v v1,0(a1) > vle32.v v1,0(a2) > vse32.v v1,0(a3) > addia5,a5,1 > bne a4,a5,.L5 > .L7: > ret > Good catch ! I think we have a missed-optimization here and I agree this code is correct and optimal codegen for this case. We have a close-to-optimal (not optimal enough) codegen for now. And this optimization should not be done by VSETVL PASS. After VSETVL PASS fusion, both e16mf2 and e32m1 user vsetvl instrinsic are fused into e32m1, tu. They are totally the same so it's meaningless seperate them into different blocks (They should be the same single block). The reason why we missed an optimization here is because we expand user vsetvl __riscv_vsetvl_e32m1 and __riscv_vsetvl_e16mf2 into 2 different RTL expressions. The before PASSes (before VSETVL) don't known they are equivalent, so separate them into different blocks. If you change codes as follows: if (cond) vl = __riscv_vsetvl_e32m1(avl); else vl = __riscv_vsetvl_e32m1(avl); I am sure the codegen will be as you said above. (A single vsetvl e32m1 tu in a single block). To optimize it, a alternative approach is that we expand all user vsetvl instrinscs into same RTL expression (as long as they are having same ratio). Meaning, expand __riscv_vsetvl_e64m1 __riscv_vsetvl_e32m1 __riscv_vsetvl_e16mf2 __riscv_vsetvl_e8mf8 into same RTL expression since their VL outputs are definitely the same. I don't see it will cause any problems here. But different ratio like 32m1 and e32mf2 should be different RLT expression. I am not sure kito agree with this idea. Another alternative approach is that we enhance bb_reorder PASS. The VSETVL PASS is run before bb_reorder PASS and current bb_reorder PASS is unable to fuse these 2 vsetvls e32m1 Tu into same block because we split it into "real" vsetvls which is the RTL pattern has side effects. The "real" vsetvl patterns which generate assembly should have side effects since vsetvl does change global VL/VTYPE status and also set a general register. No matter which approach to optimize it, I won't do it in GCC-14 since stage 1 is soon to close. We have a few more features (which are much more imporant) that we are planning and working to support in GCC-14. I have confidence that our RVV GCC current VSETVL PASS is really optimal and fancy enough. After stage 1 close, we won't do any optimizations, we will only run full coverage testing (for example, using different LMUL different -march to run the whole gcc testsuite) and fix bugs.
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 --- Comment #7 from Maciej W. Rozycki --- Thank you for all your explanations. I think I'm still missing something here, so I'll write it differently (and let's ignore the tail-agnostic vs tail-undisturbed choice for the purpose of this consideration). Let me paste the whole assembly code produced here (sans decorations): beq a5,zero,.L2 vsetvli zero,a6,e32,m1,tu,ma .L3: beq a4,zero,.L7 li a5,0 .L5: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia5,a5,1 bne a4,a5,.L5 .L7: ret .L2: vsetvli zero,a6,e32,m1,tu,ma j .L3 This seems to me to correspond to this source code: if (cond) __riscv_vsetvl_e32m1(avl); else __riscv_vsetvl_e16mf2(avl); for (size_t i = 0; i < n; i += 1) { vint32m1_t a = __riscv_vle32_v_i32m1(in1, avl); vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, avl); vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, avl); __riscv_vse32_v_i32m1(out, c, avl); } And in that case I'd expect the conditional to be optimised away, as its result is ignored (along with the intrinsics) and does not affect actual code executed except for the different execution path, i.e.: beq a4,zero,.L7 vsetvli zero,a6,e32,m1,tu,ma li a5,0 .L5: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia5,a5,1 bne a4,a5,.L5 .L7: ret However actual source code is as follows: size_t vl; if (cond) vl = __riscv_vsetvl_e32m1(avl); else vl = __riscv_vsetvl_e16mf2(avl); for (size_t i = 0; i < n; i += 1) { vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); vint32m1_t b = __riscv_vle32_v_i32m1_tu(a, in2, vl); vint32m1_t c = __riscv_vle32_v_i32m1_tu(b, in3, vl); __riscv_vse32_v_i32m1(out, c, vl); } Based on what you write I'd expect code like this instead: beq a5,zero,.L2 vsetvli a6,a6,e16,mf2,ta,ma .L3: beq a4,zero,.L7 vsetvli zero,a6,e32,m1,tu,ma li a5,0 .L5: vle32.v v1,0(a0) vle32.v v1,0(a1) vle32.v v1,0(a2) vse32.v v1,0(a3) addia5,a5,1 bne a4,a5,.L5 .L7: ret .L2: vsetvli a6,a6,e32,m1,ta,ma j .L3 which is roughly what you say LLVM produces. Why is the `vl' value determined by hardware from `avl' by an explicit request (!) of the programmer who inserted the vsetvl intrinsics ignored? Is the compiler able to prove the use of `avl' in place of `vl' does not affect the operation of the VLE32.V and VSE32.V instructions in any way? What is the purpose of these intrinsics if they can be freely ignored? Please forgive me if my questions seem to you obvious to answer or irrelevant, I'm still rather new to this RVV stuff.
[Bug target/111888] RISC-V: Horrible redundant number vsetvl instructions in vectorized codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111888 JuzheZhong changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from JuzheZhong --- Fixed
[Bug target/111888] RISC-V: Horrible redundant number vsetvl instructions in vectorized codes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111888 --- Comment #1 from CVS Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:e37bc2cf00671e3bc4d82f2627330c0f885a6f29 commit r14-4961-ge37bc2cf00671e3bc4d82f2627330c0f885a6f29 Author: Juzhe-Zhong Date: Thu Oct 26 16:13:51 2023 +0800 RISC-V: Add AVL propagation PASS for RVV auto-vectorization This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization which is a known issue for a long time and I finally find the time to address it. Consider a simple vector addition operation: https://godbolt.org/z/7hfGfEjW3 void foo (int *__restrict a, int *__restrict b, int *__restrict n) { for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; } Optimized IR: Loop body: _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma ... vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0) vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1) vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2 .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4) We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling. The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment: vect__7.12_19 = vect__6.11_20 + vect__4.8_27; GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization. Such flow are used by all other targets like ARM SVE (RVV also uses such flow): ARM SVE: .L3: ld1wz30.s, p7/z, [x0, x3, lsl 2] -> predicated load ld1wz31.s, p7/z, [x1, x3, lsl 2] -> predicated load add z31.s, z31.s, z30.s-> un-predicated add st1wz31.s, p7, [x0, x3, lsl 2] -> predicated store Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it. Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons: 1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend. 2. Changing Loop vectorizer for it will make code base ugly and hard to maintain. 3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, We also need COND_LEN_EXTEND, , COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns. To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls due to AVL/VL toggling. The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS) Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several experiments and tries. The reasons as follows: 1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL PASS become heavy and heavy again, then we will need to refactor it again in the future. Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor fixes. 2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS. 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation. 4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations. This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements. We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate VSETVL PASS again which is already so complicated.) Here is an example to demonstrate more: https://godbolt.org/z/bE86sv3q5 void foo2 (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict a2, int *__restrict b2,
[Bug target/111318] RISC-V: Redundant vsetvl instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111318 --- Comment #1 from CVS Commits --- The master branch has been updated by Pan Li : https://gcc.gnu.org/g:e37bc2cf00671e3bc4d82f2627330c0f885a6f29 commit r14-4961-ge37bc2cf00671e3bc4d82f2627330c0f885a6f29 Author: Juzhe-Zhong Date: Thu Oct 26 16:13:51 2023 +0800 RISC-V: Add AVL propagation PASS for RVV auto-vectorization This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization which is a known issue for a long time and I finally find the time to address it. Consider a simple vector addition operation: https://godbolt.org/z/7hfGfEjW3 void foo (int *__restrict a, int *__restrict b, int *__restrict n) { for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; } Optimized IR: Loop body: _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma ... vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0) vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1) vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2 .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4) We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling. The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment: vect__7.12_19 = vect__6.11_20 + vect__4.8_27; GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization. Such flow are used by all other targets like ARM SVE (RVV also uses such flow): ARM SVE: .L3: ld1wz30.s, p7/z, [x0, x3, lsl 2] -> predicated load ld1wz31.s, p7/z, [x1, x3, lsl 2] -> predicated load add z31.s, z31.s, z30.s-> un-predicated add st1wz31.s, p7, [x0, x3, lsl 2] -> predicated store Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it. Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons: 1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend. 2. Changing Loop vectorizer for it will make code base ugly and hard to maintain. 3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, We also need COND_LEN_EXTEND, , COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns. To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls due to AVL/VL toggling. The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS) Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several experiments and tries. The reasons as follows: 1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL PASS become heavy and heavy again, then we will need to refactor it again in the future. Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor fixes. 2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS. 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation. 4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations. This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements. We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate VSETVL PASS again which is already so complicated.) Here is an example to demonstrate more: https://godbolt.org/z/bE86sv3q5 void foo2 (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict a2, int *__restrict b2,
[Bug target/112103] [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103 Roger Sayle changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC||roger at nextmovesoftware dot com Last reconfirmed||2023-10-26
[Bug fortran/104649] ICE in gfc_match_formal_arglist, at fortran/decl.cc:6733 since r6-1958-g4668d6f9c00d4767
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104649 anlauf at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |anlauf at gcc dot gnu.org --- Comment #5 from anlauf at gcc dot gnu.org --- Submitted: https://gcc.gnu.org/pipermail/fortran/2023-October/059872.html
[Bug libstdc++/112089] std::shared_lock::unlock should throw operation_not_permitted instead resource_deadlock_would_occur
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112089 Jonathan Wakely changed: What|Removed |Added Target Milestone|--- |11.5 --- Comment #3 from Jonathan Wakely --- Fixed on trunk so far.
[Bug tree-optimization/112096] `(a || b) ? a : b` should be simplified to a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112096 --- Comment #3 from Andrew Pinski --- (In reply to Andrew Pinski from comment #2) > ``` > int t01(int x, int y) > { > bool t = x == 5 && y == 5; > if (t) return 5; return y; > } // y > ``` > Is able to be detected in phiopt2. Just not the == 0/!=0 case. r0-125639-gc9ef86a1717dd6 added that code. https://inbox.sourceware.org/gcc-patches/01cebdbf$a3155310$e93ff930$@arm.com/ I really have not read this code in over 10 years and I forgot I even reviewed it slightly. Anyways I am thinking about ways of improving this even further ... And maybe even rewriting part of it since it has got way too complex.
[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100 --- Comment #4 from Jonathan Wakely --- (In reply to Jonathan Wakely from comment #2) > It would need a completely new category of "memory location that you can > read and write to but nothing else" That was supposed to say "read and write zero to but nothing else".
[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100 --- Comment #3 from Jonathan Wakely --- (In reply to Andrew Pinski from comment #1) > Maybe some how libstdc++ debug mode can catch this > https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libstdc++/manual/manual/ > debug_mode_using.html#debug_mode.using.mode > -D_GLIBCXX_DEBUG Only by adding a "past-the-end character is still null" check to std::string member functions (which ones, all of them? Just accessors that would let you read the null, like c_str, operator[], data etc.?) That would be doable, but sounds pretty expensive.
[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100 --- Comment #2 from Jonathan Wakely --- (In reply to Jan Engelhardt from comment #0) > ==55843==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xsomething How would that even be possible? The terminating nul clearly has to be in allocated memory, because you are allowed to read it. So asan can't treat it as overflow. It's valid memory. Not only that, it's valid *writable* memory. You are allowed to store '\0' there. It would need a completely new category of "memory location that you can read and write to but nothing else". That's not an asan or ubsan check. > https://eel.is/c++draft/string.access specifies the modification of the NUL > char's position to values other than \0 is UB, so it should warn about this. There are hundreds of things the standard says are undefined that asan and ubsan can never detect. It's unreasonable to expect it IMHO.
[Bug tree-optimization/112096] `(a || b) ? a : b` should be simplified to a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112096 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-10-26 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #2 from Andrew Pinski --- ``` int t01(int x, int y) { bool t = x == 5 && y == 5; if (t) return 5; return y; } // y ``` Is able to be detected in phiopt2. Just not the == 0/!=0 case. Nor: ``` int t1(int x, int y) { bool t = x != 5 || y != 5; if (t) return x; return 5; } // x ``` I have to look at where phiopt is able to detect this and improve it for these 2 cases ...
[Bug libstdc++/112089] std::shared_lock::unlock should throw operation_not_permitted instead resource_deadlock_would_occur
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112089 --- Comment #2 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:0c305f3dec9a992dd775a3b9607b7b1e8c051859 commit r14-4960-g0c305f3dec9a992dd775a3b9607b7b1e8c051859 Author: Jonathan Wakely Date: Thu Oct 26 16:51:30 2023 +0100 libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089] The incorrect errc constant here looks like a copy error. libstdc++-v3/ChangeLog: PR libstdc++/112089 * include/std/shared_mutex (shared_lock::unlock): Change errc constant to operation_not_permitted. * testsuite/30_threads/shared_lock/locking/112089.cc: New test.
[Bug c/112101] feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug middle-end/112098] suboptimal optimization of inverted bit extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098 --- Comment #3 from Andrew Pinski --- Trying 6, 7, 8 -> 9: 6: {r105:SI=r108:SI 0>>0x9;clobber flags:CC;} REG_DEAD r108:SI REG_UNUSED flags:CC 7: {r106:SI=r105:SI&0x1;clobber flags:CC;} REG_DEAD r105:SI REG_UNUSED flags:CC 8: {r107:SI=r106:SI^0x1;clobber flags:CC;} REG_DEAD r106:SI REG_UNUSED flags:CC 9: {r103:SI=r107:SI<<0x4;clobber flags:CC;} REG_DEAD r107:SI REG_UNUSED flags:CC Failed to match this instruction: (parallel [ (set (reg:SI 103) (and:SI (lshiftrt:SI (xor:SI (reg:SI 108) (const_int 512 [0x200])) (const_int 5 [0x5])) (const_int 16 [0x10]))) (clobber (reg:CC 17 flags)) ]) The xor here maybe should have been not. But I can't remember if we allow 4->3 combining or just 4->2.
[Bug modula2/111530] Unable to build GM2 standard library on BSD due to a `getopt_long_only' GNU extension dependency
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111530 Gaius Mulley changed: What|Removed |Added CC||gaius at gcc dot gnu.org --- Comment #2 from Gaius Mulley --- Created attachment 56316 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56316=edit Proposed fix Here is a proposed patch which is currently undergoing bootstrap testing. I thought I'd post the proposed patch for testing and potential comments. It uses the libiberty getopt long functions (wrapped up inside libgm2/libm2pim/cgetopt.cc) and only enables this implementation if configure detects no getopt_long and friends on the target.
[Bug libstdc++/112100] ubsan: misses UB when modifying std::string's trailing \0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100 Andrew Pinski changed: What|Removed |Added Component|sanitizer |libstdc++ --- Comment #1 from Andrew Pinski --- Maybe some how libstdc++ debug mode can catch this https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libstdc++/manual/manual/debug_mode_using.html#debug_mode.using.mode -D_GLIBCXX_DEBUG
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #8 from Kaze Emanuar --- This code is just an example, but I have seen this issue appear in many of my collision functions. I agree it's not a huge issue in my use case, but it'd still be cool to have this work well. I can work around it with inline assembly if this is not deemed an important enough issue to address.
[Bug tree-optimization/111957] `a ? abs(a) : 0` is not simplified to just abs(a)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111957 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target Milestone|--- |14.0 --- Comment #6 from Andrew Pinski --- fixed.
[Bug tree-optimization/111957] `a ? abs(a) : 0` is not simplified to just abs(a)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111957 --- Comment #5 from CVS Commits --- The trunk branch has been updated by Andrew Pinski : https://gcc.gnu.org/g:662655e22dddf5392d9aa67fce45beee980e5454 commit r14-4955-g662655e22dddf5392d9aa67fce45beee980e5454 Author: Andrew Pinski Date: Tue Oct 24 23:13:18 2023 + match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957] This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead of abs. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111957 gcc/ChangeLog: * match.pd (`a != C1 ? abs(a) : C2`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-40.c: New test.
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #7 from Andrew Pinski --- Also is this function from real code or just an example to show the issue? I suspect in real code you either have 2 extra nops or a scheduling bubble. the nops might not make a huge difference ...
[Bug testsuite/111969] RISC-V rv32gcv: 12 grouped flaky failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111969 --- Comment #6 from Patrick O'Neill --- Mixed up my hashes when copy/pasting. r14-4875-g9cf2e7441ee passes locally/CI
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #6 from Andrew Pinski --- It just happened the scheduler didn't schedule it that way. Scheduling is NP complete problem too.
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #5 from Andrew Pinski --- /* True if mflo and mfhi can be immediately followed by instructions which write to the HI and LO registers. According to MIPS specifications, MIPS ISAs I, II, and III need (at least) two instructions between the reads of HI/LO and instructions which write them, and later ISAs do not. Contradicting the MIPS specifications, some MIPS IV processor user manuals (e.g. the UM for the NEC Vr5000) document needing the instructions between HI/LO reads and writes, as well. Therefore, we declare only MIPS32, MIPS64 and later ISAs to have the interlocks, plus any specific earlier-ISA CPUs for which CPU documentation declares that the instructions are really interlocked. */ #define ISA_HAS_HILO_INTERLOCKS (mips_isa_rev >= 1 \ || TARGET_MIPS5500 \ || TARGET_MIPS5900 \ || TARGET_LOONGSON_2EF) So the question becomes what are you compiling for?
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #4 from Kaze Emanuar --- I'm using the vr4300 (Nintendo 64). It does have the hazard between mult and mflos. MULT can't be within 2 instructions of the MFLO. This shouldn't be an issue here though since there were 3 instructions available to put into the 2 NOP slots the MULT<>MFLO clash caused.
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #3 from Andrew Pinski --- Which mips arch are you really trying to compile for? Mips 1, 2, 4 or mips32 (r1-r5 or r6). There are many different ones and mips32 (and above) does not have any delay slots/hazards for the mult instruction.
[Bug target/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #2 from Andrew Pinski --- -march=mips32r2 removes the nops. Iirc there was a hazard between the mflo and mult instructions for older architectures.
[Bug rtl-optimization/111971] [12/13/14 regression] ICE: maximum number of generated reload insns per insn achieved (90) since r12-6803-g85419ac59724b7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111971 --- Comment #6 from Vladimir Makarov --- (In reply to Andrew Pinski from comment #4) > But r1 is the argument register. It is even worse, r1 is a stack pointer. Still the compilation should not finish by LRA failure. I've just started to work on this problem. I hope a patch fixing this will be committed on this week or at the beginning of the next week.
[Bug middle-end/111632] gcc fails to bootstrap when using libc++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632 --- Comment #6 from Sam James --- Try replying to the patch with 'ping'. I'm not a reviewer, but it both LGTM and we're using it in Gentoo with no reported problems.
[Bug middle-end/111632] gcc fails to bootstrap when using libc++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111632 --- Comment #5 from Dimitry Andric --- Is there any further action required to get this patch in? :)
[Bug target/112103] New: [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112103 Bug ID: 112103 Summary: [14 regression] gcc.target/powerpc/rlwinm-0.c fails after r14-4941-gd1bb9569d70304 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:d1bb9569d7030490fe7bb35af432f934560d689d, r14-4941-gd1bb9569d70304 make -k check-gcc RUNTESTFLAGS="powerpc.exp=gcc.target/powerpc/rlwinm-0.c" FAIL: gcc.target/powerpc/rlwinm-0.c scan-assembler-times (?n)^\\s+rldicl 3081 FAIL: gcc.target/powerpc/rlwinm-0.c scan-assembler-times (?n)^\\s+rlwinm 3093 # of expected passes5 # of unexpected failures2 These changes in code output are OK as neither the original rlwinm nor the rldicl actually have any effect. So in the short term the test case just needs to update its instruction counts. We are tracking something to get rid of the extraneous ops later. seurer@ltcden2-lp1:~/gcc/git/build/gcc-test$ diff rlwinm-0.s.r14-4940 rlwinm-0.s.r14-4941 5371c5371 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 6089c6089 < rlwinm 3,3,0,0xff --- > rldicl 3,3,0,32 8959c8959 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 9677c9677 < rlwinm 3,3,0,0xff --- > rldicl 3,3,0,32 12546c12546 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 13264c13264 < rlwinm 3,3,0,0xff --- > rldicl 3,3,0,32 16131c16131 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 19715c19715 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 23298c23298 < rlwinm 3,3,0,0x --- > rldicl 3,3,0,32 commit d1bb9569d7030490fe7bb35af432f934560d689d (HEAD) Author: Roger Sayle Date: Thu Oct 26 10:06:59 2023 +0100 PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.
[Bug c++/100470] std::is_nothrow_move_constructible incorrect behavior for explicitly defaulted members
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100470 Johel Ernesto Guerrero Peña changed: What|Removed |Added CC||johelegp at gmail dot com --- Comment #6 from Johel Ernesto Guerrero Peña --- Is this a duplicate of Bug 96090?
[Bug testsuite/109951] [14 Regression] libgomp, testsuite: non-native multilib c++ tests fail on Darwin.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951 --- Comment #14 from CVS Commits --- The master branch has been updated by Thomas Schwinge : https://gcc.gnu.org/g:d8ff4b96b4be3bb4346c045bd0a7337079eabf90 commit r14-4949-gd8ff4b96b4be3bb4346c045bd0a7337079eabf90 Author: Thomas Schwinge Date: Mon Sep 11 11:36:31 2023 +0200 libatomic: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7 "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]", this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d "libatomic/test: Fix compilation for build sysroot" done differently, avoiding build-tree testing use of any random gunk that may appear in build-time 'CC'. PR testsuite/109951 libatomic/ * configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/lib/libatomic.exp (libatomic_init): If '--with-build-sysroot=[...]' was specified, use it for build-tree testing. * testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't set. (SYSROOT_CFLAGS_FOR_TARGET): Set.
[Bug testsuite/109951] [14 Regression] libgomp, testsuite: non-native multilib c++ tests fail on Darwin.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109951 --- Comment #13 from CVS Commits --- The master branch has been updated by Thomas Schwinge : https://gcc.gnu.org/g:967d4171b2eb0557e86ba28996423353f0f1b141 commit r14-4948-g967d4171b2eb0557e86ba28996423353f0f1b141 Author: Thomas Schwinge Date: Mon Sep 11 10:50:00 2023 +0200 libffi: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR109951] Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7 "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]", this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b "libffi/test: Fix compilation for build sysroot" done differently, avoiding build-tree testing use of any random gunk that may appear in build-time 'CC', 'CXX'. PR testsuite/109951 libffi/ * configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'. : Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead set 'SYSROOT_CFLAGS_FOR_TARGET'. * Makefile.in: Regenerate. * configure: Likewise. * include/Makefile.in: Likewise. * man/Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/lib/libffi.exp (libffi_target_compile): If '--with-build-sysroot=[...]' was specified, use it for build-tree testing.
[Bug c++/112099] GCC doesn't recognize matching friend operator!= to resolve ambiguity in operator==
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112099 Daniel Krügler changed: What|Removed |Added CC||daniel.kruegler@googlemail. ||com --- Comment #1 from Daniel Krügler --- This could be related to https://cplusplus.github.io/CWG/issues/2804.html
[Bug c/112102] Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 --- Comment #1 from Kaze Emanuar --- Ignore the line about cycle counts. That was only applicable to my use case before I realized GCC does this for all MIPS architectures. Sorry!
[Bug c++/101631] gcc allows for the changing of an union active member to be changed via a reference
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101631 Patrick Palka changed: What|Removed |Added Resolution|--- |FIXED CC||ppalka at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED Target Milestone|--- |14.0 --- Comment #7 from Patrick Palka --- Marking this fixed for GCC 14 then, thanks!
[Bug c/112102] New: Inefficient Integer multiplication on MIPS processors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112102 Bug ID: 112102 Summary: Inefficient Integer multiplication on MIPS processors Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: kazeemanuar at googlemail dot com Target Milestone: --- Running integer multiplication with the -Os flag enabled can generate 2 unnecessary NOP instructions. This increases the cost of integer multiplication from 7 to 9 cycles in most cases. Example code: int test(int a, int b, int c, int d) { return 788*a + 789 * b + 187 + c + d; } output: li $2,788 mult$4,$2 li $2,789 <--- could be moved down into one of the NOPs mflo$4 nop nop mult$5,$2 mflo$5 addu$4,$4,$5 addiu $4,$4,187 <--- could be moved up into one of the NOPs addu$4,$4,$6 <--- could be moved up into one of the NOPs jr $31 addu$2,$4,$7 This happens on all GCC versions as far as I can tell. Compiler explorer link: https://godbolt.org/z/M3x3s3KhM
[Bug c/112101] feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101 --- Comment #1 from Abdulmalek Almkainzi --- correction for the gurantee_type macro: #define gurantee_type(exp, type) \ _Generic(exp, type: exp, default: (type){0})
[Bug c/112101] New: feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112101 Bug ID: 112101 Summary: feature request: typeof_arg for extracting the type of a function's (or function pointer's) arguments Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: malekwryyy at gmail dot com Target Milestone: --- C23 will add typeof (although gcc had a for a while) which gives the type of an expression or a type. By using it, it is possible to get the return type of a function like so: ``` int func(); typeof(func()) x; // int x; ``` But there's no way to extract the type of the argument of a function: ``` void func(int); ?? x; ``` I think something like 'typeof_arg' would be a good addition. It takes 2 operands, first is function or function pointer, and second is an integer constant for the index of the argument, which must be within [0, arg_count). for example: ``` #define print_func(f) \ printf(#f \ "(" \ _Generic( (__typeof_arg(f, 0)){0}, \ int: "int", \ long:"long", \ float: "float", \ char*: "char*", \ default: "other ") \ ")") ``` this would print a single-argument function's name and arg type like this "puts(char*)". another example: ``` #define gurantee_type(exp, type) \ _Generic(exp, type: exp, default: (typeof(exp)){0}) #define call_with_empty(f) \ _Generic( (__typeof_arg(f, 0)){0}, \ char*: gurantee_type(f, void(*)(char*))(""), \ default: f( (__typeof_arg(f, 0)){0} ) \ ) ``` which calls the function 'f' with empty string if it takes char*, or 0 of the correct type otherwise. this wouldn't work for variadic functions, so __typeof_arg(printf, 1) would be an error. I think a feature like this would be really helpful for generic programming in C
[Bug sanitizer/112100] New: ubsan: misses UB when modifying std::string's trailing \0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112100 Bug ID: 112100 Summary: ubsan: misses UB when modifying std::string's trailing \0 Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: sanitizer Assignee: unassigned at gcc dot gnu.org Reporter: jengelh at inai dot de CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org, jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at gcc dot gnu.org Target Milestone: --- Input: #include int main() { std::string s="fooo"; s[s.size()] = 0xff; } Observed: $ g++ x.cpp -v -Wall -ggdb3 -fsanitize=undefined,address && ./a.out gcc version 13.2.1 20230912 [revision b96e66fd4ef3e36983969fb8cdd1956f551a074b] (SUSE Linux) (no runtime output by executable) Expected: ==55843==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xsomething https://eel.is/c++draft/string.access specifies the modification of the NUL char's position to values other than \0 is UB, so it should warn about this.
[Bug c++/112099] New: GCC doesn't recognize matching friend operator!= to resolve ambiguity in operator==
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112099 Bug ID: 112099 Summary: GCC doesn't recognize matching friend operator!= to resolve ambiguity in operator== Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: usaxena95 at gmail dot com Target Milestone: --- https://godbolt.org/z/13T5vhETK ```cpp struct S { operator int(); friend bool operator==(const S &, int); friend bool operator!=(const S &, int); }; struct A : S {}; struct B : S {}; bool x = A{} == B{}; // ambiguous!! ``` Adding a decl for `operator!=` to the **namespace scope** makes it work fine: https://godbolt.org/z/zzGWxb9zG ```cpp struct S { operator int(); friend bool operator==(const S &, int); friend bool operator!=(const S &, int); }; bool operator!=(const S &, int); struct A : S {}; struct B : S {}; bool x = A{} == B{}; ``` According to [P2468R2 - The Equality Operator You Are Looking For](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2468r2.html) """ A non-template function or function template F named operator== is a rewrite target with first operand o unless a search for the name operator!= in the scope S from the instantiation context of the operator expression finds a function or function template that would correspond ([basic.scope.scope]) to F if its name were operator==, **where S is the scope of the class type of o if F is a class member, and the namespace scope of which F is a member otherwise**. A function template specialization named operator== is a rewrite target if its function template is a rewrite target. """ It feels like for `friend` functions (which are class members), `S` is namespace scope. A lookup in the namespace scope does not find a matching `operator!=` unless declared outside the class scope. The need to add a re-decl of the friend outside of class scope looks unreasonable to me. This looks to me like an oversight in this paper OR this is a compiler bug and namespace lookup should actually find the `friend operator!= in class scope` and resolve the ambiguity.
[Bug middle-end/112098] suboptimal optimization of inverted bit extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2023-10-26 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- Mine. The problem is more simplified than that. We recognize: `(A & C) != 0 ? D : 0` But not `(A & C) == 0 ? D : 0` Also the order of match causes issues for unsigned int foo_ (unsigned int x) { int t = x & 0x200; if (t) return 0x10; return 0; } /* A few simplifications of "a ? CST1 : CST2". */ And a few other issues too.
[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740 --- Comment #14 from Bálint Aradi --- Thanks a lot for fixing it!
[Bug tree-optimization/111520] [14 Regression] ICE: verify_flow_info failed (error: probability of edge 3->8 not initialized) with -O -fsignaling-nans -fharden-compares -fnon-call-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111520 Alexandre Oliva changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Alexandre Oliva --- Fixed.
[Bug tree-optimization/111943] ICE in gimple_split_edge, at tree-cfg.cc:3019 on 20050510-1.c with new -fharden-control-flow-redundancy with computed gotos
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111943 Alexandre Oliva changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-10-26 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |aoliva at gcc dot gnu.org --- Comment #1 from Alexandre Oliva --- Created attachment 56315 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56315=edit candidate patch under test Mine. Thanks for the report, testing a fix.
[Bug middle-end/112098] suboptimal optimization of inverted bit extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098 --- Comment #1 from Bruno Haible --- The code that gets executed inside gcc is maybe the one mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109907#c2 .
[Bug middle-end/112098] New: suboptimal optimization of inverted bit extraction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112098 Bug ID: 112098 Summary: suboptimal optimization of inverted bit extraction Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: bruno at clisp dot org Target Milestone: --- gcc optimizes quite well a bit extraction such as -- foo.c -- unsigned int foo (unsigned int x) { return (x & 0x200 ? 0x10 : 0); } --- $ gcc -O2 -S foo.c && cat foo.s ... shrl$5, %eax andl$16, %eax ... That is perfect: 2 arithmetic instructions. However, for the inverted bit extraction == foo.c == unsigned int foo (unsigned int x) { return (x & 0x200 ? 0 : 0x10); } === the resulting code has 4 arithmetic instructions: $ gcc -O2 -S foo.c && cat foo.s ... shrl$9, %eax xorl$1, %eax andl$1, %eax sall$4, %eax ... Very clearly, the last shift instruction could be saved by transforming this code to ... shrl$5, %eax xorl$16, %eax andl$16, %eax ... clang 16 even replaces the "xorl $16, %eax" instruction with a "notl %eax". So, the optimal instruction sequence is one of ... shrl$5, %eax notl%eax andl$16, %eax ... or ... notl%eax shrl$5, %eax andl$16, %eax ... $ gcc --version gcc (GCC) 13.2.0 Copyright (C) 2023 Free Software Foundation, Inc. This is for x86_64. But similar optimization opportunities exist for other CPUs as well. For example, arm: ... lsr r0, r0, #9 eor r0, r0, #1 and r0, r0, #1 lsl r0, r0, #4 ... which can be optimized to ... lsr r0, r0, #5 eor r0, r0, #16 and r0, r0, #16 ... Or for sparc64: ... and %o0, 512, %o0 cmp %g0, %o0 subx%g0, -1, %o0 sll %o0, 4, %o0 jmp %o7+8 srl%o0, 0, %o0 ... which can be optimized to ... xnor%g0, %o0, %o0 srl %o0, 5, %o0 jmp %o7+8 and%o0, 16, %o0 ...
[Bug libstdc++/112097] _PSTL_EARLYEXIT_PRESENT macro doesn't correctly identify intel compilers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112097 Jonathan Wakely changed: What|Removed |Added Last reconfirmed||2023-10-26 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Jonathan Wakely --- This is set in which does: #if defined(__INTEL_COMPILER) && __INTEL_COMPILER >= 1800 # define _PSTL_EARLYEXIT_PRESENT # define _PSTL_MONOTONIC_PRESENT #endif That was written by Intel, but maybe before icc was replaced by icx. The relevant macros for icx are: #define __INTEL_CLANG_COMPILER 20230200 #define __INTEL_LLVM_COMPILER 20230200
[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740 Paul Thomas changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Paul Thomas --- Fixed on 13-branch and trunk. Thanks for the report Paul
[Bug fortran/67740] Wrong association status of allocatable character pointer in derived types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67740 --- Comment #12 from CVS Commits --- The releases/gcc-13 branch has been updated by Paul Thomas : https://gcc.gnu.org/g:6fb12d3a0456a3503a670d95803aef10549f0134 commit r13-7986-g6fb12d3a0456a3503a670d95803aef10549f0134 Author: Paul Thomas Date: Thu Oct 12 07:26:59 2023 +0100 Fortran: Set hidden string length for pointer components [PR67740]. 2023-10-11 Paul Thomas gcc/fortran PR fortran/67740 * trans-expr.cc (gfc_trans_pointer_assignment): Set the hidden string length component for pointer assignment to character pointer components. gcc/testsuite/ PR fortran/67740 * gfortran.dg/pr67740.f90: New test (cherry picked from commit 701363d827d45d3e3601735fa42f95644fda8b64)
[Bug rtl-optimization/91865] Combine misses opportunity to remove (sign_extend (zero_extend)) before searching for insn patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91865 --- Comment #6 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:d1bb9569d7030490fe7bb35af432f934560d689d commit r14-4941-gd1bb9569d7030490fe7bb35af432f934560d689d Author: Roger Sayle Date: Thu Oct 26 10:06:59 2023 +0100 PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation. This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ] which results in the following code: foo:AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo:MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle Richard Biener gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
[Bug fortran/104625] ICE in fixup_array_ref, at fortran/resolve.cc:9275 since r10-2912-g70570ec192745095
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104625 --- Comment #8 from Paul Thomas --- (In reply to anlauf from comment #6) > Steve Lionel of Intel confirmed that the code is valid, and that if X is > polymorphic, so is (X): > > community.intel.com/t5/Intel-Fortran-Compiler/SELECT-TYPE-statement-and- > parenthesized-selector/m-p/1537256#M168843 Indeed: "R1105: selector is expr or variable" is totally unambiguous. I will post the latest version of the patch at end-of-play today. I have dealt with nested parentheses but find that array references to 'z' in the original testcase are generating "unclassifiable statement" errors. I have also corrected the error message generated when 'z' is put in an assignment context from, "‘z’ at (1) associated to vector-indexed target cannot be used in a variable definition context (assignment)" to "‘z’ at (1) associated to expression cannot be used in a variable definition context (assignment)", simply by checking for vector-indexing at expr.cc:6477. Cheers Paul
[Bug fortran/104625] ICE in fixup_array_ref, at fortran/resolve.cc:9275 since r10-2912-g70570ec192745095
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104625 --- Comment #7 from Paul Thomas --- (In reply to anlauf from comment #6) > Steve Lionel of Intel confirmed that the code is valid, and that if X is > polymorphic, so is (X): > > community.intel.com/t5/Intel-Fortran-Compiler/SELECT-TYPE-statement-and- > parenthesized-selector/m-p/1537256#M168843 Indeed: "R1105: selector is expr or variable" is totally unambiguous. I will post the latest version of the patch at end-of-play today. I have dealt with nested parentheses but find that array references to 'z' in the original testcase are generating "unclassifiable statement" errors. I have also corrected the error message generated when 'z' is put in an assignment context from, "‘z’ at (1) associated to vector-indexed target cannot be used in a variable definition context (assignment)" to "‘z’ at (1) associated to expression cannot be used in a variable definition context (assignment)", simply by checking for vector-indexing at expr.cc:6477. Cheers Paul
[Bug libstdc++/112097] New: _PSTL_EARLYEXIT_PRESENT macro doesn't correctly identify intel compilers.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112097 Bug ID: 112097 Summary: _PSTL_EARLYEXIT_PRESENT macro doesn't correctly identify intel compilers. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: denis.yaroshevskij at gmail dot com Target Milestone: --- simd first code: https://github.com/gcc-mirror/gcc/blob/be34a8b538c0f04b11a428bd1a9340eb19dec13f/libstdc%2B%2B-v3/include/pstl/unseq_backend_simd.h#L164C13-L164C36 find_if(std::unseq on icx goes to the block = 8 part https://godbolt.org/z/6fdT4j4cz despite it supporting the `#pragma omp simd early_exit` https://godbolt.org/z/Yre19vxdG
[Bug target/111828] rs6000: Parse inline asm string to figure out it requires HTM feature or not.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111828 Kewen Lin changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2023-10-26
[Bug middle-end/111942] ICE in rtl_split_edge, at cfgrtl.cc:1943 on pr98096.c with new -fharden-control-flow-redundancy with asm goto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111942 Alexandre Oliva changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2023-10-26 Ever confirmed|0 |1 --- Comment #1 from Alexandre Oliva --- Thanks for the report. This latent bug is independent from -fharden-control-flow-redundancy. The issue is that volatile asm stmts are considered throw points when -fnon-call-exceptions is enabled. I'm not sure where C++ wires non-call exceptions to enclosing handlers, but it appears that it doesn't: even with a handler added around f's body, no EH edges are added. However, inlining f() into another function with a handler for the call will wire all escaping exceptions to that handler, creating the same arrangement of 3 outgoing edges (fallthrough, asm jmp, and EH) that the rtl splitter barfs at. /* compile with -fnon-call-exceptions */ int i, j; int f(void) { asm goto ("# %0 %2" : "+r" (i) ::: jmp); i += 2; asm goto ("# %0 %1 %l[jmp]" : "+r" (i), "+r" (j) ::: jmp); jmp: return i; } int inline __attribute__ ((__always_inline__)) f(void); int g(void) { try { return f(); } catch (...) { i++; throw; } } ./xgcc -B./ pr98096.cc -fno-harden-control-flow-redundancy -fnon-call-exceptions during RTL pass: expand pr98096.cc: In function ‘int g()’: pr98096.cc:19:1: internal compiler error: in rtl_split_edge, at cfgrtl.cc:1943 19 | }
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 --- Comment #6 from JuzheZhong --- > I have troubles chasing one down and the source code is so > convoluted with macros I can't even find the implementation. I am sorry for causing confusion to you here. But because of the RVV fusion rules are so complicated, we define it in riscv-vsetvl.def. To understand the codes, I suggest you directly read the riscv-vsetvl.def We define all compatible, fusion, available rules there. For example, vle16.v (e16, m1 ) is compatible with vadd.vv (e32, mf2 ), In this case, adjacent 2 instructions "vle16" (e16m1) and vadd.vv (e32mf2) can have the same vsetvl (vsetvl e32mf2). Wheras vsub.vv(e16,m1) and vadd (e32 mf2), they are not compatible.
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 --- Comment #5 from JuzheZhong --- Yes. I am agree that some arch prefer agnostic than undisturbed even with more vsetvls. That's why I have post PR for asking whether we can have a option like -mprefer-agosnotic. https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/37 But I think Maciej is worrying about why GCC fuse vsetvl, and change e16mf2 vsetvl into e32m1. For example: https://godbolt.org/z/6G9G7Pbe9 No 'TU' included. I think LLVM codegen looks more reasonable: beqza5, .LBB0_4 vsetvli a1, a6, e32, m1, ta, ma beqza4, .LBB0_3 .LBB0_2:# =>This Inner Loop Header: Depth=1 vsetvli zero, a1, e32, m1, ta, ma vle32.v v8, (a0) vadd.vv v8, v8, v8 addia4, a4, -1 vse32.v v8, (a3) bneza4, .LBB0_2 .LBB0_3: ret .LBB0_4: sraia1, a6, 2 vsetvli a1, a1, e16, mf2, ta, ma bneza4, .LBB0_2 j .LBB0_3 But GCC is correct with optimizations: foo(int*, int*, int*, int*, unsigned long, int, int): beq a5,zero,.L2 vsetvli a5,a6,e32,m1,ta,ma .L3: beq a4,zero,.L10 li a2,0 .L5: vle32.v v1,0(a0) addia2,a2,1 vadd.vv v1,v1,v1 vse32.v v1,0(a3) bne a4,a2,.L5 .L10: ret .L2: sraiw a5,a6,2 vsetvli zero,a5,e32,m1,ta,ma j .L3
[Bug target/112092] RISC-V: Wrong RVV code produced for vsetvl-11.c and vsetvlmax-8.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112092 Kito Cheng changed: What|Removed |Added CC||kito at gcc dot gnu.org --- Comment #4 from Kito Cheng --- The testcase it self is look like tricky but right, it typically could use to optimize mixed-width (mixed-SEW) operations, You can refer to the EEW stuffs in v-spec[1], most load store has encoding static-EEW and then could apply such vsetvli fusion optimization. [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#52-vector-operands Give a (more) practical example here: ```c #include "riscv_vector.h" void foo(int32_t *in1, int16_t *in2, int16_t *in3, int32_t *out, size_t n, int cond, int avl) { size_t vl = __riscv_vsetvl_e16mf2(avl); vint32m1_t a = __riscv_vle32_v_i32m1(in1, vl); vint16mf2_t b = __riscv_vle16_v_i16mf2(in2, vl); vint16mf2_t c = __riscv_vle16_v_i16mf2(in3, vl); vint32m1_t x = __riscv_vwmacc_vv_i32m1(a, b, c, vl); __riscv_vse32_v_i32m1(out, x, vl); } ``` > Is is guaranteed by the RVV specification that the value of `vl' produced > (which is then supplied as an argument to `__riscv_vle32_v_i32m1', etc.; > I presume implicitly via the VL CSR as I can't see it in actual assembly > produced) is going to be the same for all microarchitectures for both: > > vsetvli zero,a6,e32,m1,tu,ma > >and: > > vsetvli zero,a6,e16,mf2,ta,ma This is another trick in this case: tail agnostic vs tail undisturbed tail undisturbed has stronger semantic than tail agnostic, so using tail undisturbed for agnostic is always safe and satisfied the semantic, same for mask agnostic vs mask undisturbed. But performance is another story, as I know some uArch implement agnostic as undisturbed, which means agnostic or undisturbed no much difference, so fuse those two vsetvli is become kind of optimization. However you could imagine, that also means some uArch is implement agnostic in another way: agnostic MAY has better performance than undisturbed, we should not fuse those vsetvli IF we are targeting such target, anyway, our cost model for RVV still in an initial states, so personally I am fine with that for now, but I guess we need add some more stuff to -mtune to handle those difference.
[Bug tree-optimization/111520] [14 Regression] ICE: verify_flow_info failed (error: probability of edge 3->8 not initialized) with -O -fsignaling-nans -fharden-compares -fnon-call-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111520 --- Comment #3 from CVS Commits --- The master branch has been updated by Alexandre Oliva : https://gcc.gnu.org/g:33d38b431cced81e575b1d17d36cb9e43d64b02b commit r14-4936-g33d38b431cced81e575b1d17d36cb9e43d64b02b Author: Alexandre Oliva Date: Thu Oct 26 03:06:09 2023 -0300 set hardcmp eh probs Set execution count of EH blocks, and probability of EH edges. for gcc/ChangeLog PR tree-optimization/111520 * gimple-harden-conditionals.cc (pass_harden_compares::execute): Set EH edge probability and EH block execution count. for gcc/testsuite/ChangeLog PR tree-optimization/111520 * g++.dg/torture/harden-comp-pr111520.cc: New.