[Bug tree-optimization/111792] [14 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111792 Richard Biener changed: What|Removed |Added Summary|wrong code at -O3 on|[14 Regression] wrong code |x86_64-linux-gnu|at -O3 on x86_64-linux-gnu Target Milestone|--- |14.0 CC||rguenth at gcc dot gnu.org Keywords||needs-bisection, wrong-code Status|UNCONFIRMED |NEW Version|unknown |14.0 Ever confirmed|0 |1 Last reconfirmed||2023-10-13 --- Comment #1 from Richard Biener --- Confirmed. -fno-tree-loop-vectorize fixes it. We vectorize for (; l; l--) { long a[1]; for (r = 0; r < 1; r++) { h = a[0]; if (g) goto L; } }
[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791 --- Comment #2 from Richard Biener --- It does do that (if there's a POPCOUNT optab, that is). Replacing with __builtin_popcount would eventually lead to an infinite recursion in this case ;) (see that old memcpy + ldist bug)
[Bug c++/111788] g++ DWARF for void foo(...) missing unspecified parameters DIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111788 Richard Biener changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #2 from Richard Biener --- I wonder if foo (...) is a GNU extension (it was for C).
[Bug bootstrap/111787] [14 regression] r14-4592-g0d00385eaf72cc breaks build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111787 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED Target Milestone|--- |14.0 --- Comment #4 from Richard Biener --- Fixed.
[Bug target/111784] [14 Regression] aarch64: ldp_stp_{15, 16, 17, 18}.c test failures since r14-4579
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111784 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0
[Bug tree-optimization/111779] Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #5 from Richard Biener --- This should be fixed now.
[Bug target/111778] PowerPC constant code change uses an undefined shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #4 from Richard Biener --- Fixed.
[Bug ipa/111773] Inconsistent optimization of replaced operator new()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111773 Richard Biener changed: What|Removed |Added Status|ASSIGNED|UNCONFIRMED Ever confirmed|1 |0 Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #5 from Richard Biener --- So the second example is fixed, but it's quite a corner-case so probably not worth backporting. Given we have a single bugreport for two issues back to UNCONFIRMED for the first issue. I agree with Andrew _that_ issue behaves within the constraints of the standard. ISTR it says that 'operator new' has to return a pointer that nothing else points to which means it acts as if it were restrict qualified. That allows GCC to conclude x != 0 because it rewrites x == 0 as a - p == 0 and a == p. The difference operation cannot be constant folded based on this but during IPA optimization we inline 'new' which exposes p == a and thus a - p == 0.
[Bug tree-optimization/111779] Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 --- Comment #4 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:6decda1a35be5764101987c210b5693a0d914e58 commit r14-4612-g6decda1a35be5764101987c210b5693a0d914e58 Author: Richard Biener Date: Thu Oct 12 11:34:57 2023 +0200 tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA The following handles byte-aligned, power-of-two and byte-multiple sized BIT_FIELD_REF reads in SRA. In particular this should cover BIT_FIELD_REFs created by optimize_bit_field_compare. For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF appearing there leading to more DSE, fully eliding the aggregates. This results in the same false positive -Wuninitialized as the older attempt to remove the folding from optimize_bit_field_compare, fixed by initializing part of the aggregate unconditionally. PR tree-optimization/111779 gcc/ * tree-sra.cc (sra_handled_bf_read_p): New function. (build_access_from_expr_1): Handle some BIT_FIELD_REFs. (sra_modify_expr): Likewise. (make_fancy_name_1): Skip over BIT_FIELD_REF. gcc/fortran/ * trans-expr.cc (gfc_trans_assignment_1): Initialize lhs_caf_attr and rhs_caf_attr codimension flag to avoid false positive -Wuninitialized. gcc/testsuite/ * gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE. * gcc.dg/vect/vect-pr111779.c: New testcase.
[Bug ipa/111773] Inconsistent optimization of replaced operator new()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111773 --- Comment #4 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:35b5bb475375dba4ea9101d6db13a6012c4e84ca commit r14-4611-g35b5bb475375dba4ea9101d6db13a6012c4e84ca Author: Richard Biener Date: Thu Oct 12 10:13:58 2023 +0200 tree-optimization/111773 - avoid CD-DCE of noreturn special calls The support to elide calls to allocation functions in DCE runs into the issue that when implementations are discovered noreturn we end up DCEing the calls anyway, leaving blocks without termination and without outgoing edges which is both invalid IL and wrong-code when as in the example the noreturn call would throw. The following avoids taking advantage of both noreturn and the ability to elide allocation at the same time. For the testcase it's valid to throw or return 10 by eliding the allocation. But we have to do either where currently we'd run off the function. PR tree-optimization/111773 * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Do not elide noreturn calls that are reflected to the IL. * g++.dg/torture/pr111773.C: New testcase.
[Bug tree-optimization/111727] [14 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111727 --- Comment #2 from Zhendong Su --- Another similar/related test: [553] % gcctk -O2 small.c; ./a.out [554] % [554] % gcctk -O3 small.c [555] % ./a.out Aborted [556] % cat small.c int a, b; int main() { for (; a < 4; a += 2) if (a > 2) while (b++); ; if (a != 4) __builtin_abort(); return 0; }
[Bug tree-optimization/111792] New: wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111792 Bug ID: 111792 Summary: wrong code at -O3 on x86_64-linux-gnu Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: zhendong.su at inf dot ethz.ch Target Milestone: --- It appears to be a recent regression. compiler Explorer: https://godbolt.org/z/7vfzPYTrn [537] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/suz-local/software/local/gcc-trunk/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk --enable-sanitizers --enable-languages=c,c++ --disable-werror --enable-multilib Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20231013 (experimental) (GCC) [538] % [538] % gcctk -O2 small.c; ./a.out [539] % [539] % gcctk -O3 small.c [540] % ./a.out Aborted [541] % cat small.c int c, d, h, i, j, l, *n = &h; short e, f, g, *k, m; long o; short p(short p1, int q) { return q >= 32 || p1 > 5 >> q ? 1 : p1 << q; } long u(unsigned p1) { int r = 50, s, *t = &c; L: m && (*k = 0); for (d = 1; d; d--) for (s = 0; s < 3; s++) { *n = i ^ p1; *t = p1 > (unsigned)p((unsigned)(o = 4073709551615) >= p1 && 5, r); if (f) goto L; } for (; e < 1;) return j; int *b[2] = {&s, &r}; for (; l; l--) { long a[1]; for (r = 0; r < 1; r++) { h = a[0]; if (g) goto L; } } return 0; } int main() { u(6); if (c != 1) __builtin_abort(); return 0; }
[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001 --- Comment #3 from Oleg Endo --- (In reply to Oleg Endo from comment #2) > I've briefly tried on a local gcc version 13.1.1 20230714 > > While it doesn't crash, the sh_treg_combine2 pass seems to be stuck in an > infinite loop. It produces a log file > 200 MByte. It trips on the following insn: (insn 1431 1430 179 19 (set (reg/v:DI 264 [ blk_cnt ]) (reg/v:DI 264 [ blk_cnt ])) "rw_bitmaps.c":341:11 -1 (nil)) ... which is a reg-reg move on itself (i.e. a nop). For some reason this insn is emitted by the split1 pass, which runs before sh_treg_combine2.
[Bug target/111001] SH: ICE during RTL pass: sh_treg_combine2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111001 Oleg Endo changed: What|Removed |Added Last reconfirmed||2023-10-13 Ever confirmed|0 |1 CC||olegendo at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #2 from Oleg Endo --- I've briefly tried on a local gcc version 13.1.1 20230714 While it doesn't crash, the sh_treg_combine2 pass seems to be stuck in an infinite loop. It produces a log file > 200 MByte.
[Bug target/54089] [SH] Refactor shift patterns
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54089 --- Comment #103 from Oleg Endo --- (In reply to Alexander Klepikov from comment #102) > Created attachment 55543 [details] > Arithmetic right shift late expanding v2 > > Here's the patch. I hope I did not miss anything. > Sorry, I've been busy with other things. I think your patch is OK, but I wanted to test it in more detail before committing it.
[Bug target/111600] [14 Regression] RISC-V bootstrap time regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 JuzheZhong changed: What|Removed |Added CC||juzhe.zhong at rivai dot ai --- Comment #24 from JuzheZhong --- (In reply to Robin Dapp from comment #23) > For the lack of a better idea (and time constraints as looking for compiler > bottlenecks is slow and tedious) I went with Kito's suggestion of splitting > insn-emit.cc > > This reduces this part of the compilation with eight threads to 40s (from 10 > min before). I evenly split the number of patterns into the 10 files but it > just so happens that the last file will receive all the problematical > maybe_code_for functions, so that file makes up for most of the 40s. The > rest usually takes 5-20s. > > Doing bootstrapping tests now, going to post an initial patch once it's > "presentable". Hi, Robin. I believe your patch can solve the compile-time issue. But I wonder whether it can fix memory consumption too ?
[Bug target/108315] -mcpu=power10 changes ABI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #21 from Rui Ueyama --- I fixed several issues in mold related to POWER10 compatibility, and all its unit tests pass on gcc120! I also confirmed that mold can now bootstrap itself with `-mcpu=power10`. So I believe it's now usable on POWER10 machines.
[Bug target/111424] LoongArch: Enable vect test suite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111424 --- Comment #3 from Chenghui Pan --- vect.exp is enabled in master branch for now, but there's some check_effective_target procs in gcc/testsuite/lib/target-supports.exp that seems need modifying for enabling more vectorization tests.
[Bug target/107704] [13/14 Regression] Testsuite regression after recent DCE changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107704 --- Comment #5 from Oleg Endo --- (In reply to Jeffrey A. Law from comment #2) > ACK. And as I mentioned, the RTL form looks like it ought to be caught by > the SH specific code to optimize T reg handling. I don't care enough about > the SH to try and debug a missed optimization though. Haven't seen this one. Will try to have a look at it, since I wrote those parts.
[Bug target/111784] [14 Regression] aarch64: ldp_stp_{15, 16, 17, 18}.c test failures since r14-4579
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111784 Kewen Lin changed: What|Removed |Added Summary|[14 Regression] aarch64:|[14 Regression] aarch64: |ldp_stp_{15,16,17,18}.c |ldp_stp_{15,16,17,18}.c |test failures |test failures since ||r14-4579 Assignee|unassigned at gcc dot gnu.org |linkw at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2023-10-13 CC||linkw at gcc dot gnu.org --- Comment #2 from Kewen Lin --- Thanks for reporting! (In reply to Alex Coplan from comment #1) > More context/details about the issue in 8/10 of the original patch series > (that the above revision comes from): > https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630234.html Yeah, there are some discussion under this thread, I'll make a patch to cover both load and store.
[Bug rtl-optimization/111782] [11/12/13/14 Regression] Extra move in double argument and multiplication and return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111782 Andrew Pinski changed: What|Removed |Added Summary|[11/12/13/14 Regression]|[11/12/13/14 Regression] |Extra move in complex |Extra move in double |double argument and |argument and multiplication |multiplication |and return --- Comment #2 from Andrew Pinski --- Here is another testcase without complex (or even needing -ffast-math) being involved: ``` struct cmplx { double r; double i; }; struct cmplx f(double ar, double ai, double br, double bi, __complex double *r) { double t = ai * bi; double t1 = ai * br; double t2 = ar * bi + t1; double t3 = ar * br - t; return (struct cmplx){t3,t2}; } ``` This only shows up wih both arguments and returns and using the same registers. I am 99% sure this does not show up that much really either.
[Bug libfortran/83282] missing comma in format changes output
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83282 anlauf at gcc dot gnu.org changed: What|Removed |Added Last reconfirmed|2017-12-05 00:00:00 |2023-10-12 CC||anlauf at gcc dot gnu.org Component|fortran |libfortran Keywords||wrong-code --- Comment #3 from anlauf at gcc dot gnu.org --- I looked at parse_format_list and got lost in the forest... @Jerry: it's a missing comma *after* the A descriptor triggering the error.
[Bug rtl-optimization/111782] [11/12/13/14 Regression] Extra move in complex double multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111782 Andrew Pinski changed: What|Removed |Added Keywords||ra Last reconfirmed||2023-10-12 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- Confirmed, the first difference in the IR between the 10 and 11 is reload (LRA). I suspect this is only an argument register allocation issue.
[Bug c/111786] No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 --- Comment #3 from Lukas Grätz --- (In reply to Jakub Jelinek from comment #1) > We completely intentionally don't emit tail calls to noreturn functions, so > that e.g. in case of abort one doesn't need to virtually reconstruct > backtrace. > In your case, the interprocedural optimizations determine expr_main_original > is noreturn and so calls it normally (and optimizes away anything after that > call). Thank you very much indeed! (Ah yes, this also explains why there is not "ret".) And sorry for not realizing that this is duplicate. So the "call" is intentionally emitted by gcc for a better debugging experience. I agree, this makes perfectly sense in many cases. However, the price of better debugging seems to be the danger of a stack overflow. After I understood your "complete" intention, it took me about 20 min to construct an example with bears a stack overflow following that intention. --- void foo(int n) { if (n == 0) exit(0); int x[200]; for (int i = 0; i < 200; i++) extern_function(x[i], x[200-i]); return foo(n-1); } --- After adding __attribute__((noreturn)), compiling with -O3 and passing 1 to foo(), I get a segmentation fault. There is still a warning "function declared ‘noreturn’ has a ‘return’ statement". But in our case, the noreturn attribute is not wrong, because none of the recursive calls actually do return. This might be something that interprocedure optimizations detect in the future. So even without attribute noreturn, gcc could decide to produce no tail recursion (because it is a noreturn function, regardless of the noreturn attribute). Last remark, then I remain silent. I just learned that clang actually has the attribute musttail which would help for my reported C file as well as in the foo() example above to prevent the stack overflow. But I guess it is not planned to add musttail to gcc?
[Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-10-12 CC||pinskia at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed. I almost want to say SCCP should always do this loop into _builtin_popcountN if there are no other statements in it and then it will be removed and not be vectorized but maybe that there is another way to fix ...
[Bug fortran/86120] ICE caused by unassociated pointer in SIZE intrinsic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86120 anlauf at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED CC||anlauf at gcc dot gnu.org --- Comment #6 from anlauf at gcc dot gnu.org --- The testcase discussed here is almost identical to pr66969. *** This bug has been marked as a duplicate of bug 66969 ***
[Bug fortran/66969] Internal compiler error, segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66969 anlauf at gcc dot gnu.org changed: What|Removed |Added CC||simon.kluepfel at gmail dot com --- Comment #3 from anlauf at gcc dot gnu.org --- *** Bug 86120 has been marked as a duplicate of this bug. ***
[Bug target/111778] PowerPC constant code change uses an undefined shift
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111778 --- Comment #3 from CVS Commits --- The master branch has been updated by Michael Meissner : https://gcc.gnu.org/g:611eef7609f732db65c119a7eab6d50a5fdd5985 commit r14-4600-g611eef7609f732db65c119a7eab6d50a5fdd5985 Author: Michael Meissner Date: Thu Oct 12 16:17:59 2023 -0400 PR111778, PowerPC: Do not depend on an undefined shift I was building a cross compiler to PowerPC on my x86_86 workstation with the latest version of GCC on October 11th. I could not build the compiler on the x86_64 system as it died in building libgcc. I looked into it, and I discovered the compiler was recursing until it ran out of stack space. If I build a native compiler with the same sources on a PowerPC system, it builds fine. I traced this down to a change made around October 10th: | commit 8f1a70a4fbcc6441c70da60d4ef6db1e5635e18a (HEAD) | Author: Jiufu Guo | Date: Tue Jan 10 20:52:33 2023 +0800 | | rs6000: build constant via li/lis;rldicl/rldicr | | If a constant is possible left/right cleaned on a rotated value from | a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr" | to build the constant. The code was doing a -1 << 64 which is undefined behavior because different machines produce different results. On the x86_64 system, (-1 << 64) produces -1 while on a PowerPC 64-bit system, (-1 << 64) produces 0. The x86_64 then recurses until the stack runs out of space. If I apply this patch, the compiler builds fine on both x86_64 as a PowerPC crosss compiler and on a native PowerPC system. 2023-10-12 Michael Meissner gcc/ PR target/111778 * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): Protect code from shifts that are undefined. (can_be_built_by_li_lis_and_rldicr): Likewise. (can_be_built_by_li_and_rldic): Protect code from shifts that undefined. Also replace uses of 1ULL with HOST_WIDE_INT_1U.
[Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791 Bug ID: 111791 Summary: RISC-V: Strange loop vectorizaion on popcount function Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kito at gcc dot gnu.org Target Milestone: --- Symptom: A typical popcount implementation with Brian Kernighan’s algorithm, vectorizer has recognized that as popcount, but...come with strange vectorization result, I know that might because I add -fno-vect-cost-model, but I still don't understand why it vectorized, so I guess maybe it's something worth to report. NOTE: Those bad/strange code gen will gone once scalar popcount instruction available. Case: ``` int popcount(unsigned long value) { int nbits; for (nbits = 0; value != 0; value &= value - 1) nbits++; return nbits; } ``` Command to reproduce: ``` $ riscv64-unknown-linux-gnu-gcc x.c -march=rv64gcv -o - -S -fno-vect-cost-model -O3 ``` Sha1: g:faae30c49560f1481f036061fa2f894b0f7257f8 (some random point of top of trunk) Current output: ``` .globl popcount .type popcount, @function popcount: .LFB0: .cfi_startproc beq a0,zero,.L4 addisp,sp,-16 .cfi_def_cfa_offset 16 sd ra,8(sp) .cfi_offset 1, -8 call__popcountdi2 csrra2,vlenb sext.w a0,a0 srlia2,a2,2 vsetvli a3,zero,e32,m1,ta,ma vid.v v1 .L3: vsetvli a5,a0,e8,mf4,ta,ma sub a0,a0,a5 vsetvli a3,zero,e32,m1,ta,ma vmv1r.v v3,v1 vmv.v.x v2,a2 vadd.vv v1,v1,v2 bne a0,zero,.L3 ld ra,8(sp) .cfi_restore 1 addia5,a5,-1 vadd.vi v3,v3,1 vslidedown.vx v3,v3,a5 addisp,sp,16 .cfi_def_cfa_offset 0 vmv.x.s a0,v3 jr ra .L4: li a0,0 ret .cfi_endproc .LFE0: .size popcount, .-popcount ```
[Bug fortran/111783] 'exit' intrinsic should be marked as noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111783 --- Comment #3 from anlauf at gcc dot gnu.org --- (In reply to anlauf from comment #2) > This leaves ABORT and EXIT to deal with. Speaking to myself: subroutine s1() call exit(1) stop 98 end subroutine s2() call abort stop 99 end Here the STOP statements do not show up in .optimized. So I am wondering where the _gfortran_exit_i4 in comment#0 come from?
[Bug fortran/111783] 'exit' intrinsic should be marked as noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111783 anlauf at gcc dot gnu.org changed: What|Removed |Added CC||anlauf at gcc dot gnu.org --- Comment #2 from anlauf at gcc dot gnu.org --- (In reply to Tobias Burnus from comment #0) > There are probably more, at least the ABORT intrinsic subroutine and > the functions associated with STOP / ERROR STOP like _gfortran_stop_numeric. trans-decl.cc has: /* STOP doesn't return. */ TREE_THIS_VOLATILE (gfor_fndecl_stop_numeric) = 1; TREE_THIS_VOLATILE (gfor_fndecl_stop_string) = 1; TREE_THIS_VOLATILE (gfor_fndecl_error_stop_numeric) = 1; TREE_THIS_VOLATILE (gfor_fndecl_error_stop_string) = 1; plus a few more, so these are already accounted for. Try: stop 1 error stop 2 stop "3" end The dump-tree-optimized only contains the first even at -O0, as it should be, shouldn't it? This leaves ABORT and EXIT to deal with.
[Bug c++/111660] [14 Regression] Compilation of constexpr function returning enum takes exponential time with -std=c++2a since r14-4140-g6851e3423c2b5e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111660 --- Comment #12 from Marek Polacek --- Candidate fix: --- a/gcc/cp/cp-gimplify.cc +++ b/gcc/cp/cp-gimplify.cc @@ -1072,7 +1072,7 @@ cp_fold_immediate_r (tree *stmt_p, int *walk_subtrees, void *data_) /* We're done here. Don't clear *walk_subtrees here though: we're called from cp_fold_r and we must let it recurse on the expression with cp_fold. */ - break; + return integer_zero_node; case PTRMEM_CST: if (TREE_CODE (PTRMEM_CST_MEMBER (stmt)) == FUNCTION_DECL && DECL_IMMEDIATE_FUNCTION_P (PTRMEM_CST_MEMBER (stmt))) @@ -1145,7 +1145,8 @@ cp_fold_immediate (tree *tp, mce_value manifestly_const_eval) flags |= ff_mce_false; cp_fold_data data (flags); - return !!cp_walk_tree_without_duplicates (tp, cp_fold_immediate_r, &data); + tree r = cp_walk_tree_without_duplicates (tp, cp_fold_immediate_r, &data); + return r == error_mark_node; } /* Perform any pre-gimplification folding of C++ front end trees to
[Bug tree-optimization/27504] x && (x & y) not optimized to x & y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27504 Andrew Pinski changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #6 from Andrew Pinski --- Note also there are other bugs associated with similar ifcombine issue (besides this one and PR 88280 ). I don't know if I will get to that part until next year though.
[Bug tree-optimization/27504] x && (x & y) not optimized to x & y
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=27504 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #5 from Andrew Pinski --- Mine. Bug 88280 comment #5 fixes part of the issue. The secondary issue is ifcombine does not do the combining to get that point.
[Bug tree-optimization/88280] missing folding of logical and bitwise AND
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88280 Andrew Pinski changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org --- Comment #5 from Andrew Pinski --- Match pattern: ``` (for cmp (eq ne ) bop (bit_ior bit_and) (bop:c (cmp @0 integer_zerop) (cmp@2 (bit_and:c @0 @1) integer_zerop)) @2) ``` testcase that shows the issue (of the swapped order of the ifs) ``` _Bool f0(int a, int b) { if (a == 0) return 1; return (a & b) == 0; // (a==0) | ((a &b) == 0) -> ((a &b) == 0) } _Bool f0_(int a, int b) { if ((a & b) == 0) return 1; return a == 0; // ((a &b) == 0) | (a==0) -> ((a &b) == 0) } _Bool f1(int a, int b) { if (a != 0) return (a & b) != 0; return 0; // (a!=0) & ((a & b) != 0) -> ((a & b) != 0) } _Bool f1_(int a, int b) { if ((a & b) != 0) return a != 0; return 0; // ((a & b) != 0) & (a!=0) -> ((a & b) != 0) } ```
[Bug c++/111790] Unwarranted missing template keyword warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111790 --- Comment #1 from Tinko Sebastian Bartels --- A command line that can trigger the behavior is g++ main.cpp One of the versions for which it occurs is g++ -v Es werden eingebaute Spezifikationen verwendet. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/13.2.1/lto-wrapper Ziel: x86_64-pc-linux-gnu Konfiguriert mit: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread-Modell: posix Unterstützte LTO-Kompressionsalgorithmen: zlib zstd gcc-Version 13.2.1 20230801 (GCC)
[Bug c++/111790] New: Unwarranted missing template keyword warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111790 Bug ID: 111790 Summary: Unwarranted missing template keyword warning Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: t.bart...@tu-berlin.de Target Milestone: --- Consider the following code: template //no warning if this is not a template struct X { T v; }; template inline void f() { X x; //no warning with e.g. X if(x.v < 0 && 0 > -1) {} //No warning without the "-" } int main() { f(); //Still get a warning without this instantiation. } GCC 12, 13, current trunk and seemingly everything after b8ffa71e427 give me a warning for this: warning: expected 'template' keyword before dependent template name [-Wmissing-template-keyword] But GCC compiles it, so it seems to interpret it correctly at some point after triggering the warning. I tried to look into it and thought that maybe cp_lexer_peek_token (parser->lexer)->type <= CPP_LAST_PUNCTUATOR in line 6447 of gcc/cp/parser.cc is too general but I do not have a proposed patch.
[Bug c++/108993] Value initialization does not occur for derived class , for gcc versions > 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108993 Andrew Pinski changed: What|Removed |Added CC||iamsupermouse at mail dot ru --- Comment #15 from Andrew Pinski --- *** Bug 111771 has been marked as a duplicate of this bug. ***
[Bug c++/111771] Incorrect "is used uninitialized" warning, as if zero-initialization didn't propagate through user-provided default constructors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111771 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED --- Comment #5 from Andrew Pinski --- Dup of bug 108993. *** This bug has been marked as a duplicate of bug 108993 ***
[Bug tree-optimization/111622] [13 Regression] EVRP compile-time hog compiling risc-v insn-opinit.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111622 --- Comment #6 from Andrew Macleod --- Interesting. The "fix" turns out to be: commit 9ea74d235c7e7816b996a17c61288f02ef767985 Author: Richard Biener Date: Thu Sep 14 09:31:23 2023 +0200 tree-optimization/111294 - better DCE after forwprop The following adds more aggressive DCE to forwprop to clean up dead stmts when folding a stmt leaves some operands unused. The patch uses simple_dce_from_worklist for this purpose, queueing original operands before substitution and folding, but only if we folded the stmt. This removes one dead stmt biasing threading costs in a later pass but it doesn't resolve the optimization issue in the PR yet. Which implies something pathological was triggering in VRP, so I dug a little deeper... It seems to be a massive number of partial equivalencies generated by sequences like: _5 = (unsigned int) _1; _10 = (unsigned int) _1; _15 = (unsigned int) _1; _20 = (unsigned int) _1; _25 = (unsigned int) _1; <...> for a couple of hundred statements. these are all then members of a partial equivalence set, and we end up doing obscene amounts of pointless looping and recomputing of ranges of things in the set when say _1 may change. The intent of partial equivalence is to allow us to reflect known subranges thru casts, but not to build up large groups like in an equivalence. There should be a limit to the size. We start to lose most of the usefulness when the grouping gets too large.
[Bug middle-end/111789] [14 Regression] runtime Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111789 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords||needs-bisection, ||needs-reduction Last reconfirmed||2023-10-12 --- Comment #3 from Andrew Pinski --- This looks wrong: ``` ;; Function func_28.constprop.isra (func_28.constprop.0.isra.0, funcdef_no=115, decl_uid=6855, cgraph_uid=120, symbol_order=162) void func_28.constprop.isra (int32_t * * p_30) { [local count: 1073741824]: return; } ``` >From : ``` static uint16_t func_28(int16_t p_29, int32_t ** p_30, int32_t * p_31, uint16_t p_32) { int32_t *l_668 = &g_10; (*p_30) = l_668; return p_29; } ```
[Bug middle-end/111789] [14 Regression] runtime Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111789 --- Comment #2 from Andrew Pinski --- #0 0x00402521 in func_4 (p_5=, p_6=p_6@entry=0x406044 , p_7=21909, p_8=, p_9=398526839) at /home/cuisk/gcc/tmp/a.c:134 => 0x00402521 <+193>: mov(%r9),%r11d r9 0x4064
[Bug middle-end/111789] [14 Regression] runtime Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111789 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |14.0 Summary|Segmentation fault with |[14 Regression] runtime |'-O3 -fno-inline|Segmentation fault with |-fno-toplevel-reorder' |'-O3 -fno-inline ||-fno-toplevel-reorder' Component|c |middle-end Keywords||wrong-code Target||x86_64-linux-gnu
[Bug c/111789] Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111789 --- Comment #1 from CTC <19373742 at buaa dot edu.cn> --- Created attachment 56100 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56100&action=edit The compiler output
[Bug c/111789] New: Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111789 Bug ID: 111789 Summary: Segmentation fault with '-O3 -fno-inline -fno-toplevel-reorder' Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: 19373742 at buaa dot edu.cn Target Milestone: --- Created attachment 56099 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56099&action=edit The preprocessed file *** OS and Platform: Ubuntu 20.04.4 LTS *** gcc version: $ gcc -v Using built-in specs. COLLECT_GCC=/home/ctc/gcc-releases/gcc-14/bin/gcc COLLECT_LTO_WRAPPER=/home/ctc/gcc-releases/gcc-14/libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ./configure --prefix=/home/cuisk/ctc/gcc-releases/gcc-14 --disable-multilib --enable-language=c,c++ Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20231008 (experimental) (GCC) *** Command Lines: $ gcc -O3 -fno-inline -fno-toplevel-reorder -I /home/csmith/include/csmith-2.3.0/ a.c -o cra 2>cra.txt $ ./cra Segmentation fault (core dumped) $ gcc -I /home/csmith/include/csmith-2.3.0/ -fsanitize=undefined a.c -o ncra $ ./ncra checksum = 91C2E0C4
[Bug c++/111788] g++ DWARF for void foo(...) missing unspecified parameters DIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111788 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-10-12 Ever confirmed|0 |1 Known to fail||4.1.2 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- Confirmed, does not look like a regression either.
[Bug c++/111788] New: g++ DWARF for void foo(...) missing unspecified parameters DIE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111788 Bug ID: 111788 Summary: g++ DWARF for void foo(...) missing unspecified parameters DIE Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gprocida at google dot com Target Milestone: --- I'm putting this in component c++ rather than debug, because the debug information is correct for C compilation. FWIW, clang++-15 does emit DW_TAG_unspecified_parameters. $ cat v.c int foo1(int x, int y, ...) { return x + y; } int foo2(int x, ...) { return x; } int foo3(...) { return 0; } $ gcc -Wall -Wextra -g -c v.c v.o # DWARF for foo3 includes DW_TAG_unspecified_parameters $ g++ -Wall -Wextra -g -c v.c v.o # DWARF for foo3 does NOT include DW_TAG_unspecified_parameters $ g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-4' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-oyarai/gcc-13-13.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-oyarai/gcc-13-13.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=28 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Debian 13.2.0-4)
[Bug fortran/111783] 'exit' intrinsic should be marked as noreturn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111783 Andrew Pinski changed: What|Removed |Added Summary|'exit' intrinsic should be |'exit' intrinsic should be |marked as |marked as noreturn Last reconfirmed||2023-10-12 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug tree-optimization/80917] missed bit information propagation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80917 Andrew Pinski changed: What|Removed |Added Known to work||14.0 Status|NEW |RESOLVED Known to fail||10.5.0 Target Milestone|--- |14.0 Resolution|--- |FIXED --- Comment #3 from Andrew Pinski --- Fixed on the trunk, most likely via r14-2377-g0c888665dfbd517525 (and the fall ups).
[Bug fortran/39627] [meta-bug] Fortran 2008 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39627 --- Comment #3 from Paul Thomas --- Created attachment 56098 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56098&action=edit Evidence for replies in last attachment As promised in the previous entry in this PR. Paul
[Bug fortran/39627] [meta-bug] Fortran 2008 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39627 Paul Thomas changed: What|Removed |Added CC||pault at gcc dot gnu.org --- Comment #2 from Paul Thomas --- Created attachment 56097 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56097&action=edit Ian Chivers and Jane Sleightholme F2008 compliance table version 4 With Harald Anlauf's help, this version of the compliance table has been completely filled out. Please provide any comments or corrections as further comments to this PR. I intend to return it to Ian Chivers early in November. Previously, a number of lines were not filled out. The "evidence" for the responses is provided in the next comment to this PR. Paul
[Bug target/111755] The built-in memset function in GCC inadvertently generates code like "vst1.8 {d8-d9}, [sp:64]", which assumes an 8-byte alignment on the stack pointer $sp, leading to alignment vi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111755 Andrew Pinski changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #5 from Andrew Pinski --- .
[Bug target/111784] [14 Regression] aarch64: ldp_stp_{15, 16, 17, 18}.c test failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111784 --- Comment #1 from Alex Coplan --- More context/details about the issue in 8/10 of the original patch series (that the above revision comes from): https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630234.html
[Bug bootstrap/111787] [14 regression] r14-4592-g0d00385eaf72cc breaks build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111787 --- Comment #3 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:53a94071fa9e90e268a94adbdc903bd868ddeec1 commit r14-4596-g53a94071fa9e90e268a94adbdc903bd868ddeec1 Author: Jakub Jelinek Date: Thu Oct 12 17:20:36 2023 +0200 wide-int: Fix build with gcc < 12 or clang++ [PR111787] While my wide_int patch bootstrapped/regtested fine when I used GCC 12 as system gcc, apparently it doesn't with GCC 11 and older or clang++. For GCC before PR96555 C++ DR1315 implementation the compiler complains about template argument involving template parameters, for clang++ the same + complains about missing needs_write_val_arg static data member in some wi::int_traits specializations. 2023-10-12 Jakub Jelinek PR bootstrap/111787 * tree.h (wi::int_traits ::needs_write_val_arg): New static data member. (int_traits >::needs_write_val_arg): Likewise. (wi::ints_for): Provide separate partial specializations for generic_wide_int > and INL_CONST_PRECISION or that and CONST_PRECISION, rather than using int_traits >::precision_type as the second template argument. * rtl.h (wi::int_traits ::needs_write_val_arg): New static data member. * double-int.h (wi::int_traits ::needs_write_val_arg): Likewise.
[Bug bootstrap/111787] [14 regression] r14-4592-g0d00385eaf72cc breaks build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111787 --- Comment #2 from seurer at gcc dot gnu.org --- This system is on RHEL 8 which has gcc 8.5 as the distro compiler. And yes, the patch worked.
[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED --- Comment #11 from Jeffrey A. Law --- Fixed by Mary's patch on the trunk.
[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 --- Comment #10 from CVS Commits --- The master branch has been updated by Jeff Law : https://gcc.gnu.org/g:e99ad401f84ca6cd2717a58a116e44274d55da70 commit r14-4595-ge99ad401f84ca6cd2717a58a116e44274d55da70 Author: Mary Bennett Date: Thu Oct 12 09:17:24 2023 -0600 RISCV: Bugfix for incorrect documentation heading nesting PR middle-end/111777 gcc/ChangeLog: * doc/extend.texi: Change subsubsection to subsection for CORE-V built-ins.
[Bug bootstrap/111787] [14 regression] r14-4592-g0d00385eaf72cc breaks build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111787 --- Comment #1 from Jakub Jelinek --- Does 2023-10-12 Jakub Jelinek PR bootstrap/111787 * tree.h (wi::int_traits ::needs_write_val_arg): New static data member. (int_traits >::needs_write_val_arg): Likewise. (wi::ints_for): Provide separate partial specializations for generic_wide_int > and INL_CONST_PRECISION or that and CONST_PRECISION, rather than using int_traits >::precision_type as the second template argument. * rtl.h (wi::int_traits ::needs_write_val_arg): New static data member. * double-int.h (wi::int_traits ::needs_write_val_arg): Likewise. --- gcc/tree.h.jj 2023-10-12 16:01:04.0 +0200 +++ gcc/tree.h 2023-10-12 16:52:51.977954615 +0200 @@ -6237,6 +6237,7 @@ namespace wi static const enum precision_type precision_type = VAR_PRECISION; static const bool host_dependent_precision = false; static const bool is_sign_extended = false; +static const bool needs_write_val_arg = false; }; template @@ -6262,6 +6263,7 @@ namespace wi = N == ADDR_MAX_PRECISION ? INL_CONST_PRECISION : CONST_PRECISION; static const bool host_dependent_precision = false; static const bool is_sign_extended = true; +static const bool needs_write_val_arg = false; static const unsigned int precision = N; }; @@ -6293,8 +6295,14 @@ namespace wi tree_to_poly_wide_ref to_poly_wide (const_tree); template - struct ints_for >, - int_traits >::precision_type> + struct ints_for >, INL_CONST_PRECISION> + { +typedef generic_wide_int > extended; +static extended zero (const extended &); + }; + + template + struct ints_for >, CONST_PRECISION> { typedef generic_wide_int > extended; static extended zero (const extended &); @@ -6532,8 +6540,15 @@ wi::to_poly_wide (const_tree t) template inline generic_wide_int > wi::ints_for >, - wi::int_traits >::precision_type ->::zero (const extended &x) + wi::INL_CONST_PRECISION>::zero (const extended &x) +{ + return build_zero_cst (TREE_TYPE (x.get_tree ())); +} + +template +inline generic_wide_int > +wi::ints_for >, + wi::CONST_PRECISION>::zero (const extended &x) { return build_zero_cst (TREE_TYPE (x.get_tree ())); } --- gcc/rtl.h.jj2023-09-29 22:04:44.463012421 +0200 +++ gcc/rtl.h 2023-10-12 16:54:59.915240074 +0200 @@ -2270,6 +2270,7 @@ namespace wi /* This ought to be true, except for the special case that BImode is canonicalized to STORE_FLAG_VALUE, which might be 1. */ static const bool is_sign_extended = false; +static const bool needs_write_val_arg = false; static unsigned int get_precision (const rtx_mode_t &); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, const rtx_mode_t &); --- gcc/double-int.h.jj 2023-10-12 16:01:04.260164202 +0200 +++ gcc/double-int.h2023-10-12 16:53:41.401292272 +0200 @@ -442,6 +442,7 @@ namespace wi { static const enum precision_type precision_type = INL_CONST_PRECISION; static const bool host_dependent_precision = true; +static const bool needs_write_val_arg = false; static const unsigned int precision = HOST_BITS_PER_DOUBLE_INT; static unsigned int get_precision (const double_int &); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, fix this? It works for me in stage3 gcc subdir, but that worked fine without it too (I've been using g++ 12 as stage1 compiler).
[Bug bootstrap/111787] New: [14 regression] xxx breaks build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111787 Bug ID: 111787 Summary: [14 regression] xxx breaks build Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: seurer at gcc dot gnu.org Target Milestone: --- g:0d00385eaf72ccacff17935b0d214a26773e095f, r14-4592-g0d00385eaf72cc g++ -c -g -O2 -DIN_GCC-fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -DGENERATOR_FILE -I. -Ibuild -I/home/seurer/gcc/git/gcc-trunk/gcc -I/home/seurer/gcc/git/gcc-trunk/gcc/build -I/home/seurer/gcc/git/gcc-trunk/gcc/../include -I/home/seurer/gcc/git/gcc-trunk/gcc/../libcpp/include \ -o build/gencondmd.o build/gencondmd.cc In file included from /home/seurer/gcc/git/gcc-trunk/gcc/recog.h:24, from build/gencondmd.cc:40: /home/seurer/gcc/git/gcc-trunk/gcc/tree.h:6296:10: error: template argument 'wi::int_traits >::precision_type' involves template parameter(s) struct ints_for >, ^~~~ int_traits >::precision_type> make[2]: *** [Makefile:2951: build/gencondmd.o] Error 1 commit 0d00385eaf72ccacff17935b0d214a26773e095f (HEAD) Author: Jakub Jelinek Date: Thu Oct 12 16:01:12 2023 +0200 wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989]
[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 --- Comment #9 from seurer at gcc dot gnu.org --- That patch works fine on a system where the build was failing.
[Bug fortran/52994] [OOP] [F08] internal compiler error: in gfc_trans_assignment_1, at fortran/trans-expr.c:6881
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52994 Paul Thomas changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED CC||pault at gcc dot gnu.org --- Comment #15 from Paul Thomas --- >From as far back as GNU Fortran (GCC) 11.2.1 20210728, the pointer function assignment gives the correct result arr(-1) = -666.0 Marking it as fixed. Paul
[Bug fortran/39627] [meta-bug] Fortran 2008 support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39627 Bug 39627 depends on bug 52994, which changed state. Bug 52994 Summary: [OOP] [F08] internal compiler error: in gfc_trans_assignment_1, at fortran/trans-expr.c:6881 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52994 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED
[Bug c/111786] No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 Xi Ruoyao changed: What|Removed |Added Resolution|INVALID |DUPLICATE --- Comment #2 from Xi Ruoyao --- *** This bug has been marked as a duplicate of bug 10837 ***
[Bug rtl-optimization/10837] noreturn attribute causes no sibling calling optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10837 Xi Ruoyao changed: What|Removed |Added CC||lukas.graetz@tu-darmstadt.d ||e --- Comment #14 from Xi Ruoyao --- *** Bug 111786 has been marked as a duplicate of this bug. ***
[Bug c/102989] Implement C2x's n2763 (_BitInt)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102989 --- Comment #112 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:0d00385eaf72ccacff17935b0d214a26773e095f commit r14-4592-g0d00385eaf72ccacff17935b0d214a26773e095f Author: Jakub Jelinek Date: Thu Oct 12 16:01:12 2023 +0200 wide-int: Allow up to 16320 bits wide_int and change widest_int precision to 32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +
[Bug c/111786] No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- We completely intentionally don't emit tail calls to noreturn functions, so that e.g. in case of abort one doesn't need to virtually reconstruct backtrace. In your case, the interprocedural optimizations determine expr_main_original is noreturn and so calls it normally (and optimizes away anything after that call).
[Bug c/111786] New: No tail recursion for simple program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111786 Bug ID: 111786 Summary: No tail recursion for simple program Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: lukas.gra...@tu-darmstadt.de Target Milestone: --- Created attachment 56096 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56096&action=edit C code of expr_main Follow up with nearly the same source file as 111643, only without the flatten attribute. Sorry for taking so long for that. I learned the optimized compiler should output a tail recursion. But this seams not to be the case: With "sub" and "call", 16 bytes on the stack are used. The file attached file contains: --- int expr_main(int argc, char **argv) { return expr_main_original(argc, argv); } --- And after cc1 -O3 on amd64, the output contains: -- gcc 13.2.0 -- expr_main: subq$8, %rsp callexpr_main_original --- -- gcc 9.4.0 shipped with ubuntu 20.04 --- expr_main: endbr64 pushq %rax popq%rax pushq %rax callexpr_main_original --- -- Expected -- expr_main: jmp expr_main_original --- If I compile the above snippet only, I get the expected result. But not when compiling the whole C file which also includes the body of expr_main_original(). I also suspect there are some other factors I don't know, since many other functions I tested yield the expected result. In my case, the overhead seams to be negligible. However, I think it should be possible to construct similar recursive programs where the overhead compared to tail recursion is not negligible.
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #13 from David C. Manuelda --- I'd suggest for now to pick a common value in order to prevent the compilation failure (in stage comparison) while a proper fix/workaround is picked.
[Bug c++/111785] New: [modules] ICE when compiling fmt lib as module
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111785 Bug ID: 111785 Summary: [modules] ICE when compiling fmt lib as module Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: mends-sputter.0z at icloud dot com Target Milestone: --- Created attachment 56095 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56095&action=edit preprocessed output generated by -freport-bug GCC 14.0 from recent snapshot (20231008) on aarch64 Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/opt/gcc14/libexec/gcc/aarch64-unknown-linux-gnu/14.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: ../configure --prefix=/opt/gcc14 --enable-languages=c,c++ --disable-multilib Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20231008 (experimental) (GCC) When attempting to build fmt from https://github.com/fmtlib/fmt, with the following command: g++ -std=c++20 -fmodules-ts -I../include fmt.cc The compiler ICEs with the following error: fmt.cc:73:8: internal compiler error: in core_vals, at cp/module.cc:6262 73 | export module fmt; |^ 0x9a9b67 trees_out::core_vals(tree_node*) ../../gcc/cp/module.cc:6262 0x9add5f trees_out::tree_node_vals(tree_node*) ../../gcc/cp/module.cc:7218 0x9add5f trees_out::tree_value(tree_node*) ../../gcc/cp/module.cc:9083 0x9a7c73 trees_out::tree_node(tree_node*) ../../gcc/cp/module.cc:9281 0x9a9537 trees_out::core_vals(tree_node*) ../../gcc/cp/module.cc:6171 0x9a5fb3 trees_out::tree_node_vals(tree_node*) ../../gcc/cp/module.cc:7218 0x9a5fb3 trees_out::decl_value(tree_node*, depset*) ../../gcc/cp/module.cc:7797 0x9b0823 depset::hash::find_dependencies(module_state*) ../../gcc/cp/module.cc:13328 0x9b16c7 module_state::write_begin(elf_out*, cpp_reader*, module_state_config&, unsigned int&) ../../gcc/cp/module.cc:17895 0x9b2a0f finish_module_processing(cpp_reader*) ../../gcc/cp/module.cc:20241 0x91df57 c_parse_final_cleanups() ../../gcc/cp/decl2.cc:5255 0xbe773f c_common_parse_file() ../../gcc/c-family/c-opts.cc:1296 attaching the file generated by -freport-bug - compressed with the gzip tool. Note that this is similar and involves the same file as reported in this other in a comment in a separate report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108080#c7 however with gcc 14.0 snapshots, this fails with every level of optimization flags, although in each case the crash dump is slightly different and fails on a separate line.
[Bug target/111784] New: [14 Regression] aarch64: ldp_stp_{15,16,17,18}.c test failures
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111784 Bug ID: 111784 Summary: [14 Regression] aarch64: ldp_stp_{15,16,17,18}.c test failures Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: acoplan at gcc dot gnu.org Target Milestone: --- Since r14-4579-g0bdb9bb5607edd7df1ee74ddfcadb87324ca00c2 the following aarch64 tests are failing: FAIL: gcc.target/aarch64/ldp_stp_15.c check-function-bodies dup_8_int32_t FAIL: gcc.target/aarch64/ldp_stp_15.c check-function-bodies cons2_16_int32_t FAIL: gcc.target/aarch64/ldp_stp_15.c check-function-bodies cons4_8_int32_t FAIL: gcc.target/aarch64/ldp_stp_16.c check-function-bodies cons2_4_float FAIL: gcc.target/aarch64/ldp_stp_17.c check-function-bodies dup_16_int64_t FAIL: gcc.target/aarch64/ldp_stp_17.c check-function-bodies cons2_16_int64_t FAIL: gcc.target/aarch64/ldp_stp_17.c check-function-bodies cons4_16_int64_t FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies dup_8_double FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies dup_16_double FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies cons2_4_double FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies cons2_8_double FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies cons2_8_double FAIL: gcc.target/aarch64/ldp_stp_18.c check-function-bodies cons2_8_double E.g. for dup8_int32_t, we now generate: dup_8_int32_t: .LFB9: .cfi_startproc stp w1, w1, [x0] stp w1, w1, [x0, 8] stp w1, w1, [x0, 16] stp w1, w1, [x0, 24] ret instead of a dup with a q-register stp. Most likely we need to update the costs on the aarch64 side.
[Bug fortran/111783] New: 'exit' intrinsic should be marked as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111783 Bug ID: 111783 Summary: 'exit' intrinsic should be marked as Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org Target Milestone: --- Compiling testsuite/gfortran.dg/team_number_1.f90 with -O3 produces the following optimized dump: _gfortran_exit_i4 (0); _gfortran_exit_i4 (0); _gfortran_stop_numeric (2, 0); } The last three statements could be removed as the 'EXIT' intrinsic subroutine is known not to return. Thus, we should set ATTR_NORETURN_NOTHROW_LIST (cf. fortran/f95-lang.cc). There are probably more, at least the ABORT intrinsic subroutine and the functions associated with STOP / ERROR STOP like _gfortran_stop_numeric.
[Bug target/111600] [14 Regression] RISC-V bootstrap time regression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #23 from Robin Dapp --- For the lack of a better idea (and time constraints as looking for compiler bottlenecks is slow and tedious) I went with Kito's suggestion of splitting insn-emit.cc This reduces this part of the compilation with eight threads to 40s (from 10 min before). I evenly split the number of patterns into the 10 files but it just so happens that the last file will receive all the problematical maybe_code_for functions, so that file makes up for most of the 40s. The rest usually takes 5-20s. Doing bootstrapping tests now, going to post an initial patch once it's "presentable".
[Bug fortran/90608] Inline non-scalar minloc/maxloc calls
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608 Mikael Morin changed: What|Removed |Added Attachment #56091|0 |1 is obsolete|| --- Comment #11 from Mikael Morin --- Created attachment 56094 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56094&action=edit Improved patch This improved patch (still single argument only) passes the fortran regression testsuite. (In reply to Mikael Morin from comment #8) > It regresses on minloc_1.f90 at least, but I haven't be able to pinpoint the > problem in the original tree dump so far. > The problem was an initialization of the result to the first element of the array that the patch removed, which seemed useless to me but made a difference in the questionable case where the array argument is filled with nans. > The problem could be with the initialization of loop iteration variables. > (...) > Unfortunately, this conditional initialization seems to > confuse the optimizers a lot. > On closer look, the conditional initialization doesn't seem to be that confusing (at least in the problematic case), as it's removed early (ccp1) in the pipeline. The loop iteration variables remain initialized with phis, but that's because of the loops.
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2023-10-12 --- Comment #12 from Richard Biener --- Looking at the 'hybrid' flag in cpuid sounds like the most reasonable thing to do, possibly simply skipping auto-detection for the problematical parts (L1 and L2 cache sizes) as Alex suggests.
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #11 from Alexander Monakov --- (In reply to Hongtao.liu from comment #10) > > indeed (but I believe it did happen with Alder Lake already, by accident, > > with AVX512 on P-cores but not on E-cores). > > AVX512 is physically fused off for Alderlake P-core, P-core and E-core share > the same ISA level(AVX2). I think Arsen means initial Alder Lake batches, where AVX-512 wasn't yet fused off (but BIOS support was unofficial/experimental anyway).
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #10 from Hongtao.liu --- > indeed (but I believe it did happen with Alder Lake already, by accident, > with AVX512 on P-cores but not on E-cores). AVX512 is physically fused off for Alderlake P-core, P-core and E-core share the same ISA level(AVX2).
[Bug middle-end/111782] [11/12/13/14 Regression] Extra move in complex double multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111782 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.5 Keywords||needs-bisection
[Bug tree-optimization/111779] Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #3 from Richard Biener --- I posted a patch for SRA.
[Bug fortran/111781] Fortran compiler complains about variable bound in array dummy argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111781 Mikael Morin changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Keywords||patch CC||mikael at gcc dot gnu.org Last reconfirmed||2023-10-12 --- Comment #1 from Mikael Morin --- Confirmed. This should fix it: diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 1042b8c18e8..e2e0fc8eba3 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -285,6 +285,7 @@ gfc_resolve_formal_arglist (gfc_symbol *proc) sym->attr.always_explicit = 1; } + bool saved_formal_arg_flag = formal_arg_flag; formal_arg_flag = true; for (f = proc->formal; f; f = f->next) @@ -533,7 +534,7 @@ gfc_resolve_formal_arglist (gfc_symbol *proc) } } } - formal_arg_flag = false; + formal_arg_flag = saved_formal_arg_flag; }
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #9 from Alexander Monakov --- (In reply to Arsen Arsenović from comment #8) > indeed (but I believe it did happen with Alder Lake already, by accident, > with AVX512 on P-cores but not on E-cores). AFAIK on those Alder Lake CPUs you could only get AVX-512 by disabling E-cores in the BIOS, so you couldn't boot in a configuration when both E-cores are available and AVX-512 on P-cores is available.
[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768 --- Comment #8 from Arsen Arsenović --- (In reply to Alexander Monakov from comment #7) > I'm afraid hybrid CPUs with varying ISA feature sets are not practical for > the current ecosystem: you wouldn't be able to reschedule from a higher- to > lower-capable core. Not to mention scenarios like Mesa on-disk llvmpipe > shader cache. indeed (but I believe it did happen with Alder Lake already, by accident, with AVX512 on P-cores but not on E-cores). > "Always" probing all cores is a not a good idea (the compiler would have to > manually reschedule itself to all cores, of which there could be hundreds). > Plus, portable API for such probing across available cores does not exist > afaik. I'd consider this close enough to 'not possible' ;P my thinking was does cpuid provide a way to query cross-CPU (or CPU 'group' I suppose). if not, we're definitely better off just using a common, smaller cache size for intel hybrid CPUs (at least for now) > I think releasing an x86 hybrid CPU with varying capabilities across cores > would require substantial preparatory work in the kernel and likely in the > userland as well, so probably best to leave it until the time comes and > specifics of what can differ are known.
[Bug middle-end/111782] New: [11/12/13/14 Regression] Extra move in complex double multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111782 Bug ID: 111782 Summary: [11/12/13/14 Regression] Extra move in complex double multiplication Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Target: aarch64 The testcase: __complex double foo (__complex double a, __complex double b) { return a * b; } With GCC trunk at -Ofast I see on aarch64: foo(double _Complex, double _Complex): fmovd31, d1 fmuld1, d1, d2 fmadd d1, d0, d3, d1 fmuld31, d31, d3 fnmsub d0, d0, d2, d31 ret with GCC 10 the codegen used to be tighter: foo(double _Complex, double _Complex): fmuld4, d1, d3 fmuld5, d1, d2 fmadd d1, d0, d3, d5 fnmsub d0, d0, d2, d4 ret There's an extra fmov emitted on trunk. I noticed this regressed with the GCC 11 series
[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 --- Comment #8 from mary.bennett at embecosm dot com --- Thanks for pinging me, Jeff
[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777 mary.bennett at embecosm dot com changed: What|Removed |Added CC||mary.bennett at embecosm dot com --- Comment #7 from mary.bennett at embecosm dot com --- Created attachment 56093 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56093&action=edit Bugfix for incorrect documentation heading nesting
[Bug tree-optimization/111779] Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=108070 --- Comment #2 from Richard Biener --- Ah, did that already, it regressed, I filed PR108070 for it.
[Bug fortran/111781] New: Compiler error on valid code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111781 Bug ID: 111781 Summary: Compiler error on valid code Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: rasmus.vikhamar-sandberg at uit dot no Target Milestone: --- Created attachment 56092 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56092&action=edit Minimal fortran program to trigger error message I have attached a minimal Fortran code example.f90 to trigger the compiler error. I ran "gfortran example.f90 -o example.x" and got error message example.f90:8:31: 8 | real, intent(out) :: A(n) | 1 Error: Variable ‘n’ cannot appear in the expression at (1) If I replace f(g,A) with f(A,g) it compiles. I think the code should be valid Fortran code since explicit-shape arrays that are dummy arguments are allowed to have global variables as bounds.
[Bug target/111755] The built-in memset function in GCC inadvertently generates code like "vst1.8 {d8-d9}, [sp:64]", which assumes an 8-byte alignment on the stack pointer $sp, leading to alignment vi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111755 --- Comment #4 from kuzume --- I apologize, but I will retract this report. I've realized that the IRQ handler call of a certain RTOS I'm using is invoking with $sp as a multiple of 4, not 8. This violates the ARM ABI convention.
[Bug ipa/111773] Inconsistent optimization of replaced operator new()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111773 Sam James changed: What|Removed |Added CC||sjames at gcc dot gnu.org --- Comment #3 from Sam James --- (Assigned to who?)
[Bug tree-optimization/111779] Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 Richard Biener changed: What|Removed |Added Component|c |tree-optimization Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC||jamborm at gcc dot gnu.org Last reconfirmed||2023-10-12 --- Comment #1 from Richard Biener --- The issue is the aggregate copy: t.c:26:22: missed: not vectorized: more than one data ref in stmt: a = *_3; which SRA fails to scalarize: [local count: 955630224]: # s_23 = PHI # i_25 = PHI _1 = (long unsigned int) i_25; _2 = _1 * 24; _3 = x_16(D) + _2; a = *_3; _4 = BIT_FIELD_REF ; _12 = _4 & 1; _6 = (int) _12; s_18 = _6 + s_23; a ={v} {CLOBBER(eol)}; i_20 = i_25 + 1; if (y_14(D) > i_20) Candidate (2778): a ... ! Disqualifying a - No scalar replacements to be created. the BIT_FIELD_REF is already created by early folding in optimize_bit_field_compare folding (int) a.b4.f != 0 s = ((int) NON_LVALUE_EXPR > & 1) + s; SRA could handle BIT_FIELD_REFs just fine - esp. quantities with a byte size. And then this folding is just premature... Removing the folding that handles BF != CST fixes it. I know removing all of it, esp. BF != BF will regress some stuff. I'll put this half-way patch through testing. diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 82299bb7f1d..3db383360d6 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -4695,7 +4695,7 @@ optimize_bit_field_compare (location_t loc, enum tree_code code, return 0; if (const_p) -rreversep = lreversep; +return 0; else { /* If this is not a constant, we can only do something if bit positions,
[Bug tree-optimization/111780] Missed optimization of '(t*4)/(t*2) -> 2'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111780 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed||2023-10-12 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Confirmed. Same for int foo (int a, int b, int c) { return 2*c*(a*b) / (a*b); } note when we cannot remove the division like for c*a / d*a we have to watch out for -INT_MIN / -1, but I think the only way this would not invoke undefined behavior before the transform is when the factor is equal to -1 but then c*a cannot be -INT_MIN so it should be safe in general, not only when m and n are constants? Note we cannot re-associate for the transform. We only have /* Simplify (t * 2) / 2) -> t. */ (for div (trunc_div ceil_div floor_div round_div exact_div) (simplify (div (mult:c @0 @1) @1) (if (ANY_INTEGRAL_TYPE_P (type)) (if (TYPE_OVERFLOW_UNDEFINED (type)) @0 #if GIMPLE (with {value_range vr0, vr1;} (if (INTEGRAL_TYPE_P (type) && get_range_query (cfun)->range_of_expr (vr0, @0) && get_range_query (cfun)->range_of_expr (vr1, @1) && range_op_handler (MULT_EXPR).overflow_free_p (vr0, vr1)) @0)) #endif but not the case with two multiplies.
[Bug tree-optimization/111764] [11/12/13 Regression] Wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111764 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Known to work||14.0 Keywords|needs-bisection | Summary|[11/12/13/14 Regression]|[11/12/13 Regression] Wrong |Wrong code at -O3 on|code at -O3 on |x86_64-linux-gnu|x86_64-linux-gnu --- Comment #8 from Richard Biener --- Fixed on trunk sofar.
[Bug tree-optimization/111764] [11/12/13/14 Regression] Wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111764 --- Comment #7 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:05f98310b54da95e468d799f4a910174320cccbb commit r14-4588-g05f98310b54da95e468d799f4a910174320cccbb Author: Richard Biener Date: Thu Oct 12 09:09:46 2023 +0200 tree-optimization/111764 - wrong reduction vectorization The following removes a misguided attempt to allow x + x in a reduction path, also allowing x * x which isn't valid. x + x actually never arrives this way but instead is canonicalized to 2 * x. This makes reduction path handling consistent with how we handle the single-stmt reduction case. PR tree-optimization/111764 * tree-vect-loop.cc (check_reduction_path): Remove the attempt to allow x + x via special-casing of assigns. * gcc.dg/vect/pr111764.c: New testcase.
[Bug ipa/111773] Inconsistent optimization of replaced operator new()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111773 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-10-12 Status|UNCONFIRMED |ASSIGNED Component|c++ |ipa Ever confirmed|0 |1 CC||hubicka at gcc dot gnu.org, ||marxin at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Keywords||wrong-code --- Comment #2 from Richard Biener --- For the second case I think we do something wrong. local-pure-const figures operator new is 'noreturn': Function is locally looping. Function is locally throwing. Function is locally malloc. Function found to be noreturn: operator new and fixup_cfg in turn turns main into int main () { int * D.3130; int * p1; int * _3(D); : operator new (4); } which would be fine I think. But then CDDCE decides Eliminating unnecessary statements: Deleting : operator new (4); and we end up with int main () { int * D.3130; int * p1; int * _3(D); : } and local-pure-const adds an unreachable: local analysis of int main()/18 checking previously known:Function is locally const. Function found to be noreturn: main Function found to be const: int main()/18 Declaration updated to be const: int main()/18 Function found to be nothrow: main Introduced new external node (void __builtin_unreachable()/32). int main () { int * D.3130; int * p1; int * _3(D); [count: 0]: __builtin_unreachable (); I think CD-DCE shouldn't remove the call as it's looping and noreturn. It doesn't mark the allocation as necessary because of -fallocation-dce: if (callee != NULL_TREE && flag_allocation_dce && DECL_IS_REPLACEABLE_OPERATOR_NEW_P (callee)) return; we fail to check gimple_call_from_new_or_delete here I think (we later do it in most other places). But we maybe should never remove a control stmt which a noreturn call is, even more so as it can throw (yeah, we remove "dead" exceptions, but together with noreturn this doesn't quite match). Adding gimple_call_from_new_or_delete () will fix the testcase at hand but I think the same issue would exist with a class scope operator new triggered by a new expression. So, it's maybe not wrong we remove the call to ::operator new(), but if we do we have to preserve the 'return 10;' - we cannot do both, take advantage of 'noreturn' _and_ elide it. The behavior for the other testcase is OK I think.
[Bug tree-optimization/111780] New: Missed optimization of '(t*4)/(t*2) -> 2'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111780 Bug ID: 111780 Summary: Missed optimization of '(t*4)/(t*2) -> 2' Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: 652023330028 at smail dot nju.edu.cn Target Milestone: --- Hello, we found some optimizations (regarding Arithmetic optimization) that GCC may have missed. We would greatly appreicate if you can take a look and let us know what you think. Given the following code: https://godbolt.org/z/G9rWK7c3q int n1; void func1(int a){ if(a>1&&a<4) n1=(a+a+a+a)/(a+a); } Different from PR 111718, this missed optimization appears to be due to a missed pattern: (t*4)/(t*2) -> 2 # DEBUG BEGIN_STMT # RANGE [irange] int [8, 8][12, 12] NONZERO 0xc _3 = a_7(D) * 4; # RANGE [irange] int [4, 4][6, 6] NONZERO 0x6 _4 = a_7(D) * 2; # RANGE [irange] int [1, 3] NONZERO 0x3 _5 = _3 / _4; # .MEM_9 = VDEF <.MEM_8(D)> n1D.2761 = _5; Or a more general pattern: (t*m)/(t*n) -> m/n , where m and n are constants. Thank you very much for your time and effort! We look forward to hearing from you.
[Bug c/111779] New: Fail to vectorize the struct include struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111779 Bug ID: 111779 Summary: Fail to vectorize the struct include struct Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: juzhe.zhong at rivai dot ai Target Milestone: --- #include #include struct C { int c; int d; bool f :1; float e; }; struct A { unsigned int a; unsigned char c1, c2; bool b1 : 1; bool b2 : 1; bool b3 : 1; struct C b4; }; void foo (const struct A * __restrict x, int y) { int s = 0, i = 0; for (i = 0; i < y; ++i) { const struct A a = x[i]; s += a.b4.f ? 1 : 0; } assert (s == 0); //__builtin_abort (); } int main () { struct A x[100]; int i; __builtin_memset (x, -1, sizeof (x)); for (i = 0; i < 100; i++) { x[i].b1 = false; x[i].b2 = false; x[i].b3 = false; x[i].b4.f = false; } foo (x, 100); return 0; } https://godbolt.org/z/KWb7c1n5h Both SVE GCC and RVV GCC failed to vectorize it. But Clang succeed on vectorization.
[Bug c++/111771] Incorrect "is used uninitialized" warning, as if zero-initialization didn't propagate through user-provided default constructors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111771 --- Comment #4 from Richard Biener --- -fno-lifetime-dse fixes the issue (and the diagnostic)
[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769 --- Comment #4 from David Brown --- (In reply to Richard Biener from comment #1) > If you compile with debug info enabled the info should be already there, > just nothing looks at this (and mismatches) at link time. Perhaps I should file this as an enhancement request for binutils? If there is enough information already generated by gcc, then only the linker part needs to be implemented. (It could also be checked by an external program, but that would be a lot more inconvenient for the user.) Will the binutils folk be familiar with the debug information and its layout, or is there a document I can point them to? I assume this kind of information will be stable and documented, because gdb will need it. (I can read through the material for the details, but a rough pointer to get me started would help.)
[Bug c++/111771] Incorrect "is used uninitialized" warning, as if zero-initialization didn't propagate through user-provided default constructors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111771 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-10-12 Keywords||diagnostic, wrong-code Status|UNCONFIRMED |NEW CC||jason at gcc dot gnu.org Known to fail||13.2.1, 7.5.0 --- Comment #3 from Richard Biener --- We have int main () { int D.2848; { struct B b; try { b.D.2778.x = 0; B::B (&b); D.2848 = b.D.2778.x; return D.2848; but: void B::B (struct B * const this) { _1 = &this->D.2778; A::A (_1); } void A::A (struct A * const this) { *this = {CLOBBER}; so A::A invoked by B::B invalidates the initialized storage. Maybe a "different" B::B should have been called (one not invoking A::A?).
[Bug target/111424] LoongArch: Enable vect test suite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111424 --- Comment #2 from CVS Commits --- The master branch has been updated by LuluCheng : https://gcc.gnu.org/g:a2a51b6982c895ff3e37bda622303e92b3ac1d16 commit r14-4585-ga2a51b6982c895ff3e37bda622303e92b3ac1d16 Author: Chenghui Pan Date: Tue Sep 26 14:39:18 2023 +0800 LoongArch: Enable vect.exp for LoongArch. [PR111424] gcc/testsuite/ChangeLog: PR target/111424 * lib/target-supports.exp: Enable vect.exp for LoongArch.
[Bug c/111769] Annotate function definitions and calls to facilitate link-time checking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111769 --- Comment #3 from David Brown --- (In reply to Andrew Pinski from comment #2) > IIRC there was a bug about this specific thing which was closed as fixed > with the use of LTO ... Certainly if you use LTO, then this is not necessary. But LTO use is far from universal, and there can be good reasons not to use it. (It is rarely seen in small embedded systems, for various reasons - some good, some less good.) The thought here is for a very low cost (for users) enhancement - the check could be introduced without any change to the users' existing build process, code, or linker scripts.