[Bug c/79217] Feature request: high half of an integer multiplication
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79217 --- Comment #6 from Andrew Pinski --- (In reply to H. Peter Anvin from comment #5) > b) it seems likely that getting __intN where N > CHAR_BIT*sizeof(uintmax_t) > into a standard would be very hard, and thus would not be possible to > standard-track (although it could be used as an implementation on gcc using > a header inline, of course.) I saw a proposal for C23 (I think it was C23) for arbitrary bit size integers. I don't know if that included big integers either.
[Bug tree-optimization/54116] suboptimal code for tight loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-08-07 Status|UNCONFIRMED |WAITING --- Comment #3 from Andrew Pinski --- GCC, clang and ICC all optimize it this way. Do you have a testcase that causes a performance of increased register pressure?
[Bug rtl-optimization/78085] extra sign extend if used to store in 32bit and return 64bit and the upper bits are known to be zeroed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78085 Andrew Pinski changed: What|Removed |Added Component|target |rtl-optimization Last reconfirmed||2021-08-07 Ever confirmed|0 |1 Summary|Unexpected cltq instruction |extra sign extend if used |on Linux x86-64 for |to store in 32bit and |conversion of positive int |return 64bit and the upper |to long |bits are known to be ||zeroed. Status|UNCONFIRMED |NEW Keywords||missed-optimization Severity|normal |enhancement --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug middle-end/48609] Inefficient complex float argument passing/return
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48609 --- Comment #4 from Andrew Pinski --- *** Bug 77851 has been marked as a duplicate of this bug. ***
[Bug target/77851] Odd code for _Complex float return value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77851 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andrew Pinski --- Dup of bug 48609. *** This bug has been marked as a duplicate of bug 48609 ***
[Bug target/77702] suffix or operands invalid for `movq'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77702 Andrew Pinski changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- https://stackoverflow.com/questions/43101509/suffix-or-operands-invalid-for-move-with-gcc https://github.com/Homebrew/legacy-homebrew/issues/45258#issuecomment-150955783
[Bug tree-optimization/51499] -Ofast does not vectorize while -O3 does.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-08-07 Ever confirmed|0 |1 Summary|vectorizer missing simple |-Ofast does not vectorize |case|while -O3 does. Status|UNCONFIRMED |NEW --- Comment #15 from Andrew Pinski --- So here is the interesting for the trunk, With -O3 we can vectorize the loop because we are using a SLP vectorizer but -Ofast we don't as we say the vectorization is too costly. The inner most loop for -O3: .L3: addq$1, %rax addpd %xmm1, %xmm2 addpd %xmm1, %xmm3 addpd %xmm1, %xmm4 cmpq%rax, %rdi jne .L3 The SLP vectorizer has done it since 11+. Here is the inner loop for -Ofast: .L3: addq$1, %rax addsd %xmm0, %xmm3 addsd %xmm0, %xmm6 addsd %xmm0, %xmm1 addsd %xmm0, %xmm5 addsd %xmm0, %xmm2 addsd %xmm0, %xmm4 cmpq%rax, %rdi jne .L3 as you can see we don't vectorize it.
[Bug tree-optimization/61724] Some loops not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61724 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- /app/example.cpp:12:21: note: LOOP VECTORIZED /app/example.cpp:22:31: note: LOOP VECTORIZED That is A::f and A::h. A::g and A::k are optimized away as the results were not used. If I add some slight code to allow them not to be optimized away, we use __builtin_memcpy instead. So this is fixed. I not even going to check when as this has been fixed a long time ago.
[Bug middle-end/74113] by_pieces_ninsns doesn't support TImode/OImode/XImode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74113 --- Comment #1 from Andrew Pinski --- Isn't this fixed now?
[Bug tree-optimization/71726] Simplify (intptr_t)p+4-(intptr_t)(p+4)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71726 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2021-08-07 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug rtl-optimization/71775] Redundant move instruction for sign extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71775 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Severity|normal |enhancement Component|target |rtl-optimization Keywords||missed-optimization Last reconfirmed||2021-08-07 --- Comment #2 from Andrew Pinski --- Confirmed: Trying 11 -> 13: 11: {r87:DI=ctz(r86:DI);clobber flags:CC;} REG_UNUSED flags:CC 13: r88:DI=sign_extend(r87:DI#0) REG_DEAD r87:DI Failed to match this instruction: (set (reg:DI 88 [ _1 ]) (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) Part of the problem is ctz has an unkown value at 0 but we know x is non-zero (well kinda, at the gimple level we do). We do the right thing on aarch64 because we know the value at 0. Trying 11 -> 13: 11: r97:DI=ctz(r96:DI) 13: r98:DI=sign_extend(r97:DI#0) REG_DEAD r97:DI Successfully matched this instruction: (set (reg:DI 98 [ _1 ]) (ctz:DI (reg/v:DI 96 [ x ]))) allowing combination of insns 11 and 13 original costs 8 + 4 = 12 replacement cost 8 deferring deletion of insn with uid = 11. modifying insn i313: r98:DI=ctz(r96:DI) deferring rescan insn with uid = 13. So this requires us to bring the range down from gimple to RTL. Here is the range: # RANGE [1, 18446744073709551615] # x_12 = PHI
[Bug tree-optimization/68500] Remove in_loop_pipeline usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68500 Andrew Pinski changed: What|Removed |Added Component|other |tree-optimization --- Comment #7 from Andrew Pinski --- This still seems like some good cleanup with respect to PROP_scev. I can't remember how LOOP_CLOSED_SSA is handled these days though.
[Bug other/68500] Remove in_loop_pipeline usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68500 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-08-07 --- Comment #6 from Andrew Pinski --- (In reply to Tom de Vries from comment #4) > First patch posted here: > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02634.html The first was applied I see. > > Last two patches fyi-posted here: > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02688.html The last two were not. PROP_scev did not in.
[Bug plugins/101810] libiberty/simple-object-xcoff.c segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101810 --- Comment #1 from Alan Modra --- Created attachment 51272 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51272=edit Proposed fix
[Bug c++/70793] C++11: not accepting some integeral types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70793 --- Comment #3 from Andrew Pinski --- This has been broken since GCC added support non-class-key friends for C++11 which was added in 4.7.1.
[Bug c++/70793] C++11: not accepting some integeral types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70793 Andrew Pinski changed: What|Removed |Added Summary|g++ does not accept some|C++11: not accepting some |forms of "friend" |integeral types |declaration for builtin | |types | Keywords||diagnostic --- Comment #2 from Andrew Pinski --- Here is an example which gives out a diagnostic which does not make any sense: struct S { friend unsigned long long short; };
[Bug plugins/101810] New: libiberty/simple-object-xcoff.c segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101810 Bug ID: 101810 Summary: libiberty/simple-object-xcoff.c segmentation fault Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: plugins Assignee: unassigned at gcc dot gnu.org Reporter: amodra at gmail dot com Target Milestone: --- >From https://sourceware.org/bugzilla/show_bug.cgi?id=28179 binutils/nm-new --plugin ~/build/gcc-virgin/lto-plugin/.libs/liblto_plugin.so -a pr28179 AddressSanitizer:DEADLYSIGNAL = ==3630013==ERROR: AddressSanitizer: SEGV on unknown address 0x6021000a (pc 0x7fc28ca928ea bp 0x sp 0x7ffd425c36d0 T0) ==3630013==The signal is caused by a READ memory access. #0 0x7fc28ca928ea in simple_object_xcoff_find_sections /home/alan/src/gcc-virgin/libiberty/simple-object-xcoff.c:529:26 #1 0x7fc28ca874f7 in claim_file_handler /home/alan/src/gcc-virgin/lto-plugin/lto-plugin.c:1189:16 #2 0x9ad923 in try_claim /home/alan/src/binutils-gdb/bfd/plugin.c:323:7 [snip] A little analysis of the binutils testcase reveals the xcoff file header has nsyms of 0x8000. The file contains a number of places where ocr->nsyms * SYMESZ is calculated. Since ocr->nsyms is an unsigned int and SYMESZ a plain number (18), the expression overflows to zero. That results in a zero length buffer being allocated and read from file, but 0x8000 syms processed from the buffer.
[Bug c++/70793] g++ does not accept some forms of "friend" declaration for builtin types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70793 Andrew Pinski changed: What|Removed |Added Keywords||rejects-valid Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-08-07 --- Comment #1 from Andrew Pinski --- Even accepts this: typedef int t; struct S { friend t; } s;
[Bug target/70079] missed constant propagation in memcpy expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70079 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-08-07 Keywords||missed-optimization --- Comment #4 from Andrew Pinski --- movq(%rsi), %rax movq%rdi, %rcx leaq8(%rdi), %rdi movq%rax, -8(%rdi) movq504(%rsi), %rax movq%rax, 496(%rdi) andq$-8, %rdi xorl%eax, %eax subq%rdi, %rcx subq%rcx, %rsi addl$512, %ecx shrl$3, %ecx Confirmed.
[Bug target/51837] Use of result from 64*64->128 bit multiply via __uint128_t not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51837 Andrew Pinski changed: What|Removed |Added Component|middle-end |target Target||i?86-linux-gnu Known to fail|5.1.0, 5.5.0| Known to work|6.1.0, 7.1.0, 9.4.0 | --- Comment #2 from Andrew Pinski --- GCC 6+ improved/fixed the __uint128_t case but the -m32 case for uint64_t is still there.
[Bug middle-end/51837] Use of result from 64*64->128 bit multiply via __uint128_t not optimized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51837 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Target Milestone|--- |6.0 Known to work||6.1.0, 7.1.0, 9.4.0 Known to fail||5.1.0, 5.5.0
[Bug target/69519] STV doesn't use xmm register for DImove move
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69519 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Target Milestone|--- |10.0 Keywords||missed-optimization Severity|normal |enhancement Resolution|--- |FIXED Known to work||10.1.0, 11.1.0 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=91154 Known to fail||9.4.0 --- Comment #4 from Andrew Pinski --- 10+ produce: fn1(): subl$28, %esp movqa, %xmm0 movqb, %xmm1 pxor%xmm0, %xmm1 movq%xmm0, 8(%esp) movq%xmm1, a callfn2() movq8(%esp), %xmm0 movq%xmm0, a addl$28, %esp ret Most likely done by the work for PR 91154.
[Bug c++/100720] inconsistent return type deduction behavior with user defined conversion function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100720 --- Comment #1 from Andrew Pinski --- clang rejects the first example: :7:20: error: function 'f' with deduced return type cannot be used before it is defined return f(0); ^ :2:10: note: 'f' declared here auto f(auto); ^ And rejects the second with a similar message as GCC.
[Bug c++/95127] Self-calling lambda with auto return type gives misleading error message
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95127 --- Comment #2 from Andrew Pinski --- All 4 compilers (GCC, ICC, clang and MSVC) I have access to reject this code.
[Bug c++/88557] Lambda in template parameter list compiler segmentation fault (ICE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88557 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Keywords|ice-on-invalid-code |ice-on-valid-code Last reconfirmed||2021-08-07 Status|UNCONFIRMED |NEW --- Comment #2 from Andrew Pinski --- The first example works on the trunk. The second example has an template missing but even adding that, the ICE is still there.
[Bug c++/68938] [C++11] use of lambda before deduction of auto does not fail in templated function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68938 Andrew Pinski changed: What|Removed |Added Keywords||accepts-invalid Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2021-08-07 --- Comment #1 from Andrew Pinski --- Confirmed,
[Bug other/69722] [6 Regression] gcc/doc/extend.texi:7526: warning: node `Constraints' is next for `Extended Asm' in menu but not in sectioning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69722 Andrew Pinski changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #6 from Andrew Pinski --- *** Bug 68900 has been marked as a duplicate of this bug. ***
[Bug middle-end/68900] extended.texi/md.texi: Texinfo warnings regarding @node and @menu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68900 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #2 from Andrew Pinski --- Dup of bug 69722. *** This bug has been marked as a duplicate of bug 69722 ***
[Bug driver/68808] "--sysroot" not propagated to linker when "--specs" is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68808 --- Comment #1 from Andrew Pinski --- Yes because the specs are setup with the specific --sysroot option supplied. The code sets up the sysroot spec only if the --sysroot is supplied: /* Pass the --sysroot option to the linker, if it supports that. If there is a sysroot_suffix_spec, it has already been processed by this point, so target_system_root really is the system root we should be using. */ if (target_system_root) { obstack_grow (, "%(sysroot_spec) ", strlen ("%(sysroot_spec) ")); obstack_grow0 (, link_spec, strlen (link_spec)); set_spec ("link", XOBFINISH (, const char *), false); }
[Bug libgcc/67902] Undefined negation in __divdi3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67902 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2021-08-07 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- Confirmed. __moddi3 has the same issue. if (c) w = -w; So does __divmoddi4: if (c1) w = -w; if (c2) r = -r;
[Bug target/61407] Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407 Andrew Pinski changed: What|Removed |Added CC||275438859 at qq dot com --- Comment #58 from Andrew Pinski --- *** Bug 67734 has been marked as a duplicate of this bug. ***
[Bug target/67734] Gcc warning "gcc: warning: couldn’t understand kern.osversion ‘14.5.0"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67734 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED Target Milestone|--- |4.9.2 --- Comment #2 from Andrew Pinski --- Dup of bug 61407 which did not make it until 4.9.2. *** This bug has been marked as a duplicate of bug 61407 ***
[Bug middle-end/66989] poor performance of builtin_isfinite on x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66989 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from Andrew Pinski --- Your testcase program is broken once I fix it (similar to the way I fixed PR 66986), I get better results using the builtin.
[Bug target/66986] poor performance of __builtin_isinf on x64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66986 Andrew Pinski changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #6 from Andrew Pinski --- Your defined isinf2 is incorrect: int I2 isinf2 (double dx) { unsigned long x; memcpy(, , sizeof(dx)); if (2 * x == 0xffe0) return 0; else return (int) (x >> 32); } With that change, the GCC version that is produced is faster. isinf2: .LFB22: .cfi_startproc #APP # 19 "/app/example.cpp" 1 movq %xmm0, %rax # 0 "" 2 #NO_APP movabsq $-9007199254740992, %rdx leaq(%rax,%rax), %rcx shrq$32, %rax cmpq%rdx, %rcx movl$0, %edx cmove %edx, %eax ret vs isinf2: .LFB22: .cfi_startproc xorl%eax, %eax andpd .LC0(%rip), %xmm0 ucomisd .LC1(%rip), %xmm0 seta%al ret For the inlined inlined case (for the T1): .L15: movsd (%rax), %xmm0 addsd %xmm4, %xmm0 andpd %xmm3, %xmm0 ucomisd %xmm2, %xmm0 jbe .L14 addsd %xmm5, %xmm1 .L14: addq$8, %rax cmpq%rax, %rdx jne .L15 vs .L19: movsd (%rax), %xmm3 addsd %xmm0, %xmm3 movq%xmm3, %rdx leaq(%rdx,%rdx), %rcx cmpq%rdi, %rcx je .L18 shrq$32, %rdx testl %edx, %edx je .L18 addsd %xmm2, %xmm1 .L18: addq$8, %rax cmpq%rsi, %rax jne .L19 A double jump
[Bug c++/101681] PMF comparison to nullptr is not considered a constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101681 --- Comment #3 from Steven Sun --- By the way, in the current design, the class definition is passed twice in order we can see every member data/function declaration before parsing NSDMI and member functions. The class is complete after parsing all declaration, which means `::f == nullptr` can reduce to false since that. So, under current design, the following code compiles on GCC. https://godbolt.org/z/fMTsf4KoM ``` struct C { C() { static_assert(::f != 0); // complete type } void f() noexcept(::f != 0) { static_assert(::f != 0); // complete type } static_assert(__builtin_constant_p(::f));// incomplete type static_assert(!__builtin_constant_p(::f == 0)); // incomplete type }; static_assert(::f != 0); // complete type ```
[Bug c++/101681] PMF comparison to nullptr is not considered a constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101681 --- Comment #2 from Steven Sun --- The root cause for this is that the compiler forbids constant folding when involving PMF of an incomplete class. https://gcc.gnu.org/git?p=gcc.git;a=blob;f=gcc/cp/expr.c;h=d16d1896f2ddd08264b389b02b9640cca332ec13;hb=refs/heads/master#l42 (gcc/cp/expr.c) > 42 /* We can't lower this until the class is complete. */ > 43 if (!COMPLETE_TYPE_P (DECL_CONTEXT (member))) > 44 return cst; If we comment this `if`, the constant folding will succeed at (gcc/cp/expr.c) > 67 expand_ptrmemfunc_cst (cst, , ); > 68 cst = build_ptrmemfunc1 (type, delta, pfn); solving everything.
[Bug target/66589] AVX instruction set extension is not enabled by default for bdver2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66589 --- Comment #2 from Andrew Pinski --- I don't why sometimes it shows up as enabled and other times it does not.
[Bug tree-optimization/101793] Incorrect -Wmaybe-uninitialized on an unreachable use at -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101793 Martin Sebor changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |msebor at gcc dot gnu.org --- Comment #6 from Martin Sebor --- Let me see if I can handle this.
[Bug tree-optimization/101793] Incorrect production of ‘may be used uninitialized in this function [-Werror=maybe-uninitialized]'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101793 Martin Sebor changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #5 from Martin Sebor --- (In reply to Andrew Pinski from comment #2) > I don't know enough about the uninit predicated code to understand why it > can't find that bb15 is predicated on p_9(D) != 0 The pass sees that saved_10(D)(15) is uninitialized in bb 8 where the PHI with it as an operand is used and it's missing logic to figure out that the predicate guarding the uninitialized operand's definition is impossible to satisfy (i.e., that bb 15 is unreachable). The condition is p == 0 && v == 0 && fn () != 0 && p != 0 but the pass doesn't even compute it.
[Bug middle-end/101809] New: emulated gather capability doesn't support 32-bit target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101809 Bug ID: 101809 Summary: emulated gather capability doesn't support 32-bit target Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: crazylht at gmail dot com, rguenth at gcc dot gnu.org Target Milestone: --- On Linux/x86-64, I get [hjl@gnu-cfl-2 xxx]$ cat x.c #include #define loop_t uint32_t #define idx_t uint32_t void loop(double * const __restrict__ dst, double const * const __restrict__ src, idx_t const * const __restrict__ idx, loop_t const begin, loop_t const end) { for (loop_t i = begin; i < end; ++i) dst[i] = 42.0 * src[idx[i]]; } [hjl@gnu-cfl-2 xxx]$ make x.s /export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/xgcc -B/export/build/gnu/tools-build/gcc-gitlab-debug/build-x86_64-linux/gcc/ -O3 -m32 -march=x86-64 -mfpmath=sse -S x.c [hjl@gnu-cfl-2 xxx]$ cat x.s .file "x.c" .text .p2align 4 .globl loop .type loop, @function loop: .LFB0: .cfi_startproc pushl %edi .cfi_def_cfa_offset 8 .cfi_offset 7, -8 pushl %esi .cfi_def_cfa_offset 12 .cfi_offset 6, -12 pushl %ebx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movl28(%esp), %eax movl32(%esp), %ecx movl16(%esp), %ebx movl20(%esp), %esi movl24(%esp), %edi cmpl%ecx, %eax jnb .L1 movsd .LC0, %xmm1 .p2align 4,,10 .p2align 3 .L3: movl(%edi,%eax,4), %edx movsd (%esi,%edx,8), %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, (%ebx,%eax,8) addl$1, %eax cmpl%eax, %ecx jne .L3 .L1: popl%ebx .cfi_restore 3 .cfi_def_cfa_offset 12 popl%esi .cfi_restore 6 .cfi_def_cfa_offset 8 popl%edi .cfi_restore 7 .cfi_def_cfa_offset 4 ret .cfi_endproc .LFE0: .size loop, .-loop .section.rodata.cst8,"aM",@progbits,8 .align 8 .LC0: .long 0 .long 1078263808 .ident "GCC: (GNU) 12.0.0 20210806 (experimental)" .section.note.GNU-stack,"",@progbits [hjl@gnu-cfl-2 xxx]$ emulated gather capability isn't enabled.
[Bug middle-end/96989] SSA_NAMEs in Wuninitialized warning messages after r11-959
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96989 --- Comment #6 from Martin Sebor --- The tree pretty-printer would do better by obviating the internal differences: e.g., it could convert the IL for h(): a = __builtin_malloc (_1); _2 = a + 8; _3 = *_2; directly to a[2].
[Bug middle-end/96989] SSA_NAMEs in Wuninitialized warning messages after r11-959
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96989 Martin Sebor changed: What|Removed |Added Assignee|ibuclaw at gdcproject dot org |unassigned at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Component|d |middle-end Last reconfirmed||2021-08-06 --- Comment #5 from Martin Sebor --- Confirmed. The format depends on the IL which is less than ideal since that exposes internal differences that users don't care about and that can be confusing. $ cat pr96989.c && gcc -S -Wall pr96989.c int f (void) { unsigned a[3]; return a[2]; } int g (unsigned n) { int a[n]; return a[2]; } int h (unsigned n) { unsigned *a = __builtin_malloc (n); return a[2]; } pr96989.c: In function ‘f’: pr96989.c:4:11: warning: ‘a’ is used uninitialized [-Wuninitialized] 4 | return a[2]; | ~^~~ pr96989.c:3:12: note: ‘a’ declared here 3 | unsigned a[3]; |^ pr96989.c: In function ‘g’: pr96989.c:10:11: warning: ‘*a[2]’ is used uninitialized [-Wuninitialized] 10 | return a[2]; | ~^~~ pr96989.c: In function ‘h’: pr96989.c:16:11: warning: ‘*a_7 + 8’ is used uninitialized [-Wuninitialized] 16 | return a[2]; | ~^~~
[Bug tree-optimization/101805] Max -> bool0 | bool1 Min -> a & b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101805 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Last reconfirmed||2021-08-06 Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- Mine. #if GIMPLE (match gimple_truth SSA_NAME@0 (if (get_nonzero_bits (@0) == 1))) (simplify (min gimple_truth@0 gimple_truth@1) (bit_and @0 @1)) (simplify (max gimple_truth@0 gimple_truth@1) (bit_ior @0 @1)) #endif
[Bug tree-optimization/101808] comparison0 < comparison1 should be transformed into comparison0` & comparison1; likewise for <=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808 --- Comment #2 from Andrew Pinski --- bit_or should be bit_ior
[Bug tree-optimization/101808] comparison0 < comparison1 should be transformed into comparison0` & comparison1; likewise for <=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2021-08-06 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |pinskia at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Andrew Pinski --- I am going to try to fix this. A simple thing like this might work (I am going to reuse gimple_truth in other places too): #if GIMPLE (match gimple_truth SSA_NAME@0 (if (get_nonzero_bits (@0) == 1))) (simplify (lt gimple_truth@0 gimple_truth@1) (bit_and (xor! @0 { build_one_cst (type); }) @1) (simplify (le gimple_truth@0 gimple_truth@1) (bit_or (xor! @0 { build_one_cst (type); }) @1) (simplify (gt gimple_truth@0 gimple_truth@1) (bit_and (xor! @1 { build_one_cst (type); }) @0) (simplify (ge gimple_truth@0 gimple_truth@1) (bit_or (xor! @1 { build_one_cst (type); }) @0) #endif I have never used the ! in match-and-simplify before either.
[Bug tree-optimization/101808] New: comparison0 < comparison1 should be transformed into comparison0` & comparison1; likewise for <=
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101808 Bug ID: 101808 Summary: comparison0 < comparison1 should be transformed into comparison0` & comparison1; likewise for <= Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: bool f(int ai, int bi) { bool a = ai, b = bi; return a
[Bug middle-end/101807] New: bool0 < bool1 Should expand as !bool0 and bool0 <= bool1 as !bool0 | bool1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101807 Bug ID: 101807 Summary: bool0 < bool1 Should expand as !bool0 and bool0 <= bool1 as !bool0 | bool1 Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: bool f(bool a, bool b) { return a
[Bug rtl-optimization/101806] Extra zero extends for some arguments in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101806 --- Comment #1 from Andrew Pinski --- It happens to work on x86-64(with -march=skylake-avx512) becausewe get a zero_extend instead of an and there. I still don't understand how x86 is able to figure out the &1 part. Trying 11, 9 -> 12: 11: r94:SI=zero_extend(r97:SI#0) REG_DEAD r97:SI 9: r92:SI=zero_extend(r96:SI#0) REG_DEAD r96:SI 12: {r95:SI=~r92:SI:SI;clobber flags:CC;} REG_DEAD r92:SI REG_UNUSED flags:CC REG_DEAD r94:SI Failed to match this instruction: (parallel [ (set (reg:SI 95) (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0 (clobber (reg:CC 17 flags)) ]) Failed to match this instruction: (set (reg:SI 95) (zero_extend:SI (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0 Successfully matched this instruction: (set (reg:QI 94 [ b ]) (and:QI (not:QI (subreg:QI (reg:SI 96) 0)) (subreg:QI (reg:SI 97) 0))) Successfully matched this instruction: (set (reg:SI 95) (zero_extend:SI (reg:QI 94 [ b ])))
[Bug rtl-optimization/101806] New: Extra zero extends for some arguments in some cases
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101806 Bug ID: 101806 Summary: Extra zero extends for some arguments in some cases Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Target: aarch64-*-* Take: bool g(bool a, bool b) { return ~a & b; } CUT --- Currently we produce: and w1, w1, 255 and w0, w0, 255 bic w0, w1, w0 and w0, w0, 1 CUT --- But we should produce: bic w0, w1, w0 and w0, w0, 1 The zero extends are not needed. This happens because combine does the correct thing until it tries to figure out the cutting point:Trying 2, 8 -> 16: 2: r98:SI=zero_extend(x0:QI) REG_DEAD x0:QI 8: r102:SI=~r98:SI:SI REG_DEAD r98:SI REG_DEAD r99:SI 16: x0:SI=r102:SI&0x1 REG_DEAD r102:SI Failed to match this instruction: (set (reg:SI 0 x0) (and:SI (and:SI (not:SI (reg:SI 0 x0 [ a ])) (reg/v:SI 99 [ b ])) (const_int 1 [0x1]))) Successfully matched this instruction: (set (reg:SI 102) (not:SI (reg:SI 0 x0 [ a ]))) Failed to match this instruction: (set (reg:SI 0 x0) (and:SI (and:SI (reg:SI 102) (reg/v:SI 99 [ b ])) (const_int 1 [0x1]))) If we had chose (and:SI (not:SI (reg:SI 0 x0 [ a ])) (reg/v:SI 99 [ b ])) instead, we would have gotten the correct thing.
[Bug fortran/68568] ICE with automatic character object and save, in combination with some options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68568 anlauf at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |anlauf at gcc dot gnu.org --- Comment #9 from anlauf at gcc dot gnu.org --- Submitted: https://gcc.gnu.org/pipermail/fortran/2021-August/056328.html
[Bug fortran/68568] ICE with automatic character object and save, in combination with some options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68568 anlauf at gcc dot gnu.org changed: What|Removed |Added CC||anlauf at gcc dot gnu.org --- Comment #8 from anlauf at gcc dot gnu.org --- I'm testing the following (almost obvious) fix: diff --git a/gcc/fortran/primary.c b/gcc/fortran/primary.c index 9fe8d1ee20c..56a78d6f89f 100644 --- a/gcc/fortran/primary.c +++ b/gcc/fortran/primary.c @@ -2779,7 +2779,7 @@ gfc_expr_attr (gfc_expr *e) && e->value.function.isym->transformational && e->ts.type == BT_CLASS) attr = CLASS_DATA (e)->attr; - else + else if (e->symtree) attr = gfc_variable_attr (e, NULL); /* TODO: NULL() returns pointers. May have to take care of this
[Bug sanitizer/90589] In Fedora 30 ps hangs using address sanitizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90589 francis.deslauriers at efficios dot com changed: What|Removed |Added CC||francis.deslauriers@efficio ||s.com --- Comment #12 from francis.deslauriers at efficios dot com --- I am witnessing the exact same issue on Ubuntu 20.04.2. I see the same symptoms with ps, pgrep, w, w.procps, uptime, vmstat, top, and free. It seems that most of the tools of the procps project have this issue. I see the same callstack that mccannd@ shared. I could find one reference to this issue[1]. [1] https://unix.stackexchange.com/questions/652288/why-pgrep-hangs-when-clang-addresssanitizer-library-is-preloaded-using-ld-preloa
[Bug tree-optimization/101805] New: Max -> bool0 | bool1 Min -> a & b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101805 Bug ID: 101805 Summary: Max -> bool0 | bool1 Min -> a & b Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: int maxbool(bool ab, bool bb) { int a = ab; int b = bb; int c; c = (a > b)?a : b; return c; } int minbool(bool ab, bool bb) { int a = ab; int b = bb; int c; c = (a < b)?a : b; return c; } CUT These two should be optimized to just: int maxbool_or(bool ab, bool bb) { int c = ab | bb; return c; } int minbool_and(bool ab, bool bb) { int c = ab & bb; return c; } -- CUT GCC, ICC, clang nor MSVC do this optimization.
[Bug target/101804] float_vector_all_ones_operand should be used more
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101804 H.J. Lu changed: What|Removed |Added Attachment #51270|0 |1 is obsolete|| --- Comment #1 from H.J. Lu --- Created attachment 51271 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51271=edit A patch
[Bug tree-optimization/101769] loop->finite_p is not always true for some loops even with -ffinite-loops being used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101769 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |12.0
[Bug tree-optimization/96542] Failure to optimize simple code to a constant when storing part of the operation in a variable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96542 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |12.0
[Bug target/101804] New: float_vector_all_ones_operand should be used more
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101804 Bug ID: 101804 Summary: float_vector_all_ones_operand should be used more Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: crazylht at gmail dot com Target Milestone: --- Target: i386,x86-64 Created attachment 51270 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51270=edit A patch float_vector_all_ones_operand should be used more.
[Bug c++/101681] PMF comparison to nullptr is not considered a constexpr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101681 Steven Sun changed: What|Removed |Added CC||StevenSun2021 at hotmail dot com --- Comment #1 from Steven Sun --- The following program compiles. https://godbolt.org/z/aTvchYxYW ``` struct C { void f() {} static_assert(__builtin_constant_p(::f)); static_assert(!__builtin_constant_p(::f == nullptr)); // not nonzero yet }; static_assert(__builtin_constant_p(::f == nullptr)); // nonzero now struct D { void f() {} static_assert(__builtin_constant_p(::f == nullptr)); static_assert(!__builtin_constant_p(::f == nullptr)); }; static_assert(__builtin_constant_p(::f == nullptr)); static_assert(__builtin_constant_p(::f == nullptr)); ``` Looks that the `::f` is known to be constexpr right after the function was parsed. But only when the class completely parsed, its value was assigned. We can then compare it to nullptr. To make code in comment0 accepted, we need some kind of `not null' mark on the expression tree. 0ne possible way is to assign the `::f` in advance, right after it was parsed.
[Bug middle-end/101799] Warning messages for PMF leak internal names like ::__pfn and ::__delta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101799 --- Comment #2 from Martin Sebor --- PR 96989 is related only in that it also involves the pretty printer. Otherwise, to avoid SSA_NAMEs the pretty-printer needs to recursively expand them into their assignments from DECLs or expressions (e.g., results of function calls etc.)
[Bug middle-end/101799] Warning messages for PMF leak internal names like ::__pfn and ::__delta
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101799 Martin Sebor changed: What|Removed |Added Status|UNCONFIRMED |NEW Blocks||24639 Component|c++ |middle-end CC||msebor at gcc dot gnu.org Ever confirmed|0 |1 Keywords||diagnostic Last reconfirmed||2021-08-06 --- Comment #1 from Martin Sebor --- The internal names are the ++ front end representation of member pointers as seen in the dump below. The pretty printer might be able to do a better job formatting them (i.e., recognize it's a member pointer and print the name as it appears in the source) but then we'd end up with a duplicate warning. To avoid that -Wuninitialized would also have to recognize member pointers and treat them as special. It might help to mark the internal names DECL_ARTIFICIAL. bool f () { struct { void S:: (struct S *) * __pfn; long int __delta; } mp; ... void S:: (struct S *) * _1; ... : # VUSE <.MEM_4(D)> _1 = mp.__pfn; <<< -Wuninitialized Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=24639 [Bug 24639] [meta-bug] bug to track all Wuninitialized issues
[Bug c++/101795] (x > QNaNf) is not a constant expression
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101795 --- Comment #1 from Marc Glisse --- Hint: -fno-trapping-math lets it compile. It should probably be accepted in a manifestly_const_eval context, although some in the committee wanted to prevent the use of NaN (and sometimes even infinity!) in constant expressions...
[Bug target/101723] arm: incorrect order of .fpu and .arch_extension directives leads to unsupported instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101723 --- Comment #4 from CVS Commits --- The master branch has been updated by Christophe Lyon : https://gcc.gnu.org/g:aff75af3b50f8c039ed6fbfa3f313ba45d44f6e6 commit r12-2791-gaff75af3b50f8c039ed6fbfa3f313ba45d44f6e6 Author: Christophe Lyon Date: Fri Aug 6 14:25:47 2021 + arm: Fix pr69245.c testcase for reorder assembler architecture directives [PR101723] In gcc.target/arm/pr69245.c, to have a .fpu neon-vfpv4 directive, make sure code for fn1() is emitted, by removing the static keyword. Fix a typo in gcc.target/arm/pr69245.c, where \s should be \\s. 2021-08-06 Christophe Lyon gcc/testsuite/ PR target/101723 * gcc.target/arm/pr69245.c: Make sure to emit code for fn1, fix typo.
[Bug target/101723] arm: incorrect order of .fpu and .arch_extension directives leads to unsupported instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101723 --- Comment #3 from CVS Commits --- The master branch has been updated by Christophe Lyon : https://gcc.gnu.org/g:a22b3e022c2b45047a28d901042888eb77620499 commit r12-2790-ga22b3e022c2b45047a28d901042888eb77620499 Author: Christophe Lyon Date: Fri Aug 6 14:06:44 2021 + arm: Fix typos for reorder assembler architecture directives [PR101723] Two tests had typos preventing them from passing, committed as obvious. 2021-08-06 Christophe Lyon gcc/testsuite/ PR target/101723 * gcc.target/arm/attr-neon3.c: Fix typo. * gcc.target/arm/pragma_fpu_attribute_2.c: Fix typo.
[Bug rtl-optimization/101260] [10/11/12 Regression] Backport 27381e78925 to GCC 11
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101260 --- Comment #10 from Stefan Schulze Frielinghaus --- In regcprop we call find_oldest_value_reg which itself calls maybe_mode_change (TImode, TImode, DImode, 10, 18) where we have regno += subreg_regno_offset (regno, orig_mode, offset, new_mode); The call is made where offset equals 8 which is wrong since we are interested in the high part which is contained in r10 and not r11. The following patch fixes this: diff --git a/gcc/regcprop.c b/gcc/regcprop.c index d2a01130fe1..0e1ac12458a 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -414,9 +414,14 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode, copy_nregs, _per_reg)) return NULL_RTX; poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs); - poly_uint64 offset - = subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset, - GET_MODE_SIZE (orig_mode)); + poly_uint64 offset = +#if WORDS_BIG_ENDIAN + subreg_size_highpart_offset +#else + subreg_size_lowpart_offset +#endif + (GET_MODE_SIZE (new_mode) + copy_offset, +GET_MODE_SIZE (orig_mode)); regno += subreg_regno_offset (regno, orig_mode, offset, new_mode); if (targetm.hard_regno_mode_ok (regno, new_mode)) return gen_raw_REG (new_mode, regno); With the patch (insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ]) (reg:DI 18 %f4)) 1376 {*movdi_64} (nil)) is first modified into a noop (insn 234 222 235 14 (set (reg:DI 10 %r10 [ a ]) (reg:DI 10 %r10 [18])) 1376 {*movdi_64} (nil)) and then deleted within regcprop.
[Bug c++/101803] New: CTAD fails for nested designated initializers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101803 Bug ID: 101803 Summary: CTAD fails for nested designated initializers Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: h2+bugs at fsfe dot org Target Milestone: --- See the code below. Two initializing statements fail to build, although they shouldn't influence CTAD of the outer type at all (IMHO). struct Inner { int i = 0; }; struct Outer2 { Inner s{}; }; template struct Outer { Inner s{}; }; int main() { Outer2 o21{ .s = {} };// works Outer2 o22{ .s = Inner{ .i = 1} };// works Outer2 o23{ .s = { .i = 1} }; // works Outer2 o24{ .s{} }; // works Outer2 o25{ .s{Inner{ .i = 1} } };// works Outer2 o26{ .s{ .i = 1} };// works Outer o1{ .s = {} };// works Outer o2{ .s = Inner{ .i = 1} };// works //Outer o3{ .s = { .i = 1} }; // does not Outer o4{ .s{} }; // works Outer o5{ .s{Inner{ .i = 1} } };// works //Outer o6{ .s{ .i = 1} };// does not }
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 101801, which changed state. Bug 101801 Summary: vect_worthwhile_without_simd_p is broken https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed.
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 --- Comment #5 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:f31da42e047e8018ca6ad9809273bc7efb6ffcaf commit r12-2789-gf31da42e047e8018ca6ad9809273bc7efb6ffcaf Author: Richard Biener Date: Fri Aug 6 14:39:05 2021 +0200 tree-optimization/101801 - remove vect_worthwhile_without_simd_p This removes the cost part of vect_worthwhile_without_simd_p, retaining only the correctness bits. The reason is that the cost heuristic do not properly account for SLP plus the check whether "without simd" applies misfires for AVX512 mask vectors at the moment, leading to missed vectorizations there. Any costing decision should take place in the cost modeling, no single stmt is to disable all vectorization on its own. 2021-08-06 Richard Biener PR tree-optimization/101801 * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this. * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this and fold in vect_min_worthwhile_factor. (vect_min_worthwhile_factor): Remove. (vectorizable_reduction): Adjust and remove the cost part. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise.
[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531 --- Comment #13 from H.J. Lu --- Here is the equivalent C code: --- #include #define loop_t uint32_t #define idx_t uint32_t void loop(double * const __restrict__ dst, double const * const __restrict__ src, idx_t const * const __restrict__ idx, loop_t const begin, loop_t const end) { for (loop_t i = begin; i < end; ++i) dst[i] = 42.0 * src[idx[i]]; } ---
[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531 --- Comment #12 from H.J. Lu --- For some reason, -march=x86-64 -mx32 and -march=x86-64 -m32 -mfpmath=sse won't vectorize the loop.
[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531 --- Comment #11 from Richard Biener --- OK, that probably was an unintended side-effect of now doing /* Include the conversion if it is widening and we're using the IFN path or the target can handle the converted from offset or the current size is not already the same as the data vector element size. */ if ((TYPE_PRECISION (TREE_TYPE (op0)) < TYPE_PRECISION (TREE_TYPE (off))) && (use_ifn_p || (DR_IS_READ (dr) ? (targetm.vectorize.builtin_gather && targetm.vectorize.builtin_gather (vectype, TREE_TYPE (op0), scale)) : (targetm.vectorize.builtin_scatter && targetm.vectorize.builtin_scatter (vectype, TREE_TYPE (op0), scale))) || !operand_equal_p (TYPE_SIZE (TREE_TYPE (off)), TYPE_SIZE (TREE_TYPE (vectype)), 0))) { off = op0; offtype = TREE_TYPE (off); STRIP_NOPS (off); continue; } that is we no longer try to consume the conversion because with the conversion source the gather is not supported and the offset is also already of the size of the data. We should probably add this testcase to make sure any other heuristic improvements in the above code doesn't break it again.
[Bug tree-optimization/88531] Index data types when targeting AVX-512 vectorization with gather/scatter
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88531 --- Comment #10 from H.J. Lu --- It is fixed by r12-2733.
[Bug target/101797] ICE on valid code at -O2 and -O3: in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101797 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Uroš Bizjak --- Fixed.
[Bug target/101797] ICE on valid code at -O2 and -O3: in extract_constrain_insn, at recog.c:2670
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101797 --- Comment #3 from CVS Commits --- The master branch has been updated by Uros Bizjak : https://gcc.gnu.org/g:cd04e829c3ae244abd711e2597f8b72d6c58c713 commit r12-2787-gcd04e829c3ae244abd711e2597f8b72d6c58c713 Author: Uros Bizjak Date: Fri Aug 6 14:21:27 2021 +0200 i386: Fix conditional move reg-to-reg move elimination peepholes [PR101797] Add missing operand predicate, otherwise any RTX will match. 2021-08-06 Uroš Bizjak gcc/ PR target/101797 * config/i386/i386.md (cmove reg-to-reg move elimination peephole2s): Add general_gr_operand predicate to operand 3. gcc/testsuite/ PR target/101797 * gcc.target/i386/pr101797.c: New test.
[Bug gcov-profile/99440] [GCOV] Wrong coverage with callsite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99440 --- Comment #1 from Yang Wang --- And also line 26 should be executed 5 times
[Bug c++/53660] function pointer conversion function template with nested-name-specifier ignored
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53660 David Krauss changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #3 from David Krauss --- (The standardese mentions "for each … where the conversion-type-id denotes a type" which excludes the case of being a dependent type.)
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 --- Comment #4 from Richard Biener --- (In reply to rsand...@gcc.gnu.org from comment #3) > So maybe a less invasive fix would be to add && !VECTOR_BOOLEAN_TYPE_P > to the condition. Still no objection to killing it off instead though. Yeah, I've pondered adding && !mask_op_p - but I'm not sure if VECTOR_BOOLEAN_TYPE_P can be generic vectors (I guess not at the moment). So detecting what is a generic vector and what not seems fragile. Btw, we _do_ vectorize using vector booleans, it's just the check for vect_worthwhile_without_simd_p misclassifies the non-vector-mode types as "generic". I'm currently checking whether there's any testsuite fallout in making vect_worthwhile_without_simd_p return true unconditionally.
[Bug analyzer/97110] [meta-bug] tracker bug for supporting C++ in -fanalyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97110 Bug 97110 depends on bug 101781, which changed state. Bug 101781 Summary: make_unique generating a warning with -fanalyzer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101781 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME
[Bug analyzer/101781] make_unique generating a warning with -fanalyzer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101781 KL changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #2 from KL --- Ok I will wait then. Thanks! Solved
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 --- Comment #3 from rsandifo at gcc dot gnu.org --- So maybe a less invasive fix would be to add && !VECTOR_BOOLEAN_TYPE_P to the condition. Still no objection to killing it off instead though.
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 --- Comment #2 from rsandifo at gcc dot gnu.org --- Never really looked at the SIMD-without-SIMD stuff. When I first saw this, I was hoping you were suggesting killing off the whole thing :-) So no, no objection from me. It sounds like in the motivating case we should really be vectorising the mask operations as vectors of booleans though, even if the vectors have an integer TYPE_MODE.
[Bug c++/99901] [9/10 Regression] static const class var implemented with constexpr doesn't emit symbols in C++17 mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99901 Jakub Jelinek changed: What|Removed |Added CC||koncek.marian at gmail dot com --- Comment #7 from Jakub Jelinek --- *** Bug 101800 has been marked as a duplicate of this bug. ***
[Bug c++/101800] mingw-64 link error with constexpr static variable definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101800 Jakub Jelinek changed: What|Removed |Added Resolution|--- |DUPLICATE CC||jakub at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED --- Comment #1 from Jakub Jelinek --- Dup *** This bug has been marked as a duplicate of bug 99901 ***
[Bug tree-optimization/101802] Vectorization can end up creating vector bool CTORs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101802 Richard Biener changed: What|Removed |Added Blocks||53947 Keywords||missed-optimization --- Comment #1 from Richard Biener --- See PR101636 for a testcase where this triggers. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug c++/70413] Class template names in anonymous namespaces are not globally unique
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70413 Jonathan Wakely changed: What|Removed |Added See Also|https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=101695| CC||tom_maly at volny dot cz --- Comment #5 from Jonathan Wakely --- *** Bug 101695 has been marked as a duplicate of this bug. ***
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 Jonathan Wakely changed: What|Removed |Added Resolution|--- |DUPLICATE Status|NEW |RESOLVED See Also|https://gcc.gnu.org/bugzill | |a/show_bug.cgi?id=70413 | --- Comment #5 from Jonathan Wakely --- Definitely a dup *** This bug has been marked as a duplicate of bug 70413 ***
[Bug tree-optimization/101802] New: Vectorization can end up creating vector bool CTORs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101802 Bug ID: 101802 Summary: Vectorization can end up creating vector bool CTORs Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- When SLP vectorization creates code for external bool defs that are used in condition composition we can end up combining them with vector booleans and thus push types like vector(16) on them. vect_create_constant_vectors then converts the components to signed-boolean:1 using _3 = _2 ? -1 : 0 and builds a CTOR with signed-boolean:1 components. It's probably better to code-generate the "conversion" to vector bool by using a CTOR with the original bools and then producing the vector bool mask by a comparison against zero (if supported?).
[Bug tree-optimization/101801] vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Blocks||53947 Keywords||missed-optimization --- Comment #1 from Richard Biener --- I'm not sure why we have vect_worthwhile_without_simd_p at all (it looks like a cost thing and thus should be an overall assessment and not local spread across vectorizable_*). Any objection to kill it off completely? Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug tree-optimization/101801] New: vect_worthwhile_without_simd_p is broken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101801 Bug ID: 101801 Summary: vect_worthwhile_without_simd_p is broken Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- vect_worthwhile_without_simd_p is currently bool vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code) { loop_vec_info loop_vinfo = dyn_cast (vinfo); unsigned HOST_WIDE_INT value; return (loop_vinfo && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant () && value >= vect_min_worthwhile_factor (code)); } which means it's never worthwhile to BB vectorize. Also the VF check doesn't honor SLP so that a fully SLPed loop with VF == 1 is never considered worthwhile to vectorize. I ran into this beast when looking at vectorization of mask condition operations like cond_mask1 & cond_mask2 which, for AVX512, have integer mode but vectorizable_operation does /* Worthwhile without SIMD support? Check only during analysis. */ if (!VECTOR_MODE_P (vec_mode) && !vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "not worthwhile without SIMD support.\n"); return false; } and in my case with SLP the VF was indeed one and vectorization failed. I think the code should not look at the vectorization factor but instead at the vector type (and its number of components).
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 --- Comment #4 from Jonathan Wakely --- PR 70413 comment 3 has a suggested fix
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 Jonathan Wakely changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=70413 --- Comment #3 from Jonathan Wakely --- Possibly a dup of PR 70413
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 Jonathan Wakely changed: What|Removed |Added Attachment #51268|0 |1 is obsolete|| --- Comment #2 from Jonathan Wakely --- Created attachment 51269 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51269=edit Tar file with reproducer Slightly further reduced. The bug is the visibility of this symbol: W _ZN8DelegateIFvPvEE4bindI11MemoryArenaXadL_ZNS4_7destroyIN12_GLOBAL__N_16TesterEEEvS0_RS2_PT_ That should have internal linkage. Because both aaa.o and bbb.o contain that as a weak symbol, the linker merges them and only keeps the first one, which runs the destroy<{aaa.cpp anonymous namespace}::Tester> specialization, which casts the void* to that type and so deletes it as the wrong type, which runs the wrong destructor, and decrements the wrong counter.
[Bug tree-optimization/101636] [11/12 Regression] ICE: verify_gimple failed (error: conversion of register to a different size in 'view_convert_expr')
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101636 --- Comment #6 from Richard Biener --- So what happens is that we have a vector(16) constructor _151 = {_150, _149, _148, _147, _146, _145, _144, _143, _142, _141, _140, _139, _138, _137, _136, _135}; fed by a series of _150 = _75 ? -1 : 0; stmts that compute a from a _Bool. We're now trying to vectorize that CTOR (I think that's good). Now, bool pattern detection doesn't consider a vector CTOR of to be a mask precision "sink" which means we end up with t.i:26:1: note: using boolean precision 32 for _49 = _17 != 0; t.i:26:1: note: using boolean precision 32 for _74 = _1 != 0; t.i:26:1: note: using boolean precision 32 for _75 = _73 & _74; t.i:26:1: note: using boolean precision 32 for _70 = _4 != 0; t.i:26:1: note: using boolean precision 32 for _71 = _69 & _70; ... because eventually the compares are 'int' loads. Now, there's of course the issue that the vectorizer produces this inefficient code because of similar issues when analyzing the following if-conversion result in BB vect mode from the loop vectorizer: _16 = MEM[(int *)a_81 + 60B]; _47 = _16 != 0; _45 = _47 & _49; iftmp.0_43 = _45 ? _16 : 0; MEM[(int *)e_82 + 60B] = iftmp.0_43; here we end up with the same precisions. I'm actually unsure how things should go here, vect_recog_bool_pattern seems to look at COND_EXPR conditions, but then it does else if (rhs_code == COND_EXPR && TREE_CODE (var) == SSA_NAME) { vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (lhs)); if (vectype == NULL_TREE) return NULL; /* Build a scalar type for the boolean result that when vectorized matches the vector type of the result in size and number of elements. */ unsigned prec = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vectype)), TYPE_VECTOR_SUBPARTS (vectype)); tree type = build_nonstandard_integer_type (prec, TYPE_UNSIGNED (TREE_TYPE (var))); if (get_vectype_for_scalar_type (vinfo, type) == NULL_TREE) return NULL; if (!check_bool_pattern (var, vinfo, bool_stmts)) return NULL; going the classic way of using a non-mask type. For the testcase in question check_bool_pattern fails though. But we fail in vectorizable_operation because for a MASK and we run into /* Worthwhile without SIMD support? Check only during analysis. */ if (!VECTOR_MODE_P (vec_mode) && !vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "not worthwhile without SIMD support.\n"); return false; } that looks like an inefficiency (only triggering for low tripcount loops). Also vect_worthwhile_without_simd_p looks at the VF only which is insufficient for SLP. Even with that fixed the BB vectorization triggered from loop vect does not see the invariant compared defs of one arm of the bit-and so we just create another vector CTOR with and we repeat the same mistakes.
[Bug c++/101800] New: mingw-64 link error with constexpr static variable definition
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101800 Bug ID: 101800 Summary: mingw-64 link error with constexpr static variable definition Product: gcc Version: 10.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: koncek.marian at gmail dot com Target Milestone: --- Version: x86_64-w64-mingw32-g++ (GCC) 10.3.1 20210422 (Fedora MinGW 10.3.1-1.fc34) Using 3 source files: #pragma once struct S { const static int value; }; #include "test.hpp" constexpr int S::value = 0; #include "test.hpp" int main() { return S::value; } Compile with: $ x86_64-w64-mingw32-g++ -std=c++2a -c test.cpp && x86_64-w64-mingw32-g++ -std=c++2a test.o main.cpp Results in: /usr/lib/gcc/x86_64-w64-mingw32/10.3.1/../../../../x86_64-w64-mingw32/bin/ld: /tmp/ccANpWMi.o:main.cpp:(.rdata$.refptr._ZN1S5valueE[.refptr._ZN1S5valueE]+0x0): undefined reference to `S::value' collect2: error: ld returned 1 exit status Whereas plain g++ is able to link the files in the executable. Changing the `constexpr` definition to `const` makes mingw link the files successfully.
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 Jonathan Wakely changed: What|Removed |Added Attachment #51225|0 |1 is obsolete|| --- Comment #1 from Jonathan Wakely --- Created attachment 51268 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51268=edit Tar file with reproducer This is a slightly reduced version of the repro, containing only the source files. It can be built using g++ *.cpp with any version from 4.7.0 onwards to demonstrate the bug.
[Bug c++/101695] calling incorrect destructor of same-name class in anonymous namespaces in separate translation units
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101695 Jonathan Wakely changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2021-08-06
[Bug tree-optimization/101512] [11 Regression] ICE in maybe_trim_constructor_store, at tree-ssa-dse.c:379
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101512 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Known to fail||11.2.0 Known to work||11.2.1 Status|ASSIGNED|RESOLVED --- Comment #8 from Richard Biener --- Fixed.
[Bug target/101505] [10/11 Regression] ICE: verify_gimple failed (error: incompatible types in 'PHI' argument 0)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101505 --- Comment #9 from CVS Commits --- The releases/gcc-11 branch has been updated by Richard Biener : https://gcc.gnu.org/g:c891d268c446bd01f82e256e24580afcb3b526ae commit r11-8831-gc891d268c446bd01f82e256e24580afcb3b526ae Author: Richard Biener Date: Mon Jul 19 13:29:16 2021 +0200 tree-optimization/101505 - properly determine stmt precision for PHIs Loop vectorization pattern recog fails to walk PHIs when determining stmt precisions. This fails to recognize non-mask uses for bools in PHIs and outer loop vectorization. 2021-07-19 Richard Biener PR tree-optimization/101505 * tree-vect-patterns.c (vect_determine_precisions): Walk PHIs also for loop vectorization. * gcc.dg/vect/pr101505.c: New testcase. (cherry picked from commit 8df3ee8f7d85d0708f3c3ca96b55c9230c2ae9f0)
[Bug tree-optimization/101512] [11 Regression] ICE in maybe_trim_constructor_store, at tree-ssa-dse.c:379
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101512 --- Comment #7 from CVS Commits --- The releases/gcc-11 branch has been updated by Richard Biener : https://gcc.gnu.org/g:129bf4f2efc0ec22ee14affd8c8a3bfe07896356 commit r11-8830-g129bf4f2efc0ec22ee14affd8c8a3bfe07896356 Author: Richard Biener Date: Wed Jul 21 09:14:24 2021 +0200 c/101512 - fix missing address-taking in c_common_mark_addressable_vec c_common_mark_addressable_vec fails to look through C_MAYBE_CONST_EXPR in the case it isn't at the toplevel. 2021-07-21 Richard Biener PR c/101512 gcc/c-family/ * c-common.c (c_common_mark_addressable_vec): Look through C_MAYBE_CONST_EXPR even if not at the toplevel. gcc/testsuite/ * gcc.dg/torture/pr101512.c: New testcase. (cherry picked from commit e63d76234d18cac731c4f3610d513bd8b39b5520)
[Bug c++/85087] call to a non-const member function on a const lvalue accepted
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85087 Jonathan Wakely changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #3 from Jonathan Wakely --- (In reply to Martin Sebor from comment #0) > While looking at bug 85043 I noticed that in the test case below, GCC > correctly rejects the attempt to convert the const reference to B to A in > the call to g() No, it allows the conversion, but that produces an rvalue which can't bind to the A& parameter of g. > but it accepts the same invalid conversion in the context > where a a non-const member function on the result of the conversion is > called. Other compilers reject both conversions. Why should it be rejected? ((A)b) calls the conversion operator to get a const A& and then initializes an rvalue of type A from that. The rvalue is not const, and you can call the member function. i.e. equivalent to: static_cast(b.operator const A&()).f(); or: A(b).f(); both of which are accepted, as they should be. EDG accepts the static_cast version, but not A(b).f(), I don't know why. But I think EDG has the bug here, not GCC.