[Bug libstdc++/115040] Missed optimization opportunity in std::find of std::vector elements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115040 --- Comment #9 from AK --- Thanks for merging the patch!
[Bug libstdc++/115040] Missed optimization opportunity in std::find of std::vector elements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115040 --- Comment #6 from AK --- The duplicate part of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545, the first loop, will get fixed with Jonathan's patch.
[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #10 from AK --- With this patch find of int8_t gets converted to memchr. Using testcase from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115040 as example. With the patch posted in https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html ``` bool find_epi8(const std::vector& v) { return std::find(v.begin(), v.end(), 42) != v.end(); } ``` $ gcc -O3 -ftree-vectorize -march=pantherlake test.cpp -S -o test.s ``` .globl _Z9find_epi8RKSt6vectorIaSaIaEE .type _Z9find_epi8RKSt6vectorIaSaIaEE, @function _Z9find_epi8RKSt6vectorIaSaIaEE: .LFB1535: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq 8(%rdi), %rbx movq (%rdi), %rdi testq %rdi, %rdi je .L2 movq %rbx, %rdx subq %rdi, %rdx xorl %eax, %eax testq %rdx, %rdx jle .L1 movl $42, %esi call memchr movq %rax, %rdx cmpq %rax, %rbx setne %al testq %rdx, %rdx setne %dl andl %edx, %eax .L1: popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L2: .cfi_restore_state cmpq $3, %rbx jg .L4 cmpq $2, %rbx je .L10 cmpq $3, %rbx je .L6 xorl %eax, %eax cmpq $1, %rbx jne .L1 .L7: cmpb $42, (%rdi) sete %al cmpq %rdi, %rbx setne %dl andl %edx, %eax popq %rbx .cfi_def_cfa_offset 8 ret .L10: .cfi_restore_state xorl %edx, %edx movl $1, %edi .L5: cmpb $42, (%rdx) movl $1, %eax jne .L7 popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 ret .L6: .cfi_restore_state cmpb $42, 0 movl $1, %eax je .L1 movl $2, %edi movl $1, %edx jmp .L5 .cfi_endproc .section .text.unlikely .cfi_startproc .type _Z9find_epi8RKSt6vectorIaSaIaEE.cold, @function _Z9find_epi8RKSt6vectorIaSaIaEE.cold: .LFSB1535: .L4: .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movzbl 0, %eax ud2 .cfi_endproc .LFE1535: .text .size _Z9find_epi8RKSt6vectorIaSaIaEE, .-_Z9find_epi8RKSt6vectorIaSaIaEE .section .text.unlikely .size _Z9find_epi8RKSt6vectorIaSaIaEE.cold, .-_Z9find_epi8RKSt6vectorIaSaIaEE.cold ```
[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #7 from AK --- Is there a plan to push a patch for this?
[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 --- Comment #6 from AK --- > We can use memchr to find a char in a range of signed char, or even to find > an int in a range of signed char, as long as we're careful about values. +1, this approach should fix the bug i reported https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115040
[Bug libstdc++/88545] std::find compile to memchr in trivial random access cases (patch)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #5 from AK --- > I think we're going to remove the manual loop unrolling in __find_if for GCC > 15, which should allow the compiler to optimize it better, potentially > auto-vectorizing. That might make memchr less advantageous, but I think it's > worth doing anyway. And even for code-size flags (-Os) memchr still gives best of both worlds as auto-vectorizing increases the size.
[Bug tree-optimization/115041] New: Missed optimization opportunity in std::find of std::vector elements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115041 Bug ID: 115041 Summary: Missed optimization opportunity in std::find of std::vector elements Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://gcc.godbolt.org/z/s3hv15935 ``` #include #include #include bool find_epi8(const std::vector& v) { return std::find(v.begin(), v.end(), 42) != v.end(); } bool find_epi32(const std::vector& v) { return std::find(v.begin(), v.end(), 42) != v.end(); } ``` $ gcc -O3 -ftree-vectorize -march=pantherlake ``` find_epi8(std::vector > const&): mov rcx, QWORD PTR [rdi+8] mov rdx, QWORD PTR [rdi] mov rsi, rcx sub rsi, rdx mov rax, rsi sar rax, 2 testrax, rax jle .L2 lea rax, [rdx+rax*4] jmp .L8 .L3: cmp BYTE PTR [rdx+1], 42 je .L23 cmp BYTE PTR [rdx+2], 42 je .L24 cmp BYTE PTR [rdx+3], 42 je .L25 add rdx, 4 cmp rdx, rax je .L26 .L8: cmp BYTE PTR [rdx], 42 jne .L3 .L21: cmp rcx, rdx setne al ret .L26: mov rsi, rcx sub rsi, rdx .L2: cmp rsi, 2 je .L9 cmp rsi, 3 je .L10 cmp rsi, 1 je .L11 xor eax, eax ret .L10: cmp BYTE PTR [rdx], 42 je .L21 add rdx, 1 .L9: cmp BYTE PTR [rdx], 42 je .L21 add rdx, 1 .L11: cmp BYTE PTR [rdx], 42 seteal cmp rcx, rdx setne dl and eax, edx ret .L23: add rdx, 1 cmp rcx, rdx setne al ret .L24: add rdx, 2 cmp rcx, rdx setne al ret .L25: add rdx, 3 cmp rcx, rdx setne al ret find_epi32(std::vector > const&): mov rcx, QWORD PTR [rdi+8] mov rdx, QWORD PTR [rdi] mov rax, rcx sub rax, rdx mov rsi, rax sar rax, 4 sar rsi, 2 testrax, rax jle .L28 sal rax, 4 add rax, rdx jmp .L34 .L29: cmp DWORD PTR [rdx+4], 42 je .L48 cmp DWORD PTR [rdx+8], 42 je .L49 cmp DWORD PTR [rdx+12], 42 je .L50 add rdx, 16 cmp rdx, rax je .L51 .L34: cmp DWORD PTR [rdx], 42 jne .L29 .L47: cmp rcx, rdx setne al ret .L51: mov rsi, rcx sub rsi, rdx sar rsi, 2 .L28: cmp rsi, 2 je .L35 cmp rsi, 3 je .L36 cmp rsi, 1 je .L37 xor eax, eax ret .L36: cmp DWORD PTR [rdx], 42 je .L47 add rdx, 4 .L35: cmp DWORD PTR [rdx], 42 je .L47 add rdx, 4 .L37: cmp DWORD PTR [rdx], 42 seteal cmp rcx, rdx setne dl and eax, edx ret .L48: add rdx, 4 cmp rcx, rdx setne al ret .L49: add rdx, 8 cmp rcx, rdx setne al ret .L50: add rdx, 12 cmp rcx, rdx setne al ret ``` clang lowers both the calls to (w)memchr
[Bug tree-optimization/107263] Memcpy not elided when initializing struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107263 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #3 from AK --- Seems like a duplicate of #59863 ?
[Bug middle-end/59863] const array in function is placed on stack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59863 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #9 from AK --- *** Bug 114342 has been marked as a duplicate of this bug. ***
[Bug middle-end/114342] suboptimal codegen of vector::vector(range)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342 AK changed: What|Removed |Added Resolution|--- |DUPLICATE Version|unknown |14.0 Status|NEW |RESOLVED --- Comment #3 from AK --- I see. marking as duplicate. Thanks for clarifying! *** This bug has been marked as a duplicate of bug 59863 ***
[Bug c++/114342] New: suboptimal codegen of vector::vector(range)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114342 Bug ID: 114342 Summary: suboptimal codegen of vector::vector(range) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include std::vector td() { int arr[]{-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15,-5, 10, 15 -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,-5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10, 15, -5, 10,}; auto b = std::ranges::begin(arr); auto e = std::ranges::end(arr); std::vector dd(b, e); return dd; } What is the reason for calling `rep movsq` twice? $ gcc -O3 -std=c++23 ``` td(): pushrbp mov esi, OFFSET FLAT:.LC0 mov ecx, 55 pxorxmm0, xmm0 pushrbx mov rbx, rdi sub rsp, 456 mov QWORD PTR [rbx+16], 0 mov rbp, rsp movups XMMWORD PTR [rbx], xmm0 mov rdi, rbp rep movsq mov eax, DWORD PTR [rsi] mov DWORD PTR [rdi], eax mov edi, 444 calloperator new(unsigned long) lea rdx, [rax+444] mov QWORD PTR [rbx], rax lea rdi, [rax+8] mov rsi, rbp mov QWORD PTR [rbx+16], rdx mov rcx, QWORD PTR [rsp] and rdi, -8 mov QWORD PTR [rax], rcx mov rcx, QWORD PTR [rsp+436] mov QWORD PTR [rax+436], rcx sub rax, rdi sub rsi, rax add eax, 444 shr eax, 3 mov ecx, eax mov rax, rbx rep movsq mov QWORD PTR [rbx+8], rdx add rsp, 456 pop rbx pop rbp ret mov rbp, rax jmp .L2 td() [clone .cold]: .L2: mov rdi, QWORD PTR [rbx] mov rsi, QWORD PTR [rbx+16] sub rsi, rdi testrdi, rdi je .L3 calloperator delete(void*, unsigned long) .L3: mov rdi, rbp call_Unwind_Resume ``` https://godbolt.org/z/5333db8Px
[Bug c++/111806] g++ generates better code for variant at -Os compared to -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806 --- Comment #1 from AK --- It seems like we could 'sink' the 4 common instructions (of .L2) at -O3 L2: add rsp, 48 xor eax, eax pop rbx ret Or is it due to some kind of tail duplication?
[Bug c++/111806] New: g++ generates better code for variant at -Os compared to -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111806 Bug ID: 111806 Summary: g++ generates better code for variant at -Os compared to -O3 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include #include int foo() { std::variant v {"abc"}; std::cout << std::get<0>(v); return 0; } g++ -O3 -std=c++20 -g0 -fno-exceptions foo(): .LFB2484: pushrbx mov eax, 25185 mov edx, 3 mov edi, OFFSET FLAT:_ZSt4cout sub rsp, 48 lea rbx, [rsp+16] mov WORD PTR [rsp+16], ax mov rsi, rbx mov QWORD PTR [rsp], rbx mov BYTE PTR [rsp+18], 99 mov QWORD PTR [rsp+8], 3 mov BYTE PTR [rsp+19], 0 mov BYTE PTR [rsp+32], 0 callstd::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long) cmp BYTE PTR [rsp+32], 0 je .L5 .L2: add rsp, 48 xor eax, eax pop rbx ret .L5: mov rdi, QWORD PTR [rsp] cmp rdi, rbx je .L2 mov rax, QWORD PTR [rsp+16] lea rsi, [rax+1] calloperator delete(void*, unsigned long) add rsp, 48 xor eax, eax pop rbx ret .LFE2484: g++ -Os -std=c++20 -g0 -fno-exceptions foo(): .LFB2463: pushrbx mov edx, 3 mov edi, OFFSET FLAT:_ZSt4cout sub rsp, 48 lea rbx, [rsp+24] mov WORD PTR [rsp+24], 25185 mov rsi, rbx mov QWORD PTR [rsp+8], rbx mov BYTE PTR [rsp+26], 99 mov QWORD PTR [rsp+16], 3 mov BYTE PTR [rsp+27], 0 mov BYTE PTR [rsp+40], 0 callstd::basic_ostream >& std::__ostream_insert >(std::basic_ostream >&, char const*, long) cmp BYTE PTR [rsp+40], 0 jne .L2 mov rdi, QWORD PTR [rsp+8] cmp rdi, rbx je .L2 mov rax, QWORD PTR [rsp+24] lea rsi, [rax+1] calloperator delete(void*, unsigned long) .L2: add rsp, 48 xor eax, eax pop rbx ret .LFE2463: https://godbolt.org/z/3xKh35Mrv
[Bug c++/111805] New: suboptimal codegen of variant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111805 Bug ID: 111805 Summary: suboptimal codegen of variant Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include std::string foo() { std::variant v {"abc"}; return std::get<0>(v); } g++-13.2 -O2 -std=c++20 foo[abi:cxx11](): lea rdx, [rdi+16] mov BYTE PTR [rdi+18], 99 mov rax, rdi mov QWORD PTR [rdi], rdx mov edx, 25185 mov WORD PTR [rdi+16], dx mov QWORD PTR [rdi+8], 3 mov BYTE PTR [rdi+19], 0 ret clang++ -O2 -std=c++20 foo():# @foo() mov rax, rdi mov byte ptr [rdi], 6 mov dword ptr [rdi + 1], 6513249 ret https://godbolt.org/z/nTv5rYanM
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #6 from AK --- To confirm what Andrew mentioned, the release build (-O3) built successfully.
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 AK changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |MOVED --- Comment #5 from AK --- Created: https://sourceware.org/bugzilla/show_bug.cgi?id=30855
[Bug target/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #4 from AK --- good catch. By mistake i built at -O0, i wanted to build at -O3.
[Bug c/111420] relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 --- Comment #1 from AK --- I got this error while building clang (ninja clang) on a riscv machine. root@lpi4a:~# gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=32 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.0 (Debian 13.1.0-6) -- root@lpi4a:~# uname -a Linux lpi4a 5.10.113-g7b352f5ac2ba #1 SMP PREEMPT Wed Apr 12 12:06:11 UTC 2023 riscv64 GNU/Linux
[Bug c/111420] New: relocation truncated to fit: R_RISCV_JAL against `.L12287'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111420 Bug ID: 111420 Summary: relocation truncated to fit: R_RISCV_JAL against `.L12287' Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- CGBuiltin.cpp:(.text._ZN5clang7CodeGen15CodeGenFunction20EmitRISCVBuiltinExprEjPKNS_8CallExprENS0_15ReturnValueSlotE+0x10d0): relocation truncated to fit: R_RISCV_JAL against `.L12287' command: : && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -Wl,-z,defs -Wl,-z,nodelete -Wl,-rpath-link,/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/./lib -Wl,--gc-sections -shared -Wl,-soname,libclangCodeGen.so.18git -o lib/libclangCodeGen.so.18git tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/ABIInfoImpl.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/BackendUtil.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGAtomic.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBlocks.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGBuiltin.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDANV.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCUDARuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCXXABI.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCall.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGClass.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCleanup.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGCoroutine.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDebugInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDecl.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGDeclCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGException.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExpr.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprAgg.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprCXX.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprComplex.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprConstant.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGExprScalar.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGGPUBuiltin.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGHLSLRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGLoopInfo.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGNonTrivialStruct.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjC.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCGNU.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCMac.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGObjCRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenCLRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntime.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGOpenMPRuntimeGPU.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGRecordLayoutBuilder.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmt.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGStmtOpenMP.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTT.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CGVTables.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenABITypes.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenAction.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenFunction.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenModule.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenPGO.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTBAA.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/CodeGenTypes.cpp.o tools/clang/lib/CodeGen/CMakeFiles/obj.clangCodeGen.dir/Cons
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #9 from AK --- i think it is okay to close this bug as this doesn't seem to be related to gcc.
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #8 from AK --- > this does seem like a HW issue. Are you sure you have a decent RISCV machine > without any memory issues? > I suspect ninja is building with all of the cores which pushes the memory > usage high. possible. I have the https://sipeed.com/licheepi4a (licheepi 4a board) > Maybe lower the clock speed of the CPU you are using. will do. thanks
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #5 from AK --- Created attachment 55890 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55890&action=edit GlobalModuleIndex.cpp preprocessed files Everytime the crash is in a different file. it could be just because of memory issues.
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #3 from AK --- gcc -v COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.1.0-6' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=32 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.0 (Debian 13.1.0-6) root@lpi4a:/media/root/d2fc9f48-c166-4a9
[Bug tree-optimization/111393] ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 --- Comment #1 from AK --- oot/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build# ninja clang check-clang [100/845] Building CXX object tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o FAILED: tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o /usr/bin/c++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_HARDENED_MODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/lib/Serialization -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/tools/clang/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/build/include -I/media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -fno-lifetime-dse -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-nonnull -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o -MF tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o.d -o tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/GlobalModuleIndex.cpp.o -c /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp In file included from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMapInfo.h:20, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/llvm/include/llvm/ADT/DenseMap.h:17, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/include/clang/Serialization/GlobalModuleIndex.h:18, from /media/root/d2fc9f48-c166-4a9e-9868-133a1db7af88/llvm-project/clang/lib/Serialization/GlobalModuleIndex.cpp:13: /usr/include/c++/13/tuple: In instantiation of ‘struct std::_Tuple_impl<0, clang::ModuleFileExtensionReader*, std::default_delete >’: /usr/include/c++/13/tuple:1232:11: required from ‘class std::tuple >’ /usr/include/c++/13/bits/unique_ptr.h:232:27: required from ‘class std::__uniq_ptr_impl >’ /usr/include/c++/13/bits/unique_ptr.h:239:12: required from ‘struct std::__uniq_ptr_data, true, true>’ /usr/include/c++/13/bits/unique_ptr.h:283:33: required from ‘class std::unique_ptr’ /usr/include/c++/13/bits/stl_vector.h:367:35: required from ‘std::_Vector_base<_Tp, _Alloc>::~_Vector_base() [with _Tp = std::unique_ptr; _Alloc = std::allocator >]’ /usr/include/c++/13/bits/stl_vector.h:528:7: required from here /usr/include/c++/13/tuple:269:7: internal compiler error: Segmentation fault 269 | _M_head(_Tuple_impl& __t) noexcept { return _Base::_M_head(__t); } | ^~~ 0x85d7c5 crash_signal ../../src/gcc/toplev.cc:314 0xa0d5e0 profile_count::operator==(profile_count const&) const ../../src/gcc/profile-count.h:865 0xa0d5e0 profile_count::apply_probability(profile_probability) const ../../src/gcc/profile-count.h:1104 0xa0d5e0 edge_def::count() const ../../src/gcc/basic-block.h:639 0xa0d5e0 eliminate_tail_call ../../src/gcc/tree-tailcall.cc:982 0xa0d5e0 optimize_tail_call ../../src/gcc/tree-tailcall.cc:1053 0xa0d5e0 tree_optimize_tail_calls_1 ../../src/gcc/tree-tailcall.cc:1193 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See for instructions.
[Bug tree-optimization/111393] New: ICE: Segmentation fault src/gcc/toplev.cc:314
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111393 Bug ID: 111393 Summary: ICE: Segmentation fault src/gcc/toplev.cc:314 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- riscv64-gnu-linux (version Debian 13.1) building llvm-project (GlobalModuleIndex.cpp) crashed with ICE. src/gcc/toplev.cc:314 profile_count::operator==(proile_count const&) const ../../src/gcc/profile-count.h:865 profile_count::apply_probability(proile_probability) const ../../src/gcc/profile-count.h:1104
[Bug c++/110909] New: Suboptimal codegen in vector copy assignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110909 Bug ID: 110909 Summary: Suboptimal codegen in vector copy assignment Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include using Container = std::vector; int copy_assignment(const Container &v1, Container &v2) { v2 = v1; return 0; } I'd expect this to only generate a memcpy. but i'm not sure why memmoves are generated? $ gcc -std=c++2a -O3 -fno-exceptions copy_assignment(std::vector > const&, std::vector >&): cmp rsi, rdi je .L21 pushr13 pushr12 pushrbp mov rbp, rdi pushrbx mov rbx, rsi sub rsp, 8 mov rax, QWORD PTR [rdi+8] mov r13, QWORD PTR [rdi] mov rdx, QWORD PTR [rsi+16] mov rdi, QWORD PTR [rsi] mov r12, rax sub r12, r13 sub rdx, rdi cmp rdx, r12 jb .L25 mov rcx, QWORD PTR [rsi+8] mov rdx, rcx sub rdx, rdi cmp rdx, r12 jnb .L26 cmp rdx, 4 jle .L12 mov rsi, r13 callmemmove mov rcx, QWORD PTR [rbx+8] mov rdi, QWORD PTR [rbx] mov rax, QWORD PTR [rbp+8] mov r13, QWORD PTR [rbp+0] mov rdx, rcx sub rdx, rdi .L13: lea rsi, [r13+0+rdx] sub rax, rsi mov rdx, rax cmp rax, 4 jle .L14 mov rdi, rcx callmemmove mov rax, QWORD PTR [rbx] add rax, r12 .L8: mov QWORD PTR [rbx+8], rax add rsp, 8 xor eax, eax pop rbx pop rbp pop r12 pop r13 ret .L21: xor eax, eax ret .L25: movabs rax, 9223372036854775804 cmp rax, r12 jb .L27 mov rdi, r12 calloperator new(unsigned long) mov rbp, rax cmp r12, 4 jle .L5 mov rdx, r12 mov rsi, r13 mov rdi, rax callmemcpy .L6: mov rdi, QWORD PTR [rbx] testrdi, rdi je .L7 mov rsi, QWORD PTR [rbx+16] sub rsi, rdi calloperator delete(void*, unsigned long) .L7: lea rax, [rbp+0+r12] mov QWORD PTR [rbx], rbp mov QWORD PTR [rbx+16], rax jmp .L8 .L26: cmp r12, 4 jle .L10 mov rdx, r12 mov rsi, r13 callmemmove mov rax, QWORD PTR [rbx] add rax, r12 jmp .L8 .L14: lea rax, [rdi+r12] jne .L8 mov edx, DWORD PTR [rsi] mov DWORD PTR [rcx], edx jmp .L8 .L12: jne .L13 mov esi, DWORD PTR [r13+0] mov DWORD PTR [rdi], esi jmp .L13 .L10: lea rax, [rdi+r12] jne .L8 mov edx, DWORD PTR [r13+0] mov DWORD PTR [rdi], edx jmp .L8 .L5: mov eax, DWORD PTR [r13+0] mov DWORD PTR [rbp+0], eax jmp .L6 .L27: callstd::__throw_bad_array_new_length() Ideally, the above C++ code should translate to an equivalent of the following C++ code: using Container = std::vector; int copy_assignment(const Container &v1, Container &v2) { v2.reserve(v1.size()); std::memcpy(&v2[0], &v1[0], v1.size()*sizeof(int)); // change the size: v2.size() = v1.size() return 0; }
[Bug c++/110137] implement clang -fassume-sane-operator-new
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110137 --- Comment #3 from AK --- 1. clang also has noalias on nothrow versions of operator new. will `-fassume-sane-operator-new` enable that as well? 2. as per: http://eel.is/c++draft/basic.stc.dynamic#allocation-2 """If the request succeeds, the value returned by a replaceable allocation function is a non-null pointer value ([basic.compound]) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to a replaceable deallocation function.""" Does this mean that all successful new allocations can be assumed to be a noalias as long as the pointer wasn't passed to a deallocation function? In that case when possible, can the compiler `infer` from a bottom-up analysis that an allocation is a noalias?
[Bug tree-optimization/110819] Missed optimization: when vector's size is 0 but vector::reserve has been called.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819 --- Comment #2 from AK --- > When compiled with clang, libstdc++'s std::vector uses __builtin_operator_new > which always has the -fassume-sane-operator-new semantics, and so can be > optimized. yes clang optimizes with libstdc++ as well. what can be done in gcc for it to detect that the new+delete pair can be optimized away?
[Bug c++/110819] New: Missed optimization: when vector size is 0 but vector::reserve has been called.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110819 Bug ID: 110819 Summary: Missed optimization: when vector size is 0 but vector::reserve has been called. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include void f(int); void use_idx_const_size_reserve() { std::vector v; v.reserve(10); auto s = v.size(); for (std::vector::size_type i = 0; i < s; i++) f(v[i]); } $ g++ -O3 use_idx_const_size_reserve(): sub rsp, 8 mov edi, 40 calloperator new(unsigned long) mov esi, 40 add rsp, 8 mov rdi, rax jmp operator delete(void*, unsigned long) $ clang++ -O3 -stdlib=libc++ use_idx_const_size_reserve():# @use_idx_const_size_reserve() ret
[Bug libstdc++/109442] Dead local copy of std::vector not removed from function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 --- Comment #17 from AK --- With recent changes in libc++ (https://reviews.llvm.org/D147741) clang optimizes away the new-delete pair. https://godbolt.org/z/a6PG54Pvb $ clang++ -O3 -stdlib=libc++ -fno-exceptions vat1(std::__1::vector >): # @vat1(std::__1::vector >) sub rsp, 24 xorps xmm0, xmm0 movaps xmmword ptr [rsp], xmm0 mov qword ptr [rsp + 16], 0 mov rax, qword ptr [rdi + 8] sub rax, qword ptr [rdi] je .LBB0_2 js .LBB0_3 .LBB0_2: mov eax, 10 add rsp, 24 ret .LBB0_3: mov rdi, rsp call std::__1::vector >::__throw_length_error[abi:v17]() const .L.str: .asciz "vector" .L.str.1: .asciz "length_error was thrown in -fno-exceptions mode with message \"%s\"" Previously clang couldn't even convert the copy to a memmove and would generate a raw loop e.g., https://godbolt.org/z/G8ax1o5bc .LBB0_6: # =>This Inner Loop Header: Depth=1 movups xmm0, xmmword ptr [r15 + 4*rdi] movups xmm1, xmmword ptr [r15 + 4*rdi + 16] movups xmmword ptr [rax + 4*rdi], xmm0 movups xmmword ptr [rax + 4*rdi + 16], xmm1 add rdi, 8 cmp rsi, rdi jne .LBB0_6 cmp rbx, rsi jne .LBB0_8 jmp .LBB0_9 .LBB0_3:
[Bug c++/109443] missed optimization of std::vector access (Related to issue 35269)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443 --- Comment #17 from AK --- Even after vector::size() is hoisted, the codegen is sub-optimal compared to iterator version. ``` void use_idx_const_size(std::vector v) { auto s = v.size(); for (std::vector::size_type i = 0; i < s; i++) f(v[i]); } ``` $ g++ -O3 use_idx_const_size(std::vector >): pushr12 pushrbp pushrbx mov rdx, QWORD PTR [rdi+8] mov rax, QWORD PTR [rdi] mov r12, rdx sub r12, rax sar r12, 2 cmp rax, rdx je .L1 mov rbp, rdi xor ebx, ebx jmp .L3 .L6: mov rax, QWORD PTR [rbp+0] .L3: mov edi, DWORD PTR [rax+rbx*4] add rbx, 1 callf(int) cmp rbx, r12 jb .L6 .L1: pop rbx pop rbp pop r12 ret It seems compiler is assuming that vector `v` is not loop-invariant?
[Bug target/100811] Consider not omitting frame pointers by default on targets with many registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811 --- Comment #8 from AK --- Should we enable frame-pointers by default for RISCV64 as well?
[Bug target/100811] Consider not omitting frame pointers by default on targets with many registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100811 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #4 from AK --- On AArch64 (typically mobile platforms) app developers typically would enable frame pointers by default because it helps with crash reporting.
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #6 from AK --- Opened a bug for clang as well: https://github.com/llvm/llvm-project/issues/62783
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #5 from AK --- As per: https://en.cppreference.com/w/cpp/memory/new/operator_delete """ In all cases, if ptr is a null pointer, the standard library deallocation functions do nothing. If the pointer passed to the standard library deallocation function was not obtained from the corresponding standard library allocation function, the behavior is undefined. """ So it should be fine to remove the check `if(p)`
[Bug tree-optimization/109441] missed optimization when all elements of vector are known
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441 --- Comment #3 from AK --- > But IMHO it's academic, right? yes. i was just messing with vector codegen. But in case all the elements of a vector/array are same, maybe the loop can be replaced with equivalent computation?
[Bug tree-optimization/35269] missed optimization of std::vector access.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #2 from AK --- I posted a revised version of this bug here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443
[Bug tree-optimization/109443] missed optimization of std::vector access (Related to issue 35269)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443 --- Comment #1 from AK --- Link to issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35269 where I derived the testcase from.
[Bug tree-optimization/109443] New: missed optimization of std::vector access (Related to issue 35269)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109443 Bug ID: 109443 Summary: missed optimization of std::vector access (Related to issue 35269) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- here is slightly modified code example from issue #35269. Both accesses are similar bug different code is generated. the function `h` has better codegen than `g` for some reason. $ g++ -O3 -std=c++20 -fno-exceptions void f(int); void g(std::vector v) { for (std::vector::size_type i = 0; i < v.size(); i++) f( v[ i ] ); } void h(std::vector v) { for (std::vector::const_iterator i = v.begin(); i != v.end(); ++i) f( *i ); } g(std::vector >): mov rdx, QWORD PTR [rdi] cmp QWORD PTR [rdi+8], rdx je .L6 pushrbp mov rbp, rdi pushrbx xor ebx, ebx sub rsp, 8 .L3: mov edi, DWORD PTR [rdx+rbx*4] add rbx, 1 callf(int) mov rdx, QWORD PTR [rbp+0] mov rax, QWORD PTR [rbp+8] sub rax, rdx sar rax, 2 cmp rbx, rax jb .L3 add rsp, 8 pop rbx pop rbp ret .L6: ret h(std::vector >): pushrbp pushrbx sub rsp, 8 mov rbx, QWORD PTR [rdi] cmp rbx, QWORD PTR [rdi+8] je .L10 mov rbp, rdi .L12: mov edi, DWORD PTR [rbx] add rbx, 4 callf(int) cmp QWORD PTR [rbp+8], rbx jne .L12 .L10: add rsp, 8 pop rbx pop rbp ret
[Bug tree-optimization/109442] New: Dead local copy of std::vector not removed from function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 Bug ID: 109442 Summary: Dead local copy of std::vector not removed from function Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- T vat1(std::vector v1) { auto v = v1; return 10; } g++ -O3 -std=c++20 -fno-exceptions vat1(std::vector >): mov rax, QWORD PTR [rdi+8] sub rax, QWORD PTR [rdi] je .L11 pushrbp mov rbp, rax movabs rax, 9223372036854775804 pushrbx sub rsp, 8 cmp rax, rbp jb .L15 mov rbx, rdi mov rdi, rbp calloperator new(unsigned long) mov rsi, QWORD PTR [rbx] mov rdx, QWORD PTR [rbx+8] mov rdi, rax sub rdx, rsi cmp rdx, 4 jle .L16 callmemmove mov rdi, rax .L6: mov rsi, rbp calloperator delete(void*, unsigned long) add rsp, 8 mov eax, 10 pop rbx pop rbp ret .L11: mov eax, 10 ret .L15: callstd::__throw_bad_array_new_length() .L16: jne .L6 mov eax, DWORD PTR [rsi] mov DWORD PTR [rdi], eax jmp .L6
[Bug tree-optimization/109441] missed optimization when all elements of vector are known
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441 --- Comment #1 from AK --- I guess a better test case is this: #include using namespace std; using T = int; T v(std::vector v) { T s; std::fill(v.begin(), v.end(), T()); for (auto i = 0; i < v.size(); ++i) { s += v[i]; } return s; } which has similar effect. $ g++ -O3 -std=c++17 v(std::vector >): pushrbp pushrbx sub rsp, 8 mov rbp, QWORD PTR [rdi+8] mov rcx, QWORD PTR [rdi] cmp rcx, rbp je .L7 sub rbp, rcx mov rdi, rcx xor esi, esi mov rbx, rcx mov rdx, rbp callmemset mov rdi, rbp mov edx, 1 mov rcx, rbx sar rdi, 2 testrbp, rbp cmovne rdx, rdi cmp rbp, 12 jbe .L8 mov rax, rdx pxorxmm0, xmm0 shr rax, 2 sal rax, 4 add rax, rbx .L4: movdqu xmm2, XMMWORD PTR [rbx] add rbx, 16 paddd xmm0, xmm2 cmp rbx, rax jne .L4 movdqa xmm1, xmm0 psrldq xmm1, 8 paddd xmm0, xmm1 movdqa xmm1, xmm0 psrldq xmm1, 4 paddd xmm0, xmm1 movdeax, xmm0 testdl, 3 je .L1 and rdx, -4 mov esi, edx .L3: add eax, DWORD PTR [rcx+rdx*4] lea edx, [rsi+1] movsx rdx, edx cmp rdx, rdi jnb .L1 add esi, 2 lea r8, [0+rdx*4] add eax, DWORD PTR [rcx+rdx*4] movsx rsi, esi cmp rsi, rdi jnb .L1 add eax, DWORD PTR [rcx+4+r8] .L1: add rsp, 8 pop rbx pop rbp ret .L7: add rsp, 8 xor eax, eax pop rbx pop rbp ret .L8: xor eax, eax xor esi, esi xor edx, edx jmp .L3
[Bug tree-optimization/109441] New: missed optimization when all elements of vector are known
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109441 Bug ID: 109441 Summary: missed optimization when all elements of vector are known Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Reference: https://godbolt.org/z/af4x6zhz9 When all elements of vector are 0, then the compiler should be able to remove the loop and just return 0. Testcase: #include using namespace std; using T = int; T v() { T s; std::vector v; v.resize(1000, 0); for (auto i = 0; i < v.size(); ++i) { s += v[i]; } return s; } $ g++ -O3 -std=c++17 .LC0: .string "vector::_M_fill_insert" v(): push rbx pxor xmm0, xmm0 mov edx, 1000 xor esi, esi sub rsp, 48 lea rcx, [rsp+12] lea rdi, [rsp+16] mov QWORD PTR [rsp+32], 0 mov DWORD PTR [rsp+12], 0 movaps XMMWORD PTR [rsp+16], xmm0 call std::vector >::_M_fill_insert(__gnu_cxx::__normal_iterator > >, unsigned long, int const&) mov rdx, QWORD PTR [rsp+24] mov rdi, QWORD PTR [rsp+16] mov rax, rdx sub rax, rdi mov rsi, rax sar rsi, 2 cmp rdx, rdi je .L99 test rax, rax mov ecx, 1 cmovne rcx, rsi cmp rax, 12 jbe .L107 mov rdx, rcx pxor xmm0, xmm0 mov rax, rdi shr rdx, 2 sal rdx, 4 add rdx, rdi .L101: movdqu xmm2, XMMWORD PTR [rax] add rax, 16 paddd xmm0, xmm2 cmp rdx, rax jne .L101 movdqa xmm1, xmm0 psrldq xmm1, 8 paddd xmm0, xmm1 movdqa xmm1, xmm0 psrldq xmm1, 4 paddd xmm0, xmm1 movd ebx, xmm0 test cl, 3 je .L99 and rcx, -4 mov eax, ecx .L100: lea edx, [rax+1] add ebx, DWORD PTR [rdi+rcx*4] movsx rdx, edx cmp rdx, rsi jnb .L99 add eax, 2 lea rcx, [0+rdx*4] add ebx, DWORD PTR [rdi+rdx*4] cdqe cmp rax, rsi jnb .L99 add ebx, DWORD PTR [rdi+4+rcx] .L99: test rdi, rdi je .L98 mov rsi, QWORD PTR [rsp+32] sub rsi, rdi call operator delete(void*, unsigned long) .L98: add rsp, 48 mov eax, ebx pop rbx ret .L107: xor eax, eax xor ecx, ecx jmp .L100 mov rbx, rax jmp .L105 v() [clone .cold]:
[Bug tree-optimization/109440] New: Missed optimization of vector::at when a function is called inside the loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109440 Bug ID: 109440 Summary: Missed optimization of vector::at when a function is called inside the loop Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include #include using namespace std; bool bar(); using T = int; T vat(std::vector v) { T s; for (auto i = 0; i < v.size(); ++i) { if (bar()) s += v.at(i); } return s; } $ gcc -O2 -fexceptions -fno-unroll-loops .LC0: .string "vector::_M_range_check: __n (which is %zu) >= this->size() (which is %zu)" vat(std::vector >): mov rax, QWORD PTR [rdi] cmp QWORD PTR [rdi+8], rax je .L9 pushr12 pushrbp mov rbp, rdi pushrbx xor ebx, ebx jmp .L6 .L14: mov rax, QWORD PTR [rbp+8] sub rax, QWORD PTR [rbp+0] add rbx, 1 sar rax, 2 cmp rbx, rax jnb .L13 .L6: callbar() testal, al je .L14 mov rcx, QWORD PTR [rbp+0] mov rdx, QWORD PTR [rbp+8] sub rdx, rcx sar rdx, 2 mov rax, rdx cmp rbx, rdx jnb .L15 add r12d, DWORD PTR [rcx+rbx*4] add rbx, 1 cmp rbx, rax jb .L6 .L13: mov eax, r12d pop rbx pop rbp pop r12 ret .L9: mov eax, r12d ret .L15: mov rsi, rbx mov edi, OFFSET FLAT:.LC0 xor eax, eax callstd::__throw_out_of_range_fmt(char const*, ...)
[Bug tree-optimization/108915] invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 --- Comment #6 from AK --- For reference, I had opened a related bug in clang: https://github.com/llvm/llvm-project/issues/60967
[Bug c++/109017] ICE on unexpanded pack from C++20 explicit-template-parameter lambda syntax
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109017 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #1 from AK --- Example from twitter: https://twitter.com/seanbax/status/1631689332007337985 which had discussion on similar bug. ``` template struct outer1_t { void g() { // Compiles for mysterious reasons. int array[] { [](){ int i = Is2; return i; }.template operator()() ... }; } }; int main() { // Compiles OKAY when this is commented out. // ICEs when it's compiled. outer1_t<1, 5, 10>().g(); } ``` clang issues a compiler error: https://godbolt.org/z/7f6E55svM ``` :6:15: error: initializer contains unexpanded parameter pack 'Is2' int i = Is2; ```
[Bug tree-optimization/108915] invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 AK changed: What|Removed |Added Resolution|INVALID |FIXED --- Comment #4 from AK --- Adding `__attribute__((used))` also fixed it. Does it reflect the same behavior as using `asm` as you suggested?
[Bug c/108915] New: invalid pointer access preserved in optimized code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108915 Bug ID: 108915 Summary: invalid pointer access preserved in optimized code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Testcase has been reduced from u-boot's linker-list macro: https://github.com/u-boot/u-boot/blob/master/include/linker_lists.h#L127 #include char* bar() { static char start_bar[0] __attribute__((aligned(16))) __attribute__((unused)) __attribute__((section("__u_boot_list_2_1"))); char *p = (char *)start_bar; for (int i = p[0]; i < p[9]; i++) printf("asdfasd"); return 0; } $ gcc -O3 -fno-unroll-loops -S -o - .LC0: .string "asdfasd" bar: pushrbx movsx eax, BYTE PTR start_bar.1[rip+9] movsx ebx, BYTE PTR start_bar.1[rip] cmp ebx, eax jge .L2 .L3: mov edi, OFFSET FLAT:.LC0 xor eax, eax add ebx, 1 callprintf movsx eax, BYTE PTR start_bar.1[rip+9] cmp eax, ebx jg .L3 .L2: xor eax, eax pop rbx ret - $ clang -O3 -fno-unroll-loops -S -o - bar:# @bar xor eax, eax ret
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 AK changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |INVALID --- Comment #7 from AK --- not a bug
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 --- Comment #5 from AK --- Is this the definition of throw_bad_cast? https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/rtti.c#L221
[Bug libstdc++/107335] call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 --- Comment #4 from AK --- I wasn't sure if this is expected. Thanks for clarifying.
[Bug c++/107335] New: call to throw_bad_cast even with -fno-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107335 Bug ID: 107335 Summary: call to throw_bad_cast even with -fno-exceptions Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- Testcase: #include void foo() { std::cout << std::endl; } $ g++ -std=c++17 -O3 -fno-exceptions ```asm foo(): mov rax, QWORD PTR std::cout[rip] pushrbx mov rax, QWORD PTR [rax-24] mov rbx, QWORD PTR std::cout[rax+240] testrbx, rbx je .L10 cmp BYTE PTR [rbx+56], 0 je .L5 movsx esi, BYTE PTR [rbx+67] .L6: mov edi, OFFSET FLAT:std::cout callstd::basic_ostream >::put(char) pop rbx mov rdi, rax jmp std::basic_ostream >::flush() .L5: mov rdi, rbx callstd::ctype::_M_widen_init() const mov rax, QWORD PTR [rbx] mov esi, 10 mov rax, QWORD PTR [rax+48] cmp rax, OFFSET FLAT:_ZNKSt5ctypeIcE8do_widenEc je .L6 mov rdi, rbx callrax movsx esi, al jmp .L6 .L10: callstd::__throw_bad_cast() <--- call to __throw_bad_cast _GLOBAL__sub_I_foo(): sub rsp, 8 mov edi, OFFSET FLAT:_ZStL8__ioinit callstd::ios_base::Init::Init() [complete object constructor] mov edx, OFFSET FLAT:__dso_handle mov esi, OFFSET FLAT:_ZStL8__ioinit mov edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev add rsp, 8 jmp __cxa_atexit ```
[Bug tree-optimization/85611] Suboptimal code generation for (potentially) redundant atomic loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85611 AK changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #2 from AK --- Don't remember what I was expecting.
[Bug rtl-optimization/107063] New: [X86_64 codegen] Using inc eax instead of inc dword ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107063 Bug ID: 107063 Summary: [X86_64 codegen] Using inc eax instead of inc dword ptr Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- int volatile gv = 0; void foo() { ++gv; } $ gcc -Os foo(): mov eax, DWORD PTR gv[rip] inc eax mov DWORD PTR gv[rip], eax ret gv: .zero 4 $ clang -Os foo():# @foo() inc dword ptr [rip + gv] ret gv: .long 0 https://godbolt.org/z/vzq4jr5vj
[Bug tree-optimization/107011] instruction with undefined behavior not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011 --- Comment #2 from AK --- ah ok. sorry for the noise.
[Bug tree-optimization/107011] New: instruction with undefined behavior not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107011 Bug ID: 107011 Summary: instruction with undefined behavior not optimized away Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include int main() { return INT_MIN / -1; } $ gcc -O3 main: mov eax, -2147483648 ret $ clang -O3 main: # @main ret https://godbolt.org/z/393EMqs1E PS: I reported this bug yesterday as well but for some reason it does not appear in bugzilla so I'm creating another one.
[Bug tree-optimization/95565] [Feature request] add a flag to only instrument function entry.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95565 --- Comment #2 from AK --- clang has `-finstrument-function-entry-bare` to this effect: https://reviews.llvm.org/D40276
[Bug tree-optimization/107005] New: gcc not exploiting undefined behavior to optimize away the result of division
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107005 Bug ID: 107005 Summary: gcc not exploiting undefined behavior to optimize away the result of division Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- #include int main() { return INT_MIN / -1; } gcc -O2 main: mov eax, -2147483648 ret clang -O2 main: # @main ret https://godbolt.org/z/Tjxx3KGdK
[Bug ipa/106991] new+delete pair not optimized by g++ at -O3 but optimized at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991 --- Comment #3 from AK --- Thanks for identifying the underlying issue @Jan After modifying the definition of operator delete. gcc does optimize it at -O3 as well. https://godbolt.org/z/1WPqaWrEr // source code #include #include int volatile gv = 0; void* operator new(long unsigned sz ) { ++gv; return malloc( sz ); } void operator delete(void *p, unsigned long) noexcept { --gv; free(p); } class c { int l; public: c() : l(0) {} int get(){ return l; } }; int caller( void ){ c *f = new c(); assert( f->get() == 0 ); delete f; return gv; } $ $ g++ -std=c++20 -O3 ``` operator new(unsigned long): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*, unsigned long): mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax jmp free caller(): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax mov eax, DWORD PTR gv[rip] ret gv: .zero 4 ```
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #4 from AK --- Seems like clang now added the check: $ clang++ -Oz -fno-exceptions if_delete(char*): # @if_delete(char*) testrdi, rdi jne operator delete(void*)@PLT # TAILCALL ret
[Bug c++/106991] New: new+delete pair not optimized by g++ at -O3 but optimized at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106991 Bug ID: 106991 Summary: new+delete pair not optimized by g++ at -O3 but optimized at -Os Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://godbolt.org/z/PeYcoqTKn --- #include #include int volatile gv = 0; void* operator new(long unsigned sz ) { ++gv; return malloc( sz ); } void operator delete(void *p) noexcept { --gv; free(p); } class c { int l; public: c() : l(0) {} int get(){ return l; } }; int caller( void ){ c *f = new c(); assert( f->get() == 0 ); delete f; return gv; } --- $ g++ -std=c++20 -O3 operator new(unsigned long): mov eax, DWORD PTR gv[rip] add eax, 1 mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*): mov eax, DWORD PTR gv[rip] sub eax, 1 mov DWORD PTR gv[rip], eax jmp free caller(): sub rsp, 8 mov eax, DWORD PTR gv[rip] mov edi, 4 add eax, 1 mov DWORD PTR gv[rip], eax callmalloc mov esi, 4 mov rdi, rax calloperator delete(void*, unsigned long) mov eax, DWORD PTR gv[rip] add rsp, 8 ret gv: .zero 4 --- $ g++ -std=c++20 -Os operator new(unsigned long): mov eax, DWORD PTR gv[rip] inc eax mov DWORD PTR gv[rip], eax jmp malloc operator delete(void*): mov eax, DWORD PTR gv[rip] dec eax mov DWORD PTR gv[rip], eax jmp free caller(): mov eax, DWORD PTR gv[rip] ret gv: .zero 4
[Bug c++/87628] Redundant check of pointer when operator delete is called
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87628 --- Comment #3 from AK --- Still happening with gcc trunk. https://godbolt.org/z/5K94665GK
[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889 --- Comment #5 from AK --- Link to compiler explorer: https://godbolt.org/z/dGYG4dG15
[Bug rtl-optimization/82889] Unnecessary sign extension of int32 to int64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82889 --- Comment #4 from AK --- Seems like clang doesn't sign extend. $ clang -O3 -std=c++14 -g0 ``` .text .intel_syntax noprefix .file "example.cpp" .globl lol(int*, int*, unsigned int, unsigned int) # -- Begin function lol(int*, int*, unsigned int, unsigned int) .p2align4, 0x90 .type lol(int*, int*, unsigned int, unsigned int),@function lol(int*, int*, unsigned int, unsigned int): # @lol(int*, int*, unsigned int, unsigned int) .cfi_startproc # %bb.0: # kill: def $edx killed $edx def $rdx and edx, ecx mov r8d, ecx mov ecx, 1 jmp .LBB0_1 .p2align4, 0x90 .LBB0_4:# in Loop: Header=BB0_1 Depth=1 testal, 1 jne .LBB0_5 .LBB0_7:# in Loop: Header=BB0_1 Depth=1 add edx, ecx and edx, r8d inc rcx .LBB0_1:# =>This Inner Loop Header: Depth=1 mov eax, dword ptr [rsi + 4*rdx] testeax, eax js .LBB0_4 # %bb.2:# in Loop: Header=BB0_1 Depth=1 cmp dword ptr [rdi + 4*rax], 42 jne .LBB0_7 # %bb.3: mov eax, 1 ret .LBB0_5: xor eax, eax ret .Lfunc_end0: .size lol(int*, int*, unsigned int, unsigned int), .Lfunc_end0-lol(int*, int*, unsigned int, unsigned int) .cfi_endproc # -- End function .ident "clang version 16.0.0 (https://github.com/llvm/llvm-project.git 5e22ef3198d1686f7978dd150a3eefad4f737bfc)" .section".note.GNU-stack","",@progbits .addrsig ``` $ gcc -O3 -std=c++14 -g0 ``` lol(int*, int*, unsigned int, unsigned int): and edx, ecx mov r8d, 1 mov ecx, ecx jmp .L5 .L10: cmp DWORD PTR [rdi+rax*4], 42 je .L9 .L4: add rdx, r8 add r8, 1 and rdx, rcx .L5: movsx rax, DWORD PTR [rsi+rdx*4] <--- sign extend testeax, eax jns .L10 testal, 1 je .L4 xor eax, eax ret .L9: mov eax, 1 ret ```
[Bug libstdc++/78717] no definition of string::find when lowered to gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78717 --- Comment #3 from AK --- Even with a high inline limit, string::find didn't inline. g++-11.0.2 -O3 -finline-limit=10 -S -o a.s s.cpp cat a.s ``` _Z3fooRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_i: .LFB1240: .cfi_startproc endbr64 pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 movq8(%rsi), %rcx movslq %edx, %rbx xorl%edx, %edx movq(%rsi), %rsi call _ZNKSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE4findEPKcmm@PLT cmpq%rax, %rbx popq%rbx .cfi_def_cfa_offset 8 sete%al movzbl %al, %eax ret ```
[Bug other/92396] -ftime-trace support
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92396 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #12 from AK --- I was building a giant file that takes around 100 minutes. The -ftime-report gave nothing useful to find out hotspots. It is also not clear what we are reporting here as there is no documentation for it in man gcc. The %ages don't add up to 100 and that makes it confusing. I'm wondering if making this task a GSoC project will get more attention?
[Bug libstdc++/80331] unused const std::string not optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80331 --- Comment #9 from AK --- can't repro this with gcc 12.1 Seems like this is fixed? https://godbolt.org/z/e6n94zK4E
[Bug tree-optimization/105830] call to memcpy when -nostdlib -nodefaultlibs flags provided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830 --- Comment #3 from AK --- with -ffreestanding the calls to memcpy did disappear. Thanks.
[Bug tree-optimization/105830] New: call to memcpy when -nostdlib -nodefaultlibs flags provided
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105830 Bug ID: 105830 Summary: call to memcpy when -nostdlib -nodefaultlibs flags provided Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- https://godbolt.org/z/jTEa6ajn3 ``` // test.c // Type your code here, or load an example. /* Nonzero if either X or Y is not aligned on a "long" boundary. */ #define UNALIGNED(X, Y) \ (((unsigned long)X & (sizeof (unsigned long) - 1)) | ((unsigned long)Y & (sizeof (unsigned long) - 1))) #define UNALIGNED1(a) \ ((unsigned long)(a) & (sizeof(unsigned long)-1)) /* How many bytes are copied each iteration of the 4X unrolled loop. */ #define BIGBLOCKSIZE(sizeof (unsigned long) * 4) /* How many bytes are copied each iteration of the word copy loop. */ #define LITTLEBLOCKSIZE (sizeof (unsigned long)) /* Threshhold for punting to the byte copier. */ #define TOO_SMALL(LEN) ((LEN) < BIGBLOCKSIZE) void * memcpy (void *__restrict dst0, const void *__restrict src0, unsigned long len0) { unsigned char *dst = dst0; const unsigned char *src = src0; /* If the size is small, or either SRC or DST is unaligned, then punt into the byte copy loop. This should be rare. */ if (len0 >= LITTLEBLOCKSIZE && !UNALIGNED (src, dst)) { unsigned long *aligned_dst; const unsigned long *aligned_src; aligned_dst = (unsigned long*)dst; aligned_src = (const unsigned long*)src; /* Copy one long word at a time if possible. */ /* Copy one long word at a time if possible. */ do { *aligned_dst++ = *aligned_src++; len0 -= LITTLEBLOCKSIZE; } while (len0 >= LITTLEBLOCKSIZE); /* Pick up any residual with a byte copier. */ dst = (unsigned char*)aligned_dst; src = (const unsigned char*)aligned_src; } for (; len0; len0--) *dst++ = *src++; return dst0; } // ARM gcc trunk gcc -O3 -nostdlib -nodefaultlibs -S -o - memcpy: push{r3, r4, r5, r6, r7, lr} cmp r2, #3 mov r4, r2 mov r5, r0 mov r6, r1 bls .L5 orr r3, r0, r1 lslsr3, r3, #30 beq .L9 .L3: mov r2, r4 mov r1, r6 bl memcpy ; <- call to memcpy mov r0, r5 pop {r3, r4, r5, r6, r7, pc} .L9: subsr7, r2, #4 and r4, r2, #3 bic r7, r7, #3 addsr7, r7, #4 mov r2, r7 add r6, r6, r7 bl memcpy ; <- call to memcpy addsr0, r5, r7 .L5: cmp r4, #0 bne .L3 mov r0, r5 pop {r3, r4, r5, r6, r7, pc}
[Bug c++/105796] New: error: no matching function for call with template function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105796 Bug ID: 105796 Summary: error: no matching function for call with template function Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- test.cpp ``` int func(int, char); template int testFunc(int (*)(TArgs..., char)); int x = testFunc(func); ``` With gcc trunk: g++ -std=c++20 test.cpp -c :6:22: error: no matching function for call to 'testFunc(int (&)(int, char))' 6 | int x = testFunc(func); | ~^~ :4:5: note: candidate: 'template int testFunc(int (*)(TArgs ..., char))' 4 | int testFunc(int (*)(TArgs..., char)); | ^~~~ :4:5: note: template argument deduction/substitution failed: :6:22: note: mismatched types 'char' and 'int' 6 | int x = testFunc(func); | ~^~ Compiler returned: 1
[Bug c++/101138] New: Ambiguous code (with operator==) compiled without error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101138 Bug ID: 101138 Summary: Ambiguous code (with operator==) compiled without error Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.cpp #include using namespace std; template struct D { template bool operator==(Y a) const { cout << "f" < bool operator==(T a, D b) { cout << "fD" < a, b; if (a == b) return 0; return 1; } gcc compiles this code fine, bug clang errors out. https://godbolt.org/z/c13EExxeY
[Bug tree-optimization/101116] New: missed peephole optimization not of bitwise and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101116 Bug ID: 101116 Summary: missed peephole optimization not of bitwise and Product: gcc Version: 11.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- $ cat test.c bool foo(unsigned i) { return !(i & 1); } gcc -O2 test.c -S -o- foo(unsigned int): mov eax, edi not eax and eax, 1 ret clang -O2 test.c -S -o- foo(unsigned int): # @foo(unsigned int) testb $1, %dil sete %al retq Ref: https://godbolt.org/z/Tndb1dM8Y
[Bug tree-optimization/100004] Dead write not removed when indirection is introduced.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14 --- Comment #1 from AK --- godbolt link: https://gcc.godbolt.org/z/f7Y6G1svf
[Bug tree-optimization/100004] New: Dead write not removed when indirection is introduced.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14 Bug ID: 14 Summary: Dead write not removed when indirection is introduced. Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- struct Foo { int x; }; struct Bar { int x; }; void alias(Foo* foo, Bar* bar) { foo->x = 5; foo->x = bar->x; } struct Wrap1 { Foo foo; }; struct Wrap2 { Foo foo; }; void assign_direct(Wrap1* w1, Wrap2* w2) { w1->foo.x = 5; w1->foo.x = w2->foo.x; } void assign_via_pointer(Wrap1* w1, Wrap2* w2) { Foo* f1 = &w1->foo; Foo* f2 = &w2->foo; f1->x = 5; f1->x = f2->x; } $ gcc-arm64 -O2 -std=c++17 -fstrict-aliasing -S -o - alias(Foo*, Bar*): ldr w1, [x1] str w1, [x0] ret assign_direct(Wrap1*, Wrap2*): ldr w1, [x1] str w1, [x0] ret assign_via_pointer(Wrap1*, Wrap2*): mov w2, 5 str w2, [x0] ldr w1, [x1] str w1, [x0] ret
[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048 AK changed: What|Removed |Added CC||hiraditya at msn dot com --- Comment #17 from AK --- Now that we have string_view, will it be possible to avoid creating a copy?
[Bug tree-optimization/98497] New: [Potential Perf regression] jne to hot branch instead je to cold
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98497 Bug ID: 98497 Summary: [Potential Perf regression] jne to hot branch instead je to cold Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hiraditya at msn dot com Target Milestone: --- In the following code generated by gcc 10.2 ``` .L2: movups xmm3, XMMWORD PTR [rax] add rax, 16 addps xmm0, xmm3 cmp rax, rdx je .L6 jmp .L2 matrix_sum_column_major.cold: .L6: movaps xmm2, xmm0 # . ``` I think `jne .L2; jmp.L6` should be more efficient as it avoids one instruction in the hot path. c code: ``` float matrix_sum_column_major(float* x, int n) { n = 32767; float sum = 0; for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) sum += x[j * n + i]; return sum; } ``` gcc -Ofast -floop-nest-optimize -o - ``` matrix_sum_column_major: mov eax, 4294836212 lea rdx, [rdi+131056] pxorxmm1, xmm1 lea rcx, [rdi+rax] .L3: mov rax, rdi pxorxmm0, xmm0 .L2: movups xmm3, XMMWORD PTR [rax] add rax, 16 addps xmm0, xmm3 cmp rax, rdx je .L6 jmp .L2 matrix_sum_column_major.cold: .L6: movaps xmm2, xmm0 addss xmm1, DWORD PTR [rax+8] lea rdx, [rax+131068] add rdi, 131068 movhlps xmm2, xmm0 addps xmm2, xmm0 movaps xmm0, xmm2 shufps xmm0, xmm2, 85 addps xmm0, xmm2 movss xmm2, DWORD PTR [rax+4] addss xmm2, DWORD PTR [rax] addss xmm1, xmm2 addss xmm1, xmm0 cmp rdx, rcx jne .L3 movaps xmm0, xmm1 ret ``` Link to godbolt: https://gcc.godbolt.org/z/ac7YY1