[Bug c++/109127] New: More advanced constexpr value compile time evaluation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109127 Bug ID: 109127 Summary: More advanced constexpr value compile time evaluation Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- Hello, I'd like to report the idea which could improve the application performance. The idea is related to `constexpr` math, which can be performed at compile time. At some degree C++ compiler manages to perform the optimization. But in my more real example for some reason it does not perform that kind of optimization. Let's start with the simple example which explains the idea and which works. Following function serializes the `constexpr` unsigned into the string. It does not work right, as an output is reversed, but we will get into it later. ```cpp // The expected output is "543\0" void foo1(char* ptr) { constexpr unsigned Tag = 345; auto v = Tag; do { *ptr++ = (v % 10) + '0'; v /= 10; } while(v); *ptr = 0; } ``` The produced assembly is as following: ```asm foo1(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ``` It is good enough. I would replace the reading from the memory `.LC0` with the hardcoded unsigned integer though, so CPU does not have to access other memory locations: ``` mov eax, 0x35343300 ; instead of mov eax, DWORD PTR .LC0[rip] ``` Now, I change the code a bit to use 16-base math. That is an intermediate step before we go to the real code: ```cpp void foo2(char* ptr) { constexpr unsigned Tag = 0xF345; auto v = Tag; while(v != 0xF) { *ptr++ = (v % 16) + '0'; v /= 16; } *ptr = 0; } ``` The assembly is the same as above, which is good. The thing which does not work is if I reverse the output bytes, then compiler does not perform the `constexpr` math in the compile time: ```cpp void foo3(char* ptr) { constexpr unsigned Tag = 0x345; // Convert 0x345 -> 0xF543 auto v = Tag; auto reversed = 0xFu; // 0xF is a stop value while(v) { reversed <<= 4; reversed |= v & 0xFu; v >>= 4; } // Now serialize 0xF543 into "345\0" while(reversed != 0xF) { *ptr++ = (reversed % 16) + '0'; reversed /= 16; } *ptr = 0; } ``` The assembly output is following: ```asm foo3(char*): mov eax, 62277 .L2: mov edx, eax add rdi, 1 shr eax, 4 and edx, 15 add edx, 48 mov BYTE PTR [rdi-1], dl cmp eax, 15 jne .L2 mov BYTE PTR [rdi], 0 ret ``` In the assembly above there is a `.L2` loop, which could be calculated during the compilation. The workaround is to force compiler to calculate the reversed unsigned and store it as constexpr: ```cpp constexpr unsigned reverse(unsigned v) { auto reversed = 0xFu; while(v) { reversed <<= 4; reversed |= v & 0xFu; v >>= 4; } return reversed; } void foo3(char* ptr) { constexpr unsigned Tag = 0x543; constexpr unsigned ReversedTag = reverse(Tag); auto reversed = ReversedTag; while(reversed != 0xF) { *ptr++ = (reversed % 16) + '0'; reversed /= 16; } *ptr = 0; } ``` The assembly is back to normal: ```cpp foo3(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ```
[Bug c++/98840] Why does baz call the delete operator for moved unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840 --- Comment #4 from Dmitriy Ovdienko --- What if introduce new ABI version and encode into function name (function name mangling). And then have two options: * Either compile code and store both versions into lib file (ABI v1 and v2). Applies only to functions that have arguments of the non-trivial class passed by value. * Or compile ABI v2 and then linker if can find referenced ABI v2 function uses it as is (assuming that v2 function destructs the object inside) or if v2 function is not found it calls v1 function and adds the code to destruct objects passed by value. That applies to destruction only. Stack is cleaned by calling function as before.
[Bug c++/98840] Why does baz call the delete operator for moved unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840 --- Comment #3 from Dmitriy Ovdienko --- > This is not a GCC bug. No it is not. But can we improve that? That approach increases the binary size. In case if `baz` is called from many places, that is going to increase the binary size.
[Bug c++/98840] New: Why does baz call the delete operator for moved unique_ptr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840 Bug ID: 98840 Summary: Why does baz call the delete operator for moved unique_ptr Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- I'm trying to evaluate the overhead of the `unique_ptr` and I do not understand why does Gcc execute the destructor of the `unique_ptr` passed by value? Let's assume we have two examples of code: C style: ``` #include void foo(int* ptr); void baz(int value) { int* ptr = new int(value); try { foo(ptr); } catch(...) { delete ptr; throw; } } ``` The asm (/O3): ``` baz(int): pushrbp pushrbx mov ebx, edi mov edi, 4 sub rsp, 8 calloperator new(unsigned long) mov DWORD PTR [rax], ebx mov rdi, rax mov rbp, rax callfoo(int*) add rsp, 8 pop rbx pop rbp ret mov rdi, rax jmp .L2 baz(int) [clone .cold]: .L2: call__cxa_begin_catch mov esi, 4 mov rdi, rbp calloperator delete(void*, unsigned long) call__cxa_rethrow mov rbp, rax call__cxa_end_catch mov rdi, rbp call_Unwind_Resume ``` And C++ style ``` #include void foo(std::unique_ptr ptr); void baz(int value) { foo(std::make_unique(value)); } ``` The asm (/O3) ``` baz(int): pushrbp pushrbx mov ebx, edi mov edi, 4 sub rsp, 24 calloperator new(unsigned long) lea rdi, [rsp+8] mov DWORD PTR [rax], ebx mov QWORD PTR [rsp+8], rax callfoo(std::unique_ptr >) mov rdi, QWORD PTR [rsp+8] testrdi, rdi je .L1 mov esi, 4 calloperator delete(void*, unsigned long) << Here, why do we need to call the delete operator. It is `foo` who is responsible for that .L1: add rsp, 24 pop rbx pop rbp ret mov rbp, rax jmp .L3 baz(int) [clone .cold]: ```
[Bug c++/97641] Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 --- Comment #7 from Dmitriy Ovdienko --- If I change the body of the loop like this, it also works ``` while ('\x01' != *ptr) { result = result * 10 - '0' + *ptr++; } ``` Looks like integer overflow happens on last iteration and compiler treats it as a UB.
[Bug c++/97641] Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 --- Comment #6 from Dmitriy Ovdienko --- This code does not work ``` #include int Parse1(char const* ptr) noexcept { int result = 0; while ('\x01' != *ptr) { result = result * 10 + *ptr++ - '0'; } return result; } int main() { if(2147483600 != Parse1("2147483600\x01")) printf("does not match\n"); else printf("matches\n"); } ``` But this does work: ``` #include int Parse1(char const* ptr) noexcept { int result = 0; while ('\x01' != *ptr) { result = result * 10 + (*ptr++ - '0'); } return result; } int main() { if(2147483600 != Parse1("2147483600\x01")) printf("does not match\n"); else printf("matches\n"); } ```
[Bug c++/97641] Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 Dmitriy Ovdienko changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID |--- --- Comment #5 from Dmitriy Ovdienko --- The maximum value that works is 2147483599. 2147483600 does not work. My function is correct. On clang and vc++ it works.
[Bug c++/97641] Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 --- Comment #4 from Dmitriy Ovdienko --- It happens to 2147483646, 2147483647 and std::numeric_limits::min().
[Bug c++/97641] Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 --- Comment #1 from Dmitriy Ovdienko --- OS: Windows 10 Distribution: MSys2 (https://www.msys2.org/) Version: (Rev4, Built by MSYS2 project) 10.2.0 I tried to reproduce this issue on https://gcc.godbolt.org/. gcc (trunk) is also unable to compile this code correctly.
[Bug c++/97641] New: Wrong codegen if optimizer is enabled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641 Bug ID: 97641 Summary: Wrong codegen if optimizer is enabled Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- g++ optimizer produces wrong code in case if -O3 is used. In case if -O2 and -O1 are used, app works as expected. Expected output: matches In fact output: does not match ``` // // g++ -O3 test.cpp // #include int Parse1(char const* ptr) noexcept { bool const negative = '-' == *ptr; if (negative) { ++ptr; } int result = 0; while ('\x01' != *ptr) { result = result * 10 + *ptr++ - '0'; } return negative ? -result : result; } int main() { if(-2147483648 != Parse1("-2147483648\x01")) printf("does not match\n"); else printf("matches\n"); } ```