[Bug c++/109127] New: More advanced constexpr value compile time evaluation

2023-03-14 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109127

Bug ID: 109127
   Summary: More advanced constexpr value compile time evaluation
   Product: gcc
   Version: 12.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dmitriy.ovdienko at gmail dot com
  Target Milestone: ---

Hello,

I'd like to report the idea which could improve the application performance.

The idea is related to `constexpr` math, which can be performed at compile
time. At some degree C++ compiler manages to perform the optimization. But in
my more real example for some reason it does not perform that kind of
optimization.

Let's start with the simple example which explains the idea and which works.
Following function serializes the `constexpr` unsigned into the string. It does
not work right, as an output is reversed, but we will get into it later.

```cpp
// The expected output is "543\0"
void foo1(char* ptr)
{
constexpr unsigned Tag = 345;

auto v = Tag;

do
{
*ptr++ = (v % 10) + '0';
v /= 10;
}
while(v);

*ptr = 0;
}
```


The produced assembly is as following:


```asm
foo1(char*):
mov eax, DWORD PTR .LC0[rip]
mov DWORD PTR [rdi], eax
ret

.LC0:
.byte   53
.byte   52
.byte   51
.byte   0
```

It is good enough. I would replace the reading from the memory `.LC0` with the
hardcoded unsigned integer though, so CPU does not have to access other memory
locations:

```
mov eax, 0x35343300
; instead of
mov eax, DWORD PTR .LC0[rip]
```

Now, I change the code a bit to use 16-base math. That is an intermediate step
before we go to the real code:

```cpp
void foo2(char* ptr)
{
constexpr unsigned Tag = 0xF345;

auto v = Tag;

while(v != 0xF)
{
*ptr++ = (v % 16) + '0';
v /= 16;
}

*ptr = 0;
}
```

The assembly is the same as above, which is good.

The thing which does not work is if I reverse the output bytes, then compiler
does not perform the `constexpr` math in the compile time:


```cpp
void foo3(char* ptr)
{
constexpr unsigned Tag = 0x345;

// Convert 0x345 -> 0xF543
auto v = Tag;
auto reversed = 0xFu; // 0xF is a stop value
while(v)
{
reversed <<= 4;
reversed |= v & 0xFu;
v >>= 4;
}

// Now serialize 0xF543 into "345\0"
while(reversed != 0xF)
{
*ptr++ = (reversed % 16) + '0';
reversed /= 16;
}

*ptr = 0;
}

```

The assembly output is following:

```asm
foo3(char*):
mov eax, 62277
.L2:
mov edx, eax
add rdi, 1
shr eax, 4
and edx, 15
add edx, 48
mov BYTE PTR [rdi-1], dl
cmp eax, 15
jne .L2
mov BYTE PTR [rdi], 0
ret
```

In the assembly above there is a `.L2` loop, which could be calculated during
the compilation.

The workaround is to force compiler to calculate the reversed unsigned and
store it as constexpr:

```cpp
constexpr unsigned reverse(unsigned v)
{
auto reversed = 0xFu;
while(v)
{
reversed <<= 4;
reversed |= v & 0xFu;
v >>= 4;
}

return reversed;
}

void foo3(char* ptr)
{
constexpr unsigned Tag = 0x543;
constexpr unsigned ReversedTag = reverse(Tag);

auto reversed = ReversedTag;
while(reversed != 0xF)
{
*ptr++ = (reversed % 16) + '0';
reversed /= 16;
}

*ptr = 0;
}

```

The assembly is back to normal:

```cpp
foo3(char*):
mov eax, DWORD PTR .LC0[rip]
mov DWORD PTR [rdi], eax
ret
.LC0:
.byte   53
.byte   52
.byte   51
.byte   0
```

[Bug c++/98840] Why does baz call the delete operator for moved unique_ptr

2021-01-26 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840

--- Comment #4 from Dmitriy Ovdienko  ---
What if introduce new ABI version and encode into function name (function name
mangling). 

And then have two options:

* Either compile code and store both versions into lib file (ABI v1 and v2).
Applies only to functions that have arguments of the non-trivial class passed
by value.
* Or compile ABI v2 and then linker if can find referenced ABI v2 function uses
it as is (assuming that v2 function destructs the object inside) or if v2
function is not found it calls v1 function and adds the code to destruct
objects passed by value.

That applies to destruction only. Stack is cleaned by calling function as
before.

[Bug c++/98840] Why does baz call the delete operator for moved unique_ptr

2021-01-26 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840

--- Comment #3 from Dmitriy Ovdienko  ---
> This is not a GCC bug.

No it is not. But can we improve that?

That approach increases the binary size. In case if `baz` is called from many
places, that is going to increase the binary size.

[Bug c++/98840] New: Why does baz call the delete operator for moved unique_ptr

2021-01-26 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840

Bug ID: 98840
   Summary: Why does baz call the delete operator for moved
unique_ptr
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dmitriy.ovdienko at gmail dot com
  Target Milestone: ---

I'm trying to evaluate the overhead of the `unique_ptr` and I do not understand
why does Gcc execute the destructor of the `unique_ptr` passed by value?

Let's assume we have two examples of code:

C style:

```
#include 

void foo(int* ptr);

void baz(int value)
{
int* ptr = new int(value);

try
{
foo(ptr);
}
catch(...)
{
delete ptr;
throw;
}
}
```

The asm (/O3):

```
baz(int):
pushrbp
pushrbx
mov ebx, edi
mov edi, 4
sub rsp, 8
calloperator new(unsigned long)
mov DWORD PTR [rax], ebx
mov rdi, rax
mov rbp, rax
callfoo(int*)
add rsp, 8
pop rbx
pop rbp
ret
mov rdi, rax
jmp .L2

baz(int) [clone .cold]:
.L2:
call__cxa_begin_catch
mov esi, 4
mov rdi, rbp
calloperator delete(void*, unsigned long)
call__cxa_rethrow
mov rbp, rax
call__cxa_end_catch
mov rdi, rbp
call_Unwind_Resume
```


And C++ style

```
#include 

void foo(std::unique_ptr ptr);

void baz(int value)
{
foo(std::make_unique(value));
}
```

The asm (/O3)

```
baz(int):
pushrbp
pushrbx
mov ebx, edi
mov edi, 4
sub rsp, 24
calloperator new(unsigned long)
lea rdi, [rsp+8]
mov DWORD PTR [rax], ebx
mov QWORD PTR [rsp+8], rax
callfoo(std::unique_ptr >)
mov rdi, QWORD PTR [rsp+8]
testrdi, rdi
je  .L1
mov esi, 4
calloperator delete(void*, unsigned long) << Here, why do we
need to call the delete operator. It is `foo` who is responsible for that
.L1:
add rsp, 24
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L3
baz(int) [clone .cold]:
```

[Bug c++/97641] Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

--- Comment #7 from Dmitriy Ovdienko  ---
If I change the body of the loop like this, it also works

```

while ('\x01' != *ptr)
{
result = result * 10 - '0' + *ptr++;
}

```

Looks like integer overflow happens on last iteration and compiler treats it as
a UB.

[Bug c++/97641] Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

--- Comment #6 from Dmitriy Ovdienko  ---
This code does not work
```
#include 

int Parse1(char const* ptr) noexcept
{
int result = 0;
while ('\x01' != *ptr)
{
result = result * 10 + *ptr++ - '0';
}
return result;
}

int main()
{
if(2147483600 != Parse1("2147483600\x01")) 
printf("does not match\n");
else
printf("matches\n");
}
```

But this does work:

```
#include 

int Parse1(char const* ptr) noexcept
{
int result = 0;
while ('\x01' != *ptr)
{
result = result * 10 + (*ptr++ - '0');
}
return result;
}

int main()
{
if(2147483600 != Parse1("2147483600\x01")) 
printf("does not match\n");
else
printf("matches\n");
}
```

[Bug c++/97641] Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

Dmitriy Ovdienko  changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |---

--- Comment #5 from Dmitriy Ovdienko  ---
The maximum value that works is 2147483599. 2147483600 does not work.

My function is correct. On clang and vc++ it works.

[Bug c++/97641] Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

--- Comment #4 from Dmitriy Ovdienko  ---
It happens to 2147483646, 2147483647 and std::numeric_limits::min().

[Bug c++/97641] Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

--- Comment #1 from Dmitriy Ovdienko  ---
OS: Windows 10
Distribution: MSys2 (https://www.msys2.org/)
Version: (Rev4, Built by MSYS2 project) 10.2.0

I tried to reproduce this issue on https://gcc.godbolt.org/. gcc (trunk) is
also unable to compile this code correctly.

[Bug c++/97641] New: Wrong codegen if optimizer is enabled

2020-10-30 Thread dmitriy.ovdienko at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641

Bug ID: 97641
   Summary: Wrong codegen if optimizer is enabled
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dmitriy.ovdienko at gmail dot com
  Target Milestone: ---

g++ optimizer produces wrong code in case if -O3 is used. In case if -O2 and
-O1 are used, app works as expected.

Expected output: matches
In fact output: does not match

```
//
// g++ -O3 test.cpp 
//

#include 

int Parse1(char const* ptr) noexcept
{
bool const negative = '-' == *ptr;
if (negative)
{
++ptr;
}

int result = 0;
while ('\x01' != *ptr)
{
result = result * 10 + *ptr++ - '0';
}
return negative ? -result : result;
}

int main()
{
if(-2147483648 != Parse1("-2147483648\x01")) 
printf("does not match\n");
else
printf("matches\n");
}

```