[Bug gcov-profile/114851] Alternative to -Wmisexpect from LLVM in GCC

2024-04-25 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114851

--- Comment #3 from Alexander Zaitsev  ---
> Though I do wonder if the "hints" are used instead of the PGO here.

We already discussed this question a bit in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112806 . If I understand
correctly, no clear answer yet:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112806#c4 .

[Bug gcov-profile/114851] New: Alternative to -Wmisexpect from LLVM in GCC

2024-04-25 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114851

Bug ID: 114851
   Summary: Alternative to -Wmisexpect from LLVM in GCC
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

LLVM infrastructure supports a diagnostic for checking mismatches between
user-provided __builtin_expect/[[likely]] hints and PGO profiles:
https://clang.llvm.org/docs/DiagnosticsReference.html#wmisexpect +
https://llvm.org/docs/MisExpect.html (and an example of its usage in Chromium:
https://issues.chromium.org/issues/40694104).

I was trying to find a similar diagnostic in GCC but found nothing. Is there
anything similar in GCC? If not, can we make the issue a Feature Request for
such a feature? Having such a diagnostic can be helpful in practice since it
allows for finding wrongfully placed user hints in sources.

[Bug tree-optimization/114761] Ignored [[likely]] attribute with multiple if statements doing the same thing

2024-04-18 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114761

--- Comment #5 from Alexander Zaitsev  ---
> Is this based on real code or you just was looking at the differences between 
> gcc and clang here?

Really, not on a real code. I came up with this example when I found that GCC
for this example doesn't reorganize branches according to PGO profiles (when
Clang does it). I just wondered about this difference in behavior between
compilers, and trying to figure out what compiler is "right" here.

Regarding generated code efficiency between Clang and GCC in this case. Am I
right that in this case ignoring branch probabilities (in the GCC case) doesn't
affect actual code performance? Asking it since I am not so proficient in
compiler optimizations.

[Bug tree-optimization/114761] New: Ignored [[likely]] attribute

2024-04-17 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114761

Bug ID: 114761
   Summary: Ignored [[likely]] attribute
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

For the following code:

bool foo(int var)
{
if (var == 42) [[unlikely]] return true;
if (var == 322) [[unlikely]] return true;
if (var == 1337) [[likely]] return true;

return false;
}

GCC (trunk) with "-O3 -std=c++20" generates the following:

foo(int):
cmp edi, 322
seteal
cmp edi, 42
setedl
or  eax, edx
cmp edi, 1337
setedl
or  eax, edx
ret

Clang (18) with "-O3 -std=c++20" however, generates a bit different version:

foo(int):# @foo(int)
mov al, 1
cmp edi, 1337
jne .LBB0_1
.LBB0_4:
ret
.LBB0_1:
cmp edi, 42
je  .LBB0_4
cmp edi, 322
je  .LBB0_4
xor eax, eax
ret

GCC for some reason ignores [[likely]] attribute and doesn't place the branch
with 1337 at the beginning of the function. Clang does it. Placing this branch
at the beginning should be more optimal. I also tested GCC 13.2 (on my Fedora
machine) with __builtin_expect and PGO - the result is the same for GCC: it
ignores such an optimization.

Godbolt link: https://godbolt.org/z/o8KMx8M33

[Bug gcov-profile/112829] Dump PGO profiles to a memory buffer

2023-12-03 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112829

--- Comment #2 from Alexander Zaitsev  ---
Am I right that right now in GCC there are no ready-to-use alternatives to "int
__llvm_profile_write_buffer(char *Buffer)" from LLVM and it should be
implemented somehow manually (as you described)?

[Bug gcov-profile/112829] New: Dump PGO profiles to a memory buffer

2023-12-02 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112829

Bug ID: 112829
   Summary: Dump PGO profiles to a memory buffer
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

According to the GCC documentation
(https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html) the only
option is to dump PGO profiles to a filesystem. I am looking for an option to
dump PGO profiles into a memory buffer. LLVM infrastructure has such an ability
- it's documented here:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#using-the-profiling-runtime-without-a-filesystem
. If GCC has such an ability too - would be great if it would be described
somewhere in the Instrumentation documentation (or in any other better place in
your opinion).

The use case for having this is simple - in some systems, a filesystem can be
read-only (e.g. due to security concerns) or even not enough to handle the PGO
profile. With the memory approach, we will be able to collect PGO profiles and
then deliver and expose them via other interfaces like HTTP or MQTT.

I guess some related information can be found here
(https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libgcc/libgcov-profiler.c) but I
am not sure.

[Bug gcov-profile/112806] Profile-Guided Optimization (PGO) policy regarding explicit user optimization hint behavior

2023-12-01 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112806

--- Comment #3 from Alexander Zaitsev  ---
> https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/Other-Builtins.html#index-fprofile-arcs-1

I already read this and still do not understand the actual behavior. If PGO
profiles show that the branch is "cold" but a user write for this branch via
__builtin_expect/[[likely]] that the branch is "hot" - what decision will be
made by the optimizer?

On the link above there is only "In general, you should prefer to use actual
profile feedback for this (-fprofile-arcs), as programmers are notoriously bad
at predicting how their programs actually perform.". But it does not specify
the actual behavior - it's just a recommendation to use PGO instead of manual
[[likely]] hints.

[Bug gcov-profile/112717] .gcda profiles compatibility guarantees between GCC versions

2023-11-26 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112717

--- Comment #3 from Alexander Zaitsev  ---
> I thought this was documented but I don't see. There is no guarantee for 
> forward or backwards compatibility at all. In fact iirc there is a version 
> stored in the files to make sure the correct version is used with the version 
> of tools/compiler.

Could we add this information to the documentation? Would be really helpful to
the users to know this detail.

Since your answer am I right that right now it's a strong
recommendation/requirement to regenerate PGO profiles with each GCC update?

[Bug gcov-profile/112717] New: .gcda profiles compatibility guarantees between GCC versions

2023-11-26 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112717

Bug ID: 112717
   Summary: .gcda profiles compatibility guarantees between GCC
versions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Hi. I have several questions regarding .gcda profiles re-usage between GCC
versions for Profile-Guided Optimization (PGO) purposes.

The first question goes about forward and backward guarantees .gcda profiles. I
didn't find related information in the GCC documentation. Are there guarantees
in this area? Like "it's guaranteed that .gcda profiles from GCC version N will
be always readable by GCC version N+1", where N is a minor/major GCC version.
For us it's an important question since we are thinking about caching .gcda
profiles in storage so PGO profiles can be reused later probably with a newer
compiler. This goes in another direction too in the case if we generated the
PGO profile with GCC 10 and some time later decided to revert the compiler to
GCC 9. If there are some guarantees in this area, would be great to see them
documented somewhere in the documentation (probably in a place like
https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html).

The second question is about PGO profiles reusability between GCC versions. As
far as I understand, PGO profiles track some "counters" about the code.
Possibly these counters can rely somehow on the performed by GCC optimizations
(it's just my guess). Let's imagine that GCC 11 added more optimization passes
that affect somehow generated code (e.g. much more aggressive inlining compared
to GCC 10). In this case, probably, PGO profiles from GCC 10 wouldn't be useful
anymore and we will need to regenerate them once again but with GCC 11. Is this
scenario real? If yes, are there ways to mitigate it somehow?

For LLVM I have the same questions that are discussed here:
https://discourse.llvm.org/t/profile-guided-optimization-pgo-related-questions-and-suggestions/75232
. As far as I understand, GCC also implements PGO on something like "IR" (don't
know how it's called properly in GCC - "Generic" or "GIMPLE"?), so probably
some answers from LLVM would be applicable to GCC as well.

[Bug other/112492] New: Add LLVM BOLT support to the GCC build scripts

2023-11-12 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112492

Bug ID: 112492
   Summary: Add LLVM BOLT support to the GCC build scripts
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

Hi!

According to the Facebook Research Paper
(https://research.facebook.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/),
LLVM BOLT (https://github.com/llvm/llvm-project/blob/main/bolt/README.md) helps
with achieving better performance for GCC even after PGO-optimized GCC build.

I think will be a good idea to add support for building GCC with BOLT as it's
already for PGO-optimized GCC build with "make profiledbootstrap" target.
Integrating LLVM BOLT to the build scripts allows maintainers in a much easier
way to enable LLVM BOLT for GCC in their .spec files.

Here I got some examples of how LLVM BOLT is already integrated into other
projects:

* Rustc: https://github.com/rust-lang/rust/pull/116352
* CPython: https://github.com/python/cpython/pull/95908
* Pyston:
  - https://github.com/pyston/pyston#building
  - https://github.com/pyston/pyston/blob/pyston_main/Makefile#L200
* Clang:
https://github.com/llvm/llvm-project/blob/main/clang/cmake/caches/BOLT.cmake

More about LLVM BOLT results for other projects can be found in:

* Rustc:
  - https://github.com/rust-lang/rust/pull/116352
  -
https://www.reddit.com/r/rust/comments/y4w2kr/llvm_used_by_rustc_is_now_optimized_with_bolt_on/
* CPython: https://github.com/python/cpython/pull/95908
* YDB: https://github.com/ydb-platform/ydb/issues/140
* Clang:
  -
[Slides](https://llvm.org/devmtg/2022-11/slides/Lightning15-OptimizingClangWithBOLTUsingCMake.pdf)
  - [Results on building
Clang](https://github.com/ptr1337/llvm-bolt-scripts/blob/master/results.md)
  - [Linaro
results](https://android-review.linaro.org/plugins/gitiles/toolchain/llvm_android/+/f36c64eeddf531b7b1a144c40f61d6c9a78eee7a)
  - [on AMD
7950X3D](https://github.com/llvm/llvm-project/issues/65010#issuecomment-1701255347)
* LDC:
https://github.com/ldc-developers/ldc/issues/4228#issuecomment-1334499428
* NodeJS: https://aaupov.github.io/blog/2020/10/08/bolt-nodejs
* Chromium: https://aaupov.github.io/blog/2022/11/12/bolt-chromium
* MySQL, MongoDB, memcached, Verilator:
https://people.ucsc.edu/~hlitz/papers/ocolos.pdf

More information can be found here: https://github.com/zamazan4ik/awesome-pgo

[Bug c++/96821] [concepts] Incorrect evaluation of concept with ill-formed expression

2021-05-09 Thread zamazan4ik at tut dot by via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96821

Alexander Zaitsev  changed:

   What|Removed |Added

 CC||zamazan4ik at tut dot by

--- Comment #6 from Alexander Zaitsev  ---
Any updates on the issue? Such behaviour is strange too since Clang and MSVC
have a different opinion from GCC for the code.

[Bug tree-optimization/91540] Missed optimization: simplification CFG

2019-08-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91540

--- Comment #1 from Alexander Zaitsev  ---
Godbolt playground: https://godbolt.org/z/MFSH1D

[Bug tree-optimization/91540] New: Missed optimization: simplification CFG

2019-08-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91540

Bug ID: 91540
   Summary: Missed optimization: simplification CFG
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

For the code below:

int Test(bool cond1, bool cond2)
{
if (cond1)
{
if (cond2)
{
return 42;
}
}
return 43;
}

gcc(trunk) with '-O3' produces:

Test(bool, bool):
  test dil, dil
  je .L3
  test sil, sil
  jne .L5
.L3:
  mov eax, 43
  ret
.L5:
  mov eax, 42
  ret

clang(trunk) with '-O3' produces:

Test(bool, bool): # @Test(bool, bool)
  mov eax, edi
  and eax, esi
  xor eax, 43
  ret

I think GCC can do it better.

[Bug tree-optimization/91250] Missed optimization: is not used vfnmsub213ss

2019-07-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91250

--- Comment #2 from Alexander Zaitsev  ---
But on this example all is fine:


float foo(float a, float b, float c)
{
return -a * -b - c;
}

[Bug tree-optimization/91250] Missed optimization: is not used vfnmsub213ss

2019-07-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91250

--- Comment #1 from Alexander Zaitsev  ---
Another example of missed optimization:

float foo(float a, float b, float c)
{
return -a * b - c;
}

[Bug tree-optimization/91250] New: Missed optimization: is not used vfnmsub213ss

2019-07-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91250

Bug ID: 91250
   Summary: Missed optimization: is not used vfnmsub213ss
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

For this code:

float foo(float a, float b, float c)
{
return a * -b - c;
}


gcc(trunk) with '-O3 -ffast-math -march=haswell' produces this:


foo(float, float, float):
vfmadd132ss xmm0, xmm2, xmm1
vxorps  xmm0, xmm0, XMMWORD PTR .LC0[rip]
ret


clang (trunk) with '-O3 -ffast-math -march=haswell' produces this:


foo(float, float, float):
vfnmsub213ssxmm0, xmm1, xmm2 # xmm0 = -(xmm1 * xmm0) - xmm2
ret


Note: playground on godbolt - https://godbolt.org/z/NTMVdg

[Bug tree-optimization/91249] New: Missed optimization: division and multiplying ops in ffast-math mode

2019-07-24 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91249

Bug ID: 91249
   Summary: Missed optimization: division and multiplying ops in
ffast-math mode
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

For this code:

float foo(float x, float y)
{
return x * y/y;
}


gcc(trunk,9,8,7,6,5) with '-O3 -ffast-math' produces this:


foo(float, float):
mulss   xmm0, xmm1
divss   xmm0, xmm1
ret


clang(trunk) with '-O3 -ffast-math' produces this:


foo(float, float):   # @foo(float, float)
ret


Notes: playground on godbolt - https://godbolt.org/z/Qjr3OD

[Bug c++/87849] Missed optimization: useless for-loop must be eliminated

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87849

--- Comment #1 from Alexander Zaitsev  ---
Same for code without using STL algorithms and containers:

int min(int a, int b)
{
return a < b ? a : b;
}

int max(int a, int b)
{
return a > b ? a : b;
}

int foo(int* v, int size) {
int l = v[0];
for(int i=0; i < size; ++i)
{
l = min(l, v[i]);
}

for(int i=0; i < size; ++i)
{
l = max(l, v[i]);
}

return l;
}


Result of function doesn't depend on the result of first loop and it can be
eliminated.

[Bug c++/87849] New: Missed optimization: useless for-loop must be eliminated

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87849

Bug ID: 87849
   Summary: Missed optimization: useless for-loop must be
eliminated
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with -O3 -std=c++17 for this code:

#include 
#include 

int foo(std::vector v) {
int l = v[0];
for(const auto& x : v) {
l = std::min(l, x);
}

for(const auto& x : v) {
l = std::max(l, x);
}

return l;
}

gcc doesn't eliminate first loop, but gcc can, because first loop has no effect
in this function.

[Bug tree-optimization/83354] Missed optimization in math expression: pow(cbrt(x), y) == pow(x, y / 3)

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83354

--- Comment #2 from Alexander Zaitsev  ---
Yes, you are right. My bad. Closing this issue.

[Bug tree-optimization/83350] Missed optimization in math expression: missing cube of the sum formula

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83350

--- Comment #2 from Alexander Zaitsev  ---
Sure. Do you know about any activity in gcc in implementing or integrating
built-in math engine for optimizing such expressions?

[Bug tree-optimization/83348] Missed optimization in math expression: can be used std::pow function

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83348

--- Comment #2 from Alexander Zaitsev  ---
Thank you for the great comment! Should I close this issue?

[Bug tree-optimization/83354] Missed optimization in math expression: pow(cbrt(x), y) == pow(x, y / 3)

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83354

Alexander Zaitsev  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

[Bug tree-optimization/83353] Missed optimization in math expression: sin(asin(a)) == a

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83353

--- Comment #3 from Alexander Zaitsev  ---
FMPOV compiler should optimize as much as possible. If it's too time consuming,
I prefer some additional compiler option like
-f-do-some-math-time-consuming-optimization.

And yes - developers canot write all math optimizations manually. We need some
math engine inside (smth like souper optimizer)

[Bug tree-optimization/83352] Missed optimization in math expression: sqrt(sqrt(a)) == pow(a, 1/4)

2018-11-01 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83352

--- Comment #2 from Alexander Zaitsev  ---
What about longer sqrt functions call?

gcc(trunk) -O3 -ffast-math:

double test(double a)
{
return sqrt(sqrt(sqrt(sqrt(sqrt(a);
}

test(double):
andpd   xmm0, XMMWORD PTR .LC0[rip]
sqrtsd  xmm0, xmm0
sqrtsd  xmm0, xmm0
sqrtsd  xmm0, xmm0
sqrtsd  xmm0, xmm0
sqrtsd  xmm0, xmm0
ret

Don't see any optimizations on godbolt

[Bug c++/87429] New: Strange overload resolution with decltype in template function

2018-09-25 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87429

Bug ID: 87429
   Summary: Strange overload resolution with decltype in template
function
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) for this code:

#include 

template  int foo(int) { return 1; }
template  int foo(decltype(T{})) { return 2; }
template  int foo(decltype(int(T{}))) { return 3;}

int main()
{
std::cout << foo(0);
}

prints '2'. But as I understand here should be compilation error because of
ambiguous call to overload function

[Bug c++/85747] suboptimal code without constexpr

2018-08-06 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747

Alexander Zaitsev  changed:

   What|Removed |Added

 CC||zamazan4ik at tut dot by

--- Comment #8 from Alexander Zaitsev  ---
(In reply to Marc Glisse from comment #5)
> (In reply to Antony Polukhin from comment #4)
> > Does providing some kind of -Oon-the-fly switch solves the issue with JIT
> > compile times while still allows more optimizations for the traditional non
> > JIT  -O2 builds?
> 
> Not sure what you mean by -Oon-the-fly. If you want to JIT compile the code,
> you more or less need to embed a compiler in the executable...
> 
> The closest I can think of is -fprofile-values. It is possible to collect
> the values of constants during a training run and use them during a second
> compilation. But then knowing more constants in one compilation than the
> other may change the code in ways that the PGO framework will not like.

As I understand Anthony meant here some compiler option which allows to
compiler some "aggresive" mode for detecting code which can be calculated at
compile-time as much as possible, even if it will increase significantly
compilation time. Of course this flag must be disabled by default even with -O3

[Bug tree-optimization/86707] New: Missed optimization: optimizing set of if statements

2018-07-27 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86707

Bug ID: 86707
   Summary: Missed optimization: optimizing set of if statements
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with -O3 -std=c++17 for this code:

unsigned int foo(unsigned int x)
{
if(x % 2 == 0)
{
return x * 2;
}
if(x % 4 == 0)
{
return x * 4;
}
if(x % 8 == 0)
{
return x * 8;
}
if(x % 16 == 0)
{
return x * 16;
}
if(x % 32 == 0)
{
return x * 32;
}
return 100;
}

generates this:

foo(unsigned int):
  test dil, 1
  je .L9
  test dil, 3
  je .L10
  test dil, 7
  je .L11
  mov eax, edi
  test dil, 15
  je .L12
  sal eax, 5
  and edi, 31
  mov edi, 100
  cmovne eax, edi
  ret
.L10:
  lea eax, [0+rdi*4]
  ret
.L9:
  lea eax, [rdi+rdi]
  ret
.L12:
  sal eax, 4
  ret
.L11:
  lea eax, [0+rdi*8]
  ret


As you see, generated code is suboptimal: here we can leave only first 'if'
statement and otherwise return 100.

[Bug libstdc++/84688] New: Use pdqsort instead of introsort for std::sort

2018-03-03 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84688

Bug ID: 84688
   Summary: Use pdqsort instead of introsort for std::sort
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

I suggest use pdqsort for std::sort, because pdqsort is really faster. Here you
can find benchmarks: https://github.com/orlp/pdqsort

[Bug c++/84560] Internal error in std::function with std::memset

2018-02-25 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84560

--- Comment #1 from Alexander Zaitsev  ---
On GCC 7.3.1 for this code I have:


internal compiler error: в expand_expr_real_1, в expr.c:9908
 memset(d[n - 1], 0, sizeof(int));
^
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
Preprocessed source stored into /tmp/ccsFhYKj.out file, please attach this to
your bugreport.
gmake[3]: *** [CMakeFiles/test_proj.dir/build.make:63:
CMakeFiles/test_proj.dir/main.cpp.o] Ошибка 1
gmake[2]: *** [CMakeFiles/Makefile2:68: CMakeFiles/test_proj.dir/all] Ошибка 2
gmake[1]: *** [CMakeFiles/Makefile2:80: CMakeFiles/test_proj.dir/rule] Ошибка 2
gmake: *** [Makefile:118: test_proj] Ошибка 2

[Bug c++/84560] New: Internal error in std::function with std::memset

2018-02-25 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84560

Bug ID: 84560
   Summary: Internal error in std::function with std::memset
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '-O3 -march=native --std=c++11 -Wall' for this code:


#include 
#include 

using namespace std;

int main() {
int n = 1;
int m = 1;
int d[n][m];
function<void()> rec = [&]() {
memset(d[n - 1], 0, sizeof(int));
};
  return 0;
}


generates this:


: In lambda function:
:10:32: error: Local declaration from a different function
 function<void()> rec = [&]() {
^
D.34164
:11:16: note: in statement
 memset(d[n - 1], 0, sizeof(int));
^
_2 = (sizetype) D.34164;
:10:32: error: Local declaration from a different function
 function<void()> rec = [&]() {
^
D.34164
:11:16: note: in statement
 memset(d[n - 1], 0, sizeof(int));
^
_9 = (sizetype) D.34164;
:10:32: error: Local declaration from a different function
 function<void()> rec = [&]() {
^
D.34167
:11:23: note: in statement
 memset(d[n - 1], 0, sizeof(int));
~~~^
_13 = D.34167 /[ex] 4;
during GIMPLE pass: cfg
:10:32: internal compiler error: verify_gimple failed
 function<void()> rec = [&]() {
^
mmap: Cannot allocate memory
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
Compiler returned: 1


I think Internal error isn't good here.

Clang works fine.

[Bug tree-optimization/84512] New: Missed optimization: should be precalculated in compile-time

2018-02-22 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84512

Bug ID: 84512
   Summary: Missed optimization: should be precalculated in
compile-time
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '-O3' option for this code:


int foo()
{
int a[10];
for(int i = 0; i < 10; ++i)
{
a[i] = i*i;
}
int res = 0;
for(int i = 0; i < 10; ++i)
{
res += a[i];
}
return res;
}


produces this:


foo():
  movabs rax, 4294967296
  mov QWORD PTR [rsp-56], rax
  movabs rax, 38654705668
  mov QWORD PTR [rsp-48], rax
  movabs rax, 107374182416
  mov QWORD PTR [rsp-40], rax
  movabs rax, 210453397540
  mov QWORD PTR [rsp-32], rax
  movdqa xmm0, XMMWORD PTR [rsp-40]
  paddd xmm0, XMMWORD PTR [rsp-56]
  movdqa xmm1, xmm0
  psrldq xmm1, 8
  paddd xmm0, xmm1
  movdqa xmm1, xmm0
  psrldq xmm1, 4
  paddd xmm0, xmm1
  movd eax, xmm0
  add eax, 145
  ret


but clang(trunk) with '-O3' produces this one:


foo(): # @foo()
  mov eax, 285
  ret

[Bug c++/82478] Rejects valid access to private member type that should be allowed by friend

2018-02-07 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82478

Alexander Zaitsev  changed:

   What|Removed |Added

 CC||zamazan4ik at tut dot by

--- Comment #6 from Alexander Zaitsev  ---
One more problematic case: 

#include 

class A {
int a;
friend class B;
};

class B {
public:

template 
struct trait: std::false_type {};

template 
struct trait<T, std::void_t<decltype(std::declval<T&>().a)>>:
std::true_type {};
};

int main() {
static_assert(B::trait{});
return 0;
}


On clang(trunk) this code works well. On gcc(trunk) we have: 

prog.cc: In function 'int main()':
prog.cc:15:61: error: 'int A::a' is private within this context
 struct trait<T, std::void_t<decltype(std::declval<T&>().a)>>:
std::true_type {};
  ~~~^
prog.cc:4:9: note: declared private here
 int a;

[Bug tree-optimization/83715] New: Missed optimization in math expression: optimize double comparing

2018-01-06 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83715

Bug ID: 83715
   Summary: Missed optimization in math expression: optimize
double comparing
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc trunk with '-O3 -ffast-math -std=c++17' for this code:


double test(double x, double y)
{
if(x != y)
{
return 42.0;
}
return x/y;
}


generates this code:

test(double, double):
  comisd xmm0, xmm1
  jne .L3
  divsd xmm0, xmm1
  ret
.L3:
  movsd xmm0, QWORD PTR .LC0[rip]
  ret
.LC0:
  .long 0
  .long 1078263808


but we can optimize here divide operation and just return 1.0.

[Bug tree-optimization/83541] New: Missed optimization with int overflow

2017-12-21 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83541

Bug ID: 83541
   Summary: Missed optimization with int overflow
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '-O3 -std=c++17 -ffast-math' for this:

#include 

int test(int x)
{
if(x == std::numeric_limits::max())
{
return x+1;
}
return 42;
}

generates this:

test(int):
  cmp edi, 2147483647
  mov edx, -2147483648
  mov eax, 42
  cmove eax, edx
  ret


But branch with condition is UB, so you can just delete it and simply return
42.

[Bug tree-optimization/83518] New: Missing optimization: useless instructions should be dropped

2017-12-20 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83518

Bug ID: 83518
   Summary: Missing optimization: useless instructions should be
dropped
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc (trunk) with '-O3 -std=c++17' for this code:

unsigned test()
{
int arr[] = {5,4,3,2,1};
int sum = 0;

for(int i = 0;i < 5;++i)
{
for(int j = 0; j < 5; ++j)
{
int t = arr[i];
arr[i] = arr[j];
arr[j] = t;
}
}

for(int i = 0; i < 5; ++i)
{
sum += arr[i];
}

return sum;
}


generates it:

test():
  movdqa xmm0, XMMWORD PTR .LC0[rip]
  movaps XMMWORD PTR [rsp-40], xmm0
  mov rax, QWORD PTR [rsp-32]
  mov DWORD PTR [rsp-32], 1
  mov QWORD PTR [rsp-40], rax
  mov DWORD PTR [rsp-28], 5
  movdqa xmm0, XMMWORD PTR [rsp-40]
  movdqa xmm1, xmm0
  psrldq xmm1, 8
  paddd xmm0, xmm1
  movdqa xmm1, xmm0
  psrldq xmm1, 4
  paddd xmm0, xmm1
  movd eax, xmm0
  add eax, 4
  ret
.LC0:
  .long 5
  .long 4
  .long 3
  .long 2


clang (trunk) with '-O3 -std=c++17':

test(): # @test()
  mov eax, 15
  ret

[Bug tree-optimization/83517] New: Missed optimization in math expression: (x+x)/x == 2

2017-12-20 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83517

Bug ID: 83517
   Summary: Missed optimization in math expression: (x+x)/x == 2
   Product: gcc
   Version: tree-ssa
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc (trunk) with '-O3 -std=c++17 -ffast-math -funsafe-math-optimizations' for
this code:

int test(int x)
{
return (x+x)/x;
}


generates:

test(int):
  lea eax, [rdi+rdi]
  cdq
  idiv edi
  ret


Why? In this case we can return simply 2. Because there are only two corner
cases: when x is 0 and we have division by zero (it's UB), and when x+x is
integer overflow (it's also UB).

So we can simply optimize it. There are a lot of similar cases.

[Bug c++/83384] New: Optimize heap allocation as stack allocation

2017-12-11 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83384

Bug ID: 83384
   Summary: Optimize heap allocation as stack allocation
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3' for this code:


int test(int a)
{
int res = 0;
int* i = new int;

for(*i = 0; *i < a; *i = *i + 1)
{
res += *i;
}

return res;
}


generates this:


test(int):
pushrbx
mov ebx, edi
mov edi, 4
calloperator new(unsigned long)
testebx, ebx
jle .L8
lea eax, [rbx-1]
cmp eax, 17
jbe .L9
mov edx, ebx
movdqa  xmm1, XMMWORD PTR .LC0[rip]
xor eax, eax
pxorxmm0, xmm0
movdqa  xmm2, XMMWORD PTR .LC1[rip]
shr edx, 2
.L5:
add eax, 1
paddd   xmm0, xmm1
paddd   xmm1, xmm2
cmp eax, edx
jne .L5
movdqa  xmm1, xmm0
mov edx, ebx
psrldq  xmm1, 8
and edx, -4
paddd   xmm0, xmm1
movdqa  xmm1, xmm0
psrldq  xmm1, 4
paddd   xmm0, xmm1
movdeax, xmm0
cmp ebx, edx
je  .L1
.L7:
add eax, edx
add edx, 1
cmp ebx, edx
jg  .L7
.L1:
pop rbx
ret
.L8:
xor eax, eax
pop rbx
ret
.L9:
xor eax, eax
xor edx, edx
jmp .L7
.LC0:
.long   0
.long   1
.long   2
.long   3
.LC1:
.long   4
.long   4
.long   4
.long   4


clang(trunk) with '--std=c++17 -O3':


test(int): # @test(int)
  test edi, edi
  jle .LBB0_1
  lea eax, [rdi - 1]
  lea ecx, [rdi - 2]
  imul rcx, rax
  shr rcx
  lea eax, [rcx + rdi]
  add eax, -1
  ret
.LBB0_1:
  xor eax, eax
  ret



>From C++14 compiler can remove heap allocation and use allocation on stack.

[Bug tree-optimization/83354] New: Missed optimization in math expression: pow(cbrt(x), y) == pow(x, y / 3)

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83354

Bug ID: 83354
   Summary: Missed optimization in math expression: pow(cbrt(x),
y) == pow(x, y / 3)
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:

#include 

double test(double x, double y)
{
return pow(cbrt(x),y);
}

generates this assembly:


test(double, double):
sub rsp, 24
vmovsd  QWORD PTR [rsp+8], xmm1
callcbrt
vmovsd  xmm1, QWORD PTR [rsp+8]
add rsp, 24
jmp __pow_finite


As you can see, we can simplify it by calling pow(x, y/3). It should be faster.

[Bug tree-optimization/83353] Missed optimization in math expression: sin(asin(a)) == a

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83353

--- Comment #1 from Alexander Zaitsev  ---
The same issue about cos(acos(x)).

[Bug tree-optimization/83353] New: Missed optimization in math expression: sin(asin(a)) == a

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83353

Bug ID: 83353
   Summary: Missed optimization in math expression: sin(asin(a))
== a
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:


#include 

double test(double a)
{
return sin(asin(a));
}


generates this assembly:


test(double):
sub rsp, 8
call__asin_finite
add rsp, 8
jmp sin


But sin(asin(a)) == a. So there is no reason to call anything.

[Bug tree-optimization/83352] New: Missed optimization in math expression: sqrt(sqrt(a)) == pow(a, 1/4)

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83352

Bug ID: 83352
   Summary: Missed optimization in math expression: sqrt(sqrt(a))
== pow(a, 1/4)
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:


#include 

double test(double a)
{
return sqrt(sqrt(a));
}


generates this assembly:


test(double): # @test(double)
  vsqrtsd xmm0, xmm0, xmm0
  vsqrtsd xmm0, xmm0, xmm0
  ret


But there is formula: sqrt(sqrt(a)) == pow(a, 1/4). And it can be compiled in
faster way.

[Bug tree-optimization/83351] New: Missed optimization in math expression: sin^2(a) + cos^2(a) == 1

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83351

Bug ID: 83351
   Summary: Missed optimization in math expression: sin^2(a) +
cos^2(a) == 1
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:


#include 

double test(double a)
{
return cos(a) * cos(a) + sin(a) * sin(a);
}


generates this assembly:


test(double):
sub rsp, 24
mov rsi, rsp
lea rdi, [rsp+8]
callsincos
vmovsd  xmm1, QWORD PTR [rsp+8]
vmovsd  xmm0, QWORD PTR [rsp]
add rsp, 24
vmulsd  xmm1, xmm1, xmm1
vfmadd132sd xmm0, xmm1, xmm0
ret


But there is formula: sin^2(a) + cos^2(a) == 1. And it can be compiled in
faster way.

[Bug tree-optimization/83350] New: Missed optimization in math expression: missing cube of the sum formula

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83350

Bug ID: 83350
   Summary: Missed optimization in math expression: missing cube
of the sum formula
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:


double test(double a, double b)
{
return a*a*a + 3.0*a*a*b + 3.0*a*b*b + b*b*b;
}


generates this assembly:


test(double, double):
vmovsd  xmm2, QWORD PTR .LC0[rip]
vmulsd  xmm3, xmm0, xmm0
vmovapd xmm4, xmm0
vfmadd132sd xmm4, xmm1, xmm2
vmulsd  xmm4, xmm4, xmm1
vfmadd132sd xmm2, xmm4, xmm3
vmulsd  xmm1, xmm2, xmm1
vfmadd132sd xmm0, xmm1, xmm3
ret
.LC0:
.long   0
.long   1074266112


But there is formula: a*a*a + 3.0*a*a*b + 3.0*a*b*b + b*b*b == (a + b)^3. And
it can be compiled in faster way.

[Bug tree-optimization/83349] New: Missed optimization in math expression: aggressive optimization with std::pow

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83349

Bug ID: 83349
   Summary: Missed optimization in math expression: aggressive
optimization with std::pow
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with '--std=c++17 -O3 -march=native -ffast-math' flags for this
code:

#include 

double test(double a, double x)
{
return pow(a, x) * a * a * a * a;
}

generates this assembly:


test(double, double):
sub rsp, 24
vmovsd  QWORD PTR [rsp+8], xmm0
call__pow_finite
vmovsd  xmm2, QWORD PTR [rsp+8]
vmulsd  xmm2, xmm2, xmm2
vmulsd  xmm2, xmm2, xmm2
vmulsd  xmm0, xmm2, xmm0
add rsp, 24
ret


As you can see, me can simplify it by adding 4 to 'x' variable and after call
std::pow.

[Bug tree-optimization/83348] New: Missed optimization in math expression: can be used std::pow function

2017-12-10 Thread zamazan4ik at tut dot by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83348

Bug ID: 83348
   Summary: Missed optimization in math expression: can be used
std::pow function
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zamazan4ik at tut dot by
  Target Milestone: ---

gcc(trunk) with optimization flags '--std=c++17 -O3 -march=native -ffast-math'
for this code:


double f(double a)
{
return a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*
   a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*
   a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*
   a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*
   a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*
   a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a;
}


generates this assembly:

f(double):
vmulsd  xmm1, xmm0, xmm0
vmulsd  xmm1, xmm1, xmm1
vmulsd  xmm1, xmm1, xmm1
vmulsd  xmm1, xmm1, xmm1
vmulsd  xmm0, xmm1, xmm0
vmulsd  xmm1, xmm0, xmm0
vmulsd  xmm0, xmm1, xmm0
vmulsd  xmm0, xmm0, xmm0
ret


But here it can be simplified by using std::pow function. If you will increase
length of this multiply chain, gcc just will add more and more multiplications.