https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047

            Bug ID: 87047
           Summary: gcc 7 & 8 - performance regression because of
                    if-conversion
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: already5chosen at yahoo dot com
  Target Milestone: ---

Created attachment 44570
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44570&action=edit
demonstrate performance regression because of if-conversion

Very significant performance regression from gcc 6.x to 7.x and 8.x cause by
if-conversion of predictable branch.

Compilation flags: -O2 -Wall
Target: x86-64 (my test machine is IvyBridge)
It is possible that the problem is specific to x86-64 target. I tested (by
observing compiler output) aarch64 target and it looks o.k.

The problem occurs here:
    if ((i & 15)==0) {
      const uint64_t PROD_ONE = (uint64_t)(1) << 19;
      uint64_t prod = umulh(invRange, range);
      invRange = umulh(invRange, (PROD_ONE*2-1-prod)<<44)<<1;
    }

The condition has low probability and is easily predicted by branch predictor,
while code within if has relatively high latency.
gcc, starting from  gcc.7.x and up to the latest, is convinced that always
executing the inner part of the if is a bright idea. Measurements, on my
real-world code, do not agree and show 30% slowdown. I'm sure that on
artificial sequences I can demonstrate a slowdown of 100% and more.

What is special about this case is that compiler is VERY confident in its
stupid decision. It does not change its mind even when I replace 
    if ((i & 15)==0) {
by
    if (__builtin_expect((i & 15)==0, 0)) {

I found only two ways of forcing sane code generation:
1. -fno-if-conversion
2.
    if ((i & 15)==0) {
      asm volatile("");
      ...
    }

Reply via email to