https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645

            Bug ID: 91645
           Summary: Missed optimization with sqrt(x*x)
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lisyarus at gmail dot com
  Target Milestone: ---

Based on a discussion on stackoverflow:
https://stackoverflow.com/questions/57673825/how-to-force-gcc-to-assume-that-a-floating-point-expression-is-non-negative.

With gcc-trunk and '-std=c++17 -O3', the function 

float test (float x) 
{
    return std::sqrt(x*x);
}

produces the following assembly:

test(float):
        mulss   xmm0, xmm0
        pxor    xmm2, xmm2
        ucomiss xmm2, xmm0
        movaps  xmm1, xmm0
        sqrtss  xmm1, xmm1
        ja      .L8
        movaps  xmm0, xmm1
        ret
.L8:
        sub     rsp, 24
        movss   DWORD PTR [rsp+12], xmm1
        call    sqrtf
        movss   xmm1, DWORD PTR [rsp+12]
        add     rsp, 24
        movaps  xmm0, xmm1
        ret


As far as I can tell, it calls sqrtf, unless the argument to sqrt is >= 0, to
check for negatives/NaN's and set the appropriate errno. The behavior is
reasonable, as expected.

Adding '-fno-math-errno', '-ffast-math', or '-ffinite-math-only' removes all
the clutter and compiles the same code into the neat

test(float):
        mulss   xmm0, xmm0
        sqrtss  xmm0, xmm0
        ret


Now, the problem is that GCC doesn't seem to optimize away the call to sqrtf
based on some surrounding code. As an example, it would be neat to have this
(or something similar) to get compiled into the same mulss-sqrtss-ret:

float test (float x) 
{
    float y = x*x;
    if (y >= 0.f)
        return std::sqrt(y);
    __builtin_unreachable();
}

If I understand it correctly, the 'y >= 0.f' excludes 'y' being NaN and 'y'
being negative (though this is excluded by 'y = x*x'), so there is no need to
check if the argument to `std::sqrt` is any bad, enabling to just do 'sqrtss'
and return.

Furthemore, adding e.g. '#pragma GCC optimize ("no-math-errno")' before the
'test' function doesn't lead to optimizing it either, though I'm not sure
whether this is expected to work and/or requires a separate bugtracker issue.

Reply via email to