[Bug tree-optimization/88713] Vectorized code slow vs. flang

elrodc at gmail dot com Tue, 22 Jan 2019 20:42:53 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88713


--- Comment #30 from Chris Elrod <elrodc at gmail dot com> ---
gcc still (In reply to Marc Glisse from comment #29)
> The main difference I can see is that clang computes rsqrt directly, while
> gcc first computes sqrt and then computes the inverse. Also gcc seems afraid
> of getting NaN for sqrt(0) so it masks out this value. ix86_emit_swsqrtsf in
> gcc/config/i386/i386.c seems like a good place to look at.

gcc caclulates the rsqrt directly with funsafe-math-optimizations and a couple
other flags (or just -ffast-math):

        vmovups (%rsi), %zmm0
        vxorps  %xmm1, %xmm1, %xmm1
        vcmpps  $4, %zmm0, %zmm1, %k1
        vrsqrt14ps      %zmm0, %zmm1{%k1}{z}

[Bug tree-optimization/88713] Vectorized code slow vs. flang

Reply via email to