https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79359
--- Comment #2 from Raphael C <drraph at gmail dot com> ---
As an additional data point in relation to Part 2 (that is without
-ffast-math). In gcc 7 -O3 -ffinite-math-only gives
f:
movq QWORD PTR [rsp-16], xmm0
movss xmm3, DWORD PTR [rsp-12]
movss xmm2, DWORD PTR [rsp-16]
movaps xmm1, xmm3
movaps xmm0, xmm2
jmp __mulsc3
whereas in clang trunk it gives
f: # @f
movaps xmm1, xmm0
shufps xmm1, xmm1, 229 # xmm1 = xmm1[1,1,2,3]
movaps xmm2, xmm0
mulss xmm2, xmm1
addss xmm2, xmm2
mulss xmm0, xmm0
mulss xmm1, xmm1
subss xmm0, xmm1
unpcklps xmm0, xmm2 # xmm0 =
xmm0[0],xmm2[0],xmm0[1],xmm2[1]
ret
I am no longer convinced ICC is handling NaN and Inf correctly so have posted a
query to their forum. However, it looks like gcc is not optimising as it could
when -ffinite-math-only is enabled.