https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62041
Bug ID: 62041 Summary: vector fneg codegen uses a subtract instead of an xor (x86-64) Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: spatel at rotateright dot com $ cat fneg.c #include <xmmintrin.h> __m128 fneg4(__m128 x) { return _mm_sub_ps(_mm_set1_ps(-0.0), x); } $ ~gcc49/local/bin/gcc -march=core-avx2 -O2 -S fneg.c -o - ... _fneg4: LFB513: vmovaps LC0(%rip), %xmm1 vsubps %xmm0, %xmm1, %xmm0 ret ... LC0: .long 2147483648 .long 2147483648 .long 2147483648 .long 2147483648 ------------------------------------ Instead of generating 'vsubps' here, it would be better to generate 'vxorps' because we know we're just flipping the sign bit of each element. This is what gcc does for the scalar version of this code. Note that there is no difference if I use -ffast-math with this testcase. With -ffast-math enabled, we should generate the same 'xorps' code even if the "-0.0" is "+0.0". Again, that's what the scalar codegen does, so I think this is just a deficiency when generating vector code. I can file the -ffast-math case as a separate bug if that would be better.