https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71725
Bug ID: 71725 Summary: Backend decides to generate larger and possibly slower float ops for integer ops that appear in source Product: gcc Version: 7.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Target: x86_64-*-* The following testcase derived from gcc.target/i386/xorps-sse2.c (see PR54716) generates FP ops for the xor which uses a larger opcode and possibly is slower when g is a trap/denormal representation(?) #define vector __attribute__ ((vector_size (16))) vector int x(vector float f, vector int h) { vector int g = { 0x80000000, 0, 0x80000000, 0 }; vector int f_int = (vector int) f; return (f_int ^ g) + h; } x: .LFB1: .cfi_startproc xorps .LC0(%rip), %xmm0 paddd %xmm1, %xmm0 ret flags used are -O -msse2 -mno-sse3. Today r191827 might be better implemented in sth like the STV pass which can apply logic that isn't localized to a single instruction but really to the context as the gcc.target/i386/xorps-sse2.c testcase claims to test.