https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71725

            Bug ID: 71725
           Summary: Backend decides to generate larger and possibly slower
                    float ops for integer ops that appear in source
           Product: gcc
           Version: 7.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64-*-*

The following testcase derived from gcc.target/i386/xorps-sse2.c (see PR54716)
generates FP ops for the xor which uses a larger opcode and possibly is slower
when g is a trap/denormal representation(?)

#define vector __attribute__ ((vector_size (16)))

vector int x(vector float f, vector int h)
{
  vector int g = { 0x80000000, 0, 0x80000000, 0 };
  vector int f_int = (vector int) f;
  return (f_int ^ g) + h;
}

x:
.LFB1:
        .cfi_startproc
        xorps   .LC0(%rip), %xmm0
        paddd   %xmm1, %xmm0
        ret

flags used are -O -msse2 -mno-sse3.

Today r191827 might be better implemented in sth like the STV pass which can
apply logic that isn't localized to a single instruction but really to
the context as the gcc.target/i386/xorps-sse2.c testcase claims to test.

Reply via email to