https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82281

            Bug ID: 82281
           Summary: Bulldozer/Zen tuning: uses XMM for single 64-bit
                    integer AND, even with a simple mask
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization, ssemmx
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter at cordes dot ca
  Target Milestone: ---

long long test_and(long long x) {
        return x & 0x77ffffffffULL;
}
// https://godbolt.org/g/D6XujV
# -O3 -march=znver1 -m32 -mno-avx
        movaps  .LC0, %xmm1
        movq    4(%esp), %xmm0
        andps   %xmm1, %xmm0
        movd    %xmm0, %eax
        pextrd  $1, %xmm0, %edx
        ret

# -O3 -m32
        movl    8(%esp), %edx
        movl    4(%esp), %eax
        andl    $119, %edx
        ret

We get this with znver1 and bdver1-4, but not barcelona or btver2.

Also not haswell, skylake or knl.

So something is wrong with tunings for recent AMD that make it over-eager to go
to vector registers for 64-bit integers in the most trivial case possible. 
Fortunately it's on when coming from memory:

long long ext();
long long test_and() {
        long long x = ext();
        return x & 0x77ffffffffULL;
}
  # -O3 -march=znver1 -m32
        subl    $12, %esp
        call    ext()
        addl    $12, %esp
        andl    $119, %edx
        ret

Reply via email to