https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82281
Bug ID: 82281 Summary: Bulldozer/Zen tuning: uses XMM for single 64-bit integer AND, even with a simple mask Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: missed-optimization, ssemmx Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: peter at cordes dot ca Target Milestone: --- long long test_and(long long x) { return x & 0x77ffffffffULL; } // https://godbolt.org/g/D6XujV # -O3 -march=znver1 -m32 -mno-avx movaps .LC0, %xmm1 movq 4(%esp), %xmm0 andps %xmm1, %xmm0 movd %xmm0, %eax pextrd $1, %xmm0, %edx ret # -O3 -m32 movl 8(%esp), %edx movl 4(%esp), %eax andl $119, %edx ret We get this with znver1 and bdver1-4, but not barcelona or btver2. Also not haswell, skylake or knl. So something is wrong with tunings for recent AMD that make it over-eager to go to vector registers for 64-bit integers in the most trivial case possible. Fortunately it's on when coming from memory: long long ext(); long long test_and() { long long x = ext(); return x & 0x77ffffffffULL; } # -O3 -march=znver1 -m32 subl $12, %esp call ext() addl $12, %esp andl $119, %edx ret