https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81602

--- Comment #3 from Peter Cordes <peter at cordes dot ca> ---
Forgot to mention: memory-source popcnt with an indexed addressing mode would
also be worse on SnB/IvB: it can't stay micro-fused, so the front-end
un-laminates it in the issue stage.

Haswell and later can keep  popcnt (%rdi, %rdx), %eax  micro-fused throughout
the pipeline, so it's always 1 fused-domain uop instead of expanding to 2, but
it's still 2 unfused-domain uops so it takes more room in the scheduler than
the reg-reg form.

When Intel fixes the output dependency in some future uarch, it might
un-laminate again with indexed addressing modes.  That's what happens on
Skylake for tzcnt/lzcnt, because SKL fixed their output dependency.  (And
judging from the published errata, they meant to fix popcnt as well.)  But
index addressing modes can only stay micro-fused with an ALU uop with
"traditional" x86-style instructions with 2 operands where the destination is
read/write, not write-only.   (Tested on Haswell and Skylake).  And yes, this
makes indexed addressing modes with AVX instructions worse than with the SSE
equivalent. :/

Reply via email to