https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118818
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
-mno-recip disables this, the documentation probably needs an update:
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/x86-Options.html#index-mrecip-2
Your benchmark looks latency-limited, but use of rcpss only improves throughput
(latency is increased from ~10 cycles for divps to ~16 cycles for
rcpps-mul-mul-add-sub replacement sequence).