https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118276
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target|X86_64 |x86_64-*-*
CC| |hubicka at gcc dot gnu.org
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is probably a tuning issue in the backend then, thinking (for generic
tuning) that for 11 elements req stosq is better (size/speed) vs. the
unrolled SSE code.
What's faster will ultimatively depend on the uarch (some have a low
overhead rep stosq, some do not).