https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118057
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |riscv
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
I would expect this to be always slower when vectorized unless the core is
seriously bottle-necked on the frontend. The loads/stores need to be
decomposed to separate uops, there's no actual vector operation. The vector op
introduces an artificial dependence between otherwise independent lanes which
could execute OOO in scalar.
I think GCC behaves better here.