https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918

--- Comment #12 from Cory Fields <lists at coryfields dot com> ---
> probably the target could advertise a rotate insn for that mode, restricted 
> to an argument of 8?

It seems this is already the case for avx512vl? There, my example above becomes
a vprold.

This missing optimization leads to a 25% slowdown for chacha20 on avx2 compared
to clang due to the pessimized 8bit/16bit rotates.

Would avx2 advertising this as a rotate be the preferred solution here? I'm not
familiar with the codebase, but I could try to implement that if so.

Reply via email to