https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96918
--- Comment #12 from Cory Fields <lists at coryfields dot com> --- > probably the target could advertise a rotate insn for that mode, restricted > to an argument of 8? It seems this is already the case for avx512vl? There, my example above becomes a vprold. This missing optimization leads to a 25% slowdown for chacha20 on avx2 compared to clang due to the pessimized 8bit/16bit rotates. Would avx2 advertising this as a rotate be the preferred solution here? I'm not familiar with the codebase, but I could try to implement that if so.
