https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- Slow: Samples: 4K of event 'cycles:u', Event count (approx.): 4565667242 Overhead Samples Command Shared Object Symbol 30.88% 1252 botan libbotan-2.so.17 [.] Botan::Block_Cipher_Fixed_Params<16ul, 16ul, 0ul, 1ul, Botan 30.24% 1235 botan libbotan-2.so.17 [.] Botan::Block_Cipher_Fixed_Params<16ul, 16ul, 0ul, 1ul, Botan 26.04% 1055 botan libbotan-2.so.17 [.] Botan::poly_double_n_le Fast Samples: 4K of event 'cycles:u', Event count (approx.): 4427277434 Overhead Samples Command Shared Object Symbol 33.59% 1372 botan libbotan-2.so.17 [.] Botan::Block_Cipher_Fixed_Params<16ul, 16ul, 0ul, 1ul, Bo 33.16% 1356 botan libbotan-2.so.17 [.] Botan::Block_Cipher_Fixed_Params<16ul, 16ul, 0ul, 1ul, Bo 18.71% 765 botan libbotan-2.so.17 [.] Botan::poly_double_n_le also fast on trunk when not vectorizing, so the rev does what it was intended to (more vectorization). I'll look into what we do to poly_double_n_le.