https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069
--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> --- (In reply to Uroš Bizjak from comment #8) > A better patch: The real issue is that the following permutation (truncation): + for (i = 0; i < d.nelt; ++i) + d.perm[i] = i * 2; + + ok = ix86_expand_vec_perm_const_1 (&d); results in a slow code involving VPERMQ. Ideally, ix86_expand_vec_perm_const_1 should emit faster code for truncation, because this will benefit other code as well.