https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201
--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #11) > I'm not aware of vcompressb insn, only vcompressps and vcompresspd. Intel lists it under VBMI2, so icelake+. > Sure, > one could just emit whatever we emit for __builtin_shuffle with (__v64qi) { > 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, > 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, > 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, > 32, 40, 48, 56 } or similar perm, the question is if it will be faster that > way or not. Exactly.