On Thu, Aug 12, 2021 at 11:22:48AM +0200, Jakub Jelinek via Gcc-patches wrote: > So, I wonder if your new routine shouldn't be instead done after > in ix86_expand_vec_perm_const_1 after vec_perm_1 among other 2 insn cases > and handle the other vpmovdw etc. cases in combine splitters (see that we > only use low half or quarter of the result and transform whatever > permutation we've used into what we want).
E.g. in the first function, combine tries: (set (reg:V16HI 85) (vec_select:V16HI (unspec:V32HI [ (mem/u/c:V32HI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S64 A512]) (reg:V32HI 88) repeated x2 ] UNSPEC_VPERMT2) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) A combine splitter could run avoid_constant_pool_reference on the first UNSPEC_VPERMT2 argument and check the permutation if it can be optimized, ideally using some function call so that we wouldn't need too many splitters. Jakub