On Thu, Aug 12, 2021 at 11:22:48AM +0200, Jakub Jelinek via Gcc-patches wrote:
> So, I wonder if your new routine shouldn't be instead done after
> in ix86_expand_vec_perm_const_1 after vec_perm_1 among other 2 insn cases
> and handle the other vpmovdw etc. cases in combine splitters (see that we
> only use low half or quarter of the result and transform whatever
> permutation we've used into what we want).

E.g. in the first function, combine tries:
(set (reg:V16HI 85)
    (vec_select:V16HI (unspec:V32HI [
                (mem/u/c:V32HI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S64 
A512])
                (reg:V32HI 88) repeated x2
            ] UNSPEC_VPERMT2)
        (parallel [
                (const_int 0 [0])
                (const_int 1 [0x1])
                (const_int 2 [0x2])
                (const_int 3 [0x3])
                (const_int 4 [0x4])
                (const_int 5 [0x5])
                (const_int 6 [0x6])
                (const_int 7 [0x7])
                (const_int 8 [0x8])
                (const_int 9 [0x9])
                (const_int 10 [0xa])
                (const_int 11 [0xb])
                (const_int 12 [0xc])
                (const_int 13 [0xd])
                (const_int 14 [0xe])
                (const_int 15 [0xf])
            ])))
A combine splitter could run avoid_constant_pool_reference on the
first UNSPEC_VPERMT2 argument and check the permutation if it can be
optimized, ideally using some function call so that we wouldn't need too
many splitters.

        Jakub

Reply via email to