Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

Jakub Jelinek via Gcc-patches Tue, 12 Jan 2021 04:34:39 -0800

On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches wrote:
> > OTOH, perhaps some of the new testcases can be handled in x86
> > target_fold_builtin? In the long term, maybe target_fold_shuffle can
> > be introduced to map __builtin_shufle to various target builtins, so
> > the builtin can be processed further in target_fold_builtin. As
> > pointed out below, vector insn patterns can be quite complex, and push
> > RTL combiners to their limits, so perhaps they can be more efficiently
> > handled by tree passes.
> 
> My primary motivation was to generate good code from __builtin_shuffle here
> and trying to find the best permutation and map it back from insns to
> builtins would be a nightmare.
> I'll see how many targets I need to modify to try the no middle-end
> force_reg for CONST0_RTX case...


For the folding, I think best would be to change _mm_unpacklo_epi8
and all the similar intrinsics for hardcoded specific permutations
from using a builtin to just using __builtin_shuffle (together with
verification that we emit as good or better code from it for each case of
course), and keep __builtin_shuffle -> VEC_PERM_EXPR as the canonical
form (with which the GIMPLE code can do any optimizations it wants).

        Jakub

Re: [PATCH] i386: Optimize _mm_unpacklo_epi8 of 0 vector as second argument or similar VEC_PERM_EXPRs into pmovzx [PR95905]

Reply via email to