On Wed, Jan 13, 2021 at 08:26:49AM +0100, Richard Biener wrote:
> On Wed, 13 Jan 2021, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > The following patch implements what I've talked about, i.e. to no longer
> > force operands of vec_perm_const into registers in the generic code, but let
> > each of the (currently 8) targets force it into registers individually,
> > giving the targets better control on if it does that and when and allowing
> > them to do something special with some particular operands.
> > And then defines the define_insn_and_split for the 256-bit and 512-bit
> > permutations into vpmovzx* (only the bw, wd and dq cases, in theory we could
> > add define_insn_and_split patterns also for the bd, bq and wq).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> I wonder if it's worth handling the v0 == v1 case this way given the
> weird
> 
> +  if (op1 && op0 != op1)
> +    op1 = force_reg (vmode, op1);
> 
> code (presumably to handle RTX sharing here)?

The v0 == v1 case is not about RTX sharing, but about telling the targets
in a quick way that it is a one argument permutation (two argument
__builtin_shuffle, or 3 argument where the expander can determine they are
the same etc.).  Sometimes that can also be determined from the permutation,
but not always.
If we just forced both arguments into registers separately, this info would
be lost.

> Otherwise it looks sensible - for x86, only constant op1/op0 are 
> interesting, correct?  Wouldn't that simplify things (to only handle
> constants this way)?

Yes, and only CONST0_RTX ATM.  I originally had the generic code only treat
CONST0_RTX that way and otherwise force_reg, but it didn't really simplify
anything at all, the backends still need to force_reg if they require a REG,
even when it is just CONST0_RTX that could make it through.

        Jakub

Reply via email to