On Wed, Apr 06, 2022 at 03:01:33PM -0500, will schmidt wrote:
> In this context it's not clear what is the "old code" ?
> The mtvsrdd
> instruction is referenced in this code path.  I see no direct reference
> to lxvrdx here, though I suppose it's assumed somewhere behind the
> emit_ calls.

The lxvrdx comes from this pattern:

(define_insn "vsx_lxvr<wd>x"
  [(set (match_operand:TI 0 "vsx_register_operand" "=wa")
       (zero_extend:TI (match_operand:INT_ISA3  1 "memory_operand" "Z")))]
  "TARGET_POWER10"
  "lxvr<wd>x %x0,%y1"
  [(set_attr "type" "vecload")])

However since we don't currently provide a zero_extendditi2 insn, the compiler
won't generate the normal operation.  Instead, it will use the machine
independent code to copy the bottom bits, and zero the top bits as separate
insns.  By providing the pattern below, it opens up the possibility for
generating the zero_extend insn.

However, if you don't provide support for doing the GPR support, it will force
everything into the vector registers, which mean adding direct moves to get the
values into the appropriate registers.

> 
> > +(define_insn_and_split "zero_extendditi2"
> > +  [(set (match_operand:TI 0 "register_operand"                 "=r,r, 
> > wa,&wa")
> > +   (zero_extend:TI (match_operand:DI 1 "register_operand"  "r,wa,r,  
> > wa")))]
> > +  "TARGET_POWERPC64 && TARGET_P9_VECTOR"
> > +  "@
> > +   #
> > +   #
> > +   mtvsrdd %x0,0,%1
> > +   #"
> > +  "&& reload_completed
> > +   && (int_reg_operand (operands[0], TImode)
> > +       || vsx_register_operand (operands[1], DImode))"
> > +  [(pc)]
> > +{
> > +  rtx dest = operands[0];
> > +  rtx src = operands[1];
> > +  int dest_regno = reg_or_subregno (dest);
> > +
> > +  /* Handle conversion to GPR registers.  Load up the low part and then do
> > +     zero out the upper part.  */
> > +  if (INT_REGNO_P (dest_regno))
> > +    {
> > +      rtx dest_hi = gen_highpart (DImode, dest);
> > +      rtx dest_lo = gen_lowpart (DImode, dest);
> > +
> > +      emit_move_insn (dest_lo, src);
> > +      emit_move_insn (dest_hi, const0_rtx);
> > +      DONE;
> > +    }
> > +
> > +  /* For settomg a VSX register from another VSX register, clear the result
> > +     register, and use XXPERMDI to shift the value into the lower 64-bits. 
> >  */
> 
> setting
> 
> No reference to xxpermdi in the code chunk here, though we are pretty
> sure it will be generated via gen_vsx_concat_v2di.

Yes.

For example, on power9 for:

        void
        gpr_to_vsx (__uint128_t *p, unsigned long long a)
        {
          /* mtvsrdd 0,0,4; stxv 0,0(3).  */
          __uint128_t b = a;
          __asm__ (" # %x0" : "+wa" (b));
          *p = b;
        }

        void
        vsx_to_vsx (__uint128_t *p, double d)
        {
          /* fctiduz 1,1; xxspltib 0,0; xxpermdi 0,0,1,0; stxv 0,0(3).  */
          __uint128_t a = (unsigned long long)d;
          __asm__ (" # %x0" : "+wa" (a));
          *p = a;
        }

The code generated is:

        old gpr_to_vsx          new gpr_to_vsx
        ==============          ==============
        mr 10,4                 mtvsrdd 0,0,4
        li 11,0
        mtvsrdd 0,11,10

        old vsx_to_vsx          new new vsx_to_vsx
        ==============          ==================
        fctiduz 0,1             fctiduz 1,1
        li 11,0                 xxsplitib 0,0
        mfvsrd 10,0             xxpermdi 0,0,1,0
        mtvsrdd 0,11,10

In addition, on power10 for:

        void
        mem_to_vsx (__uint128_t *p, unsigned long long *q)
        {
          /* lxvrdx 0,0,4; stxv 0,0(3).  */
          __uint128_t a = *q;
          __asm__ (" # %x0" : "+wa" (a));
          *p = a;
        }

The code generated is:

        old mem_to_vsx          new mem_to_vsx
        ==============          ==============
        ld 10,0(4)              lxvrdx 0,0,4
        li 11,0
        mtvsrdd 0,11,10


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Reply via email to