On Fri, Jan 14, 2022 at 10:00 AM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
> Here's a revised version of this patch incorporating your suggestion of using
> force_reg instead of emit_move_insn to a pseudo allocated by gen_reg_rtx.
> I also took the opportunity to transition the rest of the function (and 
> clean-up
> those around it) to use this preferred idiom.
>
> This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
> and make -k check with no new failures.  OK for mainline?
>
>
> 2022-01-14  Roger Sayle  <ro...@nextmovesoftware.com>
>             Uroš Bizjak  <ubiz...@gmail.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti): Use force_reg.
>         (ix86_expand_ti_to_v1ti): Use force_reg.
>         (ix86_expand_v1ti_shift): Use force_reg.
>         (ix86_expand_v1ti_rotate): Use force_reg.
>         (ix86_expand_v1ti_ashiftrt): Provide new three operation
>         implementations for shifts by 111..126 bits.  Use force_reg.

LGTM, as far as I can review the patch due to code churn...

Thanks,
Uros.

>
> Thanks again,
> Roger
> --
>
> > -----Original Message-----
> > From: Uros Bizjak <ubiz...@gmail.com>
> > Sent: 12 January 2022 19:18
> > To: Roger Sayle <ro...@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] x86_64: Improvements to arithmetic right shifts of
> > V1TImode values.
> >
> > On Tue, Jan 11, 2022 at 2:26 PM Roger Sayle <ro...@nextmovesoftware.com>
> > wrote:
> > >
> > >
> > > This patch to the i386 backend's ix86_expand_v1ti_ashiftrt provides
> > > improved (shorter) implementations of V1TI mode arithmetic right
> > > shifts for constant amounts between 111 and 126 bits.  The
> > > significance of this range is that this functionality is useful for
> > > (eventually) providing sign extension from HImode and QImode to V1TImode.
> > >
> > > For example, x>>112 (to sign extend a 16-bit value), was previously
> > > generated as a four operation sequence:
> > >
> > >         movdqa  %xmm0, %xmm1            // word    7 6 5 4 3 2 1 0
> > >         psrad   $31, %xmm0              // V8HI = [S,S,?,?,?,?,?,?]
> > >         psrad   $16, %xmm1              // V8HI = [S,X,?,?,?,?,?,?]
> > >         punpckhqdq      %xmm0, %xmm1    // V8HI = [S,S,?,?,S,X,?,?]
> > >         pshufd  $253, %xmm1, %xmm0      // V8HI = [S,S,S,S,S,S,S,X]
> > >
> > > with this patch, we now generates a three operation sequence:
> > >
> > >         psrad   $16, %xmm0              // V8HI = [S,X,?,?,?,?,?,?]
> > >         pshufhw $254, %xmm0, %xmm0      // V8HI = [S,S,S,X,?,?,?,?]
> > >         pshufd  $254, %xmm0, %xmm0      // V8HI = [S,S,S,S,S,S,S,X]
> > >
> > > The correctness of generated code is confirmed by the existing
> > > run-time test gcc.target/i386/sse2-v1ti-ashiftrt-1.c in the testsuite.
> > > This idiom is safe to use for shifts by 127, but that case gets
> > > handled by a two operation sequence earlier in this function.
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with a make
> > > bootstrap and make -k check with no new failures.  OK for mainline?
> > >
> > >
> > > 2022-01-11  Roger Sayle  <ro...@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386-expand.c (ix86_expand_v1ti_ashiftrt): Provide
> > >         new three operation implementations for shifts by 111..126 bits.
> >
> > +  if (bits >= 111)
> > +    {
> > +      /* Three operations.  */
> > +      rtx tmp1 = gen_reg_rtx (V4SImode);
> > +      rtx tmp2 = gen_reg_rtx (V4SImode);
> > +      emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
> > +      emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits - 96)));
> >
> > This can be written as:
> >
> > rtx tmp1 = force_reg (V4SImode, gen_lowpart (V4SImode, op1)); emit_insn
> > (gen_ashrv4i3 (tmp2, tmp1, GEN_INT ...));
> >
> > +      rtx tmp3 = gen_reg_rtx (V8HImode);
> > +      rtx tmp4 = gen_reg_rtx (V8HImode);
> > +      emit_move_insn (tmp3, gen_lowpart (V8HImode, tmp2));
> > +      emit_insn (gen_sse2_pshufhw (tmp4, tmp3, GEN_INT (0xfe)));
> >
> > Here in a similar way...
> >
> > +      rtx tmp5 = gen_reg_rtx (V4SImode);
> > +      rtx tmp6 = gen_reg_rtx (V4SImode);
> > +      emit_move_insn (tmp5, gen_lowpart (V4SImode, tmp4));
> > +      emit_insn (gen_sse2_pshufd (tmp6, tmp5, GEN_INT (0xfe)));
> >
> > ... also here.
> >
> > +      rtx tmp7 = gen_reg_rtx (V1TImode);
> > +      emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
> > +      emit_move_insn (operands[0], tmp7);
> >
> > And here a simple:
> >
> > emit_move_insn (operands[0], gen_lowpart (V1TImode, tmp6);
> >
> > +      return;
> > +    }
> > +
> >
> > Uros.

Reply via email to