On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <l...@redhat.com> wrote:
>
> On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> > On x86, when AVX and AVX512 are enabled, vector move instructions can
> > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> >
> >    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
> >    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> >
> > We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> > only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> > and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> > indicate target preferences of vector move encoding.  For vector register
> > to vector register move, we can use 512-bit vector move instructions to
> > move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> > if upper 16 vector registers aren't used.  This patch adds a function,
> > ix86_output_ssemov, to generate vector moves:
> >
> > 1. If zmm registers are used, use EVEX encoding.
> > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> > will be generated.
> > 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
> >    a. With AVX512VL, AVX512VL vector moves will be generated.
> >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> >       move will be done with zmm register move.
> >
> >
> [ ... ]
>
> >
> > +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> > +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> > +   TARGET_AVX512VL or it is a register to register move which can
> > +   be done with zmm register move. */
> > +
> > +static const char *
> > +ix86_get_ssemov (rtx *operands, unsigned size,
> > +              enum attr_mode insn_mode, machine_mode mode)
> > +{
> > +  char buf[128];
> > +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> > +                    || misaligned_operand (operands[1], mode));
> > +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> > +                  || EXT_REX_SSE_REG_P (operands[1]));
> > +  machine_mode scalar_mode;
> > +
> > +  else if (SCALAR_INT_MODE_P (scalar_mode))
> > +    {
> > +      switch (scalar_mode)
> > +     {
> > +     case E_QImode:
> > +       if (size == 64)
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu8"
> > +                      : "vmovdqu64")
> > +                   : "vmovdqa64");
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = (misaligned_p
> > +                       ? (TARGET_AVX512BW
> > +                          ? "vmovdqu8"
> > +                          : "vmovdqu64")
> > +                       : "vmovdqa64");
> > +         }
> > +       else
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu8"
> > +                      : "%vmovdqu")
> > +                   : "%vmovdqa");
> > +       break;
> > +     case E_HImode:
> > +       if (size == 64)
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu16"
> > +                      : "vmovdqu64")
> > +                   : "vmovdqa64");
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = (misaligned_p
> > +                       ? (TARGET_AVX512BW
> > +                          ? "vmovdqu16"
> > +                          : "vmovdqu64")
> > +                       : "vmovdqa64");
> > +         }
> > +       else
> > +         opcode = (misaligned_p
> > +                   ? (TARGET_AVX512BW
> > +                      ? "vmovdqu16"
> > +                      : "%vmovdqu")
> > +                   : "%vmovdqa");
> > +       break;
> > +     case E_SImode:
> > +       if (size == 64)
> > +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> > +         }
> > +       else
> > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +       break;
> > +     case E_DImode:
> > +     case E_TImode:
> > +     case E_OImode:
> > +       if (size == 64)
> > +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +       else if (evex_reg_p)
> > +         {
> > +           if (TARGET_AVX512VL)
> > +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +         }
> > +       else
> > +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> > +       break;
> > +     case E_XImode:
> > +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> > +       break;
> > +     default:
> > +       gcc_unreachable ();
> > +     }
> > +    }
> > +  else
> > +    gcc_unreachable ();
> > +
> > +  if (!opcode)
> > +    {
> > +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> > +      registers without AVX512VL by using zmm register move.  */
> So the overall flow control in here is rather convoluted.  I hate the way you
> don't set OPCODE above and then do it down here.  I would suggest breaking
> the !opcode bits into its own little function.  Then above in those places
> where you do
>
> if (TARGET_AVX512VL)
>    opcode = <whatever>;
>
>
> Instead change those to something like
>
> if (TARGET_AVX512VL)
>    opcode = <whatever>;
> else
>    opcode = new_function (...)
>
> That way opcode is set on every path through the major if-else in this
> function.
>
> Second when I suggested you break the patch up on a per-pattern basis, I
> probably should have also said that I would start with the minimal support in
> ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
> converted.  That way the mapping from current code to new code is more 
> obvious.

I will do these.   On x86,  different instructions can move vector
registers.  They all
do the same thing.  But some are preferred over others, depending on
tuning options.

>
> As it stands the breaking into separate patches didn't really help much 
> because
> we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov in
> patch #1 and that's the code I'm most worried about verifying we get right,
> particularly at this stage.  I literally can't take any patch and map from the
> old code to the new code without having to understand all of patch #1.

The old code is very convoluted and wrong in some cases.  I am trying to
clean it up.  I will update my patches based on your feedback.

-- 
H.J.

Reply via email to