Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

Jeff Law Wed, 26 Feb 2020 14:42:29 -0800

On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote:
> On x86, when AVX and AVX512 are enabled, vector move instructions can
> be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
> 
>    0: c5 f9 6f d1             vmovdqa %xmm1,%xmm2
>    4: 62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
> 
> We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> indicate target preferences of vector move encoding.  For vector register
> to vector register move, we can use 512-bit vector move instructions to
> move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> if upper 16 vector registers aren't used.  This patch adds a function,
> ix86_output_ssemov, to generate vector moves:
> 
> 1. If zmm registers are used, use EVEX encoding.
> 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> will be generated.
> 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
>    a. With AVX512VL, AVX512VL vector moves will be generated.
>    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>       move will be done with zmm register move.
> 
> 
[ ... ]


>  
> +/* Return the opcode of the TYPE_SSEMOV instruction.  To move from
> +   or to xmm16-xmm31/ymm16-ymm31 registers, we either require
> +   TARGET_AVX512VL or it is a register to register move which can
> +   be done with zmm register move. */
> +
> +static const char *
> +ix86_get_ssemov (rtx *operands, unsigned size,
> +              enum attr_mode insn_mode, machine_mode mode)
> +{
> +  char buf[128];
> +  bool misaligned_p = (misaligned_operand (operands[0], mode)
> +                    || misaligned_operand (operands[1], mode));
> +  bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0])
> +                  || EXT_REX_SSE_REG_P (operands[1]));
> +  machine_mode scalar_mode;
> +
> +  else if (SCALAR_INT_MODE_P (scalar_mode))
> +    {
> +      switch (scalar_mode)
> +     {
> +     case E_QImode:
> +       if (size == 64)
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW
> +                      ? "vmovdqu8"
> +                      : "vmovdqu64")
> +                   : "vmovdqa64");
> +       else if (evex_reg_p)
> +         {
> +           if (TARGET_AVX512VL)
> +             opcode = (misaligned_p
> +                       ? (TARGET_AVX512BW
> +                          ? "vmovdqu8"
> +                          : "vmovdqu64")
> +                       : "vmovdqa64");
> +         }
> +       else
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW
> +                      ? "vmovdqu8"
> +                      : "%vmovdqu")
> +                   : "%vmovdqa");
> +       break;
> +     case E_HImode:
> +       if (size == 64)
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW
> +                      ? "vmovdqu16"
> +                      : "vmovdqu64")
> +                   : "vmovdqa64");
> +       else if (evex_reg_p)
> +         {
> +           if (TARGET_AVX512VL)
> +             opcode = (misaligned_p
> +                       ? (TARGET_AVX512BW
> +                          ? "vmovdqu16"
> +                          : "vmovdqu64")
> +                       : "vmovdqa64");
> +         }
> +       else
> +         opcode = (misaligned_p
> +                   ? (TARGET_AVX512BW
> +                      ? "vmovdqu16"
> +                      : "%vmovdqu")
> +                   : "%vmovdqa");
> +       break;
> +     case E_SImode:
> +       if (size == 64)
> +         opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +       else if (evex_reg_p)
> +         {
> +           if (TARGET_AVX512VL)
> +             opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
> +         }
> +       else
> +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +       break;
> +     case E_DImode:
> +     case E_TImode:
> +     case E_OImode:
> +       if (size == 64)
> +         opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +       else if (evex_reg_p)
> +         {
> +           if (TARGET_AVX512VL)
> +             opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +         }
> +       else
> +         opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
> +       break;
> +     case E_XImode:
> +       opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
> +       break;
> +     default:
> +       gcc_unreachable ();
> +     }
> +    }
> +  else
> +    gcc_unreachable ();
> +
> +  if (!opcode)
> +    {
> +      /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31
> +      registers without AVX512VL by using zmm register move.  */
So the overall flow control in here is rather convoluted.  I hate the way you
don't set OPCODE above and then do it down here.  I would suggest breaking 
the !opcode bits into its own little function.  Then above in those places
where you do

if (TARGET_AVX512VL)
   opcode = <whatever>;


Instead change those to something like

if (TARGET_AVX512VL)
   opcode = <whatever>;
else
   opcode = new_function (...)

That way opcode is set on every path through the major if-else in this
function.

Second when I suggested you break the patch up on a per-pattern basis, I
probably should have also said that I would start with the minimal support in
ix86_get_ssemov and ix86_output_ssemov to support the pattern you just
converted.  That way the mapping from current code to new code is more obvious.
 

As it stands the breaking into separate patches didn't really help much because
we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov in
patch #1 and that's the code I'm most worried about verifying we get right,
particularly at this stage.  I literally can't take any patch and map from the
old code to the new code without having to understand all of patch #1.



Jeff

Re: [PATCH 01/10] i386: Properly encode vector registers in vector move

Reply via email to