On Wed, Feb 26, 2020 at 2:42 PM Jeff Law <l...@redhat.com> wrote: > > On Sat, 2020-02-15 at 07:26 -0800, H.J. Lu wrote: > > On x86, when AVX and AVX512 are enabled, vector move instructions can > > be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512): > > > > 0: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 > > 4: 62 f1 fd 08 6f d1 vmovdqa64 %xmm1,%xmm2 > > > > We prefer VEX encoding over EVEX since VEX is shorter. Also AVX512F > > only supports 512-bit vector moves. AVX512F + AVX512VL supports 128-bit > > and 256-bit vector moves. Mode attributes on x86 vector move patterns > > indicate target preferences of vector move encoding. For vector register > > to vector register move, we can use 512-bit vector move instructions to > > move 128-bit/256-bit vector if AVX512VL isn't available. With AVX512F > > and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves > > if upper 16 vector registers aren't used. This patch adds a function, > > ix86_output_ssemov, to generate vector moves: > > > > 1. If zmm registers are used, use EVEX encoding. > > 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding > > will be generated. > > 3. If xmm16-xmm31/ymm16-ymm31 registers are used: > > a. With AVX512VL, AVX512VL vector moves will be generated. > > b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register > > move will be done with zmm register move. > > > > > [ ... ] > > > > > +/* Return the opcode of the TYPE_SSEMOV instruction. To move from > > + or to xmm16-xmm31/ymm16-ymm31 registers, we either require > > + TARGET_AVX512VL or it is a register to register move which can > > + be done with zmm register move. */ > > + > > +static const char * > > +ix86_get_ssemov (rtx *operands, unsigned size, > > + enum attr_mode insn_mode, machine_mode mode) > > +{ > > + char buf[128]; > > + bool misaligned_p = (misaligned_operand (operands[0], mode) > > + || misaligned_operand (operands[1], mode)); > > + bool evex_reg_p = (EXT_REX_SSE_REG_P (operands[0]) > > + || EXT_REX_SSE_REG_P (operands[1])); > > + machine_mode scalar_mode; > > + > > + else if (SCALAR_INT_MODE_P (scalar_mode)) > > + { > > + switch (scalar_mode) > > + { > > + case E_QImode: > > + if (size == 64) > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu8" > > + : "vmovdqu64") > > + : "vmovdqa64"); > > + else if (evex_reg_p) > > + { > > + if (TARGET_AVX512VL) > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu8" > > + : "vmovdqu64") > > + : "vmovdqa64"); > > + } > > + else > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu8" > > + : "%vmovdqu") > > + : "%vmovdqa"); > > + break; > > + case E_HImode: > > + if (size == 64) > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu16" > > + : "vmovdqu64") > > + : "vmovdqa64"); > > + else if (evex_reg_p) > > + { > > + if (TARGET_AVX512VL) > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu16" > > + : "vmovdqu64") > > + : "vmovdqa64"); > > + } > > + else > > + opcode = (misaligned_p > > + ? (TARGET_AVX512BW > > + ? "vmovdqu16" > > + : "%vmovdqu") > > + : "%vmovdqa"); > > + break; > > + case E_SImode: > > + if (size == 64) > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; > > + else if (evex_reg_p) > > + { > > + if (TARGET_AVX512VL) > > + opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; > > + } > > + else > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; > > + break; > > + case E_DImode: > > + case E_TImode: > > + case E_OImode: > > + if (size == 64) > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > + else if (evex_reg_p) > > + { > > + if (TARGET_AVX512VL) > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > + } > > + else > > + opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; > > + break; > > + case E_XImode: > > + opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; > > + break; > > + default: > > + gcc_unreachable (); > > + } > > + } > > + else > > + gcc_unreachable (); > > + > > + if (!opcode) > > + { > > + /* NB: We get here only because we move xmm16-xmm31/ymm16-ymm31 > > + registers without AVX512VL by using zmm register move. */ > So the overall flow control in here is rather convoluted. I hate the way you > don't set OPCODE above and then do it down here. I would suggest breaking > the !opcode bits into its own little function. Then above in those places > where you do > > if (TARGET_AVX512VL) > opcode = <whatever>; > > > Instead change those to something like > > if (TARGET_AVX512VL) > opcode = <whatever>; > else > opcode = new_function (...) > > That way opcode is set on every path through the major if-else in this > function. > > Second when I suggested you break the patch up on a per-pattern basis, I > probably should have also said that I would start with the minimal support in > ix86_get_ssemov and ix86_output_ssemov to support the pattern you just > converted. That way the mapping from current code to new code is more > obvious.
I will do these. On x86, different instructions can move vector registers. They all do the same thing. But some are preferred over others, depending on tuning options. > > As it stands the breaking into separate patches didn't really help much > because > we still have all the complexity in ix86_get_ssemov and ix86_output_ssemov in > patch #1 and that's the code I'm most worried about verifying we get right, > particularly at this stage. I literally can't take any patch and map from the > old code to the new code without having to understand all of patch #1. The old code is very convoluted and wrong in some cases. I am trying to clean it up. I will update my patches based on your feedback. -- H.J.