On 16/05/2019 23:50, Richard Henderson wrote:

> The VBROADCASTSD instruction only allows %ymm registers as destination.
> Rather than forcing VEX.L and writing to the entire 256-bit register,
> revert to using MOVDDUP with an %xmm register.  This is sufficient for
> an avx1 host since we do not support TCG_TYPE_V256 for that case.
> 
> Also fix the 32-bit avx2, which should have used VPBROADCASTW.
> 
> Fixes: 1e262b49b533
> Reported-by: Mark Cave-Ayland <mark.cave-ayl...@ilande.co.uk>
> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
> ---
>  tcg/i386/tcg-target.inc.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index aafd01cb49..b3601446cd 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -358,6 +358,7 @@ static inline int tcg_target_const_match(tcg_target_long 
> val, TCGType type,
>  #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
>  #define OPC_MOVD_VyEy   (0x6e | P_EXT | P_DATA16)
>  #define OPC_MOVD_EyVy   (0x7e | P_EXT | P_DATA16)
> +#define OPC_MOVDDUP     (0x12 | P_EXT | P_SIMDF2)
>  #define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16)
>  #define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16)
>  #define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3)
> @@ -921,7 +922,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, 
> unsigned vece,
>      } else {
>          switch (vece) {
>          case MO_64:
> -            tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSD, r, 0, base, 
> offset);
> +            tcg_out_vex_modrm_offset(s, OPC_MOVDDUP, r, 0, base, offset);
>              break;
>          case MO_32:
>              tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSS, r, 0, base, 
> offset);
> @@ -963,12 +964,12 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType 
> type,
>          } else if (have_avx2) {
>              tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret);
>          } else {
> -            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD, ret);
> +            tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
>          }
>          new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
>      } else {
>          if (have_avx2) {
> -            tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD + vex_l, ret);
> +            tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
>          } else {
>              tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
>          }

Indeed, this fixes the issue for me here - thank you!

Tested-by: Mark Cave-Ayland <mark.cave-ayl...@ilande.co.uk>


ATB,

Mark.

Reply via email to