date:20190518

Re: [Qemu-devel] [PATCH v4 5/5] target/mips: Refactor and fix INSERT. instructions

2019-05-18 Thread Aleksandar Markovic

On Apr 2, 2019 3:49 PM, "Mateja Marjanovic" 
wrote:
>
> From: Mateja Marjanovic 
>
> The old version of the helper for the INSERT. MSA instructions
> has been replaced with four helpers that don't use switch, and change
> the endianness of the given index, when executed on a big endian host.
>
> Signed-off-by: Mateja Marjanovic 
> ---

Reviewed-by: Aleksandar Markovic 

I'll do minor corrections (resulting from this mail thread discussion)
while applying to my pull request.

>  target/mips/helper.h |  5 +++-
>  target/mips/msa_helper.c | 65

>  target/mips/translate.c  | 19 +-
>  3 files changed, 71 insertions(+), 18 deletions(-)
>
> diff --git a/target/mips/helper.h b/target/mips/helper.h
> index 8b6703c..82f6a40 100644
> --- a/target/mips/helper.h
> +++ b/target/mips/helper.h
> @@ -875,7 +875,6 @@ DEF_HELPER_5(msa_hsub_u_df, void, env, i32, i32, i32,
i32)
>  DEF_HELPER_5(msa_sldi_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_splati_df, void, env, i32, i32, i32, i32)
>
> -DEF_HELPER_5(msa_insert_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_insve_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_3(msa_ctcmsa, void, env, tl, i32)
>  DEF_HELPER_2(msa_cfcmsa, tl, env, i32)
> @@ -942,6 +941,10 @@ DEF_HELPER_4(msa_copy_s_d, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_u_b, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_u_h, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_u_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_insert_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_insert_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_insert_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_insert_d, void, env, i32, i32, i32)
>
>  DEF_HELPER_4(msa_fclass_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_ftrunc_s_df, void, env, i32, i32, i32)
> diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> index d5bf4dc..d5c3842 100644
> --- a/target/mips/msa_helper.c
> +++ b/target/mips/msa_helper.c
> @@ -1323,28 +1323,61 @@ void helper_msa_copy_u_w(CPUMIPSState *env,
uint32_t rd,
>  env->active_tc.gpr[rd] = (uint32_t)env->active_fpu.fpr[ws].wr.w[n];
>  }
>
> -void helper_msa_insert_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
> +void helper_msa_insert_b(CPUMIPSState *env, uint32_t wd,
>uint32_t rs_num, uint32_t n)
>  {
>  wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
>  target_ulong rs = env->active_tc.gpr[rs_num];
> +n %= 16;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 8) {
> +n = 8 - n - 1;
> +} else {
> +n = 24 - n - 1;
> +}
> +#endif
> +pwd->b[n] = (int8_t)rs;
> +}
>
> -switch (df) {
> -case DF_BYTE:
> -pwd->b[n] = (int8_t)rs;
> -break;
> -case DF_HALF:
> -pwd->h[n] = (int16_t)rs;
> -break;
> -case DF_WORD:
> -pwd->w[n] = (int32_t)rs;
> -break;
> -case DF_DOUBLE:
> -pwd->d[n] = (int64_t)rs;
> -break;
> -default:
> -assert(0);
> +void helper_msa_insert_h(CPUMIPSState *env, uint32_t wd,
> +  uint32_t rs_num, uint32_t n)
> +{
> +wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
> +target_ulong rs = env->active_tc.gpr[rs_num];
> +n %= 8;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 4) {
> +n = 4 - n - 1;
> +} else {
> +n = 12 - n - 1;
> +}
> +#endif
> +pwd->h[n] = (int16_t)rs;
> +}
> +
> +void helper_msa_insert_w(CPUMIPSState *env, uint32_t wd,
> +  uint32_t rs_num, uint32_t n)
> +{
> +wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
> +target_ulong rs = env->active_tc.gpr[rs_num];
> +n %= 4;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 2) {
> +n = 2 - n - 1;
> +} else {
> +n = 6 - n - 1;
>  }
> +#endif
> +pwd->w[n] = (int32_t)rs;
> +}
> +
> +void helper_msa_insert_d(CPUMIPSState *env, uint32_t wd,
> +  uint32_t rs_num, uint32_t n)
> +{
> +wr_t *pwd = &(env->active_fpu.fpr[wd].wr);
> +target_ulong rs = env->active_tc.gpr[rs_num];
> +n %= 2;
> +pwd->d[n] = (int64_t)rs;
>  }
>
>  void helper_msa_insve_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index 72ed0a8..64587c4 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -29446,7 +29446,24 @@ static void gen_msa_elm_df(CPUMIPSState *env,
DisasContext *ctx, uint32_t df,
>  }
>  break;
>  case OPC_INSERT_df:
> -gen_helper_msa_insert_df(cpu_env, tdf, twd, tws, tn);
> +switch (df) {
> +case DF_BYTE:
> +gen_helper_msa_insert_b(cpu_env, twd, tws, tn);
> +break;
> +case DF_HALF:
> +gen_helper_msa_insert_h(cpu_env, twd, tws, tn);
> +break;
> +case DF_WORD:
> +

Re: [Qemu-devel] [PATCH v4 3/7] tcg/ppc: Support vector multiply

2019-05-18 Thread Aleksandar Markovic

On May 19, 2019 6:35 AM, "Richard Henderson" 
wrote:
>
> For Altivec, this is always an expansion.
>
> Signed-off-by: Richard Henderson 
> ---

Large portions of this patch have nothing to do with what title or commit
message say.Reorganize.

Thanks, Aleksandar

>  tcg/ppc/tcg-target.h |   2 +-
>  tcg/ppc/tcg-target.opc.h |   8 +++
>  tcg/ppc/tcg-target.inc.c | 112 ++-
>  3 files changed, 120 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index 766706fd30..a130192cbd 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -154,7 +154,7 @@ extern bool have_isa_3_00;
>  #define TCG_TARGET_HAS_shs_vec  0
>  #define TCG_TARGET_HAS_shv_vec  1
>  #define TCG_TARGET_HAS_cmp_vec  1
> -#define TCG_TARGET_HAS_mul_vec  0
> +#define TCG_TARGET_HAS_mul_vec  1
>  #define TCG_TARGET_HAS_sat_vec  1
>  #define TCG_TARGET_HAS_minmax_vec   1
>  #define TCG_TARGET_HAS_bitsel_vec   0
> diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
> index 4816a6c3d4..5c6a5ad52c 100644
> --- a/tcg/ppc/tcg-target.opc.h
> +++ b/tcg/ppc/tcg-target.opc.h
> @@ -1,3 +1,11 @@
>  /* Target-specific opcodes for host vector expansion.  These will be
> emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
> consider these to be UNSPEC with names.  */
> +
> +DEF(ppc_mrgh_vec, 1, 2, 0, IMPLVEC)
> +DEF(ppc_mrgl_vec, 1, 2, 0, IMPLVEC)
> +DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
> +DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
> +DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
> +DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
> +DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index 62a8c428e0..9d58db9eb1 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -526,6 +526,25 @@ static int tcg_target_const_match(tcg_target_long
val, TCGType type,
>  #define VSRAB  VX4(772)
>  #define VSRAH  VX4(836)
>  #define VSRAW  VX4(900)
> +#define VRLB   VX4(4)
> +#define VRLH   VX4(68)
> +#define VRLW   VX4(132)
> +
> +#define VMULEUBVX4(520)
> +#define VMULEUHVX4(584)
> +#define VMULOUBVX4(8)
> +#define VMULOUHVX4(72)
> +#define VMSUMUHM   VX4(38)
> +
> +#define VMRGHB VX4(12)
> +#define VMRGHH VX4(76)
> +#define VMRGHW VX4(140)
> +#define VMRGLB VX4(268)
> +#define VMRGLH VX4(332)
> +#define VMRGLW VX4(396)
> +
> +#define VPKUHUMVX4(14)
> +#define VPKUWUMVX4(78)
>
>  #define VAND   VX4(1028)
>  #define VANDC  VX4(1092)
> @@ -2892,6 +2911,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType
type, unsigned vece)
>  case INDEX_op_sarv_vec:
>  return vece <= MO_32;
>  case INDEX_op_cmp_vec:
> +case INDEX_op_mul_vec:
>  case INDEX_op_shli_vec:
>  case INDEX_op_shri_vec:
>  case INDEX_op_sari_vec:
> @@ -3005,7 +3025,13 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>  smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
>  shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
>  shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
> -sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
> +sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
> +mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
> +mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
> +muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
> +mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
> +pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
> +rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
>
>  TCGType type = vecl + TCG_TYPE_V64;
>  TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
> @@ -3094,6 +3120,29 @@ static void tcg_out_vec_op(TCGContext *s,
TCGOpcode opc,
>  }
>  break;
>
> +case INDEX_op_ppc_mrgh_vec:
> +insn = mrgh_op[vece];
> +break;
> +case INDEX_op_ppc_mrgl_vec:
> +insn = mrgl_op[vece];
> +break;
> +case INDEX_op_ppc_muleu_vec:
> +insn = muleu_op[vece];
> +break;
> +case INDEX_op_ppc_mulou_vec:
> +insn = mulou_op[vece];
> +break;
> +case INDEX_op_ppc_pkum_vec:
> +insn = pkum_op[vece];
> +break;
> +case INDEX_op_ppc_rotl_vec:
> +insn = rotl_op[vece];
> +break;
> +case INDEX_op_ppc_msum_vec:
> +tcg_debug_assert(vece == MO_16);
> +tcg_out32(s, VMSUMUHM | VRT(a0) | VRA(a1) | VRB(a2) |
VRC(args[3]));
> +return;
> +
>  case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
>  case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
>  case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
> @@ -3163,6 +3212,53 @@ static void expand_vec_cmp(TCGType type, unsigned
vece, TCGv_vec v0,
>  }
>  }
>
> +static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
> +   TCGv_vec v1, TCGv_vec v2)
> +{
> +TCGv_vec t1 =

[Qemu-devel] [PATCH v4 2/7] tcg/ppc: Support vector shift by immediate

2019-05-18 Thread Richard Henderson

For Altivec, this is done via vector shift by vector,
and loading the immediate into a register.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  2 +-
 tcg/ppc/tcg-target.inc.c | 58 ++--
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 368c250c6a..766706fd30 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -152,7 +152,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
-#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_cmp_vec  1
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_sat_vec  1
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 479e653da6..62a8c428e0 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -517,6 +517,16 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
 
+#define VSLB   VX4(260)
+#define VSLH   VX4(324)
+#define VSLW   VX4(388)
+#define VSRB   VX4(516)
+#define VSRH   VX4(580)
+#define VSRW   VX4(644)
+#define VSRAB  VX4(772)
+#define VSRAH  VX4(836)
+#define VSRAW  VX4(900)
+
 #define VAND   VX4(1028)
 #define VANDC  VX4(1092)
 #define VNOR   VX4(1284)
@@ -2877,8 +2887,14 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_sssub_vec:
 case INDEX_op_usadd_vec:
 case INDEX_op_ussub_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return vece <= MO_32;
 case INDEX_op_cmp_vec:
+case INDEX_op_shli_vec:
+case INDEX_op_shri_vec:
+case INDEX_op_sari_vec:
 return vece <= MO_32 ? -1 : 0;
 default:
 return 0;
@@ -2986,7 +3002,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 umin_op[4] = { VMINUB, VMINUH, VMINUW, 0 },
 smin_op[4] = { VMINSB, VMINSH, VMINSW, 0 },
 umax_op[4] = { VMAXUB, VMAXUH, VMAXUW, 0 },
-smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 };
+smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
+shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
+shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
+sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
 
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -3033,6 +3052,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 case INDEX_op_umax_vec:
 insn = umax_op[vece];
 break;
+case INDEX_op_shlv_vec:
+insn = shlv_op[vece];
+break;
+case INDEX_op_shrv_vec:
+insn = shrv_op[vece];
+break;
+case INDEX_op_sarv_vec:
+insn = sarv_op[vece];
+break;
 case INDEX_op_and_vec:
 insn = VAND;
 break;
@@ -3077,6 +3105,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out32(s, insn | VRT(a0) | VRA(a1) | VRB(a2));
 }
 
+static void expand_vec_shi(TCGType type, unsigned vece, TCGv_vec v0,
+   TCGv_vec v1, TCGArg imm, TCGOpcode opci)
+{
+TCGv_vec t1 = tcg_temp_new_vec(type);
+
+/* Splat w/bytes for xxspltib.  */
+tcg_gen_dupi_vec(MO_8, t1, imm & ((8 << vece) - 1));
+vec_gen_3(opci, type, vece, tcgv_vec_arg(v0),
+  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+tcg_temp_free_vec(t1);
+}
+
 static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
@@ -3128,14 +3168,25 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 {
 va_list va;
 TCGv_vec v0, v1, v2;
+TCGArg a2;
 
 va_start(va, a0);
 v0 = temp_tcgv_vec(arg_temp(a0));
 v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
-v2 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+a2 = va_arg(va, TCGArg);
 
 switch (opc) {
+case INDEX_op_shli_vec:
+expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shlv_vec);
+break;
+case INDEX_op_shri_vec:
+expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_shrv_vec);
+break;
+case INDEX_op_sari_vec:
+expand_vec_shi(type, vece, v0, v1, a2, INDEX_op_sarv_vec);
+break;
 case INDEX_op_cmp_vec:
+v2 = temp_tcgv_vec(arg_temp(a2));
 expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
 break;
 default:
@@ -3336,6 +3387,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_smin_vec:
 case INDEX_op_umax_vec:
 case INDEX_op_umin_vec:
+case INDEX_op_shlv_vec:
+case INDEX_op_shrv_vec:
+case INDEX_op_sarv_vec:
 return _v_v;
 case INDEX_op_not_vec:
 case INDEX_op_dup_vec:
-- 
2.17.1

[Qemu-devel] [PATCH v4 1/7] tcg/ppc: Initial backend support for Altivec

2019-05-18 Thread Richard Henderson

There are a few missing operations yet, like expansion of
multiply and shifts.  But this has move, load, store, and
basic arithmetic.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  36 +-
 tcg/ppc/tcg-target.opc.h |   3 +
 tcg/ppc/tcg-target.inc.c | 707 +++
 3 files changed, 685 insertions(+), 61 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 7627fb62d3..368c250c6a 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -31,7 +31,7 @@
 # define TCG_TARGET_REG_BITS  32
 #endif
 
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
 
@@ -45,10 +45,20 @@ typedef enum {
 TCG_REG_R24, TCG_REG_R25, TCG_REG_R26, TCG_REG_R27,
 TCG_REG_R28, TCG_REG_R29, TCG_REG_R30, TCG_REG_R31,
 
+TCG_REG_V0,  TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+TCG_REG_V4,  TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+TCG_REG_V8,  TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
+
 TCG_REG_CALL_STACK = TCG_REG_R1,
 TCG_AREG0 = TCG_REG_R27
 } TCGReg;
 
+extern bool have_isa_altivec;
 extern bool have_isa_2_06;
 extern bool have_isa_3_00;
 
@@ -126,6 +136,30 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_mulsh_i641
 #endif
 
+/*
+ * While technically Altivec could support V64, it has no 64-bit store
+ * instruction and substituting two 32-bit stores makes the generated
+ * code quite large.
+ */
+#define TCG_TARGET_HAS_v64  0
+#define TCG_TARGET_HAS_v128 have_isa_altivec
+#define TCG_TARGET_HAS_v256 0
+
+#define TCG_TARGET_HAS_andc_vec 1
+#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_not_vec  1
+#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_abs_vec  0
+#define TCG_TARGET_HAS_shi_vec  0
+#define TCG_TARGET_HAS_shs_vec  0
+#define TCG_TARGET_HAS_shv_vec  0
+#define TCG_TARGET_HAS_cmp_vec  1
+#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_sat_vec  1
+#define TCG_TARGET_HAS_minmax_vec   1
+#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   0
+
 void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
new file mode 100644
index 00..4816a6c3d4
--- /dev/null
+++ b/tcg/ppc/tcg-target.opc.h
@@ -0,0 +1,3 @@
+/* Target-specific opcodes for host vector expansion.  These will be
+   emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+   consider these to be UNSPEC with names.  */
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 30c095d3d5..479e653da6 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -42,6 +42,9 @@
 # define TCG_REG_TMP1   TCG_REG_R12
 #endif
 
+#define TCG_VEC_TMP1TCG_REG_V0
+#define TCG_VEC_TMP2TCG_REG_V1
+
 #define TCG_REG_TB TCG_REG_R31
 #define USE_REG_TB (TCG_TARGET_REG_BITS == 64)
 
@@ -61,6 +64,7 @@
 
 static tcg_insn_unit *tb_ret_addr;
 
+bool have_isa_altivec;
 bool have_isa_2_06;
 bool have_isa_3_00;
 
@@ -72,39 +76,15 @@ bool have_isa_3_00;
 #endif
 
 #ifdef CONFIG_DEBUG_TCG
-static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-"r0",
-"r1",
-"r2",
-"r3",
-"r4",
-"r5",
-"r6",
-"r7",
-"r8",
-"r9",
-"r10",
-"r11",
-"r12",
-"r13",
-"r14",
-"r15",
-"r16",
-"r17",
-"r18",
-"r19",
-"r20",
-"r21",
-"r22",
-"r23",
-"r24",
-"r25",
-"r26",
-"r27",
-"r28",
-"r29",
-"r30",
-"r31"
+static const char tcg_target_reg_names[TCG_TARGET_NB_REGS][4] = {
+"r0",  "r1",  "r2",  "r3",  "r4",  "r5",  "r6",  "r7",
+"r8",  "r9",  "r10", "r11", "r12", "r13", "r14", "r15",
+"r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23",
+"r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31",
+"v0",  "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+"v8",  "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+"v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -139,6 +119,26 @@ static const int tcg_target_reg_alloc_order[] = {
 TCG_REG_R5,
 TCG_REG_R4,
 TCG_REG_R3,
+
+/* V0 and V1 reserved as temporaries; V20 - V31 are call-saved */
+TCG_REG_V2,   /* call clobbered, vectors */
+TCG_REG_V3,
+TCG_REG_V4,
+TCG_REG_V5,
+TCG_REG_V6,
+TCG_REG_V7,
+

[Qemu-devel] [PATCH v4 3/7] tcg/ppc: Support vector multiply

2019-05-18 Thread Richard Henderson

For Altivec, this is always an expansion.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   2 +-
 tcg/ppc/tcg-target.opc.h |   8 +++
 tcg/ppc/tcg-target.inc.c | 112 ++-
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 766706fd30..a130192cbd 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -154,7 +154,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_shs_vec  0
 #define TCG_TARGET_HAS_shv_vec  1
 #define TCG_TARGET_HAS_cmp_vec  1
-#define TCG_TARGET_HAS_mul_vec  0
+#define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
diff --git a/tcg/ppc/tcg-target.opc.h b/tcg/ppc/tcg-target.opc.h
index 4816a6c3d4..5c6a5ad52c 100644
--- a/tcg/ppc/tcg-target.opc.h
+++ b/tcg/ppc/tcg-target.opc.h
@@ -1,3 +1,11 @@
 /* Target-specific opcodes for host vector expansion.  These will be
emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
consider these to be UNSPEC with names.  */
+
+DEF(ppc_mrgh_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_mrgl_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_msum_vec, 1, 3, 0, IMPLVEC)
+DEF(ppc_muleu_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_mulou_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_pkum_vec, 1, 2, 0, IMPLVEC)
+DEF(ppc_rotl_vec, 1, 2, 0, IMPLVEC)
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 62a8c428e0..9d58db9eb1 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -526,6 +526,25 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VSRAB  VX4(772)
 #define VSRAH  VX4(836)
 #define VSRAW  VX4(900)
+#define VRLB   VX4(4)
+#define VRLH   VX4(68)
+#define VRLW   VX4(132)
+
+#define VMULEUBVX4(520)
+#define VMULEUHVX4(584)
+#define VMULOUBVX4(8)
+#define VMULOUHVX4(72)
+#define VMSUMUHM   VX4(38)
+
+#define VMRGHB VX4(12)
+#define VMRGHH VX4(76)
+#define VMRGHW VX4(140)
+#define VMRGLB VX4(268)
+#define VMRGLH VX4(332)
+#define VMRGLW VX4(396)
+
+#define VPKUHUMVX4(14)
+#define VPKUWUMVX4(78)
 
 #define VAND   VX4(1028)
 #define VANDC  VX4(1092)
@@ -2892,6 +2911,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_sarv_vec:
 return vece <= MO_32;
 case INDEX_op_cmp_vec:
+case INDEX_op_mul_vec:
 case INDEX_op_shli_vec:
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
@@ -3005,7 +3025,13 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 smax_op[4] = { VMAXSB, VMAXSH, VMAXSW, 0 },
 shlv_op[4] = { VSLB, VSLH, VSLW, 0 },
 shrv_op[4] = { VSRB, VSRH, VSRW, 0 },
-sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 };
+sarv_op[4] = { VSRAB, VSRAH, VSRAW, 0 },
+mrgh_op[4] = { VMRGHB, VMRGHH, VMRGHW, 0 },
+mrgl_op[4] = { VMRGLB, VMRGLH, VMRGLW, 0 },
+muleu_op[4] = { VMULEUB, VMULEUH, 0, 0 },
+mulou_op[4] = { VMULOUB, VMULOUH, 0, 0 },
+pkum_op[4] = { VPKUHUM, VPKUWUM, 0, 0 },
+rotl_op[4] = { VRLB, VRLH, VRLW, 0 };
 
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
@@ -3094,6 +3120,29 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
+case INDEX_op_ppc_mrgh_vec:
+insn = mrgh_op[vece];
+break;
+case INDEX_op_ppc_mrgl_vec:
+insn = mrgl_op[vece];
+break;
+case INDEX_op_ppc_muleu_vec:
+insn = muleu_op[vece];
+break;
+case INDEX_op_ppc_mulou_vec:
+insn = mulou_op[vece];
+break;
+case INDEX_op_ppc_pkum_vec:
+insn = pkum_op[vece];
+break;
+case INDEX_op_ppc_rotl_vec:
+insn = rotl_op[vece];
+break;
+case INDEX_op_ppc_msum_vec:
+tcg_debug_assert(vece == MO_16);
+tcg_out32(s, VMSUMUHM | VRT(a0) | VRA(a1) | VRB(a2) | VRC(args[3]));
+return;
+
 case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -3163,6 +3212,53 @@ static void expand_vec_cmp(TCGType type, unsigned vece, 
TCGv_vec v0,
 }
 }
 
+static void expand_vec_mul(TCGType type, unsigned vece, TCGv_vec v0,
+   TCGv_vec v1, TCGv_vec v2)
+{
+TCGv_vec t1 = tcg_temp_new_vec(type);
+TCGv_vec t2 = tcg_temp_new_vec(type);
+TCGv_vec t3, t4;
+
+switch (vece) {
+case MO_8:
+case MO_16:
+vec_gen_3(INDEX_op_ppc_muleu_vec, type, vece, tcgv_vec_arg(t1),
+  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+vec_gen_3(INDEX_op_ppc_mulou_vec, type, vece, tcgv_vec_arg(t2),
+  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+

[Qemu-devel] [PATCH v4 0/7] tcg/ppc: Add vector opcodes

2019-05-18 Thread Richard Henderson

Based-on: <20190518190157.21255-1-richard.hender...@linaro.org>
Aka "tcg: misc gvec improvements".

Version 3 was last posted in March,
https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg05859.html

Changes since v3:
  * Add support for bitsel, with the vsx xxsel insn.
  * Rely on the new relocation overflow handling, so
we don't require 3 insns for a vector load.

Changes since v2:
  * Several generic tcg patches to improve dup vs dupi vs dupm.
In particular, if a global temp (like guest r10) is not in
a host register, we should duplicate from memory instead of
loading to an integer register, spilling to stack, loading
to a vector register, and then duplicating.
  * I have more confidence that 32-bit ppc host should work
this time around.  No testing on that front yet, but I've
unified some code sequences with 64-bit ppc host.
  * Base altivec now supports V128 only.  Moved V64 support to
Power7 (v2.06), which has 64-bit load/store.
  * Dropped support for 64-bit vector multiply using Power8.
The expansion was too large compared to using integer regs.


r~


Richard Henderson (7):
  tcg/ppc: Initial backend support for Altivec
  tcg/ppc: Support vector shift by immediate
  tcg/ppc: Support vector multiply
  tcg/ppc: Support vector dup2
  tcg/ppc: Update vector support to v2.06
  tcg/ppc: Update vector support to v2.07
  tcg/ppc: Update vector support to v3.00

 tcg/ppc/tcg-target.h |   39 +-
 tcg/ppc/tcg-target.opc.h |   11 +
 tcg/ppc/tcg-target.inc.c | 1077 +++---
 3 files changed, 1063 insertions(+), 64 deletions(-)
 create mode 100644 tcg/ppc/tcg-target.opc.h

-- 
2.17.1

[Qemu-devel] [PATCH v4 4/7] tcg/ppc: Support vector dup2

2019-05-18 Thread Richard Henderson

This is only used for 32-bit hosts.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.inc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 9d58db9eb1..3219df2e90 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -3120,6 +3120,14 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
+case INDEX_op_dup2_vec:
+assert(TCG_TARGET_REG_BITS == 32);
+/* With inputs a1 = xLxx, a2 = xHxx  */
+tcg_out32(s, VMRGHW | VRT(a0) | VRA(a2) | VRB(a1));  /* a0  = xxHL */
+tcg_out_vsldoi(s, TCG_VEC_TMP1, a0, a0, 8);  /* tmp = HLxx */
+tcg_out_vsldoi(s, a0, a0, TCG_VEC_TMP1, 8);  /* a0  = HLHL */
+return;
+
 case INDEX_op_ppc_mrgh_vec:
 insn = mrgh_op[vece];
 break;
@@ -3498,6 +3506,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_ppc_mulou_vec:
 case INDEX_op_ppc_pkum_vec:
 case INDEX_op_ppc_rotl_vec:
+case INDEX_op_dup2_vec:
 return _v_v;
 case INDEX_op_not_vec:
 case INDEX_op_dup_vec:
-- 
2.17.1

[Qemu-devel] [PATCH v4 7/7] tcg/ppc: Update vector support to v3.00

2019-05-18 Thread Richard Henderson

This includes vector load/store with immediate offset, some extra
move and splat insns, compare ne, and negate.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   3 +-
 tcg/ppc/tcg-target.inc.c | 103 ++-
 2 files changed, 94 insertions(+), 12 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index b8355d0a56..533f0ef510 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -63,6 +63,7 @@ extern bool have_isa_2_06;
 extern bool have_isa_2_06_vsx;
 extern bool have_isa_2_07_vsx;
 extern bool have_isa_3_00;
+extern bool have_isa_3_00_vsx;
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i320 /* andi */
@@ -150,7 +151,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_andc_vec 1
 #define TCG_TARGET_HAS_orc_vec  have_isa_2_07_vsx
 #define TCG_TARGET_HAS_not_vec  1
-#define TCG_TARGET_HAS_neg_vec  0
+#define TCG_TARGET_HAS_neg_vec  have_isa_3_00_vsx
 #define TCG_TARGET_HAS_abs_vec  0
 #define TCG_TARGET_HAS_shi_vec  0
 #define TCG_TARGET_HAS_shs_vec  0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index dedf0de04d..4ee77df178 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -69,6 +69,7 @@ bool have_isa_2_06;
 bool have_isa_2_06_vsx;
 bool have_isa_2_07_vsx;
 bool have_isa_3_00;
+bool have_isa_3_00_vsx;
 
 #define HAVE_ISA_2_06  have_isa_2_06
 #define HAVE_ISEL  have_isa_2_06
@@ -475,11 +476,16 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define LXSDX  XO31(588)  /* v2.06 */
 #define LXVDSX XO31(332)  /* v2.06 */
 #define LXSIWZXXO31(12)   /* v2.07 */
+#define LXV(OPCD(61) | 1) /* v3.00 */
+#define LXSD   (OPCD(57) | 2) /* v3.00 */
+#define LXVWSX XO31(364)  /* v3.00 */
 
 #define STVX   XO31(231)
 #define STVEWX XO31(199)
 #define STXSDX XO31(716)  /* v2.06 */
 #define STXSIWXXO31(140)  /* v2.07 */
+#define STXV   (OPCD(61) | 5) /* v3.00 */
+#define STXSD  (OPCD(61) | 2) /* v3.00 */
 
 #define VADDSBSVX4(768)
 #define VADDUBSVX4(512)
@@ -503,6 +509,9 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VSUBUWMVX4(1152)
 #define VSUBUDMVX4(1216)  /* v2.07 */
 
+#define VNEGW  (VX4(1538) | (6 << 16))  /* v3.00 */
+#define VNEGD  (VX4(1538) | (7 << 16))  /* v3.00 */
+
 #define VMAXSB VX4(258)
 #define VMAXSH VX4(322)
 #define VMAXSW VX4(386)
@@ -532,6 +541,9 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
 #define VCMPGTUD   VX4(711)   /* v2.07 */
+#define VCMPNEBVX4(7) /* v3.00 */
+#define VCMPNEHVX4(71)/* v3.00 */
+#define VCMPNEWVX4(135)   /* v3.00 */
 
 #define VSLB   VX4(260)
 #define VSLH   VX4(324)
@@ -589,11 +601,14 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 
 #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
 #define XXSEL  (OPCD(60) | (3 << 4))/* v2.06 */
+#define XXSPLTIB   (OPCD(60) | (360 << 1))  /* v3.00 */
 
 #define MFVSRD XO31(51)   /* v2.07 */
 #define MFVSRWZXO31(115)  /* v2.07 */
 #define MTVSRD XO31(179)  /* v2.07 */
 #define MTVSRWZXO31(179)  /* v2.07 */
+#define MTVSRDDXO31(435)  /* v3.00 */
+#define MTVSRWSXO31(403)  /* v3.00 */
 
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
@@ -917,6 +932,10 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
TCGReg ret,
 return;
 }
 }
+if (have_isa_3_00_vsx && val == (tcg_target_long)dup_const(MO_8, val)) {
+tcg_out32(s, XXSPLTIB | VRT(ret) | ((val & 0xff) << 11) | 1);
+return;
+}
 
 /*
  * Otherwise we must load the value from the constant pool.
@@ -1105,7 +1124,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int 
opx, TCGReg rt,
  TCGReg base, tcg_target_long offset)
 {
 tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
-bool is_store = false;
+bool is_int_store = false;
 TCGReg rs = TCG_REG_TMP1;
 
 switch (opi) {
@@ -1118,11 +1137,20 @@ static void tcg_out_mem_long(TCGContext *s, int opi, 
int opx, TCGReg rt,
 break;
 }
 break;
+case LXSD:
+case STXSD:
+align = 3;
+break;
+case LXV: case LXV | 8:
+case STXV: case STXV | 8:
+/* The |8 cases force altivec registers.  */
+align = 15;
+break;
 case STD:
 align = 3;
 /* FALLTHRU */
 case STB: case STH: case STW:
-is_store = true;
+is_int_store = true;
 break;
 }
 
@@ -1131,7 +1159,7 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int 
opx, TCGReg rt,
 if (rs == base) {
 rs =

[Qemu-devel] [PATCH v4 6/7] tcg/ppc: Update vector support to v2.07

2019-05-18 Thread Richard Henderson

This includes single-word loads and stores, lots of double-word
arithmetic, and a few extra logical operations.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |   3 +-
 tcg/ppc/tcg-target.inc.c | 111 +++
 2 files changed, 91 insertions(+), 23 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 40544f996d..b8355d0a56 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -61,6 +61,7 @@ typedef enum {
 extern bool have_isa_altivec;
 extern bool have_isa_2_06;
 extern bool have_isa_2_06_vsx;
+extern bool have_isa_2_07_vsx;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -147,7 +148,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_v256 0
 
 #define TCG_TARGET_HAS_andc_vec 1
-#define TCG_TARGET_HAS_orc_vec  0
+#define TCG_TARGET_HAS_orc_vec  have_isa_2_07_vsx
 #define TCG_TARGET_HAS_not_vec  1
 #define TCG_TARGET_HAS_neg_vec  0
 #define TCG_TARGET_HAS_abs_vec  0
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 6cb8c8f0eb..dedf0de04d 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -67,6 +67,7 @@ static tcg_insn_unit *tb_ret_addr;
 bool have_isa_altivec;
 bool have_isa_2_06;
 bool have_isa_2_06_vsx;
+bool have_isa_2_07_vsx;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -473,10 +474,12 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define LVEWX  XO31(71)
 #define LXSDX  XO31(588)  /* v2.06 */
 #define LXVDSX XO31(332)  /* v2.06 */
+#define LXSIWZXXO31(12)   /* v2.07 */
 
 #define STVX   XO31(231)
 #define STVEWX XO31(199)
 #define STXSDX XO31(716)  /* v2.06 */
+#define STXSIWXXO31(140)  /* v2.07 */
 
 #define VADDSBSVX4(768)
 #define VADDUBSVX4(512)
@@ -487,6 +490,7 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VADDSWSVX4(896)
 #define VADDUWSVX4(640)
 #define VADDUWMVX4(128)
+#define VADDUDMVX4(192)   /* v2.07 */
 
 #define VSUBSBSVX4(1792)
 #define VSUBUBSVX4(1536)
@@ -497,47 +501,62 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VSUBSWSVX4(1920)
 #define VSUBUWSVX4(1664)
 #define VSUBUWMVX4(1152)
+#define VSUBUDMVX4(1216)  /* v2.07 */
 
 #define VMAXSB VX4(258)
 #define VMAXSH VX4(322)
 #define VMAXSW VX4(386)
+#define VMAXSD VX4(450)   /* v2.07 */
 #define VMAXUB VX4(2)
 #define VMAXUH VX4(66)
 #define VMAXUW VX4(130)
+#define VMAXUD VX4(194)   /* v2.07 */
 #define VMINSB VX4(770)
 #define VMINSH VX4(834)
 #define VMINSW VX4(898)
+#define VMINSD VX4(962)   /* v2.07 */
 #define VMINUB VX4(514)
 #define VMINUH VX4(578)
 #define VMINUW VX4(642)
+#define VMINUD VX4(706)   /* v2.07 */
 
 #define VCMPEQUB   VX4(6)
 #define VCMPEQUH   VX4(70)
 #define VCMPEQUW   VX4(134)
+#define VCMPEQUD   VX4(199)   /* v2.07 */
 #define VCMPGTSB   VX4(774)
 #define VCMPGTSH   VX4(838)
 #define VCMPGTSW   VX4(902)
+#define VCMPGTSD   VX4(967)   /* v2.07 */
 #define VCMPGTUB   VX4(518)
 #define VCMPGTUH   VX4(582)
 #define VCMPGTUW   VX4(646)
+#define VCMPGTUD   VX4(711)   /* v2.07 */
 
 #define VSLB   VX4(260)
 #define VSLH   VX4(324)
 #define VSLW   VX4(388)
+#define VSLD   VX4(1476)  /* v2.07 */
 #define VSRB   VX4(516)
 #define VSRH   VX4(580)
 #define VSRW   VX4(644)
+#define VSRD   VX4(1732)  /* v2.07 */
 #define VSRAB  VX4(772)
 #define VSRAH  VX4(836)
 #define VSRAW  VX4(900)
+#define VSRAD  VX4(964)   /* v2.07 */
 #define VRLB   VX4(4)
 #define VRLH   VX4(68)
 #define VRLW   VX4(132)
+#define VRLD   VX4(196)   /* v2.07 */
 
 #define VMULEUBVX4(520)
 #define VMULEUHVX4(584)
+#define VMULEUWVX4(648)   /* v2.07 */
 #define VMULOUBVX4(8)
 #define VMULOUHVX4(72)
+#define VMULOUWVX4(136)   /* v2.07 */
+#define VMULUWMVX4(137)   /* v2.07 */
 #define VMSUMUHM   VX4(38)
 
 #define VMRGHB VX4(12)
@@ -555,6 +574,9 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define VNOR   VX4(1284)
 #define VORVX4(1156)
 #define VXOR   VX4(1220)
+#define VEQV   VX4(1668)  /* v2.07 */
+#define VNAND  VX4(1412)  /* v2.07 */
+#define VORC   VX4(1348)  /* v2.07 */
 
 #define VSPLTB VX4(524)
 #define VSPLTH VX4(588)
@@ -568,6 +590,11 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
 #define XXSEL  (OPCD(60) | (3 << 4))/* v2.06 */
 
+#define MFVSRD XO31(51)   /* v2.07 */
+#define MFVSRWZXO31(115)  /* v2.07 */
+#define MTVSRD XO31(179)  /* v2.07 */
+#define MTVSRWZXO31(179)  /* v2.07 */
+

[Qemu-devel] [PATCH v4 5/7] tcg/ppc: Update vector support to v2.06

2019-05-18 Thread Richard Henderson

This includes double-word loads and stores, double-word load and splat,
double-word permute, and bit select.  All of which require multiple
operations in the base Altivec instruction set.

Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.h |  5 ++--
 tcg/ppc/tcg-target.inc.c | 51 
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index a130192cbd..40544f996d 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -60,6 +60,7 @@ typedef enum {
 
 extern bool have_isa_altivec;
 extern bool have_isa_2_06;
+extern bool have_isa_2_06_vsx;
 extern bool have_isa_3_00;
 
 /* optional instructions automatically implemented */
@@ -141,7 +142,7 @@ extern bool have_isa_3_00;
  * instruction and substituting two 32-bit stores makes the generated
  * code quite large.
  */
-#define TCG_TARGET_HAS_v64  0
+#define TCG_TARGET_HAS_v64  have_isa_2_06_vsx
 #define TCG_TARGET_HAS_v128 have_isa_altivec
 #define TCG_TARGET_HAS_v256 0
 
@@ -157,7 +158,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   have_isa_2_06_vsx
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 void flush_icache_range(uintptr_t start, uintptr_t stop);
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 3219df2e90..6cb8c8f0eb 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -66,6 +66,7 @@ static tcg_insn_unit *tb_ret_addr;
 
 bool have_isa_altivec;
 bool have_isa_2_06;
+bool have_isa_2_06_vsx;
 bool have_isa_3_00;
 
 #define HAVE_ISA_2_06  have_isa_2_06
@@ -470,9 +471,12 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 #define LVEBX  XO31(7)
 #define LVEHX  XO31(39)
 #define LVEWX  XO31(71)
+#define LXSDX  XO31(588)  /* v2.06 */
+#define LXVDSX XO31(332)  /* v2.06 */
 
 #define STVX   XO31(231)
 #define STVEWX XO31(199)
+#define STXSDX XO31(716)  /* v2.06 */
 
 #define VADDSBSVX4(768)
 #define VADDUBSVX4(512)
@@ -561,6 +565,9 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 
 #define VSLDOI VX4(44)
 
+#define XXPERMDI   (OPCD(60) | (10 << 3))   /* v2.06 */
+#define XXSEL  (OPCD(60) | (3 << 4))/* v2.06 */
+
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -887,11 +894,21 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
TCGReg ret,
 add = 0;
 }
 
-load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
-if (TCG_TARGET_REG_BITS == 64) {
-new_pool_l2(s, rel, s->code_ptr, add, val, val);
+if (have_isa_2_06_vsx) {
+load_insn = type == TCG_TYPE_V64 ? LXSDX : LXVDSX;
+load_insn |= VRT(ret) | RB(TCG_REG_TMP1) | 1;
+if (TCG_TARGET_REG_BITS == 64) {
+new_pool_label(s, val, rel, s->code_ptr, add);
+} else {
+new_pool_l2(s, rel, s->code_ptr, add, val, val);
+}
 } else {
-new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+load_insn = LVX | VRT(ret) | RB(TCG_REG_TMP1);
+if (TCG_TARGET_REG_BITS == 64) {
+new_pool_l2(s, rel, s->code_ptr, add, val, val);
+} else {
+new_pool_l4(s, rel, s->code_ptr, add, val, val, val, val);
+}
 }
 
 if (USE_REG_TB) {
@@ -1138,6 +1155,10 @@ static void tcg_out_ld(TCGContext *s, TCGType type, 
TCGReg ret,
 /* fallthru */
 case TCG_TYPE_V64:
 tcg_debug_assert(ret >= 32);
+if (have_isa_2_06_vsx) {
+tcg_out_mem_long(s, 0, LXSDX | 1, ret & 31, base, offset);
+break;
+}
 assert((offset & 7) == 0);
 tcg_out_mem_long(s, 0, LVX, ret & 31, base, offset & -16);
 if (offset & 8) {
@@ -1181,6 +1202,10 @@ static void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg arg,
 /* fallthru */
 case TCG_TYPE_V64:
 tcg_debug_assert(arg >= 32);
+if (have_isa_2_06_vsx) {
+tcg_out_mem_long(s, 0, STXSDX | 1, arg & 31, base, offset);
+break;
+}
 assert((offset & 7) == 0);
 if (offset & 8) {
 tcg_out_vsldoi(s, TCG_VEC_TMP1, arg, arg, 8);
@@ -2916,6 +2941,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_shri_vec:
 case INDEX_op_sari_vec:
 return vece <= MO_32 ? -1 : 0;
+case INDEX_op_bitsel_vec:
+return have_isa_2_06_vsx;
 default:
 return 0;
 }
@@ -2942,6 +2969,10 @@ static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
unsigned vece,
 tcg_out32(s, VSPLTW | VRT(dst) | VRB(src) | (1 << 16));
 break;
 case MO_64:
+if (have_isa_2_06_vsx) {
+tcg_out32(s, XXPERMDI | 7

Re: [Qemu-devel] QEMU on OpenBSD is broken?

2019-05-18 Thread Brad Smith


I just noticed when I had replied that my e-mail was sent from a different
name, by accident, as I was testing something with my e-mail client.

On 5/18/2019 5:27 PM, Jim Payne wrote:

On 5/16/2019 9:04 PM, Thomas Huth wrote:


On 10/05/2019 12.46, Gerd Hoffmann wrote:

This patch series changes the way virtual machines for test builds are
managed.  They are created locally on the developer machine now.  The
installer is booted on the serial console and the scripts walks through
the dialogs to install and configure the guest.

That takes the download.patchew.org server out of the loop and makes it
alot easier to tweak the guest images (adding build dependencies for
example).

The install scripts take care to apply host proxy settings (from 
*_proxy

environment variables) to the guest, so any package downloads will be
routed through the proxy and can be cached that way.  This also makes
them work behind strict firewalls.

There are also a bunch of smaller tweaks for tests/vm to fix issues I
was struggling with.  See commit messages of individual patches for
details.

Gerd Hoffmann (13):
   scripts: use git archive in archive-source
   tests/vm: send proxy environment variables over ssh
   tests/vm: use ssh with pty unconditionally
   tests/vm: run test builds on snapshot
   tests/vm: proper guest shutdown
   tests/vm: add vm-boot-{ssh,serial}- targets
   tests/vm: add DEBUG=1 to help text
   tests/vm: serial console support helpers
   tests/vm: openbsd autoinstall, using serial console
   tests/vm: freebsd autoinstall, using serial console
   tests/vm: netbsd autoinstall, using serial console
   tests/vm: fedora autoinstall, using serial console
   tests/vm: ubuntu.i386: apt proxy setup

freebsd, netbsd and fedora targets work fine for me, so for the patches
1 - 8 and 10 - 12 :

Tested-by: Thomas Huth 

openbsd still fails for me:

   TEST    check-qtest-arm: tests/tmp105-test
   TEST    check-qtest-arm: tests/pca9552-test
   TEST    check-qtest-arm: tests/ds1338-test
   TEST    check-qtest-arm: tests/microbit-test
   TEST    check-qtest-arm: tests/m25p80-test
   TEST    check-qtest-arm: tests/test-arm-mptimer
   TEST    check-qtest-arm: tests/boot-serial-test
qemu-system-arm: cannot set up guest memory 'ram': Cannot allocate 
memory

Broken pipe


How much memory is trying to be allocated here?

The default maximum data size is set to 768MB. If there is a 
requirement to go beyond

that then the default has to be adjusted in /etc/login.conf.

datasize-max and datasize-cur

default:\
    :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin 
/usr/local/bin /usr/local/sbin:\

    :umask=022:\
    :datasize-max=768M:\
    :datasize-cur=768M:\
    :maxproc-max=256:\
    :maxproc-cur=128:\
    :openfiles-max=1024:\
    :openfiles-cur=512:\

[Qemu-devel] [PATCHv2 1/3] RISC-V: Raise access fault exceptions on PMP violations

2019-05-18 Thread Hesham Almatary

Section 3.6 in RISC-V v1.10 privilege specification states that PMP violations
report "access exceptions." The current PMP implementation has
a bug which wrongly reports "page exceptions" on PMP violations.

This patch fixes this bug by reporting the correct PMP access exceptions
trap values.

Signed-off-by: Hesham Almatary 
---
 target/riscv/cpu_helper.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 41d6db41c3..b48de36114 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -318,12 +318,13 @@ restart:
 }

 static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
-MMUAccessType access_type)
+MMUAccessType access_type, bool pmp_violation)
 {
 CPUState *cs = CPU(riscv_env_get_cpu(env));
 int page_fault_exceptions =
 (env->priv_ver >= PRIV_VERSION_1_10_0) &&
-get_field(env->satp, SATP_MODE) != VM_1_10_MBARE;
+get_field(env->satp, SATP_MODE) != VM_1_10_MBARE &&
+!pmp_violation;
 switch (access_type) {
 case MMU_INST_FETCH:
 cs->exception_index = page_fault_exceptions ?
@@ -389,6 +390,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 CPURISCVState *env = >env;
 hwaddr pa = 0;
 int prot;
+bool pmp_violation = false;
 int ret = TRANSLATE_FAIL;

 qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
@@ -402,6 +404,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,

 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
 !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
+pmp_violation = true;
 ret = TRANSLATE_FAIL;
 }
 if (ret == TRANSLATE_SUCCESS) {
@@ -411,7 +414,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 } else if (probe) {
 return false;
 } else {
-raise_mmu_exception(env, address, access_type);
+raise_mmu_exception(env, address, access_type, pmp_violation);
 riscv_raise_exception(env, cs->exception_index, retaddr);
 }
 #else
--
2.17.1

[Qemu-devel] [PATCHv2 2/3] RISC-V: Only Check PMP if MMU translation succeeds

2019-05-18 Thread Hesham Almatary

The current implementation unnecessarily checks for PMP even if MMU translation
failed. This may trigger a wrong PMP access exception instead of
a page exception.

For example, the very first instruction fetched after the first satp write in
S-Mode will trigger a PMP access fault instead of an instruction fetch page
fault.

This patch prioritises MMU exceptions over PMP exceptions and only checks for
PMP if MMU translation succeeds.

Signed-off-by: Hesham Almatary 
---
 target/riscv/cpu_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b48de36114..7c7282c680 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -403,6 +403,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
   " prot %d\n", __func__, address, ret, pa, prot);

 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
+(ret == TRANSLATE_SUCCESS) &&
 !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
 pmp_violation = true;
 ret = TRANSLATE_FAIL;
--
2.17.1

[Qemu-devel] [PATCHv3 3/3] RISC-V: Check PMP during Page Table Walks

2019-05-18 Thread Hesham Almatary

The PMP should be checked when doing a page table walk, and report access
fault exception if the to-be-read PTE address failed the PMP check.

Suggested-by: Jonathan Behrens 
Signed-off-by: Hesham Almatary 
---
 target/riscv/cpu.h|  1 +
 target/riscv/cpu_helper.c | 10 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index c17184f4e4..ab3ba3f15a 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -94,6 +94,7 @@ enum {
 #define PRIV_VERSION_1_09_1 0x00010901
 #define PRIV_VERSION_1_10_0 0x00011000

+#define TRANSLATE_PMP_FAIL 2
 #define TRANSLATE_FAIL 1
 #define TRANSLATE_SUCCESS 0
 #define NB_MMU_MODES 4
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 7c7282c680..d0b0f9cf88 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -211,6 +211,12 @@ restart:

 /* check that physical address of PTE is legal */
 target_ulong pte_addr = base + idx * ptesize;
+
+if (riscv_feature(env, RISCV_FEATURE_PMP) &&
+!pmp_hart_has_privs(env, pte_addr, sizeof(target_ulong),
+1 << MMU_DATA_LOAD)) {
+return TRANSLATE_PMP_FAIL;
+}
 #if defined(TARGET_RISCV32)
 target_ulong pte = ldl_phys(cs->as, pte_addr);
 #elif defined(TARGET_RISCV64)
@@ -405,8 +411,10 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
 (ret == TRANSLATE_SUCCESS) &&
 !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
+ret = TRANSLATE_PMP_FAIL;
+}
+if (ret == TRANSLATE_PMP_FAIL) {
 pmp_violation = true;
-ret = TRANSLATE_FAIL;
 }
 if (ret == TRANSLATE_SUCCESS) {
 tlb_set_page(cs, address & TARGET_PAGE_MASK, pa & TARGET_PAGE_MASK,
--
2.17.1

Re: [Qemu-devel] [Qemu-riscv] [PATCH 1/2] RISC-V: Raise access fault exceptions on PMP violations

2019-05-18 Thread Hesham Almatary

Hi Jonathan,

Thanks for your feedback.

On Sat, 18 May 2019 at 22:51, Jonathan Behrens  wrote:
>
> This patch assumes that translation failure should always raise a paging 
> fault, but it should be possible for it to raise an access fault as well 
> (since according to the spec "PMP  checks  are  also  applied  to  page-table 
>  accesses  for  virtual-address translation, for which the effective 
> privilege mode is S."). I think the code to actually do the PMP checking 
> during page table walks is currently unimplemented though...
>

The patch actually fixes (rather than assumes) one issue of the
current implementation which always raises a paging fault "when
translation succeeds and PMP fails". The second issue that you report
here which happens "when the PTW fails the PMP check" could be another
future separate fix.

I am happy to submit another patch to fix the second issue.

> Jonathan
>
> On Sat, May 18, 2019 at 3:14 PM Hesham Almatary 
>  wrote:
>>
>> Section 3.6 in RISC-V v1.10 privilege specification states that PMP 
>> violations
>> report "access exceptions." The current PMP implementation has
>> a bug which wrongly reports "page exceptions" on PMP violations.
>>
>> This patch fixes this bug by reporting the correct PMP access exceptions
>> trap values.
>>
>> Signed-off-by: Hesham Almatary 
>> ---
>>  target/riscv/cpu_helper.c | 9 ++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
>> index 41d6db41c3..b48de36114 100644
>> --- a/target/riscv/cpu_helper.c
>> +++ b/target/riscv/cpu_helper.c
>> @@ -318,12 +318,13 @@ restart:
>>  }
>>
>>  static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
>> -MMUAccessType access_type)
>> +MMUAccessType access_type, bool 
>> pmp_violation)
>>  {
>>  CPUState *cs = CPU(riscv_env_get_cpu(env));
>>  int page_fault_exceptions =
>>  (env->priv_ver >= PRIV_VERSION_1_10_0) &&
>> -get_field(env->satp, SATP_MODE) != VM_1_10_MBARE;
>> +get_field(env->satp, SATP_MODE) != VM_1_10_MBARE &&
>> +!pmp_violation;
>>  switch (access_type) {
>>  case MMU_INST_FETCH:
>>  cs->exception_index = page_fault_exceptions ?
>> @@ -389,6 +390,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
>> size,
>>  CPURISCVState *env = >env;
>>  hwaddr pa = 0;
>>  int prot;
>> +bool pmp_violation = false;
>>  int ret = TRANSLATE_FAIL;
>>
>>  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
>> @@ -402,6 +404,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
>> size,
>>
>>  if (riscv_feature(env, RISCV_FEATURE_PMP) &&
>>  !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
>> +pmp_violation = true;
>>  ret = TRANSLATE_FAIL;
>>  }
>>  if (ret == TRANSLATE_SUCCESS) {
>> @@ -411,7 +414,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
>> size,
>>  } else if (probe) {
>>  return false;
>>  } else {
>> -raise_mmu_exception(env, address, access_type);
>> +raise_mmu_exception(env, address, access_type, pmp_violation);
>>  riscv_raise_exception(env, cs->exception_index, retaddr);
>>  }
>>  #else
>> --
>> 2.17.1
>>
>>

Re: [Qemu-devel] [PATCH v2 10/13] tests/vm: freebsd autoinstall, using serial console

2019-05-18 Thread Philippe Mathieu-Daudé

Hi Gerd,

On 5/10/19 12:46 PM, Gerd Hoffmann wrote:
> Instead of fetching the prebuilt image from patchew download the install
> iso and prepare the image locally.  Install to disk, using the serial
> console.  Create qemu user, configure ssh login.  Install packages
> needed for qemu builds.

I'm impressed how charmly this works :)

3 comments so far.

1/ We could record (in tests/vm/freebsd header?) roughly how many local
storage will be used (or display in 'make vm-help'?). FYI this image
takes ~3.1GiB.

2/ "Autoboot in 9 seconds, hit [Enter] to boot or any other key to stop"

3/ I am a bit annoyed it overwrote my previous
~/.cache/qemu-vm/images/freebsd.img VM. Not sure what's the best hash to
use, maybe "git log -n 1 --pretty=format:%H -- tests/vm/freebsd"?
(Similarly for other images).

> Note that freebsd package downloads are delivered as non-cachable
> content, so I had to configure squid with "ignore-no-store
> ignore-private ignore-reload" for pkgmir.geo.freebsd.org to make the
> caching actually work.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  tests/vm/freebsd | 175 ---
>  1 file changed, 165 insertions(+), 10 deletions(-)
> 
> diff --git a/tests/vm/freebsd b/tests/vm/freebsd
> index b0066017a617..57e5c97f3b26 100755
> --- a/tests/vm/freebsd
> +++ b/tests/vm/freebsd
> @@ -2,43 +2,198 @@
>  #
>  # FreeBSD VM image
>  #
> -# Copyright 2017 Red Hat Inc.
> +# Copyright 2017-2019 Red Hat Inc.
>  #
>  # Authors:
>  #  Fam Zheng 
> +#  Gerd Hoffmann 
>  #
>  # This code is licensed under the GPL version 2 or later.  See
>  # the COPYING file in the top-level directory.
>  #
>  
>  import os
> +import re
>  import sys
> +import time
> +import socket
>  import subprocess
>  import basevm
>  
>  class FreeBSDVM(basevm.BaseVM):
>  name = "freebsd"
>  arch = "x86_64"
> +
> +link = 
> "https://download.freebsd.org/ftp/releases/ISO-IMAGES/12.0/FreeBSD-12.0-RELEASE-amd64-disc1.iso.xz;
> +csum = "1d40015bea89d05b8bd13e2ed80c40b522a9ec1abd8e7c8b80954fb485fb99db"
> +size = "20G"
> +pkgs = [
> +# build tools
> +"git",
> +"pkgconf",
> +"bzip2",
> +
> +# gnu tools
> +"bash",
> +"gmake",
> +"gsed",
> +"flex", "bison",
> +
> +# libs: crypto
> +"gnutls",
> +
> +# libs: images
> +"jpeg-turbo",
> +"png",
> +
> +# libs: ui
> +"sdl2",
> +"gtk3",
> +"libxkbcommon",
> +
> +# libs: opengl
> +"libepoxy",
> +"mesa-libs",
> +]
> +
>  BUILD_SCRIPT = """
>  set -e;
> -rm -rf /var/tmp/qemu-test.*
> -cd $(mktemp -d /var/tmp/qemu-test.XX);
> +rm -rf /home/qemu/qemu-test.*
> +cd $(mktemp -d /home/qemu/qemu-test.XX);
> +mkdir src build; cd src;
>  tar -xf /dev/vtbd1;
> -./configure {configure_opts};
> +cd ../build
> +../src/configure --python=python3.6 {configure_opts};
>  gmake --output-sync -j{jobs} {target} {verbose};
>  """
>  
> +def console_boot_serial(self):
> +self.console_wait_send("Autoboot", "3")
> +self.console_wait_send("OK", "set console=comconsole\n")
> +self.console_wait_send("OK", "boot\n")
> +
>  def build_image(self, img):
> -cimg = 
> self._download_with_cache("http://download.patchew.org/freebsd-11.1-amd64.img.xz;,
> -
> sha256sum='adcb771549b37bc63826c501f05121a206ed3d9f55f49145908f7e1432d65891')
> -img_tmp_xz = img + ".tmp.xz"
> +self.print_step("Downloading install iso")
> +cimg = self._download_with_cache(self.link, sha256sum=self.csum)
>  img_tmp = img + ".tmp"
> -sys.stderr.write("Extracting the image...\n")
> -subprocess.check_call(["cp", "-f", cimg, img_tmp_xz])
> -subprocess.check_call(["xz", "-dvf", img_tmp_xz])
> +iso = img + ".install.iso"
> +iso_xz = iso + ".xz"
> +
> +self.print_step("Preparing iso and disk image")
> +subprocess.check_call(["cp", "-f", cimg, iso_xz])
> +subprocess.check_call(["xz", "-dvf", iso_xz])
> +subprocess.check_call(["qemu-img", "create", "-f", "qcow2",
> +   img_tmp, self.size])
> +
> +self.print_step("Booting installer")
> +self.boot(img_tmp, extra_args = [
> +"-machine", "graphics=off",
> +"-cdrom", iso
> +])
> +self.console_init()
> +self.console_boot_serial()
> +self.console_wait_send("Console type",  "xterm\n")
> +
> +# pre-install configuration
> +self.console_wait_send("Welcome",   "\n")
> +self.console_wait_send("Keymap Selection",  "\n")
> +self.console_wait_send("Set Hostname",  "freebsd\n")
> +

Re: [Qemu-devel] [PATCH v2 07/13] tests/vm: add DEBUG=1 to help text

2019-05-18 Thread Philippe Mathieu-Daudé

Hi Gerd,

On 5/10/19 12:46 PM, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  tests/vm/Makefile.include | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/vm/Makefile.include b/tests/vm/Makefile.include
> index 47084d5717c6..8714b5947958 100644
> --- a/tests/vm/Makefile.include
> +++ b/tests/vm/Makefile.include
> @@ -25,6 +25,8 @@ vm-test:
>   @echo "  vm-boot-ssh- - Boot guest and login via ssh"
>   @echo
>   @echo "Special variables:"
> + @echo "DEBUG=1   - be verbose, also start 
> interactive"
> + @echo "shell on build failures"

Can you replace  by ? See:

Special variables:
DEBUG=1  - be verbose, also start interactive
   shell on build failures
BUILD_TARGET=foo - override the build target
TARGET_LIST=a,b,c- Override target list in builds.
EXTRA_CONFIGURE_OPTS="..."

Using spaces:
Reviewed-by: Philippe Mathieu-Daudé 

>   @echo "BUILD_TARGET=foo  - override the build target"
>   @echo "TARGET_LIST=a,b,c - Override target list in 
> builds."
>   @echo 'EXTRA_CONFIGURE_OPTS="..."'
>

Re: [Qemu-devel] [PATCH v2 13/13] tests/vm: ubuntu.i386: apt proxy setup

2019-05-18 Thread Philippe Mathieu-Daudé

On 5/10/19 12:46 PM, Gerd Hoffmann wrote:
> Configure apt proxy so package downloads
> can be cached and can pass firewalls.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  tests/vm/ubuntu.i386 | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tests/vm/ubuntu.i386 b/tests/vm/ubuntu.i386
> index a22d137e76df..b869afd212fa 100755
> --- a/tests/vm/ubuntu.i386
> +++ b/tests/vm/ubuntu.i386
> @@ -51,6 +51,10 @@ class UbuntuX86VM(basevm.BaseVM):
>"ssh-authorized-keys:\n",
>"- %s\n" % basevm.SSH_PUB_KEY,
>"locale: en_US.UTF-8\n"])
> +proxy = os.environ.get("http_proxy")
> +if not proxy is None:
> +udata.writelines(["apt:\n",
> +  "  proxy: %s" % proxy])
>  udata.close()
>  subprocess.check_call(["genisoimage", "-output", "cloud-init.iso",
> "-volid", "cidata", "-joliet", "-rock",
> 

Reviewed-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH v2 02/13] tests/vm: send proxy environment variables over ssh

2019-05-18 Thread Philippe Mathieu-Daudé

On 5/10/19 12:46 PM, Gerd Hoffmann wrote:
> Packages are fetched via proxy that way, if configured on the host.
> That might be required to pass firewalls, and it allows to route
> package downloads through a caching proxy server.
> 
> Needs AcceptEnv setup in sshd_config on the guest side to work.
> 
> Signed-off-by: Gerd Hoffmann 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  tests/vm/basevm.py | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
> index 0556bdcf9e9f..6b46674f4497 100755
> --- a/tests/vm/basevm.py
> +++ b/tests/vm/basevm.py
> @@ -38,6 +38,13 @@ class BaseVM(object):
>  GUEST_PASS = "qemupass"
>  ROOT_PASS = "qemupass"
>  
> +envvars = [
> +"https_proxy",
> +"http_proxy",
> +"ftp_proxy",
> +"no_proxy",
> +]
> +
>  # The script to run in the guest that builds QEMU
>  BUILD_SCRIPT = ""
>  # The guest name, to be overridden by subclasses
> @@ -105,6 +112,8 @@ class BaseVM(object):
> "-o", "UserKnownHostsFile=" + os.devnull,
> "-o", "ConnectTimeout=1",
> "-p", self.ssh_port, "-i", self._ssh_key_file]
> +for var in self.envvars:
> +ssh_cmd += ['-o', "SendEnv=%s" % var ]
>  if interactive:
>  ssh_cmd += ['-t']
>  assert not isinstance(cmd, str)
>

Re: [Qemu-devel] [PATCH for-4.1 2/2] target/riscv: Add support for -bios "firmware_filename" flag

2019-05-18 Thread Jonathan Behrens

> I've never been fully convinced of this, why not just use the generic
loader?

If I understand you are proposing passing bbl (or other firmware) with the
-kernel flag, and then vmlinux (or another kernel) with the -initrd flag?
Wouldn't this result in losing the ability to pass a real init ramdisk to
Linux? It also seems to open the possibility for strange bugs/compatibility
issues later if firmware starts recognizing any "initrd" entries in the
device tree as kernel code to jump into.

I do wonder though how compatible the current design is with providing
default firmware for riscv in the future.

> This should be in a generic boot.c file and support added to all RISC-V
boards.

I can do this for v2.

Jonathan

Re: [Qemu-devel] [Qemu-riscv] [PATCH 1/2] RISC-V: Raise access fault exceptions on PMP violations

2019-05-18 Thread Jonathan Behrens

This patch assumes that translation failure should always raise a paging
fault, but it should be possible for it to raise an access fault as well
(since according to the spec "PMP  checks  are  also  applied  to
page-table  accesses  for  virtual-address translation, for which the
effective privilege mode is S."). I think the code to actually do the PMP
checking during page table walks is currently unimplemented though...

Jonathan

On Sat, May 18, 2019 at 3:14 PM Hesham Almatary <
hesham.almat...@cl.cam.ac.uk> wrote:

> Section 3.6 in RISC-V v1.10 privilege specification states that PMP
> violations
> report "access exceptions." The current PMP implementation has
> a bug which wrongly reports "page exceptions" on PMP violations.
>
> This patch fixes this bug by reporting the correct PMP access exceptions
> trap values.
>
> Signed-off-by: Hesham Almatary 
> ---
>  target/riscv/cpu_helper.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
> index 41d6db41c3..b48de36114 100644
> --- a/target/riscv/cpu_helper.c
> +++ b/target/riscv/cpu_helper.c
> @@ -318,12 +318,13 @@ restart:
>  }
>
>  static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
> -MMUAccessType access_type)
> +MMUAccessType access_type, bool
> pmp_violation)
>  {
>  CPUState *cs = CPU(riscv_env_get_cpu(env));
>  int page_fault_exceptions =
>  (env->priv_ver >= PRIV_VERSION_1_10_0) &&
> -get_field(env->satp, SATP_MODE) != VM_1_10_MBARE;
> +get_field(env->satp, SATP_MODE) != VM_1_10_MBARE &&
> +!pmp_violation;
>  switch (access_type) {
>  case MMU_INST_FETCH:
>  cs->exception_index = page_fault_exceptions ?
> @@ -389,6 +390,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address,
> int size,
>  CPURISCVState *env = >env;
>  hwaddr pa = 0;
>  int prot;
> +bool pmp_violation = false;
>  int ret = TRANSLATE_FAIL;
>
>  qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
> @@ -402,6 +404,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address,
> int size,
>
>  if (riscv_feature(env, RISCV_FEATURE_PMP) &&
>  !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type))
> {
> +pmp_violation = true;
>  ret = TRANSLATE_FAIL;
>  }
>  if (ret == TRANSLATE_SUCCESS) {
> @@ -411,7 +414,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address,
> int size,
>  } else if (probe) {
>  return false;
>  } else {
> -raise_mmu_exception(env, address, access_type);
> +raise_mmu_exception(env, address, access_type, pmp_violation);
>  riscv_raise_exception(env, cs->exception_index, retaddr);
>  }
>  #else
> --
> 2.17.1
>
>
>

Re: [Qemu-devel] QEMU on OpenBSD is broken?

2019-05-18 Thread Jim Payne


On 5/16/2019 9:04 PM, Thomas Huth wrote:


On 10/05/2019 12.46, Gerd Hoffmann wrote:

This patch series changes the way virtual machines for test builds are
managed.  They are created locally on the developer machine now.  The
installer is booted on the serial console and the scripts walks through
the dialogs to install and configure the guest.

That takes the download.patchew.org server out of the loop and makes it
alot easier to tweak the guest images (adding build dependencies for
example).

The install scripts take care to apply host proxy settings (from *_proxy
environment variables) to the guest, so any package downloads will be
routed through the proxy and can be cached that way.  This also makes
them work behind strict firewalls.

There are also a bunch of smaller tweaks for tests/vm to fix issues I
was struggling with.  See commit messages of individual patches for
details.

Gerd Hoffmann (13):
   scripts: use git archive in archive-source
   tests/vm: send proxy environment variables over ssh
   tests/vm: use ssh with pty unconditionally
   tests/vm: run test builds on snapshot
   tests/vm: proper guest shutdown
   tests/vm: add vm-boot-{ssh,serial}- targets
   tests/vm: add DEBUG=1 to help text
   tests/vm: serial console support helpers
   tests/vm: openbsd autoinstall, using serial console
   tests/vm: freebsd autoinstall, using serial console
   tests/vm: netbsd autoinstall, using serial console
   tests/vm: fedora autoinstall, using serial console
   tests/vm: ubuntu.i386: apt proxy setup

freebsd, netbsd and fedora targets work fine for me, so for the patches
1 - 8 and 10 - 12 :

Tested-by: Thomas Huth 

openbsd still fails for me:

   TESTcheck-qtest-arm: tests/tmp105-test
   TESTcheck-qtest-arm: tests/pca9552-test
   TESTcheck-qtest-arm: tests/ds1338-test
   TESTcheck-qtest-arm: tests/microbit-test
   TESTcheck-qtest-arm: tests/m25p80-test
   TESTcheck-qtest-arm: tests/test-arm-mptimer
   TESTcheck-qtest-arm: tests/boot-serial-test
qemu-system-arm: cannot set up guest memory 'ram': Cannot allocate memory
Broken pipe


How much memory is trying to be allocated here?

The default maximum data size is set to 768MB. If there is a requirement 
to go beyond

that then the default has to be adjusted in /etc/login.conf.

datasize-max and datasize-cur

default:\
    :path=/usr/bin /bin /usr/sbin /sbin /usr/X11R6/bin 
/usr/local/bin /usr/local/sbin:\

    :umask=022:\
    :datasize-max=768M:\
    :datasize-cur=768M:\
    :maxproc-max=256:\
    :maxproc-cur=128:\
    :openfiles-max=1024:\
    :openfiles-cur=512:\

[Qemu-devel] [Bug 1829576] Re: QEMU-SYSTEM-PPC64 Regression QEMU-4.0.0

2019-05-18 Thread Jose Santiago

I applied the four patches you indicated and the image boots up and
runs. Everything seems to be working now. Thank You.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1829576

Title:
  QEMU-SYSTEM-PPC64 Regression QEMU-4.0.0

Status in QEMU:
  New

Bug description:
  I have been using QEMU-SYSTEM-PPC64 v3.1.0 to run CentOS7 PPC emulated
  system. It stopped working when I upgraded to QEMU-4.0.0 . I
  downgraded back to QEMU-3.1.0 and it started working again. The
  problem is that my CentOS7 image will not boot up udner QEMU-4.0.0,
  but works fine under QEMU-3.1.0.

  I have an QCOW2 image available at
  https://www.mediafire.com/file/d8dda05ro85whn1/linux-
  centos7-ppc64.qcow2/file . NOTE: It is 15GB. Kind of large.

  I run it as follows:

 qemu-system-ppc64 \
-name "CENTOS7-PPC64" \
-cpu POWER7 -machine pseries \
-m 4096 \
-netdev bridge,id=netbr0,br=br0 \
-device e1000,netdev=netbr0,mac=52:54:3c:13:21:33 \
-hda "./linux-centos7-ppc64.qcow2" \
-monitor stdio

  HOST: I am using Manjaro Linux on an Intel i7 machine with the QEMU
  packages installed via the package manager of the distribution.

  [jsantiago@jlsws0 ~]$ uname -a
  Linux jlsws0.haivision.com 4.19.42-1-MANJARO #1 SMP PREEMPT Fri May 10 
20:52:43 UTC 2019 x86_64 GNU/Linux

  jsantiago@jlsws0 ~]$ cpuinfo 
  Intel(R) processor family information utility, Version 2019 Update 3 Build 
20190214 (id: b645a4a54)
  Copyright (C) 2005-2019 Intel Corporation.  All rights reserved.

  =  Processor composition  =
  Processor name: Intel(R) Core(TM) i7-6700K  
  Packages(sockets) : 1
  Cores : 4
  Processors(CPUs)  : 8
  Cores per package : 4
  Threads per core  : 2

  =  Processor identification  =
  Processor Thread Id.  Core Id.Package Id.
  0 0   0   0   
  1 0   1   0   
  2 0   2   0   
  3 0   3   0   
  4 1   0   0   
  5 1   1   0   
  6 1   2   0   
  7 1   3   0   
  =  Placement on packages  =
  Package Id.   Core Id.Processors
  0 0,1,2,3 (0,4)(1,5)(2,6)(3,7)

  =  Cache sharing  =
  Cache SizeProcessors
  L132  KB  (0,4)(1,5)(2,6)(3,7)
  L2256 KB  (0,4)(1,5)(2,6)(3,7)
  L38   MB  (0,1,2,3,4,5,6,7)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1829576/+subscriptions

[Qemu-devel] [PATCH 0/2] target/arm: make use of new gvec expanders

2019-05-18 Thread Richard Henderson

Based-on: <20190518190157.21255-1-richard.hender...@linaro.org>
Aka "tcg: misc gvec improvements".

We've added (or are adding) generic support for variable vector shifts
and bitsel.  This trivially replaces the implementations of BSL, BIT,
and BSL.  It enables a reasonable implementation of {U,S}SHL.


r~


Richard Henderson (2):
  target/arm: Vectorize USHL and SSHL
  target/arm: Use tcg_gen_gvec_bitsel

 target/arm/helper.h|  15 +-
 target/arm/translate-a64.h |   2 +
 target/arm/translate.h |   9 +-
 target/arm/neon_helper.c   |  33 
 target/arm/translate-a64.c |  33 ++--
 target/arm/translate.c | 362 -
 target/arm/vec_helper.c| 176 ++
 7 files changed, 486 insertions(+), 144 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH 1/2] target/arm: Vectorize USHL and SSHL

2019-05-18 Thread Richard Henderson

These instructions shift left or right depending on the sign
of the input, and 7 bits are significant to the shift.  This
requires several masks and selects in addition to the actual
shifts to form the complete answer.

That said, the operation is still a small improvement even for
two 64-bit elements -- 13 vector operations instead of 2 * 7
integer operations.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  15 +-
 target/arm/translate.h |   6 +
 target/arm/neon_helper.c   |  33 -
 target/arm/translate-a64.c |  18 +--
 target/arm/translate.c | 284 +++--
 target/arm/vec_helper.c| 176 +++
 6 files changed, 466 insertions(+), 66 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 132aa1682e..ac2d8fb407 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -297,14 +297,8 @@ DEF_HELPER_2(neon_abd_s16, i32, i32, i32)
 DEF_HELPER_2(neon_abd_u32, i32, i32, i32)
 DEF_HELPER_2(neon_abd_s32, i32, i32, i32)
 
-DEF_HELPER_2(neon_shl_u8, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_shl_u16, i32, i32, i32)
 DEF_HELPER_2(neon_shl_s16, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_s32, i32, i32, i32)
-DEF_HELPER_2(neon_shl_u64, i64, i64, i64)
-DEF_HELPER_2(neon_shl_s64, i64, i64, i64)
 DEF_HELPER_2(neon_rshl_u8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_s8, i32, i32, i32)
 DEF_HELPER_2(neon_rshl_u16, i32, i32, i32)
@@ -691,6 +685,15 @@ DEF_HELPER_FLAGS_2(frint64_s, TCG_CALL_NO_RWG, f32, f32, 
ptr)
 DEF_HELPER_FLAGS_2(frint32_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 DEF_HELPER_FLAGS_2(frint64_d, TCG_CALL_NO_RWG, f64, f64, ptr)
 
+DEF_HELPER_FLAGS_4(gvec_sshl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_sshl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(gvec_ushl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/translate.h b/target/arm/translate.h
index c2348def0d..f357b767cb 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -244,6 +244,8 @@ extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
+extern const GVecGen3 sshl_op[4];
+extern const GVecGen3 ushl_op[4];
 extern const GVecGen2i ssra_op[4];
 extern const GVecGen2i usra_op[4];
 extern const GVecGen2i sri_op[4];
@@ -253,6 +255,10 @@ extern const GVecGen4 sqadd_op[4];
 extern const GVecGen4 uqsub_op[4];
 extern const GVecGen4 sqsub_op[4];
 void gen_cmtst_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_ushl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_sshl_i32(TCGv_i32 d, TCGv_i32 a, TCGv_i32 b);
+void gen_ushl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
+void gen_sshl_i64(TCGv_i64 d, TCGv_i64 a, TCGv_i64 b);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index 4259056723..c581ffb7d3 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -615,24 +615,9 @@ NEON_VOP(abd_u32, neon_u32, 1)
 } else { \
 dest = src1 << tmp; \
 }} while (0)
-NEON_VOP(shl_u8, neon_u8, 4)
 NEON_VOP(shl_u16, neon_u16, 2)
-NEON_VOP(shl_u32, neon_u32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t shiftop)
-{
-int8_t shift = (int8_t)shiftop;
-if (shift >= 64 || shift <= -64) {
-val = 0;
-} else if (shift < 0) {
-val >>= -shift;
-} else {
-val <<= shift;
-}
-return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
 int8_t tmp; \
 tmp = (int8_t)src2; \
@@ -645,27 +630,9 @@ uint64_t HELPER(neon_shl_u64)(uint64_t val, uint64_t 
shiftop)
 } else { \
 dest = src1 << tmp; \
 }} while (0)
-NEON_VOP(shl_s8, neon_s8, 4)
 NEON_VOP(shl_s16, neon_s16, 2)
-NEON_VOP(shl_s32, neon_s32, 1)
 #undef NEON_FN
 
-uint64_t HELPER(neon_shl_s64)(uint64_t valop, uint64_t shiftop)
-{
-int8_t shift = (int8_t)shiftop;
-int64_t val = valop;
-if (shift >= 64) {
-val = 0;
-} else if (shift <= -64) {
-val >>= 63;
-} else if (shift < 0) {
-val >>= -shift;
-} else {
-val <<= shift;
-}
-return val;
-}
-
 #define NEON_FN(dest, src1, src2) do { \
 int8_t tmp; \
 tmp = (int8_t)src2; \
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b7c5a928b4..2c280243a9 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c

[Qemu-devel] [PATCH 2/2] target/arm: Use tcg_gen_gvec_bitsel

2019-05-18 Thread Richard Henderson

This replaces 3 target-specific implementations for BIT, BIF, and BSL.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h |  2 +
 target/arm/translate.h |  3 --
 target/arm/translate-a64.c | 15 ++--
 target/arm/translate.c | 78 +++---
 4 files changed, 20 insertions(+), 78 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 63d958cf50..9569bc5963 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -122,5 +122,7 @@ typedef void GVecGen2iFn(unsigned, uint32_t, uint32_t, 
int64_t,
  uint32_t, uint32_t);
 typedef void GVecGen3Fn(unsigned, uint32_t, uint32_t,
 uint32_t, uint32_t, uint32_t);
+typedef void GVecGen4Fn(unsigned, uint32_t, uint32_t, uint32_t,
+uint32_t, uint32_t, uint32_t);
 
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate.h b/target/arm/translate.h
index f357b767cb..01ae454dcf 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -238,9 +238,6 @@ static inline void gen_ss_advance(DisasContext *s)
 }
 
 /* Vector operations shared between ARM and AArch64.  */
-extern const GVecGen3 bsl_op;
-extern const GVecGen3 bit_op;
-extern const GVecGen3 bif_op;
 extern const GVecGen3 mla_op[4];
 extern const GVecGen3 mls_op[4];
 extern const GVecGen3 cmtst_op[4];
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 2c280243a9..955ab63ff8 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -704,6 +704,15 @@ static void gen_gvec_fn3(DisasContext *s, bool is_q, int 
rd, int rn, int rm,
 vec_full_reg_offset(s, rm), is_q ? 16 : 8, vec_full_reg_size(s));
 }
 
+/* Expand a 4-operand AdvSIMD vector operation using an expander function.  */
+static void gen_gvec_fn4(DisasContext *s, bool is_q, int rd, int rn, int rm,
+ int rx, GVecGen4Fn *gvec_fn, int vece)
+{
+gvec_fn(vece, vec_full_reg_offset(s, rd), vec_full_reg_offset(s, rn),
+vec_full_reg_offset(s, rm), vec_full_reg_offset(s, rx),
+is_q ? 16 : 8, vec_full_reg_size(s));
+}
+
 /* Expand a 2-operand + immediate AdvSIMD vector operation using
  * an op descriptor.
  */
@@ -10916,13 +10925,13 @@ static void disas_simd_3same_logic(DisasContext *s, 
uint32_t insn)
 return;
 
 case 5: /* BSL bitwise select */
-gen_gvec_op3(s, is_q, rd, rn, rm, _op);
+gen_gvec_fn4(s, is_q, rd, rd, rn, rm, tcg_gen_gvec_bitsel, 0);
 return;
 case 6: /* BIT, bitwise insert if true */
-gen_gvec_op3(s, is_q, rd, rn, rm, _op);
+gen_gvec_fn4(s, is_q, rd, rm, rn, rd, tcg_gen_gvec_bitsel, 0);
 return;
 case 7: /* BIF, bitwise insert if false */
-gen_gvec_op3(s, is_q, rd, rn, rm, _op);
+gen_gvec_fn4(s, is_q, rd, rm, rd, rn, tcg_gen_gvec_bitsel, 0);
 return;
 
 default:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 49dfcdc90d..3abcae3a50 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -5755,72 +5755,6 @@ static int do_v81_helper(DisasContext *s, 
gen_helper_gvec_3_ptr *fn,
 return 1;
 }
 
-/*
- * Expanders for VBitOps_VBIF, VBIT, VBSL.
- */
-static void gen_bsl_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-tcg_gen_xor_i64(rn, rn, rm);
-tcg_gen_and_i64(rn, rn, rd);
-tcg_gen_xor_i64(rd, rm, rn);
-}
-
-static void gen_bit_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-tcg_gen_xor_i64(rn, rn, rd);
-tcg_gen_and_i64(rn, rn, rm);
-tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bif_i64(TCGv_i64 rd, TCGv_i64 rn, TCGv_i64 rm)
-{
-tcg_gen_xor_i64(rn, rn, rd);
-tcg_gen_andc_i64(rn, rn, rm);
-tcg_gen_xor_i64(rd, rd, rn);
-}
-
-static void gen_bsl_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-tcg_gen_xor_vec(vece, rn, rn, rm);
-tcg_gen_and_vec(vece, rn, rn, rd);
-tcg_gen_xor_vec(vece, rd, rm, rn);
-}
-
-static void gen_bit_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-tcg_gen_xor_vec(vece, rn, rn, rd);
-tcg_gen_and_vec(vece, rn, rn, rm);
-tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-static void gen_bif_vec(unsigned vece, TCGv_vec rd, TCGv_vec rn, TCGv_vec rm)
-{
-tcg_gen_xor_vec(vece, rn, rn, rd);
-tcg_gen_andc_vec(vece, rn, rn, rm);
-tcg_gen_xor_vec(vece, rd, rd, rn);
-}
-
-const GVecGen3 bsl_op = {
-.fni8 = gen_bsl_i64,
-.fniv = gen_bsl_vec,
-.prefer_i64 = TCG_TARGET_REG_BITS == 64,
-.load_dest = true
-};
-
-const GVecGen3 bit_op = {
-.fni8 = gen_bit_i64,
-.fniv = gen_bit_vec,
-.prefer_i64 = TCG_TARGET_REG_BITS == 64,
-.load_dest = true
-};
-
-const GVecGen3 bif_op = {
-.fni8 = gen_bif_i64,
-.fniv = gen_bif_vec,
-.prefer_i64 = TCG_TARGET_REG_BITS == 64,
-.load_dest = true
-};
-
 static void gen_ssra8_i64(TCGv_i64 d, TCGv_i64 a, int64_t shift)
 {
 tcg_gen_vec_sar8i_i64(a, a, shift);

[Qemu-devel] [PATCH 2/2] RISC-V: Only Check PMP if MMU translation succeeds

2019-05-18 Thread Hesham Almatary

The current implementation unnecessarily checks for PMP even if MMU translation
failed. This may trigger a wrong PMP access exception instead of
a page exception.

For example, the very first instruction fetched after the first satp write in
S-Mode will trigger a PMP access fault instead of an instruction fetch page
fault.

This patch prioritises MMU exceptions over PMP exceptions and only checks for
PMP if MMU translation succeeds.

Signed-off-by: Hesham Almatary 
---
 target/riscv/cpu_helper.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index b48de36114..7c7282c680 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -403,6 +403,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
   " prot %d\n", __func__, address, ret, pa, prot);

 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
+(ret == TRANSLATE_SUCCESS) &&
 !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
 pmp_violation = true;
 ret = TRANSLATE_FAIL;
--
2.17.1

[Qemu-devel] [PATCH 1/2] target/ppc: Use vector variable shifts for VSL, VSR, VSRA

2019-05-18 Thread Richard Henderson

The gvec expanders take care of masking the shift amount
against the element width.

Signed-off-by: Richard Henderson 
---
 target/ppc/helper.h | 12 --
 target/ppc/int_helper.c | 37 -
 target/ppc/translate/vmx-impl.inc.c | 24 +--
 3 files changed, 12 insertions(+), 61 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 638a6e99c4..02b67a333e 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -180,18 +180,6 @@ DEF_HELPER_3(vmuloub, void, avr, avr, avr)
 DEF_HELPER_3(vmulouh, void, avr, avr, avr)
 DEF_HELPER_3(vmulouw, void, avr, avr, avr)
 DEF_HELPER_3(vmuluwm, void, avr, avr, avr)
-DEF_HELPER_3(vsrab, void, avr, avr, avr)
-DEF_HELPER_3(vsrah, void, avr, avr, avr)
-DEF_HELPER_3(vsraw, void, avr, avr, avr)
-DEF_HELPER_3(vsrad, void, avr, avr, avr)
-DEF_HELPER_3(vsrb, void, avr, avr, avr)
-DEF_HELPER_3(vsrh, void, avr, avr, avr)
-DEF_HELPER_3(vsrw, void, avr, avr, avr)
-DEF_HELPER_3(vsrd, void, avr, avr, avr)
-DEF_HELPER_3(vslb, void, avr, avr, avr)
-DEF_HELPER_3(vslh, void, avr, avr, avr)
-DEF_HELPER_3(vslw, void, avr, avr, avr)
-DEF_HELPER_3(vsld, void, avr, avr, avr)
 DEF_HELPER_3(vslo, void, avr, avr, avr)
 DEF_HELPER_3(vsro, void, avr, avr, avr)
 DEF_HELPER_3(vsrv, void, avr, avr, avr)
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index f6a088ac08..40a7035df0 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1776,23 +1776,6 @@ VSHIFT(l, 1)
 VSHIFT(r, 0)
 #undef VSHIFT
 
-#define VSL(suffix, element, mask)  \
-void helper_vsl##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
-{   \
-int i;  \
-\
-for (i = 0; i < ARRAY_SIZE(r->element); i++) {  \
-unsigned int shift = b->element[i] & mask;  \
-\
-r->element[i] = a->element[i] << shift; \
-}   \
-}
-VSL(b, u8, 0x7)
-VSL(h, u16, 0x0F)
-VSL(w, u32, 0x1F)
-VSL(d, u64, 0x3F)
-#undef VSL
-
 void helper_vslv(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
 int i;
@@ -1965,26 +1948,6 @@ VNEG(vnegw, s32)
 VNEG(vnegd, s64)
 #undef VNEG
 
-#define VSR(suffix, element, mask)  \
-void helper_vsr##suffix(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)   \
-{   \
-int i;  \
-\
-for (i = 0; i < ARRAY_SIZE(r->element); i++) {  \
-unsigned int shift = b->element[i] & mask;  \
-r->element[i] = a->element[i] >> shift; \
-}   \
-}
-VSR(ab, s8, 0x7)
-VSR(ah, s16, 0xF)
-VSR(aw, s32, 0x1F)
-VSR(ad, s64, 0x3F)
-VSR(b, u8, 0x7)
-VSR(h, u16, 0xF)
-VSR(w, u32, 0x1F)
-VSR(d, u64, 0x3F)
-#undef VSR
-
 void helper_vsro(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
 int sh = (b->VsrB(0xf) >> 3) & 0xf;
diff --git a/target/ppc/translate/vmx-impl.inc.c 
b/target/ppc/translate/vmx-impl.inc.c
index 6861f4c5b9..663275b729 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -530,21 +530,21 @@ GEN_VXFORM(vmuleuw, 4, 10);
 GEN_VXFORM(vmulesb, 4, 12);
 GEN_VXFORM(vmulesh, 4, 13);
 GEN_VXFORM(vmulesw, 4, 14);
-GEN_VXFORM(vslb, 2, 4);
-GEN_VXFORM(vslh, 2, 5);
-GEN_VXFORM(vslw, 2, 6);
+GEN_VXFORM_V(vslb, MO_8, tcg_gen_gvec_shlv, 2, 4);
+GEN_VXFORM_V(vslh, MO_16, tcg_gen_gvec_shlv, 2, 5);
+GEN_VXFORM_V(vslw, MO_32, tcg_gen_gvec_shlv, 2, 6);
 GEN_VXFORM(vrlwnm, 2, 6);
 GEN_VXFORM_DUAL(vslw, PPC_ALTIVEC, PPC_NONE, \
 vrlwnm, PPC_NONE, PPC2_ISA300)
-GEN_VXFORM(vsld, 2, 23);
-GEN_VXFORM(vsrb, 2, 8);
-GEN_VXFORM(vsrh, 2, 9);
-GEN_VXFORM(vsrw, 2, 10);
-GEN_VXFORM(vsrd, 2, 27);
-GEN_VXFORM(vsrab, 2, 12);
-GEN_VXFORM(vsrah, 2, 13);
-GEN_VXFORM(vsraw, 2, 14);
-GEN_VXFORM(vsrad, 2, 15);
+GEN_VXFORM_V(vsld, MO_64, tcg_gen_gvec_shlv, 2, 23);
+GEN_VXFORM_V(vsrb, MO_8, tcg_gen_gvec_shrv, 2, 8);
+GEN_VXFORM_V(vsrh, MO_16, tcg_gen_gvec_shrv, 2, 9);
+GEN_VXFORM_V(vsrw, MO_32, tcg_gen_gvec_shrv, 2, 10);
+GEN_VXFORM_V(vsrd, MO_64, tcg_gen_gvec_shrv, 2, 27);
+GEN_VXFORM_V(vsrab, MO_8, tcg_gen_gvec_sarv, 2, 12);
+GEN_VXFORM_V(vsrah, MO_16, tcg_gen_gvec_sarv, 2, 13);
+GEN_VXFORM_V(vsraw, MO_32, tcg_gen_gvec_sarv, 2, 14);
+GEN_VXFORM_V(vsrad, MO_64, tcg_gen_gvec_sarv, 2, 15);
 GEN_VXFORM(vsrv, 2, 28);
 GEN_VXFORM(vslv, 2, 29);
 GEN_VXFORM(vslo, 6, 16);
--

[Qemu-devel] [PATCH 1/2] RISC-V: Raise access fault exceptions on PMP violations

2019-05-18 Thread Hesham Almatary

Section 3.6 in RISC-V v1.10 privilege specification states that PMP violations
report "access exceptions." The current PMP implementation has
a bug which wrongly reports "page exceptions" on PMP violations.

This patch fixes this bug by reporting the correct PMP access exceptions
trap values.

Signed-off-by: Hesham Almatary 
---
 target/riscv/cpu_helper.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 41d6db41c3..b48de36114 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -318,12 +318,13 @@ restart:
 }

 static void raise_mmu_exception(CPURISCVState *env, target_ulong address,
-MMUAccessType access_type)
+MMUAccessType access_type, bool pmp_violation)
 {
 CPUState *cs = CPU(riscv_env_get_cpu(env));
 int page_fault_exceptions =
 (env->priv_ver >= PRIV_VERSION_1_10_0) &&
-get_field(env->satp, SATP_MODE) != VM_1_10_MBARE;
+get_field(env->satp, SATP_MODE) != VM_1_10_MBARE &&
+!pmp_violation;
 switch (access_type) {
 case MMU_INST_FETCH:
 cs->exception_index = page_fault_exceptions ?
@@ -389,6 +390,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 CPURISCVState *env = >env;
 hwaddr pa = 0;
 int prot;
+bool pmp_violation = false;
 int ret = TRANSLATE_FAIL;

 qemu_log_mask(CPU_LOG_MMU, "%s ad %" VADDR_PRIx " rw %d mmu_idx %d\n",
@@ -402,6 +404,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,

 if (riscv_feature(env, RISCV_FEATURE_PMP) &&
 !pmp_hart_has_privs(env, pa, TARGET_PAGE_SIZE, 1 << access_type)) {
+pmp_violation = true;
 ret = TRANSLATE_FAIL;
 }
 if (ret == TRANSLATE_SUCCESS) {
@@ -411,7 +414,7 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 } else if (probe) {
 return false;
 } else {
-raise_mmu_exception(env, address, access_type);
+raise_mmu_exception(env, address, access_type, pmp_violation);
 riscv_raise_exception(env, cs->exception_index, retaddr);
 }
 #else
--
2.17.1

[Qemu-devel] [PATCH 02/16] tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem

2019-05-18 Thread Richard Henderson

The paths through tcg_gen_dup_mem_vec and through MO_128 were
missing the check_size_align.  The path through MO_128 was also
missing the expand_clr.  This last was not visible because the
only user is ARM SVE, which would set oprsz == maxsz, and not
require the clear.

Fix by adding the check_size_align and using do_dup directly
instead of duplicating the check in tcg_gen_gvec_dup_{i32,i64}.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-gvec.c | 48 ---
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 338ddd9d9e..bbf70e3cd9 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -1446,36 +1446,35 @@ void tcg_gen_gvec_dup_i64(unsigned vece, uint32_t dofs, 
uint32_t oprsz,
 void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, uint32_t aofs,
   uint32_t oprsz, uint32_t maxsz)
 {
+check_size_align(oprsz, maxsz, dofs);
 if (vece <= MO_64) {
-TCGType type = choose_vector_type(0, vece, oprsz, 0);
+TCGType type = choose_vector_type(NULL, vece, oprsz, 0);
 if (type != 0) {
 TCGv_vec t_vec = tcg_temp_new_vec(type);
 tcg_gen_dup_mem_vec(vece, t_vec, cpu_env, aofs);
 do_dup_store(type, dofs, oprsz, maxsz, t_vec);
 tcg_temp_free_vec(t_vec);
-return;
+} else if (vece <= MO_32) {
+TCGv_i32 in = tcg_temp_new_i32();
+switch (vece) {
+case MO_8:
+tcg_gen_ld8u_i32(in, cpu_env, aofs);
+break;
+case MO_16:
+tcg_gen_ld16u_i32(in, cpu_env, aofs);
+break;
+default:
+tcg_gen_ld_i32(in, cpu_env, aofs);
+break;
+}
+do_dup(vece, dofs, oprsz, maxsz, in, NULL, 0);
+tcg_temp_free_i32(in);
+} else {
+TCGv_i64 in = tcg_temp_new_i64();
+tcg_gen_ld_i64(in, cpu_env, aofs);
+do_dup(vece, dofs, oprsz, maxsz, NULL, in, 0);
+tcg_temp_free_i64(in);
 }
-}
-if (vece <= MO_32) {
-TCGv_i32 in = tcg_temp_new_i32();
-switch (vece) {
-case MO_8:
-tcg_gen_ld8u_i32(in, cpu_env, aofs);
-break;
-case MO_16:
-tcg_gen_ld16u_i32(in, cpu_env, aofs);
-break;
-case MO_32:
-tcg_gen_ld_i32(in, cpu_env, aofs);
-break;
-}
-tcg_gen_gvec_dup_i32(vece, dofs, oprsz, maxsz, in);
-tcg_temp_free_i32(in);
-} else if (vece == MO_64) {
-TCGv_i64 in = tcg_temp_new_i64();
-tcg_gen_ld_i64(in, cpu_env, aofs);
-tcg_gen_gvec_dup_i64(MO_64, dofs, oprsz, maxsz, in);
-tcg_temp_free_i64(in);
 } else {
 /* 128-bit duplicate.  */
 /* ??? Dup to 256-bit vector.  */
@@ -1504,6 +1503,9 @@ void tcg_gen_gvec_dup_mem(unsigned vece, uint32_t dofs, 
uint32_t aofs,
 tcg_temp_free_i64(in0);
 tcg_temp_free_i64(in1);
 }
+if (oprsz < maxsz) {
+expand_clr(dofs + oprsz, maxsz - oprsz);
+}
 }
 }
 
-- 
2.17.1

[Qemu-devel] [PATCH 2/2] target/ppc: Use tcg_gen_gvec_bitsel

2019-05-18 Thread Richard Henderson

Replace the target-specific implementation of XXSEL.

Signed-off-by: Richard Henderson 
---
 target/ppc/translate/vsx-impl.inc.c | 24 ++--
 1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/target/ppc/translate/vsx-impl.inc.c 
b/target/ppc/translate/vsx-impl.inc.c
index 11d9b75d01..7a5d0e1f46 100644
--- a/target/ppc/translate/vsx-impl.inc.c
+++ b/target/ppc/translate/vsx-impl.inc.c
@@ -1290,28 +1290,8 @@ static void glue(gen_, name)(DisasContext *ctx)  
   \
 VSX_XXMRG(xxmrghw, 1)
 VSX_XXMRG(xxmrglw, 0)
 
-static void xxsel_i64(TCGv_i64 t, TCGv_i64 a, TCGv_i64 b, TCGv_i64 c)
-{
-tcg_gen_and_i64(b, b, c);
-tcg_gen_andc_i64(a, a, c);
-tcg_gen_or_i64(t, a, b);
-}
-
-static void xxsel_vec(unsigned vece, TCGv_vec t, TCGv_vec a,
-  TCGv_vec b, TCGv_vec c)
-{
-tcg_gen_and_vec(vece, b, b, c);
-tcg_gen_andc_vec(vece, a, a, c);
-tcg_gen_or_vec(vece, t, a, b);
-}
-
 static void gen_xxsel(DisasContext *ctx)
 {
-static const GVecGen4 g = {
-.fni8 = xxsel_i64,
-.fniv = xxsel_vec,
-.vece = MO_64,
-};
 int rt = xT(ctx->opcode);
 int ra = xA(ctx->opcode);
 int rb = xB(ctx->opcode);
@@ -1321,8 +1301,8 @@ static void gen_xxsel(DisasContext *ctx)
 gen_exception(ctx, POWERPC_EXCP_VSXU);
 return;
 }
-tcg_gen_gvec_4(vsr_full_offset(rt), vsr_full_offset(ra),
-   vsr_full_offset(rb), vsr_full_offset(rc), 16, 16, );
+tcg_gen_gvec_bitsel(MO_64, vsr_full_offset(rt), vsr_full_offset(rc),
+vsr_full_offset(rb), vsr_full_offset(ra), 16, 16);
 }
 
 static void gen_xxspltw(DisasContext *ctx)
-- 
2.17.1

[Qemu-devel] [PATCH 0/2] target/ppc: make use of new gvec expanders

2019-05-18 Thread Richard Henderson

Based-on: <20190518190157.21255-1-richard.hender...@linaro.org>
Aka "tcg: misc gvec improvements".

Since Mark's initial patches, we've added (or are adding)
generic support for variable vector shifts and bitsel.


r~


Richard Henderson (2):
  target/ppc: Use vector variable shifts for VSL, VSR, VSRA
  target/ppc: Use tcg_gen_gvec_bitsel

 target/ppc/helper.h | 12 --
 target/ppc/int_helper.c | 37 -
 target/ppc/translate/vmx-impl.inc.c | 24 +--
 target/ppc/translate/vsx-impl.inc.c | 24 ++-
 4 files changed, 14 insertions(+), 83 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH 01/16] tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts

2019-05-18 Thread Richard Henderson

The VBROADCASTSD instruction only allows %ymm registers as destination.
Rather than forcing VEX.L and writing to the entire 256-bit register,
revert to using MOVDDUP with an %xmm register.  This is sufficient for
an avx1 host since we do not support TCG_TYPE_V256 for that case.

Also fix the 32-bit avx2, which should have used VPBROADCASTW.

Fixes: 1e262b49b533
Tested-by: Mark Cave-Ayland 
Reported-by: Mark Cave-Ayland 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index aafd01cb49..b3601446cd 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -358,6 +358,7 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define OPC_MOVBE_MyGy  (0xf1 | P_EXT38)
 #define OPC_MOVD_VyEy   (0x6e | P_EXT | P_DATA16)
 #define OPC_MOVD_EyVy   (0x7e | P_EXT | P_DATA16)
+#define OPC_MOVDDUP (0x12 | P_EXT | P_SIMDF2)
 #define OPC_MOVDQA_VxWx (0x6f | P_EXT | P_DATA16)
 #define OPC_MOVDQA_WxVx (0x7f | P_EXT | P_DATA16)
 #define OPC_MOVDQU_VxWx (0x6f | P_EXT | P_SIMDF3)
@@ -921,7 +922,7 @@ static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, 
unsigned vece,
 } else {
 switch (vece) {
 case MO_64:
-tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSD, r, 0, base, offset);
+tcg_out_vex_modrm_offset(s, OPC_MOVDDUP, r, 0, base, offset);
 break;
 case MO_32:
 tcg_out_vex_modrm_offset(s, OPC_VBROADCASTSS, r, 0, base, offset);
@@ -963,12 +964,12 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 } else if (have_avx2) {
 tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTQ + vex_l, ret);
 } else {
-tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD, ret);
+tcg_out_vex_modrm_pool(s, OPC_MOVDDUP, ret);
 }
 new_pool_label(s, arg, R_386_PC32, s->code_ptr - 4, -4);
 } else {
 if (have_avx2) {
-tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSD + vex_l, ret);
+tcg_out_vex_modrm_pool(s, OPC_VPBROADCASTW + vex_l, ret);
 } else {
 tcg_out_vex_modrm_pool(s, OPC_VBROADCASTSS, ret);
 }
-- 
2.17.1

[Qemu-devel] [PATCH 03/16] tcg: Add support for vector bitwise select

2019-05-18 Thread Richard Henderson

This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h  |  2 ++
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  1 +
 tcg/tcg-op-gvec.h|  7 +++
 tcg/tcg-op.h |  3 +++
 tcg/tcg-opc.h|  2 ++
 tcg/tcg.h|  1 +
 accel/tcg/tcg-runtime-gvec.c | 14 ++
 tcg/tcg-op-gvec.c| 23 +++
 tcg/tcg-op-vec.c | 26 ++
 tcg/tcg.c|  2 ++
 tcg/README   |  4 
 12 files changed, 86 insertions(+)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 6d73dc2d65..4fa61b49b4 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -303,3 +303,5 @@ DEF_HELPER_FLAGS_4(gvec_leu8, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu16, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu32, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(gvec_leu64, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_bitsel, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e43554c3c7..52ee66424f 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,6 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
+#define TCG_TARGET_HAS_bitsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 66f16fbe3c..08a0386433 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -190,6 +190,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
+#define TCG_TARGET_HAS_bitsel_vec   0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
 (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op-gvec.h b/tcg/tcg-op-gvec.h
index 52a398c190..2a9e0c7c0a 100644
--- a/tcg/tcg-op-gvec.h
+++ b/tcg/tcg-op-gvec.h
@@ -342,6 +342,13 @@ void tcg_gen_gvec_cmp(TCGCond cond, unsigned vece, 
uint32_t dofs,
   uint32_t aofs, uint32_t bofs,
   uint32_t oprsz, uint32_t maxsz);
 
+/*
+ * Perform vector bit select: d = (b & a) | (c & ~a).
+ */
+void tcg_gen_gvec_bitsel(unsigned vece, uint32_t dofs, uint32_t aofs,
+ uint32_t bofs, uint32_t cofs,
+ uint32_t oprsz, uint32_t maxsz);
+
 /*
  * 64-bit vector operations.  Use these when the register has been allocated
  * with tcg_global_mem_new_i64, and so we cannot also address it via pointer.
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 660fe205d0..268860ed2f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1000,6 +1000,9 @@ void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec 
a, TCGv_vec s);
 void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, TCGv_vec r,
  TCGv_vec a, TCGv_vec b);
 
+void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
+TCGv_vec b, TCGv_vec c);
+
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_stl_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset, TCGType t);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 4a2dd116eb..c05b71427c 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -256,6 +256,8 @@ DEF(sarv_vec, 1, 2, 0, IMPLVEC | 
IMPL(TCG_TARGET_HAS_shv_vec))
 
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
+DEF(bitsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_bitsel_vec))
+
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
 #if TCG_TARGET_MAYBE_vec
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 0e01a70d66..72f9f6c70b 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -187,6 +187,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_mul_vec  0
 #define TCG_TARGET_HAS_sat_vec  0
 #define TCG_TARGET_HAS_minmax_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   0
 #else
 #define TCG_TARGET_MAYBE_vec1
 #endif
diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
index 0f09e0ef38..3b6052fe97 100644
--- a/accel/tcg/tcg-runtime-gvec.c
+++ b/accel/tcg/tcg-runtime-gvec.c
@@ -1444,3 +1444,17 @@ void HELPER(gvec_umax64)(void *d, void *a, void *b, 
uint32_t desc)
 }
 clear_high(d, oprsz, desc);
 }
+
+void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
+{
+intptr_t oprsz = simd_oprsz(desc);
+intptr_t i;
+
+for (i = 0; i < oprsz; i += sizeof(vec64)) {
+vec64 aa = *(vec64 *)(a + i);
+vec64 bb = *(vec64 *)(b + i);
+vec64 cc = *(vec64 *)(c + i);
+

[Qemu-devel] [PATCH 04/16] tcg: Add support for vector compare select

2019-05-18 Thread Richard Henderson

Perform a per-element conditional move.  This combination operation is
easier to implement on some host vector units than plain cmp+bitsel.
Omit the usual gvec interface, as this is intended to be used by
target-specific gvec expansion call-backs.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  1 +
 tcg/i386/tcg-target.h|  1 +
 tcg/tcg-op.h |  2 ++
 tcg/tcg-opc.h|  1 +
 tcg/tcg.h|  1 +
 tcg/tcg-op-vec.c | 59 
 tcg/tcg.c|  3 ++
 tcg/README   |  7 +
 8 files changed, 75 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 52ee66424f..b4a9d36bbc 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -141,6 +141,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
 #define TCG_TARGET_HAS_MEMORY_BSWAP 1
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 08a0386433..16a83a7f7b 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -191,6 +191,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
 (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index 268860ed2f..2d4dd5cd7d 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -1002,6 +1002,8 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece, 
TCGv_vec r,
 
 void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, TCGv_vec a,
 TCGv_vec b, TCGv_vec c);
+void tcg_gen_cmpsel_vec(TCGCond cond, unsigned vece, TCGv_vec r,
+TCGv_vec a, TCGv_vec b, TCGv_vec c, TCGv_vec d);
 
 void tcg_gen_ld_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
 void tcg_gen_st_vec(TCGv_vec r, TCGv_ptr base, TCGArg offset);
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index c05b71427c..c7d971fa3d 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -257,6 +257,7 @@ DEF(sarv_vec, 1, 2, 0, IMPLVEC | 
IMPL(TCG_TARGET_HAS_shv_vec))
 DEF(cmp_vec, 1, 2, 1, IMPLVEC)
 
 DEF(bitsel_vec, 1, 3, 0, IMPLVEC | IMPL(TCG_TARGET_HAS_bitsel_vec))
+DEF(cmpsel_vec, 1, 4, 1, IMPLVEC | IMPL(TCG_TARGET_HAS_cmpsel_vec))
 
 DEF(last_generic, 0, 0, 0, TCG_OPF_NOT_PRESENT)
 
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 72f9f6c70b..21cd6f1249 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -188,6 +188,7 @@ typedef uint64_t TCGRegSet;
 #define TCG_TARGET_HAS_sat_vec  0
 #define TCG_TARGET_HAS_minmax_vec   0
 #define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   0
 #else
 #define TCG_TARGET_MAYBE_vec1
 #endif
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 99cbf29e0b..a888c02df8 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -119,6 +119,11 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
 continue;
 }
 break;
+case INDEX_op_cmpsel_vec:
+if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
+continue;
+}
+break;
 default:
 break;
 }
@@ -159,6 +164,20 @@ void vec_gen_4(TCGOpcode opc, TCGType type, unsigned vece,
 op->args[3] = c;
 }
 
+static void vec_gen_6(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r,
+  TCGArg a, TCGArg b, TCGArg c, TCGArg d, TCGArg e)
+{
+TCGOp *op = tcg_emit_op(opc);
+TCGOP_VECL(op) = type - TCG_TYPE_V64;
+TCGOP_VECE(op) = vece;
+op->args[0] = r;
+op->args[1] = a;
+op->args[2] = b;
+op->args[3] = c;
+op->args[4] = d;
+op->args[5] = e;
+}
+
 static void vec_gen_op2(TCGOpcode opc, unsigned vece, TCGv_vec r, TCGv_vec a)
 {
 TCGTemp *rt = tcgv_vec_temp(r);
@@ -717,3 +736,43 @@ void tcg_gen_bitsel_vec(unsigned vece, TCGv_vec r, 
TCGv_vec a,
 tcg_temp_free_vec(t);
 }
 }
+
+void tcg_gen_cmpsel_vec(TCGCond cond, unsigned vece, TCGv_vec r,
+TCGv_vec a, TCGv_vec b, TCGv_vec c, TCGv_vec d)
+{
+TCGTemp *rt = tcgv_vec_temp(r);
+TCGTemp *at = tcgv_vec_temp(a);
+TCGTemp *bt = tcgv_vec_temp(b);
+TCGTemp *ct = tcgv_vec_temp(c);
+TCGTemp *dt = tcgv_vec_temp(d);
+TCGArg ri = temp_arg(rt);
+TCGArg ai = temp_arg(at);
+TCGArg bi = temp_arg(bt);
+TCGArg ci = temp_arg(ct);
+TCGArg di = temp_arg(dt);
+TCGType type = rt->base_type;
+const TCGOpcode *hold_list;
+int can;
+
+tcg_debug_assert(at->base_type >= type);
+tcg_debug_assert(bt->base_type >= type);
+tcg_debug_assert(ct->base_type >= type);
+tcg_debug_assert(dt->base_type >= type);
+
+tcg_assert_listed_vecop(INDEX_op_cmpsel_vec);

[Qemu-devel] [PATCH 08/16] tcg/i386: Support vector comparison select value

2019-05-18 Thread Richard Henderson

We already had backend support for this feature.  Expand the new
cmpsel opcode using vpblendb.  The combination allows us to avoid
an extra NOT for some comparison codes.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h |  2 +-
 tcg/i386/tcg-target.inc.c | 39 +++
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 16a83a7f7b..928e8b87bb 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -191,7 +191,7 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
 #define TCG_TARGET_HAS_bitsel_vec   0
-#define TCG_TARGET_HAS_cmpsel_vec   0
+#define TCG_TARGET_HAS_cmpsel_vec   -1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
 (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index b3601446cd..ffcafb1e14 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3246,6 +3246,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_andc_vec:
 return 1;
 case INDEX_op_cmp_vec:
+case INDEX_op_cmpsel_vec:
 return -1;
 
 case INDEX_op_shli_vec:
@@ -3464,8 +3465,8 @@ static void expand_vec_mul(TCGType type, unsigned vece,
 }
 }
 
-static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
-   TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v0,
+ TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
 enum {
 NEED_SWAP = 1,
@@ -3522,11 +3523,34 @@ static void expand_vec_cmp(TCGType type, unsigned vece, 
TCGv_vec v0,
 tcg_temp_free_vec(t2);
 }
 }
-if (fixup & NEED_INV) {
+return fixup & NEED_INV;
+}
+
+static void expand_vec_cmp(TCGType type, unsigned vece, TCGv_vec v0,
+   TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+{
+if (expand_vec_cmp_noinv(type, vece, v0, v1, v2, cond)) {
 tcg_gen_not_vec(vece, v0, v0);
 }
 }
 
+static void expand_vec_cmpsel(TCGType type, unsigned vece, TCGv_vec v0,
+  TCGv_vec c1, TCGv_vec c2,
+  TCGv_vec v3, TCGv_vec v4, TCGCond cond)
+{
+TCGv_vec t = tcg_temp_new_vec(type);
+
+if (expand_vec_cmp_noinv(type, vece, t, c1, c2, cond)) {
+/* Invert the sense of the compare by swapping arguments.  */
+TCGv_vec x;
+x = v3, v3 = v4, v4 = x;
+}
+vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
+  tcgv_vec_arg(v0), tcgv_vec_arg(v4),
+  tcgv_vec_arg(v3), tcgv_vec_arg(t));
+tcg_temp_free_vec(t);
+}
+
 static void expand_vec_minmax(TCGType type, unsigned vece,
   TCGCond cond, bool min,
   TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
@@ -3551,7 +3575,7 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 {
 va_list va;
 TCGArg a2;
-TCGv_vec v0, v1, v2;
+TCGv_vec v0, v1, v2, v3, v4;
 
 va_start(va, a0);
 v0 = temp_tcgv_vec(arg_temp(a0));
@@ -3578,6 +3602,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 expand_vec_cmp(type, vece, v0, v1, v2, va_arg(va, TCGArg));
 break;
 
+case INDEX_op_cmpsel_vec:
+v2 = temp_tcgv_vec(arg_temp(a2));
+v3 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+v4 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+expand_vec_cmpsel(type, vece, v0, v1, v2, v3, v4, va_arg(va, TCGArg));
+break;
+
 case INDEX_op_smin_vec:
 v2 = temp_tcgv_vec(arg_temp(a2));
 expand_vec_minmax(type, vece, TCG_COND_GT, true, v0, v1, v2);
-- 
2.17.1

[Qemu-devel] [PATCH 00/16] tcg: misc gvec improvments

2019-05-18 Thread Richard Henderson

Add support for bitsel and cmpsel primitives, which will be
used by target/* patches that I'll post shortly.

Improvements to the i386 and aarch64 backends.

A handfull of bug fixes.

Assert that we haven't forgotten a QEMU_ALIGNED() marker,
by using MOVDQA for x86_64.


r~


Richard Henderson (16):
  tcg/i386: Fix dupi/dupm for avx1 and 32-bit hosts
  tcg: Fix missing checks and clears in tcg_gen_gvec_dup_mem
  tcg: Add support for vector bitwise select
  tcg: Add support for vector compare select
  tcg: Introduce do_op3_nofail for vector expansion
  tcg: Expand vector minmax using cmp+cmpsel
  tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative
  tcg/i386: Support vector comparison select value
  tcg/i386: Remove expansion for missing minmax
  tcg/i386: Use umin/umax in expanding unsigned compare
  tcg/aarch64: Support vector bitwise select value
  tcg/aarch64: Split up is_fimm
  tcg/aarch64: Use MVNI in tcg_out_dupi_vec
  tcg/aarch64: Build vector immediates with two insns
  tcg/aarch64: Allow immediates for vector ORR and BIC
  tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store

 accel/tcg/tcg-runtime.h  |   2 +
 tcg/aarch64/tcg-target.h |   2 +
 tcg/i386/tcg-target.h|   2 +
 tcg/tcg-op-gvec.h|   7 +
 tcg/tcg-op.h |   5 +
 tcg/tcg-opc.h|   5 +-
 tcg/tcg.h|   2 +
 accel/tcg/tcg-runtime-gvec.c |  14 ++
 tcg/aarch64/tcg-target.inc.c | 371 ++-
 tcg/i386/tcg-target.inc.c| 169 ++--
 tcg/tcg-op-gvec.c|  71 ---
 tcg/tcg-op-vec.c | 142 --
 tcg/tcg.c|   5 +
 tcg/README   |  11 ++
 14 files changed, 620 insertions(+), 188 deletions(-)

-- 
2.17.1

[Qemu-devel] [PATCH 05/16] tcg: Introduce do_op3_nofail for vector expansion

2019-05-18 Thread Richard Henderson

This makes do_op3 match do_op2 in allowing for failure,
and thus fall back expansions.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 45 +++--
 1 file changed, 27 insertions(+), 18 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index a888c02df8..004a34935b 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -562,7 +562,7 @@ void tcg_gen_cmp_vec(TCGCond cond, unsigned vece,
 }
 }
 
-static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
+static bool do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
TCGv_vec b, TCGOpcode opc)
 {
 TCGTemp *rt = tcgv_vec_temp(r);
@@ -580,82 +580,91 @@ static void do_op3(unsigned vece, TCGv_vec r, TCGv_vec a,
 can = tcg_can_emit_vec_op(opc, type, vece);
 if (can > 0) {
 vec_gen_3(opc, type, vece, ri, ai, bi);
-} else {
+} else if (can < 0) {
 const TCGOpcode *hold_list = tcg_swap_vecop_list(NULL);
-tcg_debug_assert(can < 0);
 tcg_expand_vec_op(opc, type, vece, ri, ai, bi);
 tcg_swap_vecop_list(hold_list);
+} else {
+return false;
 }
+return true;
+}
+
+static void do_op3_nofail(unsigned vece, TCGv_vec r, TCGv_vec a,
+  TCGv_vec b, TCGOpcode opc)
+{
+bool ok = do_op3(vece, r, a, b, opc);
+tcg_debug_assert(ok);
 }
 
 void tcg_gen_add_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_add_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_add_vec);
 }
 
 void tcg_gen_sub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sub_vec);
 }
 
 void tcg_gen_mul_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_mul_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_mul_vec);
 }
 
 void tcg_gen_ssadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_ssadd_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_ssadd_vec);
 }
 
 void tcg_gen_usadd_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_usadd_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_usadd_vec);
 }
 
 void tcg_gen_sssub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sssub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sssub_vec);
 }
 
 void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_ussub_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_smin_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_umin_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_smax_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_umax_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_shlv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_shlv_vec);
 }
 
 void tcg_gen_shrv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_shrv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_shrv_vec);
 }
 
 void tcg_gen_sarv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3(vece, r, a, b, INDEX_op_sarv_vec);
+do_op3_nofail(vece, r, a, b, INDEX_op_sarv_vec);
 }
 
 static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
@@ -691,7 +700,7 @@ static void do_shifts(unsigned vece, TCGv_vec r, TCGv_vec a,
 } else {
 tcg_gen_dup_i32_vec(vece, vec_s, s);
 }
-do_op3(vece, r, a, vec_s, opc_v);
+do_op3_nofail(vece, r, a, vec_s, opc_v);
 tcg_temp_free_vec(vec_s);
 }
 tcg_swap_vecop_list(hold_list);
-- 
2.17.1

[Qemu-devel] [PATCH 09/16] tcg/i386: Remove expansion for missing minmax

2019-05-18 Thread Richard Henderson

This is now handled by code within tcg-op-vec.c.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 37 -
 1 file changed, 37 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index ffcafb1e14..569a2c2120 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3297,7 +3297,6 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_smax_vec:
 case INDEX_op_umin_vec:
 case INDEX_op_umax_vec:
-return vece <= MO_32 ? 1 : -1;
 case INDEX_op_abs_vec:
 return vece <= MO_32;
 
@@ -3551,25 +3550,6 @@ static void expand_vec_cmpsel(TCGType type, unsigned 
vece, TCGv_vec v0,
 tcg_temp_free_vec(t);
 }
 
-static void expand_vec_minmax(TCGType type, unsigned vece,
-  TCGCond cond, bool min,
-  TCGv_vec v0, TCGv_vec v1, TCGv_vec v2)
-{
-TCGv_vec t1 = tcg_temp_new_vec(type);
-
-tcg_debug_assert(vece == MO_64);
-
-tcg_gen_cmp_vec(cond, vece, t1, v1, v2);
-if (min) {
-TCGv_vec t2;
-t2 = v1, v1 = v2, v2 = t2;
-}
-vec_gen_4(INDEX_op_x86_vpblendvb_vec, type, vece,
-  tcgv_vec_arg(v0), tcgv_vec_arg(v1),
-  tcgv_vec_arg(v2), tcgv_vec_arg(t1));
-tcg_temp_free_vec(t1);
-}
-
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
TCGArg a0, ...)
 {
@@ -3609,23 +3589,6 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece,
 expand_vec_cmpsel(type, vece, v0, v1, v2, v3, v4, va_arg(va, TCGArg));
 break;
 
-case INDEX_op_smin_vec:
-v2 = temp_tcgv_vec(arg_temp(a2));
-expand_vec_minmax(type, vece, TCG_COND_GT, true, v0, v1, v2);
-break;
-case INDEX_op_smax_vec:
-v2 = temp_tcgv_vec(arg_temp(a2));
-expand_vec_minmax(type, vece, TCG_COND_GT, false, v0, v1, v2);
-break;
-case INDEX_op_umin_vec:
-v2 = temp_tcgv_vec(arg_temp(a2));
-expand_vec_minmax(type, vece, TCG_COND_GTU, true, v0, v1, v2);
-break;
-case INDEX_op_umax_vec:
-v2 = temp_tcgv_vec(arg_temp(a2));
-expand_vec_minmax(type, vece, TCG_COND_GTU, false, v0, v1, v2);
-break;
-
 default:
 break;
 }
-- 
2.17.1

[Qemu-devel] [PATCH 10/16] tcg/i386: Use umin/umax in expanding unsigned compare

2019-05-18 Thread Richard Henderson

Using umin(a, b) == a as an expansion for TCG_COND_LEU is a
better alternative to (a - INT_MIN) <= (b - INT_MIN).

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 80 +--
 1 file changed, 61 insertions(+), 19 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 569a2c2120..6ec5e60448 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -3468,28 +3468,61 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned 
vece, TCGv_vec v0,
  TCGv_vec v1, TCGv_vec v2, TCGCond cond)
 {
 enum {
-NEED_SWAP = 1,
-NEED_INV  = 2,
-NEED_BIAS = 4
-};
-static const uint8_t fixups[16] = {
-[0 ... 15] = -1,
-[TCG_COND_EQ] = 0,
-[TCG_COND_NE] = NEED_INV,
-[TCG_COND_GT] = 0,
-[TCG_COND_LT] = NEED_SWAP,
-[TCG_COND_LE] = NEED_INV,
-[TCG_COND_GE] = NEED_SWAP | NEED_INV,
-[TCG_COND_GTU] = NEED_BIAS,
-[TCG_COND_LTU] = NEED_BIAS | NEED_SWAP,
-[TCG_COND_LEU] = NEED_BIAS | NEED_INV,
-[TCG_COND_GEU] = NEED_BIAS | NEED_SWAP | NEED_INV,
+NEED_INV  = 1,
+NEED_SWAP = 2,
+NEED_BIAS = 4,
+NEED_UMIN = 8,
+NEED_UMAX = 16,
 };
 TCGv_vec t1, t2;
 uint8_t fixup;
 
-fixup = fixups[cond & 15];
-tcg_debug_assert(fixup != 0xff);
+switch (cond) {
+case TCG_COND_EQ:
+case TCG_COND_GT:
+fixup = 0;
+break;
+case TCG_COND_NE:
+case TCG_COND_LE:
+fixup = NEED_INV;
+break;
+case TCG_COND_LT:
+fixup = NEED_SWAP;
+break;
+case TCG_COND_GE:
+fixup = NEED_SWAP | NEED_INV;
+break;
+case TCG_COND_LEU:
+if (vece <= MO_32) {
+fixup = NEED_UMIN;
+} else {
+fixup = NEED_BIAS | NEED_INV;
+}
+break;
+case TCG_COND_GTU:
+if (vece <= MO_32) {
+fixup = NEED_UMIN | NEED_INV;
+} else {
+fixup = NEED_BIAS;
+}
+break;
+case TCG_COND_GEU:
+if (vece <= MO_32) {
+fixup = NEED_UMAX;
+} else {
+fixup = NEED_BIAS | NEED_SWAP | NEED_INV;
+}
+break;
+case TCG_COND_LTU:
+if (vece <= MO_32) {
+fixup = NEED_UMAX | NEED_INV;
+} else {
+fixup = NEED_BIAS | NEED_SWAP;
+}
+break;
+default:
+g_assert_not_reached();
+}
 
 if (fixup & NEED_INV) {
 cond = tcg_invert_cond(cond);
@@ -3500,7 +3533,16 @@ static bool expand_vec_cmp_noinv(TCGType type, unsigned 
vece, TCGv_vec v0,
 }
 
 t1 = t2 = NULL;
-if (fixup & NEED_BIAS) {
+if (fixup & (NEED_UMIN | NEED_UMAX)) {
+t1 = tcg_temp_new_vec(type);
+if (fixup & NEED_UMIN) {
+tcg_gen_umin_vec(vece, t1, v1, v2);
+} else {
+tcg_gen_umax_vec(vece, t1, v1, v2);
+}
+v2 = t1;
+cond = TCG_COND_EQ;
+} else if (fixup & NEED_BIAS) {
 t1 = tcg_temp_new_vec(type);
 t2 = tcg_temp_new_vec(type);
 tcg_gen_dupi_vec(vece, t2, 1ull << ((8 << vece) - 1));
-- 
2.17.1

[Qemu-devel] [PATCH 14/16] tcg/aarch64: Build vector immediates with two insns

2019-05-18 Thread Richard Henderson

Use MOVI+ORR or MVNI+BIC in order to build some vector constants,
as opposed to dropping them to the constant pool.  This includes
all 16-bit constants and a similar set of 32-bit constants.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 47 
 1 file changed, 47 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 0b8b733805..52c18074ae 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -273,6 +273,26 @@ static bool is_fimm64(uint64_t v64, int *cmode, int *imm8)
 return false;
 }
 
+/*
+ * Return non-zero if v32 can be formed by MOVI+ORR.
+ * Place the parameters for MOVI in (cmode, imm8).
+ * Return the cmode for ORR; the imm8 can be had via extraction from v32.
+ */
+static int is_shimm32_pair(uint32_t v32, int *cmode, int *imm8)
+{
+int i;
+
+for (i = 6; i > 0; i -= 2) {
+/* Mask out one byte we can add with ORR.  */
+uint32_t tmp = v32 & ~(0xffu << (i * 4));
+if (is_shimm32(tmp, cmode, imm8) ||
+is_soimm32(tmp, cmode, imm8)) {
+break;
+}
+}
+return i;
+}
+
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
   const TCGArgConstraint *arg_ct)
 {
@@ -495,6 +515,8 @@ typedef enum {
 /* AdvSIMD modified immediate */
 I3606_MOVI  = 0x0f000400,
 I3606_MVNI  = 0x2f000400,
+I3606_BIC   = 0x2f001400,
+I3606_ORR   = 0x0f001400,
 
 /* AdvSIMD shift by immediate */
 I3614_SSHR  = 0x0f000400,
@@ -843,6 +865,14 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
 return;
 }
+
+/*
+ * Otherwise, all remaining constants can be loaded in two insns:
+ * rd = v16 & 0xff, rd |= v16 & 0xff00.
+ */
+tcg_out_insn(s, 3606, MOVI, q, rd, 0, 0x8, v16 & 0xff);
+tcg_out_insn(s, 3606, ORR, q, rd, 0, 0xa, v16 >> 8);
+return;
 } else if (v64 == dup_const(MO_32, v64)) {
 uint32_t v32 = v64;
 uint32_t n32 = ~v32;
@@ -858,6 +888,23 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
 return;
 }
+
+/*
+ * Restrict the set of constants to those we can load with
+ * two instructions.  Others we load from the pool.
+ */
+i = is_shimm32_pair(v32, , );
+if (i) {
+tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
+tcg_out_insn(s, 3606, ORR, q, rd, 0, i, extract32(v32, i * 4, 8));
+return;
+}
+i = is_shimm32_pair(n32, , );
+if (i) {
+tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+tcg_out_insn(s, 3606, BIC, q, rd, 0, i, extract32(n32, i * 4, 8));
+return;
+}
 } else if (is_fimm64(v64, , )) {
 tcg_out_insn(s, 3606, MOVI, q, rd, 1, cmode, imm8);
 return;
-- 
2.17.1

[Qemu-devel] [PATCH 16/16] tcg/i386: Use MOVDQA for TCG_TYPE_V128 load/store

2019-05-18 Thread Richard Henderson

This instruction raises #GP, aka SIGSEGV, if the effective address
is not aligned to 16-bytes.

We have assertions in tcg-op-gvec.c that the offset from ENV is
aligned, for vector types <= V128.  But the offset itself does not
validate that the final pointer is aligned -- one must also remember
to use the QEMU_ALIGNED() attribute on the vector member within ENV.

PowerPC Altivec has vector load/store instructions that silently
discard the low 4 bits of the address, making alignment mistakes
difficult to discover.  Aid that by making the most popular host
visibly signal the error.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 6ec5e60448..c0443da4af 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1082,14 +1082,24 @@ static void tcg_out_ld(TCGContext *s, TCGType type, 
TCGReg ret,
 }
 /* FALLTHRU */
 case TCG_TYPE_V64:
+/* There is no instruction that can validate 8-byte alignment.  */
 tcg_debug_assert(ret >= 16);
 tcg_out_vex_modrm_offset(s, OPC_MOVQ_VqWq, ret, 0, arg1, arg2);
 break;
 case TCG_TYPE_V128:
+/*
+ * The gvec infrastructure is asserts that v128 vector loads
+ * and stores use a 16-byte aligned offset.  Validate that the
+ * final pointer is aligned by using an insn that will SIGSEGV.
+ */
 tcg_debug_assert(ret >= 16);
-tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx, ret, 0, arg1, arg2);
+tcg_out_vex_modrm_offset(s, OPC_MOVDQA_VxWx, ret, 0, arg1, arg2);
 break;
 case TCG_TYPE_V256:
+/*
+ * The gvec infrastructure only requires 16-byte alignment,
+ * so here we must use an unaligned load.
+ */
 tcg_debug_assert(ret >= 16);
 tcg_out_vex_modrm_offset(s, OPC_MOVDQU_VxWx | P_VEXL,
  ret, 0, arg1, arg2);
@@ -1117,14 +1127,24 @@ static void tcg_out_st(TCGContext *s, TCGType type, 
TCGReg arg,
 }
 /* FALLTHRU */
 case TCG_TYPE_V64:
+/* There is no instruction that can validate 8-byte alignment.  */
 tcg_debug_assert(arg >= 16);
 tcg_out_vex_modrm_offset(s, OPC_MOVQ_WqVq, arg, 0, arg1, arg2);
 break;
 case TCG_TYPE_V128:
+/*
+ * The gvec infrastructure is asserts that v128 vector loads
+ * and stores use a 16-byte aligned offset.  Validate that the
+ * final pointer is aligned by using an insn that will SIGSEGV.
+ */
 tcg_debug_assert(arg >= 16);
-tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx, arg, 0, arg1, arg2);
+tcg_out_vex_modrm_offset(s, OPC_MOVDQA_WxVx, arg, 0, arg1, arg2);
 break;
 case TCG_TYPE_V256:
+/*
+ * The gvec infrastructure only requires 16-byte alignment,
+ * so here we must use an unaligned store.
+ */
 tcg_debug_assert(arg >= 16);
 tcg_out_vex_modrm_offset(s, OPC_MOVDQU_WxVx | P_VEXL,
  arg, 0, arg1, arg2);
-- 
2.17.1

[Qemu-devel] [PATCH 13/16] tcg/aarch64: Use MVNI in tcg_out_dupi_vec

2019-05-18 Thread Richard Henderson

The compliment of a subset of immediates can be computed
with a single instruction.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1422dfebe2..0b8b733805 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -494,6 +494,7 @@ typedef enum {
 
 /* AdvSIMD modified immediate */
 I3606_MOVI  = 0x0f000400,
+I3606_MVNI  = 0x2f000400,
 
 /* AdvSIMD shift by immediate */
 I3614_SSHR  = 0x0f000400,
@@ -838,8 +839,13 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
 return;
 }
+if (is_shimm16(~v16, , )) {
+tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+return;
+}
 } else if (v64 == dup_const(MO_32, v64)) {
 uint32_t v32 = v64;
+uint32_t n32 = ~v32;
 
 if (is_shimm32(v32, , ) ||
 is_soimm32(v32, , ) ||
@@ -847,6 +853,11 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type,
 tcg_out_insn(s, 3606, MOVI, q, rd, 0, cmode, imm8);
 return;
 }
+if (is_shimm32(n32, , ) ||
+is_soimm32(n32, , )) {
+tcg_out_insn(s, 3606, MVNI, q, rd, 0, cmode, imm8);
+return;
+}
 } else if (is_fimm64(v64, , )) {
 tcg_out_insn(s, 3606, MOVI, q, rd, 1, cmode, imm8);
 return;
-- 
2.17.1

[Qemu-devel] [PATCH 06/16] tcg: Expand vector minmax using cmp+cmpsel

2019-05-18 Thread Richard Henderson

Provide a generic fallback for the min/max operations.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op-vec.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 004a34935b..501d9630a2 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -120,6 +120,10 @@ bool tcg_can_emit_vecop_list(const TCGOpcode *list,
 }
 break;
 case INDEX_op_cmpsel_vec:
+case INDEX_op_smin_vec:
+case INDEX_op_smax_vec:
+case INDEX_op_umin_vec:
+case INDEX_op_umax_vec:
 if (tcg_can_emit_vec_op(INDEX_op_cmp_vec, type, vece)) {
 continue;
 }
@@ -632,24 +636,32 @@ void tcg_gen_ussub_vec(unsigned vece, TCGv_vec r, 
TCGv_vec a, TCGv_vec b)
 do_op3_nofail(vece, r, a, b, INDEX_op_ussub_vec);
 }
 
+static void do_minmax(unsigned vece, TCGv_vec r, TCGv_vec a,
+  TCGv_vec b, TCGOpcode opc, TCGCond cond)
+{
+if (!do_op3(vece, r, a, b, opc)) {
+tcg_gen_cmpsel_vec(cond, vece, r, a, b, a, b);
+}
+}
+
 void tcg_gen_smin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_smin_vec);
+do_minmax(vece, r, a, b, INDEX_op_smin_vec, TCG_COND_LT);
 }
 
 void tcg_gen_umin_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_umin_vec);
+do_minmax(vece, r, a, b, INDEX_op_umin_vec, TCG_COND_LTU);
 }
 
 void tcg_gen_smax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_smax_vec);
+do_minmax(vece, r, a, b, INDEX_op_smax_vec, TCG_COND_GT);
 }
 
 void tcg_gen_umax_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
 {
-do_op3_nofail(vece, r, a, b, INDEX_op_umax_vec);
+do_minmax(vece, r, a, b, INDEX_op_umax_vec, TCG_COND_GTU);
 }
 
 void tcg_gen_shlv_vec(unsigned vece, TCGv_vec r, TCGv_vec a, TCGv_vec b)
-- 
2.17.1

[Qemu-devel] [PATCH 11/16] tcg/aarch64: Support vector bitwise select value

2019-05-18 Thread Richard Henderson

The instruction set has 3 insns that perform the same operation,
only varying in which operand must overlap the destination.  We
can represent the operation without overlap and choose based on
the operands seen.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |  2 +-
 tcg/aarch64/tcg-target.inc.c | 24 +++-
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index b4a9d36bbc..ca214f6909 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -140,7 +140,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec  1
 #define TCG_TARGET_HAS_sat_vec  1
 #define TCG_TARGET_HAS_minmax_vec   1
-#define TCG_TARGET_HAS_bitsel_vec   0
+#define TCG_TARGET_HAS_bitsel_vec   1
 #define TCG_TARGET_HAS_cmpsel_vec   0
 
 #define TCG_TARGET_DEFAULT_MO (0)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 40bf35079a..e99149cda7 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -523,6 +523,9 @@ typedef enum {
 I3616_ADD   = 0x0e208400,
 I3616_AND   = 0x0e201c00,
 I3616_BIC   = 0x0e601c00,
+I3616_BIF   = 0x2ee01c00,
+I3616_BIT   = 0x2ea01c00,
+I3616_BSL   = 0x2e601c00,
 I3616_EOR   = 0x2e201c00,
 I3616_MUL   = 0x0e209c00,
 I3616_ORR   = 0x0ea01c00,
@@ -2181,7 +2184,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 
 TCGType type = vecl + TCG_TYPE_V64;
 unsigned is_q = vecl;
-TCGArg a0, a1, a2;
+TCGArg a0, a1, a2, a3;
 
 a0 = args[0];
 a1 = args[1];
@@ -2304,6 +2307,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 }
 break;
 
+case INDEX_op_bitsel_vec:
+a3 = args[3];
+if (a0 == a3) {
+tcg_out_insn(s, 3616, BIT, is_q, 0, a0, a2, a1);
+} else if (a0 == a2) {
+tcg_out_insn(s, 3616, BIF, is_q, 0, a0, a3, a1);
+} else {
+if (a0 != a1) {
+tcg_out_mov(s, type, a0, a1);
+}
+tcg_out_insn(s, 3616, BSL, is_q, 0, a0, a2, a3);
+}
+break;
+
 case INDEX_op_mov_vec:  /* Always emitted via tcg_out_mov.  */
 case INDEX_op_dupi_vec: /* Always emitted via tcg_out_movi.  */
 case INDEX_op_dup_vec:  /* Always emitted via tcg_out_dup_vec.  */
@@ -2334,6 +2351,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, 
unsigned vece)
 case INDEX_op_usadd_vec:
 case INDEX_op_ussub_vec:
 case INDEX_op_shlv_vec:
+case INDEX_op_bitsel_vec:
 return 1;
 case INDEX_op_shrv_vec:
 case INDEX_op_sarv_vec:
@@ -2408,6 +2426,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 = { .args_ct_str = { "r", "r", "rA", "rZ", "rZ" } };
 static const TCGTargetOpDef add2
 = { .args_ct_str = { "r", "r", "rZ", "rZ", "rA", "rMZ" } };
+static const TCGTargetOpDef w_w_w_w
+= { .args_ct_str = { "w", "w", "w", "w" } };
 
 switch (op) {
 case INDEX_op_goto_ptr:
@@ -2580,6 +2600,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 return _wr;
 case INDEX_op_cmp_vec:
 return _w_wZ;
+case INDEX_op_bitsel_vec:
+return _w_w_w;
 
 default:
 return NULL;
-- 
2.17.1

[Qemu-devel] [PATCH 12/16] tcg/aarch64: Split up is_fimm

2019-05-18 Thread Richard Henderson

There are several sub-classes of vector immediate, and only MOVI
can use them all.  This will enable usage of MVNI and ORRI, which
use progressively fewer sub-classes.

This patch adds no new functionality, merely splits the function
and moves part of the logic into tcg_out_dupi_vec.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 205 ---
 1 file changed, 120 insertions(+), 85 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index e99149cda7..1422dfebe2 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -190,103 +190,86 @@ static inline bool is_limm(uint64_t val)
 return (val & (val - 1)) == 0;
 }
 
-/* Match a constant that is valid for vectors.  */
-static bool is_fimm(uint64_t v64, int *op, int *cmode, int *imm8)
+/* Return true if v16 is a valid 16-bit shifted immediate.  */
+static bool is_shimm16(uint16_t v16, int *cmode, int *imm8)
 {
-int i;
-
-*op = 0;
-/* Match replication across 8 bits.  */
-if (v64 == dup_const(MO_8, v64)) {
-*cmode = 0xe;
-*imm8 = v64 & 0xff;
+if (v16 == (v16 & 0xff)) {
+*cmode = 0x8;
+*imm8 = v16 & 0xff;
+return true;
+} else if (v16 == (v16 & 0xff00)) {
+*cmode = 0xa;
+*imm8 = v16 >> 8;
 return true;
 }
-/* Match replication across 16 bits.  */
-if (v64 == dup_const(MO_16, v64)) {
-uint16_t v16 = v64;
+return false;
+}
 
-if (v16 == (v16 & 0xff)) {
-*cmode = 0x8;
-*imm8 = v16 & 0xff;
-return true;
-} else if (v16 == (v16 & 0xff00)) {
-*cmode = 0xa;
-*imm8 = v16 >> 8;
-return true;
-}
+/* Return true if v32 is a valid 32-bit shifted immediate.  */
+static bool is_shimm32(uint32_t v32, int *cmode, int *imm8)
+{
+if (v32 == (v32 & 0xff)) {
+*cmode = 0x0;
+*imm8 = v32 & 0xff;
+return true;
+} else if (v32 == (v32 & 0xff00)) {
+*cmode = 0x2;
+*imm8 = (v32 >> 8) & 0xff;
+return true;
+} else if (v32 == (v32 & 0xff)) {
+*cmode = 0x4;
+*imm8 = (v32 >> 16) & 0xff;
+return true;
+} else if (v32 == (v32 & 0xff00)) {
+*cmode = 0x6;
+*imm8 = v32 >> 24;
+return true;
 }
-/* Match replication across 32 bits.  */
-if (v64 == dup_const(MO_32, v64)) {
-uint32_t v32 = v64;
+return false;
+}
 
-if (v32 == (v32 & 0xff)) {
-*cmode = 0x0;
-*imm8 = v32 & 0xff;
-return true;
-} else if (v32 == (v32 & 0xff00)) {
-*cmode = 0x2;
-*imm8 = (v32 >> 8) & 0xff;
-return true;
-} else if (v32 == (v32 & 0xff)) {
-*cmode = 0x4;
-*imm8 = (v32 >> 16) & 0xff;
-return true;
-} else if (v32 == (v32 & 0xff00)) {
-*cmode = 0x6;
-*imm8 = v32 >> 24;
-return true;
-} else if ((v32 & 0x00ff) == 0xff) {
-*cmode = 0xc;
-*imm8 = (v32 >> 8) & 0xff;
-return true;
-} else if ((v32 & 0xff00) == 0x) {
-*cmode = 0xd;
-*imm8 = (v32 >> 16) & 0xff;
-return true;
-}
-/* Match forms of a float32.  */
-if (extract32(v32, 0, 19) == 0
-&& (extract32(v32, 25, 6) == 0x20
-|| extract32(v32, 25, 6) == 0x1f)) {
-*cmode = 0xf;
-*imm8 = (extract32(v32, 31, 1) << 7)
-  | (extract32(v32, 25, 1) << 6)
-  | extract32(v32, 19, 6);
-return true;
-}
+/* Return true if v32 is a valid 32-bit shifting ones immediate.  */
+static bool is_soimm32(uint32_t v32, int *cmode, int *imm8)
+{
+if ((v32 & 0x00ff) == 0xff) {
+*cmode = 0xc;
+*imm8 = (v32 >> 8) & 0xff;
+return true;
+} else if ((v32 & 0xff00) == 0x) {
+*cmode = 0xd;
+*imm8 = (v32 >> 16) & 0xff;
+return true;
 }
-/* Match forms of a float64.  */
+return false;
+}
+
+/* Return true if v32 is a valid float32 immediate.  */
+static bool is_fimm32(uint32_t v32, int *cmode, int *imm8)
+{
+if (extract32(v32, 0, 19) == 0
+&& (extract32(v32, 25, 6) == 0x20
+|| extract32(v32, 25, 6) == 0x1f)) {
+*cmode = 0xf;
+*imm8 = (extract32(v32, 31, 1) << 7)
+  | (extract32(v32, 25, 1) << 6)
+  | extract32(v32, 19, 6);
+return true;
+}
+return false;
+}
+
+/* Return true if v64 is a valid float64 immediate.  */
+static bool is_fimm64(uint64_t v64, int *cmode, int *imm8)
+{
 if (extract64(v64, 0, 48) == 0
 && (extract64(v64, 54, 9) == 0x100
 || extract64(v64, 54, 9) == 0x0ff)) {
 *cmode = 0xf;
-*op = 1;
 *imm8 = (extract64(v64, 63, 1) << 7)

[Qemu-devel] [PATCH 15/16] tcg/aarch64: Allow immediates for vector ORR and BIC

2019-05-18 Thread Richard Henderson

The allows immediates to be used for ORR and BIC,
as well as the trivial inversions, ORC and AND.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 90 +---
 1 file changed, 83 insertions(+), 7 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 52c18074ae..9e1dad9696 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -119,6 +119,8 @@ static inline bool patch_reloc(tcg_insn_unit *code_ptr, int 
type,
 #define TCG_CT_CONST_LIMM 0x200
 #define TCG_CT_CONST_ZERO 0x400
 #define TCG_CT_CONST_MONE 0x800
+#define TCG_CT_CONST_ORRI 0x1000
+#define TCG_CT_CONST_ANDI 0x2000
 
 /* parse target specific constraints */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
@@ -154,6 +156,12 @@ static const char 
*target_parse_constraint(TCGArgConstraint *ct,
 case 'M': /* minus one */
 ct->ct |= TCG_CT_CONST_MONE;
 break;
+case 'O': /* vector orr/bic immediate */
+ct->ct |= TCG_CT_CONST_ORRI;
+break;
+case 'N': /* vector orr/bic immediate, inverted */
+ct->ct |= TCG_CT_CONST_ANDI;
+break;
 case 'Z': /* zero */
 ct->ct |= TCG_CT_CONST_ZERO;
 break;
@@ -293,6 +301,16 @@ static int is_shimm32_pair(uint32_t v32, int *cmode, int 
*imm8)
 return i;
 }
 
+/* Return true if V is a valid 16-bit or 32-bit shifted immediate.  */
+static bool is_shimm1632(uint32_t v32, int *cmode, int *imm8)
+{
+if (v32 == deposit32(v32, 16, 16, v32)) {
+return is_shimm16(v32, cmode, imm8);
+} else {
+return is_shimm32(v32, cmode, imm8);
+}
+}
+
 static int tcg_target_const_match(tcg_target_long val, TCGType type,
   const TCGArgConstraint *arg_ct)
 {
@@ -317,6 +335,23 @@ static int tcg_target_const_match(tcg_target_long val, 
TCGType type,
 return 1;
 }
 
+switch (ct & (TCG_CT_CONST_ORRI | TCG_CT_CONST_ANDI)) {
+case 0:
+break;
+case TCG_CT_CONST_ANDI:
+val = ~val;
+/* fallthru */
+case TCG_CT_CONST_ORRI:
+if (val == deposit64(val, 32, 32, val)) {
+int cmode, imm8;
+return is_shimm1632(val, , );
+}
+break;
+default:
+/* Both bits should not be set for the same insn.  */
+g_assert_not_reached();
+}
+
 return 0;
 }
 
@@ -2278,6 +2313,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 TCGType type = vecl + TCG_TYPE_V64;
 unsigned is_q = vecl;
 TCGArg a0, a1, a2, a3;
+int cmode, imm8;
 
 a0 = args[0];
 a1 = args[1];
@@ -2309,20 +2345,56 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 tcg_out_insn(s, 3617, ABS, is_q, vece, a0, a1);
 break;
 case INDEX_op_and_vec:
+if (const_args[2]) {
+is_shimm1632(~a2, , );
+if (a0 == a1) {
+tcg_out_insn(s, 3606, BIC, is_q, a0, 0, cmode, imm8);
+return;
+}
+tcg_out_insn(s, 3606, MVNI, is_q, a0, 0, cmode, imm8);
+a2 = a0;
+}
 tcg_out_insn(s, 3616, AND, is_q, 0, a0, a1, a2);
 break;
 case INDEX_op_or_vec:
+if (const_args[2]) {
+is_shimm1632(a2, , );
+if (a0 == a1) {
+tcg_out_insn(s, 3606, ORR, is_q, a0, 0, cmode, imm8);
+return;
+}
+tcg_out_insn(s, 3606, MOVI, is_q, a0, 0, cmode, imm8);
+a2 = a0;
+}
 tcg_out_insn(s, 3616, ORR, is_q, 0, a0, a1, a2);
 break;
-case INDEX_op_xor_vec:
-tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
-break;
 case INDEX_op_andc_vec:
+if (const_args[2]) {
+is_shimm1632(a2, , );
+if (a0 == a1) {
+tcg_out_insn(s, 3606, BIC, is_q, a0, 0, cmode, imm8);
+return;
+}
+tcg_out_insn(s, 3606, MOVI, is_q, a0, 0, cmode, imm8);
+a2 = a0;
+}
 tcg_out_insn(s, 3616, BIC, is_q, 0, a0, a1, a2);
 break;
 case INDEX_op_orc_vec:
+if (const_args[2]) {
+is_shimm1632(~a2, , );
+if (a0 == a1) {
+tcg_out_insn(s, 3606, ORR, is_q, a0, 0, cmode, imm8);
+return;
+}
+tcg_out_insn(s, 3606, MVNI, is_q, a0, 0, cmode, imm8);
+a2 = a0;
+}
 tcg_out_insn(s, 3616, ORN, is_q, 0, a0, a1, a2);
 break;
+case INDEX_op_xor_vec:
+tcg_out_insn(s, 3616, EOR, is_q, 0, a0, a1, a2);
+break;
 case INDEX_op_ssadd_vec:
 tcg_out_insn(s, 3616, SQADD, is_q, vece, a0, a1, a2);
 break;
@@ -2505,6 +2577,8 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 static const TCGTargetOpDef lZ_l = { .args_ct_str = { "lZ", "l" } };
 static const TCGTargetOpDef r_r_r = { .args_ct_str = { "r", "r", "r" } };

[Qemu-devel] [PATCH 07/16] tcg: Add TCG_OPF_NOT_PRESENT if TCG_TARGET_HAS_foo is negative

2019-05-18 Thread Richard Henderson

If INDEX_op_foo is always expanded by tcg_expand_vec_op, then
there may be no reasonable set of constraints to return from
tcg_target_op_def for that opcode.

Let TCG_TARGET_HAS_foo be specified as -1 in that case.  Thus a
boolean test for TCG_TARGET_HAS_foo is true, but we will not
assert within process_op_defs when no constraints are specified.

Compare this with tcg_can_emit_vec_op, which already uses this
tri-state indication.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-opc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index c7d971fa3d..242d608e6d 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -35,7 +35,7 @@ DEF(call, 0, 0, 3, TCG_OPF_CALL_CLOBBER | TCG_OPF_NOT_PRESENT)
 
 DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 
-#define IMPL(X) (__builtin_constant_p(X) && !(X) ? TCG_OPF_NOT_PRESENT : 0)
+#define IMPL(X) (__builtin_constant_p(X) && (X) <= 0 ? TCG_OPF_NOT_PRESENT : 0)
 #if TCG_TARGET_REG_BITS == 32
 # define IMPL64  TCG_OPF_64BIT | TCG_OPF_NOT_PRESENT
 #else
-- 
2.17.1

Re: [Qemu-devel] [PATCH] nvme: fix copy direction in DMA reads going to CMB

2019-05-18 Thread Heitke, Kenneth





On 5/18/2019 1:39 AM, Klaus Birkelund Jensen wrote:

`nvme_dma_read_prp` erronously used `qemu_iovec_*to*_buf` instead of
`qemu_iovec_*from*_buf` when the request involved the controller memory
buffer.

Signed-off-by: Klaus Birkelund Jensen 
---
  hw/block/nvme.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7caf92532a09..63a5b58849fb 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -238,7 +238,7 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
  }
  qemu_sglist_destroy();
  } else {
-if (unlikely(qemu_iovec_to_buf(, 0, ptr, len) != len)) {
+if (unlikely(qemu_iovec_from_buf(, 0, ptr, len) != len)) {
  trace_nvme_err_invalid_dma();
  status = NVME_INVALID_FIELD | NVME_DNR;
  }



Reviewed-by: Kenneth Heitke

Re: [Qemu-devel] [RISU v2 11/11] risu_reginfo_i386: accept named feature sets for --xfeature

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> Have the --xfeature option accept "sse", "avx" and "avx512" in
> addition to a plain numerical value, purely for users' convenience.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  risu_reginfo_i386.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 10/11] risu_reginfo_i386: replace xfeature constants with symbolic names

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> The original code used "magic numbers", which made it unclear in
> some places. Include a reference to the Intel manual where the
> constants' meaning is discussed.
> 
> Signed-off-by: Jan Bobek 
> ---
>  risu_reginfo_i386.c | 48 +++--
>  1 file changed, 33 insertions(+), 15 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH v4 4/5] target/mips: Refactor and fix COPY_U. instructions

2019-05-18 Thread Aleksandar Markovic

On Apr 2, 2019 3:44 PM, "Mateja Marjanovic" 
wrote:
>
> From: Mateja Marjanovic 
>
> The old version of the helper for the COPY_U. MSA instructions
> has been replaced with four helpers that don't use switch, and change
> the endianness of the given index, when executed on a big endian host.
>
> Signed-off-by: Mateja Marjanovic 
> ---

Reviewed-by: Aleksandar Markovic 

>  target/mips/helper.h |  4 +++-
>  target/mips/msa_helper.c | 55
+++-
>  target/mips/translate.c  | 21 +-
>  3 files changed, 59 insertions(+), 21 deletions(-)
>
> diff --git a/target/mips/helper.h b/target/mips/helper.h
> index 4e49618..8b6703c 100644
> --- a/target/mips/helper.h
> +++ b/target/mips/helper.h
> @@ -875,7 +875,6 @@ DEF_HELPER_5(msa_hsub_u_df, void, env, i32, i32, i32,
i32)
>  DEF_HELPER_5(msa_sldi_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_splati_df, void, env, i32, i32, i32, i32)
>
> -DEF_HELPER_5(msa_copy_u_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_insert_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_insve_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_3(msa_ctcmsa, void, env, tl, i32)
> @@ -940,6 +939,9 @@ DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_s_h, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_s_w, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_copy_s_d, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_u_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_u_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_u_w, void, env, i32, i32, i32)
>
>  DEF_HELPER_4(msa_fclass_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_ftrunc_s_df, void, env, i32, i32, i32)
> diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> index 5a06579..d5bf4dc 100644
> --- a/target/mips/msa_helper.c
> +++ b/target/mips/msa_helper.c
> @@ -1281,29 +1281,46 @@ void helper_msa_copy_s_d(CPUMIPSState *env,
uint32_t rd,
>  env->active_tc.gpr[rd] = (int64_t)env->active_fpu.fpr[ws].wr.d[n];
>  }
>
> -void helper_msa_copy_u_df(CPUMIPSState *env, uint32_t df, uint32_t rd,
> -  uint32_t ws, uint32_t n)
> +void helper_msa_copy_u_b(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
>  {
> -n %= DF_ELEMENTS(df);
> +n %= 16;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 8) {
> +n = 8 - n - 1;
> +} else {
> +n = 24 - n - 1;
> +}
> +#endif
> +env->active_tc.gpr[rd] = (uint8_t)env->active_fpu.fpr[ws].wr.b[n];
> +}
>
> -switch (df) {
> -case DF_BYTE:
> -env->active_tc.gpr[rd] =
(uint8_t)env->active_fpu.fpr[ws].wr.b[n];
> -break;
> -case DF_HALF:
> -env->active_tc.gpr[rd] =
(uint16_t)env->active_fpu.fpr[ws].wr.h[n];
> -break;
> -case DF_WORD:
> -env->active_tc.gpr[rd] =
(uint32_t)env->active_fpu.fpr[ws].wr.w[n];
> -break;
> -#ifdef TARGET_MIPS64
> -case DF_DOUBLE:
> -env->active_tc.gpr[rd] =
(uint64_t)env->active_fpu.fpr[ws].wr.d[n];
> -break;
> +void helper_msa_copy_u_h(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
> +{
> +n %= 8;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 4) {
> +n = 4 - n - 1;
> +} else {
> +n = 12 - n - 1;
> +}
>  #endif
> -default:
> -assert(0);
> +env->active_tc.gpr[rd] = (uint16_t)env->active_fpu.fpr[ws].wr.h[n];
> +}
> +
> +void helper_msa_copy_u_w(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
> +{
> +n %= 4;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 2) {
> +n = 2 - n - 1;
> +} else {
> +n = 6 - n - 1;
>  }
> +#endif
> +env->active_tc.gpr[rd] = (uint32_t)env->active_fpu.fpr[ws].wr.w[n];
>  }
>
>  void helper_msa_insert_df(CPUMIPSState *env, uint32_t df, uint32_t wd,
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index f2ea378..72ed0a8 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -29397,6 +29397,11 @@ static void gen_msa_elm_df(CPUMIPSState *env,
DisasContext *ctx, uint32_t df,
>  generate_exception_end(ctx, EXCP_RI);
>  break;
>  }
> +if ((MASK_MSA_ELM(ctx->opcode) == OPC_COPY_U_df) &&
> +  (df == DF_WORD)) {
> +generate_exception_end(ctx, EXCP_RI);
> +break;
> +}
>  #endif
>  switch (MASK_MSA_ELM(ctx->opcode)) {
>  case OPC_COPY_S_df:
> @@ -29423,7 +29428,21 @@ static void gen_msa_elm_df(CPUMIPSState *env,
DisasContext *ctx, uint32_t df,
>  break;
>  case OPC_COPY_U_df:
>  if (likely(wd != 0)) {
> -gen_helper_msa_copy_u_df(cpu_env, tdf, twd, tws, tn);
> +switch (df) {
> +case DF_BYTE:
> +gen_helper_msa_copy_u_b(cpu_env, twd, tws, tn);
> +break;
> +

Re: [Qemu-devel] [RISU v2 04/11] risu_reginfo_i386: implement arch-specific reginfo interface

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> CPU-specific code in risu_reginfo_* is expected to define and export
> the following symbols:
> 
> - arch_long_opts, arch_extra_help, process_arch_opt
> - reginfo_size
> - reginfo_init
> - reginfo_is_eq
> - reginfo_dump, reginfo_dump_mismatch
> 
> Make risu_reginfo_i386.c implement this interface; and while we're at
> it, expand the support to x86_64 as well.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  risu_reginfo_i386.h |  24 
>  risu_reginfo_i386.c | 147 ++--
>  2 files changed, 127 insertions(+), 44 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 06/11] risu_i386: remove old unused code

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> The code being removed is a remnant of the past implementation; it has
> since been replaced by its more powerful, architecture-independent
> counterpart in reginfo.c.
> 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Jan Bobek 
> ---
>  risu_i386.c | 58 -
>  1 file changed, 58 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH v4 3/5] target/mips: Refactor and fix COPY_S. instructions

2019-05-18 Thread Aleksandar Markovic

On Apr 2, 2019 3:44 PM, "Mateja Marjanovic" 
wrote:
>
> From: Mateja Marjanovic 
>
> The old version of the helper for the COPY_S. MSA instructions
> has been replaced with four helpers that don't use switch, and change
> the endianness of the given index, when executed on a big endian host.
>
> Signed-off-by: Mateja Marjanovic 
> ---

Reviewed-by: Aleksandar Markovic 

>  target/mips/helper.h |  7 +-
>  target/mips/msa_helper.c | 62
+---
>  target/mips/translate.c  | 19 ++-
>  3 files changed, 67 insertions(+), 21 deletions(-)
>
> diff --git a/target/mips/helper.h b/target/mips/helper.h
> index 2f23b0d..4e49618 100644
> --- a/target/mips/helper.h
> +++ b/target/mips/helper.h
> @@ -874,7 +874,7 @@ DEF_HELPER_5(msa_hsub_u_df, void, env, i32, i32, i32,
i32)
>
>  DEF_HELPER_5(msa_sldi_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_splati_df, void, env, i32, i32, i32, i32)
> -DEF_HELPER_5(msa_copy_s_df, void, env, i32, i32, i32, i32)
> +
>  DEF_HELPER_5(msa_copy_u_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_insert_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_insve_df, void, env, i32, i32, i32, i32)
> @@ -936,6 +936,11 @@ DEF_HELPER_4(msa_pcnt_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_nloc_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_nlzc_df, void, env, i32, i32, i32)
>
> +DEF_HELPER_4(msa_copy_s_b, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_s_h, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_s_w, void, env, i32, i32, i32)
> +DEF_HELPER_4(msa_copy_s_d, void, env, i32, i32, i32)
> +
>  DEF_HELPER_4(msa_fclass_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_ftrunc_s_df, void, env, i32, i32, i32)
>  DEF_HELPER_4(msa_ftrunc_u_df, void, env, i32, i32, i32)
> diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> index a500c59..5a06579 100644
> --- a/target/mips/msa_helper.c
> +++ b/target/mips/msa_helper.c
> @@ -1232,29 +1232,53 @@ void helper_msa_splati_df(CPUMIPSState *env,
uint32_t df, uint32_t wd,
>  msa_splat_df(df, pwd, pws, n);
>  }
>
> -void helper_msa_copy_s_df(CPUMIPSState *env, uint32_t df, uint32_t rd,
> -  uint32_t ws, uint32_t n)
> +void helper_msa_copy_s_b(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
>  {
> -n %= DF_ELEMENTS(df);
> +n %= 16;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 8) {
> +n = 8 - n - 1;
> +} else {
> +n = 24 - n - 1;
> +}
> +#endif
> +env->active_tc.gpr[rd] = (int8_t)env->active_fpu.fpr[ws].wr.b[n];
> +}
>
> -switch (df) {
> -case DF_BYTE:
> -env->active_tc.gpr[rd] = (int8_t)env->active_fpu.fpr[ws].wr.b[n];
> -break;
> -case DF_HALF:
> -env->active_tc.gpr[rd] =
(int16_t)env->active_fpu.fpr[ws].wr.h[n];
> -break;
> -case DF_WORD:
> -env->active_tc.gpr[rd] =
(int32_t)env->active_fpu.fpr[ws].wr.w[n];
> -break;
> -#ifdef TARGET_MIPS64
> -case DF_DOUBLE:
> -env->active_tc.gpr[rd] =
(int64_t)env->active_fpu.fpr[ws].wr.d[n];
> -break;
> +void helper_msa_copy_s_h(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
> +{
> +n %= 8;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 4) {
> +n = 4 - n - 1;
> +} else {
> +n = 12 - n - 1;
> +}
>  #endif
> -default:
> -assert(0);
> +env->active_tc.gpr[rd] = (int16_t)env->active_fpu.fpr[ws].wr.h[n];
> +}
> +
> +void helper_msa_copy_s_w(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
> +{
> +n %= 4;
> +#if defined(HOST_WORDS_BIGENDIAN)
> +if (n < 2) {
> +n = 2 - n - 1;
> +} else {
> +n = 6 - n - 1;
>  }
> +#endif
> +env->active_tc.gpr[rd] = (int32_t)env->active_fpu.fpr[ws].wr.w[n];
> +}
> +
> +void helper_msa_copy_s_d(CPUMIPSState *env, uint32_t rd,
> + uint32_t ws, uint32_t n)
> +{
> +n %= 2;
> +env->active_tc.gpr[rd] = (int64_t)env->active_fpu.fpr[ws].wr.d[n];
>  }
>
>  void helper_msa_copy_u_df(CPUMIPSState *env, uint32_t df, uint32_t rd,
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index 189bbc4..f2ea378 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -29401,7 +29401,24 @@ static void gen_msa_elm_df(CPUMIPSState *env,
DisasContext *ctx, uint32_t df,
>  switch (MASK_MSA_ELM(ctx->opcode)) {
>  case OPC_COPY_S_df:
>  if (likely(wd != 0)) {
> -gen_helper_msa_copy_s_df(cpu_env, tdf, twd, tws, tn);
> +switch (df) {
> +case DF_BYTE:
> +gen_helper_msa_copy_s_b(cpu_env, twd, tws, tn);
> +break;
> +case DF_HALF:
> +gen_helper_msa_copy_s_h(cpu_env, twd, tws, tn);
> +break;
> +case DF_WORD:
> +

Re: [Qemu-devel] [RISU v2 05/11] risu_i386: implement missing CPU-specific functions

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> risu_i386.c is expected to implement the following functions:
> 
> - advance_pc
> - get_reginfo_paramreg, set_ucontext_paramreg
> - get_risuop
> - get_pc
> 
> This patch adds the necessary code. We use EAX as the parameter
> register and opcode "UD1 %xxx,%eax" for triggering RISU actions.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  risu_i386.c | 35 ++-
>  1 file changed, 30 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 07/11] test_i386: change syntax from nasm to gas

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> This allows us to drop dependency on NASM and build the test image
> with GCC only. Adds support for x86_64, too.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  Makefile|  3 +++
>  test_i386.S | 41 +
>  test_i386.s | 27 ---
>  3 files changed, 44 insertions(+), 27 deletions(-)
>  create mode 100644 test_i386.S
>  delete mode 100644 test_i386.s

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 08/11] configure: add i386/x86_64 architectures

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> Now that i386 and x86_64 architectures are supported by RISU, we want
> to detect them and build RISU for them automatically.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  configure | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)


Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 03/11] risu_i386: move reginfo-related code to risu_reginfo_i386.c

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> In order to build risu successfully for i386, we need files
> risu_reginfo_i386.{h,c}; this patch adds the latter by extracting the
> relevant code from risu_i386.c.
> 
> This patch is pure code motion; no functional changes were made.
> 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Jan Bobek 
> ---
>  risu_i386.c | 54 ---
>  risu_reginfo_i386.c | 68 +
>  2 files changed, 68 insertions(+), 54 deletions(-)
>  create mode 100644 risu_reginfo_i386.c

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 02/11] risu_i386: move reginfo_t and related defines to risu_reginfo_i386.h

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> In order to build risu successfully for i386, we need files
> risu_reginfo_i386.{h,c}; this patch adds the former by extracting the
> relevant code from risu_i386.c.
> 
> This patch is pure code motion; no functional changes were made.
> 
> Reviewed-by: Alex Bennée 
> Signed-off-by: Jan Bobek 
> ---
>  risu_reginfo_i386.h | 37 +
>  risu_i386.c | 23 +--
>  2 files changed, 38 insertions(+), 22 deletions(-)
>  create mode 100644 risu_reginfo_i386.h

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [RISU v2 01/11] Makefile: undefine the arch name symbol

2019-05-18 Thread Richard Henderson

On 5/17/19 3:44 PM, Jan Bobek wrote:
> At least GCC defines the symbol "i386" to 1 to signal the target
> platform. We need to use "i386" as an undefined symbol in order to
> correctly include risu_reginfo_i386.h from risu.h. Add an -U option to
> the build command to make sure the symbol remains undefined.
> 
> Suggested-by: Richard Henderson 
> Signed-off-by: Jan Bobek 
> ---
>  Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~

Re: [Qemu-devel] [PATCH 4/4] pci: msix: move 'MSIX_CAP_LENGTH' to header file

2019-05-18 Thread Alex Williamson

On Fri, 17 May 2019 20:28:11 -0700
Li Qiang  wrote:

Lacking commit message.

> CC: qemu-triv...@nongnu.org
> Signed-off-by: Li Qiang 
> ---
>  hw/pci/msix.c | 2 --
>  hw/vfio/pci.c | 2 --
>  include/hw/pci/msix.h | 2 ++
>  3 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 4e336416a7..d39dcf32e8 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -24,8 +24,6 @@
>  #include "qapi/error.h"
>  #include "trace.h"
>  
> -#define MSIX_CAP_LENGTH 12
> -
>  /* MSI enable bit and maskall bit are in byte 1 in FLAGS register */
>  #define MSIX_CONTROL_OFFSET (PCI_MSIX_FLAGS + 1)
>  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 08729e5875..8e555db12e 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -35,8 +35,6 @@
>  #include "trace.h"
>  #include "qapi/error.h"
>  
> -#define MSIX_CAP_LENGTH 12
> -
>  #define TYPE_VFIO_PCI "vfio-pci"
>  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
>  
> diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
> index 1f27658d35..08acfa836e 100644
> --- a/include/hw/pci/msix.h
> +++ b/include/hw/pci/msix.h
> @@ -4,6 +4,8 @@
>  #include "qemu-common.h"
>  #include "hw/pci/pci.h"
>  
> +#define MSIX_CAP_LENGTH 12
> +
>  void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg);
>  MSIMessage msix_get_message(PCIDevice *dev, unsigned int vector);
>  int msix_init(PCIDevice *dev, unsigned short nentries,

Re: [Qemu-devel] [PATCH 3/4] vfio: platform: fix a typo

2019-05-18 Thread Alex Williamson

On Fri, 17 May 2019 20:28:10 -0700
Li Qiang  wrote:

An actual trivial patch, but it could still use a commit message.

> CC: qemu-triv...@nongnu.org
> Signed-off-by: Li Qiang 
> ---
>  hw/vfio/platform.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index e59a0234dd..d52d6552e0 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -72,7 +72,7 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
>  g_free(intp->interrupt);
>  g_free(intp);
>  error_setg_errno(errp, -ret,
> - "failed to initialize trigger eventd notifier");
> + "failed to initialize trigger eventfd notifier");
>  return NULL;
>  }
>  if (vfio_irq_is_automasked(intp)) {
> @@ -84,7 +84,7 @@ static VFIOINTp *vfio_init_intp(VFIODevice *vbasedev,
>  g_free(intp->unmask);
>  g_free(intp);
>  error_setg_errno(errp, -ret,
> - "failed to initialize resample eventd 
> notifier");
> + "failed to initialize resample eventfd 
> notifier");
>  return NULL;
>  }
>  }

Re: [Qemu-devel] [PATCH 2/4] hw: vfio: drop TYPE_FOO MACRO in VMStateDescription

2019-05-18 Thread Alex Williamson

On Fri, 17 May 2019 20:28:09 -0700
Li Qiang  wrote:

> As the vmstate structure names aren't related with
> the QOM type names.

Seems contrary to the first patch in the series.
 
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Li Qiang 
> ---
>  hw/vfio/amd-xgbe.c  | 2 +-
>  hw/vfio/ap.c| 2 +-
>  hw/vfio/calxeda-xgmac.c | 2 +-
>  hw/vfio/ccw.c   | 2 +-
>  hw/vfio/platform.c  | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/vfio/amd-xgbe.c b/hw/vfio/amd-xgbe.c
> index ee64a3b4a2..1b06c0f3ea 100644
> --- a/hw/vfio/amd-xgbe.c
> +++ b/hw/vfio/amd-xgbe.c
> @@ -26,7 +26,7 @@ static void amd_xgbe_realize(DeviceState *dev, Error **errp)
>  }
>  
>  static const VMStateDescription vfio_platform_amd_xgbe_vmstate = {
> -.name = TYPE_VFIO_AMD_XGBE,
> +.name = "vfio-amd-xgbe",
>  .unmigratable = 1,
>  };
>  
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index d8b79ebe53..564751650f 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -155,7 +155,7 @@ static void vfio_ap_reset(DeviceState *dev)
>  }
>  
>  static const VMStateDescription vfio_ap_vmstate = {
> -.name = VFIO_AP_DEVICE_TYPE,
> +.name = "vfio-ap",
>  .unmigratable = 1,
>  };
>  
> diff --git a/hw/vfio/calxeda-xgmac.c b/hw/vfio/calxeda-xgmac.c
> index e7767c4b02..6cc608b6ca 100644
> --- a/hw/vfio/calxeda-xgmac.c
> +++ b/hw/vfio/calxeda-xgmac.c
> @@ -26,7 +26,7 @@ static void calxeda_xgmac_realize(DeviceState *dev, Error 
> **errp)
>  }
>  
>  static const VMStateDescription vfio_platform_calxeda_xgmac_vmstate = {
> -.name = TYPE_VFIO_CALXEDA_XGMAC,
> +.name = "vfio-calxeda-xgmac",
>  .unmigratable = 1,
>  };
>  
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 31dd3a2a87..d9e39552e2 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -468,7 +468,7 @@ static Property vfio_ccw_properties[] = {
>  };
>  
>  static const VMStateDescription vfio_ccw_vmstate = {
> -.name = TYPE_VFIO_CCW,
> +.name = "vfio-ccw",
>  .unmigratable = 1,
>  };
>  
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 398db38f14..e59a0234dd 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -697,7 +697,7 @@ out:
>  }
>  
>  static const VMStateDescription vfio_platform_vmstate = {
> -.name = TYPE_VFIO_PLATFORM,
> +.name = "vfio-platform",
>  .unmigratable = 1,
>  };
>

Re: [Qemu-devel] [PATCH 1/4] vfio: pci: make "vfio-pci-nohotplug" as MACRO

2019-05-18 Thread Alex Williamson

On Fri, 17 May 2019 20:28:08 -0700
Li Qiang  wrote:

Why?  (No commit message, nor cover letter)

> CC: qemu-triv...@nongnu.org
> Signed-off-by: Li Qiang 
> ---
>  hw/vfio/pci.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8cecb53d5c..08729e5875 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -40,6 +40,8 @@
>  #define TYPE_VFIO_PCI "vfio-pci"
>  #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
>  
> +#define TYPE_VIFO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
> +
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>  
> @@ -3304,8 +3306,8 @@ static void 
> vfio_pci_nohotplug_dev_class_init(ObjectClass *klass, void *data)
>  }
>  
>  static const TypeInfo vfio_pci_nohotplug_dev_info = {
> -.name = "vfio-pci-nohotplug",
> -.parent = "vfio-pci",
> +.name = TYPE_VIFO_PCI_NOHOTPLUG,
> +.parent = TYPE_VFIO_PCI,
>  .instance_size = sizeof(VFIOPCIDevice),
>  .class_init = vfio_pci_nohotplug_dev_class_init,
>  };

Re: [Qemu-devel] [RISU v2 00/11] Support for i386/x86_64 with vector extensions

2019-05-18 Thread Alex Bennée



Jan Bobek  writes:

> This patch series adds support for i386 and x86_64 architectures to
> RISU.  Notably, vector registers (SSE, AVX, AVX-512) are supported for
> verification of the apprentice. This is V2 of the series posted in
> [1].
>
> I decided not to drop the register definitions from the second patch
> as suggested by Alex Bennée [4], but replaced them in the fourth patch
> instead. This keeps the second and third patches code-motion only.
>
> I wasn't 100% sure how to acknowledge Richard's contributions in some
> of the patches, and eventually decided to include a Suggested-by:
> line. Let me know if that's (not) acceptable.

Suggested-by: is a common tag for this sort of thing ;-)

--
Alex Bennée

[Qemu-devel] [PATCH] target/sparc:Remove multiple errors and warnings generated by checkpatch tool within the file target/sparc/asi.h.

2019-05-18 Thread Jules Irenge

Remove multiple errors and warnings generated by checkpatch.pl tool.\nERROR: 
code indent should never use tabs\nERROR: trailing whitespace\nWARNING: Block 
comments use a leading /* on a separate line
---
 target/sparc/asi.h | 352 -
 1 file changed, 183 insertions(+), 169 deletions(-)

diff --git a/target/sparc/asi.h b/target/sparc/asi.h
index d8d6284..1c8cd35 100644
--- a/target/sparc/asi.h
+++ b/target/sparc/asi.h
@@ -1,7 +1,8 @@
 #ifndef _SPARC_ASI_H
 #define _SPARC_ASI_H
 
-/* asi.h:  Address Space Identifier values for the sparc.
+/*
+ * asi.h:  Address Space Identifier values for the sparc.
  *
  * Copyright (C) 1995,1996 David S. Miller (da...@caip.rutgers.edu)
  *
@@ -53,7 +54,8 @@
 #define ASI_M_DATAC_TAG 0x0E   /* Data Cache Tag; rw, ss */
 #define ASI_M_DATAC_DATA0x0F   /* Data Cache Data; rw, ss */
 
-/* The following cache flushing ASIs work only with the 'sta'
+/*
+ * The following cache flushing ASIs work only with the 'sta'
  * instruction. Results are unpredictable for 'swap' and 'ldstuba',
  * so don't do it.
  */
@@ -68,7 +70,9 @@
 /* Block-copy operations are available only on certain V8 cpus. */
 #define ASI_M_BCOPY 0x17   /* Block copy */
 
-/* These affect only the ICACHE and are Ross HyperSparc and TurboSparc 
specific. */
+/*
+ * These affect only the ICACHE and are Ross HyperSparc and TurboSparc 
specific.
+ */
 #define ASI_M_IFLUSH_PAGE   0x18   /* Flush I Cache Line (page); wo, ss */
 #define ASI_M_IFLUSH_SEG0x19   /* Flush I Cache Line (seg); wo, ss */
 #define ASI_M_IFLUSH_REGION 0x1A   /* Flush I Cache Line (region); wo, ss */
@@ -78,7 +82,8 @@
 /* Block-fill operations are available on certain V8 cpus */
 #define ASI_M_BFILL 0x1F
 
-/* This allows direct access to main memory, actually 0x20 to 0x2f are
+/*
+ * This allows direct access to main memory, actually 0x20 to 0x2f are
  * the available ASI's for physical ram pass-through, but I don't have
  * any idea what the other ones do
  */
@@ -101,8 +106,7 @@
 #define ASI_M_DC_FLCLEAR   0x37
 
 #define ASI_M_DCDR 0x39   /* Data Cache Diagnostics Register rw, ss */
-
-#define ASI_M_VIKING_TMP1  0x40  /* Emulation temporary 1 on Viking */
+#define ASI_M_VIKING_TMP1  0x40   /* Emulation temporary 1 on Viking */
 /* only available on SuperSparc I */
 /* #define ASI_M_VIKING_TMP2  0x41 */  /* Emulation temporary 2 on Viking */
 
@@ -123,190 +127,200 @@
 #define ASI_LEON_FLUSH_PAGE 0x10
 
 /* V9 Architecture mandary ASIs. */
-#define ASI_N  0x04 /* Nucleus */
-#define ASI_NL 0x0c /* Nucleus, little endian  */
-#define ASI_AIUP   0x10 /* Primary, user   */
-#define ASI_AIUS   0x11 /* Secondary, user */
-#define ASI_AIUPL  0x18 /* Primary, user, little endian*/
-#define ASI_AIUSL  0x19 /* Secondary, user, little endian  */
-#define ASI_P  0x80 /* Primary, implicit   */
-#define ASI_S  0x81 /* Secondary, implicit */
-#define ASI_PNF0x82 /* Primary, no fault   
*/
-#define ASI_SNF0x83 /* Secondary, no fault 
*/
-#define ASI_PL 0x88 /* Primary, implicit, l-endian */
-#define ASI_SL 0x89 /* Secondary, implicit, l-endian   */
-#define ASI_PNFL   0x8a /* Primary, no fault, l-endian */
-#define ASI_SNFL   0x8b /* Secondary, no fault, l-endian   */
+#define ASI_N   0x04 /* Nucleus */
+#define ASI_NL  0x0c /* Nucleus, little endian  */
+#define ASI_AIUP0x10 /* Primary, user   */
+#define ASI_AIUS0x11 /* Secondary, user */
+#define ASI_AIUPL   0x18 /* Primary, user, little endian*/
+#define ASI_AIUSL   0x19 /* Secondary, user, little endian  */
+#define ASI_P   0x80 /* Primary, implicit   */
+#define ASI_S   0x81 /* Secondary, implicit */
+#define ASI_PNF 0x82 /* Primary, no fault   */
+#define ASI_SNF 0x83 /* Secondary, no fault */
+#define ASI_PL  0x88 /* Primary, implicit, l-endian */
+#define ASI_SL  0x89 /* Secondary, implicit, l-endian   */
+#define ASI_PNFL0x8a /* Primary, no fault, l-endian */
+#define ASI_SNFL0x8b /* Secondary, no fault, l-endian   */
 
-/* SpitFire and later extended ASIs.  The "(III)" marker designates
+/*
+ * SpitFire and later extended ASIs.  The "(III)" marker designates
  * UltraSparc-III and later specific ASIs.  The "(CMT)" marker designates
  * Chip Multi Threading specific ASIs.  "(NG)" designates Niagara specific
  * ASIs, "(4V)"

[Qemu-devel] [Bug 1824622] Re: Qemu 4.0.0-rc3 COLO Primary Crashes with "Assertion `event_unhandled_count > 0' failed."

2019-05-18 Thread Lukas Straub

Fix applied to qemu 4.1

** Changed in: qemu
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1824622

Title:
  Qemu 4.0.0-rc3 COLO Primary Crashes with "Assertion
  `event_unhandled_count > 0' failed."

Status in QEMU:
  Fix Released

Bug description:
  Hello Everyone,
  Now with Qemu 4.0.0-rc3, COLO is finally working so I gave it a try, but the 
Primary is always crashing during Network use. Typing fast in ssh or running 
"top" with 0.1 second delay (change with 'd') reliably trigger the crash for 
me. I use the attached scripts to run Qemu, in my case both primary and 
secondary run on the same Host for testing purposes. See the files in the 
attached .tar.bz2 for more Info, they also contain a Coredump.

  Regards,
  Lukas Straub

  Configure CMDline:
  ./configure --target-list=x86_64-softmmu,i386-softmmu --enable-debug-info

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1824622/+subscriptions

[Qemu-devel] [Bug 1829576] Re: QEMU-SYSTEM-PPC64 Regression QEMU-4.0.0

2019-05-18 Thread Mark Cave-Ayland

I suspect that this may be related to the VSR register conversion. Can
you try applying all of the patches below on top of 4.0 to see if they
resolve the issue?

https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01254.html
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01256.html
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01257.html
https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg01260.html

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1829576

Title:
  QEMU-SYSTEM-PPC64 Regression QEMU-4.0.0

Status in QEMU:
  New

Bug description:
  I have been using QEMU-SYSTEM-PPC64 v3.1.0 to run CentOS7 PPC emulated
  system. It stopped working when I upgraded to QEMU-4.0.0 . I
  downgraded back to QEMU-3.1.0 and it started working again. The
  problem is that my CentOS7 image will not boot up udner QEMU-4.0.0,
  but works fine under QEMU-3.1.0.

  I have an QCOW2 image available at
  https://www.mediafire.com/file/d8dda05ro85whn1/linux-
  centos7-ppc64.qcow2/file . NOTE: It is 15GB. Kind of large.

  I run it as follows:

 qemu-system-ppc64 \
-name "CENTOS7-PPC64" \
-cpu POWER7 -machine pseries \
-m 4096 \
-netdev bridge,id=netbr0,br=br0 \
-device e1000,netdev=netbr0,mac=52:54:3c:13:21:33 \
-hda "./linux-centos7-ppc64.qcow2" \
-monitor stdio

  HOST: I am using Manjaro Linux on an Intel i7 machine with the QEMU
  packages installed via the package manager of the distribution.

  [jsantiago@jlsws0 ~]$ uname -a
  Linux jlsws0.haivision.com 4.19.42-1-MANJARO #1 SMP PREEMPT Fri May 10 
20:52:43 UTC 2019 x86_64 GNU/Linux

  jsantiago@jlsws0 ~]$ cpuinfo 
  Intel(R) processor family information utility, Version 2019 Update 3 Build 
20190214 (id: b645a4a54)
  Copyright (C) 2005-2019 Intel Corporation.  All rights reserved.

  =  Processor composition  =
  Processor name: Intel(R) Core(TM) i7-6700K  
  Packages(sockets) : 1
  Cores : 4
  Processors(CPUs)  : 8
  Cores per package : 4
  Threads per core  : 2

  =  Processor identification  =
  Processor Thread Id.  Core Id.Package Id.
  0 0   0   0   
  1 0   1   0   
  2 0   2   0   
  3 0   3   0   
  4 1   0   0   
  5 1   1   0   
  6 1   2   0   
  7 1   3   0   
  =  Placement on packages  =
  Package Id.   Core Id.Processors
  0 0,1,2,3 (0,4)(1,5)(2,6)(3,7)

  =  Cache sharing  =
  Cache SizeProcessors
  L132  KB  (0,4)(1,5)(2,6)(3,7)
  L2256 KB  (0,4)(1,5)(2,6)(3,7)
  L38   MB  (0,1,2,3,4,5,6,7)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1829576/+subscriptions

[Qemu-devel] [PATCH] nvme: fix copy direction in DMA reads going to CMB

2019-05-18 Thread Klaus Birkelund Jensen

`nvme_dma_read_prp` erronously used `qemu_iovec_*to*_buf` instead of
`qemu_iovec_*from*_buf` when the request involved the controller memory
buffer.

Signed-off-by: Klaus Birkelund Jensen 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 7caf92532a09..63a5b58849fb 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -238,7 +238,7 @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t 
*ptr, uint32_t len,
 }
 qemu_sglist_destroy();
 } else {
-if (unlikely(qemu_iovec_to_buf(, 0, ptr, len) != len)) {
+if (unlikely(qemu_iovec_from_buf(, 0, ptr, len) != len)) {
 trace_nvme_err_invalid_dma();
 status = NVME_INVALID_FIELD | NVME_DNR;
 }
-- 
2.21.0

Re: [Qemu-devel] [Qemu-block] [PATCH] nvme: add Get/Set Feature Timestamp support

2019-05-18 Thread Klaus Birkelund

On Fri, May 17, 2019 at 07:49:18PM -0600, Heitke, Kenneth wrote:
> > > > > > +if (qemu_iovec_from_buf(, 0, ptr, len) != len) {
> > > > > 
> > > > > This should be `qemu_iovec_to_buf`.
> > > > > 
> > > > 
> > > > This function is transferring data from the "host" to the device so I
> > > > believe I am using the correct function.
> > > > 
> > > 
> > > Exactly, but this means that you need to populate `ptr` with data
> > > described by the prps, hence dma_buf_*write* and qemu_iovec_*to*_buf. In
> > > this case `ptr` is set to the address of the uint64_t timestamp, and
> > > that is what we need to write to.
> > > 
> > 
> > I was going to argue with the fact that nvme_dma_read_prp uses
> > qemu_iovec_from_buf. But it uses _to_buf which as far as I can tell is
> > also wrong.
> > 
> 
> Okay, I'm onboard. You're correct. I'll update my patch and re-submit. I can
> also submit a patch to fix nvme_dma_read_prp() unless you or someone else
> wants to.
> 

Hi Kenneth,

The `nvme_dma_read_prp` case is actually already fixed in one of the
patches I sent yesterday ("nvme: simplify PRP mappings"), but I'll
submit it as a separate patch.

Cheers

Re: [Qemu-devel] [PATCH] configure: Fix spelling of sdl-image in --help

2019-05-18 Thread Thomas Huth

On 17/05/2019 20.32, Markus Armbruster wrote:
> Fixes: a442fe2f2b2f20e7be0934277e9400b844b11999
> Cc: qemu-triv...@nongnu.org
> Signed-off-by: Markus Armbruster 
> ---
>  configure | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index d2fc346302..cef51b2a0b 100755
> --- a/configure
> +++ b/configure
> @@ -1745,7 +1745,7 @@ disabled with --disable-FEATURE, default is enabled if 
> available:
>gcrypt  libgcrypt cryptography support
>auth-pamPAM access control
>sdl SDL UI
> -  sdl_image   SDL Image support for icons
> +  sdl-image   SDL Image support for icons
>gtk gtk UI
>vte vte support for the gtk UI
>curses  curses UI
> 

Reviewed-by: Thomas Huth

72 matches

Mail list logo