On 3/16/20 1:04 AM, LIU Zhiwei wrote: >> As a preference, I think you can do away with this helper. >> Simply use the slideup helper with argument 1, and then >> afterwards store the integer register into element 0. You should be able to >> re-use code from vmv.s.x for that. > When I try it, I find it is some difficult, because vmv.s.x will clean > the elements (0 < index < VLEN/SEW).
Well, two things about that: (1) The 0.8 version of vmv.s.x does *not* zero the other elements, so we'll want to be prepared for that. (2) We have 8 insns that, in the end come down to a direct element access, possibly with some other processing. So we'll want basic helper functions that can locate an element by immediate offset and by variable offset: /* Compute the offset of vreg[idx] relative to cpu_env. The index must be in range of VLMAX. */ int vec_element_ofsi(int vreg, int idx, int sew); /* Compute a pointer to vreg[idx]. If need_bound is true, mask idx into VLMAX, Otherwise we know a-priori that idx is already in bounds. */ void vec_element_ofsx(DisasContext *s, TCGv_ptr base, TCGv idx, int sew, bool need_bound); /* Load idx >= VLMAX ? 0 : vreg[idx] */ void vec_element_loadi(DisasContext *s, TCGv_i64 val, int vreg, int idx, int sew); void vec_element_loadx(DisasContext *s, TCGv_i64 val, int vreg, TCGv idx, int sew); /* Store vreg[imm] = val. The index must be in range of VLMAX. */ void vec_element_storei(DisasContext *s, int vreg, int imm, TCGv_i64 val); void vec_element_storex(DisasContext *s, int vreg, TCGv idx, TCGv_i64 val); (3) It would be handy to have TCGv cpu_vl. Then: vext.x.v: If rs1 == 0, Use vec_element_loadi(s, x[rd], vs2, 0, s->sew). else Use vec_element_loadx(s, x[rd], vs2, x[rs1], true). vmv.s.x: over = gen_new_label(); tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over); For 0.7.1: Use tcg_gen_dup8i to zero all VLMAX elements of vd. If rs1 == 0, goto done. Use vec_element_storei(s, vs2, 0, x[rs1]). done: gen_set_label(over); vfmv.f.s: Use vec_element_loadi(x, f[rd], vs2, 0). NaN-box f[rd] as necessary for SEW. vfmv.s.f: tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_vl, 0, over); For 0.7.1: Use tcg_gen_dup8i to zero all VLMAX elements of vd. Let tmp = f[rs1], nan-boxed as necessary for SEW. Use vec_element_storei(s, vs2, 0, tmp). gen_set_label(over); vslide1up.vx: Ho hum, I forgot about masking. Some options: (1) Call a helper just as you did in your original patch. (2) Call a helper only for !vm, for vm as below. (3) Call vslideup w/1. tcg_gen_brcondi(TCG_COND_EQ, cpu_vl, 0, over); If !vm, // inline test for v0[0] vec_element_loadi(s, tmp, 0, 0, MO_8); tcg_gen_andi_i64(tmp, tmp, 1); tcg_gen_brcondi(TCG_COND_EQ, tmp, 0, over); Use vec_element_store(s, vd, 0, x[rs1]). gen_set_label(over); vslide1down.vx: For !vm, this is complicated enough for a helper. If using option 3 for vslide1up, then the store becomes: tcg_gen_subi_tl(tmp, cpu_vl, 1); vec_element_storex(s, base, tmp, x[rs1]); vrgather.vx: If !vm or !vl_eq_vlmax, use helper. vec_element_loadx(s, tmp, vs2, x[rs1]); Use tcg_gen_gvec_dup_i64 to store to tmp to vd. vrgather.vi: If !vm or !vl_eq_vlmax, use helper. If imm >= vlmax, Use tcg_gen_dup8i to zero vd; else, ofs = vec_element_ofsi(s, vs2, imm, s->sew); tcg_gen_gvec_dup_mem(sew, vreg_ofs(vd), ofs, vlmax, vlmax); r~