On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc,
TCGReg data,
static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
TCGReg arg1, intptr_t arg2)
{
- RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+ RISCVInsn insn;
+
+ if (type < TCG_TYPE_V64) {
+ insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+ } else {
+ tcg_debug_assert(arg >= TCG_REG_V1);
+ switch (prev_vece) {
+ case MO_8:
+ insn = OPC_VLE8_V;
+ break;
+ case MO_16:
+ insn = OPC_VLE16_V;
+ break;
+ case MO_32:
+ insn = OPC_VLE32_V;
+ break;
+ case MO_64:
+ insn = OPC_VLE64_V;
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
tcg_out_ldst(s, insn, arg, arg1, arg2);
tcg_out_ld/st are called directly from register allocation spill/fill.
You'll need to set vtype here, and cannot rely on this having been done in
tcg_out_vec_op.
That said, with a little-endian host, the selected element size doesn't matter *too* much.
A write of 8 uint16_t or a write of 2 uint64_t produces the same bits in memory.
Therefore you can examine prev_vtype and adjust only if the vector length changes. But we
do that -- e.g. load V256, store V256, store V128 to perform a 384-bit store for AArch64
SVE when VQ=3.
Is there an advantage to using the vector load/store whole register insns, if the
requested length is not too small? IIRC the NF field can be used to store multiples, but
we can't store half of a register with these.
r~