On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, 
TCGReg data,
  static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                         TCGReg arg1, intptr_t arg2)
  {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    } else {
+        tcg_debug_assert(arg >= TCG_REG_V1);
+        switch (prev_vece) {
+        case MO_8:
+            insn = OPC_VLE8_V;
+            break;
+        case MO_16:
+            insn = OPC_VLE16_V;
+            break;
+        case MO_32:
+            insn = OPC_VLE32_V;
+            break;
+        case MO_64:
+            insn = OPC_VLE64_V;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
      tcg_out_ldst(s, insn, arg, arg1, arg2);

tcg_out_ld/st are called directly from register allocation spill/fill.
You'll need to set vtype here, and cannot rely on this having been done in 
tcg_out_vec_op.

That said, with a little-endian host, the selected element size doesn't matter *too* much. A write of 8 uint16_t or a write of 2 uint64_t produces the same bits in memory.

Therefore you can examine prev_vtype and adjust only if the vector length changes. But we do that -- e.g. load V256, store V256, store V128 to perform a 384-bit store for AArch64 SVE when VQ=3.

Is there an advantage to using the vector load/store whole register insns, if the requested length is not too small? IIRC the NF field can be used to store multiples, but we can't store half of a register with these.


r~

Reply via email to