On 8/13/24 21:34, LIU Zhiwei wrote:
@@ -641,6 +645,13 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) case TCG_TYPE_I64: tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0); break; + case TCG_TYPE_V64: + case TCG_TYPE_V128: + case TCG_TYPE_V256: + tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0); + tcg_target_set_vec_config(s, type, prev_vece); + tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
I suggest these asserts be in tcg_out_opc_* That way you don't need to replicate to all uses.
+static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg dst, TCGReg src)
Oh, please drop all of the inline markup, from all patches. Let the compiler decide.
+static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece, + TCGReg dst, TCGReg base, intptr_t offset) +{ + tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset); + return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0); +}
Is this really better than using strided load with rs2 = r0? r~