On 2023/9/3 09:06, Richard Henderson wrote:
On 9/1/23 22:02, Jiajie Chen wrote:
If LSX is available, use LSX instructions to implement 128-bit load &
store.
Is this really guaranteed to be an atomic 128-bit operation?
Song Gao, please check this.
Or, as for many vector processors, is this really two separate 64-bit
memory operations under the hood?
+static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo,
TCGReg data_hi,
+ TCGReg addr_reg, MemOpIdx oi,
bool is_ld)
+{
+ TCGLabelQemuLdst *ldst;
+ HostAddress h;
+
+ ldst = prepare_host_addr(s, &h, addr_reg, oi, true);
+ if (is_ld) {
+ tcg_out_opc_vldx(s, TCG_VEC_TMP0, h.base, h.index);
+ tcg_out_opc_vpickve2gr_d(s, data_lo, TCG_VEC_TMP0, 0);
+ tcg_out_opc_vpickve2gr_d(s, data_hi, TCG_VEC_TMP0, 1);
+ } else {
+ tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_lo, 0);
+ tcg_out_opc_vinsgr2vr_d(s, TCG_VEC_TMP0, data_hi, 1);
+ tcg_out_opc_vstx(s, TCG_VEC_TMP0, h.base, h.index);
+ }
You should use h.aa.atom < MO_128 to determine if 128-bit atomicity,
and therefore the vector operation, is required. I assume the gr<->vr
moves have a cost and two integer operations are preferred when
allowable.
Compare the other implementations of this function.
r~