在 2023/10/30 下午7:54, Jiajie Chen 写道:
On 2023/10/30 16:23, gaosong wrote:
在 2023/10/28 下午9:09, Jiajie Chen 写道:
On 2023/10/26 14:54, gaosong wrote:
在 2023/10/26 上午9:38, Jiajie Chen 写道:
On 2023/10/26 03:04, Richard Henderson wrote:
On 10/25/23 10:13, Jiajie Chen wrote:
On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive,
TCGv_i128 block.
See target/ppc/translate.c, gen_stqcx_.
The situation here is slightly different: aarch64 and ppc64
have both 128-bit ll and sc, however LoongArch v1.1 only has
64-bit ll and 128-bit sc.
Ah, that does complicate things.
Possibly use the combination of ll.d and ld.d:
ll.d lo, base, 0
ld.d hi, base, 4
# do some computation
sc.q lo, hi, base
# try again if sc failed
Then a possible implementation of gen_ll() would be: align
base to 128-bit boundary, read 128-bit from memory, save
64-bit part to rd and record whole 128-bit data in llval.
Then, in gen_sc_q(), it uses a 128-bit cmpxchg.
But what about the reversed instruction pattern: ll.d hi,
base, 4; ld.d lo, base 0?
It would be worth asking your hardware engineers about the
bounds of legal behaviour. Ideally there would be some very
explicit language, similar to
I'm a community developer not affiliated with Loongson. Song
Gao, could you provide some detail from Loongson Inc.?
ll.d r1, base, 0
dbar 0x700 ==> see 2.2.8.1
ld.d r2, base, 8
...
sc.q r1, r2, base
Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence
and translate the sequence into one tcg_gen_qemu_ld_i128 and split
the result into two 64-bit parts. Can do this in QEMU?
Oh, I'm not sure.
I think we just need to implement sc.q. We don't need to care about
'll.d-dbar-ld.d'. It's just like 'll.q'.
It needs the user to ensure that .
ll.q' is
1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1
2) dbar 0x700
3) ld.d r2 base, 8 ==> load the high 64 bits to r2
sc.q needs to
1) Use 64-bit cmpxchg.
2) Write 128 bits to memory.
Consider the following code:
ll.d r1, base, 0
dbar 0x700
ld.d r2, base, 8
addi.d r2, r2, 1
sc.q r1, r2, base
We translate them into native code:
ld.d r1, base, 0
mv LLbit, 1
mv LLaddr, base
mv LLval, r1
dbar 0x700
ld.d r2, base, 8
addi.d r2, r2, 1
if (LLbit == 1 && LLaddr == base) {
cmpxchg addr=base compare=LLval new=r1
128-bit write {r2, r1} to base if cmpxchg succeeded
}
set r1 if sc.q succeeded
If the memory content of base+8 has changed between ld.d r2 and
addi.d r2, the atomicity is not guaranteed, i.e. only the high part
has changed, the low part hasn't.
Sorry, my mistake. need use cmpxchg_i128. See
target/arm/tcg/translate-a64.c gen_store_exclusive().
gen_scq(rd, rk, rj)
{
...
TCGv_i128 t16 = tcg_temp_new_i128();
TCGv_i128 c16 = tcg_temp_new_i128();
TCGv_i64 low = tcg_temp_new_i64();
TCGv_i64 high= tcg_temp_new_i64();
TCGv_i64 temp = tcg_temp_new_i64();
tcg_gen_concat_i64_i128(t16, cpu_gpr[rd], cpu_gpr[rk]));
tcg_gen_qemu_ld(low, cpu_lladdr, ctx->mem_idx, MO_TEUQ);
tcg_gen_addi_tl(temp, cpu_lladdr, 8);
tcg_gen_mb(TCG_BAR_SC | TCG_MO_LD_LD);
tcg_gen_qemu_ld(high, temp, ctx->mem_idx, MO_TEUQ);