Re: [PATCH 0/5] Add LoongArch v1.1 instructions

gaosong Mon, 30 Oct 2023 01:24:42 -0700

在 2023/10/28 下午9:09, Jiajie Chen 写道:

On 2023/10/26 14:54, gaosong wrote:
在 2023/10/26 上午9:38, Jiajie Chen 写道:
On 2023/10/26 03:04, Richard Henderson wrote:
On 10/25/23 10:13, Jiajie Chen wrote:
On 2023/10/24 07:26, Richard Henderson wrote:
See target/arm/tcg/translate-a64.c, gen_store_exclusive,TCGv_i128 block.
See target/ppc/translate.c, gen_stqcx_.
The situation here is slightly different: aarch64 and ppc64 haveboth 128-bit ll and sc, however LoongArch v1.1 only has 64-bit lland 128-bit sc.
Ah, that does complicate things.
Possibly use the combination of ll.d and ld.d:


ll.d lo, base, 0
ld.d hi, base, 4

# do some computation

sc.q lo, hi, base

# try again if sc failed
Then a possible implementation of gen_ll() would be: align base to128-bit boundary, read 128-bit from memory, save 64-bit part to rdand record whole 128-bit data in llval. Then, in gen_sc_q(), ituses a 128-bit cmpxchg.
But what about the reversed instruction pattern: ll.d hi, base, 4;ld.d lo, base 0?
It would be worth asking your hardware engineers about the boundsof legal behaviour. Ideally there would be some very explicitlanguage, similar to
I'm a community developer not affiliated with Loongson. Song Gao,could you provide some detail from Loongson Inc.?
ll.d   r1, base, 0
dbar 0x700          ==> see 2.2.8.1
ld.d  r2, base,  8
...
sc.q r1, r2, base
Thanks! I think we may need to detect the ll.d-dbar-ld.d sequence andtranslate the sequence into one tcg_gen_qemu_ld_i128 and split theresult into two 64-bit parts. Can do this in QEMU?

Oh, I'm not sure.

I think we just need to implement sc.q. We don't need to care about'll.d-dbar-ld.d'. It's just like 'll.q'.

It needs the user to ensure that .

ll.q' is
1) ll.d r1 base, 0 ==> set LLbit, load the low 64 bits into r1
2) dbar 0x700　
3) ld.d r2 base, 8 ==> load the high 64 bits to r2

sc.q needs to
1) Use 64-bit cmpxchg.
2) Write 128 bits to memory.

Thanks.
Song Gao

For this series,
I think we need set the new config bits to the 'max cpu', and changelinux-user/target_elf.h ''any' to 'max', so that we can use these newinstructions on linux-user mode.
I will work on it.
Thanks
Song Gao
https://developer.arm.com/documentation/ddi0487/latest/
B2.9.5 Load-Exclusive and Store-Exclusive instruction usagerestrictions
But you could do the same thing, aligning and recording the entire128-bit quantity, then extract the ll.d result based on address bit6. This would complicate the implementation of sc.d as well, butwould perhaps bring us "close enough" to the actual architecture.
Note that our Arm store-exclusive implementation isn't quite inspec either. There is quite a large comment within translate-a64.cstore_exclusive() about the ways things are not quite right. Butit seems to be close enough for actual usage to succeed.
r~

Re: [PATCH 0/5] Add LoongArch v1.1 instructions

Reply via email to