Alex Bennée <[email protected]> writes:
> Akihiko Odaki <[email protected]> writes:
>
>> On 2026/05/19 4:35, Alex Bennée wrote:
>>> Akihiko Odaki <[email protected]> writes:
>>>
>>>> This fixes a deadlock I previously observed with the test in [1].
>>>>
>>>> However, I can no longer reproduce the issue reliably with that test, so
>>>> I used Codex, a coding agent, to write a more reliable local test case,
>>>> shown below. I applied to Codex for Open Source to get access. The test
>>>> case is not intended for merge: current policy prohibits that, and it is
>>>> probably not worth carrying anyway because race-condition tests are
>>>> inherently fragile.
>>> What sort of hit rate where you getting with the race? So far they
>>> have
>>> both been rock solid without the additional patches for me.
>>
>> I hit the deadlock in 8 out of 10 trials.
>
> It's taking a lot longer on my system (~ 1 in 100) but with these
> patches I'm still seeing a hang, it just takes a lot longer to get
> there.
tsan shows:
[INFO] mapping blob object resource
[INFO] resource_map_blob response is CtrlHeader { hdr_type: Command(4358),
flags: 0, fence_id: 0, ctx_id: 0, _padding: 0 }
[INFO] unmapping blob object resource
==================
WARNING: ThreadSanitizer: data race (pid=3564641)
Write of size 8 at 0x55c8ce6d4250 by thread T1 (mutexes: write M0, write M1):
#0 qemu_ram_free <null> (qemu-system-aarch64+0x98f863) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#1 memory_region_destructor_ram <null> (qemu-system-aarch64+0x977046)
(BuildId: 9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#2 memory_region_finalize <null> (qemu-system-aarch64+0x9830e5) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#3 object_unref <null> (qemu-system-aarch64+0xfa741c) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#4 object_finalize_child_property <null> (qemu-system-aarch64+0xfa765f)
(BuildId: 9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#5 object_unref <null> (qemu-system-aarch64+0xfa73d6) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#6 flatview_destroy <null> (qemu-system-aarch64+0x978e7d) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#7 call_rcu_thread <null> (qemu-system-aarch64+0x122e268) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#8 qemu_thread_start <null> (qemu-system-aarch64+0x121cc8d) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
Previous atomic read of size 8 at 0x55c8ce6d4250 by thread T7:
#0 qemu_ram_block_from_host <null> (qemu-system-aarch64+0x98fabb) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#1 qemu_ram_addr_from_host_nofail <null> (qemu-system-aarch64+0x98ff16)
(BuildId: 9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#2 get_page_addr_code_hostp <null> (qemu-system-aarch64+0x4bbd0b) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#3 tb_htable_lookup <null> (qemu-system-aarch64+0x49f7bc) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#4 cpu_exec_loop <null> (qemu-system-aarch64+0x4a08a5) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#5 cpu_exec_setjmp <null> (qemu-system-aarch64+0x4a112b) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#6 cpu_exec <null> (qemu-system-aarch64+0x4a1b74) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#7 tcg_cpu_exec <null> (qemu-system-aarch64+0x4cb92b) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#8 mttcg_cpu_thread_fn <null> (qemu-system-aarch64+0x4cbe81) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#9 do_st2_mmu <null> (qemu-system-aarch64+0x4ba389) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#10 helper_stw_mmu <null> (qemu-system-aarch64+0x4bc571) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#11 <null> <null> (0x7f936faabdb2)
#12 cpu_exec_loop <null> (qemu-system-aarch64+0x4a04fc) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#13 cpu_exec_setjmp <null> (qemu-system-aarch64+0x4a112b) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#14 cpu_loop_exit_noexc <null> (qemu-system-aarch64+0x4a2242) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#15 cpu_io_recompile <null> (qemu-system-aarch64+0x4b0a9b) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#16 do_ld_mmio_beN <null> (qemu-system-aarch64+0x4b47c9) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#17 do_ld2_mmu <null> (qemu-system-aarch64+0x4b93aa) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#18 helper_lduw_mmu <null> (qemu-system-aarch64+0x4bc0a7) (BuildId:
9e57c19eb7cc79d8195b5fb05324859b4db6fbbc)
#19 <null> <null> (0x7f936faab758)
<snip>
So I guess we are trying to free the memory while still running?
--
Alex Bennée
Virtualisation Tech Lead @ Linaro