Re: apparent race condition in mttcg memory handling

Pierrick Bouvier Mon, 21 Jul 2025 10:29:40 -0700

On 7/21/25 10:25 AM, Pierrick Bouvier wrote:

On 7/21/25 10:14 AM, Michael Tokarev wrote:

On 21.07.2025 19:29, Pierrick Bouvier wrote:

On 7/21/25 9:23 AM, Pierrick Bouvier wrote:

..

looks like a good target for TSAN, which might expose the race without
really having to trigger it.
https://www.qemu.org/docs/master/devel/testing/main.html#building-and-
testing-with-tsan


I think I tried with TSAN and it gave something useful even.
The prob now is to reproduce the thing by someone more familiar
with this stuff than me :)

Else, you can reproduce your run using rr record -h (chaos mode) [1],
which randomly schedules threads, until it catches the segfault, and
then you'll have a reproducible case to debug.


In case you never had opportunity to use rr, it is quite convenient,
because you can set a hardware watchpoint on your faulty pointer (watch
-l), do a reverse-continue, and in most cases, you'll directly reach
where the bug happened. Feels like cheating.


rr is the first thing I tried.  Nope, it's absolutely hopeless.   It
tried to boot just the kernel for over 30 minutes, after which I just
gave up.


I had a similar thing to debug recently, and with a simple loop, I
couldn't expose it easily. The bug I had was triggered with 3%
probability, which seems close from yours.
As rr record -h is single threaded, I found useful to write a wrapper
script [1] to run one instance, and then run it in parallel using:
./run_one.sh | head -n 10000 | parallel --bar -j$(nproc)

With that, I could expose the bug in 2 minutes reliably (vs trying for
more than one hour before). With your 64 cores, I'm sure it will quickly
expose it.

Might be worth a try, as you need to only catch the bug once to be able
to reproduce it.

[1] https://github.com/pbo-linaro/qemu/blob/master/try_rme.sh

In this script, I finally used qemu rr feature (as QEMU was workingfine, but there was a bug in the software stack itself, that I wanted toinvestigate under gdbstub). But I was mentioning the same approach usingrr (the tool).

Thanks,

/mjt

Re: apparent race condition in mttcg memory handling

Reply via email to