Hi Mark,
"recorded section therefore seems to be incorrect".
do you observe a crash, or on assert failing at execution?
I don't know in details the code you mention, but after investigating
and fixing https://gitlab.com/qemu-project/qemu/-/issues/3040, I can
share a few things.
Overall, what you describe looks like a race condition exposing a
lifetime issue, especially when saying "we 'loose' the address space
that has been returned by the translate function".
A value was not updated as expected and is out of sync, or was freed too
early. Memory regions lifetime is something definitely tricky in QEMU,
and when you mix that with RCU, things can become very obscure in
multithreaded scenarios.
In the bug above, the solution was to stop duplicating this information,
and get it from the same source. The overhead to read such atomic data
is quite small, thanks to use of RCU.
At KVM Forum, Paolo told me he introduced this copy precisely to avoid
issues, but the opposite happened in reality, which we both found was
quite funny.
Additional questions:
- At which time of execution does it happen? Is it during pci devices
initialization, or when remapping specific memory sections?
- Is the bug deterministic or random? If random, does increasing the
number of pci devices attached increase the probably to meet it?
Additional tools:
- If you observe a crash, build with asan. If you get a use-after-free
error, it's probably an issue with RCU cleaning up things before you
expect. This is what I had in the bug mentioned above.
- If your assert fail, I can recommend you capture execution through rr
(https://github.com/rr-debugger/rr), using chaos mode rr record --chaos,
which will randomize scheduling of threads. I don't know if you're
familiar with it, but it allows you to debug your execution "backward".
Once you captured the faulty execution, you can reach the crash or
faulty assert, then execute backward (reverse-continue) with a
watchpoint set on the (correct) value that was updated meanwhile. This
way, you'll find which sequence led to desynchronization, and then
you'll have a good start to deduce what the root cause is.
- Spend some time making the crash/assert almost deterministic, it will
save you time later, especially when implementing a possible fix and
prove it works.
I hope it helps.
Regards,
Pierrick
On 10/9/25 2:10 AM, Mark Burton wrote:
(Adding Pierrick)
Thanks for getting back to me Mark.
I initially thought the same, and I think I have seen that issue, I have also
taken that patch, However …..
For MMIO access, as best I can tell, the initial calculation of the despatch is
based on the iotlb reported by the translate function (correct), while the
subsequent use of the section number uses the dispatch table from the CPU’s
address space….. which gives you the wrong section.
I would very happily do a live debug with you (or anybody) if it would help…
I’m more than willing to believe I’ve made a mistake, but I just don’t see how
it’s supposed to work.
I have been looking at solutions, and right now, I don’t see anything obvious.
As best I can tell, we “loose” the address space that has been returned by the
translate function - so, either we would need a way to hold onto that, or, we
would have to re-call the function, or….
All of those options look really really nasty to me.
The issue is going to be systems where SMMU’s are used all over the place,
specifically, in front of MMIO. (Memory works OK because we get the memory
pointer itself, all is fine, the issue seems only be with MMIO accesses through
IOMMU regions).
Cheers
Mark.
On 9 Oct 2025, at 10:43, Mark Cave-Ayland <[email protected]> wrote:
On 08/10/2025 13:38, Mark Burton wrote:
All, sorry for the wide CC, I’m trying to find somebody who understands this
corder of the code…. This is perhaps a obscure, but I think it should work.
I am trying to access an MMIO region through an IOMMU, from TCG.
The IOMMU translation has provided an address space that is different from the
CPU’s own address space.
In address_space_translate_for_iotlb the section is calculated using the
address space provide by the IOMMU translation.
d = flatview_to_dispatch(address_space_to_flatview(iotlb.target_as));
Later, we come to do the actual access (via e.g. do_st_mmio_leN), and at this
point we pick up the cpu’s address spaces in iotlb_to_section, which is
different, and the recorded section therefore seems to be incorrect.
CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
AddressSpaceDispatch *d = cpuas->memory_dispatch;
int section_index = index & ~TARGET_PAGE_MASK;
MemoryRegionSection *ret;
assert(section_index < d->map.sections_nb);
ret = d->map.sections + section_index;
What I don’t fully understand is how this is supposed to work….?
Have I missed something obvious?
Cheers
Mark.
What version of QEMU are you using? I'm wondering if you're getting caught out
by a variant of this: https://gitlab.com/qemu-project/qemu/-/issues/3040.
ATB,
Mark.