Re: MMIO through IOMMU from a TCG processor

Pierrick Bouvier Sat, 18 Oct 2025 11:05:06 -0700

Hi Mark,

"recorded section therefore seems to be incorrect".
do you observe a crash, or on assert failing at execution?

I don't know in details the code you mention, but after investigatingand fixing https://gitlab.com/qemu-project/qemu/-/issues/3040, I canshare a few things.

Overall, what you describe looks like a race condition exposing alifetime issue, especially when saying "we 'loose' the address spacethat has been returned by the translate function".A value was not updated as expected and is out of sync, or was freed tooearly. Memory regions lifetime is something definitely tricky in QEMU,and when you mix that with RCU, things can become very obscure inmultithreaded scenarios.

In the bug above, the solution was to stop duplicating this information,and get it from the same source. The overhead to read such atomic datais quite small, thanks to use of RCU.At KVM Forum, Paolo told me he introduced this copy precisely to avoidissues, but the opposite happened in reality, which we both found wasquite funny.


Additional questions:

- At which time of execution does it happen? Is it during pci devicesinitialization, or when remapping specific memory sections?- Is the bug deterministic or random? If random, does increasing thenumber of pci devices attached increase the probably to meet it?


Additional tools:

- If you observe a crash, build with asan. If you get a use-after-freeerror, it's probably an issue with RCU cleaning up things before youexpect. This is what I had in the bug mentioned above.- If your assert fail, I can recommend you capture execution through rr(https://github.com/rr-debugger/rr), using chaos mode rr record --chaos,which will randomize scheduling of threads. I don't know if you'refamiliar with it, but it allows you to debug your execution "backward".Once you captured the faulty execution, you can reach the crash orfaulty assert, then execute backward (reverse-continue) with awatchpoint set on the (correct) value that was updated meanwhile. Thisway, you'll find which sequence led to desynchronization, and thenyou'll have a good start to deduce what the root cause is.- Spend some time making the crash/assert almost deterministic, it willsave you time later, especially when implementing a possible fix andprove it works.


I hope it helps.

Regards,
Pierrick

On 10/9/25 2:10 AM, Mark Burton wrote:


(Adding Pierrick)
Thanks for getting back to me Mark.

I initially thought the same, and I think I have seen that issue, I have also 
taken that patch, However …..

For MMIO access, as best I can tell, the initial calculation of the despatch is 
based on the iotlb reported by the translate function (correct), while the 
subsequent use of the section number uses the dispatch table from the CPU’s 
address space….. which gives you the wrong section.

I would very happily do a live debug with you (or anybody) if it would help… 
I’m more than willing to believe I’ve made a mistake, but I just don’t see how 
it’s supposed to work.

I have been looking at solutions, and right now, I don’t see anything obvious. 
As best I can tell, we “loose” the address space that has been returned by the 
translate function - so, either we would need a way to hold onto that, or, we 
would have to re-call the function, or….
All of those options look really really nasty to me.

The issue is going to be systems where SMMU’s are used all over the place, 
specifically, in front of MMIO. (Memory works OK because we get the memory 
pointer itself, all is fine, the issue seems only be with MMIO accesses through 
IOMMU regions).

Cheers
Mark.

On 9 Oct 2025, at 10:43, Mark Cave-Ayland <[email protected]> wrote:

On 08/10/2025 13:38, Mark Burton wrote:

All, sorry for the wide CC, I’m trying to find somebody who understands this 
corder of the code…. This is perhaps a obscure, but I think it should work.
I am trying to access an MMIO region through an IOMMU, from TCG.
The IOMMU translation has provided an address space that is different from the 
CPU’s own address space.
In address_space_translate_for_iotlb the section is calculated using the 
address space provide by the IOMMU translation.

d = flatview_to_dispatch(address_space_to_flatview(iotlb.target_as));

Later, we come to do the actual access (via e.g. do_st_mmio_leN), and at this 
point we pick up the cpu’s address spaces in iotlb_to_section, which is 
different, and the recorded section therefore seems to be incorrect.

CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
AddressSpaceDispatch *d = cpuas->memory_dispatch;
int section_index = index & ~TARGET_PAGE_MASK;
MemoryRegionSection *ret;

assert(section_index < d->map.sections_nb);
ret = d->map.sections + section_index;

What I don’t fully understand is how this is supposed to work….?
Have I missed something obvious?
Cheers
Mark.


What version of QEMU are you using? I'm wondering if you're getting caught out 
by a variant of this: https://gitlab.com/qemu-project/qemu/-/issues/3040.


ATB,

Mark.

Re: MMIO through IOMMU from a TCG processor

Reply via email to