On 2026/06/20 1:32, Paolo Bonzini wrote:
On 6/18/26 00:44, BALATON Zoltan wrote:
On Wed, 17 Jun 2026, Paolo Bonzini wrote:
On 6/16/26 17:55, Daniel P. Berrangé wrote:
Prepare for the move to dynamically allocated memory regions by
introducing memory_region_new and memory_region_new_io functions
which call through to object_new instead of object_initialize.
MemoryRegionOps callback will certainly access fields in the parent.
Having a separate object gives you no benefit because you still have
the same use-after-free if the MemoryRegion outlives the device.
Likewise for buses.
If the device itself is embedded, then all the invariants must hold
recursively:
- the device must be kept alive by mutual references between the
device itself and its parent;
- unparent for the device should wait for all guest memory accesses
to be either completed or cancelled.
So, the problem is that "QEMU automatically references the owning
Device to keep the MemoryRegion alive" is a hack that should die.
[...] I disagree that removing embedding will fix any bugs
I'm trying to understand this better. The way it originally meant to
work according to the memory region docs and implementation is that
memory region is added to an owner as a child which takes a reference
and when the owner is freed it will unparent its children which will
then get unrefed and freed as well if nothing else increased their ref
count.
Yes, and that's broken.
The problem is that when something accesses the memory region it has
to keep the owner alive as well so instead of referencing the memory
region it references the owner (not directly but using
memory_region_ref which does this).
A mechanism to keep the owner alive exists, and it is mutual references
(so that neither owner nor MR die) + unparent of owner (which should
wait for pending users to complete before returning).
The fact that this mechanism is not used, i.e. unparent of a PCIDevice
does not wait for pending BAR accesses to complete, is a bug.
So embedding seems unrelated to this passing refs on
memory regions to owner and could be fixed separately that my patches
attempted.
Your patches add the possibility of not embedding; I don't like having
two separate API, but I think we only have a disagreement on aesthetics.
This is different for Daniel's patches, which try to fix something by
not embedding, and I don't think they do. Currently, if the owner dies,
the MemoryRegion dies with it while it shouldn't (which is a problem,
absolutely). But these patches, while preventing the MR from dying,
still leave around a dangling pointer to the owner from the MR, so
there's not much gained if anything.
I think we can stack follow-up patches that reject accesses to
MemoryRegions owned by unrealized devices. That would make this a more
complete solution. Accessing an unrealized device is an unsafe corner
case anyway, so it is better to close it explicitly.
memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr() are
a little more complicated, because direct accesses to those regions can
happen outside the BQL. Still, we can unmap the memory when the
MemoryRegion is finalized, as memory_region_init_ram() already does.
However, after looking into this further, I found one complication:
x-vfio-user-server. In that case, the unmap timing is controlled by
libvfio-user, so QEMU cannot keep the memory mapped until the
MemoryRegion reference is dropped. This is not a new issue; it is a
pre-existing use-after-free hazard. I think a practical fix would be to
extend the libvfio-user API to support asynchronous unmapping.
Cc'ing the vfio-user maintainers.
Regards,
Akihiko Odaki