> -----Original Message-----
> From: Akihiko Odaki <[email protected]>
> Sent: 22 June 2026 15:37
> To: Paolo Bonzini <[email protected]>; BALATON Zoltan
> <[email protected]>
> Cc: Daniel P. Berrangé <[email protected]>; [email protected];
> Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier
> <[email protected]>; Peter Xu <[email protected]>;
> Hervé Poussineau <[email protected]>; Alex Bennée
> <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien
> Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark Cave-
> Ayland <[email protected]>; Marc-André Lureau
> <[email protected]>; John Levon <[email protected]>;
> Thanos Makatos <[email protected]>; Cédric Le Goater
> <[email protected]>
> Subject: Re: [RFC 4/7] system: add memory_region_new /
> memory_region_new_io
> 
> !-------------------------------------------------------------------|
>   CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 2026/06/20 1:32, Paolo Bonzini wrote:
> > On 6/18/26 00:44, BALATON Zoltan wrote:
> >> On Wed, 17 Jun 2026, Paolo Bonzini wrote:
> >>> On 6/16/26 17:55, Daniel P. Berrangé wrote:
> >>>> Prepare for the move to dynamically allocated memory regions by
> >>>> introducing memory_region_new and memory_region_new_io functions
> >>>> which call through to object_new instead of object_initialize.
> >>>
> >>> MemoryRegionOps callback will certainly access fields in the parent.
> >>> Having a separate object gives you no benefit because you still have
> >>> the same use-after-free if the MemoryRegion outlives the device.
> >>> Likewise for buses.
> >>>
> >>> If the device itself is embedded, then all the invariants must hold
> >>> recursively:
> >>>
> >>> - the device must be kept alive by mutual references between the
> >>> device itself and its parent;
> >>>
> >>> - unparent for the device should wait for all guest memory accesses
> >>> to be either completed or cancelled.
> >>>
> >>> So, the problem is that "QEMU automatically references the owning
> >>> Device to keep the MemoryRegion alive" is a hack that should die.
> >>> [...] I disagree that removing embedding will fix any bugs
> >>
> >> I'm trying to understand this better. The way it originally meant to
> >> work according to the memory region docs and implementation is that
> >> memory region is added to an owner as a child which takes a reference
> >> and when the owner is freed it will unparent its children which will
> >> then get unrefed and freed as well if nothing else increased their ref
> >> count.
> >
> > Yes, and that's broken.
> >
> >> The problem is that when something accesses the memory region it has
> >> to keep the owner alive as well so instead of referencing the memory
> >> region it references the owner (not directly but using
> >> memory_region_ref which does this).
> >
> > A mechanism to keep the owner alive exists, and it is mutual references
> > (so that neither owner nor MR die) + unparent of owner (which should
> > wait for pending users to complete before returning).
> >
> > The fact that this mechanism is not used, i.e. unparent of a PCIDevice
> > does not wait for pending BAR accesses to complete, is a bug.
> >
> >> So embedding seems unrelated to this passing refs on
> >> memory regions to owner and could be fixed separately that my patches
> >> attempted.
> >
> > Your patches add the possibility of not embedding; I don't like having
> > two separate API, but I think we only have a disagreement on aesthetics.
> >
> > This is different for Daniel's patches, which try to fix something by
> > not embedding, and I don't think they do.  Currently, if the owner dies,
> > the MemoryRegion dies with it while it shouldn't (which is a problem,
> > absolutely).  But these patches, while preventing the MR from dying,
> > still leave around a dangling pointer to the owner from the MR, so
> > there's not much gained if anything.
> 
> I think we can stack follow-up patches that reject accesses to
> MemoryRegions owned by unrealized devices. That would make this a more
> complete solution. Accessing an unrealized device is an unsafe corner
> case anyway, so it is better to close it explicitly.
> 
> memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr()
> are
> a little more complicated, because direct accesses to those regions can
> happen outside the BQL. Still, we can unmap the memory when the
> MemoryRegion is finalized, as memory_region_init_ram() already does.
> 
> However, after looking into this further, I found one complication:
> x-vfio-user-server. In that case, the unmap timing is controlled by
> libvfio-user, so QEMU cannot keep the memory mapped until the
> MemoryRegion reference is dropped. This is not a new issue; it is a
> pre-existing use-after-free hazard. I think a practical fix would be to
> extend the libvfio-user API to support asynchronous unmapping.
> 
> Cc'ing the vfio-user maintainers.

VFIO_USER_DMA_UNMAP is already asynchronous, it sends the response only after 
it's done unmapping. What would that asynchronous unmapping API look like? 
VFIO_USER_DMA_UNMAP would send a respond that it received the operation and 
started unmapping, and then another one to say it's done unmapping?
Also, since (AFAIK) x-vfio-user-server is orphaned is it worth putting the 
effort in this?

> 
> Regards,
> Akihiko Odaki

Reply via email to