> -----Original Message-----
> From: Akihiko Odaki <[email protected]>
> Sent: 23 June 2026 06:42
> To: Thanos Makatos <[email protected]>; Paolo Bonzini
> <[email protected]>; BALATON Zoltan <[email protected]>
> Cc: Daniel P. Berrangé <[email protected]>; [email protected];
> Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier
> <[email protected]>; Peter Xu <[email protected]>;
> Hervé Poussineau <[email protected]>; Alex Bennée
> <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien
> Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark Cave-
> Ayland <[email protected]>; Marc-André Lureau
> <[email protected]>; John Levon <[email protected]>;
> Cédric Le Goater <[email protected]>
> Subject: Re: [RFC 4/7] system: add memory_region_new /
> memory_region_new_io
> 
> !-------------------------------------------------------------------|
>   CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 2026/06/23 6:24, Thanos Makatos wrote:
> >> -----Original Message-----
> >> From: Akihiko Odaki <[email protected]>
> >> Sent: 22 June 2026 15:37
> >> To: Paolo Bonzini <[email protected]>; BALATON Zoltan
> >> <[email protected]>
> >> Cc: Daniel P. Berrangé <[email protected]>; [email protected];
> >> Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier
> >> <[email protected]>; Peter Xu <[email protected]>;
> >> Hervé Poussineau <[email protected]>; Alex Bennée
> >> <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien
> >> Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark
> Cave-
> >> Ayland <[email protected]>; Marc-André Lureau
> >> <[email protected]>; John Levon <[email protected]>;
> >> Thanos Makatos <[email protected]>; Cédric Le Goater
> >> <[email protected]>
> >> Subject: Re: [RFC 4/7] system: add memory_region_new /
> >> memory_region_new_io
> >>
> >> !-------------------------------------------------------------------|
> >>    CAUTION: External Email
> >>
> >> |-------------------------------------------------------------------!
> >>
> >> On 2026/06/20 1:32, Paolo Bonzini wrote:
> >>> On 6/18/26 00:44, BALATON Zoltan wrote:
> >>>> On Wed, 17 Jun 2026, Paolo Bonzini wrote:
> >>>>> On 6/16/26 17:55, Daniel P. Berrangé wrote:
> >>>>>> Prepare for the move to dynamically allocated memory regions by
> >>>>>> introducing memory_region_new and memory_region_new_io
> functions
> >>>>>> which call through to object_new instead of object_initialize.
> >>>>>
> >>>>> MemoryRegionOps callback will certainly access fields in the parent.
> >>>>> Having a separate object gives you no benefit because you still have
> >>>>> the same use-after-free if the MemoryRegion outlives the device.
> >>>>> Likewise for buses.
> >>>>>
> >>>>> If the device itself is embedded, then all the invariants must hold
> >>>>> recursively:
> >>>>>
> >>>>> - the device must be kept alive by mutual references between the
> >>>>> device itself and its parent;
> >>>>>
> >>>>> - unparent for the device should wait for all guest memory accesses
> >>>>> to be either completed or cancelled.
> >>>>>
> >>>>> So, the problem is that "QEMU automatically references the owning
> >>>>> Device to keep the MemoryRegion alive" is a hack that should die.
> >>>>> [...] I disagree that removing embedding will fix any bugs
> >>>>
> >>>> I'm trying to understand this better. The way it originally meant to
> >>>> work according to the memory region docs and implementation is that
> >>>> memory region is added to an owner as a child which takes a reference
> >>>> and when the owner is freed it will unparent its children which will
> >>>> then get unrefed and freed as well if nothing else increased their ref
> >>>> count.
> >>>
> >>> Yes, and that's broken.
> >>>
> >>>> The problem is that when something accesses the memory region it has
> >>>> to keep the owner alive as well so instead of referencing the memory
> >>>> region it references the owner (not directly but using
> >>>> memory_region_ref which does this).
> >>>
> >>> A mechanism to keep the owner alive exists, and it is mutual references
> >>> (so that neither owner nor MR die) + unparent of owner (which should
> >>> wait for pending users to complete before returning).
> >>>
> >>> The fact that this mechanism is not used, i.e. unparent of a PCIDevice
> >>> does not wait for pending BAR accesses to complete, is a bug.
> >>>
> >>>> So embedding seems unrelated to this passing refs on
> >>>> memory regions to owner and could be fixed separately that my patches
> >>>> attempted.
> >>>
> >>> Your patches add the possibility of not embedding; I don't like having
> >>> two separate API, but I think we only have a disagreement on aesthetics.
> >>>
> >>> This is different for Daniel's patches, which try to fix something by
> >>> not embedding, and I don't think they do.  Currently, if the owner dies,
> >>> the MemoryRegion dies with it while it shouldn't (which is a problem,
> >>> absolutely).  But these patches, while preventing the MR from dying,
> >>> still leave around a dangling pointer to the owner from the MR, so
> >>> there's not much gained if anything.
> >>
> >> I think we can stack follow-up patches that reject accesses to
> >> MemoryRegions owned by unrealized devices. That would make this a
> more
> >> complete solution. Accessing an unrealized device is an unsafe corner
> >> case anyway, so it is better to close it explicitly.
> >>
> >> memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr()
> >> are
> >> a little more complicated, because direct accesses to those regions can
> >> happen outside the BQL. Still, we can unmap the memory when the
> >> MemoryRegion is finalized, as memory_region_init_ram() already does.
> >>
> >> However, after looking into this further, I found one complication:
> >> x-vfio-user-server. In that case, the unmap timing is controlled by
> >> libvfio-user, so QEMU cannot keep the memory mapped until the
> >> MemoryRegion reference is dropped. This is not a new issue; it is a
> >> pre-existing use-after-free hazard. I think a practical fix would be to
> >> extend the libvfio-user API to support asynchronous unmapping.
> >>
> >> Cc'ing the vfio-user maintainers.
> >
> > VFIO_USER_DMA_UNMAP is already asynchronous, it sends the response
> only after it's done unmapping. What would that asynchronous unmapping
> API look like? VFIO_USER_DMA_UNMAP would send a respond that it
> received the operation and started unmapping, and then another one to say
> it's done unmapping?
> 
> The problem is of libvfio-user's API, not the vfio-user protocol.
> vfu_setup_device_dma() takes a DMA region unregistration callback, which
> is expected to synchronously unmap the region from the guest, and
> x-vfio-user-server attempts to do so by calling object_unparent().

I misunderstood.

libvfio-user provides an optional mechanism where the device is asked to 
quiesce for certain operations, including VFIO_USER_DMA_UNMAP. Once quiesced, 
the device must not modify state. If the device cannot quiesce immediately it 
can continue operating and once ready to quiesce it can call a libvfio-user 
function to tell the library it's quiesced. Would that solve the problem?

> 
> However, object_unparent() does not actually synchronously unmap the
> region from the guest so it results in a use-after-free hazard. A
> possible solution is to have the following three changes in combination:
> - This series, ensuring the MemoryRegion stays alive
> - A change of memory_region_init_ram_ptr() to add a callback for
>    asynchronous unmapping
> - A libvfio-user API change to asynchronously complete unmapping
> 
> > Also, since (AFAIK) x-vfio-user-server is orphaned is it worth putting the
> effort in this?
> 
> I rechecked MAINTAINERS and found its status is Odd Fixes, not Orphan.
> That said, it is possible to leave a FIXME comment instead of fixing it
> right now, as it's already done by virtio-gpu-rutabaga:
> 
>  > There is small risk of the MemoryRegion dereferencing the pointer
>  > after rutabaga unmaps it. Please see discussion here:
>  >
>  > https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__lists.gnu.org_archive_html_qemu-2Ddevel_2023-
> 2D09_msg05141.html&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh
> 5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=t9M_TxgBz_6K8y6rUD0ypaL_El
> PVMTV5GPtiCVHrHb9BMKzTkgWiOKmcvA812-
> 5H&s=4N2t0WV5D358WZZsjHWVvc6CwQrYydmofzhTAlZ4eVM&e=
>  >
>  > It is highly unlikely to happen in practice and doesn't affect known
>  > use cases. However, it should be fixed and is noted here for
>  > posterity.
> 
> A difference from virtio-gpu-rutabaga is that the Rutabaga library API
> does not interfere with asynchronous unmapping so it can be fixed
> entirely on the QEMU side. On contrary, anyone who would want to fix
> x-vfio-user-server would need to change the libvfio-user API because its
> current form is too restrictive.
> 
> Regards,
> Akihiko Odaki

Reply via email to