> -----Original Message----- > From: Akihiko Odaki <[email protected]> > Sent: 23 June 2026 06:42 > To: Thanos Makatos <[email protected]>; Paolo Bonzini > <[email protected]>; BALATON Zoltan <[email protected]> > Cc: Daniel P. Berrangé <[email protected]>; [email protected]; > Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier > <[email protected]>; Peter Xu <[email protected]>; > Hervé Poussineau <[email protected]>; Alex Bennée > <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien > Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark Cave- > Ayland <[email protected]>; Marc-André Lureau > <[email protected]>; John Levon <[email protected]>; > Cédric Le Goater <[email protected]> > Subject: Re: [RFC 4/7] system: add memory_region_new / > memory_region_new_io > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On 2026/06/23 6:24, Thanos Makatos wrote: > >> -----Original Message----- > >> From: Akihiko Odaki <[email protected]> > >> Sent: 22 June 2026 15:37 > >> To: Paolo Bonzini <[email protected]>; BALATON Zoltan > >> <[email protected]> > >> Cc: Daniel P. Berrangé <[email protected]>; [email protected]; > >> Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier > >> <[email protected]>; Peter Xu <[email protected]>; > >> Hervé Poussineau <[email protected]>; Alex Bennée > >> <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien > >> Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark > Cave- > >> Ayland <[email protected]>; Marc-André Lureau > >> <[email protected]>; John Levon <[email protected]>; > >> Thanos Makatos <[email protected]>; Cédric Le Goater > >> <[email protected]> > >> Subject: Re: [RFC 4/7] system: add memory_region_new / > >> memory_region_new_io > >> > >> !-------------------------------------------------------------------| > >> CAUTION: External Email > >> > >> |-------------------------------------------------------------------! > >> > >> On 2026/06/20 1:32, Paolo Bonzini wrote: > >>> On 6/18/26 00:44, BALATON Zoltan wrote: > >>>> On Wed, 17 Jun 2026, Paolo Bonzini wrote: > >>>>> On 6/16/26 17:55, Daniel P. Berrangé wrote: > >>>>>> Prepare for the move to dynamically allocated memory regions by > >>>>>> introducing memory_region_new and memory_region_new_io > functions > >>>>>> which call through to object_new instead of object_initialize. > >>>>> > >>>>> MemoryRegionOps callback will certainly access fields in the parent. > >>>>> Having a separate object gives you no benefit because you still have > >>>>> the same use-after-free if the MemoryRegion outlives the device. > >>>>> Likewise for buses. > >>>>> > >>>>> If the device itself is embedded, then all the invariants must hold > >>>>> recursively: > >>>>> > >>>>> - the device must be kept alive by mutual references between the > >>>>> device itself and its parent; > >>>>> > >>>>> - unparent for the device should wait for all guest memory accesses > >>>>> to be either completed or cancelled. > >>>>> > >>>>> So, the problem is that "QEMU automatically references the owning > >>>>> Device to keep the MemoryRegion alive" is a hack that should die. > >>>>> [...] I disagree that removing embedding will fix any bugs > >>>> > >>>> I'm trying to understand this better. The way it originally meant to > >>>> work according to the memory region docs and implementation is that > >>>> memory region is added to an owner as a child which takes a reference > >>>> and when the owner is freed it will unparent its children which will > >>>> then get unrefed and freed as well if nothing else increased their ref > >>>> count. > >>> > >>> Yes, and that's broken. > >>> > >>>> The problem is that when something accesses the memory region it has > >>>> to keep the owner alive as well so instead of referencing the memory > >>>> region it references the owner (not directly but using > >>>> memory_region_ref which does this). > >>> > >>> A mechanism to keep the owner alive exists, and it is mutual references > >>> (so that neither owner nor MR die) + unparent of owner (which should > >>> wait for pending users to complete before returning). > >>> > >>> The fact that this mechanism is not used, i.e. unparent of a PCIDevice > >>> does not wait for pending BAR accesses to complete, is a bug. > >>> > >>>> So embedding seems unrelated to this passing refs on > >>>> memory regions to owner and could be fixed separately that my patches > >>>> attempted. > >>> > >>> Your patches add the possibility of not embedding; I don't like having > >>> two separate API, but I think we only have a disagreement on aesthetics. > >>> > >>> This is different for Daniel's patches, which try to fix something by > >>> not embedding, and I don't think they do. Currently, if the owner dies, > >>> the MemoryRegion dies with it while it shouldn't (which is a problem, > >>> absolutely). But these patches, while preventing the MR from dying, > >>> still leave around a dangling pointer to the owner from the MR, so > >>> there's not much gained if anything. > >> > >> I think we can stack follow-up patches that reject accesses to > >> MemoryRegions owned by unrealized devices. That would make this a > more > >> complete solution. Accessing an unrealized device is an unsafe corner > >> case anyway, so it is better to close it explicitly. > >> > >> memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr() > >> are > >> a little more complicated, because direct accesses to those regions can > >> happen outside the BQL. Still, we can unmap the memory when the > >> MemoryRegion is finalized, as memory_region_init_ram() already does. > >> > >> However, after looking into this further, I found one complication: > >> x-vfio-user-server. In that case, the unmap timing is controlled by > >> libvfio-user, so QEMU cannot keep the memory mapped until the > >> MemoryRegion reference is dropped. This is not a new issue; it is a > >> pre-existing use-after-free hazard. I think a practical fix would be to > >> extend the libvfio-user API to support asynchronous unmapping. > >> > >> Cc'ing the vfio-user maintainers. > > > > VFIO_USER_DMA_UNMAP is already asynchronous, it sends the response > only after it's done unmapping. What would that asynchronous unmapping > API look like? VFIO_USER_DMA_UNMAP would send a respond that it > received the operation and started unmapping, and then another one to say > it's done unmapping? > > The problem is of libvfio-user's API, not the vfio-user protocol. > vfu_setup_device_dma() takes a DMA region unregistration callback, which > is expected to synchronously unmap the region from the guest, and > x-vfio-user-server attempts to do so by calling object_unparent().
I misunderstood. libvfio-user provides an optional mechanism where the device is asked to quiesce for certain operations, including VFIO_USER_DMA_UNMAP. Once quiesced, the device must not modify state. If the device cannot quiesce immediately it can continue operating and once ready to quiesce it can call a libvfio-user function to tell the library it's quiesced. Would that solve the problem? > > However, object_unparent() does not actually synchronously unmap the > region from the guest so it results in a use-after-free hazard. A > possible solution is to have the following three changes in combination: > - This series, ensuring the MemoryRegion stays alive > - A change of memory_region_init_ram_ptr() to add a callback for > asynchronous unmapping > - A libvfio-user API change to asynchronously complete unmapping > > > Also, since (AFAIK) x-vfio-user-server is orphaned is it worth putting the > effort in this? > > I rechecked MAINTAINERS and found its status is Odd Fixes, not Orphan. > That said, it is possible to leave a FIXME comment instead of fixing it > right now, as it's already done by virtio-gpu-rutabaga: > > > There is small risk of the MemoryRegion dereferencing the pointer > > after rutabaga unmaps it. Please see discussion here: > > > > https://urldefense.proofpoint.com/v2/url?u=https- > 3A__lists.gnu.org_archive_html_qemu-2Ddevel_2023- > 2D09_msg05141.html&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh > 5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=t9M_TxgBz_6K8y6rUD0ypaL_El > PVMTV5GPtiCVHrHb9BMKzTkgWiOKmcvA812- > 5H&s=4N2t0WV5D358WZZsjHWVvc6CwQrYydmofzhTAlZ4eVM&e= > > > > It is highly unlikely to happen in practice and doesn't affect known > > use cases. However, it should be fixed and is noted here for > > posterity. > > A difference from virtio-gpu-rutabaga is that the Rutabaga library API > does not interfere with asynchronous unmapping so it can be fixed > entirely on the QEMU side. On contrary, anyone who would want to fix > x-vfio-user-server would need to change the libvfio-user API because its > current form is too restrictive. > > Regards, > Akihiko Odaki
