> -----Original Message----- > From: Akihiko Odaki <[email protected]> > Sent: 22 June 2026 15:37 > To: Paolo Bonzini <[email protected]>; BALATON Zoltan > <[email protected]> > Cc: Daniel P. Berrangé <[email protected]>; [email protected]; > Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier > <[email protected]>; Peter Xu <[email protected]>; > Hervé Poussineau <[email protected]>; Alex Bennée > <[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien > Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark Cave- > Ayland <[email protected]>; Marc-André Lureau > <[email protected]>; John Levon <[email protected]>; > Thanos Makatos <[email protected]>; Cédric Le Goater > <[email protected]> > Subject: Re: [RFC 4/7] system: add memory_region_new / > memory_region_new_io > > !-------------------------------------------------------------------| > CAUTION: External Email > > |-------------------------------------------------------------------! > > On 2026/06/20 1:32, Paolo Bonzini wrote: > > On 6/18/26 00:44, BALATON Zoltan wrote: > >> On Wed, 17 Jun 2026, Paolo Bonzini wrote: > >>> On 6/16/26 17:55, Daniel P. Berrangé wrote: > >>>> Prepare for the move to dynamically allocated memory regions by > >>>> introducing memory_region_new and memory_region_new_io functions > >>>> which call through to object_new instead of object_initialize. > >>> > >>> MemoryRegionOps callback will certainly access fields in the parent. > >>> Having a separate object gives you no benefit because you still have > >>> the same use-after-free if the MemoryRegion outlives the device. > >>> Likewise for buses. > >>> > >>> If the device itself is embedded, then all the invariants must hold > >>> recursively: > >>> > >>> - the device must be kept alive by mutual references between the > >>> device itself and its parent; > >>> > >>> - unparent for the device should wait for all guest memory accesses > >>> to be either completed or cancelled. > >>> > >>> So, the problem is that "QEMU automatically references the owning > >>> Device to keep the MemoryRegion alive" is a hack that should die. > >>> [...] I disagree that removing embedding will fix any bugs > >> > >> I'm trying to understand this better. The way it originally meant to > >> work according to the memory region docs and implementation is that > >> memory region is added to an owner as a child which takes a reference > >> and when the owner is freed it will unparent its children which will > >> then get unrefed and freed as well if nothing else increased their ref > >> count. > > > > Yes, and that's broken. > > > >> The problem is that when something accesses the memory region it has > >> to keep the owner alive as well so instead of referencing the memory > >> region it references the owner (not directly but using > >> memory_region_ref which does this). > > > > A mechanism to keep the owner alive exists, and it is mutual references > > (so that neither owner nor MR die) + unparent of owner (which should > > wait for pending users to complete before returning). > > > > The fact that this mechanism is not used, i.e. unparent of a PCIDevice > > does not wait for pending BAR accesses to complete, is a bug. > > > >> So embedding seems unrelated to this passing refs on > >> memory regions to owner and could be fixed separately that my patches > >> attempted. > > > > Your patches add the possibility of not embedding; I don't like having > > two separate API, but I think we only have a disagreement on aesthetics. > > > > This is different for Daniel's patches, which try to fix something by > > not embedding, and I don't think they do. Currently, if the owner dies, > > the MemoryRegion dies with it while it shouldn't (which is a problem, > > absolutely). But these patches, while preventing the MR from dying, > > still leave around a dangling pointer to the owner from the MR, so > > there's not much gained if anything. > > I think we can stack follow-up patches that reject accesses to > MemoryRegions owned by unrealized devices. That would make this a more > complete solution. Accessing an unrealized device is an unsafe corner > case anyway, so it is better to close it explicitly. > > memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr() > are > a little more complicated, because direct accesses to those regions can > happen outside the BQL. Still, we can unmap the memory when the > MemoryRegion is finalized, as memory_region_init_ram() already does. > > However, after looking into this further, I found one complication: > x-vfio-user-server. In that case, the unmap timing is controlled by > libvfio-user, so QEMU cannot keep the memory mapped until the > MemoryRegion reference is dropped. This is not a new issue; it is a > pre-existing use-after-free hazard. I think a practical fix would be to > extend the libvfio-user API to support asynchronous unmapping. > > Cc'ing the vfio-user maintainers.
VFIO_USER_DMA_UNMAP is already asynchronous, it sends the response only after it's done unmapping. What would that asynchronous unmapping API look like? VFIO_USER_DMA_UNMAP would send a respond that it received the operation and started unmapping, and then another one to say it's done unmapping? Also, since (AFAIK) x-vfio-user-server is orphaned is it worth putting the effort in this? > > Regards, > Akihiko Odaki
