On 2026/06/23 6:24, Thanos Makatos wrote:
-----Original Message-----
From: Akihiko Odaki <[email protected]>
Sent: 22 June 2026 15:37
To: Paolo Bonzini <[email protected]>; BALATON Zoltan
<[email protected]>
Cc: Daniel P. Berrangé <[email protected]>; [email protected];
Philippe Mathieu-Daudé <[email protected]>; Pierrick Bouvier
<[email protected]>; Peter Xu <[email protected]>;
Hervé Poussineau <[email protected]>; Alex Bennée
<[email protected]>; Michael S. Tsirkin <[email protected]>; Aurelien
Jarno <[email protected]>; Fabiano Rosas <[email protected]>; Mark Cave-
Ayland <[email protected]>; Marc-André Lureau
<[email protected]>; John Levon <[email protected]>;
Thanos Makatos <[email protected]>; Cédric Le Goater
<[email protected]>
Subject: Re: [RFC 4/7] system: add memory_region_new /
memory_region_new_io

!-------------------------------------------------------------------|
   CAUTION: External Email

|-------------------------------------------------------------------!

On 2026/06/20 1:32, Paolo Bonzini wrote:
On 6/18/26 00:44, BALATON Zoltan wrote:
On Wed, 17 Jun 2026, Paolo Bonzini wrote:
On 6/16/26 17:55, Daniel P. Berrangé wrote:
Prepare for the move to dynamically allocated memory regions by
introducing memory_region_new and memory_region_new_io functions
which call through to object_new instead of object_initialize.

MemoryRegionOps callback will certainly access fields in the parent.
Having a separate object gives you no benefit because you still have
the same use-after-free if the MemoryRegion outlives the device.
Likewise for buses.

If the device itself is embedded, then all the invariants must hold
recursively:

- the device must be kept alive by mutual references between the
device itself and its parent;

- unparent for the device should wait for all guest memory accesses
to be either completed or cancelled.

So, the problem is that "QEMU automatically references the owning
Device to keep the MemoryRegion alive" is a hack that should die.
[...] I disagree that removing embedding will fix any bugs

I'm trying to understand this better. The way it originally meant to
work according to the memory region docs and implementation is that
memory region is added to an owner as a child which takes a reference
and when the owner is freed it will unparent its children which will
then get unrefed and freed as well if nothing else increased their ref
count.

Yes, and that's broken.

The problem is that when something accesses the memory region it has
to keep the owner alive as well so instead of referencing the memory
region it references the owner (not directly but using
memory_region_ref which does this).

A mechanism to keep the owner alive exists, and it is mutual references
(so that neither owner nor MR die) + unparent of owner (which should
wait for pending users to complete before returning).

The fact that this mechanism is not used, i.e. unparent of a PCIDevice
does not wait for pending BAR accesses to complete, is a bug.

So embedding seems unrelated to this passing refs on
memory regions to owner and could be fixed separately that my patches
attempted.

Your patches add the possibility of not embedding; I don't like having
two separate API, but I think we only have a disagreement on aesthetics.

This is different for Daniel's patches, which try to fix something by
not embedding, and I don't think they do.  Currently, if the owner dies,
the MemoryRegion dies with it while it shouldn't (which is a problem,
absolutely).  But these patches, while preventing the MR from dying,
still leave around a dangling pointer to the owner from the MR, so
there's not much gained if anything.

I think we can stack follow-up patches that reject accesses to
MemoryRegions owned by unrealized devices. That would make this a more
complete solution. Accessing an unrealized device is an unsafe corner
case anyway, so it is better to close it explicitly.

memory_region_init_ram_ptr() and memory_region_init_ram_device_ptr()
are
a little more complicated, because direct accesses to those regions can
happen outside the BQL. Still, we can unmap the memory when the
MemoryRegion is finalized, as memory_region_init_ram() already does.

However, after looking into this further, I found one complication:
x-vfio-user-server. In that case, the unmap timing is controlled by
libvfio-user, so QEMU cannot keep the memory mapped until the
MemoryRegion reference is dropped. This is not a new issue; it is a
pre-existing use-after-free hazard. I think a practical fix would be to
extend the libvfio-user API to support asynchronous unmapping.

Cc'ing the vfio-user maintainers.

VFIO_USER_DMA_UNMAP is already asynchronous, it sends the response only after 
it's done unmapping. What would that asynchronous unmapping API look like? 
VFIO_USER_DMA_UNMAP would send a respond that it received the operation and 
started unmapping, and then another one to say it's done unmapping?

The problem is of libvfio-user's API, not the vfio-user protocol. vfu_setup_device_dma() takes a DMA region unregistration callback, which is expected to synchronously unmap the region from the guest, and x-vfio-user-server attempts to do so by calling object_unparent().

However, object_unparent() does not actually synchronously unmap the region from the guest so it results in a use-after-free hazard. A possible solution is to have the following three changes in combination:
- This series, ensuring the MemoryRegion stays alive
- A change of memory_region_init_ram_ptr() to add a callback for
  asynchronous unmapping
- A libvfio-user API change to asynchronously complete unmapping

Also, since (AFAIK) x-vfio-user-server is orphaned is it worth putting the 
effort in this?

I rechecked MAINTAINERS and found its status is Odd Fixes, not Orphan. That said, it is possible to leave a FIXME comment instead of fixing it right now, as it's already done by virtio-gpu-rutabaga:

> There is small risk of the MemoryRegion dereferencing the pointer
> after rutabaga unmaps it. Please see discussion here:
>
> https://lists.gnu.org/archive/html/qemu-devel/2023-09/msg05141.html
>
> It is highly unlikely to happen in practice and doesn't affect known
> use cases. However, it should be fixed and is noted here for
> posterity.

A difference from virtio-gpu-rutabaga is that the Rutabaga library API does not interfere with asynchronous unmapping so it can be fixed entirely on the QEMU side. On contrary, anyone who would want to fix x-vfio-user-server would need to change the libvfio-user API because its current form is too restrictive.

Regards,
Akihiko Odaki

Reply via email to