On 2025/10/04 0:26, Peter Xu wrote:
On Fri, Oct 03, 2025 at 11:01:38PM +0900, Akihiko Odaki wrote:
On 2025/10/03 4:40, Peter Xu wrote:
On Thu, Oct 02, 2025 at 03:23:10PM -0400, Peter Xu wrote:
On Wed, Sep 17, 2025 at 07:32:48PM +0900, Akihiko Odaki wrote:
+static int del_memory_region(Object *child, void *opaque)
+{
+    MemoryRegion *mr = (MemoryRegion *)object_dynamic_cast(child, 
TYPE_MEMORY_REGION);
+
+    if (mr && mr->container) {
+        memory_region_del_subregion(mr->container, mr);
+    }
+
+    return 0;
+}
+
   static void device_set_realized(Object *obj, bool value, Error **errp)
   {
       DeviceState *dev = DEVICE(obj);
@@ -582,6 +593,7 @@ static void device_set_realized(Object *obj, bool value, 
Error **errp)
           if (dc->unrealize) {
               dc->unrealize(dev);
           }
+        object_child_foreach(OBJECT(dev), del_memory_region, NULL);

PS: I'll keep throwing some pure questions here, again, Paolo - it doesn't
need to block merging if you're confident with the general approach.

Said that, a few things I still want to mention after I read this series..

One thing I really feel hard to review such work is, you hardly describe
the problems the series is resolving.

For example, this patch proposed auto-detach MRs in unrealize() for qdev,
however there's nowhere describing "what will start to work, while it
doesn't", "how bad is the problem", etc..  All the rest patches are about
"what we can avoid do" after this patch.

For this part, I should be more clear on what I'm requesting on the
answers.

I think I get the whole point that MRs (while still with MR refcount
piggypacked, as of current QEMU master does) can circular reference itself
if not always detached properly, so explicitly my question is about:

- What devices / use case you encountered, that QEMU has such issue?
    Especially, this is about after we have merged commit ac7a892fd3 "memory:
    Fix leaks due to owner-shared MRs circular references".  Asking because I
    believe most of them should already auto-detach when owner is shared.

- From above list of broken devices, are there any devices that are
    hot-unpluggable (aka, high priority)?  Is it a problem if we do not
    finalize a MR if it is never removable anyway?

[1]



Meanwhile, the cover letter is misleading. It is:

[PATCH 00/14] Fix memory region use-after-finalization

I believe it's simply wrong, because the whole series is not about
finalize() but unrealize().  For Device class, it also includes the exit()
which will be invoked in pci_qdev_unrealize(), but that is also part of the
unrealize() routine, not finalize().

The subject of the cover letter "fix memory region use-after-finalization"
is confusing. While this series has such fixes, it also contain refactoring
patches. The cover letter says:

Patch "qdev: Automatically delete memory subregions" and the
succeeding patches are for refactoring, but patch "vfio-user: Do not
delete the subregion" does fix use-after-finalization.

More concretely, patch "qdev: Automatically delete memory subregions"
implements a common pattern of device unrealization, and the suceeding
patches remove ad-hoc implementations of it.

And since patch "hw/pci-bridge: Do not assume immediate MemoryRegion
finalization" fixes nothing as you pointed out, only patch "vfio-user: Do
not delete the subregion" fixes something.

Without patch "vfio-user: Do not delete the subregion",
vfio_user_msix_teardown() calls memory_region_del_subregion(). However, this
function is called from instance_finalize, so the subregion is already
finalized and results in a use-after-finalization scenario.

Anything else is for refactoring and it's quite unlike patch "memory: Fix
leaks due to owner-shared MRs circular references", which is a bug fix.

I think I'll drop patch "hw/pci-bridge: Do not assume immediate MemoryRegion
finalization" and rename this series simply to "qdev: Automatically delete
memory subregions" to avoid confusion.

Yes, thanks.  I went over quite a few follow up patches but I missed this
one.  IMHO you can also split the only fix out, so that can be better
looked at by vfio-user developers.  It'll also be easier for them to verify
if they want.

I'll split the VFIO patch out since I found patch "qdev: Automatically delete memory subregions" is unnecessary; I'll describe that later.




The other question is, what if a MR has a owner that is not the device
itself?  There's no place enforcing this, hence a qdev can logically have
some sub-objects (which may not really be qdev) that can be the owner of
the memory regions.  Then the device emulation will found that some MRs are
auto-detached and some are not.

One example that I'm aware of is this:

https://lore.kernel.org/all/[email protected]/#t

TYPE_VIRTIO_SHARED_MEMORY_MAPPING is an object, not qdev here, which can be
the owner of the MR.

Patch "qdev: Automatically delete memory subregions" and the succeeding
patches are for refactoring of qdev. MRs not directly owned by qdev are out
of scope of the change.

Do you have a rough answer of above question [1], on what might suffer from
lost MRs?  I sincerely want to know how much we are missing after we could
already auto-detach owner-shared MRs.

There is no functionally "broken" device. The patch that does fix something is the VFIO patch, and it fixes use-after-finalization, which is semantically problematic but does not cause a functional problem (like use-after-free).


 From a quick glance, at least patch 4-14 are detaching MRs that shares the
same owner.  IIUC, it means at least patch 4-14 do not rely on patch 2.

If the container and the subregion shares the same owner, memory_region_del_subregion() is unnecessary in the first place and can be removed even without patch 2, and it seems there are a number of such unnecessary memory_region_del_subregion() calls.

That said, there are cases where memory_region_del_subregion() is truly needed. In general, a device exposes its MMIO functionality to the guest by adding its memory region to external containers, so there should be some containers and subregions that do not share memory regions.

Reviewing removed memory_region_del_subregion() calls in each patch, patch 3, 6, 7, 8, 9, 10, 13, and 14 are for memory regions with shared owners (that's why patch 3 can be applied without the qdev patch).
Patch 4, 5, 11 and 12 are for memory regions with different owners.

The point here is that we don't have to care whether memory regions share owners if we delete all subregions for defensive programming.


Then I wonder how much patch 2 helps in real life.

There's indeed a difference though when a qdev may realize(), unrealize()
and realize() in a sequence, in which case patch 2 could help whil commit
ac7a892fd3 won't, however I don't know whether there's real use case,
either.

It is not relevant what order realize() and unrealize() are performed. This patch is to de-duplicate the code to unrealize devices, and commit ac7a892fd3 fixes a bug instead.


I also wished if there's such device, it'll have explicit detach code so
that when I debug a problem on the device I can easily see when the MRs
will be available, instead of remembering all the rules when something in
some layer will auto-detach...

Indeed, people who read the implementation of devices wonder where are memory_region_del_subregion() calls that corresponding to memory_region_add_subregion() calls they see.

That said, anyone who cares device code already needs to know that subregions (that do not share owners) need to be deleted during unrealization, or devices will remain visible to the guest. I see avoiding this consequence is important, and it is just nice to have less code by de-deuplicating the memory_del_subregion() calls.


Personally I think such automation adds burden to developers' mind if
there're still a bunch of MRs that are not used in this way (there
definitely are, otherwise we should be able to completely unexport
memory_region_del_subregion). But that's subjective; I believe at least
Paolo agrees with your general approach.

Indeed, memory_region_del_subregion() is still necessary for memory regions that are not directly owned by devices. This is a trade-off situation so there is no clear answer what's the best.

Regards,
Akihiko Odaki

Reply via email to