On Fri, Oct 03, 2025 at 11:01:38PM +0900, Akihiko Odaki wrote:
> On 2025/10/03 4:40, Peter Xu wrote:
> > On Thu, Oct 02, 2025 at 03:23:10PM -0400, Peter Xu wrote:
> > > On Wed, Sep 17, 2025 at 07:32:48PM +0900, Akihiko Odaki wrote:
> > > > +static int del_memory_region(Object *child, void *opaque)
> > > > +{
> > > > + MemoryRegion *mr = (MemoryRegion *)object_dynamic_cast(child,
> > > > TYPE_MEMORY_REGION);
> > > > +
> > > > + if (mr && mr->container) {
> > > > + memory_region_del_subregion(mr->container, mr);
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > static void device_set_realized(Object *obj, bool value, Error **errp)
> > > > {
> > > > DeviceState *dev = DEVICE(obj);
> > > > @@ -582,6 +593,7 @@ static void device_set_realized(Object *obj, bool
> > > > value, Error **errp)
> > > > if (dc->unrealize) {
> > > > dc->unrealize(dev);
> > > > }
> > > > + object_child_foreach(OBJECT(dev), del_memory_region, NULL);
> > >
> > > PS: I'll keep throwing some pure questions here, again, Paolo - it doesn't
> > > need to block merging if you're confident with the general approach.
> > >
> > > Said that, a few things I still want to mention after I read this series..
> > >
> > > One thing I really feel hard to review such work is, you hardly describe
> > > the problems the series is resolving.
> > >
> > > For example, this patch proposed auto-detach MRs in unrealize() for qdev,
> > > however there's nowhere describing "what will start to work, while it
> > > doesn't", "how bad is the problem", etc.. All the rest patches are about
> > > "what we can avoid do" after this patch.
> >
> > For this part, I should be more clear on what I'm requesting on the
> > answers.
> >
> > I think I get the whole point that MRs (while still with MR refcount
> > piggypacked, as of current QEMU master does) can circular reference itself
> > if not always detached properly, so explicitly my question is about:
> >
> > - What devices / use case you encountered, that QEMU has such issue?
> > Especially, this is about after we have merged commit ac7a892fd3 "memory:
> > Fix leaks due to owner-shared MRs circular references". Asking because I
> > believe most of them should already auto-detach when owner is shared.
> >
> > - From above list of broken devices, are there any devices that are
> > hot-unpluggable (aka, high priority)? Is it a problem if we do not
> > finalize a MR if it is never removable anyway?
[1]
> >
> > >
> > > Meanwhile, the cover letter is misleading. It is:
> > >
> > > [PATCH 00/14] Fix memory region use-after-finalization
> > >
> > > I believe it's simply wrong, because the whole series is not about
> > > finalize() but unrealize(). For Device class, it also includes the exit()
> > > which will be invoked in pci_qdev_unrealize(), but that is also part of
> > > the
> > > unrealize() routine, not finalize().
>
> The subject of the cover letter "fix memory region use-after-finalization"
> is confusing. While this series has such fixes, it also contain refactoring
> patches. The cover letter says:
>
> > Patch "qdev: Automatically delete memory subregions" and the
> > succeeding patches are for refactoring, but patch "vfio-user: Do not
> > delete the subregion" does fix use-after-finalization.
>
> More concretely, patch "qdev: Automatically delete memory subregions"
> implements a common pattern of device unrealization, and the suceeding
> patches remove ad-hoc implementations of it.
>
> And since patch "hw/pci-bridge: Do not assume immediate MemoryRegion
> finalization" fixes nothing as you pointed out, only patch "vfio-user: Do
> not delete the subregion" fixes something.
>
> Without patch "vfio-user: Do not delete the subregion",
> vfio_user_msix_teardown() calls memory_region_del_subregion(). However, this
> function is called from instance_finalize, so the subregion is already
> finalized and results in a use-after-finalization scenario.
>
> Anything else is for refactoring and it's quite unlike patch "memory: Fix
> leaks due to owner-shared MRs circular references", which is a bug fix.
>
> I think I'll drop patch "hw/pci-bridge: Do not assume immediate MemoryRegion
> finalization" and rename this series simply to "qdev: Automatically delete
> memory subregions" to avoid confusion.
Yes, thanks. I went over quite a few follow up patches but I missed this
one. IMHO you can also split the only fix out, so that can be better
looked at by vfio-user developers. It'll also be easier for them to verify
if they want.
>
> > >
> > > The other question is, what if a MR has a owner that is not the device
> > > itself? There's no place enforcing this, hence a qdev can logically have
> > > some sub-objects (which may not really be qdev) that can be the owner of
> > > the memory regions. Then the device emulation will found that some MRs
> > > are
> > > auto-detached and some are not.
> > >
> > > One example that I'm aware of is this:
> > >
> > > https://lore.kernel.org/all/[email protected]/#t
> > >
> > > TYPE_VIRTIO_SHARED_MEMORY_MAPPING is an object, not qdev here, which can
> > > be
> > > the owner of the MR.
>
> Patch "qdev: Automatically delete memory subregions" and the succeeding
> patches are for refactoring of qdev. MRs not directly owned by qdev are out
> of scope of the change.
Do you have a rough answer of above question [1], on what might suffer from
lost MRs? I sincerely want to know how much we are missing after we could
already auto-detach owner-shared MRs.
>From a quick glance, at least patch 4-14 are detaching MRs that shares the
same owner. IIUC, it means at least patch 4-14 do not rely on patch 2.
Then I wonder how much patch 2 helps in real life.
There's indeed a difference though when a qdev may realize(), unrealize()
and realize() in a sequence, in which case patch 2 could help whil commit
ac7a892fd3 won't, however I don't know whether there's real use case,
either.
I also wished if there's such device, it'll have explicit detach code so
that when I debug a problem on the device I can easily see when the MRs
will be available, instead of remembering all the rules when something in
some layer will auto-detach...
Personally I think such automation adds burden to developers' mind if
there're still a bunch of MRs that are not used in this way (there
definitely are, otherwise we should be able to completely unexport
memory_region_del_subregion). But that's subjective; I believe at least
Paolo agrees with your general approach.
Thanks,
--
Peter Xu