On 6/29/23 13:45, Joao Martins wrote:
On 29/06/2023 09:40, Zhenzhong Duan wrote:
When vfio_realize() succeeds, hot unplug will call vfio_exitfn()
to free resources allocated in vfio_realize(); when vfio_realize()
fails, vfio_exitfn() is never called and we need to free resources
in vfio_realize().

In the case that vfio_migration_realize() fails,
e.g: with -only-migratable & enable-migration=off, we see below:

(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
0000:81:11.1: Migration disabled
Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: 
Migration is disabled for VFIO device

If we hotplug again we should see same log as above, but we see:
(qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off
Error: vfio 0000:81:11.1: device is already attached

That's because some references to VFIO device isn't released,
we should check return value of vfio_migration_realize() and
release the references, then VFIO device will be truely
released when hotplug fails.

Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable")
Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
---
  hw/vfio/pci.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 54a8179d1c64..dc69d3031b24 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3210,6 +3210,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
          ret = vfio_migration_realize(vbasedev, errp);
          if (ret) {
              error_report("%s: Migration disabled", vbasedev->name);
+            goto out_vfio_migration;
          }
      }
@@ -3219,6 +3220,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) return; +out_vfio_migration:
+    vfio_migration_exit(vbasedev);
  out_deregister:
      vfio_disable_interrupts(vdev);
  out_intx_disable:

I agree with the general sentiment behind the change.
Clearly vfio::migration and vfio::migration_blocker are leaking from inside the
migration_realize() function.

But it is kinda awkward semantic that vfio_migration_realize() (or any realize)
failures need to be accompanied with a vfio_migration_exit() that tears down
state *leaked* by its realize() failure.

It sounds to me that this should be inside the vfio_migration_realize() not on
the caller? Unless QEMU ::realize() is expected to do this.


I agree. vfio_migration_realize() should handle the cleanup of the resources
it allocated if there is a failure.

Thanks

C.


Reply via email to