On Tue, Mar 01, 2022 at 02:07:08PM +0100, Klaus Jensen wrote: > On Feb 17 18:45, Lukasz Maniak wrote: > > From: Łukasz Gieryk <lukasz.gie...@linux.intel.com> > > > > With the new command one can: > > - assign flexible resources (queues, interrupts) to primary and > > secondary controllers, > > - toggle the online/offline state of given controller. > > > > QEMU segfaults (or asserts depending on the wind blowing) if the SR-IOV > enabled device is hotplugged after being configured (i.e. follow the > docs for a simple setup and then do a `device_del <nvme-device>` in the > monitor. I suspect this is related to freeing the queues and something > getting double-freed. >
I’ve finally found some time to look at the issue. Long story short: the hot-plug mechanism deletes all VFs without the PF knowing, then PF tries to reset and delete all the already non-existing devices. I have a solution for the problem, but there’s high a chance it’s not the correct one. I’m still reading through the specs, as my knowledge in the area of hot-plug/ACPI is quite limited. Soon we will release the next patch set, with the fix included. I hope the ACPI maintainers will chime in then. Till that happens, this is the summary of my findings: 1) The current SR-IOV implementation assumes it’s the PF that creates and deletes VFs. 2) It’s a design decision (the Nvme device at least) for the VFs to be of the same class as PF. Effectively, they share the dc->hotpluggable value. 3) When a VF is created, it’s added as a child node to PF’s PCI bus slot. 4) Monitor/device_del triggers the ACPI mechanism. The implementation is not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes. 5) VFs are unrealized directly, and it doesn’t work well with (1). SR/IOV structures are not updated, so when it’s PF’s turn to be unrealized, it works on stale pointers to already-deleted VFs. My proposed ‘fix’ is to make the PCI ACPI code aware of SR/IOV: diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c index f4d706e47d..090bdb8e74 100644 --- a/hw/acpi/pcihp.c +++ b/hw/acpi/pcihp.c @@ -196,8 +196,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, PCIDevice *dev) * ACPI doesn't allow hotplug of bridge devices. Don't allow * hot-unplug of bridge devices unless they were added by hotplug * (and so, not described by acpi). + * + * Don't allow hot-unplug of SR-IOV Virtual Functions, as they + * will be removed implicitly, when Physical Function is unplugged. */ - return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable; + return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable || + pci_is_vf(dev); }