On Wed, Mar 09, 2022 at 01:41:27PM +0100, Łukasz Gieryk wrote: > On Tue, Mar 01, 2022 at 02:07:08PM +0100, Klaus Jensen wrote: > > On Feb 17 18:45, Lukasz Maniak wrote: > > > From: Łukasz Gieryk <lukasz.gie...@linux.intel.com> > > > > > > With the new command one can: > > > - assign flexible resources (queues, interrupts) to primary and > > > secondary controllers, > > > - toggle the online/offline state of given controller. > > > > > > > QEMU segfaults (or asserts depending on the wind blowing) if the SR-IOV > > enabled device is hotplugged after being configured (i.e. follow the > > docs for a simple setup and then do a `device_del <nvme-device>` in the > > monitor. I suspect this is related to freeing the queues and something > > getting double-freed. > > > > I’ve finally found some time to look at the issue. > > Long story short: the hot-plug mechanism deletes all VFs without the PF > knowing, then PF tries to reset and delete all the already non-existing > devices. > > I have a solution for the problem, but there’s high a chance it’s not > the correct one. I’m still reading through the specs, as my knowledge in > the area of hot-plug/ACPI is quite limited. > > Soon we will release the next patch set, with the fix included. I hope > the ACPI maintainers will chime in then. Till that happens, this is the > summary of my findings: > > 1) The current SR-IOV implementation assumes it’s the PF that creates > and deletes VFs. > 2) It’s a design decision (the Nvme device at least) for the VFs to be > of the same class as PF. Effectively, they share the dc->hotpluggable > value. > 3) When a VF is created, it’s added as a child node to PF’s PCI bus > slot. > 4) Monitor/device_del triggers the ACPI mechanism. The implementation is > not aware of SR/IOV and ejects PF’s PCI slot, directly unrealizing all > hot-pluggable (!acpi_pcihp_pc_no_hotplug) children nodes. > 5) VFs are unrealized directly, and it doesn’t work well with (1). > SR/IOV structures are not updated, so when it’s PF’s turn to be > unrealized, it works on stale pointers to already-deleted VFs. > > My proposed ‘fix’ is to make the PCI ACPI code aware of SR/IOV: >
CC'ing ACPI/SMBIOS maintainers/reviewers on the proposed fix. > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c > index f4d706e47d..090bdb8e74 100644 > --- a/hw/acpi/pcihp.c > +++ b/hw/acpi/pcihp.c > @@ -196,8 +196,12 @@ static bool acpi_pcihp_pc_no_hotplug(AcpiPciHpState *s, > PCIDevice *dev) > * ACPI doesn't allow hotplug of bridge devices. Don't allow > * hot-unplug of bridge devices unless they were added by hotplug > * (and so, not described by acpi). > + * > + * Don't allow hot-unplug of SR-IOV Virtual Functions, as they > + * will be removed implicitly, when Physical Function is unplugged. > */ > - return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable; > + return (pc->is_bridge && !dev->qdev.hotplugged) || !dc->hotpluggable || > + pci_is_vf(dev); > } >