On 07/05/2018 09:11 PM, Michael S. Tsirkin wrote: > On Thu, Jul 05, 2018 at 08:11:48PM +0200, Cédric Le Goater wrote: >> PCI devices needing a ROM allocate an optional MemoryRegion with >> pci_add_option_rom(). pci_del_option_rom() does the cleanup when the >> device is destroyed. The only action taken by this routine is to call >> vmstate_unregister_ram() which clears the id string of the optional >> ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This >> was recently added by commit b895de502717 ("migration: discard >> non-migratable RAMBlocks"), . >> >> VFIO devices do their own allocation of the PCI ROM region. It is >> initialized in vfio_pci_size_rom() in which the PCI attribute >> 'has_rom' is set to true but the RAMBlock of the ROM region is not >> allocated. When the associated PCI device is deleted, >> pci_del_option_rom() calls vmstate_unregister_ram() which tries to >> flag a NULL RAMBlock because 'has_rom' is set, leading to a SEGV . >> >> The use of vmstate_unregister_ram() in the PCI device was added in >> commit b0e56e0b63f350691b52d3e75e89bb64143fbeff ("unset RAMBlock idstr >> when unregister MemoryRegion") > > I don't see it in that commit. I think it was part of the original > split by Avi.
ok. That was pointed out to me by Paolo. It is hard to track all the changes. >> and from the archive in >> http://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00282.html, it >> seems that it was trying to fix a reference count issue. >> >> vmstate_unregister_ram() being a work around, let's remove it to fix >> the current SEGV issue >> and let's try to find a fix for the initial ref >> count issue if we can reproduce. >> >> Signed-off-by: Cédric Le Goater <c...@kaod.org> > > What kind of testing did you do on this patch? Could you include > that info in the commit log pls? > > I think you need to at least add/remove some devices, then migrate. ok, for next round. I plugged/unplugged a : 0034:01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx] and then migrated. Here is the original segv backtrace: I caught this bug while deleting a passthrough device from a pseries machine. Here is the stack: #0 qemu_ram_unset_migratable (rb=0x0) at /home/legoater/work/qemu/qemu-xive-3.0.git/exec.c:1994 #1 0x000000010072def0 in vmstate_unregister_ram (mr=0x101796af0, dev=<optimized out>) #2 0x0000000100694e5c in pci_del_option_rom (pdev=0x101796330) #3 pci_qdev_unrealize (dev=<optimized out>, errp=<optimized out>) #4 0x00000001005ff910 in device_set_realized (obj=0x101796330, value=<optimized out>, errp=0x0) #5 0x00000001007a487c in property_set_bool (obj=0x101796330, v=<optimized out>, name=<optimized out>, #6 0x00000001007a7878 in object_property_set (obj=0x101796330, v=0x7fff70033110, #7 0x00000001007aaf1c in object_property_set_qobject (obj=0x101796330, value=<optimized out>, #8 0x00000001007a7b90 in object_property_set_bool (obj=0x101796330, value=<optimized out>, #9 0x00000001005fcdd8 in device_unparent (obj=0x101796330) #10 0x00000001007a6dd0 in object_finalize_child_property (obj=<optimized out>, name=<optimized out>, #11 0x00000001007a50c0 in object_property_del_child (obj=0x10111f800, child=0x101796330, #12 0x0000000100425cc0 in spapr_phb_remove_pci_device_cb (dev=0x101796330) #13 0x0000000100427974 in spapr_drc_release (drc=0x1017e2df0) #14 0x0000000100429098 in spapr_drc_detach (drc=0x1017e2df0) #15 0x00000001004294e0 in drc_isolate_physical (drc=0x1017e2df0) #16 0x000000010042a50c in rtas_set_isolation_state (state=0, idx=<optimized out>) C. > >> --- >> hw/pci/pci.c | 11 ----------- >> 1 file changed, 11 deletions(-) >> >> diff --git a/hw/pci/pci.c b/hw/pci/pci.c >> index 80bc45930dee..78bf74e19f22 100644 >> --- a/hw/pci/pci.c >> +++ b/hw/pci/pci.c >> @@ -191,7 +191,6 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num); >> static void pci_update_mappings(PCIDevice *d); >> static void pci_irq_handler(void *opaque, int irq_num, int level); >> static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error >> **); >> -static void pci_del_option_rom(PCIDevice *pdev); >> >> static uint16_t pci_default_sub_vendor_id = >> PCI_SUBVENDOR_ID_REDHAT_QUMRANET; >> static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU; >> @@ -1096,7 +1095,6 @@ static void pci_qdev_unrealize(DeviceState *dev, Error >> **errp) >> PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev); >> >> pci_unregister_io_regions(pci_dev); >> - pci_del_option_rom(pci_dev); >> >> if (pc->exit) { >> pc->exit(pci_dev); >> @@ -2262,15 +2260,6 @@ static void pci_add_option_rom(PCIDevice *pdev, bool >> is_default_rom, >> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); >> } >> >> -static void pci_del_option_rom(PCIDevice *pdev) >> -{ >> - if (!pdev->has_rom) >> - return; >> - >> - vmstate_unregister_ram(&pdev->rom, &pdev->qdev); >> - pdev->has_rom = false; >> -} >> - >> /* >> * On success, pci_add_capability() returns a positive value >> * that the offset of the pci capability. >> -- >> 2.13.6