On 07/05/2018 09:11 PM, Michael S. Tsirkin wrote:
> On Thu, Jul 05, 2018 at 08:11:48PM +0200, Cédric Le Goater wrote:
>> PCI devices needing a ROM allocate an optional MemoryRegion with
>> pci_add_option_rom(). pci_del_option_rom() does the cleanup when the
>> device is destroyed. The only action taken by this routine is to call
>> vmstate_unregister_ram() which clears the id string of the optional
>> ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This
>> was recently added by commit b895de502717 ("migration: discard
>> non-migratable RAMBlocks"), .
>>
>> VFIO devices do their own allocation of the PCI ROM region. It is
>> initialized in vfio_pci_size_rom() in which the PCI attribute
>> 'has_rom' is set to true but the RAMBlock of the ROM region is not
>> allocated. When the associated PCI device is deleted,
>> pci_del_option_rom() calls vmstate_unregister_ram() which tries to
>> flag a NULL RAMBlock because 'has_rom' is set, leading to a SEGV .
>>
>> The use of vmstate_unregister_ram() in the PCI device was added in
>> commit b0e56e0b63f350691b52d3e75e89bb64143fbeff ("unset RAMBlock idstr
>> when unregister MemoryRegion")
> 
> I don't see it in that commit. I think it was part of the original
> split by Avi.

ok. That was pointed out to me by Paolo. It is hard to track all
the changes.

>> and from the archive in
>> http://lists.gnu.org/archive/html/qemu-devel/2014-04/msg00282.html, it
>> seems that it was trying to fix a reference count issue.
>>
>> vmstate_unregister_ram() being a work around, let's remove it to fix
>> the current SEGV issue
>> and let's try to find a fix for the initial ref
>> count issue if we can reproduce.
>>
>> Signed-off-by: Cédric Le Goater <c...@kaod.org>
> 
> What kind of testing did you do on this patch? Could you include
> that info in the commit log pls?
> 
> I think you need to at least add/remove some devices, then migrate.

ok, for next round. I plugged/unplugged a :

        0034:01:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx]

and then migrated.

Here is the original segv backtrace:

 I caught this bug while deleting a passthrough device from a pseries
 machine. Here is the stack:
   
   #0  qemu_ram_unset_migratable (rb=0x0) at 
/home/legoater/work/qemu/qemu-xive-3.0.git/exec.c:1994
   #1  0x000000010072def0 in vmstate_unregister_ram (mr=0x101796af0, 
dev=<optimized out>)
   #2  0x0000000100694e5c in pci_del_option_rom (pdev=0x101796330)
   #3  pci_qdev_unrealize (dev=<optimized out>, errp=<optimized out>)
   #4  0x00000001005ff910 in device_set_realized (obj=0x101796330, 
value=<optimized out>, errp=0x0)
   #5  0x00000001007a487c in property_set_bool (obj=0x101796330, v=<optimized 
out>, name=<optimized out>, 
   #6  0x00000001007a7878 in object_property_set (obj=0x101796330, 
v=0x7fff70033110, 
   #7  0x00000001007aaf1c in object_property_set_qobject (obj=0x101796330, 
value=<optimized out>, 
   #8  0x00000001007a7b90 in object_property_set_bool (obj=0x101796330, 
value=<optimized out>, 
   #9  0x00000001005fcdd8 in device_unparent (obj=0x101796330)
   #10 0x00000001007a6dd0 in object_finalize_child_property (obj=<optimized 
out>, name=<optimized out>, 
   #11 0x00000001007a50c0 in object_property_del_child (obj=0x10111f800, 
child=0x101796330, 
   #12 0x0000000100425cc0 in spapr_phb_remove_pci_device_cb (dev=0x101796330)
   #13 0x0000000100427974 in spapr_drc_release (drc=0x1017e2df0)
   #14 0x0000000100429098 in spapr_drc_detach (drc=0x1017e2df0)
   #15 0x00000001004294e0 in drc_isolate_physical (drc=0x1017e2df0)
   #16 0x000000010042a50c in rtas_set_isolation_state (state=0, idx=<optimized 
out>)

C. 

> 
>> ---
>>  hw/pci/pci.c | 11 -----------
>>  1 file changed, 11 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 80bc45930dee..78bf74e19f22 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -191,7 +191,6 @@ static PCIBus *pci_find_bus_nr(PCIBus *bus, int bus_num);
>>  static void pci_update_mappings(PCIDevice *d);
>>  static void pci_irq_handler(void *opaque, int irq_num, int level);
>>  static void pci_add_option_rom(PCIDevice *pdev, bool is_default_rom, Error 
>> **);
>> -static void pci_del_option_rom(PCIDevice *pdev);
>>  
>>  static uint16_t pci_default_sub_vendor_id = 
>> PCI_SUBVENDOR_ID_REDHAT_QUMRANET;
>>  static uint16_t pci_default_sub_device_id = PCI_SUBDEVICE_ID_QEMU;
>> @@ -1096,7 +1095,6 @@ static void pci_qdev_unrealize(DeviceState *dev, Error 
>> **errp)
>>      PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev);
>>  
>>      pci_unregister_io_regions(pci_dev);
>> -    pci_del_option_rom(pci_dev);
>>  
>>      if (pc->exit) {
>>          pc->exit(pci_dev);
>> @@ -2262,15 +2260,6 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
>> is_default_rom,
>>      pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
>>  }
>>  
>> -static void pci_del_option_rom(PCIDevice *pdev)
>> -{
>> -    if (!pdev->has_rom)
>> -        return;
>> -
>> -    vmstate_unregister_ram(&pdev->rom, &pdev->qdev);
>> -    pdev->has_rom = false;
>> -}
>> -
>>  /*
>>   * On success, pci_add_capability() returns a positive value
>>   * that the offset of the pci capability.
>> -- 
>> 2.13.6


Reply via email to