> Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of > pcie devices > > On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote: > > On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote: > > > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote: > > > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote: > > > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari > capability of > > > > > > pcie devices > > > > > > > > > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, arei.gong...@huawei.com > wrote: > > > > > > > From: Gonglei <arei.gong...@huawei.com> > > > > > > > > > > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe > > > > > > > ports. ARI Forwarding enable setting at firmware/OS Control > handoff. > > > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI > > > > > > > Device can respond to Configuration Space accesses under what it > > > > > > > interprets as being different Device Numbers, and its Functions > > > > > > > can > > > > > > > be aliased under multiple Device Numbers, generally leading to > > > > > > > undesired behavior. > > > > > > > > > > > > So what is the undesired behaviour? > > > > > > Did you observe such? > > > > > > > > > > I just observe the PCI device don't work, and the dmesg show me: > > > > > > > > > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - > > > > > 4) > > > > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4) > > > > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering > > > > > on > due to button press. > > > > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status > > > > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 > > > > > - > 4) > > > > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 > > > > > - > 4) > > > > > > > > This seems very much like the symptoms I see when I use hotplug after > > > > the latest changes to the hotplug code - do you have something that > > > > confirms this has something to do with wrong interpretation of device > > > > IDs? My suspicion is that something has gone wrong or is missing in the > > > > hotplug logic but I havent had time to dig deeper into it yet. > > > Can you please describe me the steps to reproduce the issue? > > > > Hmm, while trying to reproduce again I realize my hotplug issues are not > > the same as the one Gonglei reports, let me come back to that in a > > separate mail later, > > > > My main point here is that I don't see how this particular fix would > > alleviate Gonglei's issue, as it does not seem to get triggered unless > > there's a bug in the emulated port/switch? > > > > Gonglei, I assume you are use the TI x3130 in your example: > > IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots > > provided by the TI x3130, and according to the spec: > > > > http://www.ti.com/lit/gpn/xio3130 > > > > these slots appear as separate buses, which means that all devices will > > have devfn 0.0 but on different buses, eg. if you have all three switch > > slots (not to be confused with the slot number given by the PCI_SLOT() > > macro) populated and no other secondary buses before it, you should see > > the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0 > > and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the > > current Qemu model. > > > > So unless your device itself exposes function# > 8 and is not ARI > > capable (which would be a non-compliant device as far as I read the > > standard) you should never be able to see any device in any of the > > downstream ports have PCI_SLOT(devfn) != 0 ? > > > > With qemu master + my SR/IOV patch set + the igb patch (just to have an > > ARI capable device to play with) here: > > > > https://github.com/knuto/qemu, branch sriov_patches_v3 > > > > and a guest running F20 I am able to boot with a device (such as for > > instance an e1000) inserted in a root port, I am also able to hot plug > > it from the qemu monitor, eg. > > > > qemu-kvm ... \ > > -device ioh3420,slot=0,id=pcie_port.0 > > > > ... > > > > (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne > > (qemu) device_del ne > > > > In both cases the ARIfwd bit is disabled by default and not enabled by > > the port driver (just as I would expect) so at least your comment > > is wrong as far as I can see. > > > > Booting with a two-port downstream switch with no devices plugged, I see > > the same, ARIfwd is not enabled on the downstream ports as long as no > > device is in any slots, which is as expected, isn't it? > > > > qemu-kvm ... \ > > -device x3130-upstream,id=upsw \ > > -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \ > > -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1 > > ... > > > > The upstream port does not support ARIfwd in the QEMU model, which I > > suppose is correct as it will only contain individual downstream > > devices, which expose new buses but never have more than one function. > > > > However, either booting or hotplugging an e1000 into the downstream > > switch, eg. > > > > (qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne > > > > with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a > > NULL pointer Oops in the guest, > > > > [ 0.520009] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000038 > > [ 0.521000] IP: [<ffffffff813d942d>] > pcie_aspm_init_link_state+0x6ed/0x7c0 > > [ 0.521000] PGD 0 > > [ 0.521000] Oops: 0000 [#1] SMP > > here is the gdb version of the stack trace: > > 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at > drivers/pci/pcie/aspm.c:530 > 530 parent = pdev->bus->parent->self->link_state; > (gdb) where > #0 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at > drivers/pci/pcie/aspm.c:530 > #1 pcie_aspm_init_link_state (pdev=0xffff880178cec520) at > drivers/pci/pcie/aspm.c:578 > #2 0xffffffff813c6bbd in pci_scan_slot (bus=bus@entry=0xffff880178cad388, > devfn=devfn@entry=0) > at drivers/pci/probe.c:1486 > #3 0xffffffff813e0d54 in pciehp_configure_device > (p_slot=p_slot@entry=0xffff880173982760) > at drivers/pci/hotplug/pciehp_pci.c:54 > #4 0xffffffff813e05cb in board_added (p_slot=0xffff880173982760) at > drivers/pci/hotplug/pciehp_ctrl.c:223 > #5 pciehp_enable_slot (p_slot=p_slot@entry=0xffff880173982760) at > drivers/pci/hotplug/pciehp_ctrl.c:510 > #6 0xffffffff813e0ac0 in pciehp_power_thread (work=<optimized out>) at > drivers/pci/hotplug/pciehp_ctrl.c:308 > #7 0xffffffff8109bc31 in process_one_work > (worker=worker@entry=0xffff8801730fd728, work=0xffff8801731bbc30) > at kernel/workqueue.c:2214 > #8 0xffffffff8109c1eb in worker_thread (__worker=0xffff8801730fd728) at > kernel/workqueue.c:2340 > #9 0xffffffff810a43ef in kthread (_create=0xffff88017336edc8) at > kernel/kthread.c:207 > #10 <signal handler called> > #11 0x0000000000000000 in irq_stack_union () > #12 0x0000000000000000 in ?? () > (gdb) p pdev->bus->parent->self > $2 = (struct pci_dev *) 0x0 <irq_stack_union> > (gdb) > > I observe the same behaviour with guest kernel 3.16.3-200.fc20.x86_64. > > Knut > Hi, Knut I haven't encounter this issue you described with guest kernel RLEH 7.0 (Linux 3.10.0-121.e17.x85_64)
The command line: # ./qemu-system-x86_64 -enable-kvm -m 2048 -machine q35 -device ide-drive,bus=ide.2,drive=MacHDD \ -drive id=MacHDD,if=none,file=/home/redhat_q35.img -monitor stdio -vnc :10 \ -readconfig ../docs/q35-chipset.cfg.ori QEMU 2.1.50 monitor - type 'help' for more information (qemu) device_add e1000,vlan=1,bus=pcie-switch-downstream-port-1-1,id=ne --> (it works) (qemu) device_del ne (qemu) device_add e1000,vlan=1,bus=pcie-switch-downstream-port-1-1,id=ne,addr=1.0 --> (it doesn't work) (qemu) device_del ne Device 'ne' not found (qemu) # cat ../docs/q35-chipset.cfg.ori [device "ich9-ehci-2"] driver = "ich9-usb-ehci2" multifunction = "on" bus = "pcie.0" addr = "1a.7" [device "ich9-uhci-4"] driver = "ich9-usb-uhci4" multifunction = "on" bus = "pcie.0" addr = "1a.0" masterbus = "ich9-ehci-2.0" firstport = "0" [device "ich9-uhci-5"] driver = "ich9-usb-uhci5" multifunction = "on" bus = "pcie.0" addr = "1a.1" masterbus = "ich9-ehci-2.0" firstport = "2" [device "ich9-uhci-6"] driver = "ich9-usb-uhci6" multifunction = "on" bus = "pcie.0" addr = "1a.2" masterbus = "ich9-ehci-2.0" firstport = "4" [device "ich9-hda-audio"] driver = "ich9-intel-hda" bus = "pcie.0" addr = "1b.0" [device "ich9-pcie-port-1"] driver = "ioh3420" multifunction = "on" bus = "pcie.0" addr = "1c.0" port = "1" chassis = "1" [device "ich9-pcie-port-2"] driver = "ioh3420" multifunction = "on" bus = "pcie.0" addr = "1c.1" port = "2" chassis = "2" [device "ich9-pcie-port-3"] driver = "ioh3420" multifunction = "on" bus = "pcie.0" addr = "1c.2" port = "3" chassis = "3" [device "ich9-pcie-port-4"] driver = "ioh3420" multifunction = "on" bus = "pcie.0" addr = "1c.3" port = "4" chassis = "4" ## # Example PCIe switch with two downstream ports # [device "pcie-switch-upstream-port-1"] driver = "x3130-upstream" bus = "ich9-pcie-port-4" addr = "00.0" [device "pcie-switch-downstream-port-1-1"] driver = "xio3130-downstream" multifunction = "on" bus = "pcie-switch-upstream-port-1" addr = "00.0" port = "1" chassis = "5" [device "pcie-switch-downstream-port-1-2"] driver = "xio3130-downstream" multifunction = "on" bus = "pcie-switch-upstream-port-1" addr = "00.1" port = "1" chassis = "6" [device "ich9-ehci-1"] driver = "ich9-usb-ehci1" multifunction = "on" bus = "pcie.0" addr = "1d.7" [device "ich9-uhci-1"] driver = "ich9-usb-uhci1" multifunction = "on" bus = "pcie.0" addr = "1d.0" masterbus = "ich9-ehci-1.0" firstport = "0" [device "ich9-uhci-2"] driver = "ich9-usb-uhci2" multifunction = "on" bus = "pcie.0" addr = "1d.1" masterbus = "ich9-ehci-1.0" firstport = "2" [device "ich9-uhci-3"] driver = "ich9-usb-uhci3" multifunction = "on" bus = "pcie.0" addr = "1d.2" masterbus = "ich9-ehci-1.0" firstport = "4" [device "ich9-pci-bridge"] driver = "i82801b11-bridge" bus = "pcie.0" addr = "1e.0" [device "pci.0"] driver = "pci-bridge" bus = "ich9-pci-bridge" addr = "1.0" chassis_nr = "1" [device "pci.1"] driver = "pci-bridge" bus = "ich9-pci-bridge" addr = "2.0" chassis_nr = "2" Best regards, -Gonglei > > The exact same happens if I use the ARI capable igb device instead. > > > > I will upgrade the guest to a newer image (what kernel are you running with, > Gonglei?) > > > > lspci reports this on the guest (before I try to add any device in the > downstream port(s)): > > > > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM > Controller > > 00:01.0 VGA compatible controller: Cirrus Logic GD 5446 > > 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device > > 00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem > > 00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI > Express Root Port 0 (rev 02) > > 00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch > (Upstream) (rev 02) > > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface > > Controller > (rev 02) > > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 > port SATA Controller [AHCI mode] (rev 02) > > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev > 02) > > 02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch > (Downstream) (rev 01) > > 02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch > (Downstream) (rev 01) > > > > # lspci -t > > -[0000:00]-+-00.0 > > +-01.0 > > +-02.0 > > +-03.0 > > +-04.0-[01]-- > > +-05.0-[02-04]--+-00.0-[03]-- > > | \-01.0-[04]-- > > +-1f.0 > > +-1f.2 > > \-1f.3 > > > > eg. the devices plugged into the downstream ports will end up at 03:00.0 and > 04:00.0, > > the info qtree command verifies that the e1000 insert ends up in 03:00.0 as > expected. > > > > Similar if I instead use > > > > (qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2 > > > > I see this with info qtree: > > > > class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100) > > > > Anyway I suppose this indicates that something is not right/is missing > > with the downstream switch emulation, and not with the (generic) way the > > ARIfwd cap/ctrl bit works > > > > > It should work correctly by now, maybe a "surprise removal/addition" > > > warning and nothing more. > > > > > > Thank you, > > > Marcel > > > > > > > > > > > I am just concerned that this might alleviate the symptoms you see but > > > > not fix the problem itself, > > > > > > > > Knut > > > > > > > > > > Please make this explicit in the commit log. > > > > > > > > > > > Sorry, I copied the description from PCIe spec. :( > > > > > > > > > > IMPLEMENTATION NOTE at Page 19: > > > > > > > > > > > https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-in > terpretation-070604.pdf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So, for PCI devices attached in PCIe root ports or downstream > > > > > > > pots, > > > > > > > we should assure that its slot is not non-zero. > > > > > > > > > > > > > For PCIe devices, which > > > > > > > ARP capability is not enabled, we also should assure that its slot > > > > > > > is not non-zero. > > > > > > > > > > > > not non zero => zero > > > > > > > > > > > OK. > > > > > > > > > > > > > > > > > > > Signed-off-by: Gonglei <arei.gong...@huawei.com> > > > > > > > --- > > > > > > > hw/pci/pci.c | 4 ++++ > > > > > > > hw/pci/pcie.c | 51 > > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > > include/hw/pci/pcie.h | 1 + > > > > > > > 3 files changed, 56 insertions(+) > > > > > > > > > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > > > > > index 6ce75aa..22b7ca0 100644 > > > > > > > --- a/hw/pci/pci.c > > > > > > > +++ b/hw/pci/pci.c > > > > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState > *qdev) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > + if (pcie_cap_slot_check(bus, pci_dev)) { > > > > > > > + return -1; > > > > > > > + } > > > > > > > + > > > > > > > /* rom loading */ > > > > > > > is_default_rom = false; > > > > > > > if (pci_dev->romfile == NULL && pc->romfile != NULL) { > > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c > > > > > > > index 1babddf..b82211a 100644 > > > > > > > --- a/hw/pci/pcie.c > > > > > > > +++ b/hw/pci/pcie.c > > > > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t > offset, > > > > > > uint16_t nextfn) > > > > > > > offset, PCI_ARI_SIZEOF); > > > > > > > pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & > 0xff) << 8); > > > > > > > } > > > > > > > + > > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev) > > > > > > > +{ > > > > > > > + Object *obj = OBJECT(bus); > > > > > > > + > > > > > > > + if (pci_bus_is_root(bus)) { > > > > > > > + return 0; > > > > > > > + } > > > > > > > + > > > > > > > + if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) { > > > > > > > + DeviceState *parent = qbus_get_parent(BUS(obj)); > > > > > > > + PCIDevice *pci_dev = PCI_DEVICE(parent); > > > > > > > + uint8_t port_type; > > > > > > > + /* > > > > > > > + * Root ports and downstream ports of switches are the > hot > > > > > > > + * pluggable ports in a PCI Express hierarchy. > > > > > > > + * PCI Express supports chip-to-chip interconnect, a PCIe > link can > > > > > > > + * only connect one PCI device/Switch/EndPoint or > PCI-bridge. > > > > > > > + * > > > > > > > + * 7.3. Configuration Transaction Rules (PCI Express > specification > > > > > > 3.0) > > > > > > > + * 7.3.1. Device Number > > > > > > > + * > > > > > > > + * Downstream Ports that do not have ARI Forwarding > enabled > > > > > > must > > > > > > > + * associate only Device 0 with the device attached to > the Logical > > > > > > Bus > > > > > > > + * representing the Link from the Port. > > > > > > > + * > > > > > > > + * In QEMU, ARI Forwarding is enabled default at > emulation of > > > > > > PCIe > > > > > > > > > > > > s/enabled default/enabled by default/ > > > > It is not when I try this (as discussed above) > > The capability is there but it is disabled by default. > > > > > > > > > > > > > OK. > > > > > > > > > > > > + * ports. ARI Forwarding enable setting at firmware/OS > Control > > > > > > handoff. > > > > > > > > > > > > > > > > > > can parse last sentence. drop it? > > > > > > > > > > > OK. > > > > > > > > > > > > + * If the bit is Set when a non-ARI Device is present, > > > > > > > the > non-ARI > > > > > > > + Device can respond to Configuration Space accesses > under > > > > > > what it > > > > > > > + * interprets as being different Device Numbers, and its > Functions > > > > > > can > > > > > > > + * be aliased under multiple Device Numbers, generally > leading to > > > > > > > + * undesired behavior. > > > > I don't understand how this is possible to achieve. > > > > Knut > > > > > > > > I don't think any badness really happens. > > > > > > Did you observe any? > > > > > > > > > > > Same with above explanation. > > > > > > > > > > > > > > > > > > > > > > > > + */ > > > > > > > + port_type = pcie_cap_get_type(pci_dev); > > > > > > > + if (port_type == PCI_EXP_TYPE_DOWNSTREAM || > > > > > > > + port_type == PCI_EXP_TYPE_ROOT_PORT) { > > > > > > > + if (!pci_is_express(dev) || > > > > > > > + (pci_is_express(dev) && > > > > > > > + !pcie_find_capability(dev, > PCI_EXT_CAP_ID_ARI))) { > > > > > > > > > > > > I would just skip the test for a non express function. > > > > > > > > > > > Sorry? > > > > > > > > > > Best regards, > > > > > -Gonglei > > > > > > > > > > > > + if (PCI_SLOT(dev->devfn) != 0) { > > > > > > > + error_report("PCIe: non-ARI device can't > be > > > > > > populated" > > > > > > > + " in slot %d", > > > > > > PCI_SLOT(dev->devfn)); > > > > > > > + return -1; > > > > > > > + } > > > > > > > + } > > > > > > > + } > > > > > > > + } > > > > > > > + > > > > > > > + return 0; > > > > > > > +} > > > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h > > > > > > > index d139d58..1d8f3f4 100644 > > > > > > > --- a/include/hw/pci/pcie.h > > > > > > > +++ b/include/hw/pci/pcie.h > > > > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev, > > > > > > > uint16_t offset, uint16_t size); > > > > > > > > > > > > > > void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t > nextfn); > > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev); > > > > > > > > > > > > > > extern const VMStateDescription vmstate_pcie_device; > > > > > > > > > > > > > > -- > > > > > > > 1.7.12.4 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >