> Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari capability of
> pcie devices
> 
> On Fri, 2014-10-03 at 13:22 +0200, Knut Omang wrote:
> > On Wed, 2014-10-01 at 17:08 +0300, Marcel Apfelbaum wrote:
> > > On Wed, 2014-10-01 at 07:26 +0200, Knut Omang wrote:
> > > > On Tue, 2014-09-30 at 21:38 +0800, Gonglei wrote:
> > > > > > Subject: Re: [Qemu-devel] [PATCH v4 2/3] pcie: add check for ari
> capability of
> > > > > > pcie devices
> > > > > >
> > > > > > On Tue, Sep 30, 2014 at 06:11:25PM +0800, arei.gong...@huawei.com
> wrote:
> > > > > > > From: Gonglei <arei.gong...@huawei.com>
> > > > > > >
> > > > > > > In QEMU, ARI Forwarding is enabled default at emulation of PCIe
> > > > > > > ports. ARI Forwarding enable setting at firmware/OS Control
> handoff.
> > > > > > > If the bit is Set when a non-ARI Device is present, the non-ARI
> > > > > > > Device can respond to Configuration Space accesses under what it
> > > > > > > interprets as being different Device Numbers, and its Functions 
> > > > > > > can
> > > > > > > be aliased under multiple Device Numbers, generally leading to
> > > > > > > undesired behavior.
> > > > > >
> > > > > > So what is the undesired behaviour?
> > > > > > Did you observe such?
> > > > >
> > > > > I just observe the PCI device don't work, and the dmesg show me:
> > > > >
> > > > > [ 159.035250] pciehp 0000:05:00.0:pcie24: Button pressed on Slot (0 - 
> > > > > 4)
> > > > > [ 159.035274] pciehp 0000:05:00.0:pcie24: Card present on Slot (0 - 4)
> > > > > [ 159.036517] pciehp 0000:05:00.0:pcie24: PCI slot #0 - 4 - powering 
> > > > > on
> due to button press.
> > > > > [ 159.188049] pciehp 0000:05:00.0:pcie24: Failed to check link status
> > > > > [ 159.201968] pciehp 0000:05:00.0:pcie24: Card not present on Slot (0 
> > > > > -
> 4)
> > > > > [ 159.202529] pciehp 0000:05:00.0:pcie24: Already disabled on Slot (0 
> > > > > -
> 4)
> > > >
> > > > This seems very much like the symptoms I see when I use hotplug after
> > > > the latest changes to the hotplug code - do you have something that
> > > > confirms this has something to do with wrong interpretation of device
> > > > IDs? My suspicion is that something has gone wrong or is missing in the
> > > > hotplug logic but I havent had time to dig deeper into it yet.
> > > Can you please describe me the steps to reproduce the issue?
> >
> > Hmm, while trying to reproduce again I realize my hotplug issues are not
> > the same as the one Gonglei reports, let me come back to that in a
> > separate mail later,
> >
> > My main point here is that I don't see how this particular fix would
> > alleviate Gonglei's issue, as it does not seem to get triggered unless
> > there's a bug in the emulated port/switch?
> >
> > Gonglei, I assume you are use the TI x3130 in your example:
> > IMHO any PCIe x 1 device should fit into any of the (3) PCIe slots
> > provided by the TI x3130, and according to the spec:
> >
> > http://www.ti.com/lit/gpn/xio3130
> >
> > these slots appear as separate buses, which means that all devices will
> > have devfn 0.0 but on different buses, eg. if you have all three switch
> > slots (not to be confused with the slot number given by the PCI_SLOT()
> > macro) populated and no other secondary buses before it, you should see
> > the downstream switch ports as 01:xx:x and devices in 02:00.0, 03:00.0
> > and 04:00.0 for slots 0, 1 and 2. This seems to correspond well with the
> > current Qemu model.
> >
> > So unless your device itself exposes function# > 8 and is not ARI
> > capable (which would be a non-compliant device as far as I read the
> > standard) you should never be able to see any device in any of the
> > downstream ports have PCI_SLOT(devfn) != 0 ?
> >
> > With qemu master + my SR/IOV patch set + the igb patch (just to have an
> > ARI capable device to play with) here:
> >
> > https://github.com/knuto/qemu, branch sriov_patches_v3
> >
> > and a guest running F20 I am able to boot with a device (such as for
> > instance an e1000) inserted in a root port, I am also able to hot plug
> > it from the qemu monitor, eg.
> >
> > qemu-kvm  ... \
> >   -device ioh3420,slot=0,id=pcie_port.0
> >
> > ...
> >
> > (qemu) device_add e1000,vlan=1,bus=pcie_port.0,id=ne
> > (qemu) device_del ne
> >
> > In both cases the ARIfwd bit is disabled by default and not enabled by
> > the port driver (just as I would expect) so at least your comment
> > is wrong as far as I can see.
> >
> > Booting with a two-port downstream switch with no devices plugged, I see
> > the same, ARIfwd is not enabled on the downstream ports as long as no
> > device is in any slots, which is as expected, isn't it?
> >
> > qemu-kvm  ... \
> >   -device x3130-upstream,id=upsw \
> >   -device xio3130-downstream,bus=upsw,addr=0.0,chassis=5,id=ds_port.0 \
> >   -device xio3130-downstream,bus=upsw,addr=1.0,chassis=6,id=ds_port.1
> > ...
> >
> > The upstream port does not support ARIfwd in the QEMU model, which I
> > suppose is correct as it will only contain individual downstream
> > devices, which expose new buses but never have more than one function.
> >
> > However, either booting or hotplugging an e1000 into the downstream
> > switch, eg.
> >
> > (qemu) device_add e1000,vlan=1,bus=ds_port.0,id=ne
> >
> > with my current guest image kernel 3.13.10-200.fc20.x86_64+debug I get a
> > NULL pointer Oops in the guest,
> >
> > [    0.520009] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000038
> > [    0.521000] IP: [<ffffffff813d942d>]
> pcie_aspm_init_link_state+0x6ed/0x7c0
> > [    0.521000] PGD 0
> > [    0.521000] Oops: 0000 [#1] SMP
> 
> here is the gdb version of the stack trace:
> 
> 0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at
> drivers/pci/pcie/aspm.c:530
> 530                     parent = pdev->bus->parent->self->link_state;
> (gdb) where
> #0  0xffffffff813d9431 in alloc_pcie_link_state (pdev=0xffff880178cec520) at
> drivers/pci/pcie/aspm.c:530
> #1  pcie_aspm_init_link_state (pdev=0xffff880178cec520) at
> drivers/pci/pcie/aspm.c:578
> #2  0xffffffff813c6bbd in pci_scan_slot (bus=bus@entry=0xffff880178cad388,
> devfn=devfn@entry=0)
>     at drivers/pci/probe.c:1486
> #3  0xffffffff813e0d54 in pciehp_configure_device
> (p_slot=p_slot@entry=0xffff880173982760)
>     at drivers/pci/hotplug/pciehp_pci.c:54
> #4  0xffffffff813e05cb in board_added (p_slot=0xffff880173982760) at
> drivers/pci/hotplug/pciehp_ctrl.c:223
> #5  pciehp_enable_slot (p_slot=p_slot@entry=0xffff880173982760) at
> drivers/pci/hotplug/pciehp_ctrl.c:510
> #6  0xffffffff813e0ac0 in pciehp_power_thread (work=<optimized out>) at
> drivers/pci/hotplug/pciehp_ctrl.c:308
> #7  0xffffffff8109bc31 in process_one_work
> (worker=worker@entry=0xffff8801730fd728, work=0xffff8801731bbc30)
>     at kernel/workqueue.c:2214
> #8  0xffffffff8109c1eb in worker_thread (__worker=0xffff8801730fd728) at
> kernel/workqueue.c:2340
> #9  0xffffffff810a43ef in kthread (_create=0xffff88017336edc8) at
> kernel/kthread.c:207
> #10 <signal handler called>
> #11 0x0000000000000000 in irq_stack_union ()
> #12 0x0000000000000000 in ?? ()
> (gdb) p pdev->bus->parent->self
> $2 = (struct pci_dev *) 0x0 <irq_stack_union>
> (gdb)
> 
> I observe the same behaviour with guest kernel 3.16.3-200.fc20.x86_64.
> 
> Knut
> 
Hi, Knut
 I haven't encounter this issue you described with guest kernel RLEH 7.0 (Linux 
3.10.0-121.e17.x85_64)

The command line:
# ./qemu-system-x86_64 -enable-kvm -m 2048 -machine q35 -device 
ide-drive,bus=ide.2,drive=MacHDD \
 -drive id=MacHDD,if=none,file=/home/redhat_q35.img -monitor stdio -vnc :10 \
 -readconfig ../docs/q35-chipset.cfg.ori 
QEMU 2.1.50 monitor - type 'help' for more information
(qemu) device_add e1000,vlan=1,bus=pcie-switch-downstream-port-1-1,id=ne       
-->   (it works)
(qemu) device_del ne
(qemu) device_add 
e1000,vlan=1,bus=pcie-switch-downstream-port-1-1,id=ne,addr=1.0     -->   (it 
doesn't work)
(qemu) device_del ne
Device 'ne' not found
(qemu)

# cat ../docs/q35-chipset.cfg.ori
[device "ich9-ehci-2"]
  driver = "ich9-usb-ehci2"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1a.7"

[device "ich9-uhci-4"]
  driver = "ich9-usb-uhci4"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1a.0"
  masterbus = "ich9-ehci-2.0"
  firstport = "0"

[device "ich9-uhci-5"]
  driver = "ich9-usb-uhci5"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1a.1"
  masterbus = "ich9-ehci-2.0"
  firstport = "2"

[device "ich9-uhci-6"]
  driver = "ich9-usb-uhci6"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1a.2"
  masterbus = "ich9-ehci-2.0"
  firstport = "4"


[device "ich9-hda-audio"]
  driver = "ich9-intel-hda"
  bus = "pcie.0"
  addr = "1b.0"


[device "ich9-pcie-port-1"]
  driver = "ioh3420"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1c.0"
  port = "1"
  chassis = "1"

[device "ich9-pcie-port-2"]
  driver = "ioh3420"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1c.1"
  port = "2"
  chassis = "2"

[device "ich9-pcie-port-3"]
  driver = "ioh3420"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1c.2"
  port = "3"
  chassis = "3"

[device "ich9-pcie-port-4"]
  driver = "ioh3420"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1c.3"
  port = "4"
  chassis = "4"

##
# Example PCIe switch with two downstream ports
#
[device "pcie-switch-upstream-port-1"]
  driver = "x3130-upstream"
  bus = "ich9-pcie-port-4"
  addr = "00.0"

[device "pcie-switch-downstream-port-1-1"]
  driver = "xio3130-downstream"
  multifunction = "on"
  bus = "pcie-switch-upstream-port-1"
  addr = "00.0"
  port = "1"
  chassis = "5"

[device "pcie-switch-downstream-port-1-2"]
  driver = "xio3130-downstream"
  multifunction = "on"
  bus = "pcie-switch-upstream-port-1"
  addr = "00.1"
  port = "1"
  chassis = "6"

[device "ich9-ehci-1"]
  driver = "ich9-usb-ehci1"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1d.7"

[device "ich9-uhci-1"]
  driver = "ich9-usb-uhci1"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1d.0"
  masterbus = "ich9-ehci-1.0"
  firstport = "0"

[device "ich9-uhci-2"]
  driver = "ich9-usb-uhci2"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1d.1"
  masterbus = "ich9-ehci-1.0"
  firstport = "2"

[device "ich9-uhci-3"]
  driver = "ich9-usb-uhci3"
  multifunction = "on"
  bus = "pcie.0"
  addr = "1d.2"
  masterbus = "ich9-ehci-1.0"
  firstport = "4"


[device "ich9-pci-bridge"]
  driver = "i82801b11-bridge"
  bus = "pcie.0"
  addr = "1e.0"

[device "pci.0"]
driver = "pci-bridge" 
bus = "ich9-pci-bridge" 
addr = "1.0" 
chassis_nr = "1" 

[device "pci.1"]
driver = "pci-bridge" 
bus = "ich9-pci-bridge" 
addr = "2.0" 
chassis_nr = "2"


Best regards,
-Gonglei

> > The exact same happens if I use the ARI capable igb device instead.
> >
> > I will upgrade the guest to a newer image (what kernel are you running with,
> Gonglei?)
> >
> > lspci reports this on the guest (before I try to add any device in the
> downstream port(s)):
> >
> > 00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM
> Controller
> > 00:01.0 VGA compatible controller: Cirrus Logic GD 5446
> > 00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
> > 00:03.0 Unclassified device [0002]: Red Hat, Inc Virtio filesystem
> > 00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI
> Express Root Port 0 (rev 02)
> > 00:05.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch
> (Upstream) (rev 02)
> > 00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface 
> > Controller
> (rev 02)
> > 00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6
> port SATA Controller [AHCI mode] (rev 02)
> > 00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev
> 02)
> > 02:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch
> (Downstream) (rev 01)
> > 02:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch
> (Downstream) (rev 01)
> >
> > # lspci -t
> > -[0000:00]-+-00.0
> >            +-01.0
> >            +-02.0
> >            +-03.0
> >            +-04.0-[01]--
> >            +-05.0-[02-04]--+-00.0-[03]--
> >            |               \-01.0-[04]--
> >            +-1f.0
> >            +-1f.2
> >            \-1f.3
> >
> > eg. the devices plugged into the downstream ports will end up at 03:00.0 and
> 04:00.0,
> > the info qtree command verifies that the e1000 insert ends up in 03:00.0 as
> expected.
> >
> > Similar if I instead use
> >
> > (qemu) device_add e1000,vlan=1,bus=ds_port.1,id=ne2
> >
> > I see this with info qtree:
> >
> >   class Ethernet controller, addr 04:00.0, pci id 8086:100e (sub 1af4:1100)
> >
> > Anyway I suppose this indicates that something is not right/is missing
> > with the downstream switch emulation, and not with the (generic) way the
> > ARIfwd cap/ctrl bit works
> >
> > > It should work correctly by now, maybe a "surprise removal/addition"
> > > warning and nothing more.
> > >
> > > Thank you,
> > > Marcel
> > >
> > > >
> > > > I am just concerned that this might alleviate the symptoms you see but
> > > > not fix the problem itself,
> > > >
> > > > Knut
> > > >
> > > > > > Please make this explicit in the commit log.
> > > > > >
> > > > > Sorry, I copied the description from PCIe spec. :(
> > > > >
> > > > > IMPLEMENTATION NOTE at Page 19:
> > > > >
> > > > >
> https://www.pcisig.com/specifications/pciexpress/specifications/ECN-alt-rid-in
> terpretation-070604.pdf
> > > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > So, for PCI devices attached in PCIe root ports or downstream 
> > > > > > > pots,
> > > > > > > we should assure that its slot is not non-zero.
> >
> >
> >
> > > > > > > For PCIe devices, which
> > > > > > > ARP capability is not enabled, we also should assure that its slot
> > > > > > > is not non-zero.
> > > > > >
> > > > > > not non zero => zero
> > > > > >
> > > > > OK.
> > > > >
> > > > > > >
> > > > > > > Signed-off-by: Gonglei <arei.gong...@huawei.com>
> > > > > > > ---
> > > > > > >  hw/pci/pci.c          |  4 ++++
> > > > > > >  hw/pci/pcie.c         | 51
> > > > > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  include/hw/pci/pcie.h |  1 +
> > > > > > >  3 files changed, 56 insertions(+)
> > > > > > >
> > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > > > > index 6ce75aa..22b7ca0 100644
> > > > > > > --- a/hw/pci/pci.c
> > > > > > > +++ b/hw/pci/pci.c
> > > > > > > @@ -1770,6 +1770,10 @@ static int pci_qdev_init(DeviceState
> *qdev)
> > > > > > >          }
> > > > > > >      }
> > > > > > >
> > > > > > > +    if (pcie_cap_slot_check(bus, pci_dev)) {
> > > > > > > +        return -1;
> > > > > > > +    }
> > > > > > > +
> > > > > > >      /* rom loading */
> > > > > > >      is_default_rom = false;
> > > > > > >      if (pci_dev->romfile == NULL && pc->romfile != NULL) {
> > > > > > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > > > > > index 1babddf..b82211a 100644
> > > > > > > --- a/hw/pci/pcie.c
> > > > > > > +++ b/hw/pci/pcie.c
> > > > > > > @@ -633,3 +633,54 @@ void pcie_ari_init(PCIDevice *dev, uint16_t
> offset,
> > > > > > uint16_t nextfn)
> > > > > > >                          offset, PCI_ARI_SIZEOF);
> > > > > > >      pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn &
> 0xff) << 8);
> > > > > > >  }
> > > > > > > +
> > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev)
> > > > > > > +{
> > > > > > > +    Object *obj = OBJECT(bus);
> > > > > > > +
> > > > > > > +    if (pci_bus_is_root(bus)) {
> > > > > > > +        return 0;
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    if (object_dynamic_cast(obj, TYPE_PCIE_BUS)) {
> > > > > > > +        DeviceState *parent = qbus_get_parent(BUS(obj));
> > > > > > > +        PCIDevice *pci_dev = PCI_DEVICE(parent);
> > > > > > > +        uint8_t port_type;
> > > > > > > +        /*
> > > > > > > +         * Root ports and downstream ports of switches are the
> hot
> > > > > > > +         * pluggable ports in a PCI Express hierarchy.
> > > > > > > +         * PCI Express supports chip-to-chip interconnect, a PCIe
> link can
> > > > > > > +         * only connect one PCI device/Switch/EndPoint or
> PCI-bridge.
> > > > > > > +         *
> > > > > > > +         * 7.3. Configuration Transaction Rules (PCI Express
> specification
> > > > > > 3.0)
> > > > > > > +         * 7.3.1. Device Number
> > > > > > > +         *
> > > > > > > +         * Downstream Ports that do not have ARI Forwarding
> enabled
> > > > > > must
> > > > > > > +         * associate only Device 0 with the device attached to
> the Logical
> > > > > > Bus
> > > > > > > +         * representing the Link from the Port.
> > > > > > > +         *
> > > > > > > +         * In QEMU, ARI Forwarding is enabled default at
> emulation of
> > > > > > PCIe
> > > > > >
> > > > > > s/enabled default/enabled by default/
> >
> > It is not when I try this (as discussed above)
> > The capability is there but it is disabled by default.
> >
> > > > > >
> > > > > OK.
> > > > >
> > > > > > > +         * ports. ARI Forwarding enable setting at firmware/OS
> Control
> > > > > > handoff.
> > > > > >
> > > > > >
> > > > > > can parse last sentence. drop it?
> > > > > >
> > > > > OK.
> > > > >
> > > > > > > +         * If the bit is Set when a non-ARI Device is present, 
> > > > > > > the
> non-ARI
> > > > > > > +           Device can respond to Configuration Space accesses
> under
> > > > > > what it
> > > > > > > +         * interprets as being different Device Numbers, and its
> Functions
> > > > > > can
> > > > > > > +         * be aliased under multiple Device Numbers, generally
> leading to
> > > > > > > +         * undesired behavior.
> >
> > I don't understand how this is possible to achieve.
> >
> > Knut
> >
> > > > > > I don't think any badness really happens.
> > > > > > Did you observe any?
> > > > > >
> > > > > Same with above explanation.
> > > > >
> > > > > >
> > > > > >
> > > > > > > +         */
> > > > > > > +        port_type = pcie_cap_get_type(pci_dev);
> > > > > > > +        if (port_type == PCI_EXP_TYPE_DOWNSTREAM ||
> > > > > > > +            port_type == PCI_EXP_TYPE_ROOT_PORT) {
> > > > > > > +            if (!pci_is_express(dev) ||
> > > > > > > +                (pci_is_express(dev) &&
> > > > > > > +                !pcie_find_capability(dev,
> PCI_EXT_CAP_ID_ARI))) {
> > > > > >
> > > > > > I would just skip the test for a non express function.
> > > > > >
> > > > > Sorry?
> > > > >
> > > > > Best regards,
> > > > > -Gonglei
> > > > >
> > > > > > > +                if (PCI_SLOT(dev->devfn) != 0) {
> > > > > > > +                    error_report("PCIe: non-ARI device can't
> be
> > > > > > populated"
> > > > > > > +                                 " in slot %d",
> > > > > > PCI_SLOT(dev->devfn));
> > > > > > > +                    return -1;
> > > > > > > +                }
> > > > > > > +            }
> > > > > > > +        }
> > > > > > > +    }
> > > > > > > +
> > > > > > > +    return 0;
> > > > > > > +}
> > > > > > > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > > > > > > index d139d58..1d8f3f4 100644
> > > > > > > --- a/include/hw/pci/pcie.h
> > > > > > > +++ b/include/hw/pci/pcie.h
> > > > > > > @@ -115,6 +115,7 @@ void pcie_add_capability(PCIDevice *dev,
> > > > > > >                           uint16_t offset, uint16_t size);
> > > > > > >
> > > > > > >  void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t
> nextfn);
> > > > > > > +int pcie_cap_slot_check(PCIBus *bus, PCIDevice *dev);
> > > > > > >
> > > > > > >  extern const VMStateDescription vmstate_pcie_device;
> > > > > > >
> > > > > > > --
> > > > > > > 1.7.12.4
> > > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> 

Reply via email to