Proposes best practices on how to use PCI Express/PCI device in PCI Express based machines and explain the reasoning behind them.
Signed-off-by: Marcel Apfelbaum <mar...@redhat.com> --- Hi, I am sending the doc twice, it appears the first time didn't make it to qemu-devel list. RFC->v2: - Addressed a lot of comments from the reviewers (many thanks to all, especially to Laszlo) Since the RFC mail-thread was relatively long and already has passed a lot of time from the RFC, I post this version even if is very possible that I left some of the comments out, my apologies if so. I will go over the comments again, in the meantime please feel free to comment on this version, even if on something you've already pointed out. It may take a day or two until I'll be able to respond, but I will do my best to address all comments. Thanks, Marcel docs/pcie.txt | 273 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 273 insertions(+) create mode 100644 docs/pcie.txt diff --git a/docs/pcie.txt b/docs/pcie.txt new file mode 100644 index 0000000..7d852f1 --- /dev/null +++ b/docs/pcie.txt @@ -0,0 +1,273 @@ +PCI EXPRESS GUIDELINES +====================== + +1. Introduction +================ +The doc proposes best practices on how to use PCI Express/PCI device +in PCI Express based machines and explains the reasoning behind them. + + +2. Device placement strategy +============================ +QEMU does not have a clear socket-device matching mechanism +and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot. +Plugging a PCI device into a PCI Express slot might not always work and +is weird anyway since it cannot be done for "bare metal". +Plugging a PCI Express device into a PCI slot will hide the Extended +Configuration Space thus is also not recommended. + +The recommendation is to separate the PCI Express and PCI hierarchies. +PCI Express devices should be plugged only into PCI Express Root Ports and +PCI Express Downstream ports. + +2.1 Root Bus (pcie.0) +===================== +Place only the following kinds of devices directly on the Root Complex: + (1) Devices with dedicated, specific functionality (network card, + graphics card, IDE controller, etc); place only legacy PCI devices on + the Root Complex. These will be considered Integrated Endpoints. + Note: Integrated devices are not hot-pluggable. + + Although the PCI Express spec does not forbid PCI Express devices as + Integrated Endpoints, existing hardware mostly integrates legacy PCI + devices with the Root Complex. Guest OSes are suspected to behave + strangely when PCI Express devices are integrated with the Root Complex. + + (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express + hierarchies. + + (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI hierarchies. + + (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses are needed. + + pcie.0 bus + ----------------------------------------------------------------------------- + | | | | + ----------- ------------------ ------------------ -------------- + | PCI Dev | | PCIe Root Port | | DMI-PCI bridge | | pxb-pcie | + ----------- ------------------ ------------------ -------------- + +2.1.1 To plug a device into a pcie.0 as Root Complex Integrated Device use: + -device <dev>[,bus=pcie.0] +2.1.2 To expose a new PCI Express Root Bus use: + -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z] + Only PCI Express Root Ports and DMI-PCI bridges can be connected to the pcie.1 bus: + -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ + -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1 + + +2.2 PCI Express only hierarchy +============================== +Always use PCI Express Root Ports to start PCI Express hierarchies. + +A PCI Express Root bus supports up to 32 devices. Since each +PCI Express Root Port is a function and a multi-function +device may support up to 8 functions, the maximum possible +PCI Express Root Ports per PCI Express Root Bus is 256. + +Prefer coupling PCI Express Root Ports into multi-function devices +to keep a simple flat hierarchy that is enough for most scenarios. +Only use PCI Express Switches (x3130-upstream, xio3130-downstream) +if there is no more room for PCI Express Root Ports. +Please see section 4. for further justifications. + +Plug only PCI Express devices into PCI Express Ports. + + + pcie.0 bus + ---------------------------------------------------------------------------------- + | | | + ------------- ------------- ------------- + | Root Port | | Root Port | | Root Port | + ------------ ------------- ------------- + | -------------------------|------------------------ + ------------ | ----------------- | + | PCIe Dev | | PCI Express | Upstream Port | | + ------------ | Switch ----------------- | + | | | | + | ------------------- ------------------- | + | | Downstream Port | | Downstream Port | | + | ------------------- ------------------- | + -------------|-----------------------|------------ + ------------ + | PCIe Dev | + ------------ + +2.2.1 Plugging a PCI Express device into a PCI Express Root Port: + -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z] \ + -device <dev>,bus=root_port1 + Note that chassis parameter is compulsory, and must be unique + for each PCI Express Root Port. +2.2.2 Using multi-function PCI Express Root Ports: + -device ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0] \ + -device ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \ + -device ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \ +2.2.2 Plugging a PCI Express device into a Switch: + -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z] \ + -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ + -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]] \ + -device <dev>,bus=downstream_port1 + + +2.3 PCI only hierarchy +====================== +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices. +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI hierarchies. + +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge (having 32 slots) +and several PCI-PCI bridges attached to it (each supporting also 32 slots) will support +hundreds of legacy devices. The recommendation is to populate one PCI-PCI bridge +under the DMI-PCI bridge until is full and then plug a new PCI-PCI bridge... + + pcie.0 bus + ---------------------------------------------- + | | + ----------- ------------------ + | PCI Dev | | DMI-PCI BRIDGE | + ---------- ------------------ + | | + ----------- ------------------ + | PCI Dev | | PCI-PCI Bridge | + ----------- ------------------ + | | + ----------- ----------- + | PCI Dev | | PCI Dev | + ----------- ----------- + +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use: + -device <dev>[,bus=pcie.0] +2.3.2 Plugging a PCI device into a DMI-PCI bridge: + -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0] \ + -device <dev>,bus=dmi_pci_bridge1[,addr=x] +2.3.3 Plugging a PCI device into a PCI-PCI bridge: + -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0] \ + -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y] \ + -device <dev>,bus=pci_bridge1[,addr=x] + + +3. IO space issues +=================== +The PCI Express Root Ports and PCI Express Downstream ports are seen by +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec, +should reserve a 4K IO range for each even if only one (multifunction) +device can be plugged into them, resulting in poor IO space utilization. + +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations +by not allocating IO space if possible: + (1) - For empty PCI Express Root Ports/PCI Express Downstream ports. + (2) - If the device behind the PCI Express Root Ports/PCI Express + Downstream has no IO BARs. + +The IO space is very limited, 65536 byte-wide IO ports, but it's fragmented +resulting in ~10 PCI Express Root Ports (or PCI Express Downstream/Upstream ports) +ports per system if devices with IO BARs are used in the PCI Express hierarchy. + +Using the proposed device placing strategy solves this issue +by using only PCI Express devices within PCI Express hierarchy. + +The PCI Express spec requires the PCI Express devices to work without using IO. +The PCI hierarchy has no such limitations. + + +4. Bus numbers issues +====================== +Each PCI domain can have up to only 256 buses and the QEMU PCI Express +machines do not support multiple PCI domains even if extra Root +Complexes (pxb-pcie) are used. + +Each element of the PCI Express hierarchy (Root Complexes, +PCI Express Root Ports, PCI Express Downstream/Upstream ports) +takes up bus numbers. Since only one (multifunction) device +can be attached to a PCI Express Root Port or PCI Express Downstream +Port it is advised to plan in advance for the expected number of +devices to prevent bus numbers starvation. + + +5. Hot Plug +============ +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) +do not support hot-plug, so any devices plugged into Root Complexes +cannot be hot-plugged/hot-unplugged: + (1) PCI Express Integrated Devices + (2) PCI Express Root Ports + (3) DMI-PCI bridges + (4) pxb-pcie + +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot +be hot-plugged into DMI-PCI bridges. +The PCI hotplug is ACPI based and can work side by side with the +PCI Express native hotplug. + +PCI Express devices can be natively hot-plugged/hot-unplugged into/from +PCI Express Root Ports (and PCI Express Downstream Ports). + +5.1 Planning for hotplug: + (1) PCI hierarchy + Leave enough PCI-PCI bridge slots empty or add one + or more empty PCI-PCI bridges to the DMI-PCI bridge. + + For each such bridge the Guest Firmware is expected to reserve 4K IO + space and 2M MMIO range to be used for all devices behind it. + + Because of the hard IO limit of around 10 PCI bridges (~ 40K space) per system + don't use more than 9 bridges, leaving 4K for the Integrated devices + and none for the PCI Express Hierarchy. + + (2) PCI Express hierarchy: + Leave enough PCI Express Root Ports empty. Use multifunction + PCI Express Root Ports to prevent going out of PCI bus numbers. + Don't use PCI Express Switches if you don't have too, they use + an extra PCI bus that may handy to plug another device id it comes to it. + +5.3 Hot plug example: +Using HMP: (add -monitor stdio to QEMU command line) + device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id> + + +6. Device assignment +==================== +Host devices are mostly PCI Express and should be plugged only into +PCI Express Root Ports or PCI Express Downstream Ports. +PCI-PCI bridge slots can be used for legacy PCI host devices. + +6.1 How to detect if a device is PCI Express: + > lspci -s 03:00.0 -v (as root) + + 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) + Subsystem: Intel Corporation Dual Band Wireless-AC 7260 + Flags: bus master, fast devsel, latency 0, IRQ 50 + Memory at f0400000 (64-bit, non-prefetchable) [size=8K] + Capabilities: [c8] Power Management version 3 + Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ + Capabilities: [40] Express Endpoint, MSI 00 + + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + Capabilities: [100] Advanced Error Reporting + Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 + Capabilities: [14c] Latency Tolerance Reporting + Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 + + +7. Virtio devices +================= +Virtio devices plugged into the PCI hierarchy or as Integrated Devices +will remain PCI and have transitional behaviour as default. +Transitional virtio devices work in both IO and MMIO modes depending on +the guest support. + +Virtio devices plugged into PCI Express ports are PCI Express devices and +have "1.0" behavior by default without IO support. +In both case disable-* properties can be used to override the behaviour. + +Note that setting disable-legacy=off will enable legacy mode (enabling +legacy behavior) for PCI Express virtio devices causing them to +require IO space, which, given our PCI Express hierarchy, may quickly +lead to resource exhaustion, and is therefore strongly discouraged. + + +8. Conclusion +============== +The proposal offers a usage model that is easy to understand and follow +and in the same time overcomes the PCI Express architecture limitations. + -- 2.5.5