This series is based on top of my previously posted series which reworks how devices are added to their IOMMU groups. The two series are largely orthogonal to each other, but they both touch pnv_pci_ioda_dma_dev_setup() so there's a minor merge conflict if they aren't applied together. I can fix that if people think it's important, but applying them together is probably easisest for everyone.
Base series: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=168715 With that out of the way, what the bulk of the changes in here are in 2/4 which moves the point where we do the HW configuration to allow a bus to be used. Currently it's done when we setup the parent bridge for that bus and we're moving it to be done when we add the first device to that bus. For an example of why this change is necssary this is what happens on the current linux-next master. This has one extra patch applied to print an error when pci_enable_device() is blocked by the platform since it helps highlight the issue: /sys/devices/pci0022:00/0022:00:00.0 # echo 1 > 0022\:01\:00.0/remove e1000e 0022:01:00.0 enP34p1s0: removed PHC e1000e 0022:01:00.0 enP34p1s0: NIC Link is Down pci 0022:01:00.0: Removing from iommu group 11 At this point the bus 0022:01 is empty. /sys/devices/pci0022:00/0022:00:00.0 # echo 1 > rescan pci 0022:01:00.0: [8086:10d3] type 00 class 0x020000 pci 0022:01:00.0: reg 0x10: [mem 0x3fe9000c0000-0x3fe9000dffff] pci 0022:01:00.0: reg 0x14: [mem 0x3fe900000000-0x3fe90007ffff] pci 0022:01:00.0: reg 0x18: [io 0x0000-0x001f] pci 0022:01:00.0: reg 0x1c: [mem 0x3fe9000e0000-0x3fe9000e3fff] pci 0022:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] pci 0022:01:00.0: BAR3 [mem size 0x00004000]: requesting alignment to 0x10000 pci 0022:01:00.0: PME# supported from D0 D3hot D3cold pci 0022:00:00.0: BAR 13: no space for [io size 0x1000] pci 0022:00:00.0: BAR 13: failed to assign [io size 0x1000] pci 0022:01:00.0: BAR 1: assigned [mem 0x3fe900000000-0x3fe90007ffff] pci 0022:01:00.0: BAR 6: assigned [mem 0x3fe900080000-0x3fe9000bffff pref] pci 0022:01:00.0: BAR 0: assigned [mem 0x3fe9000c0000-0x3fe9000dffff] pci 0022:01:00.0: BAR 3: assigned [mem 0x3fe9000e0000-0x3fe9000e3fff] pci 0022:01:00.0: BAR 2: no space for [io size 0x0020] pci 0022:01:00.0: BAR 2: failed to assign [io size 0x0020] e1000e 0022:01:00.0: pci_enable_device() blocked, no PE assigned. e1000e: probe of 0022:01:00.0 failed with error -22 So on rescan we can re-discover the device, but the driver probe will always fail at the point where the driver attemps to enable the device because the PE was deconfigured. Repeating this same experiment with this series (and dependency) applied: /sys/devices/pci0022:00/0022:00:00.0 # echo 1 > rescan pci 0022:01:00.0: [8086:10d3] type 00 class 0x020000 pci 0022:01:00.0: reg 0x10: [mem 0x3fe9000c0000-0x3fe9000dffff] pci 0022:01:00.0: reg 0x14: [mem 0x3fe900000000-0x3fe90007ffff] pci 0022:01:00.0: reg 0x18: [io 0x0000-0x001f] pci 0022:01:00.0: reg 0x1c: [mem 0x3fe9000e0000-0x3fe9000e3fff] pci 0022:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] pci 0022:01:00.0: BAR3 [mem size 0x00004000]: requesting alignment to 0x10000 pci 0022:01:00.0: PME# supported from D0 D3hot D3cold pci 0022:00:00.0: BAR 13: no space for [io size 0x1000] pci 0022:00:00.0: BAR 13: failed to assign [io size 0x1000] pci 0022:01:00.0: BAR 1: assigned [mem 0x3fe900000000-0x3fe90007ffff] pci 0022:01:00.0: BAR 6: assigned [mem 0x3fe900080000-0x3fe9000bffff pref] pci 0022:01:00.0: BAR 0: assigned [mem 0x3fe9000c0000-0x3fe9000dffff] pci 0022:01:00.0: BAR 3: assigned [mem 0x3fe9000e0000-0x3fe9000e3fff] pci 0022:01:00.0: BAR 2: no space for [io size 0x0020] pci 0022:01:00.0: BAR 2: failed to assign [io size 0x0020] pci_bus 0022:01: Configuring PE for bus pci 0022:01 : [PE# fd] Secondary bus 0x0000000000000001 associated with PE#fd pci 0022:01 : [PE# fd] Setting up 32-bit TCE table at 0..80000000 pci 0022:01 : [PE# fd] Setting up window#0 0..7fffffffff pg=10000 pci 0022:01 : [PE# fd] Enabling 64-bit DMA bypass pci 0022:01:00.0: Configured PE#fd pci 0022:01:00.0: Adding to iommu group 12 e1000e 0022:01:00.0: enabling device (0140 -> 0142) e1000e 0022:01:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode e1000e 0022:01:00.0 0022:01:00.0 (uninitialized): registered PHC clock e1000e 0022:01:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 68:05:ca:37:9c:d7 e1000e 0022:01:00.0 enP34p1s0: renamed from eth0 e1000e 0022:01:00.0 enP34p1s0: Intel(R) PRO/1000 Network Connection e1000e 0022:01:00.0 enP34p1s0: MAC: 3, PHY: 8, PBA No: E46981-008 /sys/devices/pci0022:00/0022:00:00.0 # Now, when the rescan happens we notice the PE was deconfigured after removing the device and re-configure it. This allows the device to be enabled and everything works. Probably. Making this change also lays the groundwork for allowing devices to be added to a bus PE as they're enabled rather than mapping all 256 devfns on a bus to the PE in one go. This is going to be necessary for supporting the native PCIe hotplug driver (rather than pnv_php) since currently scanning an empty slot causes spurious PE freezes. Keeping inactive devices mapped to the reserved PE would prevent that from occuring. It might also be useful for (ab)using PEs to provide per-device IOMMU contexts rather than per-bus. A per-device context would also be necessary for allowing individual functions of a device to be passed through to guests rather than requiring all of them to be passed as a group. Oliver