New systems with Thunderbolt are starting to use native PCI enumeration.
Mika Westerberg's patch "PCI: Distribute available resources to hotplug
capable PCIe downstream ports"
(https://patchwork.kernel.org/patch/9972155/) adds code to expand
downstream PCI hotplug bridges to consume all remaining resource space
in the parent bridge. It is a crucial patch for supporting Thunderbolt
native enumeration on Linux.

However, it does not consider bridge alignment in all cases, which rules
out hot-adding an external graphics processor at the end of a
Thunderbolt daisy chain. Hot-adding such a device will likely fail to
work with the existing code. It also might disrupt the operation of
existing devices or prevent the subsequent hot-plugging of lower aligned
devices if the kernel frees and reallocates upstream resources to
attempt assign the resources that failed to assign in the first pass. By
Intel's ruling, Thunderbolt external graphics processors are generally
meant to function only as the first and only device in the chain.
However, there is no technical reason that prevents it from being
possible if sufficient resources are available, and people are likely to
attempt such configurations with Thunderbolt devices if they own such
hardware. Hence, I argue that we should improve the user experience and
reduce the number of constraints placed on the user by always
considering resource alignment, which will make such configurations
possible.

The other problem with the patch is that it is incompatible with
resources allocated by "pci=hpmemsize=nnM" due to a check which does not
allow for dev_res->add_size to be reduced. This check also makes a big
assumption that the hpmemsize is small but non-zero, and no action has
been taken to ensure that. In the current state, any bridge smaller than
hpmemsize will likely fail to size correctly, which will cause major
issues if the default were to change, or if the user also wants to
configure non-Thunderbolt hotplug bridges simultaneously. I argue that
if assumptions and limitations can be removed with no risks and adverse
effects, then it should be done.

The former problem is solved by rewriting the
pci_bus_distribute_available_resources() function with more information
passed in the arguments, eliminating assumptions about the initial
bridge alignment. My patch makes no assumptions about the alignment of
hot-added bridges, allowing for any device to be hot-added, given
sufficient resources are available.

The latter problem is solved by removing the check preventing the
shrinking of dev_res->add_size, which allows for the distribution of
resources if hpmemsize is non-zero. It can be made to work with zero
hpmemsize with two-line patches in pbus_size_mem() and pbus_size_io(),
or by modifying extend_bridge_window() to add a new entry to the
additional resource list if one does not exist. In theory, and by my
testing, the removal of this check does not affect the functionality of
the resource distribution with firmware-allocated resources. But it does
enable the same functionality when using pci=hpmemsize=nnM, which was
not possible prior. This may be one piece of the puzzle needed to
support Thunderbolt add-in cards that support native PCI enumeration,
without any platform dependencies.

I have tested this proposed patch extensively. Using Linux-allocated
resources, it works perfectly. I have two Gigabyte GC-TITAN RIDGE
Thunderbolt 3 add-in cards in my desktop, and a Dell XPS 9370 with the
Dell XPS 9380 Thunderbolt NVM40 firmware flashed. My peripherals are
three HP ZBook Thunderbolt 3 docks, two external graphics enclosures
with AMD Fiji XT in both, a Promise SANLink3 N1 (AQC107S), and a Dell
Thunderbolt 3 NVMe SSD. All configurations of these devices worked well,
and I can no longer make it fail if I try to. My testing with
firmware-allocated resources is limited due to using computers with
Alpine Ridge BIOS support. However, I did get manage to test the patch
with firmware-allocated resources by enabling the Alpine Ridge BIOS
support and forcing pcie_ports=native, and the results were perfect.

Mika Westerberg has agreed to test this on an official platform with
native enumeration firmware support to be sure that it works in this
situation. It is also appropriate that he reviews these changes as he
wrote the original code and any changes made to Thunderbolt-critical
code cannot be allowed to break any of the base requirements for
Thunderbolt. I would like to thank him for putting up with my emails and
trying to help where he can, and for the great work he has done on
Thunderbolt in Linux.

I have more patches in the pipeline to further improve the Thunderbolt
experience on Linux on platforms without BIOS support. This is the most
technical but least user-facing one planned. The most exciting changes
are yet to come.

Edits:

I have made code styling changes as suggested by Mika Westerberg.

I have been testing Thunderbolt devices with my other host card which
happens to be in SL0 mode. This means that devices are discovered much
more quickly. I noticed that multiple devices can be enumerated
together, rather than each getting enumerated before the next appears.
It turns out that this can break the allocation, but I have absolutely
no idea why. I have modified the patch to solve this problem. Before,
extend_bridge_window() used add_size to change the resource size. Now it
simply changes the size of the actual resource, and clears the add_size
to zero if a list entry exists. That solves the issue, and proves that
the calculated resource sizes are not at fault (the algorithm is sound).
Hence, I recommend this version with the modification be applied.

I have removed Mika Westerberg's "Tested-By" line to allow him to give
his approval for this new change.

Observation: the fact that a single Thunderbolt dock can consume 1/4 of
the total IO space (16-bit, 0xffff) is evidence that the depreciated IO
bars need to be dropped from the kernel at some point. The docks have
four bridges each, with 4096-byte alignment. The IO BARs do not do
anything, crash the system if not claimed and spam dmesg when not
assigned.

Signed-off-by: Nicholas-Johnson-opensource 
<nicholas.johnson-opensou...@outlook.com.au>
---
 drivers/pci/setup-bus.c | 188 +++++++++++++++++++++-------------------
 1 file changed, 99 insertions(+), 89 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ed960436df5e..09310b6fcdb3 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1859,27 +1859,34 @@ static void extend_bridge_window(struct pci_dev 
*bridge, struct resource *res,
        if (res->parent)
                return;
 
-       if (resource_size(res) >= available)
-               return;
+       /*
+        * Hot-adding multiple Thunderbolt devices in SL0 might result in
+        * multiple devices being enumerated together. This can break the
+        * resource allocation if the resource sizes are specified with
+        * add_size instead of simply changing the resource size.
+       */
+       pci_dbg(bridge, "bridge window %pR extended by 0x%016llx\n", res,
+               available - resource_size(res));
+       res->end = res->start + available - 1;
 
+       /*
+        * If a list entry exists, we need to remove any additional size
+        * requested because that could interfere with the alignment and
+        * sizing done when distributing resources, causing resources to
+        * fail to allocate later on.
+        */
        dev_res = res_to_dev_res(add_list, res);
        if (!dev_res)
                return;
 
-       /* Is there room to extend the window? */
-       if (available - resource_size(res) <= dev_res->add_size)
-               return;
-
-       dev_res->add_size = available - resource_size(res);
-       pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
-               &dev_res->add_size);
+       dev_res->add_size = 0;
 }
 
 static void pci_bus_distribute_available_resources(struct pci_bus *bus,
-       struct list_head *add_list, resource_size_t available_io,
-       resource_size_t available_mmio, resource_size_t available_mmio_pref)
+       struct list_head *add_list, struct resource io,
+       struct resource mmio, struct resource mmio_pref)
 {
-       resource_size_t remaining_io, remaining_mmio, remaining_mmio_pref;
+       resource_size_t io_per_hp, mmio_per_hp, mmio_pref_per_hp, align;
        unsigned int normal_bridges = 0, hotplug_bridges = 0;
        struct resource *io_res, *mmio_res, *mmio_pref_res;
        struct pci_dev *dev, *bridge = bus->self;
@@ -1889,25 +1896,32 @@ static void 
pci_bus_distribute_available_resources(struct pci_bus *bus,
        mmio_pref_res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
 
        /*
-        * Update additional resource list (add_list) to fill all the
-        * extra resource space available for this port except the space
-        * calculated in __pci_bus_size_bridges() which covers all the
-        * devices currently connected to the port and below.
+        * The alignment of this bridge is yet to be considered, hence it must
+        * be done now before extending its bridge window. A single bridge
+        * might not be able to occupy the whole parent region if the alignment
+        * differs - for example, an external GPU at the end of a Thunderbolt
+        * daisy chain.
         */
-       extend_bridge_window(bridge, io_res, add_list, available_io);
-       extend_bridge_window(bridge, mmio_res, add_list, available_mmio);
-       extend_bridge_window(bridge, mmio_pref_res, add_list,
-                            available_mmio_pref);
+       align = pci_resource_alignment(bridge, io_res);
+       if (!io_res->parent && align)
+               io.start = ALIGN(io.start, align);
+
+       align = pci_resource_alignment(bridge, mmio_res);
+       if (!mmio_res->parent && align)
+               mmio.start = ALIGN(mmio.start, align);
+
+       align = pci_resource_alignment(bridge, mmio_pref_res);
+       if (!mmio_pref_res->parent && align)
+               mmio_pref.start = ALIGN(mmio_pref.start, align);
 
        /*
-        * Calculate the total amount of extra resource space we can
-        * pass to bridges below this one. This is basically the
-        * extra space reduced by the minimal required space for the
-        * non-hotplug bridges.
+        * Update the resources to fill as much remaining resource space in the
+        * parent bridge as possible, while considering alignment.
         */
-       remaining_io = available_io;
-       remaining_mmio = available_mmio;
-       remaining_mmio_pref = available_mmio_pref;
+       extend_bridge_window(bridge, io_res, add_list, resource_size(&io));
+       extend_bridge_window(bridge, mmio_res, add_list, resource_size(&mmio));
+       extend_bridge_window(bridge, mmio_pref_res, add_list,
+               resource_size(&mmio_pref));
 
        /*
         * Calculate how many hotplug bridges and normal bridges there
@@ -1921,80 +1935,79 @@ static void 
pci_bus_distribute_available_resources(struct pci_bus *bus,
                        normal_bridges++;
        }
 
+       /*
+        * There is only one bridge on the bus so it gets all possible
+        * resources which it can then distribute to the possible
+        * hotplug bridges below.
+        */
+       if (hotplug_bridges + normal_bridges == 1) {
+               dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
+               if (dev->subordinate)
+                       pci_bus_distribute_available_resources(dev->subordinate,
+                               add_list, io, mmio, mmio_pref);
+               return;
+       }
+
+       /*
+        * Reduce the available resource space by what the
+        * bridge and devices below it occupy.
+        */
        for_each_pci_bridge(dev, bus) {
-               const struct resource *res;
+               struct resource *res;
+               resource_size_t used_size;
 
                if (dev->is_hotplug_bridge)
                        continue;
 
-               /*
-                * Reduce the available resource space by what the
-                * bridge and devices below it occupy.
-                */
                res = &dev->resource[PCI_BRIDGE_RESOURCES + 0];
-               if (!res->parent && available_io > resource_size(res))
-                       remaining_io -= resource_size(res);
+               align = pci_resource_alignment(dev, res);
+               align = align ? ALIGN(io.start, align) - io.start : 0;
+               used_size = align + resource_size(res);
+               if (!res->parent && used_size <= resource_size(&io))
+                       io.start += used_size;
 
                res = &dev->resource[PCI_BRIDGE_RESOURCES + 1];
-               if (!res->parent && available_mmio > resource_size(res))
-                       remaining_mmio -= resource_size(res);
+               align = pci_resource_alignment(dev, res);
+               align = align ? ALIGN(mmio.start, align) - mmio.start : 0;
+               used_size = align + resource_size(res);
+               if (!res->parent && used_size <= resource_size(&mmio))
+                       mmio.start += used_size;
 
                res = &dev->resource[PCI_BRIDGE_RESOURCES + 2];
-               if (!res->parent && available_mmio_pref > resource_size(res))
-                       remaining_mmio_pref -= resource_size(res);
+               align = pci_resource_alignment(dev, res);
+               align = align ? ALIGN(mmio_pref.start, align) -
+                               mmio_pref.start : 0;
+               used_size = align + resource_size(res);
+               if (!res->parent && used_size <= resource_size(&mmio_pref))
+                       mmio_pref.start += used_size;
        }
 
-       /*
-        * There is only one bridge on the bus so it gets all available
-        * resources which it can then distribute to the possible
-        * hotplug bridges below.
-        */
-       if (hotplug_bridges + normal_bridges == 1) {
-               dev = list_first_entry(&bus->devices, struct pci_dev, bus_list);
-               if (dev->subordinate) {
-                       pci_bus_distribute_available_resources(dev->subordinate,
-                               add_list, available_io, available_mmio,
-                               available_mmio_pref);
-               }
+       if (!hotplug_bridges)
                return;
-       }
 
        /*
-        * Go over devices on this bus and distribute the remaining
-        * resource space between hotplug bridges.
+        * Distribute any remaining resources equally between
+        * the hotplug-capable downstream ports.
         */
-       for_each_pci_bridge(dev, bus) {
-               resource_size_t align, io, mmio, mmio_pref;
-               struct pci_bus *b;
+       io_per_hp = div64_ul(resource_size(&io), hotplug_bridges);
+       mmio_per_hp = div64_ul(resource_size(&mmio), hotplug_bridges);
+       mmio_pref_per_hp = div64_ul(resource_size(&mmio_pref),
+               hotplug_bridges);
 
-               b = dev->subordinate;
-               if (!b || !dev->is_hotplug_bridge)
+       for_each_pci_bridge(dev, bus) {
+               if (!dev->subordinate || !dev->is_hotplug_bridge)
                        continue;
 
-               /*
-                * Distribute available extra resources equally between
-                * hotplug-capable downstream ports taking alignment into
-                * account.
-                *
-                * Here hotplug_bridges is always != 0.
-                */
-               align = pci_resource_alignment(bridge, io_res);
-               io = div64_ul(available_io, hotplug_bridges);
-               io = min(ALIGN(io, align), remaining_io);
-               remaining_io -= io;
+               io.end = io.start + io_per_hp - 1;
+               mmio.end = mmio.start + mmio_per_hp - 1;
+               mmio_pref.end = mmio_pref.start + mmio_pref_per_hp - 1;
 
-               align = pci_resource_alignment(bridge, mmio_res);
-               mmio = div64_ul(available_mmio, hotplug_bridges);
-               mmio = min(ALIGN(mmio, align), remaining_mmio);
-               remaining_mmio -= mmio;
+               pci_bus_distribute_available_resources(dev->subordinate,
+                       add_list, io, mmio, mmio_pref);
 
-               align = pci_resource_alignment(bridge, mmio_pref_res);
-               mmio_pref = div64_ul(available_mmio_pref, hotplug_bridges);
-               mmio_pref = min(ALIGN(mmio_pref, align), remaining_mmio_pref);
-               remaining_mmio_pref -= mmio_pref;
-
-               pci_bus_distribute_available_resources(b, add_list, io, mmio,
-                                                      mmio_pref);
+               io.start = io.end + 1;
+               mmio.start = mmio.end + 1;
+               mmio_pref.start = mmio_pref.end + 1;
        }
 }
 
@@ -2002,22 +2015,19 @@ static void
 pci_bridge_distribute_available_resources(struct pci_dev *bridge,
                                          struct list_head *add_list)
 {
-       resource_size_t available_io, available_mmio, available_mmio_pref;
-       const struct resource *res;
+       struct resource io_res, mmio_res, mmio_pref_res;
 
        if (!bridge->is_hotplug_bridge)
                return;
 
+       io_res = bridge->resource[PCI_BRIDGE_RESOURCES + 0];
+       mmio_res = bridge->resource[PCI_BRIDGE_RESOURCES + 1];
+       mmio_pref_res = bridge->resource[PCI_BRIDGE_RESOURCES + 2];
+
        /* Take the initial extra resources from the hotplug port */
-       res = &bridge->resource[PCI_BRIDGE_RESOURCES + 0];
-       available_io = resource_size(res);
-       res = &bridge->resource[PCI_BRIDGE_RESOURCES + 1];
-       available_mmio = resource_size(res);
-       res = &bridge->resource[PCI_BRIDGE_RESOURCES + 2];
-       available_mmio_pref = resource_size(res);
 
        pci_bus_distribute_available_resources(bridge->subordinate,
-               add_list, available_io, available_mmio, available_mmio_pref);
+               add_list, io_res, mmio_res, mmio_pref_res);
 }
 
 void pci_assign_unassigned_bridge_resources(struct pci_dev *bridge)
-- 
2.19.1

Reply via email to