[+cc moubingquan, Ian, Koba, Krzysztof H, Koen, Pali, Artem,
Korneliusz, Alexander
+bcc vsd
in case any of you can verify that this fixes the issue you reported]

On Wed, Apr 22, 2026 at 04:13:43PM +0000, Krzysztof Wilczyński wrote:
> Hello,
> 
> This series converts every dynamically allocated PCI sysfs attribute to
> a static const definition.  After the full series, pci_sysfs_init() and
> sysfs_initialized are gone, and every sysfs file is created by the
> driver model at device_add() time.
> 
> Currently, the PCI resource files (resourceN, resourceN_wc) and the
> legacy bus files (legacy_io, legacy_mem) are created dynamically
> from two unsynchronised paths:
> 
> Path A: late_initcall
> 
>   pci_sysfs_init                        (late_initcall)
>     sysfs_initialized = 1
>     for_each_pci_dev
>       pci_create_sysfs_dev_files
>         sysfs_create_bin_file           (resourceN, resourceN_wc)
>     pci_find_next_bus
>       pci_create_legacy_files
>         sysfs_create_bin_file           (legacy_io, legacy_mem)
> 
> Path B: device registration / hotplug
> 
>   pci_bus_add_devices
>     pci_bus_add_device
>       pci_create_sysfs_dev_files
>         if (!sysfs_initialized) return  <- only guard
>         sysfs_create_bin_file           (resourceN, resourceN_wc)
> 
> On most ACPI systems this does not race because PCI enumeration
> completes at subsys_initcall time, before pci_sysfs_init() runs:
> 
>   subsys_initcall (level 4):
>     acpi_pci_root_add
>       pci_bus_add_device
>         pci_create_sysfs_dev_files
>           if (!sysfs_initialized)          <- not yet set
>             return -EACCES
> 
>   late_initcall (level 7):
>     pci_sysfs_init
>       sysfs_initialized = 1
>       for_each_pci_dev
>         pci_create_sysfs_dev_files         <- creates the files, no race
> 
> On Devicetree platforms the host controller is a platform driver that
> probes via the driver model, often on a workqueue, and overlaps with the
> late_initcall:
> 
>   CPU 0 (late_initcall)                CPU 1 (driver probe)
>   ---------------------------          ----------------------------
>   pci_sysfs_init()
>     sysfs_initialized = 1
>     for_each_pci_dev(pdev)             pci_bus_add_device(pdev)
>       pci_create_sysfs_dev_files()       pci_create_sysfs_dev_files()
>         sysfs_create_bin_file()            sysfs_create_bin_file()
>                                              -> "duplicate filename"
> 
> The same happens on ACPI when probing is asynchronous (hv_pci on
> Azure, RISC-V with ACPI).
> 
> The duplicate causes sysfs_create_bin_file() to fail with -EEXIST.
> pci_create_resource_files() then calls pci_remove_resource_files() in
> its error unwind, tearing down files the other thread created and
> still references through pdev->res_attr[].  This has caused kernel
> panics on i.MX6 and boot failures on other platforms.
> 
> Several different fixes have been proposed over the years: reordering
> the sysfs_initialized assignment, adding locks, checking
> pci_dev_is_added(), setting pdev->res_attr[] to NULL after kfree
> (which only prevents a double-free on the teardown path, not the
> error unwind removing the other thread's files).  None would address the
> root cause.
> 
> This has been reported a few times:
> 
>   - 
> https://lore.kernel.org/linux-pci/[email protected]/
>   - 
> https://lore.kernel.org/linux-pci/[email protected]/
>   - 
> https://lore.kernel.org/linux-pci/[email protected]/
>   - 
> https://lore.kernel.org/linux-pci/sy0p300mb04687548090b73e40af97d8897...@sy0p300mb0468.ausp300.prod.outlook.com/
>   - https://lore.kernel.org/linux-pci/20230105174736.GA1154719@bhelgaas/
>   - https://lore.kernel.org/linux-pci/[email protected]/
>   - https://lore.kernel.org/linux-pci/20200716110423.xtfyb3n6tn5ixedh@pali/
>   - 
> https://lore.kernel.org/linux-pci/[email protected]/
>   - https://bugzilla.kernel.org/show_bug.cgi?id=215515
>   - https://bugzilla.kernel.org/show_bug.cgi?id=216888

Seems like some or all of these should be mentioned in the relevant
patch as "Closes:" tags?

> With static attributes the driver model creates sysfs entries once per
> device at device_add() time, under the device lock, eliminating the
> late_initcall iteration and the race along with it.

Reply via email to