[+cc moubingquan, Ian, Koba, Krzysztof H, Koen, Pali, Artem, Korneliusz, Alexander +bcc vsd in case any of you can verify that this fixes the issue you reported]
On Wed, Apr 22, 2026 at 04:13:43PM +0000, Krzysztof Wilczyński wrote: > Hello, > > This series converts every dynamically allocated PCI sysfs attribute to > a static const definition. After the full series, pci_sysfs_init() and > sysfs_initialized are gone, and every sysfs file is created by the > driver model at device_add() time. > > Currently, the PCI resource files (resourceN, resourceN_wc) and the > legacy bus files (legacy_io, legacy_mem) are created dynamically > from two unsynchronised paths: > > Path A: late_initcall > > pci_sysfs_init (late_initcall) > sysfs_initialized = 1 > for_each_pci_dev > pci_create_sysfs_dev_files > sysfs_create_bin_file (resourceN, resourceN_wc) > pci_find_next_bus > pci_create_legacy_files > sysfs_create_bin_file (legacy_io, legacy_mem) > > Path B: device registration / hotplug > > pci_bus_add_devices > pci_bus_add_device > pci_create_sysfs_dev_files > if (!sysfs_initialized) return <- only guard > sysfs_create_bin_file (resourceN, resourceN_wc) > > On most ACPI systems this does not race because PCI enumeration > completes at subsys_initcall time, before pci_sysfs_init() runs: > > subsys_initcall (level 4): > acpi_pci_root_add > pci_bus_add_device > pci_create_sysfs_dev_files > if (!sysfs_initialized) <- not yet set > return -EACCES > > late_initcall (level 7): > pci_sysfs_init > sysfs_initialized = 1 > for_each_pci_dev > pci_create_sysfs_dev_files <- creates the files, no race > > On Devicetree platforms the host controller is a platform driver that > probes via the driver model, often on a workqueue, and overlaps with the > late_initcall: > > CPU 0 (late_initcall) CPU 1 (driver probe) > --------------------------- ---------------------------- > pci_sysfs_init() > sysfs_initialized = 1 > for_each_pci_dev(pdev) pci_bus_add_device(pdev) > pci_create_sysfs_dev_files() pci_create_sysfs_dev_files() > sysfs_create_bin_file() sysfs_create_bin_file() > -> "duplicate filename" > > The same happens on ACPI when probing is asynchronous (hv_pci on > Azure, RISC-V with ACPI). > > The duplicate causes sysfs_create_bin_file() to fail with -EEXIST. > pci_create_resource_files() then calls pci_remove_resource_files() in > its error unwind, tearing down files the other thread created and > still references through pdev->res_attr[]. This has caused kernel > panics on i.MX6 and boot failures on other platforms. > > Several different fixes have been proposed over the years: reordering > the sysfs_initialized assignment, adding locks, checking > pci_dev_is_added(), setting pdev->res_attr[] to NULL after kfree > (which only prevents a double-free on the teardown path, not the > error unwind removing the other thread's files). None would address the > root cause. > > This has been reported a few times: > > - > https://lore.kernel.org/linux-pci/[email protected]/ > - > https://lore.kernel.org/linux-pci/[email protected]/ > - > https://lore.kernel.org/linux-pci/[email protected]/ > - > https://lore.kernel.org/linux-pci/sy0p300mb04687548090b73e40af97d8897...@sy0p300mb0468.ausp300.prod.outlook.com/ > - https://lore.kernel.org/linux-pci/20230105174736.GA1154719@bhelgaas/ > - https://lore.kernel.org/linux-pci/[email protected]/ > - https://lore.kernel.org/linux-pci/20200716110423.xtfyb3n6tn5ixedh@pali/ > - > https://lore.kernel.org/linux-pci/[email protected]/ > - https://bugzilla.kernel.org/show_bug.cgi?id=215515 > - https://bugzilla.kernel.org/show_bug.cgi?id=216888 Seems like some or all of these should be mentioned in the relevant patch as "Closes:" tags? > With static attributes the driver model creates sysfs entries once per > device at device_add() time, under the device lock, eliminating the > late_initcall iteration and the race along with it.
