Hello Bjorn,
On 9/28/19 12:59 AM, Bjorn Helgaas wrote:
On Fri, Aug 16, 2019 at 07:50:39PM +0300, Sergey Miroshnichenko wrote:
This is a yet another approach to fix an old [1-2] concurrency issue, when:
- two or more devices are being hot-added into a bridge which was
initially empty;
- a bridge with two or more devices is being hot-added;
- during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.
The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:
CPU0 CPU1
pci_enable_device_mem() pci_enable_device_mem()
pci_enable_bridge() pci_enable_bridge()
pci_is_enabled()
return false;
atomic_inc_return(enable_cnt)
Start actual enabling the bridge
... pci_is_enabled()
... return true;
... Start memory requests <-- FAIL
...
Set the PCI_COMMAND_MEMORY bit <-- Must wait for this
Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.
This isn't directly related to the movable BARs functionality; is it
here because you see the problem more frequently when moving BARs?
First two patches of this series (including this one) are fixes for
the boot and for the hotplug, not related to movable BARs.
Before these fixes, we were suffering from this issue on PowerNV until
commit db2173198b9513f7add8009f225afa1f1c79bcc6 "powerpc/powernv/pci:
Work around races in PCI bridge enabling" was backported to distros:
NVMEs randomly failed to start during system boot. So we've tested the
fixes with that commit reverted.
On x86 the BIOS does pre-enable the bridges, but they were still prone
to races when hot-added or was initially "empty".
Serge