Hi Srinath,

Thanks for chasing this down!  It must have been a real hassle to find
this.

On Mon, May 08, 2017 at 08:39:50PM +0530, Srinath Mannam wrote:
> We found a concurrency issue in NVMe Init when we initialize
> multiple NVMe connected over PCIe switch.
> 
> Setup details:
>  - SMP system with 8 ARMv8 cores running Linux kernel(4.11).
>  - Two NVMe cards are connected to PCIe RC through bridge as shown
>    in the below figure.
> 
>                    [RC]
>                     |
>                  [BRIDGE]
>                     |
>                -----------
>               |           |
>             [NVMe]      [NVMe]
> 
> Issue description:
> After PCIe enumeration completed NVMe driver probe function called
> for both the devices from two CPUS simultaneously.
> From nvme_probe, pci_enable_device_mem called for both the EPs. This
> function called pci_enable_bridge enable recursively until RC.
> 
> Inside pci_enable_bridge function, at two places concurrency issue is
> observed.
> 
> Place 1:
>   CPU 0:
>     1. Done Atomic increment dev->enable_cnt
>        in pci_enable_device_flags
>     2. Inside pci_enable_resources
>     3. Completed pci_read_config_word(dev, PCI_COMMAND, &cmd)
>     4. Ready to set PCI_COMMAND_MEMORY (0x2) in
>        pci_write_config_word(dev, PCI_COMMAND, cmd)
>   CPU 1:
>     1. Check pci_is_enabled in function pci_enable_bridge
>        and it is true
>     2. Check (!dev->is_busmaster) also true
>     3. Gone into pci_set_master
>     4. Completed pci_read_config_word(dev, PCI_COMMAND, &old_cmd)
>     5. Ready to set PCI_COMMAND_MASTER (0x4) in
>        pci_write_config_word(dev, PCI_COMMAND, cmd)
> 
> By the time of last point for both the CPUs are read value 0 and
> ready to write 2 and 4.
> After last point final value in PCI_COMMAND register is 4 instead of 6.
> 
> Place 2:
>   CPU 0:
>     1. Done Atomic increment dev->enable_cnt in
>        pci_enable_device_flags
>     2. Inside pci_enable_resources
>     3. Completed pci_read_config_word(dev, PCI_COMMAND, &cmd)
>     4. Ready to set PCI_COMMAND_MEMORY (0x2) in
>        pci_write_config_word(dev, PCI_COMMAND, cmd);
>   CPU 1:
>     1. Done Atomic increment dev->enable_cnt in function
>        pci_enable_device_flag fail return 0 from there
>     2. Gone into pci_set_master
>     3. Completed pci_read_config_word(dev, PCI_COMMAND, &old_cmd)
>     4. Ready to set PCI_COMMAND_MASTER (0x4) in
>        pci_write_config_word(dev, PCI_COMMAND, cmd)
> 
> By the time of last point for both the CPUs are read value 0 and
> ready to write 2 and 4.
> After last point final value in PCI_COMMAND register is 4 instead of 6.

Most places that write PCI_COMMAND do this sort of read/modify/write
thing.  But in most of those places we only touch one device, and it's
the device we own, e.g., the PCI core enumerating it or a driver.
I don't think we really have a concurrency problem with those.

But in this case, where a driver called pci_enable_device() for device
A, and we're touching an upstream bridge B, we *do* have an issue.

Since it's a concurrency issue, the obvious solution is a mutex.  I'm
sure this patch solves it, but it's not obvious to me how it works and
I'd be afraid of breaking it in the future.

Would it work to add a mutex in the struct pci_host_bridge, then hold
it while we enable upstream bridges, e.g., something like the
following?

  bridge = pci_upstream_bridge(dev);
  if (bridge) {
    struct mutex = &pci_find_host_bridge(dev)->mutex;

    mutex_lock(mutex);
    pci_enable_bridge(bridge);
    mutex_unlock(mutex);
  }

Bjorn

> Signed-off-by: Srinath Mannam <[email protected]>
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 7904d02..6c5744e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1345,21 +1345,39 @@ static void pci_enable_bridge(struct pci_dev *dev)
>  {
>       struct pci_dev *bridge;
>       int retval;
> +     int err;
> +     int i;
> +     unsigned int bars = 0;
> +     unsigned long flags = IORESOURCE_MEM | IORESOURCE_IO;
>  
>       bridge = pci_upstream_bridge(dev);
>       if (bridge)
>               pci_enable_bridge(bridge);
>  
> -     if (pci_is_enabled(dev)) {
> -             if (!dev->is_busmaster)
> -                     pci_set_master(dev);
> +     if (dev->pm_cap) {
> +             u16 pmcsr;
> +
> +             pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> +             dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
> +     }
> +
> +     if (atomic_inc_return(&dev->enable_cnt) > 1)
> +             return;         /* already enabled */
> +
> +     /* only skip sriov related */
> +     for (i = 0; i <= PCI_ROM_RESOURCE; i++)
> +             if (dev->resource[i].flags & flags)
> +                     bars |= (1 << i);
> +     for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++)
> +             if (dev->resource[i].flags & flags)
> +                     bars |= (1 << i);
> +
> +     err = do_pci_enable_device(dev, bars);
> +     if (err < 0) {
> +             atomic_dec(&dev->enable_cnt);
>               return;
>       }
>  
> -     retval = pci_enable_device(dev);
> -     if (retval)
> -             dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n",
> -                     retval);
>       pci_set_master(dev);
>  }
>  
> -- 
> 2.7.4
> 

Reply via email to