> -----Original Message-----
> From: Dexuan Cui <de...@microsoft.com>
> Sent: Thursday, April 18, 2024 9:53 PM
> To: bhelg...@google.com; wei....@kernel.org; KY Srinivasan
> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> lpieral...@kernel.org; linux-...@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org; linux-ker...@vger.kernel.org; Boqun
> Feng <boqun.f...@microsoft.com>; Sunil Muthuswamy
> <sunil...@microsoft.com>; Saurabh Singh Sengar <ssen...@microsoft.com>;
> Dexuan Cui <de...@microsoft.com>
> Subject: [PATCH] PCI: Add a mutex to protect the global list
> pci_domain_busn_res_list
> 
> There has been an effort to make the pci-hyperv driver support
> async-probing to reduce the boot time. With async-probing, multiple
> kernel threads can be running hv_pci_probe() -> create_root_hv_pci_bus()
> ->
> pci_scan_root_bus_bridge() -> pci_bus_insert_busn_res() at the same time
> to
> update the global list, causing list corruption.
> 
> Add a mutex to protect the list.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> ---
>  drivers/pci/probe.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index e19b79821dd6..1327fd820b24 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -37,6 +37,7 @@ LIST_HEAD(pci_root_buses);
>  EXPORT_SYMBOL(pci_root_buses);
> 
>  static LIST_HEAD(pci_domain_busn_res_list);
> +static DEFINE_MUTEX(pci_domain_busn_res_list_lock);
> 
>  struct pci_domain_busn_res {
>       struct list_head list;
> @@ -47,14 +48,22 @@ struct pci_domain_busn_res {
>  static struct resource *get_pci_domain_busn_res(int domain_nr)
>  {
>       struct pci_domain_busn_res *r;
> +     struct resource *ret;
> 
> -     list_for_each_entry(r, &pci_domain_busn_res_list, list)
> -             if (r->domain_nr == domain_nr)
> -                     return &r->res;
> +     mutex_lock(&pci_domain_busn_res_list_lock);
> +
> +     list_for_each_entry(r, &pci_domain_busn_res_list, list) {
> +             if (r->domain_nr == domain_nr) {
> +                     ret = &r->res;
> +                     goto out;
> +             }
> +     }
> 
>       r = kzalloc(sizeof(*r), GFP_KERNEL);
> -     if (!r)
> -             return NULL;
> +     if (!r) {
> +             ret = NULL;
> +             goto out;
> +     }
> 
>       r->domain_nr = domain_nr;
>       r->res.start = 0;
> @@ -62,8 +71,10 @@ static struct resource *get_pci_domain_busn_res(int
> domain_nr)
>       r->res.flags = IORESOURCE_BUS | IORESOURCE_PCI_FIXED;
> 
>       list_add_tail(&r->list, &pci_domain_busn_res_list);
> -
> -     return &r->res;
> +     ret = &r->res;
> +out:
> +     mutex_unlock(&pci_domain_busn_res_list_lock);
> +     return ret;
>  }

The patch is for common pci code. So, this bug has been there for a while?
Do you have a sample stack trace of the crash?

I checked pci-hyperv, it doesn't define the .driver.probe_type, so 
PROBE_DEFAULT_STRATEGY is in effect. driver_allows_async_probing() returns 
false unless kernel/mod param requests async. So async probing haven't 
been practiced here.

If in the future, we change the pci-hyperv's probe_type to 
PROBE_PREFER_ASYNCHRONOUS, 
how does it affect the underlying PCI device's probes within the same 
device type?
For example, MANA driver doesn't set probe_type. Will pci-hyperv's async 
probing cause async probing or potentially nondeterministic naming for 
MANA devices?

Thanks,
- Haiyang


Reply via email to