Dave Jiang wrote:
> The following lockdep splat was observed while kernel auto-online a CXL
> memory region:
> 

The entire spew with timestamps does not need to be saved in the git
history I would trim this to:

---
 ======================================================
 WARNING: possible circular locking dependency detected
 6.17.0djtest+ #53 Tainted: G        W
 ------------------------------------------------------
 systemd-udevd/3334 is trying to acquire lock:
 ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at: 
hmem_register_resource+0x31/0x50

 but task is already holding lock:
 ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at: 
blocking_notifier_call_chain+0x2e/0x70

 which lock already depends on the new lock.
 [..]
 Chain exists of:
   hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   rlock((node_chain).rwsem);
                                lock(mem_hotplug_lock);
                                lock((node_chain).rwsem);
   lock(hmem_resource_lock);
---

> The lock ordering can cause potential deadlock. There are instances
> where hmem_resource_lock is taken after (node_chain).rwsem, and vice
> versa. Narrow the scope of hmem_resource_lock in hmem_register_resource()
> to avoid the circular locking dependency. The locking is only needed when
> hmem_active needs to be protected.

It is only strictly necessary for hmem_active, but it happened to be
protecting theoretical concurrent callers of hmat_register_resource(). I
do not think it can happen in practice, but it is called by both initial
init and runtime notifier. The notifier path does:

hmat_callback() -> hmat_register_target()

That path seems impossible to add new hmem devices, but it is burning
cycles walking through all the memory ranges associated with a target
only to find that they are already registered. I think that can be
cleaned up with an unlocked check of target->registered.

If that loses some theoretical race then your new
hmem_request_resource() will pick a race winner for that target.

Otherwise, the code *looks* like it has a TOCTOU race with
platform_initialized. So feels like some comments and cleanups to make
that clear are needed.

Really I think hmat_callback() path should not be doing *any*
registration work, only update work.

> Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to dax_hmem.ko")
> Signed-off-by: Dave Jiang <[email protected]>
> ---
>  drivers/dax/hmem/device.c | 42 +++++++++++++++++++++++----------------
>  1 file changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index f9e1a76a04a9..ab5977d75d1f 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -33,21 +33,37 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn 
> fn)
>  }
>  EXPORT_SYMBOL_GPL(walk_hmem_resources);
>  
> -static void __hmem_register_resource(int target_nid, struct resource *res)
> +static struct resource *hmem_request_resource(int target_nid,
> +                                           struct resource *res)
>  {
> -     struct platform_device *pdev;
>       struct resource *new;
> -     int rc;
>  
> -     new = __request_region(&hmem_active, res->start, resource_size(res), "",
> -                            0);
> +     guard(mutex)(&hmem_resource_lock);
> +     new = __request_region(&hmem_active, res->start,
> +                            resource_size(res), "", 0);
>       if (!new) {
>               pr_debug("hmem range %pr already active\n", res);
> -             return;
> +             return ERR_PTR(-ENOMEM);

Probably does not matter since noone consumes this code, but this is
more -EBUSY than -ENOMEM.

Reply via email to