On 10/6/25 6:16 PM, [email protected] wrote:
> Dave Jiang wrote:
>> The following lockdep splat was observed while kernel auto-online a CXL
>> memory region:
>>
> 
> The entire spew with timestamps does not need to be saved in the git
> history I would trim this to:
> 
> ---
>  ======================================================
>  WARNING: possible circular locking dependency detected
>  6.17.0djtest+ #53 Tainted: G        W
>  ------------------------------------------------------
>  systemd-udevd/3334 is trying to acquire lock:
>  ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at: 
> hmem_register_resource+0x31/0x50
> 
>  but task is already holding lock:
>  ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at: 
> blocking_notifier_call_chain+0x2e/0x70
> 
>  which lock already depends on the new lock.
>  [..]
>  Chain exists of:
>    hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    rlock((node_chain).rwsem);
>                                 lock(mem_hotplug_lock);
>                                 lock((node_chain).rwsem);
>    lock(hmem_resource_lock);
> ---
> 
>> The lock ordering can cause potential deadlock. There are instances
>> where hmem_resource_lock is taken after (node_chain).rwsem, and vice
>> versa. Narrow the scope of hmem_resource_lock in hmem_register_resource()
>> to avoid the circular locking dependency. The locking is only needed when
>> hmem_active needs to be protected.
> 
> It is only strictly necessary for hmem_active, but it happened to be
> protecting theoretical concurrent callers of hmat_register_resource(). I
> do not think it can happen in practice, but it is called by both initial
> init and runtime notifier. The notifier path does:
> 
> hmat_callback() -> hmat_register_target()
> 
> That path seems impossible to add new hmem devices, but it is burning
> cycles walking through all the memory ranges associated with a target
> only to find that they are already registered. I think that can be
> cleaned up with an unlocked check of target->registered.
> 
> If that loses some theoretical race then your new
> hmem_request_resource() will pick a race winner for that target.
> 
> Otherwise, the code *looks* like it has a TOCTOU race with
> platform_initialized. So feels like some comments and cleanups to make
> that clear are needed.
> 
> Really I think hmat_callback() path should not be doing *any*
> registration work, only update work.

So are you saying that hmat_callback() should skip 
hmat_register_target_devices() when calling hmat_register_target()? hmat_init() 
calls hmat_register_targets() and hmem_init() also basically does something 
similar. So from that perspective, hmat_callback() shouldn't be finding 
something new. However if we drop the hmat_register_target() and a memory 
device gets hot-plugged (i.e. a new card gets inserted), do we lose something?

If we base the calling of hmat_register_target_devices() on target->registered, 
I don't think it removes the lockdep splat because the locking order is 
unchanged.

DJ

> 
>> Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to 
>> dax_hmem.ko")
>> Signed-off-by: Dave Jiang <[email protected]>
>> ---
>>  drivers/dax/hmem/device.c | 42 +++++++++++++++++++++++----------------
>>  1 file changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>> index f9e1a76a04a9..ab5977d75d1f 100644
>> --- a/drivers/dax/hmem/device.c
>> +++ b/drivers/dax/hmem/device.c
>> @@ -33,21 +33,37 @@ int walk_hmem_resources(struct device *host, 
>> walk_hmem_fn fn)
>>  }
>>  EXPORT_SYMBOL_GPL(walk_hmem_resources);
>>  
>> -static void __hmem_register_resource(int target_nid, struct resource *res)
>> +static struct resource *hmem_request_resource(int target_nid,
>> +                                          struct resource *res)
>>  {
>> -    struct platform_device *pdev;
>>      struct resource *new;
>> -    int rc;
>>  
>> -    new = __request_region(&hmem_active, res->start, resource_size(res), "",
>> -                           0);
>> +    guard(mutex)(&hmem_resource_lock);
>> +    new = __request_region(&hmem_active, res->start,
>> +                           resource_size(res), "", 0);
>>      if (!new) {
>>              pr_debug("hmem range %pr already active\n", res);
>> -            return;
>> +            return ERR_PTR(-ENOMEM);
> 
> Probably does not matter since noone consumes this code, but this is
> more -EBUSY than -ENOMEM.


Reply via email to