On 10/6/25 6:16 PM, [email protected] wrote:
> Dave Jiang wrote:
>> The following lockdep splat was observed while kernel auto-online a CXL
>> memory region:
>>
>
> The entire spew with timestamps does not need to be saved in the git
> history I would trim this to:
>
> ---
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.17.0djtest+ #53 Tainted: G W
> ------------------------------------------------------
> systemd-udevd/3334 is trying to acquire lock:
> ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at:
> hmem_register_resource+0x31/0x50
>
> but task is already holding lock:
> ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at:
> blocking_notifier_call_chain+0x2e/0x70
>
> which lock already depends on the new lock.
> [..]
> Chain exists of:
> hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> rlock((node_chain).rwsem);
> lock(mem_hotplug_lock);
> lock((node_chain).rwsem);
> lock(hmem_resource_lock);
> ---
>
>> The lock ordering can cause potential deadlock. There are instances
>> where hmem_resource_lock is taken after (node_chain).rwsem, and vice
>> versa. Narrow the scope of hmem_resource_lock in hmem_register_resource()
>> to avoid the circular locking dependency. The locking is only needed when
>> hmem_active needs to be protected.
>
> It is only strictly necessary for hmem_active, but it happened to be
> protecting theoretical concurrent callers of hmat_register_resource(). I
> do not think it can happen in practice, but it is called by both initial
> init and runtime notifier. The notifier path does:
>
> hmat_callback() -> hmat_register_target()
>
> That path seems impossible to add new hmem devices, but it is burning
> cycles walking through all the memory ranges associated with a target
> only to find that they are already registered. I think that can be
> cleaned up with an unlocked check of target->registered.
>
> If that loses some theoretical race then your new
> hmem_request_resource() will pick a race winner for that target.
>
> Otherwise, the code *looks* like it has a TOCTOU race with
> platform_initialized. So feels like some comments and cleanups to make
> that clear are needed.
>
> Really I think hmat_callback() path should not be doing *any*
> registration work, only update work.
So are you saying that hmat_callback() should skip
hmat_register_target_devices() when calling hmat_register_target()? hmat_init()
calls hmat_register_targets() and hmem_init() also basically does something
similar. So from that perspective, hmat_callback() shouldn't be finding
something new. However if we drop the hmat_register_target() and a memory
device gets hot-plugged (i.e. a new card gets inserted), do we lose something?
If we base the calling of hmat_register_target_devices() on target->registered,
I don't think it removes the lockdep splat because the locking order is
unchanged.
DJ
>
>> Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to
>> dax_hmem.ko")
>> Signed-off-by: Dave Jiang <[email protected]>
>> ---
>> drivers/dax/hmem/device.c | 42 +++++++++++++++++++++++----------------
>> 1 file changed, 25 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
>> index f9e1a76a04a9..ab5977d75d1f 100644
>> --- a/drivers/dax/hmem/device.c
>> +++ b/drivers/dax/hmem/device.c
>> @@ -33,21 +33,37 @@ int walk_hmem_resources(struct device *host,
>> walk_hmem_fn fn)
>> }
>> EXPORT_SYMBOL_GPL(walk_hmem_resources);
>>
>> -static void __hmem_register_resource(int target_nid, struct resource *res)
>> +static struct resource *hmem_request_resource(int target_nid,
>> + struct resource *res)
>> {
>> - struct platform_device *pdev;
>> struct resource *new;
>> - int rc;
>>
>> - new = __request_region(&hmem_active, res->start, resource_size(res), "",
>> - 0);
>> + guard(mutex)(&hmem_resource_lock);
>> + new = __request_region(&hmem_active, res->start,
>> + resource_size(res), "", 0);
>> if (!new) {
>> pr_debug("hmem range %pr already active\n", res);
>> - return;
>> + return ERR_PTR(-ENOMEM);
>
> Probably does not matter since noone consumes this code, but this is
> more -EBUSY than -ENOMEM.