Dave Jiang wrote:
> The following lockdep splat was observed while kernel auto-online a CXL
> memory region:
>
The entire spew with timestamps does not need to be saved in the git
history I would trim this to:
---
======================================================
WARNING: possible circular locking dependency detected
6.17.0djtest+ #53 Tainted: G W
------------------------------------------------------
systemd-udevd/3334 is trying to acquire lock:
ffffffff90346188 (hmem_resource_lock){+.+.}-{4:4}, at:
hmem_register_resource+0x31/0x50
but task is already holding lock:
ffffffff90338890 ((node_chain).rwsem){++++}-{4:4}, at:
blocking_notifier_call_chain+0x2e/0x70
which lock already depends on the new lock.
[..]
Chain exists of:
hmem_resource_lock --> mem_hotplug_lock --> (node_chain).rwsem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock((node_chain).rwsem);
lock(mem_hotplug_lock);
lock((node_chain).rwsem);
lock(hmem_resource_lock);
---
> The lock ordering can cause potential deadlock. There are instances
> where hmem_resource_lock is taken after (node_chain).rwsem, and vice
> versa. Narrow the scope of hmem_resource_lock in hmem_register_resource()
> to avoid the circular locking dependency. The locking is only needed when
> hmem_active needs to be protected.
It is only strictly necessary for hmem_active, but it happened to be
protecting theoretical concurrent callers of hmat_register_resource(). I
do not think it can happen in practice, but it is called by both initial
init and runtime notifier. The notifier path does:
hmat_callback() -> hmat_register_target()
That path seems impossible to add new hmem devices, but it is burning
cycles walking through all the memory ranges associated with a target
only to find that they are already registered. I think that can be
cleaned up with an unlocked check of target->registered.
If that loses some theoretical race then your new
hmem_request_resource() will pick a race winner for that target.
Otherwise, the code *looks* like it has a TOCTOU race with
platform_initialized. So feels like some comments and cleanups to make
that clear are needed.
Really I think hmat_callback() path should not be doing *any*
registration work, only update work.
> Fixes: 7dab174e2e27 ("dax/hmem: Move hmem device registration to dax_hmem.ko")
> Signed-off-by: Dave Jiang <[email protected]>
> ---
> drivers/dax/hmem/device.c | 42 +++++++++++++++++++++++----------------
> 1 file changed, 25 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/hmem/device.c b/drivers/dax/hmem/device.c
> index f9e1a76a04a9..ab5977d75d1f 100644
> --- a/drivers/dax/hmem/device.c
> +++ b/drivers/dax/hmem/device.c
> @@ -33,21 +33,37 @@ int walk_hmem_resources(struct device *host, walk_hmem_fn
> fn)
> }
> EXPORT_SYMBOL_GPL(walk_hmem_resources);
>
> -static void __hmem_register_resource(int target_nid, struct resource *res)
> +static struct resource *hmem_request_resource(int target_nid,
> + struct resource *res)
> {
> - struct platform_device *pdev;
> struct resource *new;
> - int rc;
>
> - new = __request_region(&hmem_active, res->start, resource_size(res), "",
> - 0);
> + guard(mutex)(&hmem_resource_lock);
> + new = __request_region(&hmem_active, res->start,
> + resource_size(res), "", 0);
> if (!new) {
> pr_debug("hmem range %pr already active\n", res);
> - return;
> + return ERR_PTR(-ENOMEM);
Probably does not matter since noone consumes this code, but this is
more -EBUSY than -ENOMEM.