This patch tries to solve the device hot remove locking issues in a different way from commit 5e33bc41, as kernfs already has a mechanism to break active protection.
The active protection is there to keep the file alive by blocking deletion while operations are on-going in the file. This blocking creates a dependency loop when an operation running off a sysfs knob ends up grabbing a lock which may be held while removing the said sysfs knob. So the problem here is the order of s_active, and the series of hotplug related locks. commit 5e33bc41 solves it by taking out the first of the series of hoplug related locks, device_hotplug_lock, with a try lock. And if that try lock fails, it returns a restart syscall error, and drops s_active temporarily to allow the other process to remove the sysfs knob. This doesn't help with lockdep warnings reported against s_active and other hotplug related locks in the series. This patch instead tries to take s_active out of the lock dependency graph using the active protection breaking mechanism. lock_device_hotplug_sysfs() function name is kept here, three more arguments are added, dev, attr, to help find the kernfs_node corresponding to the sysfs knob (which should always be able to be found, WARN if not); kn_out is used to record the found kernfs_node for use in unlock. unlock_device_hotplug_sysfs() is created to help cleanup the operations(get reference, break active, etc) done in lock_device_hotplug_sysfs(), as well as unlock the lock. As we break the active protection here, the device removing process might remove the sysfs entries after that point. So after we grab the device_hotplug_lock, we can check whether that really happens. If so, we should abort and not invoke the online/offline callbacks anymore. lockdep assertion is added in try_offline_node() to make sure device_hotplug_lock is grabbed while removing cpu or memory. As devcie_hotplug_lock is used to protect hotplug operations with multiple subsystems. So there might be devices that won't be removed together with devices of other subsystems, and thus device_hotplug_lock is not needed. In this case, their specific online/offline callbacks needs to check whether the device has been removed. Cc: Tejun Heo <t...@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wyso...@intel.com> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> Cc: Toshi Kani <toshi.k...@hp.com> Signed-off-by: Li Zhong <zh...@linux.vnet.ibm.com> --- drivers/base/core.c | 93 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/base/memory.c | 5 +-- include/linux/device.h | 7 +++- mm/memory_hotplug.c | 2 ++ 4 files changed, 95 insertions(+), 12 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 20da3ad..329f3b4 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -61,14 +61,87 @@ void unlock_device_hotplug(void) mutex_unlock(&device_hotplug_lock); } -int lock_device_hotplug_sysfs(void) +void device_hotplug_assert_held(void) { - if (mutex_trylock(&device_hotplug_lock)) - return 0; + lockdep_assert_held(&device_hotplug_lock); +} + +/* + * device_hotplug_lock is used to protect hotplugs which may involve multiple + * subsystems. This _sysfs() version is created to solve the deadlock with + * s_active of the device attribute file, which could be used to online/offline + * the device. + * + * Returns 0 on success. -ENODEV if the @dev is removed after we break the + * active protection. (We should always be able to find the kernfs_node + * related to the attribute, WARN if not, and also return -ENODEV). + * + * When success(return 0), the found kernfs_node is passed out in @kn_out, + * else pass NULL to @kn_out. + */ +int lock_device_hotplug_sysfs(struct device *dev, + struct device_attribute *attr, + struct kernfs_node **kn_out) +{ + struct kernfs_node *kn; + + *kn_out = kn = kernfs_find_and_get(dev->kobj.sd, attr->attr.name); + + if (WARN_ON_ONCE(!kn)) + return -ENODEV; + + /* + * Break active protection here to avoid deadlocks with device + * removing process, which tries to remove sysfs entries including this + * "online" attribute while holding some hotplug related locks. + * + * @dev needs to be protected here, or it could go away any time after + * dropping active protection. + */ + get_device(dev); + kernfs_break_active_protection(kn); + + lock_device_hotplug(); + + /* + * For devices which need device_hoplug_lock to protect the + * online/offline operation, i.e. it might be removed together with + * devices in some other subsystems, we can check here whether the + * device has been removed, so we don't call device_{on|off}line + * against removed device. lockdep assertion is added in + * try_offline_node() to make sure the lock is held to remove cpu + * or memory. + * + * If some devices won't be removed together with devices in other + * subsystems, thus device_hotplug_lock is not needed. Then they need + * check whether device has been removed in their devices' specific + * online/offline callbacks. + */ + if (!dev->kobj.sd) { + /* device_del() already called on @dev, we should also abort */ + unlock_device_hotplug(); + kernfs_unbreak_active_protection(kn); + put_device(dev); + kernfs_put(kn); + *kn_out = NULL; + return -ENODEV; + } - /* Avoid busy looping (5 ms of sleep should do). */ - msleep(5); - return restart_syscall(); + return 0; +} + +/* + * This _sysfs version of unlock_device_hotplug help to undo the + * operations (get reference, break active protection, etc) done in + * lock_device_hotplug_sysfs(), as well as unlock the device_hotplug_lock. + */ +void unlock_device_hotplug_sysfs(struct device *dev, + struct kernfs_node *kn) +{ + unlock_device_hotplug(); + kernfs_unbreak_active_protection(kn); + put_device(dev); + kernfs_put(kn); } #ifdef CONFIG_BLOCK @@ -439,17 +512,19 @@ static ssize_t online_store(struct device *dev, struct device_attribute *attr, { bool val; int ret; + struct kernfs_node *kn = NULL; ret = strtobool(buf, &val); if (ret < 0) return ret; - ret = lock_device_hotplug_sysfs(); - if (ret) + ret = lock_device_hotplug_sysfs(dev, attr, &kn); + if (ret < 0) return ret; ret = val ? device_online(dev) : device_offline(dev); - unlock_device_hotplug(); + unlock_device_hotplug_sysfs(dev, kn); + return ret < 0 ? ret : count; } static DEVICE_ATTR_RW(online); diff --git a/drivers/base/memory.c b/drivers/base/memory.c index bece691..f82dea4 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -320,8 +320,9 @@ store_mem_state(struct device *dev, { struct memory_block *mem = to_memory_block(dev); int ret, online_type; + struct kernfs_node *kn = NULL; - ret = lock_device_hotplug_sysfs(); + ret = lock_device_hotplug_sysfs(dev, attr, &kn); if (ret) return ret; @@ -360,7 +361,7 @@ store_mem_state(struct device *dev, } err: - unlock_device_hotplug(); + unlock_device_hotplug_sysfs(dev, kn); if (ret) return ret; diff --git a/include/linux/device.h b/include/linux/device.h index d1d1c05..19a24f4 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -917,7 +917,12 @@ static inline bool device_supports_offline(struct device *dev) extern void lock_device_hotplug(void); extern void unlock_device_hotplug(void); -extern int lock_device_hotplug_sysfs(void); +extern void device_hotplug_assert_held(void); +extern int lock_device_hotplug_sysfs(struct device *dev, + struct device_attribute *attr, + struct kernfs_node **kn); +extern void unlock_device_hotplug_sysfs(struct device *dev, + struct kernfs_node *parent); extern int device_offline(struct device *dev); extern int device_online(struct device *dev); /* diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a650db2..bdec24d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1822,6 +1822,8 @@ void try_offline_node(int nid) struct page *pgdat_page = virt_to_page(pgdat); int i; + device_hotplug_assert_held(); + for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) { unsigned long section_nr = pfn_to_section_nr(pfn); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/