On 07/04/2017 11:32 AM, Thomas Gleixner wrote: > Andrey reported a potential deadlock with the memory hotplug lock and the > cpu hotplug lock. > > The reason is that memory hotplug takes the memory hotplug lock and then > calls stop_machine() which calls get_online_cpus(). That's the reverse lock > order to get_online_cpus(); get_online_mems(); in mm/slub_common.c > > The problem has been there forever. The reason why this was never reported > is that the cpu hotplug locking had this homebrewn recursive reader writer > semaphore construct which due to the recursion evaded the full lock dep > coverage. The memory hotplug code copied that construct verbatim and > therefor has similar issues. > > Three steps to fix this: > > 1) Convert the memory hotplug locking to a per cpu rwsem so the potential > issues get reported proper by lockdep. > > 2) Lock the online cpus in mem_hotplug_begin() before taking the memory > hotplug rwsem and use stop_machine_cpuslocked() in the page_alloc code > and use to avoid recursive locking.
^ s/and use // ? > > 3) The cpu hotpluck locking in #2 causes a recursive locking of the cpu > hotplug lock via __offline_pages() -> lru_add_drain_all(). Solve this by > invoking lru_add_drain_all_cpuslocked() instead. > > Reported-by: Andrey Ryabinin <aryabi...@virtuozzo.com> > Signed-off-by: Thomas Gleixner <t...@linutronix.de> > Cc: Michal Hocko <mho...@kernel.org> > Cc: linux...@kvack.org > Cc: Andrew Morton <a...@linux-foundation.org> > Cc: Vlastimil Babka <vba...@suse.cz> > Cc: Vladimir Davydov <vdavydov....@gmail.com> Acked-by: Vlastimil Babka <vba...@suse.cz> > --- > mm/memory_hotplug.c | 89 > ++++++++-------------------------------------------- > mm/page_alloc.c | 2 - > 2 files changed, 16 insertions(+), 75 deletions(-) Nice! Glad to see the crazy code go.