Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 8 Feb 2013, Gerald Schaefer wrote: > On Fri, 25 Jan 2013 18:10:18 -0800 (PST) > Hugh Dickins wrote: > > > Complaints are rare, but lockdep still does not understand the way > > ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and > > holds it until the ksm_memory_callback(MEM_OFFLINE): that appears > > to be a problem because notifier callbacks are made under down_read > > of blocking_notifier_head->rwsem (so first the mutex is taken while > > holding the rwsem, then later the rwsem is taken while still holding > > the mutex); but is not in fact a problem because mem_hotplug_mutex > > is held throughout the dance. > > > > There was an attempt to fix this with mutex_lock_nested(); but if that > > happened to fool lockdep two years ago, apparently it does so no > > longer. > > > > I had hoped to eradicate this issue in extending KSM page migration > > not to need the ksm_thread_mutex. But then realized that although > > the page migration itself is safe, we do still need to lock out ksmd > > and other users of get_ksm_page() while offlining memory - at some > > point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages > > themselves may vanish, and get_ksm_page()'s accesses to them become a > > violation. > > > > So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE > > to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and > > wait_while_offlining() checks, to achieve the same lockout without > > being caught by lockdep. This is less elegant for KSM, but it's more > > important to keep lockdep useful to other users - and I apologize for > > how long it took to fix. > > Thanks a lot for the patch! I verified that it fixes the lockdep warning > that we got on memory hotremove. > > > > > Reported-by: Gerald Schaefer > > Signed-off-by: Hugh Dickins Thank you for reporting and testing and reporting back: sorry again for taking so long to fix it. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 8 Feb 2013, Gerald Schaefer wrote: On Fri, 25 Jan 2013 18:10:18 -0800 (PST) Hugh Dickins hu...@google.com wrote: Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head-rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Thanks a lot for the patch! I verified that it fixes the lockdep warning that we got on memory hotremove. Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com Signed-off-by: Hugh Dickins hu...@google.com Thank you for reporting and testing and reporting back: sorry again for taking so long to fix it. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 25 Jan 2013 18:10:18 -0800 (PST) Hugh Dickins wrote: > Complaints are rare, but lockdep still does not understand the way > ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and > holds it until the ksm_memory_callback(MEM_OFFLINE): that appears > to be a problem because notifier callbacks are made under down_read > of blocking_notifier_head->rwsem (so first the mutex is taken while > holding the rwsem, then later the rwsem is taken while still holding > the mutex); but is not in fact a problem because mem_hotplug_mutex > is held throughout the dance. > > There was an attempt to fix this with mutex_lock_nested(); but if that > happened to fool lockdep two years ago, apparently it does so no > longer. > > I had hoped to eradicate this issue in extending KSM page migration > not to need the ksm_thread_mutex. But then realized that although > the page migration itself is safe, we do still need to lock out ksmd > and other users of get_ksm_page() while offlining memory - at some > point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages > themselves may vanish, and get_ksm_page()'s accesses to them become a > violation. > > So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE > to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and > wait_while_offlining() checks, to achieve the same lockout without > being caught by lockdep. This is less elegant for KSM, but it's more > important to keep lockdep useful to other users - and I apologize for > how long it took to fix. Thanks a lot for the patch! I verified that it fixes the lockdep warning that we got on memory hotremove. > > Reported-by: Gerald Schaefer > Signed-off-by: Hugh Dickins > --- > mm/ksm.c | 55 +++-- > 1 file changed, 41 insertions(+), 14 deletions(-) > > --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 > +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800 > @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod > #define KSM_RUN_STOP 0 > #define KSM_RUN_MERGE1 > #define KSM_RUN_UNMERGE 2 > -static unsigned int ksm_run = KSM_RUN_STOP; > +#define KSM_RUN_OFFLINE 4 > +static unsigned long ksm_run = KSM_RUN_STOP; > +static void wait_while_offlining(void); > > static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); > static DEFINE_MUTEX(ksm_thread_mutex); > @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing > > while (!kthread_should_stop()) { > mutex_lock(_thread_mutex); > + wait_while_offlining(); > if (ksmd_should_run()) > ksm_do_scan(ksm_thread_pages_to_scan); > mutex_unlock(_thread_mutex); > @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa > #endif /* CONFIG_MIGRATION */ > > #ifdef CONFIG_MEMORY_HOTREMOVE > +static int just_wait(void *word) > +{ > + schedule(); > + return 0; > +} > + > +static void wait_while_offlining(void) > +{ > + while (ksm_run & KSM_RUN_OFFLINE) { > + mutex_unlock(_thread_mutex); > + wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE), > + just_wait, TASK_UNINTERRUPTIBLE); > + mutex_lock(_thread_mutex); > + } > +} > + > static void ksm_check_stable_tree(unsigned long start_pfn, > unsigned long end_pfn) > { > @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no > switch (action) { > case MEM_GOING_OFFLINE: > /* > - * Keep it very simple for now: just lock out ksmd > and > - * MADV_UNMERGEABLE while any memory is going > offline. > - * mutex_lock_nested() is necessary because lockdep > was alarmed > - * that here we take ksm_thread_mutex inside > notifier chain > - * mutex, and later take notifier chain mutex inside > - * ksm_thread_mutex to unlock it. But that's safe > because both > - * are inside mem_hotplug_mutex. > + * Prevent ksm_do_scan(), > unmerge_and_remove_all_rmap_items() > + * and remove_all_stable_nodes() while memory is > going offline: > + * it is unsafe for them to touch the stable tree at > this time. > + * But unmerge_ksm_pages(), rmap lookups and other > entry points > + * which do not need the ksm_thread_mutex are all > safe. */ > - mutex_lock_nested(_thread_mutex, > SINGLE_DEPTH_NESTING); > + mutex_lock(_thread_mutex); > + ksm_run |= KSM_RUN_OFFLINE; > + mutex_unlock(_thread_mutex); > break; > > case MEM_OFFLINE: > @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no > /* fallthrough */ > > case MEM_CANCEL_OFFLINE: > + mutex_lock(_thread_mutex); > + ksm_run &= ~KSM_RUN_OFFLINE; > mutex_unlock(_thread_mutex); > + > +
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 25 Jan 2013 18:10:18 -0800 (PST) Hugh Dickins hu...@google.com wrote: Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head-rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Thanks a lot for the patch! I verified that it fixes the lockdep warning that we got on memory hotremove. Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com Signed-off-by: Hugh Dickins hu...@google.com --- mm/ksm.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE1 #define KSM_RUN_UNMERGE 2 -static unsigned int ksm_run = KSM_RUN_STOP; +#define KSM_RUN_OFFLINE 4 +static unsigned long ksm_run = KSM_RUN_STOP; +static void wait_while_offlining(void); static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); static DEFINE_MUTEX(ksm_thread_mutex); @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing while (!kthread_should_stop()) { mutex_lock(ksm_thread_mutex); + wait_while_offlining(); if (ksmd_should_run()) ksm_do_scan(ksm_thread_pages_to_scan); mutex_unlock(ksm_thread_mutex); @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE +static int just_wait(void *word) +{ + schedule(); + return 0; +} + +static void wait_while_offlining(void) +{ + while (ksm_run KSM_RUN_OFFLINE) { + mutex_unlock(ksm_thread_mutex); + wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE), + just_wait, TASK_UNINTERRUPTIBLE); + mutex_lock(ksm_thread_mutex); + } +} + static void ksm_check_stable_tree(unsigned long start_pfn, unsigned long end_pfn) { @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* - * Keep it very simple for now: just lock out ksmd and - * MADV_UNMERGEABLE while any memory is going offline. - * mutex_lock_nested() is necessary because lockdep was alarmed - * that here we take ksm_thread_mutex inside notifier chain - * mutex, and later take notifier chain mutex inside - * ksm_thread_mutex to unlock it. But that's safe because both - * are inside mem_hotplug_mutex. + * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() + * and remove_all_stable_nodes() while memory is going offline: + * it is unsafe for them to touch the stable tree at this time. + * But unmerge_ksm_pages(), rmap lookups and other entry points + * which do not need the ksm_thread_mutex are all safe. */ - mutex_lock_nested(ksm_thread_mutex, SINGLE_DEPTH_NESTING); + mutex_lock(ksm_thread_mutex); + ksm_run |= KSM_RUN_OFFLINE; + mutex_unlock(ksm_thread_mutex); break; case MEM_OFFLINE: @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no /* fallthrough */ case MEM_CANCEL_OFFLINE: + mutex_lock(ksm_thread_mutex); + ksm_run = ~KSM_RUN_OFFLINE; mutex_unlock(ksm_thread_mutex); + + smp_mb(); /*
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Sun, 27 Jan 2013, Simon Jeons wrote: > On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote: > > @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no > > switch (action) { > > case MEM_GOING_OFFLINE: > > /* > > -* Keep it very simple for now: just lock out ksmd and > > -* MADV_UNMERGEABLE while any memory is going offline. > > -* mutex_lock_nested() is necessary because lockdep was alarmed > > -* that here we take ksm_thread_mutex inside notifier chain > > -* mutex, and later take notifier chain mutex inside > > -* ksm_thread_mutex to unlock it. But that's safe because both > > -* are inside mem_hotplug_mutex. > > +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() > > +* and remove_all_stable_nodes() while memory is going offline: > > +* it is unsafe for them to touch the stable tree at this time. > > +* But unmerge_ksm_pages(), rmap lookups and other entry points > > Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove? > > > +* which do not need the ksm_thread_mutex are all safe. It's just like userspace doing a write-fault on every KSM page in the vma. If that were unsafe for memory hotremove, then it would not be KSM's problem, memory hotremove would already be unsafe. (But memory hotremove is safe because it migrates away from all the pages to be removed before it can reach MEM_OFFLINE.) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Sun, 27 Jan 2013, Simon Jeons wrote: On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote: @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* -* Keep it very simple for now: just lock out ksmd and -* MADV_UNMERGEABLE while any memory is going offline. -* mutex_lock_nested() is necessary because lockdep was alarmed -* that here we take ksm_thread_mutex inside notifier chain -* mutex, and later take notifier chain mutex inside -* ksm_thread_mutex to unlock it. But that's safe because both -* are inside mem_hotplug_mutex. +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() +* and remove_all_stable_nodes() while memory is going offline: +* it is unsafe for them to touch the stable tree at this time. +* But unmerge_ksm_pages(), rmap lookups and other entry points Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove? +* which do not need the ksm_thread_mutex are all safe. It's just like userspace doing a write-fault on every KSM page in the vma. If that were unsafe for memory hotremove, then it would not be KSM's problem, memory hotremove would already be unsafe. (But memory hotremove is safe because it migrates away from all the pages to be removed before it can reach MEM_OFFLINE.) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote: > Complaints are rare, but lockdep still does not understand the way > ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and > holds it until the ksm_memory_callback(MEM_OFFLINE): that appears > to be a problem because notifier callbacks are made under down_read > of blocking_notifier_head->rwsem (so first the mutex is taken while > holding the rwsem, then later the rwsem is taken while still holding > the mutex); but is not in fact a problem because mem_hotplug_mutex > is held throughout the dance. > > There was an attempt to fix this with mutex_lock_nested(); but if that > happened to fool lockdep two years ago, apparently it does so no longer. > > I had hoped to eradicate this issue in extending KSM page migration not > to need the ksm_thread_mutex. But then realized that although the page > migration itself is safe, we do still need to lock out ksmd and other > users of get_ksm_page() while offlining memory - at some point between > MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may > vanish, and get_ksm_page()'s accesses to them become a violation. > > So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to > MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() > checks, to achieve the same lockout without being caught by lockdep. > This is less elegant for KSM, but it's more important to keep lockdep > useful to other users - and I apologize for how long it took to fix. > > Reported-by: Gerald Schaefer > Signed-off-by: Hugh Dickins > --- > mm/ksm.c | 55 +++-- > 1 file changed, 41 insertions(+), 14 deletions(-) > > --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 > +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800 > @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod > #define KSM_RUN_STOP 0 > #define KSM_RUN_MERGE1 > #define KSM_RUN_UNMERGE 2 > -static unsigned int ksm_run = KSM_RUN_STOP; > +#define KSM_RUN_OFFLINE 4 > +static unsigned long ksm_run = KSM_RUN_STOP; > +static void wait_while_offlining(void); > > static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); > static DEFINE_MUTEX(ksm_thread_mutex); > @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing > > while (!kthread_should_stop()) { > mutex_lock(_thread_mutex); > + wait_while_offlining(); > if (ksmd_should_run()) > ksm_do_scan(ksm_thread_pages_to_scan); > mutex_unlock(_thread_mutex); > @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa > #endif /* CONFIG_MIGRATION */ > > #ifdef CONFIG_MEMORY_HOTREMOVE > +static int just_wait(void *word) > +{ > + schedule(); > + return 0; > +} > + > +static void wait_while_offlining(void) > +{ > + while (ksm_run & KSM_RUN_OFFLINE) { > + mutex_unlock(_thread_mutex); > + wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE), > + just_wait, TASK_UNINTERRUPTIBLE); > + mutex_lock(_thread_mutex); > + } > +} > + > static void ksm_check_stable_tree(unsigned long start_pfn, > unsigned long end_pfn) > { > @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no > switch (action) { > case MEM_GOING_OFFLINE: > /* > - * Keep it very simple for now: just lock out ksmd and > - * MADV_UNMERGEABLE while any memory is going offline. > - * mutex_lock_nested() is necessary because lockdep was alarmed > - * that here we take ksm_thread_mutex inside notifier chain > - * mutex, and later take notifier chain mutex inside > - * ksm_thread_mutex to unlock it. But that's safe because both > - * are inside mem_hotplug_mutex. > + * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() > + * and remove_all_stable_nodes() while memory is going offline: > + * it is unsafe for them to touch the stable tree at this time. > + * But unmerge_ksm_pages(), rmap lookups and other entry points Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove? > + * which do not need the ksm_thread_mutex are all safe. >*/ > - mutex_lock_nested(_thread_mutex, SINGLE_DEPTH_NESTING); > + mutex_lock(_thread_mutex); > + ksm_run |= KSM_RUN_OFFLINE; > + mutex_unlock(_thread_mutex); > break; > > case MEM_OFFLINE: > @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no > /* fallthrough */ > > case MEM_CANCEL_OFFLINE: > + mutex_lock(_thread_mutex); > + ksm_run &= ~KSM_RUN_OFFLINE; > mutex_unlock(_thread_mutex); > + > + smp_mb(); /* wake_up_bit advises
Re: [PATCH 11/11] ksm: stop hotremove lockdep warning
On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote: Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head-rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com Signed-off-by: Hugh Dickins hu...@google.com --- mm/ksm.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE1 #define KSM_RUN_UNMERGE 2 -static unsigned int ksm_run = KSM_RUN_STOP; +#define KSM_RUN_OFFLINE 4 +static unsigned long ksm_run = KSM_RUN_STOP; +static void wait_while_offlining(void); static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); static DEFINE_MUTEX(ksm_thread_mutex); @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing while (!kthread_should_stop()) { mutex_lock(ksm_thread_mutex); + wait_while_offlining(); if (ksmd_should_run()) ksm_do_scan(ksm_thread_pages_to_scan); mutex_unlock(ksm_thread_mutex); @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE +static int just_wait(void *word) +{ + schedule(); + return 0; +} + +static void wait_while_offlining(void) +{ + while (ksm_run KSM_RUN_OFFLINE) { + mutex_unlock(ksm_thread_mutex); + wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE), + just_wait, TASK_UNINTERRUPTIBLE); + mutex_lock(ksm_thread_mutex); + } +} + static void ksm_check_stable_tree(unsigned long start_pfn, unsigned long end_pfn) { @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* - * Keep it very simple for now: just lock out ksmd and - * MADV_UNMERGEABLE while any memory is going offline. - * mutex_lock_nested() is necessary because lockdep was alarmed - * that here we take ksm_thread_mutex inside notifier chain - * mutex, and later take notifier chain mutex inside - * ksm_thread_mutex to unlock it. But that's safe because both - * are inside mem_hotplug_mutex. + * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() + * and remove_all_stable_nodes() while memory is going offline: + * it is unsafe for them to touch the stable tree at this time. + * But unmerge_ksm_pages(), rmap lookups and other entry points Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove? + * which do not need the ksm_thread_mutex are all safe. */ - mutex_lock_nested(ksm_thread_mutex, SINGLE_DEPTH_NESTING); + mutex_lock(ksm_thread_mutex); + ksm_run |= KSM_RUN_OFFLINE; + mutex_unlock(ksm_thread_mutex); break; case MEM_OFFLINE: @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no /* fallthrough */ case MEM_CANCEL_OFFLINE: + mutex_lock(ksm_thread_mutex); + ksm_run = ~KSM_RUN_OFFLINE; mutex_unlock(ksm_thread_mutex); + + smp_mb(); /* wake_up_bit advises this */ +
[PATCH 11/11] ksm: stop hotremove lockdep warning
Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head->rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Reported-by: Gerald Schaefer Signed-off-by: Hugh Dickins --- mm/ksm.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:38:53.984208836 -0800 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE 1 #define KSM_RUN_UNMERGE2 -static unsigned int ksm_run = KSM_RUN_STOP; +#define KSM_RUN_OFFLINE4 +static unsigned long ksm_run = KSM_RUN_STOP; +static void wait_while_offlining(void); static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); static DEFINE_MUTEX(ksm_thread_mutex); @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing while (!kthread_should_stop()) { mutex_lock(_thread_mutex); + wait_while_offlining(); if (ksmd_should_run()) ksm_do_scan(ksm_thread_pages_to_scan); mutex_unlock(_thread_mutex); @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE +static int just_wait(void *word) +{ + schedule(); + return 0; +} + +static void wait_while_offlining(void) +{ + while (ksm_run & KSM_RUN_OFFLINE) { + mutex_unlock(_thread_mutex); + wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE), + just_wait, TASK_UNINTERRUPTIBLE); + mutex_lock(_thread_mutex); + } +} + static void ksm_check_stable_tree(unsigned long start_pfn, unsigned long end_pfn) { @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* -* Keep it very simple for now: just lock out ksmd and -* MADV_UNMERGEABLE while any memory is going offline. -* mutex_lock_nested() is necessary because lockdep was alarmed -* that here we take ksm_thread_mutex inside notifier chain -* mutex, and later take notifier chain mutex inside -* ksm_thread_mutex to unlock it. But that's safe because both -* are inside mem_hotplug_mutex. +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() +* and remove_all_stable_nodes() while memory is going offline: +* it is unsafe for them to touch the stable tree at this time. +* But unmerge_ksm_pages(), rmap lookups and other entry points +* which do not need the ksm_thread_mutex are all safe. */ - mutex_lock_nested(_thread_mutex, SINGLE_DEPTH_NESTING); + mutex_lock(_thread_mutex); + ksm_run |= KSM_RUN_OFFLINE; + mutex_unlock(_thread_mutex); break; case MEM_OFFLINE: @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no /* fallthrough */ case MEM_CANCEL_OFFLINE: + mutex_lock(_thread_mutex); + ksm_run &= ~KSM_RUN_OFFLINE; mutex_unlock(_thread_mutex); + + smp_mb(); /* wake_up_bit advises this */ + wake_up_bit(_run, ilog2(KSM_RUN_OFFLINE)); break; } return NOTIFY_OK; } +#else +static void wait_while_offlining(void) +{ +} #endif /* CONFIG_MEMORY_HOTREMOVE */ #ifdef CONFIG_SYSFS @@ -2189,7
[PATCH 11/11] ksm: stop hotremove lockdep warning
Complaints are rare, but lockdep still does not understand the way ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a problem because notifier callbacks are made under down_read of blocking_notifier_head-rwsem (so first the mutex is taken while holding the rwsem, then later the rwsem is taken while still holding the mutex); but is not in fact a problem because mem_hotplug_mutex is held throughout the dance. There was an attempt to fix this with mutex_lock_nested(); but if that happened to fool lockdep two years ago, apparently it does so no longer. I had hoped to eradicate this issue in extending KSM page migration not to need the ksm_thread_mutex. But then realized that although the page migration itself is safe, we do still need to lock out ksmd and other users of get_ksm_page() while offlining memory - at some point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may vanish, and get_ksm_page()'s accesses to them become a violation. So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining() checks, to achieve the same lockout without being caught by lockdep. This is less elegant for KSM, but it's more important to keep lockdep useful to other users - and I apologize for how long it took to fix. Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com Signed-off-by: Hugh Dickins hu...@google.com --- mm/ksm.c | 55 +++-- 1 file changed, 41 insertions(+), 14 deletions(-) --- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800 +++ mmotm/mm/ksm.c 2013-01-25 14:38:53.984208836 -0800 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod #define KSM_RUN_STOP 0 #define KSM_RUN_MERGE 1 #define KSM_RUN_UNMERGE2 -static unsigned int ksm_run = KSM_RUN_STOP; +#define KSM_RUN_OFFLINE4 +static unsigned long ksm_run = KSM_RUN_STOP; +static void wait_while_offlining(void); static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait); static DEFINE_MUTEX(ksm_thread_mutex); @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing while (!kthread_should_stop()) { mutex_lock(ksm_thread_mutex); + wait_while_offlining(); if (ksmd_should_run()) ksm_do_scan(ksm_thread_pages_to_scan); mutex_unlock(ksm_thread_mutex); @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_MEMORY_HOTREMOVE +static int just_wait(void *word) +{ + schedule(); + return 0; +} + +static void wait_while_offlining(void) +{ + while (ksm_run KSM_RUN_OFFLINE) { + mutex_unlock(ksm_thread_mutex); + wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE), + just_wait, TASK_UNINTERRUPTIBLE); + mutex_lock(ksm_thread_mutex); + } +} + static void ksm_check_stable_tree(unsigned long start_pfn, unsigned long end_pfn) { @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no switch (action) { case MEM_GOING_OFFLINE: /* -* Keep it very simple for now: just lock out ksmd and -* MADV_UNMERGEABLE while any memory is going offline. -* mutex_lock_nested() is necessary because lockdep was alarmed -* that here we take ksm_thread_mutex inside notifier chain -* mutex, and later take notifier chain mutex inside -* ksm_thread_mutex to unlock it. But that's safe because both -* are inside mem_hotplug_mutex. +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items() +* and remove_all_stable_nodes() while memory is going offline: +* it is unsafe for them to touch the stable tree at this time. +* But unmerge_ksm_pages(), rmap lookups and other entry points +* which do not need the ksm_thread_mutex are all safe. */ - mutex_lock_nested(ksm_thread_mutex, SINGLE_DEPTH_NESTING); + mutex_lock(ksm_thread_mutex); + ksm_run |= KSM_RUN_OFFLINE; + mutex_unlock(ksm_thread_mutex); break; case MEM_OFFLINE: @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no /* fallthrough */ case MEM_CANCEL_OFFLINE: + mutex_lock(ksm_thread_mutex); + ksm_run = ~KSM_RUN_OFFLINE; mutex_unlock(ksm_thread_mutex); + + smp_mb(); /* wake_up_bit advises this */ + wake_up_bit(ksm_run, ilog2(KSM_RUN_OFFLINE)); break; } return NOTIFY_OK; } +#else +static void wait_while_offlining(void) +{ +}