Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-02-11 Thread Hugh Dickins
On Fri, 8 Feb 2013, Gerald Schaefer wrote:
> On Fri, 25 Jan 2013 18:10:18 -0800 (PST)
> Hugh Dickins  wrote:
> 
> > Complaints are rare, but lockdep still does not understand the way
> > ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
> > holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
> > to be a problem because notifier callbacks are made under down_read
> > of blocking_notifier_head->rwsem (so first the mutex is taken while
> > holding the rwsem, then later the rwsem is taken while still holding
> > the mutex); but is not in fact a problem because mem_hotplug_mutex
> > is held throughout the dance.
> > 
> > There was an attempt to fix this with mutex_lock_nested(); but if that
> > happened to fool lockdep two years ago, apparently it does so no
> > longer.
> > 
> > I had hoped to eradicate this issue in extending KSM page migration
> > not to need the ksm_thread_mutex.  But then realized that although
> > the page migration itself is safe, we do still need to lock out ksmd
> > and other users of get_ksm_page() while offlining memory - at some
> > point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages
> > themselves may vanish, and get_ksm_page()'s accesses to them become a
> > violation.
> > 
> > So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE
> > to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and
> > wait_while_offlining() checks, to achieve the same lockout without
> > being caught by lockdep. This is less elegant for KSM, but it's more
> > important to keep lockdep useful to other users - and I apologize for
> > how long it took to fix.
> 
> Thanks a lot for the patch! I verified that it fixes the lockdep warning
> that we got on memory hotremove.
> 
> > 
> > Reported-by: Gerald Schaefer 
> > Signed-off-by: Hugh Dickins 

Thank you for reporting and testing and reporting back:
sorry again for taking so long to fix it.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-02-11 Thread Hugh Dickins
On Fri, 8 Feb 2013, Gerald Schaefer wrote:
 On Fri, 25 Jan 2013 18:10:18 -0800 (PST)
 Hugh Dickins hu...@google.com wrote:
 
  Complaints are rare, but lockdep still does not understand the way
  ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
  holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
  to be a problem because notifier callbacks are made under down_read
  of blocking_notifier_head-rwsem (so first the mutex is taken while
  holding the rwsem, then later the rwsem is taken while still holding
  the mutex); but is not in fact a problem because mem_hotplug_mutex
  is held throughout the dance.
  
  There was an attempt to fix this with mutex_lock_nested(); but if that
  happened to fool lockdep two years ago, apparently it does so no
  longer.
  
  I had hoped to eradicate this issue in extending KSM page migration
  not to need the ksm_thread_mutex.  But then realized that although
  the page migration itself is safe, we do still need to lock out ksmd
  and other users of get_ksm_page() while offlining memory - at some
  point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages
  themselves may vanish, and get_ksm_page()'s accesses to them become a
  violation.
  
  So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE
  to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and
  wait_while_offlining() checks, to achieve the same lockout without
  being caught by lockdep. This is less elegant for KSM, but it's more
  important to keep lockdep useful to other users - and I apologize for
  how long it took to fix.
 
 Thanks a lot for the patch! I verified that it fixes the lockdep warning
 that we got on memory hotremove.
 
  
  Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com
  Signed-off-by: Hugh Dickins hu...@google.com

Thank you for reporting and testing and reporting back:
sorry again for taking so long to fix it.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-02-08 Thread Gerald Schaefer
On Fri, 25 Jan 2013 18:10:18 -0800 (PST)
Hugh Dickins  wrote:

> Complaints are rare, but lockdep still does not understand the way
> ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
> holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
> to be a problem because notifier callbacks are made under down_read
> of blocking_notifier_head->rwsem (so first the mutex is taken while
> holding the rwsem, then later the rwsem is taken while still holding
> the mutex); but is not in fact a problem because mem_hotplug_mutex
> is held throughout the dance.
> 
> There was an attempt to fix this with mutex_lock_nested(); but if that
> happened to fool lockdep two years ago, apparently it does so no
> longer.
> 
> I had hoped to eradicate this issue in extending KSM page migration
> not to need the ksm_thread_mutex.  But then realized that although
> the page migration itself is safe, we do still need to lock out ksmd
> and other users of get_ksm_page() while offlining memory - at some
> point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages
> themselves may vanish, and get_ksm_page()'s accesses to them become a
> violation.
> 
> So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE
> to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and
> wait_while_offlining() checks, to achieve the same lockout without
> being caught by lockdep. This is less elegant for KSM, but it's more
> important to keep lockdep useful to other users - and I apologize for
> how long it took to fix.

Thanks a lot for the patch! I verified that it fixes the lockdep warning
that we got on memory hotremove.

> 
> Reported-by: Gerald Schaefer 
> Signed-off-by: Hugh Dickins 
> ---
>  mm/ksm.c |   55 +++--
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:37:06.880206290 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800
> @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
>  #define KSM_RUN_STOP 0
>  #define KSM_RUN_MERGE1
>  #define KSM_RUN_UNMERGE  2
> -static unsigned int ksm_run = KSM_RUN_STOP;
> +#define KSM_RUN_OFFLINE  4
> +static unsigned long ksm_run = KSM_RUN_STOP;
> +static void wait_while_offlining(void);
> 
>  static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
>  static DEFINE_MUTEX(ksm_thread_mutex);
> @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
> 
>   while (!kthread_should_stop()) {
>   mutex_lock(_thread_mutex);
> + wait_while_offlining();
>   if (ksmd_should_run())
>   ksm_do_scan(ksm_thread_pages_to_scan);
>   mutex_unlock(_thread_mutex);
> @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
>  #endif /* CONFIG_MIGRATION */
> 
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +static int just_wait(void *word)
> +{
> + schedule();
> + return 0;
> +}
> +
> +static void wait_while_offlining(void)
> +{
> + while (ksm_run & KSM_RUN_OFFLINE) {
> + mutex_unlock(_thread_mutex);
> + wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE),
> + just_wait, TASK_UNINTERRUPTIBLE);
> + mutex_lock(_thread_mutex);
> + }
> +}
> +
>  static void ksm_check_stable_tree(unsigned long start_pfn,
> unsigned long end_pfn)
>  {
> @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
>   switch (action) {
>   case MEM_GOING_OFFLINE:
>   /*
> -  * Keep it very simple for now: just lock out ksmd
> and
> -  * MADV_UNMERGEABLE while any memory is going
> offline.
> -  * mutex_lock_nested() is necessary because lockdep
> was alarmed
> -  * that here we take ksm_thread_mutex inside
> notifier chain
> -  * mutex, and later take notifier chain mutex inside
> -  * ksm_thread_mutex to unlock it.   But that's safe
> because both
> -  * are inside mem_hotplug_mutex.
> +  * Prevent ksm_do_scan(),
> unmerge_and_remove_all_rmap_items()
> +  * and remove_all_stable_nodes() while memory is
> going offline:
> +  * it is unsafe for them to touch the stable tree at
> this time.
> +  * But unmerge_ksm_pages(), rmap lookups and other
> entry points
> +  * which do not need the ksm_thread_mutex are all
> safe. */
> - mutex_lock_nested(_thread_mutex,
> SINGLE_DEPTH_NESTING);
> + mutex_lock(_thread_mutex);
> + ksm_run |= KSM_RUN_OFFLINE;
> + mutex_unlock(_thread_mutex);
>   break;
> 
>   case MEM_OFFLINE:
> @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
>   /* fallthrough */
> 
>   case MEM_CANCEL_OFFLINE:
> + mutex_lock(_thread_mutex);
> + ksm_run &= ~KSM_RUN_OFFLINE;
>   mutex_unlock(_thread_mutex);
> +
> + 

Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-02-08 Thread Gerald Schaefer
On Fri, 25 Jan 2013 18:10:18 -0800 (PST)
Hugh Dickins hu...@google.com wrote:

 Complaints are rare, but lockdep still does not understand the way
 ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
 holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
 to be a problem because notifier callbacks are made under down_read
 of blocking_notifier_head-rwsem (so first the mutex is taken while
 holding the rwsem, then later the rwsem is taken while still holding
 the mutex); but is not in fact a problem because mem_hotplug_mutex
 is held throughout the dance.
 
 There was an attempt to fix this with mutex_lock_nested(); but if that
 happened to fool lockdep two years ago, apparently it does so no
 longer.
 
 I had hoped to eradicate this issue in extending KSM page migration
 not to need the ksm_thread_mutex.  But then realized that although
 the page migration itself is safe, we do still need to lock out ksmd
 and other users of get_ksm_page() while offlining memory - at some
 point between MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages
 themselves may vanish, and get_ksm_page()'s accesses to them become a
 violation.
 
 So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE
 to MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and
 wait_while_offlining() checks, to achieve the same lockout without
 being caught by lockdep. This is less elegant for KSM, but it's more
 important to keep lockdep useful to other users - and I apologize for
 how long it took to fix.

Thanks a lot for the patch! I verified that it fixes the lockdep warning
that we got on memory hotremove.

 
 Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  mm/ksm.c |   55 +++--
  1 file changed, 41 insertions(+), 14 deletions(-)
 
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:37:06.880206290 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800
 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
  #define KSM_RUN_STOP 0
  #define KSM_RUN_MERGE1
  #define KSM_RUN_UNMERGE  2
 -static unsigned int ksm_run = KSM_RUN_STOP;
 +#define KSM_RUN_OFFLINE  4
 +static unsigned long ksm_run = KSM_RUN_STOP;
 +static void wait_while_offlining(void);
 
  static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
  static DEFINE_MUTEX(ksm_thread_mutex);
 @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
 
   while (!kthread_should_stop()) {
   mutex_lock(ksm_thread_mutex);
 + wait_while_offlining();
   if (ksmd_should_run())
   ksm_do_scan(ksm_thread_pages_to_scan);
   mutex_unlock(ksm_thread_mutex);
 @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
  #endif /* CONFIG_MIGRATION */
 
  #ifdef CONFIG_MEMORY_HOTREMOVE
 +static int just_wait(void *word)
 +{
 + schedule();
 + return 0;
 +}
 +
 +static void wait_while_offlining(void)
 +{
 + while (ksm_run  KSM_RUN_OFFLINE) {
 + mutex_unlock(ksm_thread_mutex);
 + wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE),
 + just_wait, TASK_UNINTERRUPTIBLE);
 + mutex_lock(ksm_thread_mutex);
 + }
 +}
 +
  static void ksm_check_stable_tree(unsigned long start_pfn,
 unsigned long end_pfn)
  {
 @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
   switch (action) {
   case MEM_GOING_OFFLINE:
   /*
 -  * Keep it very simple for now: just lock out ksmd
 and
 -  * MADV_UNMERGEABLE while any memory is going
 offline.
 -  * mutex_lock_nested() is necessary because lockdep
 was alarmed
 -  * that here we take ksm_thread_mutex inside
 notifier chain
 -  * mutex, and later take notifier chain mutex inside
 -  * ksm_thread_mutex to unlock it.   But that's safe
 because both
 -  * are inside mem_hotplug_mutex.
 +  * Prevent ksm_do_scan(),
 unmerge_and_remove_all_rmap_items()
 +  * and remove_all_stable_nodes() while memory is
 going offline:
 +  * it is unsafe for them to touch the stable tree at
 this time.
 +  * But unmerge_ksm_pages(), rmap lookups and other
 entry points
 +  * which do not need the ksm_thread_mutex are all
 safe. */
 - mutex_lock_nested(ksm_thread_mutex,
 SINGLE_DEPTH_NESTING);
 + mutex_lock(ksm_thread_mutex);
 + ksm_run |= KSM_RUN_OFFLINE;
 + mutex_unlock(ksm_thread_mutex);
   break;
 
   case MEM_OFFLINE:
 @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
   /* fallthrough */
 
   case MEM_CANCEL_OFFLINE:
 + mutex_lock(ksm_thread_mutex);
 + ksm_run = ~KSM_RUN_OFFLINE;
   mutex_unlock(ksm_thread_mutex);
 +
 + smp_mb();   /* 

Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote:
> > @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
> > switch (action) {
> > case MEM_GOING_OFFLINE:
> > /*
> > -* Keep it very simple for now: just lock out ksmd and
> > -* MADV_UNMERGEABLE while any memory is going offline.
> > -* mutex_lock_nested() is necessary because lockdep was alarmed
> > -* that here we take ksm_thread_mutex inside notifier chain
> > -* mutex, and later take notifier chain mutex inside
> > -* ksm_thread_mutex to unlock it.   But that's safe because both
> > -* are inside mem_hotplug_mutex.
> > +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
> > +* and remove_all_stable_nodes() while memory is going offline:
> > +* it is unsafe for them to touch the stable tree at this time.
> > +* But unmerge_ksm_pages(), rmap lookups and other entry points
> 
> Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove?
> 
> > +* which do not need the ksm_thread_mutex are all safe.

It's just like userspace doing a write-fault on every KSM page in the vma.
If that were unsafe for memory hotremove, then it would not be KSM's
problem, memory hotremove would already be unsafe.  (But memory hotremove
is safe because it migrates away from all the pages to be removed before
it can reach MEM_OFFLINE.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
 On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote:
  @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
  switch (action) {
  case MEM_GOING_OFFLINE:
  /*
  -* Keep it very simple for now: just lock out ksmd and
  -* MADV_UNMERGEABLE while any memory is going offline.
  -* mutex_lock_nested() is necessary because lockdep was alarmed
  -* that here we take ksm_thread_mutex inside notifier chain
  -* mutex, and later take notifier chain mutex inside
  -* ksm_thread_mutex to unlock it.   But that's safe because both
  -* are inside mem_hotplug_mutex.
  +* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
  +* and remove_all_stable_nodes() while memory is going offline:
  +* it is unsafe for them to touch the stable tree at this time.
  +* But unmerge_ksm_pages(), rmap lookups and other entry points
 
 Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove?
 
  +* which do not need the ksm_thread_mutex are all safe.

It's just like userspace doing a write-fault on every KSM page in the vma.
If that were unsafe for memory hotremove, then it would not be KSM's
problem, memory hotremove would already be unsafe.  (But memory hotremove
is safe because it migrates away from all the pages to be removed before
it can reach MEM_OFFLINE.)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-26 Thread Simon Jeons
On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote:
> Complaints are rare, but lockdep still does not understand the way
> ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
> holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
> to be a problem because notifier callbacks are made under down_read
> of blocking_notifier_head->rwsem (so first the mutex is taken while
> holding the rwsem, then later the rwsem is taken while still holding
> the mutex); but is not in fact a problem because mem_hotplug_mutex
> is held throughout the dance.
> 
> There was an attempt to fix this with mutex_lock_nested(); but if that
> happened to fool lockdep two years ago, apparently it does so no longer.
> 
> I had hoped to eradicate this issue in extending KSM page migration not
> to need the ksm_thread_mutex.  But then realized that although the page
> migration itself is safe, we do still need to lock out ksmd and other
> users of get_ksm_page() while offlining memory - at some point between
> MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
> vanish, and get_ksm_page()'s accesses to them become a violation.
> 
> So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
> MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
> checks, to achieve the same lockout without being caught by lockdep.
> This is less elegant for KSM, but it's more important to keep lockdep
> useful to other users - and I apologize for how long it took to fix.
> 
> Reported-by: Gerald Schaefer 
> Signed-off-by: Hugh Dickins 
> ---
>  mm/ksm.c |   55 +++--
>  1 file changed, 41 insertions(+), 14 deletions(-)
> 
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:37:06.880206290 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800
> @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
>  #define KSM_RUN_STOP 0
>  #define KSM_RUN_MERGE1
>  #define KSM_RUN_UNMERGE  2
> -static unsigned int ksm_run = KSM_RUN_STOP;
> +#define KSM_RUN_OFFLINE  4
> +static unsigned long ksm_run = KSM_RUN_STOP;
> +static void wait_while_offlining(void);
>  
>  static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
>  static DEFINE_MUTEX(ksm_thread_mutex);
> @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
>  
>   while (!kthread_should_stop()) {
>   mutex_lock(_thread_mutex);
> + wait_while_offlining();
>   if (ksmd_should_run())
>   ksm_do_scan(ksm_thread_pages_to_scan);
>   mutex_unlock(_thread_mutex);
> @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
>  #endif /* CONFIG_MIGRATION */
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> +static int just_wait(void *word)
> +{
> + schedule();
> + return 0;
> +}
> +
> +static void wait_while_offlining(void)
> +{
> + while (ksm_run & KSM_RUN_OFFLINE) {
> + mutex_unlock(_thread_mutex);
> + wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE),
> + just_wait, TASK_UNINTERRUPTIBLE);
> + mutex_lock(_thread_mutex);
> + }
> +}
> +
>  static void ksm_check_stable_tree(unsigned long start_pfn,
> unsigned long end_pfn)
>  {
> @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
>   switch (action) {
>   case MEM_GOING_OFFLINE:
>   /*
> -  * Keep it very simple for now: just lock out ksmd and
> -  * MADV_UNMERGEABLE while any memory is going offline.
> -  * mutex_lock_nested() is necessary because lockdep was alarmed
> -  * that here we take ksm_thread_mutex inside notifier chain
> -  * mutex, and later take notifier chain mutex inside
> -  * ksm_thread_mutex to unlock it.   But that's safe because both
> -  * are inside mem_hotplug_mutex.
> +  * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
> +  * and remove_all_stable_nodes() while memory is going offline:
> +  * it is unsafe for them to touch the stable tree at this time.
> +  * But unmerge_ksm_pages(), rmap lookups and other entry points

Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove?

> +  * which do not need the ksm_thread_mutex are all safe.
>*/
> - mutex_lock_nested(_thread_mutex, SINGLE_DEPTH_NESTING);
> + mutex_lock(_thread_mutex);
> + ksm_run |= KSM_RUN_OFFLINE;
> + mutex_unlock(_thread_mutex);
>   break;
>  
>   case MEM_OFFLINE:
> @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
>   /* fallthrough */
>  
>   case MEM_CANCEL_OFFLINE:
> + mutex_lock(_thread_mutex);
> + ksm_run &= ~KSM_RUN_OFFLINE;
>   mutex_unlock(_thread_mutex);
> +
> + smp_mb();   /* wake_up_bit advises 

Re: [PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-26 Thread Simon Jeons
On Fri, 2013-01-25 at 18:10 -0800, Hugh Dickins wrote:
 Complaints are rare, but lockdep still does not understand the way
 ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
 holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
 to be a problem because notifier callbacks are made under down_read
 of blocking_notifier_head-rwsem (so first the mutex is taken while
 holding the rwsem, then later the rwsem is taken while still holding
 the mutex); but is not in fact a problem because mem_hotplug_mutex
 is held throughout the dance.
 
 There was an attempt to fix this with mutex_lock_nested(); but if that
 happened to fool lockdep two years ago, apparently it does so no longer.
 
 I had hoped to eradicate this issue in extending KSM page migration not
 to need the ksm_thread_mutex.  But then realized that although the page
 migration itself is safe, we do still need to lock out ksmd and other
 users of get_ksm_page() while offlining memory - at some point between
 MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
 vanish, and get_ksm_page()'s accesses to them become a violation.
 
 So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
 MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
 checks, to achieve the same lockout without being caught by lockdep.
 This is less elegant for KSM, but it's more important to keep lockdep
 useful to other users - and I apologize for how long it took to fix.
 
 Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  mm/ksm.c |   55 +++--
  1 file changed, 41 insertions(+), 14 deletions(-)
 
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:37:06.880206290 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:38:53.984208836 -0800
 @@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
  #define KSM_RUN_STOP 0
  #define KSM_RUN_MERGE1
  #define KSM_RUN_UNMERGE  2
 -static unsigned int ksm_run = KSM_RUN_STOP;
 +#define KSM_RUN_OFFLINE  4
 +static unsigned long ksm_run = KSM_RUN_STOP;
 +static void wait_while_offlining(void);
  
  static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
  static DEFINE_MUTEX(ksm_thread_mutex);
 @@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
  
   while (!kthread_should_stop()) {
   mutex_lock(ksm_thread_mutex);
 + wait_while_offlining();
   if (ksmd_should_run())
   ksm_do_scan(ksm_thread_pages_to_scan);
   mutex_unlock(ksm_thread_mutex);
 @@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
  #endif /* CONFIG_MIGRATION */
  
  #ifdef CONFIG_MEMORY_HOTREMOVE
 +static int just_wait(void *word)
 +{
 + schedule();
 + return 0;
 +}
 +
 +static void wait_while_offlining(void)
 +{
 + while (ksm_run  KSM_RUN_OFFLINE) {
 + mutex_unlock(ksm_thread_mutex);
 + wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE),
 + just_wait, TASK_UNINTERRUPTIBLE);
 + mutex_lock(ksm_thread_mutex);
 + }
 +}
 +
  static void ksm_check_stable_tree(unsigned long start_pfn,
 unsigned long end_pfn)
  {
 @@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
   switch (action) {
   case MEM_GOING_OFFLINE:
   /*
 -  * Keep it very simple for now: just lock out ksmd and
 -  * MADV_UNMERGEABLE while any memory is going offline.
 -  * mutex_lock_nested() is necessary because lockdep was alarmed
 -  * that here we take ksm_thread_mutex inside notifier chain
 -  * mutex, and later take notifier chain mutex inside
 -  * ksm_thread_mutex to unlock it.   But that's safe because both
 -  * are inside mem_hotplug_mutex.
 +  * Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
 +  * and remove_all_stable_nodes() while memory is going offline:
 +  * it is unsafe for them to touch the stable tree at this time.
 +  * But unmerge_ksm_pages(), rmap lookups and other entry points

Why unmerge_ksm_pages beneath us is safe for ksm memory hotremove?

 +  * which do not need the ksm_thread_mutex are all safe.
*/
 - mutex_lock_nested(ksm_thread_mutex, SINGLE_DEPTH_NESTING);
 + mutex_lock(ksm_thread_mutex);
 + ksm_run |= KSM_RUN_OFFLINE;
 + mutex_unlock(ksm_thread_mutex);
   break;
  
   case MEM_OFFLINE:
 @@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
   /* fallthrough */
  
   case MEM_CANCEL_OFFLINE:
 + mutex_lock(ksm_thread_mutex);
 + ksm_run = ~KSM_RUN_OFFLINE;
   mutex_unlock(ksm_thread_mutex);
 +
 + smp_mb();   /* wake_up_bit advises this */
 + 

[PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-25 Thread Hugh Dickins
Complaints are rare, but lockdep still does not understand the way
ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
to be a problem because notifier callbacks are made under down_read
of blocking_notifier_head->rwsem (so first the mutex is taken while
holding the rwsem, then later the rwsem is taken while still holding
the mutex); but is not in fact a problem because mem_hotplug_mutex
is held throughout the dance.

There was an attempt to fix this with mutex_lock_nested(); but if that
happened to fool lockdep two years ago, apparently it does so no longer.

I had hoped to eradicate this issue in extending KSM page migration not
to need the ksm_thread_mutex.  But then realized that although the page
migration itself is safe, we do still need to lock out ksmd and other
users of get_ksm_page() while offlining memory - at some point between
MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
vanish, and get_ksm_page()'s accesses to them become a violation.

So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
checks, to achieve the same lockout without being caught by lockdep.
This is less elegant for KSM, but it's more important to keep lockdep
useful to other users - and I apologize for how long it took to fix.

Reported-by: Gerald Schaefer 
Signed-off-by: Hugh Dickins 
---
 mm/ksm.c |   55 +++--
 1 file changed, 41 insertions(+), 14 deletions(-)

--- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800
+++ mmotm/mm/ksm.c  2013-01-25 14:38:53.984208836 -0800
@@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
 #define KSM_RUN_STOP   0
 #define KSM_RUN_MERGE  1
 #define KSM_RUN_UNMERGE2
-static unsigned int ksm_run = KSM_RUN_STOP;
+#define KSM_RUN_OFFLINE4
+static unsigned long ksm_run = KSM_RUN_STOP;
+static void wait_while_offlining(void);
 
 static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
 static DEFINE_MUTEX(ksm_thread_mutex);
@@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
 
while (!kthread_should_stop()) {
mutex_lock(_thread_mutex);
+   wait_while_offlining();
if (ksmd_should_run())
ksm_do_scan(ksm_thread_pages_to_scan);
mutex_unlock(_thread_mutex);
@@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+static int just_wait(void *word)
+{
+   schedule();
+   return 0;
+}
+
+static void wait_while_offlining(void)
+{
+   while (ksm_run & KSM_RUN_OFFLINE) {
+   mutex_unlock(_thread_mutex);
+   wait_on_bit(_run, ilog2(KSM_RUN_OFFLINE),
+   just_wait, TASK_UNINTERRUPTIBLE);
+   mutex_lock(_thread_mutex);
+   }
+}
+
 static void ksm_check_stable_tree(unsigned long start_pfn,
  unsigned long end_pfn)
 {
@@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
switch (action) {
case MEM_GOING_OFFLINE:
/*
-* Keep it very simple for now: just lock out ksmd and
-* MADV_UNMERGEABLE while any memory is going offline.
-* mutex_lock_nested() is necessary because lockdep was alarmed
-* that here we take ksm_thread_mutex inside notifier chain
-* mutex, and later take notifier chain mutex inside
-* ksm_thread_mutex to unlock it.   But that's safe because both
-* are inside mem_hotplug_mutex.
+* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
+* and remove_all_stable_nodes() while memory is going offline:
+* it is unsafe for them to touch the stable tree at this time.
+* But unmerge_ksm_pages(), rmap lookups and other entry points
+* which do not need the ksm_thread_mutex are all safe.
 */
-   mutex_lock_nested(_thread_mutex, SINGLE_DEPTH_NESTING);
+   mutex_lock(_thread_mutex);
+   ksm_run |= KSM_RUN_OFFLINE;
+   mutex_unlock(_thread_mutex);
break;
 
case MEM_OFFLINE:
@@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
/* fallthrough */
 
case MEM_CANCEL_OFFLINE:
+   mutex_lock(_thread_mutex);
+   ksm_run &= ~KSM_RUN_OFFLINE;
mutex_unlock(_thread_mutex);
+
+   smp_mb();   /* wake_up_bit advises this */
+   wake_up_bit(_run, ilog2(KSM_RUN_OFFLINE));
break;
}
return NOTIFY_OK;
 }
+#else
+static void wait_while_offlining(void)
+{
+}
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 
 #ifdef CONFIG_SYSFS
@@ -2189,7 

[PATCH 11/11] ksm: stop hotremove lockdep warning

2013-01-25 Thread Hugh Dickins
Complaints are rare, but lockdep still does not understand the way
ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and
holds it until the ksm_memory_callback(MEM_OFFLINE): that appears
to be a problem because notifier callbacks are made under down_read
of blocking_notifier_head-rwsem (so first the mutex is taken while
holding the rwsem, then later the rwsem is taken while still holding
the mutex); but is not in fact a problem because mem_hotplug_mutex
is held throughout the dance.

There was an attempt to fix this with mutex_lock_nested(); but if that
happened to fool lockdep two years ago, apparently it does so no longer.

I had hoped to eradicate this issue in extending KSM page migration not
to need the ksm_thread_mutex.  But then realized that although the page
migration itself is safe, we do still need to lock out ksmd and other
users of get_ksm_page() while offlining memory - at some point between
MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
vanish, and get_ksm_page()'s accesses to them become a violation.

So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
checks, to achieve the same lockout without being caught by lockdep.
This is less elegant for KSM, but it's more important to keep lockdep
useful to other users - and I apologize for how long it took to fix.

Reported-by: Gerald Schaefer gerald.schae...@de.ibm.com
Signed-off-by: Hugh Dickins hu...@google.com
---
 mm/ksm.c |   55 +++--
 1 file changed, 41 insertions(+), 14 deletions(-)

--- mmotm.orig/mm/ksm.c 2013-01-25 14:37:06.880206290 -0800
+++ mmotm/mm/ksm.c  2013-01-25 14:38:53.984208836 -0800
@@ -226,7 +226,9 @@ static unsigned int ksm_merge_across_nod
 #define KSM_RUN_STOP   0
 #define KSM_RUN_MERGE  1
 #define KSM_RUN_UNMERGE2
-static unsigned int ksm_run = KSM_RUN_STOP;
+#define KSM_RUN_OFFLINE4
+static unsigned long ksm_run = KSM_RUN_STOP;
+static void wait_while_offlining(void);
 
 static DECLARE_WAIT_QUEUE_HEAD(ksm_thread_wait);
 static DEFINE_MUTEX(ksm_thread_mutex);
@@ -1700,6 +1702,7 @@ static int ksm_scan_thread(void *nothing
 
while (!kthread_should_stop()) {
mutex_lock(ksm_thread_mutex);
+   wait_while_offlining();
if (ksmd_should_run())
ksm_do_scan(ksm_thread_pages_to_scan);
mutex_unlock(ksm_thread_mutex);
@@ -2056,6 +2059,22 @@ void ksm_migrate_page(struct page *newpa
 #endif /* CONFIG_MIGRATION */
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
+static int just_wait(void *word)
+{
+   schedule();
+   return 0;
+}
+
+static void wait_while_offlining(void)
+{
+   while (ksm_run  KSM_RUN_OFFLINE) {
+   mutex_unlock(ksm_thread_mutex);
+   wait_on_bit(ksm_run, ilog2(KSM_RUN_OFFLINE),
+   just_wait, TASK_UNINTERRUPTIBLE);
+   mutex_lock(ksm_thread_mutex);
+   }
+}
+
 static void ksm_check_stable_tree(unsigned long start_pfn,
  unsigned long end_pfn)
 {
@@ -2098,15 +2117,15 @@ static int ksm_memory_callback(struct no
switch (action) {
case MEM_GOING_OFFLINE:
/*
-* Keep it very simple for now: just lock out ksmd and
-* MADV_UNMERGEABLE while any memory is going offline.
-* mutex_lock_nested() is necessary because lockdep was alarmed
-* that here we take ksm_thread_mutex inside notifier chain
-* mutex, and later take notifier chain mutex inside
-* ksm_thread_mutex to unlock it.   But that's safe because both
-* are inside mem_hotplug_mutex.
+* Prevent ksm_do_scan(), unmerge_and_remove_all_rmap_items()
+* and remove_all_stable_nodes() while memory is going offline:
+* it is unsafe for them to touch the stable tree at this time.
+* But unmerge_ksm_pages(), rmap lookups and other entry points
+* which do not need the ksm_thread_mutex are all safe.
 */
-   mutex_lock_nested(ksm_thread_mutex, SINGLE_DEPTH_NESTING);
+   mutex_lock(ksm_thread_mutex);
+   ksm_run |= KSM_RUN_OFFLINE;
+   mutex_unlock(ksm_thread_mutex);
break;
 
case MEM_OFFLINE:
@@ -2122,11 +2141,20 @@ static int ksm_memory_callback(struct no
/* fallthrough */
 
case MEM_CANCEL_OFFLINE:
+   mutex_lock(ksm_thread_mutex);
+   ksm_run = ~KSM_RUN_OFFLINE;
mutex_unlock(ksm_thread_mutex);
+
+   smp_mb();   /* wake_up_bit advises this */
+   wake_up_bit(ksm_run, ilog2(KSM_RUN_OFFLINE));
break;
}
return NOTIFY_OK;
 }
+#else
+static void wait_while_offlining(void)
+{
+}