Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-14 Thread Hugh Dickins
On Thu, 14 Feb 2013, Mel Gorman wrote:
> On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote:
> > 
> > What I found is that a 4th cause emerges once KSM migration
> > is properly working: that interval during page migration when the old
> > page has been fully unmapped but the new not yet mapped in its place.
> > 
> 
> For anyone else watching -- normal page migration expects to be protected
> during that particular window with migration ptes. Any references to the
> PTE mapping a page being migrated faults on a swap-like PTE and waits
> in migration_entry_wait().
> 
> > The KSM COW breaking cannot see a page there then, so it ends up with
> > a (newly migrated) KSM page left behind.  Almost certainly has to be
> > fixed in follow_page(), but I've not yet settled on its final form -
> > the fix I have works well, but a different approach might be better.
> > 

The fix I had (following migration entry to old page) was a bit too
PageKsm specfic, and probably wrong for when get_user_pages() needs
to get a hold on the _new_ page.

> 
> follow_page() is one option. My guess is that you're thinking of adding
> a FOLL_ flag that will cause follow_page() to check is_migration_entry()
> and migration_entry_wait() if the flag is present.

Maybe a FOLL_flag, but I was thinking of doing it always.  The usual
get_user_pages() case will already wait in handle_mm_fault() and works
okay, and I didn't identify a problem case for follow_page() apart from
this ksm.c usage; but I did wonder if someone might have or add code
which gets similarly caught out by the migration case.

It's not a change I'd dare to make (without a FOLL_flag) if Andrea
hadn't already added a wait_split_huge_page() into follow_page();
and I need to convince myself that adding another cause for waiting
is necessarily safe (perhaps adding a might_sleep would be good).

Sorry, I expected to have posted follow-up patches days and days ago,
but in fact my time has vanished elsewhere and I've not even started.

> 
> Otherwise you would need to check for migration ptes in a number of places
> under page lock and then hold the lock for long periods of time to prevent
> migration starting. I did not check this option in depth because it quickly
> looked like it would be a mess, with long page lock hold times and might
> not even be workable.

Yes, I think that's more or less why I quickly decided on doing it in
follow_page().

Another option would be to move the ksm_migrate_page() callsite, and
allow it to reject the migration attempt when "inconvenient" (I haven't
stopped to think of the definition of inconvenient).  Though it wouldn't
fail often enough for anyone out there to care, that option just feels
like a shameful cop-out to me: I'm trying to improve migration, not add
strange cases when it fails.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-14 Thread Mel Gorman
On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote:
> > > 
> > > 
> > > 2. __ksm_enter() has a nice little optimization, to insert the new mm
> > > just behind ksmd's cursor, so there's a full pass for it to stabilize
> > > (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> > > but not so nice when we're trying to unmerge all mms: we were missing
> > > those mms forked and inserted behind the unmerge cursor.  Easily fixed
> > > by inserting at the end when KSM_RUN_UNMERGE.
> > > 
> > > 3. It is possible for a KSM page to be faulted back from swapcache into
> > > an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> > > Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> > > to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> > > and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 
> What I found is that a 4th cause emerges once KSM migration
> is properly working: that interval during page migration when the old
> page has been fully unmapped but the new not yet mapped in its place.
> 

For anyone else watching -- normal page migration expects to be protected
during that particular window with migration ptes. Any references to the
PTE mapping a page being migrated faults on a swap-like PTE and waits
in migration_entry_wait().

> The KSM COW breaking cannot see a page there then, so it ends up with
> a (newly migrated) KSM page left behind.  Almost certainly has to be
> fixed in follow_page(), but I've not yet settled on its final form -
> the fix I have works well, but a different approach might be better.
> 

follow_page() is one option. My guess is that you're thinking of adding
a FOLL_ flag that will cause follow_page() to check is_migration_entry()
and migration_entry_wait() if the flag is present.

Otherwise you would need to check for migration ptes in a number of places
under page lock and then hold the lock for long periods of time to prevent
migration starting. I did not check this option in depth because it quickly
looked like it would be a mess, with long page lock hold times and might
not even be workable.

> > > +static int remove_all_stable_nodes(void)
> > > +{
> > > + struct stable_node *stable_node;
> > > + int nid;
> > > + int err = 0;
> > > +
> > > + for (nid = 0; nid < nr_node_ids; nid++) {
> > > + while (root_stable_tree[nid].rb_node) {
> > > + stable_node = rb_entry(root_stable_tree[nid].rb_node,
> > > + struct stable_node, node);
> > > + if (remove_stable_node(stable_node)) {
> > > + err = -EBUSY;
> > > + break;  /* proceed to next nid */
> > > + }
> > 
> > If remove_stable_node() returns an error then it's quite possible that it'll
> > go boom when that page is encountered later but it's not guaranteed. It'd
> > be best effort to continue removing as many of the stable nodes anyway.
> > We're in trouble either way of course.
> 
> If it returns an error, then indeed something we don't yet understand
> has occurred, and we shall want to debug it.  But unless it's due to
> corruption somewhere, we shouldn't be in much trouble, shouldn't go boom:
> remove_all_stable_nodes() error is ignored at the end of unmerging, it
> will be tried again when changing merge_across_nodes, and an error
> then will just prevent changing merge_across_nodes at that time.  So
> the mysteriously unremovable stable nodes remain the same kind of tree.
> 

Ok.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-14 Thread Mel Gorman
On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote:
   SNIP
   
   2. __ksm_enter() has a nice little optimization, to insert the new mm
   just behind ksmd's cursor, so there's a full pass for it to stabilize
   (or be removed) before ksmd addresses it.  Nice when ksmd is running,
   but not so nice when we're trying to unmerge all mms: we were missing
   those mms forked and inserted behind the unmerge cursor.  Easily fixed
   by inserting at the end when KSM_RUN_UNMERGE.
   
   3. It is possible for a KSM page to be faulted back from swapcache into
   an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
   Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
   to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
   and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
 
 What I found is that a 4th cause emerges once KSM migration
 is properly working: that interval during page migration when the old
 page has been fully unmapped but the new not yet mapped in its place.
 

For anyone else watching -- normal page migration expects to be protected
during that particular window with migration ptes. Any references to the
PTE mapping a page being migrated faults on a swap-like PTE and waits
in migration_entry_wait().

 The KSM COW breaking cannot see a page there then, so it ends up with
 a (newly migrated) KSM page left behind.  Almost certainly has to be
 fixed in follow_page(), but I've not yet settled on its final form -
 the fix I have works well, but a different approach might be better.
 

follow_page() is one option. My guess is that you're thinking of adding
a FOLL_ flag that will cause follow_page() to check is_migration_entry()
and migration_entry_wait() if the flag is present.

Otherwise you would need to check for migration ptes in a number of places
under page lock and then hold the lock for long periods of time to prevent
migration starting. I did not check this option in depth because it quickly
looked like it would be a mess, with long page lock hold times and might
not even be workable.

   +static int remove_all_stable_nodes(void)
   +{
   + struct stable_node *stable_node;
   + int nid;
   + int err = 0;
   +
   + for (nid = 0; nid  nr_node_ids; nid++) {
   + while (root_stable_tree[nid].rb_node) {
   + stable_node = rb_entry(root_stable_tree[nid].rb_node,
   + struct stable_node, node);
   + if (remove_stable_node(stable_node)) {
   + err = -EBUSY;
   + break;  /* proceed to next nid */
   + }
  
  If remove_stable_node() returns an error then it's quite possible that it'll
  go boom when that page is encountered later but it's not guaranteed. It'd
  be best effort to continue removing as many of the stable nodes anyway.
  We're in trouble either way of course.
 
 If it returns an error, then indeed something we don't yet understand
 has occurred, and we shall want to debug it.  But unless it's due to
 corruption somewhere, we shouldn't be in much trouble, shouldn't go boom:
 remove_all_stable_nodes() error is ignored at the end of unmerging, it
 will be tried again when changing merge_across_nodes, and an error
 then will just prevent changing merge_across_nodes at that time.  So
 the mysteriously unremovable stable nodes remain the same kind of tree.
 

Ok.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-14 Thread Hugh Dickins
On Thu, 14 Feb 2013, Mel Gorman wrote:
 On Fri, Feb 08, 2013 at 11:33:40AM -0800, Hugh Dickins wrote:
  
  What I found is that a 4th cause emerges once KSM migration
  is properly working: that interval during page migration when the old
  page has been fully unmapped but the new not yet mapped in its place.
  
 
 For anyone else watching -- normal page migration expects to be protected
 during that particular window with migration ptes. Any references to the
 PTE mapping a page being migrated faults on a swap-like PTE and waits
 in migration_entry_wait().
 
  The KSM COW breaking cannot see a page there then, so it ends up with
  a (newly migrated) KSM page left behind.  Almost certainly has to be
  fixed in follow_page(), but I've not yet settled on its final form -
  the fix I have works well, but a different approach might be better.
  

The fix I had (following migration entry to old page) was a bit too
PageKsm specfic, and probably wrong for when get_user_pages() needs
to get a hold on the _new_ page.

 
 follow_page() is one option. My guess is that you're thinking of adding
 a FOLL_ flag that will cause follow_page() to check is_migration_entry()
 and migration_entry_wait() if the flag is present.

Maybe a FOLL_flag, but I was thinking of doing it always.  The usual
get_user_pages() case will already wait in handle_mm_fault() and works
okay, and I didn't identify a problem case for follow_page() apart from
this ksm.c usage; but I did wonder if someone might have or add code
which gets similarly caught out by the migration case.

It's not a change I'd dare to make (without a FOLL_flag) if Andrea
hadn't already added a wait_split_huge_page() into follow_page();
and I need to convince myself that adding another cause for waiting
is necessarily safe (perhaps adding a might_sleep would be good).

Sorry, I expected to have posted follow-up patches days and days ago,
but in fact my time has vanished elsewhere and I've not even started.

 
 Otherwise you would need to check for migration ptes in a number of places
 under page lock and then hold the lock for long periods of time to prevent
 migration starting. I did not check this option in depth because it quickly
 looked like it would be a mess, with long page lock hold times and might
 not even be workable.

Yes, I think that's more or less why I quickly decided on doing it in
follow_page().

Another option would be to move the ksm_migrate_page() callsite, and
allow it to reject the migration attempt when inconvenient (I haven't
stopped to think of the definition of inconvenient).  Though it wouldn't
fail often enough for anyone out there to care, that option just feels
like a shameful cop-out to me: I'm trying to improve migration, not add
strange cases when it fails.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-08 Thread Hugh Dickins
On Tue, 5 Feb 2013, Mel Gorman wrote:
> On Fri, Jan 25, 2013 at 06:01:59PM -0800, Hugh Dickins wrote:
> > Switching merge_across_nodes after running KSM is liable to oops on stale
> > nodes still left over from the previous stable tree.  It's not something
> > that people will often want to do, but it would be lame to demand a reboot
> > when they're trying to determine which merge_across_nodes setting is best.
> > 
> > How can this happen?  We only permit switching merge_across_nodes when
> > pages_shared is 0, and usually set run 2 to force that beforehand, which
> > ought to unmerge everything: yet oopses still occur when you then run 1.
> > 
> 
> When reviewing patch 1, I missed that the pages_shared check would prevent
> most of the problems I was envisioning with leftover entries in the
> stable tree. Sorry about that.

No apology necessary!

> 
> > Three causes:
> > 
> > 1. The old stable tree (built according to the inverse merge_across_nodes)
> > has not been fully torn down.  A stable node lingers until get_ksm_page()
> > notices that the page it references no longer references it: but the page
> > is not necessarily freed as soon as expected, particularly when swapcache.
> > 
> > Fix this with a pass through the old stable tree, applying get_ksm_page()
> > to each of the remaining nodes (most found stale and removed immediately),
> > with forced removal of any left over.  Unless the page is still mapped:
> > I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> > and EBUSY than BUG.

But once I applied the testing for this to the completed patch series,
I did start seeing that WARN_ON_ONCE: it's made safe by the EBUSY,
but not working as intended.  Cause outlined below.

> > 
> > 2. __ksm_enter() has a nice little optimization, to insert the new mm
> > just behind ksmd's cursor, so there's a full pass for it to stabilize
> > (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> > but not so nice when we're trying to unmerge all mms: we were missing
> > those mms forked and inserted behind the unmerge cursor.  Easily fixed
> > by inserting at the end when KSM_RUN_UNMERGE.
> > 
> > 3. It is possible for a KSM page to be faulted back from swapcache into
> > an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> > Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> > to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> > and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.

What I found is that a 4th cause emerges once KSM migration
is properly working: that interval during page migration when the old
page has been fully unmapped but the new not yet mapped in its place.

The KSM COW breaking cannot see a page there then, so it ends up with
a (newly migrated) KSM page left behind.  Almost certainly has to be
fixed in follow_page(), but I've not yet settled on its final form -
the fix I have works well, but a different approach might be better.

I'm also puzzled that I've never in practice been hit by a 5th cause:
swapoff's try_to_unuse() is much like faulting, and ought to have the
same ksm_might_need_to_copy() safeguards as faulting (or at least,
I cannot see why not).

> > --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800
> > +++ mmotm/mm/ksm.c  2013-01-25 14:37:00.768206145 -0800
> > @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
> >  /*
> >   * Only called through the sysfs control interface:
> >   */
> > +static int remove_stable_node(struct stable_node *stable_node)
> > +{
> > +   struct page *page;
> > +   int err;
> > +
> > +   page = get_ksm_page(stable_node, true);
> > +   if (!page) {
> > +   /*
> > +* get_ksm_page did remove_node_from_stable_tree itself.
> > +*/
> > +   return 0;
> > +   }
> > +
> > +   if (WARN_ON_ONCE(page_mapped(page)))
> > +   err = -EBUSY;
> > +   else {
> > +   /*
> 
> It will probably be very obvious to people familiar with ksm.c but even
> so maybe remind the reader that the pages must already have been unmerged
> 
> * This page must already have been unmerged and should be stale.
> * It might be in a pagevec waiting to be freed or it might be

Okay, I'll add a little more comment there;
but I need to think longer for exactly how to express it.

> ..
> 
> 
> 
> > +* This page might be in a pagevec waiting to be freed,
> > +* or it might be PageSwapCache (perhaps under writeback),
> > +* or it might have been removed from swapcache a moment ago.
> > +*/
> > +   set_page_stable_node(page, NULL);
> > +   remove_node_from_stable_tree(stable_node);
> > +   err = 0;
> > +   }
> > +
> > +   unlock_page(page);
> > +   put_page(page);
> > +   return err;
> > +}
> > +
> > +static int remove_all_stable_nodes(void)
> > +{
> > +   struct stable_node *stable_node;
> > +   int nid;
> > +   int err = 0;
> > +

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-08 Thread Hugh Dickins
On Tue, 5 Feb 2013, Mel Gorman wrote:
 On Fri, Jan 25, 2013 at 06:01:59PM -0800, Hugh Dickins wrote:
  Switching merge_across_nodes after running KSM is liable to oops on stale
  nodes still left over from the previous stable tree.  It's not something
  that people will often want to do, but it would be lame to demand a reboot
  when they're trying to determine which merge_across_nodes setting is best.
  
  How can this happen?  We only permit switching merge_across_nodes when
  pages_shared is 0, and usually set run 2 to force that beforehand, which
  ought to unmerge everything: yet oopses still occur when you then run 1.
  
 
 When reviewing patch 1, I missed that the pages_shared check would prevent
 most of the problems I was envisioning with leftover entries in the
 stable tree. Sorry about that.

No apology necessary!

 
  Three causes:
  
  1. The old stable tree (built according to the inverse merge_across_nodes)
  has not been fully torn down.  A stable node lingers until get_ksm_page()
  notices that the page it references no longer references it: but the page
  is not necessarily freed as soon as expected, particularly when swapcache.
  
  Fix this with a pass through the old stable tree, applying get_ksm_page()
  to each of the remaining nodes (most found stale and removed immediately),
  with forced removal of any left over.  Unless the page is still mapped:
  I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
  and EBUSY than BUG.

But once I applied the testing for this to the completed patch series,
I did start seeing that WARN_ON_ONCE: it's made safe by the EBUSY,
but not working as intended.  Cause outlined below.

  
  2. __ksm_enter() has a nice little optimization, to insert the new mm
  just behind ksmd's cursor, so there's a full pass for it to stabilize
  (or be removed) before ksmd addresses it.  Nice when ksmd is running,
  but not so nice when we're trying to unmerge all mms: we were missing
  those mms forked and inserted behind the unmerge cursor.  Easily fixed
  by inserting at the end when KSM_RUN_UNMERGE.
  
  3. It is possible for a KSM page to be faulted back from swapcache into
  an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
  Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
  to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
  and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.

What I found is that a 4th cause emerges once KSM migration
is properly working: that interval during page migration when the old
page has been fully unmapped but the new not yet mapped in its place.

The KSM COW breaking cannot see a page there then, so it ends up with
a (newly migrated) KSM page left behind.  Almost certainly has to be
fixed in follow_page(), but I've not yet settled on its final form -
the fix I have works well, but a different approach might be better.

I'm also puzzled that I've never in practice been hit by a 5th cause:
swapoff's try_to_unuse() is much like faulting, and ought to have the
same ksm_might_need_to_copy() safeguards as faulting (or at least,
I cannot see why not).

  --- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800
  +++ mmotm/mm/ksm.c  2013-01-25 14:37:00.768206145 -0800
  @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
   /*
* Only called through the sysfs control interface:
*/
  +static int remove_stable_node(struct stable_node *stable_node)
  +{
  +   struct page *page;
  +   int err;
  +
  +   page = get_ksm_page(stable_node, true);
  +   if (!page) {
  +   /*
  +* get_ksm_page did remove_node_from_stable_tree itself.
  +*/
  +   return 0;
  +   }
  +
  +   if (WARN_ON_ONCE(page_mapped(page)))
  +   err = -EBUSY;
  +   else {
  +   /*
 
 It will probably be very obvious to people familiar with ksm.c but even
 so maybe remind the reader that the pages must already have been unmerged
 
 * This page must already have been unmerged and should be stale.
 * It might be in a pagevec waiting to be freed or it might be

Okay, I'll add a little more comment there;
but I need to think longer for exactly how to express it.

 ..
 
 
 
  +* This page might be in a pagevec waiting to be freed,
  +* or it might be PageSwapCache (perhaps under writeback),
  +* or it might have been removed from swapcache a moment ago.
  +*/
  +   set_page_stable_node(page, NULL);
  +   remove_node_from_stable_tree(stable_node);
  +   err = 0;
  +   }
  +
  +   unlock_page(page);
  +   put_page(page);
  +   return err;
  +}
  +
  +static int remove_all_stable_nodes(void)
  +{
  +   struct stable_node *stable_node;
  +   int nid;
  +   int err = 0;
  +
  +   for (nid = 0; nid  nr_node_ids; nid++) {
  +   while (root_stable_tree[nid].rb_node) {
  +   stable_node = 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-05 Thread Mel Gorman
On Fri, Jan 25, 2013 at 06:01:59PM -0800, Hugh Dickins wrote:
> Switching merge_across_nodes after running KSM is liable to oops on stale
> nodes still left over from the previous stable tree.  It's not something
> that people will often want to do, but it would be lame to demand a reboot
> when they're trying to determine which merge_across_nodes setting is best.
> 
> How can this happen?  We only permit switching merge_across_nodes when
> pages_shared is 0, and usually set run 2 to force that beforehand, which
> ought to unmerge everything: yet oopses still occur when you then run 1.
> 

When reviewing patch 1, I missed that the pages_shared check would prevent
most of the problems I was envisioning with leftover entries in the
stable tree. Sorry about that.

> Three causes:
> 
> 1. The old stable tree (built according to the inverse merge_across_nodes)
> has not been fully torn down.  A stable node lingers until get_ksm_page()
> notices that the page it references no longer references it: but the page
> is not necessarily freed as soon as expected, particularly when swapcache.
> 
> Fix this with a pass through the old stable tree, applying get_ksm_page()
> to each of the remaining nodes (most found stale and removed immediately),
> with forced removal of any left over.  Unless the page is still mapped:
> I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> and EBUSY than BUG.
> 
> 2. __ksm_enter() has a nice little optimization, to insert the new mm
> just behind ksmd's cursor, so there's a full pass for it to stabilize
> (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> but not so nice when we're trying to unmerge all mms: we were missing
> those mms forked and inserted behind the unmerge cursor.  Easily fixed
> by inserting at the end when KSM_RUN_UNMERGE.
> 
> 3. It is possible for a KSM page to be faulted back from swapcache into
> an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 
> A long outstanding, unrelated bugfix sneaks in with that third fix:
> ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
> I/O error when read in from swap) to a page which it then marks Uptodate.
> Fix this case by not copying, letting do_swap_page() discover the error.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  include/linux/ksm.h |   18 ++---
>  mm/ksm.c|   83 +++---
>  mm/memory.c |   19 -
>  3 files changed, 92 insertions(+), 28 deletions(-)
> 
> --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
> +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
> @@ -16,9 +16,6 @@
>  struct stable_node;
>  struct mem_cgroup;
>  
> -struct page *ksm_does_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address);
> -
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>   unsigned long end, int advice, unsigned long *vm_flags);
> @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
>   * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
>   * but what if the vma was unmerged while the page was swapped out?
>   */
> -static inline int ksm_might_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address)
> -{
> - struct anon_vma *anon_vma = page_anon_vma(page);
> -
> - return anon_vma &&
> - (anon_vma->root != vma->anon_vma->root ||
> -  page->index != linear_page_index(vma, address));
> -}
> +struct page *ksm_might_need_to_copy(struct page *page,
> + struct vm_area_struct *vma, unsigned long address);
>  
>  int page_referenced_ksm(struct page *page,
>   struct mem_cgroup *memcg, unsigned long *vm_flags);
> @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
>   return 0;
>  }
>  
> -static inline int ksm_might_need_to_copy(struct page *page,
> +static inline struct page *ksm_might_need_to_copy(struct page *page,
>   struct vm_area_struct *vma, unsigned long address)
>  {
> - return 0;
> + return page;
>  }
>  
>  static inline int page_referenced_ksm(struct page *page,
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
> @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
>  /*
>   * Only called through the sysfs control interface:
>   */
> +static int remove_stable_node(struct stable_node *stable_node)
> +{
> + struct page *page;
> + int err;
> +
> + page = get_ksm_page(stable_node, true);
> + if (!page) {
> + /*
> +  * 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-02-05 Thread Mel Gorman
On Fri, Jan 25, 2013 at 06:01:59PM -0800, Hugh Dickins wrote:
 Switching merge_across_nodes after running KSM is liable to oops on stale
 nodes still left over from the previous stable tree.  It's not something
 that people will often want to do, but it would be lame to demand a reboot
 when they're trying to determine which merge_across_nodes setting is best.
 
 How can this happen?  We only permit switching merge_across_nodes when
 pages_shared is 0, and usually set run 2 to force that beforehand, which
 ought to unmerge everything: yet oopses still occur when you then run 1.
 

When reviewing patch 1, I missed that the pages_shared check would prevent
most of the problems I was envisioning with leftover entries in the
stable tree. Sorry about that.

 Three causes:
 
 1. The old stable tree (built according to the inverse merge_across_nodes)
 has not been fully torn down.  A stable node lingers until get_ksm_page()
 notices that the page it references no longer references it: but the page
 is not necessarily freed as soon as expected, particularly when swapcache.
 
 Fix this with a pass through the old stable tree, applying get_ksm_page()
 to each of the remaining nodes (most found stale and removed immediately),
 with forced removal of any left over.  Unless the page is still mapped:
 I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
 and EBUSY than BUG.
 
 2. __ksm_enter() has a nice little optimization, to insert the new mm
 just behind ksmd's cursor, so there's a full pass for it to stabilize
 (or be removed) before ksmd addresses it.  Nice when ksmd is running,
 but not so nice when we're trying to unmerge all mms: we were missing
 those mms forked and inserted behind the unmerge cursor.  Easily fixed
 by inserting at the end when KSM_RUN_UNMERGE.
 
 3. It is possible for a KSM page to be faulted back from swapcache into
 an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
 Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
 to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
 and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
 
 A long outstanding, unrelated bugfix sneaks in with that third fix:
 ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
 I/O error when read in from swap) to a page which it then marks Uptodate.
 Fix this case by not copying, letting do_swap_page() discover the error.
 
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  include/linux/ksm.h |   18 ++---
  mm/ksm.c|   83 +++---
  mm/memory.c |   19 -
  3 files changed, 92 insertions(+), 28 deletions(-)
 
 --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
 +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
 @@ -16,9 +16,6 @@
  struct stable_node;
  struct mem_cgroup;
  
 -struct page *ksm_does_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address);
 -
  #ifdef CONFIG_KSM
  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, int advice, unsigned long *vm_flags);
 @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
   * We'd like to make this conditional on vma-vm_flags  VM_MERGEABLE,
   * but what if the vma was unmerged while the page was swapped out?
   */
 -static inline int ksm_might_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address)
 -{
 - struct anon_vma *anon_vma = page_anon_vma(page);
 -
 - return anon_vma 
 - (anon_vma-root != vma-anon_vma-root ||
 -  page-index != linear_page_index(vma, address));
 -}
 +struct page *ksm_might_need_to_copy(struct page *page,
 + struct vm_area_struct *vma, unsigned long address);
  
  int page_referenced_ksm(struct page *page,
   struct mem_cgroup *memcg, unsigned long *vm_flags);
 @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
   return 0;
  }
  
 -static inline int ksm_might_need_to_copy(struct page *page,
 +static inline struct page *ksm_might_need_to_copy(struct page *page,
   struct vm_area_struct *vma, unsigned long address)
  {
 - return 0;
 + return page;
  }
  
  static inline int page_referenced_ksm(struct page *page,
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
 @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
  /*
   * Only called through the sysfs control interface:
   */
 +static int remove_stable_node(struct stable_node *stable_node)
 +{
 + struct page *page;
 + int err;
 +
 + page = get_ksm_page(stable_node, true);
 + if (!page) {
 + /*
 +  * get_ksm_page did remove_node_from_stable_tree itself.
 +  */
 + return 0;
 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-28 Thread Hugh Dickins
On Mon, 28 Jan 2013, Andrew Morton wrote:
> On Fri, 25 Jan 2013 18:01:59 -0800 (PST)
> Hugh Dickins  wrote:
> 
> > +static int remove_all_stable_nodes(void)
> > +{
> > +   struct stable_node *stable_node;
> > +   int nid;
> > +   int err = 0;
> > +
> > +   for (nid = 0; nid < nr_node_ids; nid++) {
> > +   while (root_stable_tree[nid].rb_node) {
> > +   stable_node = rb_entry(root_stable_tree[nid].rb_node,
> > +   struct stable_node, node);
> > +   if (remove_stable_node(stable_node)) {
> > +   err = -EBUSY;
> 
> It's a bit rude to overwrite remove_stable_node()'s return value.

Well yes, but only the tiniest bit rude :)

> 
> > +   break;  /* proceed to next nid */
> > +   }
> > +   cond_resched();
> 
> Why is this here?

Because we don't have a limit on the length of this loop, and if
every node which remove_stable_node() finds is already stale, and
has no rmap_item still attached, then there would be no rescheduling
point in the unbounded loop without this one.  I was taught to worry
about bad latencies even in unpreemptible kernels.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-28 Thread Andrew Morton
On Fri, 25 Jan 2013 18:01:59 -0800 (PST)
Hugh Dickins  wrote:

> +static int remove_all_stable_nodes(void)
> +{
> + struct stable_node *stable_node;
> + int nid;
> + int err = 0;
> +
> + for (nid = 0; nid < nr_node_ids; nid++) {
> + while (root_stable_tree[nid].rb_node) {
> + stable_node = rb_entry(root_stable_tree[nid].rb_node,
> + struct stable_node, node);
> + if (remove_stable_node(stable_node)) {
> + err = -EBUSY;

It's a bit rude to overwrite remove_stable_node()'s return value.

> + break;  /* proceed to next nid */
> + }
> + cond_resched();

Why is this here?

> + }
> + }
> + return err;
> +}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-28 Thread Andrew Morton
On Fri, 25 Jan 2013 18:01:59 -0800 (PST)
Hugh Dickins hu...@google.com wrote:

 +static int remove_all_stable_nodes(void)
 +{
 + struct stable_node *stable_node;
 + int nid;
 + int err = 0;
 +
 + for (nid = 0; nid  nr_node_ids; nid++) {
 + while (root_stable_tree[nid].rb_node) {
 + stable_node = rb_entry(root_stable_tree[nid].rb_node,
 + struct stable_node, node);
 + if (remove_stable_node(stable_node)) {
 + err = -EBUSY;

It's a bit rude to overwrite remove_stable_node()'s return value.

 + break;  /* proceed to next nid */
 + }
 + cond_resched();

Why is this here?

 + }
 + }
 + return err;
 +}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-28 Thread Hugh Dickins
On Mon, 28 Jan 2013, Andrew Morton wrote:
 On Fri, 25 Jan 2013 18:01:59 -0800 (PST)
 Hugh Dickins hu...@google.com wrote:
 
  +static int remove_all_stable_nodes(void)
  +{
  +   struct stable_node *stable_node;
  +   int nid;
  +   int err = 0;
  +
  +   for (nid = 0; nid  nr_node_ids; nid++) {
  +   while (root_stable_tree[nid].rb_node) {
  +   stable_node = rb_entry(root_stable_tree[nid].rb_node,
  +   struct stable_node, node);
  +   if (remove_stable_node(stable_node)) {
  +   err = -EBUSY;
 
 It's a bit rude to overwrite remove_stable_node()'s return value.

Well yes, but only the tiniest bit rude :)

 
  +   break;  /* proceed to next nid */
  +   }
  +   cond_resched();
 
 Why is this here?

Because we don't have a limit on the length of this loop, and if
every node which remove_stable_node() finds is already stale, and
has no rmap_item still attached, then there would be no rescheduling
point in the unbounded loop without this one.  I was taught to worry
about bad latencies even in unpreemptible kernels.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> Switching merge_across_nodes after running KSM is liable to oops on stale
> nodes still left over from the previous stable tree.  It's not something
> that people will often want to do, but it would be lame to demand a reboot
> when they're trying to determine which merge_across_nodes setting is best.
> 
> How can this happen?  We only permit switching merge_across_nodes when
> pages_shared is 0, and usually set run 2 to force that beforehand, which
> ought to unmerge everything: yet oopses still occur when you then run 1.
> 
> Three causes:
> 
> 1. The old stable tree (built according to the inverse merge_across_nodes)
> has not been fully torn down.  A stable node lingers until get_ksm_page()
> notices that the page it references no longer references it: but the page
> is not necessarily freed as soon as expected, particularly when swapcache.
> 
> Fix this with a pass through the old stable tree, applying get_ksm_page()
> to each of the remaining nodes (most found stale and removed immediately),
> with forced removal of any left over.  Unless the page is still mapped:
> I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> and EBUSY than BUG.
> 
> 2. __ksm_enter() has a nice little optimization, to insert the new mm
> just behind ksmd's cursor, so there's a full pass for it to stabilize
> (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> but not so nice when we're trying to unmerge all mms: we were missing
> those mms forked and inserted behind the unmerge cursor.  Easily fixed
> by inserting at the end when KSM_RUN_UNMERGE.
> 
> 3. It is possible for a KSM page to be faulted back from swapcache into
> an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 
> A long outstanding, unrelated bugfix sneaks in with that third fix:
> ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
> I/O error when read in from swap) to a page which it then marks Uptodate.
> Fix this case by not copying, letting do_swap_page() discover the error.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  include/linux/ksm.h |   18 ++---
>  mm/ksm.c|   83 +++---
>  mm/memory.c |   19 -
>  3 files changed, 92 insertions(+), 28 deletions(-)
> 
> --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
> +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
> @@ -16,9 +16,6 @@
>  struct stable_node;
>  struct mem_cgroup;
>  
> -struct page *ksm_does_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address);
> -
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>   unsigned long end, int advice, unsigned long *vm_flags);
> @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
>   * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
>   * but what if the vma was unmerged while the page was swapped out?
>   */
> -static inline int ksm_might_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address)
> -{
> - struct anon_vma *anon_vma = page_anon_vma(page);
> -
> - return anon_vma &&
> - (anon_vma->root != vma->anon_vma->root ||
> -  page->index != linear_page_index(vma, address));
> -}
> +struct page *ksm_might_need_to_copy(struct page *page,
> + struct vm_area_struct *vma, unsigned long address);
>  
>  int page_referenced_ksm(struct page *page,
>   struct mem_cgroup *memcg, unsigned long *vm_flags);
> @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
>   return 0;
>  }
>  
> -static inline int ksm_might_need_to_copy(struct page *page,
> +static inline struct page *ksm_might_need_to_copy(struct page *page,
>   struct vm_area_struct *vma, unsigned long address)
>  {
> - return 0;
> + return page;
>  }
>  
>  static inline int page_referenced_ksm(struct page *page,
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
> @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
>  /*
>   * Only called through the sysfs control interface:
>   */
> +static int remove_stable_node(struct stable_node *stable_node)
> +{
> + struct page *page;
> + int err;
> +
> + page = get_ksm_page(stable_node, true);
> + if (!page) {
> + /*
> +  * get_ksm_page did remove_node_from_stable_tree itself.
> +  */
> + return 0;
> + }
> +
> + if (WARN_ON_ONCE(page_mapped(page)))
> + err = 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> > Switching merge_across_nodes after running KSM is liable to oops on stale
> > nodes still left over from the previous stable tree.  It's not something
> 
> Since this patch solve the problem, so the description of
> merge_across_nodes(Value can be changed only when there is no ksm shared
> pages in system) should be changed in this patch.

No.

The code could be changed to unmerge_and_remove_all_rmap_items()
automatically whenever merge_across_nodes is changed; but that's
not what Petr chose to do, and I didn't feel strongly to change it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
> On Sun, 2013-01-27 at 15:05 -0800, Hugh Dickins wrote:
> > On Sat, 26 Jan 2013, Simon Jeons wrote:
> > > > How can this happen?  We only permit switching merge_across_nodes when
> > > > pages_shared is 0, and usually set run 2 to force that beforehand, which
> > > > ought to unmerge everything: yet oopses still occur when you then run 1.
> > > > 
> > > > Three causes:
> > > > 
> > > > 1. The old stable tree (built according to the inverse 
> > > > merge_across_nodes)
>^
> How to understand inverse merge_across_nodes here?

How not to understand it?  Either it was 0 before (in which case there
were as many stable trees as NUMA nodes) and is being changed to 1 (in
which case there is to be only one stable tree), or it was 1 before
(for one) and is being changed to 0 (for many).

> 
> > > > has not been fully torn down.  A stable node lingers until 
> > > > get_ksm_page()
> > > > notices that the page it references no longer references it: but the 
> > > > page
> 
> Do you mean page->mapping is NULL when call get_ksm_page()? Who clear it
> NULL?

I think I already pointed you to free_pages_prepare().

> 
> > > > is not necessarily freed as soon as expected, particularly when 
> > > > swapcache.
> 
> Why is not necessarily freed as soon as expected?

As I answered below.

> > > > 
> > > 
> > > When can this happen?  
> > 
> > Whenever there's an additional reference to the page, beyond those for
> > its ptes in userspace - swapcache for example, or pinned by get_user_pages.
> > That delays its being freed (arriving at the "page->mapping = NULL;"
> > in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
> > waiting for that to be filled up, to be freed as part of a batch.

> > > mms forked will be unmerged just after ksmd's cursor since they're
> > > inserted behind it, why will be missing?
> > 
> > unmerge_and_remove_all_rmap_items() makes one pass through the list
> > from start to finish: insert behind the cursor and it will be missed.
> 
> Since mms forked will be insert just after ksmd's cursor, so it is the
> next which will be scan and unmerge, where I miss?

mms forked are normally inserted just behind (== before) ksmd's cursor,
as I've said in comments and explanations several times.

Simon, I've had enough: you clearly have much more time to spare for
asking questions than I have for answering them repeatedly: I would
rather spend my time attending to 100 higher priorities.

Please try much harder to work these things out for yourself from the
source (perhaps with help from kernelnewbies.org), before interrogating
linux-kernel and linux-mm developers.  Sometimes your questions may
help everybody to understand better, but often they just waste our time.

I'll happily admit that mm, and mm/ksm.c in particular, is not the easiest
place to start in understanding the kernel, nor I the best expositor.

Best wishes,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> Switching merge_across_nodes after running KSM is liable to oops on stale
> nodes still left over from the previous stable tree.  It's not something

Since this patch solve the problem, so the description of
merge_across_nodes(Value can be changed only when there is no ksm shared
pages in system) should be changed in this patch.

> that people will often want to do, but it would be lame to demand a reboot
> when they're trying to determine which merge_across_nodes setting is best.
> 
> How can this happen?  We only permit switching merge_across_nodes when
> pages_shared is 0, and usually set run 2 to force that beforehand, which
> ought to unmerge everything: yet oopses still occur when you then run 1.
> 
> Three causes:
> 
> 1. The old stable tree (built according to the inverse merge_across_nodes)
> has not been fully torn down.  A stable node lingers until get_ksm_page()
> notices that the page it references no longer references it: but the page
> is not necessarily freed as soon as expected, particularly when swapcache.
> 
> Fix this with a pass through the old stable tree, applying get_ksm_page()
> to each of the remaining nodes (most found stale and removed immediately),
> with forced removal of any left over.  Unless the page is still mapped:
> I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> and EBUSY than BUG.
> 
> 2. __ksm_enter() has a nice little optimization, to insert the new mm
> just behind ksmd's cursor, so there's a full pass for it to stabilize
> (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> but not so nice when we're trying to unmerge all mms: we were missing
> those mms forked and inserted behind the unmerge cursor.  Easily fixed
> by inserting at the end when KSM_RUN_UNMERGE.
> 
> 3. It is possible for a KSM page to be faulted back from swapcache into
> an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 
> A long outstanding, unrelated bugfix sneaks in with that third fix:
> ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
> I/O error when read in from swap) to a page which it then marks Uptodate.
> Fix this case by not copying, letting do_swap_page() discover the error.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  include/linux/ksm.h |   18 ++---
>  mm/ksm.c|   83 +++---
>  mm/memory.c |   19 -
>  3 files changed, 92 insertions(+), 28 deletions(-)
> 
> --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
> +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
> @@ -16,9 +16,6 @@
>  struct stable_node;
>  struct mem_cgroup;
>  
> -struct page *ksm_does_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address);
> -
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>   unsigned long end, int advice, unsigned long *vm_flags);
> @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
>   * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
>   * but what if the vma was unmerged while the page was swapped out?
>   */
> -static inline int ksm_might_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address)
> -{
> - struct anon_vma *anon_vma = page_anon_vma(page);
> -
> - return anon_vma &&
> - (anon_vma->root != vma->anon_vma->root ||
> -  page->index != linear_page_index(vma, address));
> -}
> +struct page *ksm_might_need_to_copy(struct page *page,
> + struct vm_area_struct *vma, unsigned long address);
>  
>  int page_referenced_ksm(struct page *page,
>   struct mem_cgroup *memcg, unsigned long *vm_flags);
> @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
>   return 0;
>  }
>  
> -static inline int ksm_might_need_to_copy(struct page *page,
> +static inline struct page *ksm_might_need_to_copy(struct page *page,
>   struct vm_area_struct *vma, unsigned long address)
>  {
> - return 0;
> + return page;
>  }
>  
>  static inline int page_referenced_ksm(struct page *page,
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
> @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
>  /*
>   * Only called through the sysfs control interface:
>   */
> +static int remove_stable_node(struct stable_node *stable_node)
> +{
> + struct page *page;
> + int err;
> +
> + page = get_ksm_page(stable_node, true);
> + if (!page) {
> + /*
> +  * 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Sun, 2013-01-27 at 15:05 -0800, Hugh Dickins wrote:
> On Sat, 26 Jan 2013, Simon Jeons wrote:
> > On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> > > Switching merge_across_nodes after running KSM is liable to oops on stale
> > > nodes still left over from the previous stable tree.  It's not something
> > > that people will often want to do, but it would be lame to demand a reboot
> > > when they're trying to determine which merge_across_nodes setting is best.
> > > 
> > > How can this happen?  We only permit switching merge_across_nodes when
> > > pages_shared is 0, and usually set run 2 to force that beforehand, which
> > > ought to unmerge everything: yet oopses still occur when you then run 1.
> > > 
> > > Three causes:
> > > 
> > > 1. The old stable tree (built according to the inverse merge_across_nodes)
   ^
How to understand inverse merge_across_nodes here?

> > > has not been fully torn down.  A stable node lingers until get_ksm_page()
> > > notices that the page it references no longer references it: but the page

Do you mean page->mapping is NULL when call get_ksm_page()? Who clear it
NULL?

> > > is not necessarily freed as soon as expected, particularly when swapcache.

Why is not necessarily freed as soon as expected?

> > > 
> > 
> > When can this happen?  
> 
> Whenever there's an additional reference to the page, beyond those for
> its ptes in userspace - swapcache for example, or pinned by get_user_pages.
> That delays its being freed (arriving at the "page->mapping = NULL;"
> in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
> waiting for that to be filled up, to be freed as part of a batch.
> 
> > 
> > > Fix this with a pass through the old stable tree, applying get_ksm_page()
> > > to each of the remaining nodes (most found stale and removed immediately),
> > > with forced removal of any left over.  Unless the page is still mapped:
> > > I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> > > and EBUSY than BUG.
> > > 
> > > 2. __ksm_enter() has a nice little optimization, to insert the new mm
> > > just behind ksmd's cursor, so there's a full pass for it to stabilize
> > > (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> > > but not so nice when we're trying to unmerge all mms: we were missing
> > > those mms forked and inserted behind the unmerge cursor.  Easily fixed
> > > by inserting at the end when KSM_RUN_UNMERGE.
> > 
> > mms forked will be unmerged just after ksmd's cursor since they're
> > inserted behind it, why will be missing?
> 
> unmerge_and_remove_all_rmap_items() makes one pass through the list
> from start to finish: insert behind the cursor and it will be missed.

Since mms forked will be insert just after ksmd's cursor, so it is the
next which will be scan and unmerge, where I miss?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sat, 26 Jan 2013, Simon Jeons wrote:
> On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> > Switching merge_across_nodes after running KSM is liable to oops on stale
> > nodes still left over from the previous stable tree.  It's not something
> > that people will often want to do, but it would be lame to demand a reboot
> > when they're trying to determine which merge_across_nodes setting is best.
> > 
> > How can this happen?  We only permit switching merge_across_nodes when
> > pages_shared is 0, and usually set run 2 to force that beforehand, which
> > ought to unmerge everything: yet oopses still occur when you then run 1.
> > 
> > Three causes:
> > 
> > 1. The old stable tree (built according to the inverse merge_across_nodes)
> > has not been fully torn down.  A stable node lingers until get_ksm_page()
> > notices that the page it references no longer references it: but the page
> > is not necessarily freed as soon as expected, particularly when swapcache.
> > 
> 
> When can this happen?  

Whenever there's an additional reference to the page, beyond those for
its ptes in userspace - swapcache for example, or pinned by get_user_pages.
That delays its being freed (arriving at the "page->mapping = NULL;"
in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
waiting for that to be filled up, to be freed as part of a batch.

> 
> > Fix this with a pass through the old stable tree, applying get_ksm_page()
> > to each of the remaining nodes (most found stale and removed immediately),
> > with forced removal of any left over.  Unless the page is still mapped:
> > I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> > and EBUSY than BUG.
> > 
> > 2. __ksm_enter() has a nice little optimization, to insert the new mm
> > just behind ksmd's cursor, so there's a full pass for it to stabilize
> > (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> > but not so nice when we're trying to unmerge all mms: we were missing
> > those mms forked and inserted behind the unmerge cursor.  Easily fixed
> > by inserting at the end when KSM_RUN_UNMERGE.
> 
> mms forked will be unmerged just after ksmd's cursor since they're
> inserted behind it, why will be missing?

unmerge_and_remove_all_rmap_items() makes one pass through the list
from start to finish: insert behind the cursor and it will be missed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sat, 26 Jan 2013, Simon Jeons wrote:
 On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
  Switching merge_across_nodes after running KSM is liable to oops on stale
  nodes still left over from the previous stable tree.  It's not something
  that people will often want to do, but it would be lame to demand a reboot
  when they're trying to determine which merge_across_nodes setting is best.
  
  How can this happen?  We only permit switching merge_across_nodes when
  pages_shared is 0, and usually set run 2 to force that beforehand, which
  ought to unmerge everything: yet oopses still occur when you then run 1.
  
  Three causes:
  
  1. The old stable tree (built according to the inverse merge_across_nodes)
  has not been fully torn down.  A stable node lingers until get_ksm_page()
  notices that the page it references no longer references it: but the page
  is not necessarily freed as soon as expected, particularly when swapcache.
  
 
 When can this happen?  

Whenever there's an additional reference to the page, beyond those for
its ptes in userspace - swapcache for example, or pinned by get_user_pages.
That delays its being freed (arriving at the page-mapping = NULL;
in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
waiting for that to be filled up, to be freed as part of a batch.

 
  Fix this with a pass through the old stable tree, applying get_ksm_page()
  to each of the remaining nodes (most found stale and removed immediately),
  with forced removal of any left over.  Unless the page is still mapped:
  I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
  and EBUSY than BUG.
  
  2. __ksm_enter() has a nice little optimization, to insert the new mm
  just behind ksmd's cursor, so there's a full pass for it to stabilize
  (or be removed) before ksmd addresses it.  Nice when ksmd is running,
  but not so nice when we're trying to unmerge all mms: we were missing
  those mms forked and inserted behind the unmerge cursor.  Easily fixed
  by inserting at the end when KSM_RUN_UNMERGE.
 
 mms forked will be unmerged just after ksmd's cursor since they're
 inserted behind it, why will be missing?

unmerge_and_remove_all_rmap_items() makes one pass through the list
from start to finish: insert behind the cursor and it will be missed.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Sun, 2013-01-27 at 15:05 -0800, Hugh Dickins wrote:
 On Sat, 26 Jan 2013, Simon Jeons wrote:
  On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
   Switching merge_across_nodes after running KSM is liable to oops on stale
   nodes still left over from the previous stable tree.  It's not something
   that people will often want to do, but it would be lame to demand a reboot
   when they're trying to determine which merge_across_nodes setting is best.
   
   How can this happen?  We only permit switching merge_across_nodes when
   pages_shared is 0, and usually set run 2 to force that beforehand, which
   ought to unmerge everything: yet oopses still occur when you then run 1.
   
   Three causes:
   
   1. The old stable tree (built according to the inverse merge_across_nodes)
   ^
How to understand inverse merge_across_nodes here?

   has not been fully torn down.  A stable node lingers until get_ksm_page()
   notices that the page it references no longer references it: but the page

Do you mean page-mapping is NULL when call get_ksm_page()? Who clear it
NULL?

   is not necessarily freed as soon as expected, particularly when swapcache.

Why is not necessarily freed as soon as expected?

   
  
  When can this happen?  
 
 Whenever there's an additional reference to the page, beyond those for
 its ptes in userspace - swapcache for example, or pinned by get_user_pages.
 That delays its being freed (arriving at the page-mapping = NULL;
 in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
 waiting for that to be filled up, to be freed as part of a batch.
 
  
   Fix this with a pass through the old stable tree, applying get_ksm_page()
   to each of the remaining nodes (most found stale and removed immediately),
   with forced removal of any left over.  Unless the page is still mapped:
   I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
   and EBUSY than BUG.
   
   2. __ksm_enter() has a nice little optimization, to insert the new mm
   just behind ksmd's cursor, so there's a full pass for it to stabilize
   (or be removed) before ksmd addresses it.  Nice when ksmd is running,
   but not so nice when we're trying to unmerge all mms: we were missing
   those mms forked and inserted behind the unmerge cursor.  Easily fixed
   by inserting at the end when KSM_RUN_UNMERGE.
  
  mms forked will be unmerged just after ksmd's cursor since they're
  inserted behind it, why will be missing?
 
 unmerge_and_remove_all_rmap_items() makes one pass through the list
 from start to finish: insert behind the cursor and it will be missed.

Since mms forked will be insert just after ksmd's cursor, so it is the
next which will be scan and unmerge, where I miss?



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
 Switching merge_across_nodes after running KSM is liable to oops on stale
 nodes still left over from the previous stable tree.  It's not something

Since this patch solve the problem, so the description of
merge_across_nodes(Value can be changed only when there is no ksm shared
pages in system) should be changed in this patch.

 that people will often want to do, but it would be lame to demand a reboot
 when they're trying to determine which merge_across_nodes setting is best.
 
 How can this happen?  We only permit switching merge_across_nodes when
 pages_shared is 0, and usually set run 2 to force that beforehand, which
 ought to unmerge everything: yet oopses still occur when you then run 1.
 
 Three causes:
 
 1. The old stable tree (built according to the inverse merge_across_nodes)
 has not been fully torn down.  A stable node lingers until get_ksm_page()
 notices that the page it references no longer references it: but the page
 is not necessarily freed as soon as expected, particularly when swapcache.
 
 Fix this with a pass through the old stable tree, applying get_ksm_page()
 to each of the remaining nodes (most found stale and removed immediately),
 with forced removal of any left over.  Unless the page is still mapped:
 I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
 and EBUSY than BUG.
 
 2. __ksm_enter() has a nice little optimization, to insert the new mm
 just behind ksmd's cursor, so there's a full pass for it to stabilize
 (or be removed) before ksmd addresses it.  Nice when ksmd is running,
 but not so nice when we're trying to unmerge all mms: we were missing
 those mms forked and inserted behind the unmerge cursor.  Easily fixed
 by inserting at the end when KSM_RUN_UNMERGE.
 
 3. It is possible for a KSM page to be faulted back from swapcache into
 an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
 Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
 to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
 and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
 
 A long outstanding, unrelated bugfix sneaks in with that third fix:
 ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
 I/O error when read in from swap) to a page which it then marks Uptodate.
 Fix this case by not copying, letting do_swap_page() discover the error.
 
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  include/linux/ksm.h |   18 ++---
  mm/ksm.c|   83 +++---
  mm/memory.c |   19 -
  3 files changed, 92 insertions(+), 28 deletions(-)
 
 --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
 +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
 @@ -16,9 +16,6 @@
  struct stable_node;
  struct mem_cgroup;
  
 -struct page *ksm_does_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address);
 -
  #ifdef CONFIG_KSM
  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, int advice, unsigned long *vm_flags);
 @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
   * We'd like to make this conditional on vma-vm_flags  VM_MERGEABLE,
   * but what if the vma was unmerged while the page was swapped out?
   */
 -static inline int ksm_might_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address)
 -{
 - struct anon_vma *anon_vma = page_anon_vma(page);
 -
 - return anon_vma 
 - (anon_vma-root != vma-anon_vma-root ||
 -  page-index != linear_page_index(vma, address));
 -}
 +struct page *ksm_might_need_to_copy(struct page *page,
 + struct vm_area_struct *vma, unsigned long address);
  
  int page_referenced_ksm(struct page *page,
   struct mem_cgroup *memcg, unsigned long *vm_flags);
 @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
   return 0;
  }
  
 -static inline int ksm_might_need_to_copy(struct page *page,
 +static inline struct page *ksm_might_need_to_copy(struct page *page,
   struct vm_area_struct *vma, unsigned long address)
  {
 - return 0;
 + return page;
  }
  
  static inline int page_referenced_ksm(struct page *page,
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
 @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
  /*
   * Only called through the sysfs control interface:
   */
 +static int remove_stable_node(struct stable_node *stable_node)
 +{
 + struct page *page;
 + int err;
 +
 + page = get_ksm_page(stable_node, true);
 + if (!page) {
 + /*
 +  * get_ksm_page did remove_node_from_stable_tree itself.
 +  */
 + return 0;
 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
 On Sun, 2013-01-27 at 15:05 -0800, Hugh Dickins wrote:
  On Sat, 26 Jan 2013, Simon Jeons wrote:
How can this happen?  We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.

Three causes:

1. The old stable tree (built according to the inverse 
merge_across_nodes)
^
 How to understand inverse merge_across_nodes here?

How not to understand it?  Either it was 0 before (in which case there
were as many stable trees as NUMA nodes) and is being changed to 1 (in
which case there is to be only one stable tree), or it was 1 before
(for one) and is being changed to 0 (for many).

 
has not been fully torn down.  A stable node lingers until 
get_ksm_page()
notices that the page it references no longer references it: but the 
page
 
 Do you mean page-mapping is NULL when call get_ksm_page()? Who clear it
 NULL?

I think I already pointed you to free_pages_prepare().

 
is not necessarily freed as soon as expected, particularly when 
swapcache.
 
 Why is not necessarily freed as soon as expected?

As I answered below.


   
   When can this happen?  
  
  Whenever there's an additional reference to the page, beyond those for
  its ptes in userspace - swapcache for example, or pinned by get_user_pages.
  That delays its being freed (arriving at the page-mapping = NULL;
  in free_pages_prepare()).  Or it might simply be sitting in a pagevec,
  waiting for that to be filled up, to be freed as part of a batch.

   mms forked will be unmerged just after ksmd's cursor since they're
   inserted behind it, why will be missing?
  
  unmerge_and_remove_all_rmap_items() makes one pass through the list
  from start to finish: insert behind the cursor and it will be missed.
 
 Since mms forked will be insert just after ksmd's cursor, so it is the
 next which will be scan and unmerge, where I miss?

mms forked are normally inserted just behind (== before) ksmd's cursor,
as I've said in comments and explanations several times.

Simon, I've had enough: you clearly have much more time to spare for
asking questions than I have for answering them repeatedly: I would
rather spend my time attending to 100 higher priorities.

Please try much harder to work these things out for yourself from the
source (perhaps with help from kernelnewbies.org), before interrogating
linux-kernel and linux-mm developers.  Sometimes your questions may
help everybody to understand better, but often they just waste our time.

I'll happily admit that mm, and mm/ksm.c in particular, is not the easiest
place to start in understanding the kernel, nor I the best expositor.

Best wishes,
Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Hugh Dickins
On Sun, 27 Jan 2013, Simon Jeons wrote:
 On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
  Switching merge_across_nodes after running KSM is liable to oops on stale
  nodes still left over from the previous stable tree.  It's not something
 
 Since this patch solve the problem, so the description of
 merge_across_nodes(Value can be changed only when there is no ksm shared
 pages in system) should be changed in this patch.

No.

The code could be changed to unmerge_and_remove_all_rmap_items()
automatically whenever merge_across_nodes is changed; but that's
not what Petr chose to do, and I didn't feel strongly to change it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-27 Thread Simon Jeons
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
 Switching merge_across_nodes after running KSM is liable to oops on stale
 nodes still left over from the previous stable tree.  It's not something
 that people will often want to do, but it would be lame to demand a reboot
 when they're trying to determine which merge_across_nodes setting is best.
 
 How can this happen?  We only permit switching merge_across_nodes when
 pages_shared is 0, and usually set run 2 to force that beforehand, which
 ought to unmerge everything: yet oopses still occur when you then run 1.
 
 Three causes:
 
 1. The old stable tree (built according to the inverse merge_across_nodes)
 has not been fully torn down.  A stable node lingers until get_ksm_page()
 notices that the page it references no longer references it: but the page
 is not necessarily freed as soon as expected, particularly when swapcache.
 
 Fix this with a pass through the old stable tree, applying get_ksm_page()
 to each of the remaining nodes (most found stale and removed immediately),
 with forced removal of any left over.  Unless the page is still mapped:
 I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
 and EBUSY than BUG.
 
 2. __ksm_enter() has a nice little optimization, to insert the new mm
 just behind ksmd's cursor, so there's a full pass for it to stabilize
 (or be removed) before ksmd addresses it.  Nice when ksmd is running,
 but not so nice when we're trying to unmerge all mms: we were missing
 those mms forked and inserted behind the unmerge cursor.  Easily fixed
 by inserting at the end when KSM_RUN_UNMERGE.
 
 3. It is possible for a KSM page to be faulted back from swapcache into
 an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
 Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
 to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
 and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
 
 A long outstanding, unrelated bugfix sneaks in with that third fix:
 ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
 I/O error when read in from swap) to a page which it then marks Uptodate.
 Fix this case by not copying, letting do_swap_page() discover the error.
 
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  include/linux/ksm.h |   18 ++---
  mm/ksm.c|   83 +++---
  mm/memory.c |   19 -
  3 files changed, 92 insertions(+), 28 deletions(-)
 
 --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
 +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
 @@ -16,9 +16,6 @@
  struct stable_node;
  struct mem_cgroup;
  
 -struct page *ksm_does_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address);
 -
  #ifdef CONFIG_KSM
  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, int advice, unsigned long *vm_flags);
 @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
   * We'd like to make this conditional on vma-vm_flags  VM_MERGEABLE,
   * but what if the vma was unmerged while the page was swapped out?
   */
 -static inline int ksm_might_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address)
 -{
 - struct anon_vma *anon_vma = page_anon_vma(page);
 -
 - return anon_vma 
 - (anon_vma-root != vma-anon_vma-root ||
 -  page-index != linear_page_index(vma, address));
 -}
 +struct page *ksm_might_need_to_copy(struct page *page,
 + struct vm_area_struct *vma, unsigned long address);
  
  int page_referenced_ksm(struct page *page,
   struct mem_cgroup *memcg, unsigned long *vm_flags);
 @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
   return 0;
  }
  
 -static inline int ksm_might_need_to_copy(struct page *page,
 +static inline struct page *ksm_might_need_to_copy(struct page *page,
   struct vm_area_struct *vma, unsigned long address)
  {
 - return 0;
 + return page;
  }
  
  static inline int page_referenced_ksm(struct page *page,
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
 @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
  /*
   * Only called through the sysfs control interface:
   */
 +static int remove_stable_node(struct stable_node *stable_node)
 +{
 + struct page *page;
 + int err;
 +
 + page = get_ksm_page(stable_node, true);
 + if (!page) {
 + /*
 +  * get_ksm_page did remove_node_from_stable_tree itself.
 +  */
 + return 0;
 + }
 +
 + if (WARN_ON_ONCE(page_mapped(page)))
 + err = -EBUSY;
 + else {
 + /*
 +  * This page might be in a pagevec waiting to be 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-26 Thread Simon Jeons
Hi Hugh,
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
> Switching merge_across_nodes after running KSM is liable to oops on stale
> nodes still left over from the previous stable tree.  It's not something
> that people will often want to do, but it would be lame to demand a reboot
> when they're trying to determine which merge_across_nodes setting is best.
> 
> How can this happen?  We only permit switching merge_across_nodes when
> pages_shared is 0, and usually set run 2 to force that beforehand, which
> ought to unmerge everything: yet oopses still occur when you then run 1.
> 
> Three causes:
> 
> 1. The old stable tree (built according to the inverse merge_across_nodes)
> has not been fully torn down.  A stable node lingers until get_ksm_page()
> notices that the page it references no longer references it: but the page
> is not necessarily freed as soon as expected, particularly when swapcache.
> 

When can this happen?  

> Fix this with a pass through the old stable tree, applying get_ksm_page()
> to each of the remaining nodes (most found stale and removed immediately),
> with forced removal of any left over.  Unless the page is still mapped:
> I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
> and EBUSY than BUG.
> 
> 2. __ksm_enter() has a nice little optimization, to insert the new mm
> just behind ksmd's cursor, so there's a full pass for it to stabilize
> (or be removed) before ksmd addresses it.  Nice when ksmd is running,
> but not so nice when we're trying to unmerge all mms: we were missing
> those mms forked and inserted behind the unmerge cursor.  Easily fixed
> by inserting at the end when KSM_RUN_UNMERGE.

mms forked will be unmerged just after ksmd's cursor since they're
inserted behind it, why will be missing?

> 
> 3. It is possible for a KSM page to be faulted back from swapcache into
> an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
> Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
> to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
> and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
> 

Make sense. :)

> A long outstanding, unrelated bugfix sneaks in with that third fix:
> ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
> I/O error when read in from swap) to a page which it then marks Uptodate.
> Fix this case by not copying, letting do_swap_page() discover the error.
> 
> Signed-off-by: Hugh Dickins 
> ---
>  include/linux/ksm.h |   18 ++---
>  mm/ksm.c|   83 +++---
>  mm/memory.c |   19 -
>  3 files changed, 92 insertions(+), 28 deletions(-)
> 
> --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
> +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
> @@ -16,9 +16,6 @@
>  struct stable_node;
>  struct mem_cgroup;
>  
> -struct page *ksm_does_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address);
> -
>  #ifdef CONFIG_KSM
>  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
>   unsigned long end, int advice, unsigned long *vm_flags);
> @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
>   * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
>   * but what if the vma was unmerged while the page was swapped out?
>   */
> -static inline int ksm_might_need_to_copy(struct page *page,
> - struct vm_area_struct *vma, unsigned long address)
> -{
> - struct anon_vma *anon_vma = page_anon_vma(page);
> -
> - return anon_vma &&
> - (anon_vma->root != vma->anon_vma->root ||
> -  page->index != linear_page_index(vma, address));
> -}
> +struct page *ksm_might_need_to_copy(struct page *page,
> + struct vm_area_struct *vma, unsigned long address);
>  
>  int page_referenced_ksm(struct page *page,
>   struct mem_cgroup *memcg, unsigned long *vm_flags);
> @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
>   return 0;
>  }
>  
> -static inline int ksm_might_need_to_copy(struct page *page,
> +static inline struct page *ksm_might_need_to_copy(struct page *page,
>   struct vm_area_struct *vma, unsigned long address)
>  {
> - return 0;
> + return page;
>  }
>  
>  static inline int page_referenced_ksm(struct page *page,
> --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
> +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
> @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
>  /*
>   * Only called through the sysfs control interface:
>   */
> +static int remove_stable_node(struct stable_node *stable_node)
> +{
> + struct page *page;
> + int err;
> +
> + page = get_ksm_page(stable_node, true);
> + if (!page) {
> + /*
> +  * get_ksm_page did 

Re: [PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-26 Thread Simon Jeons
Hi Hugh,
On Fri, 2013-01-25 at 18:01 -0800, Hugh Dickins wrote:
 Switching merge_across_nodes after running KSM is liable to oops on stale
 nodes still left over from the previous stable tree.  It's not something
 that people will often want to do, but it would be lame to demand a reboot
 when they're trying to determine which merge_across_nodes setting is best.
 
 How can this happen?  We only permit switching merge_across_nodes when
 pages_shared is 0, and usually set run 2 to force that beforehand, which
 ought to unmerge everything: yet oopses still occur when you then run 1.
 
 Three causes:
 
 1. The old stable tree (built according to the inverse merge_across_nodes)
 has not been fully torn down.  A stable node lingers until get_ksm_page()
 notices that the page it references no longer references it: but the page
 is not necessarily freed as soon as expected, particularly when swapcache.
 

When can this happen?  

 Fix this with a pass through the old stable tree, applying get_ksm_page()
 to each of the remaining nodes (most found stale and removed immediately),
 with forced removal of any left over.  Unless the page is still mapped:
 I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
 and EBUSY than BUG.
 
 2. __ksm_enter() has a nice little optimization, to insert the new mm
 just behind ksmd's cursor, so there's a full pass for it to stabilize
 (or be removed) before ksmd addresses it.  Nice when ksmd is running,
 but not so nice when we're trying to unmerge all mms: we were missing
 those mms forked and inserted behind the unmerge cursor.  Easily fixed
 by inserting at the end when KSM_RUN_UNMERGE.

mms forked will be unmerged just after ksmd's cursor since they're
inserted behind it, why will be missing?

 
 3. It is possible for a KSM page to be faulted back from swapcache into
 an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
 Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
 to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
 and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.
 

Make sense. :)

 A long outstanding, unrelated bugfix sneaks in with that third fix:
 ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
 I/O error when read in from swap) to a page which it then marks Uptodate.
 Fix this case by not copying, letting do_swap_page() discover the error.
 
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
  include/linux/ksm.h |   18 ++---
  mm/ksm.c|   83 +++---
  mm/memory.c |   19 -
  3 files changed, 92 insertions(+), 28 deletions(-)
 
 --- mmotm.orig/include/linux/ksm.h2013-01-25 14:27:58.220193250 -0800
 +++ mmotm/include/linux/ksm.h 2013-01-25 14:37:00.764206145 -0800
 @@ -16,9 +16,6 @@
  struct stable_node;
  struct mem_cgroup;
  
 -struct page *ksm_does_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address);
 -
  #ifdef CONFIG_KSM
  int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
   unsigned long end, int advice, unsigned long *vm_flags);
 @@ -73,15 +70,8 @@ static inline void set_page_stable_node(
   * We'd like to make this conditional on vma-vm_flags  VM_MERGEABLE,
   * but what if the vma was unmerged while the page was swapped out?
   */
 -static inline int ksm_might_need_to_copy(struct page *page,
 - struct vm_area_struct *vma, unsigned long address)
 -{
 - struct anon_vma *anon_vma = page_anon_vma(page);
 -
 - return anon_vma 
 - (anon_vma-root != vma-anon_vma-root ||
 -  page-index != linear_page_index(vma, address));
 -}
 +struct page *ksm_might_need_to_copy(struct page *page,
 + struct vm_area_struct *vma, unsigned long address);
  
  int page_referenced_ksm(struct page *page,
   struct mem_cgroup *memcg, unsigned long *vm_flags);
 @@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
   return 0;
  }
  
 -static inline int ksm_might_need_to_copy(struct page *page,
 +static inline struct page *ksm_might_need_to_copy(struct page *page,
   struct vm_area_struct *vma, unsigned long address)
  {
 - return 0;
 + return page;
  }
  
  static inline int page_referenced_ksm(struct page *page,
 --- mmotm.orig/mm/ksm.c   2013-01-25 14:36:58.856206099 -0800
 +++ mmotm/mm/ksm.c2013-01-25 14:37:00.768206145 -0800
 @@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
  /*
   * Only called through the sysfs control interface:
   */
 +static int remove_stable_node(struct stable_node *stable_node)
 +{
 + struct page *page;
 + int err;
 +
 + page = get_ksm_page(stable_node, true);
 + if (!page) {
 + /*
 +  * get_ksm_page did remove_node_from_stable_tree itself.
 +  */
 + return 0;
 + }
 +
 + if 

[PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-25 Thread Hugh Dickins
Switching merge_across_nodes after running KSM is liable to oops on stale
nodes still left over from the previous stable tree.  It's not something
that people will often want to do, but it would be lame to demand a reboot
when they're trying to determine which merge_across_nodes setting is best.

How can this happen?  We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.

Three causes:

1. The old stable tree (built according to the inverse merge_across_nodes)
has not been fully torn down.  A stable node lingers until get_ksm_page()
notices that the page it references no longer references it: but the page
is not necessarily freed as soon as expected, particularly when swapcache.

Fix this with a pass through the old stable tree, applying get_ksm_page()
to each of the remaining nodes (most found stale and removed immediately),
with forced removal of any left over.  Unless the page is still mapped:
I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
and EBUSY than BUG.

2. __ksm_enter() has a nice little optimization, to insert the new mm
just behind ksmd's cursor, so there's a full pass for it to stabilize
(or be removed) before ksmd addresses it.  Nice when ksmd is running,
but not so nice when we're trying to unmerge all mms: we were missing
those mms forked and inserted behind the unmerge cursor.  Easily fixed
by inserting at the end when KSM_RUN_UNMERGE.

3. It is possible for a KSM page to be faulted back from swapcache into
an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.

A long outstanding, unrelated bugfix sneaks in with that third fix:
ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
I/O error when read in from swap) to a page which it then marks Uptodate.
Fix this case by not copying, letting do_swap_page() discover the error.

Signed-off-by: Hugh Dickins 
---
 include/linux/ksm.h |   18 ++---
 mm/ksm.c|   83 +++---
 mm/memory.c |   19 -
 3 files changed, 92 insertions(+), 28 deletions(-)

--- mmotm.orig/include/linux/ksm.h  2013-01-25 14:27:58.220193250 -0800
+++ mmotm/include/linux/ksm.h   2013-01-25 14:37:00.764206145 -0800
@@ -16,9 +16,6 @@
 struct stable_node;
 struct mem_cgroup;
 
-struct page *ksm_does_need_to_copy(struct page *page,
-   struct vm_area_struct *vma, unsigned long address);
-
 #ifdef CONFIG_KSM
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
@@ -73,15 +70,8 @@ static inline void set_page_stable_node(
  * We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
  * but what if the vma was unmerged while the page was swapped out?
  */
-static inline int ksm_might_need_to_copy(struct page *page,
-   struct vm_area_struct *vma, unsigned long address)
-{
-   struct anon_vma *anon_vma = page_anon_vma(page);
-
-   return anon_vma &&
-   (anon_vma->root != vma->anon_vma->root ||
-page->index != linear_page_index(vma, address));
-}
+struct page *ksm_might_need_to_copy(struct page *page,
+   struct vm_area_struct *vma, unsigned long address);
 
 int page_referenced_ksm(struct page *page,
struct mem_cgroup *memcg, unsigned long *vm_flags);
@@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
return 0;
 }
 
-static inline int ksm_might_need_to_copy(struct page *page,
+static inline struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
 {
-   return 0;
+   return page;
 }
 
 static inline int page_referenced_ksm(struct page *page,
--- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800
+++ mmotm/mm/ksm.c  2013-01-25 14:37:00.768206145 -0800
@@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
 /*
  * Only called through the sysfs control interface:
  */
+static int remove_stable_node(struct stable_node *stable_node)
+{
+   struct page *page;
+   int err;
+
+   page = get_ksm_page(stable_node, true);
+   if (!page) {
+   /*
+* get_ksm_page did remove_node_from_stable_tree itself.
+*/
+   return 0;
+   }
+
+   if (WARN_ON_ONCE(page_mapped(page)))
+   err = -EBUSY;
+   else {
+   /*
+* This page might be in a pagevec waiting to be freed,
+* or it might be PageSwapCache (perhaps under writeback),
+* or it might have been 

[PATCH 6/11] ksm: remove old stable nodes more thoroughly

2013-01-25 Thread Hugh Dickins
Switching merge_across_nodes after running KSM is liable to oops on stale
nodes still left over from the previous stable tree.  It's not something
that people will often want to do, but it would be lame to demand a reboot
when they're trying to determine which merge_across_nodes setting is best.

How can this happen?  We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.

Three causes:

1. The old stable tree (built according to the inverse merge_across_nodes)
has not been fully torn down.  A stable node lingers until get_ksm_page()
notices that the page it references no longer references it: but the page
is not necessarily freed as soon as expected, particularly when swapcache.

Fix this with a pass through the old stable tree, applying get_ksm_page()
to each of the remaining nodes (most found stale and removed immediately),
with forced removal of any left over.  Unless the page is still mapped:
I've not seen that case, it shouldn't occur, but better to WARN_ON_ONCE
and EBUSY than BUG.

2. __ksm_enter() has a nice little optimization, to insert the new mm
just behind ksmd's cursor, so there's a full pass for it to stabilize
(or be removed) before ksmd addresses it.  Nice when ksmd is running,
but not so nice when we're trying to unmerge all mms: we were missing
those mms forked and inserted behind the unmerge cursor.  Easily fixed
by inserting at the end when KSM_RUN_UNMERGE.

3. It is possible for a KSM page to be faulted back from swapcache into
an mm, just after unmerge_and_remove_all_rmap_items() scanned past it.
Fix this by copying on fault when KSM_RUN_UNMERGE: but that is private
to ksm.c, so dissolve the distinction between ksm_might_need_to_copy()
and ksm_does_need_to_copy(), doing it all in the one call into ksm.c.

A long outstanding, unrelated bugfix sneaks in with that third fix:
ksm_does_need_to_copy() would copy from a !PageUptodate page (implying
I/O error when read in from swap) to a page which it then marks Uptodate.
Fix this case by not copying, letting do_swap_page() discover the error.

Signed-off-by: Hugh Dickins hu...@google.com
---
 include/linux/ksm.h |   18 ++---
 mm/ksm.c|   83 +++---
 mm/memory.c |   19 -
 3 files changed, 92 insertions(+), 28 deletions(-)

--- mmotm.orig/include/linux/ksm.h  2013-01-25 14:27:58.220193250 -0800
+++ mmotm/include/linux/ksm.h   2013-01-25 14:37:00.764206145 -0800
@@ -16,9 +16,6 @@
 struct stable_node;
 struct mem_cgroup;
 
-struct page *ksm_does_need_to_copy(struct page *page,
-   struct vm_area_struct *vma, unsigned long address);
-
 #ifdef CONFIG_KSM
 int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
@@ -73,15 +70,8 @@ static inline void set_page_stable_node(
  * We'd like to make this conditional on vma-vm_flags  VM_MERGEABLE,
  * but what if the vma was unmerged while the page was swapped out?
  */
-static inline int ksm_might_need_to_copy(struct page *page,
-   struct vm_area_struct *vma, unsigned long address)
-{
-   struct anon_vma *anon_vma = page_anon_vma(page);
-
-   return anon_vma 
-   (anon_vma-root != vma-anon_vma-root ||
-page-index != linear_page_index(vma, address));
-}
+struct page *ksm_might_need_to_copy(struct page *page,
+   struct vm_area_struct *vma, unsigned long address);
 
 int page_referenced_ksm(struct page *page,
struct mem_cgroup *memcg, unsigned long *vm_flags);
@@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_
return 0;
 }
 
-static inline int ksm_might_need_to_copy(struct page *page,
+static inline struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
 {
-   return 0;
+   return page;
 }
 
 static inline int page_referenced_ksm(struct page *page,
--- mmotm.orig/mm/ksm.c 2013-01-25 14:36:58.856206099 -0800
+++ mmotm/mm/ksm.c  2013-01-25 14:37:00.768206145 -0800
@@ -644,6 +644,57 @@ static int unmerge_ksm_pages(struct vm_a
 /*
  * Only called through the sysfs control interface:
  */
+static int remove_stable_node(struct stable_node *stable_node)
+{
+   struct page *page;
+   int err;
+
+   page = get_ksm_page(stable_node, true);
+   if (!page) {
+   /*
+* get_ksm_page did remove_node_from_stable_tree itself.
+*/
+   return 0;
+   }
+
+   if (WARN_ON_ONCE(page_mapped(page)))
+   err = -EBUSY;
+   else {
+   /*
+* This page might be in a pagevec waiting to be freed,
+* or it might be PageSwapCache (perhaps under writeback),
+* or it might have