Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Hugh Dickins
On Mon, 3 Feb 2014, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be 
> > > > mapped
> > > > and have positive page_count(), so your code such as
> > > > '!page_mapping(page) && page_count(page)' makes compaction skip these 
> > > > *normal*
> > > > anon pages and this is incorrect behaviour.
> > > > 
> > > 
> > > So how does that work with migrate_page_move_mapping() which demands 
> > > page_count(page) == 1 and the get_page_unless_zero() in 
> > > __isolate_lru_page()?
> > 
> > Before doing migrate_page_move_mapping(), try_to_unmap() is called so that 
> > all
> > mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
> > __isolate_lru_page(). Am I missing something?
> > 
> 
> Ah, good point.  I wonder if we can get away with 
> page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() 
> pin?

Something like that.  But please go back to migrate_page_move_mapping()
to factor in what it's additionally considering.  Whether you can share
code with it, I don't know - it has to do some things under a lock you
cannot take at the preliminary stage - you haven't isolated or locked
the page yet.

There is a separate issue, that a mapping may supply its own non-default
mapping->a_ops->migratepage(): can we assume that the page_counting is
the same whatever migratepage() is in use?  I'm not sure.

If you stick to special-casing PageAnon pages, you won't face that
issue; but your proposed change would be a lot more satisfying if we
can convince ourselves that it's good for !PageAnon too.  May need a
trawl through the different migratepage() methods that exist in tree.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> > > Okay. It can't fix your situation. Anyway, *normal* anon pages may be 
> > > mapped
> > > and have positive page_count(), so your code such as
> > > '!page_mapping(page) && page_count(page)' makes compaction skip these 
> > > *normal*
> > > anon pages and this is incorrect behaviour.
> > > 
> > 
> > So how does that work with migrate_page_move_mapping() which demands 
> > page_count(page) == 1 and the get_page_unless_zero() in 
> > __isolate_lru_page()?
> 
> Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
> mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
> __isolate_lru_page(). Am I missing something?
> 

Ah, good point.  I wonder if we can get away with 
page_count(page) - page_mapcount(page) > 1 to avoid the get_user_pages() 
pin?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Hugh Dickins
On Mon, 3 Feb 2014, David Rientjes wrote:
> On Mon, 3 Feb 2014, Mel Gorman wrote:
> 
> > > Page migration will fail for memory that is pinned in memory with, for
> > > example, get_user_pages().  In this case, it is unnecessary to take
> > > zone->lru_lock or isolating the page and passing it to page migration
> > > which will ultimately fail.
> > > 
> > > This is a racy check, the page can still change from under us, but in
> > > that case we'll just fail later when attempting to move the page.
> > > 
> > > This avoids very expensive memory compaction when faulting transparent
> > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > 
> > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > see the enormous disparity in the number of page migration failures
> > > because of the pinning (from /proc/vmstat):

120GB of memory on the active/inactive lrus but longterm pinned,
that's quite worrying: not just a great waste of time for compaction,
but for page reclaim also.  I suppose a fairly easy way around it would
be for the driver to use mlock too, moving them all to unevictable lru.

But in general, you may well  be right that, racy as this isolation/
migration procedure necessarily is, in the face of longterm pinning it
may make more sense to test page_count before proceding to isolation
rather than only after in migration.  We always took the view that it's
better to give up only at the last moment, but that may be a bad bet.

> > > 
> > > compact_blocks_moved 7609
> > > compact_pages_moved 3431
> > > compact_pagemigrate_failed 133219
> > > compact_stall 13
> > > 
> > > After the patch, it is much more efficient:
> > > 
> > > compact_blocks_moved 7998
> > > compact_pages_moved 6403
> > > compact_pagemigrate_failed 3
> > > compact_stall 15
> > > 
> > > Signed-off-by: David Rientjes 
> > > ---
> > >  mm/compaction.c | 8 
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> > > compact_control *cc,
> > >   continue;
> > >   }
> > >  
> > > + /*
> > > +  * Migration will fail if an anonymous page is pinned in memory,
> > > +  * so avoid taking zone->lru_lock and isolating it unnecessarily
> > > +  * in an admittedly racy check.
> > > +  */
> > > + if (!page_mapping(page) && page_count(page))
> > > + continue;
> > > +
> > 
> > Are you sure about this? The page_count check migration does is this
> > 
> > int expected_count = 1 + extra_count;
> > if (!mapping) {
> > if (page_count(page) != expected_count)
> > return -EAGAIN;
> > return MIGRATEPAGE_SUCCESS;
> > }
> > 
> > spin_lock_irq(>tree_lock);
> > 
> > pslot = radix_tree_lookup_slot(>page_tree,
> > page_index(page));
> > 
> > expected_count += 1 + page_has_private(page);
> > 
> > Migration expects and can migrate pages with no mapping and a page count
> > but you are now skipping them. I think you may have intended to split
> > migrations page count into a helper or copy the logic.
> > 
> 
> Thanks for taking a look!
> 
> The patch is correct, it just shows my lack of a complete commit message 

I don't think so.  I agree with Mel that you should be reconsidering
those tests that migrate_page_move_mapping() makes, but remembering that
it's called at a stage between try_to_unmap() and remove_migration_ptes(),
when page_mapcount has been brought down to 0 - not the case here.

> which I'm struggling with recently.  In the case that this is addressing, 
> get_user_pages() already gives page_count(page) == 1, then 

But get_user_pages() brings the pages into user address space (if not
already there), page_mapcount 1 and page_count 1, and does an additional
pin on the page, page_count 2.  Or if it's a page_mapping page (perhaps
even PageAnon in SwapCache) there's another +1; if page_has_buffers
another +1; mapped into more user address spaces, +more.

Yourif (!page_mapping(page) && page_count(page))
continue;
is letting through any Anon SwapCache pages (probably no great concern
in your 120GB example; but I don't understand why you want to special-
case Anon anyway, beyond your specific testcase); and refusing to
isolate all those unpinned anonymous pages mapped into userspace which
migration is perfectly capable of migrating.  If 120GB out of 128GB is
pinned, that won't be a significant proportion, and of course your
change saves a lot of wasted time and lock contention; but for most
people it's a considerable proportion of their memory, and needs to
be migratable.

I think Joonsoo is making the same point (though I disagree with the
test he suggested); but I've not yet read the 

Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 06:00:56PM -0800, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> > and have positive page_count(), so your code such as
> > '!page_mapping(page) && page_count(page)' makes compaction skip these 
> > *normal*
> > anon pages and this is incorrect behaviour.
> > 
> 
> So how does that work with migrate_page_move_mapping() which demands 
> page_count(page) == 1 and the get_page_unless_zero() in 
> __isolate_lru_page()?

Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
__isolate_lru_page(). Am I missing something?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
> and have positive page_count(), so your code such as
> '!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
> anon pages and this is incorrect behaviour.
> 

So how does that work with migrate_page_move_mapping() which demands 
page_count(page) == 1 and the get_page_unless_zero() in 
__isolate_lru_page()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 05:20:46PM -0800, David Rientjes wrote:
> On Tue, 4 Feb 2014, Joonsoo Kim wrote:
> 
> > I think that you need more code to skip this type of page correctly.
> > Without page_mapped() check, this code makes migratable pages be skipped,
> > since if page_mapped() case, page_count() may be more than zero.
> > 
> > So I think that you need following change.
> > 
> > (!page_mapping(page) && !page_mapped(page) && page_count(page))
> > 
> 
> These pages returned by get_user_pages() will have a mapcount of 1 so this 
> wouldn't actually fix the massive lock contention.  page_mapping() is only 
> going to be NULL for pages off the lru like these are for 
> PAGE_MAPPING_ANON.

Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
and have positive page_count(), so your code such as
'!page_mapping(page) && page_count(page)' makes compaction skip these *normal*
anon pages and this is incorrect behaviour.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

> I think that you need more code to skip this type of page correctly.
> Without page_mapped() check, this code makes migratable pages be skipped,
> since if page_mapped() case, page_count() may be more than zero.
> 
> So I think that you need following change.
> 
> (!page_mapping(page) && !page_mapped(page) && page_count(page))
> 

These pages returned by get_user_pages() will have a mapcount of 1 so this 
wouldn't actually fix the massive lock contention.  page_mapping() is only 
going to be NULL for pages off the lru like these are for 
PAGE_MAPPING_ANON.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 02:49:32AM -0800, David Rientjes wrote:
> On Mon, 3 Feb 2014, Mel Gorman wrote:
> 
> > > Page migration will fail for memory that is pinned in memory with, for
> > > example, get_user_pages().  In this case, it is unnecessary to take
> > > zone->lru_lock or isolating the page and passing it to page migration
> > > which will ultimately fail.
> > > 
> > > This is a racy check, the page can still change from under us, but in
> > > that case we'll just fail later when attempting to move the page.
> > > 
> > > This avoids very expensive memory compaction when faulting transparent
> > > hugepages after pinning a lot of memory with a Mellanox driver.
> > > 
> > > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > > see the enormous disparity in the number of page migration failures
> > > because of the pinning (from /proc/vmstat):
> > > 
> > > compact_blocks_moved 7609
> > > compact_pages_moved 3431
> > > compact_pagemigrate_failed 133219
> > > compact_stall 13
> > > 
> > > After the patch, it is much more efficient:
> > > 
> > > compact_blocks_moved 7998
> > > compact_pages_moved 6403
> > > compact_pagemigrate_failed 3
> > > compact_stall 15
> > > 
> > > Signed-off-by: David Rientjes 
> > > ---
> > >  mm/compaction.c | 8 
> > >  1 file changed, 8 insertions(+)
> > > 
> > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > --- a/mm/compaction.c
> > > +++ b/mm/compaction.c
> > > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> > > compact_control *cc,
> > >   continue;
> > >   }
> > >  
> > > + /*
> > > +  * Migration will fail if an anonymous page is pinned in memory,
> > > +  * so avoid taking zone->lru_lock and isolating it unnecessarily
> > > +  * in an admittedly racy check.
> > > +  */
> > > + if (!page_mapping(page) && page_count(page))
> > > + continue;
> > > +

Hello,

I think that you need more code to skip this type of page correctly.
Without page_mapped() check, this code makes migratable pages be skipped,
since if page_mapped() case, page_count() may be more than zero.

So I think that you need following change.

(!page_mapping(page) && !page_mapped(page) && page_count(page))

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread David Rientjes
On Mon, 3 Feb 2014, Mel Gorman wrote:

> > Page migration will fail for memory that is pinned in memory with, for
> > example, get_user_pages().  In this case, it is unnecessary to take
> > zone->lru_lock or isolating the page and passing it to page migration
> > which will ultimately fail.
> > 
> > This is a racy check, the page can still change from under us, but in
> > that case we'll just fail later when attempting to move the page.
> > 
> > This avoids very expensive memory compaction when faulting transparent
> > hugepages after pinning a lot of memory with a Mellanox driver.
> > 
> > On a 128GB machine and pinning ~120GB of memory, before this patch we
> > see the enormous disparity in the number of page migration failures
> > because of the pinning (from /proc/vmstat):
> > 
> > compact_blocks_moved 7609
> > compact_pages_moved 3431
> > compact_pagemigrate_failed 133219
> > compact_stall 13
> > 
> > After the patch, it is much more efficient:
> > 
> > compact_blocks_moved 7998
> > compact_pages_moved 6403
> > compact_pagemigrate_failed 3
> > compact_stall 15
> > 
> > Signed-off-by: David Rientjes 
> > ---
> >  mm/compaction.c | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> > compact_control *cc,
> > continue;
> > }
> >  
> > +   /*
> > +* Migration will fail if an anonymous page is pinned in memory,
> > +* so avoid taking zone->lru_lock and isolating it unnecessarily
> > +* in an admittedly racy check.
> > +*/
> > +   if (!page_mapping(page) && page_count(page))
> > +   continue;
> > +
> 
> Are you sure about this? The page_count check migration does is this
> 
> int expected_count = 1 + extra_count;
> if (!mapping) {
> if (page_count(page) != expected_count)
> return -EAGAIN;
> return MIGRATEPAGE_SUCCESS;
> }
> 
> spin_lock_irq(>tree_lock);
> 
> pslot = radix_tree_lookup_slot(>page_tree,
> page_index(page));
> 
> expected_count += 1 + page_has_private(page);
> 
> Migration expects and can migrate pages with no mapping and a page count
> but you are now skipping them. I think you may have intended to split
> migrations page count into a helper or copy the logic.
> 

Thanks for taking a look!

The patch is correct, it just shows my lack of a complete commit message 
which I'm struggling with recently.  In the case that this is addressing, 
get_user_pages() already gives page_count(page) == 1, then 
__isolate_lru_page() does another get_page() that is dropped in 
putback_lru_page() after the call into migrate_pages().  So in the code 
you quote above we always have page_count(page) == 2 and
expected_count == 1.

So what we desperately need to do is avoid isolating any page where 
page_count(page) is non-zero and !page_mapping(page) and do that before 
the get_page() in __isolate_lru_page() because we want to avoid taking 
zone->lru_lock.  On my 128GB machine filled with ~120GB of pinned memory 
for the driver, this lock gets highly contended under compaction and even 
reclaim if the rest of userspace is using a lot of memory.

It's not really relevant to the commit message, but I found that if all 
that ~120GB is faulted and I manually invoke compaction with the procfs 
trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets 
taken ~450,000 times and only 0.05% of isolated pages are actually 
successfully migrated.

Deferred compaction will certainly help for compaction that isn't induced 
via procfs, but we've encountered massive amounts of lock contention in 
this path and extremely low success to failure ratios of page migration on 
average of 2-3 out of 60 runs and the fault path really does grind to a 
halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver 
does ib_umem_get() for 120GB of memory, but we want those hugepages!).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Mel Gorman
On Sat, Feb 01, 2014 at 09:46:26PM -0800, David Rientjes wrote:
> Page migration will fail for memory that is pinned in memory with, for
> example, get_user_pages().  In this case, it is unnecessary to take
> zone->lru_lock or isolating the page and passing it to page migration
> which will ultimately fail.
> 
> This is a racy check, the page can still change from under us, but in
> that case we'll just fail later when attempting to move the page.
> 
> This avoids very expensive memory compaction when faulting transparent
> hugepages after pinning a lot of memory with a Mellanox driver.
> 
> On a 128GB machine and pinning ~120GB of memory, before this patch we
> see the enormous disparity in the number of page migration failures
> because of the pinning (from /proc/vmstat):
> 
> compact_blocks_moved 7609
> compact_pages_moved 3431
> compact_pagemigrate_failed 133219
> compact_stall 13
> 
> After the patch, it is much more efficient:
> 
> compact_blocks_moved 7998
> compact_pages_moved 6403
> compact_pagemigrate_failed 3
> compact_stall 15
> 
> Signed-off-by: David Rientjes 
> ---
>  mm/compaction.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
> compact_control *cc,
>   continue;
>   }
>  
> + /*
> +  * Migration will fail if an anonymous page is pinned in memory,
> +  * so avoid taking zone->lru_lock and isolating it unnecessarily
> +  * in an admittedly racy check.
> +  */
> + if (!page_mapping(page) && page_count(page))
> + continue;
> +

Are you sure about this? The page_count check migration does is this

int expected_count = 1 + extra_count;
if (!mapping) {
if (page_count(page) != expected_count)
return -EAGAIN;
return MIGRATEPAGE_SUCCESS;
}

spin_lock_irq(>tree_lock);

pslot = radix_tree_lookup_slot(>page_tree,
page_index(page));

expected_count += 1 + page_has_private(page);

Migration expects and can migrate pages with no mapping and a page count
but you are now skipping them. I think you may have intended to split
migrations page count into a helper or copy the logic.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Mel Gorman
On Sat, Feb 01, 2014 at 09:46:26PM -0800, David Rientjes wrote:
 Page migration will fail for memory that is pinned in memory with, for
 example, get_user_pages().  In this case, it is unnecessary to take
 zone-lru_lock or isolating the page and passing it to page migration
 which will ultimately fail.
 
 This is a racy check, the page can still change from under us, but in
 that case we'll just fail later when attempting to move the page.
 
 This avoids very expensive memory compaction when faulting transparent
 hugepages after pinning a lot of memory with a Mellanox driver.
 
 On a 128GB machine and pinning ~120GB of memory, before this patch we
 see the enormous disparity in the number of page migration failures
 because of the pinning (from /proc/vmstat):
 
 compact_blocks_moved 7609
 compact_pages_moved 3431
 compact_pagemigrate_failed 133219
 compact_stall 13
 
 After the patch, it is much more efficient:
 
 compact_blocks_moved 7998
 compact_pages_moved 6403
 compact_pagemigrate_failed 3
 compact_stall 15
 
 Signed-off-by: David Rientjes rient...@google.com
 ---
  mm/compaction.c | 8 
  1 file changed, 8 insertions(+)
 
 diff --git a/mm/compaction.c b/mm/compaction.c
 --- a/mm/compaction.c
 +++ b/mm/compaction.c
 @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
 compact_control *cc,
   continue;
   }
  
 + /*
 +  * Migration will fail if an anonymous page is pinned in memory,
 +  * so avoid taking zone-lru_lock and isolating it unnecessarily
 +  * in an admittedly racy check.
 +  */
 + if (!page_mapping(page)  page_count(page))
 + continue;
 +

Are you sure about this? The page_count check migration does is this

int expected_count = 1 + extra_count;
if (!mapping) {
if (page_count(page) != expected_count)
return -EAGAIN;
return MIGRATEPAGE_SUCCESS;
}

spin_lock_irq(mapping-tree_lock);

pslot = radix_tree_lookup_slot(mapping-page_tree,
page_index(page));

expected_count += 1 + page_has_private(page);

Migration expects and can migrate pages with no mapping and a page count
but you are now skipping them. I think you may have intended to split
migrations page count into a helper or copy the logic.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread David Rientjes
On Mon, 3 Feb 2014, Mel Gorman wrote:

  Page migration will fail for memory that is pinned in memory with, for
  example, get_user_pages().  In this case, it is unnecessary to take
  zone-lru_lock or isolating the page and passing it to page migration
  which will ultimately fail.
  
  This is a racy check, the page can still change from under us, but in
  that case we'll just fail later when attempting to move the page.
  
  This avoids very expensive memory compaction when faulting transparent
  hugepages after pinning a lot of memory with a Mellanox driver.
  
  On a 128GB machine and pinning ~120GB of memory, before this patch we
  see the enormous disparity in the number of page migration failures
  because of the pinning (from /proc/vmstat):
  
  compact_blocks_moved 7609
  compact_pages_moved 3431
  compact_pagemigrate_failed 133219
  compact_stall 13
  
  After the patch, it is much more efficient:
  
  compact_blocks_moved 7998
  compact_pages_moved 6403
  compact_pagemigrate_failed 3
  compact_stall 15
  
  Signed-off-by: David Rientjes rient...@google.com
  ---
   mm/compaction.c | 8 
   1 file changed, 8 insertions(+)
  
  diff --git a/mm/compaction.c b/mm/compaction.c
  --- a/mm/compaction.c
  +++ b/mm/compaction.c
  @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
  compact_control *cc,
  continue;
  }
   
  +   /*
  +* Migration will fail if an anonymous page is pinned in memory,
  +* so avoid taking zone-lru_lock and isolating it unnecessarily
  +* in an admittedly racy check.
  +*/
  +   if (!page_mapping(page)  page_count(page))
  +   continue;
  +
 
 Are you sure about this? The page_count check migration does is this
 
 int expected_count = 1 + extra_count;
 if (!mapping) {
 if (page_count(page) != expected_count)
 return -EAGAIN;
 return MIGRATEPAGE_SUCCESS;
 }
 
 spin_lock_irq(mapping-tree_lock);
 
 pslot = radix_tree_lookup_slot(mapping-page_tree,
 page_index(page));
 
 expected_count += 1 + page_has_private(page);
 
 Migration expects and can migrate pages with no mapping and a page count
 but you are now skipping them. I think you may have intended to split
 migrations page count into a helper or copy the logic.
 

Thanks for taking a look!

The patch is correct, it just shows my lack of a complete commit message 
which I'm struggling with recently.  In the case that this is addressing, 
get_user_pages() already gives page_count(page) == 1, then 
__isolate_lru_page() does another get_page() that is dropped in 
putback_lru_page() after the call into migrate_pages().  So in the code 
you quote above we always have page_count(page) == 2 and
expected_count == 1.

So what we desperately need to do is avoid isolating any page where 
page_count(page) is non-zero and !page_mapping(page) and do that before 
the get_page() in __isolate_lru_page() because we want to avoid taking 
zone-lru_lock.  On my 128GB machine filled with ~120GB of pinned memory 
for the driver, this lock gets highly contended under compaction and even 
reclaim if the rest of userspace is using a lot of memory.

It's not really relevant to the commit message, but I found that if all 
that ~120GB is faulted and I manually invoke compaction with the procfs 
trigger (with my fix to do cc.ignore_skip_hint = true), this lock gets 
taken ~450,000 times and only 0.05% of isolated pages are actually 
successfully migrated.

Deferred compaction will certainly help for compaction that isn't induced 
via procfs, but we've encountered massive amounts of lock contention in 
this path and extremely low success to failure ratios of page migration on 
average of 2-3 out of 60 runs and the fault path really does grind to a 
halt without this patch (or simply doing MADV_NOHUGEPAGE before the driver 
does ib_umem_get() for 120GB of memory, but we want those hugepages!).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 02:49:32AM -0800, David Rientjes wrote:
 On Mon, 3 Feb 2014, Mel Gorman wrote:
 
   Page migration will fail for memory that is pinned in memory with, for
   example, get_user_pages().  In this case, it is unnecessary to take
   zone-lru_lock or isolating the page and passing it to page migration
   which will ultimately fail.
   
   This is a racy check, the page can still change from under us, but in
   that case we'll just fail later when attempting to move the page.
   
   This avoids very expensive memory compaction when faulting transparent
   hugepages after pinning a lot of memory with a Mellanox driver.
   
   On a 128GB machine and pinning ~120GB of memory, before this patch we
   see the enormous disparity in the number of page migration failures
   because of the pinning (from /proc/vmstat):
   
   compact_blocks_moved 7609
   compact_pages_moved 3431
   compact_pagemigrate_failed 133219
   compact_stall 13
   
   After the patch, it is much more efficient:
   
   compact_blocks_moved 7998
   compact_pages_moved 6403
   compact_pagemigrate_failed 3
   compact_stall 15
   
   Signed-off-by: David Rientjes rient...@google.com
   ---
mm/compaction.c | 8 
1 file changed, 8 insertions(+)
   
   diff --git a/mm/compaction.c b/mm/compaction.c
   --- a/mm/compaction.c
   +++ b/mm/compaction.c
   @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
   compact_control *cc,
 continue;
 }

   + /*
   +  * Migration will fail if an anonymous page is pinned in memory,
   +  * so avoid taking zone-lru_lock and isolating it unnecessarily
   +  * in an admittedly racy check.
   +  */
   + if (!page_mapping(page)  page_count(page))
   + continue;
   +

Hello,

I think that you need more code to skip this type of page correctly.
Without page_mapped() check, this code makes migratable pages be skipped,
since if page_mapped() case, page_count() may be more than zero.

So I think that you need following change.

(!page_mapping(page)  !page_mapped(page)  page_count(page))

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

 I think that you need more code to skip this type of page correctly.
 Without page_mapped() check, this code makes migratable pages be skipped,
 since if page_mapped() case, page_count() may be more than zero.
 
 So I think that you need following change.
 
 (!page_mapping(page)  !page_mapped(page)  page_count(page))
 

These pages returned by get_user_pages() will have a mapcount of 1 so this 
wouldn't actually fix the massive lock contention.  page_mapping() is only 
going to be NULL for pages off the lru like these are for 
PAGE_MAPPING_ANON.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 05:20:46PM -0800, David Rientjes wrote:
 On Tue, 4 Feb 2014, Joonsoo Kim wrote:
 
  I think that you need more code to skip this type of page correctly.
  Without page_mapped() check, this code makes migratable pages be skipped,
  since if page_mapped() case, page_count() may be more than zero.
  
  So I think that you need following change.
  
  (!page_mapping(page)  !page_mapped(page)  page_count(page))
  
 
 These pages returned by get_user_pages() will have a mapcount of 1 so this 
 wouldn't actually fix the massive lock contention.  page_mapping() is only 
 going to be NULL for pages off the lru like these are for 
 PAGE_MAPPING_ANON.

Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
and have positive page_count(), so your code such as
'!page_mapping(page)  page_count(page)' makes compaction skip these *normal*
anon pages and this is incorrect behaviour.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

 Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
 and have positive page_count(), so your code such as
 '!page_mapping(page)  page_count(page)' makes compaction skip these *normal*
 anon pages and this is incorrect behaviour.
 

So how does that work with migrate_page_move_mapping() which demands 
page_count(page) == 1 and the get_page_unless_zero() in 
__isolate_lru_page()?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Joonsoo Kim
On Mon, Feb 03, 2014 at 06:00:56PM -0800, David Rientjes wrote:
 On Tue, 4 Feb 2014, Joonsoo Kim wrote:
 
  Okay. It can't fix your situation. Anyway, *normal* anon pages may be mapped
  and have positive page_count(), so your code such as
  '!page_mapping(page)  page_count(page)' makes compaction skip these 
  *normal*
  anon pages and this is incorrect behaviour.
  
 
 So how does that work with migrate_page_move_mapping() which demands 
 page_count(page) == 1 and the get_page_unless_zero() in 
 __isolate_lru_page()?

Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
__isolate_lru_page(). Am I missing something?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages

2014-02-03 Thread Hugh Dickins
On Mon, 3 Feb 2014, David Rientjes wrote:
 On Mon, 3 Feb 2014, Mel Gorman wrote:
 
   Page migration will fail for memory that is pinned in memory with, for
   example, get_user_pages().  In this case, it is unnecessary to take
   zone-lru_lock or isolating the page and passing it to page migration
   which will ultimately fail.
   
   This is a racy check, the page can still change from under us, but in
   that case we'll just fail later when attempting to move the page.
   
   This avoids very expensive memory compaction when faulting transparent
   hugepages after pinning a lot of memory with a Mellanox driver.
   
   On a 128GB machine and pinning ~120GB of memory, before this patch we
   see the enormous disparity in the number of page migration failures
   because of the pinning (from /proc/vmstat):

120GB of memory on the active/inactive lrus but longterm pinned,
that's quite worrying: not just a great waste of time for compaction,
but for page reclaim also.  I suppose a fairly easy way around it would
be for the driver to use mlock too, moving them all to unevictable lru.

But in general, you may well  be right that, racy as this isolation/
migration procedure necessarily is, in the face of longterm pinning it
may make more sense to test page_count before proceding to isolation
rather than only after in migration.  We always took the view that it's
better to give up only at the last moment, but that may be a bad bet.

   
   compact_blocks_moved 7609
   compact_pages_moved 3431
   compact_pagemigrate_failed 133219
   compact_stall 13
   
   After the patch, it is much more efficient:
   
   compact_blocks_moved 7998
   compact_pages_moved 6403
   compact_pagemigrate_failed 3
   compact_stall 15
   
   Signed-off-by: David Rientjes rient...@google.com
   ---
mm/compaction.c | 8 
1 file changed, 8 insertions(+)
   
   diff --git a/mm/compaction.c b/mm/compaction.c
   --- a/mm/compaction.c
   +++ b/mm/compaction.c
   @@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
   compact_control *cc,
 continue;
 }

   + /*
   +  * Migration will fail if an anonymous page is pinned in memory,
   +  * so avoid taking zone-lru_lock and isolating it unnecessarily
   +  * in an admittedly racy check.
   +  */
   + if (!page_mapping(page)  page_count(page))
   + continue;
   +
  
  Are you sure about this? The page_count check migration does is this
  
  int expected_count = 1 + extra_count;
  if (!mapping) {
  if (page_count(page) != expected_count)
  return -EAGAIN;
  return MIGRATEPAGE_SUCCESS;
  }
  
  spin_lock_irq(mapping-tree_lock);
  
  pslot = radix_tree_lookup_slot(mapping-page_tree,
  page_index(page));
  
  expected_count += 1 + page_has_private(page);
  
  Migration expects and can migrate pages with no mapping and a page count
  but you are now skipping them. I think you may have intended to split
  migrations page count into a helper or copy the logic.
  
 
 Thanks for taking a look!
 
 The patch is correct, it just shows my lack of a complete commit message 

I don't think so.  I agree with Mel that you should be reconsidering
those tests that migrate_page_move_mapping() makes, but remembering that
it's called at a stage between try_to_unmap() and remove_migration_ptes(),
when page_mapcount has been brought down to 0 - not the case here.

 which I'm struggling with recently.  In the case that this is addressing, 
 get_user_pages() already gives page_count(page) == 1, then 

But get_user_pages() brings the pages into user address space (if not
already there), page_mapcount 1 and page_count 1, and does an additional
pin on the page, page_count 2.  Or if it's a page_mapping page (perhaps
even PageAnon in SwapCache) there's another +1; if page_has_buffers
another +1; mapped into more user address spaces, +more.

Yourif (!page_mapping(page)  page_count(page))
continue;
is letting through any Anon SwapCache pages (probably no great concern
in your 120GB example; but I don't understand why you want to special-
case Anon anyway, beyond your specific testcase); and refusing to
isolate all those unpinned anonymous pages mapped into userspace which
migration is perfectly capable of migrating.  If 120GB out of 128GB is
pinned, that won't be a significant proportion, and of course your
change saves a lot of wasted time and lock contention; but for most
people it's a considerable proportion of their memory, and needs to
be migratable.

I think Joonsoo is making the same point (though I disagree with the
test he suggested); but I've not yet read the latest mails under a
separate subject ( fix appended).

Hugh

 __isolate_lru_page() does another get_page() that is dropped in 
 putback_lru_page() after the call into 

Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread David Rientjes
On Tue, 4 Feb 2014, Joonsoo Kim wrote:

   Okay. It can't fix your situation. Anyway, *normal* anon pages may be 
   mapped
   and have positive page_count(), so your code such as
   '!page_mapping(page)  page_count(page)' makes compaction skip these 
   *normal*
   anon pages and this is incorrect behaviour.
   
  
  So how does that work with migrate_page_move_mapping() which demands 
  page_count(page) == 1 and the get_page_unless_zero() in 
  __isolate_lru_page()?
 
 Before doing migrate_page_move_mapping(), try_to_unmap() is called so that all
 mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
 __isolate_lru_page(). Am I missing something?
 

Ah, good point.  I wonder if we can get away with 
page_count(page) - page_mapcount(page)  1 to avoid the get_user_pages() 
pin?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] mm, compaction: avoid isolating pinned pages fix

2014-02-03 Thread Hugh Dickins
On Mon, 3 Feb 2014, David Rientjes wrote:
 On Tue, 4 Feb 2014, Joonsoo Kim wrote:
 
Okay. It can't fix your situation. Anyway, *normal* anon pages may be 
mapped
and have positive page_count(), so your code such as
'!page_mapping(page)  page_count(page)' makes compaction skip these 
*normal*
anon pages and this is incorrect behaviour.

   
   So how does that work with migrate_page_move_mapping() which demands 
   page_count(page) == 1 and the get_page_unless_zero() in 
   __isolate_lru_page()?
  
  Before doing migrate_page_move_mapping(), try_to_unmap() is called so that 
  all
  mapping is unmapped. Then, remained page_count() is 1 which is grabbed by
  __isolate_lru_page(). Am I missing something?
  
 
 Ah, good point.  I wonder if we can get away with 
 page_count(page) - page_mapcount(page)  1 to avoid the get_user_pages() 
 pin?

Something like that.  But please go back to migrate_page_move_mapping()
to factor in what it's additionally considering.  Whether you can share
code with it, I don't know - it has to do some things under a lock you
cannot take at the preliminary stage - you haven't isolated or locked
the page yet.

There is a separate issue, that a mapping may supply its own non-default
mapping-a_ops-migratepage(): can we assume that the page_counting is
the same whatever migratepage() is in use?  I'm not sure.

If you stick to special-casing PageAnon pages, you won't face that
issue; but your proposed change would be a lot more satisfying if we
can convince ourselves that it's good for !PageAnon too.  May need a
trawl through the different migratepage() methods that exist in tree.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] mm, compaction: avoid isolating pinned pages

2014-02-01 Thread David Rientjes
Page migration will fail for memory that is pinned in memory with, for
example, get_user_pages().  In this case, it is unnecessary to take
zone->lru_lock or isolating the page and passing it to page migration
which will ultimately fail.

This is a racy check, the page can still change from under us, but in
that case we'll just fail later when attempting to move the page.

This avoids very expensive memory compaction when faulting transparent
hugepages after pinning a lot of memory with a Mellanox driver.

On a 128GB machine and pinning ~120GB of memory, before this patch we
see the enormous disparity in the number of page migration failures
because of the pinning (from /proc/vmstat):

compact_blocks_moved 7609
compact_pages_moved 3431
compact_pagemigrate_failed 133219
compact_stall 13

After the patch, it is much more efficient:

compact_blocks_moved 7998
compact_pages_moved 6403
compact_pagemigrate_failed 3
compact_stall 15

Signed-off-by: David Rientjes 
---
 mm/compaction.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
continue;
}
 
+   /*
+* Migration will fail if an anonymous page is pinned in memory,
+* so avoid taking zone->lru_lock and isolating it unnecessarily
+* in an admittedly racy check.
+*/
+   if (!page_mapping(page) && page_count(page))
+   continue;
+
/* Check if it is ok to still hold the lock */
locked = compact_checklock_irqsave(>lru_lock, ,
locked, cc);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch] mm, compaction: avoid isolating pinned pages

2014-02-01 Thread David Rientjes
Page migration will fail for memory that is pinned in memory with, for
example, get_user_pages().  In this case, it is unnecessary to take
zone-lru_lock or isolating the page and passing it to page migration
which will ultimately fail.

This is a racy check, the page can still change from under us, but in
that case we'll just fail later when attempting to move the page.

This avoids very expensive memory compaction when faulting transparent
hugepages after pinning a lot of memory with a Mellanox driver.

On a 128GB machine and pinning ~120GB of memory, before this patch we
see the enormous disparity in the number of page migration failures
because of the pinning (from /proc/vmstat):

compact_blocks_moved 7609
compact_pages_moved 3431
compact_pagemigrate_failed 133219
compact_stall 13

After the patch, it is much more efficient:

compact_blocks_moved 7998
compact_pages_moved 6403
compact_pagemigrate_failed 3
compact_stall 15

Signed-off-by: David Rientjes rient...@google.com
---
 mm/compaction.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -578,6 +578,14 @@ isolate_migratepages_range(struct zone *zone, struct 
compact_control *cc,
continue;
}
 
+   /*
+* Migration will fail if an anonymous page is pinned in memory,
+* so avoid taking zone-lru_lock and isolating it unnecessarily
+* in an admittedly racy check.
+*/
+   if (!page_mapping(page)  page_count(page))
+   continue;
+
/* Check if it is ok to still hold the lock */
locked = compact_checklock_irqsave(zone-lru_lock, flags,
locked, cc);
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/