[PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Mel Gorman
commit [7db8889a: mm: have order > 0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

C
Process A   M S F
|---|
Process B   M   FS

C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

if (cc->order > 0 && (!cc->wrapped ||
  zone->compact_cached_free_pfn >
  cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
processes compacting it can end in a situation where the entire zone is
not being scanned at all.  Further, it is possible for two processes to
ping-pong update to compact_cached_free_pfn which is just random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it matters
if two free scanners happen to be in the same block but with racing updates
it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
Reviewed-by: Minchan Kim 
---
 mm/compaction.c |   54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a806a9c..c2d0958 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,20 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Returns the start pfn of the last page block in a zone.  This is the 
starting
+ * point for full compaction of a zone.  Compaction searches for free pages 
from
+ * the end of each zone, while isolate_freepages_block scans forward inside 
each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+   unsigned long free_pfn;
+   free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+   free_pfn &= ~(pageblock_nr_pages-1);
+   return free_pfn;
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -475,17 +489,6 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
 
-   /*
-* Skip ahead if another thread is compacting in the area
-* simultaneously. If we wrapped around, we can only skip
-* ahead if zone->compact_cached_free_pfn also wrapped to
-* above our starting point.
-*/
-   if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
-   pfn = min(pfn, zone->compact_cached_free_pfn);
-
if (!pfn_valid(pfn))
continue;
 
@@ -528,7 +531,15 @@ static void isolate_freepages(struct zone *zone,
 */
if (isolated) {
high_pfn = max(high_pfn, pfn);
-   if (cc->order > 0)
+
+   /*
+* If the free scanner has wrapped, update
+* 

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
>   C
> Process A M S F
>   |---|
> Process B M   FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
> if (cc->order > 0 && (!cc->wrapped ||
>   zone->compact_cached_free_pfn >
>   cc->start_free_pfn))
> pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
>   of the zone. When a wrapped scanner isolates a page, it updates
>   compact_cached_free_pfn to point to the highest pageblock it
>   can isolate pages from.
> 
> If a scanner has not wrapped when it has finished isolated pages it
>   checks if compact_cached_free_pfn is pointing to the end of the
>   zone. If so, the value is updated to point to the highest
>   pageblock that pages were isolated from. This value will not
>   be updated again until a free page scanner wraps and resets
>   compact_cached_free_pfn.
> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman 
> Reviewed-by: Rik van Riel 
Reviewed-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Thu, Aug 09, 2012 at 09:23:28AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > > 
> > > 
> > > Second, it updates compact_cached_free_pfn in a more limited set of
> > > circumstances.
> > > 
> > > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > >   of the zone. When a wrapped scanner isolates a page, it updates
> > >   compact_cached_free_pfn to point to the highest pageblock it
> > >   can isolate pages from.
> > 
> > Okay until here.
> > 
> 
> Great.
> 
> > > 
> > > If a scanner has not wrapped when it has finished isolated pages it
> > >   checks if compact_cached_free_pfn is pointing to the end of the
> > >   zone. If so, the value is updated to point to the highest
> > >   pageblock that pages were isolated from. This value will not
> > >   be updated again until a free page scanner wraps and resets
> > >   compact_cached_free_pfn.
> > 
> > I tried to understand your intention of this part but unfortunately failed.
> > By this part, the problem you mentioned could happen again?
> > 
> 
> Potentially yes, I did say it still races in the changelog.
> 
> > C
> >  Process A  M S F
> > |---|
> >  Process B  M   FS
> >  
> >  C is zone->compact_cached_free_pfn
> >  S is cc->start_pfree_pfn
> >  M is cc->migrate_pfn
> >  F is cc->free_pfn
> > 
> > In this diagram, Process A has just reached its migrate scanner, wrapped
> > around and updated compact_cached_free_pfn to end of the zone accordingly.
> > 
> 
> Yes. Now that it has wrapped it updates the compact_cached_free_pfn
> every loop of isolate_freepages here.
> 
> if (isolated) {
> high_pfn = max(high_pfn, pfn);
> 
> /*
>  * If the free scanner has wrapped, update
>  * compact_cached_free_pfn to point to the highest
>  * pageblock with free pages. This reduces excessive
>  * scanning of full pageblocks near the end of the
>  * zone
>  */
> if (cc->order > 0 && cc->wrapped)
> zone->compact_cached_free_pfn = high_pfn;
> }
> 
> 
> 
> > Simultaneously, Process B finishes isolating in a block and peek 
> > compact_cached_free_pfn position and know it's end of the zone so
> > update compact_cached_free_pfn to highest pageblock that pages were
> > isolated from.
> > 
> 
> Yes, they race at this point. One of two things happen here and I agree
> that this is racy
> 
> 1. Process A does another iteration of its loop and sets it back
> 2. Process A does not do another iteration of the loop, the cached_pfn
>is further along that it should. The next compacting process will
>wrap early and reset cached_pfn again but continue to scan the zone.
> 
> Either option is relatively harmless because in both cases the zone gets
> scanned. In patch 4 it was possible that large portions of the zone were
> frequently missed.
> 
> > Process A updates compact_cached_free_pfn to the highest pageblock which
> > was set by process B because process A has wrapped. It ends up big jump
> > without any scanning in process A.
> > 
> 
> It recovers quickly and is nowhere near as severe as what patch 4
> suffers from.

Agreed.
Thanks, Mel.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > 
> > 
> > Second, it updates compact_cached_free_pfn in a more limited set of
> > circumstances.
> > 
> > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > of the zone. When a wrapped scanner isolates a page, it updates
> > compact_cached_free_pfn to point to the highest pageblock it
> > can isolate pages from.
> 
> Okay until here.
> 

Great.

> > 
> > If a scanner has not wrapped when it has finished isolated pages it
> > checks if compact_cached_free_pfn is pointing to the end of the
> > zone. If so, the value is updated to point to the highest
> > pageblock that pages were isolated from. This value will not
> > be updated again until a free page scanner wraps and resets
> > compact_cached_free_pfn.
> 
> I tried to understand your intention of this part but unfortunately failed.
> By this part, the problem you mentioned could happen again?
> 

Potentially yes, I did say it still races in the changelog.

>   C
>  Process AM S F
>   |---|
>  Process BM   FS
>  
>  C is zone->compact_cached_free_pfn
>  S is cc->start_pfree_pfn
>  M is cc->migrate_pfn
>  F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn to end of the zone accordingly.
> 

Yes. Now that it has wrapped it updates the compact_cached_free_pfn
every loop of isolate_freepages here.

if (isolated) {
high_pfn = max(high_pfn, pfn);

/*
 * If the free scanner has wrapped, update
 * compact_cached_free_pfn to point to the highest
 * pageblock with free pages. This reduces excessive
 * scanning of full pageblocks near the end of the
 * zone
 */
if (cc->order > 0 && cc->wrapped)
zone->compact_cached_free_pfn = high_pfn;
}



> Simultaneously, Process B finishes isolating in a block and peek 
> compact_cached_free_pfn position and know it's end of the zone so
> update compact_cached_free_pfn to highest pageblock that pages were
> isolated from.
> 

Yes, they race at this point. One of two things happen here and I agree
that this is racy

1. Process A does another iteration of its loop and sets it back
2. Process A does not do another iteration of the loop, the cached_pfn
   is further along that it should. The next compacting process will
   wrap early and reset cached_pfn again but continue to scan the zone.

Either option is relatively harmless because in both cases the zone gets
scanned. In patch 4 it was possible that large portions of the zone were
frequently missed.

> Process A updates compact_cached_free_pfn to the highest pageblock which
> was set by process B because process A has wrapped. It ends up big jump
> without any scanning in process A.
> 

It recovers quickly and is nowhere near as severe as what patch 4
suffers from.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
  SNIP
  
  Second, it updates compact_cached_free_pfn in a more limited set of
  circumstances.
  
  If a scanner has wrapped, it updates compact_cached_free_pfn to the end
  of the zone. When a wrapped scanner isolates a page, it updates
  compact_cached_free_pfn to point to the highest pageblock it
  can isolate pages from.
 
 Okay until here.
 

Great.

  
  If a scanner has not wrapped when it has finished isolated pages it
  checks if compact_cached_free_pfn is pointing to the end of the
  zone. If so, the value is updated to point to the highest
  pageblock that pages were isolated from. This value will not
  be updated again until a free page scanner wraps and resets
  compact_cached_free_pfn.
 
 I tried to understand your intention of this part but unfortunately failed.
 By this part, the problem you mentioned could happen again?
 

Potentially yes, I did say it still races in the changelog.

   C
  Process AM S F
   |---|
  Process BM   FS
  
  C is zone-compact_cached_free_pfn
  S is cc-start_pfree_pfn
  M is cc-migrate_pfn
  F is cc-free_pfn
 
 In this diagram, Process A has just reached its migrate scanner, wrapped
 around and updated compact_cached_free_pfn to end of the zone accordingly.
 

Yes. Now that it has wrapped it updates the compact_cached_free_pfn
every loop of isolate_freepages here.

if (isolated) {
high_pfn = max(high_pfn, pfn);

/*
 * If the free scanner has wrapped, update
 * compact_cached_free_pfn to point to the highest
 * pageblock with free pages. This reduces excessive
 * scanning of full pageblocks near the end of the
 * zone
 */
if (cc-order  0  cc-wrapped)
zone-compact_cached_free_pfn = high_pfn;
}



 Simultaneously, Process B finishes isolating in a block and peek 
 compact_cached_free_pfn position and know it's end of the zone so
 update compact_cached_free_pfn to highest pageblock that pages were
 isolated from.
 

Yes, they race at this point. One of two things happen here and I agree
that this is racy

1. Process A does another iteration of its loop and sets it back
2. Process A does not do another iteration of the loop, the cached_pfn
   is further along that it should. The next compacting process will
   wrap early and reset cached_pfn again but continue to scan the zone.

Either option is relatively harmless because in both cases the zone gets
scanned. In patch 4 it was possible that large portions of the zone were
frequently missed.

 Process A updates compact_cached_free_pfn to the highest pageblock which
 was set by process B because process A has wrapped. It ends up big jump
 without any scanning in process A.
 

It recovers quickly and is nowhere near as severe as what patch 4
suffers from.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Thu, Aug 09, 2012 at 09:23:28AM +0100, Mel Gorman wrote:
 On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
   SNIP
   
   Second, it updates compact_cached_free_pfn in a more limited set of
   circumstances.
   
   If a scanner has wrapped, it updates compact_cached_free_pfn to the end
 of the zone. When a wrapped scanner isolates a page, it updates
 compact_cached_free_pfn to point to the highest pageblock it
 can isolate pages from.
  
  Okay until here.
  
 
 Great.
 
   
   If a scanner has not wrapped when it has finished isolated pages it
 checks if compact_cached_free_pfn is pointing to the end of the
 zone. If so, the value is updated to point to the highest
 pageblock that pages were isolated from. This value will not
 be updated again until a free page scanner wraps and resets
 compact_cached_free_pfn.
  
  I tried to understand your intention of this part but unfortunately failed.
  By this part, the problem you mentioned could happen again?
  
 
 Potentially yes, I did say it still races in the changelog.
 
  C
   Process A  M S F
  |---|
   Process B  M   FS
   
   C is zone-compact_cached_free_pfn
   S is cc-start_pfree_pfn
   M is cc-migrate_pfn
   F is cc-free_pfn
  
  In this diagram, Process A has just reached its migrate scanner, wrapped
  around and updated compact_cached_free_pfn to end of the zone accordingly.
  
 
 Yes. Now that it has wrapped it updates the compact_cached_free_pfn
 every loop of isolate_freepages here.
 
 if (isolated) {
 high_pfn = max(high_pfn, pfn);
 
 /*
  * If the free scanner has wrapped, update
  * compact_cached_free_pfn to point to the highest
  * pageblock with free pages. This reduces excessive
  * scanning of full pageblocks near the end of the
  * zone
  */
 if (cc-order  0  cc-wrapped)
 zone-compact_cached_free_pfn = high_pfn;
 }
 
 
 
  Simultaneously, Process B finishes isolating in a block and peek 
  compact_cached_free_pfn position and know it's end of the zone so
  update compact_cached_free_pfn to highest pageblock that pages were
  isolated from.
  
 
 Yes, they race at this point. One of two things happen here and I agree
 that this is racy
 
 1. Process A does another iteration of its loop and sets it back
 2. Process A does not do another iteration of the loop, the cached_pfn
is further along that it should. The next compacting process will
wrap early and reset cached_pfn again but continue to scan the zone.
 
 Either option is relatively harmless because in both cases the zone gets
 scanned. In patch 4 it was possible that large portions of the zone were
 frequently missed.
 
  Process A updates compact_cached_free_pfn to the highest pageblock which
  was set by process B because process A has wrapped. It ends up big jump
  without any scanning in process A.
  
 
 It recovers quickly and is nowhere near as severe as what patch 4
 suffers from.

Agreed.
Thanks, Mel.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
 commit [7db8889a: mm: have order  0 compaction start off where it left]
 introduced a caching mechanism to reduce the amount work the free page
 scanner does in compaction. However, it has a problem. Consider two process
 simultaneously scanning free pages
 
   C
 Process A M S F
   |---|
 Process B M   FS
 
 C is zone-compact_cached_free_pfn
 S is cc-start_pfree_pfn
 M is cc-migrate_pfn
 F is cc-free_pfn
 
 In this diagram, Process A has just reached its migrate scanner, wrapped
 around and updated compact_cached_free_pfn accordingly.
 
 Simultaneously, Process B finishes isolating in a block and updates
 compact_cached_free_pfn again to the location of its free scanner.
 
 Process A moves to end_of_zone - one_pageblock and runs this check
 
 if (cc-order  0  (!cc-wrapped ||
   zone-compact_cached_free_pfn 
   cc-start_free_pfn))
 pfn = min(pfn, zone-compact_cached_free_pfn);
 
 compact_cached_free_pfn is above where it started so the free scanner skips
 almost the entire space it should have scanned. When there are multiple
 processes compacting it can end in a situation where the entire zone is
 not being scanned at all.  Further, it is possible for two processes to
 ping-pong update to compact_cached_free_pfn which is just random.
 
 Overall, the end result wrecks allocation success rates.
 
 There is not an obvious way around this problem without introducing new
 locking and state so this patch takes a different approach.
 
 First, it gets rid of the skip logic because it's not clear that it matters
 if two free scanners happen to be in the same block but with racing updates
 it's too easy for it to skip over blocks it should not.
 
 Second, it updates compact_cached_free_pfn in a more limited set of
 circumstances.
 
 If a scanner has wrapped, it updates compact_cached_free_pfn to the end
   of the zone. When a wrapped scanner isolates a page, it updates
   compact_cached_free_pfn to point to the highest pageblock it
   can isolate pages from.
 
 If a scanner has not wrapped when it has finished isolated pages it
   checks if compact_cached_free_pfn is pointing to the end of the
   zone. If so, the value is updated to point to the highest
   pageblock that pages were isolated from. This value will not
   be updated again until a free page scanner wraps and resets
   compact_cached_free_pfn.
 
 This is not optimal and it can still race but the compact_cached_free_pfn
 will be pointing to or very near a pageblock with free pages.
 
 Signed-off-by: Mel Gorman mgor...@suse.de
 Reviewed-by: Rik van Riel r...@redhat.com
Reviewed-by: Minchan Kim minc...@kernel.org

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-09 Thread Mel Gorman
commit [7db8889a: mm: have order  0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

C
Process A   M S F
|---|
Process B   M   FS

C is zone-compact_cached_free_pfn
S is cc-start_pfree_pfn
M is cc-migrate_pfn
F is cc-free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to end_of_zone - one_pageblock and runs this check

if (cc-order  0  (!cc-wrapped ||
  zone-compact_cached_free_pfn 
  cc-start_free_pfn))
pfn = min(pfn, zone-compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
processes compacting it can end in a situation where the entire zone is
not being scanned at all.  Further, it is possible for two processes to
ping-pong update to compact_cached_free_pfn which is just random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it matters
if two free scanners happen to be in the same block but with racing updates
it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman mgor...@suse.de
Reviewed-by: Rik van Riel r...@redhat.com
Reviewed-by: Minchan Kim minc...@kernel.org
---
 mm/compaction.c |   54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index a806a9c..c2d0958 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -437,6 +437,20 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Returns the start pfn of the last page block in a zone.  This is the 
starting
+ * point for full compaction of a zone.  Compaction searches for free pages 
from
+ * the end of each zone, while isolate_freepages_block scans forward inside 
each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+   unsigned long free_pfn;
+   free_pfn = zone-zone_start_pfn + zone-spanned_pages;
+   free_pfn = ~(pageblock_nr_pages-1);
+   return free_pfn;
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -475,17 +489,6 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
 
-   /*
-* Skip ahead if another thread is compacting in the area
-* simultaneously. If we wrapped around, we can only skip
-* ahead if zone-compact_cached_free_pfn also wrapped to
-* above our starting point.
-*/
-   if (cc-order  0  (!cc-wrapped ||
- zone-compact_cached_free_pfn 
- cc-start_free_pfn))
-   pfn = min(pfn, zone-compact_cached_free_pfn);
-
if (!pfn_valid(pfn))
continue;
 
@@ -528,7 +531,15 @@ static void isolate_freepages(struct zone *zone,
 */
if (isolated) {
high_pfn = max(high_pfn, pfn);
-   if (cc-order  0)
+
+   /*
+* If the free scanner has wrapped, update
+   

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-08 Thread Minchan Kim
Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
>   C
> Process A M S F
>   |---|
> Process B M   FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
> if (cc->order > 0 && (!cc->wrapped ||
>   zone->compact_cached_free_pfn >
>   cc->start_free_pfn))
> pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
>   of the zone. When a wrapped scanner isolates a page, it updates
>   compact_cached_free_pfn to point to the highest pageblock it
>   can isolate pages from.

Okay until here.

> 
> If a scanner has not wrapped when it has finished isolated pages it
>   checks if compact_cached_free_pfn is pointing to the end of the
>   zone. If so, the value is updated to point to the highest
>   pageblock that pages were isolated from. This value will not
>   be updated again until a free page scanner wraps and resets
>   compact_cached_free_pfn.

I tried to understand your intention of this part but unfortunately failed.
By this part, the problem you mentioned could happen again?

C
 Process A  M S F
|---|
 Process B  M   FS
 
 C is zone->compact_cached_free_pfn
 S is cc->start_pfree_pfn
 M is cc->migrate_pfn
 F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn to end of the zone accordingly.

Simultaneously, Process B finishes isolating in a block and peek 
compact_cached_free_pfn position and know it's end of the zone so
update compact_cached_free_pfn to highest pageblock that pages were
isolated from.

Process A updates compact_cached_free_pfn to the highest pageblock which
was set by process B because process A has wrapped. It ends up big jump
without any scanning in process A.

No?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman 
> Reviewed-by: Rik van Riel 
> ---
>  mm/compaction.c |   54 --
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the 
> starting
> + * point for full compaction of a zone.  Compaction searches for free pages 
> from
> + * the end of each zone, while isolate_freepages_block scans forward inside 
> each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> + unsigned long free_pfn;
> + free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> + free_pfn &= ~(pageblock_nr_pages-1);
> + return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for 

[PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-08 Thread Mel Gorman
commit [7db8889a: mm: have order > 0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

C
Process A   M S F
|---|
Process B   M   FS

C is zone->compact_cached_free_pfn
S is cc->start_pfree_pfn
M is cc->migrate_pfn
F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to "end_of_zone - one_pageblock" and runs this check

if (cc->order > 0 && (!cc->wrapped ||
  zone->compact_cached_free_pfn >
  cc->start_free_pfn))
pfn = min(pfn, zone->compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
processes compacting it can end in a situation where the entire zone is
not being scanned at all.  Further, it is possible for two processes to
ping-pong update to compact_cached_free_pfn which is just random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it matters
if two free scanners happen to be in the same block but with racing updates
it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman 
Reviewed-by: Rik van Riel 
---
 mm/compaction.c |   54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index be310f1..df50f73 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Returns the start pfn of the last page block in a zone.  This is the 
starting
+ * point for full compaction of a zone.  Compaction searches for free pages 
from
+ * the end of each zone, while isolate_freepages_block scans forward inside 
each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+   unsigned long free_pfn;
+   free_pfn = zone->zone_start_pfn + zone->spanned_pages;
+   free_pfn &= ~(pageblock_nr_pages-1);
+   return free_pfn;
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
 
-   /*
-* Skip ahead if another thread is compacting in the area
-* simultaneously. If we wrapped around, we can only skip
-* ahead if zone->compact_cached_free_pfn also wrapped to
-* above our starting point.
-*/
-   if (cc->order > 0 && (!cc->wrapped ||
- zone->compact_cached_free_pfn >
- cc->start_free_pfn))
-   pfn = min(pfn, zone->compact_cached_free_pfn);
-
if (!pfn_valid(pfn))
continue;
 
@@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
 */
if (isolated) {
high_pfn = max(high_pfn, pfn);
-   if (cc->order > 0)
+
+   /*
+* If the free scanner has wrapped, update
+* compact_cached_free_pfn to point to the 

[PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-08 Thread Mel Gorman
commit [7db8889a: mm: have order  0 compaction start off where it left]
introduced a caching mechanism to reduce the amount work the free page
scanner does in compaction. However, it has a problem. Consider two process
simultaneously scanning free pages

C
Process A   M S F
|---|
Process B   M   FS

C is zone-compact_cached_free_pfn
S is cc-start_pfree_pfn
M is cc-migrate_pfn
F is cc-free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn accordingly.

Simultaneously, Process B finishes isolating in a block and updates
compact_cached_free_pfn again to the location of its free scanner.

Process A moves to end_of_zone - one_pageblock and runs this check

if (cc-order  0  (!cc-wrapped ||
  zone-compact_cached_free_pfn 
  cc-start_free_pfn))
pfn = min(pfn, zone-compact_cached_free_pfn);

compact_cached_free_pfn is above where it started so the free scanner skips
almost the entire space it should have scanned. When there are multiple
processes compacting it can end in a situation where the entire zone is
not being scanned at all.  Further, it is possible for two processes to
ping-pong update to compact_cached_free_pfn which is just random.

Overall, the end result wrecks allocation success rates.

There is not an obvious way around this problem without introducing new
locking and state so this patch takes a different approach.

First, it gets rid of the skip logic because it's not clear that it matters
if two free scanners happen to be in the same block but with racing updates
it's too easy for it to skip over blocks it should not.

Second, it updates compact_cached_free_pfn in a more limited set of
circumstances.

If a scanner has wrapped, it updates compact_cached_free_pfn to the end
of the zone. When a wrapped scanner isolates a page, it updates
compact_cached_free_pfn to point to the highest pageblock it
can isolate pages from.

If a scanner has not wrapped when it has finished isolated pages it
checks if compact_cached_free_pfn is pointing to the end of the
zone. If so, the value is updated to point to the highest
pageblock that pages were isolated from. This value will not
be updated again until a free page scanner wraps and resets
compact_cached_free_pfn.

This is not optimal and it can still race but the compact_cached_free_pfn
will be pointing to or very near a pageblock with free pages.

Signed-off-by: Mel Gorman mgor...@suse.de
Reviewed-by: Rik van Riel r...@redhat.com
---
 mm/compaction.c |   54 --
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index be310f1..df50f73 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
 }
 
 /*
+ * Returns the start pfn of the last page block in a zone.  This is the 
starting
+ * point for full compaction of a zone.  Compaction searches for free pages 
from
+ * the end of each zone, while isolate_freepages_block scans forward inside 
each
+ * page block.
+ */
+static unsigned long start_free_pfn(struct zone *zone)
+{
+   unsigned long free_pfn;
+   free_pfn = zone-zone_start_pfn + zone-spanned_pages;
+   free_pfn = ~(pageblock_nr_pages-1);
+   return free_pfn;
+}
+
+/*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
  */
@@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
pfn -= pageblock_nr_pages) {
unsigned long isolated;
 
-   /*
-* Skip ahead if another thread is compacting in the area
-* simultaneously. If we wrapped around, we can only skip
-* ahead if zone-compact_cached_free_pfn also wrapped to
-* above our starting point.
-*/
-   if (cc-order  0  (!cc-wrapped ||
- zone-compact_cached_free_pfn 
- cc-start_free_pfn))
-   pfn = min(pfn, zone-compact_cached_free_pfn);
-
if (!pfn_valid(pfn))
continue;
 
@@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
 */
if (isolated) {
high_pfn = max(high_pfn, pfn);
-   if (cc-order  0)
+
+   /*
+* If the free scanner has wrapped, update
+* compact_cached_free_pfn to point to the 

Re: [PATCH 5/5] mm: have order 0 compaction start near a pageblock with free pages

2012-08-08 Thread Minchan Kim
Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
 commit [7db8889a: mm: have order  0 compaction start off where it left]
 introduced a caching mechanism to reduce the amount work the free page
 scanner does in compaction. However, it has a problem. Consider two process
 simultaneously scanning free pages
 
   C
 Process A M S F
   |---|
 Process B M   FS
 
 C is zone-compact_cached_free_pfn
 S is cc-start_pfree_pfn
 M is cc-migrate_pfn
 F is cc-free_pfn
 
 In this diagram, Process A has just reached its migrate scanner, wrapped
 around and updated compact_cached_free_pfn accordingly.
 
 Simultaneously, Process B finishes isolating in a block and updates
 compact_cached_free_pfn again to the location of its free scanner.
 
 Process A moves to end_of_zone - one_pageblock and runs this check
 
 if (cc-order  0  (!cc-wrapped ||
   zone-compact_cached_free_pfn 
   cc-start_free_pfn))
 pfn = min(pfn, zone-compact_cached_free_pfn);
 
 compact_cached_free_pfn is above where it started so the free scanner skips
 almost the entire space it should have scanned. When there are multiple
 processes compacting it can end in a situation where the entire zone is
 not being scanned at all.  Further, it is possible for two processes to
 ping-pong update to compact_cached_free_pfn which is just random.
 
 Overall, the end result wrecks allocation success rates.
 
 There is not an obvious way around this problem without introducing new
 locking and state so this patch takes a different approach.
 
 First, it gets rid of the skip logic because it's not clear that it matters
 if two free scanners happen to be in the same block but with racing updates
 it's too easy for it to skip over blocks it should not.
 
 Second, it updates compact_cached_free_pfn in a more limited set of
 circumstances.
 
 If a scanner has wrapped, it updates compact_cached_free_pfn to the end
   of the zone. When a wrapped scanner isolates a page, it updates
   compact_cached_free_pfn to point to the highest pageblock it
   can isolate pages from.

Okay until here.

 
 If a scanner has not wrapped when it has finished isolated pages it
   checks if compact_cached_free_pfn is pointing to the end of the
   zone. If so, the value is updated to point to the highest
   pageblock that pages were isolated from. This value will not
   be updated again until a free page scanner wraps and resets
   compact_cached_free_pfn.

I tried to understand your intention of this part but unfortunately failed.
By this part, the problem you mentioned could happen again?

C
 Process A  M S F
|---|
 Process B  M   FS
 
 C is zone-compact_cached_free_pfn
 S is cc-start_pfree_pfn
 M is cc-migrate_pfn
 F is cc-free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn to end of the zone accordingly.

Simultaneously, Process B finishes isolating in a block and peek 
compact_cached_free_pfn position and know it's end of the zone so
update compact_cached_free_pfn to highest pageblock that pages were
isolated from.

Process A updates compact_cached_free_pfn to the highest pageblock which
was set by process B because process A has wrapped. It ends up big jump
without any scanning in process A.

No?

 
 This is not optimal and it can still race but the compact_cached_free_pfn
 will be pointing to or very near a pageblock with free pages.
 
 Signed-off-by: Mel Gorman mgor...@suse.de
 Reviewed-by: Rik van Riel r...@redhat.com
 ---
  mm/compaction.c |   54 --
  1 file changed, 28 insertions(+), 26 deletions(-)
 
 diff --git a/mm/compaction.c b/mm/compaction.c
 index be310f1..df50f73 100644
 --- a/mm/compaction.c
 +++ b/mm/compaction.c
 @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
  }
  
  /*
 + * Returns the start pfn of the last page block in a zone.  This is the 
 starting
 + * point for full compaction of a zone.  Compaction searches for free pages 
 from
 + * the end of each zone, while isolate_freepages_block scans forward inside 
 each
 + * page block.
 + */
 +static unsigned long start_free_pfn(struct zone *zone)
 +{
 + unsigned long free_pfn;
 + free_pfn = zone-zone_start_pfn + zone-spanned_pages;
 + free_pfn = ~(pageblock_nr_pages-1);
 + return free_pfn;
 +}
 +
 +/*
   * Based on information in the current compact_control, find blocks
   * suitable for isolating free pages from and then isolate them.
   */
 @@ -457,17 +471,6 @@ static void