Hi Mel,

On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
>                                                       C
> Process A             M     S                         F
>               |---------------------------------------|
> Process B             M       FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
>                 if (cc->order > 0 && (!cc->wrapped ||
>                                       zone->compact_cached_free_pfn >
>                                       cc->start_free_pfn))
>                         pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
>       of the zone. When a wrapped scanner isolates a page, it updates
>       compact_cached_free_pfn to point to the highest pageblock it
>       can isolate pages from.

Okay until here.

> 
> If a scanner has not wrapped when it has finished isolated pages it
>       checks if compact_cached_free_pfn is pointing to the end of the
>       zone. If so, the value is updated to point to the highest
>       pageblock that pages were isolated from. This value will not
>       be updated again until a free page scanner wraps and resets
>       compact_cached_free_pfn.

I tried to understand your intention of this part but unfortunately failed.
By this part, the problem you mentioned could happen again?

                                                        C
 Process A              M     S                         F
                |---------------------------------------|
 Process B              M       FS
 
 C is zone->compact_cached_free_pfn
 S is cc->start_pfree_pfn
 M is cc->migrate_pfn
 F is cc->free_pfn

In this diagram, Process A has just reached its migrate scanner, wrapped
around and updated compact_cached_free_pfn to end of the zone accordingly.

Simultaneously, Process B finishes isolating in a block and peek 
compact_cached_free_pfn position and know it's end of the zone so
update compact_cached_free_pfn to highest pageblock that pages were
isolated from.

Process A updates compact_cached_free_pfn to the highest pageblock which
was set by process B because process A has wrapped. It ends up big jump
without any scanning in process A.

No?

> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman <mgor...@suse.de>
> Reviewed-by: Rik van Riel <r...@redhat.com>
> ---
>  mm/compaction.c |   54 ++++++++++++++++++++++++++++--------------------------
>  1 file changed, 28 insertions(+), 26 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index be310f1..df50f73 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -419,6 +419,20 @@ static bool suitable_migration_target(struct page *page)
>  }
>  
>  /*
> + * Returns the start pfn of the last page block in a zone.  This is the 
> starting
> + * point for full compaction of a zone.  Compaction searches for free pages 
> from
> + * the end of each zone, while isolate_freepages_block scans forward inside 
> each
> + * page block.
> + */
> +static unsigned long start_free_pfn(struct zone *zone)
> +{
> +     unsigned long free_pfn;
> +     free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> +     free_pfn &= ~(pageblock_nr_pages-1);
> +     return free_pfn;
> +}
> +
> +/*
>   * Based on information in the current compact_control, find blocks
>   * suitable for isolating free pages from and then isolate them.
>   */
> @@ -457,17 +471,6 @@ static void isolate_freepages(struct zone *zone,
>                                       pfn -= pageblock_nr_pages) {
>               unsigned long isolated;
>  
> -             /*
> -              * Skip ahead if another thread is compacting in the area
> -              * simultaneously. If we wrapped around, we can only skip
> -              * ahead if zone->compact_cached_free_pfn also wrapped to
> -              * above our starting point.
> -              */
> -             if (cc->order > 0 && (!cc->wrapped ||
> -                                   zone->compact_cached_free_pfn >
> -                                   cc->start_free_pfn))
> -                     pfn = min(pfn, zone->compact_cached_free_pfn);
> -
>               if (!pfn_valid(pfn))
>                       continue;
>  
> @@ -510,7 +513,15 @@ static void isolate_freepages(struct zone *zone,
>                */
>               if (isolated) {
>                       high_pfn = max(high_pfn, pfn);
> -                     if (cc->order > 0)
> +
> +                     /*
> +                      * If the free scanner has wrapped, update
> +                      * compact_cached_free_pfn to point to the highest
> +                      * pageblock with free pages. This reduces excessive
> +                      * scanning of full pageblocks near the end of the
> +                      * zone
> +                      */
> +                     if (cc->order > 0 && cc->wrapped)
>                               zone->compact_cached_free_pfn = high_pfn;
>               }
>       }
> @@ -520,6 +531,11 @@ static void isolate_freepages(struct zone *zone,
>  
>       cc->free_pfn = high_pfn;
>       cc->nr_freepages = nr_freepages;
> +
> +     /* If compact_cached_free_pfn is reset then set it now */
> +     if (cc->order > 0 && !cc->wrapped &&
> +                     zone->compact_cached_free_pfn == start_free_pfn(zone))
> +             zone->compact_cached_free_pfn = high_pfn;
>  }
>  
>  /*
> @@ -607,20 +623,6 @@ static isolate_migrate_t isolate_migratepages(struct 
> zone *zone,
>       return ISOLATE_SUCCESS;
>  }
>  
> -/*
> - * Returns the start pfn of the last page block in a zone.  This is the 
> starting
> - * point for full compaction of a zone.  Compaction searches for free pages 
> from
> - * the end of each zone, while isolate_freepages_block scans forward inside 
> each
> - * page block.
> - */
> -static unsigned long start_free_pfn(struct zone *zone)
> -{
> -     unsigned long free_pfn;
> -     free_pfn = zone->zone_start_pfn + zone->spanned_pages;
> -     free_pfn &= ~(pageblock_nr_pages-1);
> -     return free_pfn;
> -}
> -
>  static int compact_finished(struct zone *zone,
>                           struct compact_control *cc)
>  {
> -- 
> 1.7.9.2
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"d...@kvack.org";> em...@kvack.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to