On Tue, 22 Apr 2014 09:38:52 +0100 Mel Gorman <[email protected]> wrote:

> throttle_direct_reclaim() is meant to trigger during swap-over-network
> during which the min watermark is treated as a pfmemalloc reserve. It
> throttes on the first node in the zonelist but this is flawed.
> 
> On a NUMA machine running a 32-bit kernel (I know) allocation requests
> freom CPUs on node 1 would detect no pfmemalloc reserves and the process
> gets throttled. This patch adjusts throttling of direct reclaim to throttle
> based on the first node in the zonelist that has a usable ZONE_NORMAL or
> lower zone.

I'm unable to determine from the above whether we should backport this
fix.  Please don't forget to describe the end-user visible effects of
a bug when that isn't obvious.  

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2507,10 +2507,17 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>  
>       for (i = 0; i <= ZONE_NORMAL; i++) {
>               zone = &pgdat->node_zones[i];
> +             if (!populated_zone(zone))
> +                     continue;

What's this?  Performance tweak?  Or does min_wmark_pages() return
non-zero for an unpopulated zone, which seems odd.

>               pfmemalloc_reserve += min_wmark_pages(zone);
>               free_pages += zone_page_state(zone, NR_FREE_PAGES);
>       }
>  
> +     /* If there are no reserves (unexpected config) then do not throttle */
> +     if (!pfmemalloc_reserve)
> +             return true;
> +
>       wmark_ok = free_pages > pfmemalloc_reserve / 2;
>  
>       /* kswapd must be awake if processes are being throttled */
> @@ -2535,9 +2542,9 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
>  static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist 
> *zonelist,
>                                       nodemask_t *nodemask)
>  {
> +     struct zoneref *z;
>       struct zone *zone;
> -     int high_zoneidx = gfp_zone(gfp_mask);
> -     pg_data_t *pgdat;
> +     pg_data_t *pgdat = NULL;
>  
>       /*
>        * Kernel threads should not be throttled as they may be indirectly
> @@ -2556,10 +2563,24 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, 
> struct zonelist *zonelist,
>       if (fatal_signal_pending(current))
>               goto out;
>  
> -     /* Check if the pfmemalloc reserves are ok */
> -     first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
> -     pgdat = zone->zone_pgdat;
> -     if (pfmemalloc_watermark_ok(pgdat))
> +     /*
> +      * Check if the pfmemalloc reserves are ok by finding the first node
> +      * with a usable ZONE_NORMAL or lower zone
> +      */

That comment tells us what the code does but not why it does it.

- Why do we ignore zones >= ZONE_NORMAL?

- Why do we throttle when there may be as-yet-unexamined nodes which
  have reclaimable pages?


> +        for_each_zone_zonelist_nodemask(zone, z, zonelist,
> +                                        gfp_mask, nodemask) {

Those two lines have spaces-instead-of-tabs.

> +             if (zone_idx(zone) > ZONE_NORMAL)
> +                     continue;
> +
> +             /* Throttle based on the first usable node */
> +             pgdat = zone->zone_pgdat;
> +             if (pfmemalloc_watermark_ok(pgdat))
> +                     goto out;
> +             break;
> +     }
> +
> +     /* If no zone was usable by the allocation flags then do not throttle */
> +     if (!pgdat)
>               goto out;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to