Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-14 Thread Johannes Hirte
Am Fri, 9 Nov 2012 08:36:37 +
schrieb Mel Gorman :

> On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote:
> > Am Mon, 5 Nov 2012 14:24:49 +
> > schrieb Mel Gorman :
> > 
> > > Jiri Slaby reported the following:
> > > 
> > >   (It's an effective revert of "mm: vmscan: scale number of
> > > pages reclaimed by reclaim/compaction based on failures".) Given
> > > kswapd had hours of runtime in ps/top output yesterday in the
> > > morning and after the revert it's now 2 minutes in sum for the
> > > last 24h, I would say, it's gone.
> > > 
> > > The intention of the patch in question was to compensate for the
> > > loss of lumpy reclaim. Part of the reason lumpy reclaim worked is
> > > because it aggressively reclaimed pages and this patch was meant
> > > to be a sane compromise.
> > > 
> > > When compaction fails, it gets deferred and both compaction and
> > > reclaim/compaction is deferred avoid excessive reclaim. However,
> > > since commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is
> > > woken up each time and continues reclaiming which was not taken
> > > into account when the patch was developed.
> > > 
> > > Attempts to address the problem ended up just changing the shape
> > > of the problem instead of fixing it. The release window gets
> > > closer and while a THP allocation failing is not a major problem,
> > > kswapd chewing up a lot of CPU is. This patch reverts "mm:
> > > vmscan: scale number of pages reclaimed by reclaim/compaction
> > > based on failures" and will be revisited in the future.
> > > 
> > > Signed-off-by: Mel Gorman 
> > > ---
> > >  mm/vmscan.c |   25 -
> > >  1 file changed, 25 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 2624edc..e081ee8 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> > > scan_control *sc) return false;
> > >  }
> > >  
> > > -#ifdef CONFIG_COMPACTION
> > > -/*
> > > - * If compaction is deferred for sc->order then scale the number
> > > of pages
> > > - * reclaimed based on the number of consecutive allocation
> > > failures
> > > - */
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - struct zone *zone = lruvec_zone(lruvec);
> > > -
> > > - if (zone->compact_order_failed <= sc->order)
> > > - pages_for_compaction <<=
> > > zone->compact_defer_shift;
> > > - return pages_for_compaction;
> > > -}
> > > -#else
> > > -static unsigned long scale_for_compaction(unsigned long
> > > pages_for_compaction,
> > > - struct lruvec *lruvec, struct
> > > scan_control *sc) -{
> > > - return pages_for_compaction;
> > > -}
> > > -#endif
> > > -
> > >  /*
> > >   * Reclaim/compaction is used for high-order allocation
> > > requests. It reclaims
> > >   * order-0 pages before compacting the zone.
> > > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static
> > > inline bool should_continue_reclaim(struct lruvec *lruvec,
> > >* inactive lists are large enough, continue reclaiming
> > >*/
> > >   pages_for_compaction = (2UL << sc->order);
> > > -
> > > - pages_for_compaction =
> > > scale_for_compaction(pages_for_compaction,
> > > - lruvec, sc);
> > >   inactive_lru_pages = get_lru_size(lruvec,
> > > LRU_INACTIVE_FILE); if (nr_swap_pages > 0)
> > >   inactive_lru_pages += get_lru_size(lruvec,
> > > LRU_INACTIVE_ANON); --
> > 
> > Even with this patch I see kswapd0 very often on top. Much more than
> > with kernel 3.6.
> 
> How severe is the CPU usage? The higher usage can be explained by "mm:
> remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to
> reduce the amount of time processes spend in compaction but will
> result in the CPU cost being incurred by kswapd.
> 
> Is it really high like the bug was reporting with high usage over long
> periods of time or do you just see it using 2-6% of CPU for short
> periods?

It is really high. I've seen with compile-jobs (make -j4 on dual
core) kswapd0 consuming at least 50% CPU most time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-09 Thread Mel Gorman
On Mon, Nov 05, 2012 at 02:24:49PM +, Mel Gorman wrote:
> Jiri Slaby reported the following:
> 
>   (It's an effective revert of "mm: vmscan: scale number of pages
>   reclaimed by reclaim/compaction based on failures".) Given kswapd
>   had hours of runtime in ps/top output yesterday in the morning
>   and after the revert it's now 2 minutes in sum for the last 24h,
>   I would say, it's gone.
> 
> The intention of the patch in question was to compensate for the loss
> of lumpy reclaim. Part of the reason lumpy reclaim worked is because
> it aggressively reclaimed pages and this patch was meant to be a sane
> compromise.
> 
> When compaction fails, it gets deferred and both compaction and
> reclaim/compaction is deferred avoid excessive reclaim. However, since
> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time
> and continues reclaiming which was not taken into account when the patch
> was developed.
> 
> Attempts to address the problem ended up just changing the shape of the
> problem instead of fixing it. The release window gets closer and while a
> THP allocation failing is not a major problem, kswapd chewing up a lot of
> CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed
> by reclaim/compaction based on failures" and will be revisited in the future.
> 
> Signed-off-by: Mel Gorman 

Andrew, can you pick up this patch please and drop
mm-vmscan-scale-number-of-pages-reclaimed-by-reclaim-compaction-only-in-direct-reclaim.patch
?

There are mixed reports on how much it helps but it comes down to "this
fixes a problem" versus "kswapd is still showing higher usage". I think
the higher kswapd usage is explained by the removal of __GFP_NO_KSWAPD
and so while higher usage is bad, it is not necessarily unjustified.
Ideally it would have been proven that having kswapd doing the work
reduced application stalls in direct reclaim but unfortunately I do not
have concrete evidence of that at this time.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-09 Thread Mel Gorman
On Tue, Nov 06, 2012 at 11:15:54AM +0100, Johannes Hirte wrote:
> Am Mon, 5 Nov 2012 14:24:49 +
> schrieb Mel Gorman :
> 
> > Jiri Slaby reported the following:
> > 
> > (It's an effective revert of "mm: vmscan: scale number of
> > pages reclaimed by reclaim/compaction based on failures".) Given
> > kswapd had hours of runtime in ps/top output yesterday in the morning
> > and after the revert it's now 2 minutes in sum for the last
> > 24h, I would say, it's gone.
> > 
> > The intention of the patch in question was to compensate for the loss
> > of lumpy reclaim. Part of the reason lumpy reclaim worked is because
> > it aggressively reclaimed pages and this patch was meant to be a sane
> > compromise.
> > 
> > When compaction fails, it gets deferred and both compaction and
> > reclaim/compaction is deferred avoid excessive reclaim. However, since
> > commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each
> > time and continues reclaiming which was not taken into account when
> > the patch was developed.
> > 
> > Attempts to address the problem ended up just changing the shape of
> > the problem instead of fixing it. The release window gets closer and
> > while a THP allocation failing is not a major problem, kswapd chewing
> > up a lot of CPU is. This patch reverts "mm: vmscan: scale number of
> > pages reclaimed by reclaim/compaction based on failures" and will be
> > revisited in the future.
> > 
> > Signed-off-by: Mel Gorman 
> > ---
> >  mm/vmscan.c |   25 -
> >  1 file changed, 25 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 2624edc..e081ee8 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> > scan_control *sc) return false;
> >  }
> >  
> > -#ifdef CONFIG_COMPACTION
> > -/*
> > - * If compaction is deferred for sc->order then scale the number of
> > pages
> > - * reclaimed based on the number of consecutive allocation failures
> > - */
> > -static unsigned long scale_for_compaction(unsigned long
> > pages_for_compaction,
> > -   struct lruvec *lruvec, struct scan_control
> > *sc) -{
> > -   struct zone *zone = lruvec_zone(lruvec);
> > -
> > -   if (zone->compact_order_failed <= sc->order)
> > -   pages_for_compaction <<= zone->compact_defer_shift;
> > -   return pages_for_compaction;
> > -}
> > -#else
> > -static unsigned long scale_for_compaction(unsigned long
> > pages_for_compaction,
> > -   struct lruvec *lruvec, struct scan_control
> > *sc) -{
> > -   return pages_for_compaction;
> > -}
> > -#endif
> > -
> >  /*
> >   * Reclaim/compaction is used for high-order allocation requests. It
> > reclaims
> >   * order-0 pages before compacting the zone.
> > should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline
> > bool should_continue_reclaim(struct lruvec *lruvec,
> >  * inactive lists are large enough, continue reclaiming
> >  */
> > pages_for_compaction = (2UL << sc->order);
> > -
> > -   pages_for_compaction =
> > scale_for_compaction(pages_for_compaction,
> > -   lruvec, sc);
> > inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
> > if (nr_swap_pages > 0)
> > inactive_lru_pages += get_lru_size(lruvec,
> > LRU_INACTIVE_ANON); --
> 
> Even with this patch I see kswapd0 very often on top. Much more than
> with kernel 3.6.

How severe is the CPU usage? The higher usage can be explained by "mm:
remove __GFP_NO_KSWAPD" which allows kswapd to compact memory to reduce
the amount of time processes spend in compaction but will result in the
CPU cost being incurred by kswapd.

Is it really high like the bug was reporting with high usage over long
periods of time or do you just see it using 2-6% of CPU for short
periods?

Thanks.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-06 Thread Johannes Hirte
Am Mon, 5 Nov 2012 14:24:49 +
schrieb Mel Gorman :

> Jiri Slaby reported the following:
> 
>   (It's an effective revert of "mm: vmscan: scale number of
> pages reclaimed by reclaim/compaction based on failures".) Given
> kswapd had hours of runtime in ps/top output yesterday in the morning
>   and after the revert it's now 2 minutes in sum for the last
> 24h, I would say, it's gone.
> 
> The intention of the patch in question was to compensate for the loss
> of lumpy reclaim. Part of the reason lumpy reclaim worked is because
> it aggressively reclaimed pages and this patch was meant to be a sane
> compromise.
> 
> When compaction fails, it gets deferred and both compaction and
> reclaim/compaction is deferred avoid excessive reclaim. However, since
> commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each
> time and continues reclaiming which was not taken into account when
> the patch was developed.
> 
> Attempts to address the problem ended up just changing the shape of
> the problem instead of fixing it. The release window gets closer and
> while a THP allocation failing is not a major problem, kswapd chewing
> up a lot of CPU is. This patch reverts "mm: vmscan: scale number of
> pages reclaimed by reclaim/compaction based on failures" and will be
> revisited in the future.
> 
> Signed-off-by: Mel Gorman 
> ---
>  mm/vmscan.c |   25 -
>  1 file changed, 25 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2624edc..e081ee8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct
> scan_control *sc) return false;
>  }
>  
> -#ifdef CONFIG_COMPACTION
> -/*
> - * If compaction is deferred for sc->order then scale the number of
> pages
> - * reclaimed based on the number of consecutive allocation failures
> - */
> -static unsigned long scale_for_compaction(unsigned long
> pages_for_compaction,
> - struct lruvec *lruvec, struct scan_control
> *sc) -{
> - struct zone *zone = lruvec_zone(lruvec);
> -
> - if (zone->compact_order_failed <= sc->order)
> - pages_for_compaction <<= zone->compact_defer_shift;
> - return pages_for_compaction;
> -}
> -#else
> -static unsigned long scale_for_compaction(unsigned long
> pages_for_compaction,
> - struct lruvec *lruvec, struct scan_control
> *sc) -{
> - return pages_for_compaction;
> -}
> -#endif
> -
>  /*
>   * Reclaim/compaction is used for high-order allocation requests. It
> reclaims
>   * order-0 pages before compacting the zone.
> should_continue_reclaim() returns @@ -1829,9 +1807,6 @@ static inline
> bool should_continue_reclaim(struct lruvec *lruvec,
>* inactive lists are large enough, continue reclaiming
>*/
>   pages_for_compaction = (2UL << sc->order);
> -
> - pages_for_compaction =
> scale_for_compaction(pages_for_compaction,
> - lruvec, sc);
>   inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
>   if (nr_swap_pages > 0)
>   inactive_lru_pages += get_lru_size(lruvec,
> LRU_INACTIVE_ANON); --

Even with this patch I see kswapd0 very often on top. Much more than
with kernel 3.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"

2012-11-05 Thread Mel Gorman
Jiri Slaby reported the following:

(It's an effective revert of "mm: vmscan: scale number of pages
reclaimed by reclaim/compaction based on failures".) Given kswapd
had hours of runtime in ps/top output yesterday in the morning
and after the revert it's now 2 minutes in sum for the last 24h,
I would say, it's gone.

The intention of the patch in question was to compensate for the loss
of lumpy reclaim. Part of the reason lumpy reclaim worked is because
it aggressively reclaimed pages and this patch was meant to be a sane
compromise.

When compaction fails, it gets deferred and both compaction and
reclaim/compaction is deferred avoid excessive reclaim. However, since
commit c6543459 (mm: remove __GFP_NO_KSWAPD), kswapd is woken up each time
and continues reclaiming which was not taken into account when the patch
was developed.

Attempts to address the problem ended up just changing the shape of the
problem instead of fixing it. The release window gets closer and while a
THP allocation failing is not a major problem, kswapd chewing up a lot of
CPU is. This patch reverts "mm: vmscan: scale number of pages reclaimed
by reclaim/compaction based on failures" and will be revisited in the future.

Signed-off-by: Mel Gorman 
---
 mm/vmscan.c |   25 -
 1 file changed, 25 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2624edc..e081ee8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1760,28 +1760,6 @@ static bool in_reclaim_compaction(struct scan_control 
*sc)
return false;
 }
 
-#ifdef CONFIG_COMPACTION
-/*
- * If compaction is deferred for sc->order then scale the number of pages
- * reclaimed based on the number of consecutive allocation failures
- */
-static unsigned long scale_for_compaction(unsigned long pages_for_compaction,
-   struct lruvec *lruvec, struct scan_control *sc)
-{
-   struct zone *zone = lruvec_zone(lruvec);
-
-   if (zone->compact_order_failed <= sc->order)
-   pages_for_compaction <<= zone->compact_defer_shift;
-   return pages_for_compaction;
-}
-#else
-static unsigned long scale_for_compaction(unsigned long pages_for_compaction,
-   struct lruvec *lruvec, struct scan_control *sc)
-{
-   return pages_for_compaction;
-}
-#endif
-
 /*
  * Reclaim/compaction is used for high-order allocation requests. It reclaims
  * order-0 pages before compacting the zone. should_continue_reclaim() returns
@@ -1829,9 +1807,6 @@ static inline bool should_continue_reclaim(struct lruvec 
*lruvec,
 * inactive lists are large enough, continue reclaiming
 */
pages_for_compaction = (2UL << sc->order);
-
-   pages_for_compaction = scale_for_compaction(pages_for_compaction,
-   lruvec, sc);
inactive_lru_pages = get_lru_size(lruvec, LRU_INACTIVE_FILE);
if (nr_swap_pages > 0)
inactive_lru_pages += get_lru_size(lruvec, LRU_INACTIVE_ANON);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/