Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-24 Thread Johannes Weiner
Hi Rik,

On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
> On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
> > On 22/04/14 12:55, Christian Borntraeger wrote:
> >> While preparing/testing some KVM on s390 patches for the next merge window 
> >> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> >> performance hickup on guest paging (all anonymous memory).
> >>
> >> All memory bound guests are in "D" state now and the system is barely 
> >> unusable.
> >>
> >> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> >> "mm: vmscan: do not swap anon pages just because free+file is low" makes 
> >> the problem go away.
> >>
> >> According to /proc/vmstat the system is now in direct reclaim almost all 
> >> the time for every page fault (more than 10x more direct reclaims than 
> >> kswap reclaims)
> >> With the patch being reverted everything is fine again.
> >>
> >> Any ideas?
> > 
> > Here is an idea to tackle my problem and the original problem:
> > 
> > reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
> > also seems to make my system usable.
> > 
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> >  */
> > if (global_reclaim(sc)) {
> > free = zone_page_state(zone, NR_FREE_PAGES);
> > -   if (unlikely(file + free <= high_wmark_pages(zone))) {
> > +   if (unlikely(file + free <= low_wmark_pages(zone))) {
> > scan_balance = SCAN_ANON;
> > goto out;
> > }
> > 
> 
> Looks reasonable to me.  Johannes?

I went with a full revert to be on the safe side.  Since kswapd's goal
is the high watermark, I kind of liked the idea that we start swapping
once the file pages alone are not enough anymore to restore the wmark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-24 Thread Johannes Weiner
Hi Rik,

On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
 On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
  On 22/04/14 12:55, Christian Borntraeger wrote:
  While preparing/testing some KVM on s390 patches for the next merge window 
  (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
  performance hickup on guest paging (all anonymous memory).
 
  All memory bound guests are in D state now and the system is barely 
  unusable.
 
  Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
  mm: vmscan: do not swap anon pages just because free+file is low makes 
  the problem go away.
 
  According to /proc/vmstat the system is now in direct reclaim almost all 
  the time for every page fault (more than 10x more direct reclaims than 
  kswap reclaims)
  With the patch being reverted everything is fine again.
 
  Any ideas?
  
  Here is an idea to tackle my problem and the original problem:
  
  reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
  also seems to make my system usable.
  
  --- a/mm/vmscan.c
  +++ b/mm/vmscan.c
  @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
  struct scan_control *sc,
   */
  if (global_reclaim(sc)) {
  free = zone_page_state(zone, NR_FREE_PAGES);
  -   if (unlikely(file + free = high_wmark_pages(zone))) {
  +   if (unlikely(file + free = low_wmark_pages(zone))) {
  scan_balance = SCAN_ANON;
  goto out;
  }
  
 
 Looks reasonable to me.  Johannes?

I went with a full revert to be on the safe side.  Since kswapd's goal
is the high watermark, I kind of liked the idea that we start swapping
once the file pages alone are not enough anymore to restore the wmark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini
On Tue, Apr 22, 2014 at 11:06:56AM -0400, Johannes Weiner wrote:
> Hi Christian,
> 
> On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
> > While preparing/testing some KVM on s390 patches for the next merge window 
> > (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> > performance hickup on guest paging (all anonymous memory).
> > 
> > All memory bound guests are in "D" state now and the system is barely 
> > unusable.
> > 
> > Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> > "mm: vmscan: do not swap anon pages just because free+file is low" makes 
> > the problem go away.
> > 
> > According to /proc/vmstat the system is now in direct reclaim almost all 
> > the time for every page fault (more than 10x more direct reclaims than 
> > kswap reclaims)
> > With the patch being reverted everything is fine again.
> 
> Ouch.  Yes, I think we have to revert this for now.
> 
> How about this?
> 
> ---
> From: Johannes Weiner 
> Subject: [patch] Revert "mm: vmscan: do not swap anon pages just because
>  free+file is low"
> 
> This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages
> just because free+file is low") because it introduced a regression in
> mostly-anonymous workloads, where reclaim would become ineffective and
> trap every allocating task in direct reclaim.
> 
> The problem is that there is a runaway feedback loop in the scan
> balance between file and anon, where the balance tips heavily towards
> a tiny thrashing file LRU and anonymous pages are no longer being
> looked at.  The commit in question removed the safe guard that would
> detect such situations and respond with forced anonymous reclaim.
> 
> This commit was part of a series to fix premature swapping in loads
> with relatively little cache, and while it made a small difference,
> the cure is obviously worse than the disease.  Revert it.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Johannes Weiner 
> Cc:[3.12+]
> ---
>  mm/vmscan.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9b6497eda806..169acb8e31c9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   get_lru_size(lruvec, LRU_INACTIVE_FILE);
>  
>   /*
> +  * Prevent the reclaimer from falling into the cache trap: as
> +  * cache pages start out inactive, every cache fault will tip
> +  * the scan balance towards the file LRU.  And as the file LRU
> +  * shrinks, so does the window for rotation from references.
> +  * This means we have a runaway feedback loop where a tiny
> +  * thrashing file LRU becomes infinitely more attractive than
> +  * anon pages.  Try to detect this based on file LRU size.
> +  */
> + if (global_reclaim(sc)) {
> + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
> +
> + if (unlikely(file + free <= high_wmark_pages(zone))) {
> + scan_balance = SCAN_ANON;
> + goto out;
> + }
> + }
> +
> + /*
>* There is enough inactive page cache, do not reclaim
>* anything from the anonymous working set right now.
>*/
> -- 
> 1.9.2
> 
Acked-by: Rafael Aquini 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger
On 22/04/14 17:06, Johannes Weiner wrote:
> Hi Christian,
> 
> On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
>> While preparing/testing some KVM on s390 patches for the next merge window 
>> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
>> performance hickup on guest paging (all anonymous memory).
>>
>> All memory bound guests are in "D" state now and the system is barely 
>> unusable.
>>
>> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
>> "mm: vmscan: do not swap anon pages just because free+file is low" makes the 
>> problem go away.
>>
>> According to /proc/vmstat the system is now in direct reclaim almost all the 
>> time for every page fault (more than 10x more direct reclaims than kswap 
>> reclaims)
>> With the patch being reverted everything is fine again.
> 
> Ouch.  Yes, I think we have to revert this for now.
> 
> How about this?
> 
> ---
> From: Johannes Weiner 
> Subject: [patch] Revert "mm: vmscan: do not swap anon pages just because
>  free+file is low"
> 
> This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages
> just because free+file is low") because it introduced a regression in
> mostly-anonymous workloads, where reclaim would become ineffective and
> trap every allocating task in direct reclaim.
> 
> The problem is that there is a runaway feedback loop in the scan
> balance between file and anon, where the balance tips heavily towards
> a tiny thrashing file LRU and anonymous pages are no longer being
> looked at.  The commit in question removed the safe guard that would
> detect such situations and respond with forced anonymous reclaim.
> 
> This commit was part of a series to fix premature swapping in loads
> with relatively little cache, and while it made a small difference,
> the cure is obviously worse than the disease.  Revert it.
> 
> Reported-by: Christian Borntraeger 
> Signed-off-by: Johannes Weiner 
> Cc:[3.12+]



This is certainly safer than my hack with low_wmark_pages. We have several 
cases where increasing the min_free_kbytes avoids going into direct reclaim for 
large host systems with heavy paging. So I guess my patch is just a trade off 
between the two cases, but it actually makes it still more likely to go into 
direct reclaim than your revert. So I prefer your revert

Acked-by: Christian Borntraeger 

> ---
>  mm/vmscan.c | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 9b6497eda806..169acb8e31c9 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>   get_lru_size(lruvec, LRU_INACTIVE_FILE);
> 
>   /*
> +  * Prevent the reclaimer from falling into the cache trap: as
> +  * cache pages start out inactive, every cache fault will tip
> +  * the scan balance towards the file LRU.  And as the file LRU
> +  * shrinks, so does the window for rotation from references.
> +  * This means we have a runaway feedback loop where a tiny
> +  * thrashing file LRU becomes infinitely more attractive than
> +  * anon pages.  Try to detect this based on file LRU size.
> +  */
> + if (global_reclaim(sc)) {
> + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
> +
> + if (unlikely(file + free <= high_wmark_pages(zone))) {
> + scan_balance = SCAN_ANON;
> + goto out;
> + }
> + }
> +
> + /*
>* There is enough inactive page cache, do not reclaim
>* anything from the anonymous working set right now.
>*/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Rik van Riel
On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
> On 22/04/14 12:55, Christian Borntraeger wrote:
>> While preparing/testing some KVM on s390 patches for the next merge window 
>> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
>> performance hickup on guest paging (all anonymous memory).
>>
>> All memory bound guests are in "D" state now and the system is barely 
>> unusable.
>>
>> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
>> "mm: vmscan: do not swap anon pages just because free+file is low" makes the 
>> problem go away.
>>
>> According to /proc/vmstat the system is now in direct reclaim almost all the 
>> time for every page fault (more than 10x more direct reclaims than kswap 
>> reclaims)
>> With the patch being reverted everything is fine again.
>>
>> Any ideas?
> 
> Here is an idea to tackle my problem and the original problem:
> 
> reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
> also seems to make my system usable.
> 
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
> struct scan_control *sc,
>  */
> if (global_reclaim(sc)) {
> free = zone_page_state(zone, NR_FREE_PAGES);
> -   if (unlikely(file + free <= high_wmark_pages(zone))) {
> +   if (unlikely(file + free <= low_wmark_pages(zone))) {
> scan_balance = SCAN_ANON;
> goto out;
> }
> 

Looks reasonable to me.  Johannes?

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Johannes Weiner
Hi Christian,

On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
> While preparing/testing some KVM on s390 patches for the next merge window 
> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> performance hickup on guest paging (all anonymous memory).
> 
> All memory bound guests are in "D" state now and the system is barely 
> unusable.
> 
> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> "mm: vmscan: do not swap anon pages just because free+file is low" makes the 
> problem go away.
> 
> According to /proc/vmstat the system is now in direct reclaim almost all the 
> time for every page fault (more than 10x more direct reclaims than kswap 
> reclaims)
> With the patch being reverted everything is fine again.

Ouch.  Yes, I think we have to revert this for now.

How about this?

---
From: Johannes Weiner 
Subject: [patch] Revert "mm: vmscan: do not swap anon pages just because
 free+file is low"

This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages
just because free+file is low") because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan
balance between file and anon, where the balance tips heavily towards
a tiny thrashing file LRU and anonymous pages are no longer being
looked at.  The commit in question removed the safe guard that would
detect such situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads
with relatively little cache, and while it made a small difference,
the cure is obviously worse than the disease.  Revert it.

Reported-by: Christian Borntraeger 
Signed-off-by: Johannes Weiner 
Cc:  [3.12+]
---
 mm/vmscan.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b6497eda806..169acb8e31c9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
get_lru_size(lruvec, LRU_INACTIVE_FILE);
 
/*
+* Prevent the reclaimer from falling into the cache trap: as
+* cache pages start out inactive, every cache fault will tip
+* the scan balance towards the file LRU.  And as the file LRU
+* shrinks, so does the window for rotation from references.
+* This means we have a runaway feedback loop where a tiny
+* thrashing file LRU becomes infinitely more attractive than
+* anon pages.  Try to detect this based on file LRU size.
+*/
+   if (global_reclaim(sc)) {
+   unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
+
+   if (unlikely(file + free <= high_wmark_pages(zone))) {
+   scan_balance = SCAN_ANON;
+   goto out;
+   }
+   }
+
+   /*
 * There is enough inactive page cache, do not reclaim
 * anything from the anonymous working set right now.
 */
-- 
1.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini
On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
> On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
> > On 22/04/14 12:55, Christian Borntraeger wrote:
> >> While preparing/testing some KVM on s390 patches for the next merge window 
> >> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> >> performance hickup on guest paging (all anonymous memory).
> >>
> >> All memory bound guests are in "D" state now and the system is barely 
> >> unusable.
> >>
> >> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> >> "mm: vmscan: do not swap anon pages just because free+file is low" makes 
> >> the problem go away.
> >>
> >> According to /proc/vmstat the system is now in direct reclaim almost all 
> >> the time for every page fault (more than 10x more direct reclaims than 
> >> kswap reclaims)
> >> With the patch being reverted everything is fine again.
> >>
> >> Any ideas?
> > 
> > Here is an idea to tackle my problem and the original problem:
> > 
> > reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
> > also seems to make my system usable.
> > 
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
> > struct scan_control *sc,
> >  */
> > if (global_reclaim(sc)) {
> > free = zone_page_state(zone, NR_FREE_PAGES);
> > -   if (unlikely(file + free <= high_wmark_pages(zone))) {
> > +   if (unlikely(file + free <= low_wmark_pages(zone))) {
> > scan_balance = SCAN_ANON;
> > goto out;
> > }
> > 
> 
> Looks reasonable to me.
+1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b " mm: vmscan: do not swap anon pages just because free+file is low" causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger
On 22/04/14 12:55, Christian Borntraeger wrote:
> While preparing/testing some KVM on s390 patches for the next merge window 
> (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
> performance hickup on guest paging (all anonymous memory).
> 
> All memory bound guests are in "D" state now and the system is barely 
> unusable.
> 
> Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
> "mm: vmscan: do not swap anon pages just because free+file is low" makes the 
> problem go away.
> 
> According to /proc/vmstat the system is now in direct reclaim almost all the 
> time for every page fault (more than 10x more direct reclaims than kswap 
> reclaims)
> With the patch being reverted everything is fine again.
> 
> Any ideas?

Here is an idea to tackle my problem and the original problem:

reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
also seems to make my system usable.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
 */
if (global_reclaim(sc)) {
free = zone_page_state(zone, NR_FREE_PAGES);
-   if (unlikely(file + free <= high_wmark_pages(zone))) {
+   if (unlikely(file + free <= low_wmark_pages(zone))) {
scan_balance = SCAN_ANON;
goto out;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger
On 22/04/14 12:55, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).
 
 All memory bound guests are in D state now and the system is barely 
 unusable.
 
 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.
 
 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.
 
 Any ideas?

Here is an idea to tackle my problem and the original problem:

reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
also seems to make my system usable.

--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
 */
if (global_reclaim(sc)) {
free = zone_page_state(zone, NR_FREE_PAGES);
-   if (unlikely(file + free = high_wmark_pages(zone))) {
+   if (unlikely(file + free = low_wmark_pages(zone))) {
scan_balance = SCAN_ANON;
goto out;
}

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini
On Tue, Apr 22, 2014 at 10:40:17AM -0400, Rik van Riel wrote:
 On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
  On 22/04/14 12:55, Christian Borntraeger wrote:
  While preparing/testing some KVM on s390 patches for the next merge window 
  (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
  performance hickup on guest paging (all anonymous memory).
 
  All memory bound guests are in D state now and the system is barely 
  unusable.
 
  Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
  mm: vmscan: do not swap anon pages just because free+file is low makes 
  the problem go away.
 
  According to /proc/vmstat the system is now in direct reclaim almost all 
  the time for every page fault (more than 10x more direct reclaims than 
  kswap reclaims)
  With the patch being reverted everything is fine again.
 
  Any ideas?
  
  Here is an idea to tackle my problem and the original problem:
  
  reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
  also seems to make my system usable.
  
  --- a/mm/vmscan.c
  +++ b/mm/vmscan.c
  @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
  struct scan_control *sc,
   */
  if (global_reclaim(sc)) {
  free = zone_page_state(zone, NR_FREE_PAGES);
  -   if (unlikely(file + free = high_wmark_pages(zone))) {
  +   if (unlikely(file + free = low_wmark_pages(zone))) {
  scan_balance = SCAN_ANON;
  goto out;
  }
  
 
 Looks reasonable to me.
+1

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Johannes Weiner
Hi Christian,

On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).
 
 All memory bound guests are in D state now and the system is barely 
 unusable.
 
 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.
 
 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.

Ouch.  Yes, I think we have to revert this for now.

How about this?

---
From: Johannes Weiner han...@cmpxchg.org
Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
 free+file is low

This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
just because free+file is low) because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan
balance between file and anon, where the balance tips heavily towards
a tiny thrashing file LRU and anonymous pages are no longer being
looked at.  The commit in question removed the safe guard that would
detect such situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads
with relatively little cache, and while it made a small difference,
the cure is obviously worse than the disease.  Revert it.

Reported-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Johannes Weiner han...@cmpxchg.org
Cc: sta...@kernel.org [3.12+]
---
 mm/vmscan.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b6497eda806..169acb8e31c9 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
get_lru_size(lruvec, LRU_INACTIVE_FILE);
 
/*
+* Prevent the reclaimer from falling into the cache trap: as
+* cache pages start out inactive, every cache fault will tip
+* the scan balance towards the file LRU.  And as the file LRU
+* shrinks, so does the window for rotation from references.
+* This means we have a runaway feedback loop where a tiny
+* thrashing file LRU becomes infinitely more attractive than
+* anon pages.  Try to detect this based on file LRU size.
+*/
+   if (global_reclaim(sc)) {
+   unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
+
+   if (unlikely(file + free = high_wmark_pages(zone))) {
+   scan_balance = SCAN_ANON;
+   goto out;
+   }
+   }
+
+   /*
 * There is enough inactive page cache, do not reclaim
 * anything from the anonymous working set right now.
 */
-- 
1.9.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rik van Riel
On 04/22/2014 07:57 AM, Christian Borntraeger wrote:
 On 22/04/14 12:55, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).

 All memory bound guests are in D state now and the system is barely 
 unusable.

 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.

 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.

 Any ideas?
 
 Here is an idea to tackle my problem and the original problem:
 
 reverting  0bf1457f0cfca7bc026a82323ad34bcf58ad035d + checking against low, 
 also seems to make my system usable.
 
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1923,7 +1923,7 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
  */
 if (global_reclaim(sc)) {
 free = zone_page_state(zone, NR_FREE_PAGES);
 -   if (unlikely(file + free = high_wmark_pages(zone))) {
 +   if (unlikely(file + free = low_wmark_pages(zone))) {
 scan_balance = SCAN_ANON;
 goto out;
 }
 

Looks reasonable to me.  Johannes?

-- 
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Christian Borntraeger
On 22/04/14 17:06, Johannes Weiner wrote:
 Hi Christian,
 
 On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
 While preparing/testing some KVM on s390 patches for the next merge window 
 (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
 performance hickup on guest paging (all anonymous memory).

 All memory bound guests are in D state now and the system is barely 
 unusable.

 Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
 mm: vmscan: do not swap anon pages just because free+file is low makes the 
 problem go away.

 According to /proc/vmstat the system is now in direct reclaim almost all the 
 time for every page fault (more than 10x more direct reclaims than kswap 
 reclaims)
 With the patch being reverted everything is fine again.
 
 Ouch.  Yes, I think we have to revert this for now.
 
 How about this?
 
 ---
 From: Johannes Weiner han...@cmpxchg.org
 Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
  free+file is low
 
 This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
 just because free+file is low) because it introduced a regression in
 mostly-anonymous workloads, where reclaim would become ineffective and
 trap every allocating task in direct reclaim.
 
 The problem is that there is a runaway feedback loop in the scan
 balance between file and anon, where the balance tips heavily towards
 a tiny thrashing file LRU and anonymous pages are no longer being
 looked at.  The commit in question removed the safe guard that would
 detect such situations and respond with forced anonymous reclaim.
 
 This commit was part of a series to fix premature swapping in loads
 with relatively little cache, and while it made a small difference,
 the cure is obviously worse than the disease.  Revert it.
 
 Reported-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 Cc: sta...@kernel.org   [3.12+]



This is certainly safer than my hack with low_wmark_pages. We have several 
cases where increasing the min_free_kbytes avoids going into direct reclaim for 
large host systems with heavy paging. So I guess my patch is just a trade off 
between the two cases, but it actually makes it still more likely to go into 
direct reclaim than your revert. So I prefer your revert

Acked-by: Christian Borntraeger borntrae...@de.ibm.com

 ---
  mm/vmscan.c | 18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 9b6497eda806..169acb8e31c9 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
   get_lru_size(lruvec, LRU_INACTIVE_FILE);
 
   /*
 +  * Prevent the reclaimer from falling into the cache trap: as
 +  * cache pages start out inactive, every cache fault will tip
 +  * the scan balance towards the file LRU.  And as the file LRU
 +  * shrinks, so does the window for rotation from references.
 +  * This means we have a runaway feedback loop where a tiny
 +  * thrashing file LRU becomes infinitely more attractive than
 +  * anon pages.  Try to detect this based on file LRU size.
 +  */
 + if (global_reclaim(sc)) {
 + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
 +
 + if (unlikely(file + free = high_wmark_pages(zone))) {
 + scan_balance = SCAN_ANON;
 + goto out;
 + }
 + }
 +
 + /*
* There is enough inactive page cache, do not reclaim
* anything from the anonymous working set right now.
*/
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: commit 0bf1457f0cfca7b mm: vmscan: do not swap anon pages just because free+file is low causes heavy performance regression on paging

2014-04-22 Thread Rafael Aquini
On Tue, Apr 22, 2014 at 11:06:56AM -0400, Johannes Weiner wrote:
 Hi Christian,
 
 On Tue, Apr 22, 2014 at 12:55:37PM +0200, Christian Borntraeger wrote:
  While preparing/testing some KVM on s390 patches for the next merge window 
  (target is kvm/next which is based on 3.15-rc1) I faced a very severe 
  performance hickup on guest paging (all anonymous memory).
  
  All memory bound guests are in D state now and the system is barely 
  unusable.
  
  Reverting commit 0bf1457f0cfca7bc026a82323ad34bcf58ad035d
  mm: vmscan: do not swap anon pages just because free+file is low makes 
  the problem go away.
  
  According to /proc/vmstat the system is now in direct reclaim almost all 
  the time for every page fault (more than 10x more direct reclaims than 
  kswap reclaims)
  With the patch being reverted everything is fine again.
 
 Ouch.  Yes, I think we have to revert this for now.
 
 How about this?
 
 ---
 From: Johannes Weiner han...@cmpxchg.org
 Subject: [patch] Revert mm: vmscan: do not swap anon pages just because
  free+file is low
 
 This reverts commit 0bf1457f0cfc (mm: vmscan: do not swap anon pages
 just because free+file is low) because it introduced a regression in
 mostly-anonymous workloads, where reclaim would become ineffective and
 trap every allocating task in direct reclaim.
 
 The problem is that there is a runaway feedback loop in the scan
 balance between file and anon, where the balance tips heavily towards
 a tiny thrashing file LRU and anonymous pages are no longer being
 looked at.  The commit in question removed the safe guard that would
 detect such situations and respond with forced anonymous reclaim.
 
 This commit was part of a series to fix premature swapping in loads
 with relatively little cache, and while it made a small difference,
 the cure is obviously worse than the disease.  Revert it.
 
 Reported-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Johannes Weiner han...@cmpxchg.org
 Cc: sta...@kernel.org   [3.12+]
 ---
  mm/vmscan.c | 18 ++
  1 file changed, 18 insertions(+)
 
 diff --git a/mm/vmscan.c b/mm/vmscan.c
 index 9b6497eda806..169acb8e31c9 100644
 --- a/mm/vmscan.c
 +++ b/mm/vmscan.c
 @@ -1916,6 +1916,24 @@ static void get_scan_count(struct lruvec *lruvec, 
 struct scan_control *sc,
   get_lru_size(lruvec, LRU_INACTIVE_FILE);
  
   /*
 +  * Prevent the reclaimer from falling into the cache trap: as
 +  * cache pages start out inactive, every cache fault will tip
 +  * the scan balance towards the file LRU.  And as the file LRU
 +  * shrinks, so does the window for rotation from references.
 +  * This means we have a runaway feedback loop where a tiny
 +  * thrashing file LRU becomes infinitely more attractive than
 +  * anon pages.  Try to detect this based on file LRU size.
 +  */
 + if (global_reclaim(sc)) {
 + unsigned long free = zone_page_state(zone, NR_FREE_PAGES);
 +
 + if (unlikely(file + free = high_wmark_pages(zone))) {
 + scan_balance = SCAN_ANON;
 + goto out;
 + }
 + }
 +
 + /*
* There is enough inactive page cache, do not reclaim
* anything from the anonymous working set right now.
*/
 -- 
 1.9.2
 
Acked-by: Rafael Aquini aqu...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/