Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-04-01 Thread Andrew Morton
On Sat, 15 Mar 2014 20:36:02 -0700 (PDT) Hugh Dickins  wrote:

> From: Suleiman Souhlal 
> 
> Prior to this change, we would decide whether to force scan a LRU
> during reclaim if that LRU itself was too small for the current
> priority. However, this can lead to the file LRU getting force
> scanned even if there are a lot of anonymous pages we can reclaim,
> leading to hot file pages getting needlessly reclaimed.

Struggling a bit here.  You're referring to this code?

size = get_lru_size(lruvec, lru);
scan = size >> sc->priority;

if (!scan && force_scan)
scan = min(size, SWAP_CLUSTER_MAX);

So we're talking about the case where the LRU is so small that it
contains fewer than (1 To address this, we instead only force scan when none of the
> reclaimable LRUs are big enough.
> 
> Gives huge improvements with zswap. For example, when doing -j20
> kernel build in a 500MB container with zswap enabled, runtime (in
> seconds) is greatly reduced:
> 
> x without this change
> + with this change
> N   Min   MaxMedian   AvgStddev
> x   5   700.997   790.076   763.928754.05  39.59493
> +   5   141.634   197.899   155.706 161.9 21.270224
> Difference at 95.0% confidence
> -592.15 +/- 46.3521
> -78.5293% +/- 6.14709%
> (Student's t, pooled s = 31.7819)

And yet the patch makes a large difference.  What am I missing here?

> --- 3.14-rc6/mm/vmscan.c  2014-02-02 18:49:07.949302116 -0800
> +++ linux/mm/vmscan.c 2014-03-15 19:31:44.948977032 -0700
> @@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
>   fraction[1] = fp;
>   denominator = ap + fp + 1;
>  out:
> - for_each_evictable_lru(lru) {
> - int file = is_file_lru(lru);
> - unsigned long size;
> - unsigned long scan;
> -
> - size = get_lru_size(lruvec, lru);
> - scan = size >> sc->priority;
> -
> - if (!scan && force_scan)
> - scan = min(size, SWAP_CLUSTER_MAX);
> -
> - switch (scan_balance) {
> - case SCAN_EQUAL:
> - /* Scan lists relative to size */
> - break;
> - case SCAN_FRACT:
> + some_scanned = false;
> + /* Only use force_scan on second pass. */

That's a poor comment.

> + for (pass = 0; !some_scanned && pass < 2; pass++) {
> + for_each_evictable_lru(lru) {
> + int file = is_file_lru(lru);
> + unsigned long size;
> + unsigned long scan;
> +
> + size = get_lru_size(lruvec, lru);
> + scan = size >> sc->priority;
> +
> + if (!scan && pass && force_scan)
> + scan = min(size, SWAP_CLUSTER_MAX);
> +
> + switch (scan_balance) {
> + case SCAN_EQUAL:
> + /* Scan lists relative to size */
> + break;
> + case SCAN_FRACT:
> + /*
> +  * Scan types proportional to swappiness and
> +  * their relative recent reclaim efficiency.
> +  */
> + scan = div64_u64(scan * fraction[file],
> + denominator);
> + break;
> + case SCAN_FILE:
> + case SCAN_ANON:
> + /* Scan one type exclusively */
> + if ((scan_balance == SCAN_FILE) != file)
> + scan = 0;
> + break;
> + default:
> + /* Look ma, no brain */
> + BUG();
> + }
> + nr[lru] = scan;
>   /*
> -  * Scan types proportional to swappiness and
> -  * their relative recent reclaim efficiency.
> +  * Skip the second pass and don't force_scan,
> +  * if we found something to scan.

And so is that.  Both comments explain *what* the code is doing (which
was fairly obvious from the code!) but they fail to explain *why* the
code is doing what it does.

>*/
> - scan = div64_u64(scan * fraction[file], denominator);
> - break;
> - case SCAN_FILE:
> - case SCAN_ANON:
> - /* Scan one type exclusively */
> 

Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-04-01 Thread Andrew Morton
On Sat, 15 Mar 2014 20:36:02 -0700 (PDT) Hugh Dickins hu...@google.com wrote:

 From: Suleiman Souhlal sulei...@google.com
 
 Prior to this change, we would decide whether to force scan a LRU
 during reclaim if that LRU itself was too small for the current
 priority. However, this can lead to the file LRU getting force
 scanned even if there are a lot of anonymous pages we can reclaim,
 leading to hot file pages getting needlessly reclaimed.

Struggling a bit here.  You're referring to this code?

size = get_lru_size(lruvec, lru);
scan = size  sc-priority;

if (!scan  force_scan)
scan = min(size, SWAP_CLUSTER_MAX);

So we're talking about the case where the LRU is so small that it
contains fewer than (1sc-priority) pages?

If so, then I'd expect that in normal operation this situation rarely
occurs?  Surely the LRUs normally contain many more pages than this.

 To address this, we instead only force scan when none of the
 reclaimable LRUs are big enough.
 
 Gives huge improvements with zswap. For example, when doing -j20
 kernel build in a 500MB container with zswap enabled, runtime (in
 seconds) is greatly reduced:
 
 x without this change
 + with this change
 N   Min   MaxMedian   AvgStddev
 x   5   700.997   790.076   763.928754.05  39.59493
 +   5   141.634   197.899   155.706 161.9 21.270224
 Difference at 95.0% confidence
 -592.15 +/- 46.3521
 -78.5293% +/- 6.14709%
 (Student's t, pooled s = 31.7819)

And yet the patch makes a large difference.  What am I missing here?

 --- 3.14-rc6/mm/vmscan.c  2014-02-02 18:49:07.949302116 -0800
 +++ linux/mm/vmscan.c 2014-03-15 19:31:44.948977032 -0700
 @@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
   fraction[1] = fp;
   denominator = ap + fp + 1;
  out:
 - for_each_evictable_lru(lru) {
 - int file = is_file_lru(lru);
 - unsigned long size;
 - unsigned long scan;
 -
 - size = get_lru_size(lruvec, lru);
 - scan = size  sc-priority;
 -
 - if (!scan  force_scan)
 - scan = min(size, SWAP_CLUSTER_MAX);
 -
 - switch (scan_balance) {
 - case SCAN_EQUAL:
 - /* Scan lists relative to size */
 - break;
 - case SCAN_FRACT:
 + some_scanned = false;
 + /* Only use force_scan on second pass. */

That's a poor comment.

 + for (pass = 0; !some_scanned  pass  2; pass++) {
 + for_each_evictable_lru(lru) {
 + int file = is_file_lru(lru);
 + unsigned long size;
 + unsigned long scan;
 +
 + size = get_lru_size(lruvec, lru);
 + scan = size  sc-priority;
 +
 + if (!scan  pass  force_scan)
 + scan = min(size, SWAP_CLUSTER_MAX);
 +
 + switch (scan_balance) {
 + case SCAN_EQUAL:
 + /* Scan lists relative to size */
 + break;
 + case SCAN_FRACT:
 + /*
 +  * Scan types proportional to swappiness and
 +  * their relative recent reclaim efficiency.
 +  */
 + scan = div64_u64(scan * fraction[file],
 + denominator);
 + break;
 + case SCAN_FILE:
 + case SCAN_ANON:
 + /* Scan one type exclusively */
 + if ((scan_balance == SCAN_FILE) != file)
 + scan = 0;
 + break;
 + default:
 + /* Look ma, no brain */
 + BUG();
 + }
 + nr[lru] = scan;
   /*
 -  * Scan types proportional to swappiness and
 -  * their relative recent reclaim efficiency.
 +  * Skip the second pass and don't force_scan,
 +  * if we found something to scan.

And so is that.  Both comments explain *what* the code is doing (which
was fairly obvious from the code!) but they fail to explain *why* the
code is doing what it does.

*/
 - scan = div64_u64(scan * fraction[file], denominator);
 - break;
 - case SCAN_FILE:
 - case SCAN_ANON:
 - /* Scan one type exclusively */
 - if ((scan_balance == SCAN_FILE) != file)
 - 

Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-28 Thread Rafael Aquini
On Sat, Mar 15, 2014 at 08:36:02PM -0700, Hugh Dickins wrote:
> From: Suleiman Souhlal 
> 
> Prior to this change, we would decide whether to force scan a LRU
> during reclaim if that LRU itself was too small for the current
> priority. However, this can lead to the file LRU getting force
> scanned even if there are a lot of anonymous pages we can reclaim,
> leading to hot file pages getting needlessly reclaimed.
> 
> To address this, we instead only force scan when none of the
> reclaimable LRUs are big enough.
> 
> Gives huge improvements with zswap. For example, when doing -j20
> kernel build in a 500MB container with zswap enabled, runtime (in
> seconds) is greatly reduced:
> 
> x without this change
> + with this change
> N   Min   MaxMedian   AvgStddev
> x   5   700.997   790.076   763.928754.05  39.59493
> +   5   141.634   197.899   155.706 161.9 21.270224
> Difference at 95.0% confidence
> -592.15 +/- 46.3521
> -78.5293% +/- 6.14709%
> (Student's t, pooled s = 31.7819)
> 
> Should also give some improvements in regular (non-zswap) swap cases.
> 
> Yes, hughd found significant speedup using regular swap, with several
> memcgs under pressure; and it should also be effective in the non-memcg
> case, whenever one or another zone LRU is forced too small.
> 
> Signed-off-by: Suleiman Souhlal 
> Signed-off-by: Hugh Dickins 
> ---
> 

Acked-by: Rafael Aquini 

> I apologize to everyone for holding on to this so long: I think it's
> a very helpful patch (which we've been using in Google for months now).
> Been sitting on my TODO list, now prompted to send by related patches
> 
> https://lkml.org/lkml/2014/3/13/217
> https://lkml.org/lkml/2014/3/14/277
> 
> Certainly worth considering all three together, but my understanding
> is that they're actually three independent attacks on different ways
> in which we currently squeeze an LRU too small; and this patch from
> Suleiman seems to be the most valuable of the three, at least for
> the workloads I've tried it on.  But I'm not much of a page reclaim
> performance tester: please try it out to see if it's good for you.
> Thanks!
> 
>  mm/vmscan.c |   72 +-
>  1 file changed, 42 insertions(+), 30 deletions(-)
> 
> We did experiment with different ways of writing the patch, I'm afraid
> the way it came out best indents deeper, making it look more than it is.
> 
> --- 3.14-rc6/mm/vmscan.c  2014-02-02 18:49:07.949302116 -0800
> +++ linux/mm/vmscan.c 2014-03-15 19:31:44.948977032 -0700
> @@ -1852,6 +1852,8 @@ static void get_scan_count(struct lruvec
>   bool force_scan = false;
>   unsigned long ap, fp;
>   enum lru_list lru;
> + bool some_scanned;
> + int pass;
>  
>   /*
>* If the zone or memcg is small, nr[l] can be 0.  This
> @@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
>   fraction[1] = fp;
>   denominator = ap + fp + 1;
>  out:
> - for_each_evictable_lru(lru) {
> - int file = is_file_lru(lru);
> - unsigned long size;
> - unsigned long scan;
> -
> - size = get_lru_size(lruvec, lru);
> - scan = size >> sc->priority;
> -
> - if (!scan && force_scan)
> - scan = min(size, SWAP_CLUSTER_MAX);
> -
> - switch (scan_balance) {
> - case SCAN_EQUAL:
> - /* Scan lists relative to size */
> - break;
> - case SCAN_FRACT:
> + some_scanned = false;
> + /* Only use force_scan on second pass. */
> + for (pass = 0; !some_scanned && pass < 2; pass++) {
> + for_each_evictable_lru(lru) {
> + int file = is_file_lru(lru);
> + unsigned long size;
> + unsigned long scan;
> +
> + size = get_lru_size(lruvec, lru);
> + scan = size >> sc->priority;
> +
> + if (!scan && pass && force_scan)
> + scan = min(size, SWAP_CLUSTER_MAX);
> +
> + switch (scan_balance) {
> + case SCAN_EQUAL:
> + /* Scan lists relative to size */
> + break;
> + case SCAN_FRACT:
> + /*
> +  * Scan types proportional to swappiness and
> +  * their relative recent reclaim efficiency.
> +  */
> + scan = div64_u64(scan * fraction[file],
> + denominator);
> + break;
> + case SCAN_FILE:
> + case SCAN_ANON:
> + /* Scan one type exclusively */
> +

Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-28 Thread Rafael Aquini
On Sat, Mar 15, 2014 at 08:36:02PM -0700, Hugh Dickins wrote:
 From: Suleiman Souhlal sulei...@google.com
 
 Prior to this change, we would decide whether to force scan a LRU
 during reclaim if that LRU itself was too small for the current
 priority. However, this can lead to the file LRU getting force
 scanned even if there are a lot of anonymous pages we can reclaim,
 leading to hot file pages getting needlessly reclaimed.
 
 To address this, we instead only force scan when none of the
 reclaimable LRUs are big enough.
 
 Gives huge improvements with zswap. For example, when doing -j20
 kernel build in a 500MB container with zswap enabled, runtime (in
 seconds) is greatly reduced:
 
 x without this change
 + with this change
 N   Min   MaxMedian   AvgStddev
 x   5   700.997   790.076   763.928754.05  39.59493
 +   5   141.634   197.899   155.706 161.9 21.270224
 Difference at 95.0% confidence
 -592.15 +/- 46.3521
 -78.5293% +/- 6.14709%
 (Student's t, pooled s = 31.7819)
 
 Should also give some improvements in regular (non-zswap) swap cases.
 
 Yes, hughd found significant speedup using regular swap, with several
 memcgs under pressure; and it should also be effective in the non-memcg
 case, whenever one or another zone LRU is forced too small.
 
 Signed-off-by: Suleiman Souhlal sulei...@google.com
 Signed-off-by: Hugh Dickins hu...@google.com
 ---
 

Acked-by: Rafael Aquini aqu...@redhat.com

 I apologize to everyone for holding on to this so long: I think it's
 a very helpful patch (which we've been using in Google for months now).
 Been sitting on my TODO list, now prompted to send by related patches
 
 https://lkml.org/lkml/2014/3/13/217
 https://lkml.org/lkml/2014/3/14/277
 
 Certainly worth considering all three together, but my understanding
 is that they're actually three independent attacks on different ways
 in which we currently squeeze an LRU too small; and this patch from
 Suleiman seems to be the most valuable of the three, at least for
 the workloads I've tried it on.  But I'm not much of a page reclaim
 performance tester: please try it out to see if it's good for you.
 Thanks!
 
  mm/vmscan.c |   72 +-
  1 file changed, 42 insertions(+), 30 deletions(-)
 
 We did experiment with different ways of writing the patch, I'm afraid
 the way it came out best indents deeper, making it look more than it is.
 
 --- 3.14-rc6/mm/vmscan.c  2014-02-02 18:49:07.949302116 -0800
 +++ linux/mm/vmscan.c 2014-03-15 19:31:44.948977032 -0700
 @@ -1852,6 +1852,8 @@ static void get_scan_count(struct lruvec
   bool force_scan = false;
   unsigned long ap, fp;
   enum lru_list lru;
 + bool some_scanned;
 + int pass;
  
   /*
* If the zone or memcg is small, nr[l] can be 0.  This
 @@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
   fraction[1] = fp;
   denominator = ap + fp + 1;
  out:
 - for_each_evictable_lru(lru) {
 - int file = is_file_lru(lru);
 - unsigned long size;
 - unsigned long scan;
 -
 - size = get_lru_size(lruvec, lru);
 - scan = size  sc-priority;
 -
 - if (!scan  force_scan)
 - scan = min(size, SWAP_CLUSTER_MAX);
 -
 - switch (scan_balance) {
 - case SCAN_EQUAL:
 - /* Scan lists relative to size */
 - break;
 - case SCAN_FRACT:
 + some_scanned = false;
 + /* Only use force_scan on second pass. */
 + for (pass = 0; !some_scanned  pass  2; pass++) {
 + for_each_evictable_lru(lru) {
 + int file = is_file_lru(lru);
 + unsigned long size;
 + unsigned long scan;
 +
 + size = get_lru_size(lruvec, lru);
 + scan = size  sc-priority;
 +
 + if (!scan  pass  force_scan)
 + scan = min(size, SWAP_CLUSTER_MAX);
 +
 + switch (scan_balance) {
 + case SCAN_EQUAL:
 + /* Scan lists relative to size */
 + break;
 + case SCAN_FRACT:
 + /*
 +  * Scan types proportional to swappiness and
 +  * their relative recent reclaim efficiency.
 +  */
 + scan = div64_u64(scan * fraction[file],
 + denominator);
 + break;
 + case SCAN_FILE:
 + case SCAN_ANON:
 + /* Scan one type exclusively */
 + if ((scan_balance == SCAN_FILE) != file)
 + 

Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-27 Thread Rik van Riel

On 03/15/2014 11:36 PM, Hugh Dickins wrote:

From: Suleiman Souhlal 

Prior to this change, we would decide whether to force scan a LRU
during reclaim if that LRU itself was too small for the current
priority. However, this can lead to the file LRU getting force
scanned even if there are a lot of anonymous pages we can reclaim,
leading to hot file pages getting needlessly reclaimed.

To address this, we instead only force scan when none of the
reclaimable LRUs are big enough.

Gives huge improvements with zswap. For example, when doing -j20
kernel build in a 500MB container with zswap enabled, runtime (in
seconds) is greatly reduced:

x without this change
+ with this change
 N   Min   MaxMedian   AvgStddev
x   5   700.997   790.076   763.928754.05  39.59493
+   5   141.634   197.899   155.706 161.9 21.270224
Difference at 95.0% confidence
 -592.15 +/- 46.3521
 -78.5293% +/- 6.14709%
 (Student's t, pooled s = 31.7819)

Should also give some improvements in regular (non-zswap) swap cases.

Yes, hughd found significant speedup using regular swap, with several
memcgs under pressure; and it should also be effective in the non-memcg
case, whenever one or another zone LRU is forced too small.

Signed-off-by: Suleiman Souhlal 
Signed-off-by: Hugh Dickins 


Acked-by: Rik van Riel 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-27 Thread Rik van Riel

On 03/15/2014 11:36 PM, Hugh Dickins wrote:

From: Suleiman Souhlal sulei...@google.com

Prior to this change, we would decide whether to force scan a LRU
during reclaim if that LRU itself was too small for the current
priority. However, this can lead to the file LRU getting force
scanned even if there are a lot of anonymous pages we can reclaim,
leading to hot file pages getting needlessly reclaimed.

To address this, we instead only force scan when none of the
reclaimable LRUs are big enough.

Gives huge improvements with zswap. For example, when doing -j20
kernel build in a 500MB container with zswap enabled, runtime (in
seconds) is greatly reduced:

x without this change
+ with this change
 N   Min   MaxMedian   AvgStddev
x   5   700.997   790.076   763.928754.05  39.59493
+   5   141.634   197.899   155.706 161.9 21.270224
Difference at 95.0% confidence
 -592.15 +/- 46.3521
 -78.5293% +/- 6.14709%
 (Student's t, pooled s = 31.7819)

Should also give some improvements in regular (non-zswap) swap cases.

Yes, hughd found significant speedup using regular swap, with several
memcgs under pressure; and it should also be effective in the non-memcg
case, whenever one or another zone LRU is forced too small.

Signed-off-by: Suleiman Souhlal sulei...@google.com
Signed-off-by: Hugh Dickins hu...@google.com


Acked-by: Rik van Riel r...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-15 Thread Hugh Dickins
From: Suleiman Souhlal 

Prior to this change, we would decide whether to force scan a LRU
during reclaim if that LRU itself was too small for the current
priority. However, this can lead to the file LRU getting force
scanned even if there are a lot of anonymous pages we can reclaim,
leading to hot file pages getting needlessly reclaimed.

To address this, we instead only force scan when none of the
reclaimable LRUs are big enough.

Gives huge improvements with zswap. For example, when doing -j20
kernel build in a 500MB container with zswap enabled, runtime (in
seconds) is greatly reduced:

x without this change
+ with this change
N   Min   MaxMedian   AvgStddev
x   5   700.997   790.076   763.928754.05  39.59493
+   5   141.634   197.899   155.706 161.9 21.270224
Difference at 95.0% confidence
-592.15 +/- 46.3521
-78.5293% +/- 6.14709%
(Student's t, pooled s = 31.7819)

Should also give some improvements in regular (non-zswap) swap cases.

Yes, hughd found significant speedup using regular swap, with several
memcgs under pressure; and it should also be effective in the non-memcg
case, whenever one or another zone LRU is forced too small.

Signed-off-by: Suleiman Souhlal 
Signed-off-by: Hugh Dickins 
---

I apologize to everyone for holding on to this so long: I think it's
a very helpful patch (which we've been using in Google for months now).
Been sitting on my TODO list, now prompted to send by related patches

https://lkml.org/lkml/2014/3/13/217
https://lkml.org/lkml/2014/3/14/277

Certainly worth considering all three together, but my understanding
is that they're actually three independent attacks on different ways
in which we currently squeeze an LRU too small; and this patch from
Suleiman seems to be the most valuable of the three, at least for
the workloads I've tried it on.  But I'm not much of a page reclaim
performance tester: please try it out to see if it's good for you.
Thanks!

 mm/vmscan.c |   72 +-
 1 file changed, 42 insertions(+), 30 deletions(-)

We did experiment with different ways of writing the patch, I'm afraid
the way it came out best indents deeper, making it look more than it is.

--- 3.14-rc6/mm/vmscan.c2014-02-02 18:49:07.949302116 -0800
+++ linux/mm/vmscan.c   2014-03-15 19:31:44.948977032 -0700
@@ -1852,6 +1852,8 @@ static void get_scan_count(struct lruvec
bool force_scan = false;
unsigned long ap, fp;
enum lru_list lru;
+   bool some_scanned;
+   int pass;
 
/*
 * If the zone or memcg is small, nr[l] can be 0.  This
@@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
fraction[1] = fp;
denominator = ap + fp + 1;
 out:
-   for_each_evictable_lru(lru) {
-   int file = is_file_lru(lru);
-   unsigned long size;
-   unsigned long scan;
-
-   size = get_lru_size(lruvec, lru);
-   scan = size >> sc->priority;
-
-   if (!scan && force_scan)
-   scan = min(size, SWAP_CLUSTER_MAX);
-
-   switch (scan_balance) {
-   case SCAN_EQUAL:
-   /* Scan lists relative to size */
-   break;
-   case SCAN_FRACT:
+   some_scanned = false;
+   /* Only use force_scan on second pass. */
+   for (pass = 0; !some_scanned && pass < 2; pass++) {
+   for_each_evictable_lru(lru) {
+   int file = is_file_lru(lru);
+   unsigned long size;
+   unsigned long scan;
+
+   size = get_lru_size(lruvec, lru);
+   scan = size >> sc->priority;
+
+   if (!scan && pass && force_scan)
+   scan = min(size, SWAP_CLUSTER_MAX);
+
+   switch (scan_balance) {
+   case SCAN_EQUAL:
+   /* Scan lists relative to size */
+   break;
+   case SCAN_FRACT:
+   /*
+* Scan types proportional to swappiness and
+* their relative recent reclaim efficiency.
+*/
+   scan = div64_u64(scan * fraction[file],
+   denominator);
+   break;
+   case SCAN_FILE:
+   case SCAN_ANON:
+   /* Scan one type exclusively */
+   if ((scan_balance == SCAN_FILE) != file)
+   scan = 0;
+   break;
+   default:
+   /* Look ma, no brain 

[PATCH] mm: Only force scan in reclaim when none of the LRUs are big enough.

2014-03-15 Thread Hugh Dickins
From: Suleiman Souhlal sulei...@google.com

Prior to this change, we would decide whether to force scan a LRU
during reclaim if that LRU itself was too small for the current
priority. However, this can lead to the file LRU getting force
scanned even if there are a lot of anonymous pages we can reclaim,
leading to hot file pages getting needlessly reclaimed.

To address this, we instead only force scan when none of the
reclaimable LRUs are big enough.

Gives huge improvements with zswap. For example, when doing -j20
kernel build in a 500MB container with zswap enabled, runtime (in
seconds) is greatly reduced:

x without this change
+ with this change
N   Min   MaxMedian   AvgStddev
x   5   700.997   790.076   763.928754.05  39.59493
+   5   141.634   197.899   155.706 161.9 21.270224
Difference at 95.0% confidence
-592.15 +/- 46.3521
-78.5293% +/- 6.14709%
(Student's t, pooled s = 31.7819)

Should also give some improvements in regular (non-zswap) swap cases.

Yes, hughd found significant speedup using regular swap, with several
memcgs under pressure; and it should also be effective in the non-memcg
case, whenever one or another zone LRU is forced too small.

Signed-off-by: Suleiman Souhlal sulei...@google.com
Signed-off-by: Hugh Dickins hu...@google.com
---

I apologize to everyone for holding on to this so long: I think it's
a very helpful patch (which we've been using in Google for months now).
Been sitting on my TODO list, now prompted to send by related patches

https://lkml.org/lkml/2014/3/13/217
https://lkml.org/lkml/2014/3/14/277

Certainly worth considering all three together, but my understanding
is that they're actually three independent attacks on different ways
in which we currently squeeze an LRU too small; and this patch from
Suleiman seems to be the most valuable of the three, at least for
the workloads I've tried it on.  But I'm not much of a page reclaim
performance tester: please try it out to see if it's good for you.
Thanks!

 mm/vmscan.c |   72 +-
 1 file changed, 42 insertions(+), 30 deletions(-)

We did experiment with different ways of writing the patch, I'm afraid
the way it came out best indents deeper, making it look more than it is.

--- 3.14-rc6/mm/vmscan.c2014-02-02 18:49:07.949302116 -0800
+++ linux/mm/vmscan.c   2014-03-15 19:31:44.948977032 -0700
@@ -1852,6 +1852,8 @@ static void get_scan_count(struct lruvec
bool force_scan = false;
unsigned long ap, fp;
enum lru_list lru;
+   bool some_scanned;
+   int pass;
 
/*
 * If the zone or memcg is small, nr[l] can be 0.  This
@@ -1971,39 +1973,49 @@ static void get_scan_count(struct lruvec
fraction[1] = fp;
denominator = ap + fp + 1;
 out:
-   for_each_evictable_lru(lru) {
-   int file = is_file_lru(lru);
-   unsigned long size;
-   unsigned long scan;
-
-   size = get_lru_size(lruvec, lru);
-   scan = size  sc-priority;
-
-   if (!scan  force_scan)
-   scan = min(size, SWAP_CLUSTER_MAX);
-
-   switch (scan_balance) {
-   case SCAN_EQUAL:
-   /* Scan lists relative to size */
-   break;
-   case SCAN_FRACT:
+   some_scanned = false;
+   /* Only use force_scan on second pass. */
+   for (pass = 0; !some_scanned  pass  2; pass++) {
+   for_each_evictable_lru(lru) {
+   int file = is_file_lru(lru);
+   unsigned long size;
+   unsigned long scan;
+
+   size = get_lru_size(lruvec, lru);
+   scan = size  sc-priority;
+
+   if (!scan  pass  force_scan)
+   scan = min(size, SWAP_CLUSTER_MAX);
+
+   switch (scan_balance) {
+   case SCAN_EQUAL:
+   /* Scan lists relative to size */
+   break;
+   case SCAN_FRACT:
+   /*
+* Scan types proportional to swappiness and
+* their relative recent reclaim efficiency.
+*/
+   scan = div64_u64(scan * fraction[file],
+   denominator);
+   break;
+   case SCAN_FILE:
+   case SCAN_ANON:
+   /* Scan one type exclusively */
+   if ((scan_balance == SCAN_FILE) != file)
+   scan = 0;
+   break;
+   default:
+