Re: [PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-28 Thread Minchan Kim
Hi Andrew,

On Wed, Nov 28, 2018 at 03:35:59PM -0800, Andrew Morton wrote:
> On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim  wrote:
> 
> > This patch supports new feature "zram idle/huge page writeback".
> > On zram-swap usecase, zram has usually many idle/huge swap pages.
> > It's pointless to keep in memory(ie, zram).
> > 
> > To solve the problem, this feature introduces idle/huge page
> > writeback to backing device so the goal is to save more memory
> > space on embedded system.
> > 
> > Normal sequence to use idle/huge page writeback feature is as follows,
> > 
> > while (1) {
> > # mark allocated zram slot to idle
> > echo all > /sys/block/zram0/idle
> > # leave system working for several hours
> > # Unless there is no access for some blocks on zram,
> > # they are still IDLE marked pages.
> > 
> > echo "idle" > /sys/block/zram0/writeback
> > or/and
> > echo "huge" > /sys/block/zram0/writeback
> > # write the IDLE or/and huge marked slot into backing device
> > # and free the memory.
> > }
> > 
> > By per discussion:
> > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
> > 
> > This patch removes direct incommpressibe page writeback feature
> > (d2afd25114f4, zram: write incompressible pages to backing device)
> > so we could regard it as regression because incompressible pages
> > doesn't go to backing storage automatically. Instead, usre should
> > do it via "echo huge" > /sys/block/zram/writeback" manually.
> 
> I'm not in any position to determine the regression risk here.
> 
> Why is that feature being removed, anyway?

Below concerns from Sergey:
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u

== &< ==
"IDLE writeback" is superior to "incompressible writeback".

"incompressible writeback" is completely unpredictable and
uncontrollable; it depens on data patterns and compression algorithms.
While "IDLE writeback" is predictable.

I even suspect, that, *ideally*, we can remove "incompressible
writeback". "IDLE pages" is a super set which also includes
"incompressible" pages. So, technically, we still can do
"incompressible writeback" from "IDLE writeback" path; but a much
more reasonable one, based on a page idling period.

I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from
flash wearout, so I can see "incompressible writeback" path becoming
a dead code, long term.
== &< ==

My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want
to enable only idlepage writeback so we need to introduce turn on/off
knob for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase.
I don't want to make it complicated *if possible*.

Long term, I imagine we need to make VM aware of new swap hierarchy
a little bit different with as-is.
For example, first high priority swap can return -EIO or -ENOCOMP,
swap try to fallback to next lower priority swap device. With that,
hugepage writeback will work tranparently.

> 
> > If we hear some regression, we could restore the function.
> 
> Why not do that now?
> 

We want to remove it at this moment. 


Re: [PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-28 Thread Minchan Kim
Hi Andrew,

On Wed, Nov 28, 2018 at 03:35:59PM -0800, Andrew Morton wrote:
> On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim  wrote:
> 
> > This patch supports new feature "zram idle/huge page writeback".
> > On zram-swap usecase, zram has usually many idle/huge swap pages.
> > It's pointless to keep in memory(ie, zram).
> > 
> > To solve the problem, this feature introduces idle/huge page
> > writeback to backing device so the goal is to save more memory
> > space on embedded system.
> > 
> > Normal sequence to use idle/huge page writeback feature is as follows,
> > 
> > while (1) {
> > # mark allocated zram slot to idle
> > echo all > /sys/block/zram0/idle
> > # leave system working for several hours
> > # Unless there is no access for some blocks on zram,
> > # they are still IDLE marked pages.
> > 
> > echo "idle" > /sys/block/zram0/writeback
> > or/and
> > echo "huge" > /sys/block/zram0/writeback
> > # write the IDLE or/and huge marked slot into backing device
> > # and free the memory.
> > }
> > 
> > By per discussion:
> > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
> > 
> > This patch removes direct incommpressibe page writeback feature
> > (d2afd25114f4, zram: write incompressible pages to backing device)
> > so we could regard it as regression because incompressible pages
> > doesn't go to backing storage automatically. Instead, usre should
> > do it via "echo huge" > /sys/block/zram/writeback" manually.
> 
> I'm not in any position to determine the regression risk here.
> 
> Why is that feature being removed, anyway?

Below concerns from Sergey:
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u

== &< ==
"IDLE writeback" is superior to "incompressible writeback".

"incompressible writeback" is completely unpredictable and
uncontrollable; it depens on data patterns and compression algorithms.
While "IDLE writeback" is predictable.

I even suspect, that, *ideally*, we can remove "incompressible
writeback". "IDLE pages" is a super set which also includes
"incompressible" pages. So, technically, we still can do
"incompressible writeback" from "IDLE writeback" path; but a much
more reasonable one, based on a page idling period.

I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from
flash wearout, so I can see "incompressible writeback" path becoming
a dead code, long term.
== &< ==

My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want
to enable only idlepage writeback so we need to introduce turn on/off
knob for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase.
I don't want to make it complicated *if possible*.

Long term, I imagine we need to make VM aware of new swap hierarchy
a little bit different with as-is.
For example, first high priority swap can return -EIO or -ENOCOMP,
swap try to fallback to next lower priority swap device. With that,
hugepage writeback will work tranparently.

> 
> > If we hear some regression, we could restore the function.
> 
> Why not do that now?
> 

We want to remove it at this moment. 


Re: [PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-28 Thread Andrew Morton
On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim  wrote:

> This patch supports new feature "zram idle/huge page writeback".
> On zram-swap usecase, zram has usually many idle/huge swap pages.
> It's pointless to keep in memory(ie, zram).
> 
> To solve the problem, this feature introduces idle/huge page
> writeback to backing device so the goal is to save more memory
> space on embedded system.
> 
> Normal sequence to use idle/huge page writeback feature is as follows,
> 
> while (1) {
> # mark allocated zram slot to idle
> echo all > /sys/block/zram0/idle
> # leave system working for several hours
> # Unless there is no access for some blocks on zram,
>   # they are still IDLE marked pages.
> 
> echo "idle" > /sys/block/zram0/writeback
>   or/and
>   echo "huge" > /sys/block/zram0/writeback
> # write the IDLE or/and huge marked slot into backing device
>   # and free the memory.
> }
> 
> By per discussion:
> https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
> 
> This patch removes direct incommpressibe page writeback feature
> (d2afd25114f4, zram: write incompressible pages to backing device)
> so we could regard it as regression because incompressible pages
> doesn't go to backing storage automatically. Instead, usre should
> do it via "echo huge" > /sys/block/zram/writeback" manually.

I'm not in any position to determine the regression risk here.

Why is that feature being removed, anyway?

> If we hear some regression, we could restore the function.

Why not do that now?




Re: [PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-28 Thread Andrew Morton
On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim  wrote:

> This patch supports new feature "zram idle/huge page writeback".
> On zram-swap usecase, zram has usually many idle/huge swap pages.
> It's pointless to keep in memory(ie, zram).
> 
> To solve the problem, this feature introduces idle/huge page
> writeback to backing device so the goal is to save more memory
> space on embedded system.
> 
> Normal sequence to use idle/huge page writeback feature is as follows,
> 
> while (1) {
> # mark allocated zram slot to idle
> echo all > /sys/block/zram0/idle
> # leave system working for several hours
> # Unless there is no access for some blocks on zram,
>   # they are still IDLE marked pages.
> 
> echo "idle" > /sys/block/zram0/writeback
>   or/and
>   echo "huge" > /sys/block/zram0/writeback
> # write the IDLE or/and huge marked slot into backing device
>   # and free the memory.
> }
> 
> By per discussion:
> https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
> 
> This patch removes direct incommpressibe page writeback feature
> (d2afd25114f4, zram: write incompressible pages to backing device)
> so we could regard it as regression because incompressible pages
> doesn't go to backing storage automatically. Instead, usre should
> do it via "echo huge" > /sys/block/zram/writeback" manually.

I'm not in any position to determine the regression risk here.

Why is that feature being removed, anyway?

> If we hear some regression, we could restore the function.

Why not do that now?




[PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-26 Thread Minchan Kim
This patch supports new feature "zram idle/huge page writeback".
On zram-swap usecase, zram has usually many idle/huge swap pages.
It's pointless to keep in memory(ie, zram).

To solve the problem, this feature introduces idle/huge page
writeback to backing device so the goal is to save more memory
space on embedded system.

Normal sequence to use idle/huge page writeback feature is as follows,

while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.

echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}

By per discussion:
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,

This patch removes direct incommpressibe page writeback feature
(d2afd25114f4, zram: write incompressible pages to backing device)
so we could regard it as regression because incompressible pages
doesn't go to backing storage automatically. Instead, usre should
do it via "echo huge" > /sys/block/zram/writeback" manually.

If we hear some regression, we could restore the function.

Reviewed-by: Joey Pabalinas 
Signed-off-by: Minchan Kim 
---
 Documentation/ABI/testing/sysfs-block-zram |   7 +
 Documentation/blockdev/zram.txt|  28 ++-
 drivers/block/zram/Kconfig |   5 +-
 drivers/block/zram/zram_drv.c  | 247 +++--
 drivers/block/zram/zram_drv.h  |   1 +
 5 files changed, 209 insertions(+), 79 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram 
b/Documentation/ABI/testing/sysfs-block-zram
index 04c9a5980bc7..d1f80b077885 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -106,3 +106,10 @@ Contact:   Minchan Kim 
idle file is write-only and mark zram slot as idle.
If system has mounted debugfs, user can see which slots
are idle via /sys/kernel/debug/zram/zram/block_state
+
+What:  /sys/block/zram/writeback
+Date:  November 2018
+Contact:   Minchan Kim 
+Description:
+   The writeback file is write-only and trigger idle and/or
+   huge page writeback to backing device.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index f3bcd716d8a9..806cdaabac83 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -238,11 +238,31 @@ The stat file represents device's mm statistics. It 
consists of a single
 
 = writeback
 
-With incompressible pages, there is no memory saving with zram.
-Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
+With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
 to backing storage rather than keeping it in memory.
-User should set up backing device via /sys/block/zramX/backing_dev
-before disksize setting.
+To use the feature, admin should set up backing device via
+
+   "echo /dev/sda5 > /sys/block/zramX/backing_dev"
+
+before disksize setting. It supports only partition at this moment.
+If admin want to use incompressible page writeback, they could do via
+
+   "echo huge > /sys/block/zramX/write"
+
+To use idle page writeback, first, user need to declare zram pages
+as idle.
+
+   "echo all > /sys/block/zramX/idle"
+
+From now on, any pages on zram are idle pages. The idle mark
+will be removed until someone request access of the block.
+IOW, unless there is access request, those pages are still idle pages.
+
+Admin can request writeback of those idle pages at right timing via
+
+   "echo idle > /sys/block/zramX/writeback"
+
+With the command, zram writeback idle pages from memory to the storage.
 
 = memory tracking
 
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index fcd055457364..1ffc64770643 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -15,7 +15,7 @@ config ZRAM
  See Documentation/blockdev/zram.txt for more information.
 
 config ZRAM_WRITEBACK
-   bool "Write back incompressible page to backing device"
+   bool "Write back incompressible or idle page to backing device"
depends on ZRAM
help
 With incompressible page, there is no memory saving to keep it
@@ -23,6 +23,9 @@ config ZRAM_WRITEBACK
 For this feature, admin should set up backing device via
 /sys/block/zramX/backing_dev.
 
+With /sys/block/zramX/{idle,writeback}, application could ask
+idle page's writeback to the backing device to save in memory.
+
 See Documentation/blockdev/zram.txt for more information.
 
 config ZRAM_MEMORY_TRACKING
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c

[PATCH v3 5/7] zram: support idle/huge page writeback

2018-11-26 Thread Minchan Kim
This patch supports new feature "zram idle/huge page writeback".
On zram-swap usecase, zram has usually many idle/huge swap pages.
It's pointless to keep in memory(ie, zram).

To solve the problem, this feature introduces idle/huge page
writeback to backing device so the goal is to save more memory
space on embedded system.

Normal sequence to use idle/huge page writeback feature is as follows,

while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.

echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}

By per discussion:
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,

This patch removes direct incommpressibe page writeback feature
(d2afd25114f4, zram: write incompressible pages to backing device)
so we could regard it as regression because incompressible pages
doesn't go to backing storage automatically. Instead, usre should
do it via "echo huge" > /sys/block/zram/writeback" manually.

If we hear some regression, we could restore the function.

Reviewed-by: Joey Pabalinas 
Signed-off-by: Minchan Kim 
---
 Documentation/ABI/testing/sysfs-block-zram |   7 +
 Documentation/blockdev/zram.txt|  28 ++-
 drivers/block/zram/Kconfig |   5 +-
 drivers/block/zram/zram_drv.c  | 247 +++--
 drivers/block/zram/zram_drv.h  |   1 +
 5 files changed, 209 insertions(+), 79 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block-zram 
b/Documentation/ABI/testing/sysfs-block-zram
index 04c9a5980bc7..d1f80b077885 100644
--- a/Documentation/ABI/testing/sysfs-block-zram
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -106,3 +106,10 @@ Contact:   Minchan Kim 
idle file is write-only and mark zram slot as idle.
If system has mounted debugfs, user can see which slots
are idle via /sys/kernel/debug/zram/zram/block_state
+
+What:  /sys/block/zram/writeback
+Date:  November 2018
+Contact:   Minchan Kim 
+Description:
+   The writeback file is write-only and trigger idle and/or
+   huge page writeback to backing device.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index f3bcd716d8a9..806cdaabac83 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -238,11 +238,31 @@ The stat file represents device's mm statistics. It 
consists of a single
 
 = writeback
 
-With incompressible pages, there is no memory saving with zram.
-Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page
+With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
 to backing storage rather than keeping it in memory.
-User should set up backing device via /sys/block/zramX/backing_dev
-before disksize setting.
+To use the feature, admin should set up backing device via
+
+   "echo /dev/sda5 > /sys/block/zramX/backing_dev"
+
+before disksize setting. It supports only partition at this moment.
+If admin want to use incompressible page writeback, they could do via
+
+   "echo huge > /sys/block/zramX/write"
+
+To use idle page writeback, first, user need to declare zram pages
+as idle.
+
+   "echo all > /sys/block/zramX/idle"
+
+From now on, any pages on zram are idle pages. The idle mark
+will be removed until someone request access of the block.
+IOW, unless there is access request, those pages are still idle pages.
+
+Admin can request writeback of those idle pages at right timing via
+
+   "echo idle > /sys/block/zramX/writeback"
+
+With the command, zram writeback idle pages from memory to the storage.
 
 = memory tracking
 
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index fcd055457364..1ffc64770643 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -15,7 +15,7 @@ config ZRAM
  See Documentation/blockdev/zram.txt for more information.
 
 config ZRAM_WRITEBACK
-   bool "Write back incompressible page to backing device"
+   bool "Write back incompressible or idle page to backing device"
depends on ZRAM
help
 With incompressible page, there is no memory saving to keep it
@@ -23,6 +23,9 @@ config ZRAM_WRITEBACK
 For this feature, admin should set up backing device via
 /sys/block/zramX/backing_dev.
 
+With /sys/block/zramX/{idle,writeback}, application could ask
+idle page's writeback to the backing device to save in memory.
+
 See Documentation/blockdev/zram.txt for more information.
 
 config ZRAM_MEMORY_TRACKING
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c