Re: [PATCH v3 5/7] zram: support idle/huge page writeback
Hi Andrew, On Wed, Nov 28, 2018 at 03:35:59PM -0800, Andrew Morton wrote: > On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim wrote: > > > This patch supports new feature "zram idle/huge page writeback". > > On zram-swap usecase, zram has usually many idle/huge swap pages. > > It's pointless to keep in memory(ie, zram). > > > > To solve the problem, this feature introduces idle/huge page > > writeback to backing device so the goal is to save more memory > > space on embedded system. > > > > Normal sequence to use idle/huge page writeback feature is as follows, > > > > while (1) { > > # mark allocated zram slot to idle > > echo all > /sys/block/zram0/idle > > # leave system working for several hours > > # Unless there is no access for some blocks on zram, > > # they are still IDLE marked pages. > > > > echo "idle" > /sys/block/zram0/writeback > > or/and > > echo "huge" > /sys/block/zram0/writeback > > # write the IDLE or/and huge marked slot into backing device > > # and free the memory. > > } > > > > By per discussion: > > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, > > > > This patch removes direct incommpressibe page writeback feature > > (d2afd25114f4, zram: write incompressible pages to backing device) > > so we could regard it as regression because incompressible pages > > doesn't go to backing storage automatically. Instead, usre should > > do it via "echo huge" > /sys/block/zram/writeback" manually. > > I'm not in any position to determine the regression risk here. > > Why is that feature being removed, anyway? Below concerns from Sergey: https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u == &< == "IDLE writeback" is superior to "incompressible writeback". "incompressible writeback" is completely unpredictable and uncontrollable; it depens on data patterns and compression algorithms. While "IDLE writeback" is predictable. I even suspect, that, *ideally*, we can remove "incompressible writeback". "IDLE pages" is a super set which also includes "incompressible" pages. So, technically, we still can do "incompressible writeback" from "IDLE writeback" path; but a much more reasonable one, based on a page idling period. I understand that you want to keep "direct incompressible writeback" around. ZRAM is especially popular on devices which do suffer from flash wearout, so I can see "incompressible writeback" path becoming a dead code, long term. == &< == My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation, both hugepage/idlepage writeck will turn on. However someuser want to enable only idlepage writeback so we need to introduce turn on/off knob for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I don't want to make it complicated *if possible*. Long term, I imagine we need to make VM aware of new swap hierarchy a little bit different with as-is. For example, first high priority swap can return -EIO or -ENOCOMP, swap try to fallback to next lower priority swap device. With that, hugepage writeback will work tranparently. > > > If we hear some regression, we could restore the function. > > Why not do that now? > We want to remove it at this moment.
Re: [PATCH v3 5/7] zram: support idle/huge page writeback
Hi Andrew, On Wed, Nov 28, 2018 at 03:35:59PM -0800, Andrew Morton wrote: > On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim wrote: > > > This patch supports new feature "zram idle/huge page writeback". > > On zram-swap usecase, zram has usually many idle/huge swap pages. > > It's pointless to keep in memory(ie, zram). > > > > To solve the problem, this feature introduces idle/huge page > > writeback to backing device so the goal is to save more memory > > space on embedded system. > > > > Normal sequence to use idle/huge page writeback feature is as follows, > > > > while (1) { > > # mark allocated zram slot to idle > > echo all > /sys/block/zram0/idle > > # leave system working for several hours > > # Unless there is no access for some blocks on zram, > > # they are still IDLE marked pages. > > > > echo "idle" > /sys/block/zram0/writeback > > or/and > > echo "huge" > /sys/block/zram0/writeback > > # write the IDLE or/and huge marked slot into backing device > > # and free the memory. > > } > > > > By per discussion: > > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, > > > > This patch removes direct incommpressibe page writeback feature > > (d2afd25114f4, zram: write incompressible pages to backing device) > > so we could regard it as regression because incompressible pages > > doesn't go to backing storage automatically. Instead, usre should > > do it via "echo huge" > /sys/block/zram/writeback" manually. > > I'm not in any position to determine the regression risk here. > > Why is that feature being removed, anyway? Below concerns from Sergey: https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u == &< == "IDLE writeback" is superior to "incompressible writeback". "incompressible writeback" is completely unpredictable and uncontrollable; it depens on data patterns and compression algorithms. While "IDLE writeback" is predictable. I even suspect, that, *ideally*, we can remove "incompressible writeback". "IDLE pages" is a super set which also includes "incompressible" pages. So, technically, we still can do "incompressible writeback" from "IDLE writeback" path; but a much more reasonable one, based on a page idling period. I understand that you want to keep "direct incompressible writeback" around. ZRAM is especially popular on devices which do suffer from flash wearout, so I can see "incompressible writeback" path becoming a dead code, long term. == &< == My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation, both hugepage/idlepage writeck will turn on. However someuser want to enable only idlepage writeback so we need to introduce turn on/off knob for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I don't want to make it complicated *if possible*. Long term, I imagine we need to make VM aware of new swap hierarchy a little bit different with as-is. For example, first high priority swap can return -EIO or -ENOCOMP, swap try to fallback to next lower priority swap device. With that, hugepage writeback will work tranparently. > > > If we hear some regression, we could restore the function. > > Why not do that now? > We want to remove it at this moment.
Re: [PATCH v3 5/7] zram: support idle/huge page writeback
On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim wrote: > This patch supports new feature "zram idle/huge page writeback". > On zram-swap usecase, zram has usually many idle/huge swap pages. > It's pointless to keep in memory(ie, zram). > > To solve the problem, this feature introduces idle/huge page > writeback to backing device so the goal is to save more memory > space on embedded system. > > Normal sequence to use idle/huge page writeback feature is as follows, > > while (1) { > # mark allocated zram slot to idle > echo all > /sys/block/zram0/idle > # leave system working for several hours > # Unless there is no access for some blocks on zram, > # they are still IDLE marked pages. > > echo "idle" > /sys/block/zram0/writeback > or/and > echo "huge" > /sys/block/zram0/writeback > # write the IDLE or/and huge marked slot into backing device > # and free the memory. > } > > By per discussion: > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, > > This patch removes direct incommpressibe page writeback feature > (d2afd25114f4, zram: write incompressible pages to backing device) > so we could regard it as regression because incompressible pages > doesn't go to backing storage automatically. Instead, usre should > do it via "echo huge" > /sys/block/zram/writeback" manually. I'm not in any position to determine the regression risk here. Why is that feature being removed, anyway? > If we hear some regression, we could restore the function. Why not do that now?
Re: [PATCH v3 5/7] zram: support idle/huge page writeback
On Tue, 27 Nov 2018 14:54:27 +0900 Minchan Kim wrote: > This patch supports new feature "zram idle/huge page writeback". > On zram-swap usecase, zram has usually many idle/huge swap pages. > It's pointless to keep in memory(ie, zram). > > To solve the problem, this feature introduces idle/huge page > writeback to backing device so the goal is to save more memory > space on embedded system. > > Normal sequence to use idle/huge page writeback feature is as follows, > > while (1) { > # mark allocated zram slot to idle > echo all > /sys/block/zram0/idle > # leave system working for several hours > # Unless there is no access for some blocks on zram, > # they are still IDLE marked pages. > > echo "idle" > /sys/block/zram0/writeback > or/and > echo "huge" > /sys/block/zram0/writeback > # write the IDLE or/and huge marked slot into backing device > # and free the memory. > } > > By per discussion: > https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, > > This patch removes direct incommpressibe page writeback feature > (d2afd25114f4, zram: write incompressible pages to backing device) > so we could regard it as regression because incompressible pages > doesn't go to backing storage automatically. Instead, usre should > do it via "echo huge" > /sys/block/zram/writeback" manually. I'm not in any position to determine the regression risk here. Why is that feature being removed, anyway? > If we hear some regression, we could restore the function. Why not do that now?
[PATCH v3 5/7] zram: support idle/huge page writeback
This patch supports new feature "zram idle/huge page writeback". On zram-swap usecase, zram has usually many idle/huge swap pages. It's pointless to keep in memory(ie, zram). To solve the problem, this feature introduces idle/huge page writeback to backing device so the goal is to save more memory space on embedded system. Normal sequence to use idle/huge page writeback feature is as follows, while (1) { # mark allocated zram slot to idle echo all > /sys/block/zram0/idle # leave system working for several hours # Unless there is no access for some blocks on zram, # they are still IDLE marked pages. echo "idle" > /sys/block/zram0/writeback or/and echo "huge" > /sys/block/zram0/writeback # write the IDLE or/and huge marked slot into backing device # and free the memory. } By per discussion: https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, This patch removes direct incommpressibe page writeback feature (d2afd25114f4, zram: write incompressible pages to backing device) so we could regard it as regression because incompressible pages doesn't go to backing storage automatically. Instead, usre should do it via "echo huge" > /sys/block/zram/writeback" manually. If we hear some regression, we could restore the function. Reviewed-by: Joey Pabalinas Signed-off-by: Minchan Kim --- Documentation/ABI/testing/sysfs-block-zram | 7 + Documentation/blockdev/zram.txt| 28 ++- drivers/block/zram/Kconfig | 5 +- drivers/block/zram/zram_drv.c | 247 +++-- drivers/block/zram/zram_drv.h | 1 + 5 files changed, 209 insertions(+), 79 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 04c9a5980bc7..d1f80b077885 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -106,3 +106,10 @@ Contact: Minchan Kim idle file is write-only and mark zram slot as idle. If system has mounted debugfs, user can see which slots are idle via /sys/kernel/debug/zram/zram/block_state + +What: /sys/block/zram/writeback +Date: November 2018 +Contact: Minchan Kim +Description: + The writeback file is write-only and trigger idle and/or + huge page writeback to backing device. diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index f3bcd716d8a9..806cdaabac83 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -238,11 +238,31 @@ The stat file represents device's mm statistics. It consists of a single = writeback -With incompressible pages, there is no memory saving with zram. -Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page +With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page to backing storage rather than keeping it in memory. -User should set up backing device via /sys/block/zramX/backing_dev -before disksize setting. +To use the feature, admin should set up backing device via + + "echo /dev/sda5 > /sys/block/zramX/backing_dev" + +before disksize setting. It supports only partition at this moment. +If admin want to use incompressible page writeback, they could do via + + "echo huge > /sys/block/zramX/write" + +To use idle page writeback, first, user need to declare zram pages +as idle. + + "echo all > /sys/block/zramX/idle" + +From now on, any pages on zram are idle pages. The idle mark +will be removed until someone request access of the block. +IOW, unless there is access request, those pages are still idle pages. + +Admin can request writeback of those idle pages at right timing via + + "echo idle > /sys/block/zramX/writeback" + +With the command, zram writeback idle pages from memory to the storage. = memory tracking diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index fcd055457364..1ffc64770643 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -15,7 +15,7 @@ config ZRAM See Documentation/blockdev/zram.txt for more information. config ZRAM_WRITEBACK - bool "Write back incompressible page to backing device" + bool "Write back incompressible or idle page to backing device" depends on ZRAM help With incompressible page, there is no memory saving to keep it @@ -23,6 +23,9 @@ config ZRAM_WRITEBACK For this feature, admin should set up backing device via /sys/block/zramX/backing_dev. +With /sys/block/zramX/{idle,writeback}, application could ask +idle page's writeback to the backing device to save in memory. + See Documentation/blockdev/zram.txt for more information. config ZRAM_MEMORY_TRACKING diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
[PATCH v3 5/7] zram: support idle/huge page writeback
This patch supports new feature "zram idle/huge page writeback". On zram-swap usecase, zram has usually many idle/huge swap pages. It's pointless to keep in memory(ie, zram). To solve the problem, this feature introduces idle/huge page writeback to backing device so the goal is to save more memory space on embedded system. Normal sequence to use idle/huge page writeback feature is as follows, while (1) { # mark allocated zram slot to idle echo all > /sys/block/zram0/idle # leave system working for several hours # Unless there is no access for some blocks on zram, # they are still IDLE marked pages. echo "idle" > /sys/block/zram0/writeback or/and echo "huge" > /sys/block/zram0/writeback # write the IDLE or/and huge marked slot into backing device # and free the memory. } By per discussion: https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, This patch removes direct incommpressibe page writeback feature (d2afd25114f4, zram: write incompressible pages to backing device) so we could regard it as regression because incompressible pages doesn't go to backing storage automatically. Instead, usre should do it via "echo huge" > /sys/block/zram/writeback" manually. If we hear some regression, we could restore the function. Reviewed-by: Joey Pabalinas Signed-off-by: Minchan Kim --- Documentation/ABI/testing/sysfs-block-zram | 7 + Documentation/blockdev/zram.txt| 28 ++- drivers/block/zram/Kconfig | 5 +- drivers/block/zram/zram_drv.c | 247 +++-- drivers/block/zram/zram_drv.h | 1 + 5 files changed, 209 insertions(+), 79 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index 04c9a5980bc7..d1f80b077885 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram @@ -106,3 +106,10 @@ Contact: Minchan Kim idle file is write-only and mark zram slot as idle. If system has mounted debugfs, user can see which slots are idle via /sys/kernel/debug/zram/zram/block_state + +What: /sys/block/zram/writeback +Date: November 2018 +Contact: Minchan Kim +Description: + The writeback file is write-only and trigger idle and/or + huge page writeback to backing device. diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index f3bcd716d8a9..806cdaabac83 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt @@ -238,11 +238,31 @@ The stat file represents device's mm statistics. It consists of a single = writeback -With incompressible pages, there is no memory saving with zram. -Instead, with CONFIG_ZRAM_WRITEBACK, zram can write incompressible page +With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page to backing storage rather than keeping it in memory. -User should set up backing device via /sys/block/zramX/backing_dev -before disksize setting. +To use the feature, admin should set up backing device via + + "echo /dev/sda5 > /sys/block/zramX/backing_dev" + +before disksize setting. It supports only partition at this moment. +If admin want to use incompressible page writeback, they could do via + + "echo huge > /sys/block/zramX/write" + +To use idle page writeback, first, user need to declare zram pages +as idle. + + "echo all > /sys/block/zramX/idle" + +From now on, any pages on zram are idle pages. The idle mark +will be removed until someone request access of the block. +IOW, unless there is access request, those pages are still idle pages. + +Admin can request writeback of those idle pages at right timing via + + "echo idle > /sys/block/zramX/writeback" + +With the command, zram writeback idle pages from memory to the storage. = memory tracking diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig index fcd055457364..1ffc64770643 100644 --- a/drivers/block/zram/Kconfig +++ b/drivers/block/zram/Kconfig @@ -15,7 +15,7 @@ config ZRAM See Documentation/blockdev/zram.txt for more information. config ZRAM_WRITEBACK - bool "Write back incompressible page to backing device" + bool "Write back incompressible or idle page to backing device" depends on ZRAM help With incompressible page, there is no memory saving to keep it @@ -23,6 +23,9 @@ config ZRAM_WRITEBACK For this feature, admin should set up backing device via /sys/block/zramX/backing_dev. +With /sys/block/zramX/{idle,writeback}, application could ask +idle page's writeback to the backing device to save in memory. + See Documentation/blockdev/zram.txt for more information. config ZRAM_MEMORY_TRACKING diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c