Kirill Tkhai writes:
> When a FUSE process is making shrink, it must not wait
> on page writeback. Otherwise, it may meet a page,
> that is being writebacked by him, and the process will stall.
>
> So, our kernel does not wait writeback after commit a9707947010d
> "mm: vmscan: never wait on writeback pages".
>
> But in case of huge number of writebacked pages and
> memory pressure, this lead to busy loop: many process
> in the system are trying to shrink memory and have
> no success. And the node shows high time, spent in kernel.
>
> This patch reduces the number of processes, which may
> busy looping on shrink. Only one userspace process --
> vstorage -- will be allowed not to sleep on writeback.
> Other processes will sleep up to 5 seconds to wait
> writeback completion on every page.
>
> The detection of vstorage is very simple and it based
> on process name. It seems, there is no a way to detect
NAK. Detection by name is very very bad design style.
fused and others should mark iself as writeback-proof explicitly
via API similar ioctl/madvice/ionice/ulimit,
may be it is reasonable to place such app to speciffic cgroup,
you may pick any recepy you like. But please do not do comm-name
matching.
> all FUSE processes, especially from !ve0, because FUSE
> mount is tricky, and a process doing mount may not be
> a FUSE daemon. So, we remain the vanila kernel behaviour,
> but we don't wait forever, just 5 second. This will save
> us from lookup messages from kernel and will allow
> to kill FUSE daemon if necessary.
>
> https://jira.sw.ru/browse/PSBM-69296
>
> Signed-off-by: Kirill Tkhai
> ---
> mm/vmscan.c | 19 ++-
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a5db5940bb1..e72d515c111 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -959,8 +959,16 @@ static unsigned long shrink_page_list(struct list_head
> *page_list,
>
> /* Case 3 above */
> } else {
> - nr_immediate++;
> - goto keep_locked;
> + /*
> + * Currently, vstorage is the only fuse process,
> + * exercising writeback; it mustn't sleep to
> avoid
> + * deadlocks.
> + */
> + if (!strncmp(current->comm, "vstorage", 8) ||
> + wait_on_page_bit_killable_timeout(page,
> PG_writeback, 5 * HZ) != 0) {
> + nr_immediate++;
> + goto keep_locked;
> + }
> }
> }
>
> @@ -1592,9 +1600,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct
> lruvec *lruvec,
> if (nr_writeback && nr_writeback == nr_taken)
> zone_set_flag(zone, ZONE_WRITEBACK);
>
> - if (!global_reclaim(sc) && nr_immediate)
> - congestion_wait(BLK_RW_ASYNC, HZ/10);
> -
> + /*
> + * memcg will stall in page writeback so only consider forcibly
> + * stalling for global reclaim
> + */
> if (global_reclaim(sc)) {
> /*
>* Tag a zone as congested if all the dirty pages scanned were
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel