Kirill Tkhai <ktk...@virtuozzo.com> writes: > When a FUSE process is making shrink, it must not wait > on page writeback. Otherwise, it may meet a page, > that is being writebacked by him, and the process will stall. > > So, our kernel does not wait writeback after commit a9707947010d > "mm: vmscan: never wait on writeback pages". > > But in case of huge number of writebacked pages and > memory pressure, this lead to busy loop: many process > in the system are trying to shrink memory and have > no success. And the node shows high time, spent in kernel. > > This patch reduces the number of processes, which may > busy looping on shrink. Only one userspace process -- > vstorage -- will be allowed not to sleep on writeback. > Other processes will sleep up to 5 seconds to wait > writeback completion on every page. > > The detection of vstorage is very simple and it based > on process name. It seems, there is no a way to detect NAK. Detection by name is very very bad design style. fused and others should mark iself as writeback-proof explicitly via API similar ioctl/madvice/ionice/ulimit, may be it is reasonable to place such app to speciffic cgroup, you may pick any recepy you like. But please do not do comm-name matching.
> all FUSE processes, especially from !ve0, because FUSE > mount is tricky, and a process doing mount may not be > a FUSE daemon. So, we remain the vanila kernel behaviour, > but we don't wait forever, just 5 second. This will save > us from lookup messages from kernel and will allow > to kill FUSE daemon if necessary. > > https://jira.sw.ru/browse/PSBM-69296 > > Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com> > --- > mm/vmscan.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a5db5940bb1..e72d515c111 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -959,8 +959,16 @@ static unsigned long shrink_page_list(struct list_head > *page_list, > > /* Case 3 above */ > } else { > - nr_immediate++; > - goto keep_locked; > + /* > + * Currently, vstorage is the only fuse process, > + * exercising writeback; it mustn't sleep to > avoid > + * deadlocks. > + */ > + if (!strncmp(current->comm, "vstorage", 8) || > + wait_on_page_bit_killable_timeout(page, > PG_writeback, 5 * HZ) != 0) { > + nr_immediate++; > + goto keep_locked; > + } > } > } > > @@ -1592,9 +1600,10 @@ shrink_inactive_list(unsigned long nr_to_scan, struct > lruvec *lruvec, > if (nr_writeback && nr_writeback == nr_taken) > zone_set_flag(zone, ZONE_WRITEBACK); > > - if (!global_reclaim(sc) && nr_immediate) > - congestion_wait(BLK_RW_ASYNC, HZ/10); > - > + /* > + * memcg will stall in page writeback so only consider forcibly > + * stalling for global reclaim > + */ > if (global_reclaim(sc)) { > /* > * Tag a zone as congested if all the dirty pages scanned were _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel