Re: [PATCH 1/2] Revert "mm: don't reclaim inodes with many attached pages"

Jan Kara Fri, 08 Feb 2019 01:55:33 -0800

On Thu 07-02-19 21:37:27, Andrew Morton wrote:
> On Thu, 7 Feb 2019 11:27:50 +0100 Jan Kara <[email protected]> wrote:
> 
> > On Fri 01-02-19 09:19:04, Dave Chinner wrote:
> > > Maybe for memcgs, but that's exactly the oppose of what we want to
> > > do for global caches (e.g. filesystem metadata caches). We need to
> > > make sure that a single, heavily pressured cache doesn't evict small
> > > caches that lower pressure but are equally important for
> > > performance.
> > > 
> > > e.g. I've noticed recently a significant increase in RMW cycles in
> > > XFS inode cache writeback during various benchmarks. It hasn't
> > > affected performance because the machine has IO and CPU to burn, but
> > > on slower machines and storage, it will have a major impact.
> > 
> > Just as a data point, our performance testing infrastructure has bisected
> > down to the commits discussed in this thread as the cause of about 40%
> > regression in XFS file delete performance in bonnie++ benchmark.
> > 
> 
> Has anyone done significant testing with Rik's maybe-fix?


I will give it a spin with bonnie++ today. We'll see what comes out.

                                                                Honza

> 
> 
> 
> From: Rik van Riel <[email protected]>
> Subject: mm, slab, vmscan: accumulate gradual pressure on small slabs
> 
> There are a few issues with the way the number of slab objects to scan is
> calculated in do_shrink_slab.  First, for zero-seek slabs, we could leave
> the last object around forever.  That could result in pinning a dying
> cgroup into memory, instead of reclaiming it.  The fix for that is
> trivial.
> 
> Secondly, small slabs receive much more pressure, relative to their size,
> than larger slabs, due to "rounding up" the minimum number of scanned
> objects to batch_size.
> 
> We can keep the pressure on all slabs equal relative to their size by
> accumulating the scan pressure on small slabs over time, resulting in
> sometimes scanning an object, instead of always scanning several.
> 
> This results in lower system CPU use, and a lower major fault rate, as
> actively used entries from smaller caches get reclaimed less aggressively,
> and need to be reloaded/recreated less often.
> 
> [[email protected]: whitespace fixes, per Roman]
> [[email protected]: couple of fixes]
>   Link: http://lkml.kernel.org/r/[email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Fixes: 4b85afbdacd2 ("mm: zero-seek shrinkers")
> Fixes: 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number 
> of objects")
> Signed-off-by: Rik van Riel <[email protected]>
> Tested-by: Chris Mason <[email protected]>
> Acked-by: Roman Gushchin <[email protected]>
> Acked-by: Johannes Weiner <[email protected]>
> Cc: Dave Chinner <[email protected]>
> Cc: Jonathan Lemon <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: <[email protected]>
> 
> Signed-off-by: Andrew Morton <[email protected]>
> ---
> 
> 
> --- 
> a/include/linux/shrinker.h~mmslabvmscan-accumulate-gradual-pressure-on-small-slabs
> +++ a/include/linux/shrinker.h
> @@ -65,6 +65,7 @@ struct shrinker {
>  
>       long batch;     /* reclaim batch size, 0 = default */
>       int seeks;      /* seeks to recreate an obj */
> +     int small_scan; /* accumulate pressure on slabs with few objects */
>       unsigned flags;
>  
>       /* These are for internal use */
> --- a/mm/vmscan.c~mmslabvmscan-accumulate-gradual-pressure-on-small-slabs
> +++ a/mm/vmscan.c
> @@ -488,18 +488,30 @@ static unsigned long do_shrink_slab(stru
>                * them aggressively under memory pressure to keep
>                * them from causing refetches in the IO caches.
>                */
> -             delta = freeable / 2;
> +             delta = (freeable + 1) / 2;
>       }
>  
>       /*
>        * Make sure we apply some minimal pressure on default priority
> -      * even on small cgroups. Stale objects are not only consuming memory
> +      * even on small cgroups, by accumulating pressure across multiple
> +      * slab shrinker runs. Stale objects are not only consuming memory
>        * by themselves, but can also hold a reference to a dying cgroup,
>        * preventing it from being reclaimed. A dying cgroup with all
>        * corresponding structures like per-cpu stats and kmem caches
>        * can be really big, so it may lead to a significant waste of memory.
>        */
> -     delta = max_t(unsigned long long, delta, min(freeable, batch_size));
> +     if (!delta && shrinker->seeks) {
> +             unsigned long nr_considered;
> +
> +             shrinker->small_scan += freeable;
> +             nr_considered = shrinker->small_scan >> priority;
> +
> +             delta = 4 * nr_considered;
> +             do_div(delta, shrinker->seeks);
> +
> +             if (delta)
> +                     shrinker->small_scan -= nr_considered << priority;
> +     }
>  
>       total_scan += delta;
>       if (total_scan < 0) {
> _
> 
-- 
Jan Kara <[email protected]>
SUSE Labs, CR

Re: [PATCH 1/2] Revert "mm: don't reclaim inodes with many attached pages"

Reply via email to