Cc few more people On Thu 01-08-19 15:42:50, Jan Hadrava wrote: > There seems to be a bug in mm/vmscan.c shrink_slab function when kernel is > compilled with CONFIG_MEMCG=y and it is then disabled at boot with commandline > parameter cgroup_disable=memory. SLABs are then not getting shrinked if the > system memory is consumed by userspace.
This looks similar to http://lkml.kernel.org/r/[email protected] although the culprit commit has been identified to be different. Could you try it out please? Maybe we need more fixes. keeping the rest of the email for the reference > This issue is present in linux-stable 4.19 and all newer lines. > (tested on git tags v5.3-rc2 v5.2.5 v5.1.21 v4.19.63) > And it is no not present in 4.14.135 (v4.14.135). > > Git bisect is pointing to commit: > b0dedc49a2daa0f44ddc51fbf686b2ef012fccbf > > Particulary the last hunk seems to be causing it: > > @@ -585,13 +657,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > .memcg = memcg, > }; > > - /* > - * If kernel memory accounting is disabled, we ignore > - * SHRINKER_MEMCG_AWARE flag and call all shrinkers > - * passing NULL for memcg. > - */ > - if (memcg_kmem_enabled() && > - !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > + if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > continue; > > if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > > Following commit aeed1d325d429ac9699c4bf62d17156d60905519 > deletes conditional continue (and so it fixes the problem). But it is creating > similar issue few lines earlier: > > @@ -644,7 +642,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > struct shrinker *shrinker; > unsigned long freed = 0; > > - if (memcg && !mem_cgroup_is_root(memcg)) > + if (!mem_cgroup_is_root(memcg)) > return shrink_slab_memcg(gfp_mask, nid, memcg, priority); > > if (!down_read_trylock(&shrinker_rwsem)) > @@ -657,9 +655,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, > .memcg = memcg, > }; > > - if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE)) > - continue; > - > if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) > sc.nid = 0; > > > How was the bisection done: > > - Compile kernel with x86_64_defconfig + CONFIG_MEMCG=y > - Boot VM with cgroup_disable=memory and filesystem with 500k Inodes and run > simple script on it: > - Observe number of active objects of ext4_inode_cache > --> around 400, but anything under 1000 was accepted by the bisect script > > - Call `find / > /dev/null` > - Again observe number of active objects of ext4_inode_cache > --> around 7000, but anything over 6000 was accepted by the script > > - Consume whole memory by simple program `while(1){ malloc(1); }` until it > gets killed by oom-killer. > - Again observe number of active objects of ext4_inode_cache > --> around 7000, script threshold: >= 6000 --> bug is there > --> around 100, script threshold <= 1000 --> bug not present > > Real-world appearance: > > We encountered this issue after upgrading kernel from 4.9 to 4.19 on our > backup > server. (Debian Stretch userspace, upgrade to Debian Buster distribution > kernel > or custom build 4.19.60.) The server has around 12 M of used inodes and only > 4 GB of RAM. Whenever we run the backup, memory gets quickly consumed by > kernel > SLABs (mainly ext4_inode_cache). Userspace starts receiving a lot of hits by > oom-killer after that so the server is completly unusable until reboot. > > We just removed the cgroup_disable=memory parameter on our server. Memory > footprint of memcg is significantly smaller then it used to be in the time we > started using this parameter. But i still think that mentioned behaviour is a > bug and should be fixed. > > By the way, it seems like the raspberrypi kernel was fighting this issue as > well: > https://github.com/raspberrypi/linux/issues/2829 > If I'm reading correctly: they disabled memcg via commandline due to some > memory leaks. Month later: they hit this issue and reenabled memcg. > > > Thanks, > Jan -- Michal Hocko SUSE Labs

