Recently huge amount one-off slab drop was seen on some vfs metadata heavy 
workloads,
it turned out there were huge amount accumulated nr_deferred objects seen by the
shrinker.

I managed to reproduce this problem with kernel build workload plus negative 
dentry
generator.

First step, run the below kernel build test script:

NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l`

cd /root/Buildarea/linux-stable

for i in `seq 1500`; do
        cgcreate -g memory:kern_build
        echo 4G > /sys/fs/cgroup/memory/kern_build/memory.limit_in_bytes

        echo 3 > /proc/sys/vm/drop_caches
        cgexec -g memory:kern_build make clean > /dev/null 2>&1
        cgexec -g memory:kern_build make -j$NR_CPUS > /dev/null 2>&1

        cgdelete -g memory:kern_build
done

That would generate huge amount deferred objects due to __GFP_NOFS allocations.

Then run the below negative dentry generator script:

NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l`

mkdir /sys/fs/cgroup/memory/test
echo $$ > /sys/fs/cgroup/memory/test/tasks

for i in `seq $NR_CPUS`; do
        while true; do
                FILE=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 64`
                cat $FILE 2>/dev/null
        done &
done

Then kswapd will shrink half of dentry cache in just one loop as the below 
tracing result
showed:

        kswapd0-475   [028] .... 305968.252561: mm_shrink_slab_start: 
super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0
objects to shrink 4994376020 gfp_flags GFP_KERNEL cache items 93689873 delta 
45746 total_scan 46844936 priority 12
        kswapd0-475   [021] .... 306013.099399: mm_shrink_slab_end: 
super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0 unused
scan count 4994376020 new scan count 4947576838 total_scan 8 last shrinker 
return val 46844928

There were huge deferred objects before the shrinker was called, the behavior 
does match the code
but it might be not desirable from the user's stand of point.

IIUC the deferred objects were used to make balance between slab and page 
cache, but since commit
9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use sc->priority for slab shrink 
targets") they
were decoupled.  And as that commit stated "these two things have nothing to do 
with each other".

So why do we have to still keep it around?  I can think of there might be huge 
slab accumulated
without taking into account deferred objects, but nowadays the most workloads 
are constrained by
memcg which could limit the usage of kmem (by default now), so it seems 
maintaining deferred
objects is not that useful anymore.  It seems we could remove it to simplify 
the shrinker logic
a lot.

I may overlook some other important usecases of nr_deferred, comments are much 
appreciated.


Reply via email to