Dear all,
TL;DR/summary:
- Tuning vm.watermark_boost_factor to 0 (disable) on Debian
significantly improves performance on memory-intensive tasks that utilise
SWAP space, by stopping preemptive kswapd freeing of memory, and
subsequent page thrashing.
- I suggest that Debian should tune vm-watermark_boost_fact=0 by default
to prevent this problem.
I have recently installed Debian 11 on a HP Z8 G4 Workstation (Z3Z16AV) -
32GB RAM, installed with ~120GB SWAP on a 2TB solid state drive (specs at
end of this message).
I have been running some compute-intensive image processing tasks (CPU- and
memory- intensive), which has on occasion had to dip into SWAP space,
depending on image sizes (the processing I am running is image registration
using elastix/transformix).
I had benchmarked the code on my Ubuntu laptop (similar spec) without any
problems, but when running on Debian, whenever SWAP was needed, the system
processing significantly slowed down/essentially froze.
After much debugging, I have traced this to the vm.watermark_boost_factor
kernel parameter:
Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a
showed two key differences in virtual memory (vm) management parameters.
- Ubuntu:
- vm.swappiness=60
- vm.watermark_boost_factor=0
- Debian:
- vm.swappiness=10
- vm.watermark_boost_factor=150
I identified what these two parameters control:
- vm.swappiness : a parameter used to calculate the swap tendency (
https://access.redhat.com/solutions/103833)
- vm.watermark_boost_factor : controls the level of reclaim when memory
is being fragmented.. A boost factor of 0 will disable the feature. (
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.4_release_notes/kernel_parameters_changes
)
I changed swappiness and then watermark_boost_factor sequentially, to see
whether tuning these parameters to match my Ubuntu system prevented the
system from freezing under my memory-intensive task.
- sudo sysctl vm.swappiness=60 on my Debian system did not prevent the
freezing behaviour.
- sudo sysctl vm.watermark_boost_factor=0 (disabling it) on my Debian
system prevented the freezing behaviour.
I then set these permanently by adding the following to /etc/sysctl.conf
vm.swappiness=60
vm.watermark_boost_factor=0
Further searching revealed this Ubuntu bug report:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359
swap storms kills interactive use
With this key entry:
Sultan Alsawaf (kerneltoast) wrote on 2020-03-27: #56
This problem is caused by an upstream memory management feature called
watermark boosting. Normally, when a memory allocation fails and falls back
to the page allocator, the page allocator will wake up kswapd to free up
pages in order to make the memory allocation succeed. kswapd tries to free
memory until it reaches a minimum amount of memory for each memory zone
called the high watermark.
What watermark boosting does is try to preemptively fire up kswapd to free
memory when there hasn't been an allocation failure. It does this by
increasing kswapd's high watermark goal and then firing up kswapd. The
reason why this causes freezes is because, with the increased high
watermark goal, kswapd will steal memory from processes that need it in
order to make forward progress. These processes will, in turn, try to
allocate memory again, which will cause kswapd to steal necessary pages
from those processes again, in a positive feedback loop known as page
thrashing. When page thrashing occurs, your system is essentially
livelocked until the necessary forward progress can be made to stop
processes from trying to continuously allocate memory and trigger kswapd to
steal it back.
This problem already occurs with kswapd *without* watermark boosting, but
it's usually only encountered on machines with a small amount of memory
and/or a slow CPU. Watermark boosting just makes the existing problem worse
enough to notice on higher spec'd machines.
To fix the issue in this bug, watermark boosting can be disabled with the
following:
# echo 0 > /proc/sys/vm/watermark_boost_factor
There's really no harm in doing so, because watermark boosting is an
inherently broken feature...
So essentially, disabling watermark_boost_factor ensures effective swapping
and reduces page thrashing.
*I therefore suggest that Debian should tune vm.watermark_boost_factor=0 by
default.*
Cheers,
Steve.
Below are some more detailed specs of my Debian machine for reference:
$ uname -a
Linux panseer 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64
GNU/Linux
$ lscpu
Architecture:x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 20
On-line CPU(s) list: 0-19
Thread(s) per core: 2
Core(