Re: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-29 Thread Steven J. West
Thanks, Nicholas :D  I mainly posted this problem with a solution here for
reference to other Debian users ;)

Yes, my processing memory requirements (typically ~30GB) are borderline to
my current physical memory limit (~32GB), and as my images sometimes are a
little bigger, this can push it into swapping, which indeed is not ideal..

I am modifying my code to batch this procedure to lower the gargantuan
memory requirements of my image registration task, but in the meantime, for
processes with large memory requirements that may occasionally need to
swap, I found this kernel tuning at least allows the process to complete.

Cheers!

Steve.


On Fri, 28 Jan 2022 at 23:56, Nicholas Geovanis 
wrote:

>
>
> On Fri, Jan 28, 2022, 4:33 AM Steven J. West 
> wrote:
>
>> Dear all,
>>
>> TL;DR/summary:
>>
>>- Tuning vm.watermark_boost_factor to 0 (disable) on Debian
>>significantly improves performance on memory-intensive tasks that utilise
>>SWAP space, by stopping preemptive kswapd freeing of memory, and
>>subsequent page thrashing.
>>- I suggest that Debian should tune vm-watermark_boost_fact=0 by
>>default to prevent this problem
>>
>> I'm not a Debian maintainer, but this has got to be the best problem
> report I ever saw :-)
>
> But for years I have adopted the philosophy at home which is demanded in
> every data center I've worked in: If your Linux system is swapping, you
> have configured it wrong. In the server farms there is no swapping. You
> make sure you have enough RAM to prevent swapping. EOS.
>
>
> I have recently installed Debian 11 on a HP Z8 G4 Workstation (Z3Z16AV) -
>> 32GB RAM, installed with ~120GB SWAP on a 2TB solid state drive (specs at
>> end of this message).
>>
>> I have been running some compute-intensive image processing tasks (CPU-
>> and memory- intensive), which has on occasion had to dip into SWAP space,
>> depending on image sizes (the processing I am running is image registration
>> using elastix/transformix).
>>
>> I had benchmarked the code on my Ubuntu laptop (similar spec) without any
>> problems, but when running on Debian, whenever SWAP was needed, the system
>> processing significantly slowed down/essentially froze.
>>
>> After much debugging, I have traced this to the vm.watermark_boost_factor
>> kernel parameter:
>>
>> Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a
>> showed two key differences in virtual memory (vm) management parameters.
>>
>>- Ubuntu:
>>   - vm.swappiness=60
>>   - vm.watermark_boost_factor=0
>>   - Debian:
>>   - vm.swappiness=10
>>   - vm.watermark_boost_factor=150
>>
>>
>> I identified what these two parameters control:
>>
>>
>>- vm.swappiness : a parameter used to calculate the swap tendency (
>>https://access.redhat.com/solutions/103833)
>>- vm.watermark_boost_factor : controls the level of reclaim when
>>memory is being fragmented.. A boost factor of 0 will disable the 
>> feature. (
>>
>> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.4_release_notes/kernel_parameters_changes
>>)
>>
>>
>> I changed swappiness and then watermark_boost_factor sequentially, to
>> see whether tuning these parameters to match my Ubuntu system prevented the
>> system from freezing under my memory-intensive task.
>>
>>
>>- sudo sysctl vm.swappiness=60 on my Debian system did not prevent
>>the freezing behaviour.
>>- sudo sysctl vm.watermark_boost_factor=0 (disabling it) on my Debian
>>system prevented the freezing behaviour.
>>
>>
>> I then set these permanently by adding the following to /etc/sysctl.conf
>>
>> vm.swappiness=60
>> vm.watermark_boost_factor=0
>>
>>
>> Further searching revealed this Ubuntu bug report:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359
>>
>> swap storms kills interactive use
>> With this key entry:
>>
>> Sultan Alsawaf (kerneltoast) wrote on 2020-03-27: #56
>>
>> This problem is caused by an upstream memory management feature called
>> watermark boosting. Normally, when a memory allocation fails and falls back
>> to the page allocator, the page allocator will wake up kswapd to free up
>> pages in order to make the memory allocation succeed. kswapd tries to free
>> memory until it reaches a minimum amount of memory for each memory zone
>> called the high watermark.
>>
>> What watermark boosting does is try to preemptively fire up kswapd to
>> free memory when there hasn't been an allocation failure. It does this by
>> increasing kswapd's high watermark goal and then firing up kswapd. The
>> reason why this causes freezes is because, with the increased high
>> watermark goal, kswapd will steal memory from processes that need it in
>> order to make forward progress. These processes will, in turn, try to
>> allocate memory again, which will cause kswapd to steal necessary pages
>> from those processes again, in a positive feedback loop known as page
>> thrashing. When page thrashing o

Re: Fwd: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-29 Thread Marco Möller

On 28.01.22 22:55, Tixy wrote:

On Fri, 2022-01-28 at 17:31 +0100, Marco Möller wrote:

On 28.01.22 11:15, Steven J. West wrote:

Comparing the Ubuntu and Debian kernel parameters using sudo sysctl
-a
showed two key differences in virtual memory (vm) management
parameters.

   * Ubuntu:
   o vm.swappiness=60
   o vm.watermark_boost_factor=0
   * Debian:
   o vm.swappiness=10
   o vm.watermark_boost_factor=150


Might this "150" be a typographical error and you wanted to write
15?
Your reference to the Red Hat documentation states 15 to be the
default in Red Hat, and in my Debian, where I have not touched this
value, it is also set to 15.


Might 15 be a typographical error too? ;-) On my machine...

# cat /proc/sys/vm/watermark_boost_factor
15000

Which matches the default in the kernel source code [1] and 'git blame'
shows that line hasn't been changed since the original commit in 2018
[2]

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?id=169387e2aa291a4e3cb856053730fe99d6cec06f#n354
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c30844d2dfe272d58c8fc000960b835d13aa2ac



You are right. 15000 is the value I wanted to write. Shame on me. Sorry.
Marco



Re: Fwd: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-29 Thread Andrei POPESCU
On Vi, 28 ian 22, 10:15:58, Steven J. West wrote:
> Dear all,
> 
> TL;DR/summary:
> 
>- Tuning vm.watermark_boost_factor to 0 (disable) on Debian
>significantly improves performance on memory-intensive tasks that utilise
>SWAP space, by stopping preemptive kswapd freeing of memory, and
>subsequent page thrashing.
>- I suggest that Debian should tune vm-watermark_boost_fact=0 by default
>to prevent this problem.

Hello,

This list is mostly for Debian users.

While some Debian Developers are reading and even actively engaging with 
the community (thanks!) you should probably send this either to 
debian-kernel or file it as a bug against the source package 'linux'.

(reportbug should do this by default if you point it to any linux-image 
package you have installed)

Kind regards,
Andrei
-- 
http://wiki.debian.org/FAQsFromDebianUser


signature.asc
Description: PGP signature


Re: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-28 Thread Nicholas Geovanis
On Fri, Jan 28, 2022, 4:33 AM Steven J. West 
wrote:

> Dear all,
>
> TL;DR/summary:
>
>- Tuning vm.watermark_boost_factor to 0 (disable) on Debian
>significantly improves performance on memory-intensive tasks that utilise
>SWAP space, by stopping preemptive kswapd freeing of memory, and
>subsequent page thrashing.
>- I suggest that Debian should tune vm-watermark_boost_fact=0 by
>default to prevent this problem
>
> I'm not a Debian maintainer, but this has got to be the best problem
report I ever saw :-)

But for years I have adopted the philosophy at home which is demanded in
every data center I've worked in: If your Linux system is swapping, you
have configured it wrong. In the server farms there is no swapping. You
make sure you have enough RAM to prevent swapping. EOS.


I have recently installed Debian 11 on a HP Z8 G4 Workstation (Z3Z16AV) -
> 32GB RAM, installed with ~120GB SWAP on a 2TB solid state drive (specs at
> end of this message).
>
> I have been running some compute-intensive image processing tasks (CPU-
> and memory- intensive), which has on occasion had to dip into SWAP space,
> depending on image sizes (the processing I am running is image registration
> using elastix/transformix).
>
> I had benchmarked the code on my Ubuntu laptop (similar spec) without any
> problems, but when running on Debian, whenever SWAP was needed, the system
> processing significantly slowed down/essentially froze.
>
> After much debugging, I have traced this to the vm.watermark_boost_factor
> kernel parameter:
>
> Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a
> showed two key differences in virtual memory (vm) management parameters.
>
>- Ubuntu:
>   - vm.swappiness=60
>   - vm.watermark_boost_factor=0
>   - Debian:
>   - vm.swappiness=10
>   - vm.watermark_boost_factor=150
>
>
> I identified what these two parameters control:
>
>
>- vm.swappiness : a parameter used to calculate the swap tendency (
>https://access.redhat.com/solutions/103833)
>- vm.watermark_boost_factor : controls the level of reclaim when
>memory is being fragmented.. A boost factor of 0 will disable the feature. 
> (
>
> https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.4_release_notes/kernel_parameters_changes
>)
>
>
> I changed swappiness and then watermark_boost_factor sequentially, to see
> whether tuning these parameters to match my Ubuntu system prevented the
> system from freezing under my memory-intensive task.
>
>
>- sudo sysctl vm.swappiness=60 on my Debian system did not prevent the
>freezing behaviour.
>- sudo sysctl vm.watermark_boost_factor=0 (disabling it) on my Debian
>system prevented the freezing behaviour.
>
>
> I then set these permanently by adding the following to /etc/sysctl.conf
>
> vm.swappiness=60
> vm.watermark_boost_factor=0
>
>
> Further searching revealed this Ubuntu bug report:
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359
>
> swap storms kills interactive use
> With this key entry:
>
> Sultan Alsawaf (kerneltoast) wrote on 2020-03-27: #56
>
> This problem is caused by an upstream memory management feature called
> watermark boosting. Normally, when a memory allocation fails and falls back
> to the page allocator, the page allocator will wake up kswapd to free up
> pages in order to make the memory allocation succeed. kswapd tries to free
> memory until it reaches a minimum amount of memory for each memory zone
> called the high watermark.
>
> What watermark boosting does is try to preemptively fire up kswapd to free
> memory when there hasn't been an allocation failure. It does this by
> increasing kswapd's high watermark goal and then firing up kswapd. The
> reason why this causes freezes is because, with the increased high
> watermark goal, kswapd will steal memory from processes that need it in
> order to make forward progress. These processes will, in turn, try to
> allocate memory again, which will cause kswapd to steal necessary pages
> from those processes again, in a positive feedback loop known as page
> thrashing. When page thrashing occurs, your system is essentially
> livelocked until the necessary forward progress can be made to stop
> processes from trying to continuously allocate memory and trigger kswapd to
> steal it back.
>
> This problem already occurs with kswapd *without* watermark boosting, but
> it's usually only encountered on machines with a small amount of memory
> and/or a slow CPU. Watermark boosting just makes the existing problem worse
> enough to notice on higher spec'd machines.
>
> To fix the issue in this bug, watermark boosting can be disabled with the
> following:
> # echo 0 > /proc/sys/vm/watermark_boost_factor
>
> There's really no harm in doing so, because watermark boosting is an
> inherently broken feature...
>
>
> So essentially, disabling watermark_boost_factor ensures effective
> swapping and reduces page th

Re: Fwd: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-28 Thread Tixy
On Fri, 2022-01-28 at 17:31 +0100, Marco Möller wrote:
> On 28.01.22 11:15, Steven J. West wrote:
> > Comparing the Ubuntu and Debian kernel parameters using sudo sysctl
> > -a 
> > showed two key differences in virtual memory (vm) management
> > parameters.
> > 
> >   * Ubuntu:
> >   o vm.swappiness=60
> >   o vm.watermark_boost_factor=0
> >   * Debian:
> >   o vm.swappiness=10
> >   o vm.watermark_boost_factor=150
> 
> Might this "150" be a typographical error and you wanted to write
> 15?
> Your reference to the Red Hat documentation states 15 to be the 
> default in Red Hat, and in my Debian, where I have not touched this 
> value, it is also set to 15.

Might 15 be a typographical error too? ;-) On my machine...

# cat /proc/sys/vm/watermark_boost_factor 
15000

Which matches the default in the kernel source code [1] and 'git blame'
shows that line hasn't been changed since the original commit in 2018
[2]

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?id=169387e2aa291a4e3cb856053730fe99d6cec06f#n354
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1c30844d2dfe272d58c8fc000960b835d13aa2ac

-- 
Tixy



Re: Fwd: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-28 Thread Marco Möller

On 28.01.22 11:15, Steven J. West wrote:
Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a 
showed two key differences in virtual memory (vm) management parameters.


  * Ubuntu:
  o vm.swappiness=60
  o vm.watermark_boost_factor=0
  * Debian:
  o vm.swappiness=10
  o vm.watermark_boost_factor=150


Might this "150" be a typographical error and you wanted to write 15?
Your reference to the Red Hat documentation states 15 to be the 
default in Red Hat, and in my Debian, where I have not touched this 
value, it is also set to 15. My Debian was installed as Buster 
(maybe as even older Stretch?) and then upgraded to Bullseye.


Best wishes,
Marco



Fwd: Debian 11: Tuning kernel parameters swappiness and watermark_boost_factor to stop SWAP Storm

2022-01-28 Thread Steven J. West
Dear all,

TL;DR/summary:

   - Tuning vm.watermark_boost_factor to 0 (disable) on Debian
   significantly improves performance on memory-intensive tasks that utilise
   SWAP space, by stopping preemptive kswapd freeing of memory, and
   subsequent page thrashing.
   - I suggest that Debian should tune vm-watermark_boost_fact=0 by default
   to prevent this problem.


I have recently installed Debian 11 on a HP Z8 G4 Workstation (Z3Z16AV) -
32GB RAM, installed with ~120GB SWAP on a 2TB solid state drive (specs at
end of this message).

I have been running some compute-intensive image processing tasks (CPU- and
memory- intensive), which has on occasion had to dip into SWAP space,
depending on image sizes (the processing I am running is image registration
using elastix/transformix).

I had benchmarked the code on my Ubuntu laptop (similar spec) without any
problems, but when running on Debian, whenever SWAP was needed, the system
processing significantly slowed down/essentially froze.

After much debugging, I have traced this to the vm.watermark_boost_factor
kernel parameter:

Comparing the Ubuntu and Debian kernel parameters using sudo sysctl -a
showed two key differences in virtual memory (vm) management parameters.

   - Ubuntu:
  - vm.swappiness=60
  - vm.watermark_boost_factor=0
  - Debian:
  - vm.swappiness=10
  - vm.watermark_boost_factor=150


I identified what these two parameters control:


   - vm.swappiness : a parameter used to calculate the swap tendency (
   https://access.redhat.com/solutions/103833)
   - vm.watermark_boost_factor : controls the level of reclaim when memory
   is being fragmented.. A boost factor of 0 will disable the feature. (
   
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/8.4_release_notes/kernel_parameters_changes
   )


I changed swappiness and then watermark_boost_factor sequentially, to see
whether tuning these parameters to match my Ubuntu system prevented the
system from freezing under my memory-intensive task.


   - sudo sysctl vm.swappiness=60 on my Debian system did not prevent the
   freezing behaviour.
   - sudo sysctl vm.watermark_boost_factor=0 (disabling it) on my Debian
   system prevented the freezing behaviour.


I then set these permanently by adding the following to /etc/sysctl.conf

vm.swappiness=60
vm.watermark_boost_factor=0


Further searching revealed this Ubuntu bug report:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1861359

swap storms kills interactive use
With this key entry:

Sultan Alsawaf (kerneltoast) wrote on 2020-03-27: #56

This problem is caused by an upstream memory management feature called
watermark boosting. Normally, when a memory allocation fails and falls back
to the page allocator, the page allocator will wake up kswapd to free up
pages in order to make the memory allocation succeed. kswapd tries to free
memory until it reaches a minimum amount of memory for each memory zone
called the high watermark.

What watermark boosting does is try to preemptively fire up kswapd to free
memory when there hasn't been an allocation failure. It does this by
increasing kswapd's high watermark goal and then firing up kswapd. The
reason why this causes freezes is because, with the increased high
watermark goal, kswapd will steal memory from processes that need it in
order to make forward progress. These processes will, in turn, try to
allocate memory again, which will cause kswapd to steal necessary pages
from those processes again, in a positive feedback loop known as page
thrashing. When page thrashing occurs, your system is essentially
livelocked until the necessary forward progress can be made to stop
processes from trying to continuously allocate memory and trigger kswapd to
steal it back.

This problem already occurs with kswapd *without* watermark boosting, but
it's usually only encountered on machines with a small amount of memory
and/or a slow CPU. Watermark boosting just makes the existing problem worse
enough to notice on higher spec'd machines.

To fix the issue in this bug, watermark boosting can be disabled with the
following:
# echo 0 > /proc/sys/vm/watermark_boost_factor

There's really no harm in doing so, because watermark boosting is an
inherently broken feature...


So essentially, disabling watermark_boost_factor ensures effective swapping
and reduces page thrashing.

*I therefore suggest that Debian should tune vm.watermark_boost_factor=0 by
default.*

Cheers,

Steve.


Below are some more detailed specs of my Debian machine for reference:


  $ uname -a
Linux panseer 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64
GNU/Linux


  $ lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   46 bits physical, 48 bits virtual
CPU(s):  20
On-line CPU(s) list: 0-19
Thread(s) per core:  2
Core(