Martin Schwarz <debian-li...@alias.kuroi.de> writes: > > Here's the output from some commands I hope to be helpful: > > The machine in this example is a RADIUS server but has not even gone > productive ... no incoming client requests yet. (But the problem is not > related to the RADIUS server software - OSC Radiator - since the same > symptoms show on different machines: not only RADIUS servers but also > nameservers, shell servers or jumphosts, etc.) > > [values while the problem persists:] ... > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 34718 12.0 0.5 29596 5672 ? D 09:01 0:00 > /usr/bin/python3 -Es /usr/bin/lsb_release --short --description > root 26491 3.1 0.2 79328 2860 ? D 08:04 1:50 apt-get > update -qq > root 32551 6.8 0.2 119036 2800 ? D 08:51 0:43 > /usr/bin/python3 /usr/bin/unattended-upgrade
Disable this, do your upgrades by some schedule for the duration in which you're debugging this problem. Think about system orchestration tools with push mechanisms if you want to minimize RAM allocated to VMs. We're thinking about deploying ansible for patch management. > root 12792 2.2 0.1 159720 1748 ? D 06:06 3:54 > /usr/bin/perl -w /usr/bin/apt-show-versions -i > root 15502 2.4 0.1 167660 1608 ? D 06:25 3:51 > /usr/bin/perl -w /usr/bin/apt-show-versions -i Do they need to run on 6:06 and then parallel at 6:25? What's their process tree calling structure, ie. what's starting them? > root 34527 1.7 0.1 14096 1596 ? Ss 09:01 0:00 /bin/bash > /usr/bin/check_mk_agent Can you show a zoomed image of the memory graph prior to a problem? And a load graph of the same duration? I had some webservers which were also prone to death spiraling, the only real solution was to throw RAM at them until they were able to process the requests and to optimize the database indices to speed up the time spent fetching and sorting rows. Peter