Hi,

I would start by analyzing the memory status at the time of the OOM. There should be a some lines in journal/syslog were the kernel writes what the memory looked like and you can figure out why it had to kill a process.

Makes few sense that OOM triggers in 64GB hosts with just 24GB configured in VMs and, probably, less real usage. IMHO it's not VMs what fill your memory up to the point of OOM, but some other process, ZFS ARC, maybe even some mem leak. Maybe some process is producing severe memory fragmentation.

Regards,



On 7/7/25 11:26, Marco Gaiarin wrote:
We have upgraded a set of clusters from PVE6 to PVE8, and we have found that
in newer kernels, OOM is a bit more 'aggressive' and sometime kill a VMs.

Nodes have plently of RAM (64GB, VMs are 2-3, each 8GB ram), VMs have qemu
agent installed and ballooning enabled, but still sometime OOM happen.
Clearly, if get OOM the main VMs that have the local DNS, we get some
trouble.


I've looked in PVE wiki, but found nothing. There's some way to relax OOM,
or control their behaviour?

In nodes there's no swap, so probably the best thing to do (but the hardest
one ;-) is to setup some swap with a lower swappiness, but i'm seeking
feedback.


Thanks.

--


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to