I have been observing occasional bouts of high load averages on several servers I administer and I am trying to find the cause. (I monitor these machines so that I can implement corrective measures in case of any malicious or abnormal activity. I think this is benign, but I’d still like to find the cause.)
Once the high load average starts, only a reboot seems to (temporarily) return the values to their normal levels. The actual CPU usage (as measured by vmstat) stays low even if the load average is elevated. The servers are VMs running on a VMWare host (ESXi). This was seen with OpenBSD 7.3 and 7.4 amd64. I can not determine anything inside the VM that causes this. There seems to be no correlation to pfstat(8) graphs, log entries, known events, or anything else I can determine. restarting all of the rc.d services never made any difference. Could this be caused by something on the VMWare host machine? (The host seems to be operating at limit regarding RAM for example. But the VM is only using the normal percentage of its allocated RAM — way below 100% and very constant usage, no swap.) How can I further debug this, keeping in mind that these are production machines and experimentation is limited to benign things that don’t cause outages. Thanks! Mike