Am 28.03.21 um 16:39 schrieb Stefan Esser:
35After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ...I have run some Monte-Carlo simulations for a few hours, with initially
processes running in parallel for some 10 seconds each. The load decreased over time since some parameter sets were faster to process. All in all 63000 processes ran within some 3 hours. When the system became idle, interactive performance was very bad. Running any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have to have this system working, I plan to reboot it later today, but will keep it in this state for some more time to see whether this state persists or whether the system recovers from it. Any ideas what might cause such a system state???
Seems that Mateusz Guzik was right to mention performance issues when the system is very low on vnodes. (Thanks!) I have been able to reproduce the issue and have checked vnode stats: kern.maxvnodes: 620370 kern.minvnodes: 155092 vm.stats.vm.v_vnodepgsout: 6890171 vm.stats.vm.v_vnodepgsin: 18475530 vm.stats.vm.v_vnodeout: 228516 vm.stats.vm.v_vnodein: 1592444 vfs.wantfreevnodes: 155092 vfs.freevnodes: 47 <----- obviously too low ... vfs.vnodes_created: 19554702 vfs.numvnodes: 621284 vfs.cache.debug.vnodes_cel_3_failures: 0 vfs.cache.stats.heldvnodes: 6412 The freevnodes value stayed in this region over several minutes, with typical program start times (e.g. for "uptime") in the region of 10 to 15 seconds. After rising maxvnodes to 2,000,000 form 600,000 the system performance is restored and I get: kern.maxvnodes: 2000000 kern.minvnodes: 500000 vm.stats.vm.v_vnodepgsout: 7875198 vm.stats.vm.v_vnodepgsin: 20788679 vm.stats.vm.v_vnodeout: 261179 vm.stats.vm.v_vnodein: 1817599 vfs.wantfreevnodes: 500000 vfs.freevnodes: 205988 <----- still a lot higher than wantfreevnodes vfs.vnodes_created: 19956502 vfs.numvnodes: 912880 vfs.cache.debug.vnodes_cel_3_failures: 0 vfs.cache.stats.heldvnodes: 20702 I do not know why the performance impact is so high - there are a few free vnodes (more than required for the shared libraries to start e.g. the uptime program). Most probably each attempt to get a vnode triggers a clean-up attempt that runs for a significant time, but has no chance to actually reach near the goal of 155k or 500k free vnodes. Anyway, kern.maxvnodes can be changed at run-time and it is thus easy to fix. It seems that no message is logged to report this situation. A rate limited hint to rise the limit should help other affected users. Regards, STefan
OpenPGP_signature
Description: OpenPGP digital signature