Re: [SOLVED] Re: Strange behavior after running under high load
Konstantin Belousov writes: > > B) We lack a nuanced call-back to tell the subsystems to release some of > > their memory "without major delay". > The delay in the wall clock sense does not drive the issue. I didnt say anything about "wall clock" and you're missing my point by a wide margin. We need to make major memory consumers, like vnodes take action *before* shortages happen, so that *when* they happen, a lot of memory can be released to relive them. > We cannot expect any io to proceed while we are low on memory [...] Which is precisely why the top level goal should be for that to never happen, while still allowing the freeable" memory to be used as a cache as much as possible. > > C) We have never attempted to enlist userland, where jemalloc often hang on > > to a lot of unused VM pages. > > > The userland does not add to this problem, [...] No, but userland can help solve it: The unused pages from jemalloc/userland can very quickly be released to relieve any imminent shortage the kernel might have. As can pages from vnodes, and for that matter socket buffers. But there are always costs, actual costs, ie: what it will take to release the memory (locking, VM mappings, washing) and potential costs (lack of future caching opportunities). These costs need to be presented to the central memory allocator, so when it decides back-pressure is appropriate, it can decide who to punk for how much memory. > But normally operating system does not have an issue with user pages. Only if you disregard all non-UNIX operating systems. Many other kernels have cooperated with userland to balance memory (and for that matter disk-space). Just imagine how much better the desktop experience would be, if we could send SIGVM to firefox to tell it stop being a memory-pig. (At least two of the major operating systems in the desktop world does something like that today.) > Io latency is not the factor there. We must avoid situations where > instantiating a vnode stalls waiting for KVA to appear, similarly we > must avoid system state where vnodes allocation consumed so much kmem > that other allocations stall. My argument is the precise opposite: We must make vnodes and the allocations they cause responsive to the sytems overall memory availability, well in advance of the shortage happening in the first place. > Quite indicative is that we do not shrink the vnode list on low memory > events. Vnlru also does not account for the memory pressure. The only reason we do not, is that we cannot tell definitively if freeing a vnode will cause disk-I/O (which may not matter with SSD's) or even how much memory it might free, if anything. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
On Sun, Apr 04, 2021 at 07:01:44PM +, Poul-Henning Kamp wrote: > > Konstantin Belousov writes: > > > But what would you provide as the input for PID controller, and what would > > be the targets? > > Viewing this purely as a vnode related issue is wrong, this is about memory > allocation in general. > > We may or may not want a PID regulator, but putting it on counts of vnode > would not improve things, precisely, as you point out, because the amount of > memory a vnode ties up has enormous variance. > Yes > > We should focus on the end goal: To ensure "sufficient" memory can always be > allocated for any purpose "without major delay". > and no > > Architecturally there are three major problems: > > A) While each subsystem generally have a good idea about memory that can be > released "without major delay", the information does not trickle up through a > summarizing NUMA aware tree. > > B) We lack a nuanced call-back to tell the subsystems to release some of > their memory "without major delay". The delay in the wall clock sense does not drive the issue. We cannot expect any io to proceed while we are low on memory, in the sense that allocators cannot respond right now. More and more, our io subsystem requires allocating memory to make any progress with io. This is already quite bad with geom, although some hacks make it not too outstanding. It is very bad with ZFS, where swap on zvols causes deadlocks almost immediately. > > C) We have never attempted to enlist userland, where jemalloc often hang on > to a lot of unused VM pages. > The userland does not add to this problem, because pagedaemon typically has enough processing power to convert user-allocated pages into usable clean or free pages. Of course, if there is no swap and dirty anon page cannot be launder, the issue would accumulate. But normally operating system does not have an issue with user pages. > > As far as vnodes go: > > > It used to be that "without major delay" meant "without disk-I/O" which again > led to the "dirty buffers/VM pages" heuristic. > > With microsecond SSD backing store, that heuristic is not only invalid, it is > down-right harmful in many cases. > > GEOM maintains estimates of per-provider latency and VM+VFS should use that > to schedule write-back so that more of it happens outside rush-hour, in order > to increase the amount of memory which can be released "without major delay". > > Today that happens largely as a side effect of the periodic syncer, which > does a really bad job at it, because it still expects VAX-era hardware > performance and workloads. > Io latency is not the factor there. We must avoid situations where instantiating a vnode stalls waiting for KVA to appear, similarly we must avoid system state where vnodes allocation consumed so much kmem that other allocations stall. Quite indicative is that we do not shrink the vnode list on low memory events. Vnlru also does not account for the memory pressure. Problem is that it is not clear how to express that relations between safe allocators state and our desire to cache file system data, which is bound to the vnode identity. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
Konstantin Belousov writes: > But what would you provide as the input for PID controller, and what would be > the targets? Viewing this purely as a vnode related issue is wrong, this is about memory allocation in general. We may or may not want a PID regulator, but putting it on counts of vnode would not improve things, precisely, as you point out, because the amount of memory a vnode ties up has enormous variance. We should focus on the end goal: To ensure "sufficient" memory can always be allocated for any purpose "without major delay". Architecturally there are three major problems: A) While each subsystem generally have a good idea about memory that can be released "without major delay", the information does not trickle up through a summarizing NUMA aware tree. B) We lack a nuanced call-back to tell the subsystems to release some of their memory "without major delay". C) We have never attempted to enlist userland, where jemalloc often hang on to a lot of unused VM pages. As far as vnodes go: It used to be that "without major delay" meant "without disk-I/O" which again led to the "dirty buffers/VM pages" heuristic. With microsecond SSD backing store, that heuristic is not only invalid, it is down-right harmful in many cases. GEOM maintains estimates of per-provider latency and VM+VFS should use that to schedule write-back so that more of it happens outside rush-hour, in order to increase the amount of memory which can be released "without major delay". Today that happens largely as a side effect of the periodic syncer, which does a really bad job at it, because it still expects VAX-era hardware performance and workloads. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
On Sun, Apr 04, 2021 at 08:45:41AM -0600, Warner Losh wrote: > On Sun, Apr 4, 2021, 5:51 AM Mateusz Guzik wrote: > > > On 4/3/21, Poul-Henning Kamp wrote: > > > > > > Mateusz Guzik writes: > > > > > >> It is high because of this: > > >> msleep(_sig, _list_mtx, PVFS, "vlruwk", > > >> hz); > > >> > > >> i.e. it literally sleeps for 1 second. > > > > > > Before the line looked like that, it slept on "lbolt" aka "lightning > > > bolt" which was woken once a second. > > > > > > The calculations which come up with those "constants" have always > > > been utterly bogus math, not quite "square-root of shoe-size > > > times sun-angle in Patagonia", but close. > > > > > > The original heuristic came from university environments with tons of > > > students doing assignments and nethack behind VT102 terminals, on > > > filesystems where files only seldom grew past 100KB, so it made sense > > > to scale number of vnodes to how much RAM was in the system, because > > > that also scaled the size of the buffer-cache. > > > > > > With a merged VM buffer-cache, whatever validity that heuristic had > > > was lost, and we tweaked the bogomath in various ways until it > > > seemed to mostly work, trusting the users for which it did not, to > > > tweak things themselves. > > > > > > Please dont tweak the Finagle Constants again. > > > > > > Rip all that crap out and come up with something fundamentally better. > > > > > > > Some level of pacing is probably useful to control total memory use -- > > there can be A LOT of memory tied up in mere fact that vnode is fully > > cached. imo the thing to do is to come up with some watermarks to be > > revisited every 1-2 years and to change the behavior when they get > > exceeded -- try to whack some stuff but in face of trouble just go > > ahead and alloc without sleep 1. Should the load spike sort itself > > out, vnlru will slowly get things down to the watermark. If the > > watermark is too low, maybe it can autotune. Bottom line is that even > > with the current idea of limiting preferred total vnode count, the > > corner case behavior can be drastically better suffering SOME perf > > loss from recycling vnodes, but not sleeping for a second for every > > single one. > > > > I'd suggest that going directly to a PID to control this would be better > than the watermarks. That would give a smoother response than high/low > watermarks would. While you'd need some level to keep things at still, the > laundry stuff has shown the precise level of that level is less critical > than the watermarks. But what would you provide as the input for PID controller, and what would be the targets? The main reason for the (almost) hard cap on the number of vnodes is not that excessive number of vnodes is harmful by itself. Each allocated vnode typically implies existence of several second-order allocations that accumulate into significant KVA usage: - filesystem inode - vm object - namecache entries There are usually even more allocations, third-order, for instance UFS inode carries a pointer to the dinode copy in RAM, and possibly EA area. And of course, the fact that vnode names pages in the page cache owned by corresponding file, i.e. amount of allocated vnodes regulates amount of work for pagedaemon. We currently trying to put some rational limit for total number of vnodes, estimating both KVA and physical memory consumed by them. If you remove that limit, you need to ensure that we do not create OOM situation either for KVA or for physical memory just by creating too many vnodes, otherwise system cannot get out of it. So there are some combinations of machine config (RAM) and loads where default settings are arguably low. Raising the limits needs to handle the indirect resource usage from vnode. I do not know how to write the feedback formula, taking into account all the consequences of the vnode existence, and that effects depend also on the underlying filesystem and patterns of VM paging usage. In this sense ZFS is probably simplest case, because its caching subsystem is autonomous. While UFS or NFS are tightly integrated with VM. > > Warner > > I think the notion of 'struct vnode' being a separately allocated > > object is not very useful and it comes with complexity (and happens to > > suffer from several bugs). > > > > That said, the easiest and safest thing to do in the meantime is to > > bump the limit. Perhaps the sleep can be whacked as it is which would > > largely sort it out. > > > > -- > > Mateusz Guzik > > ___ > > freebsd-current@freebsd.org mailing list > > https://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > > > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to
Re: [SOLVED] Re: Strange behavior after running under high load
On Sun, Apr 4, 2021, 5:51 AM Mateusz Guzik wrote: > On 4/3/21, Poul-Henning Kamp wrote: > > > > Mateusz Guzik writes: > > > >> It is high because of this: > >> msleep(_sig, _list_mtx, PVFS, "vlruwk", > >> hz); > >> > >> i.e. it literally sleeps for 1 second. > > > > Before the line looked like that, it slept on "lbolt" aka "lightning > > bolt" which was woken once a second. > > > > The calculations which come up with those "constants" have always > > been utterly bogus math, not quite "square-root of shoe-size > > times sun-angle in Patagonia", but close. > > > > The original heuristic came from university environments with tons of > > students doing assignments and nethack behind VT102 terminals, on > > filesystems where files only seldom grew past 100KB, so it made sense > > to scale number of vnodes to how much RAM was in the system, because > > that also scaled the size of the buffer-cache. > > > > With a merged VM buffer-cache, whatever validity that heuristic had > > was lost, and we tweaked the bogomath in various ways until it > > seemed to mostly work, trusting the users for which it did not, to > > tweak things themselves. > > > > Please dont tweak the Finagle Constants again. > > > > Rip all that crap out and come up with something fundamentally better. > > > > Some level of pacing is probably useful to control total memory use -- > there can be A LOT of memory tied up in mere fact that vnode is fully > cached. imo the thing to do is to come up with some watermarks to be > revisited every 1-2 years and to change the behavior when they get > exceeded -- try to whack some stuff but in face of trouble just go > ahead and alloc without sleep 1. Should the load spike sort itself > out, vnlru will slowly get things down to the watermark. If the > watermark is too low, maybe it can autotune. Bottom line is that even > with the current idea of limiting preferred total vnode count, the > corner case behavior can be drastically better suffering SOME perf > loss from recycling vnodes, but not sleeping for a second for every > single one. > I'd suggest that going directly to a PID to control this would be better than the watermarks. That would give a smoother response than high/low watermarks would. While you'd need some level to keep things at still, the laundry stuff has shown the precise level of that level is less critical than the watermarks. Warner I think the notion of 'struct vnode' being a separately allocated > object is not very useful and it comes with complexity (and happens to > suffer from several bugs). > > That said, the easiest and safest thing to do in the meantime is to > bump the limit. Perhaps the sleep can be whacked as it is which would > largely sort it out. > > -- > Mateusz Guzik > ___ > freebsd-current@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
On 4/3/21, Poul-Henning Kamp wrote: > > Mateusz Guzik writes: > >> It is high because of this: >> msleep(_sig, _list_mtx, PVFS, "vlruwk", >> hz); >> >> i.e. it literally sleeps for 1 second. > > Before the line looked like that, it slept on "lbolt" aka "lightning > bolt" which was woken once a second. > > The calculations which come up with those "constants" have always > been utterly bogus math, not quite "square-root of shoe-size > times sun-angle in Patagonia", but close. > > The original heuristic came from university environments with tons of > students doing assignments and nethack behind VT102 terminals, on > filesystems where files only seldom grew past 100KB, so it made sense > to scale number of vnodes to how much RAM was in the system, because > that also scaled the size of the buffer-cache. > > With a merged VM buffer-cache, whatever validity that heuristic had > was lost, and we tweaked the bogomath in various ways until it > seemed to mostly work, trusting the users for which it did not, to > tweak things themselves. > > Please dont tweak the Finagle Constants again. > > Rip all that crap out and come up with something fundamentally better. > Some level of pacing is probably useful to control total memory use -- there can be A LOT of memory tied up in mere fact that vnode is fully cached. imo the thing to do is to come up with some watermarks to be revisited every 1-2 years and to change the behavior when they get exceeded -- try to whack some stuff but in face of trouble just go ahead and alloc without sleep 1. Should the load spike sort itself out, vnlru will slowly get things down to the watermark. If the watermark is too low, maybe it can autotune. Bottom line is that even with the current idea of limiting preferred total vnode count, the corner case behavior can be drastically better suffering SOME perf loss from recycling vnodes, but not sleeping for a second for every single one. I think the notion of 'struct vnode' being a separately allocated object is not very useful and it comes with complexity (and happens to suffer from several bugs). That said, the easiest and safest thing to do in the meantime is to bump the limit. Perhaps the sleep can be whacked as it is which would largely sort it out. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
Mateusz Guzik writes: > It is high because of this: > msleep(_sig, _list_mtx, PVFS, "vlruwk", hz); > > i.e. it literally sleeps for 1 second. Before the line looked like that, it slept on "lbolt" aka "lightning bolt" which was woken once a second. The calculations which come up with those "constants" have always been utterly bogus math, not quite "square-root of shoe-size times sun-angle in Patagonia", but close. The original heuristic came from university environments with tons of students doing assignments and nethack behind VT102 terminals, on filesystems where files only seldom grew past 100KB, so it made sense to scale number of vnodes to how much RAM was in the system, because that also scaled the size of the buffer-cache. With a merged VM buffer-cache, whatever validity that heuristic had was lost, and we tweaked the bogomath in various ways until it seemed to mostly work, trusting the users for which it did not, to tweak things themselves. Please dont tweak the Finagle Constants again. Rip all that crap out and come up with something fundamentally better. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 p...@freebsd.org | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: [SOLVED] Re: Strange behavior after running under high load
On 4/2/21, Stefan Esser wrote: > Am 28.03.21 um 16:39 schrieb Stefan Esser: >> After a period of high load, my now idle system needs 4 to 10 seconds to >> run any trivial command - even after 20 minutes of no load ... >> >> >> I have run some Monte-Carlo simulations for a few hours, with initially > 35 >> processes running in parallel for some 10 seconds each. >> >> The load decreased over time since some parameter sets were faster to >> process. >> All in all 63000 processes ran within some 3 hours. >> >> When the system became idle, interactive performance was very bad. >> Running >> any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I >> have >> to have this system working, I plan to reboot it later today, but will >> keep >> it in this state for some more time to see whether this state persists or >> whether the system recovers from it. >> >> Any ideas what might cause such a system state??? > > Seems that Mateusz Guzik was right to mention performance issues when > the system is very low on vnodes. (Thanks!) > > I have been able to reproduce the issue and have checked vnode stats: > > kern.maxvnodes: 620370 > kern.minvnodes: 155092 > vm.stats.vm.v_vnodepgsout: 6890171 > vm.stats.vm.v_vnodepgsin: 18475530 > vm.stats.vm.v_vnodeout: 228516 > vm.stats.vm.v_vnodein: 1592444 > vfs.wantfreevnodes: 155092 > vfs.freevnodes: 47<- obviously too low ... > vfs.vnodes_created: 19554702 > vfs.numvnodes: 621284 > vfs.cache.debug.vnodes_cel_3_failures: 0 > vfs.cache.stats.heldvnodes: 6412 > > The freevnodes value stayed in this region over several minutes, with > typical program start times (e.g. for "uptime") in the region of 10 to > 15 seconds. > > After rising maxvnodes to 2,000,000 form 600,000 the system performance > is restored and I get: > > kern.maxvnodes: 200 > kern.minvnodes: 50 > vm.stats.vm.v_vnodepgsout: 7875198 > vm.stats.vm.v_vnodepgsin: 20788679 > vm.stats.vm.v_vnodeout: 261179 > vm.stats.vm.v_vnodein: 1817599 > vfs.wantfreevnodes: 50 > vfs.freevnodes: 205988<- still a lot higher than wantfreevnodes > vfs.vnodes_created: 19956502 > vfs.numvnodes: 912880 > vfs.cache.debug.vnodes_cel_3_failures: 0 > vfs.cache.stats.heldvnodes: 20702 > > I do not know why the performance impact is so high - there are a few > free vnodes (more than required for the shared libraries to start e.g. > the uptime program). Most probably each attempt to get a vnode triggers > a clean-up attempt that runs for a significant time, but has no chance > to actually reach near the goal of 155k or 500k free vnodes. > It is high because of this: msleep(_sig, _list_mtx, PVFS, "vlruwk", hz); i.e. it literally sleeps for 1 second. The vnode limit is probably too conservative and behavior when limit is reached is rather broken. Probably the thing to do is to let allocations go through while kicking vnlru to free some stuff up. I'll have to sleep on it. > Anyway, kern.maxvnodes can be changed at run-time and it is thus easy > to fix. It seems that no message is logged to report this situation. > A rate limited hint to rise the limit should help other affected users. > > Regards, STefan > > -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
[SOLVED] Re: Strange behavior after running under high load
Am 28.03.21 um 16:39 schrieb Stefan Esser: After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ... I have run some Monte-Carlo simulations for a few hours, with initially 35 processes running in parallel for some 10 seconds each. The load decreased over time since some parameter sets were faster to process. All in all 63000 processes ran within some 3 hours. When the system became idle, interactive performance was very bad. Running any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have to have this system working, I plan to reboot it later today, but will keep it in this state for some more time to see whether this state persists or whether the system recovers from it. Any ideas what might cause such a system state??? Seems that Mateusz Guzik was right to mention performance issues when the system is very low on vnodes. (Thanks!) I have been able to reproduce the issue and have checked vnode stats: kern.maxvnodes: 620370 kern.minvnodes: 155092 vm.stats.vm.v_vnodepgsout: 6890171 vm.stats.vm.v_vnodepgsin: 18475530 vm.stats.vm.v_vnodeout: 228516 vm.stats.vm.v_vnodein: 1592444 vfs.wantfreevnodes: 155092 vfs.freevnodes: 47 <- obviously too low ... vfs.vnodes_created: 19554702 vfs.numvnodes: 621284 vfs.cache.debug.vnodes_cel_3_failures: 0 vfs.cache.stats.heldvnodes: 6412 The freevnodes value stayed in this region over several minutes, with typical program start times (e.g. for "uptime") in the region of 10 to 15 seconds. After rising maxvnodes to 2,000,000 form 600,000 the system performance is restored and I get: kern.maxvnodes: 200 kern.minvnodes: 50 vm.stats.vm.v_vnodepgsout: 7875198 vm.stats.vm.v_vnodepgsin: 20788679 vm.stats.vm.v_vnodeout: 261179 vm.stats.vm.v_vnodein: 1817599 vfs.wantfreevnodes: 50 vfs.freevnodes: 205988 <- still a lot higher than wantfreevnodes vfs.vnodes_created: 19956502 vfs.numvnodes: 912880 vfs.cache.debug.vnodes_cel_3_failures: 0 vfs.cache.stats.heldvnodes: 20702 I do not know why the performance impact is so high - there are a few free vnodes (more than required for the shared libraries to start e.g. the uptime program). Most probably each attempt to get a vnode triggers a clean-up attempt that runs for a significant time, but has no chance to actually reach near the goal of 155k or 500k free vnodes. Anyway, kern.maxvnodes can be changed at run-time and it is thus easy to fix. It seems that no message is logged to report this situation. A rate limited hint to rise the limit should help other affected users. Regards, STefan OpenPGP_signature Description: OpenPGP digital signature
Re: Strange behavior after running under high load
Am 29.03.21 um 08:45 schrieb Andrea Venturoli: On 3/28/21 4:39 PM, Stefan Esser wrote: After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ... High CPU load or high disk load? High CPU load, 3 times the number of CPU threads in this particular batch run. Less than 10 files of less than 100 KB per second have been written. ZFS? Snapshots? ZFS and automatic snapshots of the file system every hour. 12.x? 13.x? -CURRENT as of some 24 hours before the issue occurred: FreeBSD 14.0-CURRENT #33 main-n245694-90d2f7c413f9-dirty: Sat Mar 27 15:35:37 CET 2021 I've seen something similar: after a high load period, system crawled so much that services were not answering in a reasonable time (e.g. mail would fail with "no such mailbox"!). Program start-up was very slow, but interactive response once running was normal (e.g. execution of internal shell commands like "echo *"). Even rebooting didn't fix it, until I deleted some autosnapshots. Rebooting fixed it on my case. top or other tools would show no disk activity, although the disks were working as mad. No disk activity in my case. The system was idle without any load, but the issue persisted over many hours (up to the moment when I decided to reboot the system to get it back into a usable state). Not sure it's the same case you experienced, though. Probably not, but you seem to have hit another case were a resource limit was reached and the system did not gracefully deal with the situation. Thanks for replying ... Regards, STefan OpenPGP_signature Description: OpenPGP digital signature
Re: Strange behavior after running under high load
Am 29.03.21 um 03:11 schrieb Mateusz Guzik: This may be the problem fixed in e9272225e6bed840b00eef1c817b188c172338ee ("vfs: fix vnlru marker handling for filtered/unfiltered cases"). My system was up for less than 24 hours and using a kernel and world built on the latest -CURRENT of less than 1 hour before the reboot: FreeBSD 14.0-CURRENT #33 main-n245694-90d2f7c413f9-dirty: Sat Mar 27 15:35:37 CET 2021 The fix had been committed some 9 days before that kernel was built. However, there is a long standing performance bug where if vnode limit is hit, and there is nothing to reclaim, the code is just going to sleep for one second. There are no log entries that give any hint to what occurred. But I do assume that these events are not logged ... (?) Yes, I could have checked that and will do so if the issue occurs again. I plan to generate more output files in the same way that triggered the issue yesterday, and since the system is very slow but still able to execute commands, I can try to debug it, just have to know where to start looking ... Thank you for your reply! Regards, STefan OpenPGP_signature Description: OpenPGP digital signature
Re: Strange behavior after running under high load
On 3/28/21 4:39 PM, Stefan Esser wrote: After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ... High CPU load or high disk load? ZFS? Snapshots? 12.x? 13.x? I've seen something similar: after a high load period, system crawled so much that services were not answering in a reasonable time (e.g. mail would fail with "no such mailbox"!). Even rebooting didn't fix it, until I deleted some autosnapshots. top or other tools would show no disk activity, although the disks were working as mad. Not sure it's the same case you experienced, though. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Strange behavior after running under high load
This may be the problem fixed in e9272225e6bed840b00eef1c817b188c172338ee ("vfs: fix vnlru marker handling for filtered/unfiltered cases"). However, there is a long standing performance bug where if vnode limit is hit, and there is nothing to reclaim, the code is just going to sleep for one second. On 3/28/21, Stefan Esser wrote: > Am 28.03.21 um 17:44 schrieb Andriy Gapon: >> On 28/03/2021 17:39, Stefan Esser wrote: >>> After a period of high load, my now idle system needs 4 to 10 seconds to >>> run any trivial command - even after 20 minutes of no load ... >>> >>> >>> I have run some Monte-Carlo simulations for a few hours, with initially >>> 35 >>> processes running in parallel for some 10 seconds each. >> >> I saw somewhat similar symptoms with 13-CURRENT some time ago. >> To me it looked like even small kernel memory allocations took a very long >> time. >> But it was hard to properly diagnose that as my favorite tool, dtrace, was >> also >> affected by the same problem. > > That could have been the case - but I had to reboot to recover the system. > > I had let it sit idle fpr a few hours and the last "time uptime" before > the reboot took 15 second real time to complete. > > Response from within the shell (e.g. "echo *") was instantaneous, though. > > I tried to trace the program execution of "uptime" with truss and found, > that the loading of shared libraries proceeded at about one or two per > second until all were attached and then the program quickly printed the > expected results. > > I could probably recreate the issue by running the same set of programs > that triggered it a few hours ago, but this is a production system and > I need it to be operational through the week ... > > Regards, STefan > > -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Strange behavior after running under high load
Am 28.03.21 um 17:44 schrieb Andriy Gapon: On 28/03/2021 17:39, Stefan Esser wrote: After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ... I have run some Monte-Carlo simulations for a few hours, with initially 35 processes running in parallel for some 10 seconds each. I saw somewhat similar symptoms with 13-CURRENT some time ago. To me it looked like even small kernel memory allocations took a very long time. But it was hard to properly diagnose that as my favorite tool, dtrace, was also affected by the same problem. That could have been the case - but I had to reboot to recover the system. I had let it sit idle fpr a few hours and the last "time uptime" before the reboot took 15 second real time to complete. Response from within the shell (e.g. "echo *") was instantaneous, though. I tried to trace the program execution of "uptime" with truss and found, that the loading of shared libraries proceeded at about one or two per second until all were attached and then the program quickly printed the expected results. I could probably recreate the issue by running the same set of programs that triggered it a few hours ago, but this is a production system and I need it to be operational through the week ... Regards, STefan OpenPGP_signature Description: OpenPGP digital signature
Re: Strange behavior after running under high load
On 28/03/2021 17:39, Stefan Esser wrote: > After a period of high load, my now idle system needs 4 to 10 seconds to > run any trivial command - even after 20 minutes of no load ... > > > I have run some Monte-Carlo simulations for a few hours, with initially 35 > processes running in parallel for some 10 seconds each. I saw somewhat similar symptoms with 13-CURRENT some time ago. To me it looked like even small kernel memory allocations took a very long time. But it was hard to properly diagnose that as my favorite tool, dtrace, was also affected by the same problem. > The load decreased over time since some parameter sets were faster to process. > All in all 63000 processes ran within some 3 hours. > > When the system became idle, interactive performance was very bad. Running > any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have > to have this system working, I plan to reboot it later today, but will keep > it in this state for some more time to see whether this state persists or > whether the system recovers from it. > > Any ideas what might cause such a system state??? > > > The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM. > > The following are a few commands that I have tried on this now practically > idle system: > > $ time vmstat -n 1 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id > 2 0 0 26G 922M 1.2K 1 4 0 1.4K 239 0 482 7.2K 934 11 1 88 > > real 0m9,357s > user 0m0,001s > sys 0m0,018 > > wait 1 minute > > $ time vmstat -n 1 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id > 1 0 0 26G 925M 1.2K 1 4 0 1.4K 239 0 482 7.2K 933 11 1 88 > > real 0m9,821s > user 0m0,003s > sys 0m0,389s > > $ systat -vm > > 4 users Load 0.10 0.72 3.57 Mar 28 16:15 > Mem usage: 97%Phy 55%Kmem VN PAGER SWAP PAGER > Mem: REAL VIRTUAL in out in out > Tot Share Tot Share Free count > Act 2387M 460K 26481M 460K 923M pages > All 2605M 218M 27105M 572M ioflt Interrupts > Proc: cow 132 total > r p d s w Csw Trp Sys Int Sof Flt 52 zfod 96 > hpet0:t0 > 316 356 39 225 132 21 53 ozfod nvme0:admi > %ozfod nvme0:io0 > 0.1%Sys 0.0%Intr 0.0%User 0.0%Nice 99.9%Idle daefr nvme0:io1 > | | | | | | | | | | | prcfr nvme0:io2 > totfr nvme0:io3 > dtbuf react nvme0:io4 > Namei Name-cache Dir-cache 620370 maxvn pdwak nvme0:io5 > Calls hits % hits % 627486 numvn 168 pdpgs 27 xhci0 > 66 > 18 14 78 65 frevn intrn ahci0 67 > 17539M wire xhci1 68 > Disks nvd0 ada0 ada1 ada2 ada3 ada4 cd0 430M act 9 re0 69 > KB/t 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12696M inact hdac0 76 > tps 0 0 0 0 0 0 0 54276K laund vgapci0 78 > MB/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 923M free > %busy 0 0 0 0 0 0 0 0 buf > > 5 minutes later > > $ time vmstat -n 1 > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id > 1 0 0 26G 922M 1.2K 1 4 0 1.4K 239 0 481 7.2K 931 11 1 88 > > real 0m4,270s > user 0m0,000s > sys 0m0,019s > > $ time uptime > 16:20 up 23:23, 4 users, load averages: 0,17 0,39 2,68 > > real 0m10,840s > user 0m0,001s > sys 0m0,374s > > $ time uptime > 16:37 up 23:40, 4 users, load averages: 0,29 0,27 0,96 > > real 0m9,273s > user 0m0,000s > sys 0m0,020s > -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Strange behavior after running under high load
After a period of high load, my now idle system needs 4 to 10 seconds to run any trivial command - even after 20 minutes of no load ... I have run some Monte-Carlo simulations for a few hours, with initially 35 processes running in parallel for some 10 seconds each. The load decreased over time since some parameter sets were faster to process. All in all 63000 processes ran within some 3 hours. When the system became idle, interactive performance was very bad. Running any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I have to have this system working, I plan to reboot it later today, but will keep it in this state for some more time to see whether this state persists or whether the system recovers from it. Any ideas what might cause such a system state??? The system has a Ryzen 5 3600 CPU (6 core/12 threads) and 32 GB or RAM. The following are a few commands that I have tried on this now practically idle system: $ time vmstat -n 1 procsmemorypage disks faults cpu r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id 2 0 0 26G 922M 1.2K 1 4 0 1.4K 239 0 482 7.2K 934 11 1 88 real0m9,357s user0m0,001s sys 0m0,018 wait 1 minute $ time vmstat -n 1 procsmemorypage disks faults cpu r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id 1 0 0 26G 925M 1.2K 1 4 0 1.4K 239 0 482 7.2K 933 11 1 88 real0m9,821s user0m0,003s sys 0m0,389s $ systat -vm 4 usersLoad 0.10 0.72 3.57 Mar 28 16:15 Mem usage: 97%Phy 55%Kmem VN PAGER SWAP PAGER Mem: REAL VIRTUAL in out in out Tot Share TotShare Free count Act 2387M460K 26481M 460K 923M pages All 2605M218M 27105M 572Mioflt Interrupts Proc: cow 132 total r p ds w Csw Trp Sys Int Sof Flt52 zfod 96 hpet0:t0 316 356 39 225 132 21 53 ozfod nvme0:admi %ozfod nvme0:io0 0.1%Sys 0.0%Intr 0.0%User 0.0%Nice 99.9%Idle daefr nvme0:io1 |||||||||||prcfr nvme0:io2 totfr nvme0:io3 dtbuf react nvme0:io4 Namei Name-cache Dir-cache620370 maxvn pdwak nvme0:io5 Callshits %hits %627486 numvn 168 pdpgs27 xhci0 66 18 14 7865 frevn intrn ahci0 67 17539M wire xhci1 68 Disks nvd0 ada0 ada1 ada2 ada3 ada4 cd0 430M act 9 re0 69 KB/t 0.00 0.00 0.00 0.00 0.00 0.00 0.00 12696M inact hdac0 76 tps 0 0 0 0 0 0 0 54276K laund vgapci0 78 MB/s 0.00 0.00 0.00 0.00 0.00 0.00 0.00 923M free %busy 0 0 0 0 0 0 0 0 buf 5 minutes later $ time vmstat -n 1 procsmemorypage disks faults cpu r b w avm fre flt re pi po fr sr nv0 in sy cs us sy id 1 0 0 26G 922M 1.2K 1 4 0 1.4K 239 0 481 7.2K 931 11 1 88 real0m4,270s user0m0,000s sys 0m0,019s $ time uptime 16:20 up 23:23, 4 users, load averages: 0,17 0,39 2,68 real0m10,840s user0m0,001s sys 0m0,374s $ time uptime 16:37 up 23:40, 4 users, load averages: 0,29 0,27 0,96 real0m9,273s user0m0,000s sys 0m0,020s OpenPGP_signature Description: OpenPGP digital signature