Re: [systemd-devel] systemd-journald may crash during memory pressure
Am Sat, 10 Feb 2018 09:39:35 -0800 schrieb vcaputo: >> After some more research, I found that vm.watermark_scale_factor may be >> the knob I am looking for. I'm going to watch behavior now with a >> higher factor (default = 10, now 200). >> >> > Have you reporteed this to the kernel maintainers? LKML? No, not yet. I think they are aware of the issues as there's still on- going work to memory allocations within kernel threads, and there's perceivable improvement with every new kernel version. Especially, btrfs has seen a few patches in this area. > While this is interesting to read on systemd-devel, it's not right > venue. What you describe sounds like a regression that probably should > be improved upon. I know it's mostly off-topic. But the problem is most visible in systemd- journald and I think there are some users here which may have a better understanding of the underlying problem, or maybe even found solutions to it. One approach for me was using systemd specific slices. So it may be interesting to other people. > Also, out of curiosity, are you running dmcrypt in this scenario? If > so, is swap on dmcrypt as well? No, actually not. I'm using bcache for rootfs which may have similar implications to memory allocations. Swap is just plain swap distributed across 4 disks. If I understand correctly, dmcrypt may expose this problem further because it needs to "double buffer" memory while passing it further down the storage layer. I had zswap enabled previously which may expose this problem, too. I now disabled it and later enabled THP again. THP now runs very well again. Looks like zswap and THP don't play well together. OTOH, these options were switched on and off during different kernel versions. So it may also be an effect of fixes in newer kernels. -- Regards, Kai Replies to list-only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-journald may crash during memory pressure
On Sat, Feb 10, 2018 at 03:05:16PM +0100, Kai Krakow wrote: > Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow: > > > Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala: > > > >> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote: > >>> This last log lines indicates journald wasn't scheduled for a long > >>> time which caused the watchdog to hit and journald was aborted. > >>> Consider increasing the watchdog timeout if your system is indeed that > >>> loaded and that's is supposed to be an OK thing... > >> > >> BTW I've seen the same behavior on a system with a single active > >> process that uses enough memory to trigger significant swap use. I > >> wonder if there has been a regression in the kernel causing misbehavior > >> when swapping? The problems aren't specific to journald - desktop > >> environment can totally freeze too etc. > > > > This problem seems to be there since kernel 4.9 which was a real pita in > > this regard. It's progressively becoming better since kernel 4.10. The > > kernel seems trying to prevent swapping at any cost since then, at least > > at the cost of much higher latency, and at the cost of pushing all cache > > out of RAM. > > > > The result is processes stuck for easily 30 seconds and more during > > memory pressure. Sometimes I see the kernel loudly complaining in dmesg > > about high wait times for allocating RAM, especially from the btrfs > > module. Thus, the biggest problem may be that kernel threads itself get > > stuck in memory allocations and are a victim of high latency. > > > > Currently I'm running my user session in a slice with max 80% RAM which > > seems to help. It helps not discarding all cache. I also put some > > potentially high memory users (regarding cache and/or resident mem) into > > slices with carefully selected memory limits (backup and maintenance > > services). Slices limited in such a way will start swapping before cache > > is discarded and everything works better again. Part of this problem may > > be that I have one process running which mmaps and locks 1G of memory > > (bees, a btrfs deduplicator). > > > > This system has 16G of RAM which is usually plenty but I use tmpfs to > > build packages in Gentoo, and while that worked wonderfully before 4.9, > > I have to be really careful now. The kernel happily throws away cache > > instead of swapping early. Setting vm.swappiness differently seems to > > have no perceivable effect. > > > > Software that uses mmap is the first latency victim of this new > > behavior. > > As such, also systemd-journald seems to be hit hard by this. > > > > After the system recovered from high memory pressure (which can take > > 10-15 minutes, resulting in a loadavg of 400+), it ends up with some > > gigabytes of inactive memory in the swap which it will only swap back in > > then during shutdown (which will also take some minutes then). > > > > The problem since 4.9 seems to be that the kernel tends to do swap > > storms instead of constantly swapping out memory at low rates during > > usage. The swap storms totally thrash the system. > > > > Before 4.9, the kernel had no such latency spikes under memory pressure. > > Swap would usually grew slowly over time, and the system felt sluggish > > one or another time but still usable wrt latency. I usually ended up > > with 5-8G of swap usage, and that was no problem. Now, swap only > > significantly grows during swap storms with an unusable system for many > > minutes, with latencies of 10+ seconds around twice per minute. > > > > I had no swap storm yet since the last boot, and swap usage is around > > 16M now. Before kernel 4.9, this would be much higher already. > > After some more research, I found that vm.watermark_scale_factor may be > the knob I am looking for. I'm going to watch behavior now with a higher > factor (default = 10, now 200). > Have you reporteed this to the kernel maintainers? LKML? While this is interesting to read on systemd-devel, it's not right venue. What you describe sounds like a regression that probably should be improved upon. Also, out of curiosity, are you running dmcrypt in this scenario? If so, is swap on dmcrypt as well? Regards, Vito Caputo ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-journald may crash during memory pressure
Am Sat, 10 Feb 2018 14:23:34 +0100 schrieb Kai Krakow: > Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala: > >> On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote: >>> This last log lines indicates journald wasn't scheduled for a long >>> time which caused the watchdog to hit and journald was aborted. >>> Consider increasing the watchdog timeout if your system is indeed that >>> loaded and that's is supposed to be an OK thing... >> >> BTW I've seen the same behavior on a system with a single active >> process that uses enough memory to trigger significant swap use. I >> wonder if there has been a regression in the kernel causing misbehavior >> when swapping? The problems aren't specific to journald - desktop >> environment can totally freeze too etc. > > This problem seems to be there since kernel 4.9 which was a real pita in > this regard. It's progressively becoming better since kernel 4.10. The > kernel seems trying to prevent swapping at any cost since then, at least > at the cost of much higher latency, and at the cost of pushing all cache > out of RAM. > > The result is processes stuck for easily 30 seconds and more during > memory pressure. Sometimes I see the kernel loudly complaining in dmesg > about high wait times for allocating RAM, especially from the btrfs > module. Thus, the biggest problem may be that kernel threads itself get > stuck in memory allocations and are a victim of high latency. > > Currently I'm running my user session in a slice with max 80% RAM which > seems to help. It helps not discarding all cache. I also put some > potentially high memory users (regarding cache and/or resident mem) into > slices with carefully selected memory limits (backup and maintenance > services). Slices limited in such a way will start swapping before cache > is discarded and everything works better again. Part of this problem may > be that I have one process running which mmaps and locks 1G of memory > (bees, a btrfs deduplicator). > > This system has 16G of RAM which is usually plenty but I use tmpfs to > build packages in Gentoo, and while that worked wonderfully before 4.9, > I have to be really careful now. The kernel happily throws away cache > instead of swapping early. Setting vm.swappiness differently seems to > have no perceivable effect. > > Software that uses mmap is the first latency victim of this new > behavior. > As such, also systemd-journald seems to be hit hard by this. > > After the system recovered from high memory pressure (which can take > 10-15 minutes, resulting in a loadavg of 400+), it ends up with some > gigabytes of inactive memory in the swap which it will only swap back in > then during shutdown (which will also take some minutes then). > > The problem since 4.9 seems to be that the kernel tends to do swap > storms instead of constantly swapping out memory at low rates during > usage. The swap storms totally thrash the system. > > Before 4.9, the kernel had no such latency spikes under memory pressure. > Swap would usually grew slowly over time, and the system felt sluggish > one or another time but still usable wrt latency. I usually ended up > with 5-8G of swap usage, and that was no problem. Now, swap only > significantly grows during swap storms with an unusable system for many > minutes, with latencies of 10+ seconds around twice per minute. > > I had no swap storm yet since the last boot, and swap usage is around > 16M now. Before kernel 4.9, this would be much higher already. After some more research, I found that vm.watermark_scale_factor may be the knob I am looking for. I'm going to watch behavior now with a higher factor (default = 10, now 200). -- Regards, Kai Replies to list-only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-journald may crash during memory pressure
Am Sat, 10 Feb 2018 02:16:44 +0200 schrieb Uoti Urpala: > On Fri, 2018-02-09 at 12:41 +0100, Lennart Poettering wrote: >> This last log lines indicates journald wasn't scheduled for a long >> time which caused the watchdog to hit and journald was >> aborted. Consider increasing the watchdog timeout if your system is >> indeed that loaded and that's is supposed to be an OK thing... > > BTW I've seen the same behavior on a system with a single active > process that uses enough memory to trigger significant swap use. I > wonder if there has been a regression in the kernel causing misbehavior > when swapping? The problems aren't specific to journald - desktop > environment can totally freeze too etc. This problem seems to be there since kernel 4.9 which was a real pita in this regard. It's progressively becoming better since kernel 4.10. The kernel seems trying to prevent swapping at any cost since then, at least at the cost of much higher latency, and at the cost of pushing all cache out of RAM. The result is processes stuck for easily 30 seconds and more during memory pressure. Sometimes I see the kernel loudly complaining in dmesg about high wait times for allocating RAM, especially from the btrfs module. Thus, the biggest problem may be that kernel threads itself get stuck in memory allocations and are a victim of high latency. Currently I'm running my user session in a slice with max 80% RAM which seems to help. It helps not discarding all cache. I also put some potentially high memory users (regarding cache and/or resident mem) into slices with carefully selected memory limits (backup and maintenance services). Slices limited in such a way will start swapping before cache is discarded and everything works better again. Part of this problem may be that I have one process running which mmaps and locks 1G of memory (bees, a btrfs deduplicator). This system has 16G of RAM which is usually plenty but I use tmpfs to build packages in Gentoo, and while that worked wonderfully before 4.9, I have to be really careful now. The kernel happily throws away cache instead of swapping early. Setting vm.swappiness differently seems to have no perceivable effect. Software that uses mmap is the first latency victim of this new behavior. As such, also systemd-journald seems to be hit hard by this. After the system recovered from high memory pressure (which can take 10-15 minutes, resulting in a loadavg of 400+), it ends up with some gigabytes of inactive memory in the swap which it will only swap back in then during shutdown (which will also take some minutes then). The problem since 4.9 seems to be that the kernel tends to do swap storms instead of constantly swapping out memory at low rates during usage. The swap storms totally thrash the system. Before 4.9, the kernel had no such latency spikes under memory pressure. Swap would usually grew slowly over time, and the system felt sluggish one or another time but still usable wrt latency. I usually ended up with 5-8G of swap usage, and that was no problem. Now, swap only significantly grows during swap storms with an unusable system for many minutes, with latencies of 10+ seconds around twice per minute. I had no swap storm yet since the last boot, and swap usage is around 16M now. Before kernel 4.9, this would be much higher already. -- Regards, Kai Replies to list-only preferred. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel