Re: stable/10: high load average when box is idle
Jeremy Chadwick wrote on 10/29/2015 11:09: On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote: Jeremy Chadwick wrote on 10/27/2015 06:05: (I am not subscribed to the mailing list, please keep me CC'd) Issue: a stable/10 system that has an abnormally high load average (e.g. 0.15, but may be higher depending on other variables which I can't account for) when the machine is definitely idle (i.e. cannot be traced to high interrupt usage per vmstat -i, cannot be traced to a userland process or kernel thread, etc.). This problem has been discussed many times on the FreeBSD mailing lists and the FreeBSD forum (including some folks seeing it on 9.x, but my complaint here is focused on 10.x so please focus there). I'd politely like to request that anyone experiencing this, or who has experienced it (and if you know when it stopped or why, including what you may have done, include that), to chime in on this ticket from 2012 (made for 9.x but style of issue still applies; c#5 is quite valid): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 For those still experiencing it, I'd suggest reading c#8 and seeing if sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At this time I would not suggest leaving that set indefinitely, as it does seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for me kern.eventtimer.periodic=1 "fixes" the issue) Is it on real HW server or in some kind of virtualization? I am seeing load 0.5 - 1.2 on three virtual machines in VMware. The machines are without any traffic. Just fresh instalation of FreeBSD 10.1 and some services without any public content. I've seen it on both bare-metal and VMs. Please see c#8 in the ticket; there's an itemised list of where I've seen it, but I'm sure it's not limited to just those. OK, I have read your c#8 and did some tests on our affected VMs. With sysctl kern.eventtimer.periodic=1 it is better. Where previously load were about 0.40 is 0.15 now. One of these three systems is FreeBSD 10.2 and on this machine the positive effect of kern.eventtimer.periodic=1 is more visible - load is now 0.00 - 0.05. I don't know if this is some coincidence or something is different in 10.2. Settings of kern.eventtimer is the same on all VMs kern.eventtimer.et.LAPIC.flags: 7 kern.eventtimer.et.LAPIC.frequency: 35071418 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.singlemul: 4 kern.eventtimer.idletick: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 1 Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
10-STABLE buildworld fails at very early stage
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I have this in /etc/src.conf (it is only one line here): MAKEOBJDIRPREFIX=/usr/home/build/obj % cd /usr/src % sudo svn up Updating '.': At revision 290139. % sudo make buildworld [one screen of output] set -e; cd /usr/src/tools/build; make buildincludes; make installinclude s sh /usr/src/tools/install.sh -C -o root -g wheel -m 444 libegacy.a /usr/home/build/obj/legacy/usr/lib install: /usr/home/build/obj/legacy/usr/lib: No such file or directory *** Error code 71 Stop. make[3]: stopped in /usr/src/tools/build *** Error code 1 % uname -v FreeBSD 10.2-PRERELEASE #7 r286065: Thu Jul 30 21:27:35 MSK 2015 root@:/usr/obj/usr/src/sys/BLOB % - -- // Lev Serebryakov -BEGIN PGP SIGNATURE- Version: GnuPG v2 iQJ8BAEBCgBmBQJWMiwyXxSAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EeP1iwQAKB0mwW2w9Ablws1vxcTUmCC zEPI7zcGi/wu/Xw6w8exsYvwr5nsEKyTCrc9zV2CmwuXoY23Xqk4r1Atl0XXL9eF 5P2fjzNxK8w1NI6PYY6kmw64JpcA1MQKZwizNzE+uhoKkTkX9PQYYGbVOJlNT6v6 7OaQYbsGPwlyRXC0nRWZW1izClIs7XqMFwo+q0oX12/oPZypwPxQsk6KsPym+WQN uL3ENoa13AnGbc7YY4omO/6Yvi3yIP1tIRUSre1s+ES7/gIKw62uHT0JuCpdCoEL G1cC9Zq4irGQYJlgR2HEjypTJ09Flzs4rgOmmV/Oj8xJw8N/JGJp0X9NkDHMtkSO KF+x1cwm+lJeDVNoz0NsJXfMpo33SiKwaTYQiQUhvRQOUpVsWzaC4KV5aNfRFa3/ uDkV7KCJETOQuYC4H7SCRn2KFRp6uxAh/UMXj3XZpwx5VzDI3CxgBx6DMxJF4/zD +eKIPPdcbGY1rRW5I375Cw/pZv3rYMni3ruPQibXeezD9TJ6YP48gTWKwtcpKiTe UmJ5IE6gn3PhyPBZjGBqsfnvxLMBw29GwqEE6bBRtQbYrhy3GfBJeShkc2nCR7c2 yv8zcgP2yBjBs+SyBrJvGkHEUxM4gaomkVqMvDsGgFixneOoSquna88S2YCA6mBY Ik+aukkDLIYCxqbbE6AA =+GmP -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stuck processes in unkillable (STOP) state, listen queue overflow
On Thu, Oct 29, 2015, at 16:46, Zara Kanaeva wrote: > Hello Дмитрий, > > thank you very much for your message. > > First of all: I like FreeBSD (the installation logic, the good > documentation etc.), this is why I use FreeBSD as Server OS. But in my > case I must desagree your strong theoretical probability > consideration. In my case I have one machine (7 years old), that had > 1-2 spontaneous rebootes in a year. In my case I got a lot of "already > in queue awaiting acceptance"-Errors and the machine rebootes > immediately after this. > > I will get soon a new replacement for this old machine with at least > 32 GB RAM and (of course) new power supply. So I will see if my > problem (perhaps it is only my problem) still persist. > > Greetings, Z. Kanaeva. > I've had resetting network interfaces combined with the queue overflow warnings on 3 different machines with 5 different NICs and 3 differnt PSUs. It disappeared when I updated to FreeBSD-10 two years ago, so I assumed the cause of this to have either been fixed or workarounded. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Stuck processes in unkillable (STOP) state, listen queue overflow
Hello Дмитрий, thank you very much for your message. First of all: I like FreeBSD (the installation logic, the good documentation etc.), this is why I use FreeBSD as Server OS. But in my case I must desagree your strong theoretical probability consideration. In my case I have one machine (7 years old), that had 1-2 spontaneous rebootes in a year. In my case I got a lot of "already in queue awaiting acceptance"-Errors and the machine rebootes immediately after this. I will get soon a new replacement for this old machine with at least 32 GB RAM and (of course) new power supply. So I will see if my problem (perhaps it is only my problem) still persist. Greetings, Z. Kanaeva. Zitat von Дмитрий Долбнин: Good day everyone ! From my point of view it seems like you're experiencing the "downgraded" hardware performance which causes you the problems you meet. Try to switch for the "new-one" power supply at least. Why I think so ? Because the bad power supplies are met much more often than the bad source code for FreeBSD. Of course I can't tell you you're completely wrong. Best regards, Dimitry. Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-requ...@freebsd.org: Send freebsd-stable mailing list submissions to freebsd-stable@freebsd.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.freebsd.org/mailman/listinfo/freebsd-stable or, via email, send a message with subject or body 'help' to freebsd-stable-requ...@freebsd.org You can reach the person managing the list at freebsd-stable-ow...@freebsd.org When replying, please edit your Subject line so it is more specific than "Re: Contents of freebsd-stable digest..." Today's Topics: 1. Re: Stuck processes in unkillable (STOP) state, listen queue overflow (Zara Kanaeva) 2. Re: Stuck processes in unkillable (STOP) state, listen queue overflow (Nagy, Attila) -- Message: 1 Date: Tue, 27 Oct 2015 14:42:42 +0100 From: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de > To: freebsd-stable@freebsd.org Subject: Re: Stuck processes in unkillable (STOP) state, listen queue overflow Message-ID: < 20151027144242.horde.3xc1_rqzavmaz12x6opx...@webmail.uni-tuebingen.de > Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes Hello, I have the same experience with apache and mapserver. It happens on physical machine and ends with spontaneous reboot. This machine is updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps this machine doesn't have enough RAM (only 8GB), but I think that must not be a reason for a spontaneous reboot. I had no such behavior with the same machine and FREEBSD 9.0 RELEASE on it (I am not 100% sure, I have yet no possibility to test it). Regards, Z. Kanaeva. Zitat von "Nagy, Attila" < b...@fsn.hu >: Hi, Recently I've started to see a lot of cases, where the log is full with "listen queue overflow" messages and the process behind the network socket is unavailable. When I open a TCP to it, it opens but nothing happens (for example I get no SMTP banner from postfix, nor I get a log entry about the new connection). I've seen this with Java programs, postfix and redis, basically everything which opens a TCP and listens on the machine. For example, I have a redis process, which listens on 6381. When I telnet into it, the TCP opens, but the program doesn't respond. When I kill it, nothing happens. Even with kill -9 yields only this state: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAN 776 redis2 200 24112K 2256K STOP3 16:56 0.00% redis- When I tcpdrop the connections of the process, tcpdrop reports success for the first time and failure for the second (No such process), but the connections remain: # sockstat -4 | grep 776 redisredis-serv 776 6 tcp4 *:6381 *:* redisredis-serv 776 9 tcp4 *:16381 *:* redisredis-serv 776 10 tcp4 127.0.0.1:16381 127.0.0.1:10460 redisredis-serv 776 11 tcp4 127.0.0.1:16381 127.0.0.1:35795 redisredis-serv 776 13 tcp4 127.0.0.1:30027 127.0.0.1:16379 redisredis-serv 776 14 tcp4 127.0.0.1:58802 127.0.0.1:16384 redisredis-serv 776 17 tcp4 127.0.0.1:16381 127.0.0.1:24354 redisredis-serv 776 18 tcp4 127.0.0.1:16381 127.0.0.1:56999 redisredis-serv 776 19 tcp4 127.0.0.1:16381 127.0.0.1:39488 redisredis-serv 776 20 tcp4 127.0.0.1:6381 127.0.0.1:39491 # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, or not known tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided, or not known tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
Re: ZFS, SSDs, and TRIM performance
If you running NVMe, are you running a version which has this: https://svnweb.freebsd.org/base?view=revision=285767 I'm pretty sure 10.2 does have that, so you should be good, but best to check. Other questions: 1. What does "gstat -d -p" show during the stalls? 2. Do you have any other zfs tuning in place? On 29/10/2015 16:54, Sean Kelly wrote: Me again. I have a new issue and I’m not sure if it is hardware or software. I have nine servers running 10.2-RELEASE-p5 with Dell OEM’d Samsung XS1715 NVMe SSDs. They are paired up in a single mirrored zpool on each server. They perform great most of the time. However, I have a problem when ZFS fires off TRIMs. Not during vdev creation, but like if I delete a 20GB snapshot. If I destroy a 20GB snapshot or delete large files, ZFS fires off tons of TRIMs to the disks. I can see the kstat.zfs.misc.zio_trim.success and kstat.zfs.misc.zio_trim.bytes sysctls skyrocket. While this is happening, any synchronous writes seem to block. For example, we’re running PostgreSQL which does fsync()s all the time. While these TRIMs happen, Postgres just hangs on writes. This causes reads to block due to lock contention as well. If I change sync=disabled on my tank/pgsql dataset while this is happening, it unblocks for the most part. But obviously this is not an ideal way to run PostgreSQL. I’m working with my vendor to get some Intel SSDs to test, but any ideas if this could somehow be a software issue? Or does the Samsung XS1715 just suck at TRIM and SYNC? We’re thinking of just setting the vfs.zfs.trim.enabled=0 tunable for now since WAL segment turnover actually causes TRIM operations a lot, but unfortunately this is a reboot. But disabling TRIM does seem to fix the issue on other servers I’ve tested with the same hardware config. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/10: high load average when box is idle
Miroslav Lachman wrote on 10/29/2015 12:47: Jeremy Chadwick wrote on 10/29/2015 11:09: [...] I've seen it on both bare-metal and VMs. Please see c#8 in the ticket; there's an itemised list of where I've seen it, but I'm sure it's not limited to just those. OK, I have read your c#8 and did some tests on our affected VMs. With sysctl kern.eventtimer.periodic=1 it is better. Where previously load were about 0.40 is 0.15 now. One of these three systems is FreeBSD 10.2 and on this machine the positive effect of kern.eventtimer.periodic=1 is more visible - load is now 0.00 - 0.05. I don't know if this is some coincidence or something is different in 10.2. Settings of kern.eventtimer is the same on all VMs kern.eventtimer.et.LAPIC.flags: 7 kern.eventtimer.et.LAPIC.frequency: 35071418 kern.eventtimer.et.LAPIC.quality: 600 kern.eventtimer.et.i8254.flags: 1 kern.eventtimer.et.i8254.frequency: 1193182 kern.eventtimer.et.i8254.quality: 100 kern.eventtimer.et.RTC.flags: 17 kern.eventtimer.et.RTC.frequency: 32768 kern.eventtimer.et.RTC.quality: 0 kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0) kern.eventtimer.singlemul: 4 kern.eventtimer.idletick: 0 kern.eventtimer.timer: LAPIC kern.eventtimer.periodic: 1 Just for the record - I added graphs of CPU load from these three VMs FreeBSD 10.1 http://imagebin.ca/v/2Kkyq29M13d3 FreeBSD 10.1 http://imagebin.ca/v/2KkzUccxJEoE FreeBSD 10.2 http://imagebin.ca/v/2Kl00mS4RQ3n And coresponding CPU idle percentages FreeBSD 10.1 http://imagebin.ca/v/2Kl0R0U1pRhg FreeBSD 10.1 http://imagebin.ca/v/2Kl0cYiB0mS4 FreeBSD 10.2 http://imagebin.ca/v/2Kl0lIipTKXc As I mentioned the difference with / without kern.eventtimer.periodic=1 is more visible on FreeBSD 10.2. The flat line on the graphs is interval where I disabled almost all services - crontab too - so no measurements in this time. Effect of kern.eventtimer.periodic=1 is visible from 18:00 when I started all usual services. Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
ZFS, SSDs, and TRIM performance
Me again. I have a new issue and I’m not sure if it is hardware or software. I have nine servers running 10.2-RELEASE-p5 with Dell OEM’d Samsung XS1715 NVMe SSDs. They are paired up in a single mirrored zpool on each server. They perform great most of the time. However, I have a problem when ZFS fires off TRIMs. Not during vdev creation, but like if I delete a 20GB snapshot. If I destroy a 20GB snapshot or delete large files, ZFS fires off tons of TRIMs to the disks. I can see the kstat.zfs.misc.zio_trim.success and kstat.zfs.misc.zio_trim.bytes sysctls skyrocket. While this is happening, any synchronous writes seem to block. For example, we’re running PostgreSQL which does fsync()s all the time. While these TRIMs happen, Postgres just hangs on writes. This causes reads to block due to lock contention as well. If I change sync=disabled on my tank/pgsql dataset while this is happening, it unblocks for the most part. But obviously this is not an ideal way to run PostgreSQL. I’m working with my vendor to get some Intel SSDs to test, but any ideas if this could somehow be a software issue? Or does the Samsung XS1715 just suck at TRIM and SYNC? We’re thinking of just setting the vfs.zfs.trim.enabled=0 tunable for now since WAL segment turnover actually causes TRIM operations a lot, but unfortunately this is a reboot. But disabling TRIM does seem to fix the issue on other servers I’ve tested with the same hardware config. -- Sean Kelly smke...@smkelly.org http://smkelly.org ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/10: high load average when box is idle
On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote: > Jeremy Chadwick wrote on 10/27/2015 06:05: > >(I am not subscribed to the mailing list, please keep me CC'd) > > > >Issue: a stable/10 system that has an abnormally high load average (e.g. > >0.15, but may be higher depending on other variables which I can't > >account for) when the machine is definitely idle (i.e. cannot be traced > >to high interrupt usage per vmstat -i, cannot be traced to a userland > >process or kernel thread, etc.). > > > >This problem has been discussed many times on the FreeBSD mailing lists > >and the FreeBSD forum (including some folks seeing it on 9.x, but my > >complaint here is focused on 10.x so please focus there). > > > >I'd politely like to request that anyone experiencing this, or who has > >experienced it (and if you know when it stopped or why, including what > >you may have done, include that), to chime in on this ticket from 2012 > >(made for 9.x but style of issue still applies; c#5 is quite valid): > > > >https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 > > > >For those still experiencing it, I'd suggest reading c#8 and seeing if > >sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At > >this time I would not suggest leaving that set indefinitely, as it does > >seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for > >me kern.eventtimer.periodic=1 "fixes" the issue) > > Is it on real HW server or in some kind of virtualization? I am seeing load > 0.5 - 1.2 on three virtual machines in VMware. The machines are without any > traffic. Just fresh instalation of FreeBSD 10.1 and some services without > any public content. I've seen it on both bare-metal and VMs. Please see c#8 in the ticket; there's an itemised list of where I've seen it, but I'm sure it's not limited to just those. -- | Jeremy Chadwick j...@koitsu.org | | UNIX Systems Administratorhttp://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/10: high load average when box is idle
Jeremy Chadwick wrote on 10/27/2015 06:05: (I am not subscribed to the mailing list, please keep me CC'd) Issue: a stable/10 system that has an abnormally high load average (e.g. 0.15, but may be higher depending on other variables which I can't account for) when the machine is definitely idle (i.e. cannot be traced to high interrupt usage per vmstat -i, cannot be traced to a userland process or kernel thread, etc.). This problem has been discussed many times on the FreeBSD mailing lists and the FreeBSD forum (including some folks seeing it on 9.x, but my complaint here is focused on 10.x so please focus there). I'd politely like to request that anyone experiencing this, or who has experienced it (and if you know when it stopped or why, including what you may have done, include that), to chime in on this ticket from 2012 (made for 9.x but style of issue still applies; c#5 is quite valid): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541 For those still experiencing it, I'd suggest reading c#8 and seeing if sysctl kern.eventtimer.periodic=1 relieves the problem for you. (At this time I would not suggest leaving that set indefinitely, as it does seem to increase the interrupt rate in cpuX:timer in vmstat -i. But for me kern.eventtimer.periodic=1 "fixes" the issue) Is it on real HW server or in some kind of virtualization? I am seeing load 0.5 - 1.2 on three virtual machines in VMware. The machines are without any traffic. Just fresh instalation of FreeBSD 10.1 and some services without any public content. Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"