Re: stable/10: high load average when box is idle

2015-10-29 Thread Miroslav Lachman

Jeremy Chadwick wrote on 10/29/2015 11:09:

On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote:

Jeremy Chadwick wrote on 10/27/2015 06:05:

(I am not subscribed to the mailing list, please keep me CC'd)

Issue: a stable/10 system that has an abnormally high load average (e.g.
0.15, but may be higher depending on other variables which I can't
account for) when the machine is definitely idle (i.e. cannot be traced
to high interrupt usage per vmstat -i, cannot be traced to a userland
process or kernel thread, etc.).

This problem has been discussed many times on the FreeBSD mailing lists
and the FreeBSD forum (including some folks seeing it on 9.x, but my
complaint here is focused on 10.x so please focus there).

I'd politely like to request that anyone experiencing this, or who has
experienced it (and if you know when it stopped or why, including what
you may have done, include that), to chime in on this ticket from 2012
(made for 9.x but style of issue still applies; c#5 is quite valid):

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541

For those still experiencing it, I'd suggest reading c#8 and seeing if
sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
this time I would not suggest leaving that set indefinitely, as it does
seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
me kern.eventtimer.periodic=1 "fixes" the issue)


Is it on real HW server or in some kind of virtualization? I am seeing load
0.5 - 1.2 on three virtual machines in VMware. The machines are without any
traffic. Just fresh instalation of FreeBSD 10.1 and some services without
any public content.


I've seen it on both bare-metal and VMs.  Please see c#8 in the ticket;
there's an itemised list of where I've seen it, but I'm sure it's not
limited to just those.


OK, I have read your c#8 and did some tests on our affected VMs.
With sysctl kern.eventtimer.periodic=1 it is better. Where previously 
load were about 0.40 is 0.15 now.
One of these three systems is FreeBSD 10.2 and on this machine the 
positive effect of kern.eventtimer.periodic=1 is more visible - load is 
now 0.00 - 0.05.

I don't know if this is some coincidence or something is different in 10.2.

Settings of kern.eventtimer is the same on all VMs

kern.eventtimer.et.LAPIC.flags: 7
kern.eventtimer.et.LAPIC.frequency: 35071418
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0)
kern.eventtimer.singlemul: 4
kern.eventtimer.idletick: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.periodic: 1

Miroslav Lachman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


10-STABLE buildworld fails at very early stage

2015-10-29 Thread Lev Serebryakov
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512


I have this in /etc/src.conf (it is only one line here):

MAKEOBJDIRPREFIX=/usr/home/build/obj


% cd /usr/src
% sudo svn up
Updating '.':
At revision 290139.
% sudo make buildworld
[one screen of output]
set -e; cd /usr/src/tools/build; make buildincludes; make installinclude
s
sh /usr/src/tools/install.sh -C -o root -g wheel -m 444   libegacy.a
/usr/home/build/obj/legacy/usr/lib
install: /usr/home/build/obj/legacy/usr/lib: No such file or directory
*** Error code 71

Stop.
make[3]: stopped in /usr/src/tools/build
*** Error code 1
% uname -v
FreeBSD 10.2-PRERELEASE #7 r286065: Thu Jul 30 21:27:35 MSK 2015
root@:/usr/obj/usr/src/sys/BLOB
%

- -- 
// Lev Serebryakov
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQJ8BAEBCgBmBQJWMiwyXxSAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRGOTZEMUNBMEI1RjQzMThCNjc0QjMzMEFF
QUIwM0M1OEJGREM0NzhGAAoJEOqwPFi/3EeP1iwQAKB0mwW2w9Ablws1vxcTUmCC
zEPI7zcGi/wu/Xw6w8exsYvwr5nsEKyTCrc9zV2CmwuXoY23Xqk4r1Atl0XXL9eF
5P2fjzNxK8w1NI6PYY6kmw64JpcA1MQKZwizNzE+uhoKkTkX9PQYYGbVOJlNT6v6
7OaQYbsGPwlyRXC0nRWZW1izClIs7XqMFwo+q0oX12/oPZypwPxQsk6KsPym+WQN
uL3ENoa13AnGbc7YY4omO/6Yvi3yIP1tIRUSre1s+ES7/gIKw62uHT0JuCpdCoEL
G1cC9Zq4irGQYJlgR2HEjypTJ09Flzs4rgOmmV/Oj8xJw8N/JGJp0X9NkDHMtkSO
KF+x1cwm+lJeDVNoz0NsJXfMpo33SiKwaTYQiQUhvRQOUpVsWzaC4KV5aNfRFa3/
uDkV7KCJETOQuYC4H7SCRn2KFRp6uxAh/UMXj3XZpwx5VzDI3CxgBx6DMxJF4/zD
+eKIPPdcbGY1rRW5I375Cw/pZv3rYMni3ruPQibXeezD9TJ6YP48gTWKwtcpKiTe
UmJ5IE6gn3PhyPBZjGBqsfnvxLMBw29GwqEE6bBRtQbYrhy3GfBJeShkc2nCR7c2
yv8zcgP2yBjBs+SyBrJvGkHEUxM4gaomkVqMvDsGgFixneOoSquna88S2YCA6mBY
Ik+aukkDLIYCxqbbE6AA
=+GmP
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Stuck processes in unkillable (STOP) state, listen queue overflow

2015-10-29 Thread Schaich Alonso
On Thu, Oct 29, 2015, at 16:46, Zara Kanaeva wrote:
> Hello Дмитрий,
> 
> thank you very much for your message.
> 
> First of all: I like FreeBSD (the installation logic, the good  
> documentation etc.), this is why I use FreeBSD as Server OS. But in my  
> case I must desagree your strong theoretical probability  
> consideration. In my case I have one machine (7 years old), that had  
> 1-2 spontaneous rebootes in a year. In my case I got a lot of "already  
> in queue awaiting acceptance"-Errors and the machine rebootes  
> immediately after this.
> 
> I will get soon a new replacement for this old machine with at least  
> 32 GB RAM and (of course) new power supply. So I will see if my  
> problem (perhaps it is only my problem) still persist.
> 
> Greetings, Z. Kanaeva.
> 

I've had resetting network interfaces combined with the queue overflow
warnings on 3 different machines with 5 different NICs and 3 differnt
PSUs.

It disappeared when I updated to FreeBSD-10 two years ago, so I assumed
the cause of this to have either been fixed or workarounded.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Stuck processes in unkillable (STOP) state, listen queue overflow

2015-10-29 Thread Zara Kanaeva

Hello Дмитрий,

thank you very much for your message.

First of all: I like FreeBSD (the installation logic, the good  
documentation etc.), this is why I use FreeBSD as Server OS. But in my  
case I must desagree your strong theoretical probability  
consideration. In my case I have one machine (7 years old), that had  
1-2 spontaneous rebootes in a year. In my case I got a lot of "already  
in queue awaiting acceptance"-Errors and the machine rebootes  
immediately after this.


I will get soon a new replacement for this old machine with at least  
32 GB RAM and (of course) new power supply. So I will see if my  
problem (perhaps it is only my problem) still persist.


Greetings, Z. Kanaeva.

Zitat von Дмитрий Долбнин :


Good day everyone !
From my point of view it seems like you're experiencing the  
"downgraded" hardware performance which causes you the problems you  
meet.

Try to switch for the "new-one" power supply at least.
Why I think so ? Because the bad power supplies are met much more  
often than the bad source code for FreeBSD. Of course I can't tell  
you you're completely wrong.

Best regards, Dimitry.

Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-requ...@freebsd.org:

Send freebsd-stable mailing list submissions to
freebsd-stable@freebsd.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
or, via email, send a message with subject or body 'help' to
freebsd-stable-requ...@freebsd.org

You can reach the person managing the list at
freebsd-stable-ow...@freebsd.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of freebsd-stable digest..."


Today's Topics:

   1. Re: Stuck processes in unkillable (STOP) state, listen queue
  overflow (Zara Kanaeva)
   2. Re: Stuck processes in unkillable (STOP) state, listen queue
  overflow (Nagy, Attila)


--

Message: 1
Date: Tue, 27 Oct 2015 14:42:42 +0100
From: Zara Kanaeva < zara.kana...@ggi.uni-tuebingen.de >
To:  freebsd-stable@freebsd.org
Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
overflow
Message-ID:
< 20151027144242.horde.3xc1_rqzavmaz12x6opx...@webmail.uni-tuebingen.de >

Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes

Hello,

I have the same experience with apache and mapserver. It happens on
physical machine and ends with spontaneous reboot. This machine is
updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps
this machine doesn't have enough RAM (only 8GB), but I think that must
not be a reason for a spontaneous reboot.

I had no such behavior with the same machine and FREEBSD 9.0 RELEASE
on it (I am not 100% sure, I have yet no possibility to test it).

Regards, Z. Kanaeva.

Zitat von "Nagy, Attila" < b...@fsn.hu >:


Hi,

Recently I've started to see a lot of cases, where the log is full
with "listen queue overflow" messages and the process behind the
network socket is unavailable.
When I open a TCP to it, it opens but nothing happens (for example I
get no SMTP banner from postfix, nor I get a log entry about the new
connection).

I've seen this with Java programs, postfix and redis, basically
everything which opens a TCP and listens on the machine.

For example, I have a redis process, which listens on 6381. When I
telnet into it, the TCP opens, but the program doesn't respond.
When I kill it, nothing happens. Even with kill -9 yields only this state:
  PID USERNAME   THR PRI NICE   SIZERES STATE   C TIME 
WCPU COMMAN

  776 redis2  200 24112K  2256K STOP3 16:56
0.00% redis-

When I tcpdrop the connections of the process, tcpdrop reports
success for the first time and failure for the second (No such
process), but the connections remain:
# sockstat -4 | grep 776
redisredis-serv 776   6  tcp4   *:6381 *:*
redisredis-serv 776   9  tcp4   *:16381 *:*
redisredis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
redisredis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
redisredis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
redisredis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
redisredis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
redisredis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
redisredis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
redisredis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
# sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided,
or not known
tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided,
or not known
tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process

Re: ZFS, SSDs, and TRIM performance

2015-10-29 Thread Steven Hartland

If you running NVMe, are you running a version which has this:
https://svnweb.freebsd.org/base?view=revision=285767

I'm pretty sure 10.2 does have that, so you should be good, but best to 
check.


Other questions:
1. What does "gstat -d -p" show during the stalls?
2. Do you have any other zfs tuning in place?

On 29/10/2015 16:54, Sean Kelly wrote:

Me again. I have a new issue and I’m not sure if it is hardware or software. I 
have nine servers running 10.2-RELEASE-p5 with Dell OEM’d Samsung XS1715 NVMe 
SSDs. They are paired up in a single mirrored zpool on each server. They 
perform great most of the time. However, I have a problem when ZFS fires off 
TRIMs. Not during vdev creation, but like if I delete a 20GB snapshot.

If I destroy a 20GB snapshot or delete large files, ZFS fires off tons of TRIMs 
to the disks. I can see the kstat.zfs.misc.zio_trim.success and 
kstat.zfs.misc.zio_trim.bytes sysctls skyrocket. While this is happening, any 
synchronous writes seem to block. For example, we’re running PostgreSQL which 
does fsync()s all the time. While these TRIMs happen, Postgres just hangs on 
writes. This causes reads to block due to lock contention as well.

If I change sync=disabled on my tank/pgsql dataset while this is happening, it 
unblocks for the most part. But obviously this is not an ideal way to run 
PostgreSQL.

I’m working with my vendor to get some Intel SSDs to test, but any ideas if 
this could somehow be a software issue? Or does the Samsung XS1715 just suck at 
TRIM and SYNC?

We’re thinking of just setting the vfs.zfs.trim.enabled=0 tunable for now since 
WAL segment turnover actually causes TRIM operations a lot, but unfortunately 
this is a reboot. But disabling TRIM does seem to fix the issue on other 
servers I’ve tested with the same hardware config.



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/10: high load average when box is idle

2015-10-29 Thread Miroslav Lachman

Miroslav Lachman wrote on 10/29/2015 12:47:

Jeremy Chadwick wrote on 10/29/2015 11:09:


[...]


I've seen it on both bare-metal and VMs.  Please see c#8 in the ticket;
there's an itemised list of where I've seen it, but I'm sure it's not
limited to just those.


OK, I have read your c#8 and did some tests on our affected VMs.
With sysctl kern.eventtimer.periodic=1 it is better. Where previously
load were about 0.40 is 0.15 now.
One of these three systems is FreeBSD 10.2 and on this machine the
positive effect of kern.eventtimer.periodic=1 is more visible - load is
now 0.00 - 0.05.
I don't know if this is some coincidence or something is different in 10.2.

Settings of kern.eventtimer is the same on all VMs

kern.eventtimer.et.LAPIC.flags: 7
kern.eventtimer.et.LAPIC.frequency: 35071418
kern.eventtimer.et.LAPIC.quality: 600
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.choice: LAPIC(600) i8254(100) RTC(0)
kern.eventtimer.singlemul: 4
kern.eventtimer.idletick: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.periodic: 1



Just for the record - I added graphs of CPU load from these three VMs

FreeBSD 10.1
http://imagebin.ca/v/2Kkyq29M13d3
FreeBSD 10.1
http://imagebin.ca/v/2KkzUccxJEoE
FreeBSD 10.2
http://imagebin.ca/v/2Kl00mS4RQ3n

And coresponding CPU idle percentages

FreeBSD 10.1
http://imagebin.ca/v/2Kl0R0U1pRhg
FreeBSD 10.1
http://imagebin.ca/v/2Kl0cYiB0mS4
FreeBSD 10.2
http://imagebin.ca/v/2Kl0lIipTKXc

As I mentioned the difference with / without kern.eventtimer.periodic=1 
is more visible on FreeBSD 10.2.


The flat line on the graphs is interval where I disabled almost all 
services - crontab too - so no measurements in this time.
Effect of kern.eventtimer.periodic=1 is visible from 18:00 when I 
started all usual services.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


ZFS, SSDs, and TRIM performance

2015-10-29 Thread Sean Kelly
Me again. I have a new issue and I’m not sure if it is hardware or software. I 
have nine servers running 10.2-RELEASE-p5 with Dell OEM’d Samsung XS1715 NVMe 
SSDs. They are paired up in a single mirrored zpool on each server. They 
perform great most of the time. However, I have a problem when ZFS fires off 
TRIMs. Not during vdev creation, but like if I delete a 20GB snapshot.

If I destroy a 20GB snapshot or delete large files, ZFS fires off tons of TRIMs 
to the disks. I can see the kstat.zfs.misc.zio_trim.success and 
kstat.zfs.misc.zio_trim.bytes sysctls skyrocket. While this is happening, any 
synchronous writes seem to block. For example, we’re running PostgreSQL which 
does fsync()s all the time. While these TRIMs happen, Postgres just hangs on 
writes. This causes reads to block due to lock contention as well.

If I change sync=disabled on my tank/pgsql dataset while this is happening, it 
unblocks for the most part. But obviously this is not an ideal way to run 
PostgreSQL.

I’m working with my vendor to get some Intel SSDs to test, but any ideas if 
this could somehow be a software issue? Or does the Samsung XS1715 just suck at 
TRIM and SYNC?

We’re thinking of just setting the vfs.zfs.trim.enabled=0 tunable for now since 
WAL segment turnover actually causes TRIM operations a lot, but unfortunately 
this is a reboot. But disabling TRIM does seem to fix the issue on other 
servers I’ve tested with the same hardware config.

-- 
Sean Kelly
smke...@smkelly.org
http://smkelly.org

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/10: high load average when box is idle

2015-10-29 Thread Jeremy Chadwick
On Thu, Oct 29, 2015 at 11:00:32AM +0100, Miroslav Lachman wrote:
> Jeremy Chadwick wrote on 10/27/2015 06:05:
> >(I am not subscribed to the mailing list, please keep me CC'd)
> >
> >Issue: a stable/10 system that has an abnormally high load average (e.g.
> >0.15, but may be higher depending on other variables which I can't
> >account for) when the machine is definitely idle (i.e. cannot be traced
> >to high interrupt usage per vmstat -i, cannot be traced to a userland
> >process or kernel thread, etc.).
> >
> >This problem has been discussed many times on the FreeBSD mailing lists
> >and the FreeBSD forum (including some folks seeing it on 9.x, but my
> >complaint here is focused on 10.x so please focus there).
> >
> >I'd politely like to request that anyone experiencing this, or who has
> >experienced it (and if you know when it stopped or why, including what
> >you may have done, include that), to chime in on this ticket from 2012
> >(made for 9.x but style of issue still applies; c#5 is quite valid):
> >
> >https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541
> >
> >For those still experiencing it, I'd suggest reading c#8 and seeing if
> >sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
> >this time I would not suggest leaving that set indefinitely, as it does
> >seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
> >me kern.eventtimer.periodic=1 "fixes" the issue)
> 
> Is it on real HW server or in some kind of virtualization? I am seeing load
> 0.5 - 1.2 on three virtual machines in VMware. The machines are without any
> traffic. Just fresh instalation of FreeBSD 10.1 and some services without
> any public content.

I've seen it on both bare-metal and VMs.  Please see c#8 in the ticket;
there's an itemised list of where I've seen it, but I'm sure it's not
limited to just those.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/10: high load average when box is idle

2015-10-29 Thread Miroslav Lachman

Jeremy Chadwick wrote on 10/27/2015 06:05:

(I am not subscribed to the mailing list, please keep me CC'd)

Issue: a stable/10 system that has an abnormally high load average (e.g.
0.15, but may be higher depending on other variables which I can't
account for) when the machine is definitely idle (i.e. cannot be traced
to high interrupt usage per vmstat -i, cannot be traced to a userland
process or kernel thread, etc.).

This problem has been discussed many times on the FreeBSD mailing lists
and the FreeBSD forum (including some folks seeing it on 9.x, but my
complaint here is focused on 10.x so please focus there).

I'd politely like to request that anyone experiencing this, or who has
experienced it (and if you know when it stopped or why, including what
you may have done, include that), to chime in on this ticket from 2012
(made for 9.x but style of issue still applies; c#5 is quite valid):

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=173541

For those still experiencing it, I'd suggest reading c#8 and seeing if
sysctl kern.eventtimer.periodic=1 relieves the problem for you.  (At
this time I would not suggest leaving that set indefinitely, as it does
seem to increase the interrupt rate in cpuX:timer in vmstat -i.  But for
me kern.eventtimer.periodic=1 "fixes" the issue)


Is it on real HW server or in some kind of virtualization? I am seeing 
load 0.5 - 1.2 on three virtual machines in VMware. The machines are 
without any traffic. Just fresh instalation of FreeBSD 10.1 and some 
services without any public content.


Miroslav Lachman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"