Hi, Ivan!

Thank you much for response and sorry for late answer. We was able to collect some data about the issue to make discussion more objective. See below.

24.01.2011 16:54, Ivan Voras пишет:
On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote:

Good day!

We are using custom php application on FreeBSD 8.1R amd64. It is started
with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend.
Several times per day this app is making self unavailable.

I think it would be more appropriate to ask this on the stable@ list.

I believe that this issue is not related to stable only and problem resolution will involve some debugging and stuff, so this was the reason why i post this to hackers@.


Simple php-fpm restart solves the problem, but i need to track it down
to the cause of this situation and ask for your assistance and
instructions on how to debug it. Some facts about this:

On one hand, FPM is said to be very experimental...

Personally, I've been using apache22-worker or apache22-event +
mod_fcgid for years without trouble.

We prefer to avoid using apache at all, because in this it's just adds yet another unneeded link and complexity.

It looks very application-specific, possibly not really an OS problem
(or maybe a problem of different expectations from the OS when porting
from Linux).

- `top -mio` shows very high (80000-90000 for VCSW) VCSW/IVCSW values
for php-fpm processes and LA is more than 120

How many "real" user request are in these 120? Do any users at the time
of problem (this doesn't look like a "crash") receive valid responses?

This problem is seen on development server too with < 5 users online. And nobody receives valid response, they all get this message:

Maximum execution time of 30 seconds exceeded in Unknown on line 0.


- user seeing http 502 error code in browser
- php-fpm log has many of this lines in time of crash:
Jan 23 17:56:58.176425 [WARNING] [pool world] server reached
max_children setting (100), consider raising it

Did you try raising it? Does the error happen ONLY when this limit is
reached?

Yes, we did. Nothing changes.
I'm not sure about second question, i believe it's not related.

These are some very varied errors, not especially consistent with each
other.

Did you try some generic socket & TCP tuning like described in
http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel
?

Yes we did some on initial stage when server was built. And recently tried to raise kern.ipc.somaxconn, but without success.

There is full current sysctl.conf:

security.bsd.see_other_uids = 0
security.bsd.see_other_gids = 0
security.bsd.unprivileged_proc_debug=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.conservative_signals=1
security.bsd.hardlink_check_uid=1
security.bsd.hardlink_check_gid=1
vfs.usermount=0
net.inet.tcp.drop_synfin=1
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=1
net.inet.icmp.bmcastecho=0
net.inet.icmp.maskrepl=0
net.inet.icmp.icmplim=100
net.link.ether.inet.max_age=800
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.inet.ip.sourceroute=0
net.inet.ip.accept_sourceroute=0
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
kern.ps_showallprocs=0
net.inet.ip.random_id=1
kern.corefile=/tmp/%U.%N.%P.core

Other than that, you will probably have to debug the php-fpm processes.
Start by observing in which state they are (top without "-mio"). If the
processes are blocking, try "procstat -k <pid>" on them.

Thay are not in blocking state:
  PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
39405 www           1 101    0   108M 14832K RUN    1   0:54 25.39% php-fpm
39544 www           1 101    0   108M 14832K CPU1   1   0:42 25.39% php-fpm
39395 www           1 101    0   108M 14828K CPU1   1   0:57 23.39% php-fpm
39401 www           1 101    0   108M 14832K CPU3   3   0:55 23.19% php-fpm
39481 www           1 101    0   108M 14832K RUN    3   0:55 23.10% php-fpm
13024 www           1 101    0   112M 47156K RUN    0   1:30 23.00% php-fpm
39398 www           1   4    0   108M 14832K accept 0   1:00 13.48% php-fpm
39373 www           1   4    0   108M 14832K accept 0   1:00 10.60% php-fpm
39385 www           1   4    0   108M 14832K accept 1   1:00  9.28% php-fpm
22130 www           1   4    0   111M 40200K accept 1   1:32  7.57% php-fpm

There is procstat -k/-t for couple of processes from this list:

PID      TID COMM             TDNAME     KSTACK
39405 100110 php-fpm             -       dmapbase
  PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39405 100110 php-fpm             -        1   185 run      -

  PID    TID COMM             TDNAME     KSTACK
39398 100154 php-fpm          -          tdq_cpu
  PID    TID COMM             TDNAME     CPU  PRI STATE   WCHAN
39398 100154 php-fpm          -           0   88 sleep   accept

When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0xffffffff808bfd80,0x7fffffffa078) = 0 (0x0)

Thank you in advance for any tips.

--
Regards,
Ruslan
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to