Re: Tracking down a problem with php on FreeBSD
Ivan Voras wrote: On 5 February 2011 19:43, Ruslan Mahmatkhanov wrote: Hi, Ivan! Thank you much for response and sorry for late answer. We was able to collect some data about the issue to make discussion more objective. See below. Simple php-fpm restart solves the problem, but i need to track it down to the cause of this situation and ask for your assistance and instructions on how to debug it. Some facts about this: On one hand, FPM is said to be very experimental... Personally, I've been using apache22-worker or apache22-event + mod_fcgid for years without trouble. We prefer to avoid using apache at all, because in this it's just adds yet another unneeded link and complexity. I guess it's about tradeoffs beween complexity and stability :) - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values for php-fpm processes and LA is more than 120 I think this is significant, especially with this: When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) "Normal" processes of the type PHP is have no need to call sched_yield() arbitrarily, unless they are implementing something they shouldn't - like a synchronization primitive (semaphore/lock). If "a lot" means "of the same order of magnitude as your VCSW rate", this is the reason for it. I've analyzed my php-cgi binary and modules and they don't use sched_yield. And yes, grepping for it in the source finds it only in FPM: sapi/fpm/fpm/fpm_atomic.h:140: sched_yield(); It seems they are trying to implement a spinlock by hand, instead of using what the OS provides. (on the other hand, the implementation might be correct but they may be using it wrong). In any case, this points to bugs in FPM. if so, unfortunately I can't help you further. If you really want to continue using FPM, I guess you should probably replace this hand-made lock implementation by sem(4) or see if "robust" pthreads mutexes can be committed and MFCed (maybe with David Xu). Yes, there is a p4 branch implemented pthread robust mutex, it requires ABI change. Document for the POSIX robust mutex is here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
06.02.2011 01:14, Ivan Voras пишет: On 5 February 2011 23:11, Ruslan Mahmatkhanov wrote: Yes, it seems so. But all of this locking/threading is a black magick for me right now, and i don't feel to be able to study out with this fpm issue by myself. So i just sent this last obtained info to php-fpm mailing list. And thank you again, Ivan, for your analysis and explanations. They would likely be more helped if you found a core dump of a crashed PHP process (if that is what's causing it). I'll try to find, but it seems they just hanging, not crashing. -- Regards, Ruslan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
On 5 February 2011 23:11, Ruslan Mahmatkhanov wrote: > Yes, it seems so. But all of this locking/threading is a black magick for me > right now, and i don't feel to be able to study out with this fpm issue by > myself. So i just sent this last obtained info to php-fpm mailing list. And > thank you again, Ivan, for your analysis and explanations. They would likely be more helped if you found a core dump of a crashed PHP process (if that is what's causing it). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
06.02.2011 00:39, Ivan Voras пишет: On 5 February 2011 21:22, Ivan Voras wrote: On 5 February 2011 21:03, Ruslan Mahmatkhanov wrote: Can you please tell me more what you mean by ""robust" pthreads mutexes" and It's just a name for properties of a mutex; actually this is imprecise, what's needed here is process-shared& robust (fpm_shm_slots.c: FPM uses shared memory). Actually I think "robustness" is the key here (in this context it means that the locks of a thread / processes are released if a thread / process dies unexpectedly (crashes)); It is very likely that in your case the PHP process with FPM SAPI module dies while holding a lock shared between processes and the other processes get stuck waiting for this lock to unlock. Yes, it seems so. But all of this locking/threading is a black magick for me right now, and i don't feel to be able to study out with this fpm issue by myself. So i just sent this last obtained info to php-fpm mailing list. And thank you again, Ivan, for your analysis and explanations. -- Regards, Ruslan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
On 5 February 2011 21:22, Ivan Voras wrote: > On 5 February 2011 21:03, Ruslan Mahmatkhanov wrote: > >> >> Can you please tell me more what you mean by ""robust" pthreads mutexes" and > > It's just a name for properties of a mutex; actually this is > imprecise, what's needed here is process-shared & robust > (fpm_shm_slots.c: FPM uses shared memory). Actually I think "robustness" is the key here (in this context it means that the locks of a thread / processes are released if a thread / process dies unexpectedly (crashes)); It is very likely that in your case the PHP process with FPM SAPI module dies while holding a lock shared between processes and the other processes get stuck waiting for this lock to unlock. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
05.02.2011 23:43, Doug Barton пишет: On 02/05/2011 10:58, Ruslan Mahmatkhanov wrote: I just forget to add. We tried to rebuild php and modules with WITHOUT_LINKTHR=false Due to the way make works the opposite of WITHOUT_* is WITH_*. The actual value that it's defined to is ignored. In any case, you're better off using 'make config', that's what it's there for. :) Doug Yes, i saw the contradiction here, and yes i used make config for enabling this option. Thanks. -- Regards, Ruslan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
On 02/05/2011 10:58, Ruslan Mahmatkhanov wrote: I just forget to add. We tried to rebuild php and modules with WITHOUT_LINKTHR=false Due to the way make works the opposite of WITHOUT_* is WITH_*. The actual value that it's defined to is ignored. In any case, you're better off using 'make config', that's what it's there for. :) Doug -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
On 5 February 2011 21:03, Ruslan Mahmatkhanov wrote: > > Can you please tell me more what you mean by ""robust" pthreads mutexes" and It's just a name for properties of a mutex; actually this is imprecise, what's needed here is process-shared & robust (fpm_shm_slots.c: FPM uses shared memory). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
05.02.2011 22:43, Ivan Voras пишет: - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values for php-fpm processes and LA is more than 120 I think this is significant, especially with this: When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) "Normal" processes of the type PHP is have no need to call sched_yield() arbitrarily, unless they are implementing something they shouldn't - like a synchronization primitive (semaphore/lock). If "a lot" means "of the same order of magnitude as your VCSW rate", this is the reason for it. I've analyzed my php-cgi binary and modules and they don't use sched_yield. And yes, grepping for it in the source finds it only in FPM: sapi/fpm/fpm/fpm_atomic.h:140: sched_yield(); It seems they are trying to implement a spinlock by hand, instead of using what the OS provides. (on the other hand, the implementation might be correct but they may be using it wrong). In any case, this points to bugs in FPM. if so, unfortunately I can't help you further. Ok, thank you. I'll try to ask FPM people once again (i sent them the same message that is in hackers@ but with no response) If you really want to continue using FPM, I guess you should probably replace this hand-made lock implementation by sem(4) or see if "robust" pthreads mutexes can be committed and MFCed (maybe with David Xu). Can you please tell me more what you mean by ""robust" pthreads mutexes" and if it can help somehow without modifying fpm code? Is there any patches that we can apply to our development server to play with? Here is the FPM file: http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/sapi/fpm/fpm/fpm_atomic.h?revision=305417&view=markup -- Regards, Ruslan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
On 5 February 2011 19:43, Ruslan Mahmatkhanov wrote: > Hi, Ivan! > > Thank you much for response and sorry for late answer. We was able to > collect some data about the issue to make discussion more objective. See > below. >>> Simple php-fpm restart solves the problem, but i need to track it down >>> to the cause of this situation and ask for your assistance and >>> instructions on how to debug it. Some facts about this: >> >> On one hand, FPM is said to be very experimental... >> >> Personally, I've been using apache22-worker or apache22-event + >> mod_fcgid for years without trouble. > > We prefer to avoid using apache at all, because in this it's just adds yet > another unneeded link and complexity. I guess it's about tradeoffs beween complexity and stability :) >>> - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values >>> for php-fpm processes and LA is more than 120 I think this is significant, especially with this: > When attaching to any hanging php-fpm proccess with truss, than i see a lot > of this calls: > sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) > = 0 (0x0) > sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) > = 0 (0x0) > sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) > = 0 (0x0) > sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) > = 0 (0x0) "Normal" processes of the type PHP is have no need to call sched_yield() arbitrarily, unless they are implementing something they shouldn't - like a synchronization primitive (semaphore/lock). If "a lot" means "of the same order of magnitude as your VCSW rate", this is the reason for it. I've analyzed my php-cgi binary and modules and they don't use sched_yield. And yes, grepping for it in the source finds it only in FPM: sapi/fpm/fpm/fpm_atomic.h:140: sched_yield(); It seems they are trying to implement a spinlock by hand, instead of using what the OS provides. (on the other hand, the implementation might be correct but they may be using it wrong). In any case, this points to bugs in FPM. if so, unfortunately I can't help you further. If you really want to continue using FPM, I guess you should probably replace this hand-made lock implementation by sem(4) or see if "robust" pthreads mutexes can be committed and MFCed (maybe with David Xu). Here is the FPM file: http://svn.php.net/viewvc/php/php-src/branches/PHP_5_3/sapi/fpm/fpm/fpm_atomic.h?revision=305417&view=markup ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
I just forget to add. We tried to rebuild php and modules with WITHOUT_LINKTHR=false (default is true) w/o any visible changes. I also the this in failures column of vmstat -z, may it be somehow related to the issue? ITEMSIZE LIMIT USED FREE REQUESTS FAILURES 32 Bucket: 280,0, 263,3, 263,1 64 Bucket: 536,0, 323,6, 329, 93 128 Bucket:1048,0, 1937,1, 8897,21199 tcptw: 72, 5150, 4256, 894, 4862309, 6966 -- Regards, Ruslan ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Tracking down a problem with php on FreeBSD
Hi, Ivan! Thank you much for response and sorry for late answer. We was able to collect some data about the issue to make discussion more objective. See below. 24.01.2011 16:54, Ivan Voras пишет: On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote: Good day! We are using custom php application on FreeBSD 8.1R amd64. It is started with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend. Several times per day this app is making self unavailable. I think it would be more appropriate to ask this on the stable@ list. I believe that this issue is not related to stable only and problem resolution will involve some debugging and stuff, so this was the reason why i post this to hackers@. Simple php-fpm restart solves the problem, but i need to track it down to the cause of this situation and ask for your assistance and instructions on how to debug it. Some facts about this: On one hand, FPM is said to be very experimental... Personally, I've been using apache22-worker or apache22-event + mod_fcgid for years without trouble. We prefer to avoid using apache at all, because in this it's just adds yet another unneeded link and complexity. It looks very application-specific, possibly not really an OS problem (or maybe a problem of different expectations from the OS when porting from Linux). - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values for php-fpm processes and LA is more than 120 How many "real" user request are in these 120? Do any users at the time of problem (this doesn't look like a "crash") receive valid responses? This problem is seen on development server too with < 5 users online. And nobody receives valid response, they all get this message: Maximum execution time of 30 seconds exceeded in Unknown on line 0. - user seeing http 502 error code in browser - php-fpm log has many of this lines in time of crash: Jan 23 17:56:58.176425 [WARNING] [pool world] server reached max_children setting (100), consider raising it Did you try raising it? Does the error happen ONLY when this limit is reached? Yes, we did. Nothing changes. I'm not sure about second question, i believe it's not related. These are some very varied errors, not especially consistent with each other. Did you try some generic socket & TCP tuning like described in http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel ? Yes we did some on initial stage when server was built. And recently tried to raise kern.ipc.somaxconn, but without success. There is full current sysctl.conf: security.bsd.see_other_uids = 0 security.bsd.see_other_gids = 0 security.bsd.unprivileged_proc_debug=0 security.bsd.unprivileged_read_msgbuf=0 security.bsd.conservative_signals=1 security.bsd.hardlink_check_uid=1 security.bsd.hardlink_check_gid=1 vfs.usermount=0 net.inet.tcp.drop_synfin=1 net.inet.icmp.drop_redirect=1 net.inet.icmp.log_redirect=1 net.inet.icmp.bmcastecho=0 net.inet.icmp.maskrepl=0 net.inet.icmp.icmplim=100 net.link.ether.inet.max_age=800 net.inet.ip.redirect=0 net.inet6.ip6.redirect=0 net.inet.ip.sourceroute=0 net.inet.ip.accept_sourceroute=0 net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 kern.ps_showallprocs=0 net.inet.ip.random_id=1 kern.corefile=/tmp/%U.%N.%P.core Other than that, you will probably have to debug the php-fpm processes. Start by observing in which state they are (top without "-mio"). If the processes are blocking, try "procstat -k " on them. Thay are not in blocking state: PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 39405 www 1 1010 108M 14832K RUN1 0:54 25.39% php-fpm 39544 www 1 1010 108M 14832K CPU1 1 0:42 25.39% php-fpm 39395 www 1 1010 108M 14828K CPU1 1 0:57 23.39% php-fpm 39401 www 1 1010 108M 14832K CPU3 3 0:55 23.19% php-fpm 39481 www 1 1010 108M 14832K RUN3 0:55 23.10% php-fpm 13024 www 1 1010 112M 47156K RUN0 1:30 23.00% php-fpm 39398 www 1 40 108M 14832K accept 0 1:00 13.48% php-fpm 39373 www 1 40 108M 14832K accept 0 1:00 10.60% php-fpm 39385 www 1 40 108M 14832K accept 1 1:00 9.28% php-fpm 22130 www 1 40 111M 40200K accept 1 1:32 7.57% php-fpm There is procstat -k/-t for couple of processes from this list: PID TID COMM TDNAME KSTACK 39405 100110 php-fpm - dmapbase PIDTID COMM TDNAME CPU PRI STATE WCHAN 39405 100110 php-fpm -1 185 run - PIDTID COMM TDNAME KSTACK 39398 100154 php-fpm - tdq_cpu PIDTID COMM TDNAME CPU PRI STATE WCHAN 39398 100154 php-fpm - 0 88 sleep accept When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1
Re: Tracking down a problem with php on FreeBSD
On 23.1.2011 23:58, Ruslan Mahmatkhanov wrote: Good day! We are using custom php application on FreeBSD 8.1R amd64. It is started with php-fpm 5.3.3 from ports as backend and nginx 0.8.54 as frontend. Several times per day this app is making self unavailable. I think it would be more appropriate to ask this on the stable@ list. Simple php-fpm restart solves the problem, but i need to track it down to the cause of this situation and ask for your assistance and instructions on how to debug it. Some facts about this: On one hand, FPM is said to be very experimental... Personally, I've been using apache22-worker or apache22-event + mod_fcgid for years without trouble. - I don't know how to manually reproduce this, but it happens several times every day - It happens on FreeBSD 7.x too - It happens with apache+mod_php instead php-fpm - It happens with lighthttpd instead nginx - It happens with both SHED_4BSD and SHED_ULE - It doesn't happen on php =< 5.2.12 - It happens with and w/o eaccelerator It looks very application-specific, possibly not really an OS problem (or maybe a problem of different expectations from the OS when porting from Linux). - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values for php-fpm processes and LA is more than 120 How many "real" user request are in these 120? Do any users at the time of problem (this doesn't look like a "crash") receive valid responses? - user seeing http 502 error code in browser - php-fpm log has many of this lines in time of crash: Jan 23 17:56:58.176425 [WARNING] [pool world] server reached max_children setting (100), consider raising it Did you try raising it? Does the error happen ONLY when this limit is reached? 2011/01/23 17:57:00 [error] 38018#0: *26006023 writev() failed (54: Connection reset by peer) while sending request to upstream, client: xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Chat& a=chatList&__path=chat_list&h=8093b9e1cf448762d5677e21bded97ae& h1=38ca8b747a46098c3b1a4f39e6658170 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: "http://some.server.org/"; 2011/01/23 17:57:00 [error] 38016#0: *26029878 kevent() reported about an closed connection (54: Connection reset by peer) while reading response header from upstream, client: xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Location&a=refresh& __path=refresh&h=276f591df26a65d9a1736f6e1006f4ab& h1=3c0916c16b1fc2e7015b71e90bbc3d23 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: "http://some.server.org/"; 2011/01/23 17:57:02 [crit] 38020#0: *26034390 open() "/tmp/nginx /client_temp/1/74/000741" failed (13: Permission denied) while sending request to upstream, client: xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send& h=4a27d8d382ba9b1059412323a451ef84& h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream: "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: "http://some.server.org/"; 2011/01/23 17:57:02 [alert] 38020#0: *26034390 http request count is zero while sending to client, client: xx.xx.xx.xx, server: some.server.org, request: "POST /?ctrl=Chat&a=send&__path=chat_send& h=4a27d8d382ba9b1059412323a451ef84& h1=b0a53c86e3c744a01356a5030559ed1a HTTP/1.1", upstream: "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: "http://some.server.org/"; 2011/01/23 17:57:03 [error] 38014#0: *25997903 upstream prematurely closed connection while reading response header from upstream, client: 109.229.69.186, server: some.server.org, request: "POST /?ctrl=Chat&a=chatList&__path=chat_list& h=c8723de73c4f8ebb98f9bf746d75e965& h1=3ab289760a009b07b73c6d96cc94a509 HTTP/1.1", upstream: "fastcgi://127.0.0.1:9002", host: "some.server.org", referrer: "http://some.server.org/"; These are some very varied errors, not especially consistent with each other. Did you try some generic socket & TCP tuning like described in http://serverfault.com/questions/64356/freebsd-performance-tuning-sysctls-loader-conf-kernel ? Other than that, you will probably have to debug the php-fpm processes. Start by observing in which state they are (top without "-mio"). If the processes are blocking, try "procstat -k " on them. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"