#43610 [Com]: fastcgi socket dies on high concurrency

2007-12-23 Thread olafvdspek at gmail dot com
 ID:   43610
 Comment by:   olafvdspek at gmail dot com
 Reported By:  oliver at realtsp dot com
 Status:   Open
 Bug Type: CGI related
 Operating System: FreeBSD 6.2
 PHP Version:  5.2.5
 New Comment:

> Since the parent process manages this queue

Eh, are you sure it does? As far as I know that's not true.


Previous Comments:


[2007-12-23 11:31:54] oliver at realtsp dot com

@olafvdspek at gmail dot com

That is not in keeping with the FastCGI spec:

#  FCGI_OVERLOADED: rejecting a new request. This happens when the
application runs out of some resource, e.g. database connections.

The situation I am talking about here is a severely overloaded
condition. ie all php worker (child) processes are already busy and
there is a queue of, in my case, an additional 200+ connections. 

My suggestion is that the php parent process allows a max_fastcgi_queue
of say 200 and then rejects further connections with 
FCGI_OVERLOADED. Since the parent process manages this queue it should
its size and it "should" be be easy to place a max limit on that size.
The limit could be configured in php.ini.



[2007-12-22 15:10:55] olafvdspek at gmail dot com

> Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

I'm no PHP developer and haven't looked at the code, but my guess:
A PHP process has C children, each being able to handle one connection.
When that connection is closed, it'll do an accept() to handle a new
connection.
When a web server opens more than C connections, those will not be
accepted until an existing connection is closed, which may take a long
time.
So a web server should never open more than C connections to one PHP
process.



[2007-12-17 13:05:41] oliver at realtsp dot com

Actually..

It turns out that the php parent is not dead at all. Even with stable
5.2.5 (rather than 5.2-latest) if you setup the fastcgi server to be
started separately from lighty ie with lighty config like this:

fastcgi.server = ( ".php" =>
   ( "localhost" =>
 (
   "socket" => "/tmp/php-fastcgi.sock"
 )
   )
)

and the use spawn_fcgi to start the php fcgi server manually. Then all
behaves as expected. ie you get some (not all!!) 500s while the overload
condition exists and when the load drops away you get all normal 200
responses again. ie elastic/tolerant performance as hoped for.

After some investigation into the the lighty source it turns out that
lighty is confused by the fact that PHP just fails to respond (ie
timeout) rather than returning FCGI_OVERLOADED. refer to this:

http://bugs.php.net/bug.php?id=39809

where dimitry said:

"PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection. The only way to detect this
situation - use connection timeout."

lighty however is sticking to the fastcgi spec and expecting the php
parent to be in shutdown mode (ie its PID to dissappear) when it does
not respond (after which it would then respawn a new parent). But
because the PHP parent is just busy and not actually shutting down, the
PID never dissappears and lighty gets stuck in a loop.

I have posted a workaround involving starting PHP separately here:

http://trac.lighttpd.net/trac/ticket/1488

which also proposes a "patch" to deal with PHP's non-standard behaviour
regarding FCGI_OVERLOADED.

However, the fundamental problem remains: It is very difficult for a
FASTCGI client to determine what is going on and therefore what to do
when php just times out on connections rather than returning the correct
FCGI_OVERLOADED response.

I did not understand dmitry's original reason for this: "PHP cannot
return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection."

Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

Thanks

Oliver



[2007-12-17 10:44:55] oliver at realtsp dot com

We have tried with  

  http://snaps.php.net/php5.2-latest.tar.gz

Result is unchanged. 

NOTE that the php workers and parent processes are still showing on ps
after the crash (same as before the crash). But lightly cannot get a
sensible response from them.

[EMAIL PROTECTED] /usr/ports/lang/php5]# pstree  
...
 |-+- 25262 www /usr/local/sbin/lighttpd -f
/usr/local/etc/lighttpd.conf
 | \-+= 25263 www /usr/local/bin/php-cgi
 |   |--- 25264 www /usr/local/bin/php-cgi
 |   |--- 25265 www /usr/local/bin/php-cgi
 |   |--- 25266 www /usr/local/bin/php-cgi
 |   |--- 25267 www /usr/local/bin/php-cgi
 |   |--- 25268 www 

#43610 [Com]: fastcgi socket dies on high concurrency

2007-12-22 Thread olafvdspek at gmail dot com
 ID:   43610
 Comment by:   olafvdspek at gmail dot com
 Reported By:  oliver at realtsp dot com
 Status:   Open
 Bug Type: CGI related
 Operating System: FreeBSD 6.2
 PHP Version:  5.2.5
 New Comment:

> Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

I'm no PHP developer and haven't looked at the code, but my guess:
A PHP process has C children, each being able to handle one connection.
When that connection is closed, it'll do an accept() to handle a new
connection.
When a web server opens more than C connections, those will not be
accepted until an existing connection is closed, which may take a long
time.
So a web server should never open more than C connections to one PHP
process.


Previous Comments:


[2007-12-17 13:05:41] oliver at realtsp dot com

Actually..

It turns out that the php parent is not dead at all. Even with stable
5.2.5 (rather than 5.2-latest) if you setup the fastcgi server to be
started separately from lighty ie with lighty config like this:

fastcgi.server = ( ".php" =>
   ( "localhost" =>
 (
   "socket" => "/tmp/php-fastcgi.sock"
 )
   )
)

and the use spawn_fcgi to start the php fcgi server manually. Then all
behaves as expected. ie you get some (not all!!) 500s while the overload
condition exists and when the load drops away you get all normal 200
responses again. ie elastic/tolerant performance as hoped for.

After some investigation into the the lighty source it turns out that
lighty is confused by the fact that PHP just fails to respond (ie
timeout) rather than returning FCGI_OVERLOADED. refer to this:

http://bugs.php.net/bug.php?id=39809

where dimitry said:

"PHP cannot return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection. The only way to detect this
situation - use connection timeout."

lighty however is sticking to the fastcgi spec and expecting the php
parent to be in shutdown mode (ie its PID to dissappear) when it does
not respond (after which it would then respawn a new parent). But
because the PHP parent is just busy and not actually shutting down, the
PID never dissappears and lighty gets stuck in a loop.

I have posted a workaround involving starting PHP separately here:

http://trac.lighttpd.net/trac/ticket/1488

which also proposes a "patch" to deal with PHP's non-standard behaviour
regarding FCGI_OVERLOADED.

However, the fundamental problem remains: It is very difficult for a
FASTCGI client to determine what is going on and therefore what to do
when php just times out on connections rather than returning the correct
FCGI_OVERLOADED response.

I did not understand dmitry's original reason for this: "PHP cannot
return FCGI_OVERLOADED, because all PHP processes are busy
and nobody accepts new connection."

Could you explain or perhaps review PHP's behaviour under overloaded
conditions.

Thanks

Oliver



[2007-12-17 10:44:55] oliver at realtsp dot com

We have tried with  

  http://snaps.php.net/php5.2-latest.tar.gz

Result is unchanged. 

NOTE that the php workers and parent processes are still showing on ps
after the crash (same as before the crash). But lightly cannot get a
sensible response from them.

[EMAIL PROTECTED] /usr/ports/lang/php5]# pstree  
...
 |-+- 25262 www /usr/local/sbin/lighttpd -f
/usr/local/etc/lighttpd.conf
 | \-+= 25263 www /usr/local/bin/php-cgi
 |   |--- 25264 www /usr/local/bin/php-cgi
 |   |--- 25265 www /usr/local/bin/php-cgi
 |   |--- 25266 www /usr/local/bin/php-cgi
 |   |--- 25267 www /usr/local/bin/php-cgi
 |   |--- 25268 www /usr/local/bin/php-cgi
 |   |--- 25269 www /usr/local/bin/php-cgi
 |   |--- 25270 www /usr/local/bin/php-cgi
 |   |--- 25271 www /usr/local/bin/php-cgi
 |   |--- 25272 www /usr/local/bin/php-cgi
 |   |--- 25273 www /usr/local/bin/php-cgi
 |   |--- 25274 www /usr/local/bin/php-cgi
 |   |--- 25275 www /usr/local/bin/php-cgi
 |   |--- 25276 www /usr/local/bin/php-cgi
 |   |--- 25277 www /usr/local/bin/php-cgi
 |   |--- 25278 www /usr/local/bin/php-cgi
 |   \--- 25279 www /usr/local/bin/php-cgi




[2007-12-17 09:17:30] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php5.2-latest.tar.gz
 
For Windows (zip):
 
  http://snaps.php.net/win32/php5.2-win32-latest.zip

For Windows (installer):

  http://snaps.php.net/win32/php5.2-win32-installer-latest.msi





[2007-12-16 21:55:00] oliver at realtsp dot com

Description:

Version information below.

When I load the server with sieg