Bug #61558 [Fbk->Asn]: Runaway spawning of children after pipe error

2012-05-23 Thread phpbug2012 at tgabber dot mine dot nu
Edit report at https://bugs.php.net/bug.php?id=61558&edit=1

 ID: 61558
 User updated by:phpbug2012 at tgabber dot mine dot nu
 Reported by:phpbug2012 at tgabber dot mine dot nu
 Summary:Runaway spawning of children after pipe error
-Status: Feedback
+Status: Assigned
 Type:   Bug
 Package:FPM related
 Operating System:   Debian Linux
 PHP Version:5.3.10
 Assigned To:fat
 Block user comment: N
 Private report: N

 New Comment:

Hi,

The bug is not reproducible as such, however it has happened 3 more times with 
exactly the same symptoms, with uptimes for fpm varying from 2 days to 5 days. 
After this I changed the fpm pool configuration files to have 
catch_workers_output = no (I'd mistakenly thought that this had to be yes to 
have php errors logged to syslog) and the bug has not resurfaced so far 
(maximum uptime reached with this config 16 days).

I'm afraid I don't have anything else from the log files now other than what I 
included in the initial bug report. However I did thoroughly check the logs at 
the time and there were no other errors either from php or any other service.

There are no ulimits in force on the server and I doubt I ran out of file 
descriptors as checking /proc/sys/fs/file-nr just now shows only 2528 out of 
392117 used.

I'm reluctant to change the config back to gather more information on the bug 
as this is a production server and the bug if it hits has a severe impact. I do 
have a backup machine (with the same config) that I could experiment on, but 
without knowing exactly how to trigger the bug and no real-world load I don't 
know how useful that would be.


Previous Comments:

[2012-05-23 18:30:39] f...@php.net

Not enough information was provided for us to be able
to handle this bug. Please re-read the instructions at
http://bugs.php.net/how-to-report.php

If you can provide more information, feel free to add it
to this bug and change the status back to "Open".

Thank you for your interest in PHP.


Does the error happened again since (is it reproductible) ? If so, It would be 
great to have a stacktrace of the last segfault.

Do you still have the error log? I'd like to see what was just before and after 
the "Bad file descriptor" lines.

Do you have, by any chances, reached any ulimit limitations (like number of 
open 
files) ? This could maybe explain the "Bad file descriptor" error (but not the 
others errors)

If you have any clue on how to reproduce the problem, it would be very useful 
to 
fix what seems to be an important bug to me.

thx a lot

------------------------
[2012-03-29 20:28:53] phpbug2012 at tgabber dot mine dot nu

Description:

Relevant software versions

Linux 2.6.32-5-amd64
Server Version: Apache/2.2.16 (Debian) DAV/2 mod_fastcgi/2.4.6
PHP Version 5.3.10-1~dotdeb.1
APC Version 3.1.9

Background

This is a lightly-loaded webserver running around 40 virtual hosts that 
experiences occasional traffic spikes. Around 20 virtual hosts use php and have 
one fpm pool each to run php under their own user, configured as ondemand so 
that they have no fpm processes running when they are not serving pages. 
Generally there are between 1 and 5 fpm processes running in total at any one 
time, rising to maybe 20 to 30 when a spike of traffic comes in.

Error

I became aware of the problem after a service monitor reported that websites 
were no longer being served. Checking, I found that that php-fpm had crashed. 
Time since the last restart of php-fpm when the error occured was about 2 days 
4 hours, with around 2.5million php requests served. The APC cache of 128Mb was 
about three-quarters full.

Looking back through the logs, this is what I found. The problem appears to 
start at 18:55:32 when php-fpm was unable to read a pipe. I have no idea what 
caused this initial error, looking back through other logs I could not see any 
abnormal web requests around this time.

Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: 
Bad file descriptor (9)
Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: 
Bad file descriptor (9)

There were no further fpm messages in the log until those shown below. This is 
the most worrying aspect of this bug as basically fpm seems to have gone into 
meltdown, continually spawning a process (and using almost all available cpu).


Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 9 exited 
with code 0 after 0.002001 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 exited 
with code 0 after 0.001790 seconds from start
Mar 24 18:56:48 banana php-fp

[PHP-BUG] Bug #61558 [NEW]: Runaway spawning of children after pipe error

2012-03-29 Thread phpbug2012 at tgabber dot mine dot nu
From: 
Operating system: Debian Linux
PHP version:  5.3.10
Package:  FPM related
Bug Type: Bug
Bug description:Runaway spawning of children after pipe error

Description:

Relevant software versions

Linux 2.6.32-5-amd64
Server Version: Apache/2.2.16 (Debian) DAV/2 mod_fastcgi/2.4.6
PHP Version 5.3.10-1~dotdeb.1
APC Version 3.1.9

Background

This is a lightly-loaded webserver running around 40 virtual hosts that
experiences occasional traffic spikes. Around 20 virtual hosts use php and
have one fpm pool each to run php under their own user, configured as
ondemand so that they have no fpm processes running when they are not
serving pages. Generally there are between 1 and 5 fpm processes running in
total at any one time, rising to maybe 20 to 30 when a spike of traffic
comes in.

Error

I became aware of the problem after a service monitor reported that
websites were no longer being served. Checking, I found that that php-fpm
had crashed. Time since the last restart of php-fpm when the error occured
was about 2 days 4 hours, with around 2.5million php requests served. The
APC cache of 128Mb was about three-quarters full.

Looking back through the logs, this is what I found. The problem appears to
start at 18:55:32 when php-fpm was unable to read a pipe. I have no idea
what caused this initial error, looking back through other logs I could not
see any abnormal web requests around this time.

Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child
say: Bad file descriptor (9)
Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child
say: Bad file descriptor (9)

There were no further fpm messages in the log until those shown below. This
is the most worrying aspect of this bug as basically fpm seems to have gone
into meltdown, continually spawning a process (and using almost all
available cpu).


Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 9
exited with code 0 after 0.002001 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230
exited with code 0 after 0.001790 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22231
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22231
exited with code 0 after 0.001694 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22232
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22232
exited with code 0 after 0.001692 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22233
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22233
exited with code 0 after 0.001685 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22234
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22234
exited with code 0 after 0.001683 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22235
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22235
exited with code 0 after 0.001659 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22236
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22236
exited with code 0 after 0.001677 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22237
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22237
exited with code 0 after 0.001652 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22238
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22238
exited with code 0 after 0.001661 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22239
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22239
exited with code 0 after 0.001653 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22240
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22240
exited with code 0 after 0.001667 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22241
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22241
exited with code 0 after 0.001638 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22242
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22242
exited with code 0 after 0.001669 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22243
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22243
exited with code 0 after 0.001660 seconds from start
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22244
started
Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22244
exited wi