Bug #61558 [Fbk->Asn]: Runaway spawning of children after pipe error
Edit report at https://bugs.php.net/bug.php?id=61558&edit=1 ID: 61558 User updated by:phpbug2012 at tgabber dot mine dot nu Reported by:phpbug2012 at tgabber dot mine dot nu Summary:Runaway spawning of children after pipe error -Status: Feedback +Status: Assigned Type: Bug Package:FPM related Operating System: Debian Linux PHP Version:5.3.10 Assigned To:fat Block user comment: N Private report: N New Comment: Hi, The bug is not reproducible as such, however it has happened 3 more times with exactly the same symptoms, with uptimes for fpm varying from 2 days to 5 days. After this I changed the fpm pool configuration files to have catch_workers_output = no (I'd mistakenly thought that this had to be yes to have php errors logged to syslog) and the bug has not resurfaced so far (maximum uptime reached with this config 16 days). I'm afraid I don't have anything else from the log files now other than what I included in the initial bug report. However I did thoroughly check the logs at the time and there were no other errors either from php or any other service. There are no ulimits in force on the server and I doubt I ran out of file descriptors as checking /proc/sys/fs/file-nr just now shows only 2528 out of 392117 used. I'm reluctant to change the config back to gather more information on the bug as this is a production server and the bug if it hits has a severe impact. I do have a backup machine (with the same config) that I could experiment on, but without knowing exactly how to trigger the bug and no real-world load I don't know how useful that would be. Previous Comments: [2012-05-23 18:30:39] f...@php.net Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.php.net/how-to-report.php If you can provide more information, feel free to add it to this bug and change the status back to "Open". Thank you for your interest in PHP. Does the error happened again since (is it reproductible) ? If so, It would be great to have a stacktrace of the last segfault. Do you still have the error log? I'd like to see what was just before and after the "Bad file descriptor" lines. Do you have, by any chances, reached any ulimit limitations (like number of open files) ? This could maybe explain the "Bad file descriptor" error (but not the others errors) If you have any clue on how to reproduce the problem, it would be very useful to fix what seems to be an important bug to me. thx a lot ------------------------ [2012-03-29 20:28:53] phpbug2012 at tgabber dot mine dot nu Description: Relevant software versions Linux 2.6.32-5-amd64 Server Version: Apache/2.2.16 (Debian) DAV/2 mod_fastcgi/2.4.6 PHP Version 5.3.10-1~dotdeb.1 APC Version 3.1.9 Background This is a lightly-loaded webserver running around 40 virtual hosts that experiences occasional traffic spikes. Around 20 virtual hosts use php and have one fpm pool each to run php under their own user, configured as ondemand so that they have no fpm processes running when they are not serving pages. Generally there are between 1 and 5 fpm processes running in total at any one time, rising to maybe 20 to 30 when a spike of traffic comes in. Error I became aware of the problem after a service monitor reported that websites were no longer being served. Checking, I found that that php-fpm had crashed. Time since the last restart of php-fpm when the error occured was about 2 days 4 hours, with around 2.5million php requests served. The APC cache of 128Mb was about three-quarters full. Looking back through the logs, this is what I found. The problem appears to start at 18:55:32 when php-fpm was unable to read a pipe. I have no idea what caused this initial error, looking back through other logs I could not see any abnormal web requests around this time. Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: Bad file descriptor (9) Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: Bad file descriptor (9) There were no further fpm messages in the log until those shown below. This is the most worrying aspect of this bug as basically fpm seems to have gone into meltdown, continually spawning a process (and using almost all available cpu). Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 9 exited with code 0 after 0.002001 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 exited with code 0 after 0.001790 seconds from start Mar 24 18:56:48 banana php-fp
[PHP-BUG] Bug #61558 [NEW]: Runaway spawning of children after pipe error
From: Operating system: Debian Linux PHP version: 5.3.10 Package: FPM related Bug Type: Bug Bug description:Runaway spawning of children after pipe error Description: Relevant software versions Linux 2.6.32-5-amd64 Server Version: Apache/2.2.16 (Debian) DAV/2 mod_fastcgi/2.4.6 PHP Version 5.3.10-1~dotdeb.1 APC Version 3.1.9 Background This is a lightly-loaded webserver running around 40 virtual hosts that experiences occasional traffic spikes. Around 20 virtual hosts use php and have one fpm pool each to run php under their own user, configured as ondemand so that they have no fpm processes running when they are not serving pages. Generally there are between 1 and 5 fpm processes running in total at any one time, rising to maybe 20 to 30 when a spike of traffic comes in. Error I became aware of the problem after a service monitor reported that websites were no longer being served. Checking, I found that that php-fpm had crashed. Time since the last restart of php-fpm when the error occured was about 2 days 4 hours, with around 2.5million php requests served. The APC cache of 128Mb was about three-quarters full. Looking back through the logs, this is what I found. The problem appears to start at 18:55:32 when php-fpm was unable to read a pipe. I have no idea what caused this initial error, looking back through other logs I could not see any abnormal web requests around this time. Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: Bad file descriptor (9) Mar 24 18:55:32 banana php-fpm[21906]: [ERROR] unable to read what child say: Bad file descriptor (9) There were no further fpm messages in the log until those shown below. This is the most worrying aspect of this bug as basically fpm seems to have gone into meltdown, continually spawning a process (and using almost all available cpu). Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 9 exited with code 0 after 0.002001 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22230 exited with code 0 after 0.001790 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22231 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22231 exited with code 0 after 0.001694 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22232 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22232 exited with code 0 after 0.001692 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22233 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22233 exited with code 0 after 0.001685 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22234 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22234 exited with code 0 after 0.001683 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22235 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22235 exited with code 0 after 0.001659 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22236 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22236 exited with code 0 after 0.001677 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22237 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22237 exited with code 0 after 0.001652 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22238 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22238 exited with code 0 after 0.001661 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22239 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22239 exited with code 0 after 0.001653 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22240 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22240 exited with code 0 after 0.001667 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22241 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22241 exited with code 0 after 0.001638 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22242 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22242 exited with code 0 after 0.001669 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22243 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22243 exited with code 0 after 0.001660 seconds from start Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22244 started Mar 24 18:56:48 banana php-fpm[21906]: [NOTICE] [pool ci] child 22244 exited wi