Re: [modwsgi] Re: apache defunct processes

Graham Dumpleton Fri, 30 Dec 2016 19:31:38 -0800

> On 31 Dec 2016, at 1:57 PM, Cristiano Coelho <[email protected]> wrote:
> 
> Hello, thanks for the quick response!
> 
> This apache deploy is done automatically by AWS Elastic Beanstalk, so I don't 
> really have control over the version used, I'm amazed it is using a 5 y.o 
> version.


I am not surprised. They base off Ubuntu and Debian/Ubuntu systems are quite 
bad when it comes to supplying up to date versions of packages. People like to 
think RHEL/CentOS are the worst, but RHEL/CentOS do a much better job than 
Debian based systems of supplying up to date, plus supported versions of 
packages. Debian/Ubuntu are becoming a source of huge pain for Open Source 
package maintainers because their users are always on old versions, yet still 
expect support, but don’t realise that package authors will not support such 
old versions, and Debian/Ubuntu maintainers don’t support them either. So you 
are using effectively unsupported software. Amazon compounds the problems by 
using older OS versions even longer and not providing more up to date versions. 
Not the brightest idea to be basing a business on unsupported software, but 
that is what companies do.

> I know for sure it is running already with Daemon Mode since I have looked at 
> the wsgi config they provide.
> 
> At the end is part of the wsgi file config and also the logs of the faulty 
> restart that caused the process to stay alive. The configuration is pretty 
> much provided by Amazon so one would expect it is ideal.

Far from ideal. There are various things wrong with their configuration.

> Also, now that you mention keep alive settings, is there any chance this 
> issue is caused by mpm_event and 100s keep alive settings combination? Will 
> mod_wsgi/apache try to wait until the connections are closed and this is 
> causing issues? This is really odd because there have been many successful 
> restarts even under load and many faulty restarts while the servers were 
> probably not being used.

Unrelated. The keep alive settings only apply to Apache child worker processes 
and not to mod_wsgi daemon mode processes. Because of how Apache handles the 
type of child process mod_wsgi uses, you get at most 4 seconds grace, which is 
when that SIGKILL arrives after the repeated SIGTERM signals. You can’t block 
SIGKILL, although a process can become unresponsive if it was blocked in a 
kernel system call and doesn’t return.

    
http://stackoverflow.com/questions/8600430/cases-in-which-sigkill-will-not-work 
<http://stackoverflow.com/questions/8600430/cases-in-which-sigkill-will-not-work>

Whether other threads can still run in that scenario I don’t know. Either way, 
it would imply that some filesystem device I/O around a broken mount would be 
the cause, which means it is due to Amazon’s infrastructure having issues.

>  -- wsgi.conf (partially)
>  
> LoadModule wsgi_module modules/mod_wsgi.so
> WSGIPythonHome /opt/python/run/baselinenv
> WSGISocketPrefix run/wsgi
> WSGIRestrictEmbedded On
> 
> WSGIDaemonProcess wsgi processes=1 threads=10 display-name=%{GROUP} \
>   
> python-path=/opt/python/current/app:/opt/python/run/venv/lib64/python2.7/site-packages:/opt/python/run/venv/lib/python2.7/site-packages
>  user=wsgi group=wsgi \
>   home=/opt/python/current/app

Should be using python-home to specify Python virtual environment and not using 
python-path to specify site-packages directory. Even then, with their setup of 
only a single daemon process group, they should be able to just use the 
WSGIPythonHome only at global scope.

> WSGIProcessGroup wsgi

They should also specify:

    WSGIApplication Group %{GLOBAL}

if only supporting the one WSGI application in the daemon process group. This 
avoids issues with some third party Python packages with C extensions that will 
not work in Python sub interpreters.

If they weren’t on such old mod_wsgi version, there is also a whole bunch of 
timeout values for daemon mode they should set up to better ensure that 
people’s WSGI applications can recover from stuck processes and backlog.

Overall they have done little to set things up well and if they don’t provide a 
way of users changing it, a user wouldn’t be able to tune and optimise the 
configuration and could well be wasting resources and money.

> </VirtualHost>
> 
> 
>  -- Restart logs
> [Fri Dec 30 18:26:46.825763 2016] [core:warn] [pid 24265:tid 140339875915840] 
> AH00045: child process 24396 still did not exit, sending a SIGTERM
> [Fri Dec 30 18:26:48.827998 2016] [core:warn] [pid 24265:tid 140339875915840] 
> AH00045: child process 24396 still did not exit, sending a SIGTERM
> [Fri Dec 30 18:26:50.830264 2016] [core:warn] [pid 24265:tid 140339875915840] 
> AH00045: child process 24396 still did not exit, sending a SIGTERM
> [Fri Dec 30 18:26:52.832466 2016] [core:error] [pid 24265:tid 
> 140339875915840] AH00046: child process 24396 still did not exit, sending a 
> SIGKILL
> [Fri Dec 30 18:26:54.539770 2016] [suexec:notice] [pid 12669:tid 
> 140513528571968] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
> [Fri Dec 30 18:26:54.550651 2016] [so:warn] [pid 12669:tid 140513528571968] 
> AH01574: module expires_module is already loaded, skipping
> [Fri Dec 30 18:26:54.550700 2016] [so:warn] [pid 12669:tid 140513528571968] 
> AH01574: module deflate_module is already loaded, skipping
> [Fri Dec 30 18:26:54.550791 2016] [so:warn] [pid 12669:tid 140513528571968] 
> AH01574: module wsgi_module is already loaded, skipping

These warnings about already been loaded shows their configuration must be 
broken in other ways as well.

> [Fri Dec 30 18:26:54.552750 2016] [auth_digest:notice] [pid 12669:tid 
> 140513528571968] AH01757: generating secret for digest authentication ...
> [Fri Dec 30 18:26:54.553328 2016] [lbmethod_heartbeat:notice] [pid 12669:tid 
> 140513528571968] AH02282: No slotmem from mod_heartmonitor
> [Fri Dec 30 18:26:54.553663 2016] [:warn] [pid 12669:tid 140513528571968] 
> mod_wsgi: Compiled for Python/2.7.9.
> [Fri Dec 30 18:26:54.553671 2016] [:warn] [pid 12669:tid 140513528571968] 
> mod_wsgi: Runtime using Python/2.7.10.
> [Fri Dec 30 18:26:54.554100 2016] [core:warn] [pid 12669:tid 140513528571968] 
> AH00098: pid file /var/run/httpd/httpd.pid overwritten -- Unclean shutdown of 
> previous Apache run?
> [Fri Dec 30 18:26:54.555343 2016] [mpm_event:notice] [pid 12669:tid 
> 140513528571968] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10 
> configured -- resuming normal operations
> 
> 
> 
> 
> 
> El viernes, 30 de diciembre de 2016, 23:16:46 (UTC-3), Graham Dumpleton 
> escribió:
> The version of mod_wsgi you are using is over 50 versions behind the latest 
> and is a version which was merely a patch release of a version from over 5 
> years ago. I can only suggest you upgrade to the latest mod_wsgi version as 
> it is not supported unless you can manage to force your operating system 
> vendor to support it.
> 
> There is a known orphaned processes issue with mod_wsgi, but is only known as 
> being a problem with certain versions of Apache 2.2 and has never been seen 
> with Apache 2.4. It also never occurred for an Apache restart, only for 
> internal daemon process restarts, albeit it was still the result of some bug 
> in Apache which seemed to have been resolved around Apache 2.2.18.
> 
> I can only speculate that you aren’t using daemon mode, or not always, and 
> requests are running in, or leaking into Apache child worker processes. On a 
> graceful restart, Apache can let worker processes linger so they handle keep 
> alive connections, so if your code was running in embedded mode processes, 
> Apache may well not be shutting them down straight away. Apache could then be 
> loosing track of them as from memory there are cases it will give up on 
> processes when a graceful restart occurs.
> 
> I would ensure you are using daemon mode of mod_wsgi, not embedded mode. 
> Ensure you set at global Apache configuration scope:
> 
>     WSGIRestrictEmbedded On
> 
> so that use of embedded mode of mod_wsgi is prohibited and you will get 
> errors if a WSGI application request is wrongly delegated to a Apache worker 
> process. This will highlight that you have some issue with your mod_wsgi 
> configuration for delegating requests to daemon mode processes.
> 
> Graham
> 
>> On 31 Dec 2016, at 12:11 PM, Cristiano Coelho <cristia...@ <>gmail.com 
>> <http://gmail.com/>> wrote:
>> 
>> Sorry for bringing up sucn an ancient post but this is the closest thing 
>> similar to my issue I have found.
>> 
>> With apache 2.4 and mod_wsgi 3.5 and python 2.7, I am having a similar 
>> issue, exactly on apache reboots.
>> 
>> Not all the time, but some times, the wsgi processes would stay alive after 
>> an apache restart, and need to be manually killed with sudo kill pid. The 
>> worst part of this is that the process keeps running, this is known because 
>> the process which is serving a django app, starts some background threads 
>> with the app that perform some tasks periodically, and when this issue 
>> happens those tasks start to stack up since duplicated logs appear when only 
>> 1 server and 1 process is supposed to be running.
>> The apache process is restarted through amazon AWS elastic beanstalk, which 
>> is a managed service, but the logs shows that a SIGTERM is attempted and 
>> after 3 failures a SIGKILL is sent, yet the process stays alive and doing 
>> tasks.
>> 
>> Note that all background tasks are either daemon threads or ThreadPool 
>> instances from the multiprocessing library.
>> 
>> 
>> El miércoles, 28 de julio de 2010, 10:45:04 (UTC-3), Paddy Joy escribió:
>> Graham, 
>> 
>> Haven't found any evidence of apache crashing, the whole setup has 
>> been running very successfully for the last two years. I usually use 
>> force-reload when changes are made to virtual hosts. 
>> 
>> The memory has definitely been increasing due to the orphaned 
>> processes, especially when I get 2 or 3 processes per application 
>> orphaned however this takes a few weeks to occur and using the 
>> mod_wsgi inactivity timeout helps as these processes appear to drop 
>> down to minimal memory consumption. 
>> 
>> I have upgraded to v3 of mod_wsgi so will monitor for a few weeks and 
>> report back if I can't resolve. Thanks again for your assistance. 
>> 
>> Paddy 
>> 
>> On Jul 27, 9:51 am, Graham Dumpleton <[email protected] <>> 
>> wrote: 
>> > On 25 July 2010 22:16, Paddy Joy <[email protected] <>> wrote: 
>> > 
>> > 
>> > 
>> > > Graham, 
>> > 
>> > > Thank you for such a detailed response. As a first step I will update 
>> > > mod_wsgi to a more recent version! 
>> > 
>> > >> But can you confirm you are using daemon mode and what the 
>> > >> WSGIDaemonProcess configuration is? 
>> > 
>> > > WSGIDaemonProcess designcliq user=django group=django threads=25 
>> > > display-name=%{GROUP} inactivity-timeout=3600 
>> > > WSGIProcessGroup designcliq 
>> > 
>> > >> > I usually have to kill them individually to get rid of them and free 
>> > >> > up the memory. 
>> > 
>> > >> Technically you can't kill defunct processes, they are actually 
>> > >> already dead, so not sure what you are doing. 
>> > 
>> > > Late night reboot. 
>> > 
>> > > Here is a more detailed example of what I am trying to get my head 
>> > > around. 
>> > 
>> > > The following command shows some django applications twice, for 
>> > > example (wsgi:designcliq) appears twice under parent id's 10436 and 
>> > > 19648 (top of output). 
>> > 
>> > > paddy@joytech:~$ ps -feA  | grep -i wsgi 
>> > > django   19686 19648  0 19:29 ?        00:00:00 (wsgi:designcliq) -k 
>> > > start 
>> > > django   14118 10436  0 Jul23 ?        00:00:00 (wsgi:designcliq) -k 
>> > > start 
>> > > django     443 19648  0 20:43 ?        00:00:00 (wsgi:erinaheight -k 
>> > > start 
>> > > django     476 19648  0 20:43 ?        00:00:00 (wsgi:simplystyli -k 
>> > > start 
>> > > django     593 19648  0 20:44 ?        00:00:00 (wsgi:gilliantenn -k 
>> > > start 
>> > > django    3719 19648  0 21:00 ?        00:00:00 (wsgi:pipair)     -k 
>> > > start 
>> > > django    5548 19648  0 21:10 ?        00:00:00 (wsgi:keyboardkid -k 
>> > > start 
>> > > django    6779 10436  0 Jul23 ?        00:00:00 (wsgi:funkparty)  -k 
>> > > start 
>> > > django   11371 19648  0 21:42 ?        00:00:00 (wsgi:classicinte -k 
>> > > start 
>> > > paddy    13613  4428  0 21:55 pts/0    00:00:00 grep -i wsgi 
>> > > django   16246 10436  0 Jul24 ?        00:00:00 (wsgi:fasttraku)  -k 
>> > > start 
>> > > django   18161 10436  0 Jul24 ?        00:00:00 (wsgi:hostingssl) -k 
>> > > start 
>> > > django   19651 19648  0 19:29 ?        00:00:00 (wsgi:hostingssl) -k 
>> > > start 
>> > > django   19700 19648  0 19:29 ?        00:00:00 (wsgi:doorssincer -k 
>> > > start 
>> > > django   19769 19648  0 19:29 ?        00:00:00 (wsgi:fasttraku)  -k 
>> > > start 
>> > > django   19853 19648  0 19:29 ?        00:00:00 (wsgi:mariatennan -k 
>> > > start 
>> > > django   19913 19648  0 19:29 ?        00:00:00 (wsgi:talkoftheto -k 
>> > > start 
>> > > django   23082 10436  0 Jul24 ?        00:00:00 (wsgi:mariatennan -k 
>> > > start 
>> > > django   30964 19648  0 20:33 ?        00:00:00 (wsgi:funkparty)  -k 
>> > > start 
>> > 
>> > > If I then stop apache and run the same command some applications still 
>> > > show up running under parent 10436 even though apache has been 
>> > > stopped. 
>> > 
>> > > paddy@joytech:~$ sudo /etc/init.d/apache2 stop 
>> > >  * Stopping web server apache2 
>> > 
>> > > paddy@joytech:~$ ps -feA  | grep -i wsgi 
>> > > django   14118 10436  0 Jul23 ?        00:00:00 (wsgi:designcliq) -k 
>> > > start 
>> > > django    6779 10436  0 Jul23 ?        00:00:00 (wsgi:funkparty)  -k 
>> > > start 
>> > > paddy    14014  4428  0 21:57 pts/0    00:00:00 grep -i wsgi 
>> > > django   16246 10436  0 Jul24 ?        00:00:00 (wsgi:fasttraku)  -k 
>> > > start 
>> > > django   18161 10436  0 Jul24 ?        00:00:00 (wsgi:hostingssl) -k 
>> > > start 
>> > > django   23082 10436  0 Jul24 ?        00:00:00 (wsgi:mariatennan -k 
>> > > start 
>> > 
>> > > Any ideas? 
>> > 
>> > Have you seen any evidence that Apache itself is crashing? 
>> > Alternatively, have you been doing anything like attaching debuggers 
>> > direct to Apache? 
>> > 
>> > Events like that can sometimes leave processes around, as can other 
>> > things. 
>> > 
>> > The operating system generally has a job to go around and cleanup 
>> > zombie processes that haven't been reclaimed and which may be orphaned 
>> > in some way. 
>> > 
>> > As I pointed out, zombie processes don't actually consume memory and 
>> > it is just an entry in the process table. Thus, unless you are seeing 
>> > issues such as growing system wide memory usage as a result, or of 
>> > Apache no longer serving requests, then I wouldn't be overly 
>> > concerned. 
>> > 
>> > BTW, when you do Apache restarts, are you doing a 'restart' or a 
>> > 'graceful restart'. A graceful restart could possibly result in 
>> > processing hanging around as in that case Apache doesn't forcibly kill 
>> > them off and so if they don't shutdown promptly themselves, and for 
>> > some reason Apache didn't clean them up properly when they do exit, 
>> > they could remain in the process table. 
>> > 
>> > Graham
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to modwsgi+u...@ <>googlegroups.com <http://googlegroups.com/>.
>> To post to this group, send email to mod...@ <>googlegroups.com 
>> <http://googlegroups.com/>.
>> Visit this group at https://groups.google.com/group/modwsgi 
>> <https://groups.google.com/group/modwsgi>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/modwsgi 
> <https://groups.google.com/group/modwsgi>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] Re: apache defunct processes

Reply via email to