> On 1 Jan 2017, at 9:48 AM, Cristiano Coelho <[email protected]> wrote:
>
> With ps auxwww I'm not really sure what to look at, I can see the process
> that should have died together with another one spawned on the same date, and
> also the new ones spawned today from the test:
>
> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> root 5944 0.0 0.4 224520 9096 ? S 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> root 5946 0.0 0.1 27304 2372 ? S 21:38 0:00
> /usr/sbin/rotatelogs /var/log/httpd/healthd/application.log.%Y-%m-%d-%H 3600
> apache 5947 0.0 0.3 224524 7436 ? S 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> wsgi 5948 7.9 4.8 1233468 98820 ? Sl 21:38 3:57 (wsgi:wsgi)
> -DFOREGROUND
> apache 5949 0.0 0.4 1111720 9548 ? Sl 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> apache 5950 0.0 0.4 980432 8484 ? Sl 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> apache 5951 0.0 0.3 980432 7672 ? Sl 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> apache 6075 0.0 0.4 980608 8692 ? Sl 21:38 0:00
> /usr/sbin/httpd -DFOREGROUND
> ec2-user 6938 0.0 0.1 117204 2464 pts/0 R+ 22:28 0:00 ps auxwww
> wsgi 12673 0.6 7.3 1239612 149972 ? Sl Dec30 11:39 (wsgi:wsgi)
> -DFOREGROUND
> root 12873 0.0 0.0 0 0 ? S Dec30 0:00
> [kworker/u30:2]
>
> You can see the process seems to be in the exact same status as the one
> that's fine so perhaps it was never attempted to be killed at all and only
> the one in the logs was killed with sigkill but it didn't kill the actual
> wsgi one.
>
> About customizing apache, I can easily add new configuration files through
> adding new files on the deployment commands, this also means overwrite files,
> and perhaps delete them as well. This is how I add additional wsgi settings,
> and other apache settings like caching, gzip etc.
> I can certainly include mod_wsgi in the requirements.txt file so it is
> installed through PIP, but that would probably cause issues with the already
> mod_wsgi modules that comes installed into the apache modules folder.
> The machine includes all modules inside /usr/lib64/httpd/modules and simply
> adds a link from /etc/httpd where all the conf files live.
The mod_wsgi module install using pip is not installed into the Apache modules
directory, but the Python virtual environment.
We then need to find a way to suppress:
LoadModule wsgi_module modules/mod_wsgi.so
so that isn’t used, and instead use what is output from running:
mod_wsgi-express module-config
A bigger problem is whether mod_wsgi can be installed using pip. If they do not
include the httpd-dev package on the system, it will not be possible to compile
any additional Apache modules.
Does the program ‘apxs’ or ‘apxs2’ exist anywhere on the system? Usually it
would be in /usr/sbin.
Graham
>
> El sábado, 31 de diciembre de 2016, 19:12:03 (UTC-3), Graham Dumpleton
> escribió:
> Use ‘ps auxwww’ instead of top to look at processes. Because display-name
> option is used with WSGIDaemonProcess, the mod_wsgi daemon processes should
> be named differently and so you can tell them apart from Apache httpd worker
> processes and master process.
>
> Also hunt around in the ‘ps’ command options and run it such that it shows
> the ‘STATUS’ field as well so can see what state process is truly in.
>
> Also update the Apache httpd configuration so that LogLevel is set to ‘info’
> instead of ‘warn’. That will cause mod_wsgi to output logging about when
> daemon processes are being restarted and why.
>
> BTW, how much ability do you have to customise the generated Apache
> configuration file. With ability to pip install mod_wsgi now, it shouldn’t be
> that hard to substitute in a newer mod_wsgi version.
>
> Graham
>
>> On 1 Jan 2017, at 8:56 AM, Cristiano Coelho <cristia...@ <>gmail.com
>> <http://gmail.com/>> wrote:
>>
>> Tried it and no luck, same issue, seems to happen more often on the
>> production machine which is also behind a load balancer, got the process
>> stuck there on the first try with the config change. Also, the process
>> doesn't seem be in zombie status neither, looks completely like a normal
>> process (and it probably is since background threads stay running) but isn't
>> receiving requests. I can't really understand how can a process stay alive
>> like this and stay running normally even after a few sigterm and sigkill
>> signals!
>>
>> What's odd is that the process id that is "stuck" is not really the one that
>> was attempted to be killed, but I'm not familiar with mod_wsgi/apache
>> internals so that's probably fine, below is both the two stuck processes
>> from top and the logs.
>>
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ RUSER COMMAND
>> 5948 wsgi 20 0 1203m 88m 11m S 0.3 4.4 0:03.40 wsgi httpd
>> 12673 wsgi 20 0 1210m 146m 11m S 0.3 7.3 11:26.47 wsgi httpd
>> ---> this one should not be here.
>>
>>
>> [Sat Dec 31 21:38:43.097424 2016] [core:warn] [pid 12669:tid
>> 140513528571968] AH00045: child process 13723 still did not exit, sending a
>> SIGTERM
>> [Sat Dec 31 21:38:45.099655 2016] [core:warn] [pid 12669:tid
>> 140513528571968] AH00045: child process 13723 still did not exit, sending a
>> SIGTERM
>> [Sat Dec 31 21:38:47.101924 2016] [core:warn] [pid 12669:tid
>> 140513528571968] AH00045: child process 13723 still did not exit, sending a
>> SIGTERM
>> [Sat Dec 31 21:38:49.104142 2016] [core:error] [pid 12669:tid
>> 140513528571968] AH00046: child process 13723 still did not exit, sending a
>> SIGKILL
>> [Sat Dec 31 21:38:50.812271 2016] [suexec:notice] [pid 5944:tid
>> 140156604848192] AH01232: suEXEC mechanism enabled (wrapper:
>> /usr/sbin/suexec)
>> [Sat Dec 31 21:38:50.825993 2016] [auth_digest:notice] [pid 5944:tid
>> 140156604848192] AH01757: generating secret for digest authentication ...
>> [Sat Dec 31 21:38:50.826665 2016] [lbmethod_heartbeat:notice] [pid 5944:tid
>> 140156604848192] AH02282: No slotmem from mod_heartmonitor
>> [Sat Dec 31 21:38:50.827032 2016] [:warn] [pid 5944:tid 140156604848192]
>> mod_wsgi: Compiled for Python/2.7.9. <http://2.7.9./>
>> [Sat Dec 31 21:38:50.827041 2016] [:warn] [pid 5944:tid 140156604848192]
>> mod_wsgi: Runtime using Python/2.7.10. <http://2.7.10./>
>> [Sat Dec 31 21:38:50.827503 2016] [core:warn] [pid 5944:tid 140156604848192]
>> AH00098: pid file /var/run/httpd/httpd.pid overwritten -- Unclean shutdown
>> of previous Apache run?
>> [Sat Dec 31 21:38:50.828766 2016] [mpm_event:notice] [pid 5944:tid
>> 140156604848192] AH00489: Apache/2.4.23 (Amazon) mod_wsgi/3.5 Python/2.7.10
>> configured -- resuming normal operations
>> [Sat Dec 31 21:38:50.828782 2016] [core:notice] [pid 5944:tid
>> 140156604848192] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
>>
>>
>> El sábado, 31 de diciembre de 2016, 1:27:46 (UTC-3), Graham Dumpleton
>> escribió:
>>
>>> On 31 Dec 2016, at 3:07 PM, Cristiano Coelho <[email protected] <>>
>>> wrote:
>>>
>>> Hello,
>>>
>>> So the configuration might not be ideal, but these small config tweaks
>>> shoun't be really the source of the issue right? It's a bit odd since I
>>> haven't had issues with similar deploys. This project I guess uses more
>>> libraries C extensions (lxml and postgres) and the background threads/pools
>>> perform a lot of IO (email sending, message queue polling, and some
>>> others), although they are all under daemon threads and should finish under
>>> the 4 seconds grace time.
>>
>> If you are using lxml then you definitely need to use:
>>
>> WSGIApplicationGroup %{GLOBAL}
>>
>> as from memory it is one of the libraries which is known to have issues when
>> used in Python sub interpreters. The problem will be if a callback function
>> is registered which lxml calls when parsing XML. Because it doesn’t deal
>> with thread locking properly when using a sub interpreter, it can deadlock
>> its own thread. Other threads can still run, but if other request threads do
>> the same, you can eventually exhaust all the request threads and the process
>> hangs. Background threads you create separately could still run though.
>> Although even if this occurs, it shouldn’t stop an Apache restart from
>> killing the process.
>>
>>> Would setting WSGIApplication Group %{GLOBAL} still allow me to use more
>>> than 1 process on the daemon configuration? Although I don't think it will
>>> do any change at all since the web servers only listen on port 80 and are
>>> on the same domain so all requests should always be falling into the same
>>> application group if I interpreted the docs correctly.
>>
>> Application group is the Python interpreter context within each respective
>> process. The value %{GLOBAL} just means the main or first interpreter
>> context of the process. This is the same as if you had run command line
>> Python and behaves the same. Any additional interpreter contexts created in
>> a process are what are referred to as sub interpreter contexts. By default
>> mod_wsgi uses a separate sub interpreter context in each process for each
>> WSGI application delegated to run in the same set of processes.
>>
>> So there is no restriction on setting ‘processes’ option of
>> WSGIDaemonProcess to be more than one at the same time as setting
>> WSGIApplicationGroup to %{GLOBAL}.
>>
>>> This issue is so random and since only happens on cloud deploys it gets
>>> really difficult to test if a change helped or not and it can take days to
>>> notice it. I guess I will keep playing around with settings and try to
>>> gather more info of the stuck processes when it happens.
>>
>> Which sounds even more like issue with sub interpreters. If the bit of code
>> which triggers the deadlock is infrequent, then the loss of request threads
>> could be slow. This is where newer mod_wsgi versions at least have various
>> timeout options for causing daemon process restarts when requests timeout or
>> block.
>>
>> At the least, add:
>>
>> WSGIApplicationGroup %{GLOBAL}
>>
>> and see how it goes.
>>
>> Graham
>>
>>> Thank you so much for all the help!
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to modwsgi+u...@ <>googlegroups.com <http://googlegroups.com/>.
>>> To post to this group, send email to mod...@ <>googlegroups.com
>>> <http://googlegroups.com/>.
>>> Visit this group at https://groups.google.com/group/modwsgi
>>> <https://groups.google.com/group/modwsgi>.
>>> For more options, visit https://groups.google.com/d/optout
>>> <https://groups.google.com/d/optout>.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "modwsgi" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <>.
>> To post to this group, send email to [email protected] <>.
>> Visit this group at https://groups.google.com/group/modwsgi
>> <https://groups.google.com/group/modwsgi>.
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "modwsgi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected]
> <mailto:[email protected]>.
> To post to this group, send email to [email protected]
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/modwsgi
> <https://groups.google.com/group/modwsgi>.
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.