Re: [modwsgi] mod_wsgi Showing more threads in htop than assigned

Graham Dumpleton Thu, 20 Feb 2014 04:04:16 -0800

Cleaned up blog version now for the second part on vertically partition Python 
web applications.


http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html

This corrects (hopefully) a number of mis stated things.

Graham

On 20/02/2014, at 3:30 PM, Graham Dumpleton <graham.dumple...@gmail.com> wrote:

> Blog version of the issue with number of threads seen.
> 
> http://blog.dscpl.com.au/2014/02/use-of-threading-in-modwsgi-daemon-mode.html
> 
> I stole your htop output. :-)
> 
> Note that the blog post explains a bit more, mentioning a transient reaper 
> thread that is created at the time of shutdown.
> 
> It is possible I should create that reaper thread up front as well, making 4 
> extra. Am wondering whether delaying creation may be the cause of a rare 
> problem with processes hanging. This could occur if resources were exhausted 
> and the thread could not be created. If request threads or interpreter 
> destruction then subsequently hung, the process would never exit.
> 
> This would though produce a specific log message though and I have never seen 
> that message reported. All the same, may be safer to create the reaper thread 
> at the outset and have it wait on a thread condition variable to know when to 
> activate.
> 
> Graham
> 
> On 20/02/2014, at 2:06 PM, Graham Dumpleton <graham.dumple...@gmail.com> 
> wrote:
> 
>> For each mod_wsgi daemon process where you have set threads=n, you will see 
>> n+3 threads.
>> 
>> The n threads is obviously the configured number of threads to handle 
>> requests.
>> 
>> The other three threads are as follows:
>> 
>> 1. The main thread which was left running after the daemon process forked 
>> from Apache. It is from this thread that the n requests threads are created 
>> initially. It will also create 2 additional threads described below. After 
>> it has done this, this main thread becomes a caretaker for the whole 
>> process. It will wait on a special socketpair, which a signal handler will 
>> write a character to as a flag that the process should shutdown. In other 
>> words, this main thread just sits there and stops the process from exiting 
>> until told to.
>> 
>> 2. The second thread is a monitor thread. What it does is manage things like 
>> the activity timeout and shutdown timeout. If either of those timeouts occur 
>> it will send a signal to the same process (ie., itself), to trigger shutdown 
>> of the process.
>> 
>> 3. The third thread is another monitoring thread, but one which specifically 
>> detects whether the whole Python interpreter itself gets into a complete 
>> deadlock and stops doing anything. If this is detected it will again send a 
>> signal to the same process to trigger a shutdown.
>> 
>> So the additional threads are to manage process shutdown and ensure the 
>> process is still alive and doing stuff.
>> 
>> As to your memory issue, the problem with web application deployments which 
>> just about no one takes into consideration is that not all URLs in a web 
>> application are equal. I actually proposed a talk for PyCon US this year 
>> about this specific issue and how to deal with, but the talk was rejected.
>> 
>> In short, because your complete web application runs in the same process 
>> space, if one specific URL, or a small subset of URLs has special resource 
>> requirements, it dictates for the complete application what resources you 
>> require, even if those URLs might be infrequently used.
>> 
>> As an example, the admin pages in a Django application are not frequently 
>> used, but they may have a requirement to process a lot of data. This could 
>> create a large transient memory requirement just for the request, but since 
>> memory allocations from the operating system are generally never given back, 
>> this one infrequent request will blow out memory usage for the whole 
>> application. This memory once allocated will be retained by the process 
>> until the process is subsequently restarted.
>> 
>> Because of this, you could have a stupid situation whereby a request which 
>> is only run once every fifteen minutes, could over the course of a few 
>> hours, progressively be handled by a different process in a multiprocess web 
>> server configuration. Thus your overall memory usage will seem to jump up 
>> for no good reason until finally all processes have finally hit a plateau 
>> where they have allocated the maximum amount of memory they require to 
>> handle the worst case transient memory usage requirements for individual 
>> requests.
>> 
>> It can though get worse though if you also have multithreading being used in 
>> each process. As the response time for a memory hungry URL is longer and 
>> longer, you raise the odds that you could have two such memory hungry 
>> requests wanting to be handled concurrently within the same process in 
>> different threads. What this means is that your worst case memory usage 
>> isn't actually just the worst case memory requirement for a specific URL, 
>> but that multiplied by the number of threads in the process.
>> 
>> Further examples I have seen in the past where people have been hit by this 
>> are a site map, PDF generation and possibly even RSS feeds where a 
>> significant amount of content is returned with each item rather than it just 
>> being a summary.
>> 
>> The big problem in all of this is identifying which URL has the large 
>> transient memory requirement. Tools available for this aren't good and you 
>> generally have to fallback to adhoc solutions to try and work it out. I'll 
>> get to how you can work it out later, possibly as separate email as I have 
>> to go find some code I wrote once before for someone to try and work it out.
>> 
>> As to solving the problem when you have identified which URLs are the 
>> problem, ideally you would change how the code works to avoid the large 
>> transient memory requirement. If you cannot do that, or not straight away, 
>> then you can fall back on a number of different techniques to at least 
>> lesson the impact, by configuring the web server differently.
>> 
>> You have already identified two ways that this can be done, which is the 
>> inactivity timeout and maximum number of requests per process before a 
>> restart.
>> 
>> The problem with these as a solution is that the requirement for a small set 
>> of URLs has dictated the configuration for the whole application. Using them 
>> can therefore have an impact on other parts of the application.
>> 
>> In the case of setting a maximum for the number of requests handled for the 
>> process, you can introduce a significant amount of process churn if this is 
>> set too low relative to the overall throughput. That is, the processes will 
>> get restarted on a frequent basis.
>> 
>> I talk about this issue of process churn in my PyCon talk from last year:
>> 
>> http://lanyrd.com/2013/pycon/scdyzk/
>> 
>> but you can also see what I mean in the attached application capacity 
>> analysis report picture.
>> 
>> <PastedGraphic-1.png>
>> The better solution to this problem with not all URLs being equal and having 
>> different resource requirements, is to vertically partition your web 
>> application and spread it across multiple processes. Where each process only 
>> handles a subset of URLs. Luckily this can be easily handled by mod_wsgi 
>> using multiple daemon process groups and delegating URLs to different 
>> processes.
>> 
>> Take for example admin URLs in Django. If these are indeed infrequently used 
>> but can have a large transient memory requirement, what we can do is:
>> 
>> WSGIDaemonProcess main processes=5 threads=5
>> WSGIDaemonProcess admin threads=3 inactivity-timeout=30 maximum-requests=20
>> 
>> WSGIApplicationGroup %{GLOBAL}
>> WSGIProcessGroup main
>> 
>> WSGIScriptAlias / /some/path/wsgi.py
>> 
>> <Location /admin>
>> WSGIProcessGroup admin
>> </Location>
>> 
>> So what we have done is created two daemon process groups and have shoved 
>> the admin pages into a distinct one of its own where we can be more 
>> aggressive and uses inactivity timeout and maximum requests to combat 
>> excessive memory use. In doing this we have left along things for the bulk 
>> of the web application.
>> 
>> The end result is that we can tailor configuration settings for different 
>> parts of the application. The only requirement is that we can reasonably 
>> easily separate them out based on the URL being able to be matched by a 
>> Location/LocationMatch directive in Apache.
>> 
>> In this example we have done this specifically to separate our misbehaving 
>> parts of an application, but the converse can also be done.
>> 
>> If you think about it, most of the traffic for your site will often hit a 
>> small subset of URLs. The performance of the handling of these small, but 
>> very frequently visited URLs, could be impeded by having to use a more 
>> general configuration for the server.
>> 
>> What may work better is to delegate the very high trafficked URLs into their 
>> own daemon process with a processes/threads mix tuned for that scenario. 
>> Because that daemon is only going to handle a smaller number of URLs, the 
>> actual amount of code from your application that would ever be executed 
>> within that process would be much smaller. So long as your code base is 
>> setup such that it only lazily imports code for specific handlers when 
>> necessary the first time, you can keep this optimised process quite lean as 
>> far as memory usage.
>> 
>> So instead of having every process having to be very fat and eventually load 
>> up all parts of your application code, you can leave that for a smaller 
>> number of processes alone, where although they are going to serve up a 
>> greater number of different URLs, wouldn't necessarily get much traffic and 
>> so don't have to have as much capacity.
>> 
>> You might therefore have the following:
>> 
>> WSGIDaemonProcess main processes=1 threads=5
>> WSGIDaemonProcess volume processes=3 threads=5
>> WSGIDaemonProcess admin threads=3 inactivity-timeout=30 maximum-requests=20
>> 
>> WSGIApplicationGroup %{GLOBAL}
>> WSGIProcessGroup main
>> 
>> WSGIScriptAlias / /some/path/wsgi.py
>> 
>> <Location /publications/article/>
>> WSGIProcessGroup volume
>> </Location>
>> 
>> <Location /admin>
>> WSGIProcessGroup admin
>> </Location>
>> 
>> In your case we are therefore shoving the one URL which accounts for almost 
>> 50% of your total traffic into one daemon process group. This should have a 
>> lower memory footprint and so we can afford to run it across a few 
>> processes, each with a small number of process. All other non admin traffic, 
>> where all the remain coding for your application would be loaded, can be 
>> handled by one process.
>> 
>> So by juggling things like this, handling as special cases worst case URLs 
>> for transient memory usage, as well as your high traffic URLs, one can often 
>> quite dramatically control the amount of memory used.
>> 
>> Now what about monitoring these so as to be able to gauge effectiveness.
>> 
>> Because server monitoring in New Relic can't separately identify the 
>> mod_wsgi daemon process groups, even when the display-name options is used, 
>> for things like memory tracking you cannot rely readily on server 
>> monitoring. This is because everything will be lumped under Apache and you 
>> cannot tell what the memory requirements are of each.
>> 
>> What you have to do in this case is rely on the memory usage charts on the 
>> main overview dashboard for the web application in New Relic.
>> 
>> <PastedGraphic-3.png>
>> 
>> We have a problem though at this point though and that is that everything 
>> will still report under the same existing application in the New Relic UI 
>> and so we still don't have separation.
>> 
>> What we can do here though is configure things though so that each daemon 
>> process group reports into a separate application, as well as still 
>> reporting to a combined application for everything. This can be done from 
>> the Apache configuration file using:
>> 
>> WSGIDaemonProcess main processes=1 threads=5
>> WSGIDaemonProcess volume processes=3 threads=5
>> WSGIDaemonProcess admin threads=3 inactivity-timeout=30 maximum-requests=20
>> 
>> WSGIApplicationGroup %{GLOBAL}
>> WSGIProcessGroup main
>> 
>> SetEnv newrelic.app_name 'My Site (main);My Site'
>> 
>> WSGIScriptAlias / /some/path/wsgi.py
>> 
>> <Location /publications/article/>
>> WSGIProcessGroup volume
>> SetEnv newrelic.app_name 'MySite (volume);My Site'
>> </Location>
>> 
>> <Location /admin>
>> WSGIProcessGroup admin
>> SetEnv newrelic.app_name 'MySite (admin);My Site'
>> </Location>
>> 
>> So we are using specialisation via the Location directive to override what 
>> the application name the New Relic Python agent reports to.
>> 
>> We are also in this case using a semi colon separated list of names.
>> 
>> The result is that each daemon process group logs under a separate 
>> application of the form 'My Site (XXX)' but at the same time they also all 
>> report to 'My Site'.
>> 
>> This way you can still have a combined view, but you can also look at each 
>> daemon process group in isolation.
>> 
>> The isolation is important, because you can then do the following separately 
>> for each daemon process group.
>> View response times.
>> View throughput.
>> View memory usage.
>> View CPU usage.
>> View capacity analysis report.
>> Trigger thread profiler.
>> If things were separated and they were all reporting only to the same 
>> application, the data presented by this would be all mixed up and for the 
>> last 4 could be confusing.
>> 
>> Okay, so that is probably going to be a lot to digest but represents just a 
>> part of what I would have presented at PyCon US if my talk had been 
>> submitted.
>> 
>> Other things I would have talked about would have included dealing with 
>> request back log when overloaded due to increase traffic for certain URLs, 
>> dealing with danger of malicious POST requests with large content size etc 
>> etc.
>> 
>> Am sure the above will keep you busy for a while at least though. :-)
>> 
>> Now that I have done all that, I should clean it up a bit and put it up in a 
>> couple of blog posts.
>> 
>> Graham
>> 
>> On 20/02/2014, at 8:06 AM, scoopseven <m...@kecko.com> wrote:
>> 
>>> Graham, I'm still not sure why with processes=5 threads=2 I see 5 threads 
>>> for each process for mod_wsgi in htop. If you could explain that last 
>>> little hanging chad it would be great. Thanks!
>>> 
>>> Updated SO with summary of solution: 
>>> http://serverfault.com/questions/576527/apache-processes-in-top-more-than-maxclients
>>> 
>>> Mark
>>> 
>>> 
>>> On Wednesday, February 19, 2014 12:05:56 PM UTC-5, scoopseven wrote:
>>> This question started on SO: 
>>> http://serverfault.com/questions/576527/apache-processes-in-top-more-than-maxclients/576600
>>> 
>>> I've updated my Apache config and mod_wsgi settings, but am still 
>>> experiencing memory creep. Here's my site conf and my apache2.conf:
>>> 
>>> WSGIDaemonProcess mywsgi user=www-data group=www-data processes=5 threads=5 
>>> display-name=mod-wsgi 
>>> python-path=/home/admin/.virtualenvs/django/lib/python2.7/site-packages
>>> WSGIPythonHome /home/admin/.virtualenvs/django
>>> WSGIRestrictEmbedded On
>>> WSGILazyInitialization On
>>> 
>>> <VirtualHost 127.0.0.1:8080>
>>>     ServerName www.mysite.com
>>>     DocumentRoot /srv/mysite
>>>     
>>>     SetEnvIf X-Forwarded-Protocol https HTTPS=1
>>>     WSGIScriptAlias / /srv/mysite/system/apache/django.wsgi process-group= 
>>> mywsgi application-group=%{GLOBAL}
>>>     RequestHeader add X-Queue-Start "%t"
>>> </VirtualHost>
>>> 
>>> <IfModule mpm_worker_module>
>>>     StartServers             1
>>>     ThreadsPerChild          5
>>>     MinSpareThreads          5
>>>     MaxSpareThreads         10
>>>     MaxClients              25
>>>     ServerLimit              5
>>>     MaxRequestsPerChild      0
>>>     MaxMemFree            1024
>>> </IfModule>
>>> 
>>> I'm watching apache and mod_wsgi via htop and apache seems to be playing by 
>>> the rules, never loading more than 25 threads. It usually stays around 
>>> 10-15 threads. We average around 5-6 requests/second monitored by 
>>> /server-status/. The thing that's bothering me is that I'm counting 44 
>>> mod_wsgi threads in htop. I assumed that since I had processes=5 threads=5 
>>> I would only see a maximum of 30 threads below (5 processes + 25 threads). 
>>> 
>>> Partial htop dump:
>>> 
>>>  2249 www-data   20   0  159M 65544  4676 S 26.0  0.8  2:09.93 mod-wsgi     
>>>      -k start
>>>  2248 www-data   20   0  164M 69040  5560 S 148.  0.8  2:10.72 mod-wsgi     
>>>      -k start
>>>  2274 www-data   20   0  159M 65544  4676 S  0.0  0.8  0:12.58 mod-wsgi     
>>>      -k start
>>>  2250 www-data   20   0  157M 62212  5168 S 10.0  0.7  1:50.35 mod-wsgi     
>>>      -k start
>>>  2291 www-data   20   0  164M 69040  5560 S 41.0  0.8  0:17.07 mod-wsgi     
>>>      -k start
>>>  2251 www-data   20   0  165M 69320  4676 S  0.0  0.8  1:59.48 mod-wsgi     
>>>      -k start
>>>  2272 www-data   20   0  159M 65544  4676 S  0.0  0.8  0:28.67 mod-wsgi     
>>>      -k start
>>>  2282 www-data   20   0  165M 69320  4676 S  0.0  0.8  0:33.85 mod-wsgi     
>>>      -k start
>>>  2292 www-data   20   0  164M 69040  5560 S 28.0  0.8  0:28.08 mod-wsgi     
>>>      -k start
>>>  2298 www-data   20   0  157M 62212  5168 S  0.0  0.7  0:14.93 mod-wsgi     
>>>      -k start
>>>  2299 www-data   20   0  157M 62212  5168 S  1.0  0.7  0:23.71 mod-wsgi     
>>>      -k start
>>>  2358 www-data   20   0  164M 69040  5560 S  1.0  0.8  0:02.62 mod-wsgi     
>>>      -k start
>>>  2252 www-data   20   0  165M 70468  4660 S 41.0  0.8  1:55.85 mod-wsgi     
>>>      -k start
>>>  2273 www-data   20   0  159M 65544  4676 S 10.0  0.8  0:29.03 mod-wsgi     
>>>      -k start
>>>  2278 www-data   20   0  159M 65544  4676 S  1.0  0.8  0:02.79 mod-wsgi     
>>>      -k start
>>>  2264 www-data   20   0  165M 70468  4660 S  0.0  0.8  0:07.50 mod-wsgi     
>>>      -k start
>>>  2266 www-data   20   0  165M 70468  4660 S 25.0  0.8  0:39.49 mod-wsgi     
>>>      -k start
>>>  2300 www-data   20   0  157M 62212  5168 S  6.0  0.7  0:28.78 mod-wsgi     
>>>      -k start
>>>  2265 www-data   20   0  165M 70468  4660 S 15.0  0.8  0:31.44 mod-wsgi     
>>>      -k start
>>>  2294 www-data   20   0  164M 69040  5560 R 54.0  0.8  0:34.82 mod-wsgi     
>>>      -k start
>>>  2279 www-data   20   0  165M 69320  4676 S  0.0  0.8  0:32.63 mod-wsgi     
>>>      -k start
>>>  2297 www-data   20   0  157M 62212  5168 S  3.0  0.7  0:09.68 mod-wsgi     
>>>      -k start
>>>  2302 www-data   20   0  157M 62212  5168 S  0.0  0.7  0:27.62 mod-wsgi     
>>>      -k start
>>>  2323 www-data   20   0  157M 62212  5168 S  0.0  0.7  0:02.56 mod-wsgi     
>>>      -k start
>>>  2280 www-data   20   0  165M 69320  4676 S  0.0  0.8  0:13.00 mod-wsgi     
>>>      -k start
>>>  2263 www-data   20   0  165M 70468  4660 S  0.0  0.8  0:19.35 mod-wsgi     
>>>      -k start
>>>  2322 www-data   20   0  165M 69320  4676 S  0.0  0.8  0:03.05 mod-wsgi     
>>>      -k start
>>>  2275 www-data   20   0  165M 70468  4660 S  0.0  0.8  0:02.72 mod-wsgi     
>>>      -k start
>>>  2285 www-data   20   0  164M 69040  5560 S  0.0  0.8  0:00.00 mod-wsgi     
>>>      -k start
>>>  2288 www-data   20   0  164M 69040  5560 S  0.0  0.8  0:00.11 mod-wsgi     
>>>      -k start
>>>  2290 www-data   20   0  164M 69040  5560 S  4.0  0.8  0:15.66 mod-wsgi     
>>>      -k start
>>>  2293 www-data   20   0  164M 69040  5560 S 20.0  0.8  0:29.01 mod-wsgi     
>>>      -k start
>>>  2268 www-data   20   0  159M 65544  4676 S  0.0  0.8  0:00.00 mod-wsgi     
>>>      -k start
>>>  2269 www-data   20   0  159M 65544  4676 S  0.0  0.8  0:00.11 mod-wsgi     
>>>      -k start
>>>  2270 www-data   20   0  159M 65544  4676 S 15.0  0.8  0:26.62 mod-wsgi     
>>>      -k start
>>>  2271 www-data   20   0  159M 65544  4676 S  0.0  0.8  0:26.55 mod-wsgi     
>>>      -k start
>>> 
>>> Last night I had processes=3 threads=3 and my NR capacity report reported 
>>> 100% usage 
>>> (https://rpm.newrelic.com/accounts/67402/applications/1132078/optimize/capacity_analysis),
>>>  so I upped it to processes=5 threads=5 and now I have 44 threads going. 
>>> Despite the instance count reported by NR staying relatively stable, memory 
>>> consumption continues to increase 
>>> (https://rpm.newrelic.com/accounts/67402/servers/1130000/processes#id=152494639).
>>>  I realize that nobody except for Graham can see those NR reports, sorry. 
>>> 
>>> Has anyone dealt with this situation before?
>>> 
>>> Mark
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "modwsgi" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to modwsgi+unsubscr...@googlegroups.com.
>>> To post to this group, send email to modwsgi@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/modwsgi.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to modwsgi+unsubscr...@googlegroups.com.
To post to this group, send email to modwsgi@googlegroups.com.
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [modwsgi] mod_wsgi Showing more threads in htop than assigned

Reply via email to