Re: [modwsgi] processes vs. threads

Graham Dumpleton Fri, 30 Dec 2016 14:20:20 -0800

> On 31 Dec 2016, at 6:37 AM, Jason Garber <[email protected]> wrote:
> 
> Hi Graham,
> 
> Can you comment on why one would use more processes or more threads in a 
> database-intensive web application?
> 
> For example: 
> 
> 6 processes
> 10 threads each
> 
> Or
> 
> 1 process
> 60 threads
> 
> My current understanding would say that you could use more threads until you 
> started to see a performance drop due to the GIL not allowing concurrent 
> execution.  But I was wondering what you had to say.


Sort of.

The problem is knowing when the GIL is starting to impact on things. The 
measure of what constitutes a performance drop is what is hard to measure. You 
could keep adding threads until you have way more than you need and still not 
see a performance drop as the contention point could be in a place where you 
just can’t get enough requests into the process to make use of them anyway. 
Have seen this many times where people allocate way too many threads and most 
of them are never used and sit their idle just chewing up memory. The design of 
mod_wsgi daemon mode is such that it will at least not activate threads unless 
the capacity is needed. This only applies to Python side of things though, they 
are still allocated on C side, and a momentary backlog could still seem them 
activated on Python side and used for a moment but then go back to not being in 
use. That is when memory use can start to blow out unnecessarily and where 
fewer threads would have been better and live with the potential momentary 
backlog.

So since Python doesn’t provide a way of measuring GIL contention and give any 
guidance, the question is what we can look at.

What I have been using to try and quantify GIL impacts and tune 
processes/threads are the following.

* Thread capacity used - This is a measure of how many of the capacity of the 
specified threads are actually being used in a time period. This can tell you 
when you have too many threads allocated and so they are wasted.

* Per request CPU usage - This is a measure of how much CPU was used in 
handling a request. If this comes in as a low figure, the request is likely I/O 
bound. If a high figure then CPU bound. Unfortunately granularity of this on 
some systems is not great and so if you always have very short response times 
under <10ms, may not give an completely accurate picture.

* Process wide CPU usage - This is a measure of how much CPU is used by the 
whole process in a period of time.

* Per request queue time - This is a measure of how much time a request spent 
in Apache before it was handled in a daemon process group by a WSGI 
application. This helps in understanding whether backlogging is occurring due 
lack of capacity in the daemon processes or bottlenecks elsewhere.

* Rate of requests timed out - This is a measure of how many requests timed out 
before being handled by the daemon processes. This helps in understanding when 
the application got overloaded and requests started to be failed, if enabled, 
to try and discard the backlog.

With the exception of the last one, there are ways in mod_wsgi of getting this 
information. I could partly implement something to track the last one, but it 
will not be accurate for severe backlogging, but would still show the spike 
which is enough, as only excessive backlogging in Apache worker processes such 
that socket connection to daemon processes fails that can’t easily get access 
to. Only thought about way to do something for the latter when saw this email, 
so may start playing with it.

An issue related to all this is process memory size. In Python there is a 
tendency for memory in use to grow up to a plateau as all possible request 
handlers are visited. This means you have a lot of memory in use that isn’t 
potentially touched very often. I have talked previously about vertically 
partition an application across multiple daemon process groups so that 
different sub sets of URLs could be handled in own processes before. This 
allows you to separately tune processes/threads for that set of URLs, and for 
more frequently visited URLs keep that memory hot with less memory that just 
sits there unused.

Another crude way of handling growing memory use and extra overhead of too much 
memory paging, is to restart the daemon processes every so often. Right now can 
only do this based on request count or inactivity though. Not for the first 
time, I have been looking lately at a way of simply restarting processes on a 
regular time interval to keep memory usage down. The danger here which isn’t 
simple to solve is avoiding restarting multiple processes at the same time and 
causing load spikes. Using graceful timeouts though may be enough to spread 
restarts as would be random based on when processes not handling requests.

Anyway, there are metrics one can use to measure things to try and understand 
what is going on.

Do you have any monitoring system in place into which metrics can be injected? 
If you do I can explain how you can get the information out and we can start 
looking at it. I would really love to be able to get hold of a data dump for 
these metrics over a period of time for a real application so I can try and do 
some data analysis of it in Jupyter Notebook and develop some code which could 
be used to give guidance on tuning when you have the data. One can’t get decent 
data for doing this when using test applications and fake data, need a real 
application with decent amount of traffic and I don’t have one.

Graham



-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/d/optout.

Re: [modwsgi] processes vs. threads

Reply via email to