Hi Graham thank you very much for the very detailed email. It helps explain a lot of the issues to me.
I run a very seriously used wsgi application for a rather large client. I would be more than happy to plug in metric Gathering code so that we can analyze Statistics over a period of time. If you have something specific in mind please let me know. I do not currently have any metric Gathering in place. I think the application is specified at 6 processes and 15 threads per process. But since we only use apache for midwsgi requests and let nginx buffer dynamic and serve all static requests, I don't think requests timing out has been a problem. What brought this to mind recently was some back-end processes then I'm running within the WSGI Daemon which process Amazon SQS messages. The volume that we were pushing through SQS was such that the background process was taking days to catch up. So I changed the code and had its spawn 40 threads to handle the SQS messages and was able to get caught up. Please let me know how I can help. Thanks! Jason On Dec 30, 2016 5:20 PM, "Graham Dumpleton" <[email protected]> wrote: > > On 31 Dec 2016, at 6:37 AM, Jason Garber <[email protected]> wrote: > > Hi Graham, > > Can you comment on why one would use more processes or more threads in a > database-intensive web application? > > For example: > > 6 processes > 10 threads each > > Or > > 1 process > 60 threads > > My current understanding would say that you could use more threads until > you started to see a performance drop due to the GIL not allowing > concurrent execution. But I was wondering what you had to say. > > > Sort of. > > The problem is knowing when the GIL is starting to impact on things. The > measure of what constitutes a performance drop is what is hard to measure. > You could keep adding threads until you have way more than you need and > still not see a performance drop as the contention point could be in a > place where you just can’t get enough requests into the process to make use > of them anyway. Have seen this many times where people allocate way too > many threads and most of them are never used and sit their idle just > chewing up memory. The design of mod_wsgi daemon mode is such that it will > at least not activate threads unless the capacity is needed. This only > applies to Python side of things though, they are still allocated on C > side, and a momentary backlog could still seem them activated on Python > side and used for a moment but then go back to not being in use. That is > when memory use can start to blow out unnecessarily and where fewer threads > would have been better and live with the potential momentary backlog. > > So since Python doesn’t provide a way of measuring GIL contention and give > any guidance, the question is what we can look at. > > What I have been using to try and quantify GIL impacts and tune > processes/threads are the following. > > * Thread capacity used - This is a measure of how many of the capacity of > the specified threads are actually being used in a time period. This can > tell you when you have too many threads allocated and so they are wasted. > > * Per request CPU usage - This is a measure of how much CPU was used in > handling a request. If this comes in as a low figure, the request is likely > I/O bound. If a high figure then CPU bound. Unfortunately granularity of > this on some systems is not great and so if you always have very short > response times under <10ms, may not give an completely accurate picture. > > * Process wide CPU usage - This is a measure of how much CPU is used by > the whole process in a period of time. > > * Per request queue time - This is a measure of how much time a request > spent in Apache before it was handled in a daemon process group by a WSGI > application. This helps in understanding whether backlogging is occurring > due lack of capacity in the daemon processes or bottlenecks elsewhere. > > * Rate of requests timed out - This is a measure of how many requests > timed out before being handled by the daemon processes. This helps in > understanding when the application got overloaded and requests started to > be failed, if enabled, to try and discard the backlog. > > With the exception of the last one, there are ways in mod_wsgi of getting > this information. I could partly implement something to track the last one, > but it will not be accurate for severe backlogging, but would still show > the spike which is enough, as only excessive backlogging in Apache worker > processes such that socket connection to daemon processes fails that can’t > easily get access to. Only thought about way to do something for the latter > when saw this email, so may start playing with it. > > An issue related to all this is process memory size. In Python there is a > tendency for memory in use to grow up to a plateau as all possible request > handlers are visited. This means you have a lot of memory in use that isn’t > potentially touched very often. I have talked previously about vertically > partition an application across multiple daemon process groups so that > different sub sets of URLs could be handled in own processes before. This > allows you to separately tune processes/threads for that set of URLs, and > for more frequently visited URLs keep that memory hot with less memory that > just sits there unused. > > Another crude way of handling growing memory use and extra overhead of too > much memory paging, is to restart the daemon processes every so often. > Right now can only do this based on request count or inactivity though. Not > for the first time, I have been looking lately at a way of simply > restarting processes on a regular time interval to keep memory usage down. > The danger here which isn’t simple to solve is avoiding restarting multiple > processes at the same time and causing load spikes. Using graceful timeouts > though may be enough to spread restarts as would be random based on when > processes not handling requests. > > Anyway, there are metrics one can use to measure things to try and > understand what is going on. > > Do you have any monitoring system in place into which metrics can be > injected? If you do I can explain how you can get the information out and > we can start looking at it. I would really love to be able to get hold of a > data dump for these metrics over a period of time for a real application so > I can try and do some data analysis of it in Jupyter Notebook and develop > some code which could be used to give guidance on tuning when you have the > data. One can’t get decent data for doing this when using test applications > and fake data, need a real application with decent amount of traffic and I > don’t have one. > > Graham > > > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/modwsgi. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
