The thing that contributes the most is maximum-requests=400. Recycling processes so quickly seems to be a trigger, in conjunction with use of prefork MPM.
The extra debugging from subversion trunk mod_wsgi tries to record whether Apache parent is getting the notification of death of a daemon mode process when it is recycled. It will indicate when the events are received and help to identify if there is an ordering issue in respect of events or if events are being lost. Anyway, that is one aspect of the problem whereby daemon processes appear never to get replaced when they die. The other issues which seems to be linked is that multiple daemon processes can attempt to try and accept new connections at same time when only one should be able to due to cross process mutex locks preventing that. Can't see how that is possible at the moment, unless daemon processes are inheriting some memory corruption from Apache parent process made worse through large numbers of daemon process restarts. In other words, problems very much tied to frequent daemon process restarts. Thus, why exactly must you use maximum-requests option? Graham On 17 November 2010 12:16, mdj <[email protected]> wrote: > > > On Nov 16, 5:15 pm, Graham Dumpleton <[email protected]> > wrote: > >> > On the system that was exhibiting the problem, I moved the daemon >> > count from 4 to 8, and threads from 5 to 8, and bumped up the max >> > requests. This appears to have mitigated the problem, but as it >> > appears for now to be a bug either within mod_wsgi or apache I'm happy >> > to help any way I can in tracking it down. >> >> Are you in a position to be able to upgrade versions of Apache and mod_wsgi? >> >> Are you willing to run mod_wsgi from subversion trunk? The subversion >> version has additional logging to help gather more information abbot >> this issue. > > Upgraded mod_wsgi I can do. However, since I have the aforementioned > changes the web server cluster has run with 100% uptime for 5 days, > when previously we averaged one failure per day (across a 3 machine > farm). Whatever the problem is, having more daemon processes per group > seems to mitigte it. > > >> Can you try and get stack traces of stuck daemon process group using >> gdb script recipe right at end of: >> >> http://code.google.com/p/modwsgi/wiki/DebuggingTechniques >> >> Please also post whether using prefork Apache MPM, plus mod_wsgi >> daemon process and related directives from mod_wsgi configuration. > > We use prefork (havn't seen the need to switch) and run 43 vhosts each > which have the following configuration: > > WSGIDaemonProcess <name> user=apache group=apache display-name=<name> > processes=4 threads=5 maximum-request > s=400 > > Then vhosts then scriptalias the vhost root to a wsgi handler. > > We isolated the failures to one specific vhost which has higher > traffic than the others, and it's been changed to 8,8,1024 instead of > 4,5,400 > >> I worked with that OP a fair bit off list to help sort this out, but >> put it aside at the moment as busy in last few weeks of a job before I >> leave. They weren't able to move to latest mod_wsgi source so can get >> debug output from extra code I added. > > We have our web servers behind a layer 7 switch, and we've already > built the latest mod_wsgi. I'll keep in touch around the failures and > once it trips again, I'll upgrade mod_wsgi to get you the logging you > need. > > a stacktrace on the other hand is harder. I'd really need to bring an > additional machine into the farm for that, as the chance of a node > failure would seem to be larger. I can probably spare the staff > resources to make that happen over the christmas period should this > issue remain unresolved by then. > > Cheers, > > Matt > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/modwsgi?hl=en. > > -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en.
