On Sat, Aug 23, 2008 at 8:16 PM, Graham Dumpleton <[EMAIL PROTECTED]> wrote: >> Looking back at the majority of tracebacks I have received so far, all >> of them are 'IOError: client connection closed' and all of them are >> for the URL used for posting uploads. > > FWIW, bit of a discussion about this in: > > http://code.google.com/p/modwsgi/issues/detail?id=29
I see. Apart from filling up our logs with spurious error/warning messages it's not going to cause any major problems. >> There is the odd exception for >> URL's but I did some further digging and correlated them to the >> numerous "probes" that we are getting. > > Probes can be a problem, especially POST requests against arbitrary > URLs. For embedded mode it isn't a problem, but can be an issue if > using daemon mode. The probes are really not too difficult to guard against and are a source of amusement for me sometimes :) > The reason there can be a problem is the UNIX sockets on some > platforms have quite small input/output buffer sizes. For example, on > MacOS X it is 8 kilobytes. With the way that daemon mode works at the > moment, all request content is pushed across to the daemon process > handling the request before an attempt is made by Apache child worker > process doing the proxying, to read any response. > > This means that if a probe sends request content greater than the UNIX > socket buffer size and the handler for the URL doesn't consume the > request content and simply sends a response, and that response is > itself greater than the UNIX socket buffer size, then you get a > deadlock. That is, Apache child worker is stuck because it can't send > all request content and similary the daemon process can't send all the > response as child worker stuck also and will not read it. > > There are few things in place to contend with this. First is that > there are the receive-buffer-size and send-buffer-size options to > WSGIDaemonProcess to allow UNIX socket buffer size to be increased on > those platforms which have stupidly small values. Second is that > mod_wsgi will detect a dead lock on UNIX socket through a timeout and > will thus abort the handling of the request. Finally, mod_wsgi will > honour the Apache LimitRequestBody directive and in particular even > when daemon mode is being used will evaluate whether condition has > failed in the Apache child worker process. This means that a HTTP > error response can be returned indicating request content too large > even before content is passed across to the daemon process. > > Frankly, the LimitRequestBody directive and also other Apache > directives such as Limit and LimitExcept do not get used in the way > they probably should by people hosting Python applications. The > attitude of Python developers is that the Python application should > handle everything and Apache should just be the tunnel for getting > data to it. Thus features like those in Apache get ignored when they > are possibly more appropriately handled in Apache. If you really > wanted to provide maximum protection for your application, you would > use LimitRequestBody to block requests with request content against > URLs that shouldn't get it. You probably should block request methods > as well except for where it is reasonable. I used to have LimitRequestBody in our config but for some reason I commented it out some time ago. I can't remember what it was, but we were having issues and commenting out LimitRequestBody solved the problem. But seeing as LimitRequestBody is the obvious and Right Thing to do, I will have to reinstate it again in our config. > BTW, the deadlock on UNIX socket is also a problem with mod_cgid and > some other Apache modules that use proxying to a backend process. > Usually worse for UNIX sockets, but technically could also be an issue > where INET sockets are used, but because buffer sizes are much larger > in that case, much harder for it to be inadvertently triggered. This would not be a problem in our case. We use lighttpd for fastcgi/cgi in addition to handling static content. Apache is purely to run Python apps under mod_wsgi and nothing else. >> Our config does not have maximum-requests specified. You do bring up a >> very good way of dealing with scaling issues that we are anticipating. > > I am not sure what part of what I said is relevant to 'scaling'. In > general when scaling is the topic, I wouldn't be using daemon mode. > This is because when you use embedded mode you can benefit from > Apaches ability to scale up by creating more Apache child processes to > handle requests and then kill them off when no longer required. In > daemon mode the number of daemon processes is fixed, so you have to > configure number of daemon processes and threads in line with what > maximum concurrent number of requests might realistically be. The plan is to use the system found in Chapter 20 of the Django book: http://www.djangobook.com/en/1.0/chapter20/ See figure 20-4 All the Django app servers will be running mod_wsgi and will be placed behind a server running perlbal or Apache with mod_proxy. Looking at our current setup, the main reason we use daemon mode is because we needed file ownerships and permissions for uploaded files to be handled correctly. This is why we are opting to use a subdomain for handling file uploads since it will allow us to use a dedicated server to handle file uploads. >> Excellent! We will probably be using a separate subdomain for handling >> file uploads. This will allow us to scale horizontally as well as >> vertically when the need arises. Good to know there is an option if >> you are unable to use subdomains. > > One thing I should point out is that you aren't restricted to breaking > up application to run across multiple daemon process groups. You could > also partly run it in embedded mode. For example: > > <VirtualHost *> > > WSGIDaemonProcess default processes=5 threads=10 maximum-requests=5000 > WSGIDaemonProcess uploads processes=1 threads=10 > > WSGIProcessGroup default > > WSGIScriptAlias / /some/path/app.wsgi > > <Locaton /my/url/for/uploads> > WSGIProcessGroup uploads > </Location> > > <Location /my/most/frequently/accessed/urls> > WSGIProcessGroup %{GLOBAL} > </Location> > > ... > > </VirtualHost> > > What have done here is delegate specific subset of URLs to be hosted > by application but in embedded mode. That is, in Apache child worker > processes. > > Thus, for your most frequently handled URLs, especially if it is a > front page which actually has small run time overhead and very small > memory requirement, you could have it handled in Apache child worker > processes and skip proxying through to a daemon process. Being in > Apache child worker processes, which can be scaled up to meet demand, > if you get a burst in traffic for most popular pages, Apache will > create the additional processes automatically for you and you will not > bog down the daemon processes handling normal requests that do actual > work. One could possibly even try and Slashdot protect your > application by looking at referrer and if it comes from Slashdot > handle in embedded mode to benefit from scaling abilities of Apache. > > Other interesting things you can do is identify what are memory > hogging URLs. You might by default run your application in embedded > mode, but identify which URLs are the ones which cause the size of > your processes to balloon out in size. Delegate these URLs to single > or small number of daemon processes. That way memory used by those > URLs is constrained by how many daemon processes you allow. In the > meantime, the bulk of your application still runs in more scalable > embedded mode configuration, but because you have offloaded memory > hogging URLs, you don't suffer problem as readily of all your system > memory running out if Apache has to creates lots of extra processes to > handle demand, ie., you don't have every process having full copy of > your fat application. You know, you have a weird sense of timing :) I have been meaning to profile our app using Guppy-PE to find out just what URL's are eating up memory. All this works with LocationMatch as well yes? > One final thing. WSGIDaemonProcess isn't constrained to be defined > inside of VirtualHost, it can actually be defined outside of it. When > defined outside, you can delegate applications running in different > virtual hosts to run in same daemon process group. By default these > would still be separated by virtue of being run in different > interpreters within process, but if application supports concept of > receiving requests for multiple virtual hosts, they could be made to > run in same interpreter. > > An example of where this might be used is with Zope (or newer WSGI > capable Grok version). This is because Zope has what they call the > virtual host monster. This means single Zope instance can handle > multiple virtual hosts. Thus rather than separate Zope for each > virtual host, do you could direct requests against multiple virtual > hosts to same Zope instance. Thus in general one would say (ignoring > that Zope requires other steps to enable virtual host monster): > > WSGIDaemonProcess shared ... > > <VirtualHost *> > ServerName www.one.com > > WSGIProcessGroup shared > WSGIApplicationGroup %{GLOBAL} > > WSGIScriptAlias / /some/path/app.wsgi > </VirtualHost> > > <VirtualHost *> > ServerName www.two.com > > WSGIProcessGroup shared > WSGIApplicationGroup %{GLOBAL} > > WSGIScriptAlias / /some/path/app.wsgi > </VirtualHost> > > Hope this gives you more interesting ideas. :-) This is just brilliant! I never realized just how flexible and powerful mod_wsgi was until you brought these up. Thanks! -- Best Regards, Nimrod A. Abing W http://arsenic.ph/ W http://preownedcar.com/ W http://preownedbike.com/ W http://abing.gotdns.com/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en -~----------~----~----~----~------~----~------~--~---
