2008/8/23 Nimrod A. Abing <[EMAIL PROTECTED]>: > > On Sat, Aug 23, 2008 at 1:33 PM, Graham Dumpleton > <[EMAIL PROTECTED]> wrote: >> >> 2008/8/22 Nimrod A. Abing <[EMAIL PROTECTED]>: >>>>>> Important changes are file descriptor leaks on a graceful restart in >>>>>> both versions and for version 2.2 possible truncation of data when >>>>>> using wsgi.file_wrapper. An upgrade would be recommended especially >>>>>> due to the latter if using wsgi.file_wrapper in a way or on system >>>>>> affected by the problem. >>>>> >>>>> Thanks for the fix for wsgi.file_wrapper. We had been experiencing >>>>> problems lately with users uploading large files. All tracebacks seem >>>>> to point to something in wsgi.file_wrapper. Unfortunately it is one of >>>>> those "heisenbugs". Some random thing/file upload triggers it, it >>>>> disappears when you try to debug it and it's impossible to recreate. >>>> >>>> You mean downloading don't you? At least I take uploading to mean >>>> putting a file on server, not the other way around, so wouldn't expect >>>> wsgi.file_wrapper to be involved. I would also have expected you to be >>>> using Apache 2.2 on UNIX where for static file downloads wouldn't be >>>> an issue. >>> >>> Uploading usually. But I have reason to suspect that it's happening >>> because users abort the upload. >> >> Which can cause 'IOError: client connection closed' exception to be >> raised when attempting to read input. >> >>> But we also get weird tracebacks from >>> time to time, these involved URL's where nothing is POSTed. We are >>> currently using Apache 2.0. We use lighttpd for static content. >> >> Am curious to see what these might be and whether they might be >> originating in mod_wsgi. > > Looking back at the majority of tracebacks I have received so far, all > of them are 'IOError: client connection closed' and all of them are > for the URL used for posting uploads.
FWIW, bit of a discussion about this in: http://code.google.com/p/modwsgi/issues/detail?id=29 > There is the odd exception for > URL's but I did some further digging and correlated them to the > numerous "probes" that we are getting. Probes can be a problem, especially POST requests against arbitrary URLs. For embedded mode it isn't a problem, but can be an issue if using daemon mode. The reason there can be a problem is the UNIX sockets on some platforms have quite small input/output buffer sizes. For example, on MacOS X it is 8 kilobytes. With the way that daemon mode works at the moment, all request content is pushed across to the daemon process handling the request before an attempt is made by Apache child worker process doing the proxying, to read any response. This means that if a probe sends request content greater than the UNIX socket buffer size and the handler for the URL doesn't consume the request content and simply sends a response, and that response is itself greater than the UNIX socket buffer size, then you get a deadlock. That is, Apache child worker is stuck because it can't send all request content and similary the daemon process can't send all the response as child worker stuck also and will not read it. There are few things in place to contend with this. First is that there are the receive-buffer-size and send-buffer-size options to WSGIDaemonProcess to allow UNIX socket buffer size to be increased on those platforms which have stupidly small values. Second is that mod_wsgi will detect a dead lock on UNIX socket through a timeout and will thus abort the handling of the request. Finally, mod_wsgi will honour the Apache LimitRequestBody directive and in particular even when daemon mode is being used will evaluate whether condition has failed in the Apache child worker process. This means that a HTTP error response can be returned indicating request content too large even before content is passed across to the daemon process. Frankly, the LimitRequestBody directive and also other Apache directives such as Limit and LimitExcept do not get used in the way they probably should by people hosting Python applications. The attitude of Python developers is that the Python application should handle everything and Apache should just be the tunnel for getting data to it. Thus features like those in Apache get ignored when they are possibly more appropriately handled in Apache. If you really wanted to provide maximum protection for your application, you would use LimitRequestBody to block requests with request content against URLs that shouldn't get it. You probably should block request methods as well except for where it is reasonable. BTW, the deadlock on UNIX socket is also a problem with mod_cgid and some other Apache modules that use proxying to a backend process. Usually worse for UNIX sockets, but technically could also be an issue where INET sockets are used, but because buffer sizes are much larger in that case, much harder for it to be inadvertently triggered. We have had discussions in the past about changing how mod_wsgi handles communication across the UNIX socket in daemon mode to avoid this issue, plus give complete end to end 100-continue functionality, but just haven't had time this year to look at it yet. >>>> Maybe you want to better explain the problem you are seeing in case it >>>> is something completely different. One concern would be that if it is >>>> a large file upload that takes more than 5 seconds, that if you are >>>> using daemon mode and recycling process based on number of requests, >>>> that upload is being interrupted when process is recycled, as by >>>> default will only allow at most 5 seconds for process to restart >>>> before killing it. In other words, it is not a graceful restart where >>>> process persists until all active requests complete. >>> >>> We are using daemon mode. It's interesting that you brought that up. >>> It had never occurred to me that the daemon could timeout in the >>> middle of a POST request, in any case I would not expect it to timeout >>> in the middle of *any* request. Like I said above, it happens more >>> often on uploads. I suspect that these users are using dial-up, a lot >>> of the IPs are from Cox address block. What are the defaults for the >>> *-timeout options if you do not supply them? >> >> The timeouts only come into play when you use 'maximum-requests' >> option to WSGIDaemonProcess. It is defined by 'shutdown-timeout' and >> if not defined defaults to 5 seconds. What occurs is that when maximum >> requests is reached, the process starts an orderly shutdown sequence >> of not accepting new requests, allowing running requests to complete >> and then triggering atexit registered callbacks and destroying Python >> interpreter instances. If however the active requests do not complete >> within that shutdown timeout period, then the process is killed off >> without the cleanup occurring. > > Our config does not have maximum-requests specified. You do bring up a > very good way of dealing with scaling issues that we are anticipating. I am not sure what part of what I said is relevant to 'scaling'. In general when scaling is the topic, I wouldn't be using daemon mode. This is because when you use embedded mode you can benefit from Apaches ability to scale up by creating more Apache child processes to handle requests and then kill them off when no longer required. In daemon mode the number of daemon processes is fixed, so you have to configure number of daemon processes and threads in line with what maximum concurrent number of requests might realistically be. >>>> Thus, long running file uploads perhaps better handled in embedded >>>> mode, or delegate the URLs for file upload to special daemon process >>>> that doesn't recycle process so as to avoid problem with it being >>>> interrupted. >>> >>> That would require us to have a subdomain just for this purpose, I'm >>> predicting this will cause the Django authentication system to fail >>> somehow. But I will look into it and see if it's feasible. >> >> No. You start on right track in later messages, but will answer here. >> >> <VirtualHost *> >> >> WSGIDaemonProcess default processes=5 threads=10 maximum-requests=5000 >> WSGIDaemonProcess uploads processes=1 threads=10 >> >> WSGIProcessGroup default >> >> WSGIScriptAlias / /some/path/app.wsgi >> >> <Locaton /my/url/for/uploads> >> WSGIProcessGroup uploads >> </Location> >> >> ... >> >> </VirtualHost> >> >> In other words, using <Location> directive with WSGIProcessGroup >> within it, you can distribute URLs across multiple daemon process >> groups with different attributes such as number of processes, maximum >> requests etc. The WSGIProcessGroup inside of <Location> directive will >> override that defined at scope of <VirtualHost> but only for matched >> URL prefix. > > Excellent! We will probably be using a separate subdomain for handling > file uploads. This will allow us to scale horizontally as well as > vertically when the need arises. Good to know there is an option if > you are unable to use subdomains. One thing I should point out is that you aren't restricted to breaking up application to run across multiple daemon process groups. You could also partly run it in embedded mode. For example: <VirtualHost *> WSGIDaemonProcess default processes=5 threads=10 maximum-requests=5000 WSGIDaemonProcess uploads processes=1 threads=10 WSGIProcessGroup default WSGIScriptAlias / /some/path/app.wsgi <Locaton /my/url/for/uploads> WSGIProcessGroup uploads </Location> <Location /my/most/frequently/accessed/urls> WSGIProcessGroup %{GLOBAL} </Location> ... </VirtualHost> What have done here is delegate specific subset of URLs to be hosted by application but in embedded mode. That is, in Apache child worker processes. Thus, for your most frequently handled URLs, especially if it is a front page which actually has small run time overhead and very small memory requirement, you could have it handled in Apache child worker processes and skip proxying through to a daemon process. Being in Apache child worker processes, which can be scaled up to meet demand, if you get a burst in traffic for most popular pages, Apache will create the additional processes automatically for you and you will not bog down the daemon processes handling normal requests that do actual work. One could possibly even try and Slashdot protect your application by looking at referrer and if it comes from Slashdot handle in embedded mode to benefit from scaling abilities of Apache. Other interesting things you can do is identify what are memory hogging URLs. You might by default run your application in embedded mode, but identify which URLs are the ones which cause the size of your processes to balloon out in size. Delegate these URLs to single or small number of daemon processes. That way memory used by those URLs is constrained by how many daemon processes you allow. In the meantime, the bulk of your application still runs in more scalable embedded mode configuration, but because you have offloaded memory hogging URLs, you don't suffer problem as readily of all your system memory running out if Apache has to creates lots of extra processes to handle demand, ie., you don't have every process having full copy of your fat application. One final thing. WSGIDaemonProcess isn't constrained to be defined inside of VirtualHost, it can actually be defined outside of it. When defined outside, you can delegate applications running in different virtual hosts to run in same daemon process group. By default these would still be separated by virtue of being run in different interpreters within process, but if application supports concept of receiving requests for multiple virtual hosts, they could be made to run in same interpreter. An example of where this might be used is with Zope (or newer WSGI capable Grok version). This is because Zope has what they call the virtual host monster. This means single Zope instance can handle multiple virtual hosts. Thus rather than separate Zope for each virtual host, do you could direct requests against multiple virtual hosts to same Zope instance. Thus in general one would say (ignoring that Zope requires other steps to enable virtual host monster): WSGIDaemonProcess shared ... <VirtualHost *> ServerName www.one.com WSGIProcessGroup shared WSGIApplicationGroup %{GLOBAL} WSGIScriptAlias / /some/path/app.wsgi </VirtualHost> <VirtualHost *> ServerName www.two.com WSGIProcessGroup shared WSGIApplicationGroup %{GLOBAL} WSGIScriptAlias / /some/path/app.wsgi </VirtualHost> Hope this gives you more interesting ideas. :-) Graham --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "modwsgi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/modwsgi?hl=en -~----------~----~----~----~------~----~------~--~---
