[modwsgi] Re: Versions 1.5 and 2.2 of mod_wsgi now available.

Graham Dumpleton Sat, 23 Aug 2008 05:16:35 -0700

2008/8/23 Nimrod A. Abing <[EMAIL PROTECTED]>:
>
> On Sat, Aug 23, 2008 at 1:33 PM, Graham Dumpleton
> <[EMAIL PROTECTED]> wrote:
>>
>> 2008/8/22 Nimrod A. Abing <[EMAIL PROTECTED]>:
>>>>>> Important changes are file descriptor leaks on a graceful restart in
>>>>>> both versions and for version 2.2 possible truncation of data when
>>>>>> using wsgi.file_wrapper. An upgrade would be recommended especially
>>>>>> due to the latter if using wsgi.file_wrapper in a way or on system
>>>>>> affected by the problem.
>>>>>
>>>>> Thanks for the fix for wsgi.file_wrapper. We had been experiencing
>>>>> problems lately with users uploading large files. All tracebacks seem
>>>>> to point to something in wsgi.file_wrapper. Unfortunately it is one of
>>>>> those "heisenbugs". Some random thing/file upload triggers it, it
>>>>> disappears when you try to debug it and it's impossible to recreate.
>>>>
>>>> You mean downloading don't you? At least I take uploading to mean
>>>> putting a file on server, not the other way around, so wouldn't expect
>>>> wsgi.file_wrapper to be involved. I would also have expected you to be
>>>> using Apache 2.2 on UNIX where for static file downloads wouldn't be
>>>> an issue.
>>>
>>> Uploading usually. But I have reason to suspect that it's happening
>>> because users abort the upload.
>>
>> Which can cause 'IOError: client connection closed' exception to be
>> raised when attempting to read input.
>>
>>> But we also get weird tracebacks from
>>> time to time, these involved URL's where nothing is POSTed. We are
>>> currently using Apache 2.0. We use lighttpd for static content.
>>
>> Am curious to see what these might be and whether they might be
>> originating in mod_wsgi.
>
> Looking back at the majority of tracebacks I have received so far, all
> of them are 'IOError: client connection closed' and all of them are
> for the URL used for posting uploads.


FWIW, bit of a discussion about this in:

  http://code.google.com/p/modwsgi/issues/detail?id=29

> There is the odd exception for
> URL's but I did some further digging and correlated them to the
> numerous "probes" that we are getting.

Probes can be a problem, especially POST requests against arbitrary
URLs. For embedded mode it isn't a problem, but can be an issue if
using daemon mode.

The reason there can be a problem is the UNIX sockets on some
platforms have quite small input/output buffer sizes. For example, on
MacOS X it is 8 kilobytes. With the way that daemon mode works at the
moment, all request content is pushed across to the daemon process
handling the request before an attempt is made by Apache child worker
process doing the proxying, to read any response.

This means that if a probe sends request content greater than the UNIX
socket buffer size and the handler for the URL doesn't consume the
request content and simply sends a response, and that response is
itself greater than the UNIX socket buffer size, then you get a
deadlock. That is, Apache child worker is stuck because it can't send
all request content and similary the daemon process can't send all the
response as child worker stuck also and will not read it.

There are few things in place to contend with this. First is that
there are the receive-buffer-size and send-buffer-size options to
WSGIDaemonProcess to allow UNIX socket buffer size to be increased on
those platforms which have stupidly small values. Second is that
mod_wsgi will detect a dead lock on UNIX socket through a timeout and
will thus abort the handling of the request. Finally, mod_wsgi will
honour the Apache LimitRequestBody directive and in particular even
when daemon mode is being used will evaluate whether condition has
failed in the Apache child worker process. This means that a HTTP
error response can be returned indicating request content too large
even before content is passed across to the daemon process.

Frankly, the LimitRequestBody directive and also other Apache
directives such as Limit and LimitExcept do not get used in the way
they probably should by people hosting Python applications. The
attitude of Python developers is that the Python application should
handle everything and Apache should just be the tunnel for getting
data to it. Thus features like those in Apache get ignored when they
are possibly more appropriately handled in Apache. If you really
wanted to provide maximum protection for your application, you would
use LimitRequestBody to block requests with request content against
URLs that shouldn't get it. You probably should block request methods
as well except for where it is reasonable.

BTW, the deadlock on UNIX socket is also a problem with mod_cgid and
some other Apache modules that use proxying to a backend process.
Usually worse for UNIX sockets, but technically could also be an issue
where INET sockets are used, but because buffer sizes are much larger
in that case, much harder for it to be inadvertently triggered.

We have had discussions in the past about changing how mod_wsgi
handles communication across the UNIX socket in daemon mode to avoid
this issue, plus give complete end to end 100-continue functionality,
but just haven't had time this year to look at it yet.

>>>> Maybe you want to better explain the problem you are seeing in case it
>>>> is something completely different. One concern would be that if it is
>>>> a large file upload that takes more than 5 seconds, that if you are
>>>> using daemon mode and recycling process based on number of requests,
>>>> that upload is being interrupted when process is recycled, as by
>>>> default will only allow at most 5 seconds for process to restart
>>>> before killing it. In other words, it is not a graceful restart where
>>>> process persists until all active requests complete.
>>>
>>> We are using daemon mode. It's interesting that you brought that up.
>>> It had never occurred to me that the daemon could timeout in the
>>> middle of a POST request, in any case I would not expect it to timeout
>>> in the middle of *any* request. Like I said above, it happens more
>>> often on uploads. I suspect that these users are using dial-up, a lot
>>> of the IPs are from Cox address block. What are the defaults for the
>>> *-timeout options if you do not supply them?
>>
>> The timeouts only come into play when you use 'maximum-requests'
>> option to WSGIDaemonProcess. It is defined by 'shutdown-timeout' and
>> if not defined defaults to 5 seconds. What occurs is that when maximum
>> requests is reached, the process starts an orderly shutdown sequence
>> of not accepting new requests, allowing running requests to complete
>> and then triggering atexit registered callbacks and destroying Python
>> interpreter instances. If however the active requests do not complete
>> within that shutdown timeout period, then the process is killed off
>> without the cleanup occurring.
>
> Our config does not have maximum-requests specified. You do bring up a
> very good way of dealing with scaling issues that we are anticipating.

I am not sure what part of what I said is relevant to 'scaling'. In
general when scaling is the topic, I wouldn't be using daemon mode.
This is because when you use embedded mode you can benefit from
Apaches ability to scale up by creating more Apache child processes to
handle requests and then kill them off when no longer required. In
daemon mode the number of daemon processes is fixed, so you have to
configure number of daemon processes and threads in line with what
maximum concurrent number of requests might realistically be.

>>>> Thus, long running file uploads perhaps better handled in embedded
>>>> mode, or delegate the URLs for file upload to special daemon process
>>>> that doesn't recycle process so as to avoid problem with it being
>>>> interrupted.
>>>
>>> That would require us to have a subdomain just for this purpose, I'm
>>> predicting this will cause the Django authentication system to fail
>>> somehow. But I will look into it and see if it's feasible.
>>
>> No. You start on right track in later messages, but will answer here.
>>
>>  <VirtualHost *>
>>
>>  WSGIDaemonProcess default processes=5 threads=10 maximum-requests=5000
>>  WSGIDaemonProcess uploads processes=1 threads=10
>>
>>  WSGIProcessGroup default
>>
>>  WSGIScriptAlias / /some/path/app.wsgi
>>
>>  <Locaton /my/url/for/uploads>
>>  WSGIProcessGroup uploads
>>  </Location>
>>
>>  ...
>>
>>  </VirtualHost>
>>
>> In other words, using <Location> directive with WSGIProcessGroup
>> within it, you can distribute URLs across multiple daemon process
>> groups with different attributes such as number of processes, maximum
>> requests etc. The WSGIProcessGroup inside of <Location> directive will
>> override that defined at scope of <VirtualHost> but only for matched
>> URL prefix.
>
> Excellent! We will probably be using a separate subdomain for handling
> file uploads. This will allow us to scale horizontally as well as
> vertically when the need arises. Good to know there is an option if
> you are unable to use subdomains.

One thing I should point out is that you aren't restricted to breaking
up application to run across multiple daemon process groups. You could
also partly run it in embedded mode. For example:

<VirtualHost *>

 WSGIDaemonProcess default processes=5 threads=10 maximum-requests=5000
 WSGIDaemonProcess uploads processes=1 threads=10

 WSGIProcessGroup default

 WSGIScriptAlias / /some/path/app.wsgi

 <Locaton /my/url/for/uploads>
 WSGIProcessGroup uploads
 </Location>

 <Location /my/most/frequently/accessed/urls>
 WSGIProcessGroup %{GLOBAL}
 </Location>

 ...

 </VirtualHost>

What have done here is delegate specific subset of URLs to be hosted
by application but in embedded mode. That is, in Apache child worker
processes.

Thus, for your most frequently handled URLs, especially if it is a
front page which actually has small run time overhead and very small
memory requirement, you could have it handled in Apache child worker
processes and skip proxying through to a daemon process. Being in
Apache child worker processes, which can be scaled up to meet demand,
if you get a burst in traffic for most popular pages, Apache will
create the additional processes automatically for you and you will not
bog down the daemon processes handling normal requests that do actual
work. One could possibly even try and Slashdot protect your
application by looking at referrer and if it comes from Slashdot
handle in embedded mode to benefit from scaling abilities of Apache.

Other interesting things you can do is identify what are memory
hogging URLs. You might by default run your application in embedded
mode, but identify which URLs are the ones which cause the size of
your processes to balloon out in size. Delegate these URLs to single
or small number of daemon processes. That way memory used by those
URLs is constrained by how many daemon processes you allow. In the
meantime, the bulk of your application still runs in more scalable
embedded mode configuration, but because you have offloaded memory
hogging URLs, you don't suffer problem as readily of all your system
memory running out if Apache has to creates lots of extra processes to
handle demand, ie., you don't have every process having full copy of
your fat application.

One final thing. WSGIDaemonProcess isn't constrained to be defined
inside of VirtualHost, it can actually be defined outside of it. When
defined outside, you can delegate applications running in different
virtual hosts to run in same daemon process group. By default these
would still be separated by virtue of being run in different
interpreters within process, but if application supports concept of
receiving requests for multiple virtual hosts, they could be made to
run in same interpreter.

An example of where this might be used is with Zope (or newer WSGI
capable Grok version). This is because Zope has what they call the
virtual host monster. This means single Zope instance can handle
multiple virtual hosts. Thus rather than separate Zope for each
virtual host, do you could direct requests against multiple virtual
hosts to same Zope instance. Thus in general one would say (ignoring
that Zope requires other steps to enable virtual host monster):

  WSGIDaemonProcess shared ...

  <VirtualHost *>
  ServerName www.one.com

  WSGIProcessGroup shared
  WSGIApplicationGroup %{GLOBAL}

  WSGIScriptAlias / /some/path/app.wsgi
  </VirtualHost>

  <VirtualHost *>
  ServerName www.two.com

  WSGIProcessGroup shared
  WSGIApplicationGroup %{GLOBAL}

  WSGIScriptAlias / /some/path/app.wsgi
  </VirtualHost>

Hope this gives you more interesting ideas. :-)

Graham

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~----------~----~----~----~------~----~------~--~---

[modwsgi] Re: Versions 1.5 and 2.2 of mod_wsgi now available.

Reply via email to