On 08/10/2013, at 3:43 AM, Rodrigo Campos <[email protected]> wrote:

> On Fri, Oct 04, 2013 at 12:28:21PM -0700, Rob Newman wrote:
>> Hi mod_wsgi folks,
>> 
>> I am an avid user of mod_wsgi, and have been asked for my opinion on how to
>> best host a web-based Python-powered app that has bad memory management.
>> 
>> Basically the Python script uses R (http://www.r-project.org/) via the rpy2
>> (http://rpy.sourceforge.net/rpy2.html) package. The app developer says that R
>> is notorious for leaving "stuff" around in memory, so he is arguing for 
>> using a
>> new CGI process as a design feature as it guarantees that each new request
>> starts with a clean slate.
>> 
>> I am unconvinced of this precluding using mod_wsgi, but I am interested in 
>> the
>> community opinion, and also how mod_wsgi would handle a script that has poor
> 
> Have you checked 
> https://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGIDaemonProcess
> the maximum-requests opt in particular ?
> 
> If you run modwsgi in daemon mode and set this maximum-requests option to
> something that works for you, then you are done. If you read the documentation
> it says:
> 
>    Defines a limit on the number of requests a daemon process should process
>    before it is shutdown and restarted. Setting this to a non zero value has
>    the benefit of limiting the amount of memory that a process can consume by
>    (accidental) memory leakage. 
> 
> 
> So, no. It does not forbid you to use modwsgi.
> 
> Keep in mind that if you expect high volumes of traffic, using this options is
> not recommended. Basically starting the python interpreter takes time and if 
> you
> have a peak in traffic, the maximum-requests threshold will be hit more easy 
> and
> then restarted more frequently, that only makes the queue bigger...
> 
> But this should be true for CGI also

Since it is unlikely that the Python module for R is used on every single URL, 
you can vertically segment your URL namespace and delegate things such that 
only the URLs that use R are in the process with a low maximum-requests.

WSGIDaemonProcess main processes=3 threads=5
WSGIDaemonProcess memory-greedy threads=5 maximum-requests=5

WSGIProcessGroup main
WSGIApplicationGroup %{GLOBAL}

<Location /suburl/that/uses/R>
WSGIProcessGroup memory-greedy
</Location>

All URLs of a web application nearly never have similar requirements, yet it is 
common that people try and shove them under one WSGI server configuration. The 
WSGI server configuration therefore often ends up being a compromise to support 
the worst URL.

By vertically segmenting an application and delegate URLs to different mod_wsgi 
daemon process groups, you can then tailor the configuration and make things 
run more efficiently and with less memory.

>> memory management. Should I be telling the app developer to do some internal
>> garbage collection, so that this is not an issue that spirals up to the
> 
> What ? I'm not sure I follow you there, but it seems you are mixing things. If
> the "stuff" is really not used, the garbage collector will free it. If the
> garbage collection doesn't free it when it runs, forcing the garbage 
> collection
> yourself won't help.

All depends on what is causing the problem. If the extension module for R is 
simply written poorly and doesn't give garbage collection hints properly, or if 
R itself just uses a lot of memory, possibly not much you can do about it.

Graham

-- 
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/modwsgi.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to