[modwsgi] Re: WSGI application files affecting each other

Graham Dumpleton Tue, 21 Jul 2009 05:24:54 -0700

2009/7/21 Malcolm Lalkaka <[email protected]>:
>
> On Tue, 2009-07-21 at 16:12 +1000, Graham Dumpleton wrote:
>> 2009/7/21 Malcolm Lalkaka <[email protected]>:
>> >
>> > On Mon, Jul 20, 2009 at 1:20 AM, Graham
>> > Dumpleton<[email protected]> wrote:
>> >>
>> >> 2009/7/20 Graham Dumpleton <[email protected]>:
>> >>> 2009/7/20 Graham Dumpleton <[email protected]>:
>> >>>> 2009/7/20 Malcolm <[email protected]>:
>> >>>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am using mod_wsgi 2.3 with Apache 2.2.11 on Ubuntu 9.04.
>> >>>>>
>> >>>>> I seem to be having problems where the code I put in on of my WSGI
>> >>>>> application files, (django.wsgi) is affecting the (sub) interpreters
>> >>>>> of other WSGI applications.
>> >>>>>
>> >>>>> Here is the relevant part of the django.wsgi file:
>> >>>>> ----------
>> >>>>> ...
>> >>>>> import warnings
>> >>>>> warnings.filterwarnings(action="ignore",
>> >>>>>    message="^the sets module is deprecated$",
>> >>>>> category=DeprecationWarning,
>> >>>>>    module="MySQLdb", lineno=34)
>> >>>>>
>> >>>>> # Determine the absolute path of the Django project directory that
>> >>>>> contains
>> >>>>> # this
>> >>>>> project.
>> >>>>> ...
>> >>>>>
>> >>>>> # If the Django project directory is not in the Python path, add
>> >>>>> it.
>> >>>>> ...
>> >>>>>
>> >>>>> os.environ['DJANGO_SETTINGS_MODULE'] = DJANGO_PROJ + '.settings'
>> >>>>>
>> >>>>> import django.core.handlers.wsgi
>> >>>>> application = django.core.handlers.wsgi.WSGIHandler()
>> >>>>> ----------
>> >>>>>
>> >>>>> In the above WSGI application file, I suppress a warning emitted by a
>> >>>>> particular module. I expected this warning to be suppressed only for
>> >>>>> this one WSGI application, but this is not the case. Say I go to /
>> >>>>> site1 where the WSGI application file suppresses the warning, then
>> >>>>> sure enough, there will be no warning in the Apache error log.
>> >>>>> However, if after visiting that WSGI application, I now go to /site2
>> >>>>> (another WSGI application that doesn't have that warning suppressed),
>> >>>>> the warning does not show up.
>> >>>>>
>> >>>>> Yet, if I restart Apache and visit /site2 first, the warning will
>> >>>>> appear in the log.
>> >>>>>
>> >>>>> I am running mod_wsgi in daemon mode with multiple WSGI applications.
>> >>>>> All the applications are running within the same process group;
>> >>>>> however, I can change this if I need to. I am using the Apache prefork
>> >>>>> MPM, and I have only one virtual host.
>> >>>>>
>> >>>>> I know this problem seems small and insignificant, since it is just
>> >>>>> affecting spam output to the Apache error log. However, it signals a
>> >>>>> larger problem for me: it means that my WSGI applications are somehow
>> >>>>> sharing state, which I don't want.
>> >>>>
>> >>>> The separation between sub interpreters isn't always perfect. If a C
>> >>>> extension module is used in implementing a Python module isn't
>> >>>> implemented correctly so as to separate data for different sub
>> >>>> interpreters properly, you can have issues.
>> >>>>
>> >>>> In this case though, we are talking about a core Python module and for
>> >>>> it I suspect it is operating on the Python core and so all
>> >>>> interpreters within the process and not just the one the module was
>> >>>> used from are affected. A quick look at the code shows:
>> >>>>
>> >>>> try:
>> >>>>    from _warnings import (filters, default_action, once_registry,
>> >>>>                            warn, warn_explicit)
>> >>>>    defaultaction = default_action
>> >>>>    onceregistry = once_registry
>> >>>>    _warnings_defaults = True
>> >>>> except ImportError:
>> >>>>    filters = []
>> >>>>    defaultaction = "default"
>> >>>>    onceregistry = {}
>> >>>>
>> >>>> So, what it does is try and import 'filters' from C extension module
>> >>>> _warnings. That value is a list and is actually a reference to a
>> >>>> global static C variable. As such, the same list will be imported into
>> >>>> all sub interpreters and changes made in one sub interpreter will
>> >>>> change what happens in other sub interpreters.
>> >>>>
>> >>>> This sharing of data between sub interpreters is actually usually not
>> >>>> a good thing to do and so am surprised to see this. I will have to do
>> >>>> a bit more research on this and post to the Python list asking about
>> >>>> why it is this way since it doesn't provide proper isolation for sub
>> >>>> interpreters.
>> >>>>
>> >>>> BTW, in mod_wsgi 3.0, you can use the WSGIPythonWarnings directive to
>> >>>> control warnings from configuration file.
>> >>>>
>> >>>>  WSGIPythonWarnings ignore::DeprecationWarning::
>> >>>>
>> >>>> This is done when Python first initialised and affects all sub
>> >>>> interpreters. But then, as you have demonstrated, there is no
>> >>>> separation where control of warnings is concerned.
>> >>>>
>> >>>> Anyway, for proper separation, looks like you will need to delegate
>> >>>> each WSGI application to a different daemon process group.
>> >>>>
>> >>>> Thanks for raising this issue as I didn't know about it.
>> >>>
>> >>> The other thing you may be able to do is:
>> >>>
>> >>>  import warnings
>> >>>  warnings.filters = list(warnings.filters)
>> >>>  warnings.onceregistry = list(warnings.onceregistry)
>> >>
>> >> Whoops.
>> >>
>> >>  warnings.onceregistry = dict(warnings.onceregistry)
>> >
>> > Hi Graham,
>> >
>> > Thanks for the quick reply.
>> >
>> > I don't quite understand what the above three lines would do. Also,
>> > since it could cause some unexpected behaviour, maybe it's better to
>> > go with the a separate daemon process per application. But I'm not
>> > quite sure how to do that. Here's my Apache configuration pertaining
>> > to WSGI:
>> > ----------
>> > WSGIDaemonProcess MainGroup
>> > WSGIProcessGroup MainGroup
>> >
>> > WSGIScriptalias /vc /usr/local/lib/django-projects/vc/apache/django.wsgi
>> > <Directory /usr/local/lib/django-projects>
>> >    Order Deny,Allow
>> >    Allow from all
>> > </Directory>
>> > Alias /vc/media /usr/local/lib/django-projects/vc/media
>> >
>> > # Personal Django workspaces. Basically, this allows users on the
>> > # system to host multiple Django projects in ~/django-projects/, and
>> > # access them via http://<domain>/~<username>/<project_name>.
>> > WSGIScriptAliasMatch ^/~([^/]+)/([^/]+)
>> > /home/$1/django-projects/$2/apache/django.wsgi
>>
>> I am not entirely sure this is going to work as you might want. You
>> have to be a bit careful with WSGIScriptAliasMatch as it can adjust
>> SCRIPT_NAME, ie., the mount point of WSGI application as seen by the
>> application in ways you might not expect.
>>
>> If you use a simple hello world WSGI application and echo back
>> REQUEST_URI and SCRIPT_NAME from WSGI 'environ', what does it say?
>>
>> It may be better to use the AddHandler method.
>>
>> > <DirectoryMatch ^/home/([^/]+)/django-projects/([^/]+)/media>
>> >    Order Deny,Allow
>> >    Allow from all
>> > </DirectoryMatch>
>> > AliasMatch ^/~([^/]+)/([^/]+)/media/(.*) 
>> > /home/$1/django-projects/$2/media/$3
>> > ----------
>> >
>> > All of the above is contained within one virtual host. So, if I
>> > understand the WSGI directives correctly, currently, all WSGI
>> > applications are running within the MainGroup process group. How can I
>> > specify that each application, even within the "personal Django
>> > workspaces", as I call them, should be in a unique process group? Is
>> > this even possible?
>>
>> Providing each user with their own daemon process is a bit of a manual
>> process at the moment unfortunately as you need to enumerate a
>> WSGIDaemonProcess for each user.
>>
>> I'll describe it later when have a bit of time, but can you post the
>> information about what you get for REQUEST_URI and SCRIPT_NAME with
>> that current setup.
>
> Sure thing. Here's the sample WSGI application I created, based off the
> Hello World example at
> http://code.google.com/p/modwsgi/wiki/QuickConfigurationGuide .
>
> ----- ~mlalkaka/django-projects/wsgitest/apache/django.wsgi -----
> def application(environ, start_response):
>    status = '200 OK'
>    output = 'REQUEST_URI: %s\nSCRIPT_NAME: %s' % \
>        (environ['REQUEST_URI'], environ['SCRIPT_NAME'])
>
>    response_headers = [('Content-type', 'text/plain'),
>                        ('Content-length', str(len(output)))]
>    start_response(status, response_headers)
>
>    return [output]
> -----------------------------------------------------------------
>
> And here's the output when I request this application from the web
> server.
>
> ----- http://<domain>/~mlalkaka/wsgitest/ -----
> REQUEST_URI: /~mlalkaka/wsgitest/
> SCRIPT_NAME: /~mlalkaka/wsgitest


That is okay. I was getting my logic round the wrong way. The problem
case is where people were expecting SCRIPT_NAME to reflect that WSGI
application was mounted at root of web site. In your case you don't
want that, so all is okay.

> -----------------------------------------------
>
> I've been running this configuration with 4 users for about 3 months.
> Each user is hosting 1-2 WSGI applications in their home directories.
> Additionally, there is one WSGI application that is outside everyone's
> home directory. So I'm usually hosting about 5-6 WSGI applications like
> this. Furthermore, since we're working as a team on the same project,
> most of the WSGI applications have the same Python module name.
>
> So far, none of this has caused a [noticeable] problem (:D), until I
> noticed the warnings issue. Plus, the documentation I found on
> http://code.google.com/p/modwsgi/wiki/ProcessesAndThreading is
> promising:
>        "In other words, a change to the global data within the context
>        of one sub interpreter will not be seen from the sub interpreter
>        corresponding to a different WSGI application. This will be the
>        case whether or not the sub interpreters are in the same
>        process."

Since you are a small group of people all working together, what I
would suggest is the following.

# Block anyone from accidentally running requests in embedded mode.

WSGIRestrictEmbedded On

# Create a daemon process for main Django instance. This will run as
Apache user.

WSGIDaemonProcess main display-name=%{GROUP}

# Map the main application.

WSGIScriptalias /vc /usr/local/lib/django-projects/vc/apache/django.wsgi

# Map media for main project.

Alias /vc/media /usr/local/lib/django-projects/vc/media

# Setup permissions for this project and the process group.

<Directory /usr/local/lib/django-projects/vc/apache>

   # Delegate main application to main daemon process group.

   WSGIProcessGroup main

   Order Deny,Allow
   Allow from all
</Directory>

# Personal Django workspaces. Basically, this allows users on the
# system to host multiple Django projects in ~/django-projects/, and
# access them via http://<domain>/~<username>/<project_name>.

# Map the .wsgi script files. Setup up permissions for this later.

WSGIScriptAliasMatch ^/~([^/]+)/([^/]+)
/home/$1/django-projects/$2/apache/django.wsgi

# Map the media files for the projects and set permissions.

AliasMatch ^/~([^/]+)/([^/]+)/media/(.*) /home/$1/django-projects/$2/media/$3

<DirectoryMatch ^/home/([^/]+)/django-projects/([^/]+)/media>
   Order Deny,Allow
   Allow from all
</DirectoryMatch>

# Now create a few daemon process groups for each user who can
# use this. These run as the user. If testing with different configurations
# is important, you might set process/threads differently for each. Thus
# could have single process/multithreaded, multiprocess/multithreaded
# and multiprocess/single threaded. Note that default is singel process,
# but don't set 'processes=1' as it subtly means something different in
# that it any use of 'processes' option sets wgsi.multiprocess even if
# set to 1 processes explicitly. One example given her for username1.

WSGIDaemonProcess username1/1 display-name=%{GLOBAL} user=username1
group=username1 threads=15
WSGIDaemonProcess username1/2 display-name=%{GLOBAL} user=username1
group=username1 processes=2 threads=15
WSGIDaemonProcess username1/3 display-name=%{GLOBAL} user=username1
group=username1 processes=2 threads=1

WSGIDaemonProcess username2/1 display-name=%{GLOBAL} user=username2
group=username2
WSGIDaemonProcess username2/2 display-name=%{GLOBAL} user=username2
group=username2
WSGIDaemonProcess username2/3 display-name=%{GLOBAL} user=username2
group=username2

WSGIDaemonProcess username3/1 display-name=%{GLOBAL} user=username3
group=username3
WSGIDaemonProcess username3/2 display-name=%{GLOBAL} user=username3
group=username3
WSGIDaemonProcess username3/3 display-name=%{GLOBAL} user=username3
group=username3

WSGIDaemonProcess username4/1 display-name=%{GLOBAL} user=username4
group=username4
WSGIDaemonProcess username4/2 display-name=%{GLOBAL} user=username4
group=username4
WSGIDaemonProcess username4/3 display-name=%{GLOBAL} user=username4
group=username4

<DirectoryMatch ^/home/username1/django-projects/([^/]+)/apache>
# Restrict this user to just their processes.
WSGIRestrictProcess username1/1 username1/2 username1/3

# Allow overrides to be done for process group and application group.
WSGIProcessGroup %{ENV:PROCESS_GROUP}
WSGIApplicationGroup %{ENV:APPLICATION_GROUP}

# Set defaults for process group and application group.
SetEnv PROCESS_GROUP username1/1
Setenv APPLICATION_GROUP %{RESOURCE}

# Allow process group and application group to be overridden in .htaccess
# file in 'apache' directory of Django project.
AllowOverride FileInfo
</Directory>

<DirectoryMatch ^/home/username2/django-projects/([^/]+)/apache>
# Restrict this user to just their processes.
WSGIRestrictProcess username2/1 username2/2 username2/3

# Allow overrides to be done for process group and application group.
WSGIProcessGroup %{ENV:PROCESS_GROUP}
WSGIApplicationGroup %{ENV:APPLICATION_GROUP}

# Set defaults for process group and application group.
SetEnv PROCESS_GROUP username2/1
Setenv APPLICATION_GROUP %{RESOURCE}

# Allow process group and application group to be overridden in .htaccess
# file in 'apache' directory of Django project.
AllowOverride FileInfo
</Directory>

... Keep going for other users.

What will this do. It means each user gets own process groups which run as them.

By default all their applications will run in first process group of their own.

They can however by adding .htaccess file into 'apache' directory
along side of the .wsgi file control to which of their process groups
an application is delegated. This is done by setting in .htaccess
file:

SetEnv PROCESS_GROUP username1/2

They can also if need be because of issues with third party modules,
delegate an application to specifically run in the main interpreter of
the process. This would be done by having in .htaccess file for that
application:

SetEnv APPLICATION_GROUP %{GLOBAL}

The WSGIRestrictEmbedded directive is to disable running stuff in
embedded mode by accident.

The WSGIRestrictProcess directive is so that they can only delegate
their applications to their own processes and thus not be able to run
their code as a different user.

Okay, it looks complicated, but that will give you the most amount of
flexibility, including users being able to control themselves which
daemon process group and application group an application runs in,
without needing to restart Apache. Because daemon processes are own by
themselves, they can simply send a 'kill SIGINT' to them if they want
to restart them to drop an application because they changed where it
was running. The 'display-name' option to WSGIDaemonProcess directive
allows 'ps' command to be easily used to identify which processes are
their own and which daemon process group they belong to.

$ ps -x -o user,pid,command | grep wsgi
grahamd 58337 (wsgi:django)   -D FOREGROUND
grahamd 58338 (wsgi:django)   -D FOREGROUND
grahamd 58339 (wsgi:django)   -D FOREGROUND
grahamd 58355 (wsgi:grahamd) -D FOREGROUND

Have a look through that and a bit of a play and let me know of any
questions about it, especially if I stuffed up on details anywhere.
:-)

Graham

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"modwsgi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/modwsgi?hl=en
-~----------~----~----~----~------~----~------~--~---

[modwsgi] Re: WSGI application files affecting each other

Reply via email to