Re: concurrency and threading question

Graham Dumpleton Thu, 22 Oct 2009 03:06:01 -0700

On Oct 22, 3:44 am, Javier Guerra <jav...@guerrag.com> wrote:
> On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon <mike.t...@gmail.com> wrote:
> > Thanks for pointing me towards celery.  Its probably overkill for what
> > I want to do right now but I'm going to try to set it up anyway.
>
> the roll-your-own alternative is just setting a DB table with the
> queued tasks, and a cron job (or a long-running daemon) that fetches
> the next job from the table to work on it.  it's called 'Ghetto
> queues'.  it works and for small setups can be much ligther, but for
> complex, or high-speed, or critical availability, it can quickly
> become a nightmare to set up right.
>
> note that if you write the cron job in Python, you can easily import
> Django's ORM to make really easy to share data with the webapp
>
> AFAIK, the 'Queue' module you mention gets it mostly right; but works
> only on a single Python interpreter.  If i'm not wrong, it can't
> mediate between the webapp and the background job, unless you modify
> eithermod_wsgior flup to spawn a thread for background
> processing.... (Graham? what would it take to add that tomod_wsgi?)

Not sure why people think they can utter my name in some arbitrary
conversation and expect me to appear. :-)

Anyway, I am not sure I understand what you perceive as the problem.
There is no problem in spawning background threads in context of web
application running under mod_wsgi. This can easily be done as side
effect of import main WSGI script file, or if properly thread
protected to avoid duplicates being started, triggered by a request
handler.

The real problem is the lifetime of the process in the context of the
web server depending on your configuration. This is why the suggestion
is that a separate daemon process independent of the web server be
used and for data about pending jobs to be communicated by the
database. Alternatively, the separate daemon process could have an XML-
RPC interface and web application could communicate to it via that.

In both these cases, if using a daemon process separate to the web
server, you then need infrastructure such as supervisor to start it up
and keep it running. This is extra setup and configuration work.

Getting back to why you don't run it in the web server, for embedded
mode you obviously have multiple processes and so in which does it
run. If you run it in one for which request originally arrived and a
future response is dependent on results cached in memory only, problem
is that you can't guarantee that requests go back to same process.

You can alleviate this using daemon mode of mod_wsgi, but does
restrict you to single process for application. In both cases you are
at the mercy of the process being restarted. For embedded mode at the
whim of Apache and for daemon mode dependent on someone touching WSGI
script file or similar. In both cases, if maximum number of requests
defined then also when that is exceeded.

One middle ground, so long as you don't periodically restart Apache,
is to create a special  mod_wsgi daemon mode process group consisting
of a single process. This daemon process wouldn't exist for the
purpose of handling requests, but purely to run your background job.

Because normally web application code wouldn't be loaded until first
request arrives for it, you would need to use WSGIImportScript
directive to preload a script file at process startup to initiate the
background thread and starting getting database from database and
processing it.

Doing this means for that process you are using Apache as a supervisor
and so at least avoid needing to install that infrastructure
separately.

Now, because it is still a web server process, the script which is
preloaded could itself be a variant of the normal WSGI script file,
including definition of the application entry point. You could then
delegate part of the URL namespace of the overall application to this
single daemon mode process, thus allowing it to also handle HTTP
requests.

This restricted set of URLs could be those which would allow one to
monitor the results of queued jobs, potentially aborting in progress
jobs or changing their operation. The original URLs which triggered
the jobs could also have been delegated here in the first place.

It could also be a distinct WSGI application support XML-RPC interface
like described before for separate daemon process outside of web
server. In this case just running it as just another daemon mode
process on same web server. You might just want to block any requests
not coming from localhost so only accessible by main application
running in same web server.

Anyway, you could certainly do various odd things with mod_wsgi daemon
mode if you really wanted to.

Graham
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Re: concurrency and threading question

Reply via email to