subject:"concurrency and threading question"

Re: concurrency and threading question

2009-10-22 Thread Graham Dumpleton

On Oct 22, 3:44 am, Javier Guerra  wrote:
> On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon  wrote:
> > Thanks for pointing me towards celery.  Its probably overkill for what
> > I want to do right now but I'm going to try to set it up anyway.
>
> the roll-your-own alternative is just setting a DB table with the
> queued tasks, and a cron job (or a long-running daemon) that fetches
> the next job from the table to work on it.  it's called 'Ghetto
> queues'.  it works and for small setups can be much ligther, but for
> complex, or high-speed, or critical availability, it can quickly
> become a nightmare to set up right.
>
> note that if you write the cron job in Python, you can easily import
> Django's ORM to make really easy to share data with the webapp
>
> AFAIK, the 'Queue' module you mention gets it mostly right; but works
> only on a single Python interpreter.  If i'm not wrong, it can't
> mediate between the webapp and the background job, unless you modify
> eithermod_wsgior flup to spawn a thread for background
> processing (Graham? what would it take to add that tomod_wsgi?)

Not sure why people think they can utter my name in some arbitrary
conversation and expect me to appear. :-)

Anyway, I am not sure I understand what you perceive as the problem.
There is no problem in spawning background threads in context of web
application running under mod_wsgi. This can easily be done as side
effect of import main WSGI script file, or if properly thread
protected to avoid duplicates being started, triggered by a request
handler.

The real problem is the lifetime of the process in the context of the
web server depending on your configuration. This is why the suggestion
is that a separate daemon process independent of the web server be
used and for data about pending jobs to be communicated by the
database. Alternatively, the separate daemon process could have an XML-
RPC interface and web application could communicate to it via that.

In both these cases, if using a daemon process separate to the web
server, you then need infrastructure such as supervisor to start it up
and keep it running. This is extra setup and configuration work.

Getting back to why you don't run it in the web server, for embedded
mode you obviously have multiple processes and so in which does it
run. If you run it in one for which request originally arrived and a
future response is dependent on results cached in memory only, problem
is that you can't guarantee that requests go back to same process.

You can alleviate this using daemon mode of mod_wsgi, but does
restrict you to single process for application. In both cases you are
at the mercy of the process being restarted. For embedded mode at the
whim of Apache and for daemon mode dependent on someone touching WSGI
script file or similar. In both cases, if maximum number of requests
defined then also when that is exceeded.

One middle ground, so long as you don't periodically restart Apache,
is to create a special  mod_wsgi daemon mode process group consisting
of a single process. This daemon process wouldn't exist for the
purpose of handling requests, but purely to run your background job.

Because normally web application code wouldn't be loaded until first
request arrives for it, you would need to use WSGIImportScript
directive to preload a script file at process startup to initiate the
background thread and starting getting database from database and
processing it.

Doing this means for that process you are using Apache as a supervisor
and so at least avoid needing to install that infrastructure
separately.

Now, because it is still a web server process, the script which is
preloaded could itself be a variant of the normal WSGI script file,
including definition of the application entry point. You could then
delegate part of the URL namespace of the overall application to this
single daemon mode process, thus allowing it to also handle HTTP
requests.

This restricted set of URLs could be those which would allow one to
monitor the results of queued jobs, potentially aborting in progress
jobs or changing their operation. The original URLs which triggered
the jobs could also have been delegated here in the first place.

It could also be a distinct WSGI application support XML-RPC interface
like described before for separate daemon process outside of web
server. In this case just running it as just another daemon mode
process on same web server. You might just want to block any requests
not coming from localhost so only accessible by main application
running in same web server.

Anyway, you could certainly do various odd things with mod_wsgi daemon
mode if you really wanted to.

Graham
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googleg

Re: concurrency and threading question

2009-10-21 Thread Mike Thon

On Oct 21, 6:44 pm, Javier Guerra  wrote:
> On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon  wrote:
> > Thanks for pointing me towards celery.  Its probably overkill for what
> > I want to do right now but I'm going to try to set it up anyway.
>
> the roll-your-own alternative is just setting a DB table with the
> queued tasks, and a cron job (or a long-running daemon) that fetches
> the next job from the table to work on it.  it's called 'Ghetto
> queues'.  it works and for small setups can be much ligther, but for
> complex, or high-speed, or critical availability, it can quickly
> become a nightmare to set up right.

That's what I was thinking of doing after reading Jani's reply.  I
could put the data crunching code into a view and then just set a cron
to fetch the view every couple of minutes.  The jobs could overlap so
I'd have to make sure I don't have too many running concurrently
somehow.

I got celery and RabbitMQ installed without any trouble so if I have
time today I'll tinker with getting jobs running on it.  I don't know
if celery will let me call other functions or shell commands outside
of the task function, or if the task function needs to be 'self
contained'.  If not, then ghetto queue it is...
Mike

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: concurrency and threading question

2009-10-21 Thread Javier Guerra

On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon  wrote:
> Thanks for pointing me towards celery.  Its probably overkill for what
> I want to do right now but I'm going to try to set it up anyway.

the roll-your-own alternative is just setting a DB table with the
queued tasks, and a cron job (or a long-running daemon) that fetches
the next job from the table to work on it.  it's called 'Ghetto
queues'.  it works and for small setups can be much ligther, but for
complex, or high-speed, or critical availability, it can quickly
become a nightmare to set up right.

note that if you write the cron job in Python, you can easily import
Django's ORM to make really easy to share data with the webapp

AFAIK, the 'Queue' module you mention gets it mostly right; but works
only on a single Python interpreter.  If i'm not wrong, it can't
mediate between the webapp and the background job, unless you modify
either mod_wsgi or flup to spawn a thread for background
processing (Graham? what would it take to add that to mod_wsgi?)

-- 
Javier

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: concurrency and threading question

2009-10-21 Thread Michael Thon



On Oct 21, 2009, at 11:55 AM, Daniel Roseman wrote:

>
> On Oct 21, 9:28 am, Mike Thon  wrote:
>> I'm new to web programming and I have a basic question about the
>> design of my Django application.  my application will do some number
>> crunching on data files uploaded by users.  The data processing will
>> take from minutes to hours for each job.  I don't expect to ever  
>> get a
>> large number of concurrent users but I'd still like to set it up so
>> that I can control the maximum number of data processing jobs that  
>> are
>> run in parallel.  I was planning to write a simple FIFO queue manager
>> (in fact I think there is a python package for this) and then run the
>> data processing in separate threads.  I'm also planning to use the
>> Django data model for storing the data so I would have multiple
>> threads writing to the data store. What is not clear to me is what
>> happens when I have more than one visitor to the site.  Are multiple
>> instances of my Django app launched, one per visitor?   I need to
>> ensure that I only have one queue manager running on the server, not
>> one per visitor.  I would be using Apache and either mySQL or sqlite3
>> as the database, in case that matters.
>>
>> thanks for any help
>> Mike
>
> Take a look at the Celery project[1]. This is a great distributed task
> queue for Django that I think will do exactly what you need - each job
> request is sent to the queue and managed there, so you don't need to
> worry about multiple instances.
>
> [1]:http://ask.github.com/celery/introduction.html
>
Thanks for pointing me towards celery.  Its probably overkill for what  
I want to do right now but I'm going to try to set it up anyway.
Mike


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: concurrency and threading question

2009-10-21 Thread Daniel Roseman


On Oct 21, 9:28 am, Mike Thon  wrote:
> I'm new to web programming and I have a basic question about the
> design of my Django application.  my application will do some number
> crunching on data files uploaded by users.  The data processing will
> take from minutes to hours for each job.  I don't expect to ever get a
> large number of concurrent users but I'd still like to set it up so
> that I can control the maximum number of data processing jobs that are
> run in parallel.  I was planning to write a simple FIFO queue manager
> (in fact I think there is a python package for this) and then run the
> data processing in separate threads.  I'm also planning to use the
> Django data model for storing the data so I would have multiple
> threads writing to the data store. What is not clear to me is what
> happens when I have more than one visitor to the site.  Are multiple
> instances of my Django app launched, one per visitor?   I need to
> ensure that I only have one queue manager running on the server, not
> one per visitor.  I would be using Apache and either mySQL or sqlite3
> as the database, in case that matters.
>
> thanks for any help
> Mike

Take a look at the Celery project[1]. This is a great distributed task
queue for Django that I think will do exactly what you need - each job
request is sent to the queue and managed there, so you don't need to
worry about multiple instances.

[1]:http://ask.github.com/celery/introduction.html

--
DR.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: concurrency and threading question

2009-10-21 Thread Jani Tiainen


Use separate background process (daemon) to handle queue + crunching (or 
launching crunching). So your web app just posts jobs to background 
process and then returns control back to user.

Otherwise your idea is quite correct.

Mike Thon kirjoitti:
> I'm new to web programming and I have a basic question about the
> design of my Django application.  my application will do some number
> crunching on data files uploaded by users.  The data processing will
> take from minutes to hours for each job.  I don't expect to ever get a
> large number of concurrent users but I'd still like to set it up so
> that I can control the maximum number of data processing jobs that are
> run in parallel.  I was planning to write a simple FIFO queue manager
> (in fact I think there is a python package for this) and then run the
> data processing in separate threads.  I'm also planning to use the
> Django data model for storing the data so I would have multiple
> threads writing to the data store. What is not clear to me is what
> happens when I have more than one visitor to the site.  Are multiple
> instances of my Django app launched, one per visitor?   I need to
> ensure that I only have one queue manager running on the server, not
> one per visitor.  I would be using Apache and either mySQL or sqlite3
> as the database, in case that matters.


-- 
Jani Tiainen

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

concurrency and threading question

2009-10-21 Thread Mike Thon


I'm new to web programming and I have a basic question about the
design of my Django application.  my application will do some number
crunching on data files uploaded by users.  The data processing will
take from minutes to hours for each job.  I don't expect to ever get a
large number of concurrent users but I'd still like to set it up so
that I can control the maximum number of data processing jobs that are
run in parallel.  I was planning to write a simple FIFO queue manager
(in fact I think there is a python package for this) and then run the
data processing in separate threads.  I'm also planning to use the
Django data model for storing the data so I would have multiple
threads writing to the data store. What is not clear to me is what
happens when I have more than one visitor to the site.  Are multiple
instances of my Django app launched, one per visitor?   I need to
ensure that I only have one queue manager running on the server, not
one per visitor.  I would be using Apache and either mySQL or sqlite3
as the database, in case that matters.

thanks for any help
Mike

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---

Re: concurrency and threading question

Re: concurrency and threading question

Re: concurrency and threading question

Re: concurrency and threading question

Re: concurrency and threading question

Re: concurrency and threading question

concurrency and threading question

7 matches

Site Navigation

Mail list logo

Footer information