Re: concurrency and threading question
On Oct 22, 3:44 am, Javier Guerra wrote: > On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon wrote: > > Thanks for pointing me towards celery. Its probably overkill for what > > I want to do right now but I'm going to try to set it up anyway. > > the roll-your-own alternative is just setting a DB table with the > queued tasks, and a cron job (or a long-running daemon) that fetches > the next job from the table to work on it. it's called 'Ghetto > queues'. it works and for small setups can be much ligther, but for > complex, or high-speed, or critical availability, it can quickly > become a nightmare to set up right. > > note that if you write the cron job in Python, you can easily import > Django's ORM to make really easy to share data with the webapp > > AFAIK, the 'Queue' module you mention gets it mostly right; but works > only on a single Python interpreter. If i'm not wrong, it can't > mediate between the webapp and the background job, unless you modify > eithermod_wsgior flup to spawn a thread for background > processing (Graham? what would it take to add that tomod_wsgi?) Not sure why people think they can utter my name in some arbitrary conversation and expect me to appear. :-) Anyway, I am not sure I understand what you perceive as the problem. There is no problem in spawning background threads in context of web application running under mod_wsgi. This can easily be done as side effect of import main WSGI script file, or if properly thread protected to avoid duplicates being started, triggered by a request handler. The real problem is the lifetime of the process in the context of the web server depending on your configuration. This is why the suggestion is that a separate daemon process independent of the web server be used and for data about pending jobs to be communicated by the database. Alternatively, the separate daemon process could have an XML- RPC interface and web application could communicate to it via that. In both these cases, if using a daemon process separate to the web server, you then need infrastructure such as supervisor to start it up and keep it running. This is extra setup and configuration work. Getting back to why you don't run it in the web server, for embedded mode you obviously have multiple processes and so in which does it run. If you run it in one for which request originally arrived and a future response is dependent on results cached in memory only, problem is that you can't guarantee that requests go back to same process. You can alleviate this using daemon mode of mod_wsgi, but does restrict you to single process for application. In both cases you are at the mercy of the process being restarted. For embedded mode at the whim of Apache and for daemon mode dependent on someone touching WSGI script file or similar. In both cases, if maximum number of requests defined then also when that is exceeded. One middle ground, so long as you don't periodically restart Apache, is to create a special mod_wsgi daemon mode process group consisting of a single process. This daemon process wouldn't exist for the purpose of handling requests, but purely to run your background job. Because normally web application code wouldn't be loaded until first request arrives for it, you would need to use WSGIImportScript directive to preload a script file at process startup to initiate the background thread and starting getting database from database and processing it. Doing this means for that process you are using Apache as a supervisor and so at least avoid needing to install that infrastructure separately. Now, because it is still a web server process, the script which is preloaded could itself be a variant of the normal WSGI script file, including definition of the application entry point. You could then delegate part of the URL namespace of the overall application to this single daemon mode process, thus allowing it to also handle HTTP requests. This restricted set of URLs could be those which would allow one to monitor the results of queued jobs, potentially aborting in progress jobs or changing their operation. The original URLs which triggered the jobs could also have been delegated here in the first place. It could also be a distinct WSGI application support XML-RPC interface like described before for separate daemon process outside of web server. In this case just running it as just another daemon mode process on same web server. You might just want to block any requests not coming from localhost so only accessible by main application running in same web server. Anyway, you could certainly do various odd things with mod_wsgi daemon mode if you really wanted to. Graham --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googleg
Re: concurrency and threading question
On Oct 21, 6:44 pm, Javier Guerra wrote: > On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon wrote: > > Thanks for pointing me towards celery. Its probably overkill for what > > I want to do right now but I'm going to try to set it up anyway. > > the roll-your-own alternative is just setting a DB table with the > queued tasks, and a cron job (or a long-running daemon) that fetches > the next job from the table to work on it. it's called 'Ghetto > queues'. it works and for small setups can be much ligther, but for > complex, or high-speed, or critical availability, it can quickly > become a nightmare to set up right. That's what I was thinking of doing after reading Jani's reply. I could put the data crunching code into a view and then just set a cron to fetch the view every couple of minutes. The jobs could overlap so I'd have to make sure I don't have too many running concurrently somehow. I got celery and RabbitMQ installed without any trouble so if I have time today I'll tinker with getting jobs running on it. I don't know if celery will let me call other functions or shell commands outside of the task function, or if the task function needs to be 'self contained'. If not, then ghetto queue it is... Mike --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: concurrency and threading question
On Wed, Oct 21, 2009 at 9:49 AM, Michael Thon wrote: > Thanks for pointing me towards celery. Its probably overkill for what > I want to do right now but I'm going to try to set it up anyway. the roll-your-own alternative is just setting a DB table with the queued tasks, and a cron job (or a long-running daemon) that fetches the next job from the table to work on it. it's called 'Ghetto queues'. it works and for small setups can be much ligther, but for complex, or high-speed, or critical availability, it can quickly become a nightmare to set up right. note that if you write the cron job in Python, you can easily import Django's ORM to make really easy to share data with the webapp AFAIK, the 'Queue' module you mention gets it mostly right; but works only on a single Python interpreter. If i'm not wrong, it can't mediate between the webapp and the background job, unless you modify either mod_wsgi or flup to spawn a thread for background processing (Graham? what would it take to add that to mod_wsgi?) -- Javier --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: concurrency and threading question
On Oct 21, 2009, at 11:55 AM, Daniel Roseman wrote: > > On Oct 21, 9:28 am, Mike Thon wrote: >> I'm new to web programming and I have a basic question about the >> design of my Django application. my application will do some number >> crunching on data files uploaded by users. The data processing will >> take from minutes to hours for each job. I don't expect to ever >> get a >> large number of concurrent users but I'd still like to set it up so >> that I can control the maximum number of data processing jobs that >> are >> run in parallel. I was planning to write a simple FIFO queue manager >> (in fact I think there is a python package for this) and then run the >> data processing in separate threads. I'm also planning to use the >> Django data model for storing the data so I would have multiple >> threads writing to the data store. What is not clear to me is what >> happens when I have more than one visitor to the site. Are multiple >> instances of my Django app launched, one per visitor? I need to >> ensure that I only have one queue manager running on the server, not >> one per visitor. I would be using Apache and either mySQL or sqlite3 >> as the database, in case that matters. >> >> thanks for any help >> Mike > > Take a look at the Celery project[1]. This is a great distributed task > queue for Django that I think will do exactly what you need - each job > request is sent to the queue and managed there, so you don't need to > worry about multiple instances. > > [1]:http://ask.github.com/celery/introduction.html > Thanks for pointing me towards celery. Its probably overkill for what I want to do right now but I'm going to try to set it up anyway. Mike --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: concurrency and threading question
On Oct 21, 9:28 am, Mike Thon wrote: > I'm new to web programming and I have a basic question about the > design of my Django application. my application will do some number > crunching on data files uploaded by users. The data processing will > take from minutes to hours for each job. I don't expect to ever get a > large number of concurrent users but I'd still like to set it up so > that I can control the maximum number of data processing jobs that are > run in parallel. I was planning to write a simple FIFO queue manager > (in fact I think there is a python package for this) and then run the > data processing in separate threads. I'm also planning to use the > Django data model for storing the data so I would have multiple > threads writing to the data store. What is not clear to me is what > happens when I have more than one visitor to the site. Are multiple > instances of my Django app launched, one per visitor? I need to > ensure that I only have one queue manager running on the server, not > one per visitor. I would be using Apache and either mySQL or sqlite3 > as the database, in case that matters. > > thanks for any help > Mike Take a look at the Celery project[1]. This is a great distributed task queue for Django that I think will do exactly what you need - each job request is sent to the queue and managed there, so you don't need to worry about multiple instances. [1]:http://ask.github.com/celery/introduction.html -- DR. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: concurrency and threading question
Use separate background process (daemon) to handle queue + crunching (or launching crunching). So your web app just posts jobs to background process and then returns control back to user. Otherwise your idea is quite correct. Mike Thon kirjoitti: > I'm new to web programming and I have a basic question about the > design of my Django application. my application will do some number > crunching on data files uploaded by users. The data processing will > take from minutes to hours for each job. I don't expect to ever get a > large number of concurrent users but I'd still like to set it up so > that I can control the maximum number of data processing jobs that are > run in parallel. I was planning to write a simple FIFO queue manager > (in fact I think there is a python package for this) and then run the > data processing in separate threads. I'm also planning to use the > Django data model for storing the data so I would have multiple > threads writing to the data store. What is not clear to me is what > happens when I have more than one visitor to the site. Are multiple > instances of my Django app launched, one per visitor? I need to > ensure that I only have one queue manager running on the server, not > one per visitor. I would be using Apache and either mySQL or sqlite3 > as the database, in case that matters. -- Jani Tiainen --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
concurrency and threading question
I'm new to web programming and I have a basic question about the design of my Django application. my application will do some number crunching on data files uploaded by users. The data processing will take from minutes to hours for each job. I don't expect to ever get a large number of concurrent users but I'd still like to set it up so that I can control the maximum number of data processing jobs that are run in parallel. I was planning to write a simple FIFO queue manager (in fact I think there is a python package for this) and then run the data processing in separate threads. I'm also planning to use the Django data model for storing the data so I would have multiple threads writing to the data store. What is not clear to me is what happens when I have more than one visitor to the site. Are multiple instances of my Django app launched, one per visitor? I need to ensure that I only have one queue manager running on the server, not one per visitor. I would be using Apache and either mySQL or sqlite3 as the database, in case that matters. thanks for any help Mike --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---