John Barham wrote:
On Tue, Mar 3, 2009 at 3:52 AM, hugo rivera <uai...@gmail.com> wrote:

I have to launch many tasks running in parallel (~5000) in a
cluster running linux. Each of the task performs some astronomical
calculations and I am not pretty sure if using fork is the best answer
here.
First of all, all the programming is done in python and c...

Take a look at the multiprocessing package
(http://docs.python.org/library/multiprocessing.html), newly
introduced with Python 2.6 and 3.0:

"multiprocessing is a package that supports spawning processes using
an API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads."

It should be a quick and easy way to set up a cluster-wide job
processing system (provided all your jobs are driven by Python).

Better: use parallelpython (www.parallelpython.org). Afaik multiprocessing is geared towards multi-core systems (one machine), while pp is also suitable for real clusters with more pc's. No special cluster software needed. It will start (here's your fork) a (some) python interpreters on each node, and then you can submit jobs to those 'workers'. The interpreters are kept alive between jobs, so the startup penalty becomes neglectibly when the number of jobs is large enough. Using it here to process massive amounts of satellite data, works like a charm.

Vincent.

It also looks like it's been (partially?) back-ported to Python 2.4
and 2.5: http://pypi.python.org/pypi/processing.

  John




Reply via email to