Using parallel over several computers

Anders Lind Tue, 14 Mar 2017 02:55:40 -0700

Hi GNU Parallel mailing list.

I'm looking for a way to run a large number of jobs (in parallel) onseveral computers. The job consists of running an analysis on severalthousand input files. They way I have this set up right now is to splitthe number of input files into chunks and move them to the variouscomputers, and then run the analysis using GNU parallel on each machine.This has the down side that I have to keep track which computers aredoing what.

I could perhaps set this up using the ssh functionality of parallel, butI would need to be able to on the fly stop some machines from runningjobs, since the computers belong to co-workers who sometimes need theircomputers for their own work.

My idea was that perhaps I can have a file containing the paths to allthe files I want to analyze, which is accesible to all computers on thenetwork. I would then like to set up separate parallel jobs on thevarious computers that would continuously pull paths from this inputfile, and then run analysis on it.

Of course the issue here is that I need to be able to keep track ofwhich paths have been used. Having several computers access it at thesame time seem like it would lead to I/O issues.


Am I missing some more obvious solution?

Any help very much appreciated. And I hope I am not abusing the purposeof this mailing list.

Using parallel over several computers

Reply via email to