Re: Using parallel over several computers

Anders Lind Wed, 15 Mar 2017 03:12:46 -0700

HI Andy and Douglas.

Thank you both for your suggestions.

I'll look into both ways. Andy, being able to send a kill signal I thinkwill be key for me since some of these analysis take weeks to finish, soI would be able to

kill  or suspend them when needed.


Cheers
//Anders

On 15/03/17 04:36, Andy Loftus wrote:

Anders,
Take a look at the --sqlmaster and --sqlworker options.
I use them to effectively create a jobqueue that any node can pulltasks from. I do this for long running backups on a parallelfilesystem (all nodes have read/write access to the data and the sqljoblog file).
1. Create a list of "tasks" and send that to parallel invoked with the--sqlmaster option. The sqlmaster option will create the joblog and exit.
2. On any machine that has access to the joblog file AND the data, runparallel with the --sqlworker option. As new machines come available,you can start parallel on them in the same manner. To stop work on aparticular node, send a KILL signal to the parallel process on thatnode, which will stop spawning any new jobs and exit after existingtasks have completed.
In my case, each "task" is a bash script file, and I list them, oneper line, in a tasklist file, such as:
/path/to/001.cmd
/path/to/002.cmd
...
/path/to/675.cmd

The parallel sqlmaster cmdline is then:
parallel -a "/path/to/tasklist" --sqlmaster "$DBURL" bash

The DBURL is now a task queue as well as a joblog.

The parallel sqlworker cmdline is:
parallel --sqlworker "$DBURL"

Some advantages here are:
+ The original (sqlmaster) host does not have to control the parallelprocess and keep spawning new tasks on all the workers.+ The worker nodes can each run at their own width (-j option). Thismight allow you to run a low task count on the worker nodes withoutinterfering with other users on the node. You could even stop andrestart with different -j values as needed throughout the day.+ Worker nodes can be started simply by running parallel on each. Andcan be stopped by sending a KILL to the local parallel on that node.
NOTE: The sql* options have very recent changes to them so make sureyou are using the most recent version of parallel.
Hope this is helpful.

Cheers,
--Andy
On Tue, Mar 14, 2017 at 10:56 AM Douglas A. Augusto<[email protected] <mailto:[email protected]>> wrote:
    On 14/03/2017 at 10:54,
    Anders Lind <[email protected] <mailto:[email protected]>>
    wrote:

    > I could perhaps set this up using the ssh functionality of
    parallel, but I
    > would need to be able to on the fly stop some machines from
    running jobs,
    > since the computers belong to co-workers who sometimes need
    their computers
    > for their own work.

    Hi Anders,

    The following thread may interest you:

       Dynamically changing remote servers list
    https://lists.nongnu.org/archive/html/parallel/2014-08/msg00012.html

    Based on that, at the time I made a shell script that keeps parallel's
    sshloginfile updated by filtering out unreachable remote servers
    and also
    allowing the user to edit (include and/or exclude remote servers)
    on-the-fly:

    https://github.com/daaugusto/gnuparallel

    PS: It worked with older versions of GNU Parallel (I haven't
    tested it with
    more recent ones yet), so you mileage may vary.

    --
    Douglas A. Augusto


--
Anders Lind
Molecular Evolution
Department of Cell and Molecular Biology
Biomedical Centre
Uppsala University
Box 596
751 23 Uppsala
Sweden
phone: +46 18 471 4058

Re: Using parallel over several computers

Reply via email to