Anders,
To my knowledge, parallel doesn't support any way to suspend a task. If you
send a kill to parallel, it will not start new tasks and then will wait for
existing tasks to complete.

However, there is a way to have parallel retry failed tasks (see
--retry-failed option).  So if your task is written in a way that it can be
killed and re-started, then you can approximate a "suspend" operation.
This would depend entirely on the task being able to save it's state and
restart from where it left off.  Then you could first send kill to parallel
and second send (appropriate signal) to individual tasks telling them to
save state and exit.  Tasks must exit in a way that tells parallel they
failed (ie: exit with return code 1), so parallel will retry them when
asked.

Just looked up in the manpage: to kill parallel, send the TERM signal:
https://www.gnu.org/software/parallel/man.html#COMPLETE-RUNNING-JOBS-BUT-DO-NOT-START-NEW-JOBS

To ask parallel to kill tasks, see --halt and --termseq options.

Cheers,
--Andy

On Wed, Mar 15, 2017 at 5:12 AM Anders Lind <[email protected]> wrote:

> HI Andy and Douglas.
>
> Thank you both for your suggestions.
> I'll look into both ways. Andy, being able to send a kill signal I think
> will be key for me since some of these analysis take weeks to finish, so I
> would be able to
> kill  or suspend them when needed.
>
> Cheers
> //Anders
>
>
> On 15/03/17 04:36, Andy Loftus wrote:
>
> Anders,
> Take a look at the --sqlmaster and --sqlworker options.
>
> I use them to effectively create a jobqueue that any node can pull tasks
> from. I do this for long running backups on a parallel filesystem (all
> nodes have read/write access to the data and the sql joblog file).
>
> 1. Create a list of "tasks" and send that to parallel invoked with the
> --sqlmaster option.  The sqlmaster option will create the joblog and exit.
>
> 2. On any machine that has access to the joblog file AND the data, run
> parallel with the --sqlworker option.  As new machines come available, you
> can start parallel on them in the same manner.  To stop work on a
> particular node, send a KILL signal to the parallel process on that node,
> which will stop spawning any new jobs and exit after existing tasks have
> completed.
>
> In my case, each "task" is a bash script file, and I list them, one per
> line, in a tasklist file, such as:
> /path/to/001.cmd
> /path/to/002.cmd
> ...
> /path/to/675.cmd
>
> The parallel sqlmaster cmdline is then:
> parallel -a "/path/to/tasklist" --sqlmaster "$DBURL" bash
>
> The DBURL is now a task queue as well as a joblog.
>
> The parallel sqlworker cmdline is:
> parallel --sqlworker "$DBURL"
>
> Some advantages here are:
> + The original (sqlmaster) host does not have to control the parallel
> process and keep spawning new tasks on all the workers.
> + The worker nodes can each run at their own width (-j option).  This
> might allow you to run a low task count on the worker nodes without
> interfering with other users on the node.  You could even stop and restart
> with different -j values as needed throughout the day.
> + Worker nodes can be started simply by running parallel on each. And can
> be stopped by sending a KILL to the local parallel on that node.
>
> NOTE: The sql* options have very recent changes to them so make sure you
> are using the most recent version of parallel.
>
> Hope this is helpful.
>
> Cheers,
> --Andy
>
> On Tue, Mar 14, 2017 at 10:56 AM Douglas A. Augusto <[email protected]>
> wrote:
>
> On 14/03/2017 at 10:54,
> Anders Lind <[email protected]> wrote:
>
> > I could perhaps set this up using the ssh functionality of parallel, but
> I
> > would need to be able to on the fly stop some machines from running jobs,
> > since the computers belong to co-workers who sometimes need their
> computers
> > for their own work.
>
> Hi Anders,
>
> The following thread may interest you:
>
>    Dynamically changing remote servers list
>    https://lists.nongnu.org/archive/html/parallel/2014-08/msg00012.html
>
> Based on that, at the time I made a shell script that keeps parallel's
> sshloginfile updated by filtering out unreachable remote servers and also
> allowing the user to edit (include and/or exclude remote servers)
> on-the-fly:
>
>    https://github.com/daaugusto/gnuparallel
>
> PS: It worked with older versions of GNU Parallel (I haven't tested it with
> more recent ones yet), so you mileage may vary.
>
> --
> Douglas A. Augusto
>
>
> --
> Anders Lind
> Molecular Evolution
> Department of Cell and Molecular Biology
> Biomedical Centre
> Uppsala University
> Box 596
> 751 23 Uppsala
> Sweden
> phone: +46 18 471 4058 <+46%2018%20471%2040%2058>
>
>

Reply via email to