Anders, Take a look at the --sqlmaster and --sqlworker options. I use them to effectively create a jobqueue that any node can pull tasks from. I do this for long running backups on a parallel filesystem (all nodes have read/write access to the data and the sql joblog file).
1. Create a list of "tasks" and send that to parallel invoked with the --sqlmaster option. The sqlmaster option will create the joblog and exit. 2. On any machine that has access to the joblog file AND the data, run parallel with the --sqlworker option. As new machines come available, you can start parallel on them in the same manner. To stop work on a particular node, send a KILL signal to the parallel process on that node, which will stop spawning any new jobs and exit after existing tasks have completed. In my case, each "task" is a bash script file, and I list them, one per line, in a tasklist file, such as: /path/to/001.cmd /path/to/002.cmd ... /path/to/675.cmd The parallel sqlmaster cmdline is then: parallel -a "/path/to/tasklist" --sqlmaster "$DBURL" bash The DBURL is now a task queue as well as a joblog. The parallel sqlworker cmdline is: parallel --sqlworker "$DBURL" Some advantages here are: + The original (sqlmaster) host does not have to control the parallel process and keep spawning new tasks on all the workers. + The worker nodes can each run at their own width (-j option). This might allow you to run a low task count on the worker nodes without interfering with other users on the node. You could even stop and restart with different -j values as needed throughout the day. + Worker nodes can be started simply by running parallel on each. And can be stopped by sending a KILL to the local parallel on that node. NOTE: The sql* options have very recent changes to them so make sure you are using the most recent version of parallel. Hope this is helpful. Cheers, --Andy On Tue, Mar 14, 2017 at 10:56 AM Douglas A. Augusto <[email protected]> wrote: > On 14/03/2017 at 10:54, > Anders Lind <[email protected]> wrote: > > > I could perhaps set this up using the ssh functionality of parallel, but > I > > would need to be able to on the fly stop some machines from running jobs, > > since the computers belong to co-workers who sometimes need their > computers > > for their own work. > > Hi Anders, > > The following thread may interest you: > > Dynamically changing remote servers list > https://lists.nongnu.org/archive/html/parallel/2014-08/msg00012.html > > Based on that, at the time I made a shell script that keeps parallel's > sshloginfile updated by filtering out unreachable remote servers and also > allowing the user to edit (include and/or exclude remote servers) > on-the-fly: > > https://github.com/daaugusto/gnuparallel > > PS: It worked with older versions of GNU Parallel (I haven't tested it with > more recent ones yet), so you mileage may vary. > > -- > Douglas A. Augusto >
