On Thu, Jun 23, 2011 at 2:47 AM, Jon Wilson <[email protected]> wrote:
> 1) submit jobs that will take several hours to run, during which time I > won't have anything else in particular to do > 2) Go work on bringing cluster nodes back up > 3) Change ~/.parallel/sshloginfile > 4) GNU parallel notices that the file has changed, just like if I were > using -j procfile, and immediately starts jobs on those additional nodes. > > I am using parallel 20110522. Is this behavior already implemented? If > not, I would like to request this feature. That is currently not implemented. A workaround for you may be to put all the nodes in ~/.parallel/sshloginfile and use --retry to retry the job if it fails on a node (e.g. if it is not up). You should set --retry to number_of_nodes_down+1, so that if GNU Parallel retries on another node that is down, it will retry until it finds at least one that is up. It is abusing the --retry and if a job actually _does_ fail, then you will run that job number_of_nodes_down+1 times. If you still want the feature, file a Whislist at https://savannah.gnu.org/bugs/?group=parallel /Ole
