On 16/08/2014 at 11:51,
Achim Gratz <[email protected]> wrote:
> Just make it the responsibility of the user that each server in the list
> given to parallel is actually reachable, don't second-guess the user.
> That list may actually be something that the user just gets from
> somewhere else, so you should perhaps be flexible with the expected
> format.
If the ability to dynamically include/exclude servers is implemented (for
instance by re-reading a file containing the list of servers) then the user
could take care of maintaining a list of active servers by doing something
like (just to get the idea):
while true; do parallel -k 'if ssh {} /bin/true; then echo "{}"; fi' :::
host1 host2 ... hostN > active_hosts.slf; sleep 10; done
And then starting GNU Parallel as:
parallel --slf active_hosts.slf ...
Of course, the jobs that were sent to the unavailable servers before they were
detected as down will still fail. But in this case I think it is okay to re-run
GNU Parallel with --resume-failed.
Best,
--
Douglas A. Augusto