First: Did you read the man page? Specifically: EXAMPLE: Running the same command on remote computers (unimplemented)
(The '(unimplimented)' is not really true anymore). time ./src/parallel -j0 --nonall -S c,d,e,f "hostname ; uptime" /Ole On Mon, Jun 6, 2011 at 3:30 PM, Hans Schou <[email protected]> wrote: > I might be doing something wrong, which makes it longer time. What is > the right syntax for what I am trying to do? (I expect you can read my > mind) > > $ time ./src/parallel "ssh {} 'hostname ; uptime'" ::: c d e f > castor > 13:18:12 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00 > elvis > 13:18:12 up 32 days, 2:59, 2 users, load average: 0.34, 0.34, 0.28 > frank > 13:18:12 up 160 days, 1:45, 0 users, load average: 4.74, 5.83, 5.70 > daimi > 15:18:12 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00 > > real 0m1.031s > user 0m0.152s > sys 0m0.032s > > $ time ./src/parallel --onall -S c,d,e,f "hostname ; uptime #" ::: 1 > castor > 13:18:18 up 48 days, 6:14, 1 user, load average: 0.00, 0.00, 0.00 > elvis > 13:18:19 up 32 days, 2:59, 2 users, load average: 0.32, 0.33, 0.28 > frank > 13:18:19 up 160 days, 1:45, 0 users, load average: 4.76, 5.82, 5.70 > daimi > 15:18:19 up 47 days, 22:58, 0 users, load average: 0.00, 0.00, 0.00 > > real 0m1.291s > user 0m0.564s > sys 0m0.112s > > > /hans > 2011/5/26 Ole Tange <[email protected]>: >> I have been convinced that GNU Parallel should have an --onall option. >> >> --onall (unimplemented) >> Run all the jobs on all computers given with --sshlogin. GNU >> parallel will log into --jobs number of computers in parallel >> and run one job at a time on the computer. The order of the >> jobs will not be changed, but some computers may finish >> before others. >> >> I intend this: >> >> parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ \$2}' >> ::: a b c ::: 1 2 3 >> >> to do: >> >> parallel -S eos '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3 >> parallel -S iris '(echo {3} {2}) | awk \{print\ \$2}' ::: a b c ::: 1 2 3 >> >> In practise I believe this could be easily implemented by having GNU >> Parallel call parallel like this: >> >> parallel -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} {2}) | awk >> \{print\ \$2}' >> parallel -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo {3} {2}) | awk >> \{print\ \$2}' >> >> where I simply put 'a\nb\nc\n' and '1\n2\n3\n' into /tmp/abc and >> /tmp/123 respectively. As they are already being put into temporary >> files then the change may be small. I believe this would work out >> fine. >> >> A small penalty is that if run n jobs in parallel and have 2n hosts, >> it will do all the jobs for host1-n first and then all the jobs for >> hostn-2n. It will not run the first job on all hosts first and then >> the second. >> >> - o - >> >> I have a harder time figuring how to deal with stdin: >> >> cat | parallel --onall -S eos,iris >> >> This should run whatever comes from cat on both eos and iris. While >> the above is easy: >> >> cat | tee >(ssh eos) >(ssh iris) >/dev/null >> >> it becomes harder if you have so many hosts (10000s) that you cannot >> login to all of them at the same time. >> >> Also this one is tricky as you have to keep the {n} working: >> >> cat | parallel --onall -S eos,iris '(echo {3} {2}) | awk \{print\ >> \$2}' :::: - ::: a b c ::: 1 2 3 >> >> Maybe the solution is to accept that we have to read all of stdin >> first, put that in a file and use -a as above? >> >> So the tricky one will be executed like: >> >> # Stuff everything from stdin into a file >> cat > /tmp/stdin >> # Call parallel for each host in parallel >> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S eos '(echo {3} >> {2}) | awk \{print\ \$2}' & >> parallel -a /tmp/stdin -a /tmp/abc -a /tmp/123 -j1 -S iris '(echo >> {3} {2}) | awk \{print\ \$2}' & >> >> The price will be that if you have a slow program generating the stdin >> then that program has to finish before GNU Parallel can even begin >> executing the jobs. Ideally GNU Parallel should start executing the >> jobs that it already knows have to be run. >> >> One way of solving that would be having a jobqueue for each sshlogin. >> That, however, looks like a big change to the code. >> >> - o - >> >> People wanting to use GNU Parallel for running the same commands on a >> lists of hosts can you please describe your situations, so the design >> will work well. At the very least I need to know: >> >> * number of hosts (can we just log in to all of them simultaneously?) >> * number of commands to be run (is it just 1 or is it a script >> generated on stdin?) >> * is speed an issue? (would it be OK to ssh for each command?) >> * how are the commands generated? (is it a fast program, so it is OK >> to read everything before executing anything?) >> >> >> /Ole >> >> > >
