This:
seq 300000 | parallel --pipe 'echo {1}-{2}; md5sum' ::: 1 2 ::: A B
gives:
1-A
1490f57cc8cedc0f3c9c6b26d29ac58a -
1-B
20fbf3f07d8e81789f78b3906d951f09 -
But I think it would be better if it gave:
1-A
1490f57cc8cedc0f3c9c6b26d29ac58a -
1-B
1490f57cc8cedc0f3c9c6b26d29ac58a -
2-A
1490f57cc8cedc0f3c9c6b26d29ac58a -
2-B
1490f57cc8cedc0f3c9c6b26d29ac58a -
1-A
20fbf3f07d8e81789f78b3906d951f09 -
1-B
20fbf3f07d8e81789f78b3906d951f09 -
2-A
20fbf3f07d8e81789f78b3906d951f09 -
2-B
20fbf3f07d8e81789f78b3906d951f09 -
So the input chunk is given to all combinations of the command.
The code for retrying --pipe chunks already exists in --retries, so it
is probably doable.
It would mean some backwards incompatible changes:
It would mean that --cat/--fifo must be changed into something like:
seq 300000 | parallel --pipe --cat 'echo {} {1}-{2}; md5sum {cat}'
::: 1 2 ::: A B
(currently if {} is used in --cat, it will be replaced with the name
of the temporary file. But if {} should mean the same as it does
everywhere else in GNU Parallel, we need to rename the {} used in
--cat. A suggestion would be {cat}).
--roundrobin could work like --tee: all jobs must be started at the
same time, except every chunk is only given to one job - not given to
all jobs:
seq 300000 | parallel --pipe --roundrobin --cat 'echo {} {1}-{2};
md5sum {cat}' ::: 1 2 ::: A B
seq 300000 | parallel --pipe --tee --cat 'echo {} {1}-{2}; md5sum
{cat}' ::: 1 2 ::: A B
But should --jobs then just be ignored when used with --pipe and ::: ?
I would expect the --tee example here to run 4 jobs in parallel and
give all input to all 4 of them. The --roundrobin example would also
start 4 jobs, but the input will be spread between them.
Comments?
/Ole