--pipe and :::

Ole Tange Sun, 09 Jan 2022 09:15:45 -0800

This:

    seq 300000 | parallel --pipe 'echo {1}-{2}; md5sum' ::: 1 2 ::: A B


gives:

    1-A
    1490f57cc8cedc0f3c9c6b26d29ac58a  -
    1-B
    20fbf3f07d8e81789f78b3906d951f09  -

But I think it would be better if it gave:

    1-A
    1490f57cc8cedc0f3c9c6b26d29ac58a  -
    1-B
    1490f57cc8cedc0f3c9c6b26d29ac58a  -
    2-A
    1490f57cc8cedc0f3c9c6b26d29ac58a  -
    2-B
    1490f57cc8cedc0f3c9c6b26d29ac58a  -
    1-A
    20fbf3f07d8e81789f78b3906d951f09  -
    1-B
    20fbf3f07d8e81789f78b3906d951f09  -
    2-A
    20fbf3f07d8e81789f78b3906d951f09  -
    2-B
    20fbf3f07d8e81789f78b3906d951f09  -

So the input chunk is given to all combinations of the command.

The code for retrying --pipe chunks already exists in --retries, so it
is probably doable.

It would mean some backwards incompatible changes:

It would mean that --cat/--fifo must be changed into something like:

    seq 300000 | parallel --pipe --cat 'echo {} {1}-{2}; md5sum {cat}'
::: 1 2 ::: A B

(currently if {} is used in --cat, it will be replaced with the name
of the temporary file. But if {} should mean the same as it does
everywhere else in GNU Parallel, we need to rename the {} used in
--cat. A suggestion would be {cat}).

--roundrobin could work like --tee: all jobs must be started at the
same time, except every chunk is only given to one job - not given to
all jobs:

    seq 300000 | parallel --pipe --roundrobin --cat 'echo {} {1}-{2};
md5sum {cat}' ::: 1 2 ::: A B
    seq 300000 | parallel --pipe --tee --cat 'echo {} {1}-{2}; md5sum
{cat}' ::: 1 2 ::: A B

But should --jobs then just be ignored when used with --pipe and ::: ?
I would expect the --tee example here to run 4 jobs in parallel and
give all input to all 4 of them. The --roundrobin example would also
start 4 jobs, but the input will be spread between them.

Comments?


/Ole

--pipe and :::

Reply via email to