Currently GNU parallel will output the results of the jobs either in the
order in which they are completed or (with --keep-order) in the order
they were specified. How would you feel about adding a --reduce option
that would specify a command to use in order to combine the results?
The command would take as arguments files (or file descriptors via
/dev/fd/) of the generated output of each job and produce the final
output of parallel.
Here are some examples.
parallel --reduce cat
is the same as parallel --keep-order
parallel --pipepart --reduce 'sort -m' sort :::: file
will sort the file in parallel and then merge-sort the parts.
<directories parallel --reduce 'tar --concatenate' tar cf -
will create a single tar file from the parallel running ones.
parallel --pipepart --reduce dgsh-merge-sum \
"tr -s ' \t\n\r\f' '\n' | sort | uniq -c" :::: file
will count the number of times each word appears in the specified input
file. (The dgsh-merge-sum command sums sorted output from uniq -c; see
https://github.com/dspinellis/dgsh/blob/master/core-tools/src/dgsh-merge-sum.pl.)
--
Diomidis