Currently GNU parallel will output the results of the jobs either in the order in which they are completed or (with --keep-order) in the order they were specified. How would you feel about adding a --reduce option that would specify a command to use in order to combine the results? The command would take as arguments files (or file descriptors via /dev/fd/) of the generated output of each job and produce the final output of parallel.

Here are some examples.

parallel --reduce cat
is the same as parallel --keep-order

parallel --pipepart --reduce 'sort -m' sort :::: file
will sort the file in parallel and then merge-sort the parts.

<directories parallel --reduce 'tar --concatenate'  tar cf -
will create a single tar file from the parallel running ones.

parallel --pipepart --reduce dgsh-merge-sum \
  "tr -s ' \t\n\r\f' '\n' | sort | uniq -c" :::: file
will count the number of times each word appears in the specified input file. (The dgsh-merge-sum command sums sorted output from uniq -c; see https://github.com/dspinellis/dgsh/blob/master/core-tools/src/dgsh-merge-sum.pl.)

--
Diomidis

Reply via email to