On Mon, Mar 7, 2022 at 4:22 AM Saint Michael <[email protected]> wrote:
>
> So how would I submit the contents of many files to parallel, without
> concatenating them?
Why do you see this as a problem? If you are going to start a process
for each line of input cat will not slow things down.
You _can_ avoid the cat, but it seems a bit silly:
< file1.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file2.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file3.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file4.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file5.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file6.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
< file7.csv parallel --colsep ',' function "{1} {2} {3} {4} {5} {6} {7}"
And I think you will find the total run time is longer.
> The function neds to process each file line by line.
> I am sure there must be a better way.
> Why concatenate them at all?
Because you want to feed them into GNU Parallel as a single input source.
cat is way faster than GNU Parallel will ever be, so please explain
why you see cat as a problem.
seq 10000 > file
time cat file >/dev/null
< file time parallel echo >/dev/null
> There is no relationship between a line and the next line.
If you can change function to read from stdin (standard input), then
we can do something way more efficient:
myfunc() { wc; }
export -f myfunc
parallel --pipepart --block -1 myfunc :::: *.csv
--pipepart has some limitations, but it is insanely fast (almost as
fast as a parallelized cat). I
> Maybe a new feature?
If the previous does not answer your question then it is unclear to me
what you really want to do.
If you read https://stackoverflow.com/help/minimal-reproducible-example
you will see how to make it easier to help you.
/Ole