Thanks to all who made suggestions. Using parallel for this task did improve performance substantially.
Dan On Sun, 2011-07-17 at 09:08 -0500, Ole Tange wrote: > On Fri, Jul 15, 2011 at 8:39 PM, Dan Kokron <[email protected]> wrote: > > > I have a bunch (~200) small (1K to 100K) binary files that I want to > > 'cat' into a larger file. I usually use "cat pe* > diag", but this > > takes considerable time on the Lustre file system we are using. I am > > exploring using GNU parallel for this task but have run into some > > difficulties. Basically the resulting diag file only contains one of > > the input files. > > > > I've tried the following variations. > > > > parallel "cat {} >diag_amsua_n18_03.2011041700" ::: pe* > > parallel cat {} ">"diag_amsua_n18_03.2011041700 ::: pe* > > ls pe* | parallel cat {} ">"diag_amsua_n18_03.2011041700 > > ls pe* | parallel -j4 -k cat {} ">"diag_amsua_n18_03.2011041700 > > ls pe* | parallel -k cat {} ">"diag_amsua_n18_03.2011041700 > > parallel -j4 -k "cat {} >diag_amsua_n18_03.2011041700" ::: pe* > > You are _so_ close. > > parallel cat >diag_all ::: pe* > > It is probably more readable for UNIX users to write this (It does > exactly the same): > > parallel cat ::: pe* >diag_all > > Or if you prefer the order kept: > > parallel -k cat ::: pe* >diag_all > > I have no experience with Lustre, but I would imagine that Lustre is > slow at getting the first byte and after that it is pretty fast. Also > the reason why it is slow is because it is waiting. If that is the > case then it will be OK to run a lot of cats simultaneously: > > parallel -j0 cat ::: pe* >diag_all > > These sections of the man page touches the subject of using the output > from GNU Parallel: > > EXAMPLE: Rewriting a for-loop and a while-read-loop > EXAMPLE: Rewriting nested for-loops > EXAMPLE: Keep order of output same as order of input > EXAMPLE: Processing a big file using more cores > > If you believe it can be explained better please post your suggestion > for discussion here. > > > /Ole -- Dan Kokron Global Modeling and Assimilation Office NASA Goddard Space Flight Center Greenbelt, MD 20771 [email protected] Phone: (301) 614-5192 Fax: (301) 614-5304
