This is more a curiosity question. I have written a bash script which reads a bzip2 compressed set of files. For each record in the file, it writes the record into a file name based on the first two "words" in the record and the "generation number" from the input file name. Do to the extreme size of the input (47 files, each of which would be around 120 Gb to 180 Gb expanded or 23 to 27 million lines - very large). Basically there are probably around 50 or so (don't know) possible combinations of the "words". I'm wondering if I rewriting the script into either Python or Perl (both basically interpreted) would be worth my while. Or should I go with a compiler such as C/C++? Or, lastly, is it basically irrelevant due to the extremely large number of records and the minimal processing; which means that I/O will dominate the application.
If you're interested, the bash script looks like: #!/bin/bash for i in irradu00.g*.bz2;do gen=${i#irradu00.}; # remove prefix gen=${gen%.bz2}; # remove suffix, leaving generation bzcat $i |\ while read line;do fn=${line%% *} # remove all trailing characters after a space ft=${line:9:8} # get second word ft=${ft%% *} # and remove trailing spaces echo "${line}" >>${fn}.${ft}.${gen}.tx2; done; done If you're curious about the "set ${line}", I just couldn't figure out a way to parse -- Maranatha! <>< John McKown ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/