Many thanks for that Perl code. I've taken it and will see how fast it is. On Wed, Jan 30, 2013 at 7:55 AM, Malcolm Beattie <beatt...@uk.ibm.com> wrote: > John McKown writes: <snip> > Perl (and Python) aren't simply interpreted. In the case of perl, it > compiles the source into an internal op tree (rather like bytecode) > while performing a decent amount of cheap optimisation (peephole > optimisation mostly) and then runs that internal structure. Python > will do something similar but the internal representation is > different. Most if this isn't relevant to your situation here though. > <snip> > It's not I/O that dominating in your implementation below, it is > (as others have spotted) the opening and closing the relevant file > on every single line of input. Either Perl or Python will let you > remove this cost entirely. In fact, on your bash script below, bash > seems to read each character from its uncompressed input in a > separate read syscall which is dreadful but may be fixable. > <snip> > This Perl program (or analogue in Python or whatever) is likely to > give (and strace on some small test data shows) much, much better > behaviour for larger input files: > > #!/usr/bin/perl > use strict; > use IO::File; > > my %fhcache; > > sub newfh { > my $filename = shift; > my $fh = IO::File->new($filename, "a") or die "$filename: $!\n"; > $fhcache{$filename} = $fh; > return $fh; > } > > sub getfh { > my $filename = shift; > return $fhcache{$filename} || newfh($filename); > } > > foreach my $infile (<irradu00.g*.bz2>) { > open(IN, "bzcat $infile|") or die "bzcat $infile: $!\n"; > my ($gen) = $infile =~ /\.(.*)\.bz2/; > while (<IN>) { > my ($fn, $ft) = split; > getfh("$fn.$ft.$gen.tx2")->print($_); > } > close(IN); > } > > --Malcolm > > -- > Malcolm Beattie > Mainframe Systems and Software Business, Europe > IBM UK >
-- Maranatha! <>< John McKown ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/