Many thanks for that Perl code. I've taken it and will see how fast it is.

On Wed, Jan 30, 2013 at 7:55 AM, Malcolm Beattie <beatt...@uk.ibm.com> wrote:
> John McKown writes:
<snip>
> Perl (and Python) aren't simply interpreted. In the case of perl, it
> compiles the source into an internal op tree (rather like bytecode)
> while performing a decent amount of cheap optimisation (peephole
> optimisation mostly) and then runs that internal structure. Python
> will do something similar but the internal representation is
> different. Most if this isn't relevant to your situation here though.
>
<snip>
> It's not I/O that dominating in your implementation below, it is
> (as others have spotted) the opening and closing the relevant file
> on every single line of input. Either Perl or Python will let you
> remove this cost entirely. In fact, on your bash script below, bash
> seems to read each character from its uncompressed input in a
> separate read syscall which is dreadful but may be fixable.
>
<snip>
> This Perl program (or analogue in Python or whatever) is likely to
> give (and strace on some small test data shows) much, much better
> behaviour for larger input files:
>
> #!/usr/bin/perl
> use strict;
> use IO::File;
>
> my %fhcache;
>
> sub newfh {
>         my $filename = shift;
>         my $fh = IO::File->new($filename, "a") or die "$filename: $!\n";
>         $fhcache{$filename} = $fh;
>         return $fh;
> }
>
> sub getfh {
>         my $filename = shift;
>         return $fhcache{$filename} || newfh($filename);
> }
>
> foreach my $infile (<irradu00.g*.bz2>) {
>         open(IN, "bzcat $infile|") or die "bzcat $infile: $!\n";
>         my ($gen) = $infile =~ /\.(.*)\.bz2/;
>         while (<IN>) {
>                 my ($fn, $ft) = split;
>                 getfh("$fn.$ft.$gen.tx2")->print($_);
>         }
>         close(IN);
> }
>
> --Malcolm
>
> --
> Malcolm Beattie
> Mainframe Systems and Software Business, Europe
> IBM UK
>

--
Maranatha! <><
John McKown

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to