Rob Das wrote:
> Hi Rob:
>
> I'm trying to merge a whole bunch of files (possibly tens of thousands) into
> one file. Here's my code (with the error checking removed for readability):
>
> opendir(INDIR, $indir);
> @logfile=grep(/$mask/i, readdir INDIR);
> closedir(INDIR);
> [EMAIL PROTECTED]; # number of files matching mask
> open(OUTFILE, ">$outdir$outfile");
> for ( $ctr=0; $ctr<$nbrfiles; $ctr++ ) {
>     open(INFILE, "$indir$logfile[$ctr]");
>     print OUTFILE <INFILE>;
>     close(INFILE);
> }
> close(OUTFILE);
>
> Then I'm writing a file from the @logfile which then gets processed to
> delete the files. It's done this way for restartability, so if I fail after
> creating the merged file, I can restart and know which files need deleting.
>
> I'd appreciate an alternative to reading the entire file - you're right, I
> don't need the whole thing in memory at the same time. However, wouldn't
> processing the file one record at a time be much slower? I'll go that route
> if I have to...

Ultimately the file has to be processed one record at a time. The difference
is whether you read all the records first and get the whole file into memory
or read just one record at a time and write it out. There's no difference
in the number of reads and writes so you'd be better off doing it one record
at a time, if only to avoid the system using the swapfile so it can accommodate
all your data.

What will improve the speed is increasing the amount read in one go to reduce
the number of reads and writes. That's where your input record size comes in.

I'd write it like this. The files are closed implicitly when the filehandles
go out of scope. Try it this way and then, if you want to, change the record
size and see if it produces a useful speed increase.

HTH,

Rob


  use strict;                # Always
  use warnings;              # Usually

  my ($indir, $mask);        # Need
  my ($outdir, $outfile);    #   initialising

  my @logfile = do {
    opendir my $dirfh, $indir or die $!;
    grep /$mask/i, readdir $dirfh;
  };

  {
    local @ARGV = @logfile;
    open my $outfh, "> $outdir$outfile" or die $!;
    while (<>) {
      print $outfh or die $!;
    }
  }





-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to