Rob Das wrote: > Hi Rob: > > I'm trying to merge a whole bunch of files (possibly tens of thousands) into > one file. Here's my code (with the error checking removed for readability): > > opendir(INDIR, $indir); > @logfile=grep(/$mask/i, readdir INDIR); > closedir(INDIR); > [EMAIL PROTECTED]; # number of files matching mask > open(OUTFILE, ">$outdir$outfile"); > for ( $ctr=0; $ctr<$nbrfiles; $ctr++ ) { > open(INFILE, "$indir$logfile[$ctr]"); > print OUTFILE <INFILE>; > close(INFILE); > } > close(OUTFILE); > > Then I'm writing a file from the @logfile which then gets processed to > delete the files. It's done this way for restartability, so if I fail after > creating the merged file, I can restart and know which files need deleting. > > I'd appreciate an alternative to reading the entire file - you're right, I > don't need the whole thing in memory at the same time. However, wouldn't > processing the file one record at a time be much slower? I'll go that route > if I have to...
Ultimately the file has to be processed one record at a time. The difference is whether you read all the records first and get the whole file into memory or read just one record at a time and write it out. There's no difference in the number of reads and writes so you'd be better off doing it one record at a time, if only to avoid the system using the swapfile so it can accommodate all your data. What will improve the speed is increasing the amount read in one go to reduce the number of reads and writes. That's where your input record size comes in. I'd write it like this. The files are closed implicitly when the filehandles go out of scope. Try it this way and then, if you want to, change the record size and see if it produces a useful speed increase. HTH, Rob use strict; # Always use warnings; # Usually my ($indir, $mask); # Need my ($outdir, $outfile); # initialising my @logfile = do { opendir my $dirfh, $indir or die $!; grep /$mask/i, readdir $dirfh; }; { local @ARGV = @logfile; open my $outfh, "> $outdir$outfile" or die $!; while (<>) { print $outfh or die $!; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]