On Wednesday 15 December 2004 15:34, Benjamin Jeeves wrote:
> Hi all

Hi Benjamin,

>
> I'm writting a program in perl to md5sum about 500,000 files these files
> are text files and have different files size the biggest being about 500KB.
> my code is below

[..code snipped..]
>
> The thing is that it is taking about 3 to 4 hours to complete would these
> be about right on this number of files?

It's hard to estimate how long it should take - I'm md5sum'ing regularly about 
25.000 files, and it takes something like 2 minutes. Given those numbers, I'd 
think that it should take around 40 minutes to sum your files.

> This there any way to speed things up? if so any example would be good or a
> point in the right way too? If I do not md5sum the files it print to the
> screen in about 2 mins?

Off the top of my head, I don't know what could be improved in your code - 
maybe it would help not to instantiate an own MD5 object in each run of your 
for loop?

I'm attaching the code I'm using - maybe you want to run a test how long this 
code needs to scan your files.

HTH,

Philipp

# =========================================================================
# SCAN_DIRECTORY
# =========================================================================
# scan all files of a directory and write the result (filenames + MD5)
# into a specified file
# param $directory_name that should be scanned
# param $filename into which to write
# param (optional) $regexp that should be applied to filter out files

my $digest;
my $out_file;
my $directory;
my $filter;
my $base_file;

# callback procedure
sub process {
        my $fh = new FileHandle;        
        return if (! -f $File::Find::name);
        
        if ($filter) {
                return if ($File::Find::name =~ /$filter/);
        }                       
        
        # TODO jar support
        #if ($File::Find::name =~ /\.jar$/) {
        #       $base_file = $File::Find::name;         
        #}
        
        open ($fh, $File::Find::name) 
                or die "cannot open file $File::Find::name : $!";
        binmode($fh);
        $digest -> addfile($fh);
        my $file_name = substr($File::Find::name, length($directory)+1);
        print $out_file $file_name . ";" . $digest -> hexdigest . "\n";
        close ($fh);
}

sub scanDirectory {
        $digest = Digest::MD5 -> new;
       ($directory, my $scan_file, $filter) = @_;               
    
       my $base_dir = getcwd();
       chdir($directory);
        
        $out_file = new FileHandle;
        sysopen($out_file, $scan_file, O_CREAT | O_RDWR) 
                or die "could not open file $scan_file : $!";   
        
        find ( \&process, $directory);
        
        close($out_file) or die "could not close file $scan_file : $!";
        
        chdir($base_dir);
}

sub testScanDirectory {
        scanDirectory('d:/temp/tools', 'd:/tools.txt');
        
        scanDirectory('d:/temp/tools', 'd:/tools_filtered.txt', '\.exe$');
}

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Reply via email to