Igor Dovgiy wrote:
Great work, Jonathan!
Notice how simple your script has become - and that's a good sign as well
in Perl. :) We can make it even simpler, however.

As you probably know, Perl has two fundamental types of collections: arrays
(where data is stored as a sequence of elements, data chunks) and hashes
(where data chunks are unordered, but stored with some unique key used to
retrieve it). Sometimes hashes are used just to sort out (non-)unique data,
but that's another story.

Now look at this line:
       push @{$files{$filesize}}, $File::Find::name;

Don't you see something... weird? You're using hash where filesizes are the
keys - and because, yes, they may well be non-unique, you have to store
arrays of filenames in your hash instead...

But much more natural (at least, for me) is to organize your hash (let's
call it %filedata) so that filenames (which are unique by their nature)
become the keys. And some info about these files - sizes and md5-hashes -
become the values.

Yes, file names in a given directory _have_ to be unique, however...


For example, our `wanted` (btw, its name is misleading a bit, no? may be
'process' will sound better?) sub may look as follows:

find(\&wanted, $path);

my %filedata;
sub wanted {
   return if substr($_, 0, 1) eq '.' || -d $_;
   my $filesize = -s _;
   open my $fh, '<', $_ or die $!, $/;
   my $filemd5 = Digest::MD5->new->addfile($fh)->hexdigest;
   close $fh;
   $filedata{$_} = [$filesize, $filemd5];

You are traversing a directory tree, so using $_ as the key may cause collisions across different directories. Better to use $File::Find::name which contains the full absolute path name.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.                   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to