On Sat, Dec 31, 2011 at 4:29 AM, John W. Krahn <jwkr...@shaw.ca> wrote:
> Igor Dovgiy wrote: > >> Great work, Jonathan! >> Notice how simple your script has become - and that's a good sign as well >> in Perl. :) We can make it even simpler, however. >> >> As you probably know, Perl has two fundamental types of collections: >> arrays >> (where data is stored as a sequence of elements, data chunks) and hashes >> (where data chunks are unordered, but stored with some unique key used to >> retrieve it). Sometimes hashes are used just to sort out (non-)unique >> data, >> but that's another story. >> >> Now look at this line: >> >>> push @{$files{$filesize}}, $File::Find::name; >>> >> >> Don't you see something... weird? You're using hash where filesizes are >> the >> keys - and because, yes, they may well be non-unique, you have to store >> arrays of filenames in your hash instead... >> >> But much more natural (at least, for me) is to organize your hash (let's >> call it %filedata) so that filenames (which are unique by their nature) >> become the keys. And some info about these files - sizes and md5-hashes - >> become the values. >> > > Yes, file names in a given directory _have_ to be unique, however... > > > > For example, our `wanted` (btw, its name is misleading a bit, no? may be >> 'process' will sound better?) sub may look as follows: >> >> find(\&wanted, $path); >> >> my %filedata; >> sub wanted { >> return if substr($_, 0, 1) eq '.' || -d $_; >> my $filesize = -s _; >> open my $fh, '<', $_ or die $!, $/; >> my $filemd5 = Digest::MD5->new->addfile($fh)**->hexdigest; >> close $fh; >> $filedata{$_} = [$filesize, $filemd5]; >> > > You are traversing a directory tree, so using $_ as the key may cause > collisions across different directories. Better to use $File::Find::name > which contains the full absolute path name. > > > > > John > -- > Any intelligent fool can make things bigger and > more complex... It takes a touch of genius - > and a lot of courage to move in the opposite > direction. -- Albert Einstein > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > Hi to all on the list still following this thread - and Happy New Year! Igor.......Thanks!! : ) It does feel like there has been some really good Perl learning progress being made here - and yep, I cannot believe how trimmed down the script has now become. Looking back on the original script makes me laugh! I wonder if that will become a consistent theme when writing?! Looking back to the hash - I agree that it makes far more sense to have the filenames as the keys Quoting yourself and John: >> filenames (which are unique by their nature) >> Yes, file names in a given directory _have_ to be unique..... I think that we can all be in agreement then that these entries should be guaranteed to have unique keys and can have non-unique data such as file size attributed to them- therefore: push @{$files{$filename}}, $File::Find::name; When sorting the hash, there seems to well established code for this eg: sorting by file size: foreach (sort {$filedata{$b} <=> $filedata{$a}} keys %filedata { ## should sort so that the highest value file size is first ... } As far as I'm aware, <=> and cmp are the same thing Is there a question of precedence over them? I assume that <=> has a higher precedence Interestingly, this is now the second time in this thread that we have been warned against using $_ >From John: >> $_ as the key may cause collisions across different directories >From Shlomi: >> The $_ variable can be easily devastated. You should use a lexical one. I believe that I understand the use of $_ as the default variable; indeed, the documentation on CPAN about File::Find states the usage of $_ in the module However, it seems that it is a variable that it so easily destroyed and many have warned against using it If this is the case, why would we choose (or be required) to use it in the first place? I have read the 'Elements to Avoid' page, as recommended by Shlomi http://perl-begin.org/tutorials/bad-elements/ which is very useful Would it be correct to say that $_ should be re-assigned asap whenever using Perl? I couldn't find any exceptions that state that it is ok to use it ##### Sincere thanks again to you all for your contributions I hope that others reading this list are learning as much as I am! All the best Jonathan