I do this in order to have a pretty high confidence in avoiding even someone's attempt to "pass" a file off as the right file (the purposeful use of a digest vulnerability). The odds of being able to exploit both MD5 and one of the SHA digest algorithms at the same time AND keep the same file length are vanishingly small.
-- Mike Arms -----Original Message----- From: Jan Dubois [mailto:[email protected]] Sent: Thursday, September 23, 2010 2:17 PM To: Arms, Mike; 'Ken Cornetet'; 'Francisco Zarabozo'; 'Active State Perl Mailing List' Subject: RE: Best way to compare to files in Perl On Thu, 23 Sep 2010, Arms, Mike wrote: > > I want to second this recommendation. I wrote a script that > recursively descends and writes out the MD5, SHA1, file length, and > file path. Using those first three parameters *in combination* is darn > close to 100% for determining file uniqueness. I have never come > across two files that differ but still have the same > > $MD5 . $SHA1 . $LENGTH > > (had to throw in some Perl :-) I do wonder why you needed to combine all three. Having a collision of the MD5 by itself is extremely unlikely unless someone intentionally tried to construct a file that has the same MD5 as another one (this is an MD5 vulnerability, and you should switch to one of the SHA algorithms if you have to worry about it). But for random files it would be highly unlikely; statistically it would take you on average 100 years to find a collision if you checked several billion files per second continuously. Concatenating multiple digests will just make your database searches slower because the index fields are longer, without providing you much actual benefit. So I would be really surprised if you had two different files with the same MD5 on your disk. If you did, how many files did you have in total? Cheers, -Jan _______________________________________________ ActivePerl mailing list [email protected] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
