I do this in order to have a pretty high confidence in avoiding even someone's 
attempt to "pass" a file off as the right file (the purposeful use of a digest 
vulnerability). The odds of being able to exploit both MD5 and one of the SHA 
digest algorithms at the same time AND keep the same file length are 
vanishingly small.

--
Mike Arms



-----Original Message-----
From: Jan Dubois [mailto:[email protected]] 
Sent: Thursday, September 23, 2010 2:17 PM
To: Arms, Mike; 'Ken Cornetet'; 'Francisco Zarabozo'; 'Active State Perl 
Mailing List'
Subject: RE: Best way to compare to files in Perl

On Thu, 23 Sep 2010, Arms, Mike wrote:
> 
> I want to second this recommendation. I wrote a script that
> recursively descends and writes out the MD5, SHA1, file length, and
> file path. Using those first three parameters *in combination* is darn
> close to 100% for determining file uniqueness. I have never come
> across two files that differ but still have the same
> 
>       $MD5 . $SHA1 . $LENGTH
> 
> (had to throw in some Perl :-)

I do wonder why you needed to combine all three.  Having a collision
of the MD5 by itself is extremely unlikely unless someone intentionally
tried to construct a file that has the same MD5 as another one (this
is an MD5 vulnerability, and you should switch to one of the SHA
algorithms if you have to worry about it).

But for random files it would be highly unlikely; statistically it would
take you on average 100 years to find a collision if you checked several
billion files per second continuously.

Concatenating multiple digests will just make your database searches
slower because the index fields are longer, without providing you much
actual benefit.

So I would be really surprised if you had two different files with the
same MD5 on your disk.  If you did, how many files did you have in total?

Cheers,
-Jan



_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to