Palit, Nilanjan wrote:
I read that the original MD5 algorithm has known issues with collisions.
Do I need to test for collisions myself...or is it pretty well tested (or proved?) to stand up to an intensive application?

A bit of googling turned up this:

http://www.cs.bham.ac.uk/~mdr/teaching/modules04/security/lectures/hash.html

SHA-1 is a direct competitor with MD5, since both of them were developed
at the same time in an effort to strengthen MD4. SHA-1 produces 160
bits; thus its "collision difficulty" is much higher, 2^80 = 10^24
compared to 2^64 = 10^19 for MD5. So it is 10000 times as secure against
brute force attacks. ... SHA-1 is slower, because it has to do more
rounds and process more data. I guess it's between 2 and 10 times slower.

And: http://www.freedom-to-tinker.com/archives/000661.html

A fundamental property of CH functions is the probability of a hash
collision given two different inputs. This is determined by considering
the Birthday Paradox, and approaches 2^(n/2), where n = bits in hash, as
the hash function gets better. For SHA-1, which is 160 bits, this yields
a best-case collision probability of 1 in 2^80. You would *expect* to
find a collision, given enough pairs of inputs, and so would quite
normally chose a hash length based on the expected population of objects
you want to digest in any domain of potential collision.

Posted by: John Wainwright at August 17, 2004 11:50 AM

This is part of a thread discussing a rumored break of the SHA-1 algorithm. Apparently the discussion got muddled, and the author of this comment was clarifying that the fact that collisions exist isn't significant, and in fact is an expected and known property of the algorithm.


What new, as John Saylor pointed out in his post, is that people are figuring out ways to generate data that when hashed, will match an existing digest.

This weakens the algorithm from a cryptography perspective, but doesn't have a bearing on your application.

So it looks like MD5 gives you a 1 in 100000000000000000000 chance of collision. Given you said you'd have < 100,000 keys, this should be adequate. If you don't think so, try SHA-1, as John Saylor also pointed out in his post.


Any experiences with how well Digest::MD5 does when used with many
millions of keys?

I believe Digest::MD5 is a direct implementation of RSA's publicly released code, so it'll behave the same as most other MD5 implementation out there. There's a pure Perl implementation also available, but I'd expect it too to perform identically, just slower.


 -Tom
_______________________________________________
Boston-pm mailing list
Boston-pm@mail.pm.org
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to