On Apr 16, 3:16 am, Nigel Rantor <wig...@wiggly.org> wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor <wig...@wiggly.org> wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the risk of a hardware or software failure producing > >>> an incorrect result. > >> Not when you're using them to compare lots of files. > > >> Trust me. Been there, done that, got the t-shirt. > > >> Using hash functions to tell whether or not files are identical is an > >> error waiting to happen. > > >> But please, do so if it makes you feel happy, you'll just eventually get > >> an incorrect result and not know it. > > > Please tell us what hash you used and provide the two files that > > collided. > > MD5 > > > If your hash is 256 bits, then you need around 2**128 files to produce > > a collision. This is known as a Birthday Attack. I seriously doubt > > you had that many files, which suggests something else went wrong. > > Okay, before I tell you about the empirical, real-world evidence I have > could you please accept that hashes collide and that no matter how many > samples you use the probability of finding two files that do collide is > small but not zero. > > Which is the only thing I've been saying. > > Yes, it's unlikely. Yes, it's possible. Yes, it happens in practice. > > If you are of the opinion though that a hash function can be used to > tell you whether or not two files are identical then you are wrong. It > really is that simple. > > I'm not sitting here discussing this for my health, I'm just trying to > give the OP the benefit of my experience, I have worked with other > people who insisted on this route and had to find out the hard way that > it was a Bad Idea (tm). They just wouldn't be told. > > Regards, > > Nige
And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. -- http://mail.python.org/mailman/listinfo/python-list